research article multiagent cooperative learning...

14
Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion Games Jong Yih Kuo, 1 Hsiang-Fu Yu, 2 Kevin Fong-Rey Liu, 3 and Fang-Wen Lee 1 1 Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan 2 Department of Computer Science, National Taipei University of Education, Taipei, Taiwan 3 Department of Safety, Health and Environmental Engineering, Ming Chi University of Technology, New Taipei City, Taiwan Correspondence should be addressed to Jong Yih Kuo; [email protected] Received 20 June 2014; Revised 17 September 2014; Accepted 12 October 2014 Academic Editor: Saeed Balochian Copyright © 2015 Jong Yih Kuo et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. is study examines the pursuit-evasion problem for coordinating multiple robotic pursuers to locate and track a nonadversarial mobile evader in a dynamic environment. Two kinds of pursuit strategies are proposed, one for agents that cooperate with each other and the other for agents that operate independently. is work further employs the probabilistic theory to analyze the uncertain state information about the pursuers and the evaders and uses case-based reasoning to equip agents with memories and learning abilities. According to the concepts of assimilation and accommodation, both positive-angle and bevel-angle strategies are developed to assist agents in adapting to their environment effectively. e case study analysis uses the Recursive Porous Agent Simulation Toolkit (REPAST) to implement a multiagent system and demonstrates superior performance of the proposed approaches to the pursuit-evasion game. 1. Introduction A multiagent system (MAS) comprises a set of agents that interact with each other. ese agents may either share a common goal or have contradictory objectives [1, 2]. is work deals with cooperative agents trying to achieve a common goal. When coordinating multiple agents as a team for the same task, the agents must have the ability to handle unknown and uncertain situations and take the success of the whole team into account. A multiagent pursuit-evasion game involves guiding one group of agents (pursuers) to cooperate with each other to catch another group of agents (evaders). However, this game varies with the type of environment in which it is played (e.g., plane, grid, and graph), the knowledge of the players (e.g., the evaders’ position and strategy), the controllability of their motions (is there a limit on their speed? and can they make turns whenever they want?), and the meaning of a capture (are the evaders to be intercepted, seen, or surrounded?). Being complex and dynamic, pursuit-evasion problems are difficult to solve [3]. To address such problem, the robotics community proposed several models [4, 5]. In these models, the motion of the evader is usually modeled by a stochastic process. ere has been growing interest in modeling the game, in which the evader is intelligent and has certain sensing capabilities [6]. Solutions to the problem proposed in other studies [79] included competitive coevo- lution, multiagent strategies, and multiagent communication algorithms. e hybrid learning approach to RoboCup [10] includes a coach agent and multiple moving agents. Using case-based reasoning and genetic algorithms (CBR-GA), the coach agent decides on a strategy goal and assigns tasks to the moving agents. Every moving agent then executes its respective task to achieve the strategy goal. e proposed method also includes two kinds of agents (pursuer and evader). Unlike the hybrid learning approach to RoboCup, the proposed method does not use the coach agent to guide pursuers in catching evaders. e pursuers must search for the evaders themselves. If other evaders/pursuers are not in the sensing area of a pursuer, the pursuer does not know their positions. e pursuer thus uses the case-based reasoning with assimilation and accommodation to catch evaders. Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2015, Article ID 964871, 13 pages http://dx.doi.org/10.1155/2015/964871

Upload: phunglien

Post on 28-Mar-2019

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Multiagent Cooperative Learning ...downloads.hindawi.com/journals/mpe/2015/964871.pdf · Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion

Research ArticleMultiagent Cooperative Learning Strategies forPursuit-Evasion Games

Jong Yih Kuo1 Hsiang-Fu Yu2 Kevin Fong-Rey Liu3 and Fang-Wen Lee1

1 Department of Computer Science and Information Engineering National Taipei University of Technology Taipei Taiwan2Department of Computer Science National Taipei University of Education Taipei Taiwan3Department of Safety Health and Environmental Engineering Ming Chi University of Technology New Taipei City Taiwan

Correspondence should be addressed to Jong Yih Kuo jykuontutedutw

Received 20 June 2014 Revised 17 September 2014 Accepted 12 October 2014

Academic Editor Saeed Balochian

Copyright copy 2015 Jong Yih Kuo et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

This study examines the pursuit-evasion problem for coordinating multiple robotic pursuers to locate and track a nonadversarialmobile evader in a dynamic environment Two kinds of pursuit strategies are proposed one for agents that cooperatewith each otherand the other for agents that operate independentlyThiswork further employs the probabilistic theory to analyze the uncertain stateinformation about the pursuers and the evaders and uses case-based reasoning to equip agents withmemories and learning abilitiesAccording to the concepts of assimilation and accommodation both positive-angle and bevel-angle strategies are developed toassist agents in adapting to their environment effectively The case study analysis uses the Recursive Porous Agent SimulationToolkit (REPAST) to implement a multiagent system and demonstrates superior performance of the proposed approaches to thepursuit-evasion game

1 Introduction

A multiagent system (MAS) comprises a set of agents thatinteract with each other These agents may either share acommon goal or have contradictory objectives [1 2] Thiswork deals with cooperative agents trying to achieve acommon goal When coordinating multiple agents as a teamfor the same task the agents must have the ability to handleunknown and uncertain situations and take the success of thewhole team into account

A multiagent pursuit-evasion game involves guiding onegroup of agents (pursuers) to cooperate with each other tocatch another group of agents (evaders) However this gamevaries with the type of environment in which it is played(eg plane grid and graph) the knowledge of the players(eg the evadersrsquo position and strategy) the controllabilityof their motions (is there a limit on their speed and canthey make turns whenever they want) and the meaningof a capture (are the evaders to be intercepted seen orsurrounded) Being complex and dynamic pursuit-evasionproblems are difficult to solve [3] To address such problemthe robotics community proposed several models [4 5] In

these models the motion of the evader is usually modeledby a stochastic process There has been growing interest inmodeling the game in which the evader is intelligent andhas certain sensing capabilities [6] Solutions to the problemproposed in other studies [7ndash9] included competitive coevo-lution multiagent strategies and multiagent communicationalgorithms

The hybrid learning approach to RoboCup [10] includesa coach agent and multiple moving agents Using case-basedreasoning and genetic algorithms (CBR-GA) the coach agentdecides on a strategy goal and assigns tasks to the movingagents Every moving agent then executes its respective taskto achieve the strategy goal The proposed method alsoincludes two kinds of agents (pursuer and evader) Unlike thehybrid learning approach to RoboCup the proposed methoddoes not use the coach agent to guide pursuers in catchingevadersThe pursuersmust search for the evaders themselvesIf other evaderspursuers are not in the sensing area of apursuer the pursuer does not know their positions Thepursuer thus uses the case-based reasoning with assimilationand accommodation to catch evaders

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015 Article ID 964871 13 pageshttpdxdoiorg1011552015964871

2 Mathematical Problems in Engineering

Owing to the uncertain environment and the dynamicinformation of agents at each moment previous researchhas mainly examined how to improve the efficiency ofcommunication and accuracy in the learning process In adifferent manner this study focuses on agents with a mentalstate and the ability to plan evolution by using strategicmodules and combining learningmethods Armed with suchcapability the agents can adapt to the dynamic environment

The rest of the paper is organized as follows Section 2reviews the background and related works of pursuit-evasionproblems Section 3 introduces the agentrsquos adaptive learningprocess Section 4 describes the case study of the pursuit-evasion gameThe conclusions and directions for future workare presented in the last section

2 Background and Related Work

Pursuit-evasion problems have long been studied from theperspective of differential game theories Nishimura andIkegami observed random swarming and other collectivepredator motions in a prey-predator game model [11] Jimand Giles [12] proposed a simple effective predator strategywhich could enable predators to move to the closest captureposition This strategy does not prove to be very successfulbecause the predators may block each other when they tryto move to the same capture position Haynes and Sen[13] used genetic programming to evolve strategies for bothpredators and preys Hladek et al [14] developed amultiagentcontrol system using fuzzy inference for a group of two-wheeled mobile robots to execute a common task Theydefined a pursuit-evasion task using fuzzy sets to establish aframework for inserting and updating expert knowledge inthe form of rules by an inference system Antoniades et al[3] proposed several pursuit algorithms for solving complexmultiplayer pursuit-evasion games Their work revealed thatreducing sensing overlap between pursuers and avoidingoverassignment of pursuers to target locations could improvethe capture time According to the Contract Net Protocol(CNP) [15] Zhou et al [16] proposed a multiagent coopera-tive pursuit algorithm and extended the CNP by improvingalliance decision and forming dynamic alliance They usedthe support decision matrix that includes credits of agentsdegrees of hookup and degrees of self-confidence to helpagents make decisions during the negotiation process

Rao and Georgeff [17] proposed the belief-desire-intention (BDI) model for the agents in a distributed arti-ficial intelligence environment As its name implies theBDI agent model involves belief desire and intention Inaddition the assimilation and accommodation approachesto the pursuit-evasion game were proposed by Piaget [18]These two approaches equip an agent with a mental stateand the ability to plan an evolution process Using differentlearning methods the agent can also adapt to the dynamicenvironment effectively

21 BDI Agent Model On the foundation of BDI Kuo etal [19 20] proposed an agent which can be completelyspecified to fulfill its intentions by events it perceives actions

it performs beliefs it holds goals it adopts and plans it hasA goal module describes the goals that an agent may adoptas well as the events to which it can respond A belief moduleincludes the information about the internal state that an agentof a certain class holds the strategies it may perform andthe environment it is in A plan module generates the plansthat an agent may employ to achieve its goals A plan is asequence of actions or strategies derived through a reasoningmechanism

In the proposed approach belief denotes the agentrsquosknowledge of the environment including the orientationsand locations of evaders desire represents the agentrsquos wish tocatch all evaders and intention stands for the agentrsquos plan ofactions

22 Assimilation and Accommodation According to Piagetrsquoscognitive theory [18] each recognition system aims at equi-libration of the inconsistent information about the worldThus an organism seeks not only adaptation (harmony oforganism and world) but also organization (harmony withinitself) Assimilation and accommodation represent formsof the maintenance and modification of these cognitiveschemata [21] Takamuku andArkin [22] applied assimilationon domestic robots for social learning enabling the robots toperform well under various situations This study proposesboth bevel-angle and positive-angle accommodation strate-gies for modifying the agentrsquos cognitive structure

23 Pursuit-Evasion Game In a pursuit-evasion game pur-suers try to capture evaders by besieging them from alldirections in a grid world The game focuses mainly on theeffectiveness of structures with varying degrees of pursuersrsquocooperation and control to entrap evaders efficiently [13]The first mathematical formulation of graph searching wasproposed by Parsons [23] in 1978 The formulation wasinspired by an earlier research by Breisch [24] who put for-ward an approach to finding an explorer lost in a complicatedsystem of dark caves

Furthermore the pursuit-evasion game varies with dif-ferent approaches to searching evaders as described in thefollowing

231 Node Search and Mixed Search Ferrari [25] proposeda new family for path planning algorithms One algorithmis for searching the goal and the other is for searching theobstacles These algorithms can be utilized to parameterizeeasily the potential scale length and strength thus providingbetter control over the moving object path

To address some unique demands from the game domainWalsh and Banerjee [26] presented a new algorithm calledldquoVRAlowastrdquo algorithm for path finding on game maps andproposed also the extension of a postsmoothing techniqueWong et al presented a Bee Colony Optimization (BCO)algorithm for the symmetrical Traveling Salesman Problem(TSP) [27] The BCO model is constructed algorithmicallyaccording to the collective intelligence observed from theforaging behavior of bees The algorithm is integrated with

Mathematical Problems in Engineering 3

a fixed-radius near-neighbor 2-opt (FRNN 2-opt) heuristic tofurther improve prior solutions generated by the BCOmodel

Raboin et al [28] consideredmultiagent pursuit scenariosin which there is a team of tracker agents and a moving targetagent The trackers observe continuously the target until thetarget is at least one trackerrsquos observation range at the end ofthe game Their study described formalism and algorithmsfor game-tree search in partially observable Euclidean space

Gerkey et al [29] introduced a new class of searchersthe 120593-searchers which can be readily instantiated as aphysical mobile robot and proposed the first complete searchalgorithm for a single 120593-searcher They also showed how thisalgorithm can be extended to handle multiple searchers andgave various examples of computed trajectories Their workaimed at coordinating teams of robots to execute tasks inapplication domains such as clearing a building for reasonsof security or safety

232 Game Theoretic Game theoretic approaches topatrolling have increasingly become an interesting topic inrecent years Basilico et al [30] presented a game theoreticscheme to determine the optimal patrolling strategy for amobile robot that operates in environments with arbitrarytopologies

Prior research [31] proposed a game theory-basedapproach which uses a multirobot system to performmultitarget search in a dynamic environment A dynamicprogramming equation is employed to estimate the utilityfunction which considered the a priori probability maptravel costs and current decisions of other robots Accordingto this utility function a utility matrix can be calculated foran119873-robot nonzero-sum gameThus pureNash equilibriumand mixed-strategy equilibrium can be utilized to guide therobots in making their decisions

3 Adaptive Agent Learning

This section introduces the adaptive agent model the refine-ment of previously developed agent models [19 20] Thisadaptive agent model contains a hybrid approach for a multi-agent learningmethod In addition thismodel enables agentsto learn and accumulate their experience and knowledgefrom other agents or the environment

31 Adaptive Agent Model The agent model shown inFigure 1 is a cooperative learning model responsible forcontrolling the learning process A goal module returnsthe goals that an agent may possibly adopt and the eventsto which it can respond A belief module describes theinformation about the environment the internal state that acertain class of agents may hold and the strategies or tacticsthat the agentmay perform A planmodule returns plans thatare possibly employed to achieve the goals of the agent Aplan is a sequence of actions or strategies derived throughreasoning There are two types of ontology that provide thedomain-specific and issue-specific knowledge respectively

Belief

Goal

Basic actionAdaptive and cooperative

learning

Action

Strategy

Plan

Sensors

Figure 1 Agent model

The mental state of an agent is represented by first-orderlanguage as follows

119865119874119879119890119903119898 = ⟨119862119900119899119904119905⟩ |⟨119865119874119881⟩|

119891119906119899119888119905119900119903 ⟨119865119906119899119888119878119910119898 times 119904119890119902 119865119874119879119890119903119898119904⟩

119860119905119900119898 = ⟨119875119903119890119889119894119888119886119905119890 times 119904119890119902 119865119874119879119890119903119898119904⟩

(1)

A first-order term (119865119874119879119890119903119898) is defined by a set of con-stants (⟨119862119900119899119904119905⟩) first-order variables (⟨119865119874119881⟩) and functions(119891119906119899119888119905119900119903 ⟨119865119906119899119888119878119910119898 times 119904119890119902 119865119874119879119890119903119898119904⟩) An atom consists of apredicate and a set of 119865119874119879119890119903119898119904

311 Belief The beliefs of an agent describe the situationswhere the agent is The beliefs are specified by a belief basewhich contains the information the agent believes about theworld as well as the information that is internal to the agentThe agent is assumed to have beliefs about its task and itsenvironment A belief is represented by a set of well-formedformulas such as

119861119890119897119894119890119891 = ⟨119860119905119900119898⟩ |not ⟨119860119905119900119898⟩| ⟨119860119905119900119898⟩ [and ⟨119860119905119900119898⟩] (2)

For instance a belief that there is an obstacle (119900119887119904119905119886119888119897119890 119894119889)at a certain location (1198831 1198841) can be represented as(119900119887119904119905119886119888119897119890 119894119889 1198831 1198841)

312 Goal A goal is an agentrsquos desire and describes the statesof affairs that an agent would like to realize For instanceCatchAllEvader(True) stands for a pursuerrsquos goal to catchall evaders A goal is represented by a set of well-formedformulas such as

119866119900119886119897 = 119886119888ℎ119894119890V119890119889 ⟨119860119905119900119898⟩ (3)

313 Basic Actions Basic actions are generally used by anagent to manipulate its environment Before performing thebasic actions certain beliefs should be held The beliefs ofthe agent must be updated after the execution of actionsBasic actions are the only entities that can change the beliefsof an agent Let 120572 be an action with parameters 119909 and let120593 120595 isin 119887119890119897119894119890119891 The programming constructs for basic actionsare then expressed as

120593 120572 (119909) 120595 (4)

where 120593 and 120595 are preconditional and postconditionalrespectivelyThe precondition indicates that an action cannot

4 Mathematical Problems in Engineering

be performed only if certain beliefs are held For example apursuer is assumed to catch an evader from location (1198830 1198840)

to location (1198831 1198841) with a basic action 119872119900V119890119879119900(1198831 1198841)Before this action is executed the pursuer must be at position(1198830 1198840) denoted by 119901119906119903119904119906119890119903(119904119890119897119891 1198830 1198840) The conditionthat the pursuer senses the evader at position (1198831 1198841) isdenoted by 119890V119886119889119890119903119860119905(1198831 1198841) This action is then defined interms of the following beliefs

119901119906119903119904119906119890119903 (119904119890119897119891 1198830 1198840) 119890V119886119889119890119903119860119905 (1198831 1198841) (5)

After executing an action an agentmust update its beliefsto make its postcondition true According to the aboveexample after performing the catch action the agent updatesits beliefs and obtains

119901119906119903119904119906119890119903 (119904119890119897119891 1198831 1198841) 119890V119886119889119890119903119860119905 (1198832 1198842) (6)

where (1198832 1198842) is an evaderrsquos new location

314 Strategies The strategy of a player refers to one of theoptions that can be chosen in a setting The choice dependsnot only on the agentrsquos own actions but also on other agentsrsquoactions A prior study [28] proposed a pure strategy in termsof a function Given an agentrsquos current information set thefunction returns the agentrsquos next move (ie its change inlocation between time 119905

119895and 119905119895+1

) For example if an evaderhas a pure strategy 120590

0 then 120579

0(119905119895+1

) = 1205790(119905119895) + 1205900(1198680(119905119895))

where 1205790(119905119895) is the location of tracker agent 0 at time 119905

119895and

1198680(119905119895) is the trackerrsquos information set at time 119905

119895 Suppose that

a pursuer has a pure strategy set 120590 = (1205901 120590

119896) and the

locations of the tracker agents 120579 (119905) = 1205791(119905) 120579

119896(119905) Then

120579(119905119895+1

) = 120579(119905119895) + 120590(119868

0(119905119895)) for each time 119905

119895 The strategy

is further refined by adding an algorithm for selecting aplan according to different conditions The algorithm can berepresented by a function that generates a planThe conditionincludes the beliefs of agents or the current situation of theenvironment The proposed strategy is denoted by

119878119905119903119886119905119890119892119910 = ⟨119888119900119899119889119894119905119894119900119899119904⟩ 119891119906119899119888119905119894119900119899 (7)

315 Plans A plan is a method for an agent to approach itsgoal and is generated according to the agentrsquos strategy A planconsists of beliefs actions and rewards as expressed in thefollowing equation

119875119897119886119899 = ⟨119887119890119897119894119890119891⟩ ⟨119887119886119904119894119888 119886119888119905119894119900119899⟩ [⟨119887119886119904119894119888 119886119888119905119894119900119899⟩] ⟨119903119890119908119886119903119889⟩

(8)

A plan can be one or a set of actions for an agent and thevalue of the plan is determined by the reward

32 Case-Based Reasoning According to previous studies[32 33] a general case-based reasoning (CBR) cycle mayinclude retrieving the most similar cases for a problemreusing the information and knowledge in those cases to solvethe problem revising the proposed solution and retainingthe parts of this experience likely to be useful for futureproblem solving

In order to solve a new problem the cases which containthe most useful knowledge have to be identified first Sincethe utility of a case cannot be evaluated directly a priori thesimilarity between problem descriptions is used in heuristicsto estimate the expected utility of the cases Therefore thequality of this measure is crucial for the success of CBRapplications There are numerous similarity measures in usetoday They are used not only in CBR but also in otherfields including data mining pattern recognition geneticalgorithm and machine learning Euclidean distance is atypical similarity measure for objects in a Euclidean space[34] According to Euclidean distance this work devises asimple distance function to evaluate the similarity betweentwo cases

321 Case Description A case stands for an agentrsquos mentalstate and the result of its past output Each case consists of agoal beliefs an action a plan and a reward The goal is whatthe agent wants to achieve or realize The beliefs describe thesituation the agent is in The action is what the agent doesunder that situationThe plan is the approach the agent takesto achieve the goalThe reward serves to evaluate the result ofthe plan

322 Similarity Relation Consider the following

Sim119886119887

=

119899

sum

119894=1

119908119894times (

1003816100381610038161003816119886119894 minus 119887119894

1003816100381610038161003816) (9)

This study proposes an evaluation function (Sim) shownas (9) for measuring the similarity between cases 119886 and 119887In this equation 119886 and 119887 represent a new problem and acase in a case base respectively The variable 119899 is the numberof features in case 119886 The weight variable 119908

119894stands for the

importance of the 119894th feature The weight vector is freelydefined by a user 119886

119894and 119887119894are the 119894th features of cases 119886 and 119887

respectively In the case-retrieving stage of the CBR approachthe case with the smallest (Sim) value is always retrieved forreuse

33 Probabilistic Framework Suppose that a finite two-dimensional environment EX with 119899

119888square cells contains

an unknown number of fixed obstacles We also assume thatthe environment has 119899

119901pursuers and 119899

119890evaders Let 119909

119901119896and

119909119890119894be the cells occupied by pursuer 119896 and evader 119894 where

1 le 119894 le 119899119901and 1 le 119895 le 119899

119890 The pursuers and evaders

are restricted to move to cells not occupied by obstaclesEach pursuer collects information about EX at discrete timeinstances 119905 isin 119879 = 1 2 119905end

Define 119909119901119896

(119905) sub EX as the cell of pursuer 119896 at time 119905 and119909119890119894

(119905) sub EX as the cell of evader 119894 at time 119905 119900(119905) is a set of cellswhere obstacles are detected Let 119880(119909

119890119894(119905)) denote the one-

step reachable set for the evader when the evader is in cell 119909

Mathematical Problems in Engineering 5

at time 119905 Then the probability of the evader being in cell 119909 attime 119905 + 1 is given by the following (10)

119901 (119909 119905 + 1 | 119909119890119894

(119905))

=

110038161003816100381610038161003816119880 (119909119890119894

(119905)) minus 119900 (119905)10038161003816100381610038161003816

119909 isin 119880 (119909119890119894

(119905))

0 119909 isin 119900 (119905)

(10)

where 119901(119909 119905 + 1 | 119909119890119894(119905)) represents the probability of the

evader being in cell 119909 at time 119905 + 1 according to the locationwhere the pursuer detects the evader at time 119905

34 Strategies for Plan Generation When the distancebetween the retrieved case and the new case is smaller thana threshold the above approach cannot generate a usefulplan for the new case To overcome this problem this studyproposes two plan-generation strategies local-max strategyand local-cooperative strategy

341 Local-Max Strategy This strategy is used only whenthere is a single pursuer for evaders Let 119878

119896(119910) be the set of all

cells that liewithin the sensing area of a pursuer 119896 at cell119910Thetotal evasion probability of the evaders at time 119905 associatedwith the pursuer is then obtained by the following (11)

119875 (119910 119905) = sum

119911isin119878119896(119910)

[119901 (119911 119905 + 1 | 119909119890119894

(119905))] (11)

By computing the evasion possibilities of all sensing cellspursuer 119896 moves to cell 119909

119901119896(119905 + 1) that has the highest total

evasion probability expressed as follows

119909119901119896

(119905 + 1) = arg max119910isin119880(119909119901

119896(119905))

[119875 (119910 119905)] (12)

342 Local-Cooperative Strategy This strategy is for pursuersthat cooperate with each other to catch the same evaderInitially pursuers need to decide whether they can cooperatewith each other or not Let 119889(119909 119910) be theManhattan distancefrom cell 119909 to cell 119910 that is if cells 119909 and 119910 lie in a two-dimensional space with coordinates (119909

1 1199092) and (119910

1 1199102)

the Manhattan distance is calculated using the followingequation

119889 (119909 119910) = max (10038161003816100381610038161199091 minus 119910

1

1003816100381610038161003816 10038161003816100381610038161199092 minus 119910

2

1003816100381610038161003816) (13)

Let 119878119896be the farthest sensing cell of a pursuer 119896 For

example if the sensing area of a pursuer 119896 is a 5 lowast 5 squareand the pursuer stands at the center of the area the farthestdistance that the pursuer can detect is two squares If thereexists two or more pursuers and one evader correspondingwith 119889(119909

119901119896(119905) 119909119890119894(119905)) le 119904

119896and 119889(119909

119901119896(119905) 119909119890119894(119905)) le 119904

119896 the

pursuers can cooperate with each otherSuppose that two or more pursuers execute the local-

cooperative strategy Let Overlap(119909 119910) be the number ofoverlap cells within the one-step reachable set (a 3lowast3 region)around cells 119909 and 119910 The overlap function is illustrated inFigure 2 The yellow grid represents the one-step reachable

EP1

P2

Figure 2 Number of overlap cells

set for a pursuer 1198751 the dashed grid depicts the one-stepreachable set for a pursuer 1198752 and the green grid representsthe position of an evader 119864 The number of overlap cells thusequals four (ie the return value of Overlap(119909 119910))

Once the pursuers can cooperate with each other thepursuers randomly select one of them as a leader Otherpursuers can determine their locations at time 119905 + 1 by thefollowing (14)

119909119901119896

(119905 + 1) = arg min119910isin119880(119909119901

119896(119905))

[Overlap (119909119901119896

(119905 + 1) 119910)] (14)

where 119909119901119896

(119905 + 1) is the location of the leader To avoidpursuers moving to the same location each pursuer finds theminimum overlap area

35 Reward Calculation Each case has its own reward forevaluating its result The higher the reward of a case is themore likely the goal will be achieved In this study the rewardis calculated using the following equation

119877 =10038161003816100381610038161003816119878119896

(119910) cap 119880 (119909119890119894

(119905))10038161003816100381610038161003816 (15)

Reward 119877 represents the number of one-step reachablesets for an evader in a pursuerrsquos sensing areaThe reward valueis between 0 and 10 If an evader is caught then the reward ofthe case will be 10

36 Assimilation and Accommodation Piagetrsquos cognitive the-ory [18] reveals that assimilation refers to the tendency tointerpret experience as much as possible through existingcognitive structure of knowing When an agent faces a newsituation it generates a new plan with the current cognitivestructure also called its strategy to direct itself what to donextThe entire process is called assimilation In other wordsan agent assimilates a new situation with its current cognitivestructure However if the current cognitive structure cannotexplain the environment (ie the current cognitive structurecannot get the equilibration of inconsistent informationabout the world) an agent has to use the strategy of accom-modation Accommodation refers to the realization that thecurrent structure is insufficient for adequate understandingof the world and that the current structure must be changeduntil it can assimilate the new situation

6 Mathematical Problems in Engineering

This study proposes two accommodationmethods bevel-angle and positive-angle strategies for modifying a cogni-tive structure After performing accommodation an agentemploys themodified strategy to assimilate the new situationBy adjusting constantly its cognitive structure the agent canadapt more effectively to the environment

361 Bevel-Angle Strategy Suppose that an evaderrsquos locationis in the one-step reachable set for a pursuer and the evader isat a bevel angle of the pursuerThe pursuer randomly choosesone of the following strategies when the evader is at 45∘ (1205874)135∘ (31205874) 225∘ (51205874) and 315∘ (71205874) of the pursuer Letdeg(119909

119901119896(119905) 119909119890119894(119905)) be the angle between the pursuer 119896 and

evader 119894 where 119909119890119894(119905) is the position of the evader and 119909

119901119896(119905)

is the position of the pursuer at time 119905

Strategy 1 According to (16) the pursuermoves to the evaderrsquosposition at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119890119894

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

(16)

Strategy 2 According to (17) the pursuer moves to 45∘ of theevaderrsquos position at time 119905+1 Let 119909

119895be an element of the one-

step reachable set for the pursuer 119896 when it is in cell 119909 at time119905

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 45

(17)

Strategy 3 According to (18) the pursuermoves tominus45∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 45

(18)

Strategy 4 According to (19) the pursuer stays in the sameplace at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119901119896

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

(19)

362 Positive-Angle Strategy Suppose that an evaderrsquos loca-tion is in the one-step reachable set for a pursuer and theevader is at a positive angle of the pursuer where the angleincludes 0∘ 90∘ (1205872) 180∘ (120587) and 270∘ (31205872)

Strategy 1 According to (20) the pursuer moves to theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119890119894

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

(20)

Strategy 2 According to (21) the pursuer moves to 45∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 45

(21)

Strategy 3 According to (22) the pursuer moves to minus45∘ ofthe evaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 45

(22)

Mathematical Problems in Engineering 7

Strategy 4 According to (23) the pursuer moves to 90∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 90

(23)

Strategy 5 According to (24) the pursuer moves to minus90∘ ofthe evaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 90

(24)

Strategy 6 According to (25) the pursuer stays in the sameplace at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119901119896

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

(25)

Combining both the bevel-angle and positive-anglestrategies the pursuer obtains 24 different plans to adapt tothe environment effectively

37 Plan Evolution Process Plan evolution plays an importantrole in an agentrsquos life cycle When a pursuer finds an evaderin the sensing area it generates a pursuit plan Initially thepursuer searches its case base for similar cases If similar casesexist the pursuer chooses one of them as the plan Otherwisethe pursuer uses one of the abovementioned strategies togenerate a plan When the pursuer faces a new state thepursuer will adjust the plan accordingly Figure 3 shows theagentrsquos plan evolution process

(1) Problem Analysis Before determining the location in thenext step a pursuer analyzes the environmental state andupdates its goal with information from the environmentThepursuer chooses a plan according to its mental state

(2) Case Retrieval Comparing the similarity between theresult of the analysis in Step 1 and the cases in the case base

the pursuer retrieves similar cases by the (Sim) function andsorts them according to their similarity levels

(3) Similarity Calculation Similarity is classified into low andhigh levels It is possible that the pursuer retrieves a case withhigh-level similarity but obtains unsatisfactory results andthus a reward 119877 is added to prevent such If the case is withhigh-level similarity and the reward 119877 exceeds a predefinedconstant 120576 the pursuer executes Step 4 If the case is with low-level similarity or the reward 119877 is smaller than the predefinedconstant 120576 the pursuer executes Step 5

(4) Case Reuse If the retrieved case is with high-levelsimilarity and its result is acceptable the pursuer reuses thecase with revisions and then goes to Step 10

(5) Strategy Assimilation If no similar case exists in thecase base or the result of the similar case is unsatisfactorythe pursuer uses strategy assimilation that is the pursuergenerates a suitable plan with the current strategy (Steps 6ndash8) according to the situation

(6) Cooperation Before using the strategy module the pur-suer decides whether to cooperate with the others or not Ifthe decision is to cooperate with the others in the situationthen the local-cooperative strategy (ie Step 8) is executedOtherwise the pursuer executes the local-max strategy (ieStep 7)

(7) Application of the Local-Max Strategy If the pursuerdecides not to cooperate with the others it executes the local-max strategy and moves to the cell in the one-step reachableset with the highest probability of containing an evaderThengo to Step 9

(8) Application of the Local-Cooperative Strategy If thecooperative mode is taken the pursuer executes the local-cooperative strategy and moves to the cell which has theminimum overlap sensing area

(9) Reward Calculation The pursuer evaluates the plan andcalculates a reward accordingly

(10) Plan TransformationThe pursuer represents the plan bybeliefs actions and rewards

(11) Strategy Assessment The pursuer evaluates whether thecurrent strategy is suitable or not The pursuer counts thenumber 119862 of consecutive failures in catching an evader byusing the strategy If 119862 is smaller than a constant 120590 thestrategy is effective Then go to Step 13 Otherwise thestrategy is accommodated by Step 12

(12) Strategy AccommodationThe pursuer modifies the strat-egy to generate a new plan for the current environment

(13) Case Revision The actions and reward of the case arerevised after the problem is solved

8 Mathematical Problems in Engineering

(1) Problem analysis

(2) Case retrieval

(3) Similarity calculation (4) Case reuse

(5) Strategy assimilation

(6) Cooperation

(7) Application of local-maxstrategy

(8) Application of local-cooperativestrategy

(9) Reward calculation

(10) Plan transformation

(12) Strategy accommodation(13) Case revision

(14) Case retention

(11) Strategy assessment

Case base

R gt 120576

R le 120576

C lt 120590

C ge 120590

Figure 3 Plan evolution process

(14) Case Retention Parts of the experience that are likely tobe useful for future problem solving are retained

4 Case Study

This study uses a pursuit-evasion game to demonstrate theproposed approach and implements a multiagent system onthe Recursive Porous Agent Simulation Toolkit (REPAST)[35] Figure 4 shows the GUI of the system where theblue red and black circles represent pursuers evaders andobstacles respectively The yellow region depicts the sensingarea of the pursuers

41 Problem Formulation The system contains multiplepursuing robots which search for multiple evading robotsin a two-dimensional environment The physical world isabstracted as a grid world and the game is played accordingto the following settings and rules

(i) The number of pursuers is 119899 The number of evadersis unknown to the pursuers

(ii) Each pursuer has a sensing area comprising 5lowast5 cellsand the pursuer is at the center of the area as shownin Figure 4

(iii) During each moving step the agents are allowed toeither move one cell away from their current cells orstay in their current cells

(iv) Each pursuer has no information about the environ-ment other than that of its own 5-by-5 sensing area

(v) All pursuers use the same case base(vi) An evader is captured when it occupies the same

cell as a pursuer and it will be removed from theenvironment

(vii) The game ends when all evaders have been captured

42 Evasion Policy When an evader is caught the systemfirst checks whether the sum of the caught and uncaughtevaders is smaller than the total number of the predefinedevaders If so the system generates a new evader and choosesits type of movement Otherwise the system does nothingClearly when the sum of the caught and uncaught evaders

Mathematical Problems in Engineering 9

Figure 4 Implementation of multiagent system

equals the number of the predefined evaders the systemstops generating new evaders and the uncaught evaders donot increase The uncaught evaders even decrease with thesuccessful evader captures The game thus ends finally Inaddition when the system selects a movement type for anew evader the choice varies with different experimentswhich in turn affects the outcome of the experience Thisstudy investigates four types of movement namely randomclockwise counterclockwise and smart movements

Random Movement An evader randomly selects a locationthat can be reached in one step or just stays in the same loca-tion The details of the algorithm are shown in Algorithm 1

Clockwise Movement An evader moves clockwise in a squarepath The evader initially selects the direction of its first stepand sets up a distance randomly If the evader runs into anobstacle on the way it changes its direction automaticallyAlgorithm 2 shows the algorithm

Counterclockwise Movement This movement type is similarto the clockwise movement except that an evader movescounterclockwise The algorithm is shown in Algorithm 3

Smart Moving Evaders try actively to avoid pursuers Eachevader has a square sensing area Each side of the square isthree cells long If the evader finds any pursuer in the area itattempts to move to the location that has fewer pursuers inthe one-step reachable set of the area Algorithm 4 presentsthe algorithm

43 Results andDiscussion This study designs several experi-ments to compare the performance of the proposed approachHysteretic Q-Learning [36] and Reinforcement-Learningfusing (RL-Fusing) [37] All these approaches are the learningtechniques formultiagent systems Each experiment has threepursuers and one evader In addition the experiments runin ldquotime steprdquo unit One round is considered finished whenthe evader has been caught The performance results arethe average values obtained by running 30 rounds with thecurrent learning policy after 100 training episodes for eachalgorithm

431 Comparison of Approaches Matignon et al pre-sented a decentralized reinforcement learning algorithm for

independent learning robots called Hysteretic Q-Learning[36]This algorithm computes a better policy in a cooperativemultiagent system without additional information or com-munication between agents Hysteretic Q-Learning updatesthe equation for an agent 119894 executing the action 119886

119894from states

ending up in state 1199041015840 as follows

120575 larr997888 119903 + 120574max1198861015840

119876119894(1199041015840

1198861015840

) minus 119876119894(119904 119886119894)

119876119894(119904 119886119894) larr997888

119876119894(119904 119886119894) + inc120575 if 120575 gt 0

119876119894(119904 119886119894) + dec120575 else

(26)

where 119876119894(119904 119886119894) is the action value function of state 119904 and

action 119886119894 119903 is the reward it receives 120574 isin [0 1] is the discount

factor and inc anddec are the rates of increase anddecrease in119876-values respectively Partalas et al proposed the RL-Fusingapproach that uses coordinated actions and a fusing processto guide the agents [37] The fusion process combines thedecisions of all agents and then outputs a global decisionwhich the agents must follow as a team Algorithm 5 showsthe RL-Fusing algorithm The set of strategies means theplans

432 Obstacles The first experiment compares the perfor-mance of the proposed approach Hysteretic Q-Learningand RL-Fusing in environments withwithout obstacles Theevader is assumed to move randomly in every round In theenvironment with obstacles 10 obstacles are randomly set upinitially and no evader is trapped in the obstacles

Table 1 shows the performance results As can be seen allthreemethods requiremore capture steps in the environmentwith obstacles because pursuers need extra time to detectand avoid obstacles Hysteretic Q-Learning takes the greatestnumber of capture steps in both environments because itdoes not support pursuer cooperation Furthermore RL-Fusing also takes more capture steps than the proposedscheme Although RL-Fusing involves cooperation betweenpursuers the generated strategies sometimes cause a pursuerto miss evaders in its sensing area The pursuer thus needs tosearch again and requires more capture steps In contrast theproposed approach utilizes not only the cooperative strategybut also assimilation and accommodation which enable apursuer to learn from either success or failure experiencesThus the pursuer can keep in step with an evader and catchit in a smaller number of capture steps

433 Movement Types The second experimentmeasures theaverage number of capture steps required by the proposedapproach Hysteretic Q-Learning and RL-Fusing under twotypes of evader movement One is clockwise movementand the other is random clockwise and counterclockwisemovement

The average number of capture steps required by thethree approaches for different movement types is shown inTable 2 Similar to the results obtained in the first experimentthe proposed scheme outperforms Hysteretic Q-Learningand RL-Fusing Among the three approaches Hysteretic Q-Learning requires the highest number of capture steps for

10 Mathematical Problems in Engineering

(1) while true do(2) next location larr generates randomly a location nearby the evader(3) if next location has an obstacle then(4) next location larr generates randomly a new location nearby the evader(5) end if(6) end while

Algorithm 1 Random movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than the environment(3) move larr 0(4) while true do(5) get the next location according to the direction(6) move add 1(7) if move equals to distance or next location has an obstacle then(8) evader turns right(9) changes direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 2 Clockwise movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than environment(3) move larr 0(4) while true do(7) if move equals to distance or next location has obstacle then(8) evader turns left(9) change direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 3 Counterclockwise movement algorithm

(1) while true do(2) if evaderrsquos sensor area has pursuer then(3) next location larr generate a location with least pursuers detected(4) else(5) next location larr randomly generate a location nearby the evader(6) end if(7) end while

Algorithm 4 Smart movement algorithm

the same reason that it does not support agent cooperationAccording to the RL-Fusing algorithm presented in Algo-rithm 5 when a state is not the coordinated state the actionof an agent chooses a random or predefined policy and asingle pursuer thus has no learning ability Therefore the

possibility of catching an evader is low and RL-Fusing alsotakes more capture steps In contrast when the pursuer ofthe proposed scheme faces an evader the pursuer observesthe evader continuously If the evader changes its movingtype the pursuer uses accommodation to change its catching

Mathematical Problems in Engineering 11

Require A set of strategies an initial state 1199040

(1) 119904 larr 1199040

(2) while true do(3) if 119904 is coordinated state then(4) 120588 larr RandomReal(0 1)

(5) if 120588 lt 120576 then(6) select a strategy randomly the same for all agent(7) else(8) rank strategies(9) communicate ranks(10) receive ranks from other agents(11) average ranks(12) select corresponding strategy(13) end if(14) execute selected strategy(15) receive reward(16) transit to a new state 119904

1015840

(17) update 119876-value(18) else(19) act with a predefined or random policy(20) end if(21) end while

Algorithm 5 RL-Fusing algorithm

Table 1 Average number of capture steps in different environments

Grid sizeWithoutobstacles(steps)

With obstacles(steps)

HystereticQ-Learning 12 lowast 12 68 73

RL-Fusing 12 lowast 12 61 65The proposedapproach 12 lowast 12 24 27

Table 2 Average number of capture steps for different movementtypes

Grid sizeClockwisemovement(steps)

Randomclockwise andcounterclock-wise movement

(steps)HystereticQ-Learning 12 lowast 12 56 69

RL-Fusing 12 lowast 12 48 56The proposedapproach 12 lowast 12 26 32

strategy If the moving type remains fixed the pursuer keepsthe same strategy by assimilation As seen in Table 2 allthree methods require more steps to capture evaders withrandom clockwise and counterclockwise movement becausepursuers need extra time to detect the moving directions ofthe evaders

Table 3 Average number of capture steps for evaders with smartmovement in different grid sizes

Grid size 12 lowast 12(steps)

Grid size 24 lowast 24(steps)

HystereticQ-Learning 74 212

RL-Fusing 63 300The proposedapproach 63 119

Table 4 Average number of capture steps withwithout assimilationand accommodation

Evader movementtype

Clockwise

Evader movementtypeSmart

With assimilation andaccommodation 186 278

Without assimilationand accommodation 242 364

434 Smart Movement and Size of Grid The third experi-ment involves evaders with smart movementThe evaders tryactively to avoid pursuers when they sense any pursuer Theenvironment is of two grid sizes 12 lowast 12 and 24 lowast 24

Table 3 shows that the proposed scheme requires smallercapture steps in different grid sizes when compared withHysteretic Q-Learning and RL-Fusing When the grid sizeis smaller RL-Fusing outperforms Hysteretic Q-LearningHowever as the grid size becomes larger RL-Fusing requiresmore capture steps than Hysteretic Q-Learning This is

12 Mathematical Problems in Engineering

0

20

40

60

80

100

120

140

10 20 30 40Training episodes

Step

s

Grid size 12 lowast 12 (steps)Grid size 24 lowast 24 (steps)

Figure 5 Average number of capture steps under different trainingepisodes and grid sizes

because RL-Fusing depends mainly on its cooperative strate-gies to catch evaders but not all strategies can guide pursuersto move close to the evaders For example strategy 3 in RL-Fusing [37] is that once all the pursuers go at the distance of3 from an evader Because the sensing area of a pursuer is a5 lowast 5 square the pursuers easily lose the evader in a largerenvironment In contrast the proposed approach assists thepursuers to move as close to the evader as possible by theavoidance of sensing overlappingThus the approach has thehighest possibility of catching the evader

435 Smart Movement Training Episodes and Size of GridThe fourth experiment investigates the relationship betweentraining episodes and grid sizes when using the proposedapproach

Figure 5 shows that regardless of grid sizes (12 lowast 12 or24 lowast 24) the greater the number of training episodes is thesmaller the average number of capture steps is required Thisis becausemore training enriches the case base thus enablingpursuers to make better strategies and decisions

436 Assimilation and Accommodation The last experimentevaluates the proposed approach withwithout assimilationand accommodation There are two types of evader move-ment clockwise and smart The results are listed in Table 4Whether the approach is with or without assimilation andaccommodation the average number of capture steps forsmartmovement is greater than that for clockwisemovementThis is because smart movement can enable an evader todynamically change its movement and position In contrastmovement of an evader with clockwise movement can beeasily predicted hence the pursuers can take fewer steps forcapture Furthermore the average number of capture steps

required by the approach with assimilation and accommo-dation is smaller than that required by the approach withoutassimilation and accommodation for both movement typesThe approach without assimilation and accommodation gen-erates a fixed plan which cannot be changed In contrastthe approach with assimilation and accommodation enablesthe first pursuer to use the local-max strategy to find abetter position and the remaining pursuers can determinethe cooperative positions according to the first pursuerrsquosposition These pursuers can adjust their positions accordingto different catching situations and strategies Accordinglythe approach with assimilation and accommodation requiresfewer capture steps when compared with the approachwithout assimilation and accommodation

5 Conclusion

This work proposes an effective strategy learning approachto the pursuit-evasion game in a dynamic environment Thismethod can facilitate a pursuer in learning and implementingeffective plans from either success or failure experiencesunder the condition of no knowledge of environment avail-able The evolved plans fare extremely well when comparedwith some of the best constructed strategies This studyalso demonstrates the superior performance of the proposedapproach for the pursuit-evasion game by implementing amultiagent system on REPAST

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Ministry of Science andTechnology under Grant no 102-2218-E-027-007

References

[1] K P Sycara ldquoMultiagent systemsrdquo AI Magazine vol 19 no 2pp 79ndash92 1998

[2] N Vlassis A Concise Introduction to Multiagent Systems andDistributed Artificial Intelligence Synthesis Lectures on Artifi-cial Intelligence and Machine Learning Morgan amp Claypool2007

[3] A Antoniades H J Kim and S Sastry ldquoPursuit-evasion strate-gies for teams of multiple agents with incomplete informationrdquoin Proceedings of the 42nd IEEE Conference on Decision andControl pp 756ndash761 December 2003

[4] L J Guibas J-C Latombe S M Lavalle D Lin and RMotwani ldquoA visibility-based pursuit-evasion problemrdquo Interna-tional Journal of Computational GeometryampApplications vol 9no 4-5 pp 471ndash493 1999

[5] J P Hespanha G J Pappas and M Prandini ldquoGreedy controlfor hybrid pur-suit-evasion gamesrdquo in Proceedings of the Euro-pean Control Conference pp 2621ndash2626 2001

[6] R Vidal O Shakernia H J Kim D H Shim and S SastryldquoProbabilistic pursuit-evasion games theory implementation

Mathematical Problems in Engineering 13

and experimental evaluationrdquo IEEE Transactions on Roboticsand Automation vol 18 no 5 pp 662ndash669 2002

[7] R E Korf ldquoA simple solution to pursuit gamesrdquo in Proceedingsof the 2nd International Workshop on Distributed ArtificialIntelligence pp 183ndash194 1992

[8] S Nolfi and D Floreano ldquoCoevolving predator and prey robotsdo ldquoarms racesrdquo arise in artificial evolutionrdquo Artificial Life vol4 no 4 pp 311ndash335 1998

[9] N J Savill and P Hogeweg ldquoEvolutionary stagnation due topattern-pattern interactions in a coevolutionary predator-preymodelrdquo Artificial Life vol 3 no 2 pp 81ndash100 1997

[10] J Y Kuo F-C Huang S-P Ma and Y-Y Fanjiang ldquoApplyinghybrid learning approach to RoboCuprsquos strategyrdquo Journal ofSystems and Software vol 86 no 7 pp 1933ndash1944 2013

[11] S I Nishimura and T Ikegami ldquoEmergence of collectivestrategies in a prey-predator game modelrdquo Artificial Life vol 3no 4 pp 243ndash260 1998

[12] K-C Jim and C L Giles ldquoTalking helps evolving communi-cating agents for the predator-prey pursuit problemrdquo ArtificialLife vol 6 no 3 pp 237ndash254 2000

[13] T Haynes and S Sen ldquoEvolving behavioral strategies in preda-tors and preyrdquo inAdaption and Learning inMulti-Agent Systemsvol 1042 of Lecture Notes in Computer Science pp 113ndash126Springer Berlin Germany 1995

[14] D Hladek J Vascak and P Sincak ldquoHierarchical fuzzy infer-ence system for robotic pursuit evasion taskrdquo in Proceedings ofthe 6th International SymposiumonAppliedMachine Intelligenceand Informatics (SAMI rsquo08) pp 273ndash277 Herlrsquoany SlovakiaJanuary 2008

[15] R G Smith ldquoThe contract net protocol high-level commu-nication and control in a distributed problem solverrdquo IEEETransactions on Computers vol 19 no 12 pp 1104ndash1113 1980

[16] P-C Zhou B-R Hong Y-H Wang and T Zhou ldquoMulti-agent coopoerative pursuit based on extended contract netprotocolrdquo in Proceedings of the International Conference onMachine Learning and Cybernetics vol 1 pp 169ndash173 August2004

[17] A Rao and M Georgeff ldquoBDI agents from theory to practicerdquoin Proceedings of the International Conference on Multi-AgentSystems pp 312ndash319 1995

[18] J Piaget The Equilibration of Cognitive Structures The CentralProblem of Intellectual Development University of ChicagoPress 1985

[19] J Y Kuo S J Lee C L Wu N L Hsueh and J Lee ldquoEvolu-tionary agents for intelligent transport systemsrdquo InternationalJournal of Fuzzy Systems vol 7 no 2 pp 85ndash93 2005

[20] J Y Kuo and F C Huang ldquoA hybrid approach for multi-agentlearning systemsrdquo Intelligent Automation and Soft Computingvol 17 no 3 pp 385ndash399 2011

[21] S Gebhardt P Grant R von Georgi and M T HuberldquoAspects of Piagetrsquos cognitive developmental psychology andneurobiology of psychotic disordersmdashan integrative modelrdquoMedical Hypotheses vol 71 no 3 pp 426ndash433 2008

[22] S Takamuku and R C Arkin ldquoMulti-method learning andassimilationrdquo Robotics and Autonomous Systems vol 55 no 8pp 618ndash627 2007

[23] T Parsons ldquoPursuit-evasion in a graphrdquo in Theory and Appli-cations of Graphs vol 642 of Lecture Notes in Mathematics pp426ndash441 Springer Heidelberg Germany 1978

[24] R Breisch ldquoAn intuitive approach to speleotopologyrdquo South-western Cavers vol 6 no 5 pp 72ndash78 1967

[25] F Ferrari ldquoA new parameterized potential family for path plan-ning algorithmsrdquo International Journal on Artificial IntelligenceTools vol 18 no 6 pp 949ndash957 2009

[26] K Walsh and B Banerjee ldquoFast Alowast with iterative resolution fornavigationrdquo International Journal on Artificial Intelligence Toolsvol 19 no 1 pp 101ndash119 2010

[27] L-P Wong M Y H Low and C S Chong ldquoBee colonyoptimization with local search for traveling salesman problemrdquoInternational Journal on Artificial Intelligence Tools vol 19 no3 pp 305ndash334 2010

[28] E Raboin D S Nau U Kuter S K Gupta and P SvecldquoStrategy generation inmulti-agent imperfect-information pur-suit gamesrdquo in Proceedings of the International Conference onAutonomous Agents and Multiagent Systems pp 947ndash954 2010

[29] B PGerkey SThrun andGGordon ldquoVisibility-based pursuit-evasion with limited field of Viewrdquo International Journal ofRobotics Research vol 25 no 4 pp 299ndash315 2006

[30] N Basilico N Gatti and F Amigoni ldquoLeader-follower strate-gies for robotic patrolling in environments with arbitrarytopologiesrdquo in Proceedings of the 8th International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMASrsquo09) pp 57ndash64 May 2009

[31] M Yan ldquoMulti-robot searching using game-theory basedapproachrdquo International Journal of Advanced Robotic Systemsvol 5 no 4 pp 341ndash350 2008

[32] K-DAlthoff ldquoCase-based reasoningrdquo inHandbook on SoftwareEngineering and Knowledge Engineering pp 549ndash587 2001

[33] E K Burke B L MacCarthy S Petrovic and R Qu ldquoMultiple-retrieval case-based reasoning for course timetabling prob-lemsrdquo Journal of the Operational Research Society vol 57 no2 pp 148ndash162 2006

[34] J Long D G Schwartz S Stoecklin and M K Patel ldquoAppli-cation of loop reduction to learning program behaviors foranomaly detectionrdquo in Proceedings of the International Confer-ence on Information Technology Coding and Computing (ITCC05) vol 1 pp 691ndash696 April 2005

[35] The Repast Suite The Repast Suite 2014 httprepastsource-forgenet

[36] L Matignon G J Laurent and N L Fort-Piat ldquoHysteretic Q-Learning an algorithm for decentralized reinforcement learn-ing in cooperative multi-agent teamsrdquo in Proceedings of theIEEERSJ International Conference on Intelligent Robots andSystems (IROS 07) pp 64ndash69 San Diego Calif USA October-November 2007

[37] I Partalas I Feneris and I Vlahavas ldquoA hybrid multiagentreinforcement learning approach using strategies and fusionrdquoInternational Journal on Artificial Intelligence Tools vol 17 no5 pp 945ndash962 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Multiagent Cooperative Learning ...downloads.hindawi.com/journals/mpe/2015/964871.pdf · Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion

2 Mathematical Problems in Engineering

Owing to the uncertain environment and the dynamicinformation of agents at each moment previous researchhas mainly examined how to improve the efficiency ofcommunication and accuracy in the learning process In adifferent manner this study focuses on agents with a mentalstate and the ability to plan evolution by using strategicmodules and combining learningmethods Armed with suchcapability the agents can adapt to the dynamic environment

The rest of the paper is organized as follows Section 2reviews the background and related works of pursuit-evasionproblems Section 3 introduces the agentrsquos adaptive learningprocess Section 4 describes the case study of the pursuit-evasion gameThe conclusions and directions for future workare presented in the last section

2 Background and Related Work

Pursuit-evasion problems have long been studied from theperspective of differential game theories Nishimura andIkegami observed random swarming and other collectivepredator motions in a prey-predator game model [11] Jimand Giles [12] proposed a simple effective predator strategywhich could enable predators to move to the closest captureposition This strategy does not prove to be very successfulbecause the predators may block each other when they tryto move to the same capture position Haynes and Sen[13] used genetic programming to evolve strategies for bothpredators and preys Hladek et al [14] developed amultiagentcontrol system using fuzzy inference for a group of two-wheeled mobile robots to execute a common task Theydefined a pursuit-evasion task using fuzzy sets to establish aframework for inserting and updating expert knowledge inthe form of rules by an inference system Antoniades et al[3] proposed several pursuit algorithms for solving complexmultiplayer pursuit-evasion games Their work revealed thatreducing sensing overlap between pursuers and avoidingoverassignment of pursuers to target locations could improvethe capture time According to the Contract Net Protocol(CNP) [15] Zhou et al [16] proposed a multiagent coopera-tive pursuit algorithm and extended the CNP by improvingalliance decision and forming dynamic alliance They usedthe support decision matrix that includes credits of agentsdegrees of hookup and degrees of self-confidence to helpagents make decisions during the negotiation process

Rao and Georgeff [17] proposed the belief-desire-intention (BDI) model for the agents in a distributed arti-ficial intelligence environment As its name implies theBDI agent model involves belief desire and intention Inaddition the assimilation and accommodation approachesto the pursuit-evasion game were proposed by Piaget [18]These two approaches equip an agent with a mental stateand the ability to plan an evolution process Using differentlearning methods the agent can also adapt to the dynamicenvironment effectively

21 BDI Agent Model On the foundation of BDI Kuo etal [19 20] proposed an agent which can be completelyspecified to fulfill its intentions by events it perceives actions

it performs beliefs it holds goals it adopts and plans it hasA goal module describes the goals that an agent may adoptas well as the events to which it can respond A belief moduleincludes the information about the internal state that an agentof a certain class holds the strategies it may perform andthe environment it is in A plan module generates the plansthat an agent may employ to achieve its goals A plan is asequence of actions or strategies derived through a reasoningmechanism

In the proposed approach belief denotes the agentrsquosknowledge of the environment including the orientationsand locations of evaders desire represents the agentrsquos wish tocatch all evaders and intention stands for the agentrsquos plan ofactions

22 Assimilation and Accommodation According to Piagetrsquoscognitive theory [18] each recognition system aims at equi-libration of the inconsistent information about the worldThus an organism seeks not only adaptation (harmony oforganism and world) but also organization (harmony withinitself) Assimilation and accommodation represent formsof the maintenance and modification of these cognitiveschemata [21] Takamuku andArkin [22] applied assimilationon domestic robots for social learning enabling the robots toperform well under various situations This study proposesboth bevel-angle and positive-angle accommodation strate-gies for modifying the agentrsquos cognitive structure

23 Pursuit-Evasion Game In a pursuit-evasion game pur-suers try to capture evaders by besieging them from alldirections in a grid world The game focuses mainly on theeffectiveness of structures with varying degrees of pursuersrsquocooperation and control to entrap evaders efficiently [13]The first mathematical formulation of graph searching wasproposed by Parsons [23] in 1978 The formulation wasinspired by an earlier research by Breisch [24] who put for-ward an approach to finding an explorer lost in a complicatedsystem of dark caves

Furthermore the pursuit-evasion game varies with dif-ferent approaches to searching evaders as described in thefollowing

231 Node Search and Mixed Search Ferrari [25] proposeda new family for path planning algorithms One algorithmis for searching the goal and the other is for searching theobstacles These algorithms can be utilized to parameterizeeasily the potential scale length and strength thus providingbetter control over the moving object path

To address some unique demands from the game domainWalsh and Banerjee [26] presented a new algorithm calledldquoVRAlowastrdquo algorithm for path finding on game maps andproposed also the extension of a postsmoothing techniqueWong et al presented a Bee Colony Optimization (BCO)algorithm for the symmetrical Traveling Salesman Problem(TSP) [27] The BCO model is constructed algorithmicallyaccording to the collective intelligence observed from theforaging behavior of bees The algorithm is integrated with

Mathematical Problems in Engineering 3

a fixed-radius near-neighbor 2-opt (FRNN 2-opt) heuristic tofurther improve prior solutions generated by the BCOmodel

Raboin et al [28] consideredmultiagent pursuit scenariosin which there is a team of tracker agents and a moving targetagent The trackers observe continuously the target until thetarget is at least one trackerrsquos observation range at the end ofthe game Their study described formalism and algorithmsfor game-tree search in partially observable Euclidean space

Gerkey et al [29] introduced a new class of searchersthe 120593-searchers which can be readily instantiated as aphysical mobile robot and proposed the first complete searchalgorithm for a single 120593-searcher They also showed how thisalgorithm can be extended to handle multiple searchers andgave various examples of computed trajectories Their workaimed at coordinating teams of robots to execute tasks inapplication domains such as clearing a building for reasonsof security or safety

232 Game Theoretic Game theoretic approaches topatrolling have increasingly become an interesting topic inrecent years Basilico et al [30] presented a game theoreticscheme to determine the optimal patrolling strategy for amobile robot that operates in environments with arbitrarytopologies

Prior research [31] proposed a game theory-basedapproach which uses a multirobot system to performmultitarget search in a dynamic environment A dynamicprogramming equation is employed to estimate the utilityfunction which considered the a priori probability maptravel costs and current decisions of other robots Accordingto this utility function a utility matrix can be calculated foran119873-robot nonzero-sum gameThus pureNash equilibriumand mixed-strategy equilibrium can be utilized to guide therobots in making their decisions

3 Adaptive Agent Learning

This section introduces the adaptive agent model the refine-ment of previously developed agent models [19 20] Thisadaptive agent model contains a hybrid approach for a multi-agent learningmethod In addition thismodel enables agentsto learn and accumulate their experience and knowledgefrom other agents or the environment

31 Adaptive Agent Model The agent model shown inFigure 1 is a cooperative learning model responsible forcontrolling the learning process A goal module returnsthe goals that an agent may possibly adopt and the eventsto which it can respond A belief module describes theinformation about the environment the internal state that acertain class of agents may hold and the strategies or tacticsthat the agentmay perform A planmodule returns plans thatare possibly employed to achieve the goals of the agent Aplan is a sequence of actions or strategies derived throughreasoning There are two types of ontology that provide thedomain-specific and issue-specific knowledge respectively

Belief

Goal

Basic actionAdaptive and cooperative

learning

Action

Strategy

Plan

Sensors

Figure 1 Agent model

The mental state of an agent is represented by first-orderlanguage as follows

119865119874119879119890119903119898 = ⟨119862119900119899119904119905⟩ |⟨119865119874119881⟩|

119891119906119899119888119905119900119903 ⟨119865119906119899119888119878119910119898 times 119904119890119902 119865119874119879119890119903119898119904⟩

119860119905119900119898 = ⟨119875119903119890119889119894119888119886119905119890 times 119904119890119902 119865119874119879119890119903119898119904⟩

(1)

A first-order term (119865119874119879119890119903119898) is defined by a set of con-stants (⟨119862119900119899119904119905⟩) first-order variables (⟨119865119874119881⟩) and functions(119891119906119899119888119905119900119903 ⟨119865119906119899119888119878119910119898 times 119904119890119902 119865119874119879119890119903119898119904⟩) An atom consists of apredicate and a set of 119865119874119879119890119903119898119904

311 Belief The beliefs of an agent describe the situationswhere the agent is The beliefs are specified by a belief basewhich contains the information the agent believes about theworld as well as the information that is internal to the agentThe agent is assumed to have beliefs about its task and itsenvironment A belief is represented by a set of well-formedformulas such as

119861119890119897119894119890119891 = ⟨119860119905119900119898⟩ |not ⟨119860119905119900119898⟩| ⟨119860119905119900119898⟩ [and ⟨119860119905119900119898⟩] (2)

For instance a belief that there is an obstacle (119900119887119904119905119886119888119897119890 119894119889)at a certain location (1198831 1198841) can be represented as(119900119887119904119905119886119888119897119890 119894119889 1198831 1198841)

312 Goal A goal is an agentrsquos desire and describes the statesof affairs that an agent would like to realize For instanceCatchAllEvader(True) stands for a pursuerrsquos goal to catchall evaders A goal is represented by a set of well-formedformulas such as

119866119900119886119897 = 119886119888ℎ119894119890V119890119889 ⟨119860119905119900119898⟩ (3)

313 Basic Actions Basic actions are generally used by anagent to manipulate its environment Before performing thebasic actions certain beliefs should be held The beliefs ofthe agent must be updated after the execution of actionsBasic actions are the only entities that can change the beliefsof an agent Let 120572 be an action with parameters 119909 and let120593 120595 isin 119887119890119897119894119890119891 The programming constructs for basic actionsare then expressed as

120593 120572 (119909) 120595 (4)

where 120593 and 120595 are preconditional and postconditionalrespectivelyThe precondition indicates that an action cannot

4 Mathematical Problems in Engineering

be performed only if certain beliefs are held For example apursuer is assumed to catch an evader from location (1198830 1198840)

to location (1198831 1198841) with a basic action 119872119900V119890119879119900(1198831 1198841)Before this action is executed the pursuer must be at position(1198830 1198840) denoted by 119901119906119903119904119906119890119903(119904119890119897119891 1198830 1198840) The conditionthat the pursuer senses the evader at position (1198831 1198841) isdenoted by 119890V119886119889119890119903119860119905(1198831 1198841) This action is then defined interms of the following beliefs

119901119906119903119904119906119890119903 (119904119890119897119891 1198830 1198840) 119890V119886119889119890119903119860119905 (1198831 1198841) (5)

After executing an action an agentmust update its beliefsto make its postcondition true According to the aboveexample after performing the catch action the agent updatesits beliefs and obtains

119901119906119903119904119906119890119903 (119904119890119897119891 1198831 1198841) 119890V119886119889119890119903119860119905 (1198832 1198842) (6)

where (1198832 1198842) is an evaderrsquos new location

314 Strategies The strategy of a player refers to one of theoptions that can be chosen in a setting The choice dependsnot only on the agentrsquos own actions but also on other agentsrsquoactions A prior study [28] proposed a pure strategy in termsof a function Given an agentrsquos current information set thefunction returns the agentrsquos next move (ie its change inlocation between time 119905

119895and 119905119895+1

) For example if an evaderhas a pure strategy 120590

0 then 120579

0(119905119895+1

) = 1205790(119905119895) + 1205900(1198680(119905119895))

where 1205790(119905119895) is the location of tracker agent 0 at time 119905

119895and

1198680(119905119895) is the trackerrsquos information set at time 119905

119895 Suppose that

a pursuer has a pure strategy set 120590 = (1205901 120590

119896) and the

locations of the tracker agents 120579 (119905) = 1205791(119905) 120579

119896(119905) Then

120579(119905119895+1

) = 120579(119905119895) + 120590(119868

0(119905119895)) for each time 119905

119895 The strategy

is further refined by adding an algorithm for selecting aplan according to different conditions The algorithm can berepresented by a function that generates a planThe conditionincludes the beliefs of agents or the current situation of theenvironment The proposed strategy is denoted by

119878119905119903119886119905119890119892119910 = ⟨119888119900119899119889119894119905119894119900119899119904⟩ 119891119906119899119888119905119894119900119899 (7)

315 Plans A plan is a method for an agent to approach itsgoal and is generated according to the agentrsquos strategy A planconsists of beliefs actions and rewards as expressed in thefollowing equation

119875119897119886119899 = ⟨119887119890119897119894119890119891⟩ ⟨119887119886119904119894119888 119886119888119905119894119900119899⟩ [⟨119887119886119904119894119888 119886119888119905119894119900119899⟩] ⟨119903119890119908119886119903119889⟩

(8)

A plan can be one or a set of actions for an agent and thevalue of the plan is determined by the reward

32 Case-Based Reasoning According to previous studies[32 33] a general case-based reasoning (CBR) cycle mayinclude retrieving the most similar cases for a problemreusing the information and knowledge in those cases to solvethe problem revising the proposed solution and retainingthe parts of this experience likely to be useful for futureproblem solving

In order to solve a new problem the cases which containthe most useful knowledge have to be identified first Sincethe utility of a case cannot be evaluated directly a priori thesimilarity between problem descriptions is used in heuristicsto estimate the expected utility of the cases Therefore thequality of this measure is crucial for the success of CBRapplications There are numerous similarity measures in usetoday They are used not only in CBR but also in otherfields including data mining pattern recognition geneticalgorithm and machine learning Euclidean distance is atypical similarity measure for objects in a Euclidean space[34] According to Euclidean distance this work devises asimple distance function to evaluate the similarity betweentwo cases

321 Case Description A case stands for an agentrsquos mentalstate and the result of its past output Each case consists of agoal beliefs an action a plan and a reward The goal is whatthe agent wants to achieve or realize The beliefs describe thesituation the agent is in The action is what the agent doesunder that situationThe plan is the approach the agent takesto achieve the goalThe reward serves to evaluate the result ofthe plan

322 Similarity Relation Consider the following

Sim119886119887

=

119899

sum

119894=1

119908119894times (

1003816100381610038161003816119886119894 minus 119887119894

1003816100381610038161003816) (9)

This study proposes an evaluation function (Sim) shownas (9) for measuring the similarity between cases 119886 and 119887In this equation 119886 and 119887 represent a new problem and acase in a case base respectively The variable 119899 is the numberof features in case 119886 The weight variable 119908

119894stands for the

importance of the 119894th feature The weight vector is freelydefined by a user 119886

119894and 119887119894are the 119894th features of cases 119886 and 119887

respectively In the case-retrieving stage of the CBR approachthe case with the smallest (Sim) value is always retrieved forreuse

33 Probabilistic Framework Suppose that a finite two-dimensional environment EX with 119899

119888square cells contains

an unknown number of fixed obstacles We also assume thatthe environment has 119899

119901pursuers and 119899

119890evaders Let 119909

119901119896and

119909119890119894be the cells occupied by pursuer 119896 and evader 119894 where

1 le 119894 le 119899119901and 1 le 119895 le 119899

119890 The pursuers and evaders

are restricted to move to cells not occupied by obstaclesEach pursuer collects information about EX at discrete timeinstances 119905 isin 119879 = 1 2 119905end

Define 119909119901119896

(119905) sub EX as the cell of pursuer 119896 at time 119905 and119909119890119894

(119905) sub EX as the cell of evader 119894 at time 119905 119900(119905) is a set of cellswhere obstacles are detected Let 119880(119909

119890119894(119905)) denote the one-

step reachable set for the evader when the evader is in cell 119909

Mathematical Problems in Engineering 5

at time 119905 Then the probability of the evader being in cell 119909 attime 119905 + 1 is given by the following (10)

119901 (119909 119905 + 1 | 119909119890119894

(119905))

=

110038161003816100381610038161003816119880 (119909119890119894

(119905)) minus 119900 (119905)10038161003816100381610038161003816

119909 isin 119880 (119909119890119894

(119905))

0 119909 isin 119900 (119905)

(10)

where 119901(119909 119905 + 1 | 119909119890119894(119905)) represents the probability of the

evader being in cell 119909 at time 119905 + 1 according to the locationwhere the pursuer detects the evader at time 119905

34 Strategies for Plan Generation When the distancebetween the retrieved case and the new case is smaller thana threshold the above approach cannot generate a usefulplan for the new case To overcome this problem this studyproposes two plan-generation strategies local-max strategyand local-cooperative strategy

341 Local-Max Strategy This strategy is used only whenthere is a single pursuer for evaders Let 119878

119896(119910) be the set of all

cells that liewithin the sensing area of a pursuer 119896 at cell119910Thetotal evasion probability of the evaders at time 119905 associatedwith the pursuer is then obtained by the following (11)

119875 (119910 119905) = sum

119911isin119878119896(119910)

[119901 (119911 119905 + 1 | 119909119890119894

(119905))] (11)

By computing the evasion possibilities of all sensing cellspursuer 119896 moves to cell 119909

119901119896(119905 + 1) that has the highest total

evasion probability expressed as follows

119909119901119896

(119905 + 1) = arg max119910isin119880(119909119901

119896(119905))

[119875 (119910 119905)] (12)

342 Local-Cooperative Strategy This strategy is for pursuersthat cooperate with each other to catch the same evaderInitially pursuers need to decide whether they can cooperatewith each other or not Let 119889(119909 119910) be theManhattan distancefrom cell 119909 to cell 119910 that is if cells 119909 and 119910 lie in a two-dimensional space with coordinates (119909

1 1199092) and (119910

1 1199102)

the Manhattan distance is calculated using the followingequation

119889 (119909 119910) = max (10038161003816100381610038161199091 minus 119910

1

1003816100381610038161003816 10038161003816100381610038161199092 minus 119910

2

1003816100381610038161003816) (13)

Let 119878119896be the farthest sensing cell of a pursuer 119896 For

example if the sensing area of a pursuer 119896 is a 5 lowast 5 squareand the pursuer stands at the center of the area the farthestdistance that the pursuer can detect is two squares If thereexists two or more pursuers and one evader correspondingwith 119889(119909

119901119896(119905) 119909119890119894(119905)) le 119904

119896and 119889(119909

119901119896(119905) 119909119890119894(119905)) le 119904

119896 the

pursuers can cooperate with each otherSuppose that two or more pursuers execute the local-

cooperative strategy Let Overlap(119909 119910) be the number ofoverlap cells within the one-step reachable set (a 3lowast3 region)around cells 119909 and 119910 The overlap function is illustrated inFigure 2 The yellow grid represents the one-step reachable

EP1

P2

Figure 2 Number of overlap cells

set for a pursuer 1198751 the dashed grid depicts the one-stepreachable set for a pursuer 1198752 and the green grid representsthe position of an evader 119864 The number of overlap cells thusequals four (ie the return value of Overlap(119909 119910))

Once the pursuers can cooperate with each other thepursuers randomly select one of them as a leader Otherpursuers can determine their locations at time 119905 + 1 by thefollowing (14)

119909119901119896

(119905 + 1) = arg min119910isin119880(119909119901

119896(119905))

[Overlap (119909119901119896

(119905 + 1) 119910)] (14)

where 119909119901119896

(119905 + 1) is the location of the leader To avoidpursuers moving to the same location each pursuer finds theminimum overlap area

35 Reward Calculation Each case has its own reward forevaluating its result The higher the reward of a case is themore likely the goal will be achieved In this study the rewardis calculated using the following equation

119877 =10038161003816100381610038161003816119878119896

(119910) cap 119880 (119909119890119894

(119905))10038161003816100381610038161003816 (15)

Reward 119877 represents the number of one-step reachablesets for an evader in a pursuerrsquos sensing areaThe reward valueis between 0 and 10 If an evader is caught then the reward ofthe case will be 10

36 Assimilation and Accommodation Piagetrsquos cognitive the-ory [18] reveals that assimilation refers to the tendency tointerpret experience as much as possible through existingcognitive structure of knowing When an agent faces a newsituation it generates a new plan with the current cognitivestructure also called its strategy to direct itself what to donextThe entire process is called assimilation In other wordsan agent assimilates a new situation with its current cognitivestructure However if the current cognitive structure cannotexplain the environment (ie the current cognitive structurecannot get the equilibration of inconsistent informationabout the world) an agent has to use the strategy of accom-modation Accommodation refers to the realization that thecurrent structure is insufficient for adequate understandingof the world and that the current structure must be changeduntil it can assimilate the new situation

6 Mathematical Problems in Engineering

This study proposes two accommodationmethods bevel-angle and positive-angle strategies for modifying a cogni-tive structure After performing accommodation an agentemploys themodified strategy to assimilate the new situationBy adjusting constantly its cognitive structure the agent canadapt more effectively to the environment

361 Bevel-Angle Strategy Suppose that an evaderrsquos locationis in the one-step reachable set for a pursuer and the evader isat a bevel angle of the pursuerThe pursuer randomly choosesone of the following strategies when the evader is at 45∘ (1205874)135∘ (31205874) 225∘ (51205874) and 315∘ (71205874) of the pursuer Letdeg(119909

119901119896(119905) 119909119890119894(119905)) be the angle between the pursuer 119896 and

evader 119894 where 119909119890119894(119905) is the position of the evader and 119909

119901119896(119905)

is the position of the pursuer at time 119905

Strategy 1 According to (16) the pursuermoves to the evaderrsquosposition at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119890119894

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

(16)

Strategy 2 According to (17) the pursuer moves to 45∘ of theevaderrsquos position at time 119905+1 Let 119909

119895be an element of the one-

step reachable set for the pursuer 119896 when it is in cell 119909 at time119905

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 45

(17)

Strategy 3 According to (18) the pursuermoves tominus45∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 45

(18)

Strategy 4 According to (19) the pursuer stays in the sameplace at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119901119896

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

(19)

362 Positive-Angle Strategy Suppose that an evaderrsquos loca-tion is in the one-step reachable set for a pursuer and theevader is at a positive angle of the pursuer where the angleincludes 0∘ 90∘ (1205872) 180∘ (120587) and 270∘ (31205872)

Strategy 1 According to (20) the pursuer moves to theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119890119894

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

(20)

Strategy 2 According to (21) the pursuer moves to 45∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 45

(21)

Strategy 3 According to (22) the pursuer moves to minus45∘ ofthe evaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 45

(22)

Mathematical Problems in Engineering 7

Strategy 4 According to (23) the pursuer moves to 90∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 90

(23)

Strategy 5 According to (24) the pursuer moves to minus90∘ ofthe evaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 90

(24)

Strategy 6 According to (25) the pursuer stays in the sameplace at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119901119896

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

(25)

Combining both the bevel-angle and positive-anglestrategies the pursuer obtains 24 different plans to adapt tothe environment effectively

37 Plan Evolution Process Plan evolution plays an importantrole in an agentrsquos life cycle When a pursuer finds an evaderin the sensing area it generates a pursuit plan Initially thepursuer searches its case base for similar cases If similar casesexist the pursuer chooses one of them as the plan Otherwisethe pursuer uses one of the abovementioned strategies togenerate a plan When the pursuer faces a new state thepursuer will adjust the plan accordingly Figure 3 shows theagentrsquos plan evolution process

(1) Problem Analysis Before determining the location in thenext step a pursuer analyzes the environmental state andupdates its goal with information from the environmentThepursuer chooses a plan according to its mental state

(2) Case Retrieval Comparing the similarity between theresult of the analysis in Step 1 and the cases in the case base

the pursuer retrieves similar cases by the (Sim) function andsorts them according to their similarity levels

(3) Similarity Calculation Similarity is classified into low andhigh levels It is possible that the pursuer retrieves a case withhigh-level similarity but obtains unsatisfactory results andthus a reward 119877 is added to prevent such If the case is withhigh-level similarity and the reward 119877 exceeds a predefinedconstant 120576 the pursuer executes Step 4 If the case is with low-level similarity or the reward 119877 is smaller than the predefinedconstant 120576 the pursuer executes Step 5

(4) Case Reuse If the retrieved case is with high-levelsimilarity and its result is acceptable the pursuer reuses thecase with revisions and then goes to Step 10

(5) Strategy Assimilation If no similar case exists in thecase base or the result of the similar case is unsatisfactorythe pursuer uses strategy assimilation that is the pursuergenerates a suitable plan with the current strategy (Steps 6ndash8) according to the situation

(6) Cooperation Before using the strategy module the pur-suer decides whether to cooperate with the others or not Ifthe decision is to cooperate with the others in the situationthen the local-cooperative strategy (ie Step 8) is executedOtherwise the pursuer executes the local-max strategy (ieStep 7)

(7) Application of the Local-Max Strategy If the pursuerdecides not to cooperate with the others it executes the local-max strategy and moves to the cell in the one-step reachableset with the highest probability of containing an evaderThengo to Step 9

(8) Application of the Local-Cooperative Strategy If thecooperative mode is taken the pursuer executes the local-cooperative strategy and moves to the cell which has theminimum overlap sensing area

(9) Reward Calculation The pursuer evaluates the plan andcalculates a reward accordingly

(10) Plan TransformationThe pursuer represents the plan bybeliefs actions and rewards

(11) Strategy Assessment The pursuer evaluates whether thecurrent strategy is suitable or not The pursuer counts thenumber 119862 of consecutive failures in catching an evader byusing the strategy If 119862 is smaller than a constant 120590 thestrategy is effective Then go to Step 13 Otherwise thestrategy is accommodated by Step 12

(12) Strategy AccommodationThe pursuer modifies the strat-egy to generate a new plan for the current environment

(13) Case Revision The actions and reward of the case arerevised after the problem is solved

8 Mathematical Problems in Engineering

(1) Problem analysis

(2) Case retrieval

(3) Similarity calculation (4) Case reuse

(5) Strategy assimilation

(6) Cooperation

(7) Application of local-maxstrategy

(8) Application of local-cooperativestrategy

(9) Reward calculation

(10) Plan transformation

(12) Strategy accommodation(13) Case revision

(14) Case retention

(11) Strategy assessment

Case base

R gt 120576

R le 120576

C lt 120590

C ge 120590

Figure 3 Plan evolution process

(14) Case Retention Parts of the experience that are likely tobe useful for future problem solving are retained

4 Case Study

This study uses a pursuit-evasion game to demonstrate theproposed approach and implements a multiagent system onthe Recursive Porous Agent Simulation Toolkit (REPAST)[35] Figure 4 shows the GUI of the system where theblue red and black circles represent pursuers evaders andobstacles respectively The yellow region depicts the sensingarea of the pursuers

41 Problem Formulation The system contains multiplepursuing robots which search for multiple evading robotsin a two-dimensional environment The physical world isabstracted as a grid world and the game is played accordingto the following settings and rules

(i) The number of pursuers is 119899 The number of evadersis unknown to the pursuers

(ii) Each pursuer has a sensing area comprising 5lowast5 cellsand the pursuer is at the center of the area as shownin Figure 4

(iii) During each moving step the agents are allowed toeither move one cell away from their current cells orstay in their current cells

(iv) Each pursuer has no information about the environ-ment other than that of its own 5-by-5 sensing area

(v) All pursuers use the same case base(vi) An evader is captured when it occupies the same

cell as a pursuer and it will be removed from theenvironment

(vii) The game ends when all evaders have been captured

42 Evasion Policy When an evader is caught the systemfirst checks whether the sum of the caught and uncaughtevaders is smaller than the total number of the predefinedevaders If so the system generates a new evader and choosesits type of movement Otherwise the system does nothingClearly when the sum of the caught and uncaught evaders

Mathematical Problems in Engineering 9

Figure 4 Implementation of multiagent system

equals the number of the predefined evaders the systemstops generating new evaders and the uncaught evaders donot increase The uncaught evaders even decrease with thesuccessful evader captures The game thus ends finally Inaddition when the system selects a movement type for anew evader the choice varies with different experimentswhich in turn affects the outcome of the experience Thisstudy investigates four types of movement namely randomclockwise counterclockwise and smart movements

Random Movement An evader randomly selects a locationthat can be reached in one step or just stays in the same loca-tion The details of the algorithm are shown in Algorithm 1

Clockwise Movement An evader moves clockwise in a squarepath The evader initially selects the direction of its first stepand sets up a distance randomly If the evader runs into anobstacle on the way it changes its direction automaticallyAlgorithm 2 shows the algorithm

Counterclockwise Movement This movement type is similarto the clockwise movement except that an evader movescounterclockwise The algorithm is shown in Algorithm 3

Smart Moving Evaders try actively to avoid pursuers Eachevader has a square sensing area Each side of the square isthree cells long If the evader finds any pursuer in the area itattempts to move to the location that has fewer pursuers inthe one-step reachable set of the area Algorithm 4 presentsthe algorithm

43 Results andDiscussion This study designs several experi-ments to compare the performance of the proposed approachHysteretic Q-Learning [36] and Reinforcement-Learningfusing (RL-Fusing) [37] All these approaches are the learningtechniques formultiagent systems Each experiment has threepursuers and one evader In addition the experiments runin ldquotime steprdquo unit One round is considered finished whenthe evader has been caught The performance results arethe average values obtained by running 30 rounds with thecurrent learning policy after 100 training episodes for eachalgorithm

431 Comparison of Approaches Matignon et al pre-sented a decentralized reinforcement learning algorithm for

independent learning robots called Hysteretic Q-Learning[36]This algorithm computes a better policy in a cooperativemultiagent system without additional information or com-munication between agents Hysteretic Q-Learning updatesthe equation for an agent 119894 executing the action 119886

119894from states

ending up in state 1199041015840 as follows

120575 larr997888 119903 + 120574max1198861015840

119876119894(1199041015840

1198861015840

) minus 119876119894(119904 119886119894)

119876119894(119904 119886119894) larr997888

119876119894(119904 119886119894) + inc120575 if 120575 gt 0

119876119894(119904 119886119894) + dec120575 else

(26)

where 119876119894(119904 119886119894) is the action value function of state 119904 and

action 119886119894 119903 is the reward it receives 120574 isin [0 1] is the discount

factor and inc anddec are the rates of increase anddecrease in119876-values respectively Partalas et al proposed the RL-Fusingapproach that uses coordinated actions and a fusing processto guide the agents [37] The fusion process combines thedecisions of all agents and then outputs a global decisionwhich the agents must follow as a team Algorithm 5 showsthe RL-Fusing algorithm The set of strategies means theplans

432 Obstacles The first experiment compares the perfor-mance of the proposed approach Hysteretic Q-Learningand RL-Fusing in environments withwithout obstacles Theevader is assumed to move randomly in every round In theenvironment with obstacles 10 obstacles are randomly set upinitially and no evader is trapped in the obstacles

Table 1 shows the performance results As can be seen allthreemethods requiremore capture steps in the environmentwith obstacles because pursuers need extra time to detectand avoid obstacles Hysteretic Q-Learning takes the greatestnumber of capture steps in both environments because itdoes not support pursuer cooperation Furthermore RL-Fusing also takes more capture steps than the proposedscheme Although RL-Fusing involves cooperation betweenpursuers the generated strategies sometimes cause a pursuerto miss evaders in its sensing area The pursuer thus needs tosearch again and requires more capture steps In contrast theproposed approach utilizes not only the cooperative strategybut also assimilation and accommodation which enable apursuer to learn from either success or failure experiencesThus the pursuer can keep in step with an evader and catchit in a smaller number of capture steps

433 Movement Types The second experimentmeasures theaverage number of capture steps required by the proposedapproach Hysteretic Q-Learning and RL-Fusing under twotypes of evader movement One is clockwise movementand the other is random clockwise and counterclockwisemovement

The average number of capture steps required by thethree approaches for different movement types is shown inTable 2 Similar to the results obtained in the first experimentthe proposed scheme outperforms Hysteretic Q-Learningand RL-Fusing Among the three approaches Hysteretic Q-Learning requires the highest number of capture steps for

10 Mathematical Problems in Engineering

(1) while true do(2) next location larr generates randomly a location nearby the evader(3) if next location has an obstacle then(4) next location larr generates randomly a new location nearby the evader(5) end if(6) end while

Algorithm 1 Random movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than the environment(3) move larr 0(4) while true do(5) get the next location according to the direction(6) move add 1(7) if move equals to distance or next location has an obstacle then(8) evader turns right(9) changes direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 2 Clockwise movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than environment(3) move larr 0(4) while true do(7) if move equals to distance or next location has obstacle then(8) evader turns left(9) change direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 3 Counterclockwise movement algorithm

(1) while true do(2) if evaderrsquos sensor area has pursuer then(3) next location larr generate a location with least pursuers detected(4) else(5) next location larr randomly generate a location nearby the evader(6) end if(7) end while

Algorithm 4 Smart movement algorithm

the same reason that it does not support agent cooperationAccording to the RL-Fusing algorithm presented in Algo-rithm 5 when a state is not the coordinated state the actionof an agent chooses a random or predefined policy and asingle pursuer thus has no learning ability Therefore the

possibility of catching an evader is low and RL-Fusing alsotakes more capture steps In contrast when the pursuer ofthe proposed scheme faces an evader the pursuer observesthe evader continuously If the evader changes its movingtype the pursuer uses accommodation to change its catching

Mathematical Problems in Engineering 11

Require A set of strategies an initial state 1199040

(1) 119904 larr 1199040

(2) while true do(3) if 119904 is coordinated state then(4) 120588 larr RandomReal(0 1)

(5) if 120588 lt 120576 then(6) select a strategy randomly the same for all agent(7) else(8) rank strategies(9) communicate ranks(10) receive ranks from other agents(11) average ranks(12) select corresponding strategy(13) end if(14) execute selected strategy(15) receive reward(16) transit to a new state 119904

1015840

(17) update 119876-value(18) else(19) act with a predefined or random policy(20) end if(21) end while

Algorithm 5 RL-Fusing algorithm

Table 1 Average number of capture steps in different environments

Grid sizeWithoutobstacles(steps)

With obstacles(steps)

HystereticQ-Learning 12 lowast 12 68 73

RL-Fusing 12 lowast 12 61 65The proposedapproach 12 lowast 12 24 27

Table 2 Average number of capture steps for different movementtypes

Grid sizeClockwisemovement(steps)

Randomclockwise andcounterclock-wise movement

(steps)HystereticQ-Learning 12 lowast 12 56 69

RL-Fusing 12 lowast 12 48 56The proposedapproach 12 lowast 12 26 32

strategy If the moving type remains fixed the pursuer keepsthe same strategy by assimilation As seen in Table 2 allthree methods require more steps to capture evaders withrandom clockwise and counterclockwise movement becausepursuers need extra time to detect the moving directions ofthe evaders

Table 3 Average number of capture steps for evaders with smartmovement in different grid sizes

Grid size 12 lowast 12(steps)

Grid size 24 lowast 24(steps)

HystereticQ-Learning 74 212

RL-Fusing 63 300The proposedapproach 63 119

Table 4 Average number of capture steps withwithout assimilationand accommodation

Evader movementtype

Clockwise

Evader movementtypeSmart

With assimilation andaccommodation 186 278

Without assimilationand accommodation 242 364

434 Smart Movement and Size of Grid The third experi-ment involves evaders with smart movementThe evaders tryactively to avoid pursuers when they sense any pursuer Theenvironment is of two grid sizes 12 lowast 12 and 24 lowast 24

Table 3 shows that the proposed scheme requires smallercapture steps in different grid sizes when compared withHysteretic Q-Learning and RL-Fusing When the grid sizeis smaller RL-Fusing outperforms Hysteretic Q-LearningHowever as the grid size becomes larger RL-Fusing requiresmore capture steps than Hysteretic Q-Learning This is

12 Mathematical Problems in Engineering

0

20

40

60

80

100

120

140

10 20 30 40Training episodes

Step

s

Grid size 12 lowast 12 (steps)Grid size 24 lowast 24 (steps)

Figure 5 Average number of capture steps under different trainingepisodes and grid sizes

because RL-Fusing depends mainly on its cooperative strate-gies to catch evaders but not all strategies can guide pursuersto move close to the evaders For example strategy 3 in RL-Fusing [37] is that once all the pursuers go at the distance of3 from an evader Because the sensing area of a pursuer is a5 lowast 5 square the pursuers easily lose the evader in a largerenvironment In contrast the proposed approach assists thepursuers to move as close to the evader as possible by theavoidance of sensing overlappingThus the approach has thehighest possibility of catching the evader

435 Smart Movement Training Episodes and Size of GridThe fourth experiment investigates the relationship betweentraining episodes and grid sizes when using the proposedapproach

Figure 5 shows that regardless of grid sizes (12 lowast 12 or24 lowast 24) the greater the number of training episodes is thesmaller the average number of capture steps is required Thisis becausemore training enriches the case base thus enablingpursuers to make better strategies and decisions

436 Assimilation and Accommodation The last experimentevaluates the proposed approach withwithout assimilationand accommodation There are two types of evader move-ment clockwise and smart The results are listed in Table 4Whether the approach is with or without assimilation andaccommodation the average number of capture steps forsmartmovement is greater than that for clockwisemovementThis is because smart movement can enable an evader todynamically change its movement and position In contrastmovement of an evader with clockwise movement can beeasily predicted hence the pursuers can take fewer steps forcapture Furthermore the average number of capture steps

required by the approach with assimilation and accommo-dation is smaller than that required by the approach withoutassimilation and accommodation for both movement typesThe approach without assimilation and accommodation gen-erates a fixed plan which cannot be changed In contrastthe approach with assimilation and accommodation enablesthe first pursuer to use the local-max strategy to find abetter position and the remaining pursuers can determinethe cooperative positions according to the first pursuerrsquosposition These pursuers can adjust their positions accordingto different catching situations and strategies Accordinglythe approach with assimilation and accommodation requiresfewer capture steps when compared with the approachwithout assimilation and accommodation

5 Conclusion

This work proposes an effective strategy learning approachto the pursuit-evasion game in a dynamic environment Thismethod can facilitate a pursuer in learning and implementingeffective plans from either success or failure experiencesunder the condition of no knowledge of environment avail-able The evolved plans fare extremely well when comparedwith some of the best constructed strategies This studyalso demonstrates the superior performance of the proposedapproach for the pursuit-evasion game by implementing amultiagent system on REPAST

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Ministry of Science andTechnology under Grant no 102-2218-E-027-007

References

[1] K P Sycara ldquoMultiagent systemsrdquo AI Magazine vol 19 no 2pp 79ndash92 1998

[2] N Vlassis A Concise Introduction to Multiagent Systems andDistributed Artificial Intelligence Synthesis Lectures on Artifi-cial Intelligence and Machine Learning Morgan amp Claypool2007

[3] A Antoniades H J Kim and S Sastry ldquoPursuit-evasion strate-gies for teams of multiple agents with incomplete informationrdquoin Proceedings of the 42nd IEEE Conference on Decision andControl pp 756ndash761 December 2003

[4] L J Guibas J-C Latombe S M Lavalle D Lin and RMotwani ldquoA visibility-based pursuit-evasion problemrdquo Interna-tional Journal of Computational GeometryampApplications vol 9no 4-5 pp 471ndash493 1999

[5] J P Hespanha G J Pappas and M Prandini ldquoGreedy controlfor hybrid pur-suit-evasion gamesrdquo in Proceedings of the Euro-pean Control Conference pp 2621ndash2626 2001

[6] R Vidal O Shakernia H J Kim D H Shim and S SastryldquoProbabilistic pursuit-evasion games theory implementation

Mathematical Problems in Engineering 13

and experimental evaluationrdquo IEEE Transactions on Roboticsand Automation vol 18 no 5 pp 662ndash669 2002

[7] R E Korf ldquoA simple solution to pursuit gamesrdquo in Proceedingsof the 2nd International Workshop on Distributed ArtificialIntelligence pp 183ndash194 1992

[8] S Nolfi and D Floreano ldquoCoevolving predator and prey robotsdo ldquoarms racesrdquo arise in artificial evolutionrdquo Artificial Life vol4 no 4 pp 311ndash335 1998

[9] N J Savill and P Hogeweg ldquoEvolutionary stagnation due topattern-pattern interactions in a coevolutionary predator-preymodelrdquo Artificial Life vol 3 no 2 pp 81ndash100 1997

[10] J Y Kuo F-C Huang S-P Ma and Y-Y Fanjiang ldquoApplyinghybrid learning approach to RoboCuprsquos strategyrdquo Journal ofSystems and Software vol 86 no 7 pp 1933ndash1944 2013

[11] S I Nishimura and T Ikegami ldquoEmergence of collectivestrategies in a prey-predator game modelrdquo Artificial Life vol 3no 4 pp 243ndash260 1998

[12] K-C Jim and C L Giles ldquoTalking helps evolving communi-cating agents for the predator-prey pursuit problemrdquo ArtificialLife vol 6 no 3 pp 237ndash254 2000

[13] T Haynes and S Sen ldquoEvolving behavioral strategies in preda-tors and preyrdquo inAdaption and Learning inMulti-Agent Systemsvol 1042 of Lecture Notes in Computer Science pp 113ndash126Springer Berlin Germany 1995

[14] D Hladek J Vascak and P Sincak ldquoHierarchical fuzzy infer-ence system for robotic pursuit evasion taskrdquo in Proceedings ofthe 6th International SymposiumonAppliedMachine Intelligenceand Informatics (SAMI rsquo08) pp 273ndash277 Herlrsquoany SlovakiaJanuary 2008

[15] R G Smith ldquoThe contract net protocol high-level commu-nication and control in a distributed problem solverrdquo IEEETransactions on Computers vol 19 no 12 pp 1104ndash1113 1980

[16] P-C Zhou B-R Hong Y-H Wang and T Zhou ldquoMulti-agent coopoerative pursuit based on extended contract netprotocolrdquo in Proceedings of the International Conference onMachine Learning and Cybernetics vol 1 pp 169ndash173 August2004

[17] A Rao and M Georgeff ldquoBDI agents from theory to practicerdquoin Proceedings of the International Conference on Multi-AgentSystems pp 312ndash319 1995

[18] J Piaget The Equilibration of Cognitive Structures The CentralProblem of Intellectual Development University of ChicagoPress 1985

[19] J Y Kuo S J Lee C L Wu N L Hsueh and J Lee ldquoEvolu-tionary agents for intelligent transport systemsrdquo InternationalJournal of Fuzzy Systems vol 7 no 2 pp 85ndash93 2005

[20] J Y Kuo and F C Huang ldquoA hybrid approach for multi-agentlearning systemsrdquo Intelligent Automation and Soft Computingvol 17 no 3 pp 385ndash399 2011

[21] S Gebhardt P Grant R von Georgi and M T HuberldquoAspects of Piagetrsquos cognitive developmental psychology andneurobiology of psychotic disordersmdashan integrative modelrdquoMedical Hypotheses vol 71 no 3 pp 426ndash433 2008

[22] S Takamuku and R C Arkin ldquoMulti-method learning andassimilationrdquo Robotics and Autonomous Systems vol 55 no 8pp 618ndash627 2007

[23] T Parsons ldquoPursuit-evasion in a graphrdquo in Theory and Appli-cations of Graphs vol 642 of Lecture Notes in Mathematics pp426ndash441 Springer Heidelberg Germany 1978

[24] R Breisch ldquoAn intuitive approach to speleotopologyrdquo South-western Cavers vol 6 no 5 pp 72ndash78 1967

[25] F Ferrari ldquoA new parameterized potential family for path plan-ning algorithmsrdquo International Journal on Artificial IntelligenceTools vol 18 no 6 pp 949ndash957 2009

[26] K Walsh and B Banerjee ldquoFast Alowast with iterative resolution fornavigationrdquo International Journal on Artificial Intelligence Toolsvol 19 no 1 pp 101ndash119 2010

[27] L-P Wong M Y H Low and C S Chong ldquoBee colonyoptimization with local search for traveling salesman problemrdquoInternational Journal on Artificial Intelligence Tools vol 19 no3 pp 305ndash334 2010

[28] E Raboin D S Nau U Kuter S K Gupta and P SvecldquoStrategy generation inmulti-agent imperfect-information pur-suit gamesrdquo in Proceedings of the International Conference onAutonomous Agents and Multiagent Systems pp 947ndash954 2010

[29] B PGerkey SThrun andGGordon ldquoVisibility-based pursuit-evasion with limited field of Viewrdquo International Journal ofRobotics Research vol 25 no 4 pp 299ndash315 2006

[30] N Basilico N Gatti and F Amigoni ldquoLeader-follower strate-gies for robotic patrolling in environments with arbitrarytopologiesrdquo in Proceedings of the 8th International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMASrsquo09) pp 57ndash64 May 2009

[31] M Yan ldquoMulti-robot searching using game-theory basedapproachrdquo International Journal of Advanced Robotic Systemsvol 5 no 4 pp 341ndash350 2008

[32] K-DAlthoff ldquoCase-based reasoningrdquo inHandbook on SoftwareEngineering and Knowledge Engineering pp 549ndash587 2001

[33] E K Burke B L MacCarthy S Petrovic and R Qu ldquoMultiple-retrieval case-based reasoning for course timetabling prob-lemsrdquo Journal of the Operational Research Society vol 57 no2 pp 148ndash162 2006

[34] J Long D G Schwartz S Stoecklin and M K Patel ldquoAppli-cation of loop reduction to learning program behaviors foranomaly detectionrdquo in Proceedings of the International Confer-ence on Information Technology Coding and Computing (ITCC05) vol 1 pp 691ndash696 April 2005

[35] The Repast Suite The Repast Suite 2014 httprepastsource-forgenet

[36] L Matignon G J Laurent and N L Fort-Piat ldquoHysteretic Q-Learning an algorithm for decentralized reinforcement learn-ing in cooperative multi-agent teamsrdquo in Proceedings of theIEEERSJ International Conference on Intelligent Robots andSystems (IROS 07) pp 64ndash69 San Diego Calif USA October-November 2007

[37] I Partalas I Feneris and I Vlahavas ldquoA hybrid multiagentreinforcement learning approach using strategies and fusionrdquoInternational Journal on Artificial Intelligence Tools vol 17 no5 pp 945ndash962 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Multiagent Cooperative Learning ...downloads.hindawi.com/journals/mpe/2015/964871.pdf · Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion

Mathematical Problems in Engineering 3

a fixed-radius near-neighbor 2-opt (FRNN 2-opt) heuristic tofurther improve prior solutions generated by the BCOmodel

Raboin et al [28] consideredmultiagent pursuit scenariosin which there is a team of tracker agents and a moving targetagent The trackers observe continuously the target until thetarget is at least one trackerrsquos observation range at the end ofthe game Their study described formalism and algorithmsfor game-tree search in partially observable Euclidean space

Gerkey et al [29] introduced a new class of searchersthe 120593-searchers which can be readily instantiated as aphysical mobile robot and proposed the first complete searchalgorithm for a single 120593-searcher They also showed how thisalgorithm can be extended to handle multiple searchers andgave various examples of computed trajectories Their workaimed at coordinating teams of robots to execute tasks inapplication domains such as clearing a building for reasonsof security or safety

232 Game Theoretic Game theoretic approaches topatrolling have increasingly become an interesting topic inrecent years Basilico et al [30] presented a game theoreticscheme to determine the optimal patrolling strategy for amobile robot that operates in environments with arbitrarytopologies

Prior research [31] proposed a game theory-basedapproach which uses a multirobot system to performmultitarget search in a dynamic environment A dynamicprogramming equation is employed to estimate the utilityfunction which considered the a priori probability maptravel costs and current decisions of other robots Accordingto this utility function a utility matrix can be calculated foran119873-robot nonzero-sum gameThus pureNash equilibriumand mixed-strategy equilibrium can be utilized to guide therobots in making their decisions

3 Adaptive Agent Learning

This section introduces the adaptive agent model the refine-ment of previously developed agent models [19 20] Thisadaptive agent model contains a hybrid approach for a multi-agent learningmethod In addition thismodel enables agentsto learn and accumulate their experience and knowledgefrom other agents or the environment

31 Adaptive Agent Model The agent model shown inFigure 1 is a cooperative learning model responsible forcontrolling the learning process A goal module returnsthe goals that an agent may possibly adopt and the eventsto which it can respond A belief module describes theinformation about the environment the internal state that acertain class of agents may hold and the strategies or tacticsthat the agentmay perform A planmodule returns plans thatare possibly employed to achieve the goals of the agent Aplan is a sequence of actions or strategies derived throughreasoning There are two types of ontology that provide thedomain-specific and issue-specific knowledge respectively

Belief

Goal

Basic actionAdaptive and cooperative

learning

Action

Strategy

Plan

Sensors

Figure 1 Agent model

The mental state of an agent is represented by first-orderlanguage as follows

119865119874119879119890119903119898 = ⟨119862119900119899119904119905⟩ |⟨119865119874119881⟩|

119891119906119899119888119905119900119903 ⟨119865119906119899119888119878119910119898 times 119904119890119902 119865119874119879119890119903119898119904⟩

119860119905119900119898 = ⟨119875119903119890119889119894119888119886119905119890 times 119904119890119902 119865119874119879119890119903119898119904⟩

(1)

A first-order term (119865119874119879119890119903119898) is defined by a set of con-stants (⟨119862119900119899119904119905⟩) first-order variables (⟨119865119874119881⟩) and functions(119891119906119899119888119905119900119903 ⟨119865119906119899119888119878119910119898 times 119904119890119902 119865119874119879119890119903119898119904⟩) An atom consists of apredicate and a set of 119865119874119879119890119903119898119904

311 Belief The beliefs of an agent describe the situationswhere the agent is The beliefs are specified by a belief basewhich contains the information the agent believes about theworld as well as the information that is internal to the agentThe agent is assumed to have beliefs about its task and itsenvironment A belief is represented by a set of well-formedformulas such as

119861119890119897119894119890119891 = ⟨119860119905119900119898⟩ |not ⟨119860119905119900119898⟩| ⟨119860119905119900119898⟩ [and ⟨119860119905119900119898⟩] (2)

For instance a belief that there is an obstacle (119900119887119904119905119886119888119897119890 119894119889)at a certain location (1198831 1198841) can be represented as(119900119887119904119905119886119888119897119890 119894119889 1198831 1198841)

312 Goal A goal is an agentrsquos desire and describes the statesof affairs that an agent would like to realize For instanceCatchAllEvader(True) stands for a pursuerrsquos goal to catchall evaders A goal is represented by a set of well-formedformulas such as

119866119900119886119897 = 119886119888ℎ119894119890V119890119889 ⟨119860119905119900119898⟩ (3)

313 Basic Actions Basic actions are generally used by anagent to manipulate its environment Before performing thebasic actions certain beliefs should be held The beliefs ofthe agent must be updated after the execution of actionsBasic actions are the only entities that can change the beliefsof an agent Let 120572 be an action with parameters 119909 and let120593 120595 isin 119887119890119897119894119890119891 The programming constructs for basic actionsare then expressed as

120593 120572 (119909) 120595 (4)

where 120593 and 120595 are preconditional and postconditionalrespectivelyThe precondition indicates that an action cannot

4 Mathematical Problems in Engineering

be performed only if certain beliefs are held For example apursuer is assumed to catch an evader from location (1198830 1198840)

to location (1198831 1198841) with a basic action 119872119900V119890119879119900(1198831 1198841)Before this action is executed the pursuer must be at position(1198830 1198840) denoted by 119901119906119903119904119906119890119903(119904119890119897119891 1198830 1198840) The conditionthat the pursuer senses the evader at position (1198831 1198841) isdenoted by 119890V119886119889119890119903119860119905(1198831 1198841) This action is then defined interms of the following beliefs

119901119906119903119904119906119890119903 (119904119890119897119891 1198830 1198840) 119890V119886119889119890119903119860119905 (1198831 1198841) (5)

After executing an action an agentmust update its beliefsto make its postcondition true According to the aboveexample after performing the catch action the agent updatesits beliefs and obtains

119901119906119903119904119906119890119903 (119904119890119897119891 1198831 1198841) 119890V119886119889119890119903119860119905 (1198832 1198842) (6)

where (1198832 1198842) is an evaderrsquos new location

314 Strategies The strategy of a player refers to one of theoptions that can be chosen in a setting The choice dependsnot only on the agentrsquos own actions but also on other agentsrsquoactions A prior study [28] proposed a pure strategy in termsof a function Given an agentrsquos current information set thefunction returns the agentrsquos next move (ie its change inlocation between time 119905

119895and 119905119895+1

) For example if an evaderhas a pure strategy 120590

0 then 120579

0(119905119895+1

) = 1205790(119905119895) + 1205900(1198680(119905119895))

where 1205790(119905119895) is the location of tracker agent 0 at time 119905

119895and

1198680(119905119895) is the trackerrsquos information set at time 119905

119895 Suppose that

a pursuer has a pure strategy set 120590 = (1205901 120590

119896) and the

locations of the tracker agents 120579 (119905) = 1205791(119905) 120579

119896(119905) Then

120579(119905119895+1

) = 120579(119905119895) + 120590(119868

0(119905119895)) for each time 119905

119895 The strategy

is further refined by adding an algorithm for selecting aplan according to different conditions The algorithm can berepresented by a function that generates a planThe conditionincludes the beliefs of agents or the current situation of theenvironment The proposed strategy is denoted by

119878119905119903119886119905119890119892119910 = ⟨119888119900119899119889119894119905119894119900119899119904⟩ 119891119906119899119888119905119894119900119899 (7)

315 Plans A plan is a method for an agent to approach itsgoal and is generated according to the agentrsquos strategy A planconsists of beliefs actions and rewards as expressed in thefollowing equation

119875119897119886119899 = ⟨119887119890119897119894119890119891⟩ ⟨119887119886119904119894119888 119886119888119905119894119900119899⟩ [⟨119887119886119904119894119888 119886119888119905119894119900119899⟩] ⟨119903119890119908119886119903119889⟩

(8)

A plan can be one or a set of actions for an agent and thevalue of the plan is determined by the reward

32 Case-Based Reasoning According to previous studies[32 33] a general case-based reasoning (CBR) cycle mayinclude retrieving the most similar cases for a problemreusing the information and knowledge in those cases to solvethe problem revising the proposed solution and retainingthe parts of this experience likely to be useful for futureproblem solving

In order to solve a new problem the cases which containthe most useful knowledge have to be identified first Sincethe utility of a case cannot be evaluated directly a priori thesimilarity between problem descriptions is used in heuristicsto estimate the expected utility of the cases Therefore thequality of this measure is crucial for the success of CBRapplications There are numerous similarity measures in usetoday They are used not only in CBR but also in otherfields including data mining pattern recognition geneticalgorithm and machine learning Euclidean distance is atypical similarity measure for objects in a Euclidean space[34] According to Euclidean distance this work devises asimple distance function to evaluate the similarity betweentwo cases

321 Case Description A case stands for an agentrsquos mentalstate and the result of its past output Each case consists of agoal beliefs an action a plan and a reward The goal is whatthe agent wants to achieve or realize The beliefs describe thesituation the agent is in The action is what the agent doesunder that situationThe plan is the approach the agent takesto achieve the goalThe reward serves to evaluate the result ofthe plan

322 Similarity Relation Consider the following

Sim119886119887

=

119899

sum

119894=1

119908119894times (

1003816100381610038161003816119886119894 minus 119887119894

1003816100381610038161003816) (9)

This study proposes an evaluation function (Sim) shownas (9) for measuring the similarity between cases 119886 and 119887In this equation 119886 and 119887 represent a new problem and acase in a case base respectively The variable 119899 is the numberof features in case 119886 The weight variable 119908

119894stands for the

importance of the 119894th feature The weight vector is freelydefined by a user 119886

119894and 119887119894are the 119894th features of cases 119886 and 119887

respectively In the case-retrieving stage of the CBR approachthe case with the smallest (Sim) value is always retrieved forreuse

33 Probabilistic Framework Suppose that a finite two-dimensional environment EX with 119899

119888square cells contains

an unknown number of fixed obstacles We also assume thatthe environment has 119899

119901pursuers and 119899

119890evaders Let 119909

119901119896and

119909119890119894be the cells occupied by pursuer 119896 and evader 119894 where

1 le 119894 le 119899119901and 1 le 119895 le 119899

119890 The pursuers and evaders

are restricted to move to cells not occupied by obstaclesEach pursuer collects information about EX at discrete timeinstances 119905 isin 119879 = 1 2 119905end

Define 119909119901119896

(119905) sub EX as the cell of pursuer 119896 at time 119905 and119909119890119894

(119905) sub EX as the cell of evader 119894 at time 119905 119900(119905) is a set of cellswhere obstacles are detected Let 119880(119909

119890119894(119905)) denote the one-

step reachable set for the evader when the evader is in cell 119909

Mathematical Problems in Engineering 5

at time 119905 Then the probability of the evader being in cell 119909 attime 119905 + 1 is given by the following (10)

119901 (119909 119905 + 1 | 119909119890119894

(119905))

=

110038161003816100381610038161003816119880 (119909119890119894

(119905)) minus 119900 (119905)10038161003816100381610038161003816

119909 isin 119880 (119909119890119894

(119905))

0 119909 isin 119900 (119905)

(10)

where 119901(119909 119905 + 1 | 119909119890119894(119905)) represents the probability of the

evader being in cell 119909 at time 119905 + 1 according to the locationwhere the pursuer detects the evader at time 119905

34 Strategies for Plan Generation When the distancebetween the retrieved case and the new case is smaller thana threshold the above approach cannot generate a usefulplan for the new case To overcome this problem this studyproposes two plan-generation strategies local-max strategyand local-cooperative strategy

341 Local-Max Strategy This strategy is used only whenthere is a single pursuer for evaders Let 119878

119896(119910) be the set of all

cells that liewithin the sensing area of a pursuer 119896 at cell119910Thetotal evasion probability of the evaders at time 119905 associatedwith the pursuer is then obtained by the following (11)

119875 (119910 119905) = sum

119911isin119878119896(119910)

[119901 (119911 119905 + 1 | 119909119890119894

(119905))] (11)

By computing the evasion possibilities of all sensing cellspursuer 119896 moves to cell 119909

119901119896(119905 + 1) that has the highest total

evasion probability expressed as follows

119909119901119896

(119905 + 1) = arg max119910isin119880(119909119901

119896(119905))

[119875 (119910 119905)] (12)

342 Local-Cooperative Strategy This strategy is for pursuersthat cooperate with each other to catch the same evaderInitially pursuers need to decide whether they can cooperatewith each other or not Let 119889(119909 119910) be theManhattan distancefrom cell 119909 to cell 119910 that is if cells 119909 and 119910 lie in a two-dimensional space with coordinates (119909

1 1199092) and (119910

1 1199102)

the Manhattan distance is calculated using the followingequation

119889 (119909 119910) = max (10038161003816100381610038161199091 minus 119910

1

1003816100381610038161003816 10038161003816100381610038161199092 minus 119910

2

1003816100381610038161003816) (13)

Let 119878119896be the farthest sensing cell of a pursuer 119896 For

example if the sensing area of a pursuer 119896 is a 5 lowast 5 squareand the pursuer stands at the center of the area the farthestdistance that the pursuer can detect is two squares If thereexists two or more pursuers and one evader correspondingwith 119889(119909

119901119896(119905) 119909119890119894(119905)) le 119904

119896and 119889(119909

119901119896(119905) 119909119890119894(119905)) le 119904

119896 the

pursuers can cooperate with each otherSuppose that two or more pursuers execute the local-

cooperative strategy Let Overlap(119909 119910) be the number ofoverlap cells within the one-step reachable set (a 3lowast3 region)around cells 119909 and 119910 The overlap function is illustrated inFigure 2 The yellow grid represents the one-step reachable

EP1

P2

Figure 2 Number of overlap cells

set for a pursuer 1198751 the dashed grid depicts the one-stepreachable set for a pursuer 1198752 and the green grid representsthe position of an evader 119864 The number of overlap cells thusequals four (ie the return value of Overlap(119909 119910))

Once the pursuers can cooperate with each other thepursuers randomly select one of them as a leader Otherpursuers can determine their locations at time 119905 + 1 by thefollowing (14)

119909119901119896

(119905 + 1) = arg min119910isin119880(119909119901

119896(119905))

[Overlap (119909119901119896

(119905 + 1) 119910)] (14)

where 119909119901119896

(119905 + 1) is the location of the leader To avoidpursuers moving to the same location each pursuer finds theminimum overlap area

35 Reward Calculation Each case has its own reward forevaluating its result The higher the reward of a case is themore likely the goal will be achieved In this study the rewardis calculated using the following equation

119877 =10038161003816100381610038161003816119878119896

(119910) cap 119880 (119909119890119894

(119905))10038161003816100381610038161003816 (15)

Reward 119877 represents the number of one-step reachablesets for an evader in a pursuerrsquos sensing areaThe reward valueis between 0 and 10 If an evader is caught then the reward ofthe case will be 10

36 Assimilation and Accommodation Piagetrsquos cognitive the-ory [18] reveals that assimilation refers to the tendency tointerpret experience as much as possible through existingcognitive structure of knowing When an agent faces a newsituation it generates a new plan with the current cognitivestructure also called its strategy to direct itself what to donextThe entire process is called assimilation In other wordsan agent assimilates a new situation with its current cognitivestructure However if the current cognitive structure cannotexplain the environment (ie the current cognitive structurecannot get the equilibration of inconsistent informationabout the world) an agent has to use the strategy of accom-modation Accommodation refers to the realization that thecurrent structure is insufficient for adequate understandingof the world and that the current structure must be changeduntil it can assimilate the new situation

6 Mathematical Problems in Engineering

This study proposes two accommodationmethods bevel-angle and positive-angle strategies for modifying a cogni-tive structure After performing accommodation an agentemploys themodified strategy to assimilate the new situationBy adjusting constantly its cognitive structure the agent canadapt more effectively to the environment

361 Bevel-Angle Strategy Suppose that an evaderrsquos locationis in the one-step reachable set for a pursuer and the evader isat a bevel angle of the pursuerThe pursuer randomly choosesone of the following strategies when the evader is at 45∘ (1205874)135∘ (31205874) 225∘ (51205874) and 315∘ (71205874) of the pursuer Letdeg(119909

119901119896(119905) 119909119890119894(119905)) be the angle between the pursuer 119896 and

evader 119894 where 119909119890119894(119905) is the position of the evader and 119909

119901119896(119905)

is the position of the pursuer at time 119905

Strategy 1 According to (16) the pursuermoves to the evaderrsquosposition at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119890119894

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

(16)

Strategy 2 According to (17) the pursuer moves to 45∘ of theevaderrsquos position at time 119905+1 Let 119909

119895be an element of the one-

step reachable set for the pursuer 119896 when it is in cell 119909 at time119905

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 45

(17)

Strategy 3 According to (18) the pursuermoves tominus45∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 45

(18)

Strategy 4 According to (19) the pursuer stays in the sameplace at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119901119896

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

(19)

362 Positive-Angle Strategy Suppose that an evaderrsquos loca-tion is in the one-step reachable set for a pursuer and theevader is at a positive angle of the pursuer where the angleincludes 0∘ 90∘ (1205872) 180∘ (120587) and 270∘ (31205872)

Strategy 1 According to (20) the pursuer moves to theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119890119894

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

(20)

Strategy 2 According to (21) the pursuer moves to 45∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 45

(21)

Strategy 3 According to (22) the pursuer moves to minus45∘ ofthe evaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 45

(22)

Mathematical Problems in Engineering 7

Strategy 4 According to (23) the pursuer moves to 90∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 90

(23)

Strategy 5 According to (24) the pursuer moves to minus90∘ ofthe evaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 90

(24)

Strategy 6 According to (25) the pursuer stays in the sameplace at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119901119896

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

(25)

Combining both the bevel-angle and positive-anglestrategies the pursuer obtains 24 different plans to adapt tothe environment effectively

37 Plan Evolution Process Plan evolution plays an importantrole in an agentrsquos life cycle When a pursuer finds an evaderin the sensing area it generates a pursuit plan Initially thepursuer searches its case base for similar cases If similar casesexist the pursuer chooses one of them as the plan Otherwisethe pursuer uses one of the abovementioned strategies togenerate a plan When the pursuer faces a new state thepursuer will adjust the plan accordingly Figure 3 shows theagentrsquos plan evolution process

(1) Problem Analysis Before determining the location in thenext step a pursuer analyzes the environmental state andupdates its goal with information from the environmentThepursuer chooses a plan according to its mental state

(2) Case Retrieval Comparing the similarity between theresult of the analysis in Step 1 and the cases in the case base

the pursuer retrieves similar cases by the (Sim) function andsorts them according to their similarity levels

(3) Similarity Calculation Similarity is classified into low andhigh levels It is possible that the pursuer retrieves a case withhigh-level similarity but obtains unsatisfactory results andthus a reward 119877 is added to prevent such If the case is withhigh-level similarity and the reward 119877 exceeds a predefinedconstant 120576 the pursuer executes Step 4 If the case is with low-level similarity or the reward 119877 is smaller than the predefinedconstant 120576 the pursuer executes Step 5

(4) Case Reuse If the retrieved case is with high-levelsimilarity and its result is acceptable the pursuer reuses thecase with revisions and then goes to Step 10

(5) Strategy Assimilation If no similar case exists in thecase base or the result of the similar case is unsatisfactorythe pursuer uses strategy assimilation that is the pursuergenerates a suitable plan with the current strategy (Steps 6ndash8) according to the situation

(6) Cooperation Before using the strategy module the pur-suer decides whether to cooperate with the others or not Ifthe decision is to cooperate with the others in the situationthen the local-cooperative strategy (ie Step 8) is executedOtherwise the pursuer executes the local-max strategy (ieStep 7)

(7) Application of the Local-Max Strategy If the pursuerdecides not to cooperate with the others it executes the local-max strategy and moves to the cell in the one-step reachableset with the highest probability of containing an evaderThengo to Step 9

(8) Application of the Local-Cooperative Strategy If thecooperative mode is taken the pursuer executes the local-cooperative strategy and moves to the cell which has theminimum overlap sensing area

(9) Reward Calculation The pursuer evaluates the plan andcalculates a reward accordingly

(10) Plan TransformationThe pursuer represents the plan bybeliefs actions and rewards

(11) Strategy Assessment The pursuer evaluates whether thecurrent strategy is suitable or not The pursuer counts thenumber 119862 of consecutive failures in catching an evader byusing the strategy If 119862 is smaller than a constant 120590 thestrategy is effective Then go to Step 13 Otherwise thestrategy is accommodated by Step 12

(12) Strategy AccommodationThe pursuer modifies the strat-egy to generate a new plan for the current environment

(13) Case Revision The actions and reward of the case arerevised after the problem is solved

8 Mathematical Problems in Engineering

(1) Problem analysis

(2) Case retrieval

(3) Similarity calculation (4) Case reuse

(5) Strategy assimilation

(6) Cooperation

(7) Application of local-maxstrategy

(8) Application of local-cooperativestrategy

(9) Reward calculation

(10) Plan transformation

(12) Strategy accommodation(13) Case revision

(14) Case retention

(11) Strategy assessment

Case base

R gt 120576

R le 120576

C lt 120590

C ge 120590

Figure 3 Plan evolution process

(14) Case Retention Parts of the experience that are likely tobe useful for future problem solving are retained

4 Case Study

This study uses a pursuit-evasion game to demonstrate theproposed approach and implements a multiagent system onthe Recursive Porous Agent Simulation Toolkit (REPAST)[35] Figure 4 shows the GUI of the system where theblue red and black circles represent pursuers evaders andobstacles respectively The yellow region depicts the sensingarea of the pursuers

41 Problem Formulation The system contains multiplepursuing robots which search for multiple evading robotsin a two-dimensional environment The physical world isabstracted as a grid world and the game is played accordingto the following settings and rules

(i) The number of pursuers is 119899 The number of evadersis unknown to the pursuers

(ii) Each pursuer has a sensing area comprising 5lowast5 cellsand the pursuer is at the center of the area as shownin Figure 4

(iii) During each moving step the agents are allowed toeither move one cell away from their current cells orstay in their current cells

(iv) Each pursuer has no information about the environ-ment other than that of its own 5-by-5 sensing area

(v) All pursuers use the same case base(vi) An evader is captured when it occupies the same

cell as a pursuer and it will be removed from theenvironment

(vii) The game ends when all evaders have been captured

42 Evasion Policy When an evader is caught the systemfirst checks whether the sum of the caught and uncaughtevaders is smaller than the total number of the predefinedevaders If so the system generates a new evader and choosesits type of movement Otherwise the system does nothingClearly when the sum of the caught and uncaught evaders

Mathematical Problems in Engineering 9

Figure 4 Implementation of multiagent system

equals the number of the predefined evaders the systemstops generating new evaders and the uncaught evaders donot increase The uncaught evaders even decrease with thesuccessful evader captures The game thus ends finally Inaddition when the system selects a movement type for anew evader the choice varies with different experimentswhich in turn affects the outcome of the experience Thisstudy investigates four types of movement namely randomclockwise counterclockwise and smart movements

Random Movement An evader randomly selects a locationthat can be reached in one step or just stays in the same loca-tion The details of the algorithm are shown in Algorithm 1

Clockwise Movement An evader moves clockwise in a squarepath The evader initially selects the direction of its first stepand sets up a distance randomly If the evader runs into anobstacle on the way it changes its direction automaticallyAlgorithm 2 shows the algorithm

Counterclockwise Movement This movement type is similarto the clockwise movement except that an evader movescounterclockwise The algorithm is shown in Algorithm 3

Smart Moving Evaders try actively to avoid pursuers Eachevader has a square sensing area Each side of the square isthree cells long If the evader finds any pursuer in the area itattempts to move to the location that has fewer pursuers inthe one-step reachable set of the area Algorithm 4 presentsthe algorithm

43 Results andDiscussion This study designs several experi-ments to compare the performance of the proposed approachHysteretic Q-Learning [36] and Reinforcement-Learningfusing (RL-Fusing) [37] All these approaches are the learningtechniques formultiagent systems Each experiment has threepursuers and one evader In addition the experiments runin ldquotime steprdquo unit One round is considered finished whenthe evader has been caught The performance results arethe average values obtained by running 30 rounds with thecurrent learning policy after 100 training episodes for eachalgorithm

431 Comparison of Approaches Matignon et al pre-sented a decentralized reinforcement learning algorithm for

independent learning robots called Hysteretic Q-Learning[36]This algorithm computes a better policy in a cooperativemultiagent system without additional information or com-munication between agents Hysteretic Q-Learning updatesthe equation for an agent 119894 executing the action 119886

119894from states

ending up in state 1199041015840 as follows

120575 larr997888 119903 + 120574max1198861015840

119876119894(1199041015840

1198861015840

) minus 119876119894(119904 119886119894)

119876119894(119904 119886119894) larr997888

119876119894(119904 119886119894) + inc120575 if 120575 gt 0

119876119894(119904 119886119894) + dec120575 else

(26)

where 119876119894(119904 119886119894) is the action value function of state 119904 and

action 119886119894 119903 is the reward it receives 120574 isin [0 1] is the discount

factor and inc anddec are the rates of increase anddecrease in119876-values respectively Partalas et al proposed the RL-Fusingapproach that uses coordinated actions and a fusing processto guide the agents [37] The fusion process combines thedecisions of all agents and then outputs a global decisionwhich the agents must follow as a team Algorithm 5 showsthe RL-Fusing algorithm The set of strategies means theplans

432 Obstacles The first experiment compares the perfor-mance of the proposed approach Hysteretic Q-Learningand RL-Fusing in environments withwithout obstacles Theevader is assumed to move randomly in every round In theenvironment with obstacles 10 obstacles are randomly set upinitially and no evader is trapped in the obstacles

Table 1 shows the performance results As can be seen allthreemethods requiremore capture steps in the environmentwith obstacles because pursuers need extra time to detectand avoid obstacles Hysteretic Q-Learning takes the greatestnumber of capture steps in both environments because itdoes not support pursuer cooperation Furthermore RL-Fusing also takes more capture steps than the proposedscheme Although RL-Fusing involves cooperation betweenpursuers the generated strategies sometimes cause a pursuerto miss evaders in its sensing area The pursuer thus needs tosearch again and requires more capture steps In contrast theproposed approach utilizes not only the cooperative strategybut also assimilation and accommodation which enable apursuer to learn from either success or failure experiencesThus the pursuer can keep in step with an evader and catchit in a smaller number of capture steps

433 Movement Types The second experimentmeasures theaverage number of capture steps required by the proposedapproach Hysteretic Q-Learning and RL-Fusing under twotypes of evader movement One is clockwise movementand the other is random clockwise and counterclockwisemovement

The average number of capture steps required by thethree approaches for different movement types is shown inTable 2 Similar to the results obtained in the first experimentthe proposed scheme outperforms Hysteretic Q-Learningand RL-Fusing Among the three approaches Hysteretic Q-Learning requires the highest number of capture steps for

10 Mathematical Problems in Engineering

(1) while true do(2) next location larr generates randomly a location nearby the evader(3) if next location has an obstacle then(4) next location larr generates randomly a new location nearby the evader(5) end if(6) end while

Algorithm 1 Random movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than the environment(3) move larr 0(4) while true do(5) get the next location according to the direction(6) move add 1(7) if move equals to distance or next location has an obstacle then(8) evader turns right(9) changes direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 2 Clockwise movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than environment(3) move larr 0(4) while true do(7) if move equals to distance or next location has obstacle then(8) evader turns left(9) change direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 3 Counterclockwise movement algorithm

(1) while true do(2) if evaderrsquos sensor area has pursuer then(3) next location larr generate a location with least pursuers detected(4) else(5) next location larr randomly generate a location nearby the evader(6) end if(7) end while

Algorithm 4 Smart movement algorithm

the same reason that it does not support agent cooperationAccording to the RL-Fusing algorithm presented in Algo-rithm 5 when a state is not the coordinated state the actionof an agent chooses a random or predefined policy and asingle pursuer thus has no learning ability Therefore the

possibility of catching an evader is low and RL-Fusing alsotakes more capture steps In contrast when the pursuer ofthe proposed scheme faces an evader the pursuer observesthe evader continuously If the evader changes its movingtype the pursuer uses accommodation to change its catching

Mathematical Problems in Engineering 11

Require A set of strategies an initial state 1199040

(1) 119904 larr 1199040

(2) while true do(3) if 119904 is coordinated state then(4) 120588 larr RandomReal(0 1)

(5) if 120588 lt 120576 then(6) select a strategy randomly the same for all agent(7) else(8) rank strategies(9) communicate ranks(10) receive ranks from other agents(11) average ranks(12) select corresponding strategy(13) end if(14) execute selected strategy(15) receive reward(16) transit to a new state 119904

1015840

(17) update 119876-value(18) else(19) act with a predefined or random policy(20) end if(21) end while

Algorithm 5 RL-Fusing algorithm

Table 1 Average number of capture steps in different environments

Grid sizeWithoutobstacles(steps)

With obstacles(steps)

HystereticQ-Learning 12 lowast 12 68 73

RL-Fusing 12 lowast 12 61 65The proposedapproach 12 lowast 12 24 27

Table 2 Average number of capture steps for different movementtypes

Grid sizeClockwisemovement(steps)

Randomclockwise andcounterclock-wise movement

(steps)HystereticQ-Learning 12 lowast 12 56 69

RL-Fusing 12 lowast 12 48 56The proposedapproach 12 lowast 12 26 32

strategy If the moving type remains fixed the pursuer keepsthe same strategy by assimilation As seen in Table 2 allthree methods require more steps to capture evaders withrandom clockwise and counterclockwise movement becausepursuers need extra time to detect the moving directions ofthe evaders

Table 3 Average number of capture steps for evaders with smartmovement in different grid sizes

Grid size 12 lowast 12(steps)

Grid size 24 lowast 24(steps)

HystereticQ-Learning 74 212

RL-Fusing 63 300The proposedapproach 63 119

Table 4 Average number of capture steps withwithout assimilationand accommodation

Evader movementtype

Clockwise

Evader movementtypeSmart

With assimilation andaccommodation 186 278

Without assimilationand accommodation 242 364

434 Smart Movement and Size of Grid The third experi-ment involves evaders with smart movementThe evaders tryactively to avoid pursuers when they sense any pursuer Theenvironment is of two grid sizes 12 lowast 12 and 24 lowast 24

Table 3 shows that the proposed scheme requires smallercapture steps in different grid sizes when compared withHysteretic Q-Learning and RL-Fusing When the grid sizeis smaller RL-Fusing outperforms Hysteretic Q-LearningHowever as the grid size becomes larger RL-Fusing requiresmore capture steps than Hysteretic Q-Learning This is

12 Mathematical Problems in Engineering

0

20

40

60

80

100

120

140

10 20 30 40Training episodes

Step

s

Grid size 12 lowast 12 (steps)Grid size 24 lowast 24 (steps)

Figure 5 Average number of capture steps under different trainingepisodes and grid sizes

because RL-Fusing depends mainly on its cooperative strate-gies to catch evaders but not all strategies can guide pursuersto move close to the evaders For example strategy 3 in RL-Fusing [37] is that once all the pursuers go at the distance of3 from an evader Because the sensing area of a pursuer is a5 lowast 5 square the pursuers easily lose the evader in a largerenvironment In contrast the proposed approach assists thepursuers to move as close to the evader as possible by theavoidance of sensing overlappingThus the approach has thehighest possibility of catching the evader

435 Smart Movement Training Episodes and Size of GridThe fourth experiment investigates the relationship betweentraining episodes and grid sizes when using the proposedapproach

Figure 5 shows that regardless of grid sizes (12 lowast 12 or24 lowast 24) the greater the number of training episodes is thesmaller the average number of capture steps is required Thisis becausemore training enriches the case base thus enablingpursuers to make better strategies and decisions

436 Assimilation and Accommodation The last experimentevaluates the proposed approach withwithout assimilationand accommodation There are two types of evader move-ment clockwise and smart The results are listed in Table 4Whether the approach is with or without assimilation andaccommodation the average number of capture steps forsmartmovement is greater than that for clockwisemovementThis is because smart movement can enable an evader todynamically change its movement and position In contrastmovement of an evader with clockwise movement can beeasily predicted hence the pursuers can take fewer steps forcapture Furthermore the average number of capture steps

required by the approach with assimilation and accommo-dation is smaller than that required by the approach withoutassimilation and accommodation for both movement typesThe approach without assimilation and accommodation gen-erates a fixed plan which cannot be changed In contrastthe approach with assimilation and accommodation enablesthe first pursuer to use the local-max strategy to find abetter position and the remaining pursuers can determinethe cooperative positions according to the first pursuerrsquosposition These pursuers can adjust their positions accordingto different catching situations and strategies Accordinglythe approach with assimilation and accommodation requiresfewer capture steps when compared with the approachwithout assimilation and accommodation

5 Conclusion

This work proposes an effective strategy learning approachto the pursuit-evasion game in a dynamic environment Thismethod can facilitate a pursuer in learning and implementingeffective plans from either success or failure experiencesunder the condition of no knowledge of environment avail-able The evolved plans fare extremely well when comparedwith some of the best constructed strategies This studyalso demonstrates the superior performance of the proposedapproach for the pursuit-evasion game by implementing amultiagent system on REPAST

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Ministry of Science andTechnology under Grant no 102-2218-E-027-007

References

[1] K P Sycara ldquoMultiagent systemsrdquo AI Magazine vol 19 no 2pp 79ndash92 1998

[2] N Vlassis A Concise Introduction to Multiagent Systems andDistributed Artificial Intelligence Synthesis Lectures on Artifi-cial Intelligence and Machine Learning Morgan amp Claypool2007

[3] A Antoniades H J Kim and S Sastry ldquoPursuit-evasion strate-gies for teams of multiple agents with incomplete informationrdquoin Proceedings of the 42nd IEEE Conference on Decision andControl pp 756ndash761 December 2003

[4] L J Guibas J-C Latombe S M Lavalle D Lin and RMotwani ldquoA visibility-based pursuit-evasion problemrdquo Interna-tional Journal of Computational GeometryampApplications vol 9no 4-5 pp 471ndash493 1999

[5] J P Hespanha G J Pappas and M Prandini ldquoGreedy controlfor hybrid pur-suit-evasion gamesrdquo in Proceedings of the Euro-pean Control Conference pp 2621ndash2626 2001

[6] R Vidal O Shakernia H J Kim D H Shim and S SastryldquoProbabilistic pursuit-evasion games theory implementation

Mathematical Problems in Engineering 13

and experimental evaluationrdquo IEEE Transactions on Roboticsand Automation vol 18 no 5 pp 662ndash669 2002

[7] R E Korf ldquoA simple solution to pursuit gamesrdquo in Proceedingsof the 2nd International Workshop on Distributed ArtificialIntelligence pp 183ndash194 1992

[8] S Nolfi and D Floreano ldquoCoevolving predator and prey robotsdo ldquoarms racesrdquo arise in artificial evolutionrdquo Artificial Life vol4 no 4 pp 311ndash335 1998

[9] N J Savill and P Hogeweg ldquoEvolutionary stagnation due topattern-pattern interactions in a coevolutionary predator-preymodelrdquo Artificial Life vol 3 no 2 pp 81ndash100 1997

[10] J Y Kuo F-C Huang S-P Ma and Y-Y Fanjiang ldquoApplyinghybrid learning approach to RoboCuprsquos strategyrdquo Journal ofSystems and Software vol 86 no 7 pp 1933ndash1944 2013

[11] S I Nishimura and T Ikegami ldquoEmergence of collectivestrategies in a prey-predator game modelrdquo Artificial Life vol 3no 4 pp 243ndash260 1998

[12] K-C Jim and C L Giles ldquoTalking helps evolving communi-cating agents for the predator-prey pursuit problemrdquo ArtificialLife vol 6 no 3 pp 237ndash254 2000

[13] T Haynes and S Sen ldquoEvolving behavioral strategies in preda-tors and preyrdquo inAdaption and Learning inMulti-Agent Systemsvol 1042 of Lecture Notes in Computer Science pp 113ndash126Springer Berlin Germany 1995

[14] D Hladek J Vascak and P Sincak ldquoHierarchical fuzzy infer-ence system for robotic pursuit evasion taskrdquo in Proceedings ofthe 6th International SymposiumonAppliedMachine Intelligenceand Informatics (SAMI rsquo08) pp 273ndash277 Herlrsquoany SlovakiaJanuary 2008

[15] R G Smith ldquoThe contract net protocol high-level commu-nication and control in a distributed problem solverrdquo IEEETransactions on Computers vol 19 no 12 pp 1104ndash1113 1980

[16] P-C Zhou B-R Hong Y-H Wang and T Zhou ldquoMulti-agent coopoerative pursuit based on extended contract netprotocolrdquo in Proceedings of the International Conference onMachine Learning and Cybernetics vol 1 pp 169ndash173 August2004

[17] A Rao and M Georgeff ldquoBDI agents from theory to practicerdquoin Proceedings of the International Conference on Multi-AgentSystems pp 312ndash319 1995

[18] J Piaget The Equilibration of Cognitive Structures The CentralProblem of Intellectual Development University of ChicagoPress 1985

[19] J Y Kuo S J Lee C L Wu N L Hsueh and J Lee ldquoEvolu-tionary agents for intelligent transport systemsrdquo InternationalJournal of Fuzzy Systems vol 7 no 2 pp 85ndash93 2005

[20] J Y Kuo and F C Huang ldquoA hybrid approach for multi-agentlearning systemsrdquo Intelligent Automation and Soft Computingvol 17 no 3 pp 385ndash399 2011

[21] S Gebhardt P Grant R von Georgi and M T HuberldquoAspects of Piagetrsquos cognitive developmental psychology andneurobiology of psychotic disordersmdashan integrative modelrdquoMedical Hypotheses vol 71 no 3 pp 426ndash433 2008

[22] S Takamuku and R C Arkin ldquoMulti-method learning andassimilationrdquo Robotics and Autonomous Systems vol 55 no 8pp 618ndash627 2007

[23] T Parsons ldquoPursuit-evasion in a graphrdquo in Theory and Appli-cations of Graphs vol 642 of Lecture Notes in Mathematics pp426ndash441 Springer Heidelberg Germany 1978

[24] R Breisch ldquoAn intuitive approach to speleotopologyrdquo South-western Cavers vol 6 no 5 pp 72ndash78 1967

[25] F Ferrari ldquoA new parameterized potential family for path plan-ning algorithmsrdquo International Journal on Artificial IntelligenceTools vol 18 no 6 pp 949ndash957 2009

[26] K Walsh and B Banerjee ldquoFast Alowast with iterative resolution fornavigationrdquo International Journal on Artificial Intelligence Toolsvol 19 no 1 pp 101ndash119 2010

[27] L-P Wong M Y H Low and C S Chong ldquoBee colonyoptimization with local search for traveling salesman problemrdquoInternational Journal on Artificial Intelligence Tools vol 19 no3 pp 305ndash334 2010

[28] E Raboin D S Nau U Kuter S K Gupta and P SvecldquoStrategy generation inmulti-agent imperfect-information pur-suit gamesrdquo in Proceedings of the International Conference onAutonomous Agents and Multiagent Systems pp 947ndash954 2010

[29] B PGerkey SThrun andGGordon ldquoVisibility-based pursuit-evasion with limited field of Viewrdquo International Journal ofRobotics Research vol 25 no 4 pp 299ndash315 2006

[30] N Basilico N Gatti and F Amigoni ldquoLeader-follower strate-gies for robotic patrolling in environments with arbitrarytopologiesrdquo in Proceedings of the 8th International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMASrsquo09) pp 57ndash64 May 2009

[31] M Yan ldquoMulti-robot searching using game-theory basedapproachrdquo International Journal of Advanced Robotic Systemsvol 5 no 4 pp 341ndash350 2008

[32] K-DAlthoff ldquoCase-based reasoningrdquo inHandbook on SoftwareEngineering and Knowledge Engineering pp 549ndash587 2001

[33] E K Burke B L MacCarthy S Petrovic and R Qu ldquoMultiple-retrieval case-based reasoning for course timetabling prob-lemsrdquo Journal of the Operational Research Society vol 57 no2 pp 148ndash162 2006

[34] J Long D G Schwartz S Stoecklin and M K Patel ldquoAppli-cation of loop reduction to learning program behaviors foranomaly detectionrdquo in Proceedings of the International Confer-ence on Information Technology Coding and Computing (ITCC05) vol 1 pp 691ndash696 April 2005

[35] The Repast Suite The Repast Suite 2014 httprepastsource-forgenet

[36] L Matignon G J Laurent and N L Fort-Piat ldquoHysteretic Q-Learning an algorithm for decentralized reinforcement learn-ing in cooperative multi-agent teamsrdquo in Proceedings of theIEEERSJ International Conference on Intelligent Robots andSystems (IROS 07) pp 64ndash69 San Diego Calif USA October-November 2007

[37] I Partalas I Feneris and I Vlahavas ldquoA hybrid multiagentreinforcement learning approach using strategies and fusionrdquoInternational Journal on Artificial Intelligence Tools vol 17 no5 pp 945ndash962 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Multiagent Cooperative Learning ...downloads.hindawi.com/journals/mpe/2015/964871.pdf · Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion

4 Mathematical Problems in Engineering

be performed only if certain beliefs are held For example apursuer is assumed to catch an evader from location (1198830 1198840)

to location (1198831 1198841) with a basic action 119872119900V119890119879119900(1198831 1198841)Before this action is executed the pursuer must be at position(1198830 1198840) denoted by 119901119906119903119904119906119890119903(119904119890119897119891 1198830 1198840) The conditionthat the pursuer senses the evader at position (1198831 1198841) isdenoted by 119890V119886119889119890119903119860119905(1198831 1198841) This action is then defined interms of the following beliefs

119901119906119903119904119906119890119903 (119904119890119897119891 1198830 1198840) 119890V119886119889119890119903119860119905 (1198831 1198841) (5)

After executing an action an agentmust update its beliefsto make its postcondition true According to the aboveexample after performing the catch action the agent updatesits beliefs and obtains

119901119906119903119904119906119890119903 (119904119890119897119891 1198831 1198841) 119890V119886119889119890119903119860119905 (1198832 1198842) (6)

where (1198832 1198842) is an evaderrsquos new location

314 Strategies The strategy of a player refers to one of theoptions that can be chosen in a setting The choice dependsnot only on the agentrsquos own actions but also on other agentsrsquoactions A prior study [28] proposed a pure strategy in termsof a function Given an agentrsquos current information set thefunction returns the agentrsquos next move (ie its change inlocation between time 119905

119895and 119905119895+1

) For example if an evaderhas a pure strategy 120590

0 then 120579

0(119905119895+1

) = 1205790(119905119895) + 1205900(1198680(119905119895))

where 1205790(119905119895) is the location of tracker agent 0 at time 119905

119895and

1198680(119905119895) is the trackerrsquos information set at time 119905

119895 Suppose that

a pursuer has a pure strategy set 120590 = (1205901 120590

119896) and the

locations of the tracker agents 120579 (119905) = 1205791(119905) 120579

119896(119905) Then

120579(119905119895+1

) = 120579(119905119895) + 120590(119868

0(119905119895)) for each time 119905

119895 The strategy

is further refined by adding an algorithm for selecting aplan according to different conditions The algorithm can berepresented by a function that generates a planThe conditionincludes the beliefs of agents or the current situation of theenvironment The proposed strategy is denoted by

119878119905119903119886119905119890119892119910 = ⟨119888119900119899119889119894119905119894119900119899119904⟩ 119891119906119899119888119905119894119900119899 (7)

315 Plans A plan is a method for an agent to approach itsgoal and is generated according to the agentrsquos strategy A planconsists of beliefs actions and rewards as expressed in thefollowing equation

119875119897119886119899 = ⟨119887119890119897119894119890119891⟩ ⟨119887119886119904119894119888 119886119888119905119894119900119899⟩ [⟨119887119886119904119894119888 119886119888119905119894119900119899⟩] ⟨119903119890119908119886119903119889⟩

(8)

A plan can be one or a set of actions for an agent and thevalue of the plan is determined by the reward

32 Case-Based Reasoning According to previous studies[32 33] a general case-based reasoning (CBR) cycle mayinclude retrieving the most similar cases for a problemreusing the information and knowledge in those cases to solvethe problem revising the proposed solution and retainingthe parts of this experience likely to be useful for futureproblem solving

In order to solve a new problem the cases which containthe most useful knowledge have to be identified first Sincethe utility of a case cannot be evaluated directly a priori thesimilarity between problem descriptions is used in heuristicsto estimate the expected utility of the cases Therefore thequality of this measure is crucial for the success of CBRapplications There are numerous similarity measures in usetoday They are used not only in CBR but also in otherfields including data mining pattern recognition geneticalgorithm and machine learning Euclidean distance is atypical similarity measure for objects in a Euclidean space[34] According to Euclidean distance this work devises asimple distance function to evaluate the similarity betweentwo cases

321 Case Description A case stands for an agentrsquos mentalstate and the result of its past output Each case consists of agoal beliefs an action a plan and a reward The goal is whatthe agent wants to achieve or realize The beliefs describe thesituation the agent is in The action is what the agent doesunder that situationThe plan is the approach the agent takesto achieve the goalThe reward serves to evaluate the result ofthe plan

322 Similarity Relation Consider the following

Sim119886119887

=

119899

sum

119894=1

119908119894times (

1003816100381610038161003816119886119894 minus 119887119894

1003816100381610038161003816) (9)

This study proposes an evaluation function (Sim) shownas (9) for measuring the similarity between cases 119886 and 119887In this equation 119886 and 119887 represent a new problem and acase in a case base respectively The variable 119899 is the numberof features in case 119886 The weight variable 119908

119894stands for the

importance of the 119894th feature The weight vector is freelydefined by a user 119886

119894and 119887119894are the 119894th features of cases 119886 and 119887

respectively In the case-retrieving stage of the CBR approachthe case with the smallest (Sim) value is always retrieved forreuse

33 Probabilistic Framework Suppose that a finite two-dimensional environment EX with 119899

119888square cells contains

an unknown number of fixed obstacles We also assume thatthe environment has 119899

119901pursuers and 119899

119890evaders Let 119909

119901119896and

119909119890119894be the cells occupied by pursuer 119896 and evader 119894 where

1 le 119894 le 119899119901and 1 le 119895 le 119899

119890 The pursuers and evaders

are restricted to move to cells not occupied by obstaclesEach pursuer collects information about EX at discrete timeinstances 119905 isin 119879 = 1 2 119905end

Define 119909119901119896

(119905) sub EX as the cell of pursuer 119896 at time 119905 and119909119890119894

(119905) sub EX as the cell of evader 119894 at time 119905 119900(119905) is a set of cellswhere obstacles are detected Let 119880(119909

119890119894(119905)) denote the one-

step reachable set for the evader when the evader is in cell 119909

Mathematical Problems in Engineering 5

at time 119905 Then the probability of the evader being in cell 119909 attime 119905 + 1 is given by the following (10)

119901 (119909 119905 + 1 | 119909119890119894

(119905))

=

110038161003816100381610038161003816119880 (119909119890119894

(119905)) minus 119900 (119905)10038161003816100381610038161003816

119909 isin 119880 (119909119890119894

(119905))

0 119909 isin 119900 (119905)

(10)

where 119901(119909 119905 + 1 | 119909119890119894(119905)) represents the probability of the

evader being in cell 119909 at time 119905 + 1 according to the locationwhere the pursuer detects the evader at time 119905

34 Strategies for Plan Generation When the distancebetween the retrieved case and the new case is smaller thana threshold the above approach cannot generate a usefulplan for the new case To overcome this problem this studyproposes two plan-generation strategies local-max strategyand local-cooperative strategy

341 Local-Max Strategy This strategy is used only whenthere is a single pursuer for evaders Let 119878

119896(119910) be the set of all

cells that liewithin the sensing area of a pursuer 119896 at cell119910Thetotal evasion probability of the evaders at time 119905 associatedwith the pursuer is then obtained by the following (11)

119875 (119910 119905) = sum

119911isin119878119896(119910)

[119901 (119911 119905 + 1 | 119909119890119894

(119905))] (11)

By computing the evasion possibilities of all sensing cellspursuer 119896 moves to cell 119909

119901119896(119905 + 1) that has the highest total

evasion probability expressed as follows

119909119901119896

(119905 + 1) = arg max119910isin119880(119909119901

119896(119905))

[119875 (119910 119905)] (12)

342 Local-Cooperative Strategy This strategy is for pursuersthat cooperate with each other to catch the same evaderInitially pursuers need to decide whether they can cooperatewith each other or not Let 119889(119909 119910) be theManhattan distancefrom cell 119909 to cell 119910 that is if cells 119909 and 119910 lie in a two-dimensional space with coordinates (119909

1 1199092) and (119910

1 1199102)

the Manhattan distance is calculated using the followingequation

119889 (119909 119910) = max (10038161003816100381610038161199091 minus 119910

1

1003816100381610038161003816 10038161003816100381610038161199092 minus 119910

2

1003816100381610038161003816) (13)

Let 119878119896be the farthest sensing cell of a pursuer 119896 For

example if the sensing area of a pursuer 119896 is a 5 lowast 5 squareand the pursuer stands at the center of the area the farthestdistance that the pursuer can detect is two squares If thereexists two or more pursuers and one evader correspondingwith 119889(119909

119901119896(119905) 119909119890119894(119905)) le 119904

119896and 119889(119909

119901119896(119905) 119909119890119894(119905)) le 119904

119896 the

pursuers can cooperate with each otherSuppose that two or more pursuers execute the local-

cooperative strategy Let Overlap(119909 119910) be the number ofoverlap cells within the one-step reachable set (a 3lowast3 region)around cells 119909 and 119910 The overlap function is illustrated inFigure 2 The yellow grid represents the one-step reachable

EP1

P2

Figure 2 Number of overlap cells

set for a pursuer 1198751 the dashed grid depicts the one-stepreachable set for a pursuer 1198752 and the green grid representsthe position of an evader 119864 The number of overlap cells thusequals four (ie the return value of Overlap(119909 119910))

Once the pursuers can cooperate with each other thepursuers randomly select one of them as a leader Otherpursuers can determine their locations at time 119905 + 1 by thefollowing (14)

119909119901119896

(119905 + 1) = arg min119910isin119880(119909119901

119896(119905))

[Overlap (119909119901119896

(119905 + 1) 119910)] (14)

where 119909119901119896

(119905 + 1) is the location of the leader To avoidpursuers moving to the same location each pursuer finds theminimum overlap area

35 Reward Calculation Each case has its own reward forevaluating its result The higher the reward of a case is themore likely the goal will be achieved In this study the rewardis calculated using the following equation

119877 =10038161003816100381610038161003816119878119896

(119910) cap 119880 (119909119890119894

(119905))10038161003816100381610038161003816 (15)

Reward 119877 represents the number of one-step reachablesets for an evader in a pursuerrsquos sensing areaThe reward valueis between 0 and 10 If an evader is caught then the reward ofthe case will be 10

36 Assimilation and Accommodation Piagetrsquos cognitive the-ory [18] reveals that assimilation refers to the tendency tointerpret experience as much as possible through existingcognitive structure of knowing When an agent faces a newsituation it generates a new plan with the current cognitivestructure also called its strategy to direct itself what to donextThe entire process is called assimilation In other wordsan agent assimilates a new situation with its current cognitivestructure However if the current cognitive structure cannotexplain the environment (ie the current cognitive structurecannot get the equilibration of inconsistent informationabout the world) an agent has to use the strategy of accom-modation Accommodation refers to the realization that thecurrent structure is insufficient for adequate understandingof the world and that the current structure must be changeduntil it can assimilate the new situation

6 Mathematical Problems in Engineering

This study proposes two accommodationmethods bevel-angle and positive-angle strategies for modifying a cogni-tive structure After performing accommodation an agentemploys themodified strategy to assimilate the new situationBy adjusting constantly its cognitive structure the agent canadapt more effectively to the environment

361 Bevel-Angle Strategy Suppose that an evaderrsquos locationis in the one-step reachable set for a pursuer and the evader isat a bevel angle of the pursuerThe pursuer randomly choosesone of the following strategies when the evader is at 45∘ (1205874)135∘ (31205874) 225∘ (51205874) and 315∘ (71205874) of the pursuer Letdeg(119909

119901119896(119905) 119909119890119894(119905)) be the angle between the pursuer 119896 and

evader 119894 where 119909119890119894(119905) is the position of the evader and 119909

119901119896(119905)

is the position of the pursuer at time 119905

Strategy 1 According to (16) the pursuermoves to the evaderrsquosposition at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119890119894

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

(16)

Strategy 2 According to (17) the pursuer moves to 45∘ of theevaderrsquos position at time 119905+1 Let 119909

119895be an element of the one-

step reachable set for the pursuer 119896 when it is in cell 119909 at time119905

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 45

(17)

Strategy 3 According to (18) the pursuermoves tominus45∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 45

(18)

Strategy 4 According to (19) the pursuer stays in the sameplace at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119901119896

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

(19)

362 Positive-Angle Strategy Suppose that an evaderrsquos loca-tion is in the one-step reachable set for a pursuer and theevader is at a positive angle of the pursuer where the angleincludes 0∘ 90∘ (1205872) 180∘ (120587) and 270∘ (31205872)

Strategy 1 According to (20) the pursuer moves to theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119890119894

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

(20)

Strategy 2 According to (21) the pursuer moves to 45∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 45

(21)

Strategy 3 According to (22) the pursuer moves to minus45∘ ofthe evaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 45

(22)

Mathematical Problems in Engineering 7

Strategy 4 According to (23) the pursuer moves to 90∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 90

(23)

Strategy 5 According to (24) the pursuer moves to minus90∘ ofthe evaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 90

(24)

Strategy 6 According to (25) the pursuer stays in the sameplace at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119901119896

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

(25)

Combining both the bevel-angle and positive-anglestrategies the pursuer obtains 24 different plans to adapt tothe environment effectively

37 Plan Evolution Process Plan evolution plays an importantrole in an agentrsquos life cycle When a pursuer finds an evaderin the sensing area it generates a pursuit plan Initially thepursuer searches its case base for similar cases If similar casesexist the pursuer chooses one of them as the plan Otherwisethe pursuer uses one of the abovementioned strategies togenerate a plan When the pursuer faces a new state thepursuer will adjust the plan accordingly Figure 3 shows theagentrsquos plan evolution process

(1) Problem Analysis Before determining the location in thenext step a pursuer analyzes the environmental state andupdates its goal with information from the environmentThepursuer chooses a plan according to its mental state

(2) Case Retrieval Comparing the similarity between theresult of the analysis in Step 1 and the cases in the case base

the pursuer retrieves similar cases by the (Sim) function andsorts them according to their similarity levels

(3) Similarity Calculation Similarity is classified into low andhigh levels It is possible that the pursuer retrieves a case withhigh-level similarity but obtains unsatisfactory results andthus a reward 119877 is added to prevent such If the case is withhigh-level similarity and the reward 119877 exceeds a predefinedconstant 120576 the pursuer executes Step 4 If the case is with low-level similarity or the reward 119877 is smaller than the predefinedconstant 120576 the pursuer executes Step 5

(4) Case Reuse If the retrieved case is with high-levelsimilarity and its result is acceptable the pursuer reuses thecase with revisions and then goes to Step 10

(5) Strategy Assimilation If no similar case exists in thecase base or the result of the similar case is unsatisfactorythe pursuer uses strategy assimilation that is the pursuergenerates a suitable plan with the current strategy (Steps 6ndash8) according to the situation

(6) Cooperation Before using the strategy module the pur-suer decides whether to cooperate with the others or not Ifthe decision is to cooperate with the others in the situationthen the local-cooperative strategy (ie Step 8) is executedOtherwise the pursuer executes the local-max strategy (ieStep 7)

(7) Application of the Local-Max Strategy If the pursuerdecides not to cooperate with the others it executes the local-max strategy and moves to the cell in the one-step reachableset with the highest probability of containing an evaderThengo to Step 9

(8) Application of the Local-Cooperative Strategy If thecooperative mode is taken the pursuer executes the local-cooperative strategy and moves to the cell which has theminimum overlap sensing area

(9) Reward Calculation The pursuer evaluates the plan andcalculates a reward accordingly

(10) Plan TransformationThe pursuer represents the plan bybeliefs actions and rewards

(11) Strategy Assessment The pursuer evaluates whether thecurrent strategy is suitable or not The pursuer counts thenumber 119862 of consecutive failures in catching an evader byusing the strategy If 119862 is smaller than a constant 120590 thestrategy is effective Then go to Step 13 Otherwise thestrategy is accommodated by Step 12

(12) Strategy AccommodationThe pursuer modifies the strat-egy to generate a new plan for the current environment

(13) Case Revision The actions and reward of the case arerevised after the problem is solved

8 Mathematical Problems in Engineering

(1) Problem analysis

(2) Case retrieval

(3) Similarity calculation (4) Case reuse

(5) Strategy assimilation

(6) Cooperation

(7) Application of local-maxstrategy

(8) Application of local-cooperativestrategy

(9) Reward calculation

(10) Plan transformation

(12) Strategy accommodation(13) Case revision

(14) Case retention

(11) Strategy assessment

Case base

R gt 120576

R le 120576

C lt 120590

C ge 120590

Figure 3 Plan evolution process

(14) Case Retention Parts of the experience that are likely tobe useful for future problem solving are retained

4 Case Study

This study uses a pursuit-evasion game to demonstrate theproposed approach and implements a multiagent system onthe Recursive Porous Agent Simulation Toolkit (REPAST)[35] Figure 4 shows the GUI of the system where theblue red and black circles represent pursuers evaders andobstacles respectively The yellow region depicts the sensingarea of the pursuers

41 Problem Formulation The system contains multiplepursuing robots which search for multiple evading robotsin a two-dimensional environment The physical world isabstracted as a grid world and the game is played accordingto the following settings and rules

(i) The number of pursuers is 119899 The number of evadersis unknown to the pursuers

(ii) Each pursuer has a sensing area comprising 5lowast5 cellsand the pursuer is at the center of the area as shownin Figure 4

(iii) During each moving step the agents are allowed toeither move one cell away from their current cells orstay in their current cells

(iv) Each pursuer has no information about the environ-ment other than that of its own 5-by-5 sensing area

(v) All pursuers use the same case base(vi) An evader is captured when it occupies the same

cell as a pursuer and it will be removed from theenvironment

(vii) The game ends when all evaders have been captured

42 Evasion Policy When an evader is caught the systemfirst checks whether the sum of the caught and uncaughtevaders is smaller than the total number of the predefinedevaders If so the system generates a new evader and choosesits type of movement Otherwise the system does nothingClearly when the sum of the caught and uncaught evaders

Mathematical Problems in Engineering 9

Figure 4 Implementation of multiagent system

equals the number of the predefined evaders the systemstops generating new evaders and the uncaught evaders donot increase The uncaught evaders even decrease with thesuccessful evader captures The game thus ends finally Inaddition when the system selects a movement type for anew evader the choice varies with different experimentswhich in turn affects the outcome of the experience Thisstudy investigates four types of movement namely randomclockwise counterclockwise and smart movements

Random Movement An evader randomly selects a locationthat can be reached in one step or just stays in the same loca-tion The details of the algorithm are shown in Algorithm 1

Clockwise Movement An evader moves clockwise in a squarepath The evader initially selects the direction of its first stepand sets up a distance randomly If the evader runs into anobstacle on the way it changes its direction automaticallyAlgorithm 2 shows the algorithm

Counterclockwise Movement This movement type is similarto the clockwise movement except that an evader movescounterclockwise The algorithm is shown in Algorithm 3

Smart Moving Evaders try actively to avoid pursuers Eachevader has a square sensing area Each side of the square isthree cells long If the evader finds any pursuer in the area itattempts to move to the location that has fewer pursuers inthe one-step reachable set of the area Algorithm 4 presentsthe algorithm

43 Results andDiscussion This study designs several experi-ments to compare the performance of the proposed approachHysteretic Q-Learning [36] and Reinforcement-Learningfusing (RL-Fusing) [37] All these approaches are the learningtechniques formultiagent systems Each experiment has threepursuers and one evader In addition the experiments runin ldquotime steprdquo unit One round is considered finished whenthe evader has been caught The performance results arethe average values obtained by running 30 rounds with thecurrent learning policy after 100 training episodes for eachalgorithm

431 Comparison of Approaches Matignon et al pre-sented a decentralized reinforcement learning algorithm for

independent learning robots called Hysteretic Q-Learning[36]This algorithm computes a better policy in a cooperativemultiagent system without additional information or com-munication between agents Hysteretic Q-Learning updatesthe equation for an agent 119894 executing the action 119886

119894from states

ending up in state 1199041015840 as follows

120575 larr997888 119903 + 120574max1198861015840

119876119894(1199041015840

1198861015840

) minus 119876119894(119904 119886119894)

119876119894(119904 119886119894) larr997888

119876119894(119904 119886119894) + inc120575 if 120575 gt 0

119876119894(119904 119886119894) + dec120575 else

(26)

where 119876119894(119904 119886119894) is the action value function of state 119904 and

action 119886119894 119903 is the reward it receives 120574 isin [0 1] is the discount

factor and inc anddec are the rates of increase anddecrease in119876-values respectively Partalas et al proposed the RL-Fusingapproach that uses coordinated actions and a fusing processto guide the agents [37] The fusion process combines thedecisions of all agents and then outputs a global decisionwhich the agents must follow as a team Algorithm 5 showsthe RL-Fusing algorithm The set of strategies means theplans

432 Obstacles The first experiment compares the perfor-mance of the proposed approach Hysteretic Q-Learningand RL-Fusing in environments withwithout obstacles Theevader is assumed to move randomly in every round In theenvironment with obstacles 10 obstacles are randomly set upinitially and no evader is trapped in the obstacles

Table 1 shows the performance results As can be seen allthreemethods requiremore capture steps in the environmentwith obstacles because pursuers need extra time to detectand avoid obstacles Hysteretic Q-Learning takes the greatestnumber of capture steps in both environments because itdoes not support pursuer cooperation Furthermore RL-Fusing also takes more capture steps than the proposedscheme Although RL-Fusing involves cooperation betweenpursuers the generated strategies sometimes cause a pursuerto miss evaders in its sensing area The pursuer thus needs tosearch again and requires more capture steps In contrast theproposed approach utilizes not only the cooperative strategybut also assimilation and accommodation which enable apursuer to learn from either success or failure experiencesThus the pursuer can keep in step with an evader and catchit in a smaller number of capture steps

433 Movement Types The second experimentmeasures theaverage number of capture steps required by the proposedapproach Hysteretic Q-Learning and RL-Fusing under twotypes of evader movement One is clockwise movementand the other is random clockwise and counterclockwisemovement

The average number of capture steps required by thethree approaches for different movement types is shown inTable 2 Similar to the results obtained in the first experimentthe proposed scheme outperforms Hysteretic Q-Learningand RL-Fusing Among the three approaches Hysteretic Q-Learning requires the highest number of capture steps for

10 Mathematical Problems in Engineering

(1) while true do(2) next location larr generates randomly a location nearby the evader(3) if next location has an obstacle then(4) next location larr generates randomly a new location nearby the evader(5) end if(6) end while

Algorithm 1 Random movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than the environment(3) move larr 0(4) while true do(5) get the next location according to the direction(6) move add 1(7) if move equals to distance or next location has an obstacle then(8) evader turns right(9) changes direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 2 Clockwise movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than environment(3) move larr 0(4) while true do(7) if move equals to distance or next location has obstacle then(8) evader turns left(9) change direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 3 Counterclockwise movement algorithm

(1) while true do(2) if evaderrsquos sensor area has pursuer then(3) next location larr generate a location with least pursuers detected(4) else(5) next location larr randomly generate a location nearby the evader(6) end if(7) end while

Algorithm 4 Smart movement algorithm

the same reason that it does not support agent cooperationAccording to the RL-Fusing algorithm presented in Algo-rithm 5 when a state is not the coordinated state the actionof an agent chooses a random or predefined policy and asingle pursuer thus has no learning ability Therefore the

possibility of catching an evader is low and RL-Fusing alsotakes more capture steps In contrast when the pursuer ofthe proposed scheme faces an evader the pursuer observesthe evader continuously If the evader changes its movingtype the pursuer uses accommodation to change its catching

Mathematical Problems in Engineering 11

Require A set of strategies an initial state 1199040

(1) 119904 larr 1199040

(2) while true do(3) if 119904 is coordinated state then(4) 120588 larr RandomReal(0 1)

(5) if 120588 lt 120576 then(6) select a strategy randomly the same for all agent(7) else(8) rank strategies(9) communicate ranks(10) receive ranks from other agents(11) average ranks(12) select corresponding strategy(13) end if(14) execute selected strategy(15) receive reward(16) transit to a new state 119904

1015840

(17) update 119876-value(18) else(19) act with a predefined or random policy(20) end if(21) end while

Algorithm 5 RL-Fusing algorithm

Table 1 Average number of capture steps in different environments

Grid sizeWithoutobstacles(steps)

With obstacles(steps)

HystereticQ-Learning 12 lowast 12 68 73

RL-Fusing 12 lowast 12 61 65The proposedapproach 12 lowast 12 24 27

Table 2 Average number of capture steps for different movementtypes

Grid sizeClockwisemovement(steps)

Randomclockwise andcounterclock-wise movement

(steps)HystereticQ-Learning 12 lowast 12 56 69

RL-Fusing 12 lowast 12 48 56The proposedapproach 12 lowast 12 26 32

strategy If the moving type remains fixed the pursuer keepsthe same strategy by assimilation As seen in Table 2 allthree methods require more steps to capture evaders withrandom clockwise and counterclockwise movement becausepursuers need extra time to detect the moving directions ofthe evaders

Table 3 Average number of capture steps for evaders with smartmovement in different grid sizes

Grid size 12 lowast 12(steps)

Grid size 24 lowast 24(steps)

HystereticQ-Learning 74 212

RL-Fusing 63 300The proposedapproach 63 119

Table 4 Average number of capture steps withwithout assimilationand accommodation

Evader movementtype

Clockwise

Evader movementtypeSmart

With assimilation andaccommodation 186 278

Without assimilationand accommodation 242 364

434 Smart Movement and Size of Grid The third experi-ment involves evaders with smart movementThe evaders tryactively to avoid pursuers when they sense any pursuer Theenvironment is of two grid sizes 12 lowast 12 and 24 lowast 24

Table 3 shows that the proposed scheme requires smallercapture steps in different grid sizes when compared withHysteretic Q-Learning and RL-Fusing When the grid sizeis smaller RL-Fusing outperforms Hysteretic Q-LearningHowever as the grid size becomes larger RL-Fusing requiresmore capture steps than Hysteretic Q-Learning This is

12 Mathematical Problems in Engineering

0

20

40

60

80

100

120

140

10 20 30 40Training episodes

Step

s

Grid size 12 lowast 12 (steps)Grid size 24 lowast 24 (steps)

Figure 5 Average number of capture steps under different trainingepisodes and grid sizes

because RL-Fusing depends mainly on its cooperative strate-gies to catch evaders but not all strategies can guide pursuersto move close to the evaders For example strategy 3 in RL-Fusing [37] is that once all the pursuers go at the distance of3 from an evader Because the sensing area of a pursuer is a5 lowast 5 square the pursuers easily lose the evader in a largerenvironment In contrast the proposed approach assists thepursuers to move as close to the evader as possible by theavoidance of sensing overlappingThus the approach has thehighest possibility of catching the evader

435 Smart Movement Training Episodes and Size of GridThe fourth experiment investigates the relationship betweentraining episodes and grid sizes when using the proposedapproach

Figure 5 shows that regardless of grid sizes (12 lowast 12 or24 lowast 24) the greater the number of training episodes is thesmaller the average number of capture steps is required Thisis becausemore training enriches the case base thus enablingpursuers to make better strategies and decisions

436 Assimilation and Accommodation The last experimentevaluates the proposed approach withwithout assimilationand accommodation There are two types of evader move-ment clockwise and smart The results are listed in Table 4Whether the approach is with or without assimilation andaccommodation the average number of capture steps forsmartmovement is greater than that for clockwisemovementThis is because smart movement can enable an evader todynamically change its movement and position In contrastmovement of an evader with clockwise movement can beeasily predicted hence the pursuers can take fewer steps forcapture Furthermore the average number of capture steps

required by the approach with assimilation and accommo-dation is smaller than that required by the approach withoutassimilation and accommodation for both movement typesThe approach without assimilation and accommodation gen-erates a fixed plan which cannot be changed In contrastthe approach with assimilation and accommodation enablesthe first pursuer to use the local-max strategy to find abetter position and the remaining pursuers can determinethe cooperative positions according to the first pursuerrsquosposition These pursuers can adjust their positions accordingto different catching situations and strategies Accordinglythe approach with assimilation and accommodation requiresfewer capture steps when compared with the approachwithout assimilation and accommodation

5 Conclusion

This work proposes an effective strategy learning approachto the pursuit-evasion game in a dynamic environment Thismethod can facilitate a pursuer in learning and implementingeffective plans from either success or failure experiencesunder the condition of no knowledge of environment avail-able The evolved plans fare extremely well when comparedwith some of the best constructed strategies This studyalso demonstrates the superior performance of the proposedapproach for the pursuit-evasion game by implementing amultiagent system on REPAST

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Ministry of Science andTechnology under Grant no 102-2218-E-027-007

References

[1] K P Sycara ldquoMultiagent systemsrdquo AI Magazine vol 19 no 2pp 79ndash92 1998

[2] N Vlassis A Concise Introduction to Multiagent Systems andDistributed Artificial Intelligence Synthesis Lectures on Artifi-cial Intelligence and Machine Learning Morgan amp Claypool2007

[3] A Antoniades H J Kim and S Sastry ldquoPursuit-evasion strate-gies for teams of multiple agents with incomplete informationrdquoin Proceedings of the 42nd IEEE Conference on Decision andControl pp 756ndash761 December 2003

[4] L J Guibas J-C Latombe S M Lavalle D Lin and RMotwani ldquoA visibility-based pursuit-evasion problemrdquo Interna-tional Journal of Computational GeometryampApplications vol 9no 4-5 pp 471ndash493 1999

[5] J P Hespanha G J Pappas and M Prandini ldquoGreedy controlfor hybrid pur-suit-evasion gamesrdquo in Proceedings of the Euro-pean Control Conference pp 2621ndash2626 2001

[6] R Vidal O Shakernia H J Kim D H Shim and S SastryldquoProbabilistic pursuit-evasion games theory implementation

Mathematical Problems in Engineering 13

and experimental evaluationrdquo IEEE Transactions on Roboticsand Automation vol 18 no 5 pp 662ndash669 2002

[7] R E Korf ldquoA simple solution to pursuit gamesrdquo in Proceedingsof the 2nd International Workshop on Distributed ArtificialIntelligence pp 183ndash194 1992

[8] S Nolfi and D Floreano ldquoCoevolving predator and prey robotsdo ldquoarms racesrdquo arise in artificial evolutionrdquo Artificial Life vol4 no 4 pp 311ndash335 1998

[9] N J Savill and P Hogeweg ldquoEvolutionary stagnation due topattern-pattern interactions in a coevolutionary predator-preymodelrdquo Artificial Life vol 3 no 2 pp 81ndash100 1997

[10] J Y Kuo F-C Huang S-P Ma and Y-Y Fanjiang ldquoApplyinghybrid learning approach to RoboCuprsquos strategyrdquo Journal ofSystems and Software vol 86 no 7 pp 1933ndash1944 2013

[11] S I Nishimura and T Ikegami ldquoEmergence of collectivestrategies in a prey-predator game modelrdquo Artificial Life vol 3no 4 pp 243ndash260 1998

[12] K-C Jim and C L Giles ldquoTalking helps evolving communi-cating agents for the predator-prey pursuit problemrdquo ArtificialLife vol 6 no 3 pp 237ndash254 2000

[13] T Haynes and S Sen ldquoEvolving behavioral strategies in preda-tors and preyrdquo inAdaption and Learning inMulti-Agent Systemsvol 1042 of Lecture Notes in Computer Science pp 113ndash126Springer Berlin Germany 1995

[14] D Hladek J Vascak and P Sincak ldquoHierarchical fuzzy infer-ence system for robotic pursuit evasion taskrdquo in Proceedings ofthe 6th International SymposiumonAppliedMachine Intelligenceand Informatics (SAMI rsquo08) pp 273ndash277 Herlrsquoany SlovakiaJanuary 2008

[15] R G Smith ldquoThe contract net protocol high-level commu-nication and control in a distributed problem solverrdquo IEEETransactions on Computers vol 19 no 12 pp 1104ndash1113 1980

[16] P-C Zhou B-R Hong Y-H Wang and T Zhou ldquoMulti-agent coopoerative pursuit based on extended contract netprotocolrdquo in Proceedings of the International Conference onMachine Learning and Cybernetics vol 1 pp 169ndash173 August2004

[17] A Rao and M Georgeff ldquoBDI agents from theory to practicerdquoin Proceedings of the International Conference on Multi-AgentSystems pp 312ndash319 1995

[18] J Piaget The Equilibration of Cognitive Structures The CentralProblem of Intellectual Development University of ChicagoPress 1985

[19] J Y Kuo S J Lee C L Wu N L Hsueh and J Lee ldquoEvolu-tionary agents for intelligent transport systemsrdquo InternationalJournal of Fuzzy Systems vol 7 no 2 pp 85ndash93 2005

[20] J Y Kuo and F C Huang ldquoA hybrid approach for multi-agentlearning systemsrdquo Intelligent Automation and Soft Computingvol 17 no 3 pp 385ndash399 2011

[21] S Gebhardt P Grant R von Georgi and M T HuberldquoAspects of Piagetrsquos cognitive developmental psychology andneurobiology of psychotic disordersmdashan integrative modelrdquoMedical Hypotheses vol 71 no 3 pp 426ndash433 2008

[22] S Takamuku and R C Arkin ldquoMulti-method learning andassimilationrdquo Robotics and Autonomous Systems vol 55 no 8pp 618ndash627 2007

[23] T Parsons ldquoPursuit-evasion in a graphrdquo in Theory and Appli-cations of Graphs vol 642 of Lecture Notes in Mathematics pp426ndash441 Springer Heidelberg Germany 1978

[24] R Breisch ldquoAn intuitive approach to speleotopologyrdquo South-western Cavers vol 6 no 5 pp 72ndash78 1967

[25] F Ferrari ldquoA new parameterized potential family for path plan-ning algorithmsrdquo International Journal on Artificial IntelligenceTools vol 18 no 6 pp 949ndash957 2009

[26] K Walsh and B Banerjee ldquoFast Alowast with iterative resolution fornavigationrdquo International Journal on Artificial Intelligence Toolsvol 19 no 1 pp 101ndash119 2010

[27] L-P Wong M Y H Low and C S Chong ldquoBee colonyoptimization with local search for traveling salesman problemrdquoInternational Journal on Artificial Intelligence Tools vol 19 no3 pp 305ndash334 2010

[28] E Raboin D S Nau U Kuter S K Gupta and P SvecldquoStrategy generation inmulti-agent imperfect-information pur-suit gamesrdquo in Proceedings of the International Conference onAutonomous Agents and Multiagent Systems pp 947ndash954 2010

[29] B PGerkey SThrun andGGordon ldquoVisibility-based pursuit-evasion with limited field of Viewrdquo International Journal ofRobotics Research vol 25 no 4 pp 299ndash315 2006

[30] N Basilico N Gatti and F Amigoni ldquoLeader-follower strate-gies for robotic patrolling in environments with arbitrarytopologiesrdquo in Proceedings of the 8th International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMASrsquo09) pp 57ndash64 May 2009

[31] M Yan ldquoMulti-robot searching using game-theory basedapproachrdquo International Journal of Advanced Robotic Systemsvol 5 no 4 pp 341ndash350 2008

[32] K-DAlthoff ldquoCase-based reasoningrdquo inHandbook on SoftwareEngineering and Knowledge Engineering pp 549ndash587 2001

[33] E K Burke B L MacCarthy S Petrovic and R Qu ldquoMultiple-retrieval case-based reasoning for course timetabling prob-lemsrdquo Journal of the Operational Research Society vol 57 no2 pp 148ndash162 2006

[34] J Long D G Schwartz S Stoecklin and M K Patel ldquoAppli-cation of loop reduction to learning program behaviors foranomaly detectionrdquo in Proceedings of the International Confer-ence on Information Technology Coding and Computing (ITCC05) vol 1 pp 691ndash696 April 2005

[35] The Repast Suite The Repast Suite 2014 httprepastsource-forgenet

[36] L Matignon G J Laurent and N L Fort-Piat ldquoHysteretic Q-Learning an algorithm for decentralized reinforcement learn-ing in cooperative multi-agent teamsrdquo in Proceedings of theIEEERSJ International Conference on Intelligent Robots andSystems (IROS 07) pp 64ndash69 San Diego Calif USA October-November 2007

[37] I Partalas I Feneris and I Vlahavas ldquoA hybrid multiagentreinforcement learning approach using strategies and fusionrdquoInternational Journal on Artificial Intelligence Tools vol 17 no5 pp 945ndash962 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Multiagent Cooperative Learning ...downloads.hindawi.com/journals/mpe/2015/964871.pdf · Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion

Mathematical Problems in Engineering 5

at time 119905 Then the probability of the evader being in cell 119909 attime 119905 + 1 is given by the following (10)

119901 (119909 119905 + 1 | 119909119890119894

(119905))

=

110038161003816100381610038161003816119880 (119909119890119894

(119905)) minus 119900 (119905)10038161003816100381610038161003816

119909 isin 119880 (119909119890119894

(119905))

0 119909 isin 119900 (119905)

(10)

where 119901(119909 119905 + 1 | 119909119890119894(119905)) represents the probability of the

evader being in cell 119909 at time 119905 + 1 according to the locationwhere the pursuer detects the evader at time 119905

34 Strategies for Plan Generation When the distancebetween the retrieved case and the new case is smaller thana threshold the above approach cannot generate a usefulplan for the new case To overcome this problem this studyproposes two plan-generation strategies local-max strategyand local-cooperative strategy

341 Local-Max Strategy This strategy is used only whenthere is a single pursuer for evaders Let 119878

119896(119910) be the set of all

cells that liewithin the sensing area of a pursuer 119896 at cell119910Thetotal evasion probability of the evaders at time 119905 associatedwith the pursuer is then obtained by the following (11)

119875 (119910 119905) = sum

119911isin119878119896(119910)

[119901 (119911 119905 + 1 | 119909119890119894

(119905))] (11)

By computing the evasion possibilities of all sensing cellspursuer 119896 moves to cell 119909

119901119896(119905 + 1) that has the highest total

evasion probability expressed as follows

119909119901119896

(119905 + 1) = arg max119910isin119880(119909119901

119896(119905))

[119875 (119910 119905)] (12)

342 Local-Cooperative Strategy This strategy is for pursuersthat cooperate with each other to catch the same evaderInitially pursuers need to decide whether they can cooperatewith each other or not Let 119889(119909 119910) be theManhattan distancefrom cell 119909 to cell 119910 that is if cells 119909 and 119910 lie in a two-dimensional space with coordinates (119909

1 1199092) and (119910

1 1199102)

the Manhattan distance is calculated using the followingequation

119889 (119909 119910) = max (10038161003816100381610038161199091 minus 119910

1

1003816100381610038161003816 10038161003816100381610038161199092 minus 119910

2

1003816100381610038161003816) (13)

Let 119878119896be the farthest sensing cell of a pursuer 119896 For

example if the sensing area of a pursuer 119896 is a 5 lowast 5 squareand the pursuer stands at the center of the area the farthestdistance that the pursuer can detect is two squares If thereexists two or more pursuers and one evader correspondingwith 119889(119909

119901119896(119905) 119909119890119894(119905)) le 119904

119896and 119889(119909

119901119896(119905) 119909119890119894(119905)) le 119904

119896 the

pursuers can cooperate with each otherSuppose that two or more pursuers execute the local-

cooperative strategy Let Overlap(119909 119910) be the number ofoverlap cells within the one-step reachable set (a 3lowast3 region)around cells 119909 and 119910 The overlap function is illustrated inFigure 2 The yellow grid represents the one-step reachable

EP1

P2

Figure 2 Number of overlap cells

set for a pursuer 1198751 the dashed grid depicts the one-stepreachable set for a pursuer 1198752 and the green grid representsthe position of an evader 119864 The number of overlap cells thusequals four (ie the return value of Overlap(119909 119910))

Once the pursuers can cooperate with each other thepursuers randomly select one of them as a leader Otherpursuers can determine their locations at time 119905 + 1 by thefollowing (14)

119909119901119896

(119905 + 1) = arg min119910isin119880(119909119901

119896(119905))

[Overlap (119909119901119896

(119905 + 1) 119910)] (14)

where 119909119901119896

(119905 + 1) is the location of the leader To avoidpursuers moving to the same location each pursuer finds theminimum overlap area

35 Reward Calculation Each case has its own reward forevaluating its result The higher the reward of a case is themore likely the goal will be achieved In this study the rewardis calculated using the following equation

119877 =10038161003816100381610038161003816119878119896

(119910) cap 119880 (119909119890119894

(119905))10038161003816100381610038161003816 (15)

Reward 119877 represents the number of one-step reachablesets for an evader in a pursuerrsquos sensing areaThe reward valueis between 0 and 10 If an evader is caught then the reward ofthe case will be 10

36 Assimilation and Accommodation Piagetrsquos cognitive the-ory [18] reveals that assimilation refers to the tendency tointerpret experience as much as possible through existingcognitive structure of knowing When an agent faces a newsituation it generates a new plan with the current cognitivestructure also called its strategy to direct itself what to donextThe entire process is called assimilation In other wordsan agent assimilates a new situation with its current cognitivestructure However if the current cognitive structure cannotexplain the environment (ie the current cognitive structurecannot get the equilibration of inconsistent informationabout the world) an agent has to use the strategy of accom-modation Accommodation refers to the realization that thecurrent structure is insufficient for adequate understandingof the world and that the current structure must be changeduntil it can assimilate the new situation

6 Mathematical Problems in Engineering

This study proposes two accommodationmethods bevel-angle and positive-angle strategies for modifying a cogni-tive structure After performing accommodation an agentemploys themodified strategy to assimilate the new situationBy adjusting constantly its cognitive structure the agent canadapt more effectively to the environment

361 Bevel-Angle Strategy Suppose that an evaderrsquos locationis in the one-step reachable set for a pursuer and the evader isat a bevel angle of the pursuerThe pursuer randomly choosesone of the following strategies when the evader is at 45∘ (1205874)135∘ (31205874) 225∘ (51205874) and 315∘ (71205874) of the pursuer Letdeg(119909

119901119896(119905) 119909119890119894(119905)) be the angle between the pursuer 119896 and

evader 119894 where 119909119890119894(119905) is the position of the evader and 119909

119901119896(119905)

is the position of the pursuer at time 119905

Strategy 1 According to (16) the pursuermoves to the evaderrsquosposition at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119890119894

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

(16)

Strategy 2 According to (17) the pursuer moves to 45∘ of theevaderrsquos position at time 119905+1 Let 119909

119895be an element of the one-

step reachable set for the pursuer 119896 when it is in cell 119909 at time119905

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 45

(17)

Strategy 3 According to (18) the pursuermoves tominus45∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 45

(18)

Strategy 4 According to (19) the pursuer stays in the sameplace at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119901119896

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

(19)

362 Positive-Angle Strategy Suppose that an evaderrsquos loca-tion is in the one-step reachable set for a pursuer and theevader is at a positive angle of the pursuer where the angleincludes 0∘ 90∘ (1205872) 180∘ (120587) and 270∘ (31205872)

Strategy 1 According to (20) the pursuer moves to theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119890119894

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

(20)

Strategy 2 According to (21) the pursuer moves to 45∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 45

(21)

Strategy 3 According to (22) the pursuer moves to minus45∘ ofthe evaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 45

(22)

Mathematical Problems in Engineering 7

Strategy 4 According to (23) the pursuer moves to 90∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 90

(23)

Strategy 5 According to (24) the pursuer moves to minus90∘ ofthe evaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 90

(24)

Strategy 6 According to (25) the pursuer stays in the sameplace at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119901119896

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

(25)

Combining both the bevel-angle and positive-anglestrategies the pursuer obtains 24 different plans to adapt tothe environment effectively

37 Plan Evolution Process Plan evolution plays an importantrole in an agentrsquos life cycle When a pursuer finds an evaderin the sensing area it generates a pursuit plan Initially thepursuer searches its case base for similar cases If similar casesexist the pursuer chooses one of them as the plan Otherwisethe pursuer uses one of the abovementioned strategies togenerate a plan When the pursuer faces a new state thepursuer will adjust the plan accordingly Figure 3 shows theagentrsquos plan evolution process

(1) Problem Analysis Before determining the location in thenext step a pursuer analyzes the environmental state andupdates its goal with information from the environmentThepursuer chooses a plan according to its mental state

(2) Case Retrieval Comparing the similarity between theresult of the analysis in Step 1 and the cases in the case base

the pursuer retrieves similar cases by the (Sim) function andsorts them according to their similarity levels

(3) Similarity Calculation Similarity is classified into low andhigh levels It is possible that the pursuer retrieves a case withhigh-level similarity but obtains unsatisfactory results andthus a reward 119877 is added to prevent such If the case is withhigh-level similarity and the reward 119877 exceeds a predefinedconstant 120576 the pursuer executes Step 4 If the case is with low-level similarity or the reward 119877 is smaller than the predefinedconstant 120576 the pursuer executes Step 5

(4) Case Reuse If the retrieved case is with high-levelsimilarity and its result is acceptable the pursuer reuses thecase with revisions and then goes to Step 10

(5) Strategy Assimilation If no similar case exists in thecase base or the result of the similar case is unsatisfactorythe pursuer uses strategy assimilation that is the pursuergenerates a suitable plan with the current strategy (Steps 6ndash8) according to the situation

(6) Cooperation Before using the strategy module the pur-suer decides whether to cooperate with the others or not Ifthe decision is to cooperate with the others in the situationthen the local-cooperative strategy (ie Step 8) is executedOtherwise the pursuer executes the local-max strategy (ieStep 7)

(7) Application of the Local-Max Strategy If the pursuerdecides not to cooperate with the others it executes the local-max strategy and moves to the cell in the one-step reachableset with the highest probability of containing an evaderThengo to Step 9

(8) Application of the Local-Cooperative Strategy If thecooperative mode is taken the pursuer executes the local-cooperative strategy and moves to the cell which has theminimum overlap sensing area

(9) Reward Calculation The pursuer evaluates the plan andcalculates a reward accordingly

(10) Plan TransformationThe pursuer represents the plan bybeliefs actions and rewards

(11) Strategy Assessment The pursuer evaluates whether thecurrent strategy is suitable or not The pursuer counts thenumber 119862 of consecutive failures in catching an evader byusing the strategy If 119862 is smaller than a constant 120590 thestrategy is effective Then go to Step 13 Otherwise thestrategy is accommodated by Step 12

(12) Strategy AccommodationThe pursuer modifies the strat-egy to generate a new plan for the current environment

(13) Case Revision The actions and reward of the case arerevised after the problem is solved

8 Mathematical Problems in Engineering

(1) Problem analysis

(2) Case retrieval

(3) Similarity calculation (4) Case reuse

(5) Strategy assimilation

(6) Cooperation

(7) Application of local-maxstrategy

(8) Application of local-cooperativestrategy

(9) Reward calculation

(10) Plan transformation

(12) Strategy accommodation(13) Case revision

(14) Case retention

(11) Strategy assessment

Case base

R gt 120576

R le 120576

C lt 120590

C ge 120590

Figure 3 Plan evolution process

(14) Case Retention Parts of the experience that are likely tobe useful for future problem solving are retained

4 Case Study

This study uses a pursuit-evasion game to demonstrate theproposed approach and implements a multiagent system onthe Recursive Porous Agent Simulation Toolkit (REPAST)[35] Figure 4 shows the GUI of the system where theblue red and black circles represent pursuers evaders andobstacles respectively The yellow region depicts the sensingarea of the pursuers

41 Problem Formulation The system contains multiplepursuing robots which search for multiple evading robotsin a two-dimensional environment The physical world isabstracted as a grid world and the game is played accordingto the following settings and rules

(i) The number of pursuers is 119899 The number of evadersis unknown to the pursuers

(ii) Each pursuer has a sensing area comprising 5lowast5 cellsand the pursuer is at the center of the area as shownin Figure 4

(iii) During each moving step the agents are allowed toeither move one cell away from their current cells orstay in their current cells

(iv) Each pursuer has no information about the environ-ment other than that of its own 5-by-5 sensing area

(v) All pursuers use the same case base(vi) An evader is captured when it occupies the same

cell as a pursuer and it will be removed from theenvironment

(vii) The game ends when all evaders have been captured

42 Evasion Policy When an evader is caught the systemfirst checks whether the sum of the caught and uncaughtevaders is smaller than the total number of the predefinedevaders If so the system generates a new evader and choosesits type of movement Otherwise the system does nothingClearly when the sum of the caught and uncaught evaders

Mathematical Problems in Engineering 9

Figure 4 Implementation of multiagent system

equals the number of the predefined evaders the systemstops generating new evaders and the uncaught evaders donot increase The uncaught evaders even decrease with thesuccessful evader captures The game thus ends finally Inaddition when the system selects a movement type for anew evader the choice varies with different experimentswhich in turn affects the outcome of the experience Thisstudy investigates four types of movement namely randomclockwise counterclockwise and smart movements

Random Movement An evader randomly selects a locationthat can be reached in one step or just stays in the same loca-tion The details of the algorithm are shown in Algorithm 1

Clockwise Movement An evader moves clockwise in a squarepath The evader initially selects the direction of its first stepand sets up a distance randomly If the evader runs into anobstacle on the way it changes its direction automaticallyAlgorithm 2 shows the algorithm

Counterclockwise Movement This movement type is similarto the clockwise movement except that an evader movescounterclockwise The algorithm is shown in Algorithm 3

Smart Moving Evaders try actively to avoid pursuers Eachevader has a square sensing area Each side of the square isthree cells long If the evader finds any pursuer in the area itattempts to move to the location that has fewer pursuers inthe one-step reachable set of the area Algorithm 4 presentsthe algorithm

43 Results andDiscussion This study designs several experi-ments to compare the performance of the proposed approachHysteretic Q-Learning [36] and Reinforcement-Learningfusing (RL-Fusing) [37] All these approaches are the learningtechniques formultiagent systems Each experiment has threepursuers and one evader In addition the experiments runin ldquotime steprdquo unit One round is considered finished whenthe evader has been caught The performance results arethe average values obtained by running 30 rounds with thecurrent learning policy after 100 training episodes for eachalgorithm

431 Comparison of Approaches Matignon et al pre-sented a decentralized reinforcement learning algorithm for

independent learning robots called Hysteretic Q-Learning[36]This algorithm computes a better policy in a cooperativemultiagent system without additional information or com-munication between agents Hysteretic Q-Learning updatesthe equation for an agent 119894 executing the action 119886

119894from states

ending up in state 1199041015840 as follows

120575 larr997888 119903 + 120574max1198861015840

119876119894(1199041015840

1198861015840

) minus 119876119894(119904 119886119894)

119876119894(119904 119886119894) larr997888

119876119894(119904 119886119894) + inc120575 if 120575 gt 0

119876119894(119904 119886119894) + dec120575 else

(26)

where 119876119894(119904 119886119894) is the action value function of state 119904 and

action 119886119894 119903 is the reward it receives 120574 isin [0 1] is the discount

factor and inc anddec are the rates of increase anddecrease in119876-values respectively Partalas et al proposed the RL-Fusingapproach that uses coordinated actions and a fusing processto guide the agents [37] The fusion process combines thedecisions of all agents and then outputs a global decisionwhich the agents must follow as a team Algorithm 5 showsthe RL-Fusing algorithm The set of strategies means theplans

432 Obstacles The first experiment compares the perfor-mance of the proposed approach Hysteretic Q-Learningand RL-Fusing in environments withwithout obstacles Theevader is assumed to move randomly in every round In theenvironment with obstacles 10 obstacles are randomly set upinitially and no evader is trapped in the obstacles

Table 1 shows the performance results As can be seen allthreemethods requiremore capture steps in the environmentwith obstacles because pursuers need extra time to detectand avoid obstacles Hysteretic Q-Learning takes the greatestnumber of capture steps in both environments because itdoes not support pursuer cooperation Furthermore RL-Fusing also takes more capture steps than the proposedscheme Although RL-Fusing involves cooperation betweenpursuers the generated strategies sometimes cause a pursuerto miss evaders in its sensing area The pursuer thus needs tosearch again and requires more capture steps In contrast theproposed approach utilizes not only the cooperative strategybut also assimilation and accommodation which enable apursuer to learn from either success or failure experiencesThus the pursuer can keep in step with an evader and catchit in a smaller number of capture steps

433 Movement Types The second experimentmeasures theaverage number of capture steps required by the proposedapproach Hysteretic Q-Learning and RL-Fusing under twotypes of evader movement One is clockwise movementand the other is random clockwise and counterclockwisemovement

The average number of capture steps required by thethree approaches for different movement types is shown inTable 2 Similar to the results obtained in the first experimentthe proposed scheme outperforms Hysteretic Q-Learningand RL-Fusing Among the three approaches Hysteretic Q-Learning requires the highest number of capture steps for

10 Mathematical Problems in Engineering

(1) while true do(2) next location larr generates randomly a location nearby the evader(3) if next location has an obstacle then(4) next location larr generates randomly a new location nearby the evader(5) end if(6) end while

Algorithm 1 Random movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than the environment(3) move larr 0(4) while true do(5) get the next location according to the direction(6) move add 1(7) if move equals to distance or next location has an obstacle then(8) evader turns right(9) changes direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 2 Clockwise movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than environment(3) move larr 0(4) while true do(7) if move equals to distance or next location has obstacle then(8) evader turns left(9) change direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 3 Counterclockwise movement algorithm

(1) while true do(2) if evaderrsquos sensor area has pursuer then(3) next location larr generate a location with least pursuers detected(4) else(5) next location larr randomly generate a location nearby the evader(6) end if(7) end while

Algorithm 4 Smart movement algorithm

the same reason that it does not support agent cooperationAccording to the RL-Fusing algorithm presented in Algo-rithm 5 when a state is not the coordinated state the actionof an agent chooses a random or predefined policy and asingle pursuer thus has no learning ability Therefore the

possibility of catching an evader is low and RL-Fusing alsotakes more capture steps In contrast when the pursuer ofthe proposed scheme faces an evader the pursuer observesthe evader continuously If the evader changes its movingtype the pursuer uses accommodation to change its catching

Mathematical Problems in Engineering 11

Require A set of strategies an initial state 1199040

(1) 119904 larr 1199040

(2) while true do(3) if 119904 is coordinated state then(4) 120588 larr RandomReal(0 1)

(5) if 120588 lt 120576 then(6) select a strategy randomly the same for all agent(7) else(8) rank strategies(9) communicate ranks(10) receive ranks from other agents(11) average ranks(12) select corresponding strategy(13) end if(14) execute selected strategy(15) receive reward(16) transit to a new state 119904

1015840

(17) update 119876-value(18) else(19) act with a predefined or random policy(20) end if(21) end while

Algorithm 5 RL-Fusing algorithm

Table 1 Average number of capture steps in different environments

Grid sizeWithoutobstacles(steps)

With obstacles(steps)

HystereticQ-Learning 12 lowast 12 68 73

RL-Fusing 12 lowast 12 61 65The proposedapproach 12 lowast 12 24 27

Table 2 Average number of capture steps for different movementtypes

Grid sizeClockwisemovement(steps)

Randomclockwise andcounterclock-wise movement

(steps)HystereticQ-Learning 12 lowast 12 56 69

RL-Fusing 12 lowast 12 48 56The proposedapproach 12 lowast 12 26 32

strategy If the moving type remains fixed the pursuer keepsthe same strategy by assimilation As seen in Table 2 allthree methods require more steps to capture evaders withrandom clockwise and counterclockwise movement becausepursuers need extra time to detect the moving directions ofthe evaders

Table 3 Average number of capture steps for evaders with smartmovement in different grid sizes

Grid size 12 lowast 12(steps)

Grid size 24 lowast 24(steps)

HystereticQ-Learning 74 212

RL-Fusing 63 300The proposedapproach 63 119

Table 4 Average number of capture steps withwithout assimilationand accommodation

Evader movementtype

Clockwise

Evader movementtypeSmart

With assimilation andaccommodation 186 278

Without assimilationand accommodation 242 364

434 Smart Movement and Size of Grid The third experi-ment involves evaders with smart movementThe evaders tryactively to avoid pursuers when they sense any pursuer Theenvironment is of two grid sizes 12 lowast 12 and 24 lowast 24

Table 3 shows that the proposed scheme requires smallercapture steps in different grid sizes when compared withHysteretic Q-Learning and RL-Fusing When the grid sizeis smaller RL-Fusing outperforms Hysteretic Q-LearningHowever as the grid size becomes larger RL-Fusing requiresmore capture steps than Hysteretic Q-Learning This is

12 Mathematical Problems in Engineering

0

20

40

60

80

100

120

140

10 20 30 40Training episodes

Step

s

Grid size 12 lowast 12 (steps)Grid size 24 lowast 24 (steps)

Figure 5 Average number of capture steps under different trainingepisodes and grid sizes

because RL-Fusing depends mainly on its cooperative strate-gies to catch evaders but not all strategies can guide pursuersto move close to the evaders For example strategy 3 in RL-Fusing [37] is that once all the pursuers go at the distance of3 from an evader Because the sensing area of a pursuer is a5 lowast 5 square the pursuers easily lose the evader in a largerenvironment In contrast the proposed approach assists thepursuers to move as close to the evader as possible by theavoidance of sensing overlappingThus the approach has thehighest possibility of catching the evader

435 Smart Movement Training Episodes and Size of GridThe fourth experiment investigates the relationship betweentraining episodes and grid sizes when using the proposedapproach

Figure 5 shows that regardless of grid sizes (12 lowast 12 or24 lowast 24) the greater the number of training episodes is thesmaller the average number of capture steps is required Thisis becausemore training enriches the case base thus enablingpursuers to make better strategies and decisions

436 Assimilation and Accommodation The last experimentevaluates the proposed approach withwithout assimilationand accommodation There are two types of evader move-ment clockwise and smart The results are listed in Table 4Whether the approach is with or without assimilation andaccommodation the average number of capture steps forsmartmovement is greater than that for clockwisemovementThis is because smart movement can enable an evader todynamically change its movement and position In contrastmovement of an evader with clockwise movement can beeasily predicted hence the pursuers can take fewer steps forcapture Furthermore the average number of capture steps

required by the approach with assimilation and accommo-dation is smaller than that required by the approach withoutassimilation and accommodation for both movement typesThe approach without assimilation and accommodation gen-erates a fixed plan which cannot be changed In contrastthe approach with assimilation and accommodation enablesthe first pursuer to use the local-max strategy to find abetter position and the remaining pursuers can determinethe cooperative positions according to the first pursuerrsquosposition These pursuers can adjust their positions accordingto different catching situations and strategies Accordinglythe approach with assimilation and accommodation requiresfewer capture steps when compared with the approachwithout assimilation and accommodation

5 Conclusion

This work proposes an effective strategy learning approachto the pursuit-evasion game in a dynamic environment Thismethod can facilitate a pursuer in learning and implementingeffective plans from either success or failure experiencesunder the condition of no knowledge of environment avail-able The evolved plans fare extremely well when comparedwith some of the best constructed strategies This studyalso demonstrates the superior performance of the proposedapproach for the pursuit-evasion game by implementing amultiagent system on REPAST

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Ministry of Science andTechnology under Grant no 102-2218-E-027-007

References

[1] K P Sycara ldquoMultiagent systemsrdquo AI Magazine vol 19 no 2pp 79ndash92 1998

[2] N Vlassis A Concise Introduction to Multiagent Systems andDistributed Artificial Intelligence Synthesis Lectures on Artifi-cial Intelligence and Machine Learning Morgan amp Claypool2007

[3] A Antoniades H J Kim and S Sastry ldquoPursuit-evasion strate-gies for teams of multiple agents with incomplete informationrdquoin Proceedings of the 42nd IEEE Conference on Decision andControl pp 756ndash761 December 2003

[4] L J Guibas J-C Latombe S M Lavalle D Lin and RMotwani ldquoA visibility-based pursuit-evasion problemrdquo Interna-tional Journal of Computational GeometryampApplications vol 9no 4-5 pp 471ndash493 1999

[5] J P Hespanha G J Pappas and M Prandini ldquoGreedy controlfor hybrid pur-suit-evasion gamesrdquo in Proceedings of the Euro-pean Control Conference pp 2621ndash2626 2001

[6] R Vidal O Shakernia H J Kim D H Shim and S SastryldquoProbabilistic pursuit-evasion games theory implementation

Mathematical Problems in Engineering 13

and experimental evaluationrdquo IEEE Transactions on Roboticsand Automation vol 18 no 5 pp 662ndash669 2002

[7] R E Korf ldquoA simple solution to pursuit gamesrdquo in Proceedingsof the 2nd International Workshop on Distributed ArtificialIntelligence pp 183ndash194 1992

[8] S Nolfi and D Floreano ldquoCoevolving predator and prey robotsdo ldquoarms racesrdquo arise in artificial evolutionrdquo Artificial Life vol4 no 4 pp 311ndash335 1998

[9] N J Savill and P Hogeweg ldquoEvolutionary stagnation due topattern-pattern interactions in a coevolutionary predator-preymodelrdquo Artificial Life vol 3 no 2 pp 81ndash100 1997

[10] J Y Kuo F-C Huang S-P Ma and Y-Y Fanjiang ldquoApplyinghybrid learning approach to RoboCuprsquos strategyrdquo Journal ofSystems and Software vol 86 no 7 pp 1933ndash1944 2013

[11] S I Nishimura and T Ikegami ldquoEmergence of collectivestrategies in a prey-predator game modelrdquo Artificial Life vol 3no 4 pp 243ndash260 1998

[12] K-C Jim and C L Giles ldquoTalking helps evolving communi-cating agents for the predator-prey pursuit problemrdquo ArtificialLife vol 6 no 3 pp 237ndash254 2000

[13] T Haynes and S Sen ldquoEvolving behavioral strategies in preda-tors and preyrdquo inAdaption and Learning inMulti-Agent Systemsvol 1042 of Lecture Notes in Computer Science pp 113ndash126Springer Berlin Germany 1995

[14] D Hladek J Vascak and P Sincak ldquoHierarchical fuzzy infer-ence system for robotic pursuit evasion taskrdquo in Proceedings ofthe 6th International SymposiumonAppliedMachine Intelligenceand Informatics (SAMI rsquo08) pp 273ndash277 Herlrsquoany SlovakiaJanuary 2008

[15] R G Smith ldquoThe contract net protocol high-level commu-nication and control in a distributed problem solverrdquo IEEETransactions on Computers vol 19 no 12 pp 1104ndash1113 1980

[16] P-C Zhou B-R Hong Y-H Wang and T Zhou ldquoMulti-agent coopoerative pursuit based on extended contract netprotocolrdquo in Proceedings of the International Conference onMachine Learning and Cybernetics vol 1 pp 169ndash173 August2004

[17] A Rao and M Georgeff ldquoBDI agents from theory to practicerdquoin Proceedings of the International Conference on Multi-AgentSystems pp 312ndash319 1995

[18] J Piaget The Equilibration of Cognitive Structures The CentralProblem of Intellectual Development University of ChicagoPress 1985

[19] J Y Kuo S J Lee C L Wu N L Hsueh and J Lee ldquoEvolu-tionary agents for intelligent transport systemsrdquo InternationalJournal of Fuzzy Systems vol 7 no 2 pp 85ndash93 2005

[20] J Y Kuo and F C Huang ldquoA hybrid approach for multi-agentlearning systemsrdquo Intelligent Automation and Soft Computingvol 17 no 3 pp 385ndash399 2011

[21] S Gebhardt P Grant R von Georgi and M T HuberldquoAspects of Piagetrsquos cognitive developmental psychology andneurobiology of psychotic disordersmdashan integrative modelrdquoMedical Hypotheses vol 71 no 3 pp 426ndash433 2008

[22] S Takamuku and R C Arkin ldquoMulti-method learning andassimilationrdquo Robotics and Autonomous Systems vol 55 no 8pp 618ndash627 2007

[23] T Parsons ldquoPursuit-evasion in a graphrdquo in Theory and Appli-cations of Graphs vol 642 of Lecture Notes in Mathematics pp426ndash441 Springer Heidelberg Germany 1978

[24] R Breisch ldquoAn intuitive approach to speleotopologyrdquo South-western Cavers vol 6 no 5 pp 72ndash78 1967

[25] F Ferrari ldquoA new parameterized potential family for path plan-ning algorithmsrdquo International Journal on Artificial IntelligenceTools vol 18 no 6 pp 949ndash957 2009

[26] K Walsh and B Banerjee ldquoFast Alowast with iterative resolution fornavigationrdquo International Journal on Artificial Intelligence Toolsvol 19 no 1 pp 101ndash119 2010

[27] L-P Wong M Y H Low and C S Chong ldquoBee colonyoptimization with local search for traveling salesman problemrdquoInternational Journal on Artificial Intelligence Tools vol 19 no3 pp 305ndash334 2010

[28] E Raboin D S Nau U Kuter S K Gupta and P SvecldquoStrategy generation inmulti-agent imperfect-information pur-suit gamesrdquo in Proceedings of the International Conference onAutonomous Agents and Multiagent Systems pp 947ndash954 2010

[29] B PGerkey SThrun andGGordon ldquoVisibility-based pursuit-evasion with limited field of Viewrdquo International Journal ofRobotics Research vol 25 no 4 pp 299ndash315 2006

[30] N Basilico N Gatti and F Amigoni ldquoLeader-follower strate-gies for robotic patrolling in environments with arbitrarytopologiesrdquo in Proceedings of the 8th International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMASrsquo09) pp 57ndash64 May 2009

[31] M Yan ldquoMulti-robot searching using game-theory basedapproachrdquo International Journal of Advanced Robotic Systemsvol 5 no 4 pp 341ndash350 2008

[32] K-DAlthoff ldquoCase-based reasoningrdquo inHandbook on SoftwareEngineering and Knowledge Engineering pp 549ndash587 2001

[33] E K Burke B L MacCarthy S Petrovic and R Qu ldquoMultiple-retrieval case-based reasoning for course timetabling prob-lemsrdquo Journal of the Operational Research Society vol 57 no2 pp 148ndash162 2006

[34] J Long D G Schwartz S Stoecklin and M K Patel ldquoAppli-cation of loop reduction to learning program behaviors foranomaly detectionrdquo in Proceedings of the International Confer-ence on Information Technology Coding and Computing (ITCC05) vol 1 pp 691ndash696 April 2005

[35] The Repast Suite The Repast Suite 2014 httprepastsource-forgenet

[36] L Matignon G J Laurent and N L Fort-Piat ldquoHysteretic Q-Learning an algorithm for decentralized reinforcement learn-ing in cooperative multi-agent teamsrdquo in Proceedings of theIEEERSJ International Conference on Intelligent Robots andSystems (IROS 07) pp 64ndash69 San Diego Calif USA October-November 2007

[37] I Partalas I Feneris and I Vlahavas ldquoA hybrid multiagentreinforcement learning approach using strategies and fusionrdquoInternational Journal on Artificial Intelligence Tools vol 17 no5 pp 945ndash962 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Multiagent Cooperative Learning ...downloads.hindawi.com/journals/mpe/2015/964871.pdf · Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion

6 Mathematical Problems in Engineering

This study proposes two accommodationmethods bevel-angle and positive-angle strategies for modifying a cogni-tive structure After performing accommodation an agentemploys themodified strategy to assimilate the new situationBy adjusting constantly its cognitive structure the agent canadapt more effectively to the environment

361 Bevel-Angle Strategy Suppose that an evaderrsquos locationis in the one-step reachable set for a pursuer and the evader isat a bevel angle of the pursuerThe pursuer randomly choosesone of the following strategies when the evader is at 45∘ (1205874)135∘ (31205874) 225∘ (51205874) and 315∘ (71205874) of the pursuer Letdeg(119909

119901119896(119905) 119909119890119894(119905)) be the angle between the pursuer 119896 and

evader 119894 where 119909119890119894(119905) is the position of the evader and 119909

119901119896(119905)

is the position of the pursuer at time 119905

Strategy 1 According to (16) the pursuermoves to the evaderrsquosposition at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119890119894

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

(16)

Strategy 2 According to (17) the pursuer moves to 45∘ of theevaderrsquos position at time 119905+1 Let 119909

119895be an element of the one-

step reachable set for the pursuer 119896 when it is in cell 119909 at time119905

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 45

(17)

Strategy 3 According to (18) the pursuermoves tominus45∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 45

(18)

Strategy 4 According to (19) the pursuer stays in the sameplace at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119901119896

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin120587

43120587

45120587

47120587

4

(19)

362 Positive-Angle Strategy Suppose that an evaderrsquos loca-tion is in the one-step reachable set for a pursuer and theevader is at a positive angle of the pursuer where the angleincludes 0∘ 90∘ (1205872) 180∘ (120587) and 270∘ (31205872)

Strategy 1 According to (20) the pursuer moves to theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119890119894

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

(20)

Strategy 2 According to (21) the pursuer moves to 45∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 45

(21)

Strategy 3 According to (22) the pursuer moves to minus45∘ ofthe evaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 45

(22)

Mathematical Problems in Engineering 7

Strategy 4 According to (23) the pursuer moves to 90∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 90

(23)

Strategy 5 According to (24) the pursuer moves to minus90∘ ofthe evaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 90

(24)

Strategy 6 According to (25) the pursuer stays in the sameplace at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119901119896

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

(25)

Combining both the bevel-angle and positive-anglestrategies the pursuer obtains 24 different plans to adapt tothe environment effectively

37 Plan Evolution Process Plan evolution plays an importantrole in an agentrsquos life cycle When a pursuer finds an evaderin the sensing area it generates a pursuit plan Initially thepursuer searches its case base for similar cases If similar casesexist the pursuer chooses one of them as the plan Otherwisethe pursuer uses one of the abovementioned strategies togenerate a plan When the pursuer faces a new state thepursuer will adjust the plan accordingly Figure 3 shows theagentrsquos plan evolution process

(1) Problem Analysis Before determining the location in thenext step a pursuer analyzes the environmental state andupdates its goal with information from the environmentThepursuer chooses a plan according to its mental state

(2) Case Retrieval Comparing the similarity between theresult of the analysis in Step 1 and the cases in the case base

the pursuer retrieves similar cases by the (Sim) function andsorts them according to their similarity levels

(3) Similarity Calculation Similarity is classified into low andhigh levels It is possible that the pursuer retrieves a case withhigh-level similarity but obtains unsatisfactory results andthus a reward 119877 is added to prevent such If the case is withhigh-level similarity and the reward 119877 exceeds a predefinedconstant 120576 the pursuer executes Step 4 If the case is with low-level similarity or the reward 119877 is smaller than the predefinedconstant 120576 the pursuer executes Step 5

(4) Case Reuse If the retrieved case is with high-levelsimilarity and its result is acceptable the pursuer reuses thecase with revisions and then goes to Step 10

(5) Strategy Assimilation If no similar case exists in thecase base or the result of the similar case is unsatisfactorythe pursuer uses strategy assimilation that is the pursuergenerates a suitable plan with the current strategy (Steps 6ndash8) according to the situation

(6) Cooperation Before using the strategy module the pur-suer decides whether to cooperate with the others or not Ifthe decision is to cooperate with the others in the situationthen the local-cooperative strategy (ie Step 8) is executedOtherwise the pursuer executes the local-max strategy (ieStep 7)

(7) Application of the Local-Max Strategy If the pursuerdecides not to cooperate with the others it executes the local-max strategy and moves to the cell in the one-step reachableset with the highest probability of containing an evaderThengo to Step 9

(8) Application of the Local-Cooperative Strategy If thecooperative mode is taken the pursuer executes the local-cooperative strategy and moves to the cell which has theminimum overlap sensing area

(9) Reward Calculation The pursuer evaluates the plan andcalculates a reward accordingly

(10) Plan TransformationThe pursuer represents the plan bybeliefs actions and rewards

(11) Strategy Assessment The pursuer evaluates whether thecurrent strategy is suitable or not The pursuer counts thenumber 119862 of consecutive failures in catching an evader byusing the strategy If 119862 is smaller than a constant 120590 thestrategy is effective Then go to Step 13 Otherwise thestrategy is accommodated by Step 12

(12) Strategy AccommodationThe pursuer modifies the strat-egy to generate a new plan for the current environment

(13) Case Revision The actions and reward of the case arerevised after the problem is solved

8 Mathematical Problems in Engineering

(1) Problem analysis

(2) Case retrieval

(3) Similarity calculation (4) Case reuse

(5) Strategy assimilation

(6) Cooperation

(7) Application of local-maxstrategy

(8) Application of local-cooperativestrategy

(9) Reward calculation

(10) Plan transformation

(12) Strategy accommodation(13) Case revision

(14) Case retention

(11) Strategy assessment

Case base

R gt 120576

R le 120576

C lt 120590

C ge 120590

Figure 3 Plan evolution process

(14) Case Retention Parts of the experience that are likely tobe useful for future problem solving are retained

4 Case Study

This study uses a pursuit-evasion game to demonstrate theproposed approach and implements a multiagent system onthe Recursive Porous Agent Simulation Toolkit (REPAST)[35] Figure 4 shows the GUI of the system where theblue red and black circles represent pursuers evaders andobstacles respectively The yellow region depicts the sensingarea of the pursuers

41 Problem Formulation The system contains multiplepursuing robots which search for multiple evading robotsin a two-dimensional environment The physical world isabstracted as a grid world and the game is played accordingto the following settings and rules

(i) The number of pursuers is 119899 The number of evadersis unknown to the pursuers

(ii) Each pursuer has a sensing area comprising 5lowast5 cellsand the pursuer is at the center of the area as shownin Figure 4

(iii) During each moving step the agents are allowed toeither move one cell away from their current cells orstay in their current cells

(iv) Each pursuer has no information about the environ-ment other than that of its own 5-by-5 sensing area

(v) All pursuers use the same case base(vi) An evader is captured when it occupies the same

cell as a pursuer and it will be removed from theenvironment

(vii) The game ends when all evaders have been captured

42 Evasion Policy When an evader is caught the systemfirst checks whether the sum of the caught and uncaughtevaders is smaller than the total number of the predefinedevaders If so the system generates a new evader and choosesits type of movement Otherwise the system does nothingClearly when the sum of the caught and uncaught evaders

Mathematical Problems in Engineering 9

Figure 4 Implementation of multiagent system

equals the number of the predefined evaders the systemstops generating new evaders and the uncaught evaders donot increase The uncaught evaders even decrease with thesuccessful evader captures The game thus ends finally Inaddition when the system selects a movement type for anew evader the choice varies with different experimentswhich in turn affects the outcome of the experience Thisstudy investigates four types of movement namely randomclockwise counterclockwise and smart movements

Random Movement An evader randomly selects a locationthat can be reached in one step or just stays in the same loca-tion The details of the algorithm are shown in Algorithm 1

Clockwise Movement An evader moves clockwise in a squarepath The evader initially selects the direction of its first stepand sets up a distance randomly If the evader runs into anobstacle on the way it changes its direction automaticallyAlgorithm 2 shows the algorithm

Counterclockwise Movement This movement type is similarto the clockwise movement except that an evader movescounterclockwise The algorithm is shown in Algorithm 3

Smart Moving Evaders try actively to avoid pursuers Eachevader has a square sensing area Each side of the square isthree cells long If the evader finds any pursuer in the area itattempts to move to the location that has fewer pursuers inthe one-step reachable set of the area Algorithm 4 presentsthe algorithm

43 Results andDiscussion This study designs several experi-ments to compare the performance of the proposed approachHysteretic Q-Learning [36] and Reinforcement-Learningfusing (RL-Fusing) [37] All these approaches are the learningtechniques formultiagent systems Each experiment has threepursuers and one evader In addition the experiments runin ldquotime steprdquo unit One round is considered finished whenthe evader has been caught The performance results arethe average values obtained by running 30 rounds with thecurrent learning policy after 100 training episodes for eachalgorithm

431 Comparison of Approaches Matignon et al pre-sented a decentralized reinforcement learning algorithm for

independent learning robots called Hysteretic Q-Learning[36]This algorithm computes a better policy in a cooperativemultiagent system without additional information or com-munication between agents Hysteretic Q-Learning updatesthe equation for an agent 119894 executing the action 119886

119894from states

ending up in state 1199041015840 as follows

120575 larr997888 119903 + 120574max1198861015840

119876119894(1199041015840

1198861015840

) minus 119876119894(119904 119886119894)

119876119894(119904 119886119894) larr997888

119876119894(119904 119886119894) + inc120575 if 120575 gt 0

119876119894(119904 119886119894) + dec120575 else

(26)

where 119876119894(119904 119886119894) is the action value function of state 119904 and

action 119886119894 119903 is the reward it receives 120574 isin [0 1] is the discount

factor and inc anddec are the rates of increase anddecrease in119876-values respectively Partalas et al proposed the RL-Fusingapproach that uses coordinated actions and a fusing processto guide the agents [37] The fusion process combines thedecisions of all agents and then outputs a global decisionwhich the agents must follow as a team Algorithm 5 showsthe RL-Fusing algorithm The set of strategies means theplans

432 Obstacles The first experiment compares the perfor-mance of the proposed approach Hysteretic Q-Learningand RL-Fusing in environments withwithout obstacles Theevader is assumed to move randomly in every round In theenvironment with obstacles 10 obstacles are randomly set upinitially and no evader is trapped in the obstacles

Table 1 shows the performance results As can be seen allthreemethods requiremore capture steps in the environmentwith obstacles because pursuers need extra time to detectand avoid obstacles Hysteretic Q-Learning takes the greatestnumber of capture steps in both environments because itdoes not support pursuer cooperation Furthermore RL-Fusing also takes more capture steps than the proposedscheme Although RL-Fusing involves cooperation betweenpursuers the generated strategies sometimes cause a pursuerto miss evaders in its sensing area The pursuer thus needs tosearch again and requires more capture steps In contrast theproposed approach utilizes not only the cooperative strategybut also assimilation and accommodation which enable apursuer to learn from either success or failure experiencesThus the pursuer can keep in step with an evader and catchit in a smaller number of capture steps

433 Movement Types The second experimentmeasures theaverage number of capture steps required by the proposedapproach Hysteretic Q-Learning and RL-Fusing under twotypes of evader movement One is clockwise movementand the other is random clockwise and counterclockwisemovement

The average number of capture steps required by thethree approaches for different movement types is shown inTable 2 Similar to the results obtained in the first experimentthe proposed scheme outperforms Hysteretic Q-Learningand RL-Fusing Among the three approaches Hysteretic Q-Learning requires the highest number of capture steps for

10 Mathematical Problems in Engineering

(1) while true do(2) next location larr generates randomly a location nearby the evader(3) if next location has an obstacle then(4) next location larr generates randomly a new location nearby the evader(5) end if(6) end while

Algorithm 1 Random movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than the environment(3) move larr 0(4) while true do(5) get the next location according to the direction(6) move add 1(7) if move equals to distance or next location has an obstacle then(8) evader turns right(9) changes direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 2 Clockwise movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than environment(3) move larr 0(4) while true do(7) if move equals to distance or next location has obstacle then(8) evader turns left(9) change direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 3 Counterclockwise movement algorithm

(1) while true do(2) if evaderrsquos sensor area has pursuer then(3) next location larr generate a location with least pursuers detected(4) else(5) next location larr randomly generate a location nearby the evader(6) end if(7) end while

Algorithm 4 Smart movement algorithm

the same reason that it does not support agent cooperationAccording to the RL-Fusing algorithm presented in Algo-rithm 5 when a state is not the coordinated state the actionof an agent chooses a random or predefined policy and asingle pursuer thus has no learning ability Therefore the

possibility of catching an evader is low and RL-Fusing alsotakes more capture steps In contrast when the pursuer ofthe proposed scheme faces an evader the pursuer observesthe evader continuously If the evader changes its movingtype the pursuer uses accommodation to change its catching

Mathematical Problems in Engineering 11

Require A set of strategies an initial state 1199040

(1) 119904 larr 1199040

(2) while true do(3) if 119904 is coordinated state then(4) 120588 larr RandomReal(0 1)

(5) if 120588 lt 120576 then(6) select a strategy randomly the same for all agent(7) else(8) rank strategies(9) communicate ranks(10) receive ranks from other agents(11) average ranks(12) select corresponding strategy(13) end if(14) execute selected strategy(15) receive reward(16) transit to a new state 119904

1015840

(17) update 119876-value(18) else(19) act with a predefined or random policy(20) end if(21) end while

Algorithm 5 RL-Fusing algorithm

Table 1 Average number of capture steps in different environments

Grid sizeWithoutobstacles(steps)

With obstacles(steps)

HystereticQ-Learning 12 lowast 12 68 73

RL-Fusing 12 lowast 12 61 65The proposedapproach 12 lowast 12 24 27

Table 2 Average number of capture steps for different movementtypes

Grid sizeClockwisemovement(steps)

Randomclockwise andcounterclock-wise movement

(steps)HystereticQ-Learning 12 lowast 12 56 69

RL-Fusing 12 lowast 12 48 56The proposedapproach 12 lowast 12 26 32

strategy If the moving type remains fixed the pursuer keepsthe same strategy by assimilation As seen in Table 2 allthree methods require more steps to capture evaders withrandom clockwise and counterclockwise movement becausepursuers need extra time to detect the moving directions ofthe evaders

Table 3 Average number of capture steps for evaders with smartmovement in different grid sizes

Grid size 12 lowast 12(steps)

Grid size 24 lowast 24(steps)

HystereticQ-Learning 74 212

RL-Fusing 63 300The proposedapproach 63 119

Table 4 Average number of capture steps withwithout assimilationand accommodation

Evader movementtype

Clockwise

Evader movementtypeSmart

With assimilation andaccommodation 186 278

Without assimilationand accommodation 242 364

434 Smart Movement and Size of Grid The third experi-ment involves evaders with smart movementThe evaders tryactively to avoid pursuers when they sense any pursuer Theenvironment is of two grid sizes 12 lowast 12 and 24 lowast 24

Table 3 shows that the proposed scheme requires smallercapture steps in different grid sizes when compared withHysteretic Q-Learning and RL-Fusing When the grid sizeis smaller RL-Fusing outperforms Hysteretic Q-LearningHowever as the grid size becomes larger RL-Fusing requiresmore capture steps than Hysteretic Q-Learning This is

12 Mathematical Problems in Engineering

0

20

40

60

80

100

120

140

10 20 30 40Training episodes

Step

s

Grid size 12 lowast 12 (steps)Grid size 24 lowast 24 (steps)

Figure 5 Average number of capture steps under different trainingepisodes and grid sizes

because RL-Fusing depends mainly on its cooperative strate-gies to catch evaders but not all strategies can guide pursuersto move close to the evaders For example strategy 3 in RL-Fusing [37] is that once all the pursuers go at the distance of3 from an evader Because the sensing area of a pursuer is a5 lowast 5 square the pursuers easily lose the evader in a largerenvironment In contrast the proposed approach assists thepursuers to move as close to the evader as possible by theavoidance of sensing overlappingThus the approach has thehighest possibility of catching the evader

435 Smart Movement Training Episodes and Size of GridThe fourth experiment investigates the relationship betweentraining episodes and grid sizes when using the proposedapproach

Figure 5 shows that regardless of grid sizes (12 lowast 12 or24 lowast 24) the greater the number of training episodes is thesmaller the average number of capture steps is required Thisis becausemore training enriches the case base thus enablingpursuers to make better strategies and decisions

436 Assimilation and Accommodation The last experimentevaluates the proposed approach withwithout assimilationand accommodation There are two types of evader move-ment clockwise and smart The results are listed in Table 4Whether the approach is with or without assimilation andaccommodation the average number of capture steps forsmartmovement is greater than that for clockwisemovementThis is because smart movement can enable an evader todynamically change its movement and position In contrastmovement of an evader with clockwise movement can beeasily predicted hence the pursuers can take fewer steps forcapture Furthermore the average number of capture steps

required by the approach with assimilation and accommo-dation is smaller than that required by the approach withoutassimilation and accommodation for both movement typesThe approach without assimilation and accommodation gen-erates a fixed plan which cannot be changed In contrastthe approach with assimilation and accommodation enablesthe first pursuer to use the local-max strategy to find abetter position and the remaining pursuers can determinethe cooperative positions according to the first pursuerrsquosposition These pursuers can adjust their positions accordingto different catching situations and strategies Accordinglythe approach with assimilation and accommodation requiresfewer capture steps when compared with the approachwithout assimilation and accommodation

5 Conclusion

This work proposes an effective strategy learning approachto the pursuit-evasion game in a dynamic environment Thismethod can facilitate a pursuer in learning and implementingeffective plans from either success or failure experiencesunder the condition of no knowledge of environment avail-able The evolved plans fare extremely well when comparedwith some of the best constructed strategies This studyalso demonstrates the superior performance of the proposedapproach for the pursuit-evasion game by implementing amultiagent system on REPAST

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Ministry of Science andTechnology under Grant no 102-2218-E-027-007

References

[1] K P Sycara ldquoMultiagent systemsrdquo AI Magazine vol 19 no 2pp 79ndash92 1998

[2] N Vlassis A Concise Introduction to Multiagent Systems andDistributed Artificial Intelligence Synthesis Lectures on Artifi-cial Intelligence and Machine Learning Morgan amp Claypool2007

[3] A Antoniades H J Kim and S Sastry ldquoPursuit-evasion strate-gies for teams of multiple agents with incomplete informationrdquoin Proceedings of the 42nd IEEE Conference on Decision andControl pp 756ndash761 December 2003

[4] L J Guibas J-C Latombe S M Lavalle D Lin and RMotwani ldquoA visibility-based pursuit-evasion problemrdquo Interna-tional Journal of Computational GeometryampApplications vol 9no 4-5 pp 471ndash493 1999

[5] J P Hespanha G J Pappas and M Prandini ldquoGreedy controlfor hybrid pur-suit-evasion gamesrdquo in Proceedings of the Euro-pean Control Conference pp 2621ndash2626 2001

[6] R Vidal O Shakernia H J Kim D H Shim and S SastryldquoProbabilistic pursuit-evasion games theory implementation

Mathematical Problems in Engineering 13

and experimental evaluationrdquo IEEE Transactions on Roboticsand Automation vol 18 no 5 pp 662ndash669 2002

[7] R E Korf ldquoA simple solution to pursuit gamesrdquo in Proceedingsof the 2nd International Workshop on Distributed ArtificialIntelligence pp 183ndash194 1992

[8] S Nolfi and D Floreano ldquoCoevolving predator and prey robotsdo ldquoarms racesrdquo arise in artificial evolutionrdquo Artificial Life vol4 no 4 pp 311ndash335 1998

[9] N J Savill and P Hogeweg ldquoEvolutionary stagnation due topattern-pattern interactions in a coevolutionary predator-preymodelrdquo Artificial Life vol 3 no 2 pp 81ndash100 1997

[10] J Y Kuo F-C Huang S-P Ma and Y-Y Fanjiang ldquoApplyinghybrid learning approach to RoboCuprsquos strategyrdquo Journal ofSystems and Software vol 86 no 7 pp 1933ndash1944 2013

[11] S I Nishimura and T Ikegami ldquoEmergence of collectivestrategies in a prey-predator game modelrdquo Artificial Life vol 3no 4 pp 243ndash260 1998

[12] K-C Jim and C L Giles ldquoTalking helps evolving communi-cating agents for the predator-prey pursuit problemrdquo ArtificialLife vol 6 no 3 pp 237ndash254 2000

[13] T Haynes and S Sen ldquoEvolving behavioral strategies in preda-tors and preyrdquo inAdaption and Learning inMulti-Agent Systemsvol 1042 of Lecture Notes in Computer Science pp 113ndash126Springer Berlin Germany 1995

[14] D Hladek J Vascak and P Sincak ldquoHierarchical fuzzy infer-ence system for robotic pursuit evasion taskrdquo in Proceedings ofthe 6th International SymposiumonAppliedMachine Intelligenceand Informatics (SAMI rsquo08) pp 273ndash277 Herlrsquoany SlovakiaJanuary 2008

[15] R G Smith ldquoThe contract net protocol high-level commu-nication and control in a distributed problem solverrdquo IEEETransactions on Computers vol 19 no 12 pp 1104ndash1113 1980

[16] P-C Zhou B-R Hong Y-H Wang and T Zhou ldquoMulti-agent coopoerative pursuit based on extended contract netprotocolrdquo in Proceedings of the International Conference onMachine Learning and Cybernetics vol 1 pp 169ndash173 August2004

[17] A Rao and M Georgeff ldquoBDI agents from theory to practicerdquoin Proceedings of the International Conference on Multi-AgentSystems pp 312ndash319 1995

[18] J Piaget The Equilibration of Cognitive Structures The CentralProblem of Intellectual Development University of ChicagoPress 1985

[19] J Y Kuo S J Lee C L Wu N L Hsueh and J Lee ldquoEvolu-tionary agents for intelligent transport systemsrdquo InternationalJournal of Fuzzy Systems vol 7 no 2 pp 85ndash93 2005

[20] J Y Kuo and F C Huang ldquoA hybrid approach for multi-agentlearning systemsrdquo Intelligent Automation and Soft Computingvol 17 no 3 pp 385ndash399 2011

[21] S Gebhardt P Grant R von Georgi and M T HuberldquoAspects of Piagetrsquos cognitive developmental psychology andneurobiology of psychotic disordersmdashan integrative modelrdquoMedical Hypotheses vol 71 no 3 pp 426ndash433 2008

[22] S Takamuku and R C Arkin ldquoMulti-method learning andassimilationrdquo Robotics and Autonomous Systems vol 55 no 8pp 618ndash627 2007

[23] T Parsons ldquoPursuit-evasion in a graphrdquo in Theory and Appli-cations of Graphs vol 642 of Lecture Notes in Mathematics pp426ndash441 Springer Heidelberg Germany 1978

[24] R Breisch ldquoAn intuitive approach to speleotopologyrdquo South-western Cavers vol 6 no 5 pp 72ndash78 1967

[25] F Ferrari ldquoA new parameterized potential family for path plan-ning algorithmsrdquo International Journal on Artificial IntelligenceTools vol 18 no 6 pp 949ndash957 2009

[26] K Walsh and B Banerjee ldquoFast Alowast with iterative resolution fornavigationrdquo International Journal on Artificial Intelligence Toolsvol 19 no 1 pp 101ndash119 2010

[27] L-P Wong M Y H Low and C S Chong ldquoBee colonyoptimization with local search for traveling salesman problemrdquoInternational Journal on Artificial Intelligence Tools vol 19 no3 pp 305ndash334 2010

[28] E Raboin D S Nau U Kuter S K Gupta and P SvecldquoStrategy generation inmulti-agent imperfect-information pur-suit gamesrdquo in Proceedings of the International Conference onAutonomous Agents and Multiagent Systems pp 947ndash954 2010

[29] B PGerkey SThrun andGGordon ldquoVisibility-based pursuit-evasion with limited field of Viewrdquo International Journal ofRobotics Research vol 25 no 4 pp 299ndash315 2006

[30] N Basilico N Gatti and F Amigoni ldquoLeader-follower strate-gies for robotic patrolling in environments with arbitrarytopologiesrdquo in Proceedings of the 8th International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMASrsquo09) pp 57ndash64 May 2009

[31] M Yan ldquoMulti-robot searching using game-theory basedapproachrdquo International Journal of Advanced Robotic Systemsvol 5 no 4 pp 341ndash350 2008

[32] K-DAlthoff ldquoCase-based reasoningrdquo inHandbook on SoftwareEngineering and Knowledge Engineering pp 549ndash587 2001

[33] E K Burke B L MacCarthy S Petrovic and R Qu ldquoMultiple-retrieval case-based reasoning for course timetabling prob-lemsrdquo Journal of the Operational Research Society vol 57 no2 pp 148ndash162 2006

[34] J Long D G Schwartz S Stoecklin and M K Patel ldquoAppli-cation of loop reduction to learning program behaviors foranomaly detectionrdquo in Proceedings of the International Confer-ence on Information Technology Coding and Computing (ITCC05) vol 1 pp 691ndash696 April 2005

[35] The Repast Suite The Repast Suite 2014 httprepastsource-forgenet

[36] L Matignon G J Laurent and N L Fort-Piat ldquoHysteretic Q-Learning an algorithm for decentralized reinforcement learn-ing in cooperative multi-agent teamsrdquo in Proceedings of theIEEERSJ International Conference on Intelligent Robots andSystems (IROS 07) pp 64ndash69 San Diego Calif USA October-November 2007

[37] I Partalas I Feneris and I Vlahavas ldquoA hybrid multiagentreinforcement learning approach using strategies and fusionrdquoInternational Journal on Artificial Intelligence Tools vol 17 no5 pp 945ndash962 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Multiagent Cooperative Learning ...downloads.hindawi.com/journals/mpe/2015/964871.pdf · Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion

Mathematical Problems in Engineering 7

Strategy 4 According to (23) the pursuer moves to 90∘ of theevaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) + 90

(23)

Strategy 5 According to (24) the pursuer moves to minus90∘ ofthe evaderrsquos position at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119895

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

cap 119909119895

isin 119880 (119909119901119896

(119905))

cap deg (119909119901119896

(119905) 119909119895) = deg (119909

119901119896(119905) 119909119890119894

(119905)) minus 90

(24)

Strategy 6 According to (25) the pursuer stays in the sameplace at time 119905 + 1

119909119901119896

(119905 + 1) = 119909119901119896

(119905)

if 119909119890119894

(119905) isin 119880 (119909119901119896

(119905))

deg (119909119901119896

(119905) 119909119890119894

(119905)) isin 0120587

2 120587

3120587

2

(25)

Combining both the bevel-angle and positive-anglestrategies the pursuer obtains 24 different plans to adapt tothe environment effectively

37 Plan Evolution Process Plan evolution plays an importantrole in an agentrsquos life cycle When a pursuer finds an evaderin the sensing area it generates a pursuit plan Initially thepursuer searches its case base for similar cases If similar casesexist the pursuer chooses one of them as the plan Otherwisethe pursuer uses one of the abovementioned strategies togenerate a plan When the pursuer faces a new state thepursuer will adjust the plan accordingly Figure 3 shows theagentrsquos plan evolution process

(1) Problem Analysis Before determining the location in thenext step a pursuer analyzes the environmental state andupdates its goal with information from the environmentThepursuer chooses a plan according to its mental state

(2) Case Retrieval Comparing the similarity between theresult of the analysis in Step 1 and the cases in the case base

the pursuer retrieves similar cases by the (Sim) function andsorts them according to their similarity levels

(3) Similarity Calculation Similarity is classified into low andhigh levels It is possible that the pursuer retrieves a case withhigh-level similarity but obtains unsatisfactory results andthus a reward 119877 is added to prevent such If the case is withhigh-level similarity and the reward 119877 exceeds a predefinedconstant 120576 the pursuer executes Step 4 If the case is with low-level similarity or the reward 119877 is smaller than the predefinedconstant 120576 the pursuer executes Step 5

(4) Case Reuse If the retrieved case is with high-levelsimilarity and its result is acceptable the pursuer reuses thecase with revisions and then goes to Step 10

(5) Strategy Assimilation If no similar case exists in thecase base or the result of the similar case is unsatisfactorythe pursuer uses strategy assimilation that is the pursuergenerates a suitable plan with the current strategy (Steps 6ndash8) according to the situation

(6) Cooperation Before using the strategy module the pur-suer decides whether to cooperate with the others or not Ifthe decision is to cooperate with the others in the situationthen the local-cooperative strategy (ie Step 8) is executedOtherwise the pursuer executes the local-max strategy (ieStep 7)

(7) Application of the Local-Max Strategy If the pursuerdecides not to cooperate with the others it executes the local-max strategy and moves to the cell in the one-step reachableset with the highest probability of containing an evaderThengo to Step 9

(8) Application of the Local-Cooperative Strategy If thecooperative mode is taken the pursuer executes the local-cooperative strategy and moves to the cell which has theminimum overlap sensing area

(9) Reward Calculation The pursuer evaluates the plan andcalculates a reward accordingly

(10) Plan TransformationThe pursuer represents the plan bybeliefs actions and rewards

(11) Strategy Assessment The pursuer evaluates whether thecurrent strategy is suitable or not The pursuer counts thenumber 119862 of consecutive failures in catching an evader byusing the strategy If 119862 is smaller than a constant 120590 thestrategy is effective Then go to Step 13 Otherwise thestrategy is accommodated by Step 12

(12) Strategy AccommodationThe pursuer modifies the strat-egy to generate a new plan for the current environment

(13) Case Revision The actions and reward of the case arerevised after the problem is solved

8 Mathematical Problems in Engineering

(1) Problem analysis

(2) Case retrieval

(3) Similarity calculation (4) Case reuse

(5) Strategy assimilation

(6) Cooperation

(7) Application of local-maxstrategy

(8) Application of local-cooperativestrategy

(9) Reward calculation

(10) Plan transformation

(12) Strategy accommodation(13) Case revision

(14) Case retention

(11) Strategy assessment

Case base

R gt 120576

R le 120576

C lt 120590

C ge 120590

Figure 3 Plan evolution process

(14) Case Retention Parts of the experience that are likely tobe useful for future problem solving are retained

4 Case Study

This study uses a pursuit-evasion game to demonstrate theproposed approach and implements a multiagent system onthe Recursive Porous Agent Simulation Toolkit (REPAST)[35] Figure 4 shows the GUI of the system where theblue red and black circles represent pursuers evaders andobstacles respectively The yellow region depicts the sensingarea of the pursuers

41 Problem Formulation The system contains multiplepursuing robots which search for multiple evading robotsin a two-dimensional environment The physical world isabstracted as a grid world and the game is played accordingto the following settings and rules

(i) The number of pursuers is 119899 The number of evadersis unknown to the pursuers

(ii) Each pursuer has a sensing area comprising 5lowast5 cellsand the pursuer is at the center of the area as shownin Figure 4

(iii) During each moving step the agents are allowed toeither move one cell away from their current cells orstay in their current cells

(iv) Each pursuer has no information about the environ-ment other than that of its own 5-by-5 sensing area

(v) All pursuers use the same case base(vi) An evader is captured when it occupies the same

cell as a pursuer and it will be removed from theenvironment

(vii) The game ends when all evaders have been captured

42 Evasion Policy When an evader is caught the systemfirst checks whether the sum of the caught and uncaughtevaders is smaller than the total number of the predefinedevaders If so the system generates a new evader and choosesits type of movement Otherwise the system does nothingClearly when the sum of the caught and uncaught evaders

Mathematical Problems in Engineering 9

Figure 4 Implementation of multiagent system

equals the number of the predefined evaders the systemstops generating new evaders and the uncaught evaders donot increase The uncaught evaders even decrease with thesuccessful evader captures The game thus ends finally Inaddition when the system selects a movement type for anew evader the choice varies with different experimentswhich in turn affects the outcome of the experience Thisstudy investigates four types of movement namely randomclockwise counterclockwise and smart movements

Random Movement An evader randomly selects a locationthat can be reached in one step or just stays in the same loca-tion The details of the algorithm are shown in Algorithm 1

Clockwise Movement An evader moves clockwise in a squarepath The evader initially selects the direction of its first stepand sets up a distance randomly If the evader runs into anobstacle on the way it changes its direction automaticallyAlgorithm 2 shows the algorithm

Counterclockwise Movement This movement type is similarto the clockwise movement except that an evader movescounterclockwise The algorithm is shown in Algorithm 3

Smart Moving Evaders try actively to avoid pursuers Eachevader has a square sensing area Each side of the square isthree cells long If the evader finds any pursuer in the area itattempts to move to the location that has fewer pursuers inthe one-step reachable set of the area Algorithm 4 presentsthe algorithm

43 Results andDiscussion This study designs several experi-ments to compare the performance of the proposed approachHysteretic Q-Learning [36] and Reinforcement-Learningfusing (RL-Fusing) [37] All these approaches are the learningtechniques formultiagent systems Each experiment has threepursuers and one evader In addition the experiments runin ldquotime steprdquo unit One round is considered finished whenthe evader has been caught The performance results arethe average values obtained by running 30 rounds with thecurrent learning policy after 100 training episodes for eachalgorithm

431 Comparison of Approaches Matignon et al pre-sented a decentralized reinforcement learning algorithm for

independent learning robots called Hysteretic Q-Learning[36]This algorithm computes a better policy in a cooperativemultiagent system without additional information or com-munication between agents Hysteretic Q-Learning updatesthe equation for an agent 119894 executing the action 119886

119894from states

ending up in state 1199041015840 as follows

120575 larr997888 119903 + 120574max1198861015840

119876119894(1199041015840

1198861015840

) minus 119876119894(119904 119886119894)

119876119894(119904 119886119894) larr997888

119876119894(119904 119886119894) + inc120575 if 120575 gt 0

119876119894(119904 119886119894) + dec120575 else

(26)

where 119876119894(119904 119886119894) is the action value function of state 119904 and

action 119886119894 119903 is the reward it receives 120574 isin [0 1] is the discount

factor and inc anddec are the rates of increase anddecrease in119876-values respectively Partalas et al proposed the RL-Fusingapproach that uses coordinated actions and a fusing processto guide the agents [37] The fusion process combines thedecisions of all agents and then outputs a global decisionwhich the agents must follow as a team Algorithm 5 showsthe RL-Fusing algorithm The set of strategies means theplans

432 Obstacles The first experiment compares the perfor-mance of the proposed approach Hysteretic Q-Learningand RL-Fusing in environments withwithout obstacles Theevader is assumed to move randomly in every round In theenvironment with obstacles 10 obstacles are randomly set upinitially and no evader is trapped in the obstacles

Table 1 shows the performance results As can be seen allthreemethods requiremore capture steps in the environmentwith obstacles because pursuers need extra time to detectand avoid obstacles Hysteretic Q-Learning takes the greatestnumber of capture steps in both environments because itdoes not support pursuer cooperation Furthermore RL-Fusing also takes more capture steps than the proposedscheme Although RL-Fusing involves cooperation betweenpursuers the generated strategies sometimes cause a pursuerto miss evaders in its sensing area The pursuer thus needs tosearch again and requires more capture steps In contrast theproposed approach utilizes not only the cooperative strategybut also assimilation and accommodation which enable apursuer to learn from either success or failure experiencesThus the pursuer can keep in step with an evader and catchit in a smaller number of capture steps

433 Movement Types The second experimentmeasures theaverage number of capture steps required by the proposedapproach Hysteretic Q-Learning and RL-Fusing under twotypes of evader movement One is clockwise movementand the other is random clockwise and counterclockwisemovement

The average number of capture steps required by thethree approaches for different movement types is shown inTable 2 Similar to the results obtained in the first experimentthe proposed scheme outperforms Hysteretic Q-Learningand RL-Fusing Among the three approaches Hysteretic Q-Learning requires the highest number of capture steps for

10 Mathematical Problems in Engineering

(1) while true do(2) next location larr generates randomly a location nearby the evader(3) if next location has an obstacle then(4) next location larr generates randomly a new location nearby the evader(5) end if(6) end while

Algorithm 1 Random movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than the environment(3) move larr 0(4) while true do(5) get the next location according to the direction(6) move add 1(7) if move equals to distance or next location has an obstacle then(8) evader turns right(9) changes direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 2 Clockwise movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than environment(3) move larr 0(4) while true do(7) if move equals to distance or next location has obstacle then(8) evader turns left(9) change direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 3 Counterclockwise movement algorithm

(1) while true do(2) if evaderrsquos sensor area has pursuer then(3) next location larr generate a location with least pursuers detected(4) else(5) next location larr randomly generate a location nearby the evader(6) end if(7) end while

Algorithm 4 Smart movement algorithm

the same reason that it does not support agent cooperationAccording to the RL-Fusing algorithm presented in Algo-rithm 5 when a state is not the coordinated state the actionof an agent chooses a random or predefined policy and asingle pursuer thus has no learning ability Therefore the

possibility of catching an evader is low and RL-Fusing alsotakes more capture steps In contrast when the pursuer ofthe proposed scheme faces an evader the pursuer observesthe evader continuously If the evader changes its movingtype the pursuer uses accommodation to change its catching

Mathematical Problems in Engineering 11

Require A set of strategies an initial state 1199040

(1) 119904 larr 1199040

(2) while true do(3) if 119904 is coordinated state then(4) 120588 larr RandomReal(0 1)

(5) if 120588 lt 120576 then(6) select a strategy randomly the same for all agent(7) else(8) rank strategies(9) communicate ranks(10) receive ranks from other agents(11) average ranks(12) select corresponding strategy(13) end if(14) execute selected strategy(15) receive reward(16) transit to a new state 119904

1015840

(17) update 119876-value(18) else(19) act with a predefined or random policy(20) end if(21) end while

Algorithm 5 RL-Fusing algorithm

Table 1 Average number of capture steps in different environments

Grid sizeWithoutobstacles(steps)

With obstacles(steps)

HystereticQ-Learning 12 lowast 12 68 73

RL-Fusing 12 lowast 12 61 65The proposedapproach 12 lowast 12 24 27

Table 2 Average number of capture steps for different movementtypes

Grid sizeClockwisemovement(steps)

Randomclockwise andcounterclock-wise movement

(steps)HystereticQ-Learning 12 lowast 12 56 69

RL-Fusing 12 lowast 12 48 56The proposedapproach 12 lowast 12 26 32

strategy If the moving type remains fixed the pursuer keepsthe same strategy by assimilation As seen in Table 2 allthree methods require more steps to capture evaders withrandom clockwise and counterclockwise movement becausepursuers need extra time to detect the moving directions ofthe evaders

Table 3 Average number of capture steps for evaders with smartmovement in different grid sizes

Grid size 12 lowast 12(steps)

Grid size 24 lowast 24(steps)

HystereticQ-Learning 74 212

RL-Fusing 63 300The proposedapproach 63 119

Table 4 Average number of capture steps withwithout assimilationand accommodation

Evader movementtype

Clockwise

Evader movementtypeSmart

With assimilation andaccommodation 186 278

Without assimilationand accommodation 242 364

434 Smart Movement and Size of Grid The third experi-ment involves evaders with smart movementThe evaders tryactively to avoid pursuers when they sense any pursuer Theenvironment is of two grid sizes 12 lowast 12 and 24 lowast 24

Table 3 shows that the proposed scheme requires smallercapture steps in different grid sizes when compared withHysteretic Q-Learning and RL-Fusing When the grid sizeis smaller RL-Fusing outperforms Hysteretic Q-LearningHowever as the grid size becomes larger RL-Fusing requiresmore capture steps than Hysteretic Q-Learning This is

12 Mathematical Problems in Engineering

0

20

40

60

80

100

120

140

10 20 30 40Training episodes

Step

s

Grid size 12 lowast 12 (steps)Grid size 24 lowast 24 (steps)

Figure 5 Average number of capture steps under different trainingepisodes and grid sizes

because RL-Fusing depends mainly on its cooperative strate-gies to catch evaders but not all strategies can guide pursuersto move close to the evaders For example strategy 3 in RL-Fusing [37] is that once all the pursuers go at the distance of3 from an evader Because the sensing area of a pursuer is a5 lowast 5 square the pursuers easily lose the evader in a largerenvironment In contrast the proposed approach assists thepursuers to move as close to the evader as possible by theavoidance of sensing overlappingThus the approach has thehighest possibility of catching the evader

435 Smart Movement Training Episodes and Size of GridThe fourth experiment investigates the relationship betweentraining episodes and grid sizes when using the proposedapproach

Figure 5 shows that regardless of grid sizes (12 lowast 12 or24 lowast 24) the greater the number of training episodes is thesmaller the average number of capture steps is required Thisis becausemore training enriches the case base thus enablingpursuers to make better strategies and decisions

436 Assimilation and Accommodation The last experimentevaluates the proposed approach withwithout assimilationand accommodation There are two types of evader move-ment clockwise and smart The results are listed in Table 4Whether the approach is with or without assimilation andaccommodation the average number of capture steps forsmartmovement is greater than that for clockwisemovementThis is because smart movement can enable an evader todynamically change its movement and position In contrastmovement of an evader with clockwise movement can beeasily predicted hence the pursuers can take fewer steps forcapture Furthermore the average number of capture steps

required by the approach with assimilation and accommo-dation is smaller than that required by the approach withoutassimilation and accommodation for both movement typesThe approach without assimilation and accommodation gen-erates a fixed plan which cannot be changed In contrastthe approach with assimilation and accommodation enablesthe first pursuer to use the local-max strategy to find abetter position and the remaining pursuers can determinethe cooperative positions according to the first pursuerrsquosposition These pursuers can adjust their positions accordingto different catching situations and strategies Accordinglythe approach with assimilation and accommodation requiresfewer capture steps when compared with the approachwithout assimilation and accommodation

5 Conclusion

This work proposes an effective strategy learning approachto the pursuit-evasion game in a dynamic environment Thismethod can facilitate a pursuer in learning and implementingeffective plans from either success or failure experiencesunder the condition of no knowledge of environment avail-able The evolved plans fare extremely well when comparedwith some of the best constructed strategies This studyalso demonstrates the superior performance of the proposedapproach for the pursuit-evasion game by implementing amultiagent system on REPAST

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Ministry of Science andTechnology under Grant no 102-2218-E-027-007

References

[1] K P Sycara ldquoMultiagent systemsrdquo AI Magazine vol 19 no 2pp 79ndash92 1998

[2] N Vlassis A Concise Introduction to Multiagent Systems andDistributed Artificial Intelligence Synthesis Lectures on Artifi-cial Intelligence and Machine Learning Morgan amp Claypool2007

[3] A Antoniades H J Kim and S Sastry ldquoPursuit-evasion strate-gies for teams of multiple agents with incomplete informationrdquoin Proceedings of the 42nd IEEE Conference on Decision andControl pp 756ndash761 December 2003

[4] L J Guibas J-C Latombe S M Lavalle D Lin and RMotwani ldquoA visibility-based pursuit-evasion problemrdquo Interna-tional Journal of Computational GeometryampApplications vol 9no 4-5 pp 471ndash493 1999

[5] J P Hespanha G J Pappas and M Prandini ldquoGreedy controlfor hybrid pur-suit-evasion gamesrdquo in Proceedings of the Euro-pean Control Conference pp 2621ndash2626 2001

[6] R Vidal O Shakernia H J Kim D H Shim and S SastryldquoProbabilistic pursuit-evasion games theory implementation

Mathematical Problems in Engineering 13

and experimental evaluationrdquo IEEE Transactions on Roboticsand Automation vol 18 no 5 pp 662ndash669 2002

[7] R E Korf ldquoA simple solution to pursuit gamesrdquo in Proceedingsof the 2nd International Workshop on Distributed ArtificialIntelligence pp 183ndash194 1992

[8] S Nolfi and D Floreano ldquoCoevolving predator and prey robotsdo ldquoarms racesrdquo arise in artificial evolutionrdquo Artificial Life vol4 no 4 pp 311ndash335 1998

[9] N J Savill and P Hogeweg ldquoEvolutionary stagnation due topattern-pattern interactions in a coevolutionary predator-preymodelrdquo Artificial Life vol 3 no 2 pp 81ndash100 1997

[10] J Y Kuo F-C Huang S-P Ma and Y-Y Fanjiang ldquoApplyinghybrid learning approach to RoboCuprsquos strategyrdquo Journal ofSystems and Software vol 86 no 7 pp 1933ndash1944 2013

[11] S I Nishimura and T Ikegami ldquoEmergence of collectivestrategies in a prey-predator game modelrdquo Artificial Life vol 3no 4 pp 243ndash260 1998

[12] K-C Jim and C L Giles ldquoTalking helps evolving communi-cating agents for the predator-prey pursuit problemrdquo ArtificialLife vol 6 no 3 pp 237ndash254 2000

[13] T Haynes and S Sen ldquoEvolving behavioral strategies in preda-tors and preyrdquo inAdaption and Learning inMulti-Agent Systemsvol 1042 of Lecture Notes in Computer Science pp 113ndash126Springer Berlin Germany 1995

[14] D Hladek J Vascak and P Sincak ldquoHierarchical fuzzy infer-ence system for robotic pursuit evasion taskrdquo in Proceedings ofthe 6th International SymposiumonAppliedMachine Intelligenceand Informatics (SAMI rsquo08) pp 273ndash277 Herlrsquoany SlovakiaJanuary 2008

[15] R G Smith ldquoThe contract net protocol high-level commu-nication and control in a distributed problem solverrdquo IEEETransactions on Computers vol 19 no 12 pp 1104ndash1113 1980

[16] P-C Zhou B-R Hong Y-H Wang and T Zhou ldquoMulti-agent coopoerative pursuit based on extended contract netprotocolrdquo in Proceedings of the International Conference onMachine Learning and Cybernetics vol 1 pp 169ndash173 August2004

[17] A Rao and M Georgeff ldquoBDI agents from theory to practicerdquoin Proceedings of the International Conference on Multi-AgentSystems pp 312ndash319 1995

[18] J Piaget The Equilibration of Cognitive Structures The CentralProblem of Intellectual Development University of ChicagoPress 1985

[19] J Y Kuo S J Lee C L Wu N L Hsueh and J Lee ldquoEvolu-tionary agents for intelligent transport systemsrdquo InternationalJournal of Fuzzy Systems vol 7 no 2 pp 85ndash93 2005

[20] J Y Kuo and F C Huang ldquoA hybrid approach for multi-agentlearning systemsrdquo Intelligent Automation and Soft Computingvol 17 no 3 pp 385ndash399 2011

[21] S Gebhardt P Grant R von Georgi and M T HuberldquoAspects of Piagetrsquos cognitive developmental psychology andneurobiology of psychotic disordersmdashan integrative modelrdquoMedical Hypotheses vol 71 no 3 pp 426ndash433 2008

[22] S Takamuku and R C Arkin ldquoMulti-method learning andassimilationrdquo Robotics and Autonomous Systems vol 55 no 8pp 618ndash627 2007

[23] T Parsons ldquoPursuit-evasion in a graphrdquo in Theory and Appli-cations of Graphs vol 642 of Lecture Notes in Mathematics pp426ndash441 Springer Heidelberg Germany 1978

[24] R Breisch ldquoAn intuitive approach to speleotopologyrdquo South-western Cavers vol 6 no 5 pp 72ndash78 1967

[25] F Ferrari ldquoA new parameterized potential family for path plan-ning algorithmsrdquo International Journal on Artificial IntelligenceTools vol 18 no 6 pp 949ndash957 2009

[26] K Walsh and B Banerjee ldquoFast Alowast with iterative resolution fornavigationrdquo International Journal on Artificial Intelligence Toolsvol 19 no 1 pp 101ndash119 2010

[27] L-P Wong M Y H Low and C S Chong ldquoBee colonyoptimization with local search for traveling salesman problemrdquoInternational Journal on Artificial Intelligence Tools vol 19 no3 pp 305ndash334 2010

[28] E Raboin D S Nau U Kuter S K Gupta and P SvecldquoStrategy generation inmulti-agent imperfect-information pur-suit gamesrdquo in Proceedings of the International Conference onAutonomous Agents and Multiagent Systems pp 947ndash954 2010

[29] B PGerkey SThrun andGGordon ldquoVisibility-based pursuit-evasion with limited field of Viewrdquo International Journal ofRobotics Research vol 25 no 4 pp 299ndash315 2006

[30] N Basilico N Gatti and F Amigoni ldquoLeader-follower strate-gies for robotic patrolling in environments with arbitrarytopologiesrdquo in Proceedings of the 8th International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMASrsquo09) pp 57ndash64 May 2009

[31] M Yan ldquoMulti-robot searching using game-theory basedapproachrdquo International Journal of Advanced Robotic Systemsvol 5 no 4 pp 341ndash350 2008

[32] K-DAlthoff ldquoCase-based reasoningrdquo inHandbook on SoftwareEngineering and Knowledge Engineering pp 549ndash587 2001

[33] E K Burke B L MacCarthy S Petrovic and R Qu ldquoMultiple-retrieval case-based reasoning for course timetabling prob-lemsrdquo Journal of the Operational Research Society vol 57 no2 pp 148ndash162 2006

[34] J Long D G Schwartz S Stoecklin and M K Patel ldquoAppli-cation of loop reduction to learning program behaviors foranomaly detectionrdquo in Proceedings of the International Confer-ence on Information Technology Coding and Computing (ITCC05) vol 1 pp 691ndash696 April 2005

[35] The Repast Suite The Repast Suite 2014 httprepastsource-forgenet

[36] L Matignon G J Laurent and N L Fort-Piat ldquoHysteretic Q-Learning an algorithm for decentralized reinforcement learn-ing in cooperative multi-agent teamsrdquo in Proceedings of theIEEERSJ International Conference on Intelligent Robots andSystems (IROS 07) pp 64ndash69 San Diego Calif USA October-November 2007

[37] I Partalas I Feneris and I Vlahavas ldquoA hybrid multiagentreinforcement learning approach using strategies and fusionrdquoInternational Journal on Artificial Intelligence Tools vol 17 no5 pp 945ndash962 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Multiagent Cooperative Learning ...downloads.hindawi.com/journals/mpe/2015/964871.pdf · Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion

8 Mathematical Problems in Engineering

(1) Problem analysis

(2) Case retrieval

(3) Similarity calculation (4) Case reuse

(5) Strategy assimilation

(6) Cooperation

(7) Application of local-maxstrategy

(8) Application of local-cooperativestrategy

(9) Reward calculation

(10) Plan transformation

(12) Strategy accommodation(13) Case revision

(14) Case retention

(11) Strategy assessment

Case base

R gt 120576

R le 120576

C lt 120590

C ge 120590

Figure 3 Plan evolution process

(14) Case Retention Parts of the experience that are likely tobe useful for future problem solving are retained

4 Case Study

This study uses a pursuit-evasion game to demonstrate theproposed approach and implements a multiagent system onthe Recursive Porous Agent Simulation Toolkit (REPAST)[35] Figure 4 shows the GUI of the system where theblue red and black circles represent pursuers evaders andobstacles respectively The yellow region depicts the sensingarea of the pursuers

41 Problem Formulation The system contains multiplepursuing robots which search for multiple evading robotsin a two-dimensional environment The physical world isabstracted as a grid world and the game is played accordingto the following settings and rules

(i) The number of pursuers is 119899 The number of evadersis unknown to the pursuers

(ii) Each pursuer has a sensing area comprising 5lowast5 cellsand the pursuer is at the center of the area as shownin Figure 4

(iii) During each moving step the agents are allowed toeither move one cell away from their current cells orstay in their current cells

(iv) Each pursuer has no information about the environ-ment other than that of its own 5-by-5 sensing area

(v) All pursuers use the same case base(vi) An evader is captured when it occupies the same

cell as a pursuer and it will be removed from theenvironment

(vii) The game ends when all evaders have been captured

42 Evasion Policy When an evader is caught the systemfirst checks whether the sum of the caught and uncaughtevaders is smaller than the total number of the predefinedevaders If so the system generates a new evader and choosesits type of movement Otherwise the system does nothingClearly when the sum of the caught and uncaught evaders

Mathematical Problems in Engineering 9

Figure 4 Implementation of multiagent system

equals the number of the predefined evaders the systemstops generating new evaders and the uncaught evaders donot increase The uncaught evaders even decrease with thesuccessful evader captures The game thus ends finally Inaddition when the system selects a movement type for anew evader the choice varies with different experimentswhich in turn affects the outcome of the experience Thisstudy investigates four types of movement namely randomclockwise counterclockwise and smart movements

Random Movement An evader randomly selects a locationthat can be reached in one step or just stays in the same loca-tion The details of the algorithm are shown in Algorithm 1

Clockwise Movement An evader moves clockwise in a squarepath The evader initially selects the direction of its first stepand sets up a distance randomly If the evader runs into anobstacle on the way it changes its direction automaticallyAlgorithm 2 shows the algorithm

Counterclockwise Movement This movement type is similarto the clockwise movement except that an evader movescounterclockwise The algorithm is shown in Algorithm 3

Smart Moving Evaders try actively to avoid pursuers Eachevader has a square sensing area Each side of the square isthree cells long If the evader finds any pursuer in the area itattempts to move to the location that has fewer pursuers inthe one-step reachable set of the area Algorithm 4 presentsthe algorithm

43 Results andDiscussion This study designs several experi-ments to compare the performance of the proposed approachHysteretic Q-Learning [36] and Reinforcement-Learningfusing (RL-Fusing) [37] All these approaches are the learningtechniques formultiagent systems Each experiment has threepursuers and one evader In addition the experiments runin ldquotime steprdquo unit One round is considered finished whenthe evader has been caught The performance results arethe average values obtained by running 30 rounds with thecurrent learning policy after 100 training episodes for eachalgorithm

431 Comparison of Approaches Matignon et al pre-sented a decentralized reinforcement learning algorithm for

independent learning robots called Hysteretic Q-Learning[36]This algorithm computes a better policy in a cooperativemultiagent system without additional information or com-munication between agents Hysteretic Q-Learning updatesthe equation for an agent 119894 executing the action 119886

119894from states

ending up in state 1199041015840 as follows

120575 larr997888 119903 + 120574max1198861015840

119876119894(1199041015840

1198861015840

) minus 119876119894(119904 119886119894)

119876119894(119904 119886119894) larr997888

119876119894(119904 119886119894) + inc120575 if 120575 gt 0

119876119894(119904 119886119894) + dec120575 else

(26)

where 119876119894(119904 119886119894) is the action value function of state 119904 and

action 119886119894 119903 is the reward it receives 120574 isin [0 1] is the discount

factor and inc anddec are the rates of increase anddecrease in119876-values respectively Partalas et al proposed the RL-Fusingapproach that uses coordinated actions and a fusing processto guide the agents [37] The fusion process combines thedecisions of all agents and then outputs a global decisionwhich the agents must follow as a team Algorithm 5 showsthe RL-Fusing algorithm The set of strategies means theplans

432 Obstacles The first experiment compares the perfor-mance of the proposed approach Hysteretic Q-Learningand RL-Fusing in environments withwithout obstacles Theevader is assumed to move randomly in every round In theenvironment with obstacles 10 obstacles are randomly set upinitially and no evader is trapped in the obstacles

Table 1 shows the performance results As can be seen allthreemethods requiremore capture steps in the environmentwith obstacles because pursuers need extra time to detectand avoid obstacles Hysteretic Q-Learning takes the greatestnumber of capture steps in both environments because itdoes not support pursuer cooperation Furthermore RL-Fusing also takes more capture steps than the proposedscheme Although RL-Fusing involves cooperation betweenpursuers the generated strategies sometimes cause a pursuerto miss evaders in its sensing area The pursuer thus needs tosearch again and requires more capture steps In contrast theproposed approach utilizes not only the cooperative strategybut also assimilation and accommodation which enable apursuer to learn from either success or failure experiencesThus the pursuer can keep in step with an evader and catchit in a smaller number of capture steps

433 Movement Types The second experimentmeasures theaverage number of capture steps required by the proposedapproach Hysteretic Q-Learning and RL-Fusing under twotypes of evader movement One is clockwise movementand the other is random clockwise and counterclockwisemovement

The average number of capture steps required by thethree approaches for different movement types is shown inTable 2 Similar to the results obtained in the first experimentthe proposed scheme outperforms Hysteretic Q-Learningand RL-Fusing Among the three approaches Hysteretic Q-Learning requires the highest number of capture steps for

10 Mathematical Problems in Engineering

(1) while true do(2) next location larr generates randomly a location nearby the evader(3) if next location has an obstacle then(4) next location larr generates randomly a new location nearby the evader(5) end if(6) end while

Algorithm 1 Random movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than the environment(3) move larr 0(4) while true do(5) get the next location according to the direction(6) move add 1(7) if move equals to distance or next location has an obstacle then(8) evader turns right(9) changes direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 2 Clockwise movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than environment(3) move larr 0(4) while true do(7) if move equals to distance or next location has obstacle then(8) evader turns left(9) change direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 3 Counterclockwise movement algorithm

(1) while true do(2) if evaderrsquos sensor area has pursuer then(3) next location larr generate a location with least pursuers detected(4) else(5) next location larr randomly generate a location nearby the evader(6) end if(7) end while

Algorithm 4 Smart movement algorithm

the same reason that it does not support agent cooperationAccording to the RL-Fusing algorithm presented in Algo-rithm 5 when a state is not the coordinated state the actionof an agent chooses a random or predefined policy and asingle pursuer thus has no learning ability Therefore the

possibility of catching an evader is low and RL-Fusing alsotakes more capture steps In contrast when the pursuer ofthe proposed scheme faces an evader the pursuer observesthe evader continuously If the evader changes its movingtype the pursuer uses accommodation to change its catching

Mathematical Problems in Engineering 11

Require A set of strategies an initial state 1199040

(1) 119904 larr 1199040

(2) while true do(3) if 119904 is coordinated state then(4) 120588 larr RandomReal(0 1)

(5) if 120588 lt 120576 then(6) select a strategy randomly the same for all agent(7) else(8) rank strategies(9) communicate ranks(10) receive ranks from other agents(11) average ranks(12) select corresponding strategy(13) end if(14) execute selected strategy(15) receive reward(16) transit to a new state 119904

1015840

(17) update 119876-value(18) else(19) act with a predefined or random policy(20) end if(21) end while

Algorithm 5 RL-Fusing algorithm

Table 1 Average number of capture steps in different environments

Grid sizeWithoutobstacles(steps)

With obstacles(steps)

HystereticQ-Learning 12 lowast 12 68 73

RL-Fusing 12 lowast 12 61 65The proposedapproach 12 lowast 12 24 27

Table 2 Average number of capture steps for different movementtypes

Grid sizeClockwisemovement(steps)

Randomclockwise andcounterclock-wise movement

(steps)HystereticQ-Learning 12 lowast 12 56 69

RL-Fusing 12 lowast 12 48 56The proposedapproach 12 lowast 12 26 32

strategy If the moving type remains fixed the pursuer keepsthe same strategy by assimilation As seen in Table 2 allthree methods require more steps to capture evaders withrandom clockwise and counterclockwise movement becausepursuers need extra time to detect the moving directions ofthe evaders

Table 3 Average number of capture steps for evaders with smartmovement in different grid sizes

Grid size 12 lowast 12(steps)

Grid size 24 lowast 24(steps)

HystereticQ-Learning 74 212

RL-Fusing 63 300The proposedapproach 63 119

Table 4 Average number of capture steps withwithout assimilationand accommodation

Evader movementtype

Clockwise

Evader movementtypeSmart

With assimilation andaccommodation 186 278

Without assimilationand accommodation 242 364

434 Smart Movement and Size of Grid The third experi-ment involves evaders with smart movementThe evaders tryactively to avoid pursuers when they sense any pursuer Theenvironment is of two grid sizes 12 lowast 12 and 24 lowast 24

Table 3 shows that the proposed scheme requires smallercapture steps in different grid sizes when compared withHysteretic Q-Learning and RL-Fusing When the grid sizeis smaller RL-Fusing outperforms Hysteretic Q-LearningHowever as the grid size becomes larger RL-Fusing requiresmore capture steps than Hysteretic Q-Learning This is

12 Mathematical Problems in Engineering

0

20

40

60

80

100

120

140

10 20 30 40Training episodes

Step

s

Grid size 12 lowast 12 (steps)Grid size 24 lowast 24 (steps)

Figure 5 Average number of capture steps under different trainingepisodes and grid sizes

because RL-Fusing depends mainly on its cooperative strate-gies to catch evaders but not all strategies can guide pursuersto move close to the evaders For example strategy 3 in RL-Fusing [37] is that once all the pursuers go at the distance of3 from an evader Because the sensing area of a pursuer is a5 lowast 5 square the pursuers easily lose the evader in a largerenvironment In contrast the proposed approach assists thepursuers to move as close to the evader as possible by theavoidance of sensing overlappingThus the approach has thehighest possibility of catching the evader

435 Smart Movement Training Episodes and Size of GridThe fourth experiment investigates the relationship betweentraining episodes and grid sizes when using the proposedapproach

Figure 5 shows that regardless of grid sizes (12 lowast 12 or24 lowast 24) the greater the number of training episodes is thesmaller the average number of capture steps is required Thisis becausemore training enriches the case base thus enablingpursuers to make better strategies and decisions

436 Assimilation and Accommodation The last experimentevaluates the proposed approach withwithout assimilationand accommodation There are two types of evader move-ment clockwise and smart The results are listed in Table 4Whether the approach is with or without assimilation andaccommodation the average number of capture steps forsmartmovement is greater than that for clockwisemovementThis is because smart movement can enable an evader todynamically change its movement and position In contrastmovement of an evader with clockwise movement can beeasily predicted hence the pursuers can take fewer steps forcapture Furthermore the average number of capture steps

required by the approach with assimilation and accommo-dation is smaller than that required by the approach withoutassimilation and accommodation for both movement typesThe approach without assimilation and accommodation gen-erates a fixed plan which cannot be changed In contrastthe approach with assimilation and accommodation enablesthe first pursuer to use the local-max strategy to find abetter position and the remaining pursuers can determinethe cooperative positions according to the first pursuerrsquosposition These pursuers can adjust their positions accordingto different catching situations and strategies Accordinglythe approach with assimilation and accommodation requiresfewer capture steps when compared with the approachwithout assimilation and accommodation

5 Conclusion

This work proposes an effective strategy learning approachto the pursuit-evasion game in a dynamic environment Thismethod can facilitate a pursuer in learning and implementingeffective plans from either success or failure experiencesunder the condition of no knowledge of environment avail-able The evolved plans fare extremely well when comparedwith some of the best constructed strategies This studyalso demonstrates the superior performance of the proposedapproach for the pursuit-evasion game by implementing amultiagent system on REPAST

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Ministry of Science andTechnology under Grant no 102-2218-E-027-007

References

[1] K P Sycara ldquoMultiagent systemsrdquo AI Magazine vol 19 no 2pp 79ndash92 1998

[2] N Vlassis A Concise Introduction to Multiagent Systems andDistributed Artificial Intelligence Synthesis Lectures on Artifi-cial Intelligence and Machine Learning Morgan amp Claypool2007

[3] A Antoniades H J Kim and S Sastry ldquoPursuit-evasion strate-gies for teams of multiple agents with incomplete informationrdquoin Proceedings of the 42nd IEEE Conference on Decision andControl pp 756ndash761 December 2003

[4] L J Guibas J-C Latombe S M Lavalle D Lin and RMotwani ldquoA visibility-based pursuit-evasion problemrdquo Interna-tional Journal of Computational GeometryampApplications vol 9no 4-5 pp 471ndash493 1999

[5] J P Hespanha G J Pappas and M Prandini ldquoGreedy controlfor hybrid pur-suit-evasion gamesrdquo in Proceedings of the Euro-pean Control Conference pp 2621ndash2626 2001

[6] R Vidal O Shakernia H J Kim D H Shim and S SastryldquoProbabilistic pursuit-evasion games theory implementation

Mathematical Problems in Engineering 13

and experimental evaluationrdquo IEEE Transactions on Roboticsand Automation vol 18 no 5 pp 662ndash669 2002

[7] R E Korf ldquoA simple solution to pursuit gamesrdquo in Proceedingsof the 2nd International Workshop on Distributed ArtificialIntelligence pp 183ndash194 1992

[8] S Nolfi and D Floreano ldquoCoevolving predator and prey robotsdo ldquoarms racesrdquo arise in artificial evolutionrdquo Artificial Life vol4 no 4 pp 311ndash335 1998

[9] N J Savill and P Hogeweg ldquoEvolutionary stagnation due topattern-pattern interactions in a coevolutionary predator-preymodelrdquo Artificial Life vol 3 no 2 pp 81ndash100 1997

[10] J Y Kuo F-C Huang S-P Ma and Y-Y Fanjiang ldquoApplyinghybrid learning approach to RoboCuprsquos strategyrdquo Journal ofSystems and Software vol 86 no 7 pp 1933ndash1944 2013

[11] S I Nishimura and T Ikegami ldquoEmergence of collectivestrategies in a prey-predator game modelrdquo Artificial Life vol 3no 4 pp 243ndash260 1998

[12] K-C Jim and C L Giles ldquoTalking helps evolving communi-cating agents for the predator-prey pursuit problemrdquo ArtificialLife vol 6 no 3 pp 237ndash254 2000

[13] T Haynes and S Sen ldquoEvolving behavioral strategies in preda-tors and preyrdquo inAdaption and Learning inMulti-Agent Systemsvol 1042 of Lecture Notes in Computer Science pp 113ndash126Springer Berlin Germany 1995

[14] D Hladek J Vascak and P Sincak ldquoHierarchical fuzzy infer-ence system for robotic pursuit evasion taskrdquo in Proceedings ofthe 6th International SymposiumonAppliedMachine Intelligenceand Informatics (SAMI rsquo08) pp 273ndash277 Herlrsquoany SlovakiaJanuary 2008

[15] R G Smith ldquoThe contract net protocol high-level commu-nication and control in a distributed problem solverrdquo IEEETransactions on Computers vol 19 no 12 pp 1104ndash1113 1980

[16] P-C Zhou B-R Hong Y-H Wang and T Zhou ldquoMulti-agent coopoerative pursuit based on extended contract netprotocolrdquo in Proceedings of the International Conference onMachine Learning and Cybernetics vol 1 pp 169ndash173 August2004

[17] A Rao and M Georgeff ldquoBDI agents from theory to practicerdquoin Proceedings of the International Conference on Multi-AgentSystems pp 312ndash319 1995

[18] J Piaget The Equilibration of Cognitive Structures The CentralProblem of Intellectual Development University of ChicagoPress 1985

[19] J Y Kuo S J Lee C L Wu N L Hsueh and J Lee ldquoEvolu-tionary agents for intelligent transport systemsrdquo InternationalJournal of Fuzzy Systems vol 7 no 2 pp 85ndash93 2005

[20] J Y Kuo and F C Huang ldquoA hybrid approach for multi-agentlearning systemsrdquo Intelligent Automation and Soft Computingvol 17 no 3 pp 385ndash399 2011

[21] S Gebhardt P Grant R von Georgi and M T HuberldquoAspects of Piagetrsquos cognitive developmental psychology andneurobiology of psychotic disordersmdashan integrative modelrdquoMedical Hypotheses vol 71 no 3 pp 426ndash433 2008

[22] S Takamuku and R C Arkin ldquoMulti-method learning andassimilationrdquo Robotics and Autonomous Systems vol 55 no 8pp 618ndash627 2007

[23] T Parsons ldquoPursuit-evasion in a graphrdquo in Theory and Appli-cations of Graphs vol 642 of Lecture Notes in Mathematics pp426ndash441 Springer Heidelberg Germany 1978

[24] R Breisch ldquoAn intuitive approach to speleotopologyrdquo South-western Cavers vol 6 no 5 pp 72ndash78 1967

[25] F Ferrari ldquoA new parameterized potential family for path plan-ning algorithmsrdquo International Journal on Artificial IntelligenceTools vol 18 no 6 pp 949ndash957 2009

[26] K Walsh and B Banerjee ldquoFast Alowast with iterative resolution fornavigationrdquo International Journal on Artificial Intelligence Toolsvol 19 no 1 pp 101ndash119 2010

[27] L-P Wong M Y H Low and C S Chong ldquoBee colonyoptimization with local search for traveling salesman problemrdquoInternational Journal on Artificial Intelligence Tools vol 19 no3 pp 305ndash334 2010

[28] E Raboin D S Nau U Kuter S K Gupta and P SvecldquoStrategy generation inmulti-agent imperfect-information pur-suit gamesrdquo in Proceedings of the International Conference onAutonomous Agents and Multiagent Systems pp 947ndash954 2010

[29] B PGerkey SThrun andGGordon ldquoVisibility-based pursuit-evasion with limited field of Viewrdquo International Journal ofRobotics Research vol 25 no 4 pp 299ndash315 2006

[30] N Basilico N Gatti and F Amigoni ldquoLeader-follower strate-gies for robotic patrolling in environments with arbitrarytopologiesrdquo in Proceedings of the 8th International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMASrsquo09) pp 57ndash64 May 2009

[31] M Yan ldquoMulti-robot searching using game-theory basedapproachrdquo International Journal of Advanced Robotic Systemsvol 5 no 4 pp 341ndash350 2008

[32] K-DAlthoff ldquoCase-based reasoningrdquo inHandbook on SoftwareEngineering and Knowledge Engineering pp 549ndash587 2001

[33] E K Burke B L MacCarthy S Petrovic and R Qu ldquoMultiple-retrieval case-based reasoning for course timetabling prob-lemsrdquo Journal of the Operational Research Society vol 57 no2 pp 148ndash162 2006

[34] J Long D G Schwartz S Stoecklin and M K Patel ldquoAppli-cation of loop reduction to learning program behaviors foranomaly detectionrdquo in Proceedings of the International Confer-ence on Information Technology Coding and Computing (ITCC05) vol 1 pp 691ndash696 April 2005

[35] The Repast Suite The Repast Suite 2014 httprepastsource-forgenet

[36] L Matignon G J Laurent and N L Fort-Piat ldquoHysteretic Q-Learning an algorithm for decentralized reinforcement learn-ing in cooperative multi-agent teamsrdquo in Proceedings of theIEEERSJ International Conference on Intelligent Robots andSystems (IROS 07) pp 64ndash69 San Diego Calif USA October-November 2007

[37] I Partalas I Feneris and I Vlahavas ldquoA hybrid multiagentreinforcement learning approach using strategies and fusionrdquoInternational Journal on Artificial Intelligence Tools vol 17 no5 pp 945ndash962 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Multiagent Cooperative Learning ...downloads.hindawi.com/journals/mpe/2015/964871.pdf · Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion

Mathematical Problems in Engineering 9

Figure 4 Implementation of multiagent system

equals the number of the predefined evaders the systemstops generating new evaders and the uncaught evaders donot increase The uncaught evaders even decrease with thesuccessful evader captures The game thus ends finally Inaddition when the system selects a movement type for anew evader the choice varies with different experimentswhich in turn affects the outcome of the experience Thisstudy investigates four types of movement namely randomclockwise counterclockwise and smart movements

Random Movement An evader randomly selects a locationthat can be reached in one step or just stays in the same loca-tion The details of the algorithm are shown in Algorithm 1

Clockwise Movement An evader moves clockwise in a squarepath The evader initially selects the direction of its first stepand sets up a distance randomly If the evader runs into anobstacle on the way it changes its direction automaticallyAlgorithm 2 shows the algorithm

Counterclockwise Movement This movement type is similarto the clockwise movement except that an evader movescounterclockwise The algorithm is shown in Algorithm 3

Smart Moving Evaders try actively to avoid pursuers Eachevader has a square sensing area Each side of the square isthree cells long If the evader finds any pursuer in the area itattempts to move to the location that has fewer pursuers inthe one-step reachable set of the area Algorithm 4 presentsthe algorithm

43 Results andDiscussion This study designs several experi-ments to compare the performance of the proposed approachHysteretic Q-Learning [36] and Reinforcement-Learningfusing (RL-Fusing) [37] All these approaches are the learningtechniques formultiagent systems Each experiment has threepursuers and one evader In addition the experiments runin ldquotime steprdquo unit One round is considered finished whenthe evader has been caught The performance results arethe average values obtained by running 30 rounds with thecurrent learning policy after 100 training episodes for eachalgorithm

431 Comparison of Approaches Matignon et al pre-sented a decentralized reinforcement learning algorithm for

independent learning robots called Hysteretic Q-Learning[36]This algorithm computes a better policy in a cooperativemultiagent system without additional information or com-munication between agents Hysteretic Q-Learning updatesthe equation for an agent 119894 executing the action 119886

119894from states

ending up in state 1199041015840 as follows

120575 larr997888 119903 + 120574max1198861015840

119876119894(1199041015840

1198861015840

) minus 119876119894(119904 119886119894)

119876119894(119904 119886119894) larr997888

119876119894(119904 119886119894) + inc120575 if 120575 gt 0

119876119894(119904 119886119894) + dec120575 else

(26)

where 119876119894(119904 119886119894) is the action value function of state 119904 and

action 119886119894 119903 is the reward it receives 120574 isin [0 1] is the discount

factor and inc anddec are the rates of increase anddecrease in119876-values respectively Partalas et al proposed the RL-Fusingapproach that uses coordinated actions and a fusing processto guide the agents [37] The fusion process combines thedecisions of all agents and then outputs a global decisionwhich the agents must follow as a team Algorithm 5 showsthe RL-Fusing algorithm The set of strategies means theplans

432 Obstacles The first experiment compares the perfor-mance of the proposed approach Hysteretic Q-Learningand RL-Fusing in environments withwithout obstacles Theevader is assumed to move randomly in every round In theenvironment with obstacles 10 obstacles are randomly set upinitially and no evader is trapped in the obstacles

Table 1 shows the performance results As can be seen allthreemethods requiremore capture steps in the environmentwith obstacles because pursuers need extra time to detectand avoid obstacles Hysteretic Q-Learning takes the greatestnumber of capture steps in both environments because itdoes not support pursuer cooperation Furthermore RL-Fusing also takes more capture steps than the proposedscheme Although RL-Fusing involves cooperation betweenpursuers the generated strategies sometimes cause a pursuerto miss evaders in its sensing area The pursuer thus needs tosearch again and requires more capture steps In contrast theproposed approach utilizes not only the cooperative strategybut also assimilation and accommodation which enable apursuer to learn from either success or failure experiencesThus the pursuer can keep in step with an evader and catchit in a smaller number of capture steps

433 Movement Types The second experimentmeasures theaverage number of capture steps required by the proposedapproach Hysteretic Q-Learning and RL-Fusing under twotypes of evader movement One is clockwise movementand the other is random clockwise and counterclockwisemovement

The average number of capture steps required by thethree approaches for different movement types is shown inTable 2 Similar to the results obtained in the first experimentthe proposed scheme outperforms Hysteretic Q-Learningand RL-Fusing Among the three approaches Hysteretic Q-Learning requires the highest number of capture steps for

10 Mathematical Problems in Engineering

(1) while true do(2) next location larr generates randomly a location nearby the evader(3) if next location has an obstacle then(4) next location larr generates randomly a new location nearby the evader(5) end if(6) end while

Algorithm 1 Random movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than the environment(3) move larr 0(4) while true do(5) get the next location according to the direction(6) move add 1(7) if move equals to distance or next location has an obstacle then(8) evader turns right(9) changes direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 2 Clockwise movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than environment(3) move larr 0(4) while true do(7) if move equals to distance or next location has obstacle then(8) evader turns left(9) change direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 3 Counterclockwise movement algorithm

(1) while true do(2) if evaderrsquos sensor area has pursuer then(3) next location larr generate a location with least pursuers detected(4) else(5) next location larr randomly generate a location nearby the evader(6) end if(7) end while

Algorithm 4 Smart movement algorithm

the same reason that it does not support agent cooperationAccording to the RL-Fusing algorithm presented in Algo-rithm 5 when a state is not the coordinated state the actionof an agent chooses a random or predefined policy and asingle pursuer thus has no learning ability Therefore the

possibility of catching an evader is low and RL-Fusing alsotakes more capture steps In contrast when the pursuer ofthe proposed scheme faces an evader the pursuer observesthe evader continuously If the evader changes its movingtype the pursuer uses accommodation to change its catching

Mathematical Problems in Engineering 11

Require A set of strategies an initial state 1199040

(1) 119904 larr 1199040

(2) while true do(3) if 119904 is coordinated state then(4) 120588 larr RandomReal(0 1)

(5) if 120588 lt 120576 then(6) select a strategy randomly the same for all agent(7) else(8) rank strategies(9) communicate ranks(10) receive ranks from other agents(11) average ranks(12) select corresponding strategy(13) end if(14) execute selected strategy(15) receive reward(16) transit to a new state 119904

1015840

(17) update 119876-value(18) else(19) act with a predefined or random policy(20) end if(21) end while

Algorithm 5 RL-Fusing algorithm

Table 1 Average number of capture steps in different environments

Grid sizeWithoutobstacles(steps)

With obstacles(steps)

HystereticQ-Learning 12 lowast 12 68 73

RL-Fusing 12 lowast 12 61 65The proposedapproach 12 lowast 12 24 27

Table 2 Average number of capture steps for different movementtypes

Grid sizeClockwisemovement(steps)

Randomclockwise andcounterclock-wise movement

(steps)HystereticQ-Learning 12 lowast 12 56 69

RL-Fusing 12 lowast 12 48 56The proposedapproach 12 lowast 12 26 32

strategy If the moving type remains fixed the pursuer keepsthe same strategy by assimilation As seen in Table 2 allthree methods require more steps to capture evaders withrandom clockwise and counterclockwise movement becausepursuers need extra time to detect the moving directions ofthe evaders

Table 3 Average number of capture steps for evaders with smartmovement in different grid sizes

Grid size 12 lowast 12(steps)

Grid size 24 lowast 24(steps)

HystereticQ-Learning 74 212

RL-Fusing 63 300The proposedapproach 63 119

Table 4 Average number of capture steps withwithout assimilationand accommodation

Evader movementtype

Clockwise

Evader movementtypeSmart

With assimilation andaccommodation 186 278

Without assimilationand accommodation 242 364

434 Smart Movement and Size of Grid The third experi-ment involves evaders with smart movementThe evaders tryactively to avoid pursuers when they sense any pursuer Theenvironment is of two grid sizes 12 lowast 12 and 24 lowast 24

Table 3 shows that the proposed scheme requires smallercapture steps in different grid sizes when compared withHysteretic Q-Learning and RL-Fusing When the grid sizeis smaller RL-Fusing outperforms Hysteretic Q-LearningHowever as the grid size becomes larger RL-Fusing requiresmore capture steps than Hysteretic Q-Learning This is

12 Mathematical Problems in Engineering

0

20

40

60

80

100

120

140

10 20 30 40Training episodes

Step

s

Grid size 12 lowast 12 (steps)Grid size 24 lowast 24 (steps)

Figure 5 Average number of capture steps under different trainingepisodes and grid sizes

because RL-Fusing depends mainly on its cooperative strate-gies to catch evaders but not all strategies can guide pursuersto move close to the evaders For example strategy 3 in RL-Fusing [37] is that once all the pursuers go at the distance of3 from an evader Because the sensing area of a pursuer is a5 lowast 5 square the pursuers easily lose the evader in a largerenvironment In contrast the proposed approach assists thepursuers to move as close to the evader as possible by theavoidance of sensing overlappingThus the approach has thehighest possibility of catching the evader

435 Smart Movement Training Episodes and Size of GridThe fourth experiment investigates the relationship betweentraining episodes and grid sizes when using the proposedapproach

Figure 5 shows that regardless of grid sizes (12 lowast 12 or24 lowast 24) the greater the number of training episodes is thesmaller the average number of capture steps is required Thisis becausemore training enriches the case base thus enablingpursuers to make better strategies and decisions

436 Assimilation and Accommodation The last experimentevaluates the proposed approach withwithout assimilationand accommodation There are two types of evader move-ment clockwise and smart The results are listed in Table 4Whether the approach is with or without assimilation andaccommodation the average number of capture steps forsmartmovement is greater than that for clockwisemovementThis is because smart movement can enable an evader todynamically change its movement and position In contrastmovement of an evader with clockwise movement can beeasily predicted hence the pursuers can take fewer steps forcapture Furthermore the average number of capture steps

required by the approach with assimilation and accommo-dation is smaller than that required by the approach withoutassimilation and accommodation for both movement typesThe approach without assimilation and accommodation gen-erates a fixed plan which cannot be changed In contrastthe approach with assimilation and accommodation enablesthe first pursuer to use the local-max strategy to find abetter position and the remaining pursuers can determinethe cooperative positions according to the first pursuerrsquosposition These pursuers can adjust their positions accordingto different catching situations and strategies Accordinglythe approach with assimilation and accommodation requiresfewer capture steps when compared with the approachwithout assimilation and accommodation

5 Conclusion

This work proposes an effective strategy learning approachto the pursuit-evasion game in a dynamic environment Thismethod can facilitate a pursuer in learning and implementingeffective plans from either success or failure experiencesunder the condition of no knowledge of environment avail-able The evolved plans fare extremely well when comparedwith some of the best constructed strategies This studyalso demonstrates the superior performance of the proposedapproach for the pursuit-evasion game by implementing amultiagent system on REPAST

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Ministry of Science andTechnology under Grant no 102-2218-E-027-007

References

[1] K P Sycara ldquoMultiagent systemsrdquo AI Magazine vol 19 no 2pp 79ndash92 1998

[2] N Vlassis A Concise Introduction to Multiagent Systems andDistributed Artificial Intelligence Synthesis Lectures on Artifi-cial Intelligence and Machine Learning Morgan amp Claypool2007

[3] A Antoniades H J Kim and S Sastry ldquoPursuit-evasion strate-gies for teams of multiple agents with incomplete informationrdquoin Proceedings of the 42nd IEEE Conference on Decision andControl pp 756ndash761 December 2003

[4] L J Guibas J-C Latombe S M Lavalle D Lin and RMotwani ldquoA visibility-based pursuit-evasion problemrdquo Interna-tional Journal of Computational GeometryampApplications vol 9no 4-5 pp 471ndash493 1999

[5] J P Hespanha G J Pappas and M Prandini ldquoGreedy controlfor hybrid pur-suit-evasion gamesrdquo in Proceedings of the Euro-pean Control Conference pp 2621ndash2626 2001

[6] R Vidal O Shakernia H J Kim D H Shim and S SastryldquoProbabilistic pursuit-evasion games theory implementation

Mathematical Problems in Engineering 13

and experimental evaluationrdquo IEEE Transactions on Roboticsand Automation vol 18 no 5 pp 662ndash669 2002

[7] R E Korf ldquoA simple solution to pursuit gamesrdquo in Proceedingsof the 2nd International Workshop on Distributed ArtificialIntelligence pp 183ndash194 1992

[8] S Nolfi and D Floreano ldquoCoevolving predator and prey robotsdo ldquoarms racesrdquo arise in artificial evolutionrdquo Artificial Life vol4 no 4 pp 311ndash335 1998

[9] N J Savill and P Hogeweg ldquoEvolutionary stagnation due topattern-pattern interactions in a coevolutionary predator-preymodelrdquo Artificial Life vol 3 no 2 pp 81ndash100 1997

[10] J Y Kuo F-C Huang S-P Ma and Y-Y Fanjiang ldquoApplyinghybrid learning approach to RoboCuprsquos strategyrdquo Journal ofSystems and Software vol 86 no 7 pp 1933ndash1944 2013

[11] S I Nishimura and T Ikegami ldquoEmergence of collectivestrategies in a prey-predator game modelrdquo Artificial Life vol 3no 4 pp 243ndash260 1998

[12] K-C Jim and C L Giles ldquoTalking helps evolving communi-cating agents for the predator-prey pursuit problemrdquo ArtificialLife vol 6 no 3 pp 237ndash254 2000

[13] T Haynes and S Sen ldquoEvolving behavioral strategies in preda-tors and preyrdquo inAdaption and Learning inMulti-Agent Systemsvol 1042 of Lecture Notes in Computer Science pp 113ndash126Springer Berlin Germany 1995

[14] D Hladek J Vascak and P Sincak ldquoHierarchical fuzzy infer-ence system for robotic pursuit evasion taskrdquo in Proceedings ofthe 6th International SymposiumonAppliedMachine Intelligenceand Informatics (SAMI rsquo08) pp 273ndash277 Herlrsquoany SlovakiaJanuary 2008

[15] R G Smith ldquoThe contract net protocol high-level commu-nication and control in a distributed problem solverrdquo IEEETransactions on Computers vol 19 no 12 pp 1104ndash1113 1980

[16] P-C Zhou B-R Hong Y-H Wang and T Zhou ldquoMulti-agent coopoerative pursuit based on extended contract netprotocolrdquo in Proceedings of the International Conference onMachine Learning and Cybernetics vol 1 pp 169ndash173 August2004

[17] A Rao and M Georgeff ldquoBDI agents from theory to practicerdquoin Proceedings of the International Conference on Multi-AgentSystems pp 312ndash319 1995

[18] J Piaget The Equilibration of Cognitive Structures The CentralProblem of Intellectual Development University of ChicagoPress 1985

[19] J Y Kuo S J Lee C L Wu N L Hsueh and J Lee ldquoEvolu-tionary agents for intelligent transport systemsrdquo InternationalJournal of Fuzzy Systems vol 7 no 2 pp 85ndash93 2005

[20] J Y Kuo and F C Huang ldquoA hybrid approach for multi-agentlearning systemsrdquo Intelligent Automation and Soft Computingvol 17 no 3 pp 385ndash399 2011

[21] S Gebhardt P Grant R von Georgi and M T HuberldquoAspects of Piagetrsquos cognitive developmental psychology andneurobiology of psychotic disordersmdashan integrative modelrdquoMedical Hypotheses vol 71 no 3 pp 426ndash433 2008

[22] S Takamuku and R C Arkin ldquoMulti-method learning andassimilationrdquo Robotics and Autonomous Systems vol 55 no 8pp 618ndash627 2007

[23] T Parsons ldquoPursuit-evasion in a graphrdquo in Theory and Appli-cations of Graphs vol 642 of Lecture Notes in Mathematics pp426ndash441 Springer Heidelberg Germany 1978

[24] R Breisch ldquoAn intuitive approach to speleotopologyrdquo South-western Cavers vol 6 no 5 pp 72ndash78 1967

[25] F Ferrari ldquoA new parameterized potential family for path plan-ning algorithmsrdquo International Journal on Artificial IntelligenceTools vol 18 no 6 pp 949ndash957 2009

[26] K Walsh and B Banerjee ldquoFast Alowast with iterative resolution fornavigationrdquo International Journal on Artificial Intelligence Toolsvol 19 no 1 pp 101ndash119 2010

[27] L-P Wong M Y H Low and C S Chong ldquoBee colonyoptimization with local search for traveling salesman problemrdquoInternational Journal on Artificial Intelligence Tools vol 19 no3 pp 305ndash334 2010

[28] E Raboin D S Nau U Kuter S K Gupta and P SvecldquoStrategy generation inmulti-agent imperfect-information pur-suit gamesrdquo in Proceedings of the International Conference onAutonomous Agents and Multiagent Systems pp 947ndash954 2010

[29] B PGerkey SThrun andGGordon ldquoVisibility-based pursuit-evasion with limited field of Viewrdquo International Journal ofRobotics Research vol 25 no 4 pp 299ndash315 2006

[30] N Basilico N Gatti and F Amigoni ldquoLeader-follower strate-gies for robotic patrolling in environments with arbitrarytopologiesrdquo in Proceedings of the 8th International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMASrsquo09) pp 57ndash64 May 2009

[31] M Yan ldquoMulti-robot searching using game-theory basedapproachrdquo International Journal of Advanced Robotic Systemsvol 5 no 4 pp 341ndash350 2008

[32] K-DAlthoff ldquoCase-based reasoningrdquo inHandbook on SoftwareEngineering and Knowledge Engineering pp 549ndash587 2001

[33] E K Burke B L MacCarthy S Petrovic and R Qu ldquoMultiple-retrieval case-based reasoning for course timetabling prob-lemsrdquo Journal of the Operational Research Society vol 57 no2 pp 148ndash162 2006

[34] J Long D G Schwartz S Stoecklin and M K Patel ldquoAppli-cation of loop reduction to learning program behaviors foranomaly detectionrdquo in Proceedings of the International Confer-ence on Information Technology Coding and Computing (ITCC05) vol 1 pp 691ndash696 April 2005

[35] The Repast Suite The Repast Suite 2014 httprepastsource-forgenet

[36] L Matignon G J Laurent and N L Fort-Piat ldquoHysteretic Q-Learning an algorithm for decentralized reinforcement learn-ing in cooperative multi-agent teamsrdquo in Proceedings of theIEEERSJ International Conference on Intelligent Robots andSystems (IROS 07) pp 64ndash69 San Diego Calif USA October-November 2007

[37] I Partalas I Feneris and I Vlahavas ldquoA hybrid multiagentreinforcement learning approach using strategies and fusionrdquoInternational Journal on Artificial Intelligence Tools vol 17 no5 pp 945ndash962 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Multiagent Cooperative Learning ...downloads.hindawi.com/journals/mpe/2015/964871.pdf · Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion

10 Mathematical Problems in Engineering

(1) while true do(2) next location larr generates randomly a location nearby the evader(3) if next location has an obstacle then(4) next location larr generates randomly a new location nearby the evader(5) end if(6) end while

Algorithm 1 Random movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than the environment(3) move larr 0(4) while true do(5) get the next location according to the direction(6) move add 1(7) if move equals to distance or next location has an obstacle then(8) evader turns right(9) changes direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 2 Clockwise movement algorithm

(1) direction larr generates randomly a direction(2) distance larr generates randomly a distance shorter than environment(3) move larr 0(4) while true do(7) if move equals to distance or next location has obstacle then(8) evader turns left(9) change direction to evader(10) move larr 0(11) end if(12) end while

Algorithm 3 Counterclockwise movement algorithm

(1) while true do(2) if evaderrsquos sensor area has pursuer then(3) next location larr generate a location with least pursuers detected(4) else(5) next location larr randomly generate a location nearby the evader(6) end if(7) end while

Algorithm 4 Smart movement algorithm

the same reason that it does not support agent cooperationAccording to the RL-Fusing algorithm presented in Algo-rithm 5 when a state is not the coordinated state the actionof an agent chooses a random or predefined policy and asingle pursuer thus has no learning ability Therefore the

possibility of catching an evader is low and RL-Fusing alsotakes more capture steps In contrast when the pursuer ofthe proposed scheme faces an evader the pursuer observesthe evader continuously If the evader changes its movingtype the pursuer uses accommodation to change its catching

Mathematical Problems in Engineering 11

Require A set of strategies an initial state 1199040

(1) 119904 larr 1199040

(2) while true do(3) if 119904 is coordinated state then(4) 120588 larr RandomReal(0 1)

(5) if 120588 lt 120576 then(6) select a strategy randomly the same for all agent(7) else(8) rank strategies(9) communicate ranks(10) receive ranks from other agents(11) average ranks(12) select corresponding strategy(13) end if(14) execute selected strategy(15) receive reward(16) transit to a new state 119904

1015840

(17) update 119876-value(18) else(19) act with a predefined or random policy(20) end if(21) end while

Algorithm 5 RL-Fusing algorithm

Table 1 Average number of capture steps in different environments

Grid sizeWithoutobstacles(steps)

With obstacles(steps)

HystereticQ-Learning 12 lowast 12 68 73

RL-Fusing 12 lowast 12 61 65The proposedapproach 12 lowast 12 24 27

Table 2 Average number of capture steps for different movementtypes

Grid sizeClockwisemovement(steps)

Randomclockwise andcounterclock-wise movement

(steps)HystereticQ-Learning 12 lowast 12 56 69

RL-Fusing 12 lowast 12 48 56The proposedapproach 12 lowast 12 26 32

strategy If the moving type remains fixed the pursuer keepsthe same strategy by assimilation As seen in Table 2 allthree methods require more steps to capture evaders withrandom clockwise and counterclockwise movement becausepursuers need extra time to detect the moving directions ofthe evaders

Table 3 Average number of capture steps for evaders with smartmovement in different grid sizes

Grid size 12 lowast 12(steps)

Grid size 24 lowast 24(steps)

HystereticQ-Learning 74 212

RL-Fusing 63 300The proposedapproach 63 119

Table 4 Average number of capture steps withwithout assimilationand accommodation

Evader movementtype

Clockwise

Evader movementtypeSmart

With assimilation andaccommodation 186 278

Without assimilationand accommodation 242 364

434 Smart Movement and Size of Grid The third experi-ment involves evaders with smart movementThe evaders tryactively to avoid pursuers when they sense any pursuer Theenvironment is of two grid sizes 12 lowast 12 and 24 lowast 24

Table 3 shows that the proposed scheme requires smallercapture steps in different grid sizes when compared withHysteretic Q-Learning and RL-Fusing When the grid sizeis smaller RL-Fusing outperforms Hysteretic Q-LearningHowever as the grid size becomes larger RL-Fusing requiresmore capture steps than Hysteretic Q-Learning This is

12 Mathematical Problems in Engineering

0

20

40

60

80

100

120

140

10 20 30 40Training episodes

Step

s

Grid size 12 lowast 12 (steps)Grid size 24 lowast 24 (steps)

Figure 5 Average number of capture steps under different trainingepisodes and grid sizes

because RL-Fusing depends mainly on its cooperative strate-gies to catch evaders but not all strategies can guide pursuersto move close to the evaders For example strategy 3 in RL-Fusing [37] is that once all the pursuers go at the distance of3 from an evader Because the sensing area of a pursuer is a5 lowast 5 square the pursuers easily lose the evader in a largerenvironment In contrast the proposed approach assists thepursuers to move as close to the evader as possible by theavoidance of sensing overlappingThus the approach has thehighest possibility of catching the evader

435 Smart Movement Training Episodes and Size of GridThe fourth experiment investigates the relationship betweentraining episodes and grid sizes when using the proposedapproach

Figure 5 shows that regardless of grid sizes (12 lowast 12 or24 lowast 24) the greater the number of training episodes is thesmaller the average number of capture steps is required Thisis becausemore training enriches the case base thus enablingpursuers to make better strategies and decisions

436 Assimilation and Accommodation The last experimentevaluates the proposed approach withwithout assimilationand accommodation There are two types of evader move-ment clockwise and smart The results are listed in Table 4Whether the approach is with or without assimilation andaccommodation the average number of capture steps forsmartmovement is greater than that for clockwisemovementThis is because smart movement can enable an evader todynamically change its movement and position In contrastmovement of an evader with clockwise movement can beeasily predicted hence the pursuers can take fewer steps forcapture Furthermore the average number of capture steps

required by the approach with assimilation and accommo-dation is smaller than that required by the approach withoutassimilation and accommodation for both movement typesThe approach without assimilation and accommodation gen-erates a fixed plan which cannot be changed In contrastthe approach with assimilation and accommodation enablesthe first pursuer to use the local-max strategy to find abetter position and the remaining pursuers can determinethe cooperative positions according to the first pursuerrsquosposition These pursuers can adjust their positions accordingto different catching situations and strategies Accordinglythe approach with assimilation and accommodation requiresfewer capture steps when compared with the approachwithout assimilation and accommodation

5 Conclusion

This work proposes an effective strategy learning approachto the pursuit-evasion game in a dynamic environment Thismethod can facilitate a pursuer in learning and implementingeffective plans from either success or failure experiencesunder the condition of no knowledge of environment avail-able The evolved plans fare extremely well when comparedwith some of the best constructed strategies This studyalso demonstrates the superior performance of the proposedapproach for the pursuit-evasion game by implementing amultiagent system on REPAST

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Ministry of Science andTechnology under Grant no 102-2218-E-027-007

References

[1] K P Sycara ldquoMultiagent systemsrdquo AI Magazine vol 19 no 2pp 79ndash92 1998

[2] N Vlassis A Concise Introduction to Multiagent Systems andDistributed Artificial Intelligence Synthesis Lectures on Artifi-cial Intelligence and Machine Learning Morgan amp Claypool2007

[3] A Antoniades H J Kim and S Sastry ldquoPursuit-evasion strate-gies for teams of multiple agents with incomplete informationrdquoin Proceedings of the 42nd IEEE Conference on Decision andControl pp 756ndash761 December 2003

[4] L J Guibas J-C Latombe S M Lavalle D Lin and RMotwani ldquoA visibility-based pursuit-evasion problemrdquo Interna-tional Journal of Computational GeometryampApplications vol 9no 4-5 pp 471ndash493 1999

[5] J P Hespanha G J Pappas and M Prandini ldquoGreedy controlfor hybrid pur-suit-evasion gamesrdquo in Proceedings of the Euro-pean Control Conference pp 2621ndash2626 2001

[6] R Vidal O Shakernia H J Kim D H Shim and S SastryldquoProbabilistic pursuit-evasion games theory implementation

Mathematical Problems in Engineering 13

and experimental evaluationrdquo IEEE Transactions on Roboticsand Automation vol 18 no 5 pp 662ndash669 2002

[7] R E Korf ldquoA simple solution to pursuit gamesrdquo in Proceedingsof the 2nd International Workshop on Distributed ArtificialIntelligence pp 183ndash194 1992

[8] S Nolfi and D Floreano ldquoCoevolving predator and prey robotsdo ldquoarms racesrdquo arise in artificial evolutionrdquo Artificial Life vol4 no 4 pp 311ndash335 1998

[9] N J Savill and P Hogeweg ldquoEvolutionary stagnation due topattern-pattern interactions in a coevolutionary predator-preymodelrdquo Artificial Life vol 3 no 2 pp 81ndash100 1997

[10] J Y Kuo F-C Huang S-P Ma and Y-Y Fanjiang ldquoApplyinghybrid learning approach to RoboCuprsquos strategyrdquo Journal ofSystems and Software vol 86 no 7 pp 1933ndash1944 2013

[11] S I Nishimura and T Ikegami ldquoEmergence of collectivestrategies in a prey-predator game modelrdquo Artificial Life vol 3no 4 pp 243ndash260 1998

[12] K-C Jim and C L Giles ldquoTalking helps evolving communi-cating agents for the predator-prey pursuit problemrdquo ArtificialLife vol 6 no 3 pp 237ndash254 2000

[13] T Haynes and S Sen ldquoEvolving behavioral strategies in preda-tors and preyrdquo inAdaption and Learning inMulti-Agent Systemsvol 1042 of Lecture Notes in Computer Science pp 113ndash126Springer Berlin Germany 1995

[14] D Hladek J Vascak and P Sincak ldquoHierarchical fuzzy infer-ence system for robotic pursuit evasion taskrdquo in Proceedings ofthe 6th International SymposiumonAppliedMachine Intelligenceand Informatics (SAMI rsquo08) pp 273ndash277 Herlrsquoany SlovakiaJanuary 2008

[15] R G Smith ldquoThe contract net protocol high-level commu-nication and control in a distributed problem solverrdquo IEEETransactions on Computers vol 19 no 12 pp 1104ndash1113 1980

[16] P-C Zhou B-R Hong Y-H Wang and T Zhou ldquoMulti-agent coopoerative pursuit based on extended contract netprotocolrdquo in Proceedings of the International Conference onMachine Learning and Cybernetics vol 1 pp 169ndash173 August2004

[17] A Rao and M Georgeff ldquoBDI agents from theory to practicerdquoin Proceedings of the International Conference on Multi-AgentSystems pp 312ndash319 1995

[18] J Piaget The Equilibration of Cognitive Structures The CentralProblem of Intellectual Development University of ChicagoPress 1985

[19] J Y Kuo S J Lee C L Wu N L Hsueh and J Lee ldquoEvolu-tionary agents for intelligent transport systemsrdquo InternationalJournal of Fuzzy Systems vol 7 no 2 pp 85ndash93 2005

[20] J Y Kuo and F C Huang ldquoA hybrid approach for multi-agentlearning systemsrdquo Intelligent Automation and Soft Computingvol 17 no 3 pp 385ndash399 2011

[21] S Gebhardt P Grant R von Georgi and M T HuberldquoAspects of Piagetrsquos cognitive developmental psychology andneurobiology of psychotic disordersmdashan integrative modelrdquoMedical Hypotheses vol 71 no 3 pp 426ndash433 2008

[22] S Takamuku and R C Arkin ldquoMulti-method learning andassimilationrdquo Robotics and Autonomous Systems vol 55 no 8pp 618ndash627 2007

[23] T Parsons ldquoPursuit-evasion in a graphrdquo in Theory and Appli-cations of Graphs vol 642 of Lecture Notes in Mathematics pp426ndash441 Springer Heidelberg Germany 1978

[24] R Breisch ldquoAn intuitive approach to speleotopologyrdquo South-western Cavers vol 6 no 5 pp 72ndash78 1967

[25] F Ferrari ldquoA new parameterized potential family for path plan-ning algorithmsrdquo International Journal on Artificial IntelligenceTools vol 18 no 6 pp 949ndash957 2009

[26] K Walsh and B Banerjee ldquoFast Alowast with iterative resolution fornavigationrdquo International Journal on Artificial Intelligence Toolsvol 19 no 1 pp 101ndash119 2010

[27] L-P Wong M Y H Low and C S Chong ldquoBee colonyoptimization with local search for traveling salesman problemrdquoInternational Journal on Artificial Intelligence Tools vol 19 no3 pp 305ndash334 2010

[28] E Raboin D S Nau U Kuter S K Gupta and P SvecldquoStrategy generation inmulti-agent imperfect-information pur-suit gamesrdquo in Proceedings of the International Conference onAutonomous Agents and Multiagent Systems pp 947ndash954 2010

[29] B PGerkey SThrun andGGordon ldquoVisibility-based pursuit-evasion with limited field of Viewrdquo International Journal ofRobotics Research vol 25 no 4 pp 299ndash315 2006

[30] N Basilico N Gatti and F Amigoni ldquoLeader-follower strate-gies for robotic patrolling in environments with arbitrarytopologiesrdquo in Proceedings of the 8th International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMASrsquo09) pp 57ndash64 May 2009

[31] M Yan ldquoMulti-robot searching using game-theory basedapproachrdquo International Journal of Advanced Robotic Systemsvol 5 no 4 pp 341ndash350 2008

[32] K-DAlthoff ldquoCase-based reasoningrdquo inHandbook on SoftwareEngineering and Knowledge Engineering pp 549ndash587 2001

[33] E K Burke B L MacCarthy S Petrovic and R Qu ldquoMultiple-retrieval case-based reasoning for course timetabling prob-lemsrdquo Journal of the Operational Research Society vol 57 no2 pp 148ndash162 2006

[34] J Long D G Schwartz S Stoecklin and M K Patel ldquoAppli-cation of loop reduction to learning program behaviors foranomaly detectionrdquo in Proceedings of the International Confer-ence on Information Technology Coding and Computing (ITCC05) vol 1 pp 691ndash696 April 2005

[35] The Repast Suite The Repast Suite 2014 httprepastsource-forgenet

[36] L Matignon G J Laurent and N L Fort-Piat ldquoHysteretic Q-Learning an algorithm for decentralized reinforcement learn-ing in cooperative multi-agent teamsrdquo in Proceedings of theIEEERSJ International Conference on Intelligent Robots andSystems (IROS 07) pp 64ndash69 San Diego Calif USA October-November 2007

[37] I Partalas I Feneris and I Vlahavas ldquoA hybrid multiagentreinforcement learning approach using strategies and fusionrdquoInternational Journal on Artificial Intelligence Tools vol 17 no5 pp 945ndash962 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article Multiagent Cooperative Learning ...downloads.hindawi.com/journals/mpe/2015/964871.pdf · Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion

Mathematical Problems in Engineering 11

Require A set of strategies an initial state 1199040

(1) 119904 larr 1199040

(2) while true do(3) if 119904 is coordinated state then(4) 120588 larr RandomReal(0 1)

(5) if 120588 lt 120576 then(6) select a strategy randomly the same for all agent(7) else(8) rank strategies(9) communicate ranks(10) receive ranks from other agents(11) average ranks(12) select corresponding strategy(13) end if(14) execute selected strategy(15) receive reward(16) transit to a new state 119904

1015840

(17) update 119876-value(18) else(19) act with a predefined or random policy(20) end if(21) end while

Algorithm 5 RL-Fusing algorithm

Table 1 Average number of capture steps in different environments

Grid sizeWithoutobstacles(steps)

With obstacles(steps)

HystereticQ-Learning 12 lowast 12 68 73

RL-Fusing 12 lowast 12 61 65The proposedapproach 12 lowast 12 24 27

Table 2 Average number of capture steps for different movementtypes

Grid sizeClockwisemovement(steps)

Randomclockwise andcounterclock-wise movement

(steps)HystereticQ-Learning 12 lowast 12 56 69

RL-Fusing 12 lowast 12 48 56The proposedapproach 12 lowast 12 26 32

strategy If the moving type remains fixed the pursuer keepsthe same strategy by assimilation As seen in Table 2 allthree methods require more steps to capture evaders withrandom clockwise and counterclockwise movement becausepursuers need extra time to detect the moving directions ofthe evaders

Table 3 Average number of capture steps for evaders with smartmovement in different grid sizes

Grid size 12 lowast 12(steps)

Grid size 24 lowast 24(steps)

HystereticQ-Learning 74 212

RL-Fusing 63 300The proposedapproach 63 119

Table 4 Average number of capture steps withwithout assimilationand accommodation

Evader movementtype

Clockwise

Evader movementtypeSmart

With assimilation andaccommodation 186 278

Without assimilationand accommodation 242 364

434 Smart Movement and Size of Grid The third experi-ment involves evaders with smart movementThe evaders tryactively to avoid pursuers when they sense any pursuer Theenvironment is of two grid sizes 12 lowast 12 and 24 lowast 24

Table 3 shows that the proposed scheme requires smallercapture steps in different grid sizes when compared withHysteretic Q-Learning and RL-Fusing When the grid sizeis smaller RL-Fusing outperforms Hysteretic Q-LearningHowever as the grid size becomes larger RL-Fusing requiresmore capture steps than Hysteretic Q-Learning This is

12 Mathematical Problems in Engineering

0

20

40

60

80

100

120

140

10 20 30 40Training episodes

Step

s

Grid size 12 lowast 12 (steps)Grid size 24 lowast 24 (steps)

Figure 5 Average number of capture steps under different trainingepisodes and grid sizes

because RL-Fusing depends mainly on its cooperative strate-gies to catch evaders but not all strategies can guide pursuersto move close to the evaders For example strategy 3 in RL-Fusing [37] is that once all the pursuers go at the distance of3 from an evader Because the sensing area of a pursuer is a5 lowast 5 square the pursuers easily lose the evader in a largerenvironment In contrast the proposed approach assists thepursuers to move as close to the evader as possible by theavoidance of sensing overlappingThus the approach has thehighest possibility of catching the evader

435 Smart Movement Training Episodes and Size of GridThe fourth experiment investigates the relationship betweentraining episodes and grid sizes when using the proposedapproach

Figure 5 shows that regardless of grid sizes (12 lowast 12 or24 lowast 24) the greater the number of training episodes is thesmaller the average number of capture steps is required Thisis becausemore training enriches the case base thus enablingpursuers to make better strategies and decisions

436 Assimilation and Accommodation The last experimentevaluates the proposed approach withwithout assimilationand accommodation There are two types of evader move-ment clockwise and smart The results are listed in Table 4Whether the approach is with or without assimilation andaccommodation the average number of capture steps forsmartmovement is greater than that for clockwisemovementThis is because smart movement can enable an evader todynamically change its movement and position In contrastmovement of an evader with clockwise movement can beeasily predicted hence the pursuers can take fewer steps forcapture Furthermore the average number of capture steps

required by the approach with assimilation and accommo-dation is smaller than that required by the approach withoutassimilation and accommodation for both movement typesThe approach without assimilation and accommodation gen-erates a fixed plan which cannot be changed In contrastthe approach with assimilation and accommodation enablesthe first pursuer to use the local-max strategy to find abetter position and the remaining pursuers can determinethe cooperative positions according to the first pursuerrsquosposition These pursuers can adjust their positions accordingto different catching situations and strategies Accordinglythe approach with assimilation and accommodation requiresfewer capture steps when compared with the approachwithout assimilation and accommodation

5 Conclusion

This work proposes an effective strategy learning approachto the pursuit-evasion game in a dynamic environment Thismethod can facilitate a pursuer in learning and implementingeffective plans from either success or failure experiencesunder the condition of no knowledge of environment avail-able The evolved plans fare extremely well when comparedwith some of the best constructed strategies This studyalso demonstrates the superior performance of the proposedapproach for the pursuit-evasion game by implementing amultiagent system on REPAST

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Ministry of Science andTechnology under Grant no 102-2218-E-027-007

References

[1] K P Sycara ldquoMultiagent systemsrdquo AI Magazine vol 19 no 2pp 79ndash92 1998

[2] N Vlassis A Concise Introduction to Multiagent Systems andDistributed Artificial Intelligence Synthesis Lectures on Artifi-cial Intelligence and Machine Learning Morgan amp Claypool2007

[3] A Antoniades H J Kim and S Sastry ldquoPursuit-evasion strate-gies for teams of multiple agents with incomplete informationrdquoin Proceedings of the 42nd IEEE Conference on Decision andControl pp 756ndash761 December 2003

[4] L J Guibas J-C Latombe S M Lavalle D Lin and RMotwani ldquoA visibility-based pursuit-evasion problemrdquo Interna-tional Journal of Computational GeometryampApplications vol 9no 4-5 pp 471ndash493 1999

[5] J P Hespanha G J Pappas and M Prandini ldquoGreedy controlfor hybrid pur-suit-evasion gamesrdquo in Proceedings of the Euro-pean Control Conference pp 2621ndash2626 2001

[6] R Vidal O Shakernia H J Kim D H Shim and S SastryldquoProbabilistic pursuit-evasion games theory implementation

Mathematical Problems in Engineering 13

and experimental evaluationrdquo IEEE Transactions on Roboticsand Automation vol 18 no 5 pp 662ndash669 2002

[7] R E Korf ldquoA simple solution to pursuit gamesrdquo in Proceedingsof the 2nd International Workshop on Distributed ArtificialIntelligence pp 183ndash194 1992

[8] S Nolfi and D Floreano ldquoCoevolving predator and prey robotsdo ldquoarms racesrdquo arise in artificial evolutionrdquo Artificial Life vol4 no 4 pp 311ndash335 1998

[9] N J Savill and P Hogeweg ldquoEvolutionary stagnation due topattern-pattern interactions in a coevolutionary predator-preymodelrdquo Artificial Life vol 3 no 2 pp 81ndash100 1997

[10] J Y Kuo F-C Huang S-P Ma and Y-Y Fanjiang ldquoApplyinghybrid learning approach to RoboCuprsquos strategyrdquo Journal ofSystems and Software vol 86 no 7 pp 1933ndash1944 2013

[11] S I Nishimura and T Ikegami ldquoEmergence of collectivestrategies in a prey-predator game modelrdquo Artificial Life vol 3no 4 pp 243ndash260 1998

[12] K-C Jim and C L Giles ldquoTalking helps evolving communi-cating agents for the predator-prey pursuit problemrdquo ArtificialLife vol 6 no 3 pp 237ndash254 2000

[13] T Haynes and S Sen ldquoEvolving behavioral strategies in preda-tors and preyrdquo inAdaption and Learning inMulti-Agent Systemsvol 1042 of Lecture Notes in Computer Science pp 113ndash126Springer Berlin Germany 1995

[14] D Hladek J Vascak and P Sincak ldquoHierarchical fuzzy infer-ence system for robotic pursuit evasion taskrdquo in Proceedings ofthe 6th International SymposiumonAppliedMachine Intelligenceand Informatics (SAMI rsquo08) pp 273ndash277 Herlrsquoany SlovakiaJanuary 2008

[15] R G Smith ldquoThe contract net protocol high-level commu-nication and control in a distributed problem solverrdquo IEEETransactions on Computers vol 19 no 12 pp 1104ndash1113 1980

[16] P-C Zhou B-R Hong Y-H Wang and T Zhou ldquoMulti-agent coopoerative pursuit based on extended contract netprotocolrdquo in Proceedings of the International Conference onMachine Learning and Cybernetics vol 1 pp 169ndash173 August2004

[17] A Rao and M Georgeff ldquoBDI agents from theory to practicerdquoin Proceedings of the International Conference on Multi-AgentSystems pp 312ndash319 1995

[18] J Piaget The Equilibration of Cognitive Structures The CentralProblem of Intellectual Development University of ChicagoPress 1985

[19] J Y Kuo S J Lee C L Wu N L Hsueh and J Lee ldquoEvolu-tionary agents for intelligent transport systemsrdquo InternationalJournal of Fuzzy Systems vol 7 no 2 pp 85ndash93 2005

[20] J Y Kuo and F C Huang ldquoA hybrid approach for multi-agentlearning systemsrdquo Intelligent Automation and Soft Computingvol 17 no 3 pp 385ndash399 2011

[21] S Gebhardt P Grant R von Georgi and M T HuberldquoAspects of Piagetrsquos cognitive developmental psychology andneurobiology of psychotic disordersmdashan integrative modelrdquoMedical Hypotheses vol 71 no 3 pp 426ndash433 2008

[22] S Takamuku and R C Arkin ldquoMulti-method learning andassimilationrdquo Robotics and Autonomous Systems vol 55 no 8pp 618ndash627 2007

[23] T Parsons ldquoPursuit-evasion in a graphrdquo in Theory and Appli-cations of Graphs vol 642 of Lecture Notes in Mathematics pp426ndash441 Springer Heidelberg Germany 1978

[24] R Breisch ldquoAn intuitive approach to speleotopologyrdquo South-western Cavers vol 6 no 5 pp 72ndash78 1967

[25] F Ferrari ldquoA new parameterized potential family for path plan-ning algorithmsrdquo International Journal on Artificial IntelligenceTools vol 18 no 6 pp 949ndash957 2009

[26] K Walsh and B Banerjee ldquoFast Alowast with iterative resolution fornavigationrdquo International Journal on Artificial Intelligence Toolsvol 19 no 1 pp 101ndash119 2010

[27] L-P Wong M Y H Low and C S Chong ldquoBee colonyoptimization with local search for traveling salesman problemrdquoInternational Journal on Artificial Intelligence Tools vol 19 no3 pp 305ndash334 2010

[28] E Raboin D S Nau U Kuter S K Gupta and P SvecldquoStrategy generation inmulti-agent imperfect-information pur-suit gamesrdquo in Proceedings of the International Conference onAutonomous Agents and Multiagent Systems pp 947ndash954 2010

[29] B PGerkey SThrun andGGordon ldquoVisibility-based pursuit-evasion with limited field of Viewrdquo International Journal ofRobotics Research vol 25 no 4 pp 299ndash315 2006

[30] N Basilico N Gatti and F Amigoni ldquoLeader-follower strate-gies for robotic patrolling in environments with arbitrarytopologiesrdquo in Proceedings of the 8th International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMASrsquo09) pp 57ndash64 May 2009

[31] M Yan ldquoMulti-robot searching using game-theory basedapproachrdquo International Journal of Advanced Robotic Systemsvol 5 no 4 pp 341ndash350 2008

[32] K-DAlthoff ldquoCase-based reasoningrdquo inHandbook on SoftwareEngineering and Knowledge Engineering pp 549ndash587 2001

[33] E K Burke B L MacCarthy S Petrovic and R Qu ldquoMultiple-retrieval case-based reasoning for course timetabling prob-lemsrdquo Journal of the Operational Research Society vol 57 no2 pp 148ndash162 2006

[34] J Long D G Schwartz S Stoecklin and M K Patel ldquoAppli-cation of loop reduction to learning program behaviors foranomaly detectionrdquo in Proceedings of the International Confer-ence on Information Technology Coding and Computing (ITCC05) vol 1 pp 691ndash696 April 2005

[35] The Repast Suite The Repast Suite 2014 httprepastsource-forgenet

[36] L Matignon G J Laurent and N L Fort-Piat ldquoHysteretic Q-Learning an algorithm for decentralized reinforcement learn-ing in cooperative multi-agent teamsrdquo in Proceedings of theIEEERSJ International Conference on Intelligent Robots andSystems (IROS 07) pp 64ndash69 San Diego Calif USA October-November 2007

[37] I Partalas I Feneris and I Vlahavas ldquoA hybrid multiagentreinforcement learning approach using strategies and fusionrdquoInternational Journal on Artificial Intelligence Tools vol 17 no5 pp 945ndash962 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article Multiagent Cooperative Learning ...downloads.hindawi.com/journals/mpe/2015/964871.pdf · Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion

12 Mathematical Problems in Engineering

0

20

40

60

80

100

120

140

10 20 30 40Training episodes

Step

s

Grid size 12 lowast 12 (steps)Grid size 24 lowast 24 (steps)

Figure 5 Average number of capture steps under different trainingepisodes and grid sizes

because RL-Fusing depends mainly on its cooperative strate-gies to catch evaders but not all strategies can guide pursuersto move close to the evaders For example strategy 3 in RL-Fusing [37] is that once all the pursuers go at the distance of3 from an evader Because the sensing area of a pursuer is a5 lowast 5 square the pursuers easily lose the evader in a largerenvironment In contrast the proposed approach assists thepursuers to move as close to the evader as possible by theavoidance of sensing overlappingThus the approach has thehighest possibility of catching the evader

435 Smart Movement Training Episodes and Size of GridThe fourth experiment investigates the relationship betweentraining episodes and grid sizes when using the proposedapproach

Figure 5 shows that regardless of grid sizes (12 lowast 12 or24 lowast 24) the greater the number of training episodes is thesmaller the average number of capture steps is required Thisis becausemore training enriches the case base thus enablingpursuers to make better strategies and decisions

436 Assimilation and Accommodation The last experimentevaluates the proposed approach withwithout assimilationand accommodation There are two types of evader move-ment clockwise and smart The results are listed in Table 4Whether the approach is with or without assimilation andaccommodation the average number of capture steps forsmartmovement is greater than that for clockwisemovementThis is because smart movement can enable an evader todynamically change its movement and position In contrastmovement of an evader with clockwise movement can beeasily predicted hence the pursuers can take fewer steps forcapture Furthermore the average number of capture steps

required by the approach with assimilation and accommo-dation is smaller than that required by the approach withoutassimilation and accommodation for both movement typesThe approach without assimilation and accommodation gen-erates a fixed plan which cannot be changed In contrastthe approach with assimilation and accommodation enablesthe first pursuer to use the local-max strategy to find abetter position and the remaining pursuers can determinethe cooperative positions according to the first pursuerrsquosposition These pursuers can adjust their positions accordingto different catching situations and strategies Accordinglythe approach with assimilation and accommodation requiresfewer capture steps when compared with the approachwithout assimilation and accommodation

5 Conclusion

This work proposes an effective strategy learning approachto the pursuit-evasion game in a dynamic environment Thismethod can facilitate a pursuer in learning and implementingeffective plans from either success or failure experiencesunder the condition of no knowledge of environment avail-able The evolved plans fare extremely well when comparedwith some of the best constructed strategies This studyalso demonstrates the superior performance of the proposedapproach for the pursuit-evasion game by implementing amultiagent system on REPAST

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work was supported by the Ministry of Science andTechnology under Grant no 102-2218-E-027-007

References

[1] K P Sycara ldquoMultiagent systemsrdquo AI Magazine vol 19 no 2pp 79ndash92 1998

[2] N Vlassis A Concise Introduction to Multiagent Systems andDistributed Artificial Intelligence Synthesis Lectures on Artifi-cial Intelligence and Machine Learning Morgan amp Claypool2007

[3] A Antoniades H J Kim and S Sastry ldquoPursuit-evasion strate-gies for teams of multiple agents with incomplete informationrdquoin Proceedings of the 42nd IEEE Conference on Decision andControl pp 756ndash761 December 2003

[4] L J Guibas J-C Latombe S M Lavalle D Lin and RMotwani ldquoA visibility-based pursuit-evasion problemrdquo Interna-tional Journal of Computational GeometryampApplications vol 9no 4-5 pp 471ndash493 1999

[5] J P Hespanha G J Pappas and M Prandini ldquoGreedy controlfor hybrid pur-suit-evasion gamesrdquo in Proceedings of the Euro-pean Control Conference pp 2621ndash2626 2001

[6] R Vidal O Shakernia H J Kim D H Shim and S SastryldquoProbabilistic pursuit-evasion games theory implementation

Mathematical Problems in Engineering 13

and experimental evaluationrdquo IEEE Transactions on Roboticsand Automation vol 18 no 5 pp 662ndash669 2002

[7] R E Korf ldquoA simple solution to pursuit gamesrdquo in Proceedingsof the 2nd International Workshop on Distributed ArtificialIntelligence pp 183ndash194 1992

[8] S Nolfi and D Floreano ldquoCoevolving predator and prey robotsdo ldquoarms racesrdquo arise in artificial evolutionrdquo Artificial Life vol4 no 4 pp 311ndash335 1998

[9] N J Savill and P Hogeweg ldquoEvolutionary stagnation due topattern-pattern interactions in a coevolutionary predator-preymodelrdquo Artificial Life vol 3 no 2 pp 81ndash100 1997

[10] J Y Kuo F-C Huang S-P Ma and Y-Y Fanjiang ldquoApplyinghybrid learning approach to RoboCuprsquos strategyrdquo Journal ofSystems and Software vol 86 no 7 pp 1933ndash1944 2013

[11] S I Nishimura and T Ikegami ldquoEmergence of collectivestrategies in a prey-predator game modelrdquo Artificial Life vol 3no 4 pp 243ndash260 1998

[12] K-C Jim and C L Giles ldquoTalking helps evolving communi-cating agents for the predator-prey pursuit problemrdquo ArtificialLife vol 6 no 3 pp 237ndash254 2000

[13] T Haynes and S Sen ldquoEvolving behavioral strategies in preda-tors and preyrdquo inAdaption and Learning inMulti-Agent Systemsvol 1042 of Lecture Notes in Computer Science pp 113ndash126Springer Berlin Germany 1995

[14] D Hladek J Vascak and P Sincak ldquoHierarchical fuzzy infer-ence system for robotic pursuit evasion taskrdquo in Proceedings ofthe 6th International SymposiumonAppliedMachine Intelligenceand Informatics (SAMI rsquo08) pp 273ndash277 Herlrsquoany SlovakiaJanuary 2008

[15] R G Smith ldquoThe contract net protocol high-level commu-nication and control in a distributed problem solverrdquo IEEETransactions on Computers vol 19 no 12 pp 1104ndash1113 1980

[16] P-C Zhou B-R Hong Y-H Wang and T Zhou ldquoMulti-agent coopoerative pursuit based on extended contract netprotocolrdquo in Proceedings of the International Conference onMachine Learning and Cybernetics vol 1 pp 169ndash173 August2004

[17] A Rao and M Georgeff ldquoBDI agents from theory to practicerdquoin Proceedings of the International Conference on Multi-AgentSystems pp 312ndash319 1995

[18] J Piaget The Equilibration of Cognitive Structures The CentralProblem of Intellectual Development University of ChicagoPress 1985

[19] J Y Kuo S J Lee C L Wu N L Hsueh and J Lee ldquoEvolu-tionary agents for intelligent transport systemsrdquo InternationalJournal of Fuzzy Systems vol 7 no 2 pp 85ndash93 2005

[20] J Y Kuo and F C Huang ldquoA hybrid approach for multi-agentlearning systemsrdquo Intelligent Automation and Soft Computingvol 17 no 3 pp 385ndash399 2011

[21] S Gebhardt P Grant R von Georgi and M T HuberldquoAspects of Piagetrsquos cognitive developmental psychology andneurobiology of psychotic disordersmdashan integrative modelrdquoMedical Hypotheses vol 71 no 3 pp 426ndash433 2008

[22] S Takamuku and R C Arkin ldquoMulti-method learning andassimilationrdquo Robotics and Autonomous Systems vol 55 no 8pp 618ndash627 2007

[23] T Parsons ldquoPursuit-evasion in a graphrdquo in Theory and Appli-cations of Graphs vol 642 of Lecture Notes in Mathematics pp426ndash441 Springer Heidelberg Germany 1978

[24] R Breisch ldquoAn intuitive approach to speleotopologyrdquo South-western Cavers vol 6 no 5 pp 72ndash78 1967

[25] F Ferrari ldquoA new parameterized potential family for path plan-ning algorithmsrdquo International Journal on Artificial IntelligenceTools vol 18 no 6 pp 949ndash957 2009

[26] K Walsh and B Banerjee ldquoFast Alowast with iterative resolution fornavigationrdquo International Journal on Artificial Intelligence Toolsvol 19 no 1 pp 101ndash119 2010

[27] L-P Wong M Y H Low and C S Chong ldquoBee colonyoptimization with local search for traveling salesman problemrdquoInternational Journal on Artificial Intelligence Tools vol 19 no3 pp 305ndash334 2010

[28] E Raboin D S Nau U Kuter S K Gupta and P SvecldquoStrategy generation inmulti-agent imperfect-information pur-suit gamesrdquo in Proceedings of the International Conference onAutonomous Agents and Multiagent Systems pp 947ndash954 2010

[29] B PGerkey SThrun andGGordon ldquoVisibility-based pursuit-evasion with limited field of Viewrdquo International Journal ofRobotics Research vol 25 no 4 pp 299ndash315 2006

[30] N Basilico N Gatti and F Amigoni ldquoLeader-follower strate-gies for robotic patrolling in environments with arbitrarytopologiesrdquo in Proceedings of the 8th International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMASrsquo09) pp 57ndash64 May 2009

[31] M Yan ldquoMulti-robot searching using game-theory basedapproachrdquo International Journal of Advanced Robotic Systemsvol 5 no 4 pp 341ndash350 2008

[32] K-DAlthoff ldquoCase-based reasoningrdquo inHandbook on SoftwareEngineering and Knowledge Engineering pp 549ndash587 2001

[33] E K Burke B L MacCarthy S Petrovic and R Qu ldquoMultiple-retrieval case-based reasoning for course timetabling prob-lemsrdquo Journal of the Operational Research Society vol 57 no2 pp 148ndash162 2006

[34] J Long D G Schwartz S Stoecklin and M K Patel ldquoAppli-cation of loop reduction to learning program behaviors foranomaly detectionrdquo in Proceedings of the International Confer-ence on Information Technology Coding and Computing (ITCC05) vol 1 pp 691ndash696 April 2005

[35] The Repast Suite The Repast Suite 2014 httprepastsource-forgenet

[36] L Matignon G J Laurent and N L Fort-Piat ldquoHysteretic Q-Learning an algorithm for decentralized reinforcement learn-ing in cooperative multi-agent teamsrdquo in Proceedings of theIEEERSJ International Conference on Intelligent Robots andSystems (IROS 07) pp 64ndash69 San Diego Calif USA October-November 2007

[37] I Partalas I Feneris and I Vlahavas ldquoA hybrid multiagentreinforcement learning approach using strategies and fusionrdquoInternational Journal on Artificial Intelligence Tools vol 17 no5 pp 945ndash962 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Research Article Multiagent Cooperative Learning ...downloads.hindawi.com/journals/mpe/2015/964871.pdf · Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion

Mathematical Problems in Engineering 13

and experimental evaluationrdquo IEEE Transactions on Roboticsand Automation vol 18 no 5 pp 662ndash669 2002

[7] R E Korf ldquoA simple solution to pursuit gamesrdquo in Proceedingsof the 2nd International Workshop on Distributed ArtificialIntelligence pp 183ndash194 1992

[8] S Nolfi and D Floreano ldquoCoevolving predator and prey robotsdo ldquoarms racesrdquo arise in artificial evolutionrdquo Artificial Life vol4 no 4 pp 311ndash335 1998

[9] N J Savill and P Hogeweg ldquoEvolutionary stagnation due topattern-pattern interactions in a coevolutionary predator-preymodelrdquo Artificial Life vol 3 no 2 pp 81ndash100 1997

[10] J Y Kuo F-C Huang S-P Ma and Y-Y Fanjiang ldquoApplyinghybrid learning approach to RoboCuprsquos strategyrdquo Journal ofSystems and Software vol 86 no 7 pp 1933ndash1944 2013

[11] S I Nishimura and T Ikegami ldquoEmergence of collectivestrategies in a prey-predator game modelrdquo Artificial Life vol 3no 4 pp 243ndash260 1998

[12] K-C Jim and C L Giles ldquoTalking helps evolving communi-cating agents for the predator-prey pursuit problemrdquo ArtificialLife vol 6 no 3 pp 237ndash254 2000

[13] T Haynes and S Sen ldquoEvolving behavioral strategies in preda-tors and preyrdquo inAdaption and Learning inMulti-Agent Systemsvol 1042 of Lecture Notes in Computer Science pp 113ndash126Springer Berlin Germany 1995

[14] D Hladek J Vascak and P Sincak ldquoHierarchical fuzzy infer-ence system for robotic pursuit evasion taskrdquo in Proceedings ofthe 6th International SymposiumonAppliedMachine Intelligenceand Informatics (SAMI rsquo08) pp 273ndash277 Herlrsquoany SlovakiaJanuary 2008

[15] R G Smith ldquoThe contract net protocol high-level commu-nication and control in a distributed problem solverrdquo IEEETransactions on Computers vol 19 no 12 pp 1104ndash1113 1980

[16] P-C Zhou B-R Hong Y-H Wang and T Zhou ldquoMulti-agent coopoerative pursuit based on extended contract netprotocolrdquo in Proceedings of the International Conference onMachine Learning and Cybernetics vol 1 pp 169ndash173 August2004

[17] A Rao and M Georgeff ldquoBDI agents from theory to practicerdquoin Proceedings of the International Conference on Multi-AgentSystems pp 312ndash319 1995

[18] J Piaget The Equilibration of Cognitive Structures The CentralProblem of Intellectual Development University of ChicagoPress 1985

[19] J Y Kuo S J Lee C L Wu N L Hsueh and J Lee ldquoEvolu-tionary agents for intelligent transport systemsrdquo InternationalJournal of Fuzzy Systems vol 7 no 2 pp 85ndash93 2005

[20] J Y Kuo and F C Huang ldquoA hybrid approach for multi-agentlearning systemsrdquo Intelligent Automation and Soft Computingvol 17 no 3 pp 385ndash399 2011

[21] S Gebhardt P Grant R von Georgi and M T HuberldquoAspects of Piagetrsquos cognitive developmental psychology andneurobiology of psychotic disordersmdashan integrative modelrdquoMedical Hypotheses vol 71 no 3 pp 426ndash433 2008

[22] S Takamuku and R C Arkin ldquoMulti-method learning andassimilationrdquo Robotics and Autonomous Systems vol 55 no 8pp 618ndash627 2007

[23] T Parsons ldquoPursuit-evasion in a graphrdquo in Theory and Appli-cations of Graphs vol 642 of Lecture Notes in Mathematics pp426ndash441 Springer Heidelberg Germany 1978

[24] R Breisch ldquoAn intuitive approach to speleotopologyrdquo South-western Cavers vol 6 no 5 pp 72ndash78 1967

[25] F Ferrari ldquoA new parameterized potential family for path plan-ning algorithmsrdquo International Journal on Artificial IntelligenceTools vol 18 no 6 pp 949ndash957 2009

[26] K Walsh and B Banerjee ldquoFast Alowast with iterative resolution fornavigationrdquo International Journal on Artificial Intelligence Toolsvol 19 no 1 pp 101ndash119 2010

[27] L-P Wong M Y H Low and C S Chong ldquoBee colonyoptimization with local search for traveling salesman problemrdquoInternational Journal on Artificial Intelligence Tools vol 19 no3 pp 305ndash334 2010

[28] E Raboin D S Nau U Kuter S K Gupta and P SvecldquoStrategy generation inmulti-agent imperfect-information pur-suit gamesrdquo in Proceedings of the International Conference onAutonomous Agents and Multiagent Systems pp 947ndash954 2010

[29] B PGerkey SThrun andGGordon ldquoVisibility-based pursuit-evasion with limited field of Viewrdquo International Journal ofRobotics Research vol 25 no 4 pp 299ndash315 2006

[30] N Basilico N Gatti and F Amigoni ldquoLeader-follower strate-gies for robotic patrolling in environments with arbitrarytopologiesrdquo in Proceedings of the 8th International Joint Confer-ence on Autonomous Agents and Multiagent Systems (AAMASrsquo09) pp 57ndash64 May 2009

[31] M Yan ldquoMulti-robot searching using game-theory basedapproachrdquo International Journal of Advanced Robotic Systemsvol 5 no 4 pp 341ndash350 2008

[32] K-DAlthoff ldquoCase-based reasoningrdquo inHandbook on SoftwareEngineering and Knowledge Engineering pp 549ndash587 2001

[33] E K Burke B L MacCarthy S Petrovic and R Qu ldquoMultiple-retrieval case-based reasoning for course timetabling prob-lemsrdquo Journal of the Operational Research Society vol 57 no2 pp 148ndash162 2006

[34] J Long D G Schwartz S Stoecklin and M K Patel ldquoAppli-cation of loop reduction to learning program behaviors foranomaly detectionrdquo in Proceedings of the International Confer-ence on Information Technology Coding and Computing (ITCC05) vol 1 pp 691ndash696 April 2005

[35] The Repast Suite The Repast Suite 2014 httprepastsource-forgenet

[36] L Matignon G J Laurent and N L Fort-Piat ldquoHysteretic Q-Learning an algorithm for decentralized reinforcement learn-ing in cooperative multi-agent teamsrdquo in Proceedings of theIEEERSJ International Conference on Intelligent Robots andSystems (IROS 07) pp 64ndash69 San Diego Calif USA October-November 2007

[37] I Partalas I Feneris and I Vlahavas ldquoA hybrid multiagentreinforcement learning approach using strategies and fusionrdquoInternational Journal on Artificial Intelligence Tools vol 17 no5 pp 945ndash962 2008

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 14: Research Article Multiagent Cooperative Learning ...downloads.hindawi.com/journals/mpe/2015/964871.pdf · Research Article Multiagent Cooperative Learning Strategies for Pursuit-Evasion

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of