cooperative behaviours with swarm intelligence in multirobot

21
Research Article Cooperative Behaviours with Swarm Intelligence in Multirobot Systems for Safety Inspections in Underground Terrains Chika Yinka-Banjo, 1 Isaac O. Osunmakinde, 2 and Antoine Bagula 3 1 Department of Computer Science, Faculty of Science, University of Cape Town, Private Bag X3, Rondebosch, Cape Town 7701, South Africa 2 School of Computing, College of Science, Engineering and Technology, University of South Africa (UNISA), P.O. Box 392, Pretoria 0003, South Africa 3 Department of Computer Science, Faculty of Science, University of Western Cape, Private Bag X17, Bellville 7535, South Africa Correspondence should be addressed to Chika Yinka-Banjo; [email protected] Received 6 February 2014; Revised 16 May 2014; Accepted 26 May 2014; Published 20 July 2014 Academic Editor: Leo Chen Copyright © 2014 Chika Yinka-Banjo et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Underground mining operations are carried out in hazardous environments. To prevent disasters from occurring, as oſten as they do in underground mines, and to prevent safety routine checkers from disasters during safety inspection checks, multirobots are suggested to do the job of safety inspection rather than human beings and single robots. Multirobots are preferred because the inspection task will be done in the minimum amount of time. is paper proposes a cooperative behaviour for a multirobot system (MRS) to achieve a preentry safety inspection in underground terrains. A hybrid QLACS swarm intelligent model based on Q- Learning (QL) and the Ant Colony System (ACS) was proposed to achieve this cooperative behaviour in MRS. e intelligent model was developed by harnessing the strengths of both QL and ACS algorithms. e ACS optimizes the routes used for each robot while the QL algorithm enhances the cooperation between the autonomous robots. A description of a communicating variation within the QLACS model for cooperative behavioural purposes is presented. e performance of the algorithms in terms of without communication, with communication, computation time, path costs, and the number of robots used was evaluated by using a simulation approach. Simulation results show achieved cooperative behaviour between robots. 1. Introduction Multirobot systems share the need to cooperate, creating the problem of modelling behaviour. When dealing with multiple robots, with randomness involved, the dynamic and unpredicted nature of the environment has to be considered. Hence, the behavioural modelling system has to cope with the random (dynamic and unpredictable) nature of the system. Researchers, on the other hand, have been captivated by this cooperative and coordinated problem of multirobot systems (MRS) in recent times. A list of literature on multiple robots’ cooperation implemented in space was reviewed in [1]. Using multiple robots to achieve tasks has been more effective than using a single robot. See for instance [2, 3] (and all references therein) for some specific robotic tasks. For a full discussion of underground mines, benefits, and disaster rates, see [4]. Safety is a major element in the under- ground mine; despite a significant reduction through safety research measures, the number of disasters in underground mines in South Africa and the world at large remains high [5]. To contribute to underground mine safety, that is, to prevent disasters from occurring, interacting autonomous multirobots are sent before mine operations resume to inspect how safe the underground mine is. is is achieved by assessing the status of the rocks, roofs, and level of toxic gases. A multirobot planning algorithm for tunnel and corridor environments is implemented in [6]. e overall problem formulation is implemented using a topological graph and spanning representation. Activities, such as a single robot Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2014, Article ID 678210, 20 pages http://dx.doi.org/10.1155/2014/678210

Upload: hoangkhanh

Post on 01-Jan-2017

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Cooperative Behaviours with Swarm Intelligence in Multirobot

Research ArticleCooperative Behaviours with Swarm Intelligence in MultirobotSystems for Safety Inspections in Underground Terrains

Chika Yinka-Banjo1 Isaac O Osunmakinde2 and Antoine Bagula3

1 Department of Computer Science Faculty of Science University of Cape Town Private Bag X3 RondeboschCape Town 7701 South Africa

2 School of Computing College of Science Engineering and Technology University of South Africa (UNISA)PO Box 392 Pretoria 0003 South Africa

3 Department of Computer Science Faculty of Science University of Western Cape Private Bag X17 Bellville 7535South Africa

Correspondence should be addressed to Chika Yinka-Banjo chikagoggmailcom

Received 6 February 2014 Revised 16 May 2014 Accepted 26 May 2014 Published 20 July 2014

Academic Editor Leo Chen

Copyright copy 2014 Chika Yinka-Banjo et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Underground mining operations are carried out in hazardous environments To prevent disasters from occurring as often as theydo in underground mines and to prevent safety routine checkers from disasters during safety inspection checks multirobots aresuggested to do the job of safety inspection rather than human beings and single robots Multirobots are preferred because theinspection task will be done in the minimum amount of timeThis paper proposes a cooperative behaviour for a multirobot system(MRS) to achieve a preentry safety inspection in underground terrains A hybrid QLACS swarm intelligent model based on Q-Learning (QL) and theAnt Colony System (ACS) was proposed to achieve this cooperative behaviour inMRSThe intelligentmodelwas developed by harnessing the strengths of both QL and ACS algorithmsTheACS optimizes the routes used for each robot whilethe QL algorithm enhances the cooperation between the autonomous robots A description of a communicating variation withinthe QLACS model for cooperative behavioural purposes is presented The performance of the algorithms in terms of withoutcommunication with communication computation time path costs and the number of robots used was evaluated by using asimulation approach Simulation results show achieved cooperative behaviour between robots

1 Introduction

Multirobot systems share the need to cooperate creatingthe problem of modelling behaviour When dealing withmultiple robots with randomness involved the dynamic andunpredicted nature of the environment has to be consideredHence the behaviouralmodelling systemhas to copewith therandom (dynamic and unpredictable) nature of the systemResearchers on the other hand have been captivated by thiscooperative and coordinated problem of multirobot systems(MRS) in recent times A list of literature on multiple robotsrsquocooperation implemented in space was reviewed in [1] Usingmultiple robots to achieve tasks has been more effective thanusing a single robot See for instance [2 3] (and all referencestherein) for some specific robotic tasks

For a full discussion of underground mines benefits anddisaster rates see [4] Safety is a major element in the under-ground mine despite a significant reduction through safetyresearch measures the number of disasters in undergroundmines in South Africa and the world at large remains high[5] To contribute to underground mine safety that is toprevent disasters from occurring interacting autonomousmultirobots are sent before mine operations resume toinspect how safe the underground mine is This is achievedby assessing the status of the rocks roofs and level of toxicgases

A multirobot planning algorithm for tunnel and corridorenvironments is implemented in [6] The overall problemformulation is implemented using a topological graph andspanning representation Activities such as a single robot

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2014 Article ID 678210 20 pageshttpdxdoiorg1011552014678210

2 Mathematical Problems in Engineering

drive and differential robots drive that can rotate in placeare carried out at different positions of the graph Differentmethods have been used to tackle coordination of MRSin different domains [7] Complete task coordination bymultirobots was handled [3 8] An extension of a market-based approach was used This was achieved by generalizingtask descriptions into tasks trees thereby allowing tasks to betraded in a market setting at a variable level of abstractionFigure 1 shows an overview of the inspection environment

We consider a scenario in which an MRS cooperatesto achieve a common goal of safety inspection in a par-tially observable environment such as the undergroundterrain Each robot requires to be guided using the proposedcooperative behavioural model Finding a plan to achievethis cooperative model often involves solving an NP-hardproblem This is so because it involves multiple interactingrobotsThe interactionwhich comes as a result ofminimizingtime involved in achieving underground inspection requiresreasoning among the robots In this case we use the QLtechnique to cleverly achieve the reasoning aspect of thiswork and combine it with optimal route finding using ACSThe two major questions answered in this problem are thefollowing (1)which stateroom should I inspect nowwithoutrepetition and a collisionwith another robot (2)Howdo I getthere using the closest link However all robots have partialknowledge of the environment and they are all equipped withthe necessary sensors required for the inspection The majorcontributions of this paper are as follows

(i) development of a hybrid QLACS swarm intelligentmodel based on Q-Learning and the ant colonysystem for addressing cooperative behaviours inMRSachieving underground terrain safety inspections

(ii) detailed experimental evaluations of the proposedQLACS model conducted on a simulated publiclyavailable underground tunnel Also the proposedmodel is benchmarked with related methods

(iii) systematic description of worked scenarios on anoptimal route finder andMRS cooperative inspectionusing QLACS for ease of implementation by systemengineers robotics researchers and practitioners

We do not know of any study that analytically describeshow a swarm intelligent model can easily be implementedand applied to a multirobot system in a heterogeneous envi-ronment In the next section we discuss related work in thearea of multirobot systems for underground terrainsrsquo safetyreinforcement learning (RL) and ant colony optimization(ACO) We then detail our proposed cooperative behaviourframework and approachThe experimental results and eval-uations of the safety inspections follow Finally we concludeby summarizing and discussing other future extensions

2 Theoretical Background

In this section we review some of the related work to MRSfor underground terrains safety which is the foundationaldomain for our research A look at RL algorithms and its

Gateway

Roof inspection (cracks crete loose rocks)

Obstacles (poles stones)

Router

Robot B

Robot A

Gas level (NH3 CO SO2)

Figure 1 Overview of MRS inspection in an underground mine

paradigms follows in Section 21 ACO and its paradigmapplied in MRS conclude Section 2

21 Related Multirobot Systems for Underground TerrainsSafety Effective exhaustive quick and safe navigation ofan MRS is subject to its ability to perceive interpret andinteract (cooperate) within their environment MRS can beused to achieve different tasks autonomously in structuredand unstructured environments In these environments theautomation of the operations is effected in different areasof MRS research biologically inspired robot teams commu-nication architectures and task planning localization andmapping object transportation and manipulation learningand so forth [9] However an environment such as theunderground terrain has been a grey application domain inMRS research

As indicated in Section 1 Peasgood et al [6] implementeda multiphase algorithm for multirobot planning in tunnelenvironments The algorithm as presented assumed a cen-tralized planning architecture They further compared themultirobot planning with a sequential planner Their futurework was to consider a decentralized planner architectureand might explore hybridizing the two planning algorithms

Thorp and Durrant-Whyte discussed a starter on fieldrobots [10] From their discussion field robotics involves theautomation of platforms such as air sea and land in harshunstructured environments such as underwater explorationmining agriculture and highways Field robots are made upof three parts navigation and sensing planning and controland safety Their work also discussed the challenges andprogress of field robots One of the major challenges of fieldrobots in harsh environments such as an underground mineis the problem of position determination (localization) Thisis so because the global positioning system (GPS) can onlyhelp in an environment where the sky views are guaranteedHowever progress has been made in automating some of theenvironments that are cooperatively constrained

The proposed model presented in [4] promised to handlethe safety part of field robots presented by Thorp andDurrant-Whyte [10] in underground mines Their modelarchitecture had three layers the first layer handles thecooperative behaviour of the model the second layer dealswith the scalability degree of the model and the last layer

Mathematical Problems in Engineering 3

Stepsinitialize 119876(119904 119886) arbitrarilyrepeat (for each episode)

initialize 119904Repeat (for each step of episode)

Choose 119886 from s using policy derived from 119876

Take action 119886 observe 119904 1199041015840

119876 (119904 119886) larr 119876 (119904 119886) + 120572 [119903 + 120574max1198861015840

119876 (1199041015840 1198861015840) minus 119876 (119904 119886)]

119904 larr 1199041015840

until s is terminal

Algorithm 1 Q-learning algorithm

handles the applicability of the model This paper is buildingon what has already been proposed in [4]

An investigation into automating the underground mineenvironment after blasting called ldquomaking saferdquo was carriedout in [11] to ensure the safety of the environment after blast-ing Blasting in an underground mine is the controlled use ofexplosives to excavate break down or remove rockThe needto investigate the stability of the environment after blastingbefore any mining operation takes place is of the highestpriority hence the reason for automation The automationwas centred on a persistent area of concern in South Africanunderground mine operation called hanging walls which iscaused as a result of rock burst and fall of ground Thereare also other persistence areas such as the levels of toxicgases which pose great disaster threats to the lives of minersfor instance heat sicknesses explosions pneumoconiosis(occupational safety and health-fall of ground managementin South Africa SAMRASS-code book for mines) and soforth Some of these disasters might result in fatalities andordisabilities Again when an accident happens during miningoperations rescuers find it difficult to respond immediatelyto accidents Looking at the aforementioned concerns therewill be a need to create models for safety inspection ofunderground mine operations For instance monitoring theunderground mine environment for detecting hazardousgases andor smoke should be one of the important safetymeasures Continuousmonitoring of workers and equipmentis another crucial safety measure [5] Picking up the soundfrom roof cracking to monitor when a roof is about to fall isalso a safety item

22 Reinforcement Learning For a robot to operate in aharsh unstructured environment considering every possibleevent in defining its behaviour is intricate [12] It is howeveressential to develop robots that can conform to changes intheir environment RL is one of the artificial intelligence(AI) algorithms that can achieve learning by experienceThis enhances robotsrsquo interaction with the environment Weinvestigate the use of RL to assist the behaviours of an MRSin safety inspection of underground mines RL is a sequenceof actions and state transitions with some associated rewardsWhat is being learned in RL is an optimal policy (what is theright thing to do in any of these states (119904)) [13]

At any time step each robot is in a specific state in relationto the environment and can take one of the following actionsinspect ignore or shutdown Each robot receives feedbackafter performing an action which explains the impact of theaction in relation to achieving the goalThe effect of the actioncan be either a good or bad reward This reward is measuredin terms of valuesTherefore the value of taking an action 119886 inany state 119904 of the underground terrain is measured using theAction-Value function called 119876-value 119876120587(119904 119886) When a robotis starting from state 119904 taking action 119886 and using a policypi(120587) an expected return which is defined as the sum of thediscounted rewards is achieved

In this research Q-learning a method of RL is exploredThe purpose of RL methods is to study 119876

120587(119904 119886) valuesso as to achieve optimal actions in the states QL is anonline RL method that requires no model for its applicationand stores the reinforcement values outcome in a look-uptable The QL architecture used in this work consists oflearning threads which amount to the number of robotsinvolved in the inspection behavioural task Each robot inthe learning thread carries out Q-learning in the environ-ment Algorithm 1 explains the QL algorithm used in thebehavioural model

120572 is the learning rate set between 0 and 1 At 0 119876-valuesare never updated hence nothing is learned learning canoccur quickly at 1 120574 is the discount rate set between 0 and 1 aswell This models the fact that the future rewards are worthless than the immediate rewards max

1198861015840 is the maximum

reward that is attainable in the state following the currentstate That means the reward for taking the optimal actionthereafter

QL is a competitive and search-based algorithm inspiredby computational theory It is not necessarily a multiagentalgorithm but can be adapted to a multiagent or multigoalscenarioThe success of this algorithm relies on the value andpolicy iterations which can be adjusted by some unfairness(heuristics) to fit the current problem scenario The mostcompetitive action is selected by its value and action leads toanother state or condition Both value and policy are updatedafter each decision Harnessing QL for an MRS scenarioincreases the cost exponentially and the overall performancedrops in the same direction As the robots increase cost suchas completion time memory usage and awareness factor

4 Mathematical Problems in Engineering

Table 1 Variant of ACO

SN Year Algorithm1 1991 Ant system (AS)2 1992 Elitist AS3 1995 Ant-Q4 1996 Ant colony system5 1996 Max-Min AS (MMAS)6 1997 Ranked based AS7 1999 ANTS8 2000 BWAS9 2001 Hypercube AS

(other robots in the environment) search time increasesHowever following our heuristic model of QL which wasmainly determined by the policy we set to achieve our goalour QL performs well above traditional QL

23 Ant Colony Optimization ACO is a type of swarmintelligence (SI) Bonabeau et al [14] defined SI as any attemptto design algorithms or distributed problem-solving devicesinspired by collective behaviour of social insect coloniesand other animal societies This implies that anytime some-thing is inspired by swarms it is called swarm intelligenceResearchers have been thrilled by swarms because of someof their fascinating features For instance the coordinatedmanner in which insect colonies work notwithstandinghaving no single member of the swarm in control thecoordinatedways inwhich termites build giant structures andhow the flocks move as one body and the harmonized waysin which ants quickly and efficiently search for food can onlybe attributed to an emergent phenomenon [15 16]

Ants an example of a social insect colony achievetheir self-organizing robust and flexible nature not bycentral control but by stigmergy Stigmergy also known as apheromone trail describes the indirect communication thatexists within the antrsquos colony The indirect communicationwhich is triggered by pheromone trails helps in recruitmentof more ants to the system Ants also use pheromones to findthe shortest paths to food sources Pheromone evaporationprevents stagnation which also helps to avoid prematureconvergence on a less than optimal path [17]

ACO is necessarily amultiagent based algorithmThefirstimplementation of this optimization algorithm is in a classicalsearch based (combinatory) problem the travelling salesmanproblem giving the shortest path to a specified destinationAfter this feat many researchers have used it or its variantsto model different problems In this work a variant of ACOis used to find the optimal path for MRS Table 1 describesvariants of ACO and their meaning [18] see also [19] and allreferences therein

In AS the pheromones are updated by all the ants thatcomplete a tour ACS is the modified version of AS whichintroduces the pseudorandom proportional rule In elitistAS ants that belong to the edges of the global best tour getan additional amount of pheromone during the pheromone

update MMAS introduces an upper and lower bound to thevalues of the pheromones trails All the solutions are rankedto conform to the antsrsquo length in ranked-based AS [20]

Our interest is in implementing ACS to find the best pos-sible round trip inspection path in our model environmentThis best possible inspection path will be supplied as inputor made available for the robots using QL for inspectiondecisions In most of the route finding scenarios the nodesare linearly joined and are not largely distributed This isa scenario where ants can forage along at least two paths(multigoal path environment though in our case there is apath exposed to four different goals see Section 3)

3 Proposed Cooperative MRSBehaviour Framework

The following are some of the factors we are considering inthe course of building the model Each robot has to learnto adjust to the unknown underground mine environmentIn this case each robot requires intelligence and a suit-able machine learning algorithm is adopted No robot hasglobal information of the environment because of its limitedsensing capabilities Each robot has limited communicationcapabilities therefore each robot has to keep track of theinformation of others to remain in a team Figure 2 describesthe proposed framework as an approach for building thedistributed coordinated inspecting model for MRS

The framework indicates three layers of the model thebottom layer the middle layer and the topmost layer Thelearning capability of the MRS is achieved in the bottomlayer with the reinforcement learning algorithm and swarmintelligence technique This intelligence enables robot A totake action knowing the action of robot B and vice versa Atthe middle layer scalability in terms of the number of robotsthe system can accommodate is achieved This is expedientbecause a team of robots tends to achieve tasks more quicklyand effectively than single robots This scalability is handledwith some memory management techniques as indicated inFigure 2The real life implementation is achieved by using theinformation acquired from the topmost layer Figure 3 is thebreakdown of the framework in Figure 2

There is a base-station or server that serves as a backupfor the information captured and analysed from individualrobots The model proposed in this research deals withthe way in which robots need to find their way withincommunication range and uses a broadcast approach foreffective communication of navigation status A team ofrobots cooperatively inspecting an area in the undergroundmine will need to know where they are and where togo next so it is obviously a continuing problem and ourcontribution in this case is that before robot R1 takes anaction it broadcasts its location and inspection status toother robots R2 R3 and so forth and vice versa Anunreachable robot receives packets of information based onthe destination address through rerouting from the nearerrobotsThe reliability of this broadcastmethod is the ability todetermine the extent of the task executed already by lookingat the memory of any leading robot in the team

Mathematical Problems in Engineering 5

MRS mine inspection Miners for operation

Scalable system

Global databaseMemory usage

IntelligenceQLACS

ACSQL

Gas meter Local database Roof sound sensor

Graph

MRS distributedknowledge baseaccess simpledecision makinginterrobotcommunication

Appl

icat

ion

laye

rTo

pmos

t lay

erSc

alab

ility

laye

rM

iddl

e lay

erC

oope

rativ

e beh

avio

ural

laye

rBo

ttom

laye

r

Figure 2 Framework of the proposed QLACS model for cooperative behaviours

Table 2 State and possible actions of the environment

State Possible Actions1 A (lower left part (LLP)) Inspect ignore2 B (lower middle part (LMP)) Inspect ignore3 C (lower right part (LRP)) Inspect ignore4 D (middle left part (MLP)) Inspect ignore5 E (central part (MCP)) Inspect ignore6 F (upper left part (ULP)) Inspect ignore7 G (upper right part (URP)) Inspect ignore8 H (outside mine part (OMP)) Shutdown

31 Problem Formulations Suppose we have seven roomsstates connected by doorslinks representing undergroundmine regions as shown in Figure 4 and labeled as shown inTable 2 We label each room 119860 through 119865 The outside of themine can be thought of as one big room (119867) Notice thatdoors 119865 and 119862 lead outside the mine 119867 We put two robotsin rooms 119865 and 119862 as their starting states respectively Therobots inspect one state at a time considering the obstaclesencountered in the process The robots can change directionfreely having linksdoors to other roomsstates In each ofthe states in Figure 4 two actions are possible inspection of

roof cracks (RC) and level of toxic gases (TG) or ignoringthe state as it has been inspected earlier According toour proposed model a robot can inspect at least half ofthe given underground mine region to attain its maximumperformance which in turn attracts a good reward Whenthe current state of a robot is the end point of the inspectiontask the robots exit the mine using the exit points 119862 and119865 respectively The possible transition states of the robotsare displayed using the state diagram in Figure 5 The statediagram and the transition matrix are formed using the QLalgorithm

The global map is assumed to be known by the robotsbut there is no prior knowledge of the local map Robotsonly know what they have sensed themselves and whattheir teammates communicate to them Not only does ourimprovedQL depend on themap of the environment but alsoeach robot learns through experience about local changesThey explore and inspect from state to state until they getto their goal states Our proposed model QLACS achievesan offline mode for the route finding algorithm (ACS) Thismeans that the global view of the map would have beenprovided before the learning robots start

The model is explained using simulations the results arepresented in Section 4 Some of the results also have graphicalinterpretations Analysis by a state transition diagram is

6 Mathematical Problems in Engineering

Table 3 Initial reward matrix

Robotrsquos actionA B C D E F G H

Robotrsquos state

A mdash 50 100 mdash 50 100 mdash mdash mdash mdashB 50 100 mdash 50 100 mdash 50 100 mdash mdash mdashC mdash 50 100 mdash mdash mdash mdash 50 100 150D 50 100 mdash mdash mdash 50 100 50 100 mdash mdashE mdash 50 100 mdash 50 100 mdash 50 100 50 100 mdashF mdash mdash mdash 50 100 50 100 mdash 50 100 150G mdash mdash 50 100 mdash 50 100 50 100 mdash mdashH mdash mdash 50 100 mdash mdash 50 100 mdash mdash

Bottom layer‐ A new hybrid model QLACS

‐ An efficient cooperative andnavigational model

Middle layer‐ An efficient scalable system

Topmost layer‐ Acquired results stored for future use

(a) (b)

Shortest round trip pathsGood communicationTime of performance

Number of iterations usedThorough inspection

Exit the system

Are theysuccessful

Monitor memorywith the

utilizationinterface platform

Record the results

Acquire informationfrom the bottom

layer

Build a hybrid modelusing QL and ACS

algorithms

Two robots

Test robots for

Add one morerobot to the

system

Results will beused in the

application layer

Database

Yes No

Database

Figure 3 Breakdown of the framework (a) Contributions of the framework Layer by layer and (b) processes of the multirobot behaviouralsystem

presented in Figure 5 The possible state actions of the robotsare presented in Table 3 The states of the robots are reducedas follows searching or inspecting for roof cracks and toxicgas levels in each state or room and recording the outcomeof the inspection in the robotsrsquo memories The processesinvolved in achieving good communication while the robotsinspect the states are broadcasting inspected states to theother robots and ignoring the inspectedbroadcasted stateavoiding collision with obstacles and other robots and finallymoving to the next available state that has not been inspected

The Q-learning algorithm is used to determine whataction is to be selected in any particular state Let theaction (119886) = inspect ignore shutdown state space (119904) =dimensions in the topological map sensor readings (119904

119903)

hazardous conditions (119867119888) = roof crack (RC) toxic gas level

(TG) Each robot is configured with an aerial scanner andchemical sensitive sensor that will provide readings (119904

119903)

to determine if there is a hazardous condition 119867119888 in the

environment The selection of an action by a robot throughthe Q-learning algorithm is based on the information fromthe broadcast The whole framework uses a decentralized 119876-learning scheme such that each robot has its own threadwith its Q-learning and table All 119876-values on the tables ofthe robots in the team are initialized to zero that is statescorresponding to such positions have not been inspectedWhen a robot enters a new state it broadcast its informationfor the current state to all other robots in the environmentThe broadcast indicates whether the current state has been

Mathematical Problems in Engineering 7

F

H

Houtside

Robot Bstarting position

Robot Astarting position

Linkbw A and B

mine

D

A B

E

G

C

Figure 4 Model of the environment with two entrances and exits(2EE)

F

BA (50 100)

(50 100)

(50 100)

(50 100)

(50 100) 150

150

(50 100)

(50 100)

(50

100

)

(50

100

)

(50

100

)

D E G

C

H

Goal state

Figure 5 Events and transition diagram of the modeled environ-ment with119867 as goal state

inspected or not The broadcast is a search of correspondingpositions in the memories of all visible robots If the resultantsearch returns a zero the broadcast indicates that the statehas not been inspected and if it returns a value greater thanzero it indicates that the state is not available for inspectionAll robots must receive a broadcast before they act in anyparticular state When a robot gets to a state it receives abroadcast and passes its value to the QL algorithm to makea decision The robot carries out the decision of the QLalgorithm The policy of the Q-learning algorithm makes arobot carry out an inspection if the resultant search of thebroadcast returns zero and ignores it if the resultant searchis greater than zero A robot only shuts down if all states havebeen visited and inspected As the broadcast information ispassed to the Q-learning algorithm the policy is iteratedtowards an optimal value and condition

The broadcast avoids the cost of multiple inspections andthe QL ensures that robots take appropriate actions Theonly state in which inspection is carried out would havesensor readings (119904

119903) indicating 119867

119888 For taking an action (119886)

in state (119904) the robot gets a reward (119877) which is used in(10) to compute the 119876-value for the current state and cansend a broadcast to other robots Figure 5 and Table 3 showthe restrictions and possible transitions among node pointsindicating the possible rewards for any particular action (119886)

in any state (119904) Every other transition besides the goal state

could result in a reward of 50 or 100 and at completionthe reward is the maximum 150 We consider it wasteful tomake the robots scan first before sharing intelligence becausesuch action would slow them down and make them expendmore energy Robots are to provide the inspection resultsshowing actions taken in different states and the nature ofconditions detected in any particular state The introductionof the broadcast approach to determine the teamrsquos explorationreduces the execution time and energy cost of the wholeteamandmakes the collaboration effective So in amultirobotscenario the task can be effectively executed if the robots aremade to share intelligence as they progress Robots do nothave to waste time and energy in doing the same task alreadycarried out by other robots in the team

GIVEN On a mathematical justification of the above sup-pose a safety preinspection of toxic gases or rock fall or somecombination of the two is being carried out on a piece ofcomplex underground terrain in Figure 1 say 119871Km2 thereis a limited number of MRS with different capacities 119877 andprecious inspection time 119879minutes Every regionstate 119909

1in

the terrain requires a capacity of robot1198771ofMRS and limited

time 1198791while every state 119909

119899requires robot 119877

119899of MRS and

inspection time 119879119899 Let 119875

1be the positive reward of QL for

correct behaviour on state 1199091and let 119875

119899be the reward on

state 119909119899 This research aims to maximise positive rewards by

choosing optimal values for states 1199091 119909

119899as follows

Maximise

1198751sdot 1199091+ 1198752sdot 1199092+ sdot sdot sdot + 119875

119899sdot 119909119899

(objective function)

Subject to constraints

1199091+ 1199092+ sdot sdot sdot + 119909

119899le 119871

(limited total states)

1198771sdot 1199091+ 1198772sdot 1199092+ sdot sdot sdot + 119877

119899sdot 119909119899le 119877

(limited MRS capacity)

1198791sdot 1199091+ 1198792sdot 1199092+ sdot sdot sdot + 119879

119899sdot 119909119899le 119879

(limited inspection time)

Non-negativity constraints

1199091ge 0 119909

2ge 0 119909

119899ge 0

(MRS cannot inspect negative states)

(1)

In terms of solving this optimization problem we use theproposed QLACS model to compare the time and number ofstates inspectedThe number of robots used is also comparedThe graphs results in Section 4 also give more insight into thesolution of the problem

32 Basic Navigation and Cooperative Behaviours UsingQLACS QLACS has two components The first componentis formed by an improved ACS and the second component

8 Mathematical Problems in Engineering

is formed by an improved QL The improvement occursbecause some heuristics were added to the ordinary QL andACS to achieve the hybrid QLACS However the secondcomponent ofQLACS which is an improvedQL was initiallyused to solve the proposed problem After much analysis werealized that the system needs to be optimized for effectivecooperation and communication

Using the second component of QLACS to solve the basicnavigation and cooperative behaviour the possible actionswere set for each robot as inspect ignore and shutdown (afterreaching the goal state 119867) Also a reward system that willreflect the possible actions of the robots was chosen In otherwords a robot gets 150 points only when the goal is achieved(shutdown) 100 points for ignoring an already inspected area(ignore) and 50 points for inspecting an uninspected area(inspect) Figure 5 shows the transition events of the modeland Table 3 displays the possible state action for each robotThe way the QLACS second component works here is basedon both navigation and communication behaviours

Achieving navigational behaviour with the second com-ponent of QLACS has some cost associated to it In ourscenario because we want the robots to share intelligenceby broadcasting results (robots search through other robotsrsquomemory) our problem is not solely navigational but alsocooperative Our behavioural actions are inspect ignoreand shutdown We noted that these actions of interest arenot navigation oriented there is no way we could usethem in making decisions on transition The functions fordecision can be integrated to assist the other Thereforeour behavioural model is an integration of two behaviours(1) navigational behaviour and (2) cooperating behaviourthrough decision making The integration works with aroute finding method called RandomStateSelector which weintroduced in the second component of QLACS to helpdetermine where the robot goes from the starting point tothe exit point Two parts of the RandomStateSelector methodare introduced in this work The first one is the Random-StateSelector C H which is used to transit from state 119862 to119867and the second one RandomStateSelector F H transits fromstate 119865 to 119867 This method works but not effectively becausesome of the states are repeated several times because of therandom selection method However the decision part of thissecond component of QLACS which is handled by a methodcalled CheckInspection worked efficiently CheckInspection isresponsible for sharing the broadcast among the agents Thebroadcast looks at all the stored 119876-values on all the robotsand returns a signal for the action that needs to be takenTherefore we concluded that the heuristically acceleratedcomponent of QLACS has proven to be effective by showingevidence of effective communication in making inspectiondecisions using our model It did not guarantee shortest pos-sible time for inspection because of repeated states decisionsIn this light we only harnessed the communication strengthof the second component of QLACS for communicationand cooperation Figure 5 and Table 3 form the basis for theQLACS second component

To take care of the random state selector problem encoun-tered in implementing the algorithm used for the secondpart of QLACS we introduced an optimized route finder

Table 4 Combined adjacency and weight matrix

119865 119866 119864 119863 119860 119861 119862 119867

119865 0 1 1 1 0 0 0 1119866 1 0 2 0 0 0 1 0119864 1 2 0 2 0 2 0 0119863 1 0 2 0 2 0 0 0119860 0 0 0 2 0 2 0 0119861 0 0 2 0 2 0 1 0119862 0 1 0 0 0 1 0 1119867 1 0 0 0 0 0 1 0

1

11

1

12

2 2

2

2

E

C

G

B

A

D

F

Figure 6 Weighted mapgraph of the model environment

algorithm This route finder algorithm which forms the firstcomponent of QLACS is a swarm intelligence techniqueFigure 6 andTable 4 form the basis of the exploration space ofthe agents (ants) which achieved the optimum route findingThe weighted graph in Figure 6 is based on the number ofobstacles the ants will encounter from the points of entry 119865and 119862 The combined table in Table 4 contains the weightsof the obstacles and evidence of an edge between any twovertices (states) It shows that there is a connection betweenany two vertices in a graph Generally a ldquo1rdquo or ldquo2rdquo depictsthe existence of an edge while a ldquo0rdquo represents the absenceof an edge that is no transition between such vertices Theconstructed graph is an undirected multigraph providingevidence of some agents coming from 119865 or 119862 of the mine(logical space) It is unidirectional because agents can movein any particular direction (multigraph) This means that thesameweights on the edges apply to both directionsThe graphdoes not show119867 we assume that once the agents reach 119865 or119862 they exit if all inspections have been done

33 HybridModel Development Theparameters for buildingthe analytical hybrid QLACS (equations (2) through (11))model are presented in Tables 5 and 6

331 Algorithmic and Mathematical Analysis

ACS Starts Computation of edge attractiveness

120578119894119895=

1

119863119894119895

(2)

Mathematical Problems in Engineering 9

Table 5 A list of parameters for the ACS model

ACO parameters Meaning120572 Pheromone influence factor120573 Influence of adjacent node distance120588 Pheromone evaporation coefficient119876 Attractiveness constant119890 Visited edge1198901015840 Edge not visited119871119896 Length tour of ant k

120591 Pheromone concentration (amount)120578 Specific visibility function (attractiveness)Δ120591119896 Pheromone concentration by Kth ant

119875119903(119894 119895) Probability of moving from i to j

119863119894119895 Visibility or distance between i and j

119891119894 Fitness of individual in a population119875119894 Probability of being selected among 119891

119894

119873 Number of individuals in the population119894 119895 Denotes any two adjacent nodes in the graph119872119896 Set of unvisited nodes

Table 6 A list of parameters for the QL model

QL Parameters Meaning119876 119876-value update119904 State119886 Action119877 Reward120574 Learning rate

Computation of instantaneous pheromone by ant 119896

Δ120591119896=

119876

119871119896

(3)

Update of pheromone

120591119894119895= (1 minus 120588) lowast 120591

119894119895+ Δ120591119896

119894119895 (4)

Computation of edge probability

119875119903(119894 119895) =

[120591119894119895]120572[120578119894119895]120573

Σ1198901015840=(119894119895)

[120591119894119895]120572[120578119894119895]120573 (5)

Adoption of roulette wheel

Cumulative (119875119903(119894 119895)) =

119873+1

sum119894=1

119875119903(119894 119895) (6)

119891119894=sum119873

119895=1119891119895

119873 (7)

119875119894=

119891119894

sum119873

119895=1119891119895

(8)

Equations (2) through (8) build the complete route find-ing model Equations (2) through (4) are prerequisite to (5)Equation (5) is prerequisite to roulette wheel selection At theend of (8) new states are selected and the trail is updatedThebest path from both directions is selected and used as inputin Q-learning Note that 119863

119894119895= weight on edges 119875

119903(119894 119895) =

chance of moving to a node 119864(119868 119895) 119871119896= sum of visited nodes

by ant 119896

QL Starts Each robot in its QL threadcomputes its learning rate

120574 =05

[1 + Frequency (119904 119886)] (9)

119876-values are updated

119876 (119904 119886) = 119877 (119904 119886) + 120574 (10)

making a broadcast (Decision = InspectIgnoreShut-down)

119876 (119904 119886) =

119876 = 0 Inspect if 119904119895

= goalstate119876 gt 0 Ignore if 119904

119895= goalstate

119876 ge 0 Shutdown if 119904119895= goalstate

(11)

Equation (9) is Gamma the learning rate which is alwaysbetween zero and 1 This equation is calculated based onthe frequency of action of each robot in inspecting statesEquations (9) to (11) are state-dependent The states are keptin a buffer and then accessed at run time ACS and QL do notwork simultaneously ACS works to completion and QL takesthe final output as its input ACS is not repeatedly called whileQL is working

Our behavioural model is an integration of two algo-rithms (1) route finding algorithm (2) communication andcooperative algorithm The integration works the followingway The optimal route finder (ACS) determines where therobot goes from the starting point to the destination whileQL determines what action it takes when it gets to any ofthe states This collaboration works effectively because theoptimal route finder has been proven to give the best possibletransitions and the heuristically accelerated QL has provento be effective by showing evidence of effective commu-nication in making inspection decisions Consequently itguarantees the shortest possible time for inspection in theabsence of wasteful inspection decisions This frameworkforms the basis for our cooperative behaviourmodel forMRS(QLACS)Thepseudocode for the implementation ofQLACSis outlined in Algorithm 2

332 QLACS Hybrid Approach Evolution Figure 7 is thearchitecture of hybrid QLACS explaining the handshakebetween the two components

(i) Graph Construct Module This is the interconnection ofstates in our model environment (see Figure 4) It encom-passes all possible transitions from 119865 to 119862 and vice versaThus from themodelwe construct an adjacencyweight graphmatrix that can be traversed by any graph-oriented algorithm

10 Mathematical Problems in Engineering

INPUT Edge distance(obstacles) pheromones antsrsquo trail associated probabilitiesstarting and terminating indexes that is from F or COUTPUT Effective cooperation inspection and navigation

(1) Boolean CompletedFlag = False Boolean variable indicates completion for all the threads(2) Declare CompletedThreadBuffer Data structure stores Info about completed thread(3) Initialize all 119876values to Zero All 119876values positions are initialized to zero(4) Initialize Best Paths From ACO algorithm starting from 119865 and 119862(5) While (CompletedFlag ltgt True) Checks for when all robots have finished and flag is true

BeginCreate N number of Threads in ParallelThreads get Current States in Parallel from ACO algorithmThreads get Next States and Indexes in Parallel from ACO algorithmThreads compare 119876values of all corresponding positions of Current States (Each Robot Broadcast119876value info)IF ((Q == 0)amp (ThreadNextState ltgt GoalState)) Checks if a particulate state is availableBeginState is Available Robot with the CurrentThreadID InspectsCompute and Update 119876value

EndIF (Q gt 0) checks if a state is not available because an already inspected state has 119876 gt 0BeginState is already inspected Robot with the CurrentThreadID IgnoreCompute and Update 119876value

EndIF ((Q == 0) amp (ThreadNextState == GoalState)) Checks for goal state and shuts downBeginCompute and Update 119876valueGoal state is reached and Inspection CompletedThread with the CurrentThreadID Shuts downStore CurrentThreadID in CompletedThreadBuffer

EndIF (Count [CompletedThreadBuffer] == NumberOfRobot) Learning stops when this happensBeginCompletedFlag = True

EndEnd of While Loop

Algorithm 2 Pseudocode for QLACS

No

No

Yes

Yes2

robots

Graphconstruct

Initializeparameters

module

Stateselectionmodule

Stateaction

module

QLACSupdatemodule

Statebroadcasting

module

Goal statemodule

New statemodule

Decisionmodule

Cooperativeinspection

module

Convergencedecisionmodule

Transitionmodule

Updatemodule

Figure 7 Hybrid architecture

Mathematical Problems in Engineering 11

Example I of QLACS (Using Figure 5 for optimized route finding)Parameters used in the first component of QLACS119876 = 20 120572 = 3 120573 = 2 120588 = 001

Starting State 119865Terminating State 119862119865Terminating condition When all states have been visited at least once and it is at terminating stateState Space = 119865 119866 119864119863 119860119861 119862Initialize pheromones positions to 001Rand = (0 1) returns a random value between 0 and 1Equations(i) Computation of attractiveness 120578

119894119895using (2)

(ii) Computation of instantaneous pheromones by ant 119896 for all potential states using (3)(iii) Pheromone update using (4)(iv) Computation of probabilities for the potential states using (5)(v) Adaptation of roulette Wheel using (6)Equation (5) can be reduced to Let 119908(119894 119895) = [120591

119894119895]120572

[120578119894119895]120573

Sum =119873+

sum119894=1

119908 (119894 119895)

So 119875119903(119894 119895) = 119908

sum

Algorithm 3 Parameters for QLACS Example I

In our case there are eight states primarily seven internalstates and a common terminating state Since the states arenot just linear the environment allows for multiple optionsthat is an agentant can choose from any present state Thistype of scenario is also called a multigoal scenario

(ii) State Selection Module Here the agents select the startand end states which according to our model in Figure 4 are119865 and 119862 These states are selected based on the cumulativeprobability of two adjacent nodes in (6)

(iii) Transition Module This module takes care of the tran-sition rules of the agents by calculating the pheromoneconcentrations distances and probabilities using (2) through(4)

(iv)UpdateModuleAfter transition fromone state to anotheran update of the pheromone is computed after whichmultipath planning with shortest path is achieved

(v) Convergence Decision Module This is where the besttrailpath decision is takenThis is the end product of the firstcomponent of QLACS which is then moved to the secondcomponent of QLACS for cooperative behaviour

(vi) Cooperative Inspection Module This is where the robotsare deployed to start inspection The robots are to use theacquired best paths starting from 119865 and 119862 respectively asinput for the second component of QLACS The two robotsare to use the learning rate from (9) to learn the environmentand use (10) for cooperation

(vii) State Broadcasting Module This module handles thebroadcasting behaviours of the two robots which areachieved by using (10) Each robot checks its memoryrepresented by 119876-values before taking any decision

(viii) State Action Module State broadcasting by each robot isimmediately followed by action selection In other words thestate to inspect or ignore is achieved here using (11)

(ix) QLACS Update Module After each action selection the119876-values of each robot are updated using (10)

(x) New State Module This module takes care of the robotrsquosnew state after updating the 119876-values Each robot runs in itsown threads managing its memory yet sharing information

(xi) Final Decision ModuleThis decision module determinesif the robot should exit the environment or still do moreinspections It is also controlled by (11)

(xii) Goal State Module The completion of the second com-ponent of QLACS is getting to the goal state after successfulinspection of statesThis goal state according to our model inFigure 4 is called119867

333 Analytical Scenario For the benefit of robotic andsystem engineers practitioners and researchers this paperuniquely presents scenarios on the ease of implementation ofthe proposed QLACS as described in Algorithms 3 and 4 andTables 7 and 10 The first component of QLACS is explainedin Example I shown in Algorithm 3 and Table 7 The firstcomponent that achieved optimal route paths for the robotshas the parameters listed in Algorithm 3 The calculation fordifferent states transition for the first component of QLACSis analysed in Table 7

Repeat steps 1ndash6 for subsequent current states until thetermination condition and state are reached At the end ofseven updates we have Tables 8 and 9 The total lengthshown in Table 8 represents the total number of obstaclesencountered by each agent while trailing to achieve theoptimal route for the robots The number of obstacle shown

12 Mathematical Problems in Engineering

Table7QLA

CSEx

ampleI

navigatio

nalanalytic

alsolutio

n

Startin

gsta

te119865

Potentialstates119863119864

119866(1)U

se(2)120578119865119863=1120578119865119864=1120578119865119866=1

(2)U

se(3)119871119896=1Δ120591119896(119865119863

)=21

=2Δ

120591119896(119865119864)=2Δ

120591119896(119865119866

)=2

(3)U

se(4)120591 119865119863=(1minus001)lowast001+2=20099=201120591119865119864=201120591119865119866=201

Firstp

heromon

eupd

ate

119865119866

119864

001

201

201

119863119860

119861

201

001

001

119862

001

(4)U

se(5)119908

(119865119863

)=[120591119865119863]120572[120578119865119863]120573=(201)3=812119908

(119865119864)=812119908

(119865119866

)=812

Sum

=119908(119865119863

)+119908(119865119864)+119908(119865119866

)=2436

119875119903(119865119863

)=

119908 sum

=812

2436=033119875119903(119865119864)=033119875119903(119865119866

)=033

Prob

abilitie

s119865

119866119864

0033

033

119863119860

119861

033

00

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0033

033

119863119860

119861

033

00

119862 0

CallRa

ndRa

nd=007R

andfalls

instate119864

Rou

lette

wheelselects119864

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

Currentstate119864

Potentialstates119861119863

119866(1)U

se(2)120578119864119861=12120578119864119863=12120578119864119866=1

(2)U

se(3)119871119896=(119865

minus119864minus119863)=1+2=3119871119896=(119865

minus119864minus119861)=1+2=3119871119896=(119865

minus119864minus119866)=1+2=3

Δ120591119896=2 3=067

(3)U

se(4)120591 119864119861=(1minus001)lowast201+067=266120591119864119863=266120591119864119866=266

Second

pherom

oneu

pdate

119865119866

119864

001

266

201

119863119860

119861

266

001

266

119862

001

(4)U

se(5)119908

(119864119861)=(266)3lowast(12)2=471119908

(119864119863

)=471119908

(119864119866

)=(266)3lowast(1)2=1882

Sum

=471+471+1882=2824

119875119903(119864119861)=

119908 sum

=471

2824=017119875119903(119864119863

)=

471

2824=017119875119903(119864119866)=1882

2824=067

Prob

abilitie

s119865

119866119864

0067

0

119863119860

119861

017

0017

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0067

0

119863119860

119861

017

0017

119862 0

CallRa

ndRa

nd=09Ra

ndfalls

instate119863

Rou

lette

wheelselects119863

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

119863

Mathematical Problems in Engineering 13

Example II of QLACS (For good cooperation and communication between robots)Parameters used in the second component of QLACS (Using output from the first component of QLACS)Reward Scheme Inspect = 50 Ignore = 100 Shutdown = 150State space Optimized path from QLACS R1 = 119865 119864119863 119860 119861 119862 119866 119862 and QLACS R2 = 119862 119861 119860119863 119865 119864 119866 119862

Starting State 119865119862119878119895= Terminating State 119862 then119867

Terminating condition When all states have been visitedInitialize 119876value positions to zerosEquations(i) Compute learning rate using (9)(ii) Compute update on 119876(119904 119886) using (10)

Algorithm 4 Parameters for QLACS Example II

Table 8 Pheromone update of a full cycle

Pheromone updateCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 001 201 201 201 001 001 001 1st update119864 001 266 201 266 001 266 001 2nd update119863 313 001 303 001 303 001 001 3rd update119860 001 001 001 045 001 045 001 4th update119861 001 001 067 001 067 001 07 5th update119862 001 091 001 001 001 089 001 6th update119866 071 001 069 001 001 001 071 7th update119862 = terminating state and ant 119896 has moved through all states at least onceTrail 119865119864119863119860119861119862119866119862Number of obstacles = 1 + 2 + 2 + 2 + 1 + 1 + 1 + 1 = 10

Table 9 Probability update of a full cycle

Probability tableCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 0 033 033 033 0 0 0119864 0 067 0 017 0 017 0119863 069 0 016 0 016 0 0119860 0 0 0 05 0 05 0119861 0 0 016 0 016 0 068119862 0 052 0 0 0 049 0119866 045 0 0 0 0 0 045119862 = terminating state

as 10 is calculated using the trail result in Table 8 Table 9displays the probability update table for a full cycle of routefinding In the case of this first example the first robot willterminate its inspection through 119862 and then119867

Algorithm 4 displays the parameters of the second exam-ple of QLACS The second component of QLACS handlesthis aspect of the model which is the communication andcooperative part Once the first component hands the outputto the second component it becomes the second componentinput and it runs with it From Algorithm 4 QLACS R1represents the input for robot 1 and QLACS R2 representsthe input for robot 2 received from the first component of

QLACS The terminating condition is when all states havebeen visited

The rest of Example II displayed in Table 10 explains thecooperative behavioural scenario from one state to anotherfor two robots The tables in the last row of Table 10 show thecommunication and cooperative output achieved using thesecond component of QLACS

4 Experimental Evaluations SafetyInspections for MRS Behaviours

The experimental results of implementing QLACS in anenvironment that consists of obstacles and links are tabu-lated in this section The experimental setup is explainedin Section 41 Different performance categories are shownin this section without communication category and withcommunication category In the without communicationcategory as displayed in Section 42 we found that robotscan inspect all states individually without knowing thatanother robot exists Robots can also inspect some of thestates thereby leaving some states not inspected The com-munication category is explained in Section 43 while theperformance of the QLACS measured with other existingmethods is tabulated in Section 44

41 Experimental Setup Figure 8 displays different sets ofexperiments conducted in the environment using two robotsFigure 8(a) shows how two robots resume inspection fromtwo different entrances In each set of experiments the robotstake the readings of the roof cracks and level of toxic gasesusing their sensors The same behaviours happen in Figures8(b) and 8(c) respectively at different inspection locationsThe inspection locations vary in the four experimental setupsshown in Figure 8

42 Experiment 1 Performance of QLACS without Coop-eration The result of implementing the QLACS withoutcommunication in the proposed environment (Figures 1 and4) is shown in Tables 11 and 12 In the case of Table 11 robot1 (R1) enters the mine through state 119865 while robot 2 (R2)enters the mine through state 119862 However each robot learnsby inspecting some of the states and ignoring some of the

14 Mathematical Problems in Engineering

Table 10 QLACS Example II cooperative behaviour

Starting simulation Robot 1Starting state F(1) Use (9)120574 (119865) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119865 (use (11))119876 (119865 119886) = 0

Selected action 119886 = inspect(3) use (10)119876 (119865 119886) = 50 + 025(0) = 50

End of value iteration

Current state E Robot 1(1) Use (9)120574 (119864) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119864 (Use (11))119876 (119864 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119864 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119862 (use (11))119876 (119862 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119862 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state B Robot 2(1) Use (9)120574 (119861) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119861 (use (11))119876 (119861 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119861 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current stateD Robot 1(1) Use (9)120574 (119863) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119863 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119863 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current State A Robot 2(1) Use (9)120574 (119860) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119860 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119860 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iterationCurrent state A Robot 1(1) Use (9)120574 (119860) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119860 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119860 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current stateD Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119863 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119863 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state B Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state F Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119865 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119865 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state E Robot 2(1) Use (9)120574 (119864) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119864 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119864 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state G Robot 1(1) Use (9)120574 (119866) =

05

[1 + 1]=05

2= 025

(2) Broadcast (Use (11))119876 (119866 119886) = 0

Selected action 119886 = Inspect(3) Use (10)119876 (119866 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state G Robot 2(1) Use (9)120574 (119866) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119866 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119866 119886) = 100 + 04 (max (50 50 50))= 100 + 20 = 120

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119862) =

05

[1 + 04]=05

14= 036

(2) Broadcast (Use (11))119876 (119862 119886) = 50

Selected action 119886 = Ignore(3) Use (10)119876 (119862 119886) = 100 + 036 (max (50 50))= 100 + 18 = 118

End of value iterationCurrent state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 036]=

05

136= 037

(2) Broadcast (Use (11))119876 (119862 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119862 119886) = 100 + 037 (max (50 50))= 100 + 037 lowast 50 = 1185

End of value iteration

All QLACS 1 states exhaustedGoal stateH Robot 1(1) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(3) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

All QLACS 2 states exhaustedGoal stateH Robot 2(4) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(5) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(6) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Cooperative Behaviours with Swarm Intelligence in Multirobot

2 Mathematical Problems in Engineering

drive and differential robots drive that can rotate in placeare carried out at different positions of the graph Differentmethods have been used to tackle coordination of MRSin different domains [7] Complete task coordination bymultirobots was handled [3 8] An extension of a market-based approach was used This was achieved by generalizingtask descriptions into tasks trees thereby allowing tasks to betraded in a market setting at a variable level of abstractionFigure 1 shows an overview of the inspection environment

We consider a scenario in which an MRS cooperatesto achieve a common goal of safety inspection in a par-tially observable environment such as the undergroundterrain Each robot requires to be guided using the proposedcooperative behavioural model Finding a plan to achievethis cooperative model often involves solving an NP-hardproblem This is so because it involves multiple interactingrobotsThe interactionwhich comes as a result ofminimizingtime involved in achieving underground inspection requiresreasoning among the robots In this case we use the QLtechnique to cleverly achieve the reasoning aspect of thiswork and combine it with optimal route finding using ACSThe two major questions answered in this problem are thefollowing (1)which stateroom should I inspect nowwithoutrepetition and a collisionwith another robot (2)Howdo I getthere using the closest link However all robots have partialknowledge of the environment and they are all equipped withthe necessary sensors required for the inspection The majorcontributions of this paper are as follows

(i) development of a hybrid QLACS swarm intelligentmodel based on Q-Learning and the ant colonysystem for addressing cooperative behaviours inMRSachieving underground terrain safety inspections

(ii) detailed experimental evaluations of the proposedQLACS model conducted on a simulated publiclyavailable underground tunnel Also the proposedmodel is benchmarked with related methods

(iii) systematic description of worked scenarios on anoptimal route finder andMRS cooperative inspectionusing QLACS for ease of implementation by systemengineers robotics researchers and practitioners

We do not know of any study that analytically describeshow a swarm intelligent model can easily be implementedand applied to a multirobot system in a heterogeneous envi-ronment In the next section we discuss related work in thearea of multirobot systems for underground terrainsrsquo safetyreinforcement learning (RL) and ant colony optimization(ACO) We then detail our proposed cooperative behaviourframework and approachThe experimental results and eval-uations of the safety inspections follow Finally we concludeby summarizing and discussing other future extensions

2 Theoretical Background

In this section we review some of the related work to MRSfor underground terrains safety which is the foundationaldomain for our research A look at RL algorithms and its

Gateway

Roof inspection (cracks crete loose rocks)

Obstacles (poles stones)

Router

Robot B

Robot A

Gas level (NH3 CO SO2)

Figure 1 Overview of MRS inspection in an underground mine

paradigms follows in Section 21 ACO and its paradigmapplied in MRS conclude Section 2

21 Related Multirobot Systems for Underground TerrainsSafety Effective exhaustive quick and safe navigation ofan MRS is subject to its ability to perceive interpret andinteract (cooperate) within their environment MRS can beused to achieve different tasks autonomously in structuredand unstructured environments In these environments theautomation of the operations is effected in different areasof MRS research biologically inspired robot teams commu-nication architectures and task planning localization andmapping object transportation and manipulation learningand so forth [9] However an environment such as theunderground terrain has been a grey application domain inMRS research

As indicated in Section 1 Peasgood et al [6] implementeda multiphase algorithm for multirobot planning in tunnelenvironments The algorithm as presented assumed a cen-tralized planning architecture They further compared themultirobot planning with a sequential planner Their futurework was to consider a decentralized planner architectureand might explore hybridizing the two planning algorithms

Thorp and Durrant-Whyte discussed a starter on fieldrobots [10] From their discussion field robotics involves theautomation of platforms such as air sea and land in harshunstructured environments such as underwater explorationmining agriculture and highways Field robots are made upof three parts navigation and sensing planning and controland safety Their work also discussed the challenges andprogress of field robots One of the major challenges of fieldrobots in harsh environments such as an underground mineis the problem of position determination (localization) Thisis so because the global positioning system (GPS) can onlyhelp in an environment where the sky views are guaranteedHowever progress has been made in automating some of theenvironments that are cooperatively constrained

The proposed model presented in [4] promised to handlethe safety part of field robots presented by Thorp andDurrant-Whyte [10] in underground mines Their modelarchitecture had three layers the first layer handles thecooperative behaviour of the model the second layer dealswith the scalability degree of the model and the last layer

Mathematical Problems in Engineering 3

Stepsinitialize 119876(119904 119886) arbitrarilyrepeat (for each episode)

initialize 119904Repeat (for each step of episode)

Choose 119886 from s using policy derived from 119876

Take action 119886 observe 119904 1199041015840

119876 (119904 119886) larr 119876 (119904 119886) + 120572 [119903 + 120574max1198861015840

119876 (1199041015840 1198861015840) minus 119876 (119904 119886)]

119904 larr 1199041015840

until s is terminal

Algorithm 1 Q-learning algorithm

handles the applicability of the model This paper is buildingon what has already been proposed in [4]

An investigation into automating the underground mineenvironment after blasting called ldquomaking saferdquo was carriedout in [11] to ensure the safety of the environment after blast-ing Blasting in an underground mine is the controlled use ofexplosives to excavate break down or remove rockThe needto investigate the stability of the environment after blastingbefore any mining operation takes place is of the highestpriority hence the reason for automation The automationwas centred on a persistent area of concern in South Africanunderground mine operation called hanging walls which iscaused as a result of rock burst and fall of ground Thereare also other persistence areas such as the levels of toxicgases which pose great disaster threats to the lives of minersfor instance heat sicknesses explosions pneumoconiosis(occupational safety and health-fall of ground managementin South Africa SAMRASS-code book for mines) and soforth Some of these disasters might result in fatalities andordisabilities Again when an accident happens during miningoperations rescuers find it difficult to respond immediatelyto accidents Looking at the aforementioned concerns therewill be a need to create models for safety inspection ofunderground mine operations For instance monitoring theunderground mine environment for detecting hazardousgases andor smoke should be one of the important safetymeasures Continuousmonitoring of workers and equipmentis another crucial safety measure [5] Picking up the soundfrom roof cracking to monitor when a roof is about to fall isalso a safety item

22 Reinforcement Learning For a robot to operate in aharsh unstructured environment considering every possibleevent in defining its behaviour is intricate [12] It is howeveressential to develop robots that can conform to changes intheir environment RL is one of the artificial intelligence(AI) algorithms that can achieve learning by experienceThis enhances robotsrsquo interaction with the environment Weinvestigate the use of RL to assist the behaviours of an MRSin safety inspection of underground mines RL is a sequenceof actions and state transitions with some associated rewardsWhat is being learned in RL is an optimal policy (what is theright thing to do in any of these states (119904)) [13]

At any time step each robot is in a specific state in relationto the environment and can take one of the following actionsinspect ignore or shutdown Each robot receives feedbackafter performing an action which explains the impact of theaction in relation to achieving the goalThe effect of the actioncan be either a good or bad reward This reward is measuredin terms of valuesTherefore the value of taking an action 119886 inany state 119904 of the underground terrain is measured using theAction-Value function called 119876-value 119876120587(119904 119886) When a robotis starting from state 119904 taking action 119886 and using a policypi(120587) an expected return which is defined as the sum of thediscounted rewards is achieved

In this research Q-learning a method of RL is exploredThe purpose of RL methods is to study 119876

120587(119904 119886) valuesso as to achieve optimal actions in the states QL is anonline RL method that requires no model for its applicationand stores the reinforcement values outcome in a look-uptable The QL architecture used in this work consists oflearning threads which amount to the number of robotsinvolved in the inspection behavioural task Each robot inthe learning thread carries out Q-learning in the environ-ment Algorithm 1 explains the QL algorithm used in thebehavioural model

120572 is the learning rate set between 0 and 1 At 0 119876-valuesare never updated hence nothing is learned learning canoccur quickly at 1 120574 is the discount rate set between 0 and 1 aswell This models the fact that the future rewards are worthless than the immediate rewards max

1198861015840 is the maximum

reward that is attainable in the state following the currentstate That means the reward for taking the optimal actionthereafter

QL is a competitive and search-based algorithm inspiredby computational theory It is not necessarily a multiagentalgorithm but can be adapted to a multiagent or multigoalscenarioThe success of this algorithm relies on the value andpolicy iterations which can be adjusted by some unfairness(heuristics) to fit the current problem scenario The mostcompetitive action is selected by its value and action leads toanother state or condition Both value and policy are updatedafter each decision Harnessing QL for an MRS scenarioincreases the cost exponentially and the overall performancedrops in the same direction As the robots increase cost suchas completion time memory usage and awareness factor

4 Mathematical Problems in Engineering

Table 1 Variant of ACO

SN Year Algorithm1 1991 Ant system (AS)2 1992 Elitist AS3 1995 Ant-Q4 1996 Ant colony system5 1996 Max-Min AS (MMAS)6 1997 Ranked based AS7 1999 ANTS8 2000 BWAS9 2001 Hypercube AS

(other robots in the environment) search time increasesHowever following our heuristic model of QL which wasmainly determined by the policy we set to achieve our goalour QL performs well above traditional QL

23 Ant Colony Optimization ACO is a type of swarmintelligence (SI) Bonabeau et al [14] defined SI as any attemptto design algorithms or distributed problem-solving devicesinspired by collective behaviour of social insect coloniesand other animal societies This implies that anytime some-thing is inspired by swarms it is called swarm intelligenceResearchers have been thrilled by swarms because of someof their fascinating features For instance the coordinatedmanner in which insect colonies work notwithstandinghaving no single member of the swarm in control thecoordinatedways inwhich termites build giant structures andhow the flocks move as one body and the harmonized waysin which ants quickly and efficiently search for food can onlybe attributed to an emergent phenomenon [15 16]

Ants an example of a social insect colony achievetheir self-organizing robust and flexible nature not bycentral control but by stigmergy Stigmergy also known as apheromone trail describes the indirect communication thatexists within the antrsquos colony The indirect communicationwhich is triggered by pheromone trails helps in recruitmentof more ants to the system Ants also use pheromones to findthe shortest paths to food sources Pheromone evaporationprevents stagnation which also helps to avoid prematureconvergence on a less than optimal path [17]

ACO is necessarily amultiagent based algorithmThefirstimplementation of this optimization algorithm is in a classicalsearch based (combinatory) problem the travelling salesmanproblem giving the shortest path to a specified destinationAfter this feat many researchers have used it or its variantsto model different problems In this work a variant of ACOis used to find the optimal path for MRS Table 1 describesvariants of ACO and their meaning [18] see also [19] and allreferences therein

In AS the pheromones are updated by all the ants thatcomplete a tour ACS is the modified version of AS whichintroduces the pseudorandom proportional rule In elitistAS ants that belong to the edges of the global best tour getan additional amount of pheromone during the pheromone

update MMAS introduces an upper and lower bound to thevalues of the pheromones trails All the solutions are rankedto conform to the antsrsquo length in ranked-based AS [20]

Our interest is in implementing ACS to find the best pos-sible round trip inspection path in our model environmentThis best possible inspection path will be supplied as inputor made available for the robots using QL for inspectiondecisions In most of the route finding scenarios the nodesare linearly joined and are not largely distributed This isa scenario where ants can forage along at least two paths(multigoal path environment though in our case there is apath exposed to four different goals see Section 3)

3 Proposed Cooperative MRSBehaviour Framework

The following are some of the factors we are considering inthe course of building the model Each robot has to learnto adjust to the unknown underground mine environmentIn this case each robot requires intelligence and a suit-able machine learning algorithm is adopted No robot hasglobal information of the environment because of its limitedsensing capabilities Each robot has limited communicationcapabilities therefore each robot has to keep track of theinformation of others to remain in a team Figure 2 describesthe proposed framework as an approach for building thedistributed coordinated inspecting model for MRS

The framework indicates three layers of the model thebottom layer the middle layer and the topmost layer Thelearning capability of the MRS is achieved in the bottomlayer with the reinforcement learning algorithm and swarmintelligence technique This intelligence enables robot A totake action knowing the action of robot B and vice versa Atthe middle layer scalability in terms of the number of robotsthe system can accommodate is achieved This is expedientbecause a team of robots tends to achieve tasks more quicklyand effectively than single robots This scalability is handledwith some memory management techniques as indicated inFigure 2The real life implementation is achieved by using theinformation acquired from the topmost layer Figure 3 is thebreakdown of the framework in Figure 2

There is a base-station or server that serves as a backupfor the information captured and analysed from individualrobots The model proposed in this research deals withthe way in which robots need to find their way withincommunication range and uses a broadcast approach foreffective communication of navigation status A team ofrobots cooperatively inspecting an area in the undergroundmine will need to know where they are and where togo next so it is obviously a continuing problem and ourcontribution in this case is that before robot R1 takes anaction it broadcasts its location and inspection status toother robots R2 R3 and so forth and vice versa Anunreachable robot receives packets of information based onthe destination address through rerouting from the nearerrobotsThe reliability of this broadcastmethod is the ability todetermine the extent of the task executed already by lookingat the memory of any leading robot in the team

Mathematical Problems in Engineering 5

MRS mine inspection Miners for operation

Scalable system

Global databaseMemory usage

IntelligenceQLACS

ACSQL

Gas meter Local database Roof sound sensor

Graph

MRS distributedknowledge baseaccess simpledecision makinginterrobotcommunication

Appl

icat

ion

laye

rTo

pmos

t lay

erSc

alab

ility

laye

rM

iddl

e lay

erC

oope

rativ

e beh

avio

ural

laye

rBo

ttom

laye

r

Figure 2 Framework of the proposed QLACS model for cooperative behaviours

Table 2 State and possible actions of the environment

State Possible Actions1 A (lower left part (LLP)) Inspect ignore2 B (lower middle part (LMP)) Inspect ignore3 C (lower right part (LRP)) Inspect ignore4 D (middle left part (MLP)) Inspect ignore5 E (central part (MCP)) Inspect ignore6 F (upper left part (ULP)) Inspect ignore7 G (upper right part (URP)) Inspect ignore8 H (outside mine part (OMP)) Shutdown

31 Problem Formulations Suppose we have seven roomsstates connected by doorslinks representing undergroundmine regions as shown in Figure 4 and labeled as shown inTable 2 We label each room 119860 through 119865 The outside of themine can be thought of as one big room (119867) Notice thatdoors 119865 and 119862 lead outside the mine 119867 We put two robotsin rooms 119865 and 119862 as their starting states respectively Therobots inspect one state at a time considering the obstaclesencountered in the process The robots can change directionfreely having linksdoors to other roomsstates In each ofthe states in Figure 4 two actions are possible inspection of

roof cracks (RC) and level of toxic gases (TG) or ignoringthe state as it has been inspected earlier According toour proposed model a robot can inspect at least half ofthe given underground mine region to attain its maximumperformance which in turn attracts a good reward Whenthe current state of a robot is the end point of the inspectiontask the robots exit the mine using the exit points 119862 and119865 respectively The possible transition states of the robotsare displayed using the state diagram in Figure 5 The statediagram and the transition matrix are formed using the QLalgorithm

The global map is assumed to be known by the robotsbut there is no prior knowledge of the local map Robotsonly know what they have sensed themselves and whattheir teammates communicate to them Not only does ourimprovedQL depend on themap of the environment but alsoeach robot learns through experience about local changesThey explore and inspect from state to state until they getto their goal states Our proposed model QLACS achievesan offline mode for the route finding algorithm (ACS) Thismeans that the global view of the map would have beenprovided before the learning robots start

The model is explained using simulations the results arepresented in Section 4 Some of the results also have graphicalinterpretations Analysis by a state transition diagram is

6 Mathematical Problems in Engineering

Table 3 Initial reward matrix

Robotrsquos actionA B C D E F G H

Robotrsquos state

A mdash 50 100 mdash 50 100 mdash mdash mdash mdashB 50 100 mdash 50 100 mdash 50 100 mdash mdash mdashC mdash 50 100 mdash mdash mdash mdash 50 100 150D 50 100 mdash mdash mdash 50 100 50 100 mdash mdashE mdash 50 100 mdash 50 100 mdash 50 100 50 100 mdashF mdash mdash mdash 50 100 50 100 mdash 50 100 150G mdash mdash 50 100 mdash 50 100 50 100 mdash mdashH mdash mdash 50 100 mdash mdash 50 100 mdash mdash

Bottom layer‐ A new hybrid model QLACS

‐ An efficient cooperative andnavigational model

Middle layer‐ An efficient scalable system

Topmost layer‐ Acquired results stored for future use

(a) (b)

Shortest round trip pathsGood communicationTime of performance

Number of iterations usedThorough inspection

Exit the system

Are theysuccessful

Monitor memorywith the

utilizationinterface platform

Record the results

Acquire informationfrom the bottom

layer

Build a hybrid modelusing QL and ACS

algorithms

Two robots

Test robots for

Add one morerobot to the

system

Results will beused in the

application layer

Database

Yes No

Database

Figure 3 Breakdown of the framework (a) Contributions of the framework Layer by layer and (b) processes of the multirobot behaviouralsystem

presented in Figure 5 The possible state actions of the robotsare presented in Table 3 The states of the robots are reducedas follows searching or inspecting for roof cracks and toxicgas levels in each state or room and recording the outcomeof the inspection in the robotsrsquo memories The processesinvolved in achieving good communication while the robotsinspect the states are broadcasting inspected states to theother robots and ignoring the inspectedbroadcasted stateavoiding collision with obstacles and other robots and finallymoving to the next available state that has not been inspected

The Q-learning algorithm is used to determine whataction is to be selected in any particular state Let theaction (119886) = inspect ignore shutdown state space (119904) =dimensions in the topological map sensor readings (119904

119903)

hazardous conditions (119867119888) = roof crack (RC) toxic gas level

(TG) Each robot is configured with an aerial scanner andchemical sensitive sensor that will provide readings (119904

119903)

to determine if there is a hazardous condition 119867119888 in the

environment The selection of an action by a robot throughthe Q-learning algorithm is based on the information fromthe broadcast The whole framework uses a decentralized 119876-learning scheme such that each robot has its own threadwith its Q-learning and table All 119876-values on the tables ofthe robots in the team are initialized to zero that is statescorresponding to such positions have not been inspectedWhen a robot enters a new state it broadcast its informationfor the current state to all other robots in the environmentThe broadcast indicates whether the current state has been

Mathematical Problems in Engineering 7

F

H

Houtside

Robot Bstarting position

Robot Astarting position

Linkbw A and B

mine

D

A B

E

G

C

Figure 4 Model of the environment with two entrances and exits(2EE)

F

BA (50 100)

(50 100)

(50 100)

(50 100)

(50 100) 150

150

(50 100)

(50 100)

(50

100

)

(50

100

)

(50

100

)

D E G

C

H

Goal state

Figure 5 Events and transition diagram of the modeled environ-ment with119867 as goal state

inspected or not The broadcast is a search of correspondingpositions in the memories of all visible robots If the resultantsearch returns a zero the broadcast indicates that the statehas not been inspected and if it returns a value greater thanzero it indicates that the state is not available for inspectionAll robots must receive a broadcast before they act in anyparticular state When a robot gets to a state it receives abroadcast and passes its value to the QL algorithm to makea decision The robot carries out the decision of the QLalgorithm The policy of the Q-learning algorithm makes arobot carry out an inspection if the resultant search of thebroadcast returns zero and ignores it if the resultant searchis greater than zero A robot only shuts down if all states havebeen visited and inspected As the broadcast information ispassed to the Q-learning algorithm the policy is iteratedtowards an optimal value and condition

The broadcast avoids the cost of multiple inspections andthe QL ensures that robots take appropriate actions Theonly state in which inspection is carried out would havesensor readings (119904

119903) indicating 119867

119888 For taking an action (119886)

in state (119904) the robot gets a reward (119877) which is used in(10) to compute the 119876-value for the current state and cansend a broadcast to other robots Figure 5 and Table 3 showthe restrictions and possible transitions among node pointsindicating the possible rewards for any particular action (119886)

in any state (119904) Every other transition besides the goal state

could result in a reward of 50 or 100 and at completionthe reward is the maximum 150 We consider it wasteful tomake the robots scan first before sharing intelligence becausesuch action would slow them down and make them expendmore energy Robots are to provide the inspection resultsshowing actions taken in different states and the nature ofconditions detected in any particular state The introductionof the broadcast approach to determine the teamrsquos explorationreduces the execution time and energy cost of the wholeteamandmakes the collaboration effective So in amultirobotscenario the task can be effectively executed if the robots aremade to share intelligence as they progress Robots do nothave to waste time and energy in doing the same task alreadycarried out by other robots in the team

GIVEN On a mathematical justification of the above sup-pose a safety preinspection of toxic gases or rock fall or somecombination of the two is being carried out on a piece ofcomplex underground terrain in Figure 1 say 119871Km2 thereis a limited number of MRS with different capacities 119877 andprecious inspection time 119879minutes Every regionstate 119909

1in

the terrain requires a capacity of robot1198771ofMRS and limited

time 1198791while every state 119909

119899requires robot 119877

119899of MRS and

inspection time 119879119899 Let 119875

1be the positive reward of QL for

correct behaviour on state 1199091and let 119875

119899be the reward on

state 119909119899 This research aims to maximise positive rewards by

choosing optimal values for states 1199091 119909

119899as follows

Maximise

1198751sdot 1199091+ 1198752sdot 1199092+ sdot sdot sdot + 119875

119899sdot 119909119899

(objective function)

Subject to constraints

1199091+ 1199092+ sdot sdot sdot + 119909

119899le 119871

(limited total states)

1198771sdot 1199091+ 1198772sdot 1199092+ sdot sdot sdot + 119877

119899sdot 119909119899le 119877

(limited MRS capacity)

1198791sdot 1199091+ 1198792sdot 1199092+ sdot sdot sdot + 119879

119899sdot 119909119899le 119879

(limited inspection time)

Non-negativity constraints

1199091ge 0 119909

2ge 0 119909

119899ge 0

(MRS cannot inspect negative states)

(1)

In terms of solving this optimization problem we use theproposed QLACS model to compare the time and number ofstates inspectedThe number of robots used is also comparedThe graphs results in Section 4 also give more insight into thesolution of the problem

32 Basic Navigation and Cooperative Behaviours UsingQLACS QLACS has two components The first componentis formed by an improved ACS and the second component

8 Mathematical Problems in Engineering

is formed by an improved QL The improvement occursbecause some heuristics were added to the ordinary QL andACS to achieve the hybrid QLACS However the secondcomponent ofQLACS which is an improvedQL was initiallyused to solve the proposed problem After much analysis werealized that the system needs to be optimized for effectivecooperation and communication

Using the second component of QLACS to solve the basicnavigation and cooperative behaviour the possible actionswere set for each robot as inspect ignore and shutdown (afterreaching the goal state 119867) Also a reward system that willreflect the possible actions of the robots was chosen In otherwords a robot gets 150 points only when the goal is achieved(shutdown) 100 points for ignoring an already inspected area(ignore) and 50 points for inspecting an uninspected area(inspect) Figure 5 shows the transition events of the modeland Table 3 displays the possible state action for each robotThe way the QLACS second component works here is basedon both navigation and communication behaviours

Achieving navigational behaviour with the second com-ponent of QLACS has some cost associated to it In ourscenario because we want the robots to share intelligenceby broadcasting results (robots search through other robotsrsquomemory) our problem is not solely navigational but alsocooperative Our behavioural actions are inspect ignoreand shutdown We noted that these actions of interest arenot navigation oriented there is no way we could usethem in making decisions on transition The functions fordecision can be integrated to assist the other Thereforeour behavioural model is an integration of two behaviours(1) navigational behaviour and (2) cooperating behaviourthrough decision making The integration works with aroute finding method called RandomStateSelector which weintroduced in the second component of QLACS to helpdetermine where the robot goes from the starting point tothe exit point Two parts of the RandomStateSelector methodare introduced in this work The first one is the Random-StateSelector C H which is used to transit from state 119862 to119867and the second one RandomStateSelector F H transits fromstate 119865 to 119867 This method works but not effectively becausesome of the states are repeated several times because of therandom selection method However the decision part of thissecond component of QLACS which is handled by a methodcalled CheckInspection worked efficiently CheckInspection isresponsible for sharing the broadcast among the agents Thebroadcast looks at all the stored 119876-values on all the robotsand returns a signal for the action that needs to be takenTherefore we concluded that the heuristically acceleratedcomponent of QLACS has proven to be effective by showingevidence of effective communication in making inspectiondecisions using our model It did not guarantee shortest pos-sible time for inspection because of repeated states decisionsIn this light we only harnessed the communication strengthof the second component of QLACS for communicationand cooperation Figure 5 and Table 3 form the basis for theQLACS second component

To take care of the random state selector problem encoun-tered in implementing the algorithm used for the secondpart of QLACS we introduced an optimized route finder

Table 4 Combined adjacency and weight matrix

119865 119866 119864 119863 119860 119861 119862 119867

119865 0 1 1 1 0 0 0 1119866 1 0 2 0 0 0 1 0119864 1 2 0 2 0 2 0 0119863 1 0 2 0 2 0 0 0119860 0 0 0 2 0 2 0 0119861 0 0 2 0 2 0 1 0119862 0 1 0 0 0 1 0 1119867 1 0 0 0 0 0 1 0

1

11

1

12

2 2

2

2

E

C

G

B

A

D

F

Figure 6 Weighted mapgraph of the model environment

algorithm This route finder algorithm which forms the firstcomponent of QLACS is a swarm intelligence techniqueFigure 6 andTable 4 form the basis of the exploration space ofthe agents (ants) which achieved the optimum route findingThe weighted graph in Figure 6 is based on the number ofobstacles the ants will encounter from the points of entry 119865and 119862 The combined table in Table 4 contains the weightsof the obstacles and evidence of an edge between any twovertices (states) It shows that there is a connection betweenany two vertices in a graph Generally a ldquo1rdquo or ldquo2rdquo depictsthe existence of an edge while a ldquo0rdquo represents the absenceof an edge that is no transition between such vertices Theconstructed graph is an undirected multigraph providingevidence of some agents coming from 119865 or 119862 of the mine(logical space) It is unidirectional because agents can movein any particular direction (multigraph) This means that thesameweights on the edges apply to both directionsThe graphdoes not show119867 we assume that once the agents reach 119865 or119862 they exit if all inspections have been done

33 HybridModel Development Theparameters for buildingthe analytical hybrid QLACS (equations (2) through (11))model are presented in Tables 5 and 6

331 Algorithmic and Mathematical Analysis

ACS Starts Computation of edge attractiveness

120578119894119895=

1

119863119894119895

(2)

Mathematical Problems in Engineering 9

Table 5 A list of parameters for the ACS model

ACO parameters Meaning120572 Pheromone influence factor120573 Influence of adjacent node distance120588 Pheromone evaporation coefficient119876 Attractiveness constant119890 Visited edge1198901015840 Edge not visited119871119896 Length tour of ant k

120591 Pheromone concentration (amount)120578 Specific visibility function (attractiveness)Δ120591119896 Pheromone concentration by Kth ant

119875119903(119894 119895) Probability of moving from i to j

119863119894119895 Visibility or distance between i and j

119891119894 Fitness of individual in a population119875119894 Probability of being selected among 119891

119894

119873 Number of individuals in the population119894 119895 Denotes any two adjacent nodes in the graph119872119896 Set of unvisited nodes

Table 6 A list of parameters for the QL model

QL Parameters Meaning119876 119876-value update119904 State119886 Action119877 Reward120574 Learning rate

Computation of instantaneous pheromone by ant 119896

Δ120591119896=

119876

119871119896

(3)

Update of pheromone

120591119894119895= (1 minus 120588) lowast 120591

119894119895+ Δ120591119896

119894119895 (4)

Computation of edge probability

119875119903(119894 119895) =

[120591119894119895]120572[120578119894119895]120573

Σ1198901015840=(119894119895)

[120591119894119895]120572[120578119894119895]120573 (5)

Adoption of roulette wheel

Cumulative (119875119903(119894 119895)) =

119873+1

sum119894=1

119875119903(119894 119895) (6)

119891119894=sum119873

119895=1119891119895

119873 (7)

119875119894=

119891119894

sum119873

119895=1119891119895

(8)

Equations (2) through (8) build the complete route find-ing model Equations (2) through (4) are prerequisite to (5)Equation (5) is prerequisite to roulette wheel selection At theend of (8) new states are selected and the trail is updatedThebest path from both directions is selected and used as inputin Q-learning Note that 119863

119894119895= weight on edges 119875

119903(119894 119895) =

chance of moving to a node 119864(119868 119895) 119871119896= sum of visited nodes

by ant 119896

QL Starts Each robot in its QL threadcomputes its learning rate

120574 =05

[1 + Frequency (119904 119886)] (9)

119876-values are updated

119876 (119904 119886) = 119877 (119904 119886) + 120574 (10)

making a broadcast (Decision = InspectIgnoreShut-down)

119876 (119904 119886) =

119876 = 0 Inspect if 119904119895

= goalstate119876 gt 0 Ignore if 119904

119895= goalstate

119876 ge 0 Shutdown if 119904119895= goalstate

(11)

Equation (9) is Gamma the learning rate which is alwaysbetween zero and 1 This equation is calculated based onthe frequency of action of each robot in inspecting statesEquations (9) to (11) are state-dependent The states are keptin a buffer and then accessed at run time ACS and QL do notwork simultaneously ACS works to completion and QL takesthe final output as its input ACS is not repeatedly called whileQL is working

Our behavioural model is an integration of two algo-rithms (1) route finding algorithm (2) communication andcooperative algorithm The integration works the followingway The optimal route finder (ACS) determines where therobot goes from the starting point to the destination whileQL determines what action it takes when it gets to any ofthe states This collaboration works effectively because theoptimal route finder has been proven to give the best possibletransitions and the heuristically accelerated QL has provento be effective by showing evidence of effective commu-nication in making inspection decisions Consequently itguarantees the shortest possible time for inspection in theabsence of wasteful inspection decisions This frameworkforms the basis for our cooperative behaviourmodel forMRS(QLACS)Thepseudocode for the implementation ofQLACSis outlined in Algorithm 2

332 QLACS Hybrid Approach Evolution Figure 7 is thearchitecture of hybrid QLACS explaining the handshakebetween the two components

(i) Graph Construct Module This is the interconnection ofstates in our model environment (see Figure 4) It encom-passes all possible transitions from 119865 to 119862 and vice versaThus from themodelwe construct an adjacencyweight graphmatrix that can be traversed by any graph-oriented algorithm

10 Mathematical Problems in Engineering

INPUT Edge distance(obstacles) pheromones antsrsquo trail associated probabilitiesstarting and terminating indexes that is from F or COUTPUT Effective cooperation inspection and navigation

(1) Boolean CompletedFlag = False Boolean variable indicates completion for all the threads(2) Declare CompletedThreadBuffer Data structure stores Info about completed thread(3) Initialize all 119876values to Zero All 119876values positions are initialized to zero(4) Initialize Best Paths From ACO algorithm starting from 119865 and 119862(5) While (CompletedFlag ltgt True) Checks for when all robots have finished and flag is true

BeginCreate N number of Threads in ParallelThreads get Current States in Parallel from ACO algorithmThreads get Next States and Indexes in Parallel from ACO algorithmThreads compare 119876values of all corresponding positions of Current States (Each Robot Broadcast119876value info)IF ((Q == 0)amp (ThreadNextState ltgt GoalState)) Checks if a particulate state is availableBeginState is Available Robot with the CurrentThreadID InspectsCompute and Update 119876value

EndIF (Q gt 0) checks if a state is not available because an already inspected state has 119876 gt 0BeginState is already inspected Robot with the CurrentThreadID IgnoreCompute and Update 119876value

EndIF ((Q == 0) amp (ThreadNextState == GoalState)) Checks for goal state and shuts downBeginCompute and Update 119876valueGoal state is reached and Inspection CompletedThread with the CurrentThreadID Shuts downStore CurrentThreadID in CompletedThreadBuffer

EndIF (Count [CompletedThreadBuffer] == NumberOfRobot) Learning stops when this happensBeginCompletedFlag = True

EndEnd of While Loop

Algorithm 2 Pseudocode for QLACS

No

No

Yes

Yes2

robots

Graphconstruct

Initializeparameters

module

Stateselectionmodule

Stateaction

module

QLACSupdatemodule

Statebroadcasting

module

Goal statemodule

New statemodule

Decisionmodule

Cooperativeinspection

module

Convergencedecisionmodule

Transitionmodule

Updatemodule

Figure 7 Hybrid architecture

Mathematical Problems in Engineering 11

Example I of QLACS (Using Figure 5 for optimized route finding)Parameters used in the first component of QLACS119876 = 20 120572 = 3 120573 = 2 120588 = 001

Starting State 119865Terminating State 119862119865Terminating condition When all states have been visited at least once and it is at terminating stateState Space = 119865 119866 119864119863 119860119861 119862Initialize pheromones positions to 001Rand = (0 1) returns a random value between 0 and 1Equations(i) Computation of attractiveness 120578

119894119895using (2)

(ii) Computation of instantaneous pheromones by ant 119896 for all potential states using (3)(iii) Pheromone update using (4)(iv) Computation of probabilities for the potential states using (5)(v) Adaptation of roulette Wheel using (6)Equation (5) can be reduced to Let 119908(119894 119895) = [120591

119894119895]120572

[120578119894119895]120573

Sum =119873+

sum119894=1

119908 (119894 119895)

So 119875119903(119894 119895) = 119908

sum

Algorithm 3 Parameters for QLACS Example I

In our case there are eight states primarily seven internalstates and a common terminating state Since the states arenot just linear the environment allows for multiple optionsthat is an agentant can choose from any present state Thistype of scenario is also called a multigoal scenario

(ii) State Selection Module Here the agents select the startand end states which according to our model in Figure 4 are119865 and 119862 These states are selected based on the cumulativeprobability of two adjacent nodes in (6)

(iii) Transition Module This module takes care of the tran-sition rules of the agents by calculating the pheromoneconcentrations distances and probabilities using (2) through(4)

(iv)UpdateModuleAfter transition fromone state to anotheran update of the pheromone is computed after whichmultipath planning with shortest path is achieved

(v) Convergence Decision Module This is where the besttrailpath decision is takenThis is the end product of the firstcomponent of QLACS which is then moved to the secondcomponent of QLACS for cooperative behaviour

(vi) Cooperative Inspection Module This is where the robotsare deployed to start inspection The robots are to use theacquired best paths starting from 119865 and 119862 respectively asinput for the second component of QLACS The two robotsare to use the learning rate from (9) to learn the environmentand use (10) for cooperation

(vii) State Broadcasting Module This module handles thebroadcasting behaviours of the two robots which areachieved by using (10) Each robot checks its memoryrepresented by 119876-values before taking any decision

(viii) State Action Module State broadcasting by each robot isimmediately followed by action selection In other words thestate to inspect or ignore is achieved here using (11)

(ix) QLACS Update Module After each action selection the119876-values of each robot are updated using (10)

(x) New State Module This module takes care of the robotrsquosnew state after updating the 119876-values Each robot runs in itsown threads managing its memory yet sharing information

(xi) Final Decision ModuleThis decision module determinesif the robot should exit the environment or still do moreinspections It is also controlled by (11)

(xii) Goal State Module The completion of the second com-ponent of QLACS is getting to the goal state after successfulinspection of statesThis goal state according to our model inFigure 4 is called119867

333 Analytical Scenario For the benefit of robotic andsystem engineers practitioners and researchers this paperuniquely presents scenarios on the ease of implementation ofthe proposed QLACS as described in Algorithms 3 and 4 andTables 7 and 10 The first component of QLACS is explainedin Example I shown in Algorithm 3 and Table 7 The firstcomponent that achieved optimal route paths for the robotshas the parameters listed in Algorithm 3 The calculation fordifferent states transition for the first component of QLACSis analysed in Table 7

Repeat steps 1ndash6 for subsequent current states until thetermination condition and state are reached At the end ofseven updates we have Tables 8 and 9 The total lengthshown in Table 8 represents the total number of obstaclesencountered by each agent while trailing to achieve theoptimal route for the robots The number of obstacle shown

12 Mathematical Problems in Engineering

Table7QLA

CSEx

ampleI

navigatio

nalanalytic

alsolutio

n

Startin

gsta

te119865

Potentialstates119863119864

119866(1)U

se(2)120578119865119863=1120578119865119864=1120578119865119866=1

(2)U

se(3)119871119896=1Δ120591119896(119865119863

)=21

=2Δ

120591119896(119865119864)=2Δ

120591119896(119865119866

)=2

(3)U

se(4)120591 119865119863=(1minus001)lowast001+2=20099=201120591119865119864=201120591119865119866=201

Firstp

heromon

eupd

ate

119865119866

119864

001

201

201

119863119860

119861

201

001

001

119862

001

(4)U

se(5)119908

(119865119863

)=[120591119865119863]120572[120578119865119863]120573=(201)3=812119908

(119865119864)=812119908

(119865119866

)=812

Sum

=119908(119865119863

)+119908(119865119864)+119908(119865119866

)=2436

119875119903(119865119863

)=

119908 sum

=812

2436=033119875119903(119865119864)=033119875119903(119865119866

)=033

Prob

abilitie

s119865

119866119864

0033

033

119863119860

119861

033

00

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0033

033

119863119860

119861

033

00

119862 0

CallRa

ndRa

nd=007R

andfalls

instate119864

Rou

lette

wheelselects119864

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

Currentstate119864

Potentialstates119861119863

119866(1)U

se(2)120578119864119861=12120578119864119863=12120578119864119866=1

(2)U

se(3)119871119896=(119865

minus119864minus119863)=1+2=3119871119896=(119865

minus119864minus119861)=1+2=3119871119896=(119865

minus119864minus119866)=1+2=3

Δ120591119896=2 3=067

(3)U

se(4)120591 119864119861=(1minus001)lowast201+067=266120591119864119863=266120591119864119866=266

Second

pherom

oneu

pdate

119865119866

119864

001

266

201

119863119860

119861

266

001

266

119862

001

(4)U

se(5)119908

(119864119861)=(266)3lowast(12)2=471119908

(119864119863

)=471119908

(119864119866

)=(266)3lowast(1)2=1882

Sum

=471+471+1882=2824

119875119903(119864119861)=

119908 sum

=471

2824=017119875119903(119864119863

)=

471

2824=017119875119903(119864119866)=1882

2824=067

Prob

abilitie

s119865

119866119864

0067

0

119863119860

119861

017

0017

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0067

0

119863119860

119861

017

0017

119862 0

CallRa

ndRa

nd=09Ra

ndfalls

instate119863

Rou

lette

wheelselects119863

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

119863

Mathematical Problems in Engineering 13

Example II of QLACS (For good cooperation and communication between robots)Parameters used in the second component of QLACS (Using output from the first component of QLACS)Reward Scheme Inspect = 50 Ignore = 100 Shutdown = 150State space Optimized path from QLACS R1 = 119865 119864119863 119860 119861 119862 119866 119862 and QLACS R2 = 119862 119861 119860119863 119865 119864 119866 119862

Starting State 119865119862119878119895= Terminating State 119862 then119867

Terminating condition When all states have been visitedInitialize 119876value positions to zerosEquations(i) Compute learning rate using (9)(ii) Compute update on 119876(119904 119886) using (10)

Algorithm 4 Parameters for QLACS Example II

Table 8 Pheromone update of a full cycle

Pheromone updateCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 001 201 201 201 001 001 001 1st update119864 001 266 201 266 001 266 001 2nd update119863 313 001 303 001 303 001 001 3rd update119860 001 001 001 045 001 045 001 4th update119861 001 001 067 001 067 001 07 5th update119862 001 091 001 001 001 089 001 6th update119866 071 001 069 001 001 001 071 7th update119862 = terminating state and ant 119896 has moved through all states at least onceTrail 119865119864119863119860119861119862119866119862Number of obstacles = 1 + 2 + 2 + 2 + 1 + 1 + 1 + 1 = 10

Table 9 Probability update of a full cycle

Probability tableCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 0 033 033 033 0 0 0119864 0 067 0 017 0 017 0119863 069 0 016 0 016 0 0119860 0 0 0 05 0 05 0119861 0 0 016 0 016 0 068119862 0 052 0 0 0 049 0119866 045 0 0 0 0 0 045119862 = terminating state

as 10 is calculated using the trail result in Table 8 Table 9displays the probability update table for a full cycle of routefinding In the case of this first example the first robot willterminate its inspection through 119862 and then119867

Algorithm 4 displays the parameters of the second exam-ple of QLACS The second component of QLACS handlesthis aspect of the model which is the communication andcooperative part Once the first component hands the outputto the second component it becomes the second componentinput and it runs with it From Algorithm 4 QLACS R1represents the input for robot 1 and QLACS R2 representsthe input for robot 2 received from the first component of

QLACS The terminating condition is when all states havebeen visited

The rest of Example II displayed in Table 10 explains thecooperative behavioural scenario from one state to anotherfor two robots The tables in the last row of Table 10 show thecommunication and cooperative output achieved using thesecond component of QLACS

4 Experimental Evaluations SafetyInspections for MRS Behaviours

The experimental results of implementing QLACS in anenvironment that consists of obstacles and links are tabu-lated in this section The experimental setup is explainedin Section 41 Different performance categories are shownin this section without communication category and withcommunication category In the without communicationcategory as displayed in Section 42 we found that robotscan inspect all states individually without knowing thatanother robot exists Robots can also inspect some of thestates thereby leaving some states not inspected The com-munication category is explained in Section 43 while theperformance of the QLACS measured with other existingmethods is tabulated in Section 44

41 Experimental Setup Figure 8 displays different sets ofexperiments conducted in the environment using two robotsFigure 8(a) shows how two robots resume inspection fromtwo different entrances In each set of experiments the robotstake the readings of the roof cracks and level of toxic gasesusing their sensors The same behaviours happen in Figures8(b) and 8(c) respectively at different inspection locationsThe inspection locations vary in the four experimental setupsshown in Figure 8

42 Experiment 1 Performance of QLACS without Coop-eration The result of implementing the QLACS withoutcommunication in the proposed environment (Figures 1 and4) is shown in Tables 11 and 12 In the case of Table 11 robot1 (R1) enters the mine through state 119865 while robot 2 (R2)enters the mine through state 119862 However each robot learnsby inspecting some of the states and ignoring some of the

14 Mathematical Problems in Engineering

Table 10 QLACS Example II cooperative behaviour

Starting simulation Robot 1Starting state F(1) Use (9)120574 (119865) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119865 (use (11))119876 (119865 119886) = 0

Selected action 119886 = inspect(3) use (10)119876 (119865 119886) = 50 + 025(0) = 50

End of value iteration

Current state E Robot 1(1) Use (9)120574 (119864) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119864 (Use (11))119876 (119864 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119864 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119862 (use (11))119876 (119862 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119862 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state B Robot 2(1) Use (9)120574 (119861) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119861 (use (11))119876 (119861 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119861 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current stateD Robot 1(1) Use (9)120574 (119863) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119863 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119863 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current State A Robot 2(1) Use (9)120574 (119860) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119860 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119860 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iterationCurrent state A Robot 1(1) Use (9)120574 (119860) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119860 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119860 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current stateD Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119863 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119863 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state B Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state F Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119865 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119865 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state E Robot 2(1) Use (9)120574 (119864) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119864 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119864 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state G Robot 1(1) Use (9)120574 (119866) =

05

[1 + 1]=05

2= 025

(2) Broadcast (Use (11))119876 (119866 119886) = 0

Selected action 119886 = Inspect(3) Use (10)119876 (119866 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state G Robot 2(1) Use (9)120574 (119866) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119866 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119866 119886) = 100 + 04 (max (50 50 50))= 100 + 20 = 120

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119862) =

05

[1 + 04]=05

14= 036

(2) Broadcast (Use (11))119876 (119862 119886) = 50

Selected action 119886 = Ignore(3) Use (10)119876 (119862 119886) = 100 + 036 (max (50 50))= 100 + 18 = 118

End of value iterationCurrent state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 036]=

05

136= 037

(2) Broadcast (Use (11))119876 (119862 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119862 119886) = 100 + 037 (max (50 50))= 100 + 037 lowast 50 = 1185

End of value iteration

All QLACS 1 states exhaustedGoal stateH Robot 1(1) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(3) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

All QLACS 2 states exhaustedGoal stateH Robot 2(4) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(5) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(6) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Cooperative Behaviours with Swarm Intelligence in Multirobot

Mathematical Problems in Engineering 3

Stepsinitialize 119876(119904 119886) arbitrarilyrepeat (for each episode)

initialize 119904Repeat (for each step of episode)

Choose 119886 from s using policy derived from 119876

Take action 119886 observe 119904 1199041015840

119876 (119904 119886) larr 119876 (119904 119886) + 120572 [119903 + 120574max1198861015840

119876 (1199041015840 1198861015840) minus 119876 (119904 119886)]

119904 larr 1199041015840

until s is terminal

Algorithm 1 Q-learning algorithm

handles the applicability of the model This paper is buildingon what has already been proposed in [4]

An investigation into automating the underground mineenvironment after blasting called ldquomaking saferdquo was carriedout in [11] to ensure the safety of the environment after blast-ing Blasting in an underground mine is the controlled use ofexplosives to excavate break down or remove rockThe needto investigate the stability of the environment after blastingbefore any mining operation takes place is of the highestpriority hence the reason for automation The automationwas centred on a persistent area of concern in South Africanunderground mine operation called hanging walls which iscaused as a result of rock burst and fall of ground Thereare also other persistence areas such as the levels of toxicgases which pose great disaster threats to the lives of minersfor instance heat sicknesses explosions pneumoconiosis(occupational safety and health-fall of ground managementin South Africa SAMRASS-code book for mines) and soforth Some of these disasters might result in fatalities andordisabilities Again when an accident happens during miningoperations rescuers find it difficult to respond immediatelyto accidents Looking at the aforementioned concerns therewill be a need to create models for safety inspection ofunderground mine operations For instance monitoring theunderground mine environment for detecting hazardousgases andor smoke should be one of the important safetymeasures Continuousmonitoring of workers and equipmentis another crucial safety measure [5] Picking up the soundfrom roof cracking to monitor when a roof is about to fall isalso a safety item

22 Reinforcement Learning For a robot to operate in aharsh unstructured environment considering every possibleevent in defining its behaviour is intricate [12] It is howeveressential to develop robots that can conform to changes intheir environment RL is one of the artificial intelligence(AI) algorithms that can achieve learning by experienceThis enhances robotsrsquo interaction with the environment Weinvestigate the use of RL to assist the behaviours of an MRSin safety inspection of underground mines RL is a sequenceof actions and state transitions with some associated rewardsWhat is being learned in RL is an optimal policy (what is theright thing to do in any of these states (119904)) [13]

At any time step each robot is in a specific state in relationto the environment and can take one of the following actionsinspect ignore or shutdown Each robot receives feedbackafter performing an action which explains the impact of theaction in relation to achieving the goalThe effect of the actioncan be either a good or bad reward This reward is measuredin terms of valuesTherefore the value of taking an action 119886 inany state 119904 of the underground terrain is measured using theAction-Value function called 119876-value 119876120587(119904 119886) When a robotis starting from state 119904 taking action 119886 and using a policypi(120587) an expected return which is defined as the sum of thediscounted rewards is achieved

In this research Q-learning a method of RL is exploredThe purpose of RL methods is to study 119876

120587(119904 119886) valuesso as to achieve optimal actions in the states QL is anonline RL method that requires no model for its applicationand stores the reinforcement values outcome in a look-uptable The QL architecture used in this work consists oflearning threads which amount to the number of robotsinvolved in the inspection behavioural task Each robot inthe learning thread carries out Q-learning in the environ-ment Algorithm 1 explains the QL algorithm used in thebehavioural model

120572 is the learning rate set between 0 and 1 At 0 119876-valuesare never updated hence nothing is learned learning canoccur quickly at 1 120574 is the discount rate set between 0 and 1 aswell This models the fact that the future rewards are worthless than the immediate rewards max

1198861015840 is the maximum

reward that is attainable in the state following the currentstate That means the reward for taking the optimal actionthereafter

QL is a competitive and search-based algorithm inspiredby computational theory It is not necessarily a multiagentalgorithm but can be adapted to a multiagent or multigoalscenarioThe success of this algorithm relies on the value andpolicy iterations which can be adjusted by some unfairness(heuristics) to fit the current problem scenario The mostcompetitive action is selected by its value and action leads toanother state or condition Both value and policy are updatedafter each decision Harnessing QL for an MRS scenarioincreases the cost exponentially and the overall performancedrops in the same direction As the robots increase cost suchas completion time memory usage and awareness factor

4 Mathematical Problems in Engineering

Table 1 Variant of ACO

SN Year Algorithm1 1991 Ant system (AS)2 1992 Elitist AS3 1995 Ant-Q4 1996 Ant colony system5 1996 Max-Min AS (MMAS)6 1997 Ranked based AS7 1999 ANTS8 2000 BWAS9 2001 Hypercube AS

(other robots in the environment) search time increasesHowever following our heuristic model of QL which wasmainly determined by the policy we set to achieve our goalour QL performs well above traditional QL

23 Ant Colony Optimization ACO is a type of swarmintelligence (SI) Bonabeau et al [14] defined SI as any attemptto design algorithms or distributed problem-solving devicesinspired by collective behaviour of social insect coloniesand other animal societies This implies that anytime some-thing is inspired by swarms it is called swarm intelligenceResearchers have been thrilled by swarms because of someof their fascinating features For instance the coordinatedmanner in which insect colonies work notwithstandinghaving no single member of the swarm in control thecoordinatedways inwhich termites build giant structures andhow the flocks move as one body and the harmonized waysin which ants quickly and efficiently search for food can onlybe attributed to an emergent phenomenon [15 16]

Ants an example of a social insect colony achievetheir self-organizing robust and flexible nature not bycentral control but by stigmergy Stigmergy also known as apheromone trail describes the indirect communication thatexists within the antrsquos colony The indirect communicationwhich is triggered by pheromone trails helps in recruitmentof more ants to the system Ants also use pheromones to findthe shortest paths to food sources Pheromone evaporationprevents stagnation which also helps to avoid prematureconvergence on a less than optimal path [17]

ACO is necessarily amultiagent based algorithmThefirstimplementation of this optimization algorithm is in a classicalsearch based (combinatory) problem the travelling salesmanproblem giving the shortest path to a specified destinationAfter this feat many researchers have used it or its variantsto model different problems In this work a variant of ACOis used to find the optimal path for MRS Table 1 describesvariants of ACO and their meaning [18] see also [19] and allreferences therein

In AS the pheromones are updated by all the ants thatcomplete a tour ACS is the modified version of AS whichintroduces the pseudorandom proportional rule In elitistAS ants that belong to the edges of the global best tour getan additional amount of pheromone during the pheromone

update MMAS introduces an upper and lower bound to thevalues of the pheromones trails All the solutions are rankedto conform to the antsrsquo length in ranked-based AS [20]

Our interest is in implementing ACS to find the best pos-sible round trip inspection path in our model environmentThis best possible inspection path will be supplied as inputor made available for the robots using QL for inspectiondecisions In most of the route finding scenarios the nodesare linearly joined and are not largely distributed This isa scenario where ants can forage along at least two paths(multigoal path environment though in our case there is apath exposed to four different goals see Section 3)

3 Proposed Cooperative MRSBehaviour Framework

The following are some of the factors we are considering inthe course of building the model Each robot has to learnto adjust to the unknown underground mine environmentIn this case each robot requires intelligence and a suit-able machine learning algorithm is adopted No robot hasglobal information of the environment because of its limitedsensing capabilities Each robot has limited communicationcapabilities therefore each robot has to keep track of theinformation of others to remain in a team Figure 2 describesthe proposed framework as an approach for building thedistributed coordinated inspecting model for MRS

The framework indicates three layers of the model thebottom layer the middle layer and the topmost layer Thelearning capability of the MRS is achieved in the bottomlayer with the reinforcement learning algorithm and swarmintelligence technique This intelligence enables robot A totake action knowing the action of robot B and vice versa Atthe middle layer scalability in terms of the number of robotsthe system can accommodate is achieved This is expedientbecause a team of robots tends to achieve tasks more quicklyand effectively than single robots This scalability is handledwith some memory management techniques as indicated inFigure 2The real life implementation is achieved by using theinformation acquired from the topmost layer Figure 3 is thebreakdown of the framework in Figure 2

There is a base-station or server that serves as a backupfor the information captured and analysed from individualrobots The model proposed in this research deals withthe way in which robots need to find their way withincommunication range and uses a broadcast approach foreffective communication of navigation status A team ofrobots cooperatively inspecting an area in the undergroundmine will need to know where they are and where togo next so it is obviously a continuing problem and ourcontribution in this case is that before robot R1 takes anaction it broadcasts its location and inspection status toother robots R2 R3 and so forth and vice versa Anunreachable robot receives packets of information based onthe destination address through rerouting from the nearerrobotsThe reliability of this broadcastmethod is the ability todetermine the extent of the task executed already by lookingat the memory of any leading robot in the team

Mathematical Problems in Engineering 5

MRS mine inspection Miners for operation

Scalable system

Global databaseMemory usage

IntelligenceQLACS

ACSQL

Gas meter Local database Roof sound sensor

Graph

MRS distributedknowledge baseaccess simpledecision makinginterrobotcommunication

Appl

icat

ion

laye

rTo

pmos

t lay

erSc

alab

ility

laye

rM

iddl

e lay

erC

oope

rativ

e beh

avio

ural

laye

rBo

ttom

laye

r

Figure 2 Framework of the proposed QLACS model for cooperative behaviours

Table 2 State and possible actions of the environment

State Possible Actions1 A (lower left part (LLP)) Inspect ignore2 B (lower middle part (LMP)) Inspect ignore3 C (lower right part (LRP)) Inspect ignore4 D (middle left part (MLP)) Inspect ignore5 E (central part (MCP)) Inspect ignore6 F (upper left part (ULP)) Inspect ignore7 G (upper right part (URP)) Inspect ignore8 H (outside mine part (OMP)) Shutdown

31 Problem Formulations Suppose we have seven roomsstates connected by doorslinks representing undergroundmine regions as shown in Figure 4 and labeled as shown inTable 2 We label each room 119860 through 119865 The outside of themine can be thought of as one big room (119867) Notice thatdoors 119865 and 119862 lead outside the mine 119867 We put two robotsin rooms 119865 and 119862 as their starting states respectively Therobots inspect one state at a time considering the obstaclesencountered in the process The robots can change directionfreely having linksdoors to other roomsstates In each ofthe states in Figure 4 two actions are possible inspection of

roof cracks (RC) and level of toxic gases (TG) or ignoringthe state as it has been inspected earlier According toour proposed model a robot can inspect at least half ofthe given underground mine region to attain its maximumperformance which in turn attracts a good reward Whenthe current state of a robot is the end point of the inspectiontask the robots exit the mine using the exit points 119862 and119865 respectively The possible transition states of the robotsare displayed using the state diagram in Figure 5 The statediagram and the transition matrix are formed using the QLalgorithm

The global map is assumed to be known by the robotsbut there is no prior knowledge of the local map Robotsonly know what they have sensed themselves and whattheir teammates communicate to them Not only does ourimprovedQL depend on themap of the environment but alsoeach robot learns through experience about local changesThey explore and inspect from state to state until they getto their goal states Our proposed model QLACS achievesan offline mode for the route finding algorithm (ACS) Thismeans that the global view of the map would have beenprovided before the learning robots start

The model is explained using simulations the results arepresented in Section 4 Some of the results also have graphicalinterpretations Analysis by a state transition diagram is

6 Mathematical Problems in Engineering

Table 3 Initial reward matrix

Robotrsquos actionA B C D E F G H

Robotrsquos state

A mdash 50 100 mdash 50 100 mdash mdash mdash mdashB 50 100 mdash 50 100 mdash 50 100 mdash mdash mdashC mdash 50 100 mdash mdash mdash mdash 50 100 150D 50 100 mdash mdash mdash 50 100 50 100 mdash mdashE mdash 50 100 mdash 50 100 mdash 50 100 50 100 mdashF mdash mdash mdash 50 100 50 100 mdash 50 100 150G mdash mdash 50 100 mdash 50 100 50 100 mdash mdashH mdash mdash 50 100 mdash mdash 50 100 mdash mdash

Bottom layer‐ A new hybrid model QLACS

‐ An efficient cooperative andnavigational model

Middle layer‐ An efficient scalable system

Topmost layer‐ Acquired results stored for future use

(a) (b)

Shortest round trip pathsGood communicationTime of performance

Number of iterations usedThorough inspection

Exit the system

Are theysuccessful

Monitor memorywith the

utilizationinterface platform

Record the results

Acquire informationfrom the bottom

layer

Build a hybrid modelusing QL and ACS

algorithms

Two robots

Test robots for

Add one morerobot to the

system

Results will beused in the

application layer

Database

Yes No

Database

Figure 3 Breakdown of the framework (a) Contributions of the framework Layer by layer and (b) processes of the multirobot behaviouralsystem

presented in Figure 5 The possible state actions of the robotsare presented in Table 3 The states of the robots are reducedas follows searching or inspecting for roof cracks and toxicgas levels in each state or room and recording the outcomeof the inspection in the robotsrsquo memories The processesinvolved in achieving good communication while the robotsinspect the states are broadcasting inspected states to theother robots and ignoring the inspectedbroadcasted stateavoiding collision with obstacles and other robots and finallymoving to the next available state that has not been inspected

The Q-learning algorithm is used to determine whataction is to be selected in any particular state Let theaction (119886) = inspect ignore shutdown state space (119904) =dimensions in the topological map sensor readings (119904

119903)

hazardous conditions (119867119888) = roof crack (RC) toxic gas level

(TG) Each robot is configured with an aerial scanner andchemical sensitive sensor that will provide readings (119904

119903)

to determine if there is a hazardous condition 119867119888 in the

environment The selection of an action by a robot throughthe Q-learning algorithm is based on the information fromthe broadcast The whole framework uses a decentralized 119876-learning scheme such that each robot has its own threadwith its Q-learning and table All 119876-values on the tables ofthe robots in the team are initialized to zero that is statescorresponding to such positions have not been inspectedWhen a robot enters a new state it broadcast its informationfor the current state to all other robots in the environmentThe broadcast indicates whether the current state has been

Mathematical Problems in Engineering 7

F

H

Houtside

Robot Bstarting position

Robot Astarting position

Linkbw A and B

mine

D

A B

E

G

C

Figure 4 Model of the environment with two entrances and exits(2EE)

F

BA (50 100)

(50 100)

(50 100)

(50 100)

(50 100) 150

150

(50 100)

(50 100)

(50

100

)

(50

100

)

(50

100

)

D E G

C

H

Goal state

Figure 5 Events and transition diagram of the modeled environ-ment with119867 as goal state

inspected or not The broadcast is a search of correspondingpositions in the memories of all visible robots If the resultantsearch returns a zero the broadcast indicates that the statehas not been inspected and if it returns a value greater thanzero it indicates that the state is not available for inspectionAll robots must receive a broadcast before they act in anyparticular state When a robot gets to a state it receives abroadcast and passes its value to the QL algorithm to makea decision The robot carries out the decision of the QLalgorithm The policy of the Q-learning algorithm makes arobot carry out an inspection if the resultant search of thebroadcast returns zero and ignores it if the resultant searchis greater than zero A robot only shuts down if all states havebeen visited and inspected As the broadcast information ispassed to the Q-learning algorithm the policy is iteratedtowards an optimal value and condition

The broadcast avoids the cost of multiple inspections andthe QL ensures that robots take appropriate actions Theonly state in which inspection is carried out would havesensor readings (119904

119903) indicating 119867

119888 For taking an action (119886)

in state (119904) the robot gets a reward (119877) which is used in(10) to compute the 119876-value for the current state and cansend a broadcast to other robots Figure 5 and Table 3 showthe restrictions and possible transitions among node pointsindicating the possible rewards for any particular action (119886)

in any state (119904) Every other transition besides the goal state

could result in a reward of 50 or 100 and at completionthe reward is the maximum 150 We consider it wasteful tomake the robots scan first before sharing intelligence becausesuch action would slow them down and make them expendmore energy Robots are to provide the inspection resultsshowing actions taken in different states and the nature ofconditions detected in any particular state The introductionof the broadcast approach to determine the teamrsquos explorationreduces the execution time and energy cost of the wholeteamandmakes the collaboration effective So in amultirobotscenario the task can be effectively executed if the robots aremade to share intelligence as they progress Robots do nothave to waste time and energy in doing the same task alreadycarried out by other robots in the team

GIVEN On a mathematical justification of the above sup-pose a safety preinspection of toxic gases or rock fall or somecombination of the two is being carried out on a piece ofcomplex underground terrain in Figure 1 say 119871Km2 thereis a limited number of MRS with different capacities 119877 andprecious inspection time 119879minutes Every regionstate 119909

1in

the terrain requires a capacity of robot1198771ofMRS and limited

time 1198791while every state 119909

119899requires robot 119877

119899of MRS and

inspection time 119879119899 Let 119875

1be the positive reward of QL for

correct behaviour on state 1199091and let 119875

119899be the reward on

state 119909119899 This research aims to maximise positive rewards by

choosing optimal values for states 1199091 119909

119899as follows

Maximise

1198751sdot 1199091+ 1198752sdot 1199092+ sdot sdot sdot + 119875

119899sdot 119909119899

(objective function)

Subject to constraints

1199091+ 1199092+ sdot sdot sdot + 119909

119899le 119871

(limited total states)

1198771sdot 1199091+ 1198772sdot 1199092+ sdot sdot sdot + 119877

119899sdot 119909119899le 119877

(limited MRS capacity)

1198791sdot 1199091+ 1198792sdot 1199092+ sdot sdot sdot + 119879

119899sdot 119909119899le 119879

(limited inspection time)

Non-negativity constraints

1199091ge 0 119909

2ge 0 119909

119899ge 0

(MRS cannot inspect negative states)

(1)

In terms of solving this optimization problem we use theproposed QLACS model to compare the time and number ofstates inspectedThe number of robots used is also comparedThe graphs results in Section 4 also give more insight into thesolution of the problem

32 Basic Navigation and Cooperative Behaviours UsingQLACS QLACS has two components The first componentis formed by an improved ACS and the second component

8 Mathematical Problems in Engineering

is formed by an improved QL The improvement occursbecause some heuristics were added to the ordinary QL andACS to achieve the hybrid QLACS However the secondcomponent ofQLACS which is an improvedQL was initiallyused to solve the proposed problem After much analysis werealized that the system needs to be optimized for effectivecooperation and communication

Using the second component of QLACS to solve the basicnavigation and cooperative behaviour the possible actionswere set for each robot as inspect ignore and shutdown (afterreaching the goal state 119867) Also a reward system that willreflect the possible actions of the robots was chosen In otherwords a robot gets 150 points only when the goal is achieved(shutdown) 100 points for ignoring an already inspected area(ignore) and 50 points for inspecting an uninspected area(inspect) Figure 5 shows the transition events of the modeland Table 3 displays the possible state action for each robotThe way the QLACS second component works here is basedon both navigation and communication behaviours

Achieving navigational behaviour with the second com-ponent of QLACS has some cost associated to it In ourscenario because we want the robots to share intelligenceby broadcasting results (robots search through other robotsrsquomemory) our problem is not solely navigational but alsocooperative Our behavioural actions are inspect ignoreand shutdown We noted that these actions of interest arenot navigation oriented there is no way we could usethem in making decisions on transition The functions fordecision can be integrated to assist the other Thereforeour behavioural model is an integration of two behaviours(1) navigational behaviour and (2) cooperating behaviourthrough decision making The integration works with aroute finding method called RandomStateSelector which weintroduced in the second component of QLACS to helpdetermine where the robot goes from the starting point tothe exit point Two parts of the RandomStateSelector methodare introduced in this work The first one is the Random-StateSelector C H which is used to transit from state 119862 to119867and the second one RandomStateSelector F H transits fromstate 119865 to 119867 This method works but not effectively becausesome of the states are repeated several times because of therandom selection method However the decision part of thissecond component of QLACS which is handled by a methodcalled CheckInspection worked efficiently CheckInspection isresponsible for sharing the broadcast among the agents Thebroadcast looks at all the stored 119876-values on all the robotsand returns a signal for the action that needs to be takenTherefore we concluded that the heuristically acceleratedcomponent of QLACS has proven to be effective by showingevidence of effective communication in making inspectiondecisions using our model It did not guarantee shortest pos-sible time for inspection because of repeated states decisionsIn this light we only harnessed the communication strengthof the second component of QLACS for communicationand cooperation Figure 5 and Table 3 form the basis for theQLACS second component

To take care of the random state selector problem encoun-tered in implementing the algorithm used for the secondpart of QLACS we introduced an optimized route finder

Table 4 Combined adjacency and weight matrix

119865 119866 119864 119863 119860 119861 119862 119867

119865 0 1 1 1 0 0 0 1119866 1 0 2 0 0 0 1 0119864 1 2 0 2 0 2 0 0119863 1 0 2 0 2 0 0 0119860 0 0 0 2 0 2 0 0119861 0 0 2 0 2 0 1 0119862 0 1 0 0 0 1 0 1119867 1 0 0 0 0 0 1 0

1

11

1

12

2 2

2

2

E

C

G

B

A

D

F

Figure 6 Weighted mapgraph of the model environment

algorithm This route finder algorithm which forms the firstcomponent of QLACS is a swarm intelligence techniqueFigure 6 andTable 4 form the basis of the exploration space ofthe agents (ants) which achieved the optimum route findingThe weighted graph in Figure 6 is based on the number ofobstacles the ants will encounter from the points of entry 119865and 119862 The combined table in Table 4 contains the weightsof the obstacles and evidence of an edge between any twovertices (states) It shows that there is a connection betweenany two vertices in a graph Generally a ldquo1rdquo or ldquo2rdquo depictsthe existence of an edge while a ldquo0rdquo represents the absenceof an edge that is no transition between such vertices Theconstructed graph is an undirected multigraph providingevidence of some agents coming from 119865 or 119862 of the mine(logical space) It is unidirectional because agents can movein any particular direction (multigraph) This means that thesameweights on the edges apply to both directionsThe graphdoes not show119867 we assume that once the agents reach 119865 or119862 they exit if all inspections have been done

33 HybridModel Development Theparameters for buildingthe analytical hybrid QLACS (equations (2) through (11))model are presented in Tables 5 and 6

331 Algorithmic and Mathematical Analysis

ACS Starts Computation of edge attractiveness

120578119894119895=

1

119863119894119895

(2)

Mathematical Problems in Engineering 9

Table 5 A list of parameters for the ACS model

ACO parameters Meaning120572 Pheromone influence factor120573 Influence of adjacent node distance120588 Pheromone evaporation coefficient119876 Attractiveness constant119890 Visited edge1198901015840 Edge not visited119871119896 Length tour of ant k

120591 Pheromone concentration (amount)120578 Specific visibility function (attractiveness)Δ120591119896 Pheromone concentration by Kth ant

119875119903(119894 119895) Probability of moving from i to j

119863119894119895 Visibility or distance between i and j

119891119894 Fitness of individual in a population119875119894 Probability of being selected among 119891

119894

119873 Number of individuals in the population119894 119895 Denotes any two adjacent nodes in the graph119872119896 Set of unvisited nodes

Table 6 A list of parameters for the QL model

QL Parameters Meaning119876 119876-value update119904 State119886 Action119877 Reward120574 Learning rate

Computation of instantaneous pheromone by ant 119896

Δ120591119896=

119876

119871119896

(3)

Update of pheromone

120591119894119895= (1 minus 120588) lowast 120591

119894119895+ Δ120591119896

119894119895 (4)

Computation of edge probability

119875119903(119894 119895) =

[120591119894119895]120572[120578119894119895]120573

Σ1198901015840=(119894119895)

[120591119894119895]120572[120578119894119895]120573 (5)

Adoption of roulette wheel

Cumulative (119875119903(119894 119895)) =

119873+1

sum119894=1

119875119903(119894 119895) (6)

119891119894=sum119873

119895=1119891119895

119873 (7)

119875119894=

119891119894

sum119873

119895=1119891119895

(8)

Equations (2) through (8) build the complete route find-ing model Equations (2) through (4) are prerequisite to (5)Equation (5) is prerequisite to roulette wheel selection At theend of (8) new states are selected and the trail is updatedThebest path from both directions is selected and used as inputin Q-learning Note that 119863

119894119895= weight on edges 119875

119903(119894 119895) =

chance of moving to a node 119864(119868 119895) 119871119896= sum of visited nodes

by ant 119896

QL Starts Each robot in its QL threadcomputes its learning rate

120574 =05

[1 + Frequency (119904 119886)] (9)

119876-values are updated

119876 (119904 119886) = 119877 (119904 119886) + 120574 (10)

making a broadcast (Decision = InspectIgnoreShut-down)

119876 (119904 119886) =

119876 = 0 Inspect if 119904119895

= goalstate119876 gt 0 Ignore if 119904

119895= goalstate

119876 ge 0 Shutdown if 119904119895= goalstate

(11)

Equation (9) is Gamma the learning rate which is alwaysbetween zero and 1 This equation is calculated based onthe frequency of action of each robot in inspecting statesEquations (9) to (11) are state-dependent The states are keptin a buffer and then accessed at run time ACS and QL do notwork simultaneously ACS works to completion and QL takesthe final output as its input ACS is not repeatedly called whileQL is working

Our behavioural model is an integration of two algo-rithms (1) route finding algorithm (2) communication andcooperative algorithm The integration works the followingway The optimal route finder (ACS) determines where therobot goes from the starting point to the destination whileQL determines what action it takes when it gets to any ofthe states This collaboration works effectively because theoptimal route finder has been proven to give the best possibletransitions and the heuristically accelerated QL has provento be effective by showing evidence of effective commu-nication in making inspection decisions Consequently itguarantees the shortest possible time for inspection in theabsence of wasteful inspection decisions This frameworkforms the basis for our cooperative behaviourmodel forMRS(QLACS)Thepseudocode for the implementation ofQLACSis outlined in Algorithm 2

332 QLACS Hybrid Approach Evolution Figure 7 is thearchitecture of hybrid QLACS explaining the handshakebetween the two components

(i) Graph Construct Module This is the interconnection ofstates in our model environment (see Figure 4) It encom-passes all possible transitions from 119865 to 119862 and vice versaThus from themodelwe construct an adjacencyweight graphmatrix that can be traversed by any graph-oriented algorithm

10 Mathematical Problems in Engineering

INPUT Edge distance(obstacles) pheromones antsrsquo trail associated probabilitiesstarting and terminating indexes that is from F or COUTPUT Effective cooperation inspection and navigation

(1) Boolean CompletedFlag = False Boolean variable indicates completion for all the threads(2) Declare CompletedThreadBuffer Data structure stores Info about completed thread(3) Initialize all 119876values to Zero All 119876values positions are initialized to zero(4) Initialize Best Paths From ACO algorithm starting from 119865 and 119862(5) While (CompletedFlag ltgt True) Checks for when all robots have finished and flag is true

BeginCreate N number of Threads in ParallelThreads get Current States in Parallel from ACO algorithmThreads get Next States and Indexes in Parallel from ACO algorithmThreads compare 119876values of all corresponding positions of Current States (Each Robot Broadcast119876value info)IF ((Q == 0)amp (ThreadNextState ltgt GoalState)) Checks if a particulate state is availableBeginState is Available Robot with the CurrentThreadID InspectsCompute and Update 119876value

EndIF (Q gt 0) checks if a state is not available because an already inspected state has 119876 gt 0BeginState is already inspected Robot with the CurrentThreadID IgnoreCompute and Update 119876value

EndIF ((Q == 0) amp (ThreadNextState == GoalState)) Checks for goal state and shuts downBeginCompute and Update 119876valueGoal state is reached and Inspection CompletedThread with the CurrentThreadID Shuts downStore CurrentThreadID in CompletedThreadBuffer

EndIF (Count [CompletedThreadBuffer] == NumberOfRobot) Learning stops when this happensBeginCompletedFlag = True

EndEnd of While Loop

Algorithm 2 Pseudocode for QLACS

No

No

Yes

Yes2

robots

Graphconstruct

Initializeparameters

module

Stateselectionmodule

Stateaction

module

QLACSupdatemodule

Statebroadcasting

module

Goal statemodule

New statemodule

Decisionmodule

Cooperativeinspection

module

Convergencedecisionmodule

Transitionmodule

Updatemodule

Figure 7 Hybrid architecture

Mathematical Problems in Engineering 11

Example I of QLACS (Using Figure 5 for optimized route finding)Parameters used in the first component of QLACS119876 = 20 120572 = 3 120573 = 2 120588 = 001

Starting State 119865Terminating State 119862119865Terminating condition When all states have been visited at least once and it is at terminating stateState Space = 119865 119866 119864119863 119860119861 119862Initialize pheromones positions to 001Rand = (0 1) returns a random value between 0 and 1Equations(i) Computation of attractiveness 120578

119894119895using (2)

(ii) Computation of instantaneous pheromones by ant 119896 for all potential states using (3)(iii) Pheromone update using (4)(iv) Computation of probabilities for the potential states using (5)(v) Adaptation of roulette Wheel using (6)Equation (5) can be reduced to Let 119908(119894 119895) = [120591

119894119895]120572

[120578119894119895]120573

Sum =119873+

sum119894=1

119908 (119894 119895)

So 119875119903(119894 119895) = 119908

sum

Algorithm 3 Parameters for QLACS Example I

In our case there are eight states primarily seven internalstates and a common terminating state Since the states arenot just linear the environment allows for multiple optionsthat is an agentant can choose from any present state Thistype of scenario is also called a multigoal scenario

(ii) State Selection Module Here the agents select the startand end states which according to our model in Figure 4 are119865 and 119862 These states are selected based on the cumulativeprobability of two adjacent nodes in (6)

(iii) Transition Module This module takes care of the tran-sition rules of the agents by calculating the pheromoneconcentrations distances and probabilities using (2) through(4)

(iv)UpdateModuleAfter transition fromone state to anotheran update of the pheromone is computed after whichmultipath planning with shortest path is achieved

(v) Convergence Decision Module This is where the besttrailpath decision is takenThis is the end product of the firstcomponent of QLACS which is then moved to the secondcomponent of QLACS for cooperative behaviour

(vi) Cooperative Inspection Module This is where the robotsare deployed to start inspection The robots are to use theacquired best paths starting from 119865 and 119862 respectively asinput for the second component of QLACS The two robotsare to use the learning rate from (9) to learn the environmentand use (10) for cooperation

(vii) State Broadcasting Module This module handles thebroadcasting behaviours of the two robots which areachieved by using (10) Each robot checks its memoryrepresented by 119876-values before taking any decision

(viii) State Action Module State broadcasting by each robot isimmediately followed by action selection In other words thestate to inspect or ignore is achieved here using (11)

(ix) QLACS Update Module After each action selection the119876-values of each robot are updated using (10)

(x) New State Module This module takes care of the robotrsquosnew state after updating the 119876-values Each robot runs in itsown threads managing its memory yet sharing information

(xi) Final Decision ModuleThis decision module determinesif the robot should exit the environment or still do moreinspections It is also controlled by (11)

(xii) Goal State Module The completion of the second com-ponent of QLACS is getting to the goal state after successfulinspection of statesThis goal state according to our model inFigure 4 is called119867

333 Analytical Scenario For the benefit of robotic andsystem engineers practitioners and researchers this paperuniquely presents scenarios on the ease of implementation ofthe proposed QLACS as described in Algorithms 3 and 4 andTables 7 and 10 The first component of QLACS is explainedin Example I shown in Algorithm 3 and Table 7 The firstcomponent that achieved optimal route paths for the robotshas the parameters listed in Algorithm 3 The calculation fordifferent states transition for the first component of QLACSis analysed in Table 7

Repeat steps 1ndash6 for subsequent current states until thetermination condition and state are reached At the end ofseven updates we have Tables 8 and 9 The total lengthshown in Table 8 represents the total number of obstaclesencountered by each agent while trailing to achieve theoptimal route for the robots The number of obstacle shown

12 Mathematical Problems in Engineering

Table7QLA

CSEx

ampleI

navigatio

nalanalytic

alsolutio

n

Startin

gsta

te119865

Potentialstates119863119864

119866(1)U

se(2)120578119865119863=1120578119865119864=1120578119865119866=1

(2)U

se(3)119871119896=1Δ120591119896(119865119863

)=21

=2Δ

120591119896(119865119864)=2Δ

120591119896(119865119866

)=2

(3)U

se(4)120591 119865119863=(1minus001)lowast001+2=20099=201120591119865119864=201120591119865119866=201

Firstp

heromon

eupd

ate

119865119866

119864

001

201

201

119863119860

119861

201

001

001

119862

001

(4)U

se(5)119908

(119865119863

)=[120591119865119863]120572[120578119865119863]120573=(201)3=812119908

(119865119864)=812119908

(119865119866

)=812

Sum

=119908(119865119863

)+119908(119865119864)+119908(119865119866

)=2436

119875119903(119865119863

)=

119908 sum

=812

2436=033119875119903(119865119864)=033119875119903(119865119866

)=033

Prob

abilitie

s119865

119866119864

0033

033

119863119860

119861

033

00

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0033

033

119863119860

119861

033

00

119862 0

CallRa

ndRa

nd=007R

andfalls

instate119864

Rou

lette

wheelselects119864

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

Currentstate119864

Potentialstates119861119863

119866(1)U

se(2)120578119864119861=12120578119864119863=12120578119864119866=1

(2)U

se(3)119871119896=(119865

minus119864minus119863)=1+2=3119871119896=(119865

minus119864minus119861)=1+2=3119871119896=(119865

minus119864minus119866)=1+2=3

Δ120591119896=2 3=067

(3)U

se(4)120591 119864119861=(1minus001)lowast201+067=266120591119864119863=266120591119864119866=266

Second

pherom

oneu

pdate

119865119866

119864

001

266

201

119863119860

119861

266

001

266

119862

001

(4)U

se(5)119908

(119864119861)=(266)3lowast(12)2=471119908

(119864119863

)=471119908

(119864119866

)=(266)3lowast(1)2=1882

Sum

=471+471+1882=2824

119875119903(119864119861)=

119908 sum

=471

2824=017119875119903(119864119863

)=

471

2824=017119875119903(119864119866)=1882

2824=067

Prob

abilitie

s119865

119866119864

0067

0

119863119860

119861

017

0017

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0067

0

119863119860

119861

017

0017

119862 0

CallRa

ndRa

nd=09Ra

ndfalls

instate119863

Rou

lette

wheelselects119863

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

119863

Mathematical Problems in Engineering 13

Example II of QLACS (For good cooperation and communication between robots)Parameters used in the second component of QLACS (Using output from the first component of QLACS)Reward Scheme Inspect = 50 Ignore = 100 Shutdown = 150State space Optimized path from QLACS R1 = 119865 119864119863 119860 119861 119862 119866 119862 and QLACS R2 = 119862 119861 119860119863 119865 119864 119866 119862

Starting State 119865119862119878119895= Terminating State 119862 then119867

Terminating condition When all states have been visitedInitialize 119876value positions to zerosEquations(i) Compute learning rate using (9)(ii) Compute update on 119876(119904 119886) using (10)

Algorithm 4 Parameters for QLACS Example II

Table 8 Pheromone update of a full cycle

Pheromone updateCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 001 201 201 201 001 001 001 1st update119864 001 266 201 266 001 266 001 2nd update119863 313 001 303 001 303 001 001 3rd update119860 001 001 001 045 001 045 001 4th update119861 001 001 067 001 067 001 07 5th update119862 001 091 001 001 001 089 001 6th update119866 071 001 069 001 001 001 071 7th update119862 = terminating state and ant 119896 has moved through all states at least onceTrail 119865119864119863119860119861119862119866119862Number of obstacles = 1 + 2 + 2 + 2 + 1 + 1 + 1 + 1 = 10

Table 9 Probability update of a full cycle

Probability tableCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 0 033 033 033 0 0 0119864 0 067 0 017 0 017 0119863 069 0 016 0 016 0 0119860 0 0 0 05 0 05 0119861 0 0 016 0 016 0 068119862 0 052 0 0 0 049 0119866 045 0 0 0 0 0 045119862 = terminating state

as 10 is calculated using the trail result in Table 8 Table 9displays the probability update table for a full cycle of routefinding In the case of this first example the first robot willterminate its inspection through 119862 and then119867

Algorithm 4 displays the parameters of the second exam-ple of QLACS The second component of QLACS handlesthis aspect of the model which is the communication andcooperative part Once the first component hands the outputto the second component it becomes the second componentinput and it runs with it From Algorithm 4 QLACS R1represents the input for robot 1 and QLACS R2 representsthe input for robot 2 received from the first component of

QLACS The terminating condition is when all states havebeen visited

The rest of Example II displayed in Table 10 explains thecooperative behavioural scenario from one state to anotherfor two robots The tables in the last row of Table 10 show thecommunication and cooperative output achieved using thesecond component of QLACS

4 Experimental Evaluations SafetyInspections for MRS Behaviours

The experimental results of implementing QLACS in anenvironment that consists of obstacles and links are tabu-lated in this section The experimental setup is explainedin Section 41 Different performance categories are shownin this section without communication category and withcommunication category In the without communicationcategory as displayed in Section 42 we found that robotscan inspect all states individually without knowing thatanother robot exists Robots can also inspect some of thestates thereby leaving some states not inspected The com-munication category is explained in Section 43 while theperformance of the QLACS measured with other existingmethods is tabulated in Section 44

41 Experimental Setup Figure 8 displays different sets ofexperiments conducted in the environment using two robotsFigure 8(a) shows how two robots resume inspection fromtwo different entrances In each set of experiments the robotstake the readings of the roof cracks and level of toxic gasesusing their sensors The same behaviours happen in Figures8(b) and 8(c) respectively at different inspection locationsThe inspection locations vary in the four experimental setupsshown in Figure 8

42 Experiment 1 Performance of QLACS without Coop-eration The result of implementing the QLACS withoutcommunication in the proposed environment (Figures 1 and4) is shown in Tables 11 and 12 In the case of Table 11 robot1 (R1) enters the mine through state 119865 while robot 2 (R2)enters the mine through state 119862 However each robot learnsby inspecting some of the states and ignoring some of the

14 Mathematical Problems in Engineering

Table 10 QLACS Example II cooperative behaviour

Starting simulation Robot 1Starting state F(1) Use (9)120574 (119865) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119865 (use (11))119876 (119865 119886) = 0

Selected action 119886 = inspect(3) use (10)119876 (119865 119886) = 50 + 025(0) = 50

End of value iteration

Current state E Robot 1(1) Use (9)120574 (119864) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119864 (Use (11))119876 (119864 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119864 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119862 (use (11))119876 (119862 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119862 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state B Robot 2(1) Use (9)120574 (119861) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119861 (use (11))119876 (119861 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119861 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current stateD Robot 1(1) Use (9)120574 (119863) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119863 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119863 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current State A Robot 2(1) Use (9)120574 (119860) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119860 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119860 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iterationCurrent state A Robot 1(1) Use (9)120574 (119860) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119860 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119860 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current stateD Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119863 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119863 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state B Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state F Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119865 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119865 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state E Robot 2(1) Use (9)120574 (119864) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119864 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119864 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state G Robot 1(1) Use (9)120574 (119866) =

05

[1 + 1]=05

2= 025

(2) Broadcast (Use (11))119876 (119866 119886) = 0

Selected action 119886 = Inspect(3) Use (10)119876 (119866 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state G Robot 2(1) Use (9)120574 (119866) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119866 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119866 119886) = 100 + 04 (max (50 50 50))= 100 + 20 = 120

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119862) =

05

[1 + 04]=05

14= 036

(2) Broadcast (Use (11))119876 (119862 119886) = 50

Selected action 119886 = Ignore(3) Use (10)119876 (119862 119886) = 100 + 036 (max (50 50))= 100 + 18 = 118

End of value iterationCurrent state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 036]=

05

136= 037

(2) Broadcast (Use (11))119876 (119862 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119862 119886) = 100 + 037 (max (50 50))= 100 + 037 lowast 50 = 1185

End of value iteration

All QLACS 1 states exhaustedGoal stateH Robot 1(1) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(3) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

All QLACS 2 states exhaustedGoal stateH Robot 2(4) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(5) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(6) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Cooperative Behaviours with Swarm Intelligence in Multirobot

4 Mathematical Problems in Engineering

Table 1 Variant of ACO

SN Year Algorithm1 1991 Ant system (AS)2 1992 Elitist AS3 1995 Ant-Q4 1996 Ant colony system5 1996 Max-Min AS (MMAS)6 1997 Ranked based AS7 1999 ANTS8 2000 BWAS9 2001 Hypercube AS

(other robots in the environment) search time increasesHowever following our heuristic model of QL which wasmainly determined by the policy we set to achieve our goalour QL performs well above traditional QL

23 Ant Colony Optimization ACO is a type of swarmintelligence (SI) Bonabeau et al [14] defined SI as any attemptto design algorithms or distributed problem-solving devicesinspired by collective behaviour of social insect coloniesand other animal societies This implies that anytime some-thing is inspired by swarms it is called swarm intelligenceResearchers have been thrilled by swarms because of someof their fascinating features For instance the coordinatedmanner in which insect colonies work notwithstandinghaving no single member of the swarm in control thecoordinatedways inwhich termites build giant structures andhow the flocks move as one body and the harmonized waysin which ants quickly and efficiently search for food can onlybe attributed to an emergent phenomenon [15 16]

Ants an example of a social insect colony achievetheir self-organizing robust and flexible nature not bycentral control but by stigmergy Stigmergy also known as apheromone trail describes the indirect communication thatexists within the antrsquos colony The indirect communicationwhich is triggered by pheromone trails helps in recruitmentof more ants to the system Ants also use pheromones to findthe shortest paths to food sources Pheromone evaporationprevents stagnation which also helps to avoid prematureconvergence on a less than optimal path [17]

ACO is necessarily amultiagent based algorithmThefirstimplementation of this optimization algorithm is in a classicalsearch based (combinatory) problem the travelling salesmanproblem giving the shortest path to a specified destinationAfter this feat many researchers have used it or its variantsto model different problems In this work a variant of ACOis used to find the optimal path for MRS Table 1 describesvariants of ACO and their meaning [18] see also [19] and allreferences therein

In AS the pheromones are updated by all the ants thatcomplete a tour ACS is the modified version of AS whichintroduces the pseudorandom proportional rule In elitistAS ants that belong to the edges of the global best tour getan additional amount of pheromone during the pheromone

update MMAS introduces an upper and lower bound to thevalues of the pheromones trails All the solutions are rankedto conform to the antsrsquo length in ranked-based AS [20]

Our interest is in implementing ACS to find the best pos-sible round trip inspection path in our model environmentThis best possible inspection path will be supplied as inputor made available for the robots using QL for inspectiondecisions In most of the route finding scenarios the nodesare linearly joined and are not largely distributed This isa scenario where ants can forage along at least two paths(multigoal path environment though in our case there is apath exposed to four different goals see Section 3)

3 Proposed Cooperative MRSBehaviour Framework

The following are some of the factors we are considering inthe course of building the model Each robot has to learnto adjust to the unknown underground mine environmentIn this case each robot requires intelligence and a suit-able machine learning algorithm is adopted No robot hasglobal information of the environment because of its limitedsensing capabilities Each robot has limited communicationcapabilities therefore each robot has to keep track of theinformation of others to remain in a team Figure 2 describesthe proposed framework as an approach for building thedistributed coordinated inspecting model for MRS

The framework indicates three layers of the model thebottom layer the middle layer and the topmost layer Thelearning capability of the MRS is achieved in the bottomlayer with the reinforcement learning algorithm and swarmintelligence technique This intelligence enables robot A totake action knowing the action of robot B and vice versa Atthe middle layer scalability in terms of the number of robotsthe system can accommodate is achieved This is expedientbecause a team of robots tends to achieve tasks more quicklyand effectively than single robots This scalability is handledwith some memory management techniques as indicated inFigure 2The real life implementation is achieved by using theinformation acquired from the topmost layer Figure 3 is thebreakdown of the framework in Figure 2

There is a base-station or server that serves as a backupfor the information captured and analysed from individualrobots The model proposed in this research deals withthe way in which robots need to find their way withincommunication range and uses a broadcast approach foreffective communication of navigation status A team ofrobots cooperatively inspecting an area in the undergroundmine will need to know where they are and where togo next so it is obviously a continuing problem and ourcontribution in this case is that before robot R1 takes anaction it broadcasts its location and inspection status toother robots R2 R3 and so forth and vice versa Anunreachable robot receives packets of information based onthe destination address through rerouting from the nearerrobotsThe reliability of this broadcastmethod is the ability todetermine the extent of the task executed already by lookingat the memory of any leading robot in the team

Mathematical Problems in Engineering 5

MRS mine inspection Miners for operation

Scalable system

Global databaseMemory usage

IntelligenceQLACS

ACSQL

Gas meter Local database Roof sound sensor

Graph

MRS distributedknowledge baseaccess simpledecision makinginterrobotcommunication

Appl

icat

ion

laye

rTo

pmos

t lay

erSc

alab

ility

laye

rM

iddl

e lay

erC

oope

rativ

e beh

avio

ural

laye

rBo

ttom

laye

r

Figure 2 Framework of the proposed QLACS model for cooperative behaviours

Table 2 State and possible actions of the environment

State Possible Actions1 A (lower left part (LLP)) Inspect ignore2 B (lower middle part (LMP)) Inspect ignore3 C (lower right part (LRP)) Inspect ignore4 D (middle left part (MLP)) Inspect ignore5 E (central part (MCP)) Inspect ignore6 F (upper left part (ULP)) Inspect ignore7 G (upper right part (URP)) Inspect ignore8 H (outside mine part (OMP)) Shutdown

31 Problem Formulations Suppose we have seven roomsstates connected by doorslinks representing undergroundmine regions as shown in Figure 4 and labeled as shown inTable 2 We label each room 119860 through 119865 The outside of themine can be thought of as one big room (119867) Notice thatdoors 119865 and 119862 lead outside the mine 119867 We put two robotsin rooms 119865 and 119862 as their starting states respectively Therobots inspect one state at a time considering the obstaclesencountered in the process The robots can change directionfreely having linksdoors to other roomsstates In each ofthe states in Figure 4 two actions are possible inspection of

roof cracks (RC) and level of toxic gases (TG) or ignoringthe state as it has been inspected earlier According toour proposed model a robot can inspect at least half ofthe given underground mine region to attain its maximumperformance which in turn attracts a good reward Whenthe current state of a robot is the end point of the inspectiontask the robots exit the mine using the exit points 119862 and119865 respectively The possible transition states of the robotsare displayed using the state diagram in Figure 5 The statediagram and the transition matrix are formed using the QLalgorithm

The global map is assumed to be known by the robotsbut there is no prior knowledge of the local map Robotsonly know what they have sensed themselves and whattheir teammates communicate to them Not only does ourimprovedQL depend on themap of the environment but alsoeach robot learns through experience about local changesThey explore and inspect from state to state until they getto their goal states Our proposed model QLACS achievesan offline mode for the route finding algorithm (ACS) Thismeans that the global view of the map would have beenprovided before the learning robots start

The model is explained using simulations the results arepresented in Section 4 Some of the results also have graphicalinterpretations Analysis by a state transition diagram is

6 Mathematical Problems in Engineering

Table 3 Initial reward matrix

Robotrsquos actionA B C D E F G H

Robotrsquos state

A mdash 50 100 mdash 50 100 mdash mdash mdash mdashB 50 100 mdash 50 100 mdash 50 100 mdash mdash mdashC mdash 50 100 mdash mdash mdash mdash 50 100 150D 50 100 mdash mdash mdash 50 100 50 100 mdash mdashE mdash 50 100 mdash 50 100 mdash 50 100 50 100 mdashF mdash mdash mdash 50 100 50 100 mdash 50 100 150G mdash mdash 50 100 mdash 50 100 50 100 mdash mdashH mdash mdash 50 100 mdash mdash 50 100 mdash mdash

Bottom layer‐ A new hybrid model QLACS

‐ An efficient cooperative andnavigational model

Middle layer‐ An efficient scalable system

Topmost layer‐ Acquired results stored for future use

(a) (b)

Shortest round trip pathsGood communicationTime of performance

Number of iterations usedThorough inspection

Exit the system

Are theysuccessful

Monitor memorywith the

utilizationinterface platform

Record the results

Acquire informationfrom the bottom

layer

Build a hybrid modelusing QL and ACS

algorithms

Two robots

Test robots for

Add one morerobot to the

system

Results will beused in the

application layer

Database

Yes No

Database

Figure 3 Breakdown of the framework (a) Contributions of the framework Layer by layer and (b) processes of the multirobot behaviouralsystem

presented in Figure 5 The possible state actions of the robotsare presented in Table 3 The states of the robots are reducedas follows searching or inspecting for roof cracks and toxicgas levels in each state or room and recording the outcomeof the inspection in the robotsrsquo memories The processesinvolved in achieving good communication while the robotsinspect the states are broadcasting inspected states to theother robots and ignoring the inspectedbroadcasted stateavoiding collision with obstacles and other robots and finallymoving to the next available state that has not been inspected

The Q-learning algorithm is used to determine whataction is to be selected in any particular state Let theaction (119886) = inspect ignore shutdown state space (119904) =dimensions in the topological map sensor readings (119904

119903)

hazardous conditions (119867119888) = roof crack (RC) toxic gas level

(TG) Each robot is configured with an aerial scanner andchemical sensitive sensor that will provide readings (119904

119903)

to determine if there is a hazardous condition 119867119888 in the

environment The selection of an action by a robot throughthe Q-learning algorithm is based on the information fromthe broadcast The whole framework uses a decentralized 119876-learning scheme such that each robot has its own threadwith its Q-learning and table All 119876-values on the tables ofthe robots in the team are initialized to zero that is statescorresponding to such positions have not been inspectedWhen a robot enters a new state it broadcast its informationfor the current state to all other robots in the environmentThe broadcast indicates whether the current state has been

Mathematical Problems in Engineering 7

F

H

Houtside

Robot Bstarting position

Robot Astarting position

Linkbw A and B

mine

D

A B

E

G

C

Figure 4 Model of the environment with two entrances and exits(2EE)

F

BA (50 100)

(50 100)

(50 100)

(50 100)

(50 100) 150

150

(50 100)

(50 100)

(50

100

)

(50

100

)

(50

100

)

D E G

C

H

Goal state

Figure 5 Events and transition diagram of the modeled environ-ment with119867 as goal state

inspected or not The broadcast is a search of correspondingpositions in the memories of all visible robots If the resultantsearch returns a zero the broadcast indicates that the statehas not been inspected and if it returns a value greater thanzero it indicates that the state is not available for inspectionAll robots must receive a broadcast before they act in anyparticular state When a robot gets to a state it receives abroadcast and passes its value to the QL algorithm to makea decision The robot carries out the decision of the QLalgorithm The policy of the Q-learning algorithm makes arobot carry out an inspection if the resultant search of thebroadcast returns zero and ignores it if the resultant searchis greater than zero A robot only shuts down if all states havebeen visited and inspected As the broadcast information ispassed to the Q-learning algorithm the policy is iteratedtowards an optimal value and condition

The broadcast avoids the cost of multiple inspections andthe QL ensures that robots take appropriate actions Theonly state in which inspection is carried out would havesensor readings (119904

119903) indicating 119867

119888 For taking an action (119886)

in state (119904) the robot gets a reward (119877) which is used in(10) to compute the 119876-value for the current state and cansend a broadcast to other robots Figure 5 and Table 3 showthe restrictions and possible transitions among node pointsindicating the possible rewards for any particular action (119886)

in any state (119904) Every other transition besides the goal state

could result in a reward of 50 or 100 and at completionthe reward is the maximum 150 We consider it wasteful tomake the robots scan first before sharing intelligence becausesuch action would slow them down and make them expendmore energy Robots are to provide the inspection resultsshowing actions taken in different states and the nature ofconditions detected in any particular state The introductionof the broadcast approach to determine the teamrsquos explorationreduces the execution time and energy cost of the wholeteamandmakes the collaboration effective So in amultirobotscenario the task can be effectively executed if the robots aremade to share intelligence as they progress Robots do nothave to waste time and energy in doing the same task alreadycarried out by other robots in the team

GIVEN On a mathematical justification of the above sup-pose a safety preinspection of toxic gases or rock fall or somecombination of the two is being carried out on a piece ofcomplex underground terrain in Figure 1 say 119871Km2 thereis a limited number of MRS with different capacities 119877 andprecious inspection time 119879minutes Every regionstate 119909

1in

the terrain requires a capacity of robot1198771ofMRS and limited

time 1198791while every state 119909

119899requires robot 119877

119899of MRS and

inspection time 119879119899 Let 119875

1be the positive reward of QL for

correct behaviour on state 1199091and let 119875

119899be the reward on

state 119909119899 This research aims to maximise positive rewards by

choosing optimal values for states 1199091 119909

119899as follows

Maximise

1198751sdot 1199091+ 1198752sdot 1199092+ sdot sdot sdot + 119875

119899sdot 119909119899

(objective function)

Subject to constraints

1199091+ 1199092+ sdot sdot sdot + 119909

119899le 119871

(limited total states)

1198771sdot 1199091+ 1198772sdot 1199092+ sdot sdot sdot + 119877

119899sdot 119909119899le 119877

(limited MRS capacity)

1198791sdot 1199091+ 1198792sdot 1199092+ sdot sdot sdot + 119879

119899sdot 119909119899le 119879

(limited inspection time)

Non-negativity constraints

1199091ge 0 119909

2ge 0 119909

119899ge 0

(MRS cannot inspect negative states)

(1)

In terms of solving this optimization problem we use theproposed QLACS model to compare the time and number ofstates inspectedThe number of robots used is also comparedThe graphs results in Section 4 also give more insight into thesolution of the problem

32 Basic Navigation and Cooperative Behaviours UsingQLACS QLACS has two components The first componentis formed by an improved ACS and the second component

8 Mathematical Problems in Engineering

is formed by an improved QL The improvement occursbecause some heuristics were added to the ordinary QL andACS to achieve the hybrid QLACS However the secondcomponent ofQLACS which is an improvedQL was initiallyused to solve the proposed problem After much analysis werealized that the system needs to be optimized for effectivecooperation and communication

Using the second component of QLACS to solve the basicnavigation and cooperative behaviour the possible actionswere set for each robot as inspect ignore and shutdown (afterreaching the goal state 119867) Also a reward system that willreflect the possible actions of the robots was chosen In otherwords a robot gets 150 points only when the goal is achieved(shutdown) 100 points for ignoring an already inspected area(ignore) and 50 points for inspecting an uninspected area(inspect) Figure 5 shows the transition events of the modeland Table 3 displays the possible state action for each robotThe way the QLACS second component works here is basedon both navigation and communication behaviours

Achieving navigational behaviour with the second com-ponent of QLACS has some cost associated to it In ourscenario because we want the robots to share intelligenceby broadcasting results (robots search through other robotsrsquomemory) our problem is not solely navigational but alsocooperative Our behavioural actions are inspect ignoreand shutdown We noted that these actions of interest arenot navigation oriented there is no way we could usethem in making decisions on transition The functions fordecision can be integrated to assist the other Thereforeour behavioural model is an integration of two behaviours(1) navigational behaviour and (2) cooperating behaviourthrough decision making The integration works with aroute finding method called RandomStateSelector which weintroduced in the second component of QLACS to helpdetermine where the robot goes from the starting point tothe exit point Two parts of the RandomStateSelector methodare introduced in this work The first one is the Random-StateSelector C H which is used to transit from state 119862 to119867and the second one RandomStateSelector F H transits fromstate 119865 to 119867 This method works but not effectively becausesome of the states are repeated several times because of therandom selection method However the decision part of thissecond component of QLACS which is handled by a methodcalled CheckInspection worked efficiently CheckInspection isresponsible for sharing the broadcast among the agents Thebroadcast looks at all the stored 119876-values on all the robotsand returns a signal for the action that needs to be takenTherefore we concluded that the heuristically acceleratedcomponent of QLACS has proven to be effective by showingevidence of effective communication in making inspectiondecisions using our model It did not guarantee shortest pos-sible time for inspection because of repeated states decisionsIn this light we only harnessed the communication strengthof the second component of QLACS for communicationand cooperation Figure 5 and Table 3 form the basis for theQLACS second component

To take care of the random state selector problem encoun-tered in implementing the algorithm used for the secondpart of QLACS we introduced an optimized route finder

Table 4 Combined adjacency and weight matrix

119865 119866 119864 119863 119860 119861 119862 119867

119865 0 1 1 1 0 0 0 1119866 1 0 2 0 0 0 1 0119864 1 2 0 2 0 2 0 0119863 1 0 2 0 2 0 0 0119860 0 0 0 2 0 2 0 0119861 0 0 2 0 2 0 1 0119862 0 1 0 0 0 1 0 1119867 1 0 0 0 0 0 1 0

1

11

1

12

2 2

2

2

E

C

G

B

A

D

F

Figure 6 Weighted mapgraph of the model environment

algorithm This route finder algorithm which forms the firstcomponent of QLACS is a swarm intelligence techniqueFigure 6 andTable 4 form the basis of the exploration space ofthe agents (ants) which achieved the optimum route findingThe weighted graph in Figure 6 is based on the number ofobstacles the ants will encounter from the points of entry 119865and 119862 The combined table in Table 4 contains the weightsof the obstacles and evidence of an edge between any twovertices (states) It shows that there is a connection betweenany two vertices in a graph Generally a ldquo1rdquo or ldquo2rdquo depictsthe existence of an edge while a ldquo0rdquo represents the absenceof an edge that is no transition between such vertices Theconstructed graph is an undirected multigraph providingevidence of some agents coming from 119865 or 119862 of the mine(logical space) It is unidirectional because agents can movein any particular direction (multigraph) This means that thesameweights on the edges apply to both directionsThe graphdoes not show119867 we assume that once the agents reach 119865 or119862 they exit if all inspections have been done

33 HybridModel Development Theparameters for buildingthe analytical hybrid QLACS (equations (2) through (11))model are presented in Tables 5 and 6

331 Algorithmic and Mathematical Analysis

ACS Starts Computation of edge attractiveness

120578119894119895=

1

119863119894119895

(2)

Mathematical Problems in Engineering 9

Table 5 A list of parameters for the ACS model

ACO parameters Meaning120572 Pheromone influence factor120573 Influence of adjacent node distance120588 Pheromone evaporation coefficient119876 Attractiveness constant119890 Visited edge1198901015840 Edge not visited119871119896 Length tour of ant k

120591 Pheromone concentration (amount)120578 Specific visibility function (attractiveness)Δ120591119896 Pheromone concentration by Kth ant

119875119903(119894 119895) Probability of moving from i to j

119863119894119895 Visibility or distance between i and j

119891119894 Fitness of individual in a population119875119894 Probability of being selected among 119891

119894

119873 Number of individuals in the population119894 119895 Denotes any two adjacent nodes in the graph119872119896 Set of unvisited nodes

Table 6 A list of parameters for the QL model

QL Parameters Meaning119876 119876-value update119904 State119886 Action119877 Reward120574 Learning rate

Computation of instantaneous pheromone by ant 119896

Δ120591119896=

119876

119871119896

(3)

Update of pheromone

120591119894119895= (1 minus 120588) lowast 120591

119894119895+ Δ120591119896

119894119895 (4)

Computation of edge probability

119875119903(119894 119895) =

[120591119894119895]120572[120578119894119895]120573

Σ1198901015840=(119894119895)

[120591119894119895]120572[120578119894119895]120573 (5)

Adoption of roulette wheel

Cumulative (119875119903(119894 119895)) =

119873+1

sum119894=1

119875119903(119894 119895) (6)

119891119894=sum119873

119895=1119891119895

119873 (7)

119875119894=

119891119894

sum119873

119895=1119891119895

(8)

Equations (2) through (8) build the complete route find-ing model Equations (2) through (4) are prerequisite to (5)Equation (5) is prerequisite to roulette wheel selection At theend of (8) new states are selected and the trail is updatedThebest path from both directions is selected and used as inputin Q-learning Note that 119863

119894119895= weight on edges 119875

119903(119894 119895) =

chance of moving to a node 119864(119868 119895) 119871119896= sum of visited nodes

by ant 119896

QL Starts Each robot in its QL threadcomputes its learning rate

120574 =05

[1 + Frequency (119904 119886)] (9)

119876-values are updated

119876 (119904 119886) = 119877 (119904 119886) + 120574 (10)

making a broadcast (Decision = InspectIgnoreShut-down)

119876 (119904 119886) =

119876 = 0 Inspect if 119904119895

= goalstate119876 gt 0 Ignore if 119904

119895= goalstate

119876 ge 0 Shutdown if 119904119895= goalstate

(11)

Equation (9) is Gamma the learning rate which is alwaysbetween zero and 1 This equation is calculated based onthe frequency of action of each robot in inspecting statesEquations (9) to (11) are state-dependent The states are keptin a buffer and then accessed at run time ACS and QL do notwork simultaneously ACS works to completion and QL takesthe final output as its input ACS is not repeatedly called whileQL is working

Our behavioural model is an integration of two algo-rithms (1) route finding algorithm (2) communication andcooperative algorithm The integration works the followingway The optimal route finder (ACS) determines where therobot goes from the starting point to the destination whileQL determines what action it takes when it gets to any ofthe states This collaboration works effectively because theoptimal route finder has been proven to give the best possibletransitions and the heuristically accelerated QL has provento be effective by showing evidence of effective commu-nication in making inspection decisions Consequently itguarantees the shortest possible time for inspection in theabsence of wasteful inspection decisions This frameworkforms the basis for our cooperative behaviourmodel forMRS(QLACS)Thepseudocode for the implementation ofQLACSis outlined in Algorithm 2

332 QLACS Hybrid Approach Evolution Figure 7 is thearchitecture of hybrid QLACS explaining the handshakebetween the two components

(i) Graph Construct Module This is the interconnection ofstates in our model environment (see Figure 4) It encom-passes all possible transitions from 119865 to 119862 and vice versaThus from themodelwe construct an adjacencyweight graphmatrix that can be traversed by any graph-oriented algorithm

10 Mathematical Problems in Engineering

INPUT Edge distance(obstacles) pheromones antsrsquo trail associated probabilitiesstarting and terminating indexes that is from F or COUTPUT Effective cooperation inspection and navigation

(1) Boolean CompletedFlag = False Boolean variable indicates completion for all the threads(2) Declare CompletedThreadBuffer Data structure stores Info about completed thread(3) Initialize all 119876values to Zero All 119876values positions are initialized to zero(4) Initialize Best Paths From ACO algorithm starting from 119865 and 119862(5) While (CompletedFlag ltgt True) Checks for when all robots have finished and flag is true

BeginCreate N number of Threads in ParallelThreads get Current States in Parallel from ACO algorithmThreads get Next States and Indexes in Parallel from ACO algorithmThreads compare 119876values of all corresponding positions of Current States (Each Robot Broadcast119876value info)IF ((Q == 0)amp (ThreadNextState ltgt GoalState)) Checks if a particulate state is availableBeginState is Available Robot with the CurrentThreadID InspectsCompute and Update 119876value

EndIF (Q gt 0) checks if a state is not available because an already inspected state has 119876 gt 0BeginState is already inspected Robot with the CurrentThreadID IgnoreCompute and Update 119876value

EndIF ((Q == 0) amp (ThreadNextState == GoalState)) Checks for goal state and shuts downBeginCompute and Update 119876valueGoal state is reached and Inspection CompletedThread with the CurrentThreadID Shuts downStore CurrentThreadID in CompletedThreadBuffer

EndIF (Count [CompletedThreadBuffer] == NumberOfRobot) Learning stops when this happensBeginCompletedFlag = True

EndEnd of While Loop

Algorithm 2 Pseudocode for QLACS

No

No

Yes

Yes2

robots

Graphconstruct

Initializeparameters

module

Stateselectionmodule

Stateaction

module

QLACSupdatemodule

Statebroadcasting

module

Goal statemodule

New statemodule

Decisionmodule

Cooperativeinspection

module

Convergencedecisionmodule

Transitionmodule

Updatemodule

Figure 7 Hybrid architecture

Mathematical Problems in Engineering 11

Example I of QLACS (Using Figure 5 for optimized route finding)Parameters used in the first component of QLACS119876 = 20 120572 = 3 120573 = 2 120588 = 001

Starting State 119865Terminating State 119862119865Terminating condition When all states have been visited at least once and it is at terminating stateState Space = 119865 119866 119864119863 119860119861 119862Initialize pheromones positions to 001Rand = (0 1) returns a random value between 0 and 1Equations(i) Computation of attractiveness 120578

119894119895using (2)

(ii) Computation of instantaneous pheromones by ant 119896 for all potential states using (3)(iii) Pheromone update using (4)(iv) Computation of probabilities for the potential states using (5)(v) Adaptation of roulette Wheel using (6)Equation (5) can be reduced to Let 119908(119894 119895) = [120591

119894119895]120572

[120578119894119895]120573

Sum =119873+

sum119894=1

119908 (119894 119895)

So 119875119903(119894 119895) = 119908

sum

Algorithm 3 Parameters for QLACS Example I

In our case there are eight states primarily seven internalstates and a common terminating state Since the states arenot just linear the environment allows for multiple optionsthat is an agentant can choose from any present state Thistype of scenario is also called a multigoal scenario

(ii) State Selection Module Here the agents select the startand end states which according to our model in Figure 4 are119865 and 119862 These states are selected based on the cumulativeprobability of two adjacent nodes in (6)

(iii) Transition Module This module takes care of the tran-sition rules of the agents by calculating the pheromoneconcentrations distances and probabilities using (2) through(4)

(iv)UpdateModuleAfter transition fromone state to anotheran update of the pheromone is computed after whichmultipath planning with shortest path is achieved

(v) Convergence Decision Module This is where the besttrailpath decision is takenThis is the end product of the firstcomponent of QLACS which is then moved to the secondcomponent of QLACS for cooperative behaviour

(vi) Cooperative Inspection Module This is where the robotsare deployed to start inspection The robots are to use theacquired best paths starting from 119865 and 119862 respectively asinput for the second component of QLACS The two robotsare to use the learning rate from (9) to learn the environmentand use (10) for cooperation

(vii) State Broadcasting Module This module handles thebroadcasting behaviours of the two robots which areachieved by using (10) Each robot checks its memoryrepresented by 119876-values before taking any decision

(viii) State Action Module State broadcasting by each robot isimmediately followed by action selection In other words thestate to inspect or ignore is achieved here using (11)

(ix) QLACS Update Module After each action selection the119876-values of each robot are updated using (10)

(x) New State Module This module takes care of the robotrsquosnew state after updating the 119876-values Each robot runs in itsown threads managing its memory yet sharing information

(xi) Final Decision ModuleThis decision module determinesif the robot should exit the environment or still do moreinspections It is also controlled by (11)

(xii) Goal State Module The completion of the second com-ponent of QLACS is getting to the goal state after successfulinspection of statesThis goal state according to our model inFigure 4 is called119867

333 Analytical Scenario For the benefit of robotic andsystem engineers practitioners and researchers this paperuniquely presents scenarios on the ease of implementation ofthe proposed QLACS as described in Algorithms 3 and 4 andTables 7 and 10 The first component of QLACS is explainedin Example I shown in Algorithm 3 and Table 7 The firstcomponent that achieved optimal route paths for the robotshas the parameters listed in Algorithm 3 The calculation fordifferent states transition for the first component of QLACSis analysed in Table 7

Repeat steps 1ndash6 for subsequent current states until thetermination condition and state are reached At the end ofseven updates we have Tables 8 and 9 The total lengthshown in Table 8 represents the total number of obstaclesencountered by each agent while trailing to achieve theoptimal route for the robots The number of obstacle shown

12 Mathematical Problems in Engineering

Table7QLA

CSEx

ampleI

navigatio

nalanalytic

alsolutio

n

Startin

gsta

te119865

Potentialstates119863119864

119866(1)U

se(2)120578119865119863=1120578119865119864=1120578119865119866=1

(2)U

se(3)119871119896=1Δ120591119896(119865119863

)=21

=2Δ

120591119896(119865119864)=2Δ

120591119896(119865119866

)=2

(3)U

se(4)120591 119865119863=(1minus001)lowast001+2=20099=201120591119865119864=201120591119865119866=201

Firstp

heromon

eupd

ate

119865119866

119864

001

201

201

119863119860

119861

201

001

001

119862

001

(4)U

se(5)119908

(119865119863

)=[120591119865119863]120572[120578119865119863]120573=(201)3=812119908

(119865119864)=812119908

(119865119866

)=812

Sum

=119908(119865119863

)+119908(119865119864)+119908(119865119866

)=2436

119875119903(119865119863

)=

119908 sum

=812

2436=033119875119903(119865119864)=033119875119903(119865119866

)=033

Prob

abilitie

s119865

119866119864

0033

033

119863119860

119861

033

00

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0033

033

119863119860

119861

033

00

119862 0

CallRa

ndRa

nd=007R

andfalls

instate119864

Rou

lette

wheelselects119864

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

Currentstate119864

Potentialstates119861119863

119866(1)U

se(2)120578119864119861=12120578119864119863=12120578119864119866=1

(2)U

se(3)119871119896=(119865

minus119864minus119863)=1+2=3119871119896=(119865

minus119864minus119861)=1+2=3119871119896=(119865

minus119864minus119866)=1+2=3

Δ120591119896=2 3=067

(3)U

se(4)120591 119864119861=(1minus001)lowast201+067=266120591119864119863=266120591119864119866=266

Second

pherom

oneu

pdate

119865119866

119864

001

266

201

119863119860

119861

266

001

266

119862

001

(4)U

se(5)119908

(119864119861)=(266)3lowast(12)2=471119908

(119864119863

)=471119908

(119864119866

)=(266)3lowast(1)2=1882

Sum

=471+471+1882=2824

119875119903(119864119861)=

119908 sum

=471

2824=017119875119903(119864119863

)=

471

2824=017119875119903(119864119866)=1882

2824=067

Prob

abilitie

s119865

119866119864

0067

0

119863119860

119861

017

0017

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0067

0

119863119860

119861

017

0017

119862 0

CallRa

ndRa

nd=09Ra

ndfalls

instate119863

Rou

lette

wheelselects119863

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

119863

Mathematical Problems in Engineering 13

Example II of QLACS (For good cooperation and communication between robots)Parameters used in the second component of QLACS (Using output from the first component of QLACS)Reward Scheme Inspect = 50 Ignore = 100 Shutdown = 150State space Optimized path from QLACS R1 = 119865 119864119863 119860 119861 119862 119866 119862 and QLACS R2 = 119862 119861 119860119863 119865 119864 119866 119862

Starting State 119865119862119878119895= Terminating State 119862 then119867

Terminating condition When all states have been visitedInitialize 119876value positions to zerosEquations(i) Compute learning rate using (9)(ii) Compute update on 119876(119904 119886) using (10)

Algorithm 4 Parameters for QLACS Example II

Table 8 Pheromone update of a full cycle

Pheromone updateCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 001 201 201 201 001 001 001 1st update119864 001 266 201 266 001 266 001 2nd update119863 313 001 303 001 303 001 001 3rd update119860 001 001 001 045 001 045 001 4th update119861 001 001 067 001 067 001 07 5th update119862 001 091 001 001 001 089 001 6th update119866 071 001 069 001 001 001 071 7th update119862 = terminating state and ant 119896 has moved through all states at least onceTrail 119865119864119863119860119861119862119866119862Number of obstacles = 1 + 2 + 2 + 2 + 1 + 1 + 1 + 1 = 10

Table 9 Probability update of a full cycle

Probability tableCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 0 033 033 033 0 0 0119864 0 067 0 017 0 017 0119863 069 0 016 0 016 0 0119860 0 0 0 05 0 05 0119861 0 0 016 0 016 0 068119862 0 052 0 0 0 049 0119866 045 0 0 0 0 0 045119862 = terminating state

as 10 is calculated using the trail result in Table 8 Table 9displays the probability update table for a full cycle of routefinding In the case of this first example the first robot willterminate its inspection through 119862 and then119867

Algorithm 4 displays the parameters of the second exam-ple of QLACS The second component of QLACS handlesthis aspect of the model which is the communication andcooperative part Once the first component hands the outputto the second component it becomes the second componentinput and it runs with it From Algorithm 4 QLACS R1represents the input for robot 1 and QLACS R2 representsthe input for robot 2 received from the first component of

QLACS The terminating condition is when all states havebeen visited

The rest of Example II displayed in Table 10 explains thecooperative behavioural scenario from one state to anotherfor two robots The tables in the last row of Table 10 show thecommunication and cooperative output achieved using thesecond component of QLACS

4 Experimental Evaluations SafetyInspections for MRS Behaviours

The experimental results of implementing QLACS in anenvironment that consists of obstacles and links are tabu-lated in this section The experimental setup is explainedin Section 41 Different performance categories are shownin this section without communication category and withcommunication category In the without communicationcategory as displayed in Section 42 we found that robotscan inspect all states individually without knowing thatanother robot exists Robots can also inspect some of thestates thereby leaving some states not inspected The com-munication category is explained in Section 43 while theperformance of the QLACS measured with other existingmethods is tabulated in Section 44

41 Experimental Setup Figure 8 displays different sets ofexperiments conducted in the environment using two robotsFigure 8(a) shows how two robots resume inspection fromtwo different entrances In each set of experiments the robotstake the readings of the roof cracks and level of toxic gasesusing their sensors The same behaviours happen in Figures8(b) and 8(c) respectively at different inspection locationsThe inspection locations vary in the four experimental setupsshown in Figure 8

42 Experiment 1 Performance of QLACS without Coop-eration The result of implementing the QLACS withoutcommunication in the proposed environment (Figures 1 and4) is shown in Tables 11 and 12 In the case of Table 11 robot1 (R1) enters the mine through state 119865 while robot 2 (R2)enters the mine through state 119862 However each robot learnsby inspecting some of the states and ignoring some of the

14 Mathematical Problems in Engineering

Table 10 QLACS Example II cooperative behaviour

Starting simulation Robot 1Starting state F(1) Use (9)120574 (119865) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119865 (use (11))119876 (119865 119886) = 0

Selected action 119886 = inspect(3) use (10)119876 (119865 119886) = 50 + 025(0) = 50

End of value iteration

Current state E Robot 1(1) Use (9)120574 (119864) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119864 (Use (11))119876 (119864 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119864 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119862 (use (11))119876 (119862 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119862 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state B Robot 2(1) Use (9)120574 (119861) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119861 (use (11))119876 (119861 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119861 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current stateD Robot 1(1) Use (9)120574 (119863) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119863 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119863 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current State A Robot 2(1) Use (9)120574 (119860) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119860 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119860 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iterationCurrent state A Robot 1(1) Use (9)120574 (119860) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119860 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119860 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current stateD Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119863 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119863 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state B Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state F Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119865 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119865 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state E Robot 2(1) Use (9)120574 (119864) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119864 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119864 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state G Robot 1(1) Use (9)120574 (119866) =

05

[1 + 1]=05

2= 025

(2) Broadcast (Use (11))119876 (119866 119886) = 0

Selected action 119886 = Inspect(3) Use (10)119876 (119866 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state G Robot 2(1) Use (9)120574 (119866) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119866 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119866 119886) = 100 + 04 (max (50 50 50))= 100 + 20 = 120

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119862) =

05

[1 + 04]=05

14= 036

(2) Broadcast (Use (11))119876 (119862 119886) = 50

Selected action 119886 = Ignore(3) Use (10)119876 (119862 119886) = 100 + 036 (max (50 50))= 100 + 18 = 118

End of value iterationCurrent state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 036]=

05

136= 037

(2) Broadcast (Use (11))119876 (119862 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119862 119886) = 100 + 037 (max (50 50))= 100 + 037 lowast 50 = 1185

End of value iteration

All QLACS 1 states exhaustedGoal stateH Robot 1(1) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(3) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

All QLACS 2 states exhaustedGoal stateH Robot 2(4) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(5) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(6) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Cooperative Behaviours with Swarm Intelligence in Multirobot

Mathematical Problems in Engineering 5

MRS mine inspection Miners for operation

Scalable system

Global databaseMemory usage

IntelligenceQLACS

ACSQL

Gas meter Local database Roof sound sensor

Graph

MRS distributedknowledge baseaccess simpledecision makinginterrobotcommunication

Appl

icat

ion

laye

rTo

pmos

t lay

erSc

alab

ility

laye

rM

iddl

e lay

erC

oope

rativ

e beh

avio

ural

laye

rBo

ttom

laye

r

Figure 2 Framework of the proposed QLACS model for cooperative behaviours

Table 2 State and possible actions of the environment

State Possible Actions1 A (lower left part (LLP)) Inspect ignore2 B (lower middle part (LMP)) Inspect ignore3 C (lower right part (LRP)) Inspect ignore4 D (middle left part (MLP)) Inspect ignore5 E (central part (MCP)) Inspect ignore6 F (upper left part (ULP)) Inspect ignore7 G (upper right part (URP)) Inspect ignore8 H (outside mine part (OMP)) Shutdown

31 Problem Formulations Suppose we have seven roomsstates connected by doorslinks representing undergroundmine regions as shown in Figure 4 and labeled as shown inTable 2 We label each room 119860 through 119865 The outside of themine can be thought of as one big room (119867) Notice thatdoors 119865 and 119862 lead outside the mine 119867 We put two robotsin rooms 119865 and 119862 as their starting states respectively Therobots inspect one state at a time considering the obstaclesencountered in the process The robots can change directionfreely having linksdoors to other roomsstates In each ofthe states in Figure 4 two actions are possible inspection of

roof cracks (RC) and level of toxic gases (TG) or ignoringthe state as it has been inspected earlier According toour proposed model a robot can inspect at least half ofthe given underground mine region to attain its maximumperformance which in turn attracts a good reward Whenthe current state of a robot is the end point of the inspectiontask the robots exit the mine using the exit points 119862 and119865 respectively The possible transition states of the robotsare displayed using the state diagram in Figure 5 The statediagram and the transition matrix are formed using the QLalgorithm

The global map is assumed to be known by the robotsbut there is no prior knowledge of the local map Robotsonly know what they have sensed themselves and whattheir teammates communicate to them Not only does ourimprovedQL depend on themap of the environment but alsoeach robot learns through experience about local changesThey explore and inspect from state to state until they getto their goal states Our proposed model QLACS achievesan offline mode for the route finding algorithm (ACS) Thismeans that the global view of the map would have beenprovided before the learning robots start

The model is explained using simulations the results arepresented in Section 4 Some of the results also have graphicalinterpretations Analysis by a state transition diagram is

6 Mathematical Problems in Engineering

Table 3 Initial reward matrix

Robotrsquos actionA B C D E F G H

Robotrsquos state

A mdash 50 100 mdash 50 100 mdash mdash mdash mdashB 50 100 mdash 50 100 mdash 50 100 mdash mdash mdashC mdash 50 100 mdash mdash mdash mdash 50 100 150D 50 100 mdash mdash mdash 50 100 50 100 mdash mdashE mdash 50 100 mdash 50 100 mdash 50 100 50 100 mdashF mdash mdash mdash 50 100 50 100 mdash 50 100 150G mdash mdash 50 100 mdash 50 100 50 100 mdash mdashH mdash mdash 50 100 mdash mdash 50 100 mdash mdash

Bottom layer‐ A new hybrid model QLACS

‐ An efficient cooperative andnavigational model

Middle layer‐ An efficient scalable system

Topmost layer‐ Acquired results stored for future use

(a) (b)

Shortest round trip pathsGood communicationTime of performance

Number of iterations usedThorough inspection

Exit the system

Are theysuccessful

Monitor memorywith the

utilizationinterface platform

Record the results

Acquire informationfrom the bottom

layer

Build a hybrid modelusing QL and ACS

algorithms

Two robots

Test robots for

Add one morerobot to the

system

Results will beused in the

application layer

Database

Yes No

Database

Figure 3 Breakdown of the framework (a) Contributions of the framework Layer by layer and (b) processes of the multirobot behaviouralsystem

presented in Figure 5 The possible state actions of the robotsare presented in Table 3 The states of the robots are reducedas follows searching or inspecting for roof cracks and toxicgas levels in each state or room and recording the outcomeof the inspection in the robotsrsquo memories The processesinvolved in achieving good communication while the robotsinspect the states are broadcasting inspected states to theother robots and ignoring the inspectedbroadcasted stateavoiding collision with obstacles and other robots and finallymoving to the next available state that has not been inspected

The Q-learning algorithm is used to determine whataction is to be selected in any particular state Let theaction (119886) = inspect ignore shutdown state space (119904) =dimensions in the topological map sensor readings (119904

119903)

hazardous conditions (119867119888) = roof crack (RC) toxic gas level

(TG) Each robot is configured with an aerial scanner andchemical sensitive sensor that will provide readings (119904

119903)

to determine if there is a hazardous condition 119867119888 in the

environment The selection of an action by a robot throughthe Q-learning algorithm is based on the information fromthe broadcast The whole framework uses a decentralized 119876-learning scheme such that each robot has its own threadwith its Q-learning and table All 119876-values on the tables ofthe robots in the team are initialized to zero that is statescorresponding to such positions have not been inspectedWhen a robot enters a new state it broadcast its informationfor the current state to all other robots in the environmentThe broadcast indicates whether the current state has been

Mathematical Problems in Engineering 7

F

H

Houtside

Robot Bstarting position

Robot Astarting position

Linkbw A and B

mine

D

A B

E

G

C

Figure 4 Model of the environment with two entrances and exits(2EE)

F

BA (50 100)

(50 100)

(50 100)

(50 100)

(50 100) 150

150

(50 100)

(50 100)

(50

100

)

(50

100

)

(50

100

)

D E G

C

H

Goal state

Figure 5 Events and transition diagram of the modeled environ-ment with119867 as goal state

inspected or not The broadcast is a search of correspondingpositions in the memories of all visible robots If the resultantsearch returns a zero the broadcast indicates that the statehas not been inspected and if it returns a value greater thanzero it indicates that the state is not available for inspectionAll robots must receive a broadcast before they act in anyparticular state When a robot gets to a state it receives abroadcast and passes its value to the QL algorithm to makea decision The robot carries out the decision of the QLalgorithm The policy of the Q-learning algorithm makes arobot carry out an inspection if the resultant search of thebroadcast returns zero and ignores it if the resultant searchis greater than zero A robot only shuts down if all states havebeen visited and inspected As the broadcast information ispassed to the Q-learning algorithm the policy is iteratedtowards an optimal value and condition

The broadcast avoids the cost of multiple inspections andthe QL ensures that robots take appropriate actions Theonly state in which inspection is carried out would havesensor readings (119904

119903) indicating 119867

119888 For taking an action (119886)

in state (119904) the robot gets a reward (119877) which is used in(10) to compute the 119876-value for the current state and cansend a broadcast to other robots Figure 5 and Table 3 showthe restrictions and possible transitions among node pointsindicating the possible rewards for any particular action (119886)

in any state (119904) Every other transition besides the goal state

could result in a reward of 50 or 100 and at completionthe reward is the maximum 150 We consider it wasteful tomake the robots scan first before sharing intelligence becausesuch action would slow them down and make them expendmore energy Robots are to provide the inspection resultsshowing actions taken in different states and the nature ofconditions detected in any particular state The introductionof the broadcast approach to determine the teamrsquos explorationreduces the execution time and energy cost of the wholeteamandmakes the collaboration effective So in amultirobotscenario the task can be effectively executed if the robots aremade to share intelligence as they progress Robots do nothave to waste time and energy in doing the same task alreadycarried out by other robots in the team

GIVEN On a mathematical justification of the above sup-pose a safety preinspection of toxic gases or rock fall or somecombination of the two is being carried out on a piece ofcomplex underground terrain in Figure 1 say 119871Km2 thereis a limited number of MRS with different capacities 119877 andprecious inspection time 119879minutes Every regionstate 119909

1in

the terrain requires a capacity of robot1198771ofMRS and limited

time 1198791while every state 119909

119899requires robot 119877

119899of MRS and

inspection time 119879119899 Let 119875

1be the positive reward of QL for

correct behaviour on state 1199091and let 119875

119899be the reward on

state 119909119899 This research aims to maximise positive rewards by

choosing optimal values for states 1199091 119909

119899as follows

Maximise

1198751sdot 1199091+ 1198752sdot 1199092+ sdot sdot sdot + 119875

119899sdot 119909119899

(objective function)

Subject to constraints

1199091+ 1199092+ sdot sdot sdot + 119909

119899le 119871

(limited total states)

1198771sdot 1199091+ 1198772sdot 1199092+ sdot sdot sdot + 119877

119899sdot 119909119899le 119877

(limited MRS capacity)

1198791sdot 1199091+ 1198792sdot 1199092+ sdot sdot sdot + 119879

119899sdot 119909119899le 119879

(limited inspection time)

Non-negativity constraints

1199091ge 0 119909

2ge 0 119909

119899ge 0

(MRS cannot inspect negative states)

(1)

In terms of solving this optimization problem we use theproposed QLACS model to compare the time and number ofstates inspectedThe number of robots used is also comparedThe graphs results in Section 4 also give more insight into thesolution of the problem

32 Basic Navigation and Cooperative Behaviours UsingQLACS QLACS has two components The first componentis formed by an improved ACS and the second component

8 Mathematical Problems in Engineering

is formed by an improved QL The improvement occursbecause some heuristics were added to the ordinary QL andACS to achieve the hybrid QLACS However the secondcomponent ofQLACS which is an improvedQL was initiallyused to solve the proposed problem After much analysis werealized that the system needs to be optimized for effectivecooperation and communication

Using the second component of QLACS to solve the basicnavigation and cooperative behaviour the possible actionswere set for each robot as inspect ignore and shutdown (afterreaching the goal state 119867) Also a reward system that willreflect the possible actions of the robots was chosen In otherwords a robot gets 150 points only when the goal is achieved(shutdown) 100 points for ignoring an already inspected area(ignore) and 50 points for inspecting an uninspected area(inspect) Figure 5 shows the transition events of the modeland Table 3 displays the possible state action for each robotThe way the QLACS second component works here is basedon both navigation and communication behaviours

Achieving navigational behaviour with the second com-ponent of QLACS has some cost associated to it In ourscenario because we want the robots to share intelligenceby broadcasting results (robots search through other robotsrsquomemory) our problem is not solely navigational but alsocooperative Our behavioural actions are inspect ignoreand shutdown We noted that these actions of interest arenot navigation oriented there is no way we could usethem in making decisions on transition The functions fordecision can be integrated to assist the other Thereforeour behavioural model is an integration of two behaviours(1) navigational behaviour and (2) cooperating behaviourthrough decision making The integration works with aroute finding method called RandomStateSelector which weintroduced in the second component of QLACS to helpdetermine where the robot goes from the starting point tothe exit point Two parts of the RandomStateSelector methodare introduced in this work The first one is the Random-StateSelector C H which is used to transit from state 119862 to119867and the second one RandomStateSelector F H transits fromstate 119865 to 119867 This method works but not effectively becausesome of the states are repeated several times because of therandom selection method However the decision part of thissecond component of QLACS which is handled by a methodcalled CheckInspection worked efficiently CheckInspection isresponsible for sharing the broadcast among the agents Thebroadcast looks at all the stored 119876-values on all the robotsand returns a signal for the action that needs to be takenTherefore we concluded that the heuristically acceleratedcomponent of QLACS has proven to be effective by showingevidence of effective communication in making inspectiondecisions using our model It did not guarantee shortest pos-sible time for inspection because of repeated states decisionsIn this light we only harnessed the communication strengthof the second component of QLACS for communicationand cooperation Figure 5 and Table 3 form the basis for theQLACS second component

To take care of the random state selector problem encoun-tered in implementing the algorithm used for the secondpart of QLACS we introduced an optimized route finder

Table 4 Combined adjacency and weight matrix

119865 119866 119864 119863 119860 119861 119862 119867

119865 0 1 1 1 0 0 0 1119866 1 0 2 0 0 0 1 0119864 1 2 0 2 0 2 0 0119863 1 0 2 0 2 0 0 0119860 0 0 0 2 0 2 0 0119861 0 0 2 0 2 0 1 0119862 0 1 0 0 0 1 0 1119867 1 0 0 0 0 0 1 0

1

11

1

12

2 2

2

2

E

C

G

B

A

D

F

Figure 6 Weighted mapgraph of the model environment

algorithm This route finder algorithm which forms the firstcomponent of QLACS is a swarm intelligence techniqueFigure 6 andTable 4 form the basis of the exploration space ofthe agents (ants) which achieved the optimum route findingThe weighted graph in Figure 6 is based on the number ofobstacles the ants will encounter from the points of entry 119865and 119862 The combined table in Table 4 contains the weightsof the obstacles and evidence of an edge between any twovertices (states) It shows that there is a connection betweenany two vertices in a graph Generally a ldquo1rdquo or ldquo2rdquo depictsthe existence of an edge while a ldquo0rdquo represents the absenceof an edge that is no transition between such vertices Theconstructed graph is an undirected multigraph providingevidence of some agents coming from 119865 or 119862 of the mine(logical space) It is unidirectional because agents can movein any particular direction (multigraph) This means that thesameweights on the edges apply to both directionsThe graphdoes not show119867 we assume that once the agents reach 119865 or119862 they exit if all inspections have been done

33 HybridModel Development Theparameters for buildingthe analytical hybrid QLACS (equations (2) through (11))model are presented in Tables 5 and 6

331 Algorithmic and Mathematical Analysis

ACS Starts Computation of edge attractiveness

120578119894119895=

1

119863119894119895

(2)

Mathematical Problems in Engineering 9

Table 5 A list of parameters for the ACS model

ACO parameters Meaning120572 Pheromone influence factor120573 Influence of adjacent node distance120588 Pheromone evaporation coefficient119876 Attractiveness constant119890 Visited edge1198901015840 Edge not visited119871119896 Length tour of ant k

120591 Pheromone concentration (amount)120578 Specific visibility function (attractiveness)Δ120591119896 Pheromone concentration by Kth ant

119875119903(119894 119895) Probability of moving from i to j

119863119894119895 Visibility or distance between i and j

119891119894 Fitness of individual in a population119875119894 Probability of being selected among 119891

119894

119873 Number of individuals in the population119894 119895 Denotes any two adjacent nodes in the graph119872119896 Set of unvisited nodes

Table 6 A list of parameters for the QL model

QL Parameters Meaning119876 119876-value update119904 State119886 Action119877 Reward120574 Learning rate

Computation of instantaneous pheromone by ant 119896

Δ120591119896=

119876

119871119896

(3)

Update of pheromone

120591119894119895= (1 minus 120588) lowast 120591

119894119895+ Δ120591119896

119894119895 (4)

Computation of edge probability

119875119903(119894 119895) =

[120591119894119895]120572[120578119894119895]120573

Σ1198901015840=(119894119895)

[120591119894119895]120572[120578119894119895]120573 (5)

Adoption of roulette wheel

Cumulative (119875119903(119894 119895)) =

119873+1

sum119894=1

119875119903(119894 119895) (6)

119891119894=sum119873

119895=1119891119895

119873 (7)

119875119894=

119891119894

sum119873

119895=1119891119895

(8)

Equations (2) through (8) build the complete route find-ing model Equations (2) through (4) are prerequisite to (5)Equation (5) is prerequisite to roulette wheel selection At theend of (8) new states are selected and the trail is updatedThebest path from both directions is selected and used as inputin Q-learning Note that 119863

119894119895= weight on edges 119875

119903(119894 119895) =

chance of moving to a node 119864(119868 119895) 119871119896= sum of visited nodes

by ant 119896

QL Starts Each robot in its QL threadcomputes its learning rate

120574 =05

[1 + Frequency (119904 119886)] (9)

119876-values are updated

119876 (119904 119886) = 119877 (119904 119886) + 120574 (10)

making a broadcast (Decision = InspectIgnoreShut-down)

119876 (119904 119886) =

119876 = 0 Inspect if 119904119895

= goalstate119876 gt 0 Ignore if 119904

119895= goalstate

119876 ge 0 Shutdown if 119904119895= goalstate

(11)

Equation (9) is Gamma the learning rate which is alwaysbetween zero and 1 This equation is calculated based onthe frequency of action of each robot in inspecting statesEquations (9) to (11) are state-dependent The states are keptin a buffer and then accessed at run time ACS and QL do notwork simultaneously ACS works to completion and QL takesthe final output as its input ACS is not repeatedly called whileQL is working

Our behavioural model is an integration of two algo-rithms (1) route finding algorithm (2) communication andcooperative algorithm The integration works the followingway The optimal route finder (ACS) determines where therobot goes from the starting point to the destination whileQL determines what action it takes when it gets to any ofthe states This collaboration works effectively because theoptimal route finder has been proven to give the best possibletransitions and the heuristically accelerated QL has provento be effective by showing evidence of effective commu-nication in making inspection decisions Consequently itguarantees the shortest possible time for inspection in theabsence of wasteful inspection decisions This frameworkforms the basis for our cooperative behaviourmodel forMRS(QLACS)Thepseudocode for the implementation ofQLACSis outlined in Algorithm 2

332 QLACS Hybrid Approach Evolution Figure 7 is thearchitecture of hybrid QLACS explaining the handshakebetween the two components

(i) Graph Construct Module This is the interconnection ofstates in our model environment (see Figure 4) It encom-passes all possible transitions from 119865 to 119862 and vice versaThus from themodelwe construct an adjacencyweight graphmatrix that can be traversed by any graph-oriented algorithm

10 Mathematical Problems in Engineering

INPUT Edge distance(obstacles) pheromones antsrsquo trail associated probabilitiesstarting and terminating indexes that is from F or COUTPUT Effective cooperation inspection and navigation

(1) Boolean CompletedFlag = False Boolean variable indicates completion for all the threads(2) Declare CompletedThreadBuffer Data structure stores Info about completed thread(3) Initialize all 119876values to Zero All 119876values positions are initialized to zero(4) Initialize Best Paths From ACO algorithm starting from 119865 and 119862(5) While (CompletedFlag ltgt True) Checks for when all robots have finished and flag is true

BeginCreate N number of Threads in ParallelThreads get Current States in Parallel from ACO algorithmThreads get Next States and Indexes in Parallel from ACO algorithmThreads compare 119876values of all corresponding positions of Current States (Each Robot Broadcast119876value info)IF ((Q == 0)amp (ThreadNextState ltgt GoalState)) Checks if a particulate state is availableBeginState is Available Robot with the CurrentThreadID InspectsCompute and Update 119876value

EndIF (Q gt 0) checks if a state is not available because an already inspected state has 119876 gt 0BeginState is already inspected Robot with the CurrentThreadID IgnoreCompute and Update 119876value

EndIF ((Q == 0) amp (ThreadNextState == GoalState)) Checks for goal state and shuts downBeginCompute and Update 119876valueGoal state is reached and Inspection CompletedThread with the CurrentThreadID Shuts downStore CurrentThreadID in CompletedThreadBuffer

EndIF (Count [CompletedThreadBuffer] == NumberOfRobot) Learning stops when this happensBeginCompletedFlag = True

EndEnd of While Loop

Algorithm 2 Pseudocode for QLACS

No

No

Yes

Yes2

robots

Graphconstruct

Initializeparameters

module

Stateselectionmodule

Stateaction

module

QLACSupdatemodule

Statebroadcasting

module

Goal statemodule

New statemodule

Decisionmodule

Cooperativeinspection

module

Convergencedecisionmodule

Transitionmodule

Updatemodule

Figure 7 Hybrid architecture

Mathematical Problems in Engineering 11

Example I of QLACS (Using Figure 5 for optimized route finding)Parameters used in the first component of QLACS119876 = 20 120572 = 3 120573 = 2 120588 = 001

Starting State 119865Terminating State 119862119865Terminating condition When all states have been visited at least once and it is at terminating stateState Space = 119865 119866 119864119863 119860119861 119862Initialize pheromones positions to 001Rand = (0 1) returns a random value between 0 and 1Equations(i) Computation of attractiveness 120578

119894119895using (2)

(ii) Computation of instantaneous pheromones by ant 119896 for all potential states using (3)(iii) Pheromone update using (4)(iv) Computation of probabilities for the potential states using (5)(v) Adaptation of roulette Wheel using (6)Equation (5) can be reduced to Let 119908(119894 119895) = [120591

119894119895]120572

[120578119894119895]120573

Sum =119873+

sum119894=1

119908 (119894 119895)

So 119875119903(119894 119895) = 119908

sum

Algorithm 3 Parameters for QLACS Example I

In our case there are eight states primarily seven internalstates and a common terminating state Since the states arenot just linear the environment allows for multiple optionsthat is an agentant can choose from any present state Thistype of scenario is also called a multigoal scenario

(ii) State Selection Module Here the agents select the startand end states which according to our model in Figure 4 are119865 and 119862 These states are selected based on the cumulativeprobability of two adjacent nodes in (6)

(iii) Transition Module This module takes care of the tran-sition rules of the agents by calculating the pheromoneconcentrations distances and probabilities using (2) through(4)

(iv)UpdateModuleAfter transition fromone state to anotheran update of the pheromone is computed after whichmultipath planning with shortest path is achieved

(v) Convergence Decision Module This is where the besttrailpath decision is takenThis is the end product of the firstcomponent of QLACS which is then moved to the secondcomponent of QLACS for cooperative behaviour

(vi) Cooperative Inspection Module This is where the robotsare deployed to start inspection The robots are to use theacquired best paths starting from 119865 and 119862 respectively asinput for the second component of QLACS The two robotsare to use the learning rate from (9) to learn the environmentand use (10) for cooperation

(vii) State Broadcasting Module This module handles thebroadcasting behaviours of the two robots which areachieved by using (10) Each robot checks its memoryrepresented by 119876-values before taking any decision

(viii) State Action Module State broadcasting by each robot isimmediately followed by action selection In other words thestate to inspect or ignore is achieved here using (11)

(ix) QLACS Update Module After each action selection the119876-values of each robot are updated using (10)

(x) New State Module This module takes care of the robotrsquosnew state after updating the 119876-values Each robot runs in itsown threads managing its memory yet sharing information

(xi) Final Decision ModuleThis decision module determinesif the robot should exit the environment or still do moreinspections It is also controlled by (11)

(xii) Goal State Module The completion of the second com-ponent of QLACS is getting to the goal state after successfulinspection of statesThis goal state according to our model inFigure 4 is called119867

333 Analytical Scenario For the benefit of robotic andsystem engineers practitioners and researchers this paperuniquely presents scenarios on the ease of implementation ofthe proposed QLACS as described in Algorithms 3 and 4 andTables 7 and 10 The first component of QLACS is explainedin Example I shown in Algorithm 3 and Table 7 The firstcomponent that achieved optimal route paths for the robotshas the parameters listed in Algorithm 3 The calculation fordifferent states transition for the first component of QLACSis analysed in Table 7

Repeat steps 1ndash6 for subsequent current states until thetermination condition and state are reached At the end ofseven updates we have Tables 8 and 9 The total lengthshown in Table 8 represents the total number of obstaclesencountered by each agent while trailing to achieve theoptimal route for the robots The number of obstacle shown

12 Mathematical Problems in Engineering

Table7QLA

CSEx

ampleI

navigatio

nalanalytic

alsolutio

n

Startin

gsta

te119865

Potentialstates119863119864

119866(1)U

se(2)120578119865119863=1120578119865119864=1120578119865119866=1

(2)U

se(3)119871119896=1Δ120591119896(119865119863

)=21

=2Δ

120591119896(119865119864)=2Δ

120591119896(119865119866

)=2

(3)U

se(4)120591 119865119863=(1minus001)lowast001+2=20099=201120591119865119864=201120591119865119866=201

Firstp

heromon

eupd

ate

119865119866

119864

001

201

201

119863119860

119861

201

001

001

119862

001

(4)U

se(5)119908

(119865119863

)=[120591119865119863]120572[120578119865119863]120573=(201)3=812119908

(119865119864)=812119908

(119865119866

)=812

Sum

=119908(119865119863

)+119908(119865119864)+119908(119865119866

)=2436

119875119903(119865119863

)=

119908 sum

=812

2436=033119875119903(119865119864)=033119875119903(119865119866

)=033

Prob

abilitie

s119865

119866119864

0033

033

119863119860

119861

033

00

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0033

033

119863119860

119861

033

00

119862 0

CallRa

ndRa

nd=007R

andfalls

instate119864

Rou

lette

wheelselects119864

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

Currentstate119864

Potentialstates119861119863

119866(1)U

se(2)120578119864119861=12120578119864119863=12120578119864119866=1

(2)U

se(3)119871119896=(119865

minus119864minus119863)=1+2=3119871119896=(119865

minus119864minus119861)=1+2=3119871119896=(119865

minus119864minus119866)=1+2=3

Δ120591119896=2 3=067

(3)U

se(4)120591 119864119861=(1minus001)lowast201+067=266120591119864119863=266120591119864119866=266

Second

pherom

oneu

pdate

119865119866

119864

001

266

201

119863119860

119861

266

001

266

119862

001

(4)U

se(5)119908

(119864119861)=(266)3lowast(12)2=471119908

(119864119863

)=471119908

(119864119866

)=(266)3lowast(1)2=1882

Sum

=471+471+1882=2824

119875119903(119864119861)=

119908 sum

=471

2824=017119875119903(119864119863

)=

471

2824=017119875119903(119864119866)=1882

2824=067

Prob

abilitie

s119865

119866119864

0067

0

119863119860

119861

017

0017

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0067

0

119863119860

119861

017

0017

119862 0

CallRa

ndRa

nd=09Ra

ndfalls

instate119863

Rou

lette

wheelselects119863

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

119863

Mathematical Problems in Engineering 13

Example II of QLACS (For good cooperation and communication between robots)Parameters used in the second component of QLACS (Using output from the first component of QLACS)Reward Scheme Inspect = 50 Ignore = 100 Shutdown = 150State space Optimized path from QLACS R1 = 119865 119864119863 119860 119861 119862 119866 119862 and QLACS R2 = 119862 119861 119860119863 119865 119864 119866 119862

Starting State 119865119862119878119895= Terminating State 119862 then119867

Terminating condition When all states have been visitedInitialize 119876value positions to zerosEquations(i) Compute learning rate using (9)(ii) Compute update on 119876(119904 119886) using (10)

Algorithm 4 Parameters for QLACS Example II

Table 8 Pheromone update of a full cycle

Pheromone updateCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 001 201 201 201 001 001 001 1st update119864 001 266 201 266 001 266 001 2nd update119863 313 001 303 001 303 001 001 3rd update119860 001 001 001 045 001 045 001 4th update119861 001 001 067 001 067 001 07 5th update119862 001 091 001 001 001 089 001 6th update119866 071 001 069 001 001 001 071 7th update119862 = terminating state and ant 119896 has moved through all states at least onceTrail 119865119864119863119860119861119862119866119862Number of obstacles = 1 + 2 + 2 + 2 + 1 + 1 + 1 + 1 = 10

Table 9 Probability update of a full cycle

Probability tableCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 0 033 033 033 0 0 0119864 0 067 0 017 0 017 0119863 069 0 016 0 016 0 0119860 0 0 0 05 0 05 0119861 0 0 016 0 016 0 068119862 0 052 0 0 0 049 0119866 045 0 0 0 0 0 045119862 = terminating state

as 10 is calculated using the trail result in Table 8 Table 9displays the probability update table for a full cycle of routefinding In the case of this first example the first robot willterminate its inspection through 119862 and then119867

Algorithm 4 displays the parameters of the second exam-ple of QLACS The second component of QLACS handlesthis aspect of the model which is the communication andcooperative part Once the first component hands the outputto the second component it becomes the second componentinput and it runs with it From Algorithm 4 QLACS R1represents the input for robot 1 and QLACS R2 representsthe input for robot 2 received from the first component of

QLACS The terminating condition is when all states havebeen visited

The rest of Example II displayed in Table 10 explains thecooperative behavioural scenario from one state to anotherfor two robots The tables in the last row of Table 10 show thecommunication and cooperative output achieved using thesecond component of QLACS

4 Experimental Evaluations SafetyInspections for MRS Behaviours

The experimental results of implementing QLACS in anenvironment that consists of obstacles and links are tabu-lated in this section The experimental setup is explainedin Section 41 Different performance categories are shownin this section without communication category and withcommunication category In the without communicationcategory as displayed in Section 42 we found that robotscan inspect all states individually without knowing thatanother robot exists Robots can also inspect some of thestates thereby leaving some states not inspected The com-munication category is explained in Section 43 while theperformance of the QLACS measured with other existingmethods is tabulated in Section 44

41 Experimental Setup Figure 8 displays different sets ofexperiments conducted in the environment using two robotsFigure 8(a) shows how two robots resume inspection fromtwo different entrances In each set of experiments the robotstake the readings of the roof cracks and level of toxic gasesusing their sensors The same behaviours happen in Figures8(b) and 8(c) respectively at different inspection locationsThe inspection locations vary in the four experimental setupsshown in Figure 8

42 Experiment 1 Performance of QLACS without Coop-eration The result of implementing the QLACS withoutcommunication in the proposed environment (Figures 1 and4) is shown in Tables 11 and 12 In the case of Table 11 robot1 (R1) enters the mine through state 119865 while robot 2 (R2)enters the mine through state 119862 However each robot learnsby inspecting some of the states and ignoring some of the

14 Mathematical Problems in Engineering

Table 10 QLACS Example II cooperative behaviour

Starting simulation Robot 1Starting state F(1) Use (9)120574 (119865) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119865 (use (11))119876 (119865 119886) = 0

Selected action 119886 = inspect(3) use (10)119876 (119865 119886) = 50 + 025(0) = 50

End of value iteration

Current state E Robot 1(1) Use (9)120574 (119864) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119864 (Use (11))119876 (119864 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119864 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119862 (use (11))119876 (119862 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119862 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state B Robot 2(1) Use (9)120574 (119861) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119861 (use (11))119876 (119861 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119861 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current stateD Robot 1(1) Use (9)120574 (119863) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119863 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119863 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current State A Robot 2(1) Use (9)120574 (119860) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119860 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119860 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iterationCurrent state A Robot 1(1) Use (9)120574 (119860) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119860 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119860 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current stateD Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119863 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119863 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state B Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state F Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119865 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119865 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state E Robot 2(1) Use (9)120574 (119864) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119864 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119864 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state G Robot 1(1) Use (9)120574 (119866) =

05

[1 + 1]=05

2= 025

(2) Broadcast (Use (11))119876 (119866 119886) = 0

Selected action 119886 = Inspect(3) Use (10)119876 (119866 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state G Robot 2(1) Use (9)120574 (119866) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119866 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119866 119886) = 100 + 04 (max (50 50 50))= 100 + 20 = 120

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119862) =

05

[1 + 04]=05

14= 036

(2) Broadcast (Use (11))119876 (119862 119886) = 50

Selected action 119886 = Ignore(3) Use (10)119876 (119862 119886) = 100 + 036 (max (50 50))= 100 + 18 = 118

End of value iterationCurrent state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 036]=

05

136= 037

(2) Broadcast (Use (11))119876 (119862 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119862 119886) = 100 + 037 (max (50 50))= 100 + 037 lowast 50 = 1185

End of value iteration

All QLACS 1 states exhaustedGoal stateH Robot 1(1) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(3) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

All QLACS 2 states exhaustedGoal stateH Robot 2(4) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(5) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(6) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Cooperative Behaviours with Swarm Intelligence in Multirobot

6 Mathematical Problems in Engineering

Table 3 Initial reward matrix

Robotrsquos actionA B C D E F G H

Robotrsquos state

A mdash 50 100 mdash 50 100 mdash mdash mdash mdashB 50 100 mdash 50 100 mdash 50 100 mdash mdash mdashC mdash 50 100 mdash mdash mdash mdash 50 100 150D 50 100 mdash mdash mdash 50 100 50 100 mdash mdashE mdash 50 100 mdash 50 100 mdash 50 100 50 100 mdashF mdash mdash mdash 50 100 50 100 mdash 50 100 150G mdash mdash 50 100 mdash 50 100 50 100 mdash mdashH mdash mdash 50 100 mdash mdash 50 100 mdash mdash

Bottom layer‐ A new hybrid model QLACS

‐ An efficient cooperative andnavigational model

Middle layer‐ An efficient scalable system

Topmost layer‐ Acquired results stored for future use

(a) (b)

Shortest round trip pathsGood communicationTime of performance

Number of iterations usedThorough inspection

Exit the system

Are theysuccessful

Monitor memorywith the

utilizationinterface platform

Record the results

Acquire informationfrom the bottom

layer

Build a hybrid modelusing QL and ACS

algorithms

Two robots

Test robots for

Add one morerobot to the

system

Results will beused in the

application layer

Database

Yes No

Database

Figure 3 Breakdown of the framework (a) Contributions of the framework Layer by layer and (b) processes of the multirobot behaviouralsystem

presented in Figure 5 The possible state actions of the robotsare presented in Table 3 The states of the robots are reducedas follows searching or inspecting for roof cracks and toxicgas levels in each state or room and recording the outcomeof the inspection in the robotsrsquo memories The processesinvolved in achieving good communication while the robotsinspect the states are broadcasting inspected states to theother robots and ignoring the inspectedbroadcasted stateavoiding collision with obstacles and other robots and finallymoving to the next available state that has not been inspected

The Q-learning algorithm is used to determine whataction is to be selected in any particular state Let theaction (119886) = inspect ignore shutdown state space (119904) =dimensions in the topological map sensor readings (119904

119903)

hazardous conditions (119867119888) = roof crack (RC) toxic gas level

(TG) Each robot is configured with an aerial scanner andchemical sensitive sensor that will provide readings (119904

119903)

to determine if there is a hazardous condition 119867119888 in the

environment The selection of an action by a robot throughthe Q-learning algorithm is based on the information fromthe broadcast The whole framework uses a decentralized 119876-learning scheme such that each robot has its own threadwith its Q-learning and table All 119876-values on the tables ofthe robots in the team are initialized to zero that is statescorresponding to such positions have not been inspectedWhen a robot enters a new state it broadcast its informationfor the current state to all other robots in the environmentThe broadcast indicates whether the current state has been

Mathematical Problems in Engineering 7

F

H

Houtside

Robot Bstarting position

Robot Astarting position

Linkbw A and B

mine

D

A B

E

G

C

Figure 4 Model of the environment with two entrances and exits(2EE)

F

BA (50 100)

(50 100)

(50 100)

(50 100)

(50 100) 150

150

(50 100)

(50 100)

(50

100

)

(50

100

)

(50

100

)

D E G

C

H

Goal state

Figure 5 Events and transition diagram of the modeled environ-ment with119867 as goal state

inspected or not The broadcast is a search of correspondingpositions in the memories of all visible robots If the resultantsearch returns a zero the broadcast indicates that the statehas not been inspected and if it returns a value greater thanzero it indicates that the state is not available for inspectionAll robots must receive a broadcast before they act in anyparticular state When a robot gets to a state it receives abroadcast and passes its value to the QL algorithm to makea decision The robot carries out the decision of the QLalgorithm The policy of the Q-learning algorithm makes arobot carry out an inspection if the resultant search of thebroadcast returns zero and ignores it if the resultant searchis greater than zero A robot only shuts down if all states havebeen visited and inspected As the broadcast information ispassed to the Q-learning algorithm the policy is iteratedtowards an optimal value and condition

The broadcast avoids the cost of multiple inspections andthe QL ensures that robots take appropriate actions Theonly state in which inspection is carried out would havesensor readings (119904

119903) indicating 119867

119888 For taking an action (119886)

in state (119904) the robot gets a reward (119877) which is used in(10) to compute the 119876-value for the current state and cansend a broadcast to other robots Figure 5 and Table 3 showthe restrictions and possible transitions among node pointsindicating the possible rewards for any particular action (119886)

in any state (119904) Every other transition besides the goal state

could result in a reward of 50 or 100 and at completionthe reward is the maximum 150 We consider it wasteful tomake the robots scan first before sharing intelligence becausesuch action would slow them down and make them expendmore energy Robots are to provide the inspection resultsshowing actions taken in different states and the nature ofconditions detected in any particular state The introductionof the broadcast approach to determine the teamrsquos explorationreduces the execution time and energy cost of the wholeteamandmakes the collaboration effective So in amultirobotscenario the task can be effectively executed if the robots aremade to share intelligence as they progress Robots do nothave to waste time and energy in doing the same task alreadycarried out by other robots in the team

GIVEN On a mathematical justification of the above sup-pose a safety preinspection of toxic gases or rock fall or somecombination of the two is being carried out on a piece ofcomplex underground terrain in Figure 1 say 119871Km2 thereis a limited number of MRS with different capacities 119877 andprecious inspection time 119879minutes Every regionstate 119909

1in

the terrain requires a capacity of robot1198771ofMRS and limited

time 1198791while every state 119909

119899requires robot 119877

119899of MRS and

inspection time 119879119899 Let 119875

1be the positive reward of QL for

correct behaviour on state 1199091and let 119875

119899be the reward on

state 119909119899 This research aims to maximise positive rewards by

choosing optimal values for states 1199091 119909

119899as follows

Maximise

1198751sdot 1199091+ 1198752sdot 1199092+ sdot sdot sdot + 119875

119899sdot 119909119899

(objective function)

Subject to constraints

1199091+ 1199092+ sdot sdot sdot + 119909

119899le 119871

(limited total states)

1198771sdot 1199091+ 1198772sdot 1199092+ sdot sdot sdot + 119877

119899sdot 119909119899le 119877

(limited MRS capacity)

1198791sdot 1199091+ 1198792sdot 1199092+ sdot sdot sdot + 119879

119899sdot 119909119899le 119879

(limited inspection time)

Non-negativity constraints

1199091ge 0 119909

2ge 0 119909

119899ge 0

(MRS cannot inspect negative states)

(1)

In terms of solving this optimization problem we use theproposed QLACS model to compare the time and number ofstates inspectedThe number of robots used is also comparedThe graphs results in Section 4 also give more insight into thesolution of the problem

32 Basic Navigation and Cooperative Behaviours UsingQLACS QLACS has two components The first componentis formed by an improved ACS and the second component

8 Mathematical Problems in Engineering

is formed by an improved QL The improvement occursbecause some heuristics were added to the ordinary QL andACS to achieve the hybrid QLACS However the secondcomponent ofQLACS which is an improvedQL was initiallyused to solve the proposed problem After much analysis werealized that the system needs to be optimized for effectivecooperation and communication

Using the second component of QLACS to solve the basicnavigation and cooperative behaviour the possible actionswere set for each robot as inspect ignore and shutdown (afterreaching the goal state 119867) Also a reward system that willreflect the possible actions of the robots was chosen In otherwords a robot gets 150 points only when the goal is achieved(shutdown) 100 points for ignoring an already inspected area(ignore) and 50 points for inspecting an uninspected area(inspect) Figure 5 shows the transition events of the modeland Table 3 displays the possible state action for each robotThe way the QLACS second component works here is basedon both navigation and communication behaviours

Achieving navigational behaviour with the second com-ponent of QLACS has some cost associated to it In ourscenario because we want the robots to share intelligenceby broadcasting results (robots search through other robotsrsquomemory) our problem is not solely navigational but alsocooperative Our behavioural actions are inspect ignoreand shutdown We noted that these actions of interest arenot navigation oriented there is no way we could usethem in making decisions on transition The functions fordecision can be integrated to assist the other Thereforeour behavioural model is an integration of two behaviours(1) navigational behaviour and (2) cooperating behaviourthrough decision making The integration works with aroute finding method called RandomStateSelector which weintroduced in the second component of QLACS to helpdetermine where the robot goes from the starting point tothe exit point Two parts of the RandomStateSelector methodare introduced in this work The first one is the Random-StateSelector C H which is used to transit from state 119862 to119867and the second one RandomStateSelector F H transits fromstate 119865 to 119867 This method works but not effectively becausesome of the states are repeated several times because of therandom selection method However the decision part of thissecond component of QLACS which is handled by a methodcalled CheckInspection worked efficiently CheckInspection isresponsible for sharing the broadcast among the agents Thebroadcast looks at all the stored 119876-values on all the robotsand returns a signal for the action that needs to be takenTherefore we concluded that the heuristically acceleratedcomponent of QLACS has proven to be effective by showingevidence of effective communication in making inspectiondecisions using our model It did not guarantee shortest pos-sible time for inspection because of repeated states decisionsIn this light we only harnessed the communication strengthof the second component of QLACS for communicationand cooperation Figure 5 and Table 3 form the basis for theQLACS second component

To take care of the random state selector problem encoun-tered in implementing the algorithm used for the secondpart of QLACS we introduced an optimized route finder

Table 4 Combined adjacency and weight matrix

119865 119866 119864 119863 119860 119861 119862 119867

119865 0 1 1 1 0 0 0 1119866 1 0 2 0 0 0 1 0119864 1 2 0 2 0 2 0 0119863 1 0 2 0 2 0 0 0119860 0 0 0 2 0 2 0 0119861 0 0 2 0 2 0 1 0119862 0 1 0 0 0 1 0 1119867 1 0 0 0 0 0 1 0

1

11

1

12

2 2

2

2

E

C

G

B

A

D

F

Figure 6 Weighted mapgraph of the model environment

algorithm This route finder algorithm which forms the firstcomponent of QLACS is a swarm intelligence techniqueFigure 6 andTable 4 form the basis of the exploration space ofthe agents (ants) which achieved the optimum route findingThe weighted graph in Figure 6 is based on the number ofobstacles the ants will encounter from the points of entry 119865and 119862 The combined table in Table 4 contains the weightsof the obstacles and evidence of an edge between any twovertices (states) It shows that there is a connection betweenany two vertices in a graph Generally a ldquo1rdquo or ldquo2rdquo depictsthe existence of an edge while a ldquo0rdquo represents the absenceof an edge that is no transition between such vertices Theconstructed graph is an undirected multigraph providingevidence of some agents coming from 119865 or 119862 of the mine(logical space) It is unidirectional because agents can movein any particular direction (multigraph) This means that thesameweights on the edges apply to both directionsThe graphdoes not show119867 we assume that once the agents reach 119865 or119862 they exit if all inspections have been done

33 HybridModel Development Theparameters for buildingthe analytical hybrid QLACS (equations (2) through (11))model are presented in Tables 5 and 6

331 Algorithmic and Mathematical Analysis

ACS Starts Computation of edge attractiveness

120578119894119895=

1

119863119894119895

(2)

Mathematical Problems in Engineering 9

Table 5 A list of parameters for the ACS model

ACO parameters Meaning120572 Pheromone influence factor120573 Influence of adjacent node distance120588 Pheromone evaporation coefficient119876 Attractiveness constant119890 Visited edge1198901015840 Edge not visited119871119896 Length tour of ant k

120591 Pheromone concentration (amount)120578 Specific visibility function (attractiveness)Δ120591119896 Pheromone concentration by Kth ant

119875119903(119894 119895) Probability of moving from i to j

119863119894119895 Visibility or distance between i and j

119891119894 Fitness of individual in a population119875119894 Probability of being selected among 119891

119894

119873 Number of individuals in the population119894 119895 Denotes any two adjacent nodes in the graph119872119896 Set of unvisited nodes

Table 6 A list of parameters for the QL model

QL Parameters Meaning119876 119876-value update119904 State119886 Action119877 Reward120574 Learning rate

Computation of instantaneous pheromone by ant 119896

Δ120591119896=

119876

119871119896

(3)

Update of pheromone

120591119894119895= (1 minus 120588) lowast 120591

119894119895+ Δ120591119896

119894119895 (4)

Computation of edge probability

119875119903(119894 119895) =

[120591119894119895]120572[120578119894119895]120573

Σ1198901015840=(119894119895)

[120591119894119895]120572[120578119894119895]120573 (5)

Adoption of roulette wheel

Cumulative (119875119903(119894 119895)) =

119873+1

sum119894=1

119875119903(119894 119895) (6)

119891119894=sum119873

119895=1119891119895

119873 (7)

119875119894=

119891119894

sum119873

119895=1119891119895

(8)

Equations (2) through (8) build the complete route find-ing model Equations (2) through (4) are prerequisite to (5)Equation (5) is prerequisite to roulette wheel selection At theend of (8) new states are selected and the trail is updatedThebest path from both directions is selected and used as inputin Q-learning Note that 119863

119894119895= weight on edges 119875

119903(119894 119895) =

chance of moving to a node 119864(119868 119895) 119871119896= sum of visited nodes

by ant 119896

QL Starts Each robot in its QL threadcomputes its learning rate

120574 =05

[1 + Frequency (119904 119886)] (9)

119876-values are updated

119876 (119904 119886) = 119877 (119904 119886) + 120574 (10)

making a broadcast (Decision = InspectIgnoreShut-down)

119876 (119904 119886) =

119876 = 0 Inspect if 119904119895

= goalstate119876 gt 0 Ignore if 119904

119895= goalstate

119876 ge 0 Shutdown if 119904119895= goalstate

(11)

Equation (9) is Gamma the learning rate which is alwaysbetween zero and 1 This equation is calculated based onthe frequency of action of each robot in inspecting statesEquations (9) to (11) are state-dependent The states are keptin a buffer and then accessed at run time ACS and QL do notwork simultaneously ACS works to completion and QL takesthe final output as its input ACS is not repeatedly called whileQL is working

Our behavioural model is an integration of two algo-rithms (1) route finding algorithm (2) communication andcooperative algorithm The integration works the followingway The optimal route finder (ACS) determines where therobot goes from the starting point to the destination whileQL determines what action it takes when it gets to any ofthe states This collaboration works effectively because theoptimal route finder has been proven to give the best possibletransitions and the heuristically accelerated QL has provento be effective by showing evidence of effective commu-nication in making inspection decisions Consequently itguarantees the shortest possible time for inspection in theabsence of wasteful inspection decisions This frameworkforms the basis for our cooperative behaviourmodel forMRS(QLACS)Thepseudocode for the implementation ofQLACSis outlined in Algorithm 2

332 QLACS Hybrid Approach Evolution Figure 7 is thearchitecture of hybrid QLACS explaining the handshakebetween the two components

(i) Graph Construct Module This is the interconnection ofstates in our model environment (see Figure 4) It encom-passes all possible transitions from 119865 to 119862 and vice versaThus from themodelwe construct an adjacencyweight graphmatrix that can be traversed by any graph-oriented algorithm

10 Mathematical Problems in Engineering

INPUT Edge distance(obstacles) pheromones antsrsquo trail associated probabilitiesstarting and terminating indexes that is from F or COUTPUT Effective cooperation inspection and navigation

(1) Boolean CompletedFlag = False Boolean variable indicates completion for all the threads(2) Declare CompletedThreadBuffer Data structure stores Info about completed thread(3) Initialize all 119876values to Zero All 119876values positions are initialized to zero(4) Initialize Best Paths From ACO algorithm starting from 119865 and 119862(5) While (CompletedFlag ltgt True) Checks for when all robots have finished and flag is true

BeginCreate N number of Threads in ParallelThreads get Current States in Parallel from ACO algorithmThreads get Next States and Indexes in Parallel from ACO algorithmThreads compare 119876values of all corresponding positions of Current States (Each Robot Broadcast119876value info)IF ((Q == 0)amp (ThreadNextState ltgt GoalState)) Checks if a particulate state is availableBeginState is Available Robot with the CurrentThreadID InspectsCompute and Update 119876value

EndIF (Q gt 0) checks if a state is not available because an already inspected state has 119876 gt 0BeginState is already inspected Robot with the CurrentThreadID IgnoreCompute and Update 119876value

EndIF ((Q == 0) amp (ThreadNextState == GoalState)) Checks for goal state and shuts downBeginCompute and Update 119876valueGoal state is reached and Inspection CompletedThread with the CurrentThreadID Shuts downStore CurrentThreadID in CompletedThreadBuffer

EndIF (Count [CompletedThreadBuffer] == NumberOfRobot) Learning stops when this happensBeginCompletedFlag = True

EndEnd of While Loop

Algorithm 2 Pseudocode for QLACS

No

No

Yes

Yes2

robots

Graphconstruct

Initializeparameters

module

Stateselectionmodule

Stateaction

module

QLACSupdatemodule

Statebroadcasting

module

Goal statemodule

New statemodule

Decisionmodule

Cooperativeinspection

module

Convergencedecisionmodule

Transitionmodule

Updatemodule

Figure 7 Hybrid architecture

Mathematical Problems in Engineering 11

Example I of QLACS (Using Figure 5 for optimized route finding)Parameters used in the first component of QLACS119876 = 20 120572 = 3 120573 = 2 120588 = 001

Starting State 119865Terminating State 119862119865Terminating condition When all states have been visited at least once and it is at terminating stateState Space = 119865 119866 119864119863 119860119861 119862Initialize pheromones positions to 001Rand = (0 1) returns a random value between 0 and 1Equations(i) Computation of attractiveness 120578

119894119895using (2)

(ii) Computation of instantaneous pheromones by ant 119896 for all potential states using (3)(iii) Pheromone update using (4)(iv) Computation of probabilities for the potential states using (5)(v) Adaptation of roulette Wheel using (6)Equation (5) can be reduced to Let 119908(119894 119895) = [120591

119894119895]120572

[120578119894119895]120573

Sum =119873+

sum119894=1

119908 (119894 119895)

So 119875119903(119894 119895) = 119908

sum

Algorithm 3 Parameters for QLACS Example I

In our case there are eight states primarily seven internalstates and a common terminating state Since the states arenot just linear the environment allows for multiple optionsthat is an agentant can choose from any present state Thistype of scenario is also called a multigoal scenario

(ii) State Selection Module Here the agents select the startand end states which according to our model in Figure 4 are119865 and 119862 These states are selected based on the cumulativeprobability of two adjacent nodes in (6)

(iii) Transition Module This module takes care of the tran-sition rules of the agents by calculating the pheromoneconcentrations distances and probabilities using (2) through(4)

(iv)UpdateModuleAfter transition fromone state to anotheran update of the pheromone is computed after whichmultipath planning with shortest path is achieved

(v) Convergence Decision Module This is where the besttrailpath decision is takenThis is the end product of the firstcomponent of QLACS which is then moved to the secondcomponent of QLACS for cooperative behaviour

(vi) Cooperative Inspection Module This is where the robotsare deployed to start inspection The robots are to use theacquired best paths starting from 119865 and 119862 respectively asinput for the second component of QLACS The two robotsare to use the learning rate from (9) to learn the environmentand use (10) for cooperation

(vii) State Broadcasting Module This module handles thebroadcasting behaviours of the two robots which areachieved by using (10) Each robot checks its memoryrepresented by 119876-values before taking any decision

(viii) State Action Module State broadcasting by each robot isimmediately followed by action selection In other words thestate to inspect or ignore is achieved here using (11)

(ix) QLACS Update Module After each action selection the119876-values of each robot are updated using (10)

(x) New State Module This module takes care of the robotrsquosnew state after updating the 119876-values Each robot runs in itsown threads managing its memory yet sharing information

(xi) Final Decision ModuleThis decision module determinesif the robot should exit the environment or still do moreinspections It is also controlled by (11)

(xii) Goal State Module The completion of the second com-ponent of QLACS is getting to the goal state after successfulinspection of statesThis goal state according to our model inFigure 4 is called119867

333 Analytical Scenario For the benefit of robotic andsystem engineers practitioners and researchers this paperuniquely presents scenarios on the ease of implementation ofthe proposed QLACS as described in Algorithms 3 and 4 andTables 7 and 10 The first component of QLACS is explainedin Example I shown in Algorithm 3 and Table 7 The firstcomponent that achieved optimal route paths for the robotshas the parameters listed in Algorithm 3 The calculation fordifferent states transition for the first component of QLACSis analysed in Table 7

Repeat steps 1ndash6 for subsequent current states until thetermination condition and state are reached At the end ofseven updates we have Tables 8 and 9 The total lengthshown in Table 8 represents the total number of obstaclesencountered by each agent while trailing to achieve theoptimal route for the robots The number of obstacle shown

12 Mathematical Problems in Engineering

Table7QLA

CSEx

ampleI

navigatio

nalanalytic

alsolutio

n

Startin

gsta

te119865

Potentialstates119863119864

119866(1)U

se(2)120578119865119863=1120578119865119864=1120578119865119866=1

(2)U

se(3)119871119896=1Δ120591119896(119865119863

)=21

=2Δ

120591119896(119865119864)=2Δ

120591119896(119865119866

)=2

(3)U

se(4)120591 119865119863=(1minus001)lowast001+2=20099=201120591119865119864=201120591119865119866=201

Firstp

heromon

eupd

ate

119865119866

119864

001

201

201

119863119860

119861

201

001

001

119862

001

(4)U

se(5)119908

(119865119863

)=[120591119865119863]120572[120578119865119863]120573=(201)3=812119908

(119865119864)=812119908

(119865119866

)=812

Sum

=119908(119865119863

)+119908(119865119864)+119908(119865119866

)=2436

119875119903(119865119863

)=

119908 sum

=812

2436=033119875119903(119865119864)=033119875119903(119865119866

)=033

Prob

abilitie

s119865

119866119864

0033

033

119863119860

119861

033

00

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0033

033

119863119860

119861

033

00

119862 0

CallRa

ndRa

nd=007R

andfalls

instate119864

Rou

lette

wheelselects119864

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

Currentstate119864

Potentialstates119861119863

119866(1)U

se(2)120578119864119861=12120578119864119863=12120578119864119866=1

(2)U

se(3)119871119896=(119865

minus119864minus119863)=1+2=3119871119896=(119865

minus119864minus119861)=1+2=3119871119896=(119865

minus119864minus119866)=1+2=3

Δ120591119896=2 3=067

(3)U

se(4)120591 119864119861=(1minus001)lowast201+067=266120591119864119863=266120591119864119866=266

Second

pherom

oneu

pdate

119865119866

119864

001

266

201

119863119860

119861

266

001

266

119862

001

(4)U

se(5)119908

(119864119861)=(266)3lowast(12)2=471119908

(119864119863

)=471119908

(119864119866

)=(266)3lowast(1)2=1882

Sum

=471+471+1882=2824

119875119903(119864119861)=

119908 sum

=471

2824=017119875119903(119864119863

)=

471

2824=017119875119903(119864119866)=1882

2824=067

Prob

abilitie

s119865

119866119864

0067

0

119863119860

119861

017

0017

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0067

0

119863119860

119861

017

0017

119862 0

CallRa

ndRa

nd=09Ra

ndfalls

instate119863

Rou

lette

wheelselects119863

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

119863

Mathematical Problems in Engineering 13

Example II of QLACS (For good cooperation and communication between robots)Parameters used in the second component of QLACS (Using output from the first component of QLACS)Reward Scheme Inspect = 50 Ignore = 100 Shutdown = 150State space Optimized path from QLACS R1 = 119865 119864119863 119860 119861 119862 119866 119862 and QLACS R2 = 119862 119861 119860119863 119865 119864 119866 119862

Starting State 119865119862119878119895= Terminating State 119862 then119867

Terminating condition When all states have been visitedInitialize 119876value positions to zerosEquations(i) Compute learning rate using (9)(ii) Compute update on 119876(119904 119886) using (10)

Algorithm 4 Parameters for QLACS Example II

Table 8 Pheromone update of a full cycle

Pheromone updateCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 001 201 201 201 001 001 001 1st update119864 001 266 201 266 001 266 001 2nd update119863 313 001 303 001 303 001 001 3rd update119860 001 001 001 045 001 045 001 4th update119861 001 001 067 001 067 001 07 5th update119862 001 091 001 001 001 089 001 6th update119866 071 001 069 001 001 001 071 7th update119862 = terminating state and ant 119896 has moved through all states at least onceTrail 119865119864119863119860119861119862119866119862Number of obstacles = 1 + 2 + 2 + 2 + 1 + 1 + 1 + 1 = 10

Table 9 Probability update of a full cycle

Probability tableCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 0 033 033 033 0 0 0119864 0 067 0 017 0 017 0119863 069 0 016 0 016 0 0119860 0 0 0 05 0 05 0119861 0 0 016 0 016 0 068119862 0 052 0 0 0 049 0119866 045 0 0 0 0 0 045119862 = terminating state

as 10 is calculated using the trail result in Table 8 Table 9displays the probability update table for a full cycle of routefinding In the case of this first example the first robot willterminate its inspection through 119862 and then119867

Algorithm 4 displays the parameters of the second exam-ple of QLACS The second component of QLACS handlesthis aspect of the model which is the communication andcooperative part Once the first component hands the outputto the second component it becomes the second componentinput and it runs with it From Algorithm 4 QLACS R1represents the input for robot 1 and QLACS R2 representsthe input for robot 2 received from the first component of

QLACS The terminating condition is when all states havebeen visited

The rest of Example II displayed in Table 10 explains thecooperative behavioural scenario from one state to anotherfor two robots The tables in the last row of Table 10 show thecommunication and cooperative output achieved using thesecond component of QLACS

4 Experimental Evaluations SafetyInspections for MRS Behaviours

The experimental results of implementing QLACS in anenvironment that consists of obstacles and links are tabu-lated in this section The experimental setup is explainedin Section 41 Different performance categories are shownin this section without communication category and withcommunication category In the without communicationcategory as displayed in Section 42 we found that robotscan inspect all states individually without knowing thatanother robot exists Robots can also inspect some of thestates thereby leaving some states not inspected The com-munication category is explained in Section 43 while theperformance of the QLACS measured with other existingmethods is tabulated in Section 44

41 Experimental Setup Figure 8 displays different sets ofexperiments conducted in the environment using two robotsFigure 8(a) shows how two robots resume inspection fromtwo different entrances In each set of experiments the robotstake the readings of the roof cracks and level of toxic gasesusing their sensors The same behaviours happen in Figures8(b) and 8(c) respectively at different inspection locationsThe inspection locations vary in the four experimental setupsshown in Figure 8

42 Experiment 1 Performance of QLACS without Coop-eration The result of implementing the QLACS withoutcommunication in the proposed environment (Figures 1 and4) is shown in Tables 11 and 12 In the case of Table 11 robot1 (R1) enters the mine through state 119865 while robot 2 (R2)enters the mine through state 119862 However each robot learnsby inspecting some of the states and ignoring some of the

14 Mathematical Problems in Engineering

Table 10 QLACS Example II cooperative behaviour

Starting simulation Robot 1Starting state F(1) Use (9)120574 (119865) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119865 (use (11))119876 (119865 119886) = 0

Selected action 119886 = inspect(3) use (10)119876 (119865 119886) = 50 + 025(0) = 50

End of value iteration

Current state E Robot 1(1) Use (9)120574 (119864) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119864 (Use (11))119876 (119864 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119864 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119862 (use (11))119876 (119862 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119862 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state B Robot 2(1) Use (9)120574 (119861) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119861 (use (11))119876 (119861 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119861 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current stateD Robot 1(1) Use (9)120574 (119863) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119863 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119863 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current State A Robot 2(1) Use (9)120574 (119860) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119860 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119860 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iterationCurrent state A Robot 1(1) Use (9)120574 (119860) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119860 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119860 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current stateD Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119863 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119863 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state B Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state F Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119865 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119865 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state E Robot 2(1) Use (9)120574 (119864) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119864 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119864 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state G Robot 1(1) Use (9)120574 (119866) =

05

[1 + 1]=05

2= 025

(2) Broadcast (Use (11))119876 (119866 119886) = 0

Selected action 119886 = Inspect(3) Use (10)119876 (119866 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state G Robot 2(1) Use (9)120574 (119866) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119866 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119866 119886) = 100 + 04 (max (50 50 50))= 100 + 20 = 120

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119862) =

05

[1 + 04]=05

14= 036

(2) Broadcast (Use (11))119876 (119862 119886) = 50

Selected action 119886 = Ignore(3) Use (10)119876 (119862 119886) = 100 + 036 (max (50 50))= 100 + 18 = 118

End of value iterationCurrent state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 036]=

05

136= 037

(2) Broadcast (Use (11))119876 (119862 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119862 119886) = 100 + 037 (max (50 50))= 100 + 037 lowast 50 = 1185

End of value iteration

All QLACS 1 states exhaustedGoal stateH Robot 1(1) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(3) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

All QLACS 2 states exhaustedGoal stateH Robot 2(4) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(5) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(6) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Cooperative Behaviours with Swarm Intelligence in Multirobot

Mathematical Problems in Engineering 7

F

H

Houtside

Robot Bstarting position

Robot Astarting position

Linkbw A and B

mine

D

A B

E

G

C

Figure 4 Model of the environment with two entrances and exits(2EE)

F

BA (50 100)

(50 100)

(50 100)

(50 100)

(50 100) 150

150

(50 100)

(50 100)

(50

100

)

(50

100

)

(50

100

)

D E G

C

H

Goal state

Figure 5 Events and transition diagram of the modeled environ-ment with119867 as goal state

inspected or not The broadcast is a search of correspondingpositions in the memories of all visible robots If the resultantsearch returns a zero the broadcast indicates that the statehas not been inspected and if it returns a value greater thanzero it indicates that the state is not available for inspectionAll robots must receive a broadcast before they act in anyparticular state When a robot gets to a state it receives abroadcast and passes its value to the QL algorithm to makea decision The robot carries out the decision of the QLalgorithm The policy of the Q-learning algorithm makes arobot carry out an inspection if the resultant search of thebroadcast returns zero and ignores it if the resultant searchis greater than zero A robot only shuts down if all states havebeen visited and inspected As the broadcast information ispassed to the Q-learning algorithm the policy is iteratedtowards an optimal value and condition

The broadcast avoids the cost of multiple inspections andthe QL ensures that robots take appropriate actions Theonly state in which inspection is carried out would havesensor readings (119904

119903) indicating 119867

119888 For taking an action (119886)

in state (119904) the robot gets a reward (119877) which is used in(10) to compute the 119876-value for the current state and cansend a broadcast to other robots Figure 5 and Table 3 showthe restrictions and possible transitions among node pointsindicating the possible rewards for any particular action (119886)

in any state (119904) Every other transition besides the goal state

could result in a reward of 50 or 100 and at completionthe reward is the maximum 150 We consider it wasteful tomake the robots scan first before sharing intelligence becausesuch action would slow them down and make them expendmore energy Robots are to provide the inspection resultsshowing actions taken in different states and the nature ofconditions detected in any particular state The introductionof the broadcast approach to determine the teamrsquos explorationreduces the execution time and energy cost of the wholeteamandmakes the collaboration effective So in amultirobotscenario the task can be effectively executed if the robots aremade to share intelligence as they progress Robots do nothave to waste time and energy in doing the same task alreadycarried out by other robots in the team

GIVEN On a mathematical justification of the above sup-pose a safety preinspection of toxic gases or rock fall or somecombination of the two is being carried out on a piece ofcomplex underground terrain in Figure 1 say 119871Km2 thereis a limited number of MRS with different capacities 119877 andprecious inspection time 119879minutes Every regionstate 119909

1in

the terrain requires a capacity of robot1198771ofMRS and limited

time 1198791while every state 119909

119899requires robot 119877

119899of MRS and

inspection time 119879119899 Let 119875

1be the positive reward of QL for

correct behaviour on state 1199091and let 119875

119899be the reward on

state 119909119899 This research aims to maximise positive rewards by

choosing optimal values for states 1199091 119909

119899as follows

Maximise

1198751sdot 1199091+ 1198752sdot 1199092+ sdot sdot sdot + 119875

119899sdot 119909119899

(objective function)

Subject to constraints

1199091+ 1199092+ sdot sdot sdot + 119909

119899le 119871

(limited total states)

1198771sdot 1199091+ 1198772sdot 1199092+ sdot sdot sdot + 119877

119899sdot 119909119899le 119877

(limited MRS capacity)

1198791sdot 1199091+ 1198792sdot 1199092+ sdot sdot sdot + 119879

119899sdot 119909119899le 119879

(limited inspection time)

Non-negativity constraints

1199091ge 0 119909

2ge 0 119909

119899ge 0

(MRS cannot inspect negative states)

(1)

In terms of solving this optimization problem we use theproposed QLACS model to compare the time and number ofstates inspectedThe number of robots used is also comparedThe graphs results in Section 4 also give more insight into thesolution of the problem

32 Basic Navigation and Cooperative Behaviours UsingQLACS QLACS has two components The first componentis formed by an improved ACS and the second component

8 Mathematical Problems in Engineering

is formed by an improved QL The improvement occursbecause some heuristics were added to the ordinary QL andACS to achieve the hybrid QLACS However the secondcomponent ofQLACS which is an improvedQL was initiallyused to solve the proposed problem After much analysis werealized that the system needs to be optimized for effectivecooperation and communication

Using the second component of QLACS to solve the basicnavigation and cooperative behaviour the possible actionswere set for each robot as inspect ignore and shutdown (afterreaching the goal state 119867) Also a reward system that willreflect the possible actions of the robots was chosen In otherwords a robot gets 150 points only when the goal is achieved(shutdown) 100 points for ignoring an already inspected area(ignore) and 50 points for inspecting an uninspected area(inspect) Figure 5 shows the transition events of the modeland Table 3 displays the possible state action for each robotThe way the QLACS second component works here is basedon both navigation and communication behaviours

Achieving navigational behaviour with the second com-ponent of QLACS has some cost associated to it In ourscenario because we want the robots to share intelligenceby broadcasting results (robots search through other robotsrsquomemory) our problem is not solely navigational but alsocooperative Our behavioural actions are inspect ignoreand shutdown We noted that these actions of interest arenot navigation oriented there is no way we could usethem in making decisions on transition The functions fordecision can be integrated to assist the other Thereforeour behavioural model is an integration of two behaviours(1) navigational behaviour and (2) cooperating behaviourthrough decision making The integration works with aroute finding method called RandomStateSelector which weintroduced in the second component of QLACS to helpdetermine where the robot goes from the starting point tothe exit point Two parts of the RandomStateSelector methodare introduced in this work The first one is the Random-StateSelector C H which is used to transit from state 119862 to119867and the second one RandomStateSelector F H transits fromstate 119865 to 119867 This method works but not effectively becausesome of the states are repeated several times because of therandom selection method However the decision part of thissecond component of QLACS which is handled by a methodcalled CheckInspection worked efficiently CheckInspection isresponsible for sharing the broadcast among the agents Thebroadcast looks at all the stored 119876-values on all the robotsand returns a signal for the action that needs to be takenTherefore we concluded that the heuristically acceleratedcomponent of QLACS has proven to be effective by showingevidence of effective communication in making inspectiondecisions using our model It did not guarantee shortest pos-sible time for inspection because of repeated states decisionsIn this light we only harnessed the communication strengthof the second component of QLACS for communicationand cooperation Figure 5 and Table 3 form the basis for theQLACS second component

To take care of the random state selector problem encoun-tered in implementing the algorithm used for the secondpart of QLACS we introduced an optimized route finder

Table 4 Combined adjacency and weight matrix

119865 119866 119864 119863 119860 119861 119862 119867

119865 0 1 1 1 0 0 0 1119866 1 0 2 0 0 0 1 0119864 1 2 0 2 0 2 0 0119863 1 0 2 0 2 0 0 0119860 0 0 0 2 0 2 0 0119861 0 0 2 0 2 0 1 0119862 0 1 0 0 0 1 0 1119867 1 0 0 0 0 0 1 0

1

11

1

12

2 2

2

2

E

C

G

B

A

D

F

Figure 6 Weighted mapgraph of the model environment

algorithm This route finder algorithm which forms the firstcomponent of QLACS is a swarm intelligence techniqueFigure 6 andTable 4 form the basis of the exploration space ofthe agents (ants) which achieved the optimum route findingThe weighted graph in Figure 6 is based on the number ofobstacles the ants will encounter from the points of entry 119865and 119862 The combined table in Table 4 contains the weightsof the obstacles and evidence of an edge between any twovertices (states) It shows that there is a connection betweenany two vertices in a graph Generally a ldquo1rdquo or ldquo2rdquo depictsthe existence of an edge while a ldquo0rdquo represents the absenceof an edge that is no transition between such vertices Theconstructed graph is an undirected multigraph providingevidence of some agents coming from 119865 or 119862 of the mine(logical space) It is unidirectional because agents can movein any particular direction (multigraph) This means that thesameweights on the edges apply to both directionsThe graphdoes not show119867 we assume that once the agents reach 119865 or119862 they exit if all inspections have been done

33 HybridModel Development Theparameters for buildingthe analytical hybrid QLACS (equations (2) through (11))model are presented in Tables 5 and 6

331 Algorithmic and Mathematical Analysis

ACS Starts Computation of edge attractiveness

120578119894119895=

1

119863119894119895

(2)

Mathematical Problems in Engineering 9

Table 5 A list of parameters for the ACS model

ACO parameters Meaning120572 Pheromone influence factor120573 Influence of adjacent node distance120588 Pheromone evaporation coefficient119876 Attractiveness constant119890 Visited edge1198901015840 Edge not visited119871119896 Length tour of ant k

120591 Pheromone concentration (amount)120578 Specific visibility function (attractiveness)Δ120591119896 Pheromone concentration by Kth ant

119875119903(119894 119895) Probability of moving from i to j

119863119894119895 Visibility or distance between i and j

119891119894 Fitness of individual in a population119875119894 Probability of being selected among 119891

119894

119873 Number of individuals in the population119894 119895 Denotes any two adjacent nodes in the graph119872119896 Set of unvisited nodes

Table 6 A list of parameters for the QL model

QL Parameters Meaning119876 119876-value update119904 State119886 Action119877 Reward120574 Learning rate

Computation of instantaneous pheromone by ant 119896

Δ120591119896=

119876

119871119896

(3)

Update of pheromone

120591119894119895= (1 minus 120588) lowast 120591

119894119895+ Δ120591119896

119894119895 (4)

Computation of edge probability

119875119903(119894 119895) =

[120591119894119895]120572[120578119894119895]120573

Σ1198901015840=(119894119895)

[120591119894119895]120572[120578119894119895]120573 (5)

Adoption of roulette wheel

Cumulative (119875119903(119894 119895)) =

119873+1

sum119894=1

119875119903(119894 119895) (6)

119891119894=sum119873

119895=1119891119895

119873 (7)

119875119894=

119891119894

sum119873

119895=1119891119895

(8)

Equations (2) through (8) build the complete route find-ing model Equations (2) through (4) are prerequisite to (5)Equation (5) is prerequisite to roulette wheel selection At theend of (8) new states are selected and the trail is updatedThebest path from both directions is selected and used as inputin Q-learning Note that 119863

119894119895= weight on edges 119875

119903(119894 119895) =

chance of moving to a node 119864(119868 119895) 119871119896= sum of visited nodes

by ant 119896

QL Starts Each robot in its QL threadcomputes its learning rate

120574 =05

[1 + Frequency (119904 119886)] (9)

119876-values are updated

119876 (119904 119886) = 119877 (119904 119886) + 120574 (10)

making a broadcast (Decision = InspectIgnoreShut-down)

119876 (119904 119886) =

119876 = 0 Inspect if 119904119895

= goalstate119876 gt 0 Ignore if 119904

119895= goalstate

119876 ge 0 Shutdown if 119904119895= goalstate

(11)

Equation (9) is Gamma the learning rate which is alwaysbetween zero and 1 This equation is calculated based onthe frequency of action of each robot in inspecting statesEquations (9) to (11) are state-dependent The states are keptin a buffer and then accessed at run time ACS and QL do notwork simultaneously ACS works to completion and QL takesthe final output as its input ACS is not repeatedly called whileQL is working

Our behavioural model is an integration of two algo-rithms (1) route finding algorithm (2) communication andcooperative algorithm The integration works the followingway The optimal route finder (ACS) determines where therobot goes from the starting point to the destination whileQL determines what action it takes when it gets to any ofthe states This collaboration works effectively because theoptimal route finder has been proven to give the best possibletransitions and the heuristically accelerated QL has provento be effective by showing evidence of effective commu-nication in making inspection decisions Consequently itguarantees the shortest possible time for inspection in theabsence of wasteful inspection decisions This frameworkforms the basis for our cooperative behaviourmodel forMRS(QLACS)Thepseudocode for the implementation ofQLACSis outlined in Algorithm 2

332 QLACS Hybrid Approach Evolution Figure 7 is thearchitecture of hybrid QLACS explaining the handshakebetween the two components

(i) Graph Construct Module This is the interconnection ofstates in our model environment (see Figure 4) It encom-passes all possible transitions from 119865 to 119862 and vice versaThus from themodelwe construct an adjacencyweight graphmatrix that can be traversed by any graph-oriented algorithm

10 Mathematical Problems in Engineering

INPUT Edge distance(obstacles) pheromones antsrsquo trail associated probabilitiesstarting and terminating indexes that is from F or COUTPUT Effective cooperation inspection and navigation

(1) Boolean CompletedFlag = False Boolean variable indicates completion for all the threads(2) Declare CompletedThreadBuffer Data structure stores Info about completed thread(3) Initialize all 119876values to Zero All 119876values positions are initialized to zero(4) Initialize Best Paths From ACO algorithm starting from 119865 and 119862(5) While (CompletedFlag ltgt True) Checks for when all robots have finished and flag is true

BeginCreate N number of Threads in ParallelThreads get Current States in Parallel from ACO algorithmThreads get Next States and Indexes in Parallel from ACO algorithmThreads compare 119876values of all corresponding positions of Current States (Each Robot Broadcast119876value info)IF ((Q == 0)amp (ThreadNextState ltgt GoalState)) Checks if a particulate state is availableBeginState is Available Robot with the CurrentThreadID InspectsCompute and Update 119876value

EndIF (Q gt 0) checks if a state is not available because an already inspected state has 119876 gt 0BeginState is already inspected Robot with the CurrentThreadID IgnoreCompute and Update 119876value

EndIF ((Q == 0) amp (ThreadNextState == GoalState)) Checks for goal state and shuts downBeginCompute and Update 119876valueGoal state is reached and Inspection CompletedThread with the CurrentThreadID Shuts downStore CurrentThreadID in CompletedThreadBuffer

EndIF (Count [CompletedThreadBuffer] == NumberOfRobot) Learning stops when this happensBeginCompletedFlag = True

EndEnd of While Loop

Algorithm 2 Pseudocode for QLACS

No

No

Yes

Yes2

robots

Graphconstruct

Initializeparameters

module

Stateselectionmodule

Stateaction

module

QLACSupdatemodule

Statebroadcasting

module

Goal statemodule

New statemodule

Decisionmodule

Cooperativeinspection

module

Convergencedecisionmodule

Transitionmodule

Updatemodule

Figure 7 Hybrid architecture

Mathematical Problems in Engineering 11

Example I of QLACS (Using Figure 5 for optimized route finding)Parameters used in the first component of QLACS119876 = 20 120572 = 3 120573 = 2 120588 = 001

Starting State 119865Terminating State 119862119865Terminating condition When all states have been visited at least once and it is at terminating stateState Space = 119865 119866 119864119863 119860119861 119862Initialize pheromones positions to 001Rand = (0 1) returns a random value between 0 and 1Equations(i) Computation of attractiveness 120578

119894119895using (2)

(ii) Computation of instantaneous pheromones by ant 119896 for all potential states using (3)(iii) Pheromone update using (4)(iv) Computation of probabilities for the potential states using (5)(v) Adaptation of roulette Wheel using (6)Equation (5) can be reduced to Let 119908(119894 119895) = [120591

119894119895]120572

[120578119894119895]120573

Sum =119873+

sum119894=1

119908 (119894 119895)

So 119875119903(119894 119895) = 119908

sum

Algorithm 3 Parameters for QLACS Example I

In our case there are eight states primarily seven internalstates and a common terminating state Since the states arenot just linear the environment allows for multiple optionsthat is an agentant can choose from any present state Thistype of scenario is also called a multigoal scenario

(ii) State Selection Module Here the agents select the startand end states which according to our model in Figure 4 are119865 and 119862 These states are selected based on the cumulativeprobability of two adjacent nodes in (6)

(iii) Transition Module This module takes care of the tran-sition rules of the agents by calculating the pheromoneconcentrations distances and probabilities using (2) through(4)

(iv)UpdateModuleAfter transition fromone state to anotheran update of the pheromone is computed after whichmultipath planning with shortest path is achieved

(v) Convergence Decision Module This is where the besttrailpath decision is takenThis is the end product of the firstcomponent of QLACS which is then moved to the secondcomponent of QLACS for cooperative behaviour

(vi) Cooperative Inspection Module This is where the robotsare deployed to start inspection The robots are to use theacquired best paths starting from 119865 and 119862 respectively asinput for the second component of QLACS The two robotsare to use the learning rate from (9) to learn the environmentand use (10) for cooperation

(vii) State Broadcasting Module This module handles thebroadcasting behaviours of the two robots which areachieved by using (10) Each robot checks its memoryrepresented by 119876-values before taking any decision

(viii) State Action Module State broadcasting by each robot isimmediately followed by action selection In other words thestate to inspect or ignore is achieved here using (11)

(ix) QLACS Update Module After each action selection the119876-values of each robot are updated using (10)

(x) New State Module This module takes care of the robotrsquosnew state after updating the 119876-values Each robot runs in itsown threads managing its memory yet sharing information

(xi) Final Decision ModuleThis decision module determinesif the robot should exit the environment or still do moreinspections It is also controlled by (11)

(xii) Goal State Module The completion of the second com-ponent of QLACS is getting to the goal state after successfulinspection of statesThis goal state according to our model inFigure 4 is called119867

333 Analytical Scenario For the benefit of robotic andsystem engineers practitioners and researchers this paperuniquely presents scenarios on the ease of implementation ofthe proposed QLACS as described in Algorithms 3 and 4 andTables 7 and 10 The first component of QLACS is explainedin Example I shown in Algorithm 3 and Table 7 The firstcomponent that achieved optimal route paths for the robotshas the parameters listed in Algorithm 3 The calculation fordifferent states transition for the first component of QLACSis analysed in Table 7

Repeat steps 1ndash6 for subsequent current states until thetermination condition and state are reached At the end ofseven updates we have Tables 8 and 9 The total lengthshown in Table 8 represents the total number of obstaclesencountered by each agent while trailing to achieve theoptimal route for the robots The number of obstacle shown

12 Mathematical Problems in Engineering

Table7QLA

CSEx

ampleI

navigatio

nalanalytic

alsolutio

n

Startin

gsta

te119865

Potentialstates119863119864

119866(1)U

se(2)120578119865119863=1120578119865119864=1120578119865119866=1

(2)U

se(3)119871119896=1Δ120591119896(119865119863

)=21

=2Δ

120591119896(119865119864)=2Δ

120591119896(119865119866

)=2

(3)U

se(4)120591 119865119863=(1minus001)lowast001+2=20099=201120591119865119864=201120591119865119866=201

Firstp

heromon

eupd

ate

119865119866

119864

001

201

201

119863119860

119861

201

001

001

119862

001

(4)U

se(5)119908

(119865119863

)=[120591119865119863]120572[120578119865119863]120573=(201)3=812119908

(119865119864)=812119908

(119865119866

)=812

Sum

=119908(119865119863

)+119908(119865119864)+119908(119865119866

)=2436

119875119903(119865119863

)=

119908 sum

=812

2436=033119875119903(119865119864)=033119875119903(119865119866

)=033

Prob

abilitie

s119865

119866119864

0033

033

119863119860

119861

033

00

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0033

033

119863119860

119861

033

00

119862 0

CallRa

ndRa

nd=007R

andfalls

instate119864

Rou

lette

wheelselects119864

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

Currentstate119864

Potentialstates119861119863

119866(1)U

se(2)120578119864119861=12120578119864119863=12120578119864119866=1

(2)U

se(3)119871119896=(119865

minus119864minus119863)=1+2=3119871119896=(119865

minus119864minus119861)=1+2=3119871119896=(119865

minus119864minus119866)=1+2=3

Δ120591119896=2 3=067

(3)U

se(4)120591 119864119861=(1minus001)lowast201+067=266120591119864119863=266120591119864119866=266

Second

pherom

oneu

pdate

119865119866

119864

001

266

201

119863119860

119861

266

001

266

119862

001

(4)U

se(5)119908

(119864119861)=(266)3lowast(12)2=471119908

(119864119863

)=471119908

(119864119866

)=(266)3lowast(1)2=1882

Sum

=471+471+1882=2824

119875119903(119864119861)=

119908 sum

=471

2824=017119875119903(119864119863

)=

471

2824=017119875119903(119864119866)=1882

2824=067

Prob

abilitie

s119865

119866119864

0067

0

119863119860

119861

017

0017

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0067

0

119863119860

119861

017

0017

119862 0

CallRa

ndRa

nd=09Ra

ndfalls

instate119863

Rou

lette

wheelselects119863

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

119863

Mathematical Problems in Engineering 13

Example II of QLACS (For good cooperation and communication between robots)Parameters used in the second component of QLACS (Using output from the first component of QLACS)Reward Scheme Inspect = 50 Ignore = 100 Shutdown = 150State space Optimized path from QLACS R1 = 119865 119864119863 119860 119861 119862 119866 119862 and QLACS R2 = 119862 119861 119860119863 119865 119864 119866 119862

Starting State 119865119862119878119895= Terminating State 119862 then119867

Terminating condition When all states have been visitedInitialize 119876value positions to zerosEquations(i) Compute learning rate using (9)(ii) Compute update on 119876(119904 119886) using (10)

Algorithm 4 Parameters for QLACS Example II

Table 8 Pheromone update of a full cycle

Pheromone updateCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 001 201 201 201 001 001 001 1st update119864 001 266 201 266 001 266 001 2nd update119863 313 001 303 001 303 001 001 3rd update119860 001 001 001 045 001 045 001 4th update119861 001 001 067 001 067 001 07 5th update119862 001 091 001 001 001 089 001 6th update119866 071 001 069 001 001 001 071 7th update119862 = terminating state and ant 119896 has moved through all states at least onceTrail 119865119864119863119860119861119862119866119862Number of obstacles = 1 + 2 + 2 + 2 + 1 + 1 + 1 + 1 = 10

Table 9 Probability update of a full cycle

Probability tableCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 0 033 033 033 0 0 0119864 0 067 0 017 0 017 0119863 069 0 016 0 016 0 0119860 0 0 0 05 0 05 0119861 0 0 016 0 016 0 068119862 0 052 0 0 0 049 0119866 045 0 0 0 0 0 045119862 = terminating state

as 10 is calculated using the trail result in Table 8 Table 9displays the probability update table for a full cycle of routefinding In the case of this first example the first robot willterminate its inspection through 119862 and then119867

Algorithm 4 displays the parameters of the second exam-ple of QLACS The second component of QLACS handlesthis aspect of the model which is the communication andcooperative part Once the first component hands the outputto the second component it becomes the second componentinput and it runs with it From Algorithm 4 QLACS R1represents the input for robot 1 and QLACS R2 representsthe input for robot 2 received from the first component of

QLACS The terminating condition is when all states havebeen visited

The rest of Example II displayed in Table 10 explains thecooperative behavioural scenario from one state to anotherfor two robots The tables in the last row of Table 10 show thecommunication and cooperative output achieved using thesecond component of QLACS

4 Experimental Evaluations SafetyInspections for MRS Behaviours

The experimental results of implementing QLACS in anenvironment that consists of obstacles and links are tabu-lated in this section The experimental setup is explainedin Section 41 Different performance categories are shownin this section without communication category and withcommunication category In the without communicationcategory as displayed in Section 42 we found that robotscan inspect all states individually without knowing thatanother robot exists Robots can also inspect some of thestates thereby leaving some states not inspected The com-munication category is explained in Section 43 while theperformance of the QLACS measured with other existingmethods is tabulated in Section 44

41 Experimental Setup Figure 8 displays different sets ofexperiments conducted in the environment using two robotsFigure 8(a) shows how two robots resume inspection fromtwo different entrances In each set of experiments the robotstake the readings of the roof cracks and level of toxic gasesusing their sensors The same behaviours happen in Figures8(b) and 8(c) respectively at different inspection locationsThe inspection locations vary in the four experimental setupsshown in Figure 8

42 Experiment 1 Performance of QLACS without Coop-eration The result of implementing the QLACS withoutcommunication in the proposed environment (Figures 1 and4) is shown in Tables 11 and 12 In the case of Table 11 robot1 (R1) enters the mine through state 119865 while robot 2 (R2)enters the mine through state 119862 However each robot learnsby inspecting some of the states and ignoring some of the

14 Mathematical Problems in Engineering

Table 10 QLACS Example II cooperative behaviour

Starting simulation Robot 1Starting state F(1) Use (9)120574 (119865) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119865 (use (11))119876 (119865 119886) = 0

Selected action 119886 = inspect(3) use (10)119876 (119865 119886) = 50 + 025(0) = 50

End of value iteration

Current state E Robot 1(1) Use (9)120574 (119864) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119864 (Use (11))119876 (119864 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119864 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119862 (use (11))119876 (119862 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119862 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state B Robot 2(1) Use (9)120574 (119861) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119861 (use (11))119876 (119861 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119861 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current stateD Robot 1(1) Use (9)120574 (119863) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119863 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119863 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current State A Robot 2(1) Use (9)120574 (119860) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119860 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119860 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iterationCurrent state A Robot 1(1) Use (9)120574 (119860) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119860 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119860 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current stateD Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119863 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119863 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state B Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state F Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119865 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119865 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state E Robot 2(1) Use (9)120574 (119864) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119864 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119864 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state G Robot 1(1) Use (9)120574 (119866) =

05

[1 + 1]=05

2= 025

(2) Broadcast (Use (11))119876 (119866 119886) = 0

Selected action 119886 = Inspect(3) Use (10)119876 (119866 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state G Robot 2(1) Use (9)120574 (119866) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119866 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119866 119886) = 100 + 04 (max (50 50 50))= 100 + 20 = 120

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119862) =

05

[1 + 04]=05

14= 036

(2) Broadcast (Use (11))119876 (119862 119886) = 50

Selected action 119886 = Ignore(3) Use (10)119876 (119862 119886) = 100 + 036 (max (50 50))= 100 + 18 = 118

End of value iterationCurrent state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 036]=

05

136= 037

(2) Broadcast (Use (11))119876 (119862 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119862 119886) = 100 + 037 (max (50 50))= 100 + 037 lowast 50 = 1185

End of value iteration

All QLACS 1 states exhaustedGoal stateH Robot 1(1) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(3) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

All QLACS 2 states exhaustedGoal stateH Robot 2(4) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(5) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(6) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Cooperative Behaviours with Swarm Intelligence in Multirobot

8 Mathematical Problems in Engineering

is formed by an improved QL The improvement occursbecause some heuristics were added to the ordinary QL andACS to achieve the hybrid QLACS However the secondcomponent ofQLACS which is an improvedQL was initiallyused to solve the proposed problem After much analysis werealized that the system needs to be optimized for effectivecooperation and communication

Using the second component of QLACS to solve the basicnavigation and cooperative behaviour the possible actionswere set for each robot as inspect ignore and shutdown (afterreaching the goal state 119867) Also a reward system that willreflect the possible actions of the robots was chosen In otherwords a robot gets 150 points only when the goal is achieved(shutdown) 100 points for ignoring an already inspected area(ignore) and 50 points for inspecting an uninspected area(inspect) Figure 5 shows the transition events of the modeland Table 3 displays the possible state action for each robotThe way the QLACS second component works here is basedon both navigation and communication behaviours

Achieving navigational behaviour with the second com-ponent of QLACS has some cost associated to it In ourscenario because we want the robots to share intelligenceby broadcasting results (robots search through other robotsrsquomemory) our problem is not solely navigational but alsocooperative Our behavioural actions are inspect ignoreand shutdown We noted that these actions of interest arenot navigation oriented there is no way we could usethem in making decisions on transition The functions fordecision can be integrated to assist the other Thereforeour behavioural model is an integration of two behaviours(1) navigational behaviour and (2) cooperating behaviourthrough decision making The integration works with aroute finding method called RandomStateSelector which weintroduced in the second component of QLACS to helpdetermine where the robot goes from the starting point tothe exit point Two parts of the RandomStateSelector methodare introduced in this work The first one is the Random-StateSelector C H which is used to transit from state 119862 to119867and the second one RandomStateSelector F H transits fromstate 119865 to 119867 This method works but not effectively becausesome of the states are repeated several times because of therandom selection method However the decision part of thissecond component of QLACS which is handled by a methodcalled CheckInspection worked efficiently CheckInspection isresponsible for sharing the broadcast among the agents Thebroadcast looks at all the stored 119876-values on all the robotsand returns a signal for the action that needs to be takenTherefore we concluded that the heuristically acceleratedcomponent of QLACS has proven to be effective by showingevidence of effective communication in making inspectiondecisions using our model It did not guarantee shortest pos-sible time for inspection because of repeated states decisionsIn this light we only harnessed the communication strengthof the second component of QLACS for communicationand cooperation Figure 5 and Table 3 form the basis for theQLACS second component

To take care of the random state selector problem encoun-tered in implementing the algorithm used for the secondpart of QLACS we introduced an optimized route finder

Table 4 Combined adjacency and weight matrix

119865 119866 119864 119863 119860 119861 119862 119867

119865 0 1 1 1 0 0 0 1119866 1 0 2 0 0 0 1 0119864 1 2 0 2 0 2 0 0119863 1 0 2 0 2 0 0 0119860 0 0 0 2 0 2 0 0119861 0 0 2 0 2 0 1 0119862 0 1 0 0 0 1 0 1119867 1 0 0 0 0 0 1 0

1

11

1

12

2 2

2

2

E

C

G

B

A

D

F

Figure 6 Weighted mapgraph of the model environment

algorithm This route finder algorithm which forms the firstcomponent of QLACS is a swarm intelligence techniqueFigure 6 andTable 4 form the basis of the exploration space ofthe agents (ants) which achieved the optimum route findingThe weighted graph in Figure 6 is based on the number ofobstacles the ants will encounter from the points of entry 119865and 119862 The combined table in Table 4 contains the weightsof the obstacles and evidence of an edge between any twovertices (states) It shows that there is a connection betweenany two vertices in a graph Generally a ldquo1rdquo or ldquo2rdquo depictsthe existence of an edge while a ldquo0rdquo represents the absenceof an edge that is no transition between such vertices Theconstructed graph is an undirected multigraph providingevidence of some agents coming from 119865 or 119862 of the mine(logical space) It is unidirectional because agents can movein any particular direction (multigraph) This means that thesameweights on the edges apply to both directionsThe graphdoes not show119867 we assume that once the agents reach 119865 or119862 they exit if all inspections have been done

33 HybridModel Development Theparameters for buildingthe analytical hybrid QLACS (equations (2) through (11))model are presented in Tables 5 and 6

331 Algorithmic and Mathematical Analysis

ACS Starts Computation of edge attractiveness

120578119894119895=

1

119863119894119895

(2)

Mathematical Problems in Engineering 9

Table 5 A list of parameters for the ACS model

ACO parameters Meaning120572 Pheromone influence factor120573 Influence of adjacent node distance120588 Pheromone evaporation coefficient119876 Attractiveness constant119890 Visited edge1198901015840 Edge not visited119871119896 Length tour of ant k

120591 Pheromone concentration (amount)120578 Specific visibility function (attractiveness)Δ120591119896 Pheromone concentration by Kth ant

119875119903(119894 119895) Probability of moving from i to j

119863119894119895 Visibility or distance between i and j

119891119894 Fitness of individual in a population119875119894 Probability of being selected among 119891

119894

119873 Number of individuals in the population119894 119895 Denotes any two adjacent nodes in the graph119872119896 Set of unvisited nodes

Table 6 A list of parameters for the QL model

QL Parameters Meaning119876 119876-value update119904 State119886 Action119877 Reward120574 Learning rate

Computation of instantaneous pheromone by ant 119896

Δ120591119896=

119876

119871119896

(3)

Update of pheromone

120591119894119895= (1 minus 120588) lowast 120591

119894119895+ Δ120591119896

119894119895 (4)

Computation of edge probability

119875119903(119894 119895) =

[120591119894119895]120572[120578119894119895]120573

Σ1198901015840=(119894119895)

[120591119894119895]120572[120578119894119895]120573 (5)

Adoption of roulette wheel

Cumulative (119875119903(119894 119895)) =

119873+1

sum119894=1

119875119903(119894 119895) (6)

119891119894=sum119873

119895=1119891119895

119873 (7)

119875119894=

119891119894

sum119873

119895=1119891119895

(8)

Equations (2) through (8) build the complete route find-ing model Equations (2) through (4) are prerequisite to (5)Equation (5) is prerequisite to roulette wheel selection At theend of (8) new states are selected and the trail is updatedThebest path from both directions is selected and used as inputin Q-learning Note that 119863

119894119895= weight on edges 119875

119903(119894 119895) =

chance of moving to a node 119864(119868 119895) 119871119896= sum of visited nodes

by ant 119896

QL Starts Each robot in its QL threadcomputes its learning rate

120574 =05

[1 + Frequency (119904 119886)] (9)

119876-values are updated

119876 (119904 119886) = 119877 (119904 119886) + 120574 (10)

making a broadcast (Decision = InspectIgnoreShut-down)

119876 (119904 119886) =

119876 = 0 Inspect if 119904119895

= goalstate119876 gt 0 Ignore if 119904

119895= goalstate

119876 ge 0 Shutdown if 119904119895= goalstate

(11)

Equation (9) is Gamma the learning rate which is alwaysbetween zero and 1 This equation is calculated based onthe frequency of action of each robot in inspecting statesEquations (9) to (11) are state-dependent The states are keptin a buffer and then accessed at run time ACS and QL do notwork simultaneously ACS works to completion and QL takesthe final output as its input ACS is not repeatedly called whileQL is working

Our behavioural model is an integration of two algo-rithms (1) route finding algorithm (2) communication andcooperative algorithm The integration works the followingway The optimal route finder (ACS) determines where therobot goes from the starting point to the destination whileQL determines what action it takes when it gets to any ofthe states This collaboration works effectively because theoptimal route finder has been proven to give the best possibletransitions and the heuristically accelerated QL has provento be effective by showing evidence of effective commu-nication in making inspection decisions Consequently itguarantees the shortest possible time for inspection in theabsence of wasteful inspection decisions This frameworkforms the basis for our cooperative behaviourmodel forMRS(QLACS)Thepseudocode for the implementation ofQLACSis outlined in Algorithm 2

332 QLACS Hybrid Approach Evolution Figure 7 is thearchitecture of hybrid QLACS explaining the handshakebetween the two components

(i) Graph Construct Module This is the interconnection ofstates in our model environment (see Figure 4) It encom-passes all possible transitions from 119865 to 119862 and vice versaThus from themodelwe construct an adjacencyweight graphmatrix that can be traversed by any graph-oriented algorithm

10 Mathematical Problems in Engineering

INPUT Edge distance(obstacles) pheromones antsrsquo trail associated probabilitiesstarting and terminating indexes that is from F or COUTPUT Effective cooperation inspection and navigation

(1) Boolean CompletedFlag = False Boolean variable indicates completion for all the threads(2) Declare CompletedThreadBuffer Data structure stores Info about completed thread(3) Initialize all 119876values to Zero All 119876values positions are initialized to zero(4) Initialize Best Paths From ACO algorithm starting from 119865 and 119862(5) While (CompletedFlag ltgt True) Checks for when all robots have finished and flag is true

BeginCreate N number of Threads in ParallelThreads get Current States in Parallel from ACO algorithmThreads get Next States and Indexes in Parallel from ACO algorithmThreads compare 119876values of all corresponding positions of Current States (Each Robot Broadcast119876value info)IF ((Q == 0)amp (ThreadNextState ltgt GoalState)) Checks if a particulate state is availableBeginState is Available Robot with the CurrentThreadID InspectsCompute and Update 119876value

EndIF (Q gt 0) checks if a state is not available because an already inspected state has 119876 gt 0BeginState is already inspected Robot with the CurrentThreadID IgnoreCompute and Update 119876value

EndIF ((Q == 0) amp (ThreadNextState == GoalState)) Checks for goal state and shuts downBeginCompute and Update 119876valueGoal state is reached and Inspection CompletedThread with the CurrentThreadID Shuts downStore CurrentThreadID in CompletedThreadBuffer

EndIF (Count [CompletedThreadBuffer] == NumberOfRobot) Learning stops when this happensBeginCompletedFlag = True

EndEnd of While Loop

Algorithm 2 Pseudocode for QLACS

No

No

Yes

Yes2

robots

Graphconstruct

Initializeparameters

module

Stateselectionmodule

Stateaction

module

QLACSupdatemodule

Statebroadcasting

module

Goal statemodule

New statemodule

Decisionmodule

Cooperativeinspection

module

Convergencedecisionmodule

Transitionmodule

Updatemodule

Figure 7 Hybrid architecture

Mathematical Problems in Engineering 11

Example I of QLACS (Using Figure 5 for optimized route finding)Parameters used in the first component of QLACS119876 = 20 120572 = 3 120573 = 2 120588 = 001

Starting State 119865Terminating State 119862119865Terminating condition When all states have been visited at least once and it is at terminating stateState Space = 119865 119866 119864119863 119860119861 119862Initialize pheromones positions to 001Rand = (0 1) returns a random value between 0 and 1Equations(i) Computation of attractiveness 120578

119894119895using (2)

(ii) Computation of instantaneous pheromones by ant 119896 for all potential states using (3)(iii) Pheromone update using (4)(iv) Computation of probabilities for the potential states using (5)(v) Adaptation of roulette Wheel using (6)Equation (5) can be reduced to Let 119908(119894 119895) = [120591

119894119895]120572

[120578119894119895]120573

Sum =119873+

sum119894=1

119908 (119894 119895)

So 119875119903(119894 119895) = 119908

sum

Algorithm 3 Parameters for QLACS Example I

In our case there are eight states primarily seven internalstates and a common terminating state Since the states arenot just linear the environment allows for multiple optionsthat is an agentant can choose from any present state Thistype of scenario is also called a multigoal scenario

(ii) State Selection Module Here the agents select the startand end states which according to our model in Figure 4 are119865 and 119862 These states are selected based on the cumulativeprobability of two adjacent nodes in (6)

(iii) Transition Module This module takes care of the tran-sition rules of the agents by calculating the pheromoneconcentrations distances and probabilities using (2) through(4)

(iv)UpdateModuleAfter transition fromone state to anotheran update of the pheromone is computed after whichmultipath planning with shortest path is achieved

(v) Convergence Decision Module This is where the besttrailpath decision is takenThis is the end product of the firstcomponent of QLACS which is then moved to the secondcomponent of QLACS for cooperative behaviour

(vi) Cooperative Inspection Module This is where the robotsare deployed to start inspection The robots are to use theacquired best paths starting from 119865 and 119862 respectively asinput for the second component of QLACS The two robotsare to use the learning rate from (9) to learn the environmentand use (10) for cooperation

(vii) State Broadcasting Module This module handles thebroadcasting behaviours of the two robots which areachieved by using (10) Each robot checks its memoryrepresented by 119876-values before taking any decision

(viii) State Action Module State broadcasting by each robot isimmediately followed by action selection In other words thestate to inspect or ignore is achieved here using (11)

(ix) QLACS Update Module After each action selection the119876-values of each robot are updated using (10)

(x) New State Module This module takes care of the robotrsquosnew state after updating the 119876-values Each robot runs in itsown threads managing its memory yet sharing information

(xi) Final Decision ModuleThis decision module determinesif the robot should exit the environment or still do moreinspections It is also controlled by (11)

(xii) Goal State Module The completion of the second com-ponent of QLACS is getting to the goal state after successfulinspection of statesThis goal state according to our model inFigure 4 is called119867

333 Analytical Scenario For the benefit of robotic andsystem engineers practitioners and researchers this paperuniquely presents scenarios on the ease of implementation ofthe proposed QLACS as described in Algorithms 3 and 4 andTables 7 and 10 The first component of QLACS is explainedin Example I shown in Algorithm 3 and Table 7 The firstcomponent that achieved optimal route paths for the robotshas the parameters listed in Algorithm 3 The calculation fordifferent states transition for the first component of QLACSis analysed in Table 7

Repeat steps 1ndash6 for subsequent current states until thetermination condition and state are reached At the end ofseven updates we have Tables 8 and 9 The total lengthshown in Table 8 represents the total number of obstaclesencountered by each agent while trailing to achieve theoptimal route for the robots The number of obstacle shown

12 Mathematical Problems in Engineering

Table7QLA

CSEx

ampleI

navigatio

nalanalytic

alsolutio

n

Startin

gsta

te119865

Potentialstates119863119864

119866(1)U

se(2)120578119865119863=1120578119865119864=1120578119865119866=1

(2)U

se(3)119871119896=1Δ120591119896(119865119863

)=21

=2Δ

120591119896(119865119864)=2Δ

120591119896(119865119866

)=2

(3)U

se(4)120591 119865119863=(1minus001)lowast001+2=20099=201120591119865119864=201120591119865119866=201

Firstp

heromon

eupd

ate

119865119866

119864

001

201

201

119863119860

119861

201

001

001

119862

001

(4)U

se(5)119908

(119865119863

)=[120591119865119863]120572[120578119865119863]120573=(201)3=812119908

(119865119864)=812119908

(119865119866

)=812

Sum

=119908(119865119863

)+119908(119865119864)+119908(119865119866

)=2436

119875119903(119865119863

)=

119908 sum

=812

2436=033119875119903(119865119864)=033119875119903(119865119866

)=033

Prob

abilitie

s119865

119866119864

0033

033

119863119860

119861

033

00

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0033

033

119863119860

119861

033

00

119862 0

CallRa

ndRa

nd=007R

andfalls

instate119864

Rou

lette

wheelselects119864

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

Currentstate119864

Potentialstates119861119863

119866(1)U

se(2)120578119864119861=12120578119864119863=12120578119864119866=1

(2)U

se(3)119871119896=(119865

minus119864minus119863)=1+2=3119871119896=(119865

minus119864minus119861)=1+2=3119871119896=(119865

minus119864minus119866)=1+2=3

Δ120591119896=2 3=067

(3)U

se(4)120591 119864119861=(1minus001)lowast201+067=266120591119864119863=266120591119864119866=266

Second

pherom

oneu

pdate

119865119866

119864

001

266

201

119863119860

119861

266

001

266

119862

001

(4)U

se(5)119908

(119864119861)=(266)3lowast(12)2=471119908

(119864119863

)=471119908

(119864119866

)=(266)3lowast(1)2=1882

Sum

=471+471+1882=2824

119875119903(119864119861)=

119908 sum

=471

2824=017119875119903(119864119863

)=

471

2824=017119875119903(119864119866)=1882

2824=067

Prob

abilitie

s119865

119866119864

0067

0

119863119860

119861

017

0017

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0067

0

119863119860

119861

017

0017

119862 0

CallRa

ndRa

nd=09Ra

ndfalls

instate119863

Rou

lette

wheelselects119863

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

119863

Mathematical Problems in Engineering 13

Example II of QLACS (For good cooperation and communication between robots)Parameters used in the second component of QLACS (Using output from the first component of QLACS)Reward Scheme Inspect = 50 Ignore = 100 Shutdown = 150State space Optimized path from QLACS R1 = 119865 119864119863 119860 119861 119862 119866 119862 and QLACS R2 = 119862 119861 119860119863 119865 119864 119866 119862

Starting State 119865119862119878119895= Terminating State 119862 then119867

Terminating condition When all states have been visitedInitialize 119876value positions to zerosEquations(i) Compute learning rate using (9)(ii) Compute update on 119876(119904 119886) using (10)

Algorithm 4 Parameters for QLACS Example II

Table 8 Pheromone update of a full cycle

Pheromone updateCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 001 201 201 201 001 001 001 1st update119864 001 266 201 266 001 266 001 2nd update119863 313 001 303 001 303 001 001 3rd update119860 001 001 001 045 001 045 001 4th update119861 001 001 067 001 067 001 07 5th update119862 001 091 001 001 001 089 001 6th update119866 071 001 069 001 001 001 071 7th update119862 = terminating state and ant 119896 has moved through all states at least onceTrail 119865119864119863119860119861119862119866119862Number of obstacles = 1 + 2 + 2 + 2 + 1 + 1 + 1 + 1 = 10

Table 9 Probability update of a full cycle

Probability tableCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 0 033 033 033 0 0 0119864 0 067 0 017 0 017 0119863 069 0 016 0 016 0 0119860 0 0 0 05 0 05 0119861 0 0 016 0 016 0 068119862 0 052 0 0 0 049 0119866 045 0 0 0 0 0 045119862 = terminating state

as 10 is calculated using the trail result in Table 8 Table 9displays the probability update table for a full cycle of routefinding In the case of this first example the first robot willterminate its inspection through 119862 and then119867

Algorithm 4 displays the parameters of the second exam-ple of QLACS The second component of QLACS handlesthis aspect of the model which is the communication andcooperative part Once the first component hands the outputto the second component it becomes the second componentinput and it runs with it From Algorithm 4 QLACS R1represents the input for robot 1 and QLACS R2 representsthe input for robot 2 received from the first component of

QLACS The terminating condition is when all states havebeen visited

The rest of Example II displayed in Table 10 explains thecooperative behavioural scenario from one state to anotherfor two robots The tables in the last row of Table 10 show thecommunication and cooperative output achieved using thesecond component of QLACS

4 Experimental Evaluations SafetyInspections for MRS Behaviours

The experimental results of implementing QLACS in anenvironment that consists of obstacles and links are tabu-lated in this section The experimental setup is explainedin Section 41 Different performance categories are shownin this section without communication category and withcommunication category In the without communicationcategory as displayed in Section 42 we found that robotscan inspect all states individually without knowing thatanother robot exists Robots can also inspect some of thestates thereby leaving some states not inspected The com-munication category is explained in Section 43 while theperformance of the QLACS measured with other existingmethods is tabulated in Section 44

41 Experimental Setup Figure 8 displays different sets ofexperiments conducted in the environment using two robotsFigure 8(a) shows how two robots resume inspection fromtwo different entrances In each set of experiments the robotstake the readings of the roof cracks and level of toxic gasesusing their sensors The same behaviours happen in Figures8(b) and 8(c) respectively at different inspection locationsThe inspection locations vary in the four experimental setupsshown in Figure 8

42 Experiment 1 Performance of QLACS without Coop-eration The result of implementing the QLACS withoutcommunication in the proposed environment (Figures 1 and4) is shown in Tables 11 and 12 In the case of Table 11 robot1 (R1) enters the mine through state 119865 while robot 2 (R2)enters the mine through state 119862 However each robot learnsby inspecting some of the states and ignoring some of the

14 Mathematical Problems in Engineering

Table 10 QLACS Example II cooperative behaviour

Starting simulation Robot 1Starting state F(1) Use (9)120574 (119865) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119865 (use (11))119876 (119865 119886) = 0

Selected action 119886 = inspect(3) use (10)119876 (119865 119886) = 50 + 025(0) = 50

End of value iteration

Current state E Robot 1(1) Use (9)120574 (119864) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119864 (Use (11))119876 (119864 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119864 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119862 (use (11))119876 (119862 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119862 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state B Robot 2(1) Use (9)120574 (119861) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119861 (use (11))119876 (119861 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119861 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current stateD Robot 1(1) Use (9)120574 (119863) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119863 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119863 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current State A Robot 2(1) Use (9)120574 (119860) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119860 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119860 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iterationCurrent state A Robot 1(1) Use (9)120574 (119860) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119860 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119860 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current stateD Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119863 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119863 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state B Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state F Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119865 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119865 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state E Robot 2(1) Use (9)120574 (119864) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119864 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119864 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state G Robot 1(1) Use (9)120574 (119866) =

05

[1 + 1]=05

2= 025

(2) Broadcast (Use (11))119876 (119866 119886) = 0

Selected action 119886 = Inspect(3) Use (10)119876 (119866 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state G Robot 2(1) Use (9)120574 (119866) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119866 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119866 119886) = 100 + 04 (max (50 50 50))= 100 + 20 = 120

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119862) =

05

[1 + 04]=05

14= 036

(2) Broadcast (Use (11))119876 (119862 119886) = 50

Selected action 119886 = Ignore(3) Use (10)119876 (119862 119886) = 100 + 036 (max (50 50))= 100 + 18 = 118

End of value iterationCurrent state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 036]=

05

136= 037

(2) Broadcast (Use (11))119876 (119862 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119862 119886) = 100 + 037 (max (50 50))= 100 + 037 lowast 50 = 1185

End of value iteration

All QLACS 1 states exhaustedGoal stateH Robot 1(1) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(3) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

All QLACS 2 states exhaustedGoal stateH Robot 2(4) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(5) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(6) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Cooperative Behaviours with Swarm Intelligence in Multirobot

Mathematical Problems in Engineering 9

Table 5 A list of parameters for the ACS model

ACO parameters Meaning120572 Pheromone influence factor120573 Influence of adjacent node distance120588 Pheromone evaporation coefficient119876 Attractiveness constant119890 Visited edge1198901015840 Edge not visited119871119896 Length tour of ant k

120591 Pheromone concentration (amount)120578 Specific visibility function (attractiveness)Δ120591119896 Pheromone concentration by Kth ant

119875119903(119894 119895) Probability of moving from i to j

119863119894119895 Visibility or distance between i and j

119891119894 Fitness of individual in a population119875119894 Probability of being selected among 119891

119894

119873 Number of individuals in the population119894 119895 Denotes any two adjacent nodes in the graph119872119896 Set of unvisited nodes

Table 6 A list of parameters for the QL model

QL Parameters Meaning119876 119876-value update119904 State119886 Action119877 Reward120574 Learning rate

Computation of instantaneous pheromone by ant 119896

Δ120591119896=

119876

119871119896

(3)

Update of pheromone

120591119894119895= (1 minus 120588) lowast 120591

119894119895+ Δ120591119896

119894119895 (4)

Computation of edge probability

119875119903(119894 119895) =

[120591119894119895]120572[120578119894119895]120573

Σ1198901015840=(119894119895)

[120591119894119895]120572[120578119894119895]120573 (5)

Adoption of roulette wheel

Cumulative (119875119903(119894 119895)) =

119873+1

sum119894=1

119875119903(119894 119895) (6)

119891119894=sum119873

119895=1119891119895

119873 (7)

119875119894=

119891119894

sum119873

119895=1119891119895

(8)

Equations (2) through (8) build the complete route find-ing model Equations (2) through (4) are prerequisite to (5)Equation (5) is prerequisite to roulette wheel selection At theend of (8) new states are selected and the trail is updatedThebest path from both directions is selected and used as inputin Q-learning Note that 119863

119894119895= weight on edges 119875

119903(119894 119895) =

chance of moving to a node 119864(119868 119895) 119871119896= sum of visited nodes

by ant 119896

QL Starts Each robot in its QL threadcomputes its learning rate

120574 =05

[1 + Frequency (119904 119886)] (9)

119876-values are updated

119876 (119904 119886) = 119877 (119904 119886) + 120574 (10)

making a broadcast (Decision = InspectIgnoreShut-down)

119876 (119904 119886) =

119876 = 0 Inspect if 119904119895

= goalstate119876 gt 0 Ignore if 119904

119895= goalstate

119876 ge 0 Shutdown if 119904119895= goalstate

(11)

Equation (9) is Gamma the learning rate which is alwaysbetween zero and 1 This equation is calculated based onthe frequency of action of each robot in inspecting statesEquations (9) to (11) are state-dependent The states are keptin a buffer and then accessed at run time ACS and QL do notwork simultaneously ACS works to completion and QL takesthe final output as its input ACS is not repeatedly called whileQL is working

Our behavioural model is an integration of two algo-rithms (1) route finding algorithm (2) communication andcooperative algorithm The integration works the followingway The optimal route finder (ACS) determines where therobot goes from the starting point to the destination whileQL determines what action it takes when it gets to any ofthe states This collaboration works effectively because theoptimal route finder has been proven to give the best possibletransitions and the heuristically accelerated QL has provento be effective by showing evidence of effective commu-nication in making inspection decisions Consequently itguarantees the shortest possible time for inspection in theabsence of wasteful inspection decisions This frameworkforms the basis for our cooperative behaviourmodel forMRS(QLACS)Thepseudocode for the implementation ofQLACSis outlined in Algorithm 2

332 QLACS Hybrid Approach Evolution Figure 7 is thearchitecture of hybrid QLACS explaining the handshakebetween the two components

(i) Graph Construct Module This is the interconnection ofstates in our model environment (see Figure 4) It encom-passes all possible transitions from 119865 to 119862 and vice versaThus from themodelwe construct an adjacencyweight graphmatrix that can be traversed by any graph-oriented algorithm

10 Mathematical Problems in Engineering

INPUT Edge distance(obstacles) pheromones antsrsquo trail associated probabilitiesstarting and terminating indexes that is from F or COUTPUT Effective cooperation inspection and navigation

(1) Boolean CompletedFlag = False Boolean variable indicates completion for all the threads(2) Declare CompletedThreadBuffer Data structure stores Info about completed thread(3) Initialize all 119876values to Zero All 119876values positions are initialized to zero(4) Initialize Best Paths From ACO algorithm starting from 119865 and 119862(5) While (CompletedFlag ltgt True) Checks for when all robots have finished and flag is true

BeginCreate N number of Threads in ParallelThreads get Current States in Parallel from ACO algorithmThreads get Next States and Indexes in Parallel from ACO algorithmThreads compare 119876values of all corresponding positions of Current States (Each Robot Broadcast119876value info)IF ((Q == 0)amp (ThreadNextState ltgt GoalState)) Checks if a particulate state is availableBeginState is Available Robot with the CurrentThreadID InspectsCompute and Update 119876value

EndIF (Q gt 0) checks if a state is not available because an already inspected state has 119876 gt 0BeginState is already inspected Robot with the CurrentThreadID IgnoreCompute and Update 119876value

EndIF ((Q == 0) amp (ThreadNextState == GoalState)) Checks for goal state and shuts downBeginCompute and Update 119876valueGoal state is reached and Inspection CompletedThread with the CurrentThreadID Shuts downStore CurrentThreadID in CompletedThreadBuffer

EndIF (Count [CompletedThreadBuffer] == NumberOfRobot) Learning stops when this happensBeginCompletedFlag = True

EndEnd of While Loop

Algorithm 2 Pseudocode for QLACS

No

No

Yes

Yes2

robots

Graphconstruct

Initializeparameters

module

Stateselectionmodule

Stateaction

module

QLACSupdatemodule

Statebroadcasting

module

Goal statemodule

New statemodule

Decisionmodule

Cooperativeinspection

module

Convergencedecisionmodule

Transitionmodule

Updatemodule

Figure 7 Hybrid architecture

Mathematical Problems in Engineering 11

Example I of QLACS (Using Figure 5 for optimized route finding)Parameters used in the first component of QLACS119876 = 20 120572 = 3 120573 = 2 120588 = 001

Starting State 119865Terminating State 119862119865Terminating condition When all states have been visited at least once and it is at terminating stateState Space = 119865 119866 119864119863 119860119861 119862Initialize pheromones positions to 001Rand = (0 1) returns a random value between 0 and 1Equations(i) Computation of attractiveness 120578

119894119895using (2)

(ii) Computation of instantaneous pheromones by ant 119896 for all potential states using (3)(iii) Pheromone update using (4)(iv) Computation of probabilities for the potential states using (5)(v) Adaptation of roulette Wheel using (6)Equation (5) can be reduced to Let 119908(119894 119895) = [120591

119894119895]120572

[120578119894119895]120573

Sum =119873+

sum119894=1

119908 (119894 119895)

So 119875119903(119894 119895) = 119908

sum

Algorithm 3 Parameters for QLACS Example I

In our case there are eight states primarily seven internalstates and a common terminating state Since the states arenot just linear the environment allows for multiple optionsthat is an agentant can choose from any present state Thistype of scenario is also called a multigoal scenario

(ii) State Selection Module Here the agents select the startand end states which according to our model in Figure 4 are119865 and 119862 These states are selected based on the cumulativeprobability of two adjacent nodes in (6)

(iii) Transition Module This module takes care of the tran-sition rules of the agents by calculating the pheromoneconcentrations distances and probabilities using (2) through(4)

(iv)UpdateModuleAfter transition fromone state to anotheran update of the pheromone is computed after whichmultipath planning with shortest path is achieved

(v) Convergence Decision Module This is where the besttrailpath decision is takenThis is the end product of the firstcomponent of QLACS which is then moved to the secondcomponent of QLACS for cooperative behaviour

(vi) Cooperative Inspection Module This is where the robotsare deployed to start inspection The robots are to use theacquired best paths starting from 119865 and 119862 respectively asinput for the second component of QLACS The two robotsare to use the learning rate from (9) to learn the environmentand use (10) for cooperation

(vii) State Broadcasting Module This module handles thebroadcasting behaviours of the two robots which areachieved by using (10) Each robot checks its memoryrepresented by 119876-values before taking any decision

(viii) State Action Module State broadcasting by each robot isimmediately followed by action selection In other words thestate to inspect or ignore is achieved here using (11)

(ix) QLACS Update Module After each action selection the119876-values of each robot are updated using (10)

(x) New State Module This module takes care of the robotrsquosnew state after updating the 119876-values Each robot runs in itsown threads managing its memory yet sharing information

(xi) Final Decision ModuleThis decision module determinesif the robot should exit the environment or still do moreinspections It is also controlled by (11)

(xii) Goal State Module The completion of the second com-ponent of QLACS is getting to the goal state after successfulinspection of statesThis goal state according to our model inFigure 4 is called119867

333 Analytical Scenario For the benefit of robotic andsystem engineers practitioners and researchers this paperuniquely presents scenarios on the ease of implementation ofthe proposed QLACS as described in Algorithms 3 and 4 andTables 7 and 10 The first component of QLACS is explainedin Example I shown in Algorithm 3 and Table 7 The firstcomponent that achieved optimal route paths for the robotshas the parameters listed in Algorithm 3 The calculation fordifferent states transition for the first component of QLACSis analysed in Table 7

Repeat steps 1ndash6 for subsequent current states until thetermination condition and state are reached At the end ofseven updates we have Tables 8 and 9 The total lengthshown in Table 8 represents the total number of obstaclesencountered by each agent while trailing to achieve theoptimal route for the robots The number of obstacle shown

12 Mathematical Problems in Engineering

Table7QLA

CSEx

ampleI

navigatio

nalanalytic

alsolutio

n

Startin

gsta

te119865

Potentialstates119863119864

119866(1)U

se(2)120578119865119863=1120578119865119864=1120578119865119866=1

(2)U

se(3)119871119896=1Δ120591119896(119865119863

)=21

=2Δ

120591119896(119865119864)=2Δ

120591119896(119865119866

)=2

(3)U

se(4)120591 119865119863=(1minus001)lowast001+2=20099=201120591119865119864=201120591119865119866=201

Firstp

heromon

eupd

ate

119865119866

119864

001

201

201

119863119860

119861

201

001

001

119862

001

(4)U

se(5)119908

(119865119863

)=[120591119865119863]120572[120578119865119863]120573=(201)3=812119908

(119865119864)=812119908

(119865119866

)=812

Sum

=119908(119865119863

)+119908(119865119864)+119908(119865119866

)=2436

119875119903(119865119863

)=

119908 sum

=812

2436=033119875119903(119865119864)=033119875119903(119865119866

)=033

Prob

abilitie

s119865

119866119864

0033

033

119863119860

119861

033

00

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0033

033

119863119860

119861

033

00

119862 0

CallRa

ndRa

nd=007R

andfalls

instate119864

Rou

lette

wheelselects119864

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

Currentstate119864

Potentialstates119861119863

119866(1)U

se(2)120578119864119861=12120578119864119863=12120578119864119866=1

(2)U

se(3)119871119896=(119865

minus119864minus119863)=1+2=3119871119896=(119865

minus119864minus119861)=1+2=3119871119896=(119865

minus119864minus119866)=1+2=3

Δ120591119896=2 3=067

(3)U

se(4)120591 119864119861=(1minus001)lowast201+067=266120591119864119863=266120591119864119866=266

Second

pherom

oneu

pdate

119865119866

119864

001

266

201

119863119860

119861

266

001

266

119862

001

(4)U

se(5)119908

(119864119861)=(266)3lowast(12)2=471119908

(119864119863

)=471119908

(119864119866

)=(266)3lowast(1)2=1882

Sum

=471+471+1882=2824

119875119903(119864119861)=

119908 sum

=471

2824=017119875119903(119864119863

)=

471

2824=017119875119903(119864119866)=1882

2824=067

Prob

abilitie

s119865

119866119864

0067

0

119863119860

119861

017

0017

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0067

0

119863119860

119861

017

0017

119862 0

CallRa

ndRa

nd=09Ra

ndfalls

instate119863

Rou

lette

wheelselects119863

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

119863

Mathematical Problems in Engineering 13

Example II of QLACS (For good cooperation and communication between robots)Parameters used in the second component of QLACS (Using output from the first component of QLACS)Reward Scheme Inspect = 50 Ignore = 100 Shutdown = 150State space Optimized path from QLACS R1 = 119865 119864119863 119860 119861 119862 119866 119862 and QLACS R2 = 119862 119861 119860119863 119865 119864 119866 119862

Starting State 119865119862119878119895= Terminating State 119862 then119867

Terminating condition When all states have been visitedInitialize 119876value positions to zerosEquations(i) Compute learning rate using (9)(ii) Compute update on 119876(119904 119886) using (10)

Algorithm 4 Parameters for QLACS Example II

Table 8 Pheromone update of a full cycle

Pheromone updateCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 001 201 201 201 001 001 001 1st update119864 001 266 201 266 001 266 001 2nd update119863 313 001 303 001 303 001 001 3rd update119860 001 001 001 045 001 045 001 4th update119861 001 001 067 001 067 001 07 5th update119862 001 091 001 001 001 089 001 6th update119866 071 001 069 001 001 001 071 7th update119862 = terminating state and ant 119896 has moved through all states at least onceTrail 119865119864119863119860119861119862119866119862Number of obstacles = 1 + 2 + 2 + 2 + 1 + 1 + 1 + 1 = 10

Table 9 Probability update of a full cycle

Probability tableCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 0 033 033 033 0 0 0119864 0 067 0 017 0 017 0119863 069 0 016 0 016 0 0119860 0 0 0 05 0 05 0119861 0 0 016 0 016 0 068119862 0 052 0 0 0 049 0119866 045 0 0 0 0 0 045119862 = terminating state

as 10 is calculated using the trail result in Table 8 Table 9displays the probability update table for a full cycle of routefinding In the case of this first example the first robot willterminate its inspection through 119862 and then119867

Algorithm 4 displays the parameters of the second exam-ple of QLACS The second component of QLACS handlesthis aspect of the model which is the communication andcooperative part Once the first component hands the outputto the second component it becomes the second componentinput and it runs with it From Algorithm 4 QLACS R1represents the input for robot 1 and QLACS R2 representsthe input for robot 2 received from the first component of

QLACS The terminating condition is when all states havebeen visited

The rest of Example II displayed in Table 10 explains thecooperative behavioural scenario from one state to anotherfor two robots The tables in the last row of Table 10 show thecommunication and cooperative output achieved using thesecond component of QLACS

4 Experimental Evaluations SafetyInspections for MRS Behaviours

The experimental results of implementing QLACS in anenvironment that consists of obstacles and links are tabu-lated in this section The experimental setup is explainedin Section 41 Different performance categories are shownin this section without communication category and withcommunication category In the without communicationcategory as displayed in Section 42 we found that robotscan inspect all states individually without knowing thatanother robot exists Robots can also inspect some of thestates thereby leaving some states not inspected The com-munication category is explained in Section 43 while theperformance of the QLACS measured with other existingmethods is tabulated in Section 44

41 Experimental Setup Figure 8 displays different sets ofexperiments conducted in the environment using two robotsFigure 8(a) shows how two robots resume inspection fromtwo different entrances In each set of experiments the robotstake the readings of the roof cracks and level of toxic gasesusing their sensors The same behaviours happen in Figures8(b) and 8(c) respectively at different inspection locationsThe inspection locations vary in the four experimental setupsshown in Figure 8

42 Experiment 1 Performance of QLACS without Coop-eration The result of implementing the QLACS withoutcommunication in the proposed environment (Figures 1 and4) is shown in Tables 11 and 12 In the case of Table 11 robot1 (R1) enters the mine through state 119865 while robot 2 (R2)enters the mine through state 119862 However each robot learnsby inspecting some of the states and ignoring some of the

14 Mathematical Problems in Engineering

Table 10 QLACS Example II cooperative behaviour

Starting simulation Robot 1Starting state F(1) Use (9)120574 (119865) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119865 (use (11))119876 (119865 119886) = 0

Selected action 119886 = inspect(3) use (10)119876 (119865 119886) = 50 + 025(0) = 50

End of value iteration

Current state E Robot 1(1) Use (9)120574 (119864) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119864 (Use (11))119876 (119864 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119864 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119862 (use (11))119876 (119862 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119862 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state B Robot 2(1) Use (9)120574 (119861) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119861 (use (11))119876 (119861 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119861 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current stateD Robot 1(1) Use (9)120574 (119863) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119863 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119863 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current State A Robot 2(1) Use (9)120574 (119860) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119860 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119860 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iterationCurrent state A Robot 1(1) Use (9)120574 (119860) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119860 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119860 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current stateD Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119863 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119863 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state B Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state F Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119865 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119865 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state E Robot 2(1) Use (9)120574 (119864) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119864 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119864 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state G Robot 1(1) Use (9)120574 (119866) =

05

[1 + 1]=05

2= 025

(2) Broadcast (Use (11))119876 (119866 119886) = 0

Selected action 119886 = Inspect(3) Use (10)119876 (119866 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state G Robot 2(1) Use (9)120574 (119866) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119866 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119866 119886) = 100 + 04 (max (50 50 50))= 100 + 20 = 120

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119862) =

05

[1 + 04]=05

14= 036

(2) Broadcast (Use (11))119876 (119862 119886) = 50

Selected action 119886 = Ignore(3) Use (10)119876 (119862 119886) = 100 + 036 (max (50 50))= 100 + 18 = 118

End of value iterationCurrent state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 036]=

05

136= 037

(2) Broadcast (Use (11))119876 (119862 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119862 119886) = 100 + 037 (max (50 50))= 100 + 037 lowast 50 = 1185

End of value iteration

All QLACS 1 states exhaustedGoal stateH Robot 1(1) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(3) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

All QLACS 2 states exhaustedGoal stateH Robot 2(4) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(5) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(6) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Cooperative Behaviours with Swarm Intelligence in Multirobot

10 Mathematical Problems in Engineering

INPUT Edge distance(obstacles) pheromones antsrsquo trail associated probabilitiesstarting and terminating indexes that is from F or COUTPUT Effective cooperation inspection and navigation

(1) Boolean CompletedFlag = False Boolean variable indicates completion for all the threads(2) Declare CompletedThreadBuffer Data structure stores Info about completed thread(3) Initialize all 119876values to Zero All 119876values positions are initialized to zero(4) Initialize Best Paths From ACO algorithm starting from 119865 and 119862(5) While (CompletedFlag ltgt True) Checks for when all robots have finished and flag is true

BeginCreate N number of Threads in ParallelThreads get Current States in Parallel from ACO algorithmThreads get Next States and Indexes in Parallel from ACO algorithmThreads compare 119876values of all corresponding positions of Current States (Each Robot Broadcast119876value info)IF ((Q == 0)amp (ThreadNextState ltgt GoalState)) Checks if a particulate state is availableBeginState is Available Robot with the CurrentThreadID InspectsCompute and Update 119876value

EndIF (Q gt 0) checks if a state is not available because an already inspected state has 119876 gt 0BeginState is already inspected Robot with the CurrentThreadID IgnoreCompute and Update 119876value

EndIF ((Q == 0) amp (ThreadNextState == GoalState)) Checks for goal state and shuts downBeginCompute and Update 119876valueGoal state is reached and Inspection CompletedThread with the CurrentThreadID Shuts downStore CurrentThreadID in CompletedThreadBuffer

EndIF (Count [CompletedThreadBuffer] == NumberOfRobot) Learning stops when this happensBeginCompletedFlag = True

EndEnd of While Loop

Algorithm 2 Pseudocode for QLACS

No

No

Yes

Yes2

robots

Graphconstruct

Initializeparameters

module

Stateselectionmodule

Stateaction

module

QLACSupdatemodule

Statebroadcasting

module

Goal statemodule

New statemodule

Decisionmodule

Cooperativeinspection

module

Convergencedecisionmodule

Transitionmodule

Updatemodule

Figure 7 Hybrid architecture

Mathematical Problems in Engineering 11

Example I of QLACS (Using Figure 5 for optimized route finding)Parameters used in the first component of QLACS119876 = 20 120572 = 3 120573 = 2 120588 = 001

Starting State 119865Terminating State 119862119865Terminating condition When all states have been visited at least once and it is at terminating stateState Space = 119865 119866 119864119863 119860119861 119862Initialize pheromones positions to 001Rand = (0 1) returns a random value between 0 and 1Equations(i) Computation of attractiveness 120578

119894119895using (2)

(ii) Computation of instantaneous pheromones by ant 119896 for all potential states using (3)(iii) Pheromone update using (4)(iv) Computation of probabilities for the potential states using (5)(v) Adaptation of roulette Wheel using (6)Equation (5) can be reduced to Let 119908(119894 119895) = [120591

119894119895]120572

[120578119894119895]120573

Sum =119873+

sum119894=1

119908 (119894 119895)

So 119875119903(119894 119895) = 119908

sum

Algorithm 3 Parameters for QLACS Example I

In our case there are eight states primarily seven internalstates and a common terminating state Since the states arenot just linear the environment allows for multiple optionsthat is an agentant can choose from any present state Thistype of scenario is also called a multigoal scenario

(ii) State Selection Module Here the agents select the startand end states which according to our model in Figure 4 are119865 and 119862 These states are selected based on the cumulativeprobability of two adjacent nodes in (6)

(iii) Transition Module This module takes care of the tran-sition rules of the agents by calculating the pheromoneconcentrations distances and probabilities using (2) through(4)

(iv)UpdateModuleAfter transition fromone state to anotheran update of the pheromone is computed after whichmultipath planning with shortest path is achieved

(v) Convergence Decision Module This is where the besttrailpath decision is takenThis is the end product of the firstcomponent of QLACS which is then moved to the secondcomponent of QLACS for cooperative behaviour

(vi) Cooperative Inspection Module This is where the robotsare deployed to start inspection The robots are to use theacquired best paths starting from 119865 and 119862 respectively asinput for the second component of QLACS The two robotsare to use the learning rate from (9) to learn the environmentand use (10) for cooperation

(vii) State Broadcasting Module This module handles thebroadcasting behaviours of the two robots which areachieved by using (10) Each robot checks its memoryrepresented by 119876-values before taking any decision

(viii) State Action Module State broadcasting by each robot isimmediately followed by action selection In other words thestate to inspect or ignore is achieved here using (11)

(ix) QLACS Update Module After each action selection the119876-values of each robot are updated using (10)

(x) New State Module This module takes care of the robotrsquosnew state after updating the 119876-values Each robot runs in itsown threads managing its memory yet sharing information

(xi) Final Decision ModuleThis decision module determinesif the robot should exit the environment or still do moreinspections It is also controlled by (11)

(xii) Goal State Module The completion of the second com-ponent of QLACS is getting to the goal state after successfulinspection of statesThis goal state according to our model inFigure 4 is called119867

333 Analytical Scenario For the benefit of robotic andsystem engineers practitioners and researchers this paperuniquely presents scenarios on the ease of implementation ofthe proposed QLACS as described in Algorithms 3 and 4 andTables 7 and 10 The first component of QLACS is explainedin Example I shown in Algorithm 3 and Table 7 The firstcomponent that achieved optimal route paths for the robotshas the parameters listed in Algorithm 3 The calculation fordifferent states transition for the first component of QLACSis analysed in Table 7

Repeat steps 1ndash6 for subsequent current states until thetermination condition and state are reached At the end ofseven updates we have Tables 8 and 9 The total lengthshown in Table 8 represents the total number of obstaclesencountered by each agent while trailing to achieve theoptimal route for the robots The number of obstacle shown

12 Mathematical Problems in Engineering

Table7QLA

CSEx

ampleI

navigatio

nalanalytic

alsolutio

n

Startin

gsta

te119865

Potentialstates119863119864

119866(1)U

se(2)120578119865119863=1120578119865119864=1120578119865119866=1

(2)U

se(3)119871119896=1Δ120591119896(119865119863

)=21

=2Δ

120591119896(119865119864)=2Δ

120591119896(119865119866

)=2

(3)U

se(4)120591 119865119863=(1minus001)lowast001+2=20099=201120591119865119864=201120591119865119866=201

Firstp

heromon

eupd

ate

119865119866

119864

001

201

201

119863119860

119861

201

001

001

119862

001

(4)U

se(5)119908

(119865119863

)=[120591119865119863]120572[120578119865119863]120573=(201)3=812119908

(119865119864)=812119908

(119865119866

)=812

Sum

=119908(119865119863

)+119908(119865119864)+119908(119865119866

)=2436

119875119903(119865119863

)=

119908 sum

=812

2436=033119875119903(119865119864)=033119875119903(119865119866

)=033

Prob

abilitie

s119865

119866119864

0033

033

119863119860

119861

033

00

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0033

033

119863119860

119861

033

00

119862 0

CallRa

ndRa

nd=007R

andfalls

instate119864

Rou

lette

wheelselects119864

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

Currentstate119864

Potentialstates119861119863

119866(1)U

se(2)120578119864119861=12120578119864119863=12120578119864119866=1

(2)U

se(3)119871119896=(119865

minus119864minus119863)=1+2=3119871119896=(119865

minus119864minus119861)=1+2=3119871119896=(119865

minus119864minus119866)=1+2=3

Δ120591119896=2 3=067

(3)U

se(4)120591 119864119861=(1minus001)lowast201+067=266120591119864119863=266120591119864119866=266

Second

pherom

oneu

pdate

119865119866

119864

001

266

201

119863119860

119861

266

001

266

119862

001

(4)U

se(5)119908

(119864119861)=(266)3lowast(12)2=471119908

(119864119863

)=471119908

(119864119866

)=(266)3lowast(1)2=1882

Sum

=471+471+1882=2824

119875119903(119864119861)=

119908 sum

=471

2824=017119875119903(119864119863

)=

471

2824=017119875119903(119864119866)=1882

2824=067

Prob

abilitie

s119865

119866119864

0067

0

119863119860

119861

017

0017

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0067

0

119863119860

119861

017

0017

119862 0

CallRa

ndRa

nd=09Ra

ndfalls

instate119863

Rou

lette

wheelselects119863

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

119863

Mathematical Problems in Engineering 13

Example II of QLACS (For good cooperation and communication between robots)Parameters used in the second component of QLACS (Using output from the first component of QLACS)Reward Scheme Inspect = 50 Ignore = 100 Shutdown = 150State space Optimized path from QLACS R1 = 119865 119864119863 119860 119861 119862 119866 119862 and QLACS R2 = 119862 119861 119860119863 119865 119864 119866 119862

Starting State 119865119862119878119895= Terminating State 119862 then119867

Terminating condition When all states have been visitedInitialize 119876value positions to zerosEquations(i) Compute learning rate using (9)(ii) Compute update on 119876(119904 119886) using (10)

Algorithm 4 Parameters for QLACS Example II

Table 8 Pheromone update of a full cycle

Pheromone updateCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 001 201 201 201 001 001 001 1st update119864 001 266 201 266 001 266 001 2nd update119863 313 001 303 001 303 001 001 3rd update119860 001 001 001 045 001 045 001 4th update119861 001 001 067 001 067 001 07 5th update119862 001 091 001 001 001 089 001 6th update119866 071 001 069 001 001 001 071 7th update119862 = terminating state and ant 119896 has moved through all states at least onceTrail 119865119864119863119860119861119862119866119862Number of obstacles = 1 + 2 + 2 + 2 + 1 + 1 + 1 + 1 = 10

Table 9 Probability update of a full cycle

Probability tableCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 0 033 033 033 0 0 0119864 0 067 0 017 0 017 0119863 069 0 016 0 016 0 0119860 0 0 0 05 0 05 0119861 0 0 016 0 016 0 068119862 0 052 0 0 0 049 0119866 045 0 0 0 0 0 045119862 = terminating state

as 10 is calculated using the trail result in Table 8 Table 9displays the probability update table for a full cycle of routefinding In the case of this first example the first robot willterminate its inspection through 119862 and then119867

Algorithm 4 displays the parameters of the second exam-ple of QLACS The second component of QLACS handlesthis aspect of the model which is the communication andcooperative part Once the first component hands the outputto the second component it becomes the second componentinput and it runs with it From Algorithm 4 QLACS R1represents the input for robot 1 and QLACS R2 representsthe input for robot 2 received from the first component of

QLACS The terminating condition is when all states havebeen visited

The rest of Example II displayed in Table 10 explains thecooperative behavioural scenario from one state to anotherfor two robots The tables in the last row of Table 10 show thecommunication and cooperative output achieved using thesecond component of QLACS

4 Experimental Evaluations SafetyInspections for MRS Behaviours

The experimental results of implementing QLACS in anenvironment that consists of obstacles and links are tabu-lated in this section The experimental setup is explainedin Section 41 Different performance categories are shownin this section without communication category and withcommunication category In the without communicationcategory as displayed in Section 42 we found that robotscan inspect all states individually without knowing thatanother robot exists Robots can also inspect some of thestates thereby leaving some states not inspected The com-munication category is explained in Section 43 while theperformance of the QLACS measured with other existingmethods is tabulated in Section 44

41 Experimental Setup Figure 8 displays different sets ofexperiments conducted in the environment using two robotsFigure 8(a) shows how two robots resume inspection fromtwo different entrances In each set of experiments the robotstake the readings of the roof cracks and level of toxic gasesusing their sensors The same behaviours happen in Figures8(b) and 8(c) respectively at different inspection locationsThe inspection locations vary in the four experimental setupsshown in Figure 8

42 Experiment 1 Performance of QLACS without Coop-eration The result of implementing the QLACS withoutcommunication in the proposed environment (Figures 1 and4) is shown in Tables 11 and 12 In the case of Table 11 robot1 (R1) enters the mine through state 119865 while robot 2 (R2)enters the mine through state 119862 However each robot learnsby inspecting some of the states and ignoring some of the

14 Mathematical Problems in Engineering

Table 10 QLACS Example II cooperative behaviour

Starting simulation Robot 1Starting state F(1) Use (9)120574 (119865) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119865 (use (11))119876 (119865 119886) = 0

Selected action 119886 = inspect(3) use (10)119876 (119865 119886) = 50 + 025(0) = 50

End of value iteration

Current state E Robot 1(1) Use (9)120574 (119864) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119864 (Use (11))119876 (119864 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119864 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119862 (use (11))119876 (119862 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119862 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state B Robot 2(1) Use (9)120574 (119861) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119861 (use (11))119876 (119861 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119861 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current stateD Robot 1(1) Use (9)120574 (119863) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119863 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119863 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current State A Robot 2(1) Use (9)120574 (119860) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119860 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119860 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iterationCurrent state A Robot 1(1) Use (9)120574 (119860) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119860 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119860 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current stateD Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119863 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119863 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state B Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state F Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119865 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119865 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state E Robot 2(1) Use (9)120574 (119864) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119864 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119864 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state G Robot 1(1) Use (9)120574 (119866) =

05

[1 + 1]=05

2= 025

(2) Broadcast (Use (11))119876 (119866 119886) = 0

Selected action 119886 = Inspect(3) Use (10)119876 (119866 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state G Robot 2(1) Use (9)120574 (119866) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119866 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119866 119886) = 100 + 04 (max (50 50 50))= 100 + 20 = 120

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119862) =

05

[1 + 04]=05

14= 036

(2) Broadcast (Use (11))119876 (119862 119886) = 50

Selected action 119886 = Ignore(3) Use (10)119876 (119862 119886) = 100 + 036 (max (50 50))= 100 + 18 = 118

End of value iterationCurrent state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 036]=

05

136= 037

(2) Broadcast (Use (11))119876 (119862 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119862 119886) = 100 + 037 (max (50 50))= 100 + 037 lowast 50 = 1185

End of value iteration

All QLACS 1 states exhaustedGoal stateH Robot 1(1) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(3) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

All QLACS 2 states exhaustedGoal stateH Robot 2(4) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(5) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(6) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Cooperative Behaviours with Swarm Intelligence in Multirobot

Mathematical Problems in Engineering 11

Example I of QLACS (Using Figure 5 for optimized route finding)Parameters used in the first component of QLACS119876 = 20 120572 = 3 120573 = 2 120588 = 001

Starting State 119865Terminating State 119862119865Terminating condition When all states have been visited at least once and it is at terminating stateState Space = 119865 119866 119864119863 119860119861 119862Initialize pheromones positions to 001Rand = (0 1) returns a random value between 0 and 1Equations(i) Computation of attractiveness 120578

119894119895using (2)

(ii) Computation of instantaneous pheromones by ant 119896 for all potential states using (3)(iii) Pheromone update using (4)(iv) Computation of probabilities for the potential states using (5)(v) Adaptation of roulette Wheel using (6)Equation (5) can be reduced to Let 119908(119894 119895) = [120591

119894119895]120572

[120578119894119895]120573

Sum =119873+

sum119894=1

119908 (119894 119895)

So 119875119903(119894 119895) = 119908

sum

Algorithm 3 Parameters for QLACS Example I

In our case there are eight states primarily seven internalstates and a common terminating state Since the states arenot just linear the environment allows for multiple optionsthat is an agentant can choose from any present state Thistype of scenario is also called a multigoal scenario

(ii) State Selection Module Here the agents select the startand end states which according to our model in Figure 4 are119865 and 119862 These states are selected based on the cumulativeprobability of two adjacent nodes in (6)

(iii) Transition Module This module takes care of the tran-sition rules of the agents by calculating the pheromoneconcentrations distances and probabilities using (2) through(4)

(iv)UpdateModuleAfter transition fromone state to anotheran update of the pheromone is computed after whichmultipath planning with shortest path is achieved

(v) Convergence Decision Module This is where the besttrailpath decision is takenThis is the end product of the firstcomponent of QLACS which is then moved to the secondcomponent of QLACS for cooperative behaviour

(vi) Cooperative Inspection Module This is where the robotsare deployed to start inspection The robots are to use theacquired best paths starting from 119865 and 119862 respectively asinput for the second component of QLACS The two robotsare to use the learning rate from (9) to learn the environmentand use (10) for cooperation

(vii) State Broadcasting Module This module handles thebroadcasting behaviours of the two robots which areachieved by using (10) Each robot checks its memoryrepresented by 119876-values before taking any decision

(viii) State Action Module State broadcasting by each robot isimmediately followed by action selection In other words thestate to inspect or ignore is achieved here using (11)

(ix) QLACS Update Module After each action selection the119876-values of each robot are updated using (10)

(x) New State Module This module takes care of the robotrsquosnew state after updating the 119876-values Each robot runs in itsown threads managing its memory yet sharing information

(xi) Final Decision ModuleThis decision module determinesif the robot should exit the environment or still do moreinspections It is also controlled by (11)

(xii) Goal State Module The completion of the second com-ponent of QLACS is getting to the goal state after successfulinspection of statesThis goal state according to our model inFigure 4 is called119867

333 Analytical Scenario For the benefit of robotic andsystem engineers practitioners and researchers this paperuniquely presents scenarios on the ease of implementation ofthe proposed QLACS as described in Algorithms 3 and 4 andTables 7 and 10 The first component of QLACS is explainedin Example I shown in Algorithm 3 and Table 7 The firstcomponent that achieved optimal route paths for the robotshas the parameters listed in Algorithm 3 The calculation fordifferent states transition for the first component of QLACSis analysed in Table 7

Repeat steps 1ndash6 for subsequent current states until thetermination condition and state are reached At the end ofseven updates we have Tables 8 and 9 The total lengthshown in Table 8 represents the total number of obstaclesencountered by each agent while trailing to achieve theoptimal route for the robots The number of obstacle shown

12 Mathematical Problems in Engineering

Table7QLA

CSEx

ampleI

navigatio

nalanalytic

alsolutio

n

Startin

gsta

te119865

Potentialstates119863119864

119866(1)U

se(2)120578119865119863=1120578119865119864=1120578119865119866=1

(2)U

se(3)119871119896=1Δ120591119896(119865119863

)=21

=2Δ

120591119896(119865119864)=2Δ

120591119896(119865119866

)=2

(3)U

se(4)120591 119865119863=(1minus001)lowast001+2=20099=201120591119865119864=201120591119865119866=201

Firstp

heromon

eupd

ate

119865119866

119864

001

201

201

119863119860

119861

201

001

001

119862

001

(4)U

se(5)119908

(119865119863

)=[120591119865119863]120572[120578119865119863]120573=(201)3=812119908

(119865119864)=812119908

(119865119866

)=812

Sum

=119908(119865119863

)+119908(119865119864)+119908(119865119866

)=2436

119875119903(119865119863

)=

119908 sum

=812

2436=033119875119903(119865119864)=033119875119903(119865119866

)=033

Prob

abilitie

s119865

119866119864

0033

033

119863119860

119861

033

00

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0033

033

119863119860

119861

033

00

119862 0

CallRa

ndRa

nd=007R

andfalls

instate119864

Rou

lette

wheelselects119864

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

Currentstate119864

Potentialstates119861119863

119866(1)U

se(2)120578119864119861=12120578119864119863=12120578119864119866=1

(2)U

se(3)119871119896=(119865

minus119864minus119863)=1+2=3119871119896=(119865

minus119864minus119861)=1+2=3119871119896=(119865

minus119864minus119866)=1+2=3

Δ120591119896=2 3=067

(3)U

se(4)120591 119864119861=(1minus001)lowast201+067=266120591119864119863=266120591119864119866=266

Second

pherom

oneu

pdate

119865119866

119864

001

266

201

119863119860

119861

266

001

266

119862

001

(4)U

se(5)119908

(119864119861)=(266)3lowast(12)2=471119908

(119864119863

)=471119908

(119864119866

)=(266)3lowast(1)2=1882

Sum

=471+471+1882=2824

119875119903(119864119861)=

119908 sum

=471

2824=017119875119903(119864119863

)=

471

2824=017119875119903(119864119866)=1882

2824=067

Prob

abilitie

s119865

119866119864

0067

0

119863119860

119861

017

0017

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0067

0

119863119860

119861

017

0017

119862 0

CallRa

ndRa

nd=09Ra

ndfalls

instate119863

Rou

lette

wheelselects119863

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

119863

Mathematical Problems in Engineering 13

Example II of QLACS (For good cooperation and communication between robots)Parameters used in the second component of QLACS (Using output from the first component of QLACS)Reward Scheme Inspect = 50 Ignore = 100 Shutdown = 150State space Optimized path from QLACS R1 = 119865 119864119863 119860 119861 119862 119866 119862 and QLACS R2 = 119862 119861 119860119863 119865 119864 119866 119862

Starting State 119865119862119878119895= Terminating State 119862 then119867

Terminating condition When all states have been visitedInitialize 119876value positions to zerosEquations(i) Compute learning rate using (9)(ii) Compute update on 119876(119904 119886) using (10)

Algorithm 4 Parameters for QLACS Example II

Table 8 Pheromone update of a full cycle

Pheromone updateCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 001 201 201 201 001 001 001 1st update119864 001 266 201 266 001 266 001 2nd update119863 313 001 303 001 303 001 001 3rd update119860 001 001 001 045 001 045 001 4th update119861 001 001 067 001 067 001 07 5th update119862 001 091 001 001 001 089 001 6th update119866 071 001 069 001 001 001 071 7th update119862 = terminating state and ant 119896 has moved through all states at least onceTrail 119865119864119863119860119861119862119866119862Number of obstacles = 1 + 2 + 2 + 2 + 1 + 1 + 1 + 1 = 10

Table 9 Probability update of a full cycle

Probability tableCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 0 033 033 033 0 0 0119864 0 067 0 017 0 017 0119863 069 0 016 0 016 0 0119860 0 0 0 05 0 05 0119861 0 0 016 0 016 0 068119862 0 052 0 0 0 049 0119866 045 0 0 0 0 0 045119862 = terminating state

as 10 is calculated using the trail result in Table 8 Table 9displays the probability update table for a full cycle of routefinding In the case of this first example the first robot willterminate its inspection through 119862 and then119867

Algorithm 4 displays the parameters of the second exam-ple of QLACS The second component of QLACS handlesthis aspect of the model which is the communication andcooperative part Once the first component hands the outputto the second component it becomes the second componentinput and it runs with it From Algorithm 4 QLACS R1represents the input for robot 1 and QLACS R2 representsthe input for robot 2 received from the first component of

QLACS The terminating condition is when all states havebeen visited

The rest of Example II displayed in Table 10 explains thecooperative behavioural scenario from one state to anotherfor two robots The tables in the last row of Table 10 show thecommunication and cooperative output achieved using thesecond component of QLACS

4 Experimental Evaluations SafetyInspections for MRS Behaviours

The experimental results of implementing QLACS in anenvironment that consists of obstacles and links are tabu-lated in this section The experimental setup is explainedin Section 41 Different performance categories are shownin this section without communication category and withcommunication category In the without communicationcategory as displayed in Section 42 we found that robotscan inspect all states individually without knowing thatanother robot exists Robots can also inspect some of thestates thereby leaving some states not inspected The com-munication category is explained in Section 43 while theperformance of the QLACS measured with other existingmethods is tabulated in Section 44

41 Experimental Setup Figure 8 displays different sets ofexperiments conducted in the environment using two robotsFigure 8(a) shows how two robots resume inspection fromtwo different entrances In each set of experiments the robotstake the readings of the roof cracks and level of toxic gasesusing their sensors The same behaviours happen in Figures8(b) and 8(c) respectively at different inspection locationsThe inspection locations vary in the four experimental setupsshown in Figure 8

42 Experiment 1 Performance of QLACS without Coop-eration The result of implementing the QLACS withoutcommunication in the proposed environment (Figures 1 and4) is shown in Tables 11 and 12 In the case of Table 11 robot1 (R1) enters the mine through state 119865 while robot 2 (R2)enters the mine through state 119862 However each robot learnsby inspecting some of the states and ignoring some of the

14 Mathematical Problems in Engineering

Table 10 QLACS Example II cooperative behaviour

Starting simulation Robot 1Starting state F(1) Use (9)120574 (119865) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119865 (use (11))119876 (119865 119886) = 0

Selected action 119886 = inspect(3) use (10)119876 (119865 119886) = 50 + 025(0) = 50

End of value iteration

Current state E Robot 1(1) Use (9)120574 (119864) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119864 (Use (11))119876 (119864 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119864 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119862 (use (11))119876 (119862 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119862 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state B Robot 2(1) Use (9)120574 (119861) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119861 (use (11))119876 (119861 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119861 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current stateD Robot 1(1) Use (9)120574 (119863) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119863 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119863 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current State A Robot 2(1) Use (9)120574 (119860) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119860 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119860 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iterationCurrent state A Robot 1(1) Use (9)120574 (119860) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119860 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119860 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current stateD Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119863 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119863 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state B Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state F Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119865 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119865 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state E Robot 2(1) Use (9)120574 (119864) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119864 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119864 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state G Robot 1(1) Use (9)120574 (119866) =

05

[1 + 1]=05

2= 025

(2) Broadcast (Use (11))119876 (119866 119886) = 0

Selected action 119886 = Inspect(3) Use (10)119876 (119866 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state G Robot 2(1) Use (9)120574 (119866) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119866 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119866 119886) = 100 + 04 (max (50 50 50))= 100 + 20 = 120

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119862) =

05

[1 + 04]=05

14= 036

(2) Broadcast (Use (11))119876 (119862 119886) = 50

Selected action 119886 = Ignore(3) Use (10)119876 (119862 119886) = 100 + 036 (max (50 50))= 100 + 18 = 118

End of value iterationCurrent state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 036]=

05

136= 037

(2) Broadcast (Use (11))119876 (119862 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119862 119886) = 100 + 037 (max (50 50))= 100 + 037 lowast 50 = 1185

End of value iteration

All QLACS 1 states exhaustedGoal stateH Robot 1(1) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(3) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

All QLACS 2 states exhaustedGoal stateH Robot 2(4) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(5) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(6) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Cooperative Behaviours with Swarm Intelligence in Multirobot

12 Mathematical Problems in Engineering

Table7QLA

CSEx

ampleI

navigatio

nalanalytic

alsolutio

n

Startin

gsta

te119865

Potentialstates119863119864

119866(1)U

se(2)120578119865119863=1120578119865119864=1120578119865119866=1

(2)U

se(3)119871119896=1Δ120591119896(119865119863

)=21

=2Δ

120591119896(119865119864)=2Δ

120591119896(119865119866

)=2

(3)U

se(4)120591 119865119863=(1minus001)lowast001+2=20099=201120591119865119864=201120591119865119866=201

Firstp

heromon

eupd

ate

119865119866

119864

001

201

201

119863119860

119861

201

001

001

119862

001

(4)U

se(5)119908

(119865119863

)=[120591119865119863]120572[120578119865119863]120573=(201)3=812119908

(119865119864)=812119908

(119865119866

)=812

Sum

=119908(119865119863

)+119908(119865119864)+119908(119865119866

)=2436

119875119903(119865119863

)=

119908 sum

=812

2436=033119875119903(119865119864)=033119875119903(119865119866

)=033

Prob

abilitie

s119865

119866119864

0033

033

119863119860

119861

033

00

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0033

033

119863119860

119861

033

00

119862 0

CallRa

ndRa

nd=007R

andfalls

instate119864

Rou

lette

wheelselects119864

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

Currentstate119864

Potentialstates119861119863

119866(1)U

se(2)120578119864119861=12120578119864119863=12120578119864119866=1

(2)U

se(3)119871119896=(119865

minus119864minus119863)=1+2=3119871119896=(119865

minus119864minus119861)=1+2=3119871119896=(119865

minus119864minus119866)=1+2=3

Δ120591119896=2 3=067

(3)U

se(4)120591 119864119861=(1minus001)lowast201+067=266120591119864119863=266120591119864119866=266

Second

pherom

oneu

pdate

119865119866

119864

001

266

201

119863119860

119861

266

001

266

119862

001

(4)U

se(5)119908

(119864119861)=(266)3lowast(12)2=471119908

(119864119863

)=471119908

(119864119866

)=(266)3lowast(1)2=1882

Sum

=471+471+1882=2824

119875119903(119864119861)=

119908 sum

=471

2824=017119875119903(119864119863

)=

471

2824=017119875119903(119864119866)=1882

2824=067

Prob

abilitie

s119865

119866119864

0067

0

119863119860

119861

017

0017

119862 0

(5)U

se(6)

119867 0

119865119866

119864

0067

0

119863119860

119861

017

0017

119862 0

CallRa

ndRa

nd=09Ra

ndfalls

instate119863

Rou

lette

wheelselects119863

asthen

extstateend

ofroulettewheel

(6)U

pdatetrail119865119864

119863

Mathematical Problems in Engineering 13

Example II of QLACS (For good cooperation and communication between robots)Parameters used in the second component of QLACS (Using output from the first component of QLACS)Reward Scheme Inspect = 50 Ignore = 100 Shutdown = 150State space Optimized path from QLACS R1 = 119865 119864119863 119860 119861 119862 119866 119862 and QLACS R2 = 119862 119861 119860119863 119865 119864 119866 119862

Starting State 119865119862119878119895= Terminating State 119862 then119867

Terminating condition When all states have been visitedInitialize 119876value positions to zerosEquations(i) Compute learning rate using (9)(ii) Compute update on 119876(119904 119886) using (10)

Algorithm 4 Parameters for QLACS Example II

Table 8 Pheromone update of a full cycle

Pheromone updateCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 001 201 201 201 001 001 001 1st update119864 001 266 201 266 001 266 001 2nd update119863 313 001 303 001 303 001 001 3rd update119860 001 001 001 045 001 045 001 4th update119861 001 001 067 001 067 001 07 5th update119862 001 091 001 001 001 089 001 6th update119866 071 001 069 001 001 001 071 7th update119862 = terminating state and ant 119896 has moved through all states at least onceTrail 119865119864119863119860119861119862119866119862Number of obstacles = 1 + 2 + 2 + 2 + 1 + 1 + 1 + 1 = 10

Table 9 Probability update of a full cycle

Probability tableCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 0 033 033 033 0 0 0119864 0 067 0 017 0 017 0119863 069 0 016 0 016 0 0119860 0 0 0 05 0 05 0119861 0 0 016 0 016 0 068119862 0 052 0 0 0 049 0119866 045 0 0 0 0 0 045119862 = terminating state

as 10 is calculated using the trail result in Table 8 Table 9displays the probability update table for a full cycle of routefinding In the case of this first example the first robot willterminate its inspection through 119862 and then119867

Algorithm 4 displays the parameters of the second exam-ple of QLACS The second component of QLACS handlesthis aspect of the model which is the communication andcooperative part Once the first component hands the outputto the second component it becomes the second componentinput and it runs with it From Algorithm 4 QLACS R1represents the input for robot 1 and QLACS R2 representsthe input for robot 2 received from the first component of

QLACS The terminating condition is when all states havebeen visited

The rest of Example II displayed in Table 10 explains thecooperative behavioural scenario from one state to anotherfor two robots The tables in the last row of Table 10 show thecommunication and cooperative output achieved using thesecond component of QLACS

4 Experimental Evaluations SafetyInspections for MRS Behaviours

The experimental results of implementing QLACS in anenvironment that consists of obstacles and links are tabu-lated in this section The experimental setup is explainedin Section 41 Different performance categories are shownin this section without communication category and withcommunication category In the without communicationcategory as displayed in Section 42 we found that robotscan inspect all states individually without knowing thatanother robot exists Robots can also inspect some of thestates thereby leaving some states not inspected The com-munication category is explained in Section 43 while theperformance of the QLACS measured with other existingmethods is tabulated in Section 44

41 Experimental Setup Figure 8 displays different sets ofexperiments conducted in the environment using two robotsFigure 8(a) shows how two robots resume inspection fromtwo different entrances In each set of experiments the robotstake the readings of the roof cracks and level of toxic gasesusing their sensors The same behaviours happen in Figures8(b) and 8(c) respectively at different inspection locationsThe inspection locations vary in the four experimental setupsshown in Figure 8

42 Experiment 1 Performance of QLACS without Coop-eration The result of implementing the QLACS withoutcommunication in the proposed environment (Figures 1 and4) is shown in Tables 11 and 12 In the case of Table 11 robot1 (R1) enters the mine through state 119865 while robot 2 (R2)enters the mine through state 119862 However each robot learnsby inspecting some of the states and ignoring some of the

14 Mathematical Problems in Engineering

Table 10 QLACS Example II cooperative behaviour

Starting simulation Robot 1Starting state F(1) Use (9)120574 (119865) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119865 (use (11))119876 (119865 119886) = 0

Selected action 119886 = inspect(3) use (10)119876 (119865 119886) = 50 + 025(0) = 50

End of value iteration

Current state E Robot 1(1) Use (9)120574 (119864) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119864 (Use (11))119876 (119864 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119864 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119862 (use (11))119876 (119862 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119862 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state B Robot 2(1) Use (9)120574 (119861) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119861 (use (11))119876 (119861 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119861 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current stateD Robot 1(1) Use (9)120574 (119863) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119863 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119863 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current State A Robot 2(1) Use (9)120574 (119860) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119860 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119860 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iterationCurrent state A Robot 1(1) Use (9)120574 (119860) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119860 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119860 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current stateD Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119863 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119863 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state B Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state F Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119865 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119865 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state E Robot 2(1) Use (9)120574 (119864) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119864 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119864 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state G Robot 1(1) Use (9)120574 (119866) =

05

[1 + 1]=05

2= 025

(2) Broadcast (Use (11))119876 (119866 119886) = 0

Selected action 119886 = Inspect(3) Use (10)119876 (119866 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state G Robot 2(1) Use (9)120574 (119866) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119866 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119866 119886) = 100 + 04 (max (50 50 50))= 100 + 20 = 120

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119862) =

05

[1 + 04]=05

14= 036

(2) Broadcast (Use (11))119876 (119862 119886) = 50

Selected action 119886 = Ignore(3) Use (10)119876 (119862 119886) = 100 + 036 (max (50 50))= 100 + 18 = 118

End of value iterationCurrent state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 036]=

05

136= 037

(2) Broadcast (Use (11))119876 (119862 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119862 119886) = 100 + 037 (max (50 50))= 100 + 037 lowast 50 = 1185

End of value iteration

All QLACS 1 states exhaustedGoal stateH Robot 1(1) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(3) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

All QLACS 2 states exhaustedGoal stateH Robot 2(4) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(5) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(6) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Cooperative Behaviours with Swarm Intelligence in Multirobot

Mathematical Problems in Engineering 13

Example II of QLACS (For good cooperation and communication between robots)Parameters used in the second component of QLACS (Using output from the first component of QLACS)Reward Scheme Inspect = 50 Ignore = 100 Shutdown = 150State space Optimized path from QLACS R1 = 119865 119864119863 119860 119861 119862 119866 119862 and QLACS R2 = 119862 119861 119860119863 119865 119864 119866 119862

Starting State 119865119862119878119895= Terminating State 119862 then119867

Terminating condition When all states have been visitedInitialize 119876value positions to zerosEquations(i) Compute learning rate using (9)(ii) Compute update on 119876(119904 119886) using (10)

Algorithm 4 Parameters for QLACS Example II

Table 8 Pheromone update of a full cycle

Pheromone updateCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 001 201 201 201 001 001 001 1st update119864 001 266 201 266 001 266 001 2nd update119863 313 001 303 001 303 001 001 3rd update119860 001 001 001 045 001 045 001 4th update119861 001 001 067 001 067 001 07 5th update119862 001 091 001 001 001 089 001 6th update119866 071 001 069 001 001 001 071 7th update119862 = terminating state and ant 119896 has moved through all states at least onceTrail 119865119864119863119860119861119862119866119862Number of obstacles = 1 + 2 + 2 + 2 + 1 + 1 + 1 + 1 = 10

Table 9 Probability update of a full cycle

Probability tableCurrent states 119865 119866 119864 119863 119860 119861 119862

119865 0 033 033 033 0 0 0119864 0 067 0 017 0 017 0119863 069 0 016 0 016 0 0119860 0 0 0 05 0 05 0119861 0 0 016 0 016 0 068119862 0 052 0 0 0 049 0119866 045 0 0 0 0 0 045119862 = terminating state

as 10 is calculated using the trail result in Table 8 Table 9displays the probability update table for a full cycle of routefinding In the case of this first example the first robot willterminate its inspection through 119862 and then119867

Algorithm 4 displays the parameters of the second exam-ple of QLACS The second component of QLACS handlesthis aspect of the model which is the communication andcooperative part Once the first component hands the outputto the second component it becomes the second componentinput and it runs with it From Algorithm 4 QLACS R1represents the input for robot 1 and QLACS R2 representsthe input for robot 2 received from the first component of

QLACS The terminating condition is when all states havebeen visited

The rest of Example II displayed in Table 10 explains thecooperative behavioural scenario from one state to anotherfor two robots The tables in the last row of Table 10 show thecommunication and cooperative output achieved using thesecond component of QLACS

4 Experimental Evaluations SafetyInspections for MRS Behaviours

The experimental results of implementing QLACS in anenvironment that consists of obstacles and links are tabu-lated in this section The experimental setup is explainedin Section 41 Different performance categories are shownin this section without communication category and withcommunication category In the without communicationcategory as displayed in Section 42 we found that robotscan inspect all states individually without knowing thatanother robot exists Robots can also inspect some of thestates thereby leaving some states not inspected The com-munication category is explained in Section 43 while theperformance of the QLACS measured with other existingmethods is tabulated in Section 44

41 Experimental Setup Figure 8 displays different sets ofexperiments conducted in the environment using two robotsFigure 8(a) shows how two robots resume inspection fromtwo different entrances In each set of experiments the robotstake the readings of the roof cracks and level of toxic gasesusing their sensors The same behaviours happen in Figures8(b) and 8(c) respectively at different inspection locationsThe inspection locations vary in the four experimental setupsshown in Figure 8

42 Experiment 1 Performance of QLACS without Coop-eration The result of implementing the QLACS withoutcommunication in the proposed environment (Figures 1 and4) is shown in Tables 11 and 12 In the case of Table 11 robot1 (R1) enters the mine through state 119865 while robot 2 (R2)enters the mine through state 119862 However each robot learnsby inspecting some of the states and ignoring some of the

14 Mathematical Problems in Engineering

Table 10 QLACS Example II cooperative behaviour

Starting simulation Robot 1Starting state F(1) Use (9)120574 (119865) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119865 (use (11))119876 (119865 119886) = 0

Selected action 119886 = inspect(3) use (10)119876 (119865 119886) = 50 + 025(0) = 50

End of value iteration

Current state E Robot 1(1) Use (9)120574 (119864) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119864 (Use (11))119876 (119864 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119864 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119862 (use (11))119876 (119862 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119862 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state B Robot 2(1) Use (9)120574 (119861) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119861 (use (11))119876 (119861 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119861 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current stateD Robot 1(1) Use (9)120574 (119863) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119863 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119863 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current State A Robot 2(1) Use (9)120574 (119860) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119860 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119860 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iterationCurrent state A Robot 1(1) Use (9)120574 (119860) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119860 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119860 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current stateD Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119863 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119863 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state B Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state F Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119865 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119865 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state E Robot 2(1) Use (9)120574 (119864) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119864 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119864 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state G Robot 1(1) Use (9)120574 (119866) =

05

[1 + 1]=05

2= 025

(2) Broadcast (Use (11))119876 (119866 119886) = 0

Selected action 119886 = Inspect(3) Use (10)119876 (119866 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state G Robot 2(1) Use (9)120574 (119866) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119866 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119866 119886) = 100 + 04 (max (50 50 50))= 100 + 20 = 120

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119862) =

05

[1 + 04]=05

14= 036

(2) Broadcast (Use (11))119876 (119862 119886) = 50

Selected action 119886 = Ignore(3) Use (10)119876 (119862 119886) = 100 + 036 (max (50 50))= 100 + 18 = 118

End of value iterationCurrent state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 036]=

05

136= 037

(2) Broadcast (Use (11))119876 (119862 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119862 119886) = 100 + 037 (max (50 50))= 100 + 037 lowast 50 = 1185

End of value iteration

All QLACS 1 states exhaustedGoal stateH Robot 1(1) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(3) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

All QLACS 2 states exhaustedGoal stateH Robot 2(4) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(5) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(6) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 14: Cooperative Behaviours with Swarm Intelligence in Multirobot

14 Mathematical Problems in Engineering

Table 10 QLACS Example II cooperative behaviour

Starting simulation Robot 1Starting state F(1) Use (9)120574 (119865) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119865 (use (11))119876 (119865 119886) = 0

Selected action 119886 = inspect(3) use (10)119876 (119865 119886) = 50 + 025(0) = 50

End of value iteration

Current state E Robot 1(1) Use (9)120574 (119864) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119864 (Use (11))119876 (119864 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119864 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 1]=05

2= 025

(2) Check the Q-value for state 119862 (use (11))119876 (119862 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119862 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state B Robot 2(1) Use (9)120574 (119861) =

05

[1 + 1]=05

2= 025

(2) Check the 119876-value for state 119861 (use (11))119876 (119861 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119861 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current stateD Robot 1(1) Use (9)120574 (119863) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119863 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119863 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current State A Robot 2(1) Use (9)120574 (119860) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119860 119886) = 0

Selected action 119886 = inspect(3) Use (10)119876 (119860 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iterationCurrent state A Robot 1(1) Use (9)120574 (119860) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119860 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119860 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current stateD Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119863 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119863 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state B Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state F Robot 2(1) Use (9)120574 (119863) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119865 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119865 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119861) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119861 119886) = 50 that is 119876 gt 0

Selected action 119886 = Ignore(3) Use (10)119876 (119861 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iteration

Current state E Robot 2(1) Use (9)120574 (119864) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119864 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119864 119886) = 100 + 04 (max (0 0 0)) = 100

End of value iterationCurrent state G Robot 1(1) Use (9)120574 (119866) =

05

[1 + 1]=05

2= 025

(2) Broadcast (Use (11))119876 (119866 119886) = 0

Selected action 119886 = Inspect(3) Use (10)119876 (119866 119886) = 50 + 025 (max (0 0 0)) = 50

End of value iteration

Current state G Robot 2(1) Use (9)120574 (119866) =

05

[1 + 025]=

05

125= 04

(2) Broadcast (use (11))119876 (119866 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119866 119886) = 100 + 04 (max (50 50 50))= 100 + 20 = 120

End of value iteration

Current state C Robot 1(1) Use (9)120574 (119862) =

05

[1 + 04]=05

14= 036

(2) Broadcast (Use (11))119876 (119862 119886) = 50

Selected action 119886 = Ignore(3) Use (10)119876 (119862 119886) = 100 + 036 (max (50 50))= 100 + 18 = 118

End of value iterationCurrent state C Robot 2(1) Use (9)120574 (119862) =

05

[1 + 036]=

05

136= 037

(2) Broadcast (Use (11))119876 (119862 119886) = 50 that is 119876 gt 0

Selected action 119886 = ignore(3) Use (10)119876 (119862 119886) = 100 + 037 (max (50 50))= 100 + 037 lowast 50 = 1185

End of value iteration

All QLACS 1 states exhaustedGoal stateH Robot 1(1) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(2) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(3) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

All QLACS 2 states exhaustedGoal stateH Robot 2(4) Use (9)120574 (119867) =

05

[1 + 1]=05

2= 025

(5) Broadcast (use (11))119876 (119867 119886) = 0

Selected action 119886 = Shutdown(6) Use (10)119876 (119867 119886) = 150 + 025 (max (0 0)) = 150

End of value iteration

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 15: Cooperative Behaviours with Swarm Intelligence in Multirobot

Mathematical Problems in Engineering 15

Table 10 Continued

End of policy iteration

Robot 1 Robot 2Inspect Ignore Shutdown

119865 Yes No No119864 Yes No No119863 Yes No No119860 No Yes No119861 No Yes No119862 No Yes No119866 Yes No No119862 No Yes Yes through 119867

Inspect Ignore Shutdown119862 Yes No No119861 Yes No no119860 Yes No No119863 No Yes No119865 No Yes No119864 No Yes No119866 No Yes no119862 No Yes Yes through 119867

states Since there is no communication they exit the mineafter learning consequently not inspecting all the states Thesame routine is reflected in Table 12 but in this case eachrobot ends up inspecting all the states before exiting themine Analysis of Tables 11 and 12 shows that resources arewasted and the inspection job is not effective and efficientComparing the two tables the time and distance cost arehigh though higher in Table 12 because of many repetitionsFor instance the time and distance cost in Table 12 column2 are 480028 and ((F G E D A B C) (C B A D E GF)) respectively It also shows that the states are repeated inTables 11 and 12 The memory usage is equally high This ledus to create a more efficient QLACS with communication byintroducing some heuristicsThe processes involved in Tables11 and 12 are explained in Table 13

One can see from Tables 11 and 12 that there is no goodcommunication among the robots hence the evolvement ofTable 14

43 Experiment 2 Performance of QLACS with Good Coop-eration The heuristic added to the known QL made thisexperiment show good communication This is where ourcontribution to communication is shown As a robot inspectsand learns the environment it broadcast its 119876-values to theother robot which is in the form of a lookup table In thiscase each robot checks for119876-values when a119876-value is equalto zero the robot randomly selects a state for inspection Ifa 119876-value is greater than zero the robot checks if the stateis available for inspection or for ignoring When a robotencounters a state with a119876-value equal to zero and the threadnext to the state is equal to the goal state (119867) then it shutsdown It must have checked the lookup table to see thatall states have been inspected The result in Table 14 showsgood communication between two robots No states wereinspected more than once The iterations for every run withtheir times memory usage and effective communication arealso displayed in Table 14

Comparing Table 14 with Tables 11 and 12 one cannot butnotice the huge differences in thememory time and distancecosts The communication between the robots in Table 14resulted in achieving good inspection however the randommethod of choosing next inspection states did not give anoptimized route thereby increasing time cost in passing andchecking through inspected states

(a) Two robots starting inspection from two different entrances

(b) Two robots inspecting as they navigate in the environment

(c) Two robots inspecting different locations with different positions

Figure 8 Different experimental behaviours for two robots

44 Experiment 3 Performance of QLACS for the NavigationBehaviour of the ProposedModel The result of implementingthe first component of QLACS for effective navigation ofthe proposed model is tabulated in Table 16 Table 15 shows

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 16: Cooperative Behaviours with Swarm Intelligence in Multirobot

16 Mathematical Problems in Engineering

Table 11 QLACS without communication (inspecting some states)

Number of runs 1 2 3 4 5 6 7Iterations 9 10 10 13 13 9 13Time (sec) 430025 300017 34002 300017 310017 310018 270016Memory usage (bytes) 18872 18160 19204 18208 17896 18164 18308Inspected states (R1) 119866 119865119866 119862 119862 119865119863 119861 119864 119860 119865 119864119863 119860 119862 119865 119866 119864 119860

Inspected states (R2) 119861119860 119866 119862 119862 119861 119860 119864 119865 119861 119860 119862 119861 119864 119861 119860 119864 119860 119866

Table 12 QLACS without communication (inspecting all states)

Number of runs 1 2 3 4 5 6 7Iterations 114 10 70 10 9 10 10Time (sec) 430024 480028 410023 340019 340019 340019 35002Memory usage (bytes) 28440 17960 27216 17000 18456 17672 17968

Inspected states (R1) F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

F G E D AB C

Inspected states (R2) C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

C B A D EG F

Table 13 Processes used in achieving Tables 11 and 12

(a) For Table 11

Inspecting some States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 9(8) Repeat Steps 3 to 6 until Step 7(9) Shutdown

(b) For Table 12

Inspecting all States(1) Initialize the starting and goal states(2) Initialize all 119876values to zeroes(3) Select a random action If 119876values of all possible actions atcurrent state are zeroes(4) Select the action with the highest 119876value If it is the Max119876value(5) Compute and update 119876value of the selected action(6) Get new state among possible states(7) If new state = goal state then go to Steps 5 and 10(8) Repeat Steps 3 to 7 until Step 9(9) All states except the goal state has taken Action = Inspect(10) Shutdown

the selected parameters used in achieving the optimal pathThe optimal path found after nine test runs is the path witha distance cost of 10 for both entries to the environmentdisplayed in Table 16 Therefore columns 4 5 7 8 and 9 canbe used as the optimized path input for the first component ofQLACS Then QLACS will use any of the optimized paths to

navigate to the specified states and take decisions accordinglyFor instance the test run result for nine ants gives 16 iterationsunder 600035 seconds and giving the shortest paths to bothrobots coming from entries 119865 and 119862 of the environmentThepath that gives us cost as 10 is obtained from FEDABCGC(1222111) and CBADFGEGC (12211221) respectively

Alpha (120572) Beta (120573) and Rho (120588) represent the heuristicproperties of the ACS algorithm The Alpha factor is themeasure of the influence of pheromone concentration thatcan influence the selection of the path with the highestpheromone concentration Beta is the measure of the influ-encewhich is related to the distance between any two adjacentnodes It is also a heuristic factor that can measure theinfluence distance in selecting the next state It is not limitedto pheromones Rho has to do with the rate at which thepheromones evaporate Rho shows how often new pathsare explored by the agentsants rather than reinforcing oldpaths Each agent cooperates by having access to other agentsrsquopheromone valuesThe pheromone value is initialized to 001and it is continually updated until learning stops It is similarto the first component of QLACS where we initialize allQLACS positions to zero and update for every new step

Table 16 gave the optimized route to be used by the robotsto achieve inspection tasksThis resulted in combining Tables14 and 16 to achieve Table 17 Table 17 shows optimized timecost memory usage route cost and good communication

45 Experiment 4 Benchmarking the New Hybrid Model(QLACS) with Popular Methods Choosing the optimizedpaths QLACS performs cooperative inspection behaviourthrough communication among robots Looking at the sam-ple result on Table 17 QLACS chooses the shortest andcomplete possible inspection routes from different runs Inthis case the best paths were obtained from Table 16 by using9 agents 10 agents and 12 agents All the test runs gave besttrail paths from both entrances listing the complete states

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 17: Cooperative Behaviours with Swarm Intelligence in Multirobot

Mathematical Problems in Engineering 17

Table 14 QLACS with communication

Number of runs 1 2 3 4 5 6 7Iterations 9 10 9 13 10 10 9Time (sec) 330019 310017 310018 310018 290023 300017 310018Memory usage (bytes) 15748 15948 15748 18232 16576 15948 15748Inspected states (R1) 119865 119866 119864 119865 119866 119864119863 119865 119866 119864 119865 119864 119865 119866 119864119863 119861 119865 119866 119864119863 119865 119866 119864

Inspected states (R2) 119862 119861 119860119863 119862 119861 119860 119862 119861 119860119863 119865 119866 119864119863 119860 119862119860 119862 119861 119860 119862 119861 119860119863

Table 15 Navigation behaviour parameter specification

ACO properties Type of ACO Population Length of path Pheromonecoefficient 120573

Heuristiccoefficient 120572 Evaporation rate 120588

Properties ACS 9 or 12 8 2 3 001

0

10

20

30

40

50

60

70

0 1 2 3 4 5 6 7 8

Tim

e cos

t

Number of runs

QL time cost

ACS time cost

QLACS time cost

Figure 9 Comparison of time costs for QL ACS and QLACS

05

1015202530354045

0 1 2 3 4 5 6

Rout

e cos

t

Runs

Path cost for R1 (QL)

Path cost for R2 (QL)

Path cost for R1 and R2 (QLACS)

Figure 10 Comparison of route costs for QL and QLACS

with length 10 as shown in Table 16 that is they have statesfrom 119865 to 119862 and from 119862 to 119865 The length is the addition ofweights along the line (trail edges) Then the QLACS usesthe optimized paths to make decisions on inspections Theresult from Table 17 shows that no state is inspected twice orvisited more than required The QLACS model concentrateson making the best inspection decisions for MRS The firstrun on Table 17 shows that nine agentsants were used forthe route finding the optimized route was achieved after nineiterations under 70004 sec and the states where all inspectedeffectively without redundancies

0

5

10

15

20

25

30

1 2 3 4 5

Num

ber o

f ant

site

ratio

ns

Runs

AntsIteration

Figure 11 Number of antsiteration required for QLACS to findoptimal path

02468

101214

0 2 4 6 8 10

Tim

eite

ratio

n

Number of runs

Iteration

Time

Figure 12 Time and iteration required for QLACS to achieveoptimal behavior

451 Comparative Study of the ProposedModel (QLACS) withQL Only and ACS Only Based on the results tabulated inTable 18 and Figure 9 the average time costs of achieving theMRS behaviours for QL ACS and QLACS were comparedTwo robots will use an average of 308590 sec to achievethorough inspection behaviour using QL an average of404309 sec to achieve optimal navigation behaviour usingACS and an average of 92868 sec to achieve both navigationand cooperative behaviour using QLACS The result shows

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 18: Cooperative Behaviours with Swarm Intelligence in Multirobot

18 Mathematical Problems in Engineering

Table 16 Navigation behaviour computations

Number of runs 1 2 3 4 5 6 7 8 9Number of Ants 5 7 8 9 10 11 12 15 28Iterations 12 26 14 16 23 14 16 23 22Time (sec) 370021 370021 390022 600035 320019 330019 35002 430025 650037Best train distance (R1) 12 12 10 10 10 10 10 10 10Best trail distance (R2) 10 12 12 10 10 12 10 10 10

Table 17 New hybrid model QLACS computations

Number of runs 1 2 3 4 5 6 7 8 9Iterations 9 9 9 9 9 9 9 9 9Number of ants 9 9 9 10 10 10 12 12 12Time (sec) 70004 80005 90005 110006 110006 120007 80005 90005 90006Memory usage (bytes) 10288 10280 10248 10288 10288 10288 10164 10712 10164Inspected states (R1) 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864119863 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866 119864 119860 119861 119865 119866119863 119860 119865 119866119863 119860 119865 119866119863 119860

Inspected states (R2) 119862 119861 119860 119862 119861 119860 119862 119861 119860 119862119863 119862119863 119862119863 119862 119861 119864 119862 119861 119864 119862 119861 119864

Table 18 Time cost comparison for QL ACS and QLACS

Runs QL time cost(sec)

ACS time cost(sec)

QLACS timecost (sec)

1 330019 370021 700042 310019 370021 80043 310018 390022 900054 310018 320019 900055 290023 35002 1100066 300017 430025 1200077 310018 600035 90006Average 308590 404309 92868

that our proposed integrated algorithm performs better withreduced time cost On the same note the route costs for theQL and QLACS were also compared The results in Table 19and Figure 10 show that the proposed model QLACS gave amuch lower route cost than the QL

The number of ants and iterations used to achieve thisis displayed in Figure 11 The more the ants the more theiterationThe best routes created in the five test runs shown inFigure 11 are run numbers 1 and 3 They used fewer ants anditerations to achieve the optimal routes The optimal resultfor the proposed model is achieved under nine iterations forevery run The iteration remains constant for any number ofagents and robotsThe blue line with starmarkers at Figure 12is the iteration value for 9 runs The red line shows thedifferent amounts of time required for each run The timeis also almost stable for the QLACS though it fluctuated alittle in the middle This signifies that the proposed modelis more robust and effective in finding the optimal routeand coordinating MRS The reduced route cost and shortercomputation time achieved with the QLACS satisfied thecriteria for cooperative behavioural purposes

46 Experiment 4 Scalability of QLACS Using Two Threeand Four Robots In this section we present some resultsobtained by experimenting with the cooperative behaviouralaction of two three and four robots using the QLACSmodelThe performance of the two three and four robots wasevaluated by running the simulation three times using thesame number of agents The performance of the proposedQLACSmodel shows good communication between the twothree and four robots under the states inspected in the lastfour columns of Table 20 An almost stable time was used inachieving the inspection for all the robots The detail of thesimulation result is laid out in Table 20

A notable observation emanating from Table 20 is thatthere is a need for a larger mine area of inspection becauserobot R1 in rows 4 to 6 could inspect only one state whilerobots R1 and R2 in rows 7 to 9 could inspect only one stateeach This implies that the size of the field of inspection isproportional to the number of robots to be deployed

5 Concluding Remarks

This paper has shown how MRS behave cooperatively inunderground terrains The QL algorithm has been inves-tigated for its potential quality of good communicationwhile the ACS algorithm was explored for good navigationproperties The strengths of the two algorithms have beenharnessed and optimized to form a hybrid model QLACSwhich addressed the behavioural situation inMRS effectively

The analytical solution of the newmodel was explained asshown in Algorithms 3 and 4 and Tables 7 and 10The hybridarchitecture for themodel was also described in Figure 7Theexperimental setup describing different navigation locationsfor two robots was shown in Figure 8

The new hybrid model QLACS for cooperative behaviourof MRS in an underground terrain was proven able to findthe optimal route andhandle cooperation between two robots

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 19: Cooperative Behaviours with Swarm Intelligence in Multirobot

Mathematical Problems in Engineering 19

Table 19 Route cost comparison for QL and QLACS

Runs QL (sec) QLACS (sec)Path cost for R1 Path cost for R2 Path cost for R1 Path cost for R2

1 20 20 10 102 32 19 10 103 28 14 10 104 27 27 10 105 39 30 10 10Average 295 22 10 10

Table 20 Summary of scalability performance on QLACS

Row numbers Number of robots Time (sec) Number of states inspectedRobot 1 (R1) Robot 2 (R2) Robot 3 (R3) Robot 4 (R4)

1 2 100006 4 32 2 110007 4 33 2 80005 4 34 3 160009 1 3 35 3 17001 1 3 36 3 120006 1 3 37 4 110006 1 1 3 28 4 140006 1 1 3 29 4 10006 1 1 3 2

effectively The cooperative behavioural problem was nar-rowed down to two situations navigational and cooperativebehaviours ACS was used to establish the shortest optimalroute for two robots while the QL was used to achieveeffective cooperation decisions The results achieved afterintegrating the two algorithms showed a reduction in routecosts and computation times as shown in Figures 9 10 11 and12 The comparative analysis between QL ACS and QLACSproved that the last-named is more robust and effective inachieving cooperative behaviour for MRS in the proposeddomain

The result of this work will be used for future simulationof MRS applications in hazardous environments An indica-tion of expansion of this research area has conspicuously sur-faced in both real life implementations and model increaseThe model used in this work can also be improved upon byincreasing the state space

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The authors gratefully acknowledge the financial support ofthe Organisation for Women in Science for the DevelopingWorld (OWSDW) Lrsquooreal-Unesco for Women in Scienceand the resources made available by the University of Cape

Town (UCT) and University of South Africa (UNISA) SouthAfrica

References

[1] J Leitner ldquoMulti-robot cooperation in space a surveyrdquo inProceedings of the Advanced Technologies for Enhanced Qualityof Life (AT-EQUAL rsquo09) vol 37 pp 144ndash151 IEEE ComputerSociety Iasi Romania July 2009

[2] G Dudek M R M Jenkin E Milios and D Wilkes ldquoAtaxonomy for multi-agent roboticsrdquo Autonomous Robots vol 3no 4 pp 375ndash397 1996

[3] A Chella G Lo Re I Macaluso M Ortolani and D PerildquoChapter 1 A networking framework for multi-robot coordi-nationrdquo in Recent Advances in Multi Robot Systems A LazinicaEd pp 1ndash14 2008

[4] Yinka-Banjo C O Osunmakinde and I O Bagula AldquoAutonomous multi-robot behaviours for safety inspectionunder the constraints of underground mine Terrainsrdquo Ubiqui-tous Computing and Communication Journal vol 7 no 5 pp1316ndash1328 2012

[5] S Yarkan S Guzelgoz H Arslan and R Murphy ldquoUnder-ground mine communications a surveyrdquo IEEE Communica-tions Surveys amp Tutorials vol 11 no 3 pp 125ndash142 2009

[6] M Peasgood J McPhee and C Clark Complete and ScalableMulti-Robot Planning in Tunnel Environments Digitalcom-mons 2006

[7] F E Schneider D Wildermuth and M Moors ldquoMethods andexperiments for hazardous area activities using multi-robotsystemrdquo in Proceedings of the IEEE International Conference onRobotics and Automation pp 3559ndash3564 May 2004

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 20: Cooperative Behaviours with Swarm Intelligence in Multirobot

20 Mathematical Problems in Engineering

[8] R Zlot and A Stentz ldquoMarket-based multirobot coordinationfor complex tasksrdquo International Journal of Robotics Researchvol 25 no 1 pp 73ndash101 2006

[9] L E Parker ldquoCurrent research in multi-robot systemsrdquo inProceedings of the 7th International Symposium on Artificial Lifeand Robotics (ISAROB rsquo03) vol 7 pp 1ndash15 2003

[10] C Thorp and H Durrant-Whyte ldquoField robotsrdquo in Proceedingsof the 10th International Symposium on Robotics Research vol 6pp 329ndash3240 2003

[11] S R Teleka J J Green S Brink J Sheer and K HlopheldquoThe automation of the ldquoMaking Saferdquo process in South AfricanHard-Rock underground minesrdquo in International Journal ofEngineering and Advanced Technology (IJEAT rsquo12) vol 1 pp 1ndash7April 2012

[12] L Cragg and H Hu ldquoApplication of reinforcement learningto a mobile robot in reaching recharging station operationrdquo inIntelligent Production Machines and Systems pp 357ndash363 2005

[13] R A C Bianchi C H C Ribeiro and A H R Costa ldquoOnthe relation between ant colony optimization and heuristicallyaccelerated reinforcement learningrdquo in Proceedings of the 1stInternational Workshop on Hybrid Control of AutonomousSystem pp 49ndash55 2009

[14] E Bonabeau M Dorigo and G Theraulaz Swarm Intelligencefrom Natural to Artificial Systems Oxford University PressOxford UK 1999

[15] Y Liu and K M Passino Swarm Intelligence LiteratureOverview Department of Electrical Engineering University ofOhio 2000

[16] L Li A Martinoli and Y S Abu-Mostafa ldquoEmergent special-ization in swarm systemsrdquo in Intelligent Data Engineering andAutomated LearningmdashIDEAL 2002 vol 2412 of Lecture Notesin Computer Science pp 261ndash266 2002

[17] B Englot and F Hover ldquoMulti-goal feasible path planningusing ant colony optimizationrdquo in Proceedings of the IEEEInternational Conference on Robotics and Automation (ICRArsquo11) pp 2255ndash2260 May 2011

[18] Z N Naqiv H Matheru and K Chadha ldquoReview of ant colonyoptimization on vehice routing problems and introduction toestimated-based ACOrdquo in International Conference on Environ-ment Science and Engineering (IPCBEE rsquo11) vol 8 pp 161ndash1662011

[19] M Dorigo and T Stutzle ldquoThe ant colony optimization meta-heuristic algorithms applications and advancesrdquo Tech RepIRIDIA2000-32 2000

[20] R Claes andTHolvoe ldquoCooperative ant colony optimization intraffic route calculationsrdquo in Advances on Practical Applicationsof Agents and Multi-Agent Systems vol 155 of Advances inIntelligent and Soft Computing pp 23ndash34 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 21: Cooperative Behaviours with Swarm Intelligence in Multirobot

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of