a multi-agent system approach to load-balancing and resource allocation for distributed computing
TRANSCRIPT
![Page 1: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/1.jpg)
A Multi-Agent System Approach toLoad Balancing and Resource Allocation
for Distributed Computing
Soumya Banerjee & Joshua Hecker
![Page 2: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/2.jpg)
� Age of distributed computing
� Trend in moving computation on inexpensive but geographically distributed computers
� SETI@home, LHC@home
� Need for efficient allocation algorithms
Motivation
![Page 3: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/3.jpg)
Decentralized Computing
� Can alleviate computing load on centralized monitors
� Robust to single-point failures
� Can achieve application-level resource management (nodes can manage resources better than a global monitor)
� Can scale more gracefully since as the system grows; centralized monitor has to communicate with more and more nodes
� Can better respond to fluctuations in process requirements
� Scenario where it has to "forget" past process requirements and completely rebuild new clusters after servicing one process i.e. no locality
![Page 4: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/4.jpg)
� An agent is a computing node; join together to form a cluster
� Multi-agent systems have emergent properties
� Have been used to model biological phenomenon and real-life problems (left: Keepaway soccer, right: Ant foraging):
Multi-Agent Systems
![Page 5: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/5.jpg)
� A huge number of distributed nodes or agents
� Advantages to computing with geographically proximal computers due to network latency, bandwidth limitations, etc
� There is a global data structure which has a large number of tasks/processes
� A new process that comes in the system will declare a priori the number of threads that it can be parallelized into and its resource requirements (CPUreq)
� Cluster as a network of computers which together can completely service the resource requirements of a single task
� Over time clusters would be created, dissolved and created again dynamically in order to serve the resource requirements of the tasks in the queue
Problem Statement and Assumptions
![Page 6: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/6.jpg)
� dRAP: Distributed Resource Allocation Procedure
� Mode 1: an agent/node that is currently not part of a cluster and has no task assigned to it
1. agent looks at queue Q, examines unallocated tasks and takes on the task which minimizes
� Mode 2: an agent/node that is currently not part of a cluster and has a task assigned to it
1. keep on executing task
2. if the task requirements are not completely satisfied, i.e., keep on querying your neighbors and try to
form a cluster such that
3. when task completes, go to Mode 1
dRAP Algorithm
|1| −reqCPU
1>reqCPU
CPU req = CPU cluster
![Page 7: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/7.jpg)
� Mode 3: an agent/node that is currently part of a cluster and has no task assigned to it
1. agent looks at queue Q, examines unallocated tasks and takes on the task which minimizes
� Mode 4: an agent/node that is currently part of a cluster and has a task assigned to it
1. keep on executing task2. when task completes, breakup cluster and go to Mode 1
dRAP Algorithm
|CPUreq −CPUcluster |
![Page 8: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/8.jpg)
� Caveat: Task list traversal requires O(nm) time per timestep, where n = number of tasks and m = number of clusters
� For entire simulation:
� Compare to FIFO scheduling - drops to O(nm)
� Does our algorithm’s increased complexity per timestep provide enough decrease in scheduling rate to be effective?
dRAP Algorithm
)()( 2
0
mnOminn
i
≈−∑=
![Page 9: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/9.jpg)
� Example screenshots of implementation (lines show clusters, redsymbolizes task execution):
Simulation
![Page 10: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/10.jpg)
� Example screenshots of implementation (lines show clusters, redsymbolizes task execution):
Simulation
![Page 11: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/11.jpg)
� Example screenshots of implementation (lines show clusters, redsymbolizes task execution):
Simulation
![Page 12: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/12.jpg)
� Comparisons with a null model (FIFO scheduling algorithm)
� Time to empty queue (of 1000 tasks) = Tcomplete
� Average waiting time (averaged over 1000 tasks) = Twait
� Values given in simulation time steps:
Experiments
Tcomplete Twait
RAP 845.60 342.54
FIFO 1071.20 475.31
![Page 13: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/13.jpg)
� Utilization experiments
� We compared the cluster utilization ability of our algorithm vs. the FIFO scheduling algorithm
� Calculation for each task: (averaged over total number of tasks)
� Optimal value is 100% (our algorithm always achieves this):
Experiments
Utilization
RAP 100%
FIFO 56%
cluster
req
Nodes
Nodes
![Page 14: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/14.jpg)
� Lastly we looked at how the average waiting time and time to completion scaled with the number of nodes in the system
Experiments
0
400
800
1200
1600
2000
0 200 400 600
T co
mp
lete
Nodes
Scaling of Tcomplete
0
200
400
600
800
0 200 400 600
T wai
t
Nodes
Scaling of Twait
![Page 15: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/15.jpg)
� Same data using log2 on axes and a power curve fit:
Experiments
y = 63630x-0.927
R² = 0.9976
128
256
512
1024
2048
40 80 160 320 640
T co
mp
lete
(lo
g2)
Nodes (log2)
Scaling of Tcomplete
y = 47010x-1.075
R² = 0.9992
64
128
256
512
1024
40 80 160 320 640
T wai
t(lo
g2)
Nodes (log2)
Scaling of Twait
![Page 16: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/16.jpg)
Optimizations Inspired by the Natural Immune System
• Operates under constraints of physical space
• Resource constrained (metabolic input, number of immune system cells)
• Performance scalability is an important concern (mice to horses)(Banerjee and Moses, 2010, in review)
![Page 17: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/17.jpg)
Search Problem
• They have to search throughout the whole body to locate small quantities of pathogens
![Page 18: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/18.jpg)
Response Problem
• Have to respond by producing antibodies
![Page 19: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/19.jpg)
Nearly Scale-Invariant Search and Response
• How does the immune system search and respond in almost the same time irrespective of the size of the search space?
![Page 20: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/20.jpg)
Crivellato et al. 2004
Solution?
![Page 21: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/21.jpg)
Lymph Nodes (LN)
• A place in which IS cells and the pathogen can encounter each other in a small volume
• Form a decentralized detection network
![Page 22: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/22.jpg)
Decentralized Detection Network
www.lymphadvice.com
![Page 23: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/23.jpg)
Lymph Node Dynamics
![Page 24: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/24.jpg)
Lymph Node Dynamics
![Page 25: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/25.jpg)
Lymph Node Dynamics
![Page 26: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/26.jpg)
![Page 27: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/27.jpg)
![Page 28: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/28.jpg)
![Page 29: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/29.jpg)
![Page 30: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/30.jpg)
Summary
• There are increasing costs to global communication as organisms grow bigger
• Semi-modular architecture balances the opposing goals of detecting pathogen (local communication) and recruiting IS cells (global communication)
• Can we emulate this modular RADAR strategy in distributed systems?
![Page 31: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/31.jpg)
Optimizations inspired by the immune system
![Page 32: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/32.jpg)
Optimizations inspired by the immune system
![Page 33: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/33.jpg)
� The move towards distributed computing necessitates efficient scheduling algorithms
� Decentralized scheduling of large number of nodes leads to robustness, reduces load on centralized monitor and better response to fluctuations in task queue requirements
� Multi-agent systems have emergent properties and have been used here to adaptively create and allocate clusters to match task demand
� The algorithm outperforms our null model (FIFO scheduling) on average waiting time, time to empty task queue and utilization
� Further, our algorithm is robust to adversarial attack (task queue fluctuations in task processor requirements)
Conclusions
![Page 34: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/34.jpg)
� Value of immune system inspired approaches
� General theory of scaling of artificial immune systems
Conclusions
![Page 35: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/35.jpg)
� Compare with more null algorithms
� Compare with algorithms used in industry e.g. SLURM uses static allocation of nodes to clusters known as partitions
� Compare with cluster allocation algorithm used by Google in MapReduce (this algorithm can improve on their locality optimization since it seeks to form clusters with its neighbors)
� … and sell to the highest bidder!
Future Work
![Page 36: A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing](https://reader033.vdocuments.mx/reader033/viewer/2022042722/58ab85a51a28ab3e738b5cc5/html5/thumbnails/36.jpg)
� Dr. Dorian Arnold
Acknowledgements