grid computing: from old traces to new applications
DESCRIPTION
The Failure Trace Archive. Grid Computing: From Old Traces to New Applications. Alexandru Iosup , Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Catalin Dumitrescu, Mathieu Jan, Dick Epema. Parallel and Distributed Systems Group, TU Delft. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/1.jpg)
April 20, 20231
Grid Computing:From Old Traces to New Applications
Fribourg, Switzerland
Alexandru Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Catalin Dumitrescu, Mathieu Jan, Dick Epema
Parallel and Distributed Systems Group, TU Delft
Many thanks to our collaborators: U Wisc./Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, …
DGSimThe FailureTraceArchive
![Page 2: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/2.jpg)
Alexandru Iosuphttp://pds.twi.tudelft.nl/~iosup/
• Systems• The Koala grid scheduler• The Tribler BitTorrent-compatible P2P file-sharing• The POGGI and CAMEO gaming platforms
• Performance• The Grid Workloads Archive (Nov 2006)• The Failure Trace Archive (Nov 2009)• The Peer-to-Peer Trace Archive (Apr 2010)• Tools: DGSim trace-based grid simulator, GrenchMark
workload-based grid benchmarking
• Team of 15+ active collaborators in NL, AT, RO, US• Happy to be in Berkeley until September
April 20, 20232
![Page 3: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/3.jpg)
The Grid
An ubiquitous, always-on computational and data storage platform on which users can seamlessly run their (large-scale) applications
April 20, 20233
Shared capacity & costs, economies of scale
![Page 4: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/4.jpg)
April 20, 20234
The Dutch Grid: DAS System and Extensions
VU (85 nodes)
TU Delft (68) Leiden (32)
SURFnet6
10 Gb/s lambdas
UvA/MultimediaN (46)
UvA/VL-e (41)
• 272 AMD Opteron nodes 792 cores, 1TB memory• Heterogeneous: 2.2-2.6 GHz single/dual core nodes• Myrinet-10G (excl. Delft)• Gigabit Ethernet
DAS-4 (upcoming)• Multi-cores: general purpose, GPU, Cell, …
DAS-3: a 5-cluster grid
Clouds• Amazon EC2+S3, Mosso, …
![Page 5: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/5.jpg)
April 20, 20235
Many Grids Built
DAS, Grid’5000, OSG, NGS, CERN, …
Why grids and not The Grid?Why grids and not The Grid?
![Page 6: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/6.jpg)
April 20, 20236
Agenda
1. Introduction2. Was it the System?3. Was it the Workload?4. Was it the System Designer?5. New Application Types6. Suggestions for Collaboration7. Conclusion
![Page 7: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/7.jpg)
April 20, 20237
The Failure Trace ArchiveFailure and Recovery Events
20+ traces online
http://fta.inria.frhttp://fta.inria.fr
D. Kondo, B. Javadi, A. Iosup, D. Epema, The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems, CCGrid 2010 (Best Paper Award)
![Page 8: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/8.jpg)
Was it the System?
• No• System can grow fast• Good data and models to support system
designers• Yes• Grid middleware unscalable
[CCGrid06,Grid09,HPDC09]• Grid middleware failure-prone [CCGrid07,Grid07]• Grid resources unavailable [CCGrid10]• Inability to load balance well [SC|07]• Poor online information about resource availability
April 20, 20238
![Page 9: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/9.jpg)
April 20, 20239
Agenda
1. Introduction2. Was it the System?3. Was it the Workload?4. Was it the System Designer?5. New Application Types6. Suggestions for Collaboration7. Conclusion
![Page 10: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/10.jpg)
April 20, 202310
The Grid Workloads ArchivePer-Job Arrival, Start, Stop, Structure, etc.
6 traces online
http://gwa.ewi.tudelft.nlhttp://gwa.ewi.tudelft.nl
1.5 yrs >750K >250
A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, D. Epema, The Grid Workloads Archive, FGCS 24, 672—686, 2008.
![Page 11: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/11.jpg)
April 20, 202311
Grid Systems
How Are Real Grids Used?
Data Analysis and Modeling• Grids vs. parallel production environments such
as clusters and (small) supercomputers• Bags of single-processor tasks vs. single parallel jobs• Bigger bursts of job arrivals• More jobs
Parallel production environments
![Page 12: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/12.jpg)
April 20, 202312
Bags-of-Tasks (BoTs)
Grid Workloads
Analysis: Grid Workload Components
Time [units]
Workflows (WFs)
• BoT size = 2-70 tasks, most 5-20• Task runtime highly variable,
from minutes to tens of hours
• WF size = 2-1k tasks, most 30-40
• Task runtime of minutes
![Page 13: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/13.jpg)
Was it the Workload?
• No• Similar workload characteristics across grids• High utilization possible due to single-node jobs• High load imbalance• Good data and models to support system designers
[Grid06,EuroPar08,HPDC08-10,FGCS08]• Yes• Too many tasks (system limitation)• Poor online information about job characteristics +
High variability of job resource requirements • How to schedule BoTs, WFs, mixtures in grids?
April 20, 202313
![Page 14: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/14.jpg)
April 20, 202314
Agenda
1. Introduction2. Was it the System?3. Was it the Workload?4. Was it the System Designer?5. New Application Types6. Suggestions for Collaboration7. Conclusion
![Page 15: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/15.jpg)
April 20, 202315
Problems in Grid Scheduling and Resource Management
The System1. Grid schedulers do not own resources
themselves• They have to negotiate with autonomous local schedulers
• Authentication/multi-organizational issues
2. Grid schedulers interface to local schedulers• Some may have support for reservations, others are queuing-based
3. Grid resources are heterogeneous and dynamic• Hardware (processor architecture, disk space, network)
• Basic software (OS, libraries)
• Grid software (middleware)
• Resources may fail
• Lack of complete and accurate resource information
![Page 16: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/16.jpg)
April 20, 202316
Problems in Grid Scheduling and Resource Management
The Workloads
4. Workloads are heterogeneous and dynamic• Grid schedulers may not have control over the full
workload (multiple submission points)
• Jobs may have performance requirements
• Lack of complete and accurate job information
5. Application structure is heterogeneous• Single sequential job
• Bags of Tasks; parameter sweeps (Monte Carlo), pilot jobs
• Workflows, pipelines, chains-of-tasks
• Parallel jobs (MPI); malleable, coallocated
![Page 17: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/17.jpg)
April 20, 202317
The Koala Grid Scheduler
• Developed in the DAS system• Has been deployed on the DAS-2 in September 2005• Ported to DAS-3 in April 2007• Independent from grid middlewares such as Globus• Runs on top of local schedulers
• Objectives:• Data and processor co-allocation in grids• Supporting different application types• Specialized application-oriented scheduling policies
Koala homepage: http://www.st.ewi.tudelft.nl/koala/
![Page 18: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/18.jpg)
April 20, 202318
Koala in a Nutshell
• Parallel Applications• MPI, Ibis,…• Co-Allocation• Malleability
• Parameter Sweep Applications• Cycle Scavenging• Run as low-priority jobs
• Workflows
A bridge between theory and practice
![Page 19: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/19.jpg)
Euro-Par 2008, Las Palmas, 27 August 200819
Inter-Operating Grids Through Delegated MatchMaking
Inter-Operation Architectures
Hybrid hierarchical/ decentralize
d
Decentralized
Hierarchical
Independent
Centralized
Delegated MatchMakin
g
![Page 20: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/20.jpg)
April 20, 202320
Inter-Operating Grids Through Delegated MatchMaking
The Delegated MatchMaking Mechanism
1. Deal with local load locally (if possible)2. When local load is too high, temporarily bind resources from
remote sites to the local environment. • May build delegation chains. • Delegate resource usage rights, do not migrate jobs.
3. Deal with delegations each delegation cycle (delegated matchmaking)
Delegate
Local load too high
Resource request
Resource usage rights
Bind remote resource
The Delegated MatchMaking Mechanism=The Delegated MatchMaking Mechanism=Delegate Resource Usage Rights, Delegate Resource Usage Rights,
Do Not Delegate JobsDo Not Delegate Jobs
![Page 21: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/21.jpg)
April 20, 202321
• DMM• High goodput• Low wait time• Finishes all jobs
• Even better for load imbalance between grids
• Reasonable overhead• [see thesis]
What is the Potential Gain of Grid Inter-Operation?
Delegated MatchMaking vs. Alternatives
Independent
Centralized
Decentralized
DMM
(Higher is better)
Grid Inter-Operation (through DMM)Grid Inter-Operation (through DMM)delivers good performancedelivers good performance
![Page 22: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/22.jpg)
April 20, 202322
4.2. Studies on Grid Scheduling [5/5]
Scheduling under Cycle Stealing
Scheduler
CS-Runner
Node
submits PSA(s)
JDF
grow/shrink
messagesregisters
Clusters
Launcher
Launcher
Head Node
KCMKCM
submitslaunchers
deploys, monitors,
and preempts
tasks
monitors/informs
idle/demanded resources
CS Policies:• Equi-All: grid-wide basis • Equi-PerSite: per cluster
CS Policies:• Equi-All: grid-wide basis • Equi-PerSite: per cluster
Application Level Scheduling:• Pull-based approach• Shrinkage policy
Application Level Scheduling:• Pull-based approach• Shrinkage policy
Launcher
Launcher
O. Sonmez, B. Grundeken, H. Mohamed, A. Iosup, D. Epema: Scheduling Strategies for Cycle Scavenging in Multicluster Grid Systems. CCGRID 2009: 12-19
Requirements1.Unobtrusiveness Minimal delay for (higher priority) local and grid jobs
2.Fairness3. Dynamic Resource Allocation4. Efficiency5. Robustness and Fault
Tolerance
Deployed as Koala Runner
![Page 23: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/23.jpg)
Was it the System Designer?
• No• Mechanisms to inter-operate grids: DMM [SC|07], …• Mechanisms to run many grid application types:
WFs, BoTs, parameter sweeps, cycle scavenging, …• Scheduling algorithms with inaccurate information
[HPDC ‘08, ‘09, ‘10]• Tools for empirical and trace-based experimentation
• Yes• Still too many tasks• What about new application types?
April 20, 202323
![Page 24: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/24.jpg)
April 20, 202324
Agenda
1. Introduction2. Was it the System?3. Was it the Workload?4. Was it the System Designer?5. New Application Types6. Suggestions for Collaboration7. Conclusion
![Page 25: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/25.jpg)
Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming 25
MSGs are a Popular, Growing Market
• 25,000,000 subscribed players (from 150,000,000+ active)
• Over 10,000 MSGs in operation
• Market size 7,500,000,000$/year
Sources: MMOGChart, own research. Sources: ESA, MPAA, RIAA.
![Page 26: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/26.jpg)
Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming 26
Massively Social Gaming as New Grid/Cloud Application
1. Virtual worldExplore, do, learn, socialize, compete+
2. ContentGraphics, maps, puzzles, quests, culture+
3. Game analyticsPlayer stats and relationshipsRomeo and
Juliet
Massively Social Gaming(online) games with massive numbers of players (100K+), for which social interaction helps the gaming experience
[SC|08, TPDS’10]
[EuroPar09
BPAward, CPE10]
[ROIA09]
![Page 27: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/27.jpg)
April 20, 202327
Suggestions for CollaborationSuggestions for Collaboration
• Scheduling mixtures of grid/HPC/cloud workloads• Scheduling and resource management in practice• Modeling aspects of cloud infrastructure and workloads
• Condor on top of Mesos
• Massively Social Gaming and Mesos• Step 1: Game analytics and social network analysis in Mesos
• The Grid Research Toolbox• Using and sharing traces: The Grid Workloads Archive and
The Failure Trace Archive• GrenchMark: testing large-scale distributed systems• DGSim: simulating multi-cluster grids
![Page 28: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/28.jpg)
April 20, 202328
Alex Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Dick Epema
Thank you! Questions? Observations?
More Information:• The Koala Grid Scheduler: www.st.ewi.tudelft.nl/koala
• The Grid Workloads Archive: gwa.ewi.tudelft.nl
• The Failure Trace Archive: fta.inria.fr
• The DGSim simulator: www.pds.ewi.tudelft.nl/~iosup/dgsim.php
• The GrenchMark perf. eval. tool: grenchmark.st.ewi.tudelft.nl
• Cloud research: www.st.ewi.tudelft.nl/~iosup/research_cloud.html
• Gaming research: www.st.ewi.tudelft.nl/~iosup/research_gaming.html
• see PDS publication database at: www.pds.twi.tudelft.nl/
email: [email protected]
Many thanks to our collaborators: U. Wisc.-Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, …
DGSim
![Page 29: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/29.jpg)
Additional Slides
April 20, 202329
![Page 30: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/30.jpg)
April 20, 202330
The 1M-CPU Machine with Shared Resource Ownership
• The 1M-CPU machine• eScience (high-energy physics, earth sciences,
financial services, bioinformatics, etc.)
• Shared resource ownership• Shared resource acquisition• Shared maintenance and operation• Summed capacity higher (more efficiently used) than
sum of individual capacities
![Page 31: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/31.jpg)
April 20, 202331
How to Build the 1M-CPU Machine with Shared Resource Ownership?• Clusters of resources are ever more
present• Top500 SuperComputers: cluster systems from 0% to
75% share in 10 years (also from 0% to 50% performance)
• CERN WLCG: from 100 to 300 clusters in 2½ years
Source: http://www.top500.org/overtime/list/29/archtyp
e/
Source: http://goc.grid.sinica.edu.tw/gstat//table.html
![Page 32: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/32.jpg)
April 20, 202332
How to Build the 1M-CPU Machine with Shared Resource Ownership?Cluster size distribution over time, Top500, 1997-2007
1
10
100
1000
10000
100000
Nov-97
May-98
Nov-98
May-99
Nov-99
May-00
Nov-00
May-01
Nov-01
May-02
Nov-02
May-03
Nov-03
May-04
Nov-04
May-05
Nov-05
May-06
Nov-06
May-07
Date
Clu
ster
siz
e
Median Average Q1 Q3 Max
Median: 10x
Averge: 20x
Max:100x
Last 10 years
Data source: http://www.top500.org
Last 4 years
Now:0.5x/yr
To build the 1M-CPU cluster:To build the 1M-CPU cluster:- At last 10 years rate, another 10 years- At last 10 years rate, another 10 years
- At current rate, another 200 years- At current rate, another 200 years
![Page 33: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/33.jpg)
April 20, 202333
How to Build the 1M-CPU Machine with Shared Resource Ownership?• Cluster-based Computing Grids
• CERN’s WLCG cluster size over time
Median: +5 procs/yr
Avg: +15 procs/yr
Max: 2x/yrShared clusters grow on average Shared clusters grow on average
slower than Top500 cluster systems!slower than Top500 cluster systems!
Data source: http://goc.grid.sinica.edu.tw/gstat/
Year 1 Year 2
![Page 34: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/34.jpg)
April 20, 202334
How to Build the 1M-CPU Machine with Shared Resource Ownership?
• Physics• Dissipate heat from large clusters
• Market• Pay industrial power consumer rate, pay special system
building rate• Collaboration
• Who pays for the largest cluster?
• We don’t know how to exploit multi-cores yet• Executing large batches of independent jobs
Why doesn’t CERN WLCG use larger clusters?Why doesn’t CERN WLCG use larger clusters?
Why doesn’t CERN WLCG opt for multi-cores?Why doesn’t CERN WLCG opt for multi-cores?
![Page 35: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/35.jpg)
April 20, 202335
• Selected Findings• Batches predominant in grid workloads; up to 96%
CPUTime
• Average batch size (Δ≤120s) is 15-30 (500 max)• 75% of the batches are sized 20 jobs or less
4.1. Grid Workloads [2/5]
BoTs are predominant in grids
A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, The Characteristics and Performance of Groups of Jobs in Grids, Euro-Par, LNCS, vol.4641, pp. 382-393, 2007.
Grid’5000 NorduGrid GLOW (Condor)
Submissions
26k 50k 13k
Jobs 808k (951k)
738k (781k) 205k (216k)
CPU time 193y (651y)
2192y (2443y)
53y (55y)
![Page 36: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/36.jpg)
Euro-Par 2008, Las Palmas, 27 August 200836
System Availability CharacteristicsResource Evolution: Grids Grow by Cluster
![Page 37: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/37.jpg)
April 20, 202337
System Availability CharacteristicsGrid Dynamics: Grids Shrink Temporarily
Grid-level view
Average availability:
69%
![Page 38: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/38.jpg)
April 20, 202338
Resource Availability Model
• Assume no correlation of failure occurrence between clusters
• Which site/cluster? • fs, fraction of failures at cluster s
MTBF MTTR Correl.
• Weibull distribution for IAT• the longer a node is online, the higher the chances that it
will fail
![Page 39: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/39.jpg)
April 20, 202339
Grid Workloads
Load Imbalance Across Sites and Grids
• Overall workload imbalance: normalized daily load (5:1)
• Temporary workload imbalance: hourly load (1000:1)
Overall imbalanc
e Temporary imbalance
![Page 40: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/40.jpg)
April 20, 202340
• Adapted to grids: percentage parallel jobs, other values.• Validated with 4 grid and 7 parallel production env. traces
4.1. Grid Workloads [4/5]
Modeling Grid Workloads: Feitelson adapted
A. Iosup, T. Tannenbaum, M. Farrellee, D. Epema, M. Livny: Inter-operating grids through Delegated MatchMaking. SC|07 (Nominated for Best Paper Award)
![Page 41: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/41.jpg)
April 20, 202341
• Single arrival process for both BoTs and parallel jobs• Reduce over-fitting and complexity of “Feitelson adapted”
by removing the RunTime-Parallelism correlated model• Validated with 7 grid workloads
Grid Workloads
Modeling Grid Workloads: adding users, BoTs
A. Iosup, O. Sonmez, S. Anoep, and D.H.J. Epema. The Performance of Bags-of-Tasks in Large-Scale Distributed Systems, HPDC, pp. 97-108, 2008.
![Page 42: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/42.jpg)
April 20, 202342
How To Compare Existing and New Grid Systems?
The Delft Grid Simulator (DGSim)
DGSimDGSim……tudelft.nl/~iosup/dgsim.phptudelft.nl/~iosup/dgsim.php
Discrete event generator
Generate realistic workloads
Automate simulation process (10,000s of tasks)
![Page 43: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/43.jpg)
April 20, 202343
How to Inter-Operate Grids?
Existing (, Working?) Alternatives
Independent Centralized
HierarchicalDecentralized
Condor
Globus GRAM Alien
Koala
OAR
CCS
Moab/Torque
OAR2
NWIRE
OurGrid
Condor Flocking
Load imbalance? Resource selection? Scale?
Root ownership?
Node failures?
Accounting?Trust? Scale?
![Page 44: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/44.jpg)
April 20, 202344
3
3
3
333
2
Inter-Operating Grids Through Delegated MatchMaking [1/3]
The Delegated MatchMaking Architecture
1. Start from a hierarchical architecture2. Let roots exchange load3. Let siblings exchange load
Delegated MatchMaking Architecture=Delegated MatchMaking Architecture=Hybrid hierarchical/decentralized Hybrid hierarchical/decentralized
architecture for grid inter-operationarchitecture for grid inter-operation
![Page 45: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/45.jpg)
Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming 45
Massively Social Gaming on Clouds
Current TechnologyCurrent Technology
The FutureThe Future• Happy players• Happy cloud operators
• Million-user, multi-bn market
• Content, World Sim, Analytics
MSGsMSGs
• Upfront payment• Cost and scalability
problems• Makes players unhappy
Our VisionOur Vision
• Scalability & Automation• Economy of scale with
cloudsOngoing WorkOngoing Work
• Content: POGGI Framework
• Platform: edutain@grid• Analytics: CAMEO
Framework
Publications Gaming and Clouds
2008: ACM SC, TR Perf
2009: ROIA, CCGrid, NetGames,
EuroPar (Best Paper Award),
CloudComp, TR variability
2010: IEEE TPDS, Elsevier CCPE
2011: Book Chapter CAMEO
Graduation Forecast
2010/2011: 1PhD, 2Msc, 4BSc
![Page 46: Grid Computing: From Old Traces to New Applications](https://reader035.vdocuments.mx/reader035/viewer/2022081421/56813d04550346895da6a8e3/html5/thumbnails/46.jpg)
April 20, 202346
6. Clouds• Large-scale, loosely coupled infrastructure and/or
platform• Computation and storage has fixed costs (?)• Guaranteed good performance, e.g., no wait time (?)• Easy to port grid applications to clouds (?)
7. Multi-cores• Small- and mid-scale, tightly-coupled infrastructure• Computation and storage has lower cost than grid (?)• Good performance (?)• Easy to port grid applications to multi-cores (?)
Problems in Grid Scheduling and Resource Management
New Hypes, New Focus for Designers