the performance of bags-of-tasks in large-scale distributed computing systems

The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems

Alexandru Iosup, Ozan Sonmez, Shanny Anoep, and Dick Epema

ACM/IEEE Int’l. Symposium on High Performance Distributed Computing

Parallel and Distributed Systems Group, TU Delft

2

The VL-e project

• A grid project in the Netherlands (2004-)

• Natural gas money: VL-e 45 MEuro / 800 MEuro total research package

• Overall aim: … to design and build a virtual lab for

(digitally) enhanced science (e-science) experiments (no in-vivo or in-vitro, but in-silico experiments).

• Goals:1. create prototypes of application-specific e-science

environments

2. design and develop re-usable ICT/grid components

3. validate with real-life applications in testbeds

Natural gas price →

$$ for grid computing

3

The VL-e project: application areas

Grid ServicesHarness multi-domain distributed resources

Managementof comm. & computing

Virtual Laboratory (VL)Application Oriented Services

Data Intensive Science

Bio-Diversity

Bio-Informatics

Food Informatics

Medical Diagnosis &

Imaging

Dutch Telescience

Philips UnileverIBM

4






Bio-Diversity

Bio-Informatics

Food Informatics

Medical Diagnosis &

Imaging

Dutch Telescience

Philips UnileverIBM

Bags-of-Tasks

5






Bio-Diversity

Bio-Informatics

Food Informatics

Medical Diagnosis &

Imaging

Dutch Telescience

Philips UnileverIBM

Bags-of-Tasks

6

The Challenge

• Complete scientific work better, … • User-oriented performance metrics

(time a critical performance component)• Bags-of-tasks for ease-of-use

• … in real systems• Workloads (now that real traces are available)• Information unavailability

• What to do?• Hint: the next 10% improvement won’t cut it!

7

The Challenge (cont’d.)

• System modelWhat is a good model for the study of large-scale distributed computing systems that run bag-of-tasks?

• Input modelWhat is a good model for bag-of-tasks workloads in large-scale distributed computing systems?

• What is the best setup for such system/input?• How to find the best?• If a best is found, can there be another?

8

The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems

1. Introduction and Motivation 2. Context: System Model3. Workload Model4. Design Space Exploration5. Conclusion

9

Context: System Model [1/4]

Overview

• System Model1. Clusters

execute jobs

2. Resource managerscoordinate job execution

3. Resource management architecturesroute jobs among resource managers

4. Task selection policiescreate the eligible set

5. Task scheduling policies:schedule the eligible set

10


Resource Management Architecturesroute jobs among resource managers

Separated Clusters (sep-c)

Centralized (csp)

Decentralized (fcondor)

11


Task Selection Policiescreate the eligible set

• Age-based:1. S-T: Select Tasks in the order of their arrival.

2. S-BoT: Select BoTs in the order of their arrival.

• User priority based:3. S-U-Prio: Select the tasks of the User with the highest

Priority.

• Based on fairness in resource consumption:4. S-U-T: Select the Tasks of the User with the lowest res. cons.

5. S-U-BoT: Select the BoTs of the User with the lowest res. cons.

6. S-U-GRR: Select the User Round-Robin/all tasks for this user.

7. S-U-RR: Select the User Round-Robin/one task for this user.

12


Task Scheduling Policiesschedule the eligible set

• Information availability:• Known• Unknown• Historical records

• Sample policies:• Earliest Completion Time (with

Prediction of Runtimes) (ECT(-P))• Fastest Processor First (FPF)• (Dynamic) Fastest Processor Largest Task ((D)FPLT)• Shortest Task First w/ Replication (STFR) • Work Queue w/ Replication (WQR)

Task Information

Reso

urc

e

Info

rmati

on

K H U

K

H

U

ECT, FPLT

FPFECT-P

DFPLT,

MQDSTFR

RR, WQR

13



14

Workload Modeling 101: What Matters• Job arrival process & job service time:

• Self-similarity (burstiness) vs. Poisson [Leland & Ott ToN’94]

• Job grouping: bags-of-tasks dominant application type in multi-cluster grids and cycle-scavenging systems (the e-Science infrastructure) [IosupJSE EuroPar’07]

• Job size: almost always 1 CPU [IosupDELW Grid’06]

No.

Pac

kets

/T

ime

Uni

tN

o.P

acke

ts/

Tim

e U

nit

Time Units Time Units

Longer queues

TimeUnit=

0.01s

TimeUnit=

100s

15

• Model:• Users, Bags-of-Tasks, Tasks• Heavy-tailed distributions for inter-arrival time, job

service time→ can model self-similar workloads

• More details (e.g., parameter values): see article

• Validation data: the Grid Workloads Archive• 7 long-term grid traces• >5 million tasks• >2500 users• >40k CPUs• Domains: HEP, graphics, AI, math, biomed, climate,

finance, aero…

A Bag-of-Tasks Workload Model

http://gwa.ewi.tudelft.nl/

16



17

Design Space Exploration [1/5]

Overview

• Design space exploration: time to understand how our solutions fit into the complete system.

• Study the impact of:• The Task Scheduling Policy (s policies)• The Workload Characteristics (P characteristics)• The Dynamic System Information (I levels)• The Task Selection Policy (S policies)• The Resource Management Architecture (A policies)

s x 7P x I x S x A x (environment) → >2M design points

18

Design Space Exploration [2/5]

Experimental Setup

• Simulator: • DGSim [IosupETFL SC’07, IosupSE EuroPar’08]

• System:• DAS + Grid’5000 [Cappello & Bal CCGrid’07]• >3,000 CPUs: relative perf. 1-1.75

• Metrics:• Makespan• Normalized Schedule Length ~ speed-up

• Workloads:• Real: DAS + Grid’5000• Realistic: system load 20-95% (from workload

model)

19

Design Space Exploration [3/5] Selected Results A

Design Guidelines for Scheduling Policies

• Influence of the information type:• (K,K): best balance between MS and NSL• (*,U),(U,*): surprisingly good (FPF) to surprisingly poor

(WQR4x)

• (*,H),(H,*): poor. Simple runtime predictors don’t work (see article)

• Where to invest time? • K -> H, K-> U: adapt for information type with lowest

variationWQR4x

FPF

20

Design Space Exploration [4/5] Selected Results B

Task Selection Only for Busy Systems• Not much difference until system load over

50%.• For DAS + Grid’5000 no change of task selection policy.

Same performanc

e

S-BoT

S-T

21

Design Space Exploration [5/5] Selected Results C

Resource Management Architecture• Centralized, separated, or distributed?

• Centralized is best [Note: job overhead not considered.]• Distributed: good for system load below 50%;

over 50% it does not finish all tasks.

22



• System Model = Resource Management

Architecture +

Task Selection Policy +

Task Scheduling Policy

• Information availability framework

• BoT workload model

• Design space exploration:

the performance of bags-of-tasks

Conclusion

• Better predictors

• (H,H) task scheduling policies

Task Information

Reso

urc

e

Info

rmati

on

K H U

K

H

U

ECT, FPLT

FPFECT-P

DFPLT,

MQDSTFR

RR, WQR

Future Work ?

24

Thank you! Questions? Remarks? Observations?

Help building the Grid Workloads Archive:

http://gwa.ewi.tudelft.nl

• Contact: [email protected] [google “Iosup“]

• Web sites:o http://www.vl-e.nl : VL-e project

o http://www.pds.ewi.tudelft.nl : PDS group articles & software

the performance of bags-of-tasks in large-scale distributed computing systems

Documents

user priority

user roundrobinall tasks

tasks workloads

tasksthe vle project

good model

input model

grid computingthe vle

performance of bags