spequlos: a qos service for bot applications using best effort distributed computing infrastructures

SpeQuloS: A QoS Service for BoT Applications UsingBest Effort Distributed Computing Infrastructures

Simon Delamare 1 Gilles Fedak 2 Derrick Kondo 3 Oleg Lodygensky 4

1LIP/CNRS, Univ. Lyon, France

2LIP/INRIA, Univ. Lyon, France

3LIG/INRIA, Univ. Grenoble, France

4LAL/CNRS, Univ. Paris XI, France

High-Performance Parallel and Distributed Computing, 2012

S. Delamare, G. Fedak, D. Kondo and O. Lodygensky (LIP/CNRS, Univ. Lyon, France, LIP/INRIA, Univ. Lyon, France, LIG/INRIA, Univ. Grenoble, France, LAL/CNRS, Univ. Paris XI, France)SpeQuloS HPDC’12 1 / 18

Introduction

BE-DCI = “Best-Effort” Distributed Computing Infrastructure

→ Large computing power at low cost, Avoid wasting resources→ No availability guarantee

Desktop Grids

→ BOINC projects: Peta FLOPS for free

Grids used in Best-Effort mode

→ ≈ 40% of utilization in Grid5000@Lyon

Cloud “Spot” Instances

→ c1.large instance price: 0.12$/h (spot) vs. 0.32$/h (regular)

Relevant for BoT execution ...I Bag of Tasks: Set of independent tasks to compute

→ but Low QoS levelI Especially compared to regular infrastructures


Performance Problem AddressedBoT completion rate increases at the end of execution→ Tail Effect

0

0.2

0.4

0.6

0.8

1

1.2

0 20 40 60 80 100

Bo

T c

om

ple

tio

n r

atio

Time

Continuation is performed at 90% of completion

Ideal Time Actual Completion Time

Tail Duration

Slowdown = (Tail Duration + Ideal Time) / Ideal Time

BoT completionTail part of the BoT

Measured by Slowdown:

S =IdealCompletionTime

RealCompletionTime


Slowdown by Tail Effect

Slowdown reported on BoT execution

0

0.2

0.4

0.6

0.8

1

0.1 1 10 100

Fra

ctio

n o

f ex

ecu

tio

n w

her

e ta

il s

low

do

wn

< S

Tail Slowdown S (Completion time observed divided by ideal completion time)

BOINC

XWHEP

I Best 50% ⇒ S < 1.3

I 25% to 33% ⇒ S > 2

I Worst 5% ⇒ S> 4 to 10

Avg. % of BoT in tail Avg. % of time in tail

BE-DCI Trace BOINC XWHEP BOINC XWHEPDesktop Grids 4.65 5.11 51.8 45.2

Best Effort Grids 3.74 6.40 27.4 16.5

Spot Instances 2.94 5.19 22.7 21.6

→ Caused by no more than the last 7% ofBoT


How to improve the situation ?

Better scheduling

QoS in Grid scheduling ([12], [20], [38])

→ Require heavy modification of middleware→ No satisfactory solution for unreliable infrastructure ([7])

Addressing the tail effect

→ e.g. in MapReduce ([3], [39]), but require precise information from computenodes, hard in large DCIs.

Building Hybrid DCIs

Grid & Desktop Grid ([35],[36])

→ Mostly to offload Grid usage

Using Cloud computing ([10],[28],[37])

→ To address peak demands


SpeQuloS Service

→ Improving BE-DCIs users perceived QoSI Speeding up BoT executionI Bring information on expected BoT execution time

By dynamic provision of Cloud resources

→ Monitoring BoT execution→ Execute the tail on Cloud

Features:1 Our context: Existing BE-DCIs and Clouds, not administrator: Black Boxes2 Interface with users: QoS requests, State of completion, Prediction on

remaining time3 Careful utilization of Cloud resources w/ Billing & Accounting of usage


FrameworkSpeQuloS modules:

Information: Collect QoS-relatedinformation from DGs

Oracle: Strategies to appropriatelyuse Cloud resources / QoSprediction for users

Scheduler: Start/Stop Cloudresources, usage accounting

Credit System: Bill Cloud usage touser, using “credits” to buy Cloudresource cpu.h

Implementation

Independant modules using Python & MySQL

Supported Clouds: EC2, OpenNebula, etc.

Supported DG middleware: BOINC & XtremWeb-HEP


Cloud Provisioning Strategies

When to start Cloud resources ?I At 90% of BoT completion (9C)I At 90% of BoT assignment (9A)I When Tail appear, by monitoring execution time variance (V)

How many Cloud resources to start (for a given amount of Credits) ?I Greedy: As much as possible, for 1 hour of cloud usage (G)I Conservative: To ensure that there will be enough credits to run Cloud up to

an estimated completion time (C)

How to use Cloud resources ?I Flat: Cloud worker not differentiated from BE-DCI workers (F)I Reschedule : Scheduler reshedule tasks executed on BE-DCI to Cloud (R)I Cloud Duplication : Uncompleted tasks are duplicated to a dedicated Cloud

infrastructure (D)


Experimentation Setup (1)

Simulations using real BE-DCI infrastructures availability traces, various BoTworkloads, BOINC and XWEP middleware

BE-DCIs availability traces :I Desktop Grids: seti, nd (SETI@Home & NotreDame traces from FTA)I Best Effort Grids: g5klyo, g5kgre (Available ressources in Grid5000 Lyon &

Grenoble clusters in December 2010)I Cloud Spot instances: spot10, spot100 (Maximum number of instances for a

renting cost of 10 or 100 $ per hour, fluctuates according to market price)

trace length mean deviation min max av. quartiles (s) unav. quartiles (s) avg. power power(days) (nops/s) std. dev.

seti 120 24391 6793 15868 31092 61,531,5407 174,501,3078 1000 250nd 413.87 180 4.129 77 501 952,3840,26562 640,960,1920 1000 250

g5klyo 31 90.573 105.4 6 226 21,51,63 191,236,480 3000 0g5kgre 31 474.69 178.7 184 591 5,182,11268 23,547,6891 3000 0

spot10 90 82.186 3.814 29 87 4415,5432,17109 4162,5034,9976 3000 300spot100 90 823.95 4.945 196 877 1063,5566,22490 383,1906,10274 3000 300


Experimentation Setup (2)

BoT workloads:

Size nops / task Arrival timeSMALL 1000 3600000 0BIG 10000 60000 0

RANDOM norm(µ = 1000, σ2 = 200) norm(µ = 60000, σ2 = 10000) weib(λ = 91.98, k = 0.57)

Simulations methodology:I Reproducible executions wo & w/ SpeQuloSI SpeQuloS Credits provisioned w/ 10% of BoT workload (in Cloud resource

cpu.hour equivalent)

→ 25000 BoT execution traces


Strategies ComparisonTail Removal Efficiency→ Tail Duration w/ SpeQuloS vs Tail Duration wo SpeQuloS

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

Fra

ctio

n o

f B

oT

wher

e ta

il e

ffic

iency

> P

Tail Removal Efficiency (Percentage P)

9C-G-F

9A-G-F

V-G-F

9C-C-F

9A-C-F

V-C-F

Flat deploymentstrategy

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

Fra

ctio

n o

f B

oT

wher

e ta

il e

ffic

iency

> P


9C-G-R

9A-G-R

V-G-R

9C-C-R

9A-C-R

V-C-R

Reschedule deploymentstrategy

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

Fra

ctio

n o

f B

oT

wher

e ta

il e

ffic

iency

> P


9C-G-D

9A-G-D

V-G-D

9C-C-D

9A-C-D

V-C-D

Cloud duplicationdeployment strategy

Best strategies are able toI Suppress tail for 50% of executionI Half the tail for 80% of execution

Flat (F) < Reschedule (R) & Cloud Duplication (D)

Tail Detection (V) triggers Cloud too late


Cloud Resources Consumption

Percentage of credits spent vscredits provisioned (=10% of BoTworkload).

10% to 25% of what has beenprovisioned are actually used byCloud resources

0

10

20

30

40

50

9C-G

-F

9C-G

-R

9C-G

-D

9C-C

-F

9C-C

-R

9C-C

-D

9A-G

-F

9A-G

-R

9A-G

-D

9A-C

-F

9A-C

-R

9A-C

-D

V-G

-F

V-G

-R

V-G

-D

V-C

-F

V-C

-R

V-C

-D

Per

centa

ge

of

cred

its

use

d

Combination of SpeQuloS strategies

→ ≈2.5% of BoT workload is executed on Cloud


Completion TimeCombination of strategies used: 9C-C-R

0

20000

40000

60000

80000

100000

120000

140000

SETI

ND

G5K

LYO

G5K

GRE

SPOT10

SPOT100

Co

mp

leti

on

tim

e (s

)

BE-DCI

No SpeQuloSSpeQuloS

BOINC & SMALL BoT

0

5000

10000

15000

20000

25000

SETI

ND

G5K

LYO

G5K

GRE

SPOT10

SPOT100

Co

mp

leti

on

tim

e (s

)

BE-DCI

No SpeQuloSSpeQuloS

BOINC & BIG BoT

0

10000

20000

30000

40000

50000

60000

70000

SETI

ND

G5K

LYO

G5K

GRE

SPOT10

SPOT100

Co

mp

leti

on

tim

e (s

)

BE-DCI

No SpeQuloSSpeQuloS

BOINC & RANDOM BoT

0

5000

10000

15000

20000

25000

30000

35000

40000

SETI

ND

G5K

LYO

G5K

GRE

SPOT10

SPOT100

Co

mp

leti

on

tim

e (s

)

BE-DCI

No SpeQuloSSpeQuloS

XWHEP & SMALL BoT

0

1000

2000

3000

4000

5000

6000

7000

8000

SETI

ND

G5K

LYO

G5K

GRE

SPOT10

SPOT100

Co

mp

leti

on

tim

e (s

)

BE-DCI

No SpeQuloSSpeQuloS

XWHEP & BIG BoT

1000

2000

3000

4000

5000

6000

7000

8000

SETI

ND

G5K

LYO

G5K

GRE

SPOT10

SPOT100

Co

mp

leti

on

tim

e (s

)

BE-DCI

No SpeQuloSSpeQuloS

XWHEP & RANDOM BoT

→ Up to 9x speedup→ Depend on middleware used, BE-DCI volatility


Completion Time Prediction

→ User can ask prediction at any moment of BoT execution

Predicted completion time:

tp = α× t(r)

r

Current completion ratio: r

Time elapsed since submission: t(r)

α: adjustment factor, depend on execution environment:I DG server & middlwareI Application & BoT size→ Adjusted after BoT execution to minimize difference w/ completion time

observed

Statistical uncertainty (±x%): Success rate of prediction vs previous execution


Prediction Results

Completion Time Predication:I Made at 50% of BoT executionI Uncertainty: ± 20%I α adjusted after 30 execution w/ same BD-DCI, middleware, BoT workload

BoT category & MiddlewareSMALL BIG RANDOM

BE-DCI BOINC XWHEP BOINC XWHEP BOINC XWHEP Mixedseti 100 100 100 82.8 100 87.0 94.1nd 100 100 100 100 100 96.0 99.4g5klyo 88.0 89.3 96.0 87.5 75 75 85.6g5kgre 96.3 88.5 100 92.9 83.3 34.8 83.3spot10 100 100 100 100 100 100 100spot100 100 100 100 100 76 3.6 78.3Mixed 97.6 96.1 99.2 93.5 89.6 65.3 90.2

→ Successful prediction in 9 cases out of 10

→ Lower results with heterogeneous BoT

→ Needs a learning phase, with same BoT (at least same app.), executed onsame BE-DCI.


SpeQuloS Deployment in European Desktop Grid Initiative

EDGI project: Bringing European Desktop Grids computing resources to scientificcommunities.


Conclusion

BE-DCIs: “Low-cost” solution but poor QoS (tail effect)

SpeQuloS: Use Cloud resources to improve QoS delivered to BE-DCI usersI Efficiently removes the tail problem

→ Speed up BoT execution→ Only require few % of workload to be executed on Cloud

I Enable completion time prediction for users→ A step towards BE-DCIs usability in computing landscape ?

Future work:I Better strategies to anticipate problems (tail effect)I Analysis from users feedback in SpeQuloS deployments


spequlos: a qos service for bot applications using best effort distributed computing infrastructures

Technology

lodygensky lipcnrs

lalspequlos hpdc12

bot assignment

bot execution00

awhen tail

bot executionbring information

oleg lodygensky

derrick kondo