the elasticity and plasticity in semi-containerized co ... · google trace vs. alibaba trace...

The Elasticity and Plasticity in Semi-Containerized Co-locating Cloud Workload: a view from Alibaba Trace

Qixiao Liu* and Zhibin Yu

Shenzhen Institute of Advanced Technology

Chinese Academy of Science

@SoCC 2018, Carlsbad, CA, U.S.A

1

Introduction

• Challenges in the cloud computing – Low resource utilization – Tail latency – IRU-QoS dilemma – Task scheduling, resource management, programming diagram, etc.

• Traces from industrial production environment – The Google trace released in 2011

• 12.7m machines, 670k jobs (mixed workload), 29 days.

– Alibaba released in 2017 • 1.3k machines, 23k jobs (also mixed workload), in 1 day.

2

Fraction External

Google trace vs. Alibaba trace • Google trace

1. Server heterogeneity

2. Priority information

3. Server failure

4. Mixed workload (production and non-production), but are

‘equal’ as jobs

• Alibaba trace 1. All servers are equipped with 64 CPUs,

>99% of servers: same memory and disk capacities.

2. No priority information

3. Negligible server failures

4. Online services and batch jobs are traced separately

More elaborative views into the co-location 3

• Elasticity – resist a distorting influence and

to return to its original size and shape when that influence or force is removed*

• Plasticity – non-reversible changes of shape

in response to applied forces*

* Elasticity, plasticity (physics), wikipedia

4

Elastic computing

100%

0

Resource

Resource need

Resource provision

5

Elastic computing

100%

0

Resource

Resource need

Resource provision

6

Elastic computing in co-location

100%

0

Resource

Resource need

Resource provision

Resource used by

batch tasks

Task eviction

7

Resource utilization Batch job performance ?

Ideal elastic computing in co-location plus plasticity

100%

0

Resource

Resource need

Resource provision

Resource used by

batch tasks

8

Outline

• Trace overview

• Shape the workload

• Statistics analysis of containers and batch jobs

• Co-location analysis

• Discussion

• Conclusion

9

Alibaba cluster management architecture

• Mixed workload – Online services

– Batch jobs

• Mixed entities (semi-containerization) – Container

– Tasks

• Mixed architecture – Concurrent schedulers: Sigma and Fuxi

– Level0-controller

– Novelty?

10

technical

legacy

It is in production.

Trace structure

Servers

Containers

Batch jobs

Static information: ID, Machine ID; Req CPU/mem/disk; Create time; Allocated CPU ID.

Static information: Jobid/taskid; Create/end time; Req CPU/memory; Status, #instance.

Static information: Machine ID; Config. CPU/mem/Disk; Event and time.

Runtime statistics: Container ID; Sample point; Used CPU/mem/disk; CPU load, CPI, mem miss.

Runtime statistics: Jobid/taskid; Instance start/end time; Status; Sequence; Used CPU/memory

Runtime statistics: Machine ID; Sample point; Used CPU/mem/disk; CPU load, CPI,

Sampled every 5min

Sampled every 5min

Execution based

11

Outline

• Trace overview




• Discussion

• Conclusion

12

Container • 11089 containers, each runs one online service, for 24 hours

• Container requests CPUs, memory and disk – Req. #CPU: 1, 4, 6, 8, 16 – Memory capacity (normalized): 0.002 to 0.318 – Disk capacity (normalized): 3e-11 to 0.113 – 25 <CPU, memory, disk> patterns for all containers, 19 are valid.

• Requested resource over server capacity (ROC):

(Resource_req/Server capacity)*100%

CPU Memory Disk

ROC 9.5% 10.9% 4.9%

ROC SD 4.4% 8.8% 2.1%

13

Batch job • Batch job structure: job, task, instance

• Job->task: DAG

• Task->instance: same computing logic, resource request

Batch job

Batch task Batch task

Batch instance

Batch instance

DAG

Same computing logic

12.9k

80k

11.9m 14

Batch job • Batch task requests CPU and memory

– #CPU: 0.45 to 8 (14 values in total, 0.05 basic unit) – Memory: 0.0027 to 0.1273 (750 values) – 989 <CPU, memory> patterns in 80k tasks

• Batch instance status: – Failed, interrupted, ready, running, terminated, wait – Failed/interrupted rate are 1.5%

• Google trace: ‘half submissions are resubmissions’

CPU Memory

ROC 0.8% 0.9%

ROC SD 0.5% 0.7%

15

Outline

• Trace overview


• Runtime statistics analysis of containers and batch jobs


• Discussion

• Conclusion

16

Container resource utilization

– Resource overprovisioning – Max vs. average resource used

• steady memory and disk utilization, but CPU varies significantly

Reserved but not used Avg. used max used

16 8 4 Mem A B C B Disk A C

CPU Memory Disk

17

0

1

2

3

4

L H L H

4 CPU 8 CPU

0

0.05

0.1

0.15

0.2

L H L H

4 CPU 8 CPU

Container performance

- Containers are guaranteed resources when load rises;

- Higher load increases resource utilization, but not hurting the performance.

L/H: low /high

CPU utilization

18

CPI CPU load

Batch instance resource utilization

– Resource overcommit, the amount of its actual used resources is greater than that it requested at submission.

– Both CPU and memory overcommit.

CPU utilization Memory utilization

19

Incremental resource allocation in Fuxi

• FUXI, VLDB 2014

20

Incremental resource allocation in Fuxi

• FUXI, VLDB 2014

• Start to run a batch instance with its initial resource request, increase its allocation when more resources become available.

• Batch instance with lower resource request has a better chance get to run

21

• Local queue in node

• Resource request: – Initial resource request (low)

– Actual (peak) request (high)

0.1

1

10

100

1000

Sta

rt D

elay

per

tas

k(s

)

Cluster wide resource allocation efficiency

Instances from the same task, get scheduled at the same time Start delay:

SD = 𝑠𝑡𝑎𝑟𝑡_𝑡𝑖𝑚𝑒𝑖 −𝑅𝑒𝑓_𝑠𝑡𝑎𝑟𝑡_𝑡𝑖𝑚𝑒𝑇

22

#servers: 1313

#instances per

task

Cluster wide resource allocation efficiency

23

0.1

1

10

100

1000Mean SD

Max SD

Sta

rt D

elay

(s)

#instances per

task

The latest one most likely delay the result delivery of the task/job

0%

20%

40%

60%

80%

100%

1 2 4 8 16 32

Cluster-wide batch instance performance

Normalized instance latency: NIL = 𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛_𝑡𝑖𝑚𝑒𝑖/𝑅𝑒𝑓_𝑡𝑖𝑚𝑒𝑇

• most tasks have their avg NIL below 3. 24

The mean NIL

0%

20%

40%

60%

80%

100%

1 2 4 8 16 32

Mean NIL Max NIL

NIL

dis

trib

uti

on

Cluster-wide batch instance performance

• the max NIL of few tasks deviate from the average.

25

Outline

• Trace overview




• Discussion

• Conclusion

26

Container deployment

Containers reserve resources - 0~64 CPUs; - 0~150% memory; overbooking - 0~60% disk.

CPU

Memory

Disk

- Containers are deployed using different policies; - CPU remains the main constraint.

27

#S

erver

s

Batch instance scheduling in the co-location

BO: Batch instance only servers CA, CB, CC: low, medium, full resources reserved by containers No obvious difference to schedule a batch instance in the cluster: - Similar accumulated instance

execution time on all servers, although BO has more batch instances running.

28

0

5000

10000

15000

BO CA CB CC

#batch instances

320340360380400

BO CA CB CC

Instance execution

time (hours)

Resource allocate to batch instances

Max #CPU allowed for batch instances to use on servers does not depend on the #CPU used by containers.

-32

-24

-16

-8

0

8

16

24

32

Max #CPU used by

containers on each server

(sampled every 5 min)

Max #CPU used by batch a

instance on each server (max

CPUs during execution)

29

Resource utilization in the cluster

30

0%10%20%30%40%50%60%70%

CPU Memory Disk ContainerR

eso

urc

e use

d

BO CA CB CC

Outline

• Trace overview




• Discussion

• Conclusion

31

Discussion

• Elasticity – Resource overprovisioning (containers). – Resource overcommitment (batch instances). – Resource overbooking.

• Plasticity – Very low task eviction rate in the cluster (1.5%). – Accumulated batch instance execution time on most servers is similar. – SD increases radically when a task owns more than 1000 instances (there

are 1313 servers). – No obvious difference between the maximum allowed #CPU for batch

instance to use on most servers

32

Outline

• Trace overview




• Discussion

• Conclusion

33

Conclusion

• Alibaba presents a trace, using semi-containerized cluster management

• Concurrent traces for online services and batch jobs allow more elaborative characterization of the mixed workload

• Elasticity and plasticity in the cluster management promoted the batch job performance.

Thanks for attention!! Also @poster

34

the elasticity and plasticity in semi-containerized co ... · google trace vs. alibaba trace...

Documents