the case for tiny tasks in compute clusters

The Case for Tiny Tasks in Compute Clusters

Kay Ousterhout*, Aurojit Panda*, Joshua Rosen*, Shivaram Venkataraman*, Reynold Xin*,

Sylvia Ratnasamy*, Scott Shenker*+, Ion Stoica*

* UC Berkeley, + ICSI

Setting…

……

Task

Task

Task

Task

Map Reduce/Spark/

Dryad Job

Today’s tasks

Tiny Tasks

Use smaller tasks!

Why?

How?

Where?

Problem: Skew and Stragglers

Contended machine?

Data skew?

Benefit: Handling of Skew and StragglersToday’s

tasks Tiny Tasks

As much as 5.2x reduction in job completion time!

Problem: Batch and Interactive Sharing

High priority interactive job

arrives

Low priority batch task

Clusters forced to trade off utilization and responsiveness!

Benefit: Improved SharingToday’s tasks Tiny Tasks

High-priority tasks not subject to long wait times!

Benefits: Recap(1)

Straggler mitigatio

n(2)

Improved

sharing

Mantri (OSDI ‘10)Scarlett (EuroSys ’11)

SkewTune (SIGMOD ‘12)Dolly (NSDI ’13)

…

Quincy (SOSP ‘09)Amoeba (SOCC ’12)

…

Why?

How?

Where?

Scheduling requirements:High

ThroughputLow LatencyDistributed Scheduling(e.g., Sparrow

Scheduler)

Schedule task

(millions per second)

(milliseconds)

Use existing thread pool to launch tasks

Launch task

Schedule task

Use existing thread pool to launch tasks

+Cache task

binariesTask launch = RPC

time (<1ms)

Launch task

Schedule task

Read input data

Smallest efficient file block size:

Distribute Metadata (à la Flat Datacenter Storage,

OSDI ‘12)

Launch task

Schedule task

8MB

Execute task + read

data for next task

Schedule task

… …

Tons of tiny transfers!

Framework-Controlled I/O(enables optimizations,

e.g., pipelining)

Read input data

Launch task

How low can you go?

Execute task + read

data for next task

Schedule task

100’s of milliseco

nds

Read input data

Launch task 8MB disk block

Why?

How?

Where?

Original Job

Map Task 1 …

Map Task 2 …

1234

N

…Map Task

s

Tiny Tasks Job

Reduce Task 1

…Reduce

Tasks

K1: K1: K1:

K2: K2: K3:

K5: K5: …

K1:

K1:

K1: K2

:

K2: Kn

:

Kn:

Kn:

Original Reduce Phase

Tiny Tasks = ?

Reduce Task 1

K1: K1: K1: K1:

K1: K1: K1: K1:

K1: K1: K1: K1:

K1: K1: K1: K1:

Splitting Large Tasks• Aggregation trees–Works for functions that are associative

and commutative

• Framework-managed temporary state store

• Ultimately, need to allow a small number of large tasks

Tiny tasks mitigate

stragglers+

Improve sharing

Distributed file

metadata

Launch task in existing thread pool

Distributed

scheduling

Pipelined task

execution

Questions? Find me or Shivaram:

Backup Slides

5.2x at the 95th percentile!

Benefit of Eliminating StragglersBased on Facebook Trace

Why Not Preemption?• Preemption only handles sharing (not

stragglers)

• Task migration is time consuming

• Tiny tasks improve fault tolerance

Dremel/Drill/Impala• Similar goals and challenges

(supporting short tasks)

• Dremel statically assigns tablets to machines; rebalances if query dispatcher notices that a machine is processing a tablet slowly standard straggler mitigation

• Most jobs expected to be interactive (no sharing)

10,000 Machines16 cores/machine

100 millisecond tasks

Scheduling Throughput

Over 1 million task scheduling decisions per

second

Sparrow: TechniquePlace m tasks on the least loaded of dm slaves

SlaveSlaveSlaveSlaveSlave

Slave

Scheduler

Scheduler

Scheduler

SchedulerJob

m = 2 tasks

4 probes (d =

2)

More at tinyurl.com/sparrow-scheduler

29

Sparrow: Performance on TPC-H Workload

Within 12% of offline optimal; median queuing delay of 8msMore at tinyurl.com/sparrow-scheduler

the case for tiny tasks in compute clusters

Documents

smaller tasks

short tasks

shorter tasks

scheduling work

job completion time

lots of excellent work

batch jobs

handling of skew