the case for tiny tasks in compute clusters
DESCRIPTION
The Case for Tiny Tasks in Compute Clusters. Kay Ousterhout * , Aurojit Panda * , Joshua Rosen * , Shivaram Venkataraman * , Reynold Xin * , Sylvia Ratnasamy * , Scott Shenker *+ , Ion Stoica *. * UC Berkeley, + ICSI. Setting. …. Task. Task. Map Reduce/Spark/Dryad Job. Task. …. - PowerPoint PPT PresentationTRANSCRIPT
The Case for Tiny Tasks in Compute Clusters
Kay Ousterhout*, Aurojit Panda*, Joshua Rosen*, Shivaram Venkataraman*, Reynold Xin*,
Sylvia Ratnasamy*, Scott Shenker*+, Ion Stoica*
* UC Berkeley, + ICSI
Setting…
……
Task
Task
Task
Task
Map Reduce/Spark/
Dryad Job
Today’s tasks
Tiny Tasks
Use smaller tasks!
Why?
How?
Where?
Why?
How?
Where?
Problem: Skew and Stragglers
Contended machine?
Data skew?
Benefit: Handling of Skew and StragglersToday’s
tasks Tiny Tasks
As much as 5.2x reduction in job completion time!
Problem: Batch and Interactive Sharing
High priority interactive job
arrives
Low priority batch task
Clusters forced to trade off utilization and responsiveness!
Benefit: Improved SharingToday’s tasks Tiny Tasks
High-priority tasks not subject to long wait times!
Benefits: Recap(1)
Straggler mitigatio
n(2)
Improved
sharing
Mantri (OSDI ‘10)Scarlett (EuroSys ’11)
SkewTune (SIGMOD ‘12)Dolly (NSDI ’13)
…
Quincy (SOSP ‘09)Amoeba (SOCC ’12)
…
Why?
How?
Where?
Scheduling requirements:High
ThroughputLow LatencyDistributed Scheduling(e.g., Sparrow
Scheduler)
Schedule task
(millions per second)
(milliseconds)
Use existing thread pool to launch tasks
Launch task
Schedule task
Use existing thread pool to launch tasks
+Cache task
binariesTask launch = RPC
time (<1ms)
Launch task
Schedule task
Read input data
Smallest efficient file block size:
Distribute Metadata (à la Flat Datacenter Storage,
OSDI ‘12)
Launch task
Schedule task
8MB
Execute task + read
data for next task
Schedule task
… …
Tons of tiny transfers!
Framework-Controlled I/O(enables optimizations,
e.g., pipelining)
Read input data
Launch task
How low can you go?
Execute task + read
data for next task
Schedule task
100’s of milliseco
nds
Read input data
Launch task 8MB disk block
Why?
How?
Where?
Original Job
Map Task 1 …
Map Task 2 …
1234
N
…Map Task
s
Tiny Tasks Job
Reduce Task 1
…Reduce
Tasks
K1: K1: K1:
K2: K2: K3:
K5: K5: …
K1:
K1:
K1: K2
:
K2: Kn
:
Kn:
Kn:
Original Reduce Phase
Tiny Tasks = ?
Reduce Task 1
K1: K1: K1: K1:
K1: K1: K1: K1:
K1: K1: K1: K1:
K1: K1: K1: K1:
Splitting Large Tasks• Aggregation trees–Works for functions that are associative
and commutative
• Framework-managed temporary state store
• Ultimately, need to allow a small number of large tasks
Tiny tasks mitigate
stragglers+
Improve sharing
Distributed file
metadata
Launch task in existing thread pool
Distributed
scheduling
Pipelined task
execution
Questions? Find me or Shivaram:
Backup Slides
5.2x at the 95th percentile!
Benefit of Eliminating StragglersBased on Facebook Trace
Why Not Preemption?• Preemption only handles sharing (not
stragglers)
• Task migration is time consuming
• Tiny tasks improve fault tolerance
Dremel/Drill/Impala• Similar goals and challenges
(supporting short tasks)
• Dremel statically assigns tablets to machines; rebalances if query dispatcher notices that a machine is processing a tablet slowly standard straggler mitigation
• Most jobs expected to be interactive (no sharing)
10,000 Machines16 cores/machine
100 millisecond tasks
Scheduling Throughput
Over 1 million task scheduling decisions per
second
Sparrow: TechniquePlace m tasks on the least loaded of dm slaves
SlaveSlaveSlaveSlaveSlave
Slave
Scheduler
Scheduler
Scheduler
SchedulerJob
m = 2 tasks
4 probes (d =
2)
More at tinyurl.com/sparrow-scheduler
29
Sparrow: Performance on TPC-H Workload
Within 12% of offline optimal; median queuing delay of 8msMore at tinyurl.com/sparrow-scheduler