Download - Coflow A Networking Abstraction For Cluster Applications UC Berkeley Mosharaf Chowdhury Ion Stoica
CoflowA Networking Abstraction For Cluster Applications
UC Berkeley
Mosharaf Chowdhury Ion Stoica
Cluster Applications
Multi-Stage Data Flows»Computation interleaved
with communication
Computation»Distributed»Runs on many machines
Communication»Structured»Between machine groups
2
Driver
A Flow»Sequence of packets» Independent»Often the unit for network
scheduling, traffic engineering, load balancing etc.
Multiple Parallel Flows» Independent»Yet, semantically bound»Shared objective
3
Driver
Communication Abstraction
Minimize Completion Time
Coflow
A collection of flows between two groups of machines that are bound together by application-specific
semantics
A collection of flows between two groups of machines that are bound together by application-specific
semantics
Captures
1.Structure
2.Shared Objective
3.Semantics
4
‘
We Want To…
Better schedule the network» Intra-coflow» Inter-coflow
Write the communication layer of a new application
»Without reinventing the wheel
Add unsupported coflows to an application, orReplace an existing coflow implementation
» Independent of applications
5
6
Coflow
APIThe Network
(Physically or Logically Centralized Controller)
ClusterApplications
7
Coflow
APIGoals1.Separate intent from mechanisms
2.Convey application-specific semantics to the network
Goals1.Separate intent from mechanisms
2.Convey application-specific semantics to the network
8
Coflow
APIShuffl
e finish
es
MapReduce
Job finishe
s
create(SHUFFLE) handle
put(handle, id, content)
get(handle, id) content
terminate(handle)
Driver
Choice of algorithms
»Default»WSS1
Choice of mechanism
»App vs. Network layer
»Pull vs. Push
Choice of algorithms
»Default»WSS1
Choice of mechanism
»App vs. Network layer
»Pull vs. Push
9
mappers
reducers
shuffl
e
1. Orchestra, SIGCOMM’2011
CoflowFlexibility
10
mappers
reducers
shuffl
e
driver (JobTracker)
bro
adca
st
@driverb create(BCAST)…
put(b, id, content)
…terminate(b)
@mapperget(b, id)…
CoflowFlexibility
11
mappers
reducers
shuffl
e
driver (JobTracker)
bro
adc
ast
@driverb create(BCAST)s create(SHUFFLE, ord=[b ~> s])
put(b, id, content)
…terminate(b)terminate(s)
@mapperget(b, id)put(s, ids1)…
CoflowFlexibility
Throughput-Sensitive Applications
12
Minimize Completion Time
After 2 seconds
Throughput-Sensitive Applications
13
After 2 seconds
After 7 secondsAfter 4 seconds
Minimize Completion Time
Throughput-Sensitive Applications
14
After 2 seconds
After 7 seconds
Minimize Completion Time
Free up resources
without hurting application-perceived
communication time
Free up resources
without hurting application-perceived
communication time
HotNets 2012
Latency-Sensitive Applications
15
Top-level Aggregat
or
Mid-level Aggregat
ors
Workers
Top-level Aggregat
or
Mid-level Aggregat
ors
Workers
Latency-Sensitive Applications
16
HotNets 2012
Meet Deadlin
e1,2
Meet Deadlin
e1,2
1. D3, SIGCOMM’20112. PDQ, SIGCOMM’2012
HotNets-XI: Home Pageconferences.sigcomm.org/hotnets/2012/The Eleventh ACM Workshop on Hot Topics in Networks (HotNets-XI) will bring together people with interest in computer networks to engage in a lively debate ...
HotNets Workshop | acm sigcommwww.sigcomm.org/events/hotnets-workshopThe Workshop on Hot Topics in Networks (HotNets) was created in 2002 to discuss early-stage, creative ... HotNets-XI, Seattle, WA area, October 29-30, 2012.
HotNets-XI: Call for Papersconferences.sigcomm.org/hotnets/2012/cfp.shtmlThe Eleventh ACM Workshop on Hot Topics in Networks (HotNets-XI) will bring together researchers in computer networks and systems to engage in a lively ...
Coflow accepted at HotNets'2012www.mosharaf.com/blog/2012/09/.../coflow-accepted-at-hotnets201...Sep 13, 2012 – Update: Coflow camera-ready is available online! Tell us what you think! Our position paper to address the lack of a networking abstraction for ...
Limit impact to as few requests
as possible
Limit impact to as few requests
as possible
One More Thing…
1. Critical Path Scheduling
2. OpenTCP
3. Structured Streams
4. …
17
Coflow
UC Berkeley
Mosharaf Chowdhury http://www.mosharaf.com/
A semantically-bound collection of flowsConveys application intent to the network
»Allows better management of network resources
»Provides greater flexibility in designing applications
Communication of a cluster application is represented by a partially-ordered set of coflows
Network allocation takes place among these partially-ordered sets of coflows
Critical Path Scheduling
19
S
B
S
A
S
S
A
S
20
Operation Caller
create(PATTERN, [opt]) handle
Driver
put(handle, id, content, [opt]) result
Sender
get(handle, id, [opt]) content
Receiver
terminate(handle, [opt]) result
Driver
Coflow
API
Throughput-Sensitive Applications
21
Local shuffle finishesLocal shuffle finishes
Shuffle finishes
Data Flow
Minimize Completio
n Time1
MapReduce
Framework
Job finishes
Map Stage
Reduce Stage
1. Orchestra, SIGCOMM’2011
22
CoflowResourceAllocation1. Weights[Across Apps]
mappers
reducers
shuffl
e1
mappers
reducers
shuffl
e2
Job 1 Job 2
Weighted sharing between coflows@driver
shuffle1 create(SHUFFLE, weight=1)shuffle2 create(SHUFFLE, weight=2)…
Weighted sharing between coflows@driver
shuffle1 create(SHUFFLE, weight=1)shuffle2 create(SHUFFLE, weight=2)…
23
Strict priorities@driver
shuffle1 create(SHUFFLE, pri=3)shuffle2 create(SHUFFLE, pri=5)…
Strict priorities@driver
shuffle1 create(SHUFFLE, pri=3)shuffle2 create(SHUFFLE, pri=5)…
CoflowResourceAllocation2. Priorities[Across Apps]
mappers
reducers
shuffl
e1
mappers
reducers
shuffl
e2
Job 1 Job 2
24
CoflowResourceAllocation3. Dependencies[Within Apps]
mappers
reducers
shuffl
e2
driver
bro
adca
st
(b)
mappers
reducers
shuffl
e1
Job 1 Job 2
aggre
gati
on(
agg)
finishes_before (~>)@driver
b create(BCAST)shuffle2 create(SHUFFLE, ord=[b ~> shuffle2])agg create(AGGR, ord=[shuffle2 ~> agg])
finishes_before (~>)@driver
b create(BCAST)shuffle2 create(SHUFFLE, ord=[b ~> shuffle2])agg create(AGGR, ord=[shuffle2 ~> agg])
25
CoflowResourceAllocation
Communication of a cluster application
is represented by a partially-ordered set of coflows
Network allocation takes place among these partially-ordered sets
of coflows
Communication of a cluster application
is represented by a partially-ordered set of coflows
Network allocation takes place among these partially-ordered sets
of coflows
S
B
S
A
S
S
A
S