Erico Vanini – CISCO
Let It Flow Resilient Asymmetric Load Balancing with Flowlet Switching Erico Vanini*, Rong Pan*, Mohammad Alizadeh†, Parvin Taheri*, Tom Edsall*
* †
Erico Vanini – CISCO
Load Balancing in Data Centers
• Goal: Avoid congestion hotspots
• Active research area
• Solved for symmetric topologies
• Still open question in asymmetric scenarios
2
Multi-rooted tree
1000s of server ports
Erico Vanini – CISCO
Asymmetry Is Common in Practice
4
10G$ 2x10G$
L0####L1#####################L2####L3#####################L4####L5#
S0#######################S1##S2#######################S3##S4######################S5#
Imbalanced striping: # of ports indivisible by # of switches in other tier
Zhou et al. “WCMP: Weighted cost multipathing for improved fairness in data centers,” CoNEXT 2014.
Erico Vanini – CISCO
Dealing with Asymmetry
7
25G
20G
20G 20G
(UDP)
50G (TCP)
40G
40G
40G
40G
40G
Scheme Thrput ECMP-WCMP
(Local Stateless) 65G
Congestion-Aware
Erico Vanini – CISCO
Dealing with Asymmetry
8
35G
15G
20G 20G
(UDP)
50G (TCP)
40G
40G
40G
40G
40G
Scheme Thrput ECMP-WCMP
(Local Stateless) 65G
Congestion-Aware 70G
Handling asymmetry needs path congestion information, which varies
dynamically with traffic.
Erico Vanini – CISCO
Example: CONGA
9
1. Leaf switches (top-of-rack) track congestion to other leaves on different paths
2. Use this information to minimize bottleneck utilization
L0 L1 L2 Conges'on(To(Leaf-
Table-@L0-
Dest-Leaf-
Path-0- 1- 2-
L1-L2-
3-
551 1 43 7 2
Erico Vanini – CISCO
Existing Load Balancing Schemes Congestion-aware decisions: complex • Measure and feed back congestion in real time • CONGA, Hedera, HULA, MPTCP, FlowBender,… Congestion-oblivious decisions: simple • Random, round robin, hashing decision process • ECMP, WCMP, Packet-Spray, Presto,…
10
Erico Vanini – CISCO
Is there a simple load balancing scheme (with congestion-oblivious decisions)
that can handle asymmetry?
11
Erico Vanini – CISCO
LetFlow Simple: Randomly assign Flowlets to available paths Flowlets:
“Flowlets are bursts from same flow separated by at least Δ” “the main origin of flowlets is the burstiness of TCP at RTT and sub-RTT scales.” Kandula et al, “Dynamic load balancing without packet reordering”, (2007)
…. … … .. .… Δ Δ Δ
12
Erico Vanini – CISCO
Simple Asymmetric Scenario
Link Failure
32x10G 32x10G A B
Detect and randomly assign Flowlets to available paths
13
Extremely simple! - No measurements - No feedback - No congestion state
Erico Vanini – CISCO
0 2 4 6 8
10 12 14 16 18
30 35 40 45 50 55 60 65 70 75
FCT
(ms)
Traffic Load % (w/o link failure)
ECMP CONGA LetFlow
LetFlow
14
bette
r
Erico Vanini – CISCO
What’s Going On?
Link Failure
32x10G 32x10G
Spine0 Spine1
Force % of choices per path
A B 15
Erico Vanini – CISCO
Flowlets are Robust Performance is not sensitive to load balancing decisions
1 2 3 4 5 6 7 8 9
10
0 10 20 30 40 50 60 70 80 90 100
Nor
m. F
CT
(bes
t)
% of Choices Assigned to Green path
flowlets flows packets (oracle)
Traffic load: 90% 16
Erico Vanini – CISCO
Flowlet Length
0
500
1000
1500
2000
2500
2 10 20 30 40 50 56 60 64 70 76 80 84 90 94 98
avg
flow
let l
engt
h (p
acke
ts)
% of Choices Assigned to Green path
full capacity path half capacity path
17
Ideal split
Erico Vanini – CISCO
Flowlets are Elastic • Flowlets change size based on congestion on the path - Uncongested path ! larger flowlets - Congested path ! smaller flowlets
! Flowlet sizes implicitly encode path congestion information … this determines the amount of traffic on each path – not just load balancing decisions
18
LetFlow is congestion-aware, despite simple random decisions
Erico Vanini – CISCO
Why Are Flowlets Elastic? • Because of congestion control (e.g., TCP)
• A flowlet gap occurs on - Window cuts (Loss/ECN) - Latency spikes (ACK clocking)
19
t
Window
halved
• But, there’s a more basic reason, applicable to any congestion control protocol …
Erico Vanini – CISCO
LetFlow Analysis
20
C1
C2
n1 + n2 n1
n2
• Assume flows transmit as Poisson processes
ƛ1 = C1 / n1
ƛ2 = C2 / n2
Avg. rate of each flow:
Erico Vanini – CISCO
LetFlow Analysis
21
C1
C2
n1 + n2 n1
n2
Statetransitionprobability!"#,"%& ≈ ()%((+,(-)
/01)∆, 4 ∈ 1,2
... , ...... , ...
Erico Vanini – CISCO
LetFlow Analysis
22
C1
C2
n1 + n2 n1
n2
Statetransitionprobability!"#,"%& ≈ ()%((+,(-)
/01)∆, 4 ∈ 1,2
... , ...... , ...
Takeaways
1. Flows move from low rate paths (small ƛ) to high rate paths (large ƛ)
2. The flowlet timeout (Δ) is important - Shouldn’t be too small or large
Erico Vanini – CISCO
Experiments Summary Different workloads: web search, data mining, enterprise
• Testbed experiments: ECMP, CONGA, LetFlow • 2 leaves 2 spines, 64 servers: symmetric & asymmetric topologies
• Simulations: ECMP, WCMP, Presto�, CONGA, LetFlow • Large topology: 6 leaves 6 spines, 288 servers • Complex asymmetric topologies: speed mismatch, combined workloads,
multitier • Different protocols: TCP, DCTCP, DCQCN
23
Erico Vanini – CISCO
Large Scale Simulations
10G$ 2x10G$
L0####L1#####################L2####L3#####################L4####L5#
S0#######################S1##S2#######################S3##S4######################S5#
0"100"200"300"400"500"
30" 40" 50" 60" 70" 80" 90"
FCT$(m
s)$
Load$%$$
CONGA"LetFlow"WCMP"Presto"�
topology from: WCMP, Zhou, Junlan, et al. 2014
24
LetFlow within 2X of CONGA; Both are much better than other schemes
Erico Vanini – CISCO
1 : 11 : 8
L0
Traffic Load uniform to the 2 destinations
b c
L1 L2
spine0 spine1
2x40G 1x10G
8 : 1
25
Multi Destination Scenario
Erico Vanini – CISCO
b c
FCT is similar between Conga and LetFlow spine0 spine1
L0 L1 L2
Traffic Load uniform to the 2 destinations
2x40G 1x10G
0
20
40
60
80
100
CONGA LetFlow 90%
LetFlow 60%
% o
f des
tinat
ioon
traf
fic
26
Multi Destination Scenario
Erico Vanini – CISCO
Other Transport Protocols
2
4
6
8
10
12
40 50 60 70
FCT
(ms)
Load % (w/o link failure)
ECMP CONGA LetFlow
2
2.5
3
3.5
4
4.5
5
40 50 60 70
FCT
(ms)
Load % (w/o link failure)
ECMP CONGA LetFlow
DCTCP DCQCN
27
Erico Vanini – CISCO
Conclusion • Flowlet switching is a powerful technique for asymmetric
load balancing
• LetFlow: a simple LB mechanism that handles asymmetry • Random decisions but implicitly congestion-aware • Suitable for standalone switches – does not need feedback
• Letflow is stochastic and reactive in nature • Cannot proactively prevent congestion / queue buildup like more
sophisticated schemes 28