dynamic scheduling of network updates based on the slides by xin jin hongqiang harry liu, rohan...
TRANSCRIPT
Dynamic Scheduling ofNetwork Updates
Based on the slides by Xin JinHongqiang Harry Liu, Rohan Gandhi, Srikanth Kandula, Ratul Mahajan,
Ming Zhang, Jennifer Rexford, Roger Wattenhofer
Presented by Sergio Rivera
2
SDN: Paradigm Shift in Networking
• Many benefits– Traffic engineering [B4, SWAN]
– Flow scheduling [Hedera, DevoFlow]
– Access control [Ethane, vCRIB]
– Device power management [ElasticTree]
• Systems operate by updating the data plane state of the network (periodically or by triggers)
Controller• Direct and centralized updates of forwarding rules in switches
3
Motivation• Time to update switch rules varies and leads to slow
network updates: – Switch hardware capabilities– Control load on the switch– Nature of updates– RPC delays (not that much…)
• Controlled experiments explore the impact of four factors:– # rules to be updated– Priorities of the rules– Type of rules (insertion vs modification)– Control load
4
Observations
Single-Priority rules
3.3 ms per-rule update time
Random-priority rules
Reaches 18 ms per-rule update time when inserting 600 rules
5
Observations (cont’d).
Modifying rules (TCAM starts with 600 single-priority rules)
11 ms per-rule modification time (insert & delete).
Control activities were performed while updating 100 rules
Time to update highly varies. (e.g. 99th percentile vs. 50th percentile)
>10x
Even in controlled conditions, switch update time varies significantly!
6
Network Update is Challenging• Requirement 1: fast– Slow updates imply longer periods where congestion or
packet loss occur
• Requirement 2: consistent– No congestion, no blackhole, no loop, etc.
Network
Controller
7
What is Consistent Network Update
Current State
A
D
CB
EF1: 5 F4: 5
F2: 5 F3: 10
Target State
A
D
CB
E
F1: 5
F4: 5
F2: 5 F3: 10
8
What is Consistent Network Update
• Asynchronous updates can cause congestion• Need to carefully order update operations
Current State
A
D
CB
EF1: 5 F4: 5
F2: 5 F3: 10
Target State
A
D
CB
E
F1: 5
F4: 5
F2: 5 F3: 10
A
D
CB
EF4: 5F1: 5
F3: 10F2: 5
Transient State
9
Existing Solutions are Slow
• Existing solutions are static [ConsistentUpdate’12, SWAN’13, zUpdate’13]
– Pre-compute an order for update operations
F3
F4
F2
F1
StaticPlan A
Target StateCurrent State
A
D
CB
EF4: 5
F2: 5
F1: 5
F3: 10A
D
CB
E
F1: 5
F4: 5
F2: 5 F3: 10
10
Existing Solutions are Slow
• Existing solutions are static [ConsistentUpdate’12, SWAN’13, zUpdate’13]
– Pre-compute an order for update operations
• Downside: Do not adapt to runtime conditions– Slow in face of highly variable operation completion time
F3
F4
F2
F1
StaticPlan A
11
Static Schedules can be Slow
F3
F4
F2 F1StaticPlan B
F3
F4
F2
F1
StaticPlan A
F1F2F3F4
Time1 2 3 4 5
F1F2F3F4
Time1 2 3 4 5
F1F2F3F4
Time1 2 3 4 5
F1F2F3F4
Time1 2 3 4 5
Target StateCurrent State
A
D
CB
EF4: 5
F2: 5
F1: 5
F3: 10A
D
CB
E
F1: 5
F4: 5
F2: 5 F3: 10
No static schedule is a clear winner under all conditions!
12
Dynamic Schedules are Adaptive and Fast
F3
F4
F2 F1StaticPlan B
F3
F4
F2
F1
StaticPlan A
Target StateCurrent State
A
D
CB
EF4: 5
F2: 5
F1: 5
F3: 10A
D
CB
E
F1: 5
F4: 5
F2: 5 F3: 10
No static schedule is a clear winner under all conditions!
F3
F4
F2F1
DynamicPlan
Adapts to actual conditions!
13
Challenges of Dynamic Update Scheduling
• Exponential number of orderings• Cannot completely avoid planning
Current State
A
D
CB
EF3: 5
F2: 5
F1: 5F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
14
Challenges of Dynamic Update Scheduling
• Exponential number of orderings• Cannot completely avoid planning
Current State
A
D
CB
EF3: 5F1: 5
F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5F2: 5
F2
Deadlock
15
Challenges of Dynamic Update Scheduling
• Exponential number of orderings• Cannot completely avoid planning
Current State
A
D
CB
EF3: 5
F2: 5
F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
F1
F1: 5
16
Challenges of Dynamic Update Scheduling
• Exponential number of orderings• Cannot completely avoid planning
Current State
A
D
CB
E
F2: 5
F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
F1 F3
F1: 5
F3: 5
17
Challenges of Dynamic Update Scheduling
• Exponential number of orderings• Cannot completely avoid planning
Current State
A
D
CB
E F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
F1 F3 F2
Consistent update plan
F2: 5
F1: 5
F3: 5
18
Dionysus Pipeline
Dependency Graph Generator
Update Scheduler
Network
CurrentState
TargetState
ConsistencyProperty
Encode valid orderings
Determine a fast order based on constraints of CP
• Balance planning and opportunism using a two-stage approach
Scheduler is independent of CP
19
Dependency Graph Generation
Dependency Graph Elements
Operation node
Resource node (link capacity, switch table size)Node
Edge Dependencies between nodes
Path node
Current State
A
D
CB
EF3: 5
F2: 5
F1: 5F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
Group operations and link capacity resources
20
Dependency Graph Generation
Current State
A
D
CB
EF3: 5
F2: 5
F1: 5F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
MoveF3
MoveF1
MoveF2
21
Dependency Graph Generation
Current State
A
D
CB
EF3: 5
F2: 5
F1: 5F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
A-E: 5
MoveF3
MoveF1
MoveF2
22
Dependency Graph Generation
Current State
A
D
CB
EF1: 5
F4: 5
F5: 10
A-E: 5
5
MoveF3
MoveF1
MoveF2
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
F3: 5
F2: 5
23
Dependency Graph Generation
A-E: 55
5
MoveF3
MoveF1
MoveF2
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
Current State
A
D
CB
EF3: 5
F2: 5
F1: 5F4: 5
F5: 10
24
Dependency Graph Generation
A-E: 55
55
MoveF3
MoveF1
MoveF2
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
Current State
A
D
CB
EF3: 5
F2: 5
F1: 5F4: 5
F5: 10
25
Dependency Graph Generation
A-E: 55
D-E: 0
5
5 5
5
MoveF3
MoveF1
MoveF2
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
Current State
A
D
CB
EF3: 5
F2: 5
F1: 5F4: 5
F5: 10
26
Dependency Graph Generation
• DG captures dependencies but still leaves scheduling flexibility.
• Two challenges in dynamically scheduling updates:– Cycle resolution in the DG– At any given time, decide which subset of rules will be
issued first
27
Dependency Graph Generation
• Supported scenarios– Tunnel-based forwarding: WANs– WCMP forwarding: data center networks
• Supported consistency properties– Loop freedom– Blackhole freedom– Packet coherence– Congestion freedom
28
Tunnel-Based Forwarding• A flow is forwarded along one or more tunnels• Ingress switch tags packet with the tunnel identifier• Ingress switch splits incoming traffic accross the
tunnels based on configured weights.
• Offers loop freedom and packet coherence by design
• Blackhole freedom is guaranteed if:– A tunnel is fully established before the ingress switch puts
any traffic on it– All traffic is removed from the tunnel before the tunnel is
deleted
29
Tunnel-Based Dependency Graph
S4-S5: 10
5D
A
p3 p2
5C
S1-S4: 10
S5: 50
S1: 50
S4: 50
1
1
1
55
EFG
S2: 50
S1-S2: 0
S2-S5: 5
5
5
1
1
B1
30
WCMP Forwarding
• Switches at every hop match on packet headers and split flows over multiple next hops with configured weights
• Packet coherence is based on version numbers:– Ingress switch tags each packet with a version number– Downstream switches handle packets based on the embedded
version number– Tagging ensures packets never mix configurations
• Blackhole- and Loop-Freedom is possible: – Do not require versioning – Can be encoded in the dependency graph – (authors omit description of graph construction for such
conditions)
31
WCMP Dependency Graph
S4-S5: 10Yp3 p2
5
S1-S4: 10
S2: 50
5
5
S1-S2: 0
S2-S5: 5
5
5
1
1
@S1: [(S2, 1) , (S4,0)] -> [(S2, 0.5) , (S4, 0.5)]
@S2: [(S3, 0.5) , (S5, 0.5)] -> [(S3, 1) , (S5, 0)]
X
Z
@S4: Has only one next-hop. No change needed
32
Dionysus Pipeline
Dependency Graph Generator
Update Scheduler
Network
CurrentState
TargetState
ConsistencyProperty
Encode valid orderings
Determine a fast order
33
Dionysus Scheduling
• Scheduling as a resource allocation problem
A-E: 55
D-E: 0
5
5 5
5
MoveF3
MoveF1
MoveF2
• How to allocate available resources to operations to minimize the update time?
34
Dionysus Scheduling
• Scheduling as a resource allocation problem
A-E: 0
D-E: 0
5
5 5
5
MoveF3
MoveF1
MoveF2
Deadlock!
35
Dionysus Scheduling
• Scheduling as a resource allocation problem
A-E: 05
D-E: 05 5
5
MoveF3
MoveF1
MoveF2
36
Dionysus Scheduling
• Scheduling as a resource allocation problem
A-E: 05
D-E: 55
5
MoveF3
MoveF2
37
Dionysus Scheduling
• Scheduling as a resource allocation problem
A-E: 05
5
MoveF3
MoveF2
38
Dionysus Scheduling
• Scheduling as a resource allocation problem
A-E: 55 Move
F2
39
Dionysus Scheduling
• Scheduling as a resource allocation problem
MoveF2
Done!
40
Dionysus Scheduling
• Scheduling as a resource allocation problem
• NP-complete problems under link capacity and switch table size constraints
• Approach– DAG: always feasible, critical-path scheduling to solve ties– Convert to a virtual DAG (general case include cycles)– Rate limit flows to resolve deadlocks
41
Critical-Path Scheduling
• Calculate critical-path length (CPL) for each node
– Extension: assign larger weight to operation nodes if we know in advance the switch is slow
• Resource allocated to operation nodes with larger CPLs
A-B:5
CPL=3
C-D:0
5
CPL=1
5
5 5
5
CPL=1
CPL=2
CPL=2
CPL=1
MoveF4
MoveF3
MoveF2
MoveF1
42
Critical-Path Scheduling
• Calculate critical-path length (CPL) for each node
– Extension: assign larger weight to operation nodes if we know in advance the switch is slow
• Resource allocated to operation nodes with larger CPLs
A-B:0
CPL=3
C-D:0
5
CPL=1
5
5
5
CPL=1
CPL=2
CPL=2
CPL=1
MoveF4
MoveF3
MoveF2
MoveF1
43
Handling Cycles
• Convert to virtual DAG– Consider each strongly connected
component (SCC) as a virtual node
CPL=1CPL=3
weight = 1weight = 2
• Critical-path scheduling on virtual DAG– Weight wi of SCC: number of operation nodes
A-E: 55
D-E: 0
5
5 5
5
MoveF3
MoveF1
MoveF2
MoveF2
44
Handling Cycles
• Convert to virtual DAG– Consider each strongly connected
component (SCC) as a virtual node
CPL=1CPL=3
weight = 1weight = 2
• Critical-path scheduling on virtual DAG– Weight wi of SCC: number of operation nodes
A-E: 05
D-E: 05 5
5
MoveF3
MoveF1
MoveF2
MoveF2
45
Deadlocks may still happen• The algorithm does not find feasible solution among all
possible orderings (problem hardness)
• No feasible solution exists even if the target state is valid
• Resolve deadlocks reducing flow rates (informing rate limiters)
46
Resolving Deadlocks
• K* = Max number of flows rate limited each time (affects throughput).
• Centrality to decide the order of iteration.• Resource consumption by path-nodes or tunnel-add
operations must happen within the same SCC.
47
Resolving Deadlocks (cont’d)• K* = 1
R3:0 P5
8A P6 R1:0
8
R2:0
P1
C
P4
P2
P7
B P3
8
88 8
4 4
4
44
4
4
4
• By centrality, node A is selected.
• Reduces 4 units of traffic on both P6 and P7.
48
Resolving Deadlocks (cont’d)• K* = 1
R3:0 P5
8A P6 R1:0
R2:0
P1
C
P4
P2
P7
B P3
8
88 8
0 4
4
44
4
0
4
• Which releases 4 units of free capacity in R1 and R2
8
49
Resolving Deadlocks (cont’d)• K* = 1
R3:0 P5
8A R1:4
R2:4
P1
C
P4
P2B P3
8
88 8
4
44
4
• A has no children. It does not belong to the SCC any more
• C is scheduled and partially schedules B
8
50
Resolving Deadlocks (cont’d)• K* = 1
R3:0 P5
0/8A R1:0
0/8
R2:0 C
P4
P2B P3
4/8 4/8 4 4
• Moves 4 units from P3 to P4
• After C finishes, it schedules the remainded operation B
4/8
4/8
51
Resolving Deadlocks (cont’d)• K* = 1
R3:4 P5
0/8A R1:0
0/8
R2:4
P4
B P3
0/4 0/4
• For node A, increase its rate on P5 as long as R3 receives free capacity released by P4
0/4
0/4
52
Resolving Deadlocks (cont’d)• K* = 1
R3:0 P5
4/8A R1:0
4/8
R2:0
P4
B
• For node A, increase its rate on P5 as long as R3 receives free capacity released by P4
4/4
4/4
53
Resolving Deadlocks (cont’d)• K* = 1
R3:4 P5
4/8A R1:0
4/8
R2:0
• For node A, increase its rate on P5 as long as R3 receives free capacity released by P4
54
Resolving Deadlocks (cont’d)• K* = 1
R3:0 R1:0
R2:0
NO DEADLOCK!
Evaluation: Testbed – TE CaseStraggler switches (+500 ms latency)
Install Tunnel
Remove Tunnel
Evaluation: Testbed – Failure Recovery CaseStraggler switches (+500 ms latency)
Install Tunnel
Remove Tunnel
Evaluation: Large-Scale Simulations• Focus on congestion freedom as consistency property
• WAN: Dataset from a large WAN that interconnects O(50) sites.o Collect traffic logs on routers every 5-minute intervalso 288 traffic matrices collected. (each is a different state)o Tunnel-based routing is used
• DC: Dataset from a large data center network with several hundred switches. o Collect traffic traces by logging the socket events on all servers and
aggregate into ToR-to-ToR flows over 5-minutes intervals o 288 traffic matrices collected per day o 1500 largest flows are chosen (40-60% of all traffic)o 1500 as switch rule memory size
• Alternative approaches: OneShot (opportunistic) and SWAN (static)
58
Evaluation: Update Time
99th
90th
50th
Dionysus outperforms SWAN in both configurations. For instance, in WAN TE:• Normal: 57% - 49% - 52% faster • Straggler: 88% - 84% - 81% faster
59
Evaluation: Link Oversubscription (WAN Failure Recovery – Random Link)
• Least oversubscription among the three• Reduces oversubscription by 41% (N) - 42% (S) < SWAN• Update time: 45% (N) - 82% (S) faster in the 99th percentile
Evaluation: Deadlocks
• Planning-based approaches lead to no deadlocks• Opportunistic approach deadlocks 90% (WAN) and 70% (DC)
of the time
WAN Topology is less regular than DC topology (i.e., more complex dependencies)
61
Evaluation: Deadlocks (cont’d)
10% memory slack = Memory size as 1100 when switch is loaded with 1000 rules
K*=5 in RateLimit algorithm
62
Conclusion
• Dionysus provides fast, consistent network updates through dynamic scheduling– Dependency graph: compactly encodes orderings– Scheduling: dynamically schedules operations
Dionysus enables more agile SDN control loops
63
Thanks!