b4, google’s sdn wan lessons learned frombadri/552dir/papers/scheduling/atc15... · benefits of...
TRANSCRIPT
![Page 1: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/1.jpg)
Subhasree MandalJuly 9, 2015
Lessons Learned from B4, Google’s SDN WAN
![Page 2: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/2.jpg)
Google Innovations in Networking
B4
20062008
20102012
2014Google Global Cache
BwE
JupitergRPC
Onix
Freedome
Watchtower
QUIC
Andromeda
![Page 3: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/3.jpg)
More Than the Sum of PartsGoogle Networking works together as an integrated whole
• B4: WAN interconnect
• GGC: edge presence
• Jupiter: building scale datacenter network
• Freedome: campus-level interconnect
• Andromeda: isolated, high-performance slices of the physical network
Publications in INFOCOM 2012, SIGCOMM 2013, SIGCOMM 2014, CoNEXT 2014, EuroSys 2014, SIGCOMM 2015
![Page 4: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/4.jpg)
Motivation for SDN B4
![Page 5: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/5.jpg)
WAN Intensive Apps
Motivation for Backend Backbone
Data centers deployed across the world● Serve content with geographic locality● Replicate content for fault tolerance
Need a network to connect these data centers to one another● Not on the public Internet● Cost effective network for high volume traffic● Application specific variable in SLO● Bursty/bulk traffic (not smooth/diurnal)
YouTube Web Search Google+ Maps AppEngine Photos and Hangouts Android/Chrome Updates
![Page 6: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/6.jpg)
B4: 10x growth in last 3.5 years!
Two separate backbones:● B2: Carries Internet facing traffic → Growing faster than the Internet● B4: Inter-datacenter traffic → More traffic than B2, growing faster than B2
B4
traf
fic
Jul 2012 Jan 2013 Jul 2013 Jan 2014 Jul 2014 Jan 2015
Two Backbones
![Page 7: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/7.jpg)
Growth vs Cost
Does cost per bit/sec go down with additional scale?● Consider analogies with compute or storage
Networking cost/bit doesn't naturally decrease with size● Quadratic complexity in pairwise interactions and broadcast overhead of all-
to-all communication requires more expensive equipment● Manual management and configuration of individual elements● Complexity of automated configuration to deal with non-standard vendor
configuration APIs
![Page 8: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/8.jpg)
SDN to Solve It● Faster innovation: separate smarts out of embedded devices
○ Leverage powerful compute in Google servers○ Faster feature roll-outs on controllers○ Less frequent switch firmware upgrade○ Easier hardware upgrade/replacement
● Efficient network management○ Manage fabric, rather than collection of devices
● Cost effective: opportunity for centralized Traffic Engineering (TE)○ Higher overall throughput, via better utilization of deployed hardware
■ Need not overprovision○ Leverage multi-objective multi-commodity flow optimization algorithms
■ More optimal throughput and faster convergence ….
![Page 9: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/9.jpg)
Topics for Today● Background for Traffic Engineering (TE)● B4-SDN/TE Architecture with OpenFlow protocol● Benefits of B4-SDN/TE● Lessons learnt on SDN in three key areas
Fast producer/slow consumer: flow control to the rescue
Robust control plane connectivity and stable mastership is critical
SDN is natural fit for abstraction and hierarchy
Performance Availability Scale
![Page 10: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/10.jpg)
Background for Centralized Traffic Engineering
![Page 11: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/11.jpg)
● Flows: R1->R6: 20; R2->R6: 20; R4->R6: 20
R2
R1
R3 R5
R4
40
20 20
R6
20
20
6020
shortest path
2nd shortest path
3rd shortest path
4th shortest path
Convergence After Failure
![Page 12: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/12.jpg)
● Flows: R1->R6: 20; R2->R6: 20; R4->R6: 20
R2
R1
R3 R5
R4
40
20 20
R6
20
20
6020
Convergence After Failure
![Page 13: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/13.jpg)
● Flows: R1->R6: 20; R2->R6: 20; R4->R6: 20
● R5-R6 link fails○ R1, R2, R4 autonomously find next best path
R2
R1
R3 R5
R4
40
20 20
R6
20
20
6020
Convergence After Failure
![Page 14: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/14.jpg)
● Flows: R1->R6: 20; R2->R6: 20; R4->R6: 20
● R5-R6 link fails○ R1, R2, R4 autonomously try for next best path○ R1, R2, R4 push 20 altogether
R2
R1
R3 R5
R4
40
20 20
R6
20
20
6020
No Traffic Engineering
Convergence After Failure
![Page 15: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/15.jpg)
● Flows: R1->R6: 20; R2->R6: 20; R4->R6: 20
● R5-R6 link fails○ R1, R2, R4 autonomously try for next best path○ R1 wins, R2, R4 retry for next best path
R2
R1
R3 R5
R4
40
20 20
R6
20
20
6020
Distributed Traffic Engineering Protocols
Convergence After Failure
![Page 16: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/16.jpg)
● Flows: R1->R6: 20; R2->R6: 20; R4->R6: 20
● R5-R6 link fails○ R1, R2, R4 autonomously try for next best path○ R1 wins, R2, R4 retry for next best path ○ R2 wins this round, R4 retries again
R2
R1
R3 R5
R4
40
20 20
R6
20
20
6020
Distributed Traffic Engineering Protocols
Convergence After Failure
![Page 17: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/17.jpg)
● Flows: R1->R6: 20; R2->R6: 20; R4->R6: 20
● R5-R6 link fails○ R1, R2, R4 autonomously try for next best path○ R1 wins, R2, R4 retry for next best path ○ R2 wins this round, R4 retries again○ R4 finally gets third best path!
R2
R1
R3 R5
R4
40
20 20
R6
20
20
6020
Distributed Traffic Engineering Protocols
Convergence After Failure
![Page 18: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/18.jpg)
● Simple topology
● Flows:○ R1->R6: 20; R2->R6: 20; R4->R6: 20
R2
R1
R3
R4
40
20 20
20
Central TE
60
20
20
R5 R6
Centralized Traffic Engineering Protocols
Centralized Traffic Engineering
![Page 19: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/19.jpg)
● Simple topology
● Flows:○ R1->R6: 20; R2->R6: 20; R4->R6: 20
● R5-R6 fails○ R5 informs TE, which programs routers in one shot
R2
R1
R3
R4
40
20 20
20
Central TE
20
20
R5 R6
Centralized Traffic Engineering Protocols
Centralized Traffic Engineering
![Page 20: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/20.jpg)
● Simple topology
● Flows:○ R1->R6: 20; R2->R6: 20; R4->R6: 20
● R5-R6 link fails○ R5 informs TE, which programs routers in one shot○ Leads to faster realization of target optimum
R2
R1
R3 R5
R4
40
20 20
R6
20
20
6020
Centralized Traffic Engineering Protocols
Centralized Traffic Engineering
![Page 21: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/21.jpg)
● Better network utilization with global picture ● Converges faster to target optimum on failure● Allows more control and specifying intent
○ Deterministic behavior simplifies planning vs. overprovisioning for worst case variability
● Can mirror production event streams for testing○ Supports innovation and robust SW development
● Controller uses modern server hardware○ 50x (!) better performance
Advantages of Centralized TE
![Page 22: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/22.jpg)
B4 Architecture
![Page 23: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/23.jpg)
OF agent
B4 Site: SDN Architecture
silicon siliconOF agent
silicon siliconOF agent
silicon silicon
2 OF agentOF agent protocol protocol protocol protocol protocol protocol
![Page 24: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/24.jpg)
OF agent
B4 Site: SDN Architecture
silicon siliconOF agent
silicon siliconOF agent
siliconOF agent
siliconOF agentOF agent
protocol protocol protocol protocol protocol protocol
![Page 25: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/25.jpg)
Traditional WAN integrated with SDN: still speaking ISIS/BGP
OF agent
B4 Site: SDN Architecture
silicon siliconOF agent
silicon siliconOF agent
siliconOF agent
silicon
4 652 OF agent 3OF agent1
protocol
Master SDN controller
protocol protocol protocol protocol 5
protocol 64321
![Page 26: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/26.jpg)
Traditional WAN integrated with SDN: still speaking ISIS/BGP
OF agent
B4 Site: SDN Architecture
silicon siliconOF agent
silicon siliconOF agent
siliconOF agent
silicon
4 652 OF agent 3OF agent1
protocol
Master SDN controller
protocol protocol protocol protocol
Standby SDN controller
heartbeat exchange
5protocol
64321
![Page 27: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/27.jpg)
Traditional WAN integrated with SDN: still speaking ISIS/BGP
OF agent
B4 Site: SDN Architecture
silicon siliconOF agent
silicon siliconOF agent
siliconOF agent
silicon
4 652
Unit of management is a site = fabric
OF agent 3
SITE-A
OF agent1
protocol
Master SDN controller
protocol protocol protocol protocol
Standby SDN controller
heartbeat exchange
5protocol
64321
![Page 28: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/28.jpg)
Traditional WAN integrated with SDN: still speaking ISIS/BGP
OF agent
B4 Site: SDN Architecture
silicon siliconOF agent
silicon siliconOF agent
siliconOF agent
silicon
4 652
Unit of management is a site = fabric
OF agent 3
SITE-A
OF agent1
protocol
Master SDN controller
protocol protocol protocol protocol
Standby SDN controller
heartbeat exchange
5protocol
64321SITE-C
SITE-B
![Page 29: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/29.jpg)
Openflow 1.0 Rules
● Per QoS Traffic Engineering (TE)○ Demand based use of longer paths○ Max-min fair bandwidth allocation○ Per app loss/latency/throughput consideration
● TE paths are overlaid on ISIS/BGP routes○ Higher priority flow rules for TE
Traffic Engineering Overlay
80 Gbps
240 Gbps
B
CISIS shortest path
A
prio
rity TE flows
BGP/ISIS flows
![Page 30: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/30.jpg)
Control Plane ArchitectureTE server
(GlobalOptimizer)
demandTopology Prefixes
Hosts
protocols
silicon
Master SDN controller
OF agent
protocols protocols protocols protocolsprotocols
siliconOF agent OF agent
silicon siliconOF agent
siliconOF agent OF agent
silicon
Standby SDN controller
SITE-A
SITE-C
SITE-B
TE Pathing
![Page 31: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/31.jpg)
Control Plane ArchitectureTE server
(GlobalOptimizer)
SDN Gateway
demandTopology PrefixesTE Pathing
Hosts
demand collection
admission control
Bandwidth Enforcer
protocols
silicon
Master SDN controller
OF agent
protocols protocols protocols protocolsprotocols
siliconOF agent OF agent
silicon siliconOF agent
siliconOF agent OF agent
silicon
Standby SDN controller
SITE-A
SITE-C
SITE-B
![Page 32: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/32.jpg)
Control Plane ArchitectureTE server
(GlobalOptimizer)
SDN Gateway
demandTopology PrefixesTE Pathing
Hosts
demand collection
admission control
Bandwidth Enforcer
protocols
silicon
Master SDN controller
OF agent
protocols protocols protocols protocolsprotocols
siliconOF agent OF agent
silicon siliconOF agent
siliconOF agent OF agent
silicon
Standby SDN controller
TE App TE App
SITE-A
SITE-C
SITE-B
![Page 33: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/33.jpg)
Control Plane ArchitectureTE server
(GlobalOptimizer)
SDN Gateway
demandTopology PrefixesTE Pathing
Hosts
demand collection
admission control
Bandwidth Enforcer
protocols
silicon
Master SDN controller
OF agent
protocols protocols protocols protocolsprotocols
siliconOF agent OF agent
silicon siliconOF agent
siliconOF agent OF agent
silicon
Standby SDN controller
TE App TE App
SITE-A
SITE-C
SITE-B
![Page 34: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/34.jpg)
Benefits of SDN B4 with Centralized Traffic Engineering
![Page 35: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/35.jpg)
Benefits of TE Over Shortest Path
● ~20% increase in throughput over SPF● Larger benefits during capacity crunch
Helps more during capacity crunch
20%
Lowers the requirement for bandwidth provisioning
Thro
ughp
ut Im
prov
emen
t ove
r SP
F (%
)
Jul 2014
Oct 2014
Jan 2015
30
10
0
20
![Page 36: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/36.jpg)
Software and hardware feature roll outs decoupled● Software timescale feature roll out
○ Hitless SW upgrades and new features■ No packet loss and no capacity degradation■ Most feature releases do not touch the switch
● Slower HW upgrades○ 3 generations of HW under same SDN architecture
Other Benefits
![Page 37: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/37.jpg)
Lesson on Performance
![Page 38: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/38.jpg)
Controller to Switch Messaging
Initial simple-minded assumptions● OpenFlow protocol:
○ Flow and control packet (ISIS/BGP/ARP/...) requests sent from controller to OF agent (OFA) sequentially
● OF agent (OFA) can process them in order● System is always in consistent state
But ….
![Page 39: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/39.jpg)
P FFFF P FF
Fast server Queue build-up on controller and switch due to slow switch CPU
embedded switch stackOFASDN controller
Messages Backlogged and Delayed!
![Page 40: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/40.jpg)
P FFFF P FF
Fast server Queue build-up on controller and switch due to slow switch CPU
Flow rules generated in bursts
embedded switch stackOFASDN controller
Messages Backlogged and Delayed!
![Page 41: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/41.jpg)
P FFFF P FF
Fast server Queue build-up on controller and switch due to slow switch CPU
Flow rules generated in bursts
Flow programmingin HW is slow
embedded switch stackOFASDN controller
Messages Backlogged and Delayed!
![Page 42: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/42.jpg)
P FFFF P FF
Fast server Queue build-up on controller and switch due to slow switch CPU
Flow rules generated in bursts
Flow programmingin HW is slow
Single OpenFlow connectionbetween controller and OFA
embedded switch stackOFASDN controller
Messages Backlogged and Delayed!
![Page 43: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/43.jpg)
P FFFF P FF
Fast server Queue build-up on controller and switch due to slow switch CPU
Flow rules generated in bursts
Flow programmingin HW is slow
Single OpenFlow connectionbetween controller and OFA
embedded switch stackOFASDN controller
packets delayed
protocols timeoutreconvergence produces more flow rules
Messages Backlogged and Delayed!
Flow rules cause HOL blocking for packets
![Page 44: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/44.jpg)
P FFFF
Vicious Cycle of Protocol Instability!!!
P FF
Fast server Queue build-up on controller and switch due to slow switch CPU
Flow rules generated in bursts
Flow programmingin HW is slow
Single OpenFlow connectionbetween controller and OFA
embedded switch stackOFASDN controller
Flow rules cause HOL blocking for packets
packets delayed
protocols timeoutreconvergence produces more flow rules
Messages Backlogged and Delayed!
![Page 45: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/45.jpg)
SDN controller OFA embedded switch stack
Lesson: Mitigation with Flow Control
![Page 46: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/46.jpg)
● Separate queue for packet IO and flow request● Strict priority for packet IO over flow programming
PPP
FFFFF
strict priority scheduler
SDN controller OFA embedded switch stack
Lesson: Mitigation with Flow Control
![Page 47: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/47.jpg)
● Separate queue for packet IO and flow request● Strict priority for packet IO over flow programming● Limit queue depth in OFA: token based flow control
PPP
FFFFF
strict priority scheduler
N
SDN controller OFA embedded switch stack
flow control
Lesson: Mitigation with Flow Control
![Page 48: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/48.jpg)
● Separate queue for packet IO and flow request● Strict priority for packet IO over flow programming● Limit queue depth in OFA: token based flow control● Systematics queue drop discipline
PP
FFFF
strict priority scheduler
N
SDN controller OFA embedded switch stack
flow control
superseded
aged out!!!
Lesson: Mitigation with Flow Control
![Page 49: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/49.jpg)
● Separate queue for packet IO and flow request● Strict priority for packet IO over flow programming● Limit queue depth in OFA: token based flow control● Systematics queue drop discipline
PP
FFFF
strict priority scheduler
N
Async
SDN controller OFA embedded switch stack
flow control
superseded
aged out!!!
● Asynchronous OFA
Lesson: Mitigation with Flow Control
![Page 50: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/50.jpg)
● Separate queue for packet IO and flow request● Strict priority for packet IO over flow programming● Limit queue depth in OFA: token based flow control● Systematics queue drop discipline
PP
FFFF
strict priority scheduler
N
Async
SDN controller OFA embedded switch stack
DMA for packet I/O
Flow Processing
flow control
superseded
aged out!!!
● Asynchronous OFA● Packet IO out of flow
processing pipeline
Lesson: Mitigation with Flow Control
![Page 51: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/51.jpg)
Lesson on Availability
![Page 52: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/52.jpg)
Outages!!!
Unstable mastership
Operational Procedure/Tools
Core Software Bugs
Unsupported Software
Sites
Postmortem Bugs by Category
Deployment Growth
Worst
Offender
2012 2013 2014
201420132012
![Page 53: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/53.jpg)
protocols
silicon
Master SDN controller
OF agent
protocols protocols protocols protocolsprotocols
siliconOF agent OF agent
silicon siliconOF agent
siliconOF agent OF agent
silicon
Standby SDN controller
heartbeat exchange
TE App TE App
Initial naive design:● Symmetry between buildings● Each building can run independently, even if the other one is down● N+1 controller redundancy sufficient for upgrades, failures etc.
Control Plane Connectivity: Mastership
![Page 54: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/54.jpg)
● Both controllers declare mastership:○ Gateway and OFAs can observe mastership flapping frequently○ Declared master has partial reachability to switches
● Reported topology changes, pathing changes, flow programming failsNon-transitive reachability => Packets dropped!!
silicon
Master SDN controller
OF agentsilicon
OF agent OF agentsilicon silicon
OF agentsilicon
OF agentsilicon
Standby SDN controller
TE App TE App
Gateway
TE server
OF agent
Unstable reachability
Control Network: Unstable Mastership
![Page 55: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/55.jpg)
Paxos
Paxos Paxos
PaxosSDN cntrl
● Multiple independent domains per site: connected only through dataplane○ Each domain is unit for safe modular upgrade and maintenance
● Paxos: quorum-based robust master election within each domain● Also removes single point of failure in each site
TE Appprotocols
Domain 1
SDN cntrl TE Appprotocols
Domain 2
SDN cntrl TE Appprotocols
SDN cntrl TE Appprotocols
Domain 4
Lesson: Robust Control Reachability
Domain 3
![Page 56: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/56.jpg)
Lessons on Scaling
![Page 57: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/57.jpg)
Flat Topology Scales Poorly
● As B4 grows: more sites deployed● As compute per site grows:
○ More capacity required per siteLarger switches OR more switches● Larger switches: loss of large capacity on switch failure● More switches: more nodes and links to manage
○ ISIS and TE will hit scaling issues, converge too slowly...!!!
![Page 58: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/58.jpg)
Lesson: Hierarchical Topology
Best of both worlds with SDN● Topology abstractions by domain controllers
○ Supernode: tightly connected nodes/switches○ Supertrunks: links between super nodes
● Domain controllers compute○ intra-domain routing○ impairment due to internal failure
xN
x2N
xN
x2N
domain X
domain Y
physical topology: domain controller view
![Page 59: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/59.jpg)
Lesson: Hierarchical Topology
Best of both worlds with SDN● Topology in terms of supertrunk capacity● TE and ISIS/BGP work on supernodes
xN
x2N
xN
x2N
Reduces global controller-visible topology complexity by over 100x
domain X
domain Y
abstract topology: global controller view
supernode -2
supernode -1
supertrunk
![Page 60: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/60.jpg)
● SDN is beneficial in real-world○ Centralized TE delivered upto 30% additional throughput! ○ Decoupled software and hardware rollout
● Lessons to work in practice○ System performance: Flow control between components○ Availability: Robust reachability for master election○ Scale: Hierarchical topology abstraction
Conclusions
![Page 61: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/61.jpg)
References
● Upward Max Min Fairness: INFOCOM 2012
● B4: Experience with a Globally-Deployed Software Defined WAN: SIGCOMM 2013
● Bandwidth Enforcer: Flexible Hierarchical Bandwidth Allocation for WAN Distributed Computing: SIGCOMM 2015
![Page 62: B4, Google’s SDN WAN Lessons Learned frombadri/552dir/papers/scheduling/atc15... · Benefits of B4-SDN/TE Lessons learnt on SDN in three key areas Fast producer/slow ... Controller](https://reader035.vdocuments.mx/reader035/viewer/2022070920/5fb938abc88e5a495f1a0765/html5/thumbnails/62.jpg)
Google Platforms Networking
Hiring ● Interns● Full time engineers
Locations worldwide:● Mountain View● New York● Sydney
Inspiration and creativity to build Google’s infrastructure:
● Scale that gives the edge● Research turned into real life
production solution
Thank You!!
Software Hardware Test Technology