15-744: computer networking l-14 data center networking ii
TRANSCRIPT
![Page 1: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/1.jpg)
15-744: Computer Networking
L-14 Data Center Networking II
![Page 2: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/2.jpg)
Overview
• Data Center Topologies
• Data Center Scheduling
2
![Page 3: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/3.jpg)
Current solutions for increasing data center network bandwidth
3
1. Hard to construct 2. Hard to expand
FatTree BCube
![Page 4: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/4.jpg)
Background: Fat-Tree• Inter-connect racks (of servers) using a fat-tree topology• Fat-Tree: a special type of Clos Networks (after C. Clos)
K-ary fat tree: three-layer topology (edge, aggregation and core)• each pod consists of (k/2)2 servers & 2 layers of k/2 k-port
switches• each edge switch connects to k/2 servers & k/2 aggr. switches • each aggr. switch connects to k/2 edge & k/2 core switches• (k/2)2 core switches: each connects to k pods
4
Fat-tree with K=4
![Page 5: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/5.jpg)
Why Fat-Tree?• Fat tree has identical bandwidth at any bisections
• Each layer has the same aggregated bandwidth
• Can be built using cheap devices with uniform capacity• Each port supports same speed as end host• All devices can transmit at line speed if packets are distributed uniform
along available paths
• Great scalability: k-port switch supports k3/4 servers
5
Fat tree network with K = 3 supporting 54 hosts
![Page 6: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/6.jpg)
An alternative: hybrid packet/circuit switched data center network
6
Goal of this work: – Feasibility: software design that enables efficient use of optical
circuits– Applicability: application performance over a hybrid network
![Page 7: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/7.jpg)
Electrical packet switching
Optical circuit switching
Switching technology
Store and forward Circuit switching
Switching capacity
Switching time
Optical circuit switching v.s. Electrical packet switching
7
16x40Gbps at high end e.g. Cisco CRS-1
320x100Gbps on market, e.g. Calient FiberConnect
Packet granularity Less than 10ms
e.g. MEMS optical switch
![Page 8: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/8.jpg)
Optical Circuit Switch
2010-09-02 SIGCOMM Nathan Farrington 8
Lenses FixedMirror
Mirrors on Motors
Glass FiberBundle
Input 1Output 2Output 1
Rotate MirrorRotate Mirror• Does not decode
packets• Needs take time
to reconfigure
![Page 9: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/9.jpg)
Electrical packet switching
Optical circuit switching
Switching technology
Store and forward Circuit switching
Switching capacity
Switching time
Switching traffic
Optical circuit switching v.s. Optical circuit switching v.s. Electrical packet switchingElectrical packet switching
9
16x40Gbps at high end e.g. Cisco CRS-1
320x100Gbps on market, e.g. Calient FiberConnect
Packet granularity Less than 10ms
For bursty, uniform traffic For stable, pair-wise traffic
![Page 10: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/10.jpg)
Optical circuit switching is promising despite slow switching time
• [IMC09][HotNets09]: “Only a few ToRs are hot and most their traffic goes to a few other ToRs. …”
• [WREN09]: “…we find that traffic at the five edge switches exhibit an ON/OFF pattern… ”
10
Full bisection bandwidth at packet granularitymay not be necessary
![Page 11: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/11.jpg)
Hybrid packet/circuit switched network architecture
Optical circuit-switched network for high capacity transfer
Electrical packet-switched network for low latency delivery
Optical paths are provisioned rack-to-rack– A simple and cost-effective choice – Aggregate traffic on per-rack basis to better utilize optical circuits
![Page 12: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/12.jpg)
Design requirements
12
Control plane:– Traffic demand estimation – Optical circuit configuration
Data plane:– Dynamic traffic de-multiplexing– Optimizing circuit utilization
(optional)
Traffic demands
![Page 13: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/13.jpg)
c-Through (a specific design)
13
No modification to applications and switches
Leverage end-hosts for traffic
management Centralized control for circuit configuration
![Page 14: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/14.jpg)
c-Through - traffic demand estimation and traffic batching
14
Per-rack traffic demand vector
2. Packets are buffered per-flow to avoid HOL blocking.
1. Transparent to applications.
Applications
Accomplish two requirements: – Traffic demand estimation – Pre-batch data to improve optical circuit utilization
Socket buffers
![Page 15: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/15.jpg)
c-Through - optical circuit configuration
15
Use Edmonds’ algorithm to compute optimal configuration
Many ways to reduce the control traffic overhead
Traffic demand
configuration
Controller
configuration
![Page 16: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/16.jpg)
c-Through - traffic de-multiplexing
16
VLAN #1
Traffic de-multiplexer
VLAN #1 VLAN #2
circuit configuration
traffic
VLAN #2
VLAN-based network isolation:
– No need to modify switches
– Avoid the instability caused by circuit reconfiguration
Traffic control on hosts:– Controller informs hosts
about the circuit configuration
– End-hosts tag packets accordingly
![Page 17: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/17.jpg)
17
![Page 18: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/18.jpg)
![Page 19: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/19.jpg)
![Page 20: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/20.jpg)
![Page 21: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/21.jpg)
![Page 22: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/22.jpg)
Overview
• Data Center Topologies
• Data Center Scheduling
22
![Page 23: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/23.jpg)
Datacenters and OLDIs
OLDI = OnLine Data Intensive applications
e.g., Web search, retail, advertisements
An important class of datacenter applications
Vital to many Internet companies
OLDIs are critical datacenter applications
![Page 24: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/24.jpg)
OLDIs
• Partition-aggregate • Tree-like structure• Root node sends query• Leaf nodes respond with data
• Deadline budget split among nodes and network• E.g., total = 300 ms, parents-leaf RPC = 50 ms
• Missed deadlines incomplete responses affect user experience & revenue
![Page 25: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/25.jpg)
Challenges Posed by OLDIs
Two important properties:
1)Deadline bound (e.g., 300 ms)• Missed deadlines affect revenue
2)Fan-in bursts Large data, 1000s of servers
Tree-like structure (high fan-in) Fan-in bursts long “tail latency”
Network shared with many apps (OLDI and non-OLDI)
Network must meet deadlines & handle fan-in bursts
![Page 26: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/26.jpg)
Current Approaches
TCP: deadline agnostic, long tail latency Congestion timeouts (slow), ECN (coarse)
Datacenter TCP (DCTCP) [SIGCOMM '10]
first to comprehensively address tail latency
Finely vary sending rate based on extent of
congestion
shortens tail latency, but is not deadline aware
~25% missed deadlines at high fan-in & tight
deadlinesDCTCP handles fan-in bursts, but is not deadline-aware
![Page 27: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/27.jpg)
D2TCP
Deadline-aware and handles fan-in bursts
Key Idea: Vary sending rate based on both deadline
and extent of congestion
Built on top of DCTCP
Distributed: uses per-flow state at end hosts
Reactive: senders react to congestion
no knowledge of other flows
![Page 28: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/28.jpg)
D2TCP’s Contributions
1) Deadline-aware and handles fan-in bursts Elegant gamma-correction for congestion avoidance
far-deadline back off more near-deadline back off less
Reactive, decentralized, state (end hosts)
• Does not hinder long-lived (non-deadline) flows
• Coexists with TCP incrementally deployable
• No change to switch hardware deployable today
D2TCP achieves 75% and 50% fewer missed deadlines than DCTCP and D3
![Page 29: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/29.jpg)
Coflow Definition
![Page 30: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/30.jpg)
30
![Page 31: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/31.jpg)
31
![Page 32: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/32.jpg)
32
![Page 33: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/33.jpg)
33
![Page 34: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/34.jpg)
34
![Page 35: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/35.jpg)
35
![Page 36: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/36.jpg)
36
![Page 37: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/37.jpg)
37
![Page 38: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/38.jpg)
Data Center Summary
• Topology• Easy deployment/costs• High bi-section bandwidth makes placement less
critical• Augment on-demand to deal with hot-spots
• Scheduling• Delays are critical in the data center• Can try to handle this in congestion control• Can try to prioritize traffic in switches• May need to consider dependencies across flows
to improve scheduling
38
![Page 39: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/39.jpg)
Review
• Networking background• OSPF, RIP, TCP, etc.
• Design principles and architecture• E2E and Clark
• Routing/Topology• BGP, powerlaws, HOT topology
39
![Page 40: 15-744: Computer Networking L-14 Data Center Networking II](https://reader036.vdocuments.mx/reader036/viewer/2022062801/56649e675503460f94b62163/html5/thumbnails/40.jpg)
Review• Resource allocation
• Congestion control and TCP performance• FQ/CSFQ/XCP
• Network evolution• Overlays and architectures• Openflow and click• SDN concepts• NFV and middleboxes
• Data centers• Routing• Topology• TCP• Scheduling
40