![Page 1: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/1.jpg)
Congestion Management for Ethernet-based Lossless
DataCenter Networks Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1,
Francisco J. Quiles1 and Jose Duato2
1: University of Castilla-La Mancha (UCLM)2: Technical University València (UPV)
DCN: 1-19-0012-00-ICne NENDICA
![Page 2: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/2.jpg)
Abstract
This paper describes congestion phenomena in lossless data center networks and its nega- tive consequences. It explores proposed solutions, analyzing their pros and cons to determine which are suited to the requirements of modern data centers. Conclusions identify important issues that should be addressed in the future.
![Page 3: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/3.jpg)
Agenda
IntroductionCongestion Dynamics in DCNsReducing In-Network and Incast CongestionCombining Congestion Management MechanismsConclusions
![Page 4: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/4.jpg)
Agenda
IntroductionCongestion Dynamics in DCNsReducing In-Network and Incast CongestionCombining Congestion Management MechanismsConclusions
![Page 5: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/5.jpg)
IntroductionOn-Line Data Intensive (OLDI) Services [Congdon18]
• Require immediate answers to requests that are coming in at a high rate.
• End-user experience is highly dependent upon the system responsiveness.
• The network becomes a significant component of overall DC latency when congestion occurs in the network.
Worker Worker ... Worker
Aggregator Aggregator ...
Worker Worker ... Worker
Aggregator
Aggregator
Deadline = 10 ms
Deadline = 50 ms
Deadline = 250 msRequest
![Page 6: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/6.jpg)
• Todays DCNs require a flexible fabric for carrying in a convergent way traffic from different types of applications, storage of control.• Latency is a concern: Fabric design for DCNs must
minimize or eliminate packet loss, provide high throughput and maintain low latency.• These goals are crucial for applications of OLDI,
Deep Learning, NVMe over Fabrics and the Cloudified Central Offices.• However, congestion threatens these applications.
IntroductionData-Center Networks (DCNs)
![Page 7: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/7.jpg)
• HoL-blocking dramatically degrades the network performance (e.g. PFC has not enough granularity and there is no congested flow identification) [Garcia05].• Classical e2e congestion
control for lossless networks is difficult to tune, reacts slowly, and may introduce oscillations and instability [Escudero11].
HSstarts
HSends
HS = traffic injected to Hot Spot destination
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 1e+06 2e+06 3e+06 4e+06 5e+06Netw
ork
Thro
ughp
ut (
norm
aliz
ed)
Time (nanoseconds)
1QITh
VOQnet
64-node CLOS network, 4 hot-spots
IntroductionWhy congestion isolation is needed?
![Page 8: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/8.jpg)
33%
33%
66%
33%
33%
33%
33%
66%
100%
Sw. 1
Sw. 2
Sw. 3
Sw. 4
Sw. 7
Sw. 6
Sw. 5
Sw. 8
Src. A
Sw. 933%
Src. B
Src. C
Src. D
Src. E
Dst. X
Dst. Y
Dst. Z33%
33%
33%
33%
33%
33 % Sending33 % Stopped
33 % Sending
Low-Order HoL-blocking
33%
33 % Sending33 % Stopped
33 % Sending
High-Order HoL-blocking
Congested flows (Dst. X)
Non-congested flows (Dst. Z)
Non-congested flows (Dst. Y)
IntroductionWhy congestion isolation is needed?
![Page 9: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/9.jpg)
• We need a congestion isolation (CI) mechanism that reacts quickly when transient congestion situations appear, preventing network performance degradation caused by the HoL blocking.• We want a CI mechanism that complements other
technologies available in the DCNs, so that CI improves their performance, while the others reduce the CI complexity.
IntroductionWhy congestion isolation is needed?
![Page 10: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/10.jpg)
Agenda
IntroductionCongestion Dynamics in DCNsReducing In-Network and Incast CongestionCombining Congestion Management MechanismsConclusions
![Page 11: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/11.jpg)
Congestion
Inje
ctio
n ra
te a
t 100
% o
fth
e lin
k ba
ndw
idth
(ful
l rat
e)
Congestion
Inje
ctio
n ra
te a
t 100
% o
fth
e lin
k ba
ndw
idth
(ful
l rat
e)
Congestion (t0+T)
Inje
ctio
n ra
te a
t 100
% o
fth
e lin
k ba
ndw
idth
(ful
l rat
e)
Congestion (t0)Congestion (t0)
Inje
ctio
n ra
te a
t 100
% o
fth
e lin
k ba
ndw
idth
(ful
l rat
e)
Congestion (t0+T)
Congestion Dynamics in DCNsAppearance of Congestion
Speedup = 1 Speedup = 2
Speedup = 2 Speedup = 1.5
![Page 12: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/12.jpg)
Congestion Dynamics in DCNsGrowth of Congestion Trees (from root to leaves)
Switch 1
Switch 3
Switch 5
Switch 2
Switch 4
Switch speedup = 1.5Packet flowsCongestion point
![Page 13: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/13.jpg)
Switch 4
Switch speedup = 1.5Packet flowsCongestion point
Switch 3
Switch 2
Switch 1
Switch 5
Switch 6
Switch 7
Congestion Dynamics in DCNsGrowth of Congestion Trees (from leaves to root)
![Page 14: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/14.jpg)
Congestion Dynamics in DCNsGrowth of Congestion Trees (Roots movement)
Switch 2
Switch 1
Switch 3
Switch 2
Switch 1
Switch 3
Switch speedup = 1.5Packet flows (start)Packet flows (after)Congestion point
![Page 15: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/15.jpg)
Switch 4
Switch 3
Switch 2
Switch 1
Switch 5
Switch 6
Switch 7 Switch 8Y
X
Switch speedup = 1.5Packet flows addressed to XPacket flows addressed to YCongestion point
Congestion Dynamics in DCNsGrowth of Congestion Trees (in-network roots)
![Page 16: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/16.jpg)
Switch speedup = 1.5Packet flows addressed to XPacket flows addressed to YCongestion point
Switch 1
Switch 2
Switch 3
Switch 4
Switch 5
Switch 6
Switch 7
Switch 8
Switch 9
X
Y
Congestion Dynamics in DCNsGrowth of Congestion Trees (Overlapping)
![Page 17: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/17.jpg)
Switch 2
Switch 1
Switch 3
Switch speedup = 1.5Permanent packet flowsPacket flows disappearing firstCongestion point first appeared in the switch
Switch 2
Switch 1
Switch 3
Congestion Dynamics in DCNsGrowth of Congestion Trees (Vanishing)
![Page 18: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/18.jpg)
Agenda
IntroductionCongestion Dynamics in DCNsReducing In-Network and Incast CongestionCombining Congestion Management MechanismsConclusions
![Page 19: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/19.jpg)
Reducing CongestionIncast congestion reduction - ECMP
![Page 20: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/20.jpg)
Switch 1
Switch 2
Switch 3
Switch 4
Switch 5
Switch 6
Switch 7
Switch 8
Switch 9
X
Y
Switch speedup = 1.5Packet flows addressed to XPacket flows addressed to YVictim flowCongestion point
Reducing CongestionIn-network congestion reduction - ECN
![Page 21: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/21.jpg)
• These technologies may work together to eliminate loss in the cloud data center network.• Load-balancing and destination scheduling are end-to-
end solutions incurring in the RTT delays when congestion appear.• However, there is no time for loss in the network due to
congestion and congestion trees grow very quickly.• Transient congestion may still produce HoL blocking
that leads to increase latency, lower throughput and buffers overflow, significantly degrading performance.• Even using these mechanisms, we still need something
to deal with HOL Blocking locally and fast.
Reducing CongestionLimitations of current technologies [Escudero19]
![Page 22: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/22.jpg)
Agenda
IntroductionCongestion Dynamics in DCNsReducing In-Network and Incast CongestionCombining Congestion Management MechanismsConclusions
![Page 23: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/23.jpg)
Combining Congestion Management Mechanisms• CI is needed to react locally and very fast to
immediately eliminate HoL blocking.• Previous technologies reduce the use of PFC and
ECN, but their closed- and open-loop approach cause delays still happening.• Congestion trees appear suddenly, are difficult to
predict (even worse when load balancing is applied) and grow quickly.• New techniques can be applied in combination to
the previous technologies, improving their behavior.
![Page 24: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/24.jpg)
Switch A
P1
P2
P3P3
Switch B
P2
P1
P4
CFQ
nCFQCongestion
Root
CIP
CFQ
LegendOutput port requested by thepacket on top.Congestion root.Congestion Isolation Packets (CIP).Packets from congested flows.Packets from non-congested flows.
CFQ
nCFQ
CFQ
nCFQ
nCFQ
CFQ
nCFQ
CFQ
nCFQ
P4
Combining Congestion Management MechanismsDynamic Virtual Lanes (DVL)
![Page 25: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/25.jpg)
Agenda
IntroductionCongestion Dynamics in DCNsReducing In-Network and Incast CongestionCombining Congestion Management MechanismsConclusions
![Page 26: Congestion Management for Ethernet-based Lossless ... · Pedro Javier Garcia1, Jesus Escudero-Sahuquillo1, Francisco J. Quiles1and Jose Duato2 1: University of Castilla-La Mancha](https://reader036.vdocuments.mx/reader036/viewer/2022063011/5fc52741a686ad79f35bd851/html5/thumbnails/26.jpg)
References
[Duato03] J. Duato, S. Yalamanchili, and L. M. Ni, Interconnection Networks: AnEngineering Approach. San Francisco, CA, USA: Morgan Kaufmann Publishers, 2003.[Garcia05] P. J. Garcia, J. Flich, J. Duato, I. Johnson, F. J. Quiles, and F. Naven, “Dynamic Evolution of Congestion Trees: Analysis and Impact on Switch Architecture,” in High Performance Embedded Architectures and Compilers, ser. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, Nov. 2005, pp. 266–285.[Congdon18] Paul Congdon, “IEEE 802 Nendica Report: The Lossless Network for Data Centers”, IEEE-SA Industry Connections White Paper, August 2018.
[Leiserson85] C. E. Leiserson, “Fat-trees: Universal networks for hardware-efficientsupercomputing,” IEEE Transactions on Computers, vol. C-34, pp. 892– 901, Oct 1985.[Escudero11] Jesús Escudero-Sahuquillo, Ernst Gunnar Gran, Pedro Javier García, JoseFlich, Tor Skeie, Olav Lysne, Francisco J. Quiles, José Duato: Combining Congested-FlowIsolation and Injection Throttling in HPC Interconnection Networks. ICPP 2011: 662-672[Escudero19] Jesús Escudero-Sahuquillo, Pedro Javier García, Francisco J. Quiles, José Duato: P802.1Qcz interworking with other data center technologies. IEEE 802.1 PlenaryMeeting, San Diego, CA, USA July 8, 2018(cz-escudero-sahuquillo-ci-internetworking-0718-v1.pdf)