![Page 1: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/1.jpg)
IrokoA Data Center Emulator
for Reinforcement Learning
Fabian Ruffy, Michael Przystupa, Ivan Beschastnikh
University of British Columbia
https://github.com/dcgym/iroko
![Page 2: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/2.jpg)
2
Reinforcement Learning and Networking
![Page 3: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/3.jpg)
3
Reinforcement Learning and Networking
![Page 4: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/4.jpg)
4
Reinforcement Learning and Networking
![Page 5: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/5.jpg)
5
Reinforcement Learning and Networking
![Page 6: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/6.jpg)
6
Reinforcement Learning and Networking
![Page 7: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/7.jpg)
7
Reinforcement Learning and Networking
![Page 8: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/8.jpg)
8
• DC challenges are optimization problems
• Traffic control
• Resource management
• Routing
• Operators have complete control
• Automation possible
• Lots of data can be collected
The Data Center: A perfect use case
Cho, Inho, Keon Jang, and Dongsu Han.
"Credit-scheduled delay-bounded congestion control for datacenters.“
SIGCOMM 2017
![Page 9: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/9.jpg)
9
• Typical reinforcement learning is not viable for data center operators!
• Fragile stability
• Questionable reproducibility
• Unknown generalizability
• Prototyping RL is complicated
• Cannot interfere with live production traffic
• Offline traces are limited in expressivity
• Deployment is tedious and slow
Two problems…
![Page 10: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/10.jpg)
10
• Iroko: open reinforcement learning gym for data center scenarios
• Inspired by the Pantheon* for WAN congestion control
• Deployable on a local Linux machine
• Can scale to topologies with many hosts
• Approximates real data center conditions
• Allows arbitrary definition of
• Reward
• State
• Actions
Our work: A platform for RL in Data Centers
*Yan, Francis Y., et al. "Pantheon: the training ground for Internet congestion-control research.“ ATC 2018
![Page 11: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/11.jpg)
11
Iroko in one slide
![Page 12: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/12.jpg)
12
Iroko in one slide
Topology
Fat-TreeRack Dumbbell
![Page 13: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/13.jpg)
13
Iroko in one slide
Topology
Fat-TreeRack Dumbbell
Traffic Pattern Action Model
![Page 14: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/14.jpg)
14
Iroko in one slide
Topology
Fat-TreeRack Dumbbell
Data Collectors
Traffic Pattern Action Model
![Page 15: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/15.jpg)
15
Iroko in one slide
Topology
Fat-TreeRack Dumbbell
Data Collectors
Reward Model State Model
Traffic Pattern Action Model
![Page 16: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/16.jpg)
16
Iroko in one slide
OpenAI Gym
Topology
Fat-TreeRack Dumbbell
Data Collectors
Reward Model State Model
Traffic Pattern Action Model
![Page 17: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/17.jpg)
17
Iroko in one slidePolicy
OpenAI Gym
Topology
Fat-TreeRack Dumbbell
Data Collectors
Reward Model State Model
Traffic Pattern Action Model
![Page 18: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/18.jpg)
18
• Ideal data center should have:
• Low latency, high utilization
• No packet loss or queuing delay
• Fairness
• CC variations draw from the reactive TCP
• Queueing latency dominates
• Frequent retransmits reduce goodput
• Data center performance may be unstable
Use Case: Congestion Control
![Page 19: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/19.jpg)
19
1010 10
10 10 10
Bandwidth
AllocationData CollectionFlow Pattern
Switch
Policy10
Predicting Networking Traffic
![Page 20: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/20.jpg)
20
1010 10
10 10 10
Bandwidth
AllocationData CollectionFlow Pattern
Switch
Policy10
Predicting Networking Traffic
![Page 21: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/21.jpg)
21
1010 10
10 10 10
Bandwidth
AllocationData CollectionFlow Pattern
Switch
Policy10
Predicting Networking Traffic
![Page 22: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/22.jpg)
22
3.3
3.3 3.4
1010 10
10 10 10
Bandwidth
AllocationData CollectionFlow Pattern
Switch
Policy10
Predicting Networking Traffic
![Page 23: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/23.jpg)
23
103.3 10
3.3 3.4 10
Bandwidth
AllocationData CollectionFlow Pattern
Switch
Policy10
Predicting Networking Traffic
![Page 24: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/24.jpg)
24
• Two environments:
• env_iroko: centralized rate limiting arbiter
• Agent can set the sending rate of hosts
• PPO, DDPG, REINFORCE
• env_tcp: raw TCP
• Contains implementations of TCP algorithms
• TCP Cubic, TCP New Vegas, DCTCP
• Goal: Avoid congestion
Can we learn to allocate traffic fairly?
![Page 25: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/25.jpg)
25
• 50000 timesteps
• Linux default UDP as base transport
• 5 runs (~7 hours per run)
• Bottleneck at central link
Experiment Setup
![Page 26: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/26.jpg)
Results – Dumbbell UDP
![Page 27: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/27.jpg)
27
• Challenging real-time environment
• Noisy observation
• Exhibits strong credit assignment problem
• RL algorithms show expected behavior for our gym
• Achieve better performance than TCP New Vegas
• More robust algorithms required to learn good policy
• DDPG and PPO achieve near optimum
• REINFORCE fails to learn good policy
Results - Takeaways
![Page 28: A Data Center Emulator for Reinforcement Learningbestchai/papers/iroko-nips18-workshop-slide… · "Credit-scheduled delay-bounded congestion control for datacenters.“ SIGCOMM 2017](https://reader033.vdocuments.mx/reader033/viewer/2022043016/5f38cd62e809df26d61880cc/html5/thumbnails/28.jpg)
28
• Data center reinforcement learning is gaining traction
…but it is difficult to prototype and evaluate
• Iroko is
• a platform to experiment with RL for data centers
• intended to train on live traffic
• early stage work
• but experiments are promising
• available on Github:
https://github.com/dcgym/iroko
Contributions