lightning talks - stanford university · lightning talks 1. christina delimitrou: “tarcil:...
TRANSCRIPT
![Page 1: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/1.jpg)
Lightning Talks
![Page 2: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/2.jpg)
Lightning Talks1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters”
2. Nic McDonald: “Service-Oriented Rate Control”
3. Collin Lee: “Infrastructure and Modules for Building Scalable Control Planes”
4. Omid Mashayekhi: “Nimbus: Running Fast, Distributed Computations with Execution Templates”
5. Stephen Yang: “Gaining Visibility and Insight into Distributed Systems”
6. Henry Qin: “Core-Aware Scheduling: Balancing Application Concurrency with Core Availability”
7. Sean Choi: “Customizing Open vSwitch using P4”
8. Stephen Ibanez: “High Speed Networks Need Proactive Congestion Control”
9. Samuel Grossman: “Grazelle: Hardware-Optimized In-Memory Graph Processing”
![Page 3: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/3.jpg)
Lightning Talks1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters”
2. Nic McDonald: “Service-Oriented Rate Control”
3. Collin Lee: “Infrastructure and Modules for Building Scalable Control Planes”
4. Omid Mashayekhi: “Nimbus: Running Fast, Distributed Computations with Execution Templates”
5. Stephen Yang: “Gaining Visibility and Insight into Distributed Systems”
6. Henry Qin: “Core-Aware Scheduling: Balancing Application Concurrency with Core Availability”
7. Sean Choi: “Customizing Open vSwitch using P4”
8. Stephen Ibanez: “High Speed Networks Need Proactive Congestion Control”
9. Samuel Grossman: “Grazelle: Hardware-Optimized In-Memory Graph Processing”
![Page 4: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/4.jpg)
4
❏ Problem: Disparity in cloud scheduling designs
❏ Centralized schedulers → High quality, low speed
❏ Sampling-based schedulers → High speed, low quality
❏ Tarcil: Key scheduling techniques to bridge this gap
❏ Account for resource preferences → High decision quality
❏ Analytical framework for sampling → Predictable performance
❏ Admission control → High quality, fast decisions at high load
❏ Distributed design → High scheduling throughput
❏ 50msec avg scheduling latency, > 95% of tasks meet QoS
Tarcil: Reconciling Scheduling Speed and Quality in Large Shared Clusters Christina Delimitrou1, Daniel Sanchez2, Christos Kozyrakis1
1Stanford University, 2MIT
![Page 5: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/5.jpg)
Lightning Talks1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters”
2. Nic McDonald: “Service-Oriented Rate Control”
3. Collin Lee: “Infrastructure and Modules for Building Scalable Control Planes”
4. Omid Mashayekhi: “Nimbus: Running Fast, Distributed Computations with Execution Templates”
5. Stephen Yang: “Gaining Visibility and Insight into Distributed Systems”
6. Henry Qin: “Core-Aware Scheduling: Balancing Application Concurrency with Core Availability”
7. Sean Choi: “Customizing Open vSwitch using P4”
8. Stephen Ibanez: “High Speed Networks Need Proactive Congestion Control”
9. Samuel Grossman: “Grazelle: Hardware-Optimized In-Memory Graph Processing”
![Page 6: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/6.jpg)
High-Performance Service-Oriented ComputingNic McDonald & Bill Dally
View MemCache
ADs UserDB
PostDBEdit
![Page 7: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/7.jpg)
Service Oriented Access Control
H1
S1:P1
S1
S1:D2
S1:D1
S3:P1
H2
S2:P1
S2
S3:P2
S2:D1
H4
S3:D2
S3
S3:D1S3:P3
H5
H3
S1:P2
S1:D3
=Service =Process =Domain =Host To LANTo CPUProcessor
Interconnect Controller
Network Access
Controller
Security Logic
Hash Map Controller
Dynamic Memory Allocator
Memory System
![Page 8: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/8.jpg)
Service Oriented Access Control
146 TB vs. 5.33 GB 1.12 GB vs. 20.8 MB
![Page 9: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/9.jpg)
Service Oriented Rate Control
View MemCache
ADs UserDB
PostDBEdit 5 Gbps
15 Gbps
![Page 10: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/10.jpg)
Service Oriented Rate ControlService BService A
Proc 15 Gbps
Proc
Proc
Proc
Proc
Proc
![Page 11: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/11.jpg)
Service Oriented Rate ControlService BService A
Proc 15 Gbps
Proc
Proc
Proc
Proc
Proc
r
btokens
messages
Disadvantages:● high latency● bandwidth waste
r/2
r/2
![Page 12: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/12.jpg)
Service Oriented Rate ControlService A Service B
Proc 15 Gbps
Proc
Proc
Proc
Proc
Procr/3 || theft+
r/3 || theft+
r/3 || theft+
Disadvantages:● overhead for transient
non-uniformity
r
btokens
messages
![Page 13: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/13.jpg)
Service Oriented Rate Control
Bandwidth:112.5% overhead
99.9%ile Latency:325 cycles310 cycles
![Page 14: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/14.jpg)
Service Oriented Rate Control
Bandwidth:0.5% overhead
99.9%ile Latency:55 cycles32 cycles
![Page 15: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/15.jpg)
Lightning Talks1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters”
2. Nic McDonald: “Service-Oriented Rate Control”
3. Collin Lee: “Infrastructure and Modules for Building Scalable Control Planes”
4. Omid Mashayekhi: “Nimbus: Running Fast, Distributed Computations with Execution Templates”
5. Stephen Yang: “Gaining Visibility and Insight into Distributed Systems”
6. Henry Qin: “Core-Aware Scheduling: Balancing Application Concurrency with Core Availability”
7. Sean Choi: “Customizing Open vSwitch using P4”
8. Stephen Ibanez: “High Speed Networks Need Proactive Congestion Control”
9. Samuel Grossman: “Grazelle: Hardware-Optimized In-Memory Graph Processing”
![Page 16: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/16.jpg)
Infrastructure and Modules forBuilding Scalable Control Planes
Collin Lee and John Ousterhout
![Page 17: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/17.jpg)
Exploring Easier Distributed Systems Development● Distributed systems are necessary but hard to build
○ Most of the difficult concentrated in “control planes”○ Needed for scalability, availability, fault-tolerance, or fundamentally distributed applications
● Difficulty makes distributed systems “cost prohibitive” for some applications○ “best” solution is to use a distributed system but it is too hard or time consuming to build
● Can building distributed systems be as easy as single node applications?● Are there interfaces, abstractions, modules, infrastructures to hide to
complexity of developing distributed systems?
![Page 18: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/18.jpg)
Collecting Ideas and Industry FeedbackPossible Applications?
● Storage System Coordinators?● Cluster/Resource Managers?● Software Defined Networks?● Drone Air Traffic Control?● Self-Driving Car Fleet Coordination?● <your_application_here>?
Know Challanges?
● Code duplication○ normal operation○ durability○ recovery
● Control node scalability● <your_frustrations_here>
Have ideas? Please come chat!
![Page 19: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/19.jpg)
Lightning Talks1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters”
2. Nic McDonald: “Service-Oriented Rate Control”
3. Collin Lee: “Infrastructure and Modules for Building Scalable Control Planes”
4. Omid Mashayekhi: “Nimbus: Running Fast, Distributed Computations with Execution Templates”
5. Stephen Yang: “Gaining Visibility and Insight into Distributed Systems”
6. Henry Qin: “Core-Aware Scheduling: Balancing Application Concurrency with Core Availability”
7. Sean Choi: “Customizing Open vSwitch using P4”
8. Stephen Ibanez: “High Speed Networks Need Proactive Congestion Control”
9. Samuel Grossman: “Grazelle: Hardware-Optimized In-Memory Graph Processing”
![Page 20: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/20.jpg)
Nimbus: Running Fast, Distributed Computations with Execution Templates
Omid Mashayekhi([email protected])
Hang QuChinmayee Shah
Philip Levis
February 2016
![Page 21: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/21.jpg)
Nimbus: Running Fast, Distributed Computations with Execution TemplatesOmid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis
● In-memory data analytics has become CPU-bound.
![Page 22: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/22.jpg)
Nimbus: Running Fast, Distributed Computations with Execution TemplatesOmid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis
● In-memory data analytics has become CPU-bound.
Runtime Overhead ~ 19-32%
![Page 23: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/23.jpg)
Nimbus: Running Fast, Distributed Computations with Execution TemplatesOmid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis
● In-memory data analytics has become CPU-bound.○ Optimizing applications in a lower level language speeds tasks up.
Runtime Overhead ~ 19-32%
![Page 24: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/24.jpg)
Nimbus: Running Fast, Distributed Computations with Execution TemplatesOmid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis
● In-memory data analytics has become CPU-bound.○ Optimizing applications in a lower level language speeds tasks up.○ Shorter task means higher task rate which results in excessive runtime overhead.
Runtime Overhead ~ 19-32%
Almost entirely Runtime Overhead
![Page 25: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/25.jpg)
Nimbus: Running Fast, Distributed Computations with Execution TemplatesOmid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis
● In-memory data analytics has become CPU-bound.○ Optimizing applications in a lower level language speeds tasks up.○ Shorter task means higher task rate which results in excessive runtime overhead.
● Current scheduling architectures have limited task rate.
![Page 26: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/26.jpg)
Nimbus: Running Fast, Distributed Computations with Execution TemplatesOmid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis
● In-memory data analytics has become CPU-bound.○ Optimizing applications in a lower level language speeds tasks up.○ Shorter task means higher task rate which results in excessive runtime overhead.
● Current scheduling architectures have limited task rate. ● Key insight behind Nimbus is that long running CPU-bound applications are
iterative in nature (e.g. ML algorithms, scientific computing, etc.). ● Scheduler can memoize and reuse computations as patterns recur.● Execution Templates provide an abstraction for memoizing and reusing the
computations and suppressing the command exchange by the scheduler.
![Page 27: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/27.jpg)
Nimbus: Running Fast, Distributed Computations with Execution TemplatesOmid Mashayekhi, Hang Qu, Chinmayee Shah, Philip Levis
● Nimbus achieves tasks rates as high as half a million tasks per second!
HPC applications within the cloud frameworks with negligible overhead (3-11%)
20X speedup for ML benchmarks
![Page 28: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/28.jpg)
Lightning Talks1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters”
2. Nic McDonald: “Service-Oriented Rate Control”
3. Collin Lee: “Infrastructure and Modules for Building Scalable Control Planes”
4. Omid Mashayekhi: “Nimbus: Running Fast, Distributed Computations with Execution Templates”
5. Stephen Yang: “Gaining Visibility and Insight into Distributed Systems”
6. Henry Qin: “Core-Aware Scheduling: Balancing Application Concurrency with Core Availability”
7. Sean Choi: “Customizing Open vSwitch using P4”
8. Stephen Ibanez: “High Speed Networks Need Proactive Congestion Control”
9. Samuel Grossman: “Grazelle: Hardware-Optimized In-Memory Graph Processing”
![Page 29: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/29.jpg)
Gaining Visibility and Insight intoDistributed Systems
Stephen Yang and John Ousterhout
![Page 30: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/30.jpg)
Gaining Visibility and Insight into Distributed Systems● Problem: Developing/Debugging Distributed Systems is hard due to
Limited Visibility○ Logging can be inefficient
● Proposal: Build an interactive visualization tool to support development of such systems
Visualization
Data Interaction
DeveloperDistributed System
![Page 31: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/31.jpg)
Lightning Talks1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters”
2. Nic McDonald: “Service-Oriented Rate Control”
3. Collin Lee: “Infrastructure and Modules for Building Scalable Control Planes”
4. Omid Mashayekhi: “Nimbus: Running Fast, Distributed Computations with Execution Templates”
5. Stephen Yang: “Gaining Visibility and Insight into Distributed Systems”
6. Henry Qin: “Core-Aware Scheduling: Balancing Application Concurrency with Core Availability”
7. Sean Choi: “Customizing Open vSwitch using P4”
8. Stephen Ibanez: “High Speed Networks Need Proactive Congestion Control”
9. Samuel Grossman: “Grazelle: Hardware-Optimized In-Memory Graph Processing”
![Page 32: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/32.jpg)
Core-Aware Scheduling: Application Concurrency with Core AvailabilityHenry Qin & John Ousterhout
● Questing for high throughput in low-latency services
● Can we efficiently multiplex machines between latency-sensitive foreground tasks CPU-intensive background tasks?
● Allocating cores in the kernel, scheduling threads at user level
● More detailed talk in the afternoon.
● Industry representatives please stop by the poster; I have many questions for you!
![Page 33: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/33.jpg)
Lightning Talks1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters”
2. Nic McDonald: “Service-Oriented Rate Control”
3. Collin Lee: “Infrastructure and Modules for Building Scalable Control Planes”
4. Omid Mashayekhi: “Nimbus: Running Fast, Distributed Computations with Execution Templates”
5. Stephen Yang: “Gaining Visibility and Insight into Distributed Systems”
6. Henry Qin: “Core-Aware Scheduling: Balancing Application Concurrency with Core Availability”
7. Sean Choi: “Customizing Open vSwitch using P4”
8. Stephen Ibanez: “High Speed Networks Need Proactive Congestion Control”
9. Samuel Grossman: “Grazelle: Hardware-Optimized In-Memory Graph Processing”
![Page 34: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/34.jpg)
Customizing Open vSwitch using P4Muhammad Shahbaz, Sean Choi, Ben Pfaff, Changhoon Kim, Nick Feamster, Jennifer Rexford, and Nick McKeown
Problem Statement
● Can we expedite the process of implementing new network protocols or features?
● Can we do this without incurring a huge performance cost?
Approach
● We present a P4 to OvS compiler○ System administrators can easily describe the changes as a P4
program○ P4-OvS switch is aimed to provide similar performance with
significantly reduced complexity
![Page 35: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/35.jpg)
Customizing Open vSwitch using P4Muhammad Shahbaz, Sean Choi, Ben Pfaff, Changhoon Kim, Nick Feamster, Jennifer Rexford, and Nick McKeown
P4-OvS Compiler Specification
![Page 36: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/36.jpg)
Customizing Open vSwitch using P4Muhammad Shahbaz, Sean Choi, Ben Pfaff, Changhoon Kim, Nick Feamster, Jennifer Rexford, and Nick McKeown
Throughput Results
![Page 37: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/37.jpg)
Lightning Talks1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters”
2. Nic McDonald: “Service-Oriented Rate Control”
3. Collin Lee: “Infrastructure and Modules for Building Scalable Control Planes”
4. Omid Mashayekhi: “Nimbus: Running Fast, Distributed Computations with Execution Templates”
5. Stephen Yang: “Gaining Visibility and Insight into Distributed Systems”
6. Henry Qin: “Core-Aware Scheduling: Balancing Application Concurrency with Core Availability”
7. Sean Choi: “Customizing Open vSwitch using P4”
8. Stephen Ibanez: “High Speed Networks Need Proactive Congestion Control”
9. Samuel Grossman: “Grazelle: Hardware-Optimized In-Memory Graph Processing”
![Page 38: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/38.jpg)
Context
● Network speed: 10 → 100 Gb/s● 1 MB / 100 Gb/s = 80 μs
Typical Flow Completion Times
10 Gb/s 70-80 RTTs
40 Gb/s 17-20 RTTs
100 Gb/s 7-8 RTTs
Adjust Flow Rate
Measure Congestion
Reactive Congestion Control
High Speed Networks Need Proactive Congestion Control
Lavanya Jose1, Lisa Yan1, Stephen Ibanez1, Issac Keslassy2, George Varghese3, Sachin Katti1, Mohammad Alizadeh4, and Nick McKeown1
1Stanford, 2Technion, 3Microsoft, 4MIT
![Page 39: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/39.jpg)
High Speed Networks Need Proactive Congestion Control
Problem
● Can we use explicit information (Link Capacities, Traffic Matrix) to find the optimal flow rate allocation more quickly?
![Page 40: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/40.jpg)
High Speed Networks Need Proactive Congestion Control
Problem
● Can we use explicit information (Link Capacities, Traffic Matrix) to find optimal flow rate allocation more quickly?
Approach
● Distributed proactive congestion control● Message passing between flows and links
![Page 41: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/41.jpg)
High Speed Networks Need Proactive Congestion Control
Approach
● Distributed proactive congestion control● Message passing between flows and links
RCP (Reactive) PERC (Proactive)
Median 14 RTTs 4 RTTs
Tail (99th %) 71 RTTs 10 RTTs
Convergence Time
![Page 42: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/42.jpg)
High Speed Networks Need Proactive Congestion Control
Using Programmable Data Planes
● Useful platform for deployment of distributed proactive congestion control schemes
![Page 43: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/43.jpg)
Lightning Talks1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters”
2. Nic McDonald: “Service-Oriented Rate Control”
3. Collin Lee: “Infrastructure and Modules for Building Scalable Control Planes”
4. Omid Mashayekhi: “Nimbus: Running Fast, Distributed Computations with Execution Templates”
5. Stephen Yang: “Gaining Visibility and Insight into Distributed Systems”
6. Henry Qin: “Core-Aware Scheduling: Balancing Application Concurrency with Core Availability”
7. Sean Choi: “Customizing Open vSwitch using P4”
8. Stephen Ibanez: “High Speed Networks Need Proactive Congestion Control”
9. Samuel Grossman: “Grazelle: Hardware-Optimized In-Memory Graph Processing”
![Page 44: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/44.jpg)
Grazelle:Hardware-Optimized In-Memory Graph
ProcessingSamuel Grossman, Heiner Litz, and Christos Kozyrakis
Stanford University
![Page 45: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/45.jpg)
Hardware Features
Vector Units Sequential Accesses
Software Prefetching NUMA Awareness
Caching Overheads Simultaneous Multithreading
![Page 46: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/46.jpg)
Hardware Features: X-Stream
✘ Vector Units ✔ Sequential Accesses
✘ Software Prefetching ✘ NUMA Awareness
✔ Caching Overheads ✘ Simultaneous Multithreading
![Page 47: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/47.jpg)
Hardware Features: Polymer
✘ Vector Units ✔ Sequential Accesses
✘ Software Prefetching ✔ NUMA Awareness
~ Caching Overheads ✘ Simultaneous Multithreading
![Page 48: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/48.jpg)
Hardware Features: Grazelle
✔ Vector Units ✔ Sequential Accesses
✔ Software Prefetching ✔ NUMA Awareness
✔ Caching Overheads ✔ Simultaneous Multithreading
![Page 49: Lightning Talks - Stanford University · Lightning Talks 1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters” 2. Nic McDonald: “Service-Oriented](https://reader034.vdocuments.mx/reader034/viewer/2022042806/5f70ea904bc93856831f50f9/html5/thumbnails/49.jpg)
Lightning Talks1. Christina Delimitrou: “Tarcil: Reconciling Scheduling and Quality in Large Shared Clusters”
2. Nic McDonald: “Service-Oriented Rate Control”
3. Collin Lee: “Infrastructure and Modules for Building Scalable Control Planes”
4. Omid Mashayekhi: “Nimbus: Running Fast, Distributed Computations with Execution Templates”
5. Stephen Yang: “Gaining Visibility and Insight into Distributed Systems”
6. Henry Qin: “Core-Aware Scheduling: Balancing Application Concurrency with Core Availability”
7. Sean Choi: “Customizing Open vSwitch using P4”
8. Stephen Ibanez: “High Speed Networks Need Proactive Congestion Control”
9. Samuel Grossman: “Grazelle: Hardware-Optimized In-Memory Graph Processing”