restructuring endpoint congestion control › ~sn624 › talks › ccp-iitm.pdfh-tcp cubic fast ebcc...

42
Akshay Narayan, Frank Cangialosi, Deepti Raghavan, Prateesh Goyal, Srinivas Narayana, Radhika Mittal, Mohammad Alizadeh, Hari Balakrishnan Restructuring Endpoint Congestion Control

Upload: others

Post on 23-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Akshay Narayan, Frank Cangialosi, Deepti Raghavan, Prateesh Goyal,Srinivas Narayana, Radhika Mittal, Mohammad Alizadeh, Hari Balakrishnan

Restructuring Endpoint Congestion Control

Page 2: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Congestion control in UG networking• Congestion: when traffic sources send too much data into a

network

• Congestion control algorithms at sources react to network congestion and vary sending rate accordingly

• A fundamental problem with over 30 years of research

Page 3: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Traditional congestion controlApplications

Operating SystemTCP

Data

Data

Network interface

“datapath”

Page 4: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Trend #1: New datapathsApplication

Data

Data

Network interface

Operating SystemTCP

Kernel-bypasse.g., DPDK

Smart NICs(FPGAs & SoCs)

Libraries & libOSs(e.g., Quic)

End of Moore’s lawApps with strict perf reqsApps with unique needs

Offload eng(RDMA, TOE)

Page 5: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Trend #2: New algorithms

1987 2018

VegasWestwood Compound

PRRBIC

H-TCP

Cubic

FAST

EBCCBinomial

LEDBATVeno

BBRPCCRemy

Sprout

DCTCP

IllinoisNV

HyblaTIMELY

ABC

NewReno

Copa

RC3XCP RCP

DCQCN

1998 2001

TahoeReno

2010

VivaceIndigo

Nimbus

Increasing bandwidth

Improving deployed algorithms

Changing application

requirements

Different deployment scenarios

Using new algorithmic

tools

Page 6: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Algorithm complexity

Sprout (NSDI 2013): Bayesian forecastingRemy (SIGCOMM 2013): Offline learning

PCC / PCC Vivace (NSDI 2015 / NSDI 2018): Online learningIndigo (Usenix ATC 2018): Reinforcement learning

Nimbus (2018--): FFT/Signal processing

Floating point operationsUser-level libraries

Extensive experimentation/tuning

Page 7: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

[ ][Need cross-product of implementations!

Vegas

WestwoodCompound

PRRBIC

H-TCP

Cubic

FAST

EBCCBinomial

LEDBAT

Veno

BBRPCCRemy

Sprout

DCTCP

IllinoisNV

HyblaTIMELY

ABCNewReno

Copa

RC3

XCP

RCP

DCQCNTahoe

RenoVivaceIndigo

Nimbus X] MTCPDPDK

QUIC

DCCP

USERSPACE

SMARTNICFPGA

RDMA

OS TCP

Page 8: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Need cross-product of implementations!

Page 9: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Trend #3: New datapath capabilities

APPLICATION

CCCC

CCCC

CC

NETWORK INTF

APPLICATION

CC

NETWORK INTF

APPLICATION

CC

Independent CC Aggregate CC

NETWORK INTF

APPLICATION

CC

NETWORK INTFCC

Software CC Offloaded CC

Page 10: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Congestion control today istoo tightly coupled to datapath,

hard to implement,and hard to evolve.

Page 11: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Congestion Control Plane (CCP)

Page 12: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Traditional datapath design

Datapath state

Congestion control

Datapath

Network interface

Application

TX RX

(for example)

Page 13: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Congestion Control Plane (CCP) design

Datapath state

CCP datapath component

Datapath

Network interface

Application

TX RXCCP agent

Separate userspace process (for example)

Page 14: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Congestion control plane (CCP) design

Datapath state

CCP datapath component

Datapath

Network interface

Application

TX RXCCP agent

Page 15: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Congestion control plane (CCP) design

Datapath state

CCP datapath component

Datapath

Network interface

Application

TX RXCCP agent

(1) Address space separation

UserKernel

Page 16: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Congestion control plane (CCP) design

Datapath state

CCP datapath component

Datapath

Network interface

Application

TX RXCCP agent

(1) Address space separation

(2) Separate mechanism (datapathdependent) from policy (datapathindependent)(3) Make datapathmechanisms flexible through a DSL

Page 17: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Congestion control plane (CCP) design

Datapath state

CCP datapath component

Datapath

Network interface

Application

TX RXCCP agent

Write once & run anywhere

Sophisticated algorithms

New CC capabilities without datapath modifications

Page 18: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

A possible usage of CCP: TCP’s AIMD

On every ACK:cwnd := cwnd + (# ACK’d pkts)/cwnd

On every loss:cwnd := cwnd / 2

time

Con

gest

ion

win

dow(cwnd)

CCP agent CCP datapathcomponentOn every ACK:

cwnd := cwnd + (#ACKed pkts)/cwndOn every loss:

cwnd := cwnd / 2

ACK, loss

cwnd

Page 19: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

But what about performance?

Page 20: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Performance concerns• Increased latency and decreased throughput

• Should every packet cross address-space boundaries?

• Reacting to stale information• We don’t want to delay an algorithm reacting to congestion

• Loss of fidelity• Is behavior within CCP agent == behavior within the datapath?

Page 21: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

(1) Throughput and Latency

Datapath state

CCP datapath component

Datapath

Network interface

Application

TX RXCCP agentAgent can’t process every packet as it comes.

Communicate info for batches of packets. Increases tput when agent receives packets.

Page 22: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

(2) Reacting to stale network state

Datapath state

CCP datapath component

Datapath

Network interface

Application

TX RXCCP agent

Agent will incur a delay before receiving info of last packet.

Split the CC implementation.

Page 23: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

(2) Reacting to stale network state

Datapath state

Sync measurement & control

Datapath

Network interface

Application

TX RXAsync CC logic

Agent will incur a delay before receiving info of last packet.

Split the CC implementation.

Page 24: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

(2) Reacting to stale network state

Datapath state

Sync measurement & control

Datapath

Network interface

Application

TX RXAsync CC logic

Slow logic in the async CCP agentOn every ACK:

cwnd := cwnd + (# ACK’d pkts)/cwnd

Fast logic in the sync datapathcomponentOn every loss:

cwnd := cwnd / 2

Page 25: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

(3) Fidelity relative to datapath-only CC

Datapath state

CCP datapath component

Datapath

Network interface

Application

TX RXCCP agent

Digital control theory: working with samples “as good” as regular signal if sampled at least every loop delay

Communicating once (or few times) per RTT is enough.

Natural time scale for CC: RTT.

Page 26: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

What, when & how should CCP agent and datapath communicate?

Page 27: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Split congestion control

Datapath state

CCP datapath component

Datapath

Network interface

Application

TX RXCCP agent

CCP datapathexposes standard primitives to agent.

Control: set cwnd, rate

Measurement: measure & summarize over time using a DSL

cwnd/rate statistics

Standardized datapath interface

Page 28: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

CCP datapath DSLMeasurement timestamp

In-order acked bytes

Out-of-order acked bytes

ECN-marked bytes

Lost bytes

Timeout occurred

RTT sample

Bytes in flight

Outgoing rate

Incoming rate

Summarize primitive measurements using fold functions over packets within the CCP datapath

Write custom report triggers within the datapath to invoke CCP agent

Implement callback functions within the CCP agent to respond to those triggers.

Page 29: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Under the hood…

Datapath state

CCP datapath

Datapath

Network interface

Application

TX RXCCP agent

cwnd/rate stats

Datapath shim layerlibccp

Shim layer exposes datapath variables. It is datapath-specific.

libccp implements a simple register-based machine and runs datapath CC programs on each packet. It is datapath-agnostic.

Page 30: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

CCP Performance

Page 31: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

96 Mbit/s, 20 ms link RTT

TCP CUBIC window dynamics

Page 32: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Link: 24 Mbit/s, 20 ms RTT

Kernel QUIC mTCPCo

paCu

bic

Link: 12 Mbit/s, 20 ms RTT

Write once, run anywhere

Page 33: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Link: 10 Gbit/s, 100 µs RTT

5% on one core

CPU utilization while saturating 10 Gbit/s

Page 34: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Link: 10 Gbit/s, 10 µs

1.15

Low-RTT: Incast w/ 50 senders (simulation)

Page 35: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Slow start: Partitioning

Link: 48 Mbit/s, 100 ms RTT

Page 36: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

TCP BBR: Partitioning

Asynchronous (per RTT)

Synchronous (per packet)

CCP agentasynchronous, on each report (one per 8 RTTs)▸ Calculate new rate based on

measurements▸ Handle switching between modes

Datapath programsynchronous, per ACK▸ Pulse: rate = 1.25 x bottleneck rate▸ After 1 RTT: rate = 0.75 x bottleneck rate▸ After 2 RTT: rate = bottleneck rate▸ After 8 RTT: repeat

Page 37: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Capabilities enabled

Sophisticated algorithmsFaster prototyping

CC for flow aggregatesApplication-integrated CCDynamic, path-specific CC

Page 38: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Looking forwardNext steps

Ø More algorithms

Ø Hardware datapaths

Ø Other new capabilities based on the CCP platform

Current status

Ø Datapaths:Ø Linux TCPØ QuicØ mTCP/DPDK

Ø CCP agent (Rust/Python)

https://[email protected]

[email protected]

Page 39: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno
Page 40: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

eBPF

▸Event-driven semantics▸Explicit reporting

model

Front-End(Language)

Back-End(Datapath)

▸Congestion control enforcement

▸Direct access to socket state

(def (Report (acked 0)))(when true(:= Report.acked (+ Report.acked Ack.bytes_acked))(:= Cwnd (+ Cwnd Report.acked))(fallthrough))

(when (> Flow.lost_pkts_sample 0)(report))

Page 41: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

How to use CCP to build CC algos?

Datapath state

CCP datapath

Datapath

Network interface

Application

TX RXCCP agent

cwnd/rate stats

CCP agent: Asynchronous event-driven CC program.

Callbacks roughly every RTT or on program-specified conditions

Page 42: Restructuring Endpoint Congestion Control › ~sn624 › talks › ccp-iitm.pdfH-TCP Cubic FAST EBCC Binomial LEDBAT Veno BBR Remy PCC Sprout DCTCP Illinois NV Hybla TIMELY ABC NewReno

Computer Science @ • csrankings.org

• #9 in algorithms & complexity• #15 in AI

• Growing presence in systems and networking. Last 5 years:• Best paper awards at PLDI, SIGCOMM, FAST, ICSE• IEEE MICRO top picks papers• ACM SIGPLAN doctoral dissertation award• Incoming faculty: best papers @ PLDI, Usenix Security, NDSS

• Apply to Rutgers CS for graduate school and postdocs