Transcript
Page 1: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

1©2017 Open-NFP

Accelerating Networked Applications with Flexible Packet Processing

AntoineKaufmann,NaveenKr.Sharma,ThomasAnderson,ArvindKrishnamurthy

TimothyStamler, SimonPeter

UniversityofWashington The UniversityofTexasatAustin

Page 2: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

2©2017 Open-NFP

Networks are becoming faster

100MbE

1GbE

10GbE

40GbE100GbE

400GbE

100M

1G

10G

100G

1T

1990 1995 2000 2005 2010 2015 2020

Ethe

rnetBandw

idth[b

its/s]

YearofStandardRelease

5nsinter-arrivaltimefor64Bpacketsat100Gbps

Page 3: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

3©2017 Open-NFP

...but software packet processing is slow

Recv+send TCP stack processing time (2.2 GHz)▪ Linux: 3.5µs▪ Kernel bypass: ~1µs

Single core performance has stalledParallelize? Assuming 1µs over 100Gb/s, excluding Amdahl‘s law:▪ 64B packets => 200 cores▪ 1KB packets => 14 cores

Many cloud apps dominated by packet processing▪ Key-value storage, real-time analytics, intrusion detection, file service, ...▪ All rely on small messages: latency & throughput equally important

Page 4: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

4©2017 Open-NFP

What are the alternatives?RDMA▪ Bypasses server software entirely▪ Not well matched to client/server processing (security, two-sided for RPC)

Full application offload to NIC (FPGA, etc.)▪ Application now at slower hardware-development speed▪ Difficult to change once deployed

Fixed-function offloads (segmentation, checksums, RSS)▪ Good start!▪ Too rigid for today’s complex server & network architecture (next slide)

Flexible function offload to NIC (NFP, FlexNIC, etc.)▪ Break down functions (eg., RSS) and provide API for software flexibility

Page 5: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

5©2017 Open-NFP

Fixed-function offloads are not well integrated

Wasted CPU cycles▪ Packet parsing and validation repeated in software▪ Packet formatted for network, not software access▪ Multiplexing, filtering repeated in software

Poor cache locality, extra synchronization▪ NIC steers packets to cores by connection▪ Application locality may not match connection

Page 6: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

6©2017 Open-NFP

A more flexible NIC can help

With multi-core, NIC needs to pick destination core▪ The “right” core is application specific

NIC is perfectly situated – sees all traffic▪ Can scalably preprocess packets according to software needs▪ Can scalably forward packets among host CPUs and network

With kernel-bypass, only NIC can enforce OS policy▪ Need flexible NIC mechanisms, or go back into kernel

Page 7: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

7©2017 Open-NFP

Talk Outline

• Motivation• FlexNIC model

• Experience with Agilio-CX as prototyping platform• Accelerating packet-oriented networking (UDP, DCCP)

• Key-value store• Real-time analytics• Network Intrusion Detection

• WiP: Accelerating stream-oriented networking (TCP)

Page 8: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

8©2017 Open-NFP

FLEXNIC MODEL

Page 9: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

9©2017 Open-NFP

FlexNIC: A Model for Integrated NIC/SW Processing[ASPLOS’16]

• Implementable at Tbps line rate & low cost

Match+action pipeline:

ActionALU

MatchTable

Parser

M+AStage1 M+A2

...

ExtractedHeaderFields

Packet

ModifiedFields

Page 10: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

10©2017 Open-NFP

Match+Action Programs

Supports: Does not support:

Match:IF udp.port ==kvs

Action:core=HASH(kvs.key)%ncoresDMA hash,kvs TO Cores[core]

LoopsComplex calculationsKeeping large state

Steer packetCalculate hash/XsumInitiate DMA operationsTrigger reply packetModify packets

Page 11: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

11©2017 Open-NFP

FlexNIC: M+A for NICs

Efficient application level processing in the NIC▪ Improve locality by steering to cores based on app criteria▪ Transform packets for efficient processing in SW▪ DMA directly into and out of application data structures▪ Send acknowledgements on NIC

IngressPipeline

EgressPipeline

DMAPipeline

Queues

Page 12: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

12©2017 Open-NFP

Netronome Agilio-CX

We use Agilio-CX to prototype FlexNIC• Implement M&A programs in P4• Run on NIC

Our experience with Agilio-CX:▪ Improve locality by steering to cores based on app criteria▪ Transform packets for efficient processing in SW▪ DMA directly into and out of application data structures▪ Send acknowledgements on NIC

Dev

Page 13: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

13©2017 Open-NFP

ACCELERATING PACKET-ORIENTED NETWORKING

Page 14: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

14©2017 Open-NFP

Example: Key-Value Store

4

7

HashTable

Core1

Core2NIC

Receive-sidescaling:core=hash(connection)%N

Client1K= 3,4

Client2K=4,7

Client3K=7,8

• Lockcontention• Poorcacheutilization

4,7

4,7

Page 15: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

15©2017 Open-NFP

Key-based Steering

Core1

Core2NIC

3

4

7

8

HashTable

Client1K=3,4

Client2K=4,7

Client3K=7,8

Match:IF udp.port ==kvsAction:core=HASH(kvs.key)%NDMA hash,kvs TO Cores[core]

• Nolocksneeded• Highercacheutilization

Page 16: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

16©2017 Open-NFP

Custom DMA

DMA to application-level data structuresRequires packet validation and transformation

ItemLog

EventQueue

G

Item1 Item2

G S

GET,ClientID,Hash,KeySET,ClientID,ItemPointer

Page 17: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

17©2017 Open-NFP

Evaluation of the Model

• Measure impact on application performance• Key-based steering: Use NIC• Custom DMA: Software emulation of M&A pipeline

• Workload: 100k 32B keys, 64B values, 90% GET• 6 Core Sandy Bridge Xeon 2.2GHz, 2x10G links

Page 18: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

18©2017 Open-NFP

Key-based steering

• Better scalability▪ PCIe is bottleneck for 4+ cores

• 45% higher throughput• Processing time reduced to 310ns

0

2

4

6

8

1 2 3 4 5Throughp

ut[m

op/s]

NumberofCPUCores

FlexKVS/RSS

FlexKVS/Key

FlexKVS/Linux

MemcachedCustomDMAreducestimeto200ns

Page 19: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

19©2017 Open-NFP

Real-time Analytics System

(De-)Multiplexing threads are performance bottleneck• 2 CPUs required for 10 Gb/s => 20 CPUs for 100 Gb/s

NIC

Software

RxQueue

TxQueue

Count

Count

Rank

Rank

DemuxACKs Mux

Page 20: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

20©2017 Open-NFP

Real-time Analytics System

Offload (de)multiplexing and ACK generation to FlexNIC• No CPUs needed => Energy-efficiency

NIC

Software

RxQueue

TxQueue

Count

Count

Rank

Rank

DemuxACKs Mux

Page 21: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

21©2017 Open-NFP

Performance Evaluation

0

2

4

6

Balanced Grouped

Throughp

ut[m

tuples/s]

ApacheStormFlexStorm/LinuxFlexStorm/BypassFlexStorm/FlexNIC.5x

1x

2x

.3x1x

2.5x

• Clusterof3machines• DetermineTop-nTwitterposters(realtrace)• Measureattainablethroughput

Page 22: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

22©2017 Open-NFP

Network Intrusion Detection

Snort sniffs packets and analyzes them• Parallelized by running multiple instances• Status quo: Receive-side scaling

FlexNIC:• Analyze rules loaded into Snort• Partition rules among cores to maximize caching• Fine-grained steering to cores

Result: 1.6x higher throughput, 30% fewer cache misses

Page 23: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

23©2017 Open-NFP

ACCELERATING STREAM-ORIENTED NETWORKING

Page 24: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

24©2017 Open-NFP

Ongoing work: Stream protocols

Full TCP processing is too complex for M&A processing▪ Significant connection state required▪ Tricky edge cases: reordering, drops▪ Complicated algorithms for congestion control

But the common case is simpler: it can be offloaded▪ Reduces the critical path in software

Opportunity: Enforce correct protocol onto untrusted app▪ Focus: congestion control

Page 25: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

25©2017 Open-NFP

FlexTCP ideas

Safety critical & common processing on NIC▪ Includes filtering, validating ACKs, enforcing rate limits

Handle all non-common cases in software▪ E.g. packet drops, re-ordering, timeouts, …

Requires small per-flow state▪ 64 bytes (SEQ/ACK, queues, rate-limit, …)

Page 26: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

26©2017 Open-NFP

FlexTCP overview

Page 27: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

27©2017 Open-NFP

Flexible congestion control offloadNIC enforces per-flow rate limits set by trusted kernel▪ Flexibility to choose congestion control

Example: DCTCPCommon-case processing on NIC▪ Echo ECN marks in generated ACK▪ Track fraction of ECN marked packets per flow

Kernel implements control policy (DCTCP)▪ Use NIC-reported fraction of packets that are ECN marked▪ Adapt rate limit according to DCTCP protocol

Result: Indistinguishable from pure software implementations

Page 28: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

28©2017 Open-NFP

FlexTCP overhead evaluation

• We implemented FlexTCP in P4• Run on Agilio-CX with null application• Compare throughput to basic NIC (wiretest)

0

10

20

30

40

256 512 1024 1500

Throughp

ut[G

b/s]

Packetsize[Bytes]

Basic

Full

Page 29: Accelerating Networked Applications with Flexible Packet … · with Flexible Packet Processing Antoine Kaufmann, Naveen Kr. Sharma, Thomas Anderson, Arvind Krishnamurthy Timothy

29©2017 Open-NFP

Summary

Networks are becoming faster, CPUs are not▪ Server applications need to keep up▪ Fast I/O requires efficient I/O path to application

Flexible offloads can eliminate inefficiencies▪ Application control over where packets are processed▪ Efficient steering, validation, transformation

Case studies: Key-value store, real-time analytics, IDS▪ Up to 2.5x throughput & latency improvement vs. kernel-bypass▪ Vastly more energy-efficient (no CPUs for packet processing)


Top Related