p51: high performance networking

30
P51: High Performance Networking Lecture 6: Programmable network devices Dr Noa Zilberman [email protected] Lent 2017/18

Upload: others

Post on 16-Oct-2021

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: P51: High Performance Networking

P51: High Performance NetworkingLecture 6: Programmable network devices

Dr Noa [email protected] Lent 2017/18

Page 2: P51: High Performance Networking

High Throughput Interfaces

Page 3: P51: High Performance Networking

Performance Limitations

• So far we discussed performance limitations due to:

• Data path

• Network Interfaces

• Other common critical paths include:

• Memory interfaces

• Lookup tables, packet buffers

• Host interfaces

• PCIe, DMA engine

Page 4: P51: High Performance Networking

Memory Interfaces

• On chip memories

• Advantage: fast access time

• Disadvantage: limited size (10’s of MB)

• Off chip memory:

• Advantage: large size (up to many GB)

• Disadvantage: access time, cost, area, power

• New technologies

• Offer mid-way solutions

Page 5: P51: High Performance Networking

Example: QDR-IV SRAM

• Does 4 operations every clock: 2 READs, 2 WRITEs

• Constant latency

• Maximum random transaction rate: 2132 MT/s

• Maximum bandwidth: 153.3Gbps

• Maximum density: 144Mb

• Example applications: Statistics, head-tail cache, descriptors lists

Switch

QDR SRAM

Page 6: P51: High Performance Networking

Example: QDR-IV SRAM

• Does 4 operations every clock: 2 READs, 2 WRITEs• DDR4 DRAM: 2 operations every clock

• Constant latency• DDR4 DRAM: variable latency

• Maximum random transaction rate: 2132 MT/s• DDR4 DRAM: 20MT/s (worst case! tRC~50ns)

• DDR4 theoretical best case 3200MT/s• Maximum bandwidth: 153.3Gbps

• DDR4 DRAM maximum bandwidth: 102.4Gbps (for 32b (2x16) bus)• Maximum density: 144Mb

• DDR4 maximum density: 16Gb• Example applications: Statistics, head-tail cache, descriptors lists

• No longer applicable: packet buffer

Switch

QDR SRAM

Page 7: P51: High Performance Networking

Random Memory Access

• Random access is a “killer” when accessing DRAM based memories

• Due to strong timing constraints

• Examples: rules access, packet buffer access

• DRAMs perform well (better) when there is strong locality or when accessing large chunks of data

• E.g. large cache lines, files etc.

• Large enough to hide timing constraints

• E.g. for 3200MT/s, 64b bus: 50ns~ 1KB

Page 8: P51: High Performance Networking

Example: PCI Express Gen 3, x8

• The theoretical performance profile:

• PCIe Gen 3 – each lane runs at 8Gbps

• ~97% link utilization (128/130 coding, control overheads)

• Data overhead – 24B-28B(including headers and CRC)

• Configurable MTU(e.g., 128B, 256B, …)

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

60 560 1060 1560

Spee

d up

Packet Size [B]

Switch Datapath

PCIe

Page 9: P51: High Performance Networking

Example: PCI Express Gen 3, x8

• Actual throughput on VC709, using Xilinx reference project:(same FPGA as NetFPGA SUME)

• This is so far from the performance profile…

• Why?

0

5

10

15

20

25

30

35

40

64 65 128

129

256

257

512

513

1024

1025

2048

2049

4096

4097

8192

8193

1638

3

BW [G

bps]

Packet Size [B]

PCIe Throughput - Network to CPU

E5-2690 v4

E5-2667 v4

E5-2643 v4

Note: the graph is for illustration purposes only.There were slight differences between the evaluated systems.

Page 10: P51: High Performance Networking

Software Defined Networks

We will not discuss SDN…

Page 11: P51: High Performance Networking

Software Defined Networking (SDN)

Key Idea: Separation of Data and Control Planes

Page 12: P51: High Performance Networking

Software Defined Networking (SDN)

• SDN is about control and manageability

• Attending to challenges in:

• Controlling large scale networks

• Different underlying hardware

• Device complexity

• …

• The data plane is simple, the “smartness” is in the control plane

• Focused on the packet processing

Page 13: P51: High Performance Networking

Programmable Network Devices

Page 14: P51: High Performance Networking

A bit of history…

• The role of a switch is to connect multiple LAN segments

• Operates on Layer 2

• Supports a single operation: Forwarding

• If you want to do more:

• Layer 3 is handled by the software

• Protocol processing is handled by another device (NPU / PPU)

• Valid until mid-2000’s

• E.g. 2002’s “state of the art” Broadcom Strata XGS, 8x10GE

Page 15: P51: High Performance Networking

Basic components of an IP router (originally)

Control Plane

Data Planeper-packet processing

SwitchingForwardingTable

RoutingTable

Routing Protocols

Management& CLI

Software

Hardw

are

Queuing

Page 16: P51: High Performance Networking

A bit of recent history…

• Mid-2000’s to start-2010’s:

• Fixed function switches

• Integration of functions: same trend as with CPUs

• Why use NPU + Switch if you can use just a switch?

• For limited applications

NPU Switch SwitchPP

Page 17: P51: High Performance Networking

A bit of recent history…

• Mid-2000’s to start-2010’s:

• Fixed function switches

• Supporting multiple (pre-defined) protocols

• E.g. Layer 3 switching

• Fixed pipeline (example only):

Fixed header parser

MAC lookup

Exact match

IPv4 lookupLongest

prefix match

Header editing

Page 18: P51: High Performance Networking

A bit of recent history…

• Start-2010’s – Recent years:

• Partly / fully flexible switches

• Support *many* protocols

• Flexibility in selecting the protocols, memories used, header size,…

Flexibleheader parser

(e.g.)Exact match

(e.g.)Longest

prefix match

Header editing

Page 19: P51: High Performance Networking

Programmable network devices

• Partly / fully programmable

• Mostly focused on the header processing

• But starting to attend also to queueing / switching / TM / …

• Support ANY protocol

• Pipeline is “programmable”

• But within given resource limitations

Page 20: P51: High Performance Networking

Programmable network devices

Advantages:

• New Features – Add new protocols

• Reduce device complexity – e.g., Implement only required protocols.

• Flexible use of resources

• SW style development – better innovation, fix data-plane bugs in the field

Page 21: P51: High Performance Networking

Reconfigurable Match-Action Model

• RMT – Reconfigurable Match-Action Model

• Bosshart, Pat, et al. "Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN." SIGCOMM 2013.

Ingress Match+Action Egress Match+ActionQueuesParser

Page 22: P51: High Performance Networking

Programmable network devices

• How do you programme a network device?

• Requires:

• Programming language

• Compilers

• Architecture

• Underlying hardware support

• We will discuss one popular option, but there are more

Page 23: P51: High Performance Networking

P4

• www.p4.org

• A declarative language

• Telling forwarding-plane devices (switches, NICs, …) how to process packets.

Page 24: P51: High Performance Networking

Example: P4 on NetFPGA (P4-NetFPGA)

P4 Program

Xilinx P416 Compiler

Xilinx SDNet ToolsSSS

SimpleSumeSwitch Architecture

NetFPGA Reference Switch

Page 25: P51: High Performance Networking

SimpleSumeSwitch Architecture Model for SUME Target

P4 used to describe parser, match-action pipeline, and deparser

tdata

AXILite

tdata

AXILite

tusertuser

Page 26: P51: High Performance Networking

P4 PSA: Portable Switch Architecture

• Composability• Example: Multiple functions in a single pipeline

• Portability • Example: Apply a function consistently across a network

• Comparability• Example: Compare functions implementation, A vs. B

Parser Checksum validate

Ingress match-action

Packet Buffer and Replication

Egress match-action

Checksum update Deparser

Buffer Queueing

Engine

Checksum Action SelectorRegister RandomAction

ProfileCounter Meter

Externs

Pipeline

Page 27: P51: High Performance Networking

P4 – Examples Use Cases

• Network telemetry (INT)

• New protocols (e.g., NDP)

• Layer 4 load balancing

• In Network Caching (NetCache) – ×10 throughput

• Consensus Protocols (NetPaxos) – ×10,000 throughput

• Tic-Tac-Toe

Page 28: P51: High Performance Networking

In Network Computing

• Idea: move services and applications from the host to the network

• Somewhat similar terms:

• Network as a Service (NaaS)

• Hardware acceleration (but network specific)

• Implementations:

• Smart NICs

• Programmable Switches

• Different platforms support different languages

Page 29: P51: High Performance Networking

In Network Computing - Examples

• Machine learning

• Graph processing

• Key-value store

• Security (e.g., DDoS detection)

• Big data analytics

• Stream processing

• But nothing is for free (cost, power, space, …)

Page 30: P51: High Performance Networking

P51 Summary

• Architecture of high-performance network devices

• High throughput devices

• Low latency devices

• Programmable network devices

• This is just a little glimpse into high performance networking…

• Reminder: 6/3, same place and time – Special talk:Steve Pope (CTO, Solarflare) – High performance end-host networking