ebpf & switch...

31
eBPF & Switch Abstractions Nick Viljoen <[email protected]> Networking Track Vancouver, 14 November 2018

Upload: others

Post on 19-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 1

eBPF & Switch Abstractions

Nick Viljoen <[email protected]>

Networking TrackVancouver, 14 November 2018

Page 2: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 2

Contents

○ Background

○ The"MultiCHost"NIC"Abstraction○ The"Switchdev"Based"MultiCHost"NIC"Abstraction

○ Currently"upstream"(as"of"2"weeks"ago)

○ Boot

○ Setting"Switchdev"Mode

○ Loading"Qdiscs

○ Next"Steps"(currently"being"upstreamed)

○ Generalising Qdisc Offload

○ Adding"clsact"Qdiscs"(u32,"cls_bpf)

○ Future"Work

Page 3: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 3

Background:"HW,"BPF"JIT,"NIC"as"a"Switch"

Page 4: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 4

Background:"HW

○ Many"core,"fully"programmable"network"processor○ 48K96"preprocessing"cores

○ 54K120"programmable"cores,"8"threads"per"core,"MIMT○ up"to"4"PCIe

○ Up to 40"ports"supported○ ~17MB"of"on"chip"memory

○ 2K24GB of DRAM○ Distributed"&"Transactional"memory architecture

○ Low"power○ ~"10K35W"(dependent"on"chip"+"frequency)

Page 5: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 5

Background:"HW"Datapath

Preprocessing

Accelerators IMU/EMU

PCIePCIe

PCIePhy Ports

Reorder

PCIePCIe

PCIePhy Ports

CLSCTM

C P LU AT

Processing*Islands

=*Flow*Processing*Core*(FPC)

Page 6: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 6

Background:"eBPF"Offload"Architecture

○ Program"is"written"in"standard"manner○ LLVM"compiled"as"normal○ The"nfp’s"jit"is"called"like"any"other"architectures"jit"

○ This"converts"the"BPF"bytecode"to"NFP"machine"code

○ Translation"reuses"the"verifier"infrastructure"in"kernel

○ Defining the FPC datapath code usingBPF

bpf_prog.c

LLVM

bpf_prog.c

Verifier

Host"JIT NFP"JIT

NFPHost"CPU

Page 7: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 7

Background:"NIC"as"a"Switch

H

HH

H

PHY/MAC

Host

PHY/MAC

PHY/MAC

PHY/MAC

PHY/MAC

MultiIHost “MultiIHomed”

Page 8: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 8

Multi7Host"NIC"Architecture

Page 9: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 9

Host"3

Host"2Host"1

Host"0

L2"Switch

Multi@Host"NIC"Abstraction:"Today

Pressure"Here

No#Linux#Controlled#QoS

No#Visibility#for#other#hosts

No#concept#of#where#offloads#occur#in#datapath,#

e.g#XDP#offload

PHY/MAC

Many#great#things,#but#a#few#holes

Page 10: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 10

Host"3

Host"2Host"1

Host"0

L2"Switch

Multi?Host"NIC"Abstraction:"Switchdev"Based

Attach&point&for&QoS

Clear&concept&of&all&the&logical&ports&present

PHY/MAC

Potentially"Visible"Stats

Configurable"from"host"0

Page 11: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 11

Current"Work:"Switchdev"and"Simple"Qdisc"Offload

Page 12: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 12

Host"3

Host"2Host"1

Host"0

L2"Switch

Multi?Host"NIC"Abstraction:"Incorporating"Qdiscs

PHY/MAC

using"offloaded"qdiscs"for"QoS"allows"improved"throughputs"and"latency

Page 13: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 13

3"Key"Steps

○ Initialisation"(.probe)○ App"Initialisation

○ vNIC"Allocation○ Entering"Switchdev"Mode"(.devlink_eswitch_set)

○ Spawning"Representors○ Qdisc"Setup"(.ndo_tc_setup)

○ Attaching"Qdisc"to"struct"nfp_abm_link

Page 14: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 14

nfp_app.c

main.c

ctrl.c

nfp_net_pci_probe()

nfp_mbox_cmd()

nfp_net_pf_app_initnfp_net_pf_alloc_vnics

nfp_abm_ctrl_qm_disablenfp_abm_ctrl_read_params

struct"nfp_abm

nfp_net_main.c

pci_epf_core.cprobe

app7>type7>initapp7>type7>vnic_init

nfp_abm_init

nfp_abm_vnic_allocstruct"nfp_abm_link

struct"nfp_abm_link

Initialisation

Driver

Kernel

App"Abstraction

Page 15: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 15

DevlinkDevlink(dev(eswitch(showdevlink dev(eswitch(setdevlink sb(pool(showdevlink sb(pool(set

devlink.c

nfp_devlink.c

nfp_shared_buf.cnfp_app.c

ctrl.c

nfp_app_eswitch_mode_get()nfp_app_eswitch_mode_set()

nfp_shared_buf_pool_get()nfp_shared_buf_pool_set()

nfp_mbox_cmd()

nfp_mbox_cmd()

app=>type=>eswitch_mode_get()app=>type=>eswitch_mode_set()

nfp_abm_ctrl_qm_enable

struct"nfp_abm

struct"nfp_abm_link

nfp_abm_eswitch_mode_get

nfp_abm_spawn_repr

struct"nfp_repr

eswitch_mode_geteswitch_mode_setsb_pool_getsb_pool_set

struct"nfp_abm_link

main.c

Entering"Switchdev"Mode

Driver

Kernel

App"Abstraction

User"Space

Page 16: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 16

nfp_app.c

main.c

ctrl.c

ndo_setup_tc

nfp_mbox_cmd()

app6>type6>setup_tc

struct"nfp_abm

struct"nfp_repr

tc9qdisctc9qdisc9add9(mq/red)tc9qdisc9replacetc9qdisc9deltc9qdisc9showtc96s9qdisc9show

sch_red.c/sch_mq.c

nfp_abm_setup_tc_red

nfp_abm_setup_tc_mqstruct"nfp_abm_link

struct"nfp_abm_link

struct"nfp_qdisc_red

Qdisc"Offload

Driver

Kernel

App"Abstraction

User"Space

Page 17: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 17

Next"Steps:"Extending"the"Egress"Representor"Architecture

Page 18: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 18

Items"Covered

○ Generalising"Qdisc"Offload○ Structure"changes

○ nfp_abm_link

○ nfp_qdisc○ The"clsact"Qdisc

○ Motivation○ Architecture

Page 19: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 19

Generalising"Qdisc"Offload:"Structure"Changes

Before

Page 20: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 20

Generalising"Qdisc"Offload:"Structure"Changes

After

Page 21: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 21

Host"3

Host"2Host"1

Host"0

L2"Switch

clsact"Qdisc:"Motivation

PHY/MAC

cls cls cls

cls

clsact"ensures"ability"to"use"GRED

GRED"allows"more"granular"

QoS

Page 22: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 22

struct"nfp_abm_link

nfp_app.c

main.c

ctrl.cnfp_mbox_cmd()

app3>type3>setup_tc

struct"nfp_abm

struct"nfp_abm_link

struct"nfp_repr

tc9qdisctc9qdisc9add9(mq/red)tc9qdisc9replacetc9qdisc9deltc9qdisc9showtc93s9qdisc9show

sch_ingress.c/sch_red.c/sch_mq.c

tc9filtertc9filter9add9(u32)tc9filter9replacetc9filter9deltc9filter9show

cls_u32.c

ndo_setup_tc

cmsg

clsact"Qdisc:"Architecture

struct"list_head"dscp_map

nfp_abm_u32_knode_replace

Driver

Kernel

User"Space

Page 23: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 23

Future"Work:"Multi?host"BPF"Offload

Page 24: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 24

Items"Covered

○ Firmware"and"BPF"JIT○ Flow"Processing"Core"Datapath

○ cls_bpf"and"Switchdev○ Architecture

○ XDP"for"Multihost"Systems○ Problems○ Proposed"Abstraction

Page 25: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 25

Preclassifier

Flow"Processing"CoreFlow"Processing"Core

CLS"(Mem)

Port Progs

Dynamically"Loaded"HelperseBPF"Progs

Tail"calls

Preclassifier

Reorder

Preclassifiers"used"to"isolate"flow"processing"cores"per"host

Lookup"in"jump"table"based"on"entry"port

Returns"the"number"of"programs"to"jump"to"and"their"

locations

Firmware"and"JIT"Changes

Page 26: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 26

nfp_app.c

main.c

ctrl.cnfp_mbox_cmd()

app3>type3>setup_tc

struct"nfp_abm

struct"nfp_abm_link

struct"nfp_repr

tc9qdisctc9qdisc9add9(mq/red)tc9qdisc9replacetc9qdisc9deltc9qdisc9showtc93s9qdisc9show

sch_ingress.c/sch_red.c/sch_mq.c

tc9filtertc9filter9add9(u32/bpf)tc9filter9replacetc9filter9deltc9filter9show

nfp_abm_setup_cls_block

cls_bpf.c

ndo_setup_tc

offload.c

offload.c

nfp_ndo_bpf

nfp_bpf_verifier9_prep

nfp_bpf_translate

nfp_bpf_map_alloc

jit.c

nfp_bpf_jit_prepare

nfp_bpf_jit

cmsg.c

nfp_bpf_ctrl_alloc_map

cmsg

main.c

verifier.c

prog.o

cls_bpf"on"Egress

struct"nfp_qdisc_clsact

Page 27: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 27

XDP"on"Multi=Host"systems

○ Challenges"that"have"to"be"solved○ XDP"is"an"RX"exclusive"hook○ Heterogenous"architecture"support"is"nascent

○ Security

○ However"more"and"more"of"the"potential"problems"are"falling"away○ e.g"David"Ahern’s"recent"work"on"exposing"the"FIB"table

Page 28: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 28

Host"3

Host"2Host"1

Host"0

XDP"attached"to"ingress"at"each"port

XDP"on"MultiGHost"systems:"Proposed"Abstraction

PHY/MAC

cls cls cls

cls

XDP

XDP

XDP

XDP

XDP

redirect()"+"FIB"table"access"allows"conventional"+"

unconventional"switching

Page 29: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 29

Summary

○ Proposing"a"fully"flexible"datapath"for"a"multiHhost"NIC

○ Achieved"through"a"combination"of"switchdev,"qdisc"offload,"cls_bpf"and"XDP

○ Work"in"progress

○ Switchdev"architecture"and"qdisc"offload"has"been"upstreamed

○ Next"is"simple"clsact"support

○ Followed"by"cls_bpf"&"XDP

○ Provides"potential"for"BPF"defined"pipelines"in"heterogenous"architectures

○ See"Jakub’s"talk"at"the"microconference"for"more!

Page 30: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 30

Credit"(Team)

○ Jakub"Kicinski○ Jiong"Wang○ Quentin"Monnet○ David Beckett○ Edwin"Peer○ Johan Moraal○ Mary Pham

Page 31: eBPF & Switch Abstractionsvger.kernel.org/lpc_net2018_talks/eBPF_For_Switches_slides.pdftc9filter9add9(u32/bpf) tc9filter9replace tc9filter9del tc9filter9show nfp_abm_setup_cls_block

©"2018"NETRONOME"SYSTEMS,"INC. 31

Thank"you!

DiscussionQuestions/Comments