ebpf & switch...
TRANSCRIPT
©"2018"NETRONOME"SYSTEMS,"INC. 1
eBPF & Switch Abstractions
Nick Viljoen <[email protected]>
Networking TrackVancouver, 14 November 2018
©"2018"NETRONOME"SYSTEMS,"INC. 2
Contents
○ Background
○ The"MultiCHost"NIC"Abstraction○ The"Switchdev"Based"MultiCHost"NIC"Abstraction
○ Currently"upstream"(as"of"2"weeks"ago)
○ Boot
○ Setting"Switchdev"Mode
○ Loading"Qdiscs
○ Next"Steps"(currently"being"upstreamed)
○ Generalising Qdisc Offload
○ Adding"clsact"Qdiscs"(u32,"cls_bpf)
○ Future"Work
©"2018"NETRONOME"SYSTEMS,"INC. 3
Background:"HW,"BPF"JIT,"NIC"as"a"Switch"
©"2018"NETRONOME"SYSTEMS,"INC. 4
Background:"HW
○ Many"core,"fully"programmable"network"processor○ 48K96"preprocessing"cores
○ 54K120"programmable"cores,"8"threads"per"core,"MIMT○ up"to"4"PCIe
○ Up to 40"ports"supported○ ~17MB"of"on"chip"memory
○ 2K24GB of DRAM○ Distributed"&"Transactional"memory architecture
○ Low"power○ ~"10K35W"(dependent"on"chip"+"frequency)
©"2018"NETRONOME"SYSTEMS,"INC. 5
Background:"HW"Datapath
Preprocessing
Accelerators IMU/EMU
PCIePCIe
PCIePhy Ports
Reorder
PCIePCIe
PCIePhy Ports
CLSCTM
C P LU AT
Processing*Islands
=*Flow*Processing*Core*(FPC)
©"2018"NETRONOME"SYSTEMS,"INC. 6
Background:"eBPF"Offload"Architecture
○ Program"is"written"in"standard"manner○ LLVM"compiled"as"normal○ The"nfp’s"jit"is"called"like"any"other"architectures"jit"
○ This"converts"the"BPF"bytecode"to"NFP"machine"code
○ Translation"reuses"the"verifier"infrastructure"in"kernel
○ Defining the FPC datapath code usingBPF
bpf_prog.c
LLVM
bpf_prog.c
Verifier
Host"JIT NFP"JIT
NFPHost"CPU
©"2018"NETRONOME"SYSTEMS,"INC. 7
Background:"NIC"as"a"Switch
H
HH
H
PHY/MAC
Host
PHY/MAC
PHY/MAC
PHY/MAC
PHY/MAC
MultiIHost “MultiIHomed”
©"2018"NETRONOME"SYSTEMS,"INC. 8
Multi7Host"NIC"Architecture
©"2018"NETRONOME"SYSTEMS,"INC. 9
Host"3
Host"2Host"1
Host"0
L2"Switch
Multi@Host"NIC"Abstraction:"Today
Pressure"Here
No#Linux#Controlled#QoS
No#Visibility#for#other#hosts
No#concept#of#where#offloads#occur#in#datapath,#
e.g#XDP#offload
PHY/MAC
Many#great#things,#but#a#few#holes
©"2018"NETRONOME"SYSTEMS,"INC. 10
Host"3
Host"2Host"1
Host"0
L2"Switch
Multi?Host"NIC"Abstraction:"Switchdev"Based
Attach&point&for&QoS
Clear&concept&of&all&the&logical&ports&present
PHY/MAC
Potentially"Visible"Stats
Configurable"from"host"0
©"2018"NETRONOME"SYSTEMS,"INC. 11
Current"Work:"Switchdev"and"Simple"Qdisc"Offload
©"2018"NETRONOME"SYSTEMS,"INC. 12
Host"3
Host"2Host"1
Host"0
L2"Switch
Multi?Host"NIC"Abstraction:"Incorporating"Qdiscs
PHY/MAC
using"offloaded"qdiscs"for"QoS"allows"improved"throughputs"and"latency
©"2018"NETRONOME"SYSTEMS,"INC. 13
3"Key"Steps
○ Initialisation"(.probe)○ App"Initialisation
○ vNIC"Allocation○ Entering"Switchdev"Mode"(.devlink_eswitch_set)
○ Spawning"Representors○ Qdisc"Setup"(.ndo_tc_setup)
○ Attaching"Qdisc"to"struct"nfp_abm_link
©"2018"NETRONOME"SYSTEMS,"INC. 14
nfp_app.c
main.c
ctrl.c
nfp_net_pci_probe()
nfp_mbox_cmd()
nfp_net_pf_app_initnfp_net_pf_alloc_vnics
nfp_abm_ctrl_qm_disablenfp_abm_ctrl_read_params
struct"nfp_abm
nfp_net_main.c
pci_epf_core.cprobe
app7>type7>initapp7>type7>vnic_init
nfp_abm_init
nfp_abm_vnic_allocstruct"nfp_abm_link
struct"nfp_abm_link
Initialisation
Driver
Kernel
App"Abstraction
©"2018"NETRONOME"SYSTEMS,"INC. 15
DevlinkDevlink(dev(eswitch(showdevlink dev(eswitch(setdevlink sb(pool(showdevlink sb(pool(set
devlink.c
nfp_devlink.c
nfp_shared_buf.cnfp_app.c
ctrl.c
nfp_app_eswitch_mode_get()nfp_app_eswitch_mode_set()
nfp_shared_buf_pool_get()nfp_shared_buf_pool_set()
nfp_mbox_cmd()
nfp_mbox_cmd()
app=>type=>eswitch_mode_get()app=>type=>eswitch_mode_set()
nfp_abm_ctrl_qm_enable
struct"nfp_abm
struct"nfp_abm_link
nfp_abm_eswitch_mode_get
nfp_abm_spawn_repr
struct"nfp_repr
eswitch_mode_geteswitch_mode_setsb_pool_getsb_pool_set
struct"nfp_abm_link
main.c
Entering"Switchdev"Mode
Driver
Kernel
App"Abstraction
User"Space
©"2018"NETRONOME"SYSTEMS,"INC. 16
nfp_app.c
main.c
ctrl.c
ndo_setup_tc
nfp_mbox_cmd()
app6>type6>setup_tc
struct"nfp_abm
struct"nfp_repr
tc9qdisctc9qdisc9add9(mq/red)tc9qdisc9replacetc9qdisc9deltc9qdisc9showtc96s9qdisc9show
sch_red.c/sch_mq.c
nfp_abm_setup_tc_red
nfp_abm_setup_tc_mqstruct"nfp_abm_link
struct"nfp_abm_link
struct"nfp_qdisc_red
Qdisc"Offload
Driver
Kernel
App"Abstraction
User"Space
©"2018"NETRONOME"SYSTEMS,"INC. 17
Next"Steps:"Extending"the"Egress"Representor"Architecture
©"2018"NETRONOME"SYSTEMS,"INC. 18
Items"Covered
○ Generalising"Qdisc"Offload○ Structure"changes
○ nfp_abm_link
○ nfp_qdisc○ The"clsact"Qdisc
○ Motivation○ Architecture
©"2018"NETRONOME"SYSTEMS,"INC. 19
Generalising"Qdisc"Offload:"Structure"Changes
Before
©"2018"NETRONOME"SYSTEMS,"INC. 20
Generalising"Qdisc"Offload:"Structure"Changes
After
©"2018"NETRONOME"SYSTEMS,"INC. 21
Host"3
Host"2Host"1
Host"0
L2"Switch
clsact"Qdisc:"Motivation
PHY/MAC
cls cls cls
cls
clsact"ensures"ability"to"use"GRED
GRED"allows"more"granular"
QoS
©"2018"NETRONOME"SYSTEMS,"INC. 22
struct"nfp_abm_link
nfp_app.c
main.c
ctrl.cnfp_mbox_cmd()
app3>type3>setup_tc
struct"nfp_abm
struct"nfp_abm_link
struct"nfp_repr
tc9qdisctc9qdisc9add9(mq/red)tc9qdisc9replacetc9qdisc9deltc9qdisc9showtc93s9qdisc9show
sch_ingress.c/sch_red.c/sch_mq.c
tc9filtertc9filter9add9(u32)tc9filter9replacetc9filter9deltc9filter9show
cls_u32.c
ndo_setup_tc
cmsg
clsact"Qdisc:"Architecture
struct"list_head"dscp_map
nfp_abm_u32_knode_replace
Driver
Kernel
User"Space
©"2018"NETRONOME"SYSTEMS,"INC. 23
Future"Work:"Multi?host"BPF"Offload
©"2018"NETRONOME"SYSTEMS,"INC. 24
Items"Covered
○ Firmware"and"BPF"JIT○ Flow"Processing"Core"Datapath
○ cls_bpf"and"Switchdev○ Architecture
○ XDP"for"Multihost"Systems○ Problems○ Proposed"Abstraction
©"2018"NETRONOME"SYSTEMS,"INC. 25
Preclassifier
Flow"Processing"CoreFlow"Processing"Core
CLS"(Mem)
Port Progs
Dynamically"Loaded"HelperseBPF"Progs
Tail"calls
Preclassifier
Reorder
Preclassifiers"used"to"isolate"flow"processing"cores"per"host
Lookup"in"jump"table"based"on"entry"port
Returns"the"number"of"programs"to"jump"to"and"their"
locations
Firmware"and"JIT"Changes
©"2018"NETRONOME"SYSTEMS,"INC. 26
nfp_app.c
main.c
ctrl.cnfp_mbox_cmd()
app3>type3>setup_tc
struct"nfp_abm
struct"nfp_abm_link
struct"nfp_repr
tc9qdisctc9qdisc9add9(mq/red)tc9qdisc9replacetc9qdisc9deltc9qdisc9showtc93s9qdisc9show
sch_ingress.c/sch_red.c/sch_mq.c
tc9filtertc9filter9add9(u32/bpf)tc9filter9replacetc9filter9deltc9filter9show
nfp_abm_setup_cls_block
cls_bpf.c
ndo_setup_tc
offload.c
offload.c
nfp_ndo_bpf
nfp_bpf_verifier9_prep
nfp_bpf_translate
nfp_bpf_map_alloc
jit.c
nfp_bpf_jit_prepare
nfp_bpf_jit
cmsg.c
nfp_bpf_ctrl_alloc_map
cmsg
main.c
verifier.c
prog.o
cls_bpf"on"Egress
struct"nfp_qdisc_clsact
©"2018"NETRONOME"SYSTEMS,"INC. 27
XDP"on"Multi=Host"systems
○ Challenges"that"have"to"be"solved○ XDP"is"an"RX"exclusive"hook○ Heterogenous"architecture"support"is"nascent
○ Security
○ However"more"and"more"of"the"potential"problems"are"falling"away○ e.g"David"Ahern’s"recent"work"on"exposing"the"FIB"table
©"2018"NETRONOME"SYSTEMS,"INC. 28
Host"3
Host"2Host"1
Host"0
XDP"attached"to"ingress"at"each"port
XDP"on"MultiGHost"systems:"Proposed"Abstraction
PHY/MAC
cls cls cls
cls
XDP
XDP
XDP
XDP
XDP
redirect()"+"FIB"table"access"allows"conventional"+"
unconventional"switching
©"2018"NETRONOME"SYSTEMS,"INC. 29
Summary
○ Proposing"a"fully"flexible"datapath"for"a"multiHhost"NIC
○ Achieved"through"a"combination"of"switchdev,"qdisc"offload,"cls_bpf"and"XDP
○ Work"in"progress
○ Switchdev"architecture"and"qdisc"offload"has"been"upstreamed
○ Next"is"simple"clsact"support
○ Followed"by"cls_bpf"&"XDP
○ Provides"potential"for"BPF"defined"pipelines"in"heterogenous"architectures
○ See"Jakub’s"talk"at"the"microconference"for"more!
©"2018"NETRONOME"SYSTEMS,"INC. 30
Credit"(Team)
○ Jakub"Kicinski○ Jiong"Wang○ Quentin"Monnet○ David Beckett○ Edwin"Peer○ Johan Moraal○ Mary Pham
©"2018"NETRONOME"SYSTEMS,"INC. 31
Thank"you!
DiscussionQuestions/Comments