direct universal access: making data center resources available … · 2019-03-12 · direct...

31
Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1 , Peng Cheng 1 , Guo Chen 2 , Zhiyuan Guo 1, 3 , Lei Qu 1 , Yongqiang Xiong 1 , Derek Chiou 4 , Thomas Moscibroda 4 Microsoft Research 1 , Hunan University 2 , Beihang University 3 , Microsoft Azure 4 Microsoft Research Microsoft Azure 1

Upload: others

Post on 17-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Direct Universal Access:Making Data Center Resources Available to FPGA

Ran Shu1, Peng Cheng1, Guo Chen2, Zhiyuan Guo1, 3, Lei Qu1, Yongqiang Xiong1,

Derek Chiou4, Thomas Moscibroda4

Microsoft Research1, Hunan University2,Beihang University3, Microsoft Azure4

Microsoft Research Microsoft Azure

1

Page 2: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

FPGA Deployment in Data Centers

• Wide deployment– Major cloud service providers

• Microsoft, Amazon, Facebook, Alibaba, Tencent, Baidu, IBM, etc.

• Accelerated applications– Computation

• Web search ranking• Deep neural networks• Big data analytics

– Networking• Network processing

– Database/Storage• SQL• Key-value store

Image from A. Caulfield et al., Micro 2016

Image from D. Firestone et al., NSDI 2018

2

Page 3: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Resource Access Requirements

• Heterogeneous resources

– CPU

– Memory

– Other FPGAs

– GPU

– SSD

Data center network fabric

CPUHost

DRAMSSDGPU

NIC

...

FPGA Onboard DRAM

FPGA board

PCIe fabric

Server

FPGA Onboard DRAM

FPGA board

CPUHost

DRAMSSDGPU

NIC

...

FPGA Onboard DRAM

FPGA board

PCIe fabric

Server

FPGA Onboard DRAM

FPGA board

...

EthernetEthernet

... ...

Image from A. Putnam et al., ISCA 2014 3

Page 4: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

FPGA Board in Data Center

FPGA chip

QSFP

DDR 4

PCIe Gen 3

Image from https://www.cnet.com/news/microsoft-project-brainwave-speeds-ai-with-fpga-chips-on-azure-build-conference/ 4

Page 5: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Current FPGA Communication Architecture

Application Layer

Physical Layer

Application

Application

DDR 4 IP PCIe Gen 3 IP

DDR Stack LTL NVMeHostDMA

FPGA Chip

QSFP IP

DMA

PCIe Gen 3QSFP

DDR 4

FPGA Board

DCQCN(in LTL) Transport Layer

Data Link Layer

5

Page 6: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Current FPGA Communication Architecture

MAC layer

Physical layer

Application

Application

DDR 4 IP PCIe Gen 3 IP

DDR Stack LTL NVMeHostDMA

FPGA Chip

QSFP IP

DMA

PCIe Gen 3QSFP

DDR 4

FPGA Board

???

Image from L. Zhang et al., CCR 2014

DCQCN(in LTL)

6

Page 7: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Problem #1 – Programming Interface

Application 1 Application 2

DDR IP PCIe IPEthernet IP

DDR Stack LTL NVMe StackHostDMA

Lines of code to use each interfaceHost DRAM: 294

Host CPU program: 205

Onboard DRAM: 517

Remote FPGA: 1356

7

FPGA Chip

Page 8: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Problem #2 – Accessibility

Application Application

DDR IP PCIe IPEthernet IP

DDR Stack LTL NVMe StackHostDMA

Data center naming space

Server-area naming space

On-board naming space

8

FPGA Chip

Page 9: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Problem #3 – Multiplexing

Application 1 Application 2

DDR IP PCIe IPEthernet IP

Host DMAMux/Demux

FPGA Chip

DDR Stack LTL NVMe StackHostDMA

9

Page 10: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Problem #3 – Multiplexing

Application 1 Application 2

DDR IP PCIe IPEthernet IP

LTLMux/Demux

DDR Stack LTL NVMe StackHostDMA

10

FPGA Chip

Page 11: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Problem #3 – Multiplexing

Application 1 Application 2

PCIe Mux/Demux

DDR IP PCIe IPEthernet IP

DDR Stack LTL NVMe StackHostDMA

11

FPGA Chip

Page 12: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Problem #4 – Security

DDR Stack LTL NVMe StackHostDMA

DDR IP PCIe IPEthernet IP

MaliciousApplication

Access unauthorized

address

Host Memory

PCIe Fabric12

FPGA Chip

Page 13: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Problem #4 – Security

DDR Stack LTL NVMe StackHostDMA

DDR IP PCIe IPEthernet IP

Victim Application

Attack application

MaliciousCPU

Program

PCIe Fabric13

FPGA Chip

Page 14: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Existing Problems

• Complex programming interface

• Separate naming space

• No general multiplexing

• Security issue

14

Page 15: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Direct Universal Access

FPGA 1

Server

QSFPDDR

Server

LTLDDR

accessFPGA

ConnectHostDMA

PCIe Gen3DDR

App 2

LTLDDR

accessFPGA

ConnectHostDMA

Datacenter networking fabric

QSFP

FPGA 2

DDR

FPGA Connect

DDR access

Host DMA

PCIe Gen3 PCIe Gen3CPU

DUA DUA

App 1

DUA

Intra-server networking fabric

① ④

FPGA 3

15

Page 16: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

DUA Overview

FPGA

Server

QSFPDDR

Server

AppApp

LTLDDR

accessFPGA

ConnectHostDMA

PCIe Gen3DDR

AppApp

LTLDDR

accessFPGA

ConnectHostDMA

Datacenter networking fabric

QSFP

FPGA

DDR

FPGA Connect

DDR access

Host DMA

PCIe Gen3 PCIe Gen3CPU

DUA DUA

AppApp

DUA

Intra-server networking fabric

① ④

DUA is an “IP layer”An abstract overlay network

Leverage all existing h/w stacksHierarchical addressing & routing

16

Page 17: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

DUA Overview

FPGA

Server

QSFPDDR

Server

AppApp

LTLDDR

accessFPGA

ConnectHostDMA

PCIe Gen3DDR

AppApp

LTLDDR

accessFPGA

ConnectHostDMA

Datacenter networking fabric

QSFP

FPGA

DDR

FPGA Connect

DDR access

Host DMA

PCIe Gen3 PCIe Gen3CPU

DUA DUA

AppApp

DUA

Intra-server networking fabric

DUA is an “IP layer”

① ④Efficient Routing

Direct resource access by FPGA, totally bypass CPU

17

Page 18: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

DUA Overview

FPGA

Server

QSFPDDR

Server

AppApp

LTLDDR

accessFPGA

ConnectHostDMA

PCIe Gen3DDR

AppApp

LTLDDR

accessFPGA

ConnectHostDMA

Datacenter networking fabric

QSFP

FPGA

DDR

FPGA Connect

DDR access

Host DMA

PCIe Gen3 PCIe Gen3CPU

DUA DUA

AppApp

DUA

Intra-server networking fabric

① ④

Compatible BSD-socket Interfacefor both applications and communication stacks

Efficient RoutingDUA is an “IP layer”

18

Page 19: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

DUA Overview

FPGA

Server

QSFPDDR

Server

AppApp

LTLDDR

accessFPGA

ConnectHostDMA

PCIe Gen3DDR

AppApp

LTLDDR

accessFPGA

ConnectHostDMA

Datacenter networking fabric

QSFP

FPGA

DDR

FPGA Connect

DDR access

Host DMA

PCIe Gen3 PCIe Gen3CPU

DUA DUA

AppApp

DUA

Intra-server networking fabric

① ④

Efficient RoutingCompatible BSD-socket Interface

General Multiplexingfor both applications and communication stacks

DUA is an “IP layer”

19

Page 20: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

DUA Overview

FPGA

Server

QSFPDDR

Server

AppApp

LTLDDR

accessFPGA

ConnectHostDMA

PCIe Gen3DDR

AppApp

LTLDDR

accessFPGA

ConnectHostDMA

Datacenter networking fabric

QSFP

FPGA

DDR

FPGA Connect

DDR access

Host DMA

PCIe Gen3 PCIe Gen3CPU

DUA DUA

AppApp

DUA

Intra-server networking fabric

① ④

Efficient RoutingCompatible BSD-socket InterfaceCompatible BSD-socket interface

DUA is an “IP layer”

Unified Multiplexing

SecurityProtect against both inside and outside attacks

20

Page 21: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

App

FPGA

App

App

NIC

FPGA

CPU

FPGA

NIC

CPU

Datacenter networking fabric

Intra-server networking fabric

QSFP

DUA Underlay

PCIe Gen3DDR

Server

Server

CPU Control Agent

DUAOverlay

FPGA Control Agent

FPGA Host Stack

NVMe Stack

LTL

FPGA Connect

Host DMA

DDR ControllerDUA

data plane

System Architecture

DUA Data Plane

21

Page 22: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

App

FPGA

App

App

NIC

FPGA

CPU

FPGA

NIC

CPU

Datacenter networking fabric

Intra-server networking fabric

QSFP

DUA Underlay

PCIe Gen3DDR

Server

Server

CPU Control Agent

DUAOverlay

FPGA Control Agent

FPGA Host Stack

NVMe Stack

LTL

FPGA Connect

Host DMA

DDR Controller

System Architecture

DUA Data Plane DUAcontrol plane

22

Page 23: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

DUA Control Plane

• Challenge: large-scale resource and routing info dissemination– Limited h/w resource

• DUA solution– Hierarchical addressing

– Hierarchical routing

– Leverage existing infrastructure

• Fully distributed and lightweight– Need no global synchronization

Resource table

Interconnection table

Src Resource (UID) Dst Resource (UID) / Stack

FPGA 2 (192.168.0.2:2) / FPGA Connect

Host DRAM (192.168.0.2:3) / DMA

Onboard DRAM (192.168.0.2:4) / DDR

FPGA 1 (192.168.0.2:1) / FPGA Connect

Host DRAM (192.168.0.2:3) / DMA

Resources on other servers (*:*) / LTL

FPGA 1 (192.168.0.2:1)

FPGA 2 (192.168.0.2:2)

UID

(serverID:deviceID)Address /port Resource description

192.168.0.2:1 0x00000001CFFFF000 1st block of host DRAM

192.168.0.2:1 0x000000019FFFF000 2nd block of host DRAM

192.168.0.2:2 0x80000000 1st block of FPGA onboard

192.168.0.2:3 8000 1st application on FPGA

192.168.0.2:3 8001 2nd application on FPGA

23

Page 24: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

DUA Data Plane

• Overlay– Unified interface

– Routing

• Stacks– Leverage all the existing (or

adopt future) stacks

• Underlay– Efficient multiplexing

– Security

App

FPGA

App

App

DUA data plane

FPGA Host stack

NVMe Stack

LTL

FPGA Connect

Host DMA

DDR Access

QSFP

DUA underlay

PCIe Gen3DDR

DUAoverlay

24

Page 25: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

DUA Data Plane – Overlay

• Efficient & extensible design– Switch fabric

• High capacity cross-bar switch

– Connector• All cached routing tables

– Translator• Protocol translation

• High performance data path– Line-rate

– Near zero-delay

DUAoverlay

LTL DDRFPGA

ConnectHost DMA

Connector Connector Connector Connector

App App

Switch Fabric

Connector Connector

LTL Translator

DDR Translator

Host DMA Translator

FPGA CA

App

Connector

25

Page 26: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Evaluation – efficiency

Extreme low latency (< 50 ns/fwd)

FPGA 1 FPGA 4FPGA 2 FPGA 3

APP APPDUA overlay DUA overlay

FPGA Connect

FPGA Connect

LTL LTLFPGA

ConnectFPGA

Connect

0.153 0.176 0.196

1.266 1.266 1.316

2.444 2.479 2.545

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

64 128 256

Late

ncy

(u

s)

Packet Size (B)

DUA LTL FPGA Connect

Round Trip Time through FPGA Connect and DUA for 4 times, LTL twice

26

Page 27: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Evaluation – Logic Overhead

DUA Overlay• 2 Ports: 4.24%• 4 Ports: 9.29%• 8 Ports: 19.86%

DUA Underlay

• 4 Stacks and 3 PHY Interfaces: 0.25%

27

Page 28: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Evaluation – Deep Crossing

45.28% Latency Reduction

SPMV

SPMV

SPMV

DMV

DMV DMV DMV

640 × 64

64 × 640

640 × 128

128 × 640

Single FPGA Board: Parall = 32, 2 FPGA Board: Parall = 64

28

Page 29: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Evaluation – Regex Matching

• Up to 105~107 higher than CPU, Up to 105 lower than CPU• Up to 3 times throughput and up to 55% latency reduction compared to

using CPU to move data between FPGAs

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

Thro

ugh

pu

t (G

B/s

)

Input String Length (Byte)

through FPGA Connect

through CPU

Pure CPU

1.00E+00

4.00E+00

1.60E+01

6.40E+01

2.56E+02

1.02E+03

4.10E+03

1.64E+04

6.55E+04

2.62E+05

1.05E+06

4.19E+06

1.68E+07

64 128 256 512 1024 2048 4096 8192 16384

Late

ncy

(u

s)

Input String Length (Byte)

through FPGA Connect through CPU Pure CPU

29

Page 30: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Conclusion

• Current FPGA communication architecture– No universal access

• DUA: build the “IP” layer for FPGA in data center– Leverage existing data center network

– Efficient routing

– Compatible BSD socket interface

– Unified multiplexing

– Security

• Open source soon

30

Page 31: Direct Universal Access: Making Data Center Resources Available … · 2019-03-12 · Direct Universal Access: Making Data Center Resources Available to FPGA Ran Shu 1, Peng Cheng

Thank you!Questions?

31