inference architectures @xilinx · © copyright 2018 xilinx edge to cloud inference – iiot...

© Copyright 2018 Xilinx

Graham Schelle, PhD Principal Engineer Xilinx Research Labs

Inference Architectures @Xilinx


Xilinx Headlines

!2

Twitch Chooses Xilinx to Enable its Broadcast-quality Livestream of eSports


Agenda

˃ Xilinx Adaptive Architectures

˃ Inference Architectures

˃ Open Source


Xilinx Adaptive Architectures

!4

Traditionally, FPGAs for massively data-parallel applications


Xilinx Adaptive Architectures

!4

Traditionally, FPGAs for massively data-parallel applications

In 2011, Zynq introduced (ZU+ in 2015) ARM CPUs added for embedded applications


Xilinx Adaptive Architectures – Alveo & Versal

!5

In 2018, Alveo introduced Accelerator cards for data center workloads

Coming in 2019, Versal Platform Adaptive compute acceleration platform (ACAP)

AI Engines

Network On Chip

Cortex-A72s


Inference Architectures


Inference Architectures – Evolving Frameworks

!7

Andrej Karpathy on Twitter

˃ Increasing, Evolving Workloads New acceleration needs & algorithms ML “infused” in many applications Adaptable HW a key benefit


Inference Architectures – Evolving Frameworks

!7

Andrej Karpathy on Twitter


˃ Move to Lower Precision ML inference moving to INT8 & lower Better Perf/W with similar accuracy Xilinx devices natively support variable precision

˃ Compressed Networks Higher performance with reduced compute / memory needs Pruning & load balancing to match network requirements


Inference Architectures – Evolving Workloads

!8





Inference Architectures – Evolving Workloads

!8




1. Inference is hard 2. Huge variation in compute and memory

requirements 3. Models typically don’t fit into cache

© Copyright 2018 Xilinx!9

Inference Architectures – Precision vs Power

Source: Bill Dally (Stanford), Cadence Embedded Neural Network Summit, February 1, 2017

Target Device ZU7EV ● Ambient temperature: 25 °C ● 12.5% of toggle rate ● 0.5 of Static Probability ● Power reported for PL accelerated block only

LSTM - Test Error vs Power(W)

Test

err

or [%

]

8

11

14

17

20

Estimated Power Consumption [W]0.500 0.875 1.250 1.625 2.000

Bits (W/A)Pareto Optimal

Rybalkin, V., Pappalardo, A., Ghaffar, M.M., Gambardella, G., Wehn, N. and Blott, M. "FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs."

FPGA:ASIC:

2/3

3/4

2/42/8

4/4

3/8 8/8

3/3

4/8

Michaela Blott, Hot Chips 2018 Tutorial, “Overview of Deep

Learning and Computer Architectures for Accelerating

DNNs”

© Copyright 2018 Xilinx!10

Xilinx Cloud Inference - ML Suite Overlays with xDNN

https://github.com/Xilinx/ml-suiteOn-prem and cloud boards

• Built in Programmable Logic • High Utilization, Thput or Latency Variants • CPU offload for new layer exploration

xDNN w/ xfDNN Compiler


Xilinx Edge Inference - DeePhi

!11

“Learning both Weights and Connections for Efficient Neural Networks”, NeurIPS 2015

“EIE: Efficient Inference Engine on Compressed Deep Neural Network”, ISCA 2016

“ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA”, FPGA 2017

(2016)

(2013)


Cloud & Edge Integration

!12


Xilinx and Open Source


Xilinx and Open Source

!14

PYNQ Xilinx Runtime for PCIe Attached FPGAsQuantized Neural Networks

…More on www.github.com/Xilinx

http://www.github.com/Xilinx


Python is increasingly the Language of Choice

!15

https://spectrum.ieee.org/at-work/innovation/the-2018-top-programming-languages

Top Programming Languages, IEEE Spectrum, July’17Top Programming Languages, IEEE Spectrum, July’18

https://stackoverflow.blog/2017/09/06/incredible-growth-python/

To date



!15


Top Programming Languages, IEEE Spectrum, July’17Top Programming Languages, IEEE Spectrum, July’18

Python is listed as an embedded language

for the first time


To date



!15


Top Programming Languages, IEEE Spectrum, July’17

Python is the fastest growing language: driven by data science, AI, ML and academia

Top Programming Languages, IEEE Spectrum, July’18

Python is listed as an embedded language

for the first time


To date


PYNQ: Python Productivity for Zynq

!16

Ubuntu-based Linux

Jupyter web server

IPython kernel

ARM A9 / A53

Overlays/designs

ZU+ Fabric



!16

Jupyter notebooks, browser-based interface

Ubuntu-based Linux

Jupyter web server

IPython kernel

ARM A9 / A53

Overlays/designs

ZU+ Fabric



!16


PYNQ enables JupyterLab on Zynq and ZU+

Ubuntu-based Linux

Jupyter web server

IPython kernel

ARM A9 / A53

Overlays/designs

ZU+ Fabric



!16



Ubuntu-based Linux

Jupyter web server

IPython kernel

ARM A9 / A53

Overlays/designs

ZU+ Fabric

FPGA designs delivered as Python packages



!16



Ubuntu-based Linux

Jupyter web server

IPython kernel

ARM A9 / A53

Overlays/designs

ZU+ Fabric

Delivered as SD Card image

FPGA designs delivered as Python packages


PYNQ Community – ML, Non-ML & Academic Partners

!17


Xilinx open source engagements related to today’s TVM meeting

!18

MicroPython


Xilinx open source engagements related to today’s TVM meeting

!18

MicroPython

University of Washington

UC Berkeley

Xilinx ResearchUC San Diego


pynq.io/community DAC2019 Design Contest OpenHW Design Contest

Cloud Free Trials

Finally, Xilinx & building new open source communities…


Summary

Xilinx Great for exploring and deploying inference

Xilinx Open Source We’re actively engaging with TVM and other communities

Email: [email protected] Visit: Boulder, Colorado


Adaptable.Intelligent.


Inference Acceleration - Edge

˃ At The Edge

!22

Edge to Cloud Inference – Automotive



˃ At The Edge

!22

ADAS/AD Central Module




˃ At The Edge

!22

Surround-View Camera Back

Short-Range Radar

Forward-Looking CameraDrive Monitor Camera

Surround-View Camera Left

Surround-View Camera Right

Short-Range Radar

Long-Range Lidar

Surround-View Camera Front

Short-Range Radar

ADAS/AD Central Module



Edge to Cloud Inference – Xilinx Platforms

!23

PYNQ-Z1

ZCU102

ZCU104

Ultra96

Edge Devices Custom I/O, ARM CPUs

Cloud Platforms Power Efficient, PCIe, Networking


Edge to Cloud Inference – IIoT Latency/Data Example

!24

Example IIoT Control Rates



!24


Distance NYC to LA: 2,800 miles Speed of light: 186,000 miles/s

Round trip: 2*2800/186000 = 30ms Required Control Rate = 10ms



!24


Distance NYC to LA: 2,800 miles Speed of light: 186,000 miles/s

Round trip: 2*2800/186000 = 30ms Required Control Rate = 10ms

E.g. Power Plant @ 8TB/Month

inference architectures @xilinx · © copyright 2018 xilinx edge to cloud inference – iiot...

Documents