inference architectures @xilinx · © copyright 2018 xilinx edge to cloud inference – iiot...

42
© Copyright 2018 Xilinx Graham Schelle, PhD Principal Engineer Xilinx Research Labs Inference Architectures @Xilinx

Upload: others

Post on 23-Jul-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Graham Schelle, PhD Principal Engineer Xilinx Research Labs

Inference Architectures @Xilinx

Page 2: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Xilinx Headlines

!2

Twitch Chooses Xilinx to Enable its Broadcast-quality Livestream of eSports

Page 3: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Agenda

˃ Xilinx Adaptive Architectures

˃ Inference Architectures

˃ Open Source

Page 4: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Xilinx Adaptive Architectures

!4

Traditionally, FPGAs for massively data-parallel applications

Page 5: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Xilinx Adaptive Architectures

!4

Traditionally, FPGAs for massively data-parallel applications

In 2011, Zynq introduced (ZU+ in 2015) ARM CPUs added for embedded applications

Page 6: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Xilinx Adaptive Architectures – Alveo & Versal

!5

In 2018, Alveo introduced Accelerator cards for data center workloads

Coming in 2019, Versal Platform Adaptive compute acceleration platform (ACAP)

AI Engines

Network On Chip

Cortex-A72s

Page 7: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Inference Architectures

Page 8: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Inference Architectures – Evolving Frameworks

!7

Andrej Karpathy on Twitter

˃ Increasing, Evolving Workloads New acceleration needs & algorithms ML “infused” in many applications Adaptable HW a key benefit

Page 9: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Inference Architectures – Evolving Frameworks

!7

Andrej Karpathy on Twitter

˃ Increasing, Evolving Workloads New acceleration needs & algorithms ML “infused” in many applications Adaptable HW a key benefit

˃ Move to Lower Precision ML inference moving to INT8 & lower Better Perf/W with similar accuracy Xilinx devices natively support variable precision

˃ Compressed Networks Higher performance with reduced compute / memory needs Pruning & load balancing to match network requirements

Page 10: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Inference Architectures – Evolving Workloads

!8

˃ Increasing, Evolving Workloads New acceleration needs & algorithms ML “infused” in many applications Adaptable HW a key benefit

˃ Move to Lower Precision ML inference moving to INT8 & lower Better Perf/W with similar accuracy Xilinx devices natively support variable precision

˃ Compressed Networks Higher performance with reduced compute / memory needs Pruning & load balancing to match network requirements

Page 11: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Inference Architectures – Evolving Workloads

!8

˃ Increasing, Evolving Workloads New acceleration needs & algorithms ML “infused” in many applications Adaptable HW a key benefit

˃ Move to Lower Precision ML inference moving to INT8 & lower Better Perf/W with similar accuracy Xilinx devices natively support variable precision

˃ Compressed Networks Higher performance with reduced compute / memory needs Pruning & load balancing to match network requirements

1. Inference is hard 2. Huge variation in compute and memory

requirements 3. Models typically don’t fit into cache

Page 12: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx!9

Inference Architectures – Precision vs Power

Source: Bill Dally (Stanford), Cadence Embedded Neural Network Summit, February 1, 2017

Target Device ZU7EV ● Ambient temperature: 25 °C ● 12.5% of toggle rate ● 0.5 of Static Probability ● Power reported for PL accelerated block only

LSTM - Test Error vs Power(W)

Test

err

or [%

]

8

11

14

17

20

Estimated Power Consumption [W]0.500 0.875 1.250 1.625 2.000

Bits (W/A)Pareto Optimal

Rybalkin, V., Pappalardo, A., Ghaffar, M.M., Gambardella, G., Wehn, N. and Blott, M. "FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs."

FPGA:ASIC:

2/3

3/4

2/42/8

4/4

3/8 8/8

3/3

4/8

Michaela Blott, Hot Chips 2018 Tutorial, “Overview of Deep

Learning and Computer Architectures for Accelerating

DNNs”

Page 13: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx!10

Xilinx Cloud Inference - ML Suite Overlays with xDNN

https://github.com/Xilinx/ml-suiteOn-prem and cloud boards

• Built in Programmable Logic • High Utilization, Thput or Latency Variants • CPU offload for new layer exploration

xDNN w/ xfDNN Compiler

Page 14: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Xilinx Edge Inference - DeePhi

!11

“Learning both Weights and Connections for Efficient Neural Networks”, NeurIPS 2015

“EIE: Efficient Inference Engine on Compressed Deep Neural Network”, ISCA 2016

“ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA”, FPGA 2017

(2016)

(2013)

Page 15: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Xilinx Edge Inference - DeePhi

!11

“Learning both Weights and Connections for Efficient Neural Networks”, NeurIPS 2015

“EIE: Efficient Inference Engine on Compressed Deep Neural Network”, ISCA 2016

“ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA”, FPGA 2017

(2016)

(2013)

Page 16: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Cloud & Edge Integration

!12

Page 17: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Xilinx and Open Source

Page 18: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Xilinx and Open Source

!14

PYNQ Xilinx Runtime for PCIe Attached FPGAsQuantized Neural Networks

…More on www.github.com/Xilinx

Page 19: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Xilinx and Open Source

!14

PYNQ Xilinx Runtime for PCIe Attached FPGAsQuantized Neural Networks

…More on www.github.com/Xilinx

Page 20: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Python is increasingly the Language of Choice

!15

https://spectrum.ieee.org/at-work/innovation/the-2018-top-programming-languages

Top Programming Languages, IEEE Spectrum, July’17Top Programming Languages, IEEE Spectrum, July’18

https://stackoverflow.blog/2017/09/06/incredible-growth-python/

To date

Page 21: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Python is increasingly the Language of Choice

!15

https://spectrum.ieee.org/at-work/innovation/the-2018-top-programming-languages

Top Programming Languages, IEEE Spectrum, July’17Top Programming Languages, IEEE Spectrum, July’18

Python is listed as an embedded language

for the first time

https://stackoverflow.blog/2017/09/06/incredible-growth-python/

To date

Page 22: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Python is increasingly the Language of Choice

!15

https://spectrum.ieee.org/at-work/innovation/the-2018-top-programming-languages

Top Programming Languages, IEEE Spectrum, July’17

Python is the fastest growing language: driven by data science, AI, ML and academia

Top Programming Languages, IEEE Spectrum, July’18

Python is listed as an embedded language

for the first time

https://stackoverflow.blog/2017/09/06/incredible-growth-python/

To date

Page 23: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

PYNQ: Python Productivity for Zynq

!16

Ubuntu-based Linux

Jupyter web server

IPython kernel

ARM A9 / A53

Overlays/designs

ZU+ Fabric

Page 24: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

PYNQ: Python Productivity for Zynq

!16

Jupyter notebooks, browser-based interface

Ubuntu-based Linux

Jupyter web server

IPython kernel

ARM A9 / A53

Overlays/designs

ZU+ Fabric

Page 25: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

PYNQ: Python Productivity for Zynq

!16

Jupyter notebooks, browser-based interface

PYNQ enables JupyterLab on Zynq and ZU+

Ubuntu-based Linux

Jupyter web server

IPython kernel

ARM A9 / A53

Overlays/designs

ZU+ Fabric

Page 26: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

PYNQ: Python Productivity for Zynq

!16

Jupyter notebooks, browser-based interface

PYNQ enables JupyterLab on Zynq and ZU+

Ubuntu-based Linux

Jupyter web server

IPython kernel

ARM A9 / A53

Overlays/designs

ZU+ Fabric

FPGA designs delivered as Python packages

Page 27: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

PYNQ: Python Productivity for Zynq

!16

Jupyter notebooks, browser-based interface

PYNQ enables JupyterLab on Zynq and ZU+

Ubuntu-based Linux

Jupyter web server

IPython kernel

ARM A9 / A53

Overlays/designs

ZU+ Fabric

Delivered as SD Card image

FPGA designs delivered as Python packages

Page 28: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

PYNQ Community – ML, Non-ML & Academic Partners

!17

Page 29: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

PYNQ Community – ML, Non-ML & Academic Partners

!17

Page 30: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

PYNQ Community – ML, Non-ML & Academic Partners

!17

Page 31: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Xilinx open source engagements related to today’s TVM meeting

!18

MicroPython

Page 32: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Xilinx open source engagements related to today’s TVM meeting

!18

MicroPython

University of Washington

UC Berkeley

Xilinx ResearchUC San Diego

Page 33: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

pynq.io/community DAC2019 Design Contest OpenHW Design Contest

Cloud Free Trials

Finally, Xilinx & building new open source communities…

Page 34: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Summary

Xilinx Great for exploring and deploying inference

Xilinx Open Source We’re actively engaging with TVM and other communities

Email: [email protected] Visit: Boulder, Colorado

Page 35: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Adaptable.Intelligent.

Page 36: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Inference Acceleration - Edge

˃ At The Edge

!22

Edge to Cloud Inference – Automotive

Page 37: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Inference Acceleration - Edge

˃ At The Edge

!22

ADAS/AD Central Module

Edge to Cloud Inference – Automotive

Page 38: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Inference Acceleration - Edge

˃ At The Edge

!22

Surround-View Camera Back

Short-Range Radar

Forward-Looking CameraDrive Monitor Camera

Surround-View Camera Left

Surround-View Camera Right

Short-Range Radar

Long-Range Lidar

Surround-View Camera Front

Short-Range Radar

ADAS/AD Central Module

Edge to Cloud Inference – Automotive

Page 39: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Edge to Cloud Inference – Xilinx Platforms

!23

PYNQ-Z1

ZCU102

ZCU104

Ultra96

Edge Devices Custom I/O, ARM CPUs

Cloud Platforms Power Efficient, PCIe, Networking

Page 40: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Edge to Cloud Inference – IIoT Latency/Data Example

!24

Example IIoT Control Rates

Page 41: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Edge to Cloud Inference – IIoT Latency/Data Example

!24

Example IIoT Control Rates

Distance NYC to LA: 2,800 miles Speed of light: 186,000 miles/s

Round trip: 2*2800/186000 = 30ms Required Control Rate = 10ms

Page 42: Inference Architectures @Xilinx · © Copyright 2018 Xilinx Edge to Cloud Inference – IIoT Latency/Data Example. Title: Graham-Schelle-Xilinx Created Date: 12/29/2018 5:28:32 AM

© Copyright 2018 Xilinx

Edge to Cloud Inference – IIoT Latency/Data Example

!24

Example IIoT Control Rates

Distance NYC to LA: 2,800 miles Speed of light: 186,000 miles/s

Round trip: 2*2800/186000 = 30ms Required Control Rate = 10ms

E.g. Power Plant @ 8TB/Month