inference architectures @xilinx · © copyright 2018 xilinx edge to cloud inference – iiot...
TRANSCRIPT
© Copyright 2018 Xilinx
Graham Schelle, PhD Principal Engineer Xilinx Research Labs
Inference Architectures @Xilinx
© Copyright 2018 Xilinx
Xilinx Headlines
!2
Twitch Chooses Xilinx to Enable its Broadcast-quality Livestream of eSports
© Copyright 2018 Xilinx
Agenda
˃ Xilinx Adaptive Architectures
˃ Inference Architectures
˃ Open Source
© Copyright 2018 Xilinx
Xilinx Adaptive Architectures
!4
Traditionally, FPGAs for massively data-parallel applications
© Copyright 2018 Xilinx
Xilinx Adaptive Architectures
!4
Traditionally, FPGAs for massively data-parallel applications
In 2011, Zynq introduced (ZU+ in 2015) ARM CPUs added for embedded applications
© Copyright 2018 Xilinx
Xilinx Adaptive Architectures – Alveo & Versal
!5
In 2018, Alveo introduced Accelerator cards for data center workloads
Coming in 2019, Versal Platform Adaptive compute acceleration platform (ACAP)
AI Engines
Network On Chip
Cortex-A72s
© Copyright 2018 Xilinx
Inference Architectures
© Copyright 2018 Xilinx
Inference Architectures – Evolving Frameworks
!7
Andrej Karpathy on Twitter
˃ Increasing, Evolving Workloads New acceleration needs & algorithms ML “infused” in many applications Adaptable HW a key benefit
© Copyright 2018 Xilinx
Inference Architectures – Evolving Frameworks
!7
Andrej Karpathy on Twitter
˃ Increasing, Evolving Workloads New acceleration needs & algorithms ML “infused” in many applications Adaptable HW a key benefit
˃ Move to Lower Precision ML inference moving to INT8 & lower Better Perf/W with similar accuracy Xilinx devices natively support variable precision
˃ Compressed Networks Higher performance with reduced compute / memory needs Pruning & load balancing to match network requirements
© Copyright 2018 Xilinx
Inference Architectures – Evolving Workloads
!8
˃ Increasing, Evolving Workloads New acceleration needs & algorithms ML “infused” in many applications Adaptable HW a key benefit
˃ Move to Lower Precision ML inference moving to INT8 & lower Better Perf/W with similar accuracy Xilinx devices natively support variable precision
˃ Compressed Networks Higher performance with reduced compute / memory needs Pruning & load balancing to match network requirements
© Copyright 2018 Xilinx
Inference Architectures – Evolving Workloads
!8
˃ Increasing, Evolving Workloads New acceleration needs & algorithms ML “infused” in many applications Adaptable HW a key benefit
˃ Move to Lower Precision ML inference moving to INT8 & lower Better Perf/W with similar accuracy Xilinx devices natively support variable precision
˃ Compressed Networks Higher performance with reduced compute / memory needs Pruning & load balancing to match network requirements
1. Inference is hard 2. Huge variation in compute and memory
requirements 3. Models typically don’t fit into cache
© Copyright 2018 Xilinx!9
Inference Architectures – Precision vs Power
Source: Bill Dally (Stanford), Cadence Embedded Neural Network Summit, February 1, 2017
Target Device ZU7EV ● Ambient temperature: 25 °C ● 12.5% of toggle rate ● 0.5 of Static Probability ● Power reported for PL accelerated block only
LSTM - Test Error vs Power(W)
Test
err
or [%
]
8
11
14
17
20
Estimated Power Consumption [W]0.500 0.875 1.250 1.625 2.000
Bits (W/A)Pareto Optimal
Rybalkin, V., Pappalardo, A., Ghaffar, M.M., Gambardella, G., Wehn, N. and Blott, M. "FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs."
FPGA:ASIC:
2/3
3/4
2/42/8
4/4
3/8 8/8
3/3
4/8
Michaela Blott, Hot Chips 2018 Tutorial, “Overview of Deep
Learning and Computer Architectures for Accelerating
DNNs”
© Copyright 2018 Xilinx!10
Xilinx Cloud Inference - ML Suite Overlays with xDNN
https://github.com/Xilinx/ml-suiteOn-prem and cloud boards
• Built in Programmable Logic • High Utilization, Thput or Latency Variants • CPU offload for new layer exploration
xDNN w/ xfDNN Compiler
© Copyright 2018 Xilinx
Xilinx Edge Inference - DeePhi
!11
“Learning both Weights and Connections for Efficient Neural Networks”, NeurIPS 2015
“EIE: Efficient Inference Engine on Compressed Deep Neural Network”, ISCA 2016
“ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA”, FPGA 2017
(2016)
(2013)
© Copyright 2018 Xilinx
Xilinx Edge Inference - DeePhi
!11
“Learning both Weights and Connections for Efficient Neural Networks”, NeurIPS 2015
“EIE: Efficient Inference Engine on Compressed Deep Neural Network”, ISCA 2016
“ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA”, FPGA 2017
(2016)
(2013)
© Copyright 2018 Xilinx
Cloud & Edge Integration
!12
© Copyright 2018 Xilinx
Xilinx and Open Source
© Copyright 2018 Xilinx
Xilinx and Open Source
!14
PYNQ Xilinx Runtime for PCIe Attached FPGAsQuantized Neural Networks
…More on www.github.com/Xilinx
© Copyright 2018 Xilinx
Xilinx and Open Source
!14
PYNQ Xilinx Runtime for PCIe Attached FPGAsQuantized Neural Networks
…More on www.github.com/Xilinx
© Copyright 2018 Xilinx
Python is increasingly the Language of Choice
!15
https://spectrum.ieee.org/at-work/innovation/the-2018-top-programming-languages
Top Programming Languages, IEEE Spectrum, July’17Top Programming Languages, IEEE Spectrum, July’18
https://stackoverflow.blog/2017/09/06/incredible-growth-python/
To date
© Copyright 2018 Xilinx
Python is increasingly the Language of Choice
!15
https://spectrum.ieee.org/at-work/innovation/the-2018-top-programming-languages
Top Programming Languages, IEEE Spectrum, July’17Top Programming Languages, IEEE Spectrum, July’18
Python is listed as an embedded language
for the first time
https://stackoverflow.blog/2017/09/06/incredible-growth-python/
To date
© Copyright 2018 Xilinx
Python is increasingly the Language of Choice
!15
https://spectrum.ieee.org/at-work/innovation/the-2018-top-programming-languages
Top Programming Languages, IEEE Spectrum, July’17
Python is the fastest growing language: driven by data science, AI, ML and academia
Top Programming Languages, IEEE Spectrum, July’18
Python is listed as an embedded language
for the first time
https://stackoverflow.blog/2017/09/06/incredible-growth-python/
To date
© Copyright 2018 Xilinx
PYNQ: Python Productivity for Zynq
!16
Ubuntu-based Linux
Jupyter web server
IPython kernel
ARM A9 / A53
Overlays/designs
ZU+ Fabric
© Copyright 2018 Xilinx
PYNQ: Python Productivity for Zynq
!16
Jupyter notebooks, browser-based interface
Ubuntu-based Linux
Jupyter web server
IPython kernel
ARM A9 / A53
Overlays/designs
ZU+ Fabric
© Copyright 2018 Xilinx
PYNQ: Python Productivity for Zynq
!16
Jupyter notebooks, browser-based interface
PYNQ enables JupyterLab on Zynq and ZU+
Ubuntu-based Linux
Jupyter web server
IPython kernel
ARM A9 / A53
Overlays/designs
ZU+ Fabric
© Copyright 2018 Xilinx
PYNQ: Python Productivity for Zynq
!16
Jupyter notebooks, browser-based interface
PYNQ enables JupyterLab on Zynq and ZU+
Ubuntu-based Linux
Jupyter web server
IPython kernel
ARM A9 / A53
Overlays/designs
ZU+ Fabric
FPGA designs delivered as Python packages
© Copyright 2018 Xilinx
PYNQ: Python Productivity for Zynq
!16
Jupyter notebooks, browser-based interface
PYNQ enables JupyterLab on Zynq and ZU+
Ubuntu-based Linux
Jupyter web server
IPython kernel
ARM A9 / A53
Overlays/designs
ZU+ Fabric
Delivered as SD Card image
FPGA designs delivered as Python packages
© Copyright 2018 Xilinx
PYNQ Community – ML, Non-ML & Academic Partners
!17
© Copyright 2018 Xilinx
PYNQ Community – ML, Non-ML & Academic Partners
!17
© Copyright 2018 Xilinx
PYNQ Community – ML, Non-ML & Academic Partners
!17
© Copyright 2018 Xilinx
Xilinx open source engagements related to today’s TVM meeting
!18
MicroPython
© Copyright 2018 Xilinx
Xilinx open source engagements related to today’s TVM meeting
!18
MicroPython
University of Washington
UC Berkeley
Xilinx ResearchUC San Diego
© Copyright 2018 Xilinx
pynq.io/community DAC2019 Design Contest OpenHW Design Contest
Cloud Free Trials
Finally, Xilinx & building new open source communities…
© Copyright 2018 Xilinx
Summary
Xilinx Great for exploring and deploying inference
Xilinx Open Source We’re actively engaging with TVM and other communities
Email: [email protected] Visit: Boulder, Colorado
© Copyright 2018 Xilinx
Adaptable.Intelligent.
© Copyright 2018 Xilinx
Inference Acceleration - Edge
˃ At The Edge
!22
Edge to Cloud Inference – Automotive
© Copyright 2018 Xilinx
Inference Acceleration - Edge
˃ At The Edge
!22
ADAS/AD Central Module
Edge to Cloud Inference – Automotive
© Copyright 2018 Xilinx
Inference Acceleration - Edge
˃ At The Edge
!22
Surround-View Camera Back
Short-Range Radar
Forward-Looking CameraDrive Monitor Camera
Surround-View Camera Left
Surround-View Camera Right
Short-Range Radar
Long-Range Lidar
Surround-View Camera Front
Short-Range Radar
ADAS/AD Central Module
Edge to Cloud Inference – Automotive
© Copyright 2018 Xilinx
Edge to Cloud Inference – Xilinx Platforms
!23
PYNQ-Z1
ZCU102
ZCU104
Ultra96
Edge Devices Custom I/O, ARM CPUs
Cloud Platforms Power Efficient, PCIe, Networking
© Copyright 2018 Xilinx
Edge to Cloud Inference – IIoT Latency/Data Example
!24
Example IIoT Control Rates
© Copyright 2018 Xilinx
Edge to Cloud Inference – IIoT Latency/Data Example
!24
Example IIoT Control Rates
Distance NYC to LA: 2,800 miles Speed of light: 186,000 miles/s
Round trip: 2*2800/186000 = 30ms Required Control Rate = 10ms
© Copyright 2018 Xilinx
Edge to Cloud Inference – IIoT Latency/Data Example
!24
Example IIoT Control Rates
Distance NYC to LA: 2,800 miles Speed of light: 186,000 miles/s
Round trip: 2*2800/186000 = 30ms Required Control Rate = 10ms
E.g. Power Plant @ 8TB/Month