taking configurability and programmability to the next ... · taking configurability and...

25
Dr. Yankin Tanurhan Vice President of Engineering, SG Synopsys Inc. July 2016 Taking Configurability and Programmability to the Next Level for Embedded Systems Processing

Upload: doanminh

Post on 24-May-2018

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

Dr. Yankin TanurhanVice President of Engineering, SGSynopsys Inc.July 2016

Taking Configurability and Programmability to the Next Level for Embedded Systems Processing

Page 2: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

2© 2016 Synopsys, Inc.

Agenda

Machine vision Scalable EV Processors Convolution Neural Network Engine (CNN)Vision programming toolsHost IntegrationConclusions

Page 3: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

© 2016 Synopsys, Inc. 3

Embedded Vision is Coming Fast

• Embedded Vision is the use of computer vision in embedded systems to interpret meaning from images or video

• In cars to improve safety

• Surveillance for detection and tracking

• In industrial automation to improve quality and control

• Estimated $300B+ market in 2020, 35% CAGR

0

50

100

150

200

250

300

350

2013 2014 2015 2016 2017 2018 2019 2020

Bill

ions

of D

olla

rs

Vision Systems Shipments

Sources: ABI Research, Insight Media, Transparency Market Research, Markets And Markets, Synopsys

Page 4: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

4© 2016 Synopsys, Inc.

Video Surveillance Market

• Global IP Video Surveillance Market expected to grow at CAGR of 37.3% from 2012-20

• Demand driven by– Growing installations of IP cameras – Need for surveillance cameras

with better video quality

http://www.alliedmarketresearch.com/IP-video-surveillance-VSaaS-market

3X Growth Forecast 2013 - 2019

Home Surveillance, Retail, Healthcare, Security (Airports, Govt, Banks, Casinos)

Page 5: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

5© 2016 Synopsys, Inc.

ADAS Vision Market

Source: IHS 2015

Page 6: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

6© 2016 Synopsys, Inc.

Scalable Embedded Vision Processors

Page 7: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

© 2016 Synopsys, Inc. 7

Scalable Embedded Vision Processors

• Highly integrated and configurable– Configurable scalar, vector DSP and

convolutional neural network (CNN) architecture

– Supports 1080p - 4K vision streams

• User scalable for optimum performance – 1 to 4 Vision CPU cores– Programmable CNN engine

• State-of-the-art performance-efficiency

• High productivity toolset– OpenCV, OpenVX, OpenCL C

OpenCV, OpenVX™ libraries and API

MetaWare, OpenCL C Development Tools

Vision CPU (1 to 4 cores)

AXI Interconnect

Core 4

Core 3

CNN Engine

Core 2

Core 1

32-bit scalar

512-bit vector DSP

Embedded Vision Processor

Convolution

Classification32-bit scalar

512-bit vector DSP

Shared MemoryShared MemorySync & DebugSync & Debug Streaming Transfer UnitStreaming Transfer Unit

Page 8: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

8© 2016 Synopsys, Inc.

High-Performance Vision CPU

• Implemented with 1 to 4 Vision CPU cores to run full range of vision algorithms up to 4K resolutions

• 32-bit scalar for easy system integration

• Configurable vector DSP for high-end vision processing

• Cross-bar implements scatter/gather delivering higher performance with increased efficiency

• OpenCL C optimized instruction set for high programming productivity with reduced cycle count and lower power

• Parallel operation with the CNN Engine increasing efficiency and throughput

Fast, Real-time Vision Processing

Vision CPU

32-bit Scalar32-bit Scalar

D$I$

512-bit Vector DSP512-bit Vector DSP

LN1

LN1

LN2

LN2

LN3

LN3

LNn

LNn

VCCM

Cross-BARCross-BAR

Page 9: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

9© 2016 Synopsys, Inc.

Multi-Core Scalability to Increase Performance

External Bus Interface (AXI)External Bus Interface (AXI)

Vision CPU 4AUXREG

VCCM

ICCMDCCM

ICACHE

Vision CPU 4Vision CPU 4AUXREG

AUXREG

VCCMVCCM

ICCMICCMDCCMDCCM

ICACHEICACHE

Vision CPU 2AUXREG

VCCM

DCCM

D$

DCCM

Vision CPU 2Vision CPU 2AUXREG

AUXREG

VCCMVCCM

DCCMDCCM

D$D$

DCCMDCCM

Interrupt I/F to Host

Inter-core interrupt

unit

Inter-core interrupt

unit Inter-core

semaphore unit

Inter-core semaphore

unit

Inter-core message

passing unit

Inter-core message

passing unit

64-bit global RTC

64-bit global RTC

ARC

AU

X in

terfa

ceARC

AUX interface

Inter-core debug

assist unit

Inter-core debug

assist unit

I$I$

Vision CPU1 AUXREG

VCCM

D$

Vision CPU1Vision CPU1 AUXREG

AUXREG

VCCMVCCM

D$D$

I$I$

Vision CPU 3AUXREG

VCCM

D$

I$

Vision CPU 3Vision CPU 3AUXREG

AUXREG

VCCMVCCM

D$D$

I$I$

L1 Scalar D$ Coherency Unit

STU(Streaming

Transfer Unit)

STU(Streaming

Transfer Unit)

VDSP

1VD

SP2

VDSP

3VD

SP4

CSM Cluster

Shared Mem.- Multi-bank- Arbitration

Page 10: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

© 2016 Synopsys, Inc. 10

Efficient Frame Data Access

• Streaming Transfer Unit (DMA)• Tightly integrated with VDSP vector memory

(VCCM)– Capable of addressing the memory as

Private Banks or Local Memory– Low overhead:

– Transfers can be setup in VCCM memory with vector instructions

– Event-based low-overhead synchronization (no ITs required)– Physical channels (up to 4) can be shared between VDSP cores to

serve multiple private banks in parallel• Support for 1D and 2D (image) transfers for vision applications• I/O coherency: Configurable I/O-coherent regions

External Bus Interface (AXI)External Bus Interface (AXI)

STUSTU

HS VDSP 2HS VDSP 2

VCCMVCCM

D$D$

I$I$

HS VDSP 4HS VDSP 4

VCCMVCCM

D$D$

I$I$

HS VDSP 1HS VDSP 1

VCCMVCCM

D$D$

I$I$

HS VDSP 3HS VDSP 3

VCCMVCCM

D$D$

I$I$

ClusterShared Memory

Page 11: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

11© 2016 Synopsys, Inc.

Convolution Neural Network Engine

Page 12: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

© 2016 Synopsys, Inc. 12

CNN is Accurate and Efficient

• Convolutional Neural Networks (CNN)– Deep learning approach outperforms other vision algorithms– CNNs attempt to replicate how the brain sees– Efficiently recognizes objects directly from pixel images with

minimal pre-processing

• CNN usage for Vision– Image classification, search similar images– Object detection, classification & localization

– Any type of object(s), depending on training phase

– Face recognition, visual attention, facial expression recognition– Gesture recognition / hand tracking– Scene recognition and labelling, semantic segmentation

CNN

Page 13: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

© 2016 Synopsys, Inc. 13

User Benefits of a Dedicated CNN Engine• Scalable high performance

– Over 800 MAC/cycle

• PPA competitive with H/W-based CNNs– Similar area density (GMAC/s/mm2) as H/W– Over 1000 GOP/s/W

• Low external memory bandwidth– Close to image pixel bandwidth <800 MB/s

• Highest productivity and flexibility– Automatic CNN graph mapping

• Closely coupled with leading edge EV6x processor– Unique combination of scalar + vector processing + CNN– Integrated into a single, standard programming model

800 MAC/cycle

Page 14: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

14© 2016 Synopsys, Inc.

High-Performance CNN Engine

• Dedicated CNN Engine delivers higher performance than competitive solutions

• Programmable to support full range of fixed point CNN graphs

• State-of-the-art power-efficiency >1000 GOPS/W

• Supports resolutions to 4K

• Real-time, high quality image classification, object recognition, semantic segmentation

• Parallel operation with Vision CPU increasing efficiency and throughput

AXI Interconnect

Vision CPU32 bit RISC

512-bit Vector DSP

ClusterSharedMemory

DMA

AR

Connect

CNN Engine

Convolution

Classification

Page 15: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

© 2016 Synopsys, Inc. 15

CNN Scene segmentation benchmark• Application is to perform semantic labeling

of full image– Define sky, forest, buildings– Define border of road (w/o lane markers)– Used typically in path finding applications for autonomous driving– Unlike object detection, there is no Region-of-Interest pre-calculation

– Need to analyze entire image

• CNN graph characteristics– Video at 1080p, 30 fps– 5 channels (3 channels for colors, 2 application-specific channels)– Image bandwidth requirements:

311 MB/s (for 8 bit pixels), or 622 MB/s (for 16 bit pixels)– Compute requirements: Over 750 GMAC/s– Weights: Over 100K values

Preliminary – Subject to Change

carcar

skybuilding

road

building

Page 16: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

16© 2016 Synopsys, Inc.

Vision Programming Tools

Page 17: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

17© 2016 Synopsys, Inc.

High Productivity Tools Increase Flexibility• OpenVX eases vision graph

development

• OpenCV open source library of 2500 vision algorithms helps build vision applications

• MetaWare C/C++ compiler delivers optimize program coding

• OpenCL C instructions with whole function vectorization simplifies DSP programming

• CNN graph mapping tools automate CNN programming

K1 Kn…

Kernel LibUi

UjUk

Kn

Cm1

UkUm

Runtime

Cm2

Cm3

Cm3

User kernels

Ui

Uk

C/C++

OpenCL C

Cmi

CNN Primitives

CNN Graph Mapping Tools

CNN Graph Training

C/C++ compiler with OpenCL C whole function

vectorizationLib

Embedded Vision Processor

Vision Graph

Synopsys User Open Source Third Party Partner

Page 18: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

18© 2016 Synopsys, Inc.

FPGA-Based Development System

EV6x ProcessorVision CPU Shared

Data Mem

CNN Engine

DMADMA

AXI Subsystem InterconnectAXI Subsystem Interconnect

HS VDSP 4

MEMHS VDSP 1

AXI InterconnectAXI Interconnect

DDR3

Vision CPU- Scalar & vector processing- 1 – 4 HS VDSP Cores- Supports HD to 4K streams- Host offload or standalone

operation

AXIUMRBus

HAPS-80 (S26) Prototyping System

HAPS-80EV6x runs at up

to 100 MHz

32-bit RISC CPU

512-bit Vision DSP

Convolution

Classifier, SVM PCIe

Software Development- High productivity tool set- OpenVX runtime & kernels- OpenCL C vectorizing compiler- MetaWare C/C++ compiler

CNN Engine- Object detection,

classification, localization- Fully programmable- >1000 GMAC/s/W

Page 19: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

19© 2016 Synopsys, Inc.

EV Processor SoC Virtual Development Kit

• Host integration demonstrator–VDK with ARC HS38 host running Linux

O/S

–Linked to EV processor model

–Low-level mechanisms provided to support customer-specific host O/S driver development

–Simple host-side demonstrator application: e.g., send frame for object detection processing, receive detection points

SoC InterconnectSoC Interconnect

SoCMem Host

Customer’s SoC

DRAMVDK

Linux

I/Operiph

Existing ARC HS VDK

DMA

VDK-wrappedEV processor model

nSIM ISS model CNN Model

I/F

Page 20: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

20© 2016 Synopsys, Inc.

Host Integration

Page 21: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

21© 2016 Synopsys, Inc.

Designed for SoC Integration

• Must be designed for easy integration with a host processor– Most designs will include host

HAPS-70 S12

Virtual Platform

Reference Designs

Sensor Subsystem

EV Processor

Host

AXI

IntelVision CPU (1 to 4 cores)

AXI Interconnect

Core 4Core 3

CNN Engine

Core 2Core 1

32-bit RISC

512-bit vector DSP

Embedded Vision Processor

Convolution

Classification

Cluster Shared MemorySync & DebugSync & Debug Streaming Transfer UnitStreaming Transfer Unit

32-bit RISC

512-bit vector DSP

Page 22: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

22© 2016 Synopsys, Inc.

Host to EV6 CommunicationAs implemented in the EV VDK

CNN-based Application

AXI Interconnect

User Application

Custom host/EV system I/F to invoke CNN processing

Memory

Interrupts

Data transfer

Data transfer

runtimeon EV61 Processor

Sensor Subsystem

Host

Page 23: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

23© 2016 Synopsys, Inc.

Conclusions

Page 24: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

24© 2016 Synopsys, Inc.

Conclusions

• Strong growth in vision applications – Surveillance, ADAS, IoT, Industrial, gesture recognition, object detection– Embedded vision algorithm innovation, highly dynamic– Requirements for configurability and progammability

• Vision Processors integrate HW & SW for optimized vision apps– Flexible, integrated solution (scalar + vector DSP + CNN)– Dedicated CNN engine offers higher performance and best efficiency– Low-power, high-performance, flexible programming– Fast time-to-market with high-level OpenVX and C programming tools

• High productivity tool set– OpenVX, OpenCV, OpenCL C, Optimized C/C++ MetaWare compiler

Page 25: Taking Configurability and Programmability to the Next ... · Taking Configurability and Programmability to the ... •Surveillance for detection and tracking ... –OpenCV, OpenVX,

Thank YouThank You