"an update on openvx and other vision-related standards," a presentation from khronos

31
© Copyright, 2014 1 NVIDIA TEGRA K1 Mar 2, 2014 NVIDIA Confidential December 3, 2014 Standard for Vision Acceleration Elif Albuz Mobile Vision Software NVIDIA

Upload: embedded-vision-alliance

Post on 18-Aug-2015

31 views

Category:

Technology


0 download

TRANSCRIPT

© Copyright, 2014 1

NVIDIA TEGRA K1 Mar 2, 2014

NVIDIA Confidential December 3, 2014

Standard for Vision Acceleration

Elif Albuz

Mobile Vision Software

NVIDIA

© Copyright, 2014 2

Khronos Standards

Visual Computing - 3D Graphics - Heterogeneous Parallel Computing

3D Asset Handling - 3D authoring asset interchange

- 3D asset transmission format with compression

Acceleration in HTML5 - 3D in browser – no Plug-in

- Heterogeneous computing for JavaScript

Sensor Processing - Vision Acceleration - Camera Control - Sensor Fusion

© Copyright, 2014 3

Mobile/Embedded Vision Acceleration

Enables new experiences

Augmented Reality

Face, Body and Gesture Tracking

Computational Photography and

Videography

3D Scene/Object Reconstruction

© Copyright, 2014 4

Challenges for Mobile & Embedded Vision

Control, coordinate and synchronize a diverse array of

mobile sensors

Maintainable code for a heterogeneous mix of CPUs, GPUs and DSPs, dedicated hardware

Performance & Power efficiency

Creating fluid 60Hz experiences on battery-powered mobile devices

Code that is deployable across multiple devices,

platforms and OS

© Copyright, 2014 5

OpenVX – Power Efficient Vision Acceleration

Defines C API for a subset of computer vision

primitives with its data containers

Defines framework to assemble and execute

primitives with a goal of enabling various

optimization opportunities for vision pipelines.

Extensible

Vision

Accelerator

Application Application

Application Application

Vision

Accelerator Vision

Accelerator Vision

Accelerator

© Copyright, 2014 6

OpenCV & OpenVX

Governance Community driven open source

with no formal specification

Formal specification defined and

implemented by hardware vendors

Conformance No conformance tests for consistency and

every vendor implements different subset

Full conformance test suite / process

creates a reliable acceleration platform

Portability APIs can vary depending on processor Hardware abstracted for portability

Scope Very wide

1000s of imaging and vision functions

Multiple camera APIs/interfaces

Tight focus on hardware accelerated

functions for mobile vision

Use external camera API

Efficiency Memory-based architecture

Each operation reads and writes memory

Graph-based execution

Optimizable computation, data transfer

Use Case Rapid experimentation Production development & deployment

© Copyright, 2014 7

Started early 2012

Version 1.0 released in Oct 2014

Conformance Test Suite OpenVX Trademark

Contributors

OpenVX History

© Copyright, 2014 8

OpenVX

VXU Library for synchronous access to single nodes

Directed graphs for power and performance efficiency

OpenVX Node

OpenVX Node

OpenVX Node

OpenVX Node

Downstream

Application

Processing

Example OpenVX Graph

© Copyright, 2014 9

OpenVX 1.0 VXU Function Overview

Core data structures Images and Image Pyramids

Processing Graphs, Kernels, Parameters

Image Processing Arithmetic, Logical, and statistical operations

Multichannel Color and BitDepth Extraction and Conversion

2D Filtering and Morphological, resize & warp

Core Computer Vision Pyramid & Integral Image computation

Feature Extraction and Tracking Histogram Computation and Equalization

Canny Edge Detection

Harris and FAST Corner detection

Sparse Optical Flow

OpenVX 1.0 defines

framework for

creating, managing and

executing graphs

Focused set of widely

used functions that are

readily accelerated

Implementers can add

functions as extensions

Widely used extensions

adopted into future

versions of the core

OpenVX Specification

Is Extensible Khronos maintains extension registry

© Copyright, 2014 10

Some optimizations VX Graphs can Enable

Reuse memory for

different

intermediate data

Memory

management

Less allocation overhead,

more memory for

other applications

Replace a sub-

graph by a single

faster node

Kernel Merge

Better memory

locality, less kernel

launch overhead

Split the graph

execution across

the whole system

: CPU / GPU /

DSP / dedicated

HW

Graph

Scheduling

Faster execution

or lower power

consumption

Execute a sub-

graph at tile

granularity

instead of image

granularity

Data Tiling

Better use of

data cache and

local memory

© Copyright, 2014 11

Example: Feature tracking Graph

frameRGB

frameYUV

frameGray

Array of keypoints

Camera/image/video

Input data

Rendering/Output

Pyr-

1 pyr

0

pyr pts

Array of keypoints

Color Conversion

Channel Extract

Image Pyramid

Optical Flow

Harris Corners

OpenVX Graph

© Copyright, 2014 12

NVIDIA VisionWorks™ – Integrating OpenVX

VisionWorks library contains diverse vision and imaging primitives

Will leverage OpenVX for optimized primitive execution

Can extend VisionWorks nodes through GPU-accelerated primitives

Provided with sample library of fully accelerated pipelines

Application

Code

Sample

Pipelines

Tegra/Kepler dGPU

CUDA

VisionWorks

Framework

VisionWorks APIs

Classifier Corner

Detection

Feature

Tracking

Hough

Detection

Feature Tracker Hough Circle&

Line Object Tracker

Optical Flow Denoising

© Copyright, 2014 13

Summary

Khronos is building interoperating APIs for portable / power-efficient

vision and sensor processing

OpenVX 1.0 specification is now finalized and released

Full conformance tests and Adopters program immediately available

Khronos open source sample implementation by end of 2014

First commercial implementations already close to shipping

Companies are encouraged to join Khronos to influence the direction

of mobile and embedded vision processing!

© Copyright, 2014 14

Questions?

© Copyright, 2014 15

Primitive assembling and execution:2 flavors

Immediate execution

Direct ‘synchronous’ call to the primitive

Classical API blocking call

OpenCV like

Useful for fast prototyping, for intermediate migration step from OpenCV

Graph based

Relevant for video stream processing

More optimization opportunities

More discussed later in the presentation

© Copyright, 2014 16

Some more features

Possibility to create user defined primitives

More targeting the host CPU currently

Delay object to keep track of the past data when needed

OpenVX can give the valid region for images

(deduced from the processing)

© Copyright, 2014 17

2 – Key programming model aspects

© Copyright, 2014 18

2/a – Memory model

© Copyright, 2014 19

Memory model : Opaque containers (1)

The property of the ‘bytes’ well defined

Object Read/write

Object bytes stay in the property of the OpenVX world

Object Access/commit (equivalent to map/unmap)

Access: the host gets access to bytes

Commit: the host releases the access to bytes

A primitive needs all its parameters committed before execution

Useful for complex memory hierarchy :

OpenVX has control on where the data bytes are physically stored

© Copyright, 2014 20

Memory model : Opaque containers (2)

Physical layout under control of OpenVX

Access/commit

OpenVX returns a pointer + memory layout (addressing structuring)

The application needs to use this layout

Import of existing application images

The host application provides its memory layout

After commit, OpenVX can creates a shadow copy if the original layout is not

convenient for best acceleration

Useful for acceleration and performance portability

© Copyright, 2014 21

Data Objects life cycle

1. Create the object creation

The application receives a reference to the object

2. Use the object reference for processing

For access/commit

For create graph nodes with the data object as parameter

3. Release the object when the application does not to use this object

anymore

Release != Destructed, the object stays alive until it’s not referenced by

other objects (ex: a graph)

© Copyright, 2014 22

2/b – Execution model (graph)

© Copyright, 2014 23

Graph

Dataflow graph defined by interconnected nodes

Node = instance of a Primitive with a well defined parameters

Dataflow edge granularity : a data object (ex: image)

No control in graph

With exception of possibility to abort/restart the graph (node callbacks)

Graph connectivity

Fully defined by node inputs/outputs : Edges are implicitly determined from

nodes, not explicitly created by the application

Single data object writer per graph

Semantics independent from node creation ordering

Particular case: bidirectional parameter (for accumulation primitive only)

© Copyright, 2014 24

Graph life cycle

Ahead-of-time (set-up time)

Graph creation

Graph ‘Verification’

Verify for correctness

Optimizations can happen here

Runtime

Graph execution

Can be called multiple times

– Without re-verification if the graph connectivity or ‘immutable’ node

parameters not modified.

– Otherwise, re-verification needed

More optimizations can happen here (need to be ‘cheap’)

© Copyright, 2014 25

Graph execution: 2 modes

Synchronous

‘Blocking’ graph execution

Asynchronous

Note: still limited feature in 1.0

© Copyright, 2014 26

Virtual objects : specific to graphs

‘Virtual’ objects describe temporary data objects

2 Usages

More generic graph:

the user does not specify some of the object properties (example : image

dimensions) that are deduced by the graph manager from the node that generates

it.

Less work in case of image dimension changes

More memory optimizations

‘virtual’ is a contract between the application and OpenVX that tells :

“the application will never access bytes of the virtual object”

The graph manager can reuse the same physical buffer across multiple virtual

objects

A virtual object never needs to be visible from the host

© Copyright, 2014 27

3 - code example (graph)

© Copyright, 2014 28

OpenVX Graph

color

convert

channel

extract

pyramid

pyr-1 pyr0

frameRGB

frameYUV

frameGray

© Copyright, 2014 29

Feature tracking: Graph Creation (1)

void createTrackerGraph(vx_image frameRGB, tracker_t &trk) {

trk.graph = vxCreateGraph(trk.context);

// Create color convert node

vx_image frameYUV = vxCreateVirtualImage(trk.graph, 0, 0,

VX_DF_IMAGE_IYUV);

trk.cvt_color_node = vxColorConvertNode(trk.graph, frameRGB, frameYUV);

// Create channel extract node

vx_image frameGray = vxCreateVirtualImage(trk.graph, 0, 0,

VX_DF_IMAGE_U8);

trk.ch_extract_node = vxChannelExtractNode (trk.graph, frameYUV,

VX_CHANNEL_Y, frameGray);

// Create pyramid node

vx_pyramid pyr_sample =

vxCreatePyramid(trk.context, trk.pyr_levels, VX_SCALE_PYRAMID_HALF,

trk.width, trk.height, VX_DF_IMAGE_U8);

trk.pyr_delay = vxCreateDelay(trk.context, (vx_reference)pyr_sample, 2);

trk.pyr_node =

vxGaussianPyramidNode(trk.graph, frameGray,

(vx_pyramid)vxGetReferenceFromDelay(trk.pyr_delay, 0));

color

convert

channel

extract

pyramid

pyr-1 pyr0

frameRGB

frameYUV

frameGray

© Copyright, 2014 30

Feature tracking: Graph Creation (2)

vx_array pts_sample = vxCreateArray(trk.context, VX_TYPE_KEYPOINT, 1000);

trk.pts_delay = vxCreateDelay(trk.context, (vx_reference)pts_sample, 2);

trk.curr_features = vxCreateArray(trk.context, VX_TYPE_KEYPOINT, 1000);

vx_uint32 lk_epsilon = UINT_MAX;

vx_scalar s_lk_epsilon = vxCreateScalar(trk.context, VX_TYPE_UINT32, &lk_epsilon);

vx_scalar s_lk_num_iters = vxCreateScalar(trk.context, VX_TYPE_UINT32, &trk.lk_num_iters);

vx_bool lk_use_init_est = vx_false_e;

vx_scalar s_lk_use_init_est = vxCreateScalar(trk.context, VX_TYPE_BOOL, &lk_use_init_est);

trk.opt_flow_node =

vxOpticalFlowPyrLKNode(trk.graph,

(vx_pyramid)vxGetReferenceFromDelay(trk.pyr_delay, -1),

(vx_pyramid)vxGetReferenceFromDelay(trk.pyr_delay, 0),

(vx_array)vxGetReferenceFromDelay(trk.pts_delay, -1),

(vx_array)vxGetReferenceFromDelay(trk.pts_delay, -1),

trk.curr_features, VX_TERM_CRITERIA_ITERATIONS, s_lk_epsilon,

s_lk_num_iters, s_lk_use_init_est, trk.lk_win_size);

color

convert

channel

extract

pyramid

optical flow pyrLK

pyr-1 pyr0 pts0 pts-1

frameRGB

frameYUV

frameGray

curr_features

P

© Copyright, 2014 31

Feature tracking: Graph Creation (3)

// Create HarrisTrack node

trk.feature_track_node =

nvxHarrisTrackNode(trk.graph, frameGray, (vx_array)vxGetReferenceFromDelay(trk.pts_delay, 0),

0, trk.curr_features, trk.harris_k, trk.harris_thresh);

// Verify the graph is legal, and optimize it

vxVerifyGraph(trk.graph);

color

convert

channel

extract

pyramid

optical flow pyrLK

Harris

track

Pyr-1 pyr0 pts0 Pts-1

frameRGB

frameYUV

frameGray

curr_features

P

P