opencv for embedded: lessons learned

15
Copyright © 2015 Itseez 1 Yury Gorbachev 12-May-2015 OpenCV for Embedded: Lessons Learned

Upload: yury-gorbachev

Post on 18-Aug-2015

293 views

Category:

Software


2 download

TRANSCRIPT

Page 1: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 1

Yury Gorbachev

12-May-2015

OpenCV for Embedded: Lessons Learned

Page 2: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 2

• Open-source Computer Vision library (>2500 algos)• De-facto standard in CV, BSD license• Written in C++, C interface is now deprecated• Supports multiple platforms (Linux, Windows, OSX, Android, iOS,

QNX)• Used by Google, nVidia, Microsoft, Intel, Stanford, etc.• Funding/contributions from Willow Garage, nVidia, GSoC, AMD,

Intel• Maintained by Itseez

Page 3: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 3

• OpenCV provides extensive means to create an entire application• Camera interface (for example, V4L2 interface on Linux)• Video Reading interface (using ffmpeg)• UI primitives (windows, keyboard/mouse input, etc.)

• Decent performance out of the box• Scalar performance is already good enough• Some algorithms are capable of working ~100 FPS on average

desktops• Extra optimization is not required in most of the cases

• Good and pretty stable acceleration possibilities• Intel® TBB is sufficient for multi-core• AVX, IPP, OpenCL, CUDA

Page 4: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 4

• Mostly ARM platforms• Exotic execution environments

• C++ is not default language (e.g. on Android)• Different interfaces (Camera, UI, Log)• Hard to troubleshoot

• Insufficient and unpredictable performance• Mobile and Embedded are still behind Desktop• Thermal protection, power saving and other tricky issues

• Zoo of acceleration possibilities• SIMD, DSP, GPU offload, FPGA• Multi-core systems, heterogeneous systems

Page 5: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 5

OpenCV

Platform Agnostic Modules

core, imgproc, calib3d, video, ml, objdetect, features2d,

photo, …

Platform Dependent Modules

gpu, highgui, androidcamera python and java bindings

DependenciesJPEG, PNG, Jasper, multimedia, OpenNI

DependenciesCMake

• Algorithm modules are easy to migrate to new environment• С++ and CMake are the only requirements!

• OpenCV accuracy tests• Easily verify correctness of OpenCV on a new platform• Some vendors use for regression tests during environment

updates

AccelerationsTBB/GDC/Concurrency,

IPP, Eigen

Page 6: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 6

Prototyping(x86)

Porting

Profiling

Bottleneck optimization

Fine Tuning

Productization

Regression Tests

Performance Tests

• Video input, more debug possibilities, simple UI, higher speed• Focus on algorithm, not environment!

Page 7: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 7

• HW performance is always an issue for vision systems• Heavy image processing requires significant memory

bandwidth• Usual bottleneck; multiple cores do not help

• Collocation of multiple algorithms on a single system (e.g. ADAS)

• Mobile platforms are even more complicated• Thermal protection, power saving are hard to control and

influence• Hard to predict when/if we are consuming too much• Unstable FPS impacts algorithm complexity (e.g. object

tracking)

• Hardware selection is not easy• Very hard to predict final application performance beforehand• No valid benchmarks to emulate computer vision patterns

Page 8: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 8

• OpenCV was initially optimized for desktop where it works fast• ARM optimizations are far behind• Scalar code does not perform on ARM as good as on x86• Optimization might help to some extent

SSE

IPP

NEON (OpenCV 3)

NEON

150

10050

5

Number of optimized functions within OpenCV

Page 9: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 9

• Algorithm optimization and only then hotspots• Reduce search and track areas, use grayscale, reduce

resolution

• Select proper HW if possible• Compare development kit performance at least• Try ARMv8, it is better in scalar performance

• Use OpenCV packages from HW vendors (NVIDIA, TI)• Vendor specific packages yield out of the box improvements

on specific HW, very easy to try• Not a cross-platform solution

• Optimize functions yourself• NEON, DSP and other HW specific options

Page 10: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 10

Filter 2

D

Adapt

ive T

hres

hold

Blur

FAST

18.9

138163.6

32.42.3 3.1 3.1 7.9

Processing on ARM v7A

OpenCV Itseez

• Note scalar difference ARM v7A vs. v8

Filter 2

D

Adapt

ive T

hres

hold

Blur

FAST

30.8 30.1 27.123.2

2.5 1.4 0.65

Processing on ARM v8

OpenCV Itseez

Page 11: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 11

• Itseez ADAS solution• Traffic Sign Recognition• Front Collision Warning• Line Departure Warning• Pedestrian Detection

• All algorithms are running real-time on off-the-shelf ARM device• Designed and tested using OpenCV• Product implements intelligent pipeline layer to reduce load• Uses custom accelerated functions

Page 12: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 12

• Intelligent pipeline• Shares computation results between algorithms• Complicated processing is performed only once, used by all• Multiple frame sizes used where appropriate

• Custom NEON optimizations• Heavily optimized using only NEON, no GPU, DSP• Multiple processing functions are joined to reduce memory

access• E.g. demosaicing with conversion to grayscale & RGBA

• Some interesting statistics• Algorithm optimizations accelerate by factor 2-3• NEON accelerations give another 3-4x

Page 13: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 13

• OpenVX standard by Khronos• Hardware accelerated vision – easier life for everyone• Currently being implemented by number of vendors

• OpenCV HAL (a part of OpenCV 3.x)• Low level API beneath the standard OpenCV• Open-source, but potentially can use proprietary

components • Generic multi-core scheduler (Planned feature)

• Make multi-core scheduler more intelligent on mobile architectures

• pthread-based backend in addition to existing options• Vision benchmarks for hardware (Desired feature)

• Some performance tests are present in OpenCV already• Not possible to use for benchmarking directly, some work is

needed • OpenCV Manager for Android could also contain

benchmarking

Page 14: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 14

• Itseez Web: www.itseez.com

• OpenCV home: www.opencv.org• OpenCV documentation: docs.opencv.org• GitHub: https://github.com/Itseez/opencv• OpenCV resources on Embedded Vision Alliance (plenty of info):

http://www.embedded-vision.com/opencv-resources

• OpenCV on TI: http://www.ti.com/lit/wp/spry175/spry175.pdf• OpenCV on NVIDIA: https://developer.nvidia.com/opencv

• E-mail me: [email protected]

Page 15: OpenCV for Embedded: Lessons Learned

Copyright © 2015 Itseez 15

Q & A