"opencv for embedded: lessons learned," a presentation from itseez

15
Copyright © 2015 Itseez 1 Yury Gorbachev 12 May 2015 OpenCV for Embedded: Lessons Learned

Upload: embedded-vision-alliance

Post on 24-Jan-2017

466 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 1

Yury Gorbachev

12 May 2015

OpenCV for Embedded:

Lessons Learned

Page 2: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 2

• Open-source Computer Vision library (>2500 algos)

• De-facto standard in CV, BSD license

• Written in C++, C interface is now deprecated

• Supports multiple platforms (Linux, Windows, OSX, Android, iOS, QNX)

• Used by Google, NVIDIA, Microsoft, Intel, Stanford, etc.

• Funding/contributions from Willow Garage, NVIDIA, GSoC, AMD, Intel

• Maintained by Itseez

What is OpenCV

Page 3: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 3

• OpenCV provides extensive means to create an entire application

• Camera interface (for example, V4L2 interface on Linux)

• Video Reading interface (using ffmpeg)

• UI primitives (windows, keyboard/mouse input, etc.)

• Decent performance out of the box

• Scalar performance is already good enough

• Some algorithms are capable of working ~100 FPS on average desktops

• Extra optimization is not required in most of the cases

• Good and pretty stable acceleration possibilities

• Intel® TBB is sufficient for multi-core

• AVX, IPP, OpenCL, CUDA

Desktops Are Good And Fast

Page 4: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 4

• Mostly ARM platforms

• Exotic execution environments

• C++ is not default language (e.g. on Android)

• Different interfaces (Camera, UI, Log)

• Hard to troubleshoot

• Insufficient and unpredictable performance

• Mobile and Embedded are still behind Desktop

• Thermal protection, power saving and other tricky issues

• Zoo of acceleration possibilities

• SIMD, DSP, GPU offload, FPGA

• Multi-core systems, heterogeneous systems

Embedded Changes A Lot

Page 5: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 5

OpenCV

OpenCV Based Algorithms Are Highly Portable

Platform Agnostic Modules

core, imgproc, calib3d, video, ml,

objdetect, features2d, photo, …

Platform Dependent Modules

gpu, highgui, androidcamera

python and java bindings

Dependencies

JPEG, PNG, Jasper,

multimedia, OpenNI

Dependencies

CMake

• Algorithm modules are easy to migrate to new environment

• С++ and CMake are the only requirements!

• OpenCV accuracy tests

• Easily verify correctness of OpenCV on a new platform

• Some vendors use for regression tests during environment updates

Accelerations

TBB/GDC/Concurrency,

IPP, Eigen

Page 6: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 6

Use Desktop For Algorithm Development

Prototyping

(x86)

Porting

Profiling

Bottleneck

optimization

Fine Tuning

Productization

Regression

Tests Performance Tests

• Video input, more debug possibilities, simple UI, higher speed

• Focus on algorithm, not environment!

Page 7: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 7

• Hardware performance is always an issue for vision systems

• Heavy image processing requires significant memory bandwidth

• Usual bottleneck; multiple cores do not help

• Collocation of multiple algorithms on a single system (e.g. ADAS)

• Mobile platforms are even more complicated

• Thermal protection, power saving are hard to control and influence

• Hard to predict when/if we are consuming too much

• Unstable FPS impacts algorithm complexity (e.g. object tracking)

• Hardware selection is not easy

• Very hard to predict final application performance beforehand

• No valid benchmarks to emulate computer vision patterns

Consider Embedded Performance Issues

Page 8: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 8

• OpenCV was initially optimized for desktop where it works fast

• ARM optimizations are far behind

• Scalar code does not perform on ARM as good as on x86

• Optimization might help to some extent

It Is Normal If It’s Slow Without Optimizations

150

100

50

5

SSE

IPP

NEON (OpenCV 3)

NEON

Number of optimized functions within OpenCV

Page 9: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 9

• Algorithm optimization and only then hotspots

• Reduce search and track areas, use grayscale, reduce resolution

• Select proper hardware if possible

• Compare development kit performance at least

• Try ARMv8, it is better in scalar performance

• Use OpenCV packages from hardware vendors (NVIDIA, TI)

• Vendor specific packages yield out of the box improvements on

specific hardware, very easy to try

• Not a cross-platform solution

• Optimize functions yourself

• NEON, DSP and other hardware specific options

A Few Optimization Hints

Page 10: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 10

Itseez Achievements

18.9

138

163.6

32.4

2.3 3.1 3.1 7.9

Filter 2D AdaptiveThreshold

Blur FAST

Processing on ARM v7A

OpenCV Itseez

• Note scalar difference ARM v7A vs. v8

30.8 30.1 27.1

23.2

2.5 1.4 0.6

5

Filter 2D AdaptiveThreshold

Blur FAST

Processing on ARM v8

OpenCV Itseez

Page 11: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 11

• Itseez ADAS solution

• Traffic Sign Recognition

• Front Collision Warning

• Line Departure Warning

• Pedestrian Detection

• All algorithms are running real-time on off-the-shelf ARM device

• Designed and tested using OpenCV

• Product implements intelligent pipeline layer to reduce load

• Uses custom accelerated functions

Actual Product Example

Page 12: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 12

• Intelligent pipeline

• Shares computation results between algorithms

• Complicated processing is performed only once, used by all

• Multiple frame sizes used where appropriate

• Custom NEON optimizations

• Heavily optimized using only NEON, no GPU, DSP

• Multiple processing functions are joined to reduce memory access

• E.g. demosaicing with conversion to grayscale & RGBA

• Some interesting statistics

• Algorithm optimizations accelerate by factor 2-3

• NEON accelerations give another 3-4x

Itseez ADAS—Some More Details

Page 13: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 13

• OpenVX standard by Khronos

• Hardware accelerated vision—easier life for everyone

• Currently being implemented by number of vendors

• OpenCV HAL (a part of OpenCV 3.x)

• Low level API beneath the standard OpenCV

• Open-source, but potentially can use proprietary components

• Generic multi-core scheduler (Planned feature)

• Make multi-core scheduler more intelligent on mobile architectures

• pthread-based backend in addition to existing options

• Vision benchmarks for hardware (Desired feature)

• Some performance tests are present in OpenCV already

• Not possible to use for benchmarking directly, some work is needed

• OpenCV Manager for Android could also contain benchmarking

What Is Missing? What Is Planned?

Page 14: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 14

• Itseez Web: www.itseez.com

• OpenCV home: www.opencv.org

• OpenCV documentation: docs.opencv.org

• GitHub: https://github.com/Itseez/opencv

• OpenCV resources on Embedded Vision Alliance (plenty of info):

http://www.embedded-vision.com/opencv-resources

• OpenCV on TI: http://www.ti.com/lit/wp/spry175/spry175.pdf

• OpenCV on NVIDIA: https://developer.nvidia.com/opencv

• E-mail me: [email protected]

Resources

Page 15: "OpenCV for Embedded: Lessons Learned," a Presentation from itseez

Copyright © 2015 Itseez 15

Q & A