t1.1- analysis of acceleration opportunities and virtualization requirements in industrial...
TRANSCRIPT
T1.1- Analysis of acceleration opportunities and virtualization
requirements in industrial applications
Bologna, April 2012UNIBO
Android and accelerators?
• Android is the most widelyused operating system formobile devices– Linux-based– Open source
• Which applications running on android-based devices could benefit from HW acceleration (GPPA, HWPU)?– Smartphones have a camera and increasingly more computationally
powerful image processing– innovative and attractive apps leveraging their portability and
ubiquity
Computer Vision
• Computer Vision is a branch of computer science that includes many techniques to extract, characterize, and interpret information in visual images
• Scientific and industrial communities are showing a growing interest in developing Computer Vision (CV) algorithms on embedded systems
Augmented Reality
• Augmented reality (AR) is a live view of a real-world environment with virtual objects superimposed upon (or composited with) the current scene– Semantic context– Real-time constraints
• Layar is an augmented reality browser for Android and iOS
– It uses sensor data (camera, compass, GPS, and accelerometer) to identify user locationand field of view
– It shows geo-located POI organized in layers– As of September 2011, Layar had 2993 layers
AR Algorithms
• A primary issue of augmented reality application is image registration, that is the process to derive real world coordinates from images
• A first step for image registration is the detection of feature points usingproper algorithms
• OpenCV is a C/C++ library thatincludes many CV algorithms,including feature detectors– Android build is available!!!
Feature extraction kernels
Android – OpenCV API Reference (http://opencv.itseez.com/)• features2d – Feature detection and description
• SIFT – Scale Invariant Feature Transform [Yuan09]• SURF – Speeded Up Robust Features [Bay06]• FAST – Detects corners using the FAST algorithm [Rosten10]
[Yuan09] , Y. Yuan , C. Shi, “Object tracking using SIFT features and mean shift”, Computer Vision and Image Understanding, 2009
[Bay06] Bay, H., Tuytelaars, T., Van Gool, L. “SURF: Speeded Up Robust Features”, 9th European Conference on Computer Vision, 2006
[Rosten10] Rosten, E.; Porter, R.; Drummond, T.; , "Faster and Better: A Machine Learning Approach to Corner Detection," Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.32, no.1, pp.105-119, Jan. 2010
Example: FAST Algorithm
• For each image point, FAST examines the 16 pixels on a circle with radius 3 and center p
• A feature is detected iff the intensities of at least n contiguous pixels are all above or all below the intensity of p by a threshold t
• Most feature detector algorithms are inherently parallel, as they verify some properties for each point in the current image
Embedded Platforms for Benchmarking
LG Optimus 2x Pandaboard DragonBoard
CPUFrequency
L1 Cache (I/D)L2 Cache
Main Memory
Dual-Core Cortex-A91 GHz, per core(32KB / 32KB) per core1 MB Shared1GB LPDDR2-667
Dual-Core Cortex-A91 GHz, per core(32KB / 32KB) per core1 MB Shared1GB LPDDR2-400
Dual-Core Scorpion1.2 GHz, per core(32KB / 32KB) per core512KB Shared1GB LPDDR2-333 ISM
Consumer smartphone
Low-cost dev board
Advanced dev board
Feature Detection on Embedded Platforms
• This figure shows the speed-up for a scalable version of FAST on three different platforms– Fine-Grained Data-Level Parallelization The main computation loop
divides the image in multiple horizontal bands regular memory access pattern
– The measured speed-up is very limited
Fine-Grained Data-Level Parallelization
• We tested the same version of FAST using a multi-core virtual platform– the experimental speed-up is closer to the ideal one when the number of threads
becomes comparable with the number of cores
• The number of cores is limited (max 4 in current generation)– A viable solution to exploit scalability is the use of accelerators