real-time attention system gpu-vocus for exploration · 2008-04-01 · macs_y3_visual_attention.ppt...
TRANSCRIPT
Real-time attention system GPU-VOCUS forexploration
Stefan May,Adaptive Reflective Teams Department
Sankt Augustin, February 15th, 2008
MACS_Y3_Visual_Attention.ppt 2FP6-004381-MACS
I. Outline
The Role of Visual Attention
Visual Attention for Robot Control
Visual Attention Simulator VOCUS
Visual Attention GPU-accelerated VOCUS
Implementation
The task of Exploration
Conclusion
MACS_Y3_Visual_Attention.ppt 3FP6-004381-MACS
1. The Role of Visual Attention
Selection of relevant stimuli and processes in interaction
with the environment by means of simple features
No previous knowledge about scene or objects!
Focusing on parts of the optical array reduces the
computational effort
, but
Visual attention is still a time-consuming task,
if parallelism is not used
Pop-Out effect: Attraction or warning
MACS_Y3_Visual_Attention.ppt 4FP6-004381-MACS
2. Visual Attention for Robot Control
Tasks of an Attention System in MACS
Extraction of “interesting” regions, especially in theexploration phase
Monitoring cues during interaction needs a high framerate (Processing time < 33 ms!)
Further: Distance estimation
MACS_Y3_Visual_Attention.ppt 5FP6-004381-MACS
3. Visual Attention Simulator VOCUS
6 image pyramids
48 scale maps (12 Intensity, 12 Orientation, 24 Color)
10 Feature maps (2 Intensity, 4 Orientation, 4 Color)
Center-surround/Gaborfilter
Rescaling/Summing up
Weighting
3 Conspicuity maps (1 Intensity, 1 Orientation, 1 Color)
Fusion
1 Saliency map
VOCUS processes lots of independent maps
Overview
VOCUS uses lots of local operators
Application of parallel processing?
Ref.: Frintrop [8] (Btw parallelism is biological plausible!)
MACS_Y3_Visual_Attention.ppt 6FP6-004381-MACS
4. Visual Attention: GPU-accelerated VOCUS
Ref.: Shih-hsuan Hsu, Graphics group, CMLab, CSIE, NTU
278.6
NV GF 7800 GT
5189.1GFlops (max.)
NV GF 8800 GTXIntel P4 630, 3GHz
CPU: Intel Pentium 4
GPU: NV GF 8800 GTX
Arithmetic performance of the GPU is the result of ahighly specialized architecture (SIMD)
MACS_Y3_Visual_Attention.ppt 7FP6-004381-MACS
5. Requirements for attenting the real world
Tasks of an Attention System in MACS
Extraction of „interesting“ regions without anyknowledge about the environment, especially in theexploration phase
Monitoring cues during interaction needs a highframerate (Processing time < 33 ms!)
Further: Distance estimation via triangulation
MACS_Y3_Visual_Attention.ppt 8FP6-004381-MACS
5. Implementations
Runtime Comparison (VGA-Resolution)
Speedup
~6 / ~9
noyesFeature orientation
Mean runtime / ms
9,621,8GPU-VOCUS (NV GF 8800 GTX / 32-bit)
25,057,7GPU-VOCUS (NV GF 7800 GT / 16-bit)
34,377,5GPU-VOCUS (NV GF 7800 GT / 32-bit)
89,1129,2VOCUS (integral)
969,71407,6VOCUS Speedup~65 / ~101
MACS_Y3_Visual_Attention.ppt 9FP6-004381-MACS
5. Implementations
VGA-Resolution
Online extraction withfull camera frame rate (15 Hz)
CPU resources are free forfurther processing tasks
Feature Extraction with GPU-VOCUS (no IOR)
MACS_Y3_Visual_Attention.ppt 10FP6-004381-MACS
5. Implementations
Find similar features
Combine them to regions
Works like a “low-leveltracking” if cue isunambiguous
Using Top-Down Mode
MACS_Y3_Visual_Attention.ppt 11FP6-004381-MACS
5. Visual Attention for Exploration
Exploration using visual attention (“curiosity”)
Basic skill
Attention system VOCUS
Bottom-up in left and Top-down in right camera images
S. May et al.: IROS 2007
CPU version shown atthe Y2 review (~3 Hz)
MACS_Y3_Visual_Attention.ppt 12FP6-004381-MACS
6. The task of Exploration
VOCUS vs. GPU-VOCUS (2 x VGA-Resolution)
VOCUS(integral)
GPU-VOCUS(Monitoring a physical process)
~3 Hz
~15 Hz(30 Hz possible)
MACS_Y3_Visual_Attention.ppt 13FP6-004381-MACS
6. The task of Exploration – Triangulation
Exploration using visual attention (“curiosity”)
Compute approximate position using triangulation (+ Mean shift)
Approach blue entity
Approach yellow entity
MACS_Y3_Visual_Attention.ppt 14FP6-004381-MACS
6. Results
GPU-VOCUS provides cues of“interesting regions”
Classification of simplefeatures
Learning of relation between“Cue – Behavior – Outcome”
Saliency-based Exploration
MACS_Y3_Visual_Attention.ppt 15FP6-004381-MACS
7. Conclusion
Online calculation of relevant stimuli with visualattention
No previous knowledge, no model database
Usage of standard hardware on a mobile robot
Speedup improvements enable monitoring of physicalprocesses
Future work “Curiosity drive”: Saliency is not sufficient for
“curiosity”!
Curiosity involves novelty detection and experienceand learning
MACS_Y3_Visual_Attention.ppt 16FP6-004381-MACS
References
[1] J. J. Gibson, The Ecological Approach to Visual Perception. Lawrence Erlbaum Associates, 1979.
[2] G. Fritz, L. Paletta, R. Breithaupt, E. Rome, and G. Dorffner, Learning Predictive Features inAffordance-based Robotic Perception Systems, in Proeedings of the IEEE/RSJ InternationalConference on Intelligent Robots and Systems (IROS), October 2006.
[3] A. P. Duchon, W. H. Warren, and L. P. Kaelbling, Ecological robotics, Adaptive Behavior,Special Issue on Biologically Inspired Models of Spatial Navigation, vol. 6, no. 3-4, pp. 473–507,1998.
[4] W. Warren and S. Whang, Visual guidance of walking through apertures: Body scaledinformation for affordances, 1987, vol. 13, pp. 371–383.
[5] L. Mark, Eyeheight-scaled information about affordances: Lerning and projecting a sersori-motor mapping, 1987, vol. 13, pp. 361–370.
[6] K. MacDorman, Grounding symbols through sensorimotor integration, 1999.
[7] Fraunhofer IAIS. (2007) EU Project MACS. [Online]. Available: http://www.macs-eu.org
[8] S. Frintrop, VOCUS: A Visual Attention System for Object Detection and Goal-directed Search,ser. Lecture Notes in Artificial Intelligence (LNAI). Springer Berlin/Heidelberg, 2006, vol. 3899 /2006.
MACS_Y3_Visual_Attention.ppt 17FP6-004381-MACS
Real-time attention system GPU-VOCUS for exploration
Thank you for your attention
MACS_Y3_Visual_Attention.ppt 18FP6-004381-MACS
4. Visual Attention: GPU-accelerated VOCUS
GPGPU – some Properties
Overhead through data transmission (CPU <-> GPU)
Per-pixel execution model (parallel execution!)
Programs running on the GPU are called shader
Multipass-Rendering / Micropass-Rendering (many smallshaders vs. large shaders; bottleneck, Re-useability?)
Shaders are applied, if „something“ is drawn (Render-to-texture – multiple copy operations)
More difficult to debug (than processing on CPU; Debugger aswell as I/O is not comfortable)
Different high-level shading languages available (GLSL, HLSL,Cg, CUDA)
Global operations need multiple passes (e.g. calculation ofmaxima)