active object recognition under gaze control
TRANSCRIPT
![Page 1: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/1.jpg)
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Active Object Recognition under Gaze Control
Active Object Recognition under Gaze Control
Jan-Olof Eklundh, Mårten BjörkmanCVAP, KTH, Stockholm
![Page 2: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/2.jpg)
2
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
The KTH Head of 1991The KTH Head of 1991
Independent eye and neck movementsEye rotations around optical centerEccentric neckDrive towards symmetry constrained redundancyMonocular stabilization and pursuit, binocular stereopsis and accommodation independent but integratedBinocular fixation at lateral speeds up to 115°/s,5 m/s in depth. Saccades up to 360°/s
![Page 3: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/3.jpg)
3
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
QuickTime™ and aH.261 decompressor
are needed to see this picture.
![Page 4: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/4.jpg)
4
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
QuickTime™ and aH.261 decompressor
are needed to see this picture.
![Page 5: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/5.jpg)
5
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
What did this system “see”?What did this system “see”?
High performance through tightly integrated hardware. No resources leftInformation about ego-motion, independent object motion and depth available, but couldn’t be utilizedAppearance of 3D objects too, as also to some extent poseGoal of current work to do that in visual search and hand-eye coordination tasks
![Page 6: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/6.jpg)
6
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
S-o-t-a object classificationS-o-t-a object classification
![Page 7: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/7.jpg)
7
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
A robot looking at a table at 1.5 mA robot looking at a table at 1.5 m
Objects subtend only a fraction of the scene and are not centered (unless attentional step)
![Page 8: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/8.jpg)
8
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Desirable system structureDesirable system structure
where ”what”attention segmentation
recognitionwhat
Run concurrently. Motion powerful in bootstrapping,but static objects often as important.
![Page 9: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/9.jpg)
9
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
With stereo and motionWith stereo and motion
![Page 10: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/10.jpg)
10
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
What the system “sees”What the system “sees”
![Page 11: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/11.jpg)
11
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion F-g-s by integration of multiple cues
from motion and appearanceF-g-s by integration of multiple cues
from motion and appearance
Original Foreground mask
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
![Page 12: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/12.jpg)
12
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
The cuesThe cues
Motion
Colour Prediction
Texture(contrast)
Combined
![Page 13: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/13.jpg)
13
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion F-g-s by integration of multiple cues
from motion and appearanceF-g-s by integration of multiple cues
from motion and appearance
Original Foreground mask
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
![Page 14: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/14.jpg)
14
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Typical static scenesTypical static scenes
![Page 15: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/15.jpg)
15
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion A system for searching for
static objectsA system for searching for
static objectsA wide field of view for attentionRecognition in foveated viewSteps:
Divide scene into depth layersSelect candidate objects through attentionFixate and track objects of potential interestRecognize/classify objects in foveal view, possibly after a second binocularly based segmentation
Technically: two pairs of stereo camerasProblem: transfer of views
![Page 16: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/16.jpg)
16
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Flow of informationFlow of information
Left
Segmentation Global hue SIFT features Local hue
Attention
Gaze direction
Recognition
RegistrationRegistration
Left RightRight
FixationCalibration
Wide field Foveal
![Page 17: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/17.jpg)
17
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Stereo computationsStereo computations
Relative orientations have to be known to• relate disparities to depths• simplify estimation of disparities
∆x∆y
(1+x )α – y rxy α + r + x r
2
z
zy
= + 1z
1 – x t- y t
Using corner features and optical flow model
Unstable process => use robust methodsFirst assume r and r to be zeroOn-line calibration allows the use of expected retinal size
z y
![Page 18: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/18.jpg)
18
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Figure-ground segmentationFigure-ground segmentation
Disparity map is sliced into layers.Widths are set after objects searched for
![Page 19: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/19.jpg)
19
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Figure-ground segmentationFigure-ground segmentation
BinoCues BinoAttn
![Page 20: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/20.jpg)
20
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Appearance based attentionAppearance based attention
Local hue histograms correlated with that of requested object.
Fast implementation using rotating sums.
![Page 21: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/21.jpg)
21
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Saliency peaksSaliency peaks
Peaks from blob detection of depth slices.Based on Differences of Gaussians.Hue saliency map used for weighting.Random value added before selection.Inhibition on return
![Page 22: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/22.jpg)
22
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
FixationFixation
0 0 c0 0 da b e
F =
The foveal system continuously tries to fixate• done using corner features• and affine essential matrix
Zero disparity filters won’t work
![Page 23: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/23.jpg)
23
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Foveated segmentationFoveated segmentation
To boost the ensuing recognition/classification
• Foveal segmentation based on disparities• Rectification using affine fundamental matrix
• Only search for disparities around zero => Large number of false positives
• Points clustered in 3D using mean shift
![Page 24: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/24.jpg)
24
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Foveated segmentationFoveated segmentation
![Page 25: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/25.jpg)
25
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Foveated segmentationFoveated segmentation
![Page 26: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/26.jpg)
26
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion Small example object database in
real-time experiments - in total 24Small example object database in real-time experiments - in total 24
Here models of SIFT features and hue histograms. Texture descriptors also included now.
![Page 27: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/27.jpg)
27
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion Visual scene searchVisual scene search
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
![Page 28: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/28.jpg)
28
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Segmentation robustnessSegmentation robustness
![Page 29: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/29.jpg)
29
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Effect of occlusionsEffect of occlusions
![Page 30: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/30.jpg)
30
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Effect of rotationsEffect of rotations
![Page 31: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/31.jpg)
31
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Recognition experimentsRecognition experiments
24 objectsLearned over a range of views, represented by two featuresArranged in 24 “scenes”“Is X in the scene?”3 fixations allowed
![Page 32: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/32.jpg)
32
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion SIFT features in wide field, no
disparity based segmentation SIFT features in wide field, no disparity based segmentation
![Page 33: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/33.jpg)
33
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Colour cue, wide field vs wide + central field disparity segmentation
Colour cue, wide field vs wide + central field disparity segmentation
![Page 34: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/34.jpg)
34
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
SIFT features, wide field vs wide + central field disparity segmentation
SIFT features, wide field vs wide + central field disparity segmentation
![Page 35: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/35.jpg)
35
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Wide field + central disparity based segmentation, combined features
Wide field + central disparity based segmentation, combined features
![Page 36: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/36.jpg)
36
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
ConclusionsConclusions
Gaze control essential. In fact, many current methods assume foveation or something with similar effects3D cues powerful for figure-ground segmentation (informs about the scene)3D cues thereby also support recognition and categorizationIntegration of multiple cues essential
![Page 37: Active Object Recognition under Gaze Control](https://reader031.vdocuments.mx/reader031/viewer/2022013001/61ca3e7b7311130fa314c310/html5/thumbnails/37.jpg)
37
Com
puta
tiona
l Vis
ion
and
Activ
e Pe
rcep
tion
Comments. Future workComments. Future work
We have a running system, that normally finds objects within three saccadesExperiments tedious (learning, scene setups)More cues being added, especially textureFocus on classification and eventually categorizationApplications to hand-eye coordination and manipulationPotential for computing both local and global shape properties