a system for observing and recognizing objects in the real world
DESCRIPTION
A System for Observing and Recognizing Objects in the Real World. J.O. Eklundh, M. Björkman, E. Hayman Computational Vision and Active Perception Lab Royal Institute of Technology (KTH). Motivation. - PowerPoint PPT PresentationTRANSCRIPT
A System for Observing and Recognizing Objects in the Real World
J.O. Eklundh, M. Björkman, E. HaymanComputational Vision and Active Perception Lab
Royal Institute of Technology (KTH)
Vision in the Real World: Attending, Foveating and Recognizing Objects
Motivation
An autonomous agent moving about in a dynamic indoor environment, performing tasks such as finding, picking up and delivering known objects or classes of objects.
What should the vision system of such an agent be capable of?
Vision in the Real World: Attending, Foveating and Recognizing Objects
Capabilities
“where” what
attention segmentation
recognition what
Should be dealt with jointly. Bootstrapping?
Vision in the Real World: Attending, Foveating and Recognizing Objects
Recognition and Categorization
Feifei et al 03
Vision in the Real World: Attending, Foveating and Recognizing Objects
A robot looking at a table at 1.5 m
Objects subtend only a fraction of the scene and are not centered (no attentional step)
Vision in the Real World: Attending, Foveating and Recognizing Objects
Approach - themes
• 3D cues relevant in the scene• Motion and stereo used for bootstrapping• Integration of multiple cues• A system interacting with the environment• Fast processes and anytime algorithms
desirable
Vision in the Real World: Attending, Foveating and Recognizing Objects
The f-g-s problem
• Segmenting 3D objects from the background• Computing motion, depth and ego-motion• Acquiring appearance models
Issues:• Combining cues• Demonstrate simple algorithms that suffice
Monocular as well as binocular cases
Vision in the Real World: Attending, Foveating and Recognizing Objects
Cue integration for f-g-s
First, the dynamic monocular case
• Problem: classifying pixels as being foreground or background (or into layers)
• Cues: motion, colour, contrast + prediction (temporal continuity)
Inference problem: observations from different spaces to be combined
Vision in the Real World: Attending, Foveating and Recognizing Objects
Integration• Two approaches:
- probabilistic: likelihood of observing data, given a model of each layer- voting: each cue decides independently, form weighted combination
Algorithm• Online initialization of colour + texture models
– Use segmentation from motion to train distributions
• Suppress unreliable cues
• Sequential algorithm
Vision in the Real World: Attending, Foveating and Recognizing Objects
Related work• Khan & Shah CVPR’01• Triesch & von der Malsburg
ICAFGR’00• Spengler & Schiele ICVS’01• Toyama & Horvitz ACCV’00• Kragic & Christensen ICRA’99• Belongie et al ICCV’98• …
Similarity to tracking
Vision in the Real World: Attending, Foveating and Recognizing Objects
VotingThe likelihood of observation
fromcue k at pixel i given modelof layer j :
Posterior probability of layer j :
We set
kiZ ,
kjM ,
)( ,,, kjkiki MZf
∑= )()() ,,,, lpMZf)p(j)M(Zf Z(jp klkikikj,ki,ki,kiki,
∑=k
ki,ki,ki Z(jp w jscore ))(
Vision in the Real World: Attending, Foveating and Recognizing Objects
Probabilistic fusion• Assume independent observations• The total likelihood of the observations given
the combined model :
• The posterior estimate of layer membership:
An independent opinion pool
}{ kjjj MM M ,1, .....=
∑∏∏= )p(j)M(Zf)p(j)M(Zf )Z(jp kj,ki,ki,kj,ki,ki,ii
)()( ,,, kjkikijii MZf MZp ∏=
Vision in the Real World: Attending, Foveating and Recognizing Objects
Illustrations
• Assume a distribution for each layer for each cue
f.g. model
b.g. model
8f ( Z | f.g ) = 8
0.4
f ( Z | b.g ) = 0.4
Observation: Z
Vision in the Real World: Attending, Foveating and Recognizing Objects
Cue combination: Weighted voting
• Each cue makes independent decision• Combine using weighted sum
• Assuming equal weights:
f.g.
b.g.
Colour
8.0
0.4
Motion
0.2
0.3
Vision in the Real World: Attending, Foveating and Recognizing Objects
Cue combination: Probabilistic
• Compute total likelihood of observations for each layer• Classify using Bayes’ Rule
• Assuming uniform priors:
f.g.
b.g.
Colour
8.0
0.4
Motion
0.2
0.3
Vision in the Real World: Attending, Foveating and Recognizing Objects
Pros and cons
• Simple!• Easy to combine observations from very
different spaces• Not obvious how cues interact• Graded output
• Mathematically well-founded• One cue can easily dominate over others• Almost binary output
Voting:
Probabilistic:
Vision in the Real World: Attending, Foveating and Recognizing Objects
Training and adaptation
• Start with segmentation just from motion
• Use this to train colour and contrast distributions– Use EM algorithm to train Gaussian mixture models
• Subsequently adapted online– Recompute models from current data– Update model as weighted sum over time window
(Raja et al ECCV’98)
Vision in the Real World: Attending, Foveating and Recognizing Objects
Suppressing unreliable cues
• Cues unreliable during training
• No independent motion poor motion segmentation
• Unreliable in the past probably unreliable now!(Triesch and von der Malsburg ICAFGR’00)
• Mechanisms: Voting: weights Probabilistic: hyper-priors
Vision in the Real World: Attending, Foveating and Recognizing Objects
The effect of hyper-priors
Orginal pdf’s After marginalization
Vision in the Real World: Attending, Foveating and Recognizing Objects
Results
Original Probabilistic Voting
Degree of membership to foreground
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Vision in the Real World: Attending, Foveating and Recognizing Objects
Cue combination
Motion
Colour Prediction
Texture(contrast)
CombinedProbabilistic
CombinedVoting
Vision in the Real World: Attending, Foveating and Recognizing Objects
Results
Original Foreground
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Vision in the Real World: Attending, Foveating and Recognizing Objects
Results: Probabilistic cue integration
Original Foreground mask
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Vision in the Real World: Attending, Foveating and Recognizing Objects
The cues
Motion
Colour Prediction
Texture(contrast)
Combined
Vision in the Real World: Attending, Foveating and Recognizing Objects
Epipolar geometry
Non-retinal info.
Image features
Disparity map
Ego-motion
Independent motion
Additional retinal info.
Regions of interest
Fixation point
Top-down control
Process overview
Vision in the Real World: Attending, Foveating and Recognizing Objects
Calibration
Relative orientations has to be known to• relate disparities to depths• simplify estimation of disparities
xy
(1+x ) – y rxy + r + x r
2
z
zy= + 1
z1 – x t- y t
Using corner features and optical flow model
Unstable process => be carefulWe first assume r and r to be zero.z y
Vision in the Real World: Attending, Foveating and Recognizing Objects
More examples
Vision in the Real World: Attending, Foveating and Recognizing Objects
Final output
Vision in the Real World: Attending, Foveating and Recognizing Objects
• Many objects of interest static. Harder!• Motion included in full system; now only
stereo• The cues in these examples
– Stereo data - exist along contours– Color data/appearance between contours
3D Cues: stereo and motion
Vision in the Real World: Attending, Foveating and Recognizing Objects
Proposed system structure
Technically by a combination of wide field and foveal cameras
A wide field for attention, recognition in foveated view
Problem: transfer from wide field to foveal view
Steps:•Divide scene into 3D objects•Select objects through attention (e.g.hue and expected size)•Fixate (and track) object of interest•Recognize objects in foveal view
Vision in the Real World: Attending, Foveating and Recognizing Objects
Processes
Recognition
AttentionSegmen-
tation
Hypotheses
Knowledge
Adaptation
Shape and size
Knowledge
Region of Interest
Where
What
Vision in the Real World: Attending, Foveating and Recognizing Objects
Flow of information
Left
Segmentation Global hue SIFT features Local hue
Attention
Gaze direction
Recognition
RegistrationRegistration
Left RightRight
FixationCalibration
Wide field Foveal
Vision in the Real World: Attending, Foveating and Recognizing Objects
Figure-ground segmentation
Disparity map is sliced into layers.Widths are set to that of requested object.
Vision in the Real World: Attending, Foveating and Recognizing Objects
Figure-ground segmentation
• Disparities using SAD correlations.• Segmentation based on slicing the 3D world.
BinoCues BinoAttn
Vision in the Real World: Attending, Foveating and Recognizing Objects
Hue based attention
Local hue histograms correlated with that of requested object.Fast implementation using rotating sums.
Vision in the Real World: Attending, Foveating and Recognizing Objects
Saliency peaks
Peaks from blob detection of depth slices. Based on Differences of Gaussians.Hue saliency map used for weighting.Random value added before selection.
Vision in the Real World: Attending, Foveating and Recognizing Objects
Fixation
0 0 c0 0 da b e
F =
The foveal system continuously tries to fixate• done using corner features• and affine essential matrix
Zero disparity filters won’t work
Vision in the Real World: Attending, Foveating and Recognizing Objects
Foveated segmentation
To boost recognition
• Foveal segmentation based on disparities• Rectification using affine fundamental matrix
• Only search for disparities around zero => Large number of false positives• Points clustered in 3D using mean shift
Vision in the Real World: Attending, Foveating and Recognizing Objects
Foveated segmentation
Vision in the Real World: Attending, Foveating and Recognizing Objects
Foveated segmentation
Vision in the Real World: Attending, Foveating and Recognizing Objects
Small object database in real-time experiments
Models of SIFT features and hue histograms
Vision in the Real World: Attending, Foveating and Recognizing Objects
Visual scene search
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
Vision in the Real World: Attending, Foveating and Recognizing Objects
Segmentation robustness
Vision in the Real World: Attending, Foveating and Recognizing Objects
Segmentation robustness
Vision in the Real World: Attending, Foveating and Recognizing Objects
Effect of occlusions
Vision in the Real World: Attending, Foveating and Recognizing Objects
Effect of rotations
Vision in the Real World: Attending, Foveating and Recognizing Objects
Recognition in w-f-o-v
Vision in the Real World: Attending, Foveating and Recognizing Objects
Recognition after foveation
Vision in the Real World: Attending, Foveating and Recognizing Objects
Conclusions
We have a running system.Objects normally found within three saccadesConcern: dependency on corner features
Current work:• Focus on recognition and categorization• More robust foveal segmentation• Additional cues e.g. texture• Learning and adaptation on all levels