a system for observing and recognizing objects in the real world

49
A System for Observing and Recognizing Objects in the Real World J.O. Eklundh, M. Björkman, E. Hayman Computational Vision and Active Perception Lab Royal Institute of Technology (KTH)

Upload: kaethe

Post on 24-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

A System for Observing and Recognizing Objects in the Real World. J.O. Eklundh, M. Björkman, E. Hayman Computational Vision and Active Perception Lab Royal Institute of Technology (KTH). Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A System for  Observing and Recognizing Objects in the Real World

A System for Observing and Recognizing Objects in the Real World

J.O. Eklundh, M. Björkman, E. HaymanComputational Vision and Active Perception Lab

Royal Institute of Technology (KTH)

Page 2: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Motivation

An autonomous agent moving about in a dynamic indoor environment, performing tasks such as finding, picking up and delivering known objects or classes of objects.

What should the vision system of such an agent be capable of?

Page 3: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Capabilities

“where” what

attention segmentation

recognition what

Should be dealt with jointly. Bootstrapping?

Page 4: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Recognition and Categorization

Feifei et al 03

Page 5: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

A robot looking at a table at 1.5 m

Objects subtend only a fraction of the scene and are not centered (no attentional step)

Page 6: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Approach - themes

• 3D cues relevant in the scene• Motion and stereo used for bootstrapping• Integration of multiple cues• A system interacting with the environment• Fast processes and anytime algorithms

desirable

Page 7: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

The f-g-s problem

• Segmenting 3D objects from the background• Computing motion, depth and ego-motion• Acquiring appearance models

Issues:• Combining cues• Demonstrate simple algorithms that suffice

Monocular as well as binocular cases

Page 8: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Cue integration for f-g-s

First, the dynamic monocular case

• Problem: classifying pixels as being foreground or background (or into layers)

• Cues: motion, colour, contrast + prediction (temporal continuity)

Inference problem: observations from different spaces to be combined

Page 9: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Integration• Two approaches:

- probabilistic: likelihood of observing data, given a model of each layer- voting: each cue decides independently, form weighted combination

Algorithm• Online initialization of colour + texture models

– Use segmentation from motion to train distributions

• Suppress unreliable cues

• Sequential algorithm

Page 10: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Related work• Khan & Shah CVPR’01• Triesch & von der Malsburg

ICAFGR’00• Spengler & Schiele ICVS’01• Toyama & Horvitz ACCV’00• Kragic & Christensen ICRA’99• Belongie et al ICCV’98• …

Similarity to tracking

Page 11: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

VotingThe likelihood of observation

fromcue k at pixel i given modelof layer j :

Posterior probability of layer j :

We set

kiZ ,

kjM ,

)( ,,, kjkiki MZf

∑= )()() ,,,, lpMZf)p(j)M(Zf Z(jp klkikikj,ki,ki,kiki,

∑=k

ki,ki,ki Z(jp w jscore ))(

Page 12: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Probabilistic fusion• Assume independent observations• The total likelihood of the observations given

the combined model :

• The posterior estimate of layer membership:

An independent opinion pool

}{ kjjj MM M ,1, .....=

∑∏∏= )p(j)M(Zf)p(j)M(Zf )Z(jp kj,ki,ki,kj,ki,ki,ii

)()( ,,, kjkikijii MZf MZp ∏=

Page 13: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Illustrations

• Assume a distribution for each layer for each cue

f.g. model

b.g. model

8f ( Z | f.g ) = 8

0.4

f ( Z | b.g ) = 0.4

Observation: Z

Page 14: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Cue combination: Weighted voting

• Each cue makes independent decision• Combine using weighted sum

• Assuming equal weights:

f.g.

b.g.

Colour

8.0

0.4

Motion

0.2

0.3

Page 15: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Cue combination: Probabilistic

• Compute total likelihood of observations for each layer• Classify using Bayes’ Rule

• Assuming uniform priors:

f.g.

b.g.

Colour

8.0

0.4

Motion

0.2

0.3

Page 16: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Pros and cons

• Simple!• Easy to combine observations from very

different spaces• Not obvious how cues interact• Graded output

• Mathematically well-founded• One cue can easily dominate over others• Almost binary output

Voting:

Probabilistic:

Page 17: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Training and adaptation

• Start with segmentation just from motion

• Use this to train colour and contrast distributions– Use EM algorithm to train Gaussian mixture models

• Subsequently adapted online– Recompute models from current data– Update model as weighted sum over time window

(Raja et al ECCV’98)

Page 18: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Suppressing unreliable cues

• Cues unreliable during training

• No independent motion poor motion segmentation

• Unreliable in the past probably unreliable now!(Triesch and von der Malsburg ICAFGR’00)

• Mechanisms: Voting: weights Probabilistic: hyper-priors

Page 19: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

The effect of hyper-priors

Orginal pdf’s After marginalization

Page 20: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Results

Original Probabilistic Voting

Degree of membership to foreground

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 21: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Cue combination

Motion

Colour Prediction

Texture(contrast)

CombinedProbabilistic

CombinedVoting

Page 22: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Results

Original Foreground

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 23: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Results: Probabilistic cue integration

Original Foreground mask

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 24: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

The cues

Motion

Colour Prediction

Texture(contrast)

Combined

Page 25: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Epipolar geometry

Non-retinal info.

Image features

Disparity map

Ego-motion

Independent motion

Additional retinal info.

Regions of interest

Fixation point

Top-down control

Process overview

Page 26: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Calibration

Relative orientations has to be known to• relate disparities to depths• simplify estimation of disparities

xy

(1+x ) – y rxy + r + x r

2

z

zy= + 1

z1 – x t- y t

Using corner features and optical flow model

Unstable process => be carefulWe first assume r and r to be zero.z y

Page 27: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

More examples

Page 28: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Final output

Page 29: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

• Many objects of interest static. Harder!• Motion included in full system; now only

stereo• The cues in these examples

– Stereo data - exist along contours– Color data/appearance between contours

3D Cues: stereo and motion

Page 30: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Proposed system structure

Technically by a combination of wide field and foveal cameras

A wide field for attention, recognition in foveated view

Problem: transfer from wide field to foveal view

Steps:•Divide scene into 3D objects•Select objects through attention (e.g.hue and expected size)•Fixate (and track) object of interest•Recognize objects in foveal view

Page 31: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Processes

Recognition

AttentionSegmen-

tation

Hypotheses

Knowledge

Adaptation

Shape and size

Knowledge

Region of Interest

Where

What

Page 32: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Flow of information

Left

Segmentation Global hue SIFT features Local hue

Attention

Gaze direction

Recognition

RegistrationRegistration

Left RightRight

FixationCalibration

Wide field Foveal

Page 33: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Figure-ground segmentation

Disparity map is sliced into layers.Widths are set to that of requested object.

Page 34: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Figure-ground segmentation

• Disparities using SAD correlations.• Segmentation based on slicing the 3D world.

BinoCues BinoAttn

Page 35: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Hue based attention

Local hue histograms correlated with that of requested object.Fast implementation using rotating sums.

Page 36: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Saliency peaks

Peaks from blob detection of depth slices. Based on Differences of Gaussians.Hue saliency map used for weighting.Random value added before selection.

Page 37: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Fixation

0 0 c0 0 da b e

F =

The foveal system continuously tries to fixate• done using corner features• and affine essential matrix

Zero disparity filters won’t work

Page 38: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Foveated segmentation

To boost recognition

• Foveal segmentation based on disparities• Rectification using affine fundamental matrix

• Only search for disparities around zero => Large number of false positives• Points clustered in 3D using mean shift

Page 39: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Foveated segmentation

Page 40: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Foveated segmentation

Page 41: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Small object database in real-time experiments

Models of SIFT features and hue histograms

Page 42: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Visual scene search

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 43: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Segmentation robustness

Page 44: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Segmentation robustness

Page 45: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Effect of occlusions

Page 46: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Effect of rotations

Page 47: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Recognition in w-f-o-v

Page 48: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Recognition after foveation

Page 49: A System for  Observing and Recognizing Objects in the Real World

Vision in the Real World: Attending, Foveating and Recognizing Objects

Conclusions

We have a running system.Objects normally found within three saccadesConcern: dependency on corner features

Current work:• Focus on recognition and categorization• More robust foveal segmentation• Additional cues e.g. texture• Learning and adaptation on all levels