active perception

8/3/2019 Active Perception

1/35

Active Perception

We not only see but we look, we

not only touch we feel,JJ.Gibson


2/35

Active Perception vs. Active

Sensing WHAT IS ACTIVE SENSING? In the robotics and computer vision literature, the term

active sensor generally refers to a sensor that transmits

(generally electromagnetic radiation, e.g., radar, sonar,

ultrasound, microwaves and collimated light) into the environment and receives and measures the reflected signals.

We believe that the use of active sensors is not a necessary

condition on active sensing, and that sensing can be performed

with passive sensors (that only receive, and do not

emit, information), employed actively.


3/35

Active Sensing

Hence the problem of Active Sensing can be stated as a

problem of controlling strategies applied to the dataacquisition

process which will depend on the current state of the data interpretation and the goal or the task of theprocess.

The question may be asked, IsActive Sensing only an

application of Control Theory? Our answer is: No, at

least not in its simple version. Here is why:


4/35

Active Perception

1) The feedback is performed not only on

sensory data

but on complex processed sensory data, i.e.,

various

extracted features, including relational features.

2) The feedback is dependent on a priori

knowledge and models that are a mixture of numeric/parametric and

symbolic information.


5/35

Active Perception turned into an

engineering agenda The implications of the active sensing/perception approach are the following:

1) The necessity of models of sensors. This is to say, first,

the model of the physics of sensors as well as the noise of

the sensors. Second, the model of the signal processing and datareduction mechanisms that are applied on the measured

data. These processes produce parameters with a definite

range of expected values plus some measure of uncertainties.

These models shall be called Local Models.


6/35

Engineering agenda,cont.

2) The system (which mirrors the theory) is modular as

dictated by good computer science practices and interactive,

that is, it acquires data as needed. In order to be able

to make predictions on the whole outcome, we need, in

addition to models of each module (as described in 1) above), models for the whole process, including feedback.

We shall refer to these as Global Models.

3) Explicit specification of the initial and final state /goal.

If the Active Vision theory is a theory, what is its predictive

power? There are two components to our theory, each

with certain predictions:


7/35

Active Vision theory

1) Local models. At each processing level, local models

are characterized by certain internal parameters. Examples

of local models can be: region growing algorithm with internal

parameters, the local similarity and size of the local

neighborhood. Another example is an edge detection algorithm with parameter of the width of the band pass filter in

which one is detecting the edge effect. These parameters

predict a) the definite range of plausible values, and b) the

noise and uncertainty which will determine the expected

resolution, sensitivity ,robustness of the output results from

each module


8/35

Active Vision,cont.

2) Global models characterize the overall performance

and make predictions on how the individual modules will

interact which in turn will determine how intermediate

results are combined. The global models also embody the

Global external parameters, the initial and final global state of the system. The basic assumption of the Active Vision

approach is the inclusion of feedback into the system and

gathering data as needed. The global model represents all

the explicit feedback connection, parameters, and the optimization

criteria which guides the process.


9/35

Control Strategies

three distinct control stages proceeding in sequence:

initialization,

processing in midterm,

completion of the task. Strategies are divided with respect to the tradeoff

between

how much data measurement the system acquires (data

driven, bottom-up) and how much a priori or acquired

knowledge the system uses at a given stage (knowledge

driven, top-down). Of course, there is that strategy which

combines the two.


10/35

Bottom up and Top down process

To eliminate possible ambiguities with the termsbottom up

and top-down, we define them here. Bottom-up(data

driven), in this discussion, is defined as a controlstrategy

where no concrete semantic, context dependentmodel is

available, as opposed to the top-down strategywhere such

knowledge is available.


11/35

GOALS/TASKS

Different tasks will determine the design of

the system, i.e. the architecture.

Consider the following tasks: Manipulation

Mobility

Communication and Interaction ofmachine to machine or people to people

via digital media or people to machine.


12/35

Goal/Task

Geographically distributed communication and

interaction using multimedia (vision primarily)

using the Internet.

We are concerned with primarily unspokencommunication: gestures and body motion.

Examples are: coordinated movement such as

dance, physical exercises, training of manualskills, remote guidance of physical activities.


13/35

Note

Recognition , Learning will play a role in all

the tasks.


14/35

Environments/context

Serves as a constraint in the design.

We shall consider only the constraints relevantto the visual task that serves to accomplish the

physical activity. For example: in the manipulation task, the size

of the object will determine the data acquisitionstrategy but also the design of the vision system

(choice of field of view, focal length, illumination,and spatial resolution). Think of moving furniturevs. picking up a coin.


15/35

Environment/context

Another example: Mobility

There is a difference if the mobility is on the

ground, in the air looking down or up.

The position and orientation of the observer will

determine the interpretation of the signal.

Furthermore there is a difference between

outdoor and indoor environment. Varied visibility conditions will influence the

design and the architecture.


16/35

Environment/context

For distributed communication andinteraction.

The environment will depend on theapplication, could be digitized environmentof the place where the participants are or italso could be a virtual environment, for

example one can put people into ahistorical environment (Rome, Pompei,etc.)


17/35

Active Vision System for 3D object

recognition Table 1 below outlines the multilayered system of an Active vision system, with the final goal of3-Dobject/shape

recognition. The layers are enumerated from 0, 1, 2, . . *

with respect to the goal (intermediate results) and feedback

parameters. Note that the first three levels correspond to

monocular processing only. Naturally the menu of extracted

Features from monocular images is far from exhaustive. The

other 3-5 levels are based on binocular images. It is only

the last level that is concerned with semantic interpretation.


18/35

Table

Level Feedback Goal

Parameters stopping conditions

________________________________________________________

0;

control of the directly measured grossly focused

Physical device current lighting system scene ,camera adjusted

open/close aperture aperture__________________________________________________________

1.

Control of the directly measured focused

Physical device focus, zoom on one object

Computed contrast distance from

focus

_______________________________________________

2.

Control of low computed only 2D segmentation

Level vision threshold of the width max .#of edges/regions

Modules of filters


19/35

Table cont.

Level Feedback Parameters Goal/Stopping_______________________________________________________________________3.Control of binocular directly measured: Depth mapSystem hardware vergence angle

Software) computed: range of admissibledepth values

_______________________________________________________________________

4.Control of intermediate computed only: segmentationGeometric vision threshold of similarityModule between surfaces______________________________________________________________________5.Control of compute the position 3D object descriptionSeveral views rotation of different views

Integration process___________________________________________________________________________

6. Control of semantic

Interpretation recognition of 3D objects/scene


20/35

Comments:

Several comments are in order:

1)Although we have presented the levels in a sequential

order, we do not believe that is the only way of the

flow of information through the system. The onlysignificance

in the order of levels is that the lower levels

are somewhat more basic and necessary for the higher

levels to function.

2)In fact, the choice of at which level one accesses the

system very much depends on the given task and/or

the goal.


21/35

Active Visual Observer

Several groups around the world build a

binocular active vision system that can

attend to and fixate a moving target.

We will review two such systems one built

at UPENN,GRASP laboratory and the

other at KTH (Royal Institute of

Technology) in Stockhols,Sweden.


22/35

The UPENN System


23/35

PennEyes

A Binocular Active Vision System


24/35

PennEyes

PennEyes is a head in-hand system with

a binocular camera platform mounted on a

6 DOF robotic arm. Although physically

limited to reach of the arm, the

functionality of the head is extended

through the use of the motorized optics

(10x zoom). The architecture is configuredto rely minimally on external systems and .


25/35

Design considerations

Mechanical:The precision positioning wasafforded by the PUMA arm. However thebinocular camera platform needed to weigh inthe range of 2.5 Kg.

Optics: The use of motorized lenses (zoom,focus and aperture) offered an increasefunctionality.

Electronics: This was the most critical element in

the design. A MIMD DSP organization wasdecided as the best tradeoff betweenperformance, extensibility and ease ofintegration.


26/35

Puma Polka


27/35

Tracking Performance

The two robots afforded objective

measures of tracking performance with

precision target.

A three dimensional path with known

precision can be repeatedly generated ,

allowing the comparison of different visual

servoing algorithms.


28/35

BiSight Head


29/35

BiSight head

Has an independent pan axes with the highesttracking performance of 1000deg/s and12,000deg/ssquare. The concern here is how

well can be maintained the calibration afterrepeated exposure to acceleration and vibration.

Another problem occurred with zoom adjustmentthe focal length also changed.

The binocular camera platform has 4 optical(zoom and focus) and 2 mechanical (pan)degrees of freedom.


30/35

C40 Architecture

Beyond the basic computing power of theindividual C40s the performance of thenetwork is enhanced by the ability to

interconnect the modules with a fairdegree of flexibility as well as the abilitystore an appreciable amount ofinformation. The former is made possible

up to six comports on each module andthe later by several Mbytes of localstorage.


31/35

C40 Architecture


32/35

Critical Issues

The performance of any modularly

structured active vision system depends

critically on a few recurring issues. They

involve the coordination of processes

running on different subsystems, the

management of large data streams,

processing and transmission delays andthe control of systems operating at

different rates.


33/35

Synchronization

The three major components of this modular

active vision system are independent entities

that work at their own pace. The lack of a

common time base makes synchronizing thecomponents a difficult task.

In some cases , an external signal can be used

to synchronize independent hardware

components. In this system, C40 network, thedigitizers and the graphics module are slaved on

the vertical sync of the genlocked cameras.


34/35

Other considerations

Bandwidth large data streams

System Integration. If data throughput becomesthe bottleneck, then some new datacompression algorithms must be invoked.

Latency. Delays between the acquisition of aframe and the motor response to it are aninevitable problem of active vision systems.Delays make the control more difficult because

they can cause instabilities. Multi-rate control. Active vision systemssuggests by their very nature a hierarchicalapproach to control


35/35

Control

If the visual and mechanical control rates

are one or more orders of magnitude

apart, the mechanical control loops are

essentially independent of the visual

control loop.

active perception

Documents