robotic models of active perception

Robo$c Models of Ac$ve Percep$on

Dimitri Ognibene, PhD Laboratory for Morphological Computa:on and Learning

(www.thrish.org)

To subs:tute humans in dangerous jobs is one of the

main goals of robo:cs

The ac$ons in these pictures are already possible for robots

of today.

However…..

Perceiving in these environments is very complex:

•  Unstructured •  Changing

•  Many different objects of different scales and shapes •  Occlusions

•  Other agents to perceive and coordinate with

Currently only humans are able to cope with such level of perceptual complexity…

And humans perceive ac$vely…

Active Perception

Ognibene & Demiris 2013

•  Robo:cs •  Neuroscience •  Automa:c Diagnosis •  Smart Devices & Environments

•  Data mining

Foveal Vision (What does it mean to perceive ac:vely?)

7

Foveal Vision (What does it mean to perceive ac:vely?)

Try to grasp an apple with foveal vision.. Seeing becomes like sampling and remembering

Ac:ve Percep:on (AP) Issues* •  Where to look? •  What to remember? •  When to stop looking and start ac:ng?

– Enough informa:on? – Enough :me? – Acquired informa:on s:ll valid?

*See also The Frame Problem

Where to look? Use only image sta:s:cs?

I] & Baldi 2010

Main limits of base saliency models are: •  No task informa:on •  Do not consider limited field of view

Where To look? Informa:on on Demand

Yarbus 1967 16

Where to look? Context and task informa:on used to drive

percep:on to the target

Vogel & de Freitas 2008

Unknown Task or Goal •  Task/Goal depending on other agents’ presence/goals

•  Mul:ple affordances required for the task

Ognibene & Demiris IJCAI 2013

Ac:ve Percep:on and Mirror Neurons

19

•  Encode ac:on goal •  Abstracts trajectory •  Needs percep:ons

Can Motor Control System predict others’

ac:ons?

Human Robot Interaction as a Distributed Dynamic Event


Predic:ve Ac:on Recogni:on

Field of view


Effec:ve Percep:on-‐Environment Coupling is necessary for :mely

Recogni:on and Survival

Field of view

Different hypotheses of target posi:on Equally probable, not seen


See also “Percep:ons as hypotheses: saccades as experiments, Friston et al.

2012”

Perceive to reduce uncertainty

Field of view

Hand movement changes distribu:on



Field of view

Saccade to target hypothesis



Field of view

No target at posi:on observed



Field of view

Update Distribu:on



2. Active Event Recognition

In this section the AER is defined and a solution based on a mixture of KFusing Information Gain (AERIG) is described.

Problem definition. The graphical model in figure 2 displays the formulationof the problem. The discrete hidden stochastic variable V represents the classof the event which is taking place, characterised by a di↵erent dynamic of theenvironment that the agent must predict and recognise. The environment iscomposed of a fixed set of elements E = {e1, e2 . . . eN} and thus its state X t attime t is composed of the states Xt

i of the di↵erent elements. For each value of Vthe evolution of X t is determined by a di↵erent dynamic system with di↵erentindependence conditions between the elements. At each time step the agentreceives for each element i an observation ot

i which depends on the currentconfiguration of the sensors ✓t. The states and observations are continuousvariables.

At every time step the goal of the system is to select the configuration ✓t

that will minimise the expected uncertainty over V (quantified by entropy H):

✓t = argmin✓

t

Z

O

p(ot|o0...t�1, ✓t)H(V |oo...t, ✓0...t)dot (1)

Proposed solution. For the recognition of the event and for the selection ofthe sensors configuration it is necessary to compute the posterior P (v|ot; ✓t).Given a prior distribution P (v,xt

1:N ) = P (xt1:N |v)P (v) and the independence

of the observed event from the sensor configuration P (v|✓) = P (v), the updateexpression of the posterior P (v|ot+1✓t+1) can be derived through the use of theBayes rule:

P (v|ot+1, ✓t+1) =P (ot+1|v, ✓t+1)P (v)

P (ot+1|✓t+1)(2)

The computation of eq.1 and eq.15 in the general case can pose severe compu-tational complexities. The solution proposed is based on the assumption that,once v is fixed, the dynamics is linear and the probability distributions are nor-mal. This enables the use of a mixture of KF with a distinct KF for each valueof v. Denoting with ot+1

v,✓t+1 the mean expected observation and with St+1v,✓t+1

its covariance matrix, both of which are conditioned on v and ✓ and computedduring the KF update, the following can be derived:

✓t+1 = argmin✓

t+1

X

v

P (v)⇣12ln |St+1

v,✓

t+1 |+Z

O

N (o; ot+1v,✓

t+1 ,St+1v,✓

t+1) ln(P (o|✓t+1))do⌘ (3)

Where |S| is the determinant of a matrix S. The first order Taylor expansion

5

Info Gain Percep:on Control for Inten:on An:cipa:on

Minimizing event uncertainty (condi:onal entropy H(v|..))


2. Active Event Recognition

In this section the AER is defined and a solution based on a mixture of KFusing Information Gain (AERIG) is described.

Problem definition. The graphical model in figure 2 displays the formulationof the problem. The discrete hidden stochastic variable V represents the classof the event which is taking place, characterised by a di↵erent dynamic of theenvironment that the agent must predict and recognise. The environment iscomposed of a fixed set of elements E = {e1, e2 . . . eN} and thus its state X t attime t is composed of the states Xt

i of the di↵erent elements. For each value of Vthe evolution of X t is determined by a di↵erent dynamic system with di↵erentindependence conditions between the elements. At each time step the agentreceives for each element i an observation ot

i which depends on the currentconfiguration of the sensors ✓t. The states and observations are continuousvariables.

At every time step the goal of the system is to select the configuration ✓t

that will minimise the expected uncertainty over V (quantified by entropy H):

✓t = argmin✓

t

Z

O

p(ot|o0...t�1, ✓t)H(V |oo...t, ✓0...t)dot (1)

Proposed solution. For the recognition of the event and for the selection ofthe sensors configuration it is necessary to compute the posterior P (v|ot; ✓t).Given a prior distribution P (v,xt

1:N ) = P (xt1:N |v)P (v) and the independence

of the observed event from the sensor configuration P (v|✓) = P (v), the updateexpression of the posterior P (v|ot+1✓t+1) can be derived through the use of theBayes rule:

P (v|ot+1, ✓t+1) =P (ot+1|v, ✓t+1)P (v)

P (ot+1|✓t+1)(2)

The computation of eq.1 and eq.15 in the general case can pose severe compu-tational complexities. The solution proposed is based on the assumption that,once v is fixed, the dynamics is linear and the probability distributions are nor-mal. This enables the use of a mixture of KF with a distinct KF for each valueof v. Denoting with ot+1

v,✓t+1 the mean expected observation and with St+1v,✓t+1

its covariance matrix, both of which are conditioned on v and ✓ and computedduring the KF update, the following can be derived:

✓t+1 = argmin✓

t+1

X

v

P (v)⇣12ln |St+1

v,✓

t+1 |+Z

O

N (o; ot+1v,✓

t+1 ,St+1v,✓

t+1) ln(P (o|✓t+1))do⌘ (3)

Where |S| is the determinant of a matrix S. The first order Taylor expansionof P (o|✓) at point ot+1

v results in:

✓t+1 ⇡ argmin✓

X

v

P (v)

"1

2ln |St+1

v,✓t+1 | + lnVX

v0

⇣P (v0) N (ov,✓; o

t+1v0,✓t+1 ,S

t+1v0,✓t+1)

⌘#(4)

5

Info Gain Using Kalman Filters

Expected entropy for hypothesis v

Difference of predic:ons between the models

Gaze target during event observa:on

0 5 10 15 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Time stesp

performer besttarget besttarget not bestperformer not bestRa

:o of saccade

s on the elem

ent


Modelling the temporal coupling of percep$on with external events

Results


Mul:ple Complex Simultaneous Ac:vi:es

Hierarchical Ac:on Representa:on to Represent Temporal Structure

Probabilis$c Grammars Dynamic Bayes Network

Lee, Ognibene, Chang , Kim, Demiris (Submimed)

Lee, Ognibene, Chang , Kim, Demiris (Submimed)

STARE Spa:o-‐Temporal Amen:on Reloca:on for Mul:ple Structured Ac:vi:es Detec:on

Ac:ve Percep:on (AP) Issues

•  Where to look? •  What to remember? •  When to stop looking and start ac:ng?

– Enough informa:on? – Enough :me? –  Is acquired informa:on s:ll valid?

Ac:ve Percep:on Issues

•  Why has evolu:on selected amen:on and reduc:on of percep:ve space for many species?

•  Why does a massively parallel system, like the brain, needs to use a serial mechanism like amen:on?

Is AP just useful to cope with hidden informa$on?


•  How are decision making and planning affected by AP? How computa:on is affected by AP?

•  Is AP in the brain reflected by a peculiar kind of “ac:ve processing”?

•  How is learning affected by AP? •  How are representa:ons affected by AP? •  How can the brain self-‐organise to support AP? •  How would a dysfunc:on of AP be manifest?




• How is learning affected by AP? • How are representa:ons affected by AP?

•  How can the brain self-‐organise to support AP? •  How would a dysfunc:on of AP be manifest?

Percep:on control is strongly dependent on the task

Learning a new task may require learning a new percep:on control policy

Ac:ve Percep:on and Learning

40

Ognibene & Baldassare, IEEE TAMD, 2014

Foveal Vision and Saliency Map May Speed-‐Up Learning of “Ecological Tasks”

Ognibene & Baldassare, IEEE TAMD, 2014

Subjec:ve and efficient representa:ons

42

Ognibene & Baldassarre, IEEE TAMD, 2014

Agent has a fovea and can see colors only at the center of its field of view Agent is rewarded if it touched the red block The red block is always on the leq of the green blocks Green blocks are very easy to find Blue blocks are randomly posi:oned distractors What will be the right ac$on to do, the right representa$on to learn for the blue object?


43


What will be the right ac$on to do, the right representa$on to learn for the blue object? While a random ac$on was expected due to random posi:on of the blue block, the agent learns a well organised representa:on. It moves from the blue block up, down on the same column or right. The policy learnt by the agent for the green and red blocks biased the agent percep$on of the blue object making it a landmark to find the red object and the agent behaviour effec$ve even without memory.


44


The policy learnt by the agent for the blue an red blocks biased the agent percep:on of the blue object while making its behaviour effec:ve. The agent starts usually from the green object and moves to elements in the leq adjacent column expec:ng to find the red object. This leads to ignore the blue blocks that are not in the columns at the leq of the green blocks (those inside the orange circle). Next picture shows the resul:ng perceived structure of the world.


45


Perceived World biased by Ac$ve Percep$on The policy learnt by the agent for the green and red blocks biased the agent percep$on of the blue object making it a landmark to find the red object and the agent behaviour effec$ve even without memory.

Sequence of observa:ons and their frequency

(grey) aqer learning


46


Representa:ons Evolu:on

47

G

R

B

Ognibene et al, SAB 2008

Representa:ons are not formed in a uniform way.

The system shows a sequen:al forma:on of different areas of ac:vity. This may be due to the selec:ve aspect of ac:ve percep:on which enables percep:on and change only

on a subset of s:muli.


48

G

R

B



49

G

R

B


As representa:ons are not formed in a uniform way the same is true for the behaviours acquired by the

agent. The sequen:al forma:on of

different areas of ac:vity may not only be reflected in the behaviours sequen:ally acquired but also be caused by the increasing capability of the agent due to acquiring other

behaviours and give place to “scaffolding” supported by AP


50

G

R

B





•  How are representa:ons affected by AP? •  How is learning affected by AP? •  How can the brain self-‐organise to support AP? •  How would a dysfunc:on of AP be manifest?

Inten:on aware resource alloca:on in 3D Tracking for Precision Manipula:on

Ini:al Improvements

Introduc:on of constraints for spa:o-‐temporal consistency and op:misa:on to exploit GPUs and mul:core CPUs

…. but STILL TOO SLOW

Ini:al Improvements

Inten:on Aware Resource Alloca:on in 3D Tracking for Precision

Manipula:on Humans are able of fast adap:ve reac:ons to unforeseen events…

which requires fast (maybe imprecise) percep:on


Manipula:on

DARWIN Attention

3D Pose Estimator

Depth Image

Mask Builder

Tracker

Other Object Masks

External Motion Info

Mask

OcclusionOcclusion

OcclusionImage

Camera Image

ID

ImageImage

ConfidenceConfidence

Confidence

OUTPUT

3D Posture ConfidenceClass ID

2D Object Detector

DARWIN Cognitive Architecture3D Posture ConfidenceClass ID

Rendered Image


Rendered ImageRendered

Image


OBJECT REPRESENTATION


Rendered Image



Image


Intentions Predictions

Context Sensitive Resource Allocation

Appearence Based Fast Tracker

Complex visual percep:on system running on parallel hardware with direct and indirect dependencies

between the components


Manipula:on

DARWIN Attention

3D Pose Estimator

Depth Image

Mask Builder

Tracker

Other Object Masks


Mask

OcclusionOcclusion

OcclusionImage

Camera Image

ID

ImageImage


Confidence

OUTPUT


2D Object Detector


Rendered Image



Image




Rendered Image



Image






Manipula:on

DARWIN Attention

3D Pose Estimator

Depth Image

Mask Builder

Tracker

Other Object Masks


Mask

OcclusionOcclusion

OcclusionImage

Camera Image

ID

ImageImage


Confidence

OUTPUT


2D Object Detector


Rendered Image



Image




Rendered Image



Image





Ac$ve Percep$on and Computa$on to reduce uncertainty

•  Intrinsic scene saliency: maximise expected overall predictability (e.g. an object moving will make salient also nearby objects that may occlude it or deviate it)

•  Agent Inten:on -‐> rise saliency changing predic:ons

1.  Humans apply certain strategies to detect hard abnormali:es in soq :ssues

2.  Op:mally chosen speed and load of tac:le probing will lead to improved tumour detec:on and bemer clinical outcomes

3. Embodied percep.on of environment should be considered to define op:mal probing behaviour

Jelizaveta Konstan:nova

Laboratory for Morphological Computa$on and Learning

(Thrish.org KCL)

Nantachai Sornkarn

Thrishantha Nanayakkara (PI)

Embodied Percep:on and Tac:le Explora:on

Embodied Percep:on

[1]J. Konstan$nova, M. Li, M. Gautam, P. Dasgupta, K. Althoefer and T. Nanayakkara. “Behavioral Characteris:cs of Manual Palpa:on to Localize Hard Nodules in Soq Tissues”, in press, IEEE Transac$ons on Biomedical Engineering, 2014.

[2] Nantachai Sornkarn, Thrishantha Nanayakkara, Mamhew Howard, “Internal Impedance Control Helps Informa:on Gain in Embodied Percep:on”, in IEEE Interna:onal Conference on Robo:cs and Automa:on (ICRA), 2014

Human Robot Hap:c Guidance

Anuradha Ranasinghe

Thrishantha Nanayakkara (PI)

Guiding agent can be modeled as 3rd order predic:ve model using a simple linear auto-‐regressive model (Arx). Human follower can be molded as 2nd order reac:ve control policy.

The guider can modulate the pulling force in response to the confidence level of the follower.

Confidence of the fo l lower correlates to model v irtual damping and can be ac$vely measured




•  How are representa:ons affected by AP? •  How is learning affected by AP? •  How can the brain self-‐organise to support AP? •  How would a dysfunc:on of AP be manifest?

Predic:ve Coding Mumford 1992 Rao and Ballard 1999

Friston 2005 Spratling 2008 Hinton 2007 Clark 2013

Hierarchical Bayesian Predic:ve (Genera:ve) Model Predic:ons flow backward and predic:on errors forward (fast reac:on) Accumula:on of sensory evidence reduces Predic:on Error (or Surprisal) and realises both Perceptual inference and Learning in a Unified Framework Amen:on can be understood as inferring the level of [un]certainty (c.f., Kalman gain)

Figure from Feldman & Friston 2010

Ac:ve Inference Friston 2003,2010 BBS Review by Clark 2013

Ac:ve Inference is a generalisa:on of Predic:ve Coding to Ac:on comple:ng the Sensorimotor Loop

Ac:ons reduce predic:on error by realising predic:ons, e.g. predicted propriocep:ve state results in a predic:on error which produces a reac:on (e.g., reflects)

Innate priors and interac:on with the environment determine behaviour – no need for norma:ve quan::es like reward Varia:onal Free Energy allows to consider – in a tractable (approximate) analy:cal form – predic:ons and predic:on error under uncertainty Ac:on, Percep:on, Learning and Planning are unified under the same computa:onal principle

Ac:ve Inference and Ac:ve Percep:on

P !u | !o,γ( ) =σ (γ ⋅Q(π ))Qτ (π ) = EQ(oτ |π )[ln P(oτ |m)]" #$$$ %$$$

+ EQ(oτ |π )[D[Q(sτ | oτ ,π ) ||Q(sτ |π )]]" #$$$$$$ %$$$$$$Extrinsic value Epistemic value

Friston, Rigoli, Ognibene et al (submimed)

Agent priors on behaviour π now contain an epistemic/explora:ve part: an agent will tend to execute ac:ons that reduce its uncertainty about states of the world (c.f., maximise informa:on gain) Epistemic value corresponds to the Bayesian Surprise. Empirically people tend to direct their gaze towards salient visual features with high Bayesian surprise (I] & Baldi 2009)

Minimising Predic:on Error in a trivial way may lead an agent to get stuck in the non-‐adap:ve states, precluding Explora:ve Behaviour

Collaborators

Karl Friston (UCL)

Hector Geffner (UPF)

Thrish Nanayakkara

(KCL)

Kris De Meyer (KCL)

Giovanni Pezzulo (CNR)

Giuseppe Giglia (Uni Pa)

Yiannis Demiris (Imperial)

Gianluca Baldassarre

(CNR)

Vito Trianni (CNR)

Kyuhwa Lee (EPFL)

robotic models of active perception

Science