outline im-clever: intrinsically motivated cumulative learning versatile robots

1/30

IM-CLeVeR: Intrinsically MotivatedCumulative Learning

Versatile Robots

Gianluca Baldassarre, Marco Mirolli,Francesco Mannella, Vincenzo Fiore, Stefano Zappacosta,

Daniele Caligiore, Fabian Chersi, Vieri Santucci, Simona Bosco

2/30

OutlineIM-CLeVeR: Intrinsically Motivated

Cumulative Learning Versatile Robots

The figures of the project The project vision The 3 pillars of the project idea + 4 S/T objectives WP3: Experiments WP4: Abstraction WP5: Intrinsic motivations WP6: Hierarchical architectures WP7: Integration and demonstrators Conclusions

3/30

OutlineIM-CLeVeR: Intrinsically Motivated

Cumulative Learning Versatile Robots

Integrated project Call: Cognitive Systems, Interactions and Robotics EU funds: 5.9 ml euros 7 (8) partners Start: May 2009 End: April 2013

4/30

Vision: the problem How can we create “truly intelligent” robots?

Versatile: have many goals; re-use skills Robust: function in different conditions, with noise Autonomous: learning is paramount

Weng, McClelland, Pentland, Sporns, Stockman, Sur, Thelen, (Science, 2001):

…knowledge-based systems (e.g. production systems)… …learning systems focussed on single tasks (e.g. RL)… …evoluationary systems… Important results, but limited autonomy and scalability. . . . . . on the contrary . . .

. . . organisms do scale, are flexible, and are robust!

5/30

Vision: the idea Why are organisms so special? Looking at children…

6/30

Vision: the ideaIngredients: Powerful abstractions: “elefant on table leg”, “it slides down” Explore Record interesting states Intrinsic motivations (interesting states, learning rates):

motivate to reproduce states (goals) guide learning of skills

Skills are re-used and composed: to explore to produce new skills

Science: which brain and behavioural mechanisms are behind these processes?

Technology: can we reverse engineer them? can we design algorithms with a similar power?

7/30

Vision: 2 promises Science: we can understand organisms Technology: we can develop a new methodology for designing robots… … in particular …

Learn actions cumulatively …

…on the basis of intrinsic

motivations…

…re-use them to build other actions…

…and achieve externally

assigned goals with them.

8/30

Vision: how we will do it:3 pillars + 4 S/T objectives

WP4: Abstraction and attention

WP5: Intrinsic motivations

WP6: Hierarchical architectures to support

cumulative learning

1. Empirical investigations:

- Monkeys - Children - Adults - Parkinson patients

4. Two robotic demonstrators:- CLEVER-B- CLEVER-K

2. Computational bio-constrained models:mechanisms underlying brainand behaviour

Suitable representations

Focussing learning

Science

From Science to Technology

Technology

3. Machine-learning models:powerful algorithms and architectures

From Technologyto Science

9/30

The project WPs




cumulative learning






Focussing learning

Science


Technology


WP3WP4

WP5

WP6

WP7

10/30

WP3: Experiments and mechatronic board




cumulative learning






Focussing learning

Science


Technology


WP3

11/30

WP3: “Joystick experiment” background

USFD (Peter Redgrave & Kevin Gurney)

Actions novel outcomes dopamine BG learning

Redgrave Gurney, 2006, Nature Rev. Neuroscience

12/30

WP3: Empirical Experiments: “Joystick experiment” Method:

Adult humans and Parkinsonian patients Joystick manoeuvring (gesture, location, timing) of a cursor on a screen to

obtain reinforcement or salient event For studying: Actions novel outcomes dopamine BG learning

13/30

WP3: Empirical Experiments: “Board experiment” UCBM-LBRB (Eugenio Guglielmelli); Mechatronic board, intelligent sensors UCBM-LDN (Flavio Keller): children CNR-ISTC-UCP (Elisabetta Visalberghi): monkeys; Goals: (a) Investigating properties of stimuli causing intrinsic motivations;

(b) acquisition of skills based on intrinsic motivations

Inertial/magnetic unit + battery + wireless

Tactile sensors

Sabbatini, Stammati, Tavares, Visalberghi, 2007,Amer. J. PrimatologyCampolo, Taffoni, Schiavone,

Formica, Guglielmelli, Keller, 2009, Int. J. Sicial Robotics

14/30

WP4: Abstraction




cumulative learning




2. Computational bio-contrained models:mechanisms underlying brainand behaviour


Focussing learning

Science


Technology


WP4

15/30

WP4 Abstraction: motor, perception, attention, vergence, Abstraction is a key ingredient for intrinsic motivations and hierarchical

actions Motor: key in hierarchies Perceptual: key in intrinsic motivations: e.g., retina images would be

always novel without abstraction Attention/vergence: two key forms of abstraction

16/30

WP4 Intrinsic motivations for developing vergence and perceptual abstraction FIAS (Jochen Triesch) E.g.: reward when

target fixated with both eyesdrives development of vergence

Similar mechanisms to develop perceptual abstraction

Weber Triesch, 2009, IJCNNFranz & Triesch, 2007, ICDL

17/30

WP5: Novelty detection




cumulative learning




2. Computational bio-contrained models:mechanisms underlying brainand behaviour


Focussing learning

Science


Technology


WP5

18/30

WP5 Intrinsic (extrinsic) motivations Extrinsic motivations

(e.g. food, sex, money): Psychology (Berlyne,

White, Deci & Rayan):motivate actions to achieve specific goals

Drive actions whose effects directly increase fitness

Come back again with the homeostatic needs they are associated with

Intrinsic motivations (skill/knowledge acquis.):

Psychology: motivate actions for their own sake

Drive actions whose effects are an increase in:(a) knowledge or prediction ability;(b) competence to do

Terminate to drive actions when knowledge/ competence is acquired

19/30

WP5 Intrinsic motivations CNR-LOCEN (Gianluca Baldassarre, Marco Mirolli) Young robot: low level of hierarchy develps skills based on

evolved ‘reinforcers’ (knowledge-based intrinsic motivations) Young robot: high level of hierarchy selects skills which produce

the highest suprise (competence-based intrinsic motivations) Adult robot: high level of hierarchy performs skill composition to

achieve salient goals (external rewards fitness measure)

Adult robot tasksChild robot task

Young robot: resultsBefore learning After learning

Adult robot: results

Schembri, Mirolli, Baldassare, 2007, ICDL, ECAL, EPIROB

20/30

WP5 Novelty detection with habituable neural networks

UU: (Ulrich Nehmzow) Task: find novel elements in world Image pre-processing (abstraction) Habituable neural network

From Marsland et al. 2005 (J. Rob. Aut. Sys.)

Neto Nehmzow, 2007, Rob. & Aut. Syst.

Task

21/30

WP5 Intrinsic motivations based on information theory

IDSIA (Juergen Schmidhuber) Theoretic ML, robotics, information-theory intrins. mot. ‘Data compression improvement’ = intrinsic motivation

Schmidhuber, 2009, Journal of SICE

22/30

WP6: Hierarchical architectures




cumulative learning




2. Computational bio-mimetic models:mechanisms underlying brainand behaviour


Focussing learning

Science


Technology


WP6

23/30

WP6 Hierarchical architecturesCumulative learning needs hierarchical architectures: To avoid catastrophic forgetting To find solutions by ‘composing skills’: dirty but fast solutions, then refine Because brain is hierarchical Because brain has a (soft) modularity at all levels

From Fuster, 2001, NeuronMcgovern Sutton Fagg

24/30

WP6 Intrinsic motivations, hierarchical RL (options)

UMASS (Andrew Barto) Intrinsically Motivated Reinforcement Learning HRL: options theory

Simsek Barto, 2006, ICML; Singh Barto Chentanez, 2004, NIPS

Sutton et al., Option theory

25/30

WP6 Bio-inspired / bio-constrained hierarchical reinforcement learning

CNR-LOCEN (Gianluca Baldassarre & Marco Mirolli) Piaget theory: actions support learning of other actions Camera, dynamic arm, reaching tasks Continuous state/action reinforcement learning Hierarchical RL: segmentation, Piaget

Caligiore Borghi Parisi Mirolli Baldassarre, ongoing

26/30

WP6 Development sensorimotor mappings in robots

AU (Mark Lee) Developmental psychology and robotics Staged development of sensorimotor behaviour LCAS – Lift Constraint, Act, and Saturate

Lee Meng Chao, 2007, Rob. & Auton. Sys.Lee Meng Chao, 2007, Adaptive Behaviour;

27/30

WP7: Integration




cumulative learning




2. Computational bio-mimetic models:mechanisms underlying brainand behaviour


Focussing learning

Science


Technology


WP7

28/30

Leave a robot alone for a month

or so…

on the basis of intrinsic

motivations…

…it will build up a repertoire of actions

incrementally.

Come back and assign it a goal

(e.g. by reward)…

…and it will learn to accomplish it

very quickly.

WP7 CLEVER-K: Kitchen scenario

Main responsible: IDSIA, UU

…interacting with the environment:

3 iCub robots from

IIT (Giorgio Metta)

29/30

WP7 CLEVER-B: Board scenario

Main responsible: AU, CNR-LOCEN

30/30

Conclusions: A timely project Timely research goals:

intrinsic motivations, hierarchical architectures Within important trends:

developmental robotics computational system neuroscience emotions/motivations

In synergy with various events:EpiRob, ICDL, J. of Autonomous Mental Development

In line with EU calls:“Cognitive Systems, Interactions and Robotics”

First EU Integrated Project wholly focussed on these topics

www.im-clever.eu

outline im-clever: intrinsically motivated cumulative learning versatile robots

Documents

intrinsic motivationswp6

machinelearning models

powerful algorithms

simona boscooutlineimclever

cumulative learning1

cognitive systems

behavioural mechanisms

project wpswp4