exploring principles toward a developmental theory of

Exploring Principles Toward a DevelopmentalTheory of Embodied Artificial Intelligence

Dissertation

zur

Erlangung der naturwissenschaftlichen Doktorwurde

(Dr. sc. nat.)

vorgelegt der

Mathematisch-naturwissenschaftlichen Fakultat

der

Universitat Zurich

von

Max Lungarella

aus

Italien

Begutachtet von

Prof. Dr. Rolf Pfeifer

Prof. Dr. Yasuo Kuniyoshi

Prof. Dr. Olaf Sporns

Zurich 2004

To my lady, Haruko, smiling from t=0.

Abstract

Embodied artificial intelligence is an increasingly popular research paradigm that studies intelligence

and intelligence-like processes by putting a strong emphasis on the dynamical and reciprocal interac-

tion across multiple time scales between body and control structure of an agent, and its environment.

Although a growing number of examples document the power of this novel approach, so far, the role

of development has been marginalized, or neglected altogether. However, is it possible to understand

natural intelligence, or create an artificial one without taking into account development?

The work presented in this thesis is aimed at tackling this question. It is based on two core as-

sumptions: (a) embedding the coupling of control, body, and environment in a developmental frame-

work favors the emergence of stable behavioral patterns, and leads to adaptivity and robustness against

changes of body and environment not attainable otherwise; (b) the study of mechanisms underlying de-

velopment yields the key to a deeper understanding of intelligent behavior. The methodology adopted

is synthetic and two-pronged: on the one hand, robot technology is used to instantiate and investigate

models originating from developmental sciences, and eventually to new hypotheses about the nature

of intelligence; on the other hand, the aim is to construct better robotic systems by exploiting insights

gained from studies on development.

This thesis documents the prolific combination of embodied and developmental aspects of intelli-

gent behavior through a series of robotics case-studies in which the synergetic interaction of control,

body, and environment is explored, quantified, and purposively exploited. Moreover, it highlights the

importance of exploratory activity from the perspective of dynamical systems in the case of motor

skill acquisition, and from an information-theoretic and statistical point of view in the case of category

learning. Various mechanisms related to exploration are examined: freezing and freeing of degrees

of freedom, physical and neural entrainment, the integration of multiple time-scales, value systems,

and the self-structuring of information. As well as providing a wealth of experimental support for

the methodology advocated by developmental robotics, this thesis also outlines a set of novel design

principles for developmental systems.

i

Preface

Seven out of ten chapters of this thesis are based on material that is either published or will appear

soon. As far as possible, I have tried to weld the individual contributions into a single smooth structure.

Chapter 1 introduces the philosophy of action of developmental robotics, and presents a set of partially

novel design principles for developmental systems. These principles are then fleshed out with concrete

examples in the chapters 2 to 9. Chapter 10, eventually, concludes the thesis by summarizing its main

contributions. Here, for what it is worth, are the prior sources for parts of the text:

Chapter 2

• Lungarella,M., Metta,G., Pfeifer,R. and Sandini,G. (2003). Developmental robotics: a survey.

Connection Science (special issue on Epigenetic Robotics), L.Berthouze and T.Ziemke (eds.),

vol.15, no.4, p.151-190.

Chapter 3

• Lungarella,M. and Berthouze,L. (2002). On the interplay between morphological, neural, and

environmental dynamics: a robotic case-study. Adaptive Behavior (special issue on Plastic

Mechanisms, Multiple Time Scales, and Lifetime Adaptation), E.Di Paolo (ed.), vol.10, no.3/4,

p.223-241.

Chapter 4

• Berthouze,L. and Lungarella,M. (2004). Motor skill acquisition under environmental perturba-

tions: on the necessity of alternate freezing and freeing of degrees of freedom. To appear in

Adaptive Behavior, vol.12, no.1.

Chapter 5

• Lungarella,M. and Berthouze,L. (2004). Robot bouncing: on the synergy between neural and

body-environment dynamics. To appear in In Iida,F., Pfeifer,R., Steels,L. and Kuniyoshi,Y (eds.)

Embodied Artificial Intelligence. Berlin: Springer-Verlag.

ii

Preface iii

Chapter 7

• Lungarella,M. and Pfeifer,R. (2001). Robots as cognitive tools: information-theoretic analysis of

sensory-motor data. In Proc. of the 2nd IEEE-RAS Int. Conf. on Humanoid Robotics, p.245-252.

Chapter 8

• Te Boekhorst,R., Lungarella,M. and Pfeifer,R. (2003). Dimensionality reduction through

sensory-motor coordination. In Proc. of the Joint Int. Conf. on Artificial Neural Networks

and Neural Information Processing, p.496-503, Lecture Notes in Computer Science 2714.

Chapter 9

• Tarapore,D., Lungarella,M. and Gomez,G. (2004). Fingerprinting agent-environment interaction

via information theory. In Proc. of the 8th Intl. Conf. on Intelligent Autonomous Systems, p.512-

520.

Other publications (in chronological order)

• Meyer,F., Sprowitz,A., Lungarella,M. and Berthouze,L. (2004). Simple and low-cost compliant

leg-foot system. Submitted to the 17th Intl. Conf. on Intelligent Robots and Systems.

• Tarapore,D., Lungarella,M. and Berthouze,L. (2004). Categorization of simple objects by em-

bodied agents: a statistical approach. Submitted to the 17th Intl. Conf. on Intelligent Robots and

Systems.

• Gomez,G., Lungarella,M., Eggenberger-Hotz,P., Matsushita,K. and Pfeifer,R. (2004). Simu-

lating development in a real robot: on the concurrent increase of sensory, motor, and neural

complexity. To appear in Proc. of the 4th Intl. Workshop on Epigenetic Robotics.

• Lungarella,M. and Berthouze,L. (2003). Learning to bounce: First lessons from a bouncing

robot. In Proc. of the Second Intl. Symp. on Adaptive Motion in Animals and Machines. ThP-II-

4, electronic proceedings.

• Lungarella,M. and Metta,G. (2003). Beyond gazing, pointing, and reaching: A survey of devel-

opmental robotics. In Proc. of the 3rd Int. Workshop on Epigenetic Robotics, p.81-89.

• Hafner,V.V., Fend,M., Lungarella,M., Pfeifer,R., Konig,P. and Kording,K.P. (2003). Optimal

coding for naturally occurring whisker deflections. In Proc. of the Joint Int. Conf. on Neural

Networks and Neural Information Processing, p.805-812. Berlin: Springer-Verlag. Lecture

Notes in Computer Science 2714.

Preface iv

• Lungarella,M. and Berthouze,L. (2002). Adaptivity via alternate freeing and freezing of degrees

of freedom. In Proc. of the 9th Int. Conf. on Neural Information Processing, p.492-497.

• Lungarella,M. and Berthouze,L. (2002). Adaptivity through physical immaturity. In Proc. of the

Second Intl. Workshop on Epigenetic Robotics, p.79-86.

• Lungarella,M., Hafner,V.V, Pfeifer,R. and Yokoi,H. (2002). An artificial whisker sensor for

robotics. In Proc. of the 15th Intl. Conf. on Intelligent Robots and Systems, p.2931-2936.

• Lungarella,M., Hafner,V.V, Pfeifer,R. and Yokoi,H. (2002). Whisking: an unexplored sensory

modality. In Proc. of the 7th Intl. Conf. on the Simulation of Adaptive Behavior, p.58-59.

Contents

1 Introduction 1

1.1 Historical perspective and paradigm shift . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Embodiment and its implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 The importance of development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Developmental robotics: the short version . . . . . . . . . . . . . . . . . . . . . . . . 6

1.5 Design principles of developmental robotics . . . . . . . . . . . . . . . . . . . . . . . 7

1.5.1 The principle of cheap design . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5.2 The principle of ecological balance . . . . . . . . . . . . . . . . . . . . . . . 8

1.5.3 The value principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.5.4 The principle of design for emergence . . . . . . . . . . . . . . . . . . . . . . 10

1.5.5 The time scales integration principle . . . . . . . . . . . . . . . . . . . . . . . 11

1.5.6 The starting simple principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.5.7 The principle of information self-structuring . . . . . . . . . . . . . . . . . . 13

1.5.8 The principle of exploratory activity . . . . . . . . . . . . . . . . . . . . . . . 14

1.5.9 The principle of social interaction . . . . . . . . . . . . . . . . . . . . . . . . 16

1.5.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.6 Contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Developmental Robotics: The Long Version1 21

2.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3 In the beginning there was the body . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4 Facets of development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4.1 Development is an incremental process . . . . . . . . . . . . . . . . . . . . . 27

2.4.2 Development as a set of constraints . . . . . . . . . . . . . . . . . . . . . . . 29

1Appeared as Lungarella, M., Metta, G., Pfeifer, R. and Sandini, G. Developmental robotics: a survey. Connection Sci-ence, 15(4), pp.151-190, 2004.

v

CONTENTS vi

2.4.3 Development as a self-organizing process . . . . . . . . . . . . . . . . . . . . 30

2.4.4 Degrees of freedom and motor activity . . . . . . . . . . . . . . . . . . . . . . 30

2.4.5 Self-exploratory activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.4.6 Spontaneous activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.4.7 Anticipatory movements and early abilities . . . . . . . . . . . . . . . . . . . 33

2.4.8 Categorization and sensory-motor coordination . . . . . . . . . . . . . . . . . 34

2.4.9 Neuromodulation, value and neural plasticity . . . . . . . . . . . . . . . . . . 34

2.4.10 Social interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.4.11 Intermediate discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.5 Research landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.5.1 Socially oriented interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.5.2 Non-social interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.5.3 Agent-related control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.5.4 Mechanisms and processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2.5.5 Intermediate discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

2.6 Developmental robotics: existing theoretical frameworks . . . . . . . . . . . . . . . . 50

2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

2.8 Future prospects and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3 Freezing and Freeing Degrees of Freedom2 57

3.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.3 Learning to swing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.4 Experimental framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.4.1 Neural oscillators and joint synergy . . . . . . . . . . . . . . . . . . . . . . . 63

3.4.2 Joint control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.5 Experimental results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.5.1 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.5.2 Exploratory process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.5.3 Experimental observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4 Alternate Freezing and Freeing of Degrees of Freedom3 84

4.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

2Appeared as Lungarella, M. and Berthouze, L. (2002). On the interplay between morphological, neural, and environ-mental dynamics: a robotic case-study. Adaptive Behavior, 10(3-4), pp.223-241, 2002.

3To appear as Berthouze, L. and Lungarella, M. Motor skill acquisition under environmental perturbations: on the neces-sity of alternate freezing and freeing of degrees of freedom. Adaptive Behavior, 12(1), 2004.

CONTENTS vii

4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.3 Pendulation study and release of the peripheral degrees of freedom . . . . . . . . . . . 87

4.4 Adding nonlinear perturbations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.5 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.5.1 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.5.2 Experimental observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.6 Conclusion and future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5 On the Synergy Between Neural and Body-Environment Dynamics4 105

5.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.3 Hypotheses on infant bouncing learning . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.4.1 Neural rhythm generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.4.2 Selection of the neural control parameters . . . . . . . . . . . . . . . . . . . . 111

5.5 Experiments and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.5.1 Scenario 1 – Free oscillations . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.5.2 Scenario 2 – Forced oscillations without ground contact . . . . . . . . . . . . 113

5.5.3 Scenario 3 – Forced oscillations with ground contact (ωp = 0) . . . . . . . . . 113

5.5.4 Scenario 4 – Forced oscillations with ground contact (ωp > 0) . . . . . . . . . 114

5.6 Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6 Value-based Stochastic Exploration 118

6.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.3 Developmental inspiration and related work . . . . . . . . . . . . . . . . . . . . . . . 120

6.4 Enter simulated annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6.5 Parameter exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.6 Control problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.7 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.8 Real world setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

6.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1324To appear as Lungarella, M. and Berthouze L. (2004). Robot bouncing: on the synergy between neural and body-

environment dynamics. In Iida, F., Pfeifer, R., Steels, L. and Kuniyoshi, Y. (eds.) Embodied Artificial Intelligence. Berlin:Springer-Verlag.

CONTENTS viii

7 Information-theoretic Analysis of Sensory Data5 136

7.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

7.3 Sensory-motor coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7.5 Analysis methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

7.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

7.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7.8 Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.9 Information theoretic appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

8 Dimensionality Reduction through Sensory-Motor Interaction6 147

8.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

8.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

8.3 Real-world instantiation and environmental setup . . . . . . . . . . . . . . . . . . . . 149

8.4 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

8.5 Experiments and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

8.6 Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

9 Fingerprinting Agent-Environment Interaction 7 156

9.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

9.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

9.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

9.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

9.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

9.6 Data Analysis and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

9.6.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

9.6.2 Entropy and mutual information . . . . . . . . . . . . . . . . . . . . . . . . . 161

9.6.3 Cumulated sensor activation . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

9.6.4 Pre-processed image entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

9.7 Further Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1665Appeared as Lungarella, M. and Pfeifer, R. Robot as cognitive tools: information-theoretic analysis of sensory-motor

data. In Proc. of the 2nd IEEE-RAS Int. Conf. on Humanoid Robotics, pp.245-252, 2001.6Te Boekhorst, R., Lungarella, M. and Pfeifer, R. Dimensionality reduction through sensory-motor coordination. In Proc.

of the Joint Int. Conf. on Artificial Neural Networks and Neural Information Processing, Lecture Notes in Computer Science2714, pp.805-812, 2003.

7Tarapore, D., Lungarella, M. and Gomez, G. Fingerprinting agent-environment interaction via information theory. InProc. of the 8th Int. Conf. on Intelligent Autonomous Systems, pp.512-520, 2004.

CONTENTS ix

10 Summary and Conclusion 168

10.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

10.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

List of Figures

1.1 Coupling between body, control structure, and environment embedded in a develop-

mental framework. Shown is the information flow, e.g., the neural system affects the

musculo-skeletal apparatus via motor signals, and conversely proprioceptive sensory

information indicating the current state of the musculo-skeletal system is fed back to

the neural system. Similarly, information flows back and forth between body and envi-

ronment, and from the environment to the control structure. It follows that these three

factors cannot be considered in isolation. . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Interaction between developmental sciences, embodied artificial intelligence, and

robotics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Experimental variety: seven chapters, seven case-studies. The labels denote one or two

design principle(s) the case-study is mainly intended to address. The numbers indicate

the chapter in which the case-study is presented. . . . . . . . . . . . . . . . . . . . . 19

2.1 Examples of robots used in developmental robotics. From left to right, top to bottom:

BabyBot (LiraLab), BabyBouncer (AIST), Infanoid (CRL), COG (MIT). . . . . . . . . 24

3.1 Humanoid robot used in our experiments. . . . . . . . . . . . . . . . . . . . . . . . . 61

3.2 Schematics of the experimental system and the control architecture. Proprioceptive

feedback consists of the visual position of the hip marker in the frame of reference

centered on the hip position when the robot is in its resting position, i.e., vertical po-

sition. Joint synergy was only activated in experiments involving coordinated 2-DOF

control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

x

LIST OF FIGURES xi

3.3 Comparison between the output of the pulse generator (thick impulse) and the output

of the oscillator (solid line) for three different configurations of τu and τv, given a

same proprioceptive feedback (dotted line). The control settings were set as follows:

τu = 0.02,τv = 0.25 (top); τu = 0.06, τv = 0.25 (middle); τu = 0.06, τv = 0.75 (bottom).

Note that while the ratio τuτv

is unchanged between the top and the bottom graph, both

the frequency of the output and the number of impulses per period (i.e., the shape of

the output) are changed. The vertical axis denotes the amplitude of each signal. The

horizontal axis denotes time steps (one time step is 33ms). . . . . . . . . . . . . . . . 65

3.4 Value-dependent exploration. The upper graph depicts the time series of the oscillatory

movement of the robot’s hip (top) and the associated value v in the value system (bot-

tom). Rectangular areas point to decreases of value caused by habituation. The lower

graph depicts the corresponding trajectories in parameter space. Oval areas point at

dense regions of high yield parameter settings, i.e., the large oscillations observed in

the time series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.5 Value landscapes (left: hip parameter space; right: knee space) uncovered by a single

exploratory run in an independent 2-DOF configuration (ωs = 0). The size of a dot

(a control setting visited by the exploratory process) is proportional to the value v

obtained for that particular control setting. Initial conditions were similar for both

joints, namely, τh,ku ∈ [0.02,0.04] and τh,k

v ∈ [0.2,0.4]. The exploratory run took roughly

10 minutes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.6 Probability distribution functions of value landscapes obtained in three different sce-

narios: independent 2-DOF exploration (top), 1-DOF exploration (middle) and boot-

strapped 2-DOF (bottom). The corresponding value landscapes are found in fig-

ure 3.5, 3.11(right) and 3.13 respectively. In each graph, the value space [0.0,0.6]

was discretized into 50 bins. Simply stated, each graph indicates the probability (verti-

cal axis) that a value v (horizontal axis) occurs during the exploratory run considered.

In the three scenarios, same initial conditions were used. . . . . . . . . . . . . . . . . 70

3.7 Value landscape obtained during a systematic exploration of the knee parameter with an

arbitrarily chosen hip parameter setting (τhu = 0.045,τh

v = 0.65). The parameter space

was discretized in a 15x15 sampling and the figure is a linear approximation of the

resulting values v. Brighter colors denote higher-yield settings. The experiment lasts

about 150 minutes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

LIST OF FIGURES xii

3.8 Effect of a small change in the hip control parameters on the ankle-hip phase plots in

the independent 2DOF configuration: left, oscillatory behavior without a true station-

ary regime (τhu = 0.060, τh

v = 0.60, τku = 0.03, τk

v = 0.3); right, no oscillatory behavior

(τhu = 0.065, τh

v = 0.65, τku = 0.03, τk

v = 0.3). In both graphs, the axes denote the hori-

zontal coordinates of the hip and ankle markers’ visual positions. . . . . . . . . . . . . 72

3.9 Evidence of preferred stable states and phase transitions in the independent 2DOF

configuration: successive pseudo-stationary regimes obtained with τhu = 0.055, τh

v =

0.55, τku = 0.03, τk

v = 0.3. Each graph shows the corresponding ankle-hip phase plot.

In all graphs, the axes denote the horizontal coordinate of the hip and ankle markers’

visual positions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.10 Large amplitude smooth performance after a long transient: left, the ankle-hip phase

plot with τhu = 0.055, τh

v = 0.65, τku = 0.025, and τk

v = 0.35; right, the corresponding

time-series for hip and ankle visual positions and motor commands. . . . . . . . . . . 73

3.11 Value landscape (hip space) uncovered by a single exploratory run in a 1-DOF configu-

ration, i.e., the second DOF (knee) is frozen. The size of a dot (a control setting visited

by the exploratory process) is proportional to the value v obtained for that particular

control setting. Initially, τhu and τh

v were randomly selected in the interval [0.02,0.04]

and [0.2,0.4] respectively. The exploratory run took roughly 10 minutes. . . . . . . . . 74

3.12 Value landscape obtained during a systematic exploration of the hip parameter space in

a 1-DOF configuration, i.e., the second DOF (knee) was frozen. The parameter space

was discretized in a 15x15 sampling and the figure is a linear approximation of the

resulting values. Brighter colors denote higher-yield settings. The experiment took

about 150 minutes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.13 Effect of the freeing of the hip DOF on the exploration of the 2-DOF configuration.

Left, value landscape uncovered by a single exploratory run in a 1-DOF configuration,

i.e., the second DOF (knee) was frozen. When the system reached a stable oscilla-

tory state, here denoted by a white triangle (roughly [0.7,0.04]), the second DOF was

released. The right graph shows the value landscape uncovered by the exploratory pro-

cess in the resulting 2-DOF configuration, with an initial condition represented by the

white rectangle (roughly [0.3,0.03]). In both graphs, the size of a dot (a control set-

ting visited by the exploratory process) is proportional to the value v obtained for that

particular control setting. Initially, τh,ku and τh,k

v were randomly selected in the interval

[0.02,0.04] and [0.2,0.4] respectively. The overall experiment took roughly 20 minutes. 77

LIST OF FIGURES xiii

3.14 Effect of the freeing of the hip on the exploration of the 2-DOF configuration. Value

landscape obtained during a systematic exploration of the knee parameter space after

its release when the system was in a stable oscillatory state in a 1-DOF configuration.

The hip oscillator was initialized with τhu = 0.054, τh

v = 0.65, which corresponds to

a high-yield 1-DOF configuration. The parameter space was discretized in a 15x15

sampling and the figure is a linear approximation of the resulting values. Brighter

colors denote higher-yield settings. The experiment took about 150 minutes. . . . . . . 78

3.15 Large amplitude oscillations with a strong intersegmental coupling (ω p = 1.0) in the

independent 2DOF configuration when τhu = 0.055, τh

v = 0.65, τku = 0.025, τk

v = 0.35:

phase plots of the hip (left) and ankle (right) motions in the stationary regime. In

both graphs, the axes denote the horizontal coordinates of the hip (respectively ankle)

marker’s visual positions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

3.16 Toward a flexible 1-DOF system: Effect of an intermediate coupling (ωs = 0.50) be-

tween hip and knee on the value landscapes (left: hip parameter space; right: knee

space) uncovered by a single exploratory run in a 2-DOF configuration. In both graphs,

the size of a dot (a control setting visited by the exploratory process) is proportional

to the value v obtained for that particular control setting. Initially, τu and τv were ran-

domly selected in the interval [0.02,0.04] and [0.2,0.4] respectively. The exploratory

run took roughly 10 minutes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.1 Humanoid robot used in our experiments. . . . . . . . . . . . . . . . . . . . . . . . . 88

4.2 Resonant oscillations for (τu = 0.065,τv = 0.6) without perturbations (top). Resulting

behavior under perturbations (bottom). In each graph, the time-series denote motor

impulses (bottom), ankle position (middle) and hip position (top). In this figure, as

well as all other similar figures in this chapter, the vertical axis is unlabelled, because

it depicts time-series of different scales and units, i.e., visual positions in pixels, motor

commands in radians. The horizontal line in the lower graph corresponds to the visual

position of the location after which the rubber band is extended. The horizontal axis

denotes time in milliseconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.3 Schematics of the experimental system and neural control architecture. Joint synergy

is only activated in experiments involving coordinated 2-DOF control. . . . . . . . . . 90

4.4 Flow of the proposed experimental discussion with respect to both 1-DOF and 2-DOF

exploration (cf. Table 4.1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

LIST OF FIGURES xiv

4.5 Time-series of hip position (top) and ankle-hip phase plots (bottom) for ωhp = 0.25

(left) and ωhp = 4.0 (right). The oscillator time-constants are: τu = 0.035,τv = 0.65 in

both cases. In the upper row of plots, the vertical axis denotes the visual positions of

the ankle (left) and the hip (right). The horizontal axis denotes time in milli-seconds.

In the lower row of plots, both vertical and horizontal axes correspond to the visual

positions of the hip (left plot) and ankle (right plot) in pixels. . . . . . . . . . . . . . . 94

4.6 From top to bottom, time-series of hip and ankle positions, hip and knee motor com-

mands with the following parameters: τhu = 0.06, τh

v = 0.65, τku = 0.02, τk

v = 0.8 and

ωhp = 2.0. The horizontal axis denotes time in milliseconds. The system was manually

perturbed after about 37.5s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.7 From top to bottom, time-series of hip and ankle positions, hip and knee motor com-

mands with the following parameters: τhu = 0.06, τh

v = 0.65, τku = 0.025, τk

v = 0.35

and ωhp = 0.25. The horizontal axis denotes time in ms. The system was manually

perturbed at time 37s, 75s, 108s and 147s (vertical lines). . . . . . . . . . . . . . . . . 97

4.8 Co-existing regimes for ωs = 0.0 and τhu = 0.06,τh

v = 0.65,τku = 0.035,τk

v = 0.4 (top).

Unique in-phase oscillatory regime with ωs = 1.0 (bottom). In each graph, the time-

series denote hip and ankle positions, hip and knee motor commands (from top to

bottom). Right-hand windows are close-ups on the time-series. The horizontal axis

denotes time in msec. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.9 Results of the release of an additional degree of freedom after stabilization in a 1-

DOF configuration. Left: (τhu = 0.045,τh

v = 0.65) and (τku = 0.025,τk

v = 0.45). Right:

(τhu = 0.06,τh

v = 0.65) and (τku = 0.025,τk

v = 0.35). From top to bottom, the time-series

denote hip and ankle positions, hip and knee motor commands. The horizontal axis

denotes time in msec. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

4.10 Oscillatory behavior obtained during alternate freezing and freeing phases. Neural

parameters are unchanged and set to τhu = 0.06,τh

v = 0.65,τku = 0.03,τk

v = 0.325,ωhp =

0.5 and ωs = 0.5. From top to bottom, time-series denote hip and ankle positions, hip

and knee motor commands. The horizontal axis denotes time in milliseconds. . . . . . 100

4.11 Effect of alternate freeing and freezing of the knee. Neural parameters are unchanged

and set to τhu = 0.035,τh

v = 0.65,τku = 0.055,τk

v = 0.45,ωhp = 0.5 and ωs = 0.5. From top

to bottom, time-series denote hip and ankle positions, hip and knee motor commands.

Right-hand graphs are close-ups on the two different regimes. The horizontal axis

denotes time in milliseconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.1 Infant strapped in a Jolly Jumper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

LIST OF FIGURES xv

5.2 Left: Humanoid robot used in our experiments. Right: Schematic representation of the

robotic setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.3 Left: Basic structure of the neuro-musculo-skeletal system. The arrows in the model

show the information flow. Right: Neural rhythm generator composed of six neural os-

cillators. The solid circles represent inhibitory, and the half-circles are excitatory con-

nections. Abbreviations: he=hip extensor, hf=hip flexor, ke=knee extensor, kf=knee

flexor, ae=ankle extensor, af=ankle flexor. Not shown are proprioceptive feedback

connections and tonic excitations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.4 Forced harmonic oscillations with ground contact (bouncing) in the absence of sensory

feedback (ωp = 0). Top: τu = 0.108,τv = 0.216 and τu = 0.140,τv = 0.280, bottom:

τu = 0.114,τv = 0.228 (phase plot on the right). In all graphs, the three curves represent

the vertical displacement of the ankle, knee and hip marker in cm. . . . . . . . . . . . 114

5.5 Forced harmonic oscillations with ground contact (bouncing) in presence of sensory

feedback (ωp > 0). Top row: ωp = 0.5,τu = 0.114,τv = 0.228, bottom row: ωp =

0.75,τu = 0.140,τv = 0.280. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6.1 Metropolis-like exponential probability distribution. This figure exemplifies the effect

of the temperature on the probability to make an downhill move. See text for details. . 123

6.2 Pseudo-code of the exploration process. For explanations see text. . . . . . . . . . . . 124

6.3 Control scheme. The arrows in the model depict the information flow. . . . . . . . . . 126

6.4 Normalized value vs. time (min=0.0, max=1.0). Shown are the results for V = 1/(1 +

k Mp Tr). As can be seen from the graphs, in both cases, after 1000 “simulated” seconds

the value is already very high. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

6.5 Systematic exploration of the parameter space and resulting value landscape for Kd =

4.0. The white dots are the parameters explored during a value-based exploration. . . . 129

6.6 High-performance robot head used in our experiments. . . . . . . . . . . . . . . . . . 130

6.7 Qualitative comparison between dynamics of eye pan and neck pan degrees of freedom. 130

6.8 Time series of transient performance evaluation for eye pan degree of freedom. . . . . 131

6.9 Time series of right eye pan and neck pan degrees of freedom for V = k M p (see text).

The desired position (square wave of period 6sec) and the effective (measured posi-

tion) are superposed. The length of the series is T = 273sec (corresponding to 45

exploratory iterations). First column, first row: Complete time series of right eye pan.

Second column, first row: Complete time series of neck pan. Second and third row are

close-ups of beginning and end of the stochastic exploration of eye and neck, respectively. 134

LIST OF FIGURES xvi

6.10 Time series of right eye pan and neck pan degrees of freedom for V = k M p Tr (see

text). The desired position (square wave of period 6sec) and the position measured

via encoder are superposed. The length of the series is T = 439sec (corresponding to

73 exploratory iterations). First column, first row: Complete time series of right eye

pan. Second column, first row: Complete time series of neck pan. Second and third

row are close-ups of beginning and end of the stochastic exploration of eye and neck,

respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

7.1 Left: Basic manipulator geomtry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

7.2 Shannon Entropy for different sensory channels measure in bits. Left: No sensory-

motor coordination. Right: Sensory-motor coordination (foveation on red objects). . . 142

7.3 Mutual information between receptors of the same sensory modality. Random actua-

tion on the left. Sensory-motor coordination on the right. . . . . . . . . . . . . . . . . 143

7.4 Cumulated stimulation of the R, G, and B-receptors. The sensory-motor coordinated

case in on the right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

8.1 Environmental setup. Object of different shape can be seen in the background. In a

typical experiment the robot started in one corner of the arena, and dependent on its in-

built reflexes, it tried to avoid obstacles, circled around them, or just tracked a moving

obstacle (the small cylinder in the front). Note that the omnidirectional camera on the

robot was not used for the experiments discussed here. . . . . . . . . . . . . . . . . . 149

8.2 Use of dimension reduction techniques, exemplified by the image data. (a) How the

robot perceives an object when approaching it (experiment 1, no sensory-motor coor-

dination). Moving forward, the image of a static object shifts to the periphery of the

visual field. (b) A contour plot of the image data displayed as a time series of the pixel

intensities. Vertical axis: pixel locations. Horizontal axis: time steps. The peripheral

shift shows up as an upward curving trace. (c) A 3D plot of (b) with pixel intensity

plotted along the vertical axis. Here the trace is visible as a trough cutting through a

landscape with a ridge on the right side. (d) A reconstruction of (c) based on the first

5 PCs, which explain 95% of the variance. (e) The same as (d) but based on average

factors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

8.3 Results of experiments 1-3 (no sensory-motor coordination). Left: experiment 1. Cen-

ter: experiment 2. Right: experiment 3. From top to bottom (and for all columns)

the vertical axes are H(λ), λmax, and Npc. In all graphs the horizontal axis denotes

time. The curves are the means from up to 15 experimental runs and the bars are the

associated 95% confidence limits around those means. For details refer to text. . . . . 153

LIST OF FIGURES xvii

8.4 Results of experiments 4 and 5 (sensory-motor coordination). Left: experiment 4.

Right: experiment 5. From top to bottom (and for all columns) the vertical axes are

H(λ), λmax, and Npc. The horizontal axis denotes time. For details refer to text. . . . . 154

9.1 (a) Bird’s eye view on the robot and its ecological niche. The trace depicts the path of

the robot during a typical experiment. (b) Schematic representation of the simulated

agent. The sensors have a position-dependent range: if rl is the length of the robot, the

range of d0, d1, d9, and d10 is 1.8rl, the one of d2 and d3 is 1.2rl, and the one of d4,

d5, d6, d7, and d8 is 0.6rl. (c) Extended Braitenberg Control Architecture: As shown,

four processes govern the agent’s behavior. . . . . . . . . . . . . . . . . . . . . . . . 159

9.2 Correlation matrix obtained from the pair-wise correlation of the distance sensors for

one particular experimental run during the behavioral state: (a) “exploring”, (b) “track-

ing”, (c)“circling.” The higher the correlation, the larger the size of the square. From

left to right the average correlation is: 0.011±0.004, 0.097±0.012, and 0.083±0.041,

where ± indicates the standard deviation. . . . . . . . . . . . . . . . . . . . . . . . . 162

9.3 Correlation matrix obtained from the pair-wise correlation of the red channels for one

particular experimental run during the behavioral state: (a) “exploring”, (b) “tracking”,

(c) “circling.” The higher the correlation, the larger the size of the square. From left

to right the average correlation is: 0.053± 0.023, 0.309± 0.042, and 0.166± 0.031,

where ± indicates the standard deviation. . . . . . . . . . . . . . . . . . . . . . . . . 162

9.4 Mutual information matrix obtained by estimating the mutual information between

pairs of proximity sensors in one particular experimental run during the behavioral

state: (a) ”exploring”, (b) ”tracking”, (c) ”circling”. The higher the mutual informa-

tion, the larger the size of the square. . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

9.5 Mutual information matrix obtained by estimating the mutual information between

pairs of red channels in one particular experimental run during the behavioral state:

(a) ”exploring”, (b) ”tracking”, (c) ”circling”. The higher the mutual information, the

larger the size of the square. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

9.6 (a) Plot of activation levels for the proximity sensors (1 to 12) for the three behavioral

states. (b) Plot of activation levels for the image sensors (1 to 24) for the three behav-

ioral states. The plots display the average computed over 16 experimental runs. The

bars denote the standard deviation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

9.7 Entropy of the effective red color averaged over all vertical slices. P1: exploring; P2:

tracking; P3: circling. The plot displays the average computed over 16 experimental

runs. The bars denote the standard deviation. . . . . . . . . . . . . . . . . . . . . . . 166

LIST OF FIGURES xviii

10.1 Seven chapters, seven case-studies. The labels denote one or two design principle(s)

the case-study intends to address. The numbers indicate the chapter. The picture is the

same of Chapter 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

List of Tables

1.1 Overview of design principles for developmental systems. . . . . . . . . . . . . . . . 17

2.1 Facets of development at a glance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2 Explicitely invoked developmental facet(s). NA = Not Available. . . . . . . . . . . . . 40

2.3 Representative examples of developmentally inspired robotics research. AVH = Active

Vision Head, UTH = Upper-Torso Humanoid, MR = Mobile Robot, HD = Humanoid,

HGS = Humanoid grasping system, UTH+MR = Upper-Torso Humanoid on Mobile

Platform, MR+AG = Mobile Robot equipped with Arm and Gripper, RS = Robotic

System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.1 Synopsis of the control parameter settings used in Figure 4.4. . . . . . . . . . . . . . . 93

6.1 EPR = Eye-pan right; Mp0: initial normalized overshoot; Mp f : final normalized over-

shoot; Tr0: initial rise-time (in sec.); Tr f : final rise-time (in sec.). . . . . . . . . . . . . 129

6.2 EPR = Eye-pan right; Mp0: initial normalized overshoot; Mp f : final normalized over-

shoot; Tr0: initial rise-time (in sec.); Tr f : final rise-time (in sec.). . . . . . . . . . . . . 131

xix

Chapter 1

Introduction

The quest for artificial intelligence is the quest for human nature. Anonymous

And He breathed into his nostrils the breath of life; and man became a living soul. Genesis II,7

I propose to consider the question, “Can machines think?” This should begin with definitions of

the meaning of the terms “machine” and “think”. Turing (1950)

Despite being crude (the tortoises) conveyed the impression of having goals, independence, and

spontaneity. Walter (1950)

Any aspect of learning and any other characteristic of intelligence may - in principle - be described

so precisely as to be simulated through a machine. McCarthy (1956)

Can machines think? Can they autonomously acquire novel skills? And then, what is the role played

by development? How does intelligent behavior emerge from the interaction between a developing or-

ganism and its environment? Can an artificial being through self-directed exploratory activity discover

interesting and unexpected strategies to exploit the interaction of body, control, and environmental

structure? It is undeniable that these are truly difficult questions.

The core speculation of this thesis is that the recent convergence of developmental sciences, em-

bodied artificial intelligence, and robotics not only gives rise to a prolific approach to seek novel an-

swers to such old issues, but also constitutes a cornerstone of a developmental theory for designing

and constructing intelligent adaptive systems. By uniting psychologists, neuroscientists, engineers,

and computer scientists in the quest for understanding intelligence, and synthesizing intelligent behav-

ior, developmental robotics (as the methodology will be referred to in this thesis) together with other

similar approaches, also paves the way to a novel and interdisciplinary style of conducting research in

which robots are perceived as means to achieve an end (understanding principles underlying intelligent

behavior), and not merely as an end unto themselves (what is typically done in robotics). In other

words, as the case-studies presented in this thesis document, development can inspire the construc-

1

1.1. Historical perspective and paradigm shift 2

tion of robots, and – conversely – robots can be used as tools to model aspects of development (see

Fig. 1.2). Concerning the latter point, it is important to note that in contrast to living beings, robots

have the methodological advantage that their internal states are accessible and can be recorded for sub-

sequent analysis. Moreover, it is possible to simply build various assumptions into the system (e.g.,

lesions), and perform tests without having to worry about ethical issues.

Unlike previous approaches to the synthesis of intelligent behavior (see following section), devel-

opmental robotics – borrowing directly from one of the basic tenets of embodied artificial intelligence

– holds that a system’s control structure cannot be decoupled from the body, and from the system’s

interaction with the local environment. Yet, developmental robotics, as asserted in this thesis, goes one

step further. The main intuition resides in the realization that embedding the reciprocal and dynamic

coupling of the three aforementioned factors in a developmental framework simplifies the emergence

of stable behavioral patterns, and provides the system with adaptivity and robustness against changes

of body and environment. The developmental part purports to emphasize the importance of the in-

teraction between experience and maturation in shaping the emergence and development of cognitive

structure, motor skills, and behavior. Whereas experience typically pertains to the permanent effects

of environmental conditions, task requirements, and learning, maturation refers to physical changes

of control and body morphology. It follows that the couplings between all these factors need to be

adequately taken into account and integrated into the design process (Fig. 1.1).

This thesis documents a series of developmental robotics case-studies in which the synergetic in-

teraction of control, body, and environment is explored, quantified, and purposively exploited. Based

on those case-studies, a set of novel, computational, and integrative design principles is abstracted.

It is our strong conviction that the experimental validation and quantification of the proposed design

principles may represent a founding block of a developmental theory of artificial systems, which could

have a big impact not only on developmental robotics, but also on other related fields of research.

1.1 Historical perspective and paradigm shift

As the epigraphs at the beginning of this chapter document, for a long time, people have been romanced

by the idea of constructing intelligent machines and replicating intelligent behavior displayed by hu-

mans and animals. Traditional Jewish mysticism includes tales of the Golem, a thinking automaton

made from the sticky clay of the bank of the river Moldau. In the 17th century, philosopher Gottfried

Wilhelm von Leibnitz outlined plans for a thinking machine by conceiving an artificial universal lan-

guage composed of symbols, which could stand for objects or concepts, and logical rules for their

manipulation. Alan Turing devised a much-discussed imitation game used as a yardstick or assessing

if a machine is intelligent or not that since then has been known as the Turing Test for artificial intel-

ligence (Turing, 1950). Gray Walter constructed the tortoises Elmer and Elsie that displayed tropism

1.1. Historical perspective and paradigm shift 3

neural systembrain

sensory feedback(exteroceptive)

(elasticity

, complia

nce ...)

mechanical feedback

actio

ns

mechanical systemmusculo−skeletal system

(proprioceptive)

sensory feedbackm

otor signals

Environment

Control structure

Body

Development

experience

maturation

maturationexperience

Figure 1.1: Coupling between body, control structure, and environment embedded in a de-velopmental framework. Shown is the information flow, e.g., the neural system affects themusculo-skeletal apparatus via motor signals, and conversely proprioceptive sensory infor-mation indicating the current state of the musculo-skeletal system is fed back to the neuralsystem. Similarly, information flows back and forth between body and environment, and fromthe environment to the control structure. It follows that these three factors cannot be consid-ered in isolation.

and reactive behaviors (Walter, 1950, 1951).

While the advent of the computer in the fifties of last century did not change the dreams and ambi-

tions of people, it made artificial intelligence a reasonable possibility. Thereafter, numerous research

groups around the world have been engaged in the construction of artificial systems with the professed

goal of emulating, equaling, or even surpassing all of our mental and physical abilities. In particu-

lar, classical “Good Old Fashioned Artificial Intelligence” (GOFAI) research has attempted (in vain)

to synthesize intelligence or higher cognition, by formalizing knowledge and crystallizing cognitive

principles mostly obtained from the study of adult human beings. It was hoped that a powerful “logic

system” combined to a massive database of common knowledge could be constructed for general prob-

lem solving (the essence of intelligence). One of the most unfortunate consequences of this attempt to

construct artificial intelligence has been the tacit acceptance of a strong and explicit separation between

cognitive language-like data structures (symbols and representations), the mechanisms that operate on

these structures (algorithms, search procedures), and the machine used to implement that program

(hardware).

1.2. Embodiment and its implications 4

This research effort has undergone a very significant paradigm shift triggered almost twenty years

ago, when some researchers realized that the shortcomings of the good old fashioned approach had

nothing to do with the relative paucity of the knowledge the systems explicitly encoded. Rather, they

thought that these shortcomings could be attributed to the lack of a fluid coupling between the system

and a real-world environment posing real-world problems of sensing and acting. Concepts, such as

situatedness (that is, the fact that embodied beings sense and act in a real physical environment) and

embodiment came to the forefront and spawned some of the most exciting and groundbreaking work

in the contemporary study of natural and artificial intelligence.

1.2 Embodiment and its implications

Embodied artificial intelligence is an increasingly popular research paradigm that studies intelligence

and intelligence-like processes by putting a strong emphasis on the dynamical and reciprocal interac-

tion across multiple time scales between brain and body of an agent, and its environment. Its method-

ology is synthetic and does not represent conventional science, but rather a fine blend of science and

engineering. That is, the aim is to understand the nature of adaptive intelligence by “building” robust

artificial systems. The adoption of such a ”synthetic methodology” leads, surprisingly quickly, to a

radical rethinking of many of the old and comfortable ideas about the nature of intelligence.

Embodied artificial intelligence incorporates explicitly aspects of body morphology, motor activ-

ity, and interaction with the local environment in its theoretical framework. Embodiment has proven

to be an essential characteristics of adaptive systems whose importance can hardly be overemphasized.

The coupling between body, brain, and environment implies that an embodied agent is continuously

exposed to a stream of sensory stimulation, to physical forces (e.g., gravity), to energy dissipation,

to wear and tear, and to damage. Long and short-term influences of the environment on the agent’s

brain and body constitute a physical implication of embodiment. It is important to understand that em-

bodiment has not only a physical implication, but an information-theoretic one as well. An embodied

agent does not passively absorb information from its surrounding environment, but due to its particular

morphological setup, and through its actions on the environment, it actively structures, selects, and

exploits such information. That is, an embodied system, by being naturally coupled to the environment

through sensory-motor interaction can shape its own sensory experience, and the quality of the sensory

data relayed to its control architecture (e.g., brain).

1.3 The importance of development

Another assumption of the cognitivistic research paradigm was the neglect of ontogenetic development

by marginalizing it, and relegating it to the role of an in principle necessary, but all in all dispensable

1.3. The importance of development 5

transient. However, is it possible to create artificial cognition without resorting to developmental mech-

anisms? Do intelligent autonomous systems really need to undergo an initial phase of development?

And, how central is the role played by developmental processes in the emergence of cognition?

These and other questions, led an increasing number of researchers of AI, robotics, and autonomous

systems design to diverge from this non-developmental approach by rejecting its nativist flavor. Indeed,

making a fully equipped intelligent and complete adult robot might either involve too much work, or

be beyond our intellectual and technological capabilities. It could turn out that any adaptive artificial

creature needs to be, if not born, at least the beneficiary of a longish period of infancy. It is therefore not

surprising that development is turning into one of the core issues in the ongoing endeavor of creating

intelligent systems. Their “developmental” control architectures are worked up by starting from real

neuroscientific and developmental data. It is relied more on getting such systems to learn and develop

by themselves, or – by pushing the designer commitments even further back – to mimic genetic mod-

ifications and to evolve generations of progressively more refined artificial systems that once evolved,

develop and learn through interaction with the environment.

It is reasonable to assume that it might be vastly easier to engineer or “artificially evolve” an ini-

tially primitive and simplistic infant robot that then could be let mature and develop, more or less the

way we all do. Further, the mere observation that in contrast to artificial systems, almost all biolog-

ical systems – to different extents at least – mature and develop, bears the compelling message that

development may be one of the main reason why the robustness, adaptivity, and versatility of organic

compounds still transcend the one of artificial systems. In humans, for instance, adult skills do not

spring up fully formed at birth, but emerge over a prolonged period of time by learning, and by ex-

periencing the rough-and-tumble environment of the real world in which each individual acquires its

own history (Thelen, 1999). Further, the state of immaturity of sensory, motor, and cognitive systems,

a salient characteristic of development that at first sight appears to be an inadequacy and of which

artificial systems are deliberately devoid, rather than being a problem might be of advantage. There

is evidence showing that early morphological and cognitive limitations effectively decrease the “in-

formation overload” (at a perceptual, motor, and cognitive level) that would otherwise most certainly

overwhelm infants, and may lead – according to a theoretical position pioneered by Turkewitz and

Kenny (1982) – to an increase of the adaptivity of the organism.

It follows that the study of the mechanisms underlying development might yield the key to a deeper

understanding of intelligent systems. There are a number of studies attempting to elaborate such mech-

anisms using connectionist models, such as the one described in the seminal book by Elman et al.

(1996), the study by Dominguez and Jacobs (2003), or the one by Sirois and Mareshal (2002). These

models are “disembodied” as they do not take into account any form of interaction between brain,

body, and real world. It has become increasingly clear, however, that in order to understand (percep-

tual, motor, cognitive) development and the emergence of cognition, it is not possible to entirely bypass

1.4. Developmental robotics: the short version 6

embodiment, that is, the continuous and mutual interaction of brain, body, and environment across mul-

tiple time scales (Eliot, 2001; Goldfield, 1995; Piaget, 1953; Thelen and Smith, 1994). Developmental

robotics strives to fill in this gap.

1.4 Developmental robotics: the short version

Developmental robotics is clearly an intellectual offshoot of embodied artificial intelligence, and as

such incorporates ideas from an equally wide range of disciplines: robotics, artificial intelligence,

developmental psychology, developmental neuroscience, cognitive science, and biology. Probably, a

good way to understand an interdisciplinary science is through its central aims, and its core assump-

tions. However, how are we to go about it in the case of developmental robotics? What are its central

aims? The following section will suggest an answer to the first question. Here, we give two possible

answers to the second one (Fig. 1.2):

• Developmental robotics aims at understanding the development of cognitive processes in natural

and artificial systems, and how such processes emerge and develop through the fluid interplay

of brain, body, and local environment (Fig.1.1). Robots are used as research tools to instantiate

or validate developmental models of cognition and action. By taking into account the embodied

nature of intelligence new hypotheses about natural phenomena are put forward, and predictions

made.

• Developmental robotics aims at conceiving a coherent set of principles to facilitate the design

and construction of intelligent systems. Such principles will eventually lead to a general theory

for developmental systems (Table 1.1).

First, it is important to note that these two goals (one analytic, and one synthetic) are coupled by

a mutually causal relationship. In fact, an understanding of cognition may be tightly linked to the

ability of engineering autonomous intelligent machines. In a sense, this is the essence of the synthetic

methodology (“understanding by building”). Further, it is important to note that developmental robotics

does not aim at mimicking or imitating nature, but only at taking inspiration from it, and at promoting

intuition. As already pointed out, development provides us with a strategy to tackle old issues in novel

ways. No organic lineage, for instance, has been able to avail itself of the possibility of inheriting

acquired characteristics to its offspring – an evolutionary hypothesis known as Lamarckian evolution.

From the absence of examples of Lamarckian evolution in nature it is not possible to deduce, however,

that it cannot be employed for constructing robots and other artificial creatures. Rather, the opposite

may be the case. Engineering artificial creatures by means of a developmental approach may indeed

involve a series of iteration-production cycles conceptually similar to Lamarckian evolution in which

1.5. Design principles of developmental robotics 7

robot technology

synthesis of intelligent systems

IntelligenceArtificial

EmbodiedRobotics

natu

ral p

heno

men

a

new h

ypot

hese

s ab

out

desig

n pr

incip

les

for

inte

lligen

t sys

tem

s

RoboticsDevelopmental

DevelopmentalSciences

inspirations

intuitions and modelling

tools

Figure 1.2: Interaction between developmental sciences, embodied artificial intelligence, androbotics.

newborn agents are initialized with knowledge and control structure acquired by individuals of previous

generations (see Dennett, 1998, for a similar point).

1.5 Design principles of developmental robotics

Is there a theory of developmental robotics? To date there is no definitive answer to this question.

However, en route to such a theory, it is possible to point out a set of principles (or guidelines) aimed at

capturing design ideas and heuristics in a concise and pertinent way, and which could be employed

for actually designing and constructing intelligent autonomous systems. Indeed, courtesy of their

constructive nature, such design principles represent tangible examples – the essence, one might argue

– of the synthetic methodology. A further advantage of such a principled approach stems from the fact

that a set of principles is a flexible entity amenable to extensions, patches, and changes. The idea is

to carefully observe complex systems (natural and artificial) and to seek generic principles of adaptive

behavior based on the assumption that some of those principles might be at work in other systems or at

other levels as well. By devising experimental scenarios to quantify and possibly validate the proposed

principles one is forced to think about the interaction between them, and hence about the interaction

between various aspects of intelligent behavior.

In the field of embodied artificial intelligence, a coherent set of design principles for intelligent


systems has already been proposed by Pfeifer (1996), and was thoroughly discussed in (Pfeifer and

Scheier, 1999). Although all these principles are significant in a way or another for the developmen-

tally inspired design and construction of robots, to design developmental agents a number of additional,

more specific issues need to be addressed. In this sense, the set of design principles for developmen-

tal systems subsequently brought forward does not represent a mere subset, but an extension of the

previously proposed design principles for intelligent systems.

An overview of the proposed principles is given in Table 1.1. In some respects, the table formalizes

in an extremely compact form, a significant part of the insights of the very rich literature of various

fields that is relevant for the design of intelligent developmental systems. It is important to note that

the principles have been deliberately stated in a general way, to help us keep the grand scheme in mind

and not get bogged down in details. Each of the principles can of course be spelled out more explicitly,

and this, in fact, is done in each chapter of the thesis.

1.5.1 The principle of cheap design

This principle asserts that the design of a developmental agent must be parsimonious, and must exploit

the physics of the system-environment interaction, as well as the constraints of the agent’s ecological

niche.

Parsimony (or simplicity) is a general modeling principle (known also as Occam’s razor 1) which

admonishes the designer to choose from a set of otherwise equivalent explanations or models of a

given phenomenon the one that makes less assumptions. Its logical implication being that there is less

chance of introducing inconsistencies, ambiguities and redundancies in the design. In this sense, design

of autonomous agents should rely more on exploiting the idiosyncrasies of the system-environment

interaction, on the proper choice of materials and morphology (spatial arrangement and properties of

sensors and effectors), as well as on emergence, but less on computation. For an in depth discussion of

the “principle of cheap design” and many examples, see (Pfeifer and Scheier, 1999, ch.13).

Chapters 3, 4, and 5 provide good illustration of this principle. These chapters document the

properties of physical entrainment (mutual and rhythmic regulation of the intrinsic dynamics of the

body, and the environment) and of neural entrainment (body-mediated regulation of neural and envi-

ronmental dynamics). Entrainment is a particular form of emergent system-environment coupling that

if adequately exploited can simplify control, and improve stability of a system.

1.5.2 The principle of ecological balance

This principle states that the agent’s complexity (in this case: its behavioral diversity) has to match the

complexity of the environment as measured by the agent’s sensory apparatus; further, given a certain

1“One should not increase, beyond what is necessary, the number of entities required to explain anything.”


task environment, a balance is required between the complexity of the sensor, motor, and control system.

Here, the word complexity is used in its intuitive connotation, that is, as the number of components

that can be independently varied in an agent’s sensory, motor, and control system. Such components

are also referred to as degrees of freedom associated with a particular system. For example, a humanoid

robot with 40 mechanical degrees of freedom is mechanically more complex than a two-wheeled mo-

bile robot. For a set of less intuitive descriptions of complexity see (Gell-Mann, 1995), for instance.

The principle also asserts that given a particular task environment, there is a sort of natural point of

equilibrium or balance between the agent’s control structure, the material properties of the agent’s

body, and its morphology (i.e., the agent’s structural characteristics, its sensory-motor setup – accu-

racy, distribution, resolution of actuators and sensors, and so forth). Again, for a thorough discussion

and many instantiations of this principle, refer to (Pfeifer and Scheier, 1999, ch.13).

One of the main difficulties with this principle is its qualitative nature (see also Ishiguro et al.,

2003). First steps in the direction of quantifying the complexity of the agent-environment interaction

(such as perceived through the agent’s sensors) are exemplified in chapters 7, 8, and 9.

1.5.3 The value principle

This principle states that for a developmental process to take place and for an autonomous agent to

behave adaptively in the real world, a set of mechanisms for self-supervised learning, and a repertoire

of basic motivations and values must be provided that shape the development of the agent’s control and

bodily structure.

Value systems clearly satisfy this requirement. They not only modulate learning (via neural or

hormonal signals, for instance), but they also mediate neural plasticity in a self-supervised and self-

organized manner. Their output informs the agent whether an action was good or bad, and depending

on the result, the probability of that action being repeated in the future is either increased or decreased.

Thus, value systems play a pivotal role in adaptive behavior. For more details on value systems and

their relevance for natural and artificial systems, please refer to Chapter 2, and to (Pfeifer and Scheier,

1999, ch.14).

This principle is also about motivation, that is, why behavior happens in the first place. Indeed,

motivation can be thought of as the major driving force of behavior. It seems that to date no gener-

ally accepted answers to this question have been conceived. Research on motivation and emotion is

highly relevant in this context, because emotions – like values – play a primary causal role in per-

ception and action, and in shaping experience (Breazeal, 2002; Manzotti, 2000; Pfeifer, 2000; Picard,

1997). In human infants, for instance, emotions have been hypothesized to protect the integrity of the

body, to guide perception, activity, and learning, and to regulate social interaction with other agents or

people (Trevarthen, 1993).


Further examples of this principle are given in chapters 3, 4, and 6. In these chapters the exploration

of the parameter space associated with the neural system is driven by a value system. This principle is

strongly tied to the “principle of exploratory activity.”

1.5.4 The principle of design for emergence

This principle asserts that the agent should not be completely designed, but rather should be endowed

with the ability to self-direct the exploration of its own sensory-motor capabilities, and with means to

escape its limited in-built behavioral repertoire, and to acquire its own history.

One of the basic tenets of developmental robotics is that the designer should not try to “code intel-

ligence” directly into the artificial system – in general an extremely hard problem. Instead, the system

should be equipped with an appropriate set of basic mechanisms or features to autonomously develop,

learn, and behave in a way that appears intelligent to an external observer. Agent-related features (pa-

rameters) in this case can be anatomical (e.g., body, materials, characteristics of the sensors) as well as

related to control (e.g., neural). Clearly, it is not trivial to decide which features and mechanisms have

to be innately fixed at the outset, and which one should be learned or trained up by the interaction of

the system with its local environment. This principle asserts that by relying more on emergence, the

choice of the ensemble of basic skills and mechanisms is not as important as generally thought.

On the contrary, it is more important “not” to completely specify at the outset every aspect of the

agent’s design, but rather to endow the agent (a) with a minimal set of mechanisms to self-direct the

exploration of its own sensory-motor capabilities, and (b) with means to escape its limited built-in

behavioral repertoire, and to acquire its own personal history. In other words, the designer should

design for emergence. This means (by definition) that it will not be possible to predict the system’s

behavior through analysis at any level simpler than the system as a whole. One of the main advantages

of systems designed for emergence – in contrast to systems in which emergence is not possible – is

that they tend to be more adaptive and robust against environmental perturbations and changes (such

as growth, or task modifications). It is important to note that here “emergence of behavior” has a

pragmatic connotation, that is, in the sense of not being pre-programmed. Thus, the final (emergent)

structure is the result of the history of the interaction of the agent with the – simulated or real world –

environment.

The emergence of structured patterns or global order from local interactions between the com-

ponents of a system without the need of explicit instructions, is a characteristic feature of self-

organization. The process of self-organization can lead either to permanent changes in the system

(self-organization with structural changes), or induce reversible formation of patterns (self-organization

without structural changes). The latter form of self-organization is frequently found in collective phe-

nomena (Pfeifer and Scheier, 1999). Finally, we note that emergence is always the result of a system-


environment interaction, and therefore a matter of degree. This means that behaviors are typically

neither completely emergent nor completely preprogrammed. The more removed from the actual

behavior the designer commitments are made, the more the resulting behavior is called “emergent.”

This principle is related to the “principle of self-organization” discussed in (Pfeifer and Scheier, 1999,

ch.14).

As in the case of the “principle of cheap design”, chapters 3, 4, and 5 provide good illustrations of

the principle discussed in this subsection. As exemplified by those chapters, entrainment can cause in

certain instances coordinated movements to emerge from interaction of control structure, body struc-

ture, and surrounding local environment (e.g., Kelso, 1995; Taga, 1991). Chapter 3 gives also concrete

evidence for abrupt phase transitions from a stable pattern to another one. It suffices to note here

that such phase transitions are a typical property of emergent design and are often observed in natural

systems.

1.5.5 The time scales integration principle

This principle states that when designing a developmental agent, a number of different time scales

exist that have to be taken into account; developmental and learning mechanisms must be conceived

to achieve a smooth integration of those time scales.

Neural dynamics; body dynamics; learning through trial-and-error, reinforcement, and observa-

tion; development of brain and body; evolutionary adaptations, and other dynamic processes and com-

ponents contributing to behavior are all characterized by different time scales. Neural dynamics, for

instance, is based on neural activity and transient (short-term) synaptic changes necessary for perceiv-

ing and acting, and on permanent (long-term) changes resulting from learning. Behavior, however,

seems to be flexibly self-organized at all these time scales (e.g., Goldfield, 1995; Kelso, 1995; Thelen

and Smith, 1994). Clearly, in a sufficiently complex system there is no a characteristic “global” time

scale, but all processes and sub-processes proceed on a wide range of time scales, and all time scales

are integrated and continuously meshed together. It is therefore important not to gloss over the link-

ages between the various time scales when designing or constructing a behaving system. For example,

we can define the oscillation frequency of a neural rhythm generator, but as soon as the generator is

coupled to the mechanical system through afferent sensory feedback, the oscillation frequency changes

– refer to (Grillner, 1985), or to Chapter 3 for concrete examples of this phenomenon. And conversely,

we can fix the mass of a limb and theoretically predict its pendular oscillation frequency, however, as

soon as we couple it to a neural system, we change its natural (intrinsic) oscillation frequency. From

this example, it is also possible to imply that some time scales are under conscious decision control

whereas others are not.

Sometimes, time scales have to be chosen carefully. Take motor skill acquisition, for instance. It


has been demonstrated that initially, while learning a new skill or movement, the peripheral degrees of

freedom (the ones further from the trunk, such as wrist and ankle) are reduced to a minimum through

tight joint coupling (freezing of degrees of freedom). Subsequently, the restrictions at the periphery are

gradually weakened so that more complex movement patterns can be explored (freeing of degrees of

freedom). The designer of a developmental system is forced to address issues such as at what point in

time the system should freeze, and when it should unfreeze in order to simplify motor skill acquisition.

Morphological changes (here: freezing and unfreezing of degrees of freedom) are a form of plastic

mechanism, and as for any mechanism of plasticity they have their own characteristic time scale. But

then, how should the choice of various time constants be made?

This principle is addressed in chapters 3, 4, and 5, which discuss the interaction between various

types of dynamics having different time scales.

1.5.6 The starting simple principle

This principle asserts that a gradual and well-balanced increase of both the agent’s internal complexity

(perceptual and motor) and its external complexity (regulated by the task environment or an instructor)

speeds up the learning of tasks and the acquisition of new skills, compared to an agent that is complex

from the onset. Also, the mechanisms by which the agent’s complexity can be successively increased

and integrated with its neural and morphological dynamics need to be specified.

The rationale of this principle is that the co-development of an agent’s sensory, motor, and control

structures, and of its environmental setup, while not necessarily leading to an optimal task performance,

guarantees a more efficient exploration of the agent’s sensory and motor space. For example, the

initial immaturity and wide spacing of photoreceptors in the infant retina, as well as limitations on the

accommodative system, significantly limit what the infant sees. The specific effect is to filter out high

spatial frequency information, and to make objects that are close to the infant most salient. It is not

unreasonable to assume, however, that such limitations may facilitate, for instance, the learning about

size constancy (see Turkewitz and Kenny, 1982). More generally, early morphological constraints and

cognitive limitations can lead to an increase of the adaptivity of a developing system. That is, the

immaturity of sensory, motor, and cognitive systems, which at first sight appear to be an inadequacy,

are in fact of advantage, because they effectively decrease or eliminate the “information overload”

that otherwise would most certainly overwhelm the infant. Following similar lines of argumentation,

several other researchers have suggested that the processing limitations of young learners, originating

from the immaturity of the neural system, can actually be beneficial for the acquisition new skills,

and the learning of tasks (Bjorklund and Green, 1992; Dominguez and Jacobs, 2003; Newport, 1990;

Elman, 1993; Westermann, 2000).

The primary difficulty with this principle is the lack of flexibility it affords. This deficiency gives


rise to the constraint-flexibility dilemma: The more constrained a system is, the less flexible it be-

comes. Indeed, on the one side, the introduction of constraints reduces the number of parameters of

a learning problem, or the space of possible limb configurations, thus, speeding up learning; on the

other side, such constraints may preclude the system to explore potentially interesting parameter sets,

or movement patterns.

This design principle is related to the “principle of ecological balance”, and to the “principle of

cheap design.” It pertains to the principle of ecological balance in the sense that if the system starts

simple (i.e., its internal and external complexity are kept low), it is probably easier to make sure that

the system is ecologically balanced at all instants in time during its developmental history. It relates to

the principle of cheap design because it strives for simplicity and parsimony. However, it is important

to note that this principle is not propose a minimalist approach. Starting simple does by no means

imply that the system needs to be trivial. The system needs to be as simple as possible, but not simpler.

Chapters 3 and 4 explicitly address the “starting simple principle.” More specifically, the two chap-

ters document a comparative analysis between the outright use of four out of four degrees of freedom,

and the progressive involvement of all degrees of freedom by using a developmental mechanism of

freezing and freeing (two degrees of freedom are initially blocked, and subsequently released), such

as hypothesized by Bernstein (1967). The results show that in case of reduced task complexity, such

mechanism might be indeed simplify emergence of stable movement patterns.

1.5.7 The principle of information self-structuring

This principle asserts that an embodied agent is not passively exposed to information from its surround-

ing environment, but due to its particular morphology, and through its actions on the environment, it

actively structures, selects, and exploits such information; it is crucial to take this characteristic into

consideration at design time. By information self-structuring we mean that the agent has an active role

in shaping its own sensory input through self-produced movements.

The first important, information-theoretic, implication of this principle is that embodiment (i.e.,

sensory morphology, embodied interaction, and so on) directly affects the information-processing ca-

pacities of the agent’s control system, in the sense that it allows the agent to generate constraints in

its sensory input. Indeed, appropriate morphological constraints and self-produced coordinated move-

ments can induce spatio-temporal patterns in the raw (unprocessed) sensory input, and generate “good”

data that are easier to process, simplifying the control problem faced by the agent. The choice of a par-

ticular sensory morphology, for instance, has been shown to improve the learning performance and the

adaptability of a neural network controller embedded in a robot (Lichtensteiger and Pfeifer, 2002). Fur-

ther, it has been hypothesized that specific ways of interacting with the environment induce constraints

in the agent’s sensory input that can be exploited for learning (Scheier et al., 1998; Lungarella and


Pfeifer, 2001). In humans, experiments by Harman et al. (1999) have shown that the active manipula-

tion of objects by adult subjects can promote perceptual learning and object recognition. Mapped onto a

neural context this means that it is easier for neural circuits to exploit sensory data having informational

regularities, and to stabilize neural connections that incorporate recurrent statistical features (Sporns

and Pegors, 2004). It is therefore plausible to assume that to simplify neural computations, natural

systems are optimized, at evolutionary, developmental and behavioral time scales, to structure their

sensory input through self-produced coordinated motor activity. This characteristic of neural systems

has to be taken into account when designing artificial systems.

There is a second, equally important implication of this principle: The embodiment of an agent

generates over time, correlations and redundancies across multiple sensory modalities, which may not

only lead to a disambiguation of the sensory input and to a reduction of the effective dimensionality of

the sensory space, but could also be exploited to bootstrap concept formation, categorization, and other

high-level cognitive processes (Thelen and Smith, 1994). From a design point of view it is crucial to

note that along with the interaction, the location of the agent’s sensors also imposes constraints on the

sensory input, and that sensors should be positioned so as to provide redundant information about the

world.

This principle is addressed in chapters 7, 8, and 9, and in (Lungarella and Sporns, 2004), where it

is fleshed out with concrete examples. Methods for quantifying the informational structure of sensory

and motor data are also presented.

1.5.8 The principle of exploratory activity

This principle states that exploratory activity is a fundamental active process by which an embodied

agent collects information for learning about its own body, and for mastering the interaction with its

surrounding environment. It is thus necessary to equip the agent with a set of mechanisms to perform

exploratory actions, and for evaluating the results at its disposal.

In the context of this principle, we make the distinction between two types of exploratory activities,

or exploration strategies: (a) external exploration, that is, a set of active processes by which an agent

gathers information for learning about the surrounding world; and (b) internal exploration (or self-

exploration), that is, a set of active processes by which an agent gathers information for learning about

its own body dynamics (see also Chapter 2). The first type of exploratory activity is oriented toward

the maximization of task-relevant sensory information. For example, hand-eye coordination begins

to develop between two and four months, inaugurating a period of trial-and-error practice at sighting

objects and grabbing them. At four months most infants can grasp an object that is within reach,

looking only at the object and not at their hands. Such results is truly impressive, and – as asserted by

the principle of exploratory activity – the result of a unceasing exploratory actions performed by the


infant. The second type of exploration strategy is oriented at maximizing body-relevant information

and movement variety. For example, by exploring force and timing combinations, and by integrating

various kinds of environmental information, a multitude of patterns of inter-segmental coordination can

be learned. Newborn infants, for instance, have been observed to spend up to 20% of their waking hours

contacting their face with their hands (Korner and Kraemer, 1972). In analogy to vocal babbling, this

experiential process has been called circular reaction (Piaget, 1953), “motor babbling” (Bullock et al.,

1993), or “body babbling” Meltzoff and Moore (1997). Essentially, it represents an exploratory process

of perception-action coupling during which body-related (proprioceptive) information generated by

perceiving and acting on objects and environment becomes correlated. The result is a unique mapping

between limb movements and limb configurations. Other examples of exploratory processes are the

rhythmical stereotypies displayed by infants (kicking, waving, banging, bouncing) and spontaneous

movement activity (Angulo-Kinzler, 2001; Thelen and Smith, 1994).

It is important to note that such exploratory processes include varying levels of attention (see also

Adolph et al., 2000). Thus, an agent should be endowed with adequate mechanisms that depending

on the state of its attentional system would allow it to select different kinds of exploratory behaviors.

On one extreme of inattentiveness are spontaneous wiggles, trashes, and movement stereotypies which

produce sensory information related to body and posture, that is, position of limbs and body relative to

gravity and the supporting surface. In a sense, almost all sensory-motor coordinated movements belong

to this category. On the other extreme of focused attention are concerted, directed movements whose

purpose is to generate and garner additional information about possibilities for action, or properties of

objects. Situated somewhere in the middle of this continuum are causal exploratory scans (e.g., visual

exploration while walking) and information gathering movements which are the byproduct of another

ongoing action (the walking movements themselves generate visual flow, vestibular, and kinesthetic

information about the status of the body relative to the environment, for instance).

From a design perspective it is useful to identify the spectrum of possible types of exploratory

activity. That is, given a certain task environment, what is the best way for the agent to explore its

parameter space? Plausibly, pure random exploration is inefficient, and in the long term exhaustive.

Conversely, pure systematic exploration requires too much time, and the system may end up exploring

uninteresting areas of its parameter space. The “value principle” comes to rescue. Random exploration

coupled to a value system evaluating the outcome of a particular action may indeed provide a satisfying

path.

The primary difficulty with this principle is the exploration-exploitation dilemma which every

learning system has to cope with. That is, the dilemma between exploring the space of possible pa-

rameters (e.g., weights of an artificial neural network, time constants of a neural oscillator, and so on),

while simultaneously exploiting the good parameter configurations that exploration has already uncov-

ered. This dilemma is also known as the stability-plasticity dilemma or diversity-compliance trade-off.


It refers to a trade-off between a conservative aspect that exploits (complies to) the givens (compliance

with rules), and one that is responsible for generating the diversity required to remain adaptive. In other

words, there is always a trade-off between generating new solutions, being flexible and innovative, and

complying with the existing rules, exploiting was is already known.

The thesis highlights the importance of exploratory activity from the perspective of dynamical

systems (chapters 3, 4, and 5), and from an information-theoretic and statistical point of view (chap-

ters 7, 8, and 9). An in-depth view of these principles is provided in Chapter 2 and in Chapter 6, where

we first further motivate its necessity for intelligent adaptive behavior, and then present a value-based

stochastic exploration scheme.

1.5.9 The principle of social interaction

This principle states that when designing a developmental agent, it is important to think about potential

social interactions of the agent, and about mechanisms that can be exploited to implement socially

mediated learning; these mechanisms have to be taken into account at design time.

It is well known that human infants are endowed from a very early age with the necessary means

to engage in simple, but nevertheless crucial social interactions, e.g., they show preferences for hu-

man smell, human faces and speech (Johnson, 1997; Nadel and Butterworth, 1999), and they imitate

protruding tongues, smiles, and other facial expressions (Meltzoff and Moore, 1977). Indeed, social

interaction bears many potential advantages: (a) It helps structure the agent’s environment simplifying

and speeding up the learning of tasks and the acquisition of new skills; (b) it shapes its developmental

trajectory and epigenetic landscape, and increases the agent’s behavioral diversity. Scaffolding by a

more capable agent or caregiver, or imitation of a peer, for instance, can reduce distractions and bias

explorative behaviors toward important environmental stimuli. The caregiver can also increase or de-

crease the complexity of the task. A particularly important type of interaction is scaffolding. Typically,

it is employed to shape and guide the development of infants. Scaffolding is a supportive framework,

usually provided by a more capable agent (e.g., an adult), that enables a less capable (infant) agent to

perform activities of which it may not be capable on his own until somewhat later. For example, infants

demonstrate the ability to walk if they are supported in the right way long before their leg muscles have

developed sufficient strength to hold them up (Thelen, 1981). The scaffolding continuously pushes the

infant agent a little beyond its current capabilities, and pushes it in the direction in which its “caregiver”

wishes it to go.

In essence, this principle copes with the question of how to prepare (engineer) an agent’s local envi-

ronment so that the agent can acquire new and progressively more complex skills over time. Although,

we acknowledge the crucial importance of social interaction for the emergence and development of

cognitive structure in man and machines (see Chapter 2 for detailed discussion of the issue), in the


context of this thesis, we deliberatively chose to avoid touching socio-historical aspects of develop-

ment.

1.5.10 Discussion

Although it is undeniably true that the proposed set of principles does capture some of the essential

aspects of adaptive, developmental systems, it is also most likely the case that this set is neither close

nor complete. Further, it is important to emphasize that despite being bolstered by empirical evidence,

these principles are still to be regarded as working hypotheses on the nature of developmental systems.

In fact, it remains to be seen whether they can stand up against further (possibly harder) empirical

testing. While being subjected to such tests, the proposed principles of design should also enable the

generation of testable hypotheses (see Fig. 1.2). In other words, they should not only be effective as

design heuristics, but they should also help us pose interesting questions. Moreover, because one of

the purposes of these principles is to appropriately characterize intelligent developmental systems, they

should allows us to make predictions, and suggest new experiments to perform with natural systems

(e.g., animals and humans), as well as with artificial ones.

It is essential that the design principles we have formulated not be looked at in isolation. Their

real power comes from their interdependencies. All the principles, in fact, are connected because they

all pertain to developmental agents embedded in their task environments. Some principles are more

closely related than others, however. In what follows a few examples of dependencies are given. Let

us start with the value principle which states, among other things, that for a developmental process

to take place, and for an agent to behave adaptively, a repertoire of basic values and motivations of

sorts must be provided by the designer. The idea of providing value is to increase, at a later point

in time, the probability that the organism will behave in a certain way if a similar situation occurs.

Whereas motivation drives behavior, values serve as implicit or explicit evaluators of behavior. Such

values represent clearly a necessary condition for social interaction, and can be exploited for socially

mediated learning. An agent, for instance, could be endowed with the means to discern a smile from

a non-smile, and thus be able to visually acquire positive or negative feedback from a caretaker or a

tutor. Basic motivations and values constitute also the engine of an agent’s exploratory activity (as long

as such activity is not completely random or systematic), and of its behavioral diversity. Self-directed

activity, albeit not necessarily oriented toward a functional goal, is also likely to induce spatial and

temporal structure across different sensory modalities. By adequately exploiting the self-structuring of

information and recurrent patterns in the system-environment interaction, it might be possible to start

simpler, and to reduce the complexity of the control structure. Besides to the starting simple principle,

this is also clearly related to the principles of cheap design, and ecological balance.

As can be inferred from this short discussion, every principle is affected simultaneously by other


Principle Synopsis

Cheap design The design of a developmental agent must be parsimonious, andmust exploit the physics of the system-environment interaction, aswell as the constraints of the agent’s ecological nice.

Ecological balance The agent’s complexity must match the complexity of the environ-ment as measured by the agent’s sensors; further a balance isrequired between the complexity of motor, sensory, and controlsystem.

Value For a developmental process to take place and for an agent tobehave adaptively in the real world, a set of mechanisms for self-supervised learning, and a repertoire of innate values and motiva-tions must be provided that direct the development of the agent’scontrol and bodily structure.

Design for emergence The agent should not be completely designed, but rather shouldbe endowed with the ability to self-direct the exploration of its ownsensory-motor capabilities, and with means to escape its limitedbuilt-in behavioral repertoire, and to acquire its own history.

Time scale integration When designing a developmental agent, a number of differenttime scales exist that have to be taken into account; develop-mental and learning mechanisms must be conceived to achievea smooth integration of those time scales.

Starting simple A gradual and well-balanced increase of the agent’s internalcomplexity and of its external complexity speeds up learning oftasks and acquisition of new skills; the mechanisms by which theagent’s internal and external complexity can be successively in-creased and integrated with its neural and morphological dynam-ics need to be specified.

Information self-structuring An embodied agent does not passively absorb information fromits surrounding environment, but due to its particular morphology,and through its actions on the environment, it is able to activelystructure, select, and exploit such information; this characteristichas be taken into account at design time.

Exploratory activity Exploratory activity is a fundamental process by which an agentcollects information for learning about its own body, and controlstructure, and for mastering the interaction with its surroundingenvironment. Thus, it is necessary to equip the agent with a setof mechanisms to perform exploratory actions, and for evaluatingand exploiting the results at its disposal.

Social interaction When designing a developmental agent, it is important to thinkabout potential social interactions of the agent, and about mecha-nisms that can be exploited to implement socially mediated learn-ing; these mechanisms have to be taken into account at designtime.

Table 1.1: Overview of design principles for developmental systems.

1.6. Contributions of the thesis 19

ones. Such couplings makes their investigation challenging. The aim of this thesis is to substantiate

the proposed principles and their mutual interdependencies, and to flesh them out by means of a series

of experimental case-studies (see Fig. 10.1).

Figure 1.3: Experimental variety: seven chapters, seven case-studies. The labels denote oneor two design principle(s) the case-study is mainly intended to address. The numbers indicatethe chapter in which the case-study is presented.

1.6 Contributions of the thesis

Adopting a developmental approach leads to a novel perspective on the design and construction of

robots, and to a promising methodology for modeling many of the processes and mechanisms under-

lying development. This thesis advances the nascent field of developmental robotics in several ways:

• it proposes a set of design principles for developmental systems which may constitute an ad-

equate “take-off platform” for a developmental theory of embodied artificial intelligence (this

1.6. Contributions of the thesis 20

chapter);

• it gives a state-of-the-art survey of the emergent field of developmental robotics (Fig. 1.2), and

points out ten aspects (“facets”) of biological development that should be addressed to advance

the field (Chapter 2);

• it shows how robots can successfully be employed to confirm hypotheses or observations made

in developmental or movement science (chapters 3, 4, and 5);

• it shows how initially freezing and subsequently freeing degrees of freedom can increase (a) the

likelihood of physical entrainment, (b) the range of parameters that lead to stable behavior, (c)

the robustness of the system against external perturbations, and (d) the speed and efficiency of

the exploration of the sensory-motor space (Chapter 3);

• it shows – confirming recent observations made in movement science – that if the complexity

of the task is increased a single phase of freezing and freezing is not sufficient, and alternate

freezing and freeing is necessary (Chapter 4);

• it quantitatively investigates the role of the coupling (a) between joints, and (b) between the sen-

sory apparatus and the neural structure for the acquisition of rhythmic motor skills (chapters 3, 4,

and 5);

• it proposes two value-based exploration schemes, and shows how they can be put to work to

explore the parameter space of a robot’s control system (chapters 3 and 6);

• it provides quantitative support for the assertion that embodied systems are not passively exposed

to sensory information, but courtesy of their morphology can self-structure such information

(chapters 7, 8, and 9).

• it presents initial analyses demonstrating how simple sensory-motor functions like gaze direction

and foveation can generate informational structure (in this case, mutual information) in the visual

channel (Chapter 7);

• it shows how information theoretic and statistical measures can be used to quantify (a) the

amount of informational structure induced by sensory-motor coordination (chapters 7 and 9),

and (b) the agent-environment interaction (chapters 8 and 9);

Chapter 2

Developmental Robotics: The Long

Version1

Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce

one which simulates the child’s? Bit by bit one would be able to allow the machine to make more

and more “choices” or “decisions.” One would eventually find it possible to program it so as to

make its behaviour the result of a comparatively small number of general principles. When these

became sufficiently general, interference would no longer be necessary, and the machine would

have “grown up.” (Turing, 1948)

2.1 Synopsis

Developmental robotics is an emerging field located at the intersection of robotics, embodied artificial

intelligence, and developmental science. This chapter elucidates the main reasons and key motivations

behind the convergence of fields with seemingly disparate interests, and shows why developmental

robotics might prove to be beneficial for all fields involved. The advocated methodology is synthetic

and two-pronged: on the one hand it employs robots to instantiate models originating from develop-

mental sciences; on the other hand, by exploiting insights gained from studies on ontogenetic develop-

ment, it aims at developing better robotic systems. This chapter gives a survey of the relevant research

issues, and points to some future research directions.

1Appeared as Lungarella, M., Metta, G., Pfeifer, R. and Sandini, G. Developmental robotics: a survey. Connection Sci-ence, 15(4), pp.151-190, 2004.

21

2.2. Introduction 22

2.2 Introduction

Developmental robotics is an emergent area of research at the intersection of robotics and developmen-

tal sciences – in particular developmental psychology and developmental neuroscience. It constitutes

an interdisciplinary and two-pronged approach to robotics, which on one side employs robots to instan-

tiate and investigate models originating from developmental sciences, and on the other side seeks to

design better robotic systems by applying insights gained from studies on ontogenetic development.2

Judging from the number of recent and forthcoming conferences, symposia, and journal special

issues, it is evident that there is growing interest for developmental robotics: Workshop on Emergence

and Development of Embodied Cognition (Berthouze et al., 1999; Pfeifer and Lungarella, 2001), work-

shop on Epigenetic Robotics (Balkenius et al., 2001; Prince et al., 2002, 2003; Berthouze and Metta,

2004), workshop on Developmental Embodied Cognition (Westermann et al., 2001), workshop and

conferences on Development and Learning (Weng, 2000; Elman et al., 2002; Triesch and Jebara, 2004),

special issue of Adaptive Behavior on “plastic mechanisms, multiple timescales and lifetime adapta-

tion” (Di Paolo, 2002), the one on “Epigenetic Robotics” (Prince and Demiris, 2003), and a special

issue of Connection Science (Berthouze and Ziemke, 2003). There are at least two distinct driving

forces behind the growth of the alliance between developmental psychology and robotics:

• Engineers are seeking novel methodologies oriented toward the advancement of robotics, and the

construction of better, that is, more autonomous, adaptable and sociable robotic systems. In that

sense, studies of cognitive development can be used as a valuable source of inspiration (Brooks

et al., 1998; Metta, 2000; Asada et al., 2001).

• Robots can be employed as research tools for the investigation of embodied models of develop-

ment. Neuroscientists, developmental psychologists, and also engineers, may gain considerable

insights from trying to embed a particular model into robots. This approach is also known as

synthetic neural modeling, or synthetic methodology (Pfeifer and Scheier, 1999; Pfeifer, 2002;

Reeke et al., 1990; Sandini, 1997; Sporns, 2003).

The research methodology advocated by developmental robotics is very similar to the one sup-

ported by epigenetic robotics. The two research endeavors not only share problems and challenges

but are also driven by a common vision. From a methodological point of view both partake of a

biomimetic approach to robotics known as biorobotics, which resides at the interface of robotics and

biology. Biorobotics addresses biological questions by building physical models of animals, and strives

to advance engineering by integrating aspects of animal sensory systems, biomechanics and motor con-

trol into the construction of robotic systems (Beer et al., 1998; Lambrinos et al., 1997; Sharkey, 2003;

2Ontogenetic development designates a process during which an organism develops from a single cell into its adult form.

2.3. In the beginning there was the body 23

Webb, 2001). There is, however, at least one important difference of emphasis between epigenetic

robotics and developmental robotics: While the former focuses primarily on cognitive and social de-

velopment (Zlatev and Balkenius, 2001), as well as on sensory-motor environmental interaction (Prince

and Demiris, 2003), the latter encompasses a broader spectrum of issues, by investigating also the ac-

quisition of motor skills, and the role played by morphological development. In the context of this

review, the difference will not be stressed any further.

The primary goal of this chapter is to present an overview of the state of the art of developmental

robotics (and hence of epigenetic robotics), and to motivate the usage of robots as “cognitive” or

“synthetic’ tools”, that is, as research tools to study and model the emergence and development of

cognition and action. From a methodological point of view, this review is not intended to be critical.

Developmental robotics is still in its infancy, and an indication of the pros and cons of specific pieces of

research may be premature. We hope, however, that the review will offer new perspectives on certain

issues and point out areas in need of further research. The secondary goal is to uncover the driving

forces behind the growth of developmental robotics as a research area, and to expose its hopefully

far-reaching implications for the design and construction of robotic systems. We advocate the idea that

ontogenetic development should not only be a source of inspiration but also a design alternative for

roboticists, as well as a new and powerful tool for cognitive scientists.

In the following section, we make an attempt to trace back the origins of developmental robotics,

which we believe are to be found in the rejection of the cognitivistic paradigm by many scholars of

artificial intelligence. Next, we present our working definition of ontogenetic development, and sum-

marize some of its key aspects. In the following sections, we give an overview of the various current

and past research directions (including motivations and goals), show who is doing or has been doing

what and to what purpose, and discuss the implications of the developmental approach for robotics

research. In the final section, we point to future research directions and conclude.

2.3 In the beginning there was the body

In an ever-growing number of fields there is an ongoing and intense debate about the usefulness of

taking into account ideas of embodiment, i.e., the claim that having a body, which mediates perception

and affects behavior, plays an integral role in the emergence of human cognition. Scholars of arti-

ficial intelligence, artificial life, robotics, developmental psychology, neuroscience, philosophy, and

other disciplines, seem to agree on the fact that brain, body and environment are reciprocally coupled,

and that cognitive processes arise from having a body with specific perceptual and motor capabilities,

interacting with and moving in the real world (Beer et al., 1998; Brooks, 1991; Clark, 1997; Hendriks-

Jensen, 1996; Lakoff and Johnson, 1999; Pfeifer and Scheier, 1999; Sporns, 2003; Thelen and Smith,

1994; Varela et al., 1991). This paradigm stands in stark contrast to the mind-as-computer metaphor


Figure 2.1: Examples of robots used in developmental robotics. From left to right, top tobottom: BabyBot (LiraLab), BabyBouncer (AIST), Infanoid (CRL), COG (MIT).

advocated by traditional cognitive science, according to which the body is seen as an output device that

merely executes commands generated by a rule-based manipulation of symbols, which are associated

with an internal representation of the world (Fodor, 1981; Newell and Simon, 1976). Perception is

largely seen as a means of creating an internal representation of the world rich enough to allow rea-

soning and cognizing to be conceptualized as a process of symbol manipulation (computer program),

which can take place entirely in the mind. One of the most unfortunate consequences of the mind-as-

computer metaphor for cognitive science and artificial intelligence in general, and for developmental

psychology and robotic research in particular, has been the tacit acceptance of a strong separation be-


tween cognitive structure (i.e., symbols and representations), the software operating on that structure

(i.e., mechanisms of attention, decision making and reasoning), and the hardware on which to im-

plement the software (Bates and Elman, 2002; Brooks, 1991; Pfeifer and Scheier, 1999; Thelen and

Smith, 1994). Another assumption of the cognitivistic research paradigm was a denial of the impor-

tance of ontogenetic development by rationalists-nativists (Chomsky, 1986; Keil, 1981). In the field of

language acquisition, for instance, Chomsky theorized that all languages derive from a universal gram-

mar, somehow encoded in our genome. The purpose of development and learning was to merely fine

tune some parameters to a specific language. The same cognitivistic approach also hypothesized accu-

rate, symbol-based representations of the real world (Newell, 1990), as well as task-specific models of

information processing and reasoning (Pylyshyn, 1984).

Out of dissatisfaction with the direction (cognitive) psychology was heading to, and to overcome

the limitations inherent to the rather artificial distinction of the developmental phenomena into domain-

specific competencies and modules (Fodor, 1983), Masao Toda proposed the study of “Fungus Eaters”,

i.e., simple but nevertheless complete and autonomous creatures endowed with everything needed to

behave in the real world (Toda, 1982). Around the same time Braitenberg (1984) defined the “law

of uphill analysis and downhill synthesis” 3 and argued for the introduction of a novel methodology

in psychology, which he called “synthetic psychology.” Two similar approaches followed: “synthetic

neural modeling” (Reeke et al., 1990), which attempts to correlate neural and behavioral events taking

place at multiple levels of organization, and the synthetic methodology (Pfeifer and Scheier, 1999), a

wider term that embraces the whole family of synthetic approaches. The shared common goal of syn-

thetic approaches is to seek an understanding of cognitive phenomena by building physical models of

the system under study. Typically they are applied in a bottom-up way: Initially, a simple system (e.g.,

with a small number of sensors) is built and explored, then its complexity is successively increased

(e.g., by adding sensors) if required to achieve a desired behavior.

The extension of the synthetic methodology to include development is a conceptually small step.

First, development is a process during which changes in all domains of function and behavior occur

from simple to complex (see Section 2.4.1). Therefore it is reasonable to assume that its key aspects

can be captured by means of a bottom-up synthetic approach. Second, cognitive development cannot

be isolated from the body in which it is instantiated, and from the real world in which it is embedded,

and with which the body physically interacts. As a matter of fact, the traditional approach (based

on the computer metaphor) has ultimately failed to address the intimate linkage between brain, body

and environment, and to study behavioral and neural changes typical of ontogenetic development that

are important for the emergence of cognition. The construction of an artificial system through the

3Also known as the law of uphill analysis and downhill invention. This law suggests that the synthesis (construction)of something new is easier than the analysis of something that already exists. We contend, however, that the definition of acomprehensive set of quantitative design principles or – even better – of a theory of synthesis for behaving system is a muchharder problem.


application of a “developmental synthetic methodology”, however, is not straightforward. An adequate

research methodology as well as a good set of design principles supporting such a methodology are

still open research issues. One possible reason being the difficulty in disentangling the complex notion

of development itself, which is – as we will show in the following section – multifaceted, non-linear,

complex, and yet to be fully understood.

The central tenet of embodied cognition is that cognitive and behavioral processes emerge from

the reciprocal and dynamic coupling between brain, body and environment. Since its inception, this

view has spawned paradigm changes in several fields, which in turn have influenced the way we think

about the role of embodiment for the emergence of cognition. Ballard (1991), for instance, introduced

the concept of animate or active vision, which states – roughly speaking – that visual processes can be

simplified if visual sensing is appropriately intertwined with acting and moving in the world (see also

Churchland et al., 1994). By employing active vision, problems such as figure/ground segmentation

or estimation of shape from shading, become well-conditioned. The paradigm change expresses how

action and motor control contribute to the improvement of perceptual abilities. Biological systems are

not passively exposed to sensory input but instead interact actively with their surrounding environment.

Thus accordingly, the “Holy Grail of Artificial Intelligence”, and that is, a computerized general vision

system, has to be viewed as strictly dependent on the availability of a controllable body coupled to a

less controllable world. In a similar vein, Brooks (1991) showed that behavior does not necessarily

have to rely on accurate models of the environment, but rather might be the result of the interaction of

a simple system with a complex world. In other words, there is no need to build enduring, full-scale

internal models of the world, because the environment can be probed and reprobed as needed. More

recently, Pfeifer and Scheier (1994) argued that a better global understanding of the perception-action

cycle might be required – contrary to our intuition.4 The authors proposed an alternative view that

breaking up perception, computation, and action into different subsystems might be too strong of a

commitment. In other words, the minimal unit of processing should be a complete perception-action

cycle. Neurophysiology too contributed to the paradigm change. Emblematic was the discovery of vi-

sually responsive motor neurons supporting the hypothesis of an intimate coupling between vision and

action in the definition of higher cognitive abilities, such as object and action recognition (Di Pellegrino

et al., 1992; Gallese et al., 1996). Fascinating, along the same line of research, is also the link between

action and language proposed by Rizzolatti and Arbib (1998), who argued that the visuo-motor neu-

rons found in the area F5 of monkeys are most probably the natural homologue of the Broca’s area of

humans.4This point was also very strongly made by Dewey over 100 years ago (Dewey, 1896).

2.4. Facets of development 27

2.4 Facets of development

Ontogenetic development is commonly seen as a process of change whereby appropriate biological

structure and skills emerge anew in an organism through a complex, variable, and constructive interplay

between endogenous and environmental factors (Johnson, 1997). Unlike development and maturation,

which involve species-typical growth, and changes at the level of cell, tissue and body, learning is

experience-dependent, and is often characterized by a relatively permanent change of behavior result-

ing from exercise and practice (e.g., Chec and Martin, 2002). The debate nowadays gravitates around

the precise nature of the interaction between learning and development. There are at least three leading

views. The first one is closest to the one of Piaget, and sees learning capitalizing on the achievements

of development (Piaget, 1953). The interaction is unidirectional, and learning cannot occur unless a

certain level of development has been achieved. The second view is bidirectional and states that learn-

ing and development are mutually coupled, in the sense that the developmental process enables, limits,

or even triggers learning, and learning in turn advances development (Kuhl, 2000). The third, more

radical view, which accommodates continuity and change under the theoretical umbrella of dynamic

systems theory (Thelen and Smith, 1994), suggests to erase the boundaries between development and

learning altogether, while considering dynamics at all levels of organization (molecular, cellular, struc-

tural, functional, and so on). We do not take any position in this or other debates. We are convinced,

however, that using robots as tools for scientific investigation might provide a route to disentangle open

issues – such as the nature of the interaction between development and learning. An additional advan-

tage of the proposed methodology is that we can simply build various assumptions into the system and

perform tests as we like – no ethical issues involved. The latter point is perhaps a less obvious, but

equally important justification for this area of research. It is relatively straightforward, for instance,

to build pathological conditions into a robot’s sensory, motor and neural systems (e.g., by lesioning

or augmenting its sensory-motor apparatus). Thus, robotic models cannot only help elucidate princi-

ples underlying normal (healthy) development, but they may as well provide insight into disease and

dysfunction of brain, body, and behavioral processes.

In the remainder of this section, we review several important facets (components) of ontogenetic

development, and give pointers to some of the pertinent literature. The reader should bear in mind that

we do not intend to give an exhaustive account of biological development. Both, our choice of what to

include and what to discard is therefore limited and biased by our beliefs of what is deemed important

and what is not. However, we do intend to convey the message that these seemingly disparate facets of

development are closely intertwined, and that – if taken into account during the design and construction

of artificial systems – they can represent a valuable source of inspiration. We also point out that many of

these aspects can and should be conceptualized as principles for the design of intelligent developmental

systems. A set of generalized principles for agent design can be found in (Pfeifer, 1996) and in (Pfeifer


and Scheier, 1999). For quick reference, we summarized the list of facets in Table 2.1.

Facet Synopsis References

Incremental process prior structures and functions are nec-essary to bootstrap later structures andfunctions

Piaget (1953), Thelen and Smith(1994)

Importance of con-straints

early constraints can lead to an increaseof the adaptivity of a developing organ-ism

Bushnell and Boudreau(1993), Elman (1993), Hendriks-Jensen (1996), Turkewitz andKenny (1982)

Self-organizing process development and learning are not deter-mined by innate mechanisms alone Goldfield (1995), Kelso

(1995), Thelen and Smith(1994)

Degrees of freedom constraining the movement space maybe benefical for the emergence of well-coordinated and precise movements

Bernstein (1967), Goldfield(1995), Sporns and Edelman(1993)

Self-exploration self-acquired control of body dynamicsAngulo-Kinzler (2001), Goldfield(1995), Thelen and Smith(1994)

Spontaneous activity spontaneous exploratory movementsare important precursors of motorcontrol in early infancy

Piek (2002), Prechtl (1997)

Prospective control,early abilities

predictive control is a basic early compe-tency on top of which human cognition isbuilt

Meltzoff and Moore(1997), Spelke (2000), Von Hof-sten et al. (1998)

Categorization, sensori-motor coordination

categorization is a fundamental ability,and can be conceptualized as a senso-rimotor interaction with the environment

Edelman (1987), Thelen andSmith (1994)

Value systems value systems mediate environmentalsaliency and modulate learning in a self-supervised and self-organized manner

Sporns and Alexander (2002)

Social interaction interaction with adults and peers are veryimportant for cognitive development Baron-Cohen (1995), Meltzoff

and Moore (1977), Vygotsky(1962)

Table 2.1: Facets of development at a glance.

2.4.1 Development is an incremental process

By assuming a certain level of abstraction, development in virtually any domain (e.g., nervous sys-

tem, motor apparatus, cognition) can be described as a sequence of stages through which the infant

advances. Indeed, the idea that development may be an incremental process is not novel, and has been


already proposed by Jean Piaget in his theory of stages of cognitive development more than 50 years

ago (e.g Piaget, 1953), as well as by Eleanor Gibson, who suggested to decompose infant exploration

into three distinct phases (Gibson, 1988). The apparent stage-like nature of development, however,

does by no means imply stable underlying processes, characterized by a well-ordered, discontinuous,

and incremental unfolding of clearly defined stages (as suggested by Piaget). Thelen and Smith (1994)

give evidence for the opposite: depending on the level of observation, development is messy, and fluid,

full of instabilities, and non-linearities, and may even occur with regressions. There may be rapid

spurts, such as the onset of babbling, as well as more protracted, and gradual changes, such as the

acquisition of postural control. Various systems (for example, the perceptual and the motor system) do

not even change at the same rate. This list of properties is a clear indication of how challenging the

study of developmental changes is. An additional difficulty arises from the fact that those changes are

both qualitative (e.g., transition from crawling to walking), and quantitative (e.g., increase of muscle-

fat ratio). Another important characteristic of the developmental progression is that later structures

build up on less complete and less efficient prior structures and their behavioral expression. In other

words, the former structures provide a background of subskills and knowledge that can be re-used

by the latter. The mastery of reaching, for instance, requires an adequate gaze and head control, and

a stable trunk support. The latter being even more important for fine manipulation (Bertenthal and

Von Hofsten, 1998). Finally, we point out the absence a central executive behind this developmental

progression. In other words, development is largely decentralized, and exhibits the properties of a

self-organizing system (see Sec. 2.4.3).

2.4.2 Development as a set of constraints

The notion of initial constraints or of “brake” on development is often invoked in order to explain de-

velopmental trajectories (Bushnell and Boudreau, 1993; Harris, 1983). Examples of constraints present

at birth in many vertebrate species (e.g., rats, cats, humans) are the limitations of the organism’s ner-

vous system (such as neural connectivity and number of neuronal cells), and of its sensory and motor

apparata (such as reduced visual acuity, and low muscle strength). Because each developmental step

somehow establishes the boundary conditions for the next one, a particular ability cannot emerge if any

of the capacities it entails is lacking. Thus, particular constraints can act (metaphorically speaking) as a

brake on development. These rate-limiting factors (as they are also called sometimes) are not necessar-

ily a bad thing. Turkewitz and Kenny (1982) pioneered the theoretical position that early morphological

limitations and constraints, can lead to an increase of the adaptivity of a developing organism (see also

Bjorklund and Green, 1992). That is, the immaturity of the sensory and the motor system, which at first

sight appears to be an inadequacy, is of advantage, because it effectively decreases or eliminates the

“information overload” that otherwise would most certainly overwhelm the infant. According to this


hypothesis, the limited acuity of vision, contrast sensitivity, and color perception of neonates (Slater

and Johnson, 1997, p. 126) may actually improve their perceptual efficiency by reducing the com-

plexity of the environmental information impinging on their visual system (for additional examples,

see Hendriks-Jensen, 1996). Following similar lines of argumentation, several other researchers have

suggested that processing limitations of young learners, originating from the immaturity of the neural

system, can actually be beneficial for learning itself (Dominguez and Jacobs, 2003; Elman, 1993; New-

port, 1990; Westermann, 2000). In other words, constraints can be interpreted as particular instances

of “ontogenetic adaptations”, that is, unique adaptations to the environment throughout development,

which effectively simplify the world, and hence facilitate learning (Bjorklund and Green, 1992).

2.4.3 Development as a self-organizing process

A fundamental characteristic of self-organization is that structured patterns or global order can emerge

from local interactions between the components constituting a system, without the need for explicit

instructions or a central program (see also Sec. 2.4.1). In this sense, development largely unfolds in a

self-organized fashion. The earliest actions of human infants, for instance, are spontaneous and exhibit

the typical properties of a self-organizing system (Goldfield, 1995; Sporns and Edelman, 1993; Thelen

and Smith, 1994). A growing body of evidence has shown that the control of movements of particu-

lar (exploratory) actions is not determined by innate mechanisms alone, but rather, emerges from the

dynamics of a sufficiently complex action system interacting with its surrounding environment (Bern-

stein, 1967; Goldfield, 1995; Kelso and Kay, 1987; Taga, 1991, 1995). In other words, the dynamics

of the interaction of infants and their surroundings modulates the ever-changing landscape of their ex-

ploratory activities. The intrinsic tendency to coordination or pattern formation between brain, body,

and environment, is often referred to as entrainment, or intrinsic dynamics (Kelso, 1995). Gentaro

Taga, for instance, was able to show that rhythmic movements (in his case: walking) can emerge from

what he called a “global entrainment” among the activity of the neural system, the musculo-skeletal

system, and the surrounding environment (Taga, 1991). Another vivid illustration of a dynamically

self-organized activity was provided by Thelen (1981). She found that the trajectory and the cyclic

rhythmicity of kicks displayed by human infants and the intrinsic timing of the movement phases was

the “result of cooperative (and local) interactions of the neuro-skeletal muscular system within partic-

ular energetic and environmental constraints” (Thelen and Smith, 1994, p. 79).

Processes of self-organization and pattern formation are not confined to the learning and the de-

velopment of movements but are an essential feature of biological systems at any level of organiza-

tion (Kelso, 1995). Iverson and Thelen (1999), for instance, invoked entrainment, and other principles

of dynamic coordination typical of self-organized behavior, to explain the developmental origins of

gestures that accompany the expression of language in speech; Edelman (1987) hypothesized that


perceptual categorization – one of the primitives of mental life – arises autonomously through self-

organization; and finally, even the amazing complexity of the brain has been proposed to be the result

of a process of self-organized ontogenesis (Von der Malsburg, 2003).

2.4.4 Degrees of freedom and motor activity

Perhaps not surprisingly, movements of infants lack control and coordination compared with those of

adults. The coordination of movements (in particular in humans) is very poor at birth and undergoes a

gradual maturation over an extended period of postnatal life. Examples of this developmental progres-

sion are crawling (Adolph et al., 1998), walking with support (Haehl et al., 2000), walking (Thelen and

Smith, 1994, p. 71), reaching and grasping (Streri, 1993). Despite the fact that the musculo-skeletal

apparatus is a highly non-linear system, with a large number of biomechanical and muscular degrees

of freedom,5 and in spite of the potential redundancy of those degrees of freedom in many move-

ment tasks (i.e., the activation of different muscle groups can lead to the same movement trajectory),

well-coordinated and precisely controlled movements emerge. The “degrees of freedom problem”,

first pointed out by Bernstein (1967), has recently attracted a lot of attention (Goldfield, 1995; Sporns

and Edelman, 1993; Vereijken et al., 1992; Zernicke and Schneider, 1993). A possible solution to the

control issues raised by the degrees of freedom problem, that is how – despite the complexity of the

neuro-musculo-skeletal system – stable and well-coordinated movements are produced, was suggested

by Bernstein himself. His proposal is characterized by three stages of change in the number of degrees

of freedom that takes place during motor skill acquisition: Initially, in learning a new skill or move-

ment, the peripheral degrees of freedom (the ones further from the trunk, such as wrist and ankle) are

reduced to a minimum through tight joint coupling (freezing of degrees of freedom). Subsequently,

the restrictions at the periphery are gradually weakened so that more complex movement patterns can

be explored (freeing of degrees of freedom). Eventually preferred patterns emerge that exploit reactive

phenomena (such as gravity and passive dynamics) so as to enhance efficiency of the movement. The

strong joint coupling of the first phase has been observed in spontaneous kicking in the first few months

of life (Thelen and Fischer, 1983), and is thought to allow infants to learn without the interference of

complex, uncoordinated motor patterns.

Recently, the straightforward, but rather narrow and unidirectional view of the nature of change

in the number of controlled degrees of freedom proposed by Bernstein has been contended – in adult

studies (Spencer and Thelen, 1999; Newell and Vaillancourt, 2001), as well as in infant studies (Haehl

et al., 2000). These recent observations seem to indicate that while according to Bernstein’s framework

biomechanical degrees of freedom only increase (as a consequence of practice and exercise), there can

5The space of possible motor activations is very large: “consider the 600 or so muscles in the human body as being, forextreme simplicity, either contracted or relaxed. This leads to 2600 possible motor activation patterns, more than the numberof atoms in the known universe” ( Wolpert et al., 2003).


be – depending on the task – an increase or decrease of the number of degrees of freedom. Despite

such counter evidence, Bernstein’s proposal bears at least two important messages, which fit very

nicely into the above discussion: (a) the presence of initial constraints that are gradually lifted, and

(b) the emergence of coordinated movements from a dynamic interaction (via external feedback and

forces) between the maturing organisms and the environment.

2.4.5 Self-exploratory activity

Scaffolding by parents and caretakers (see Sec. 2.4.10), as well as active exploration of objects and

events, have been acknowledged to be of crucial importance for the developing infant (Bushnell and

Boudreau, 1993; Gibson, 1988; Piaget, 1953; Rochat, 1989). Little attention, however, has been paid to

the understanding of what sort of information is available to infants as a result of their self-exploratory

acts. Self-exploration plays an important role in infancy, in that infants’ “sense of bodily self” to some

extent emerges from a systematic exploration of the perceptual consequences of their self-produced

actions (Rochat and Striano, 2000). The exploration of the infants’ own capacities is one of the primary

driving forces of development and change in behavior, and infants explore, discover, and select –

among all possible solutions – those that seem more adaptive and efficient (Angulo-Kinzler, 2001,

p. 363). Exploratory actions, traditionally thought to be actions focused on the external world, may

as well be focused on the infants own action system (Von Hofsten, 1993). Infants exploring their own

action system or their immediate surroundings have been observed to perform movements over and

over again (Piaget, 1953). Newborn infants, for instance, have been observed to spend up to 20% of

their waking hours contacting their face with their hands (Korner and Kraemer, 1972). In analogy to

vocal babbling, this experiential process has also been called “body babbling” (Meltzoff and Moore,

1997). By means of self-exploratory activities infants learn to control and exploit the dynamics of their

bodies (Goldfield, 1995; Smitsman and Schellingerhout, 2000; Thelen and Smith, 1994). The nature of

these dynamics differs from infant to infant (each infant has a unique set of abilities, muscle physiology,

fat distribution, and so on), and depends on the dynamics of the interaction with the environment,

which in turn varies from task to task. Self-exploration can also be conceptualized as a process of

soft-assembly, 6 i.e., a process of self-organization (see Sec. 2.4.3) during which new movements are

generated, and more effective ways of harnessing environmental forces are explored, discovered, and

selected (Goldfield, 1995; Schneider et al., 1990; Schneider and Zernicke, 1992).

6Soft-assembly refers to a self-organizing ability of biological systems to “freely” recruit the components (such as neu-rons, groups of neurons, and mechanical degrees of freedom) that are part of the system, yielding flexibility, variability androbustness against external perturbations (Clark, 1997; Goldfield, 1995; Thelen and Smith, 1994).


2.4.6 Spontaneous activity

Spontaneous movements have been recognized as important precursors to the development of motor

control in early infancy (Forssberg, 1999; Piek, 2002; Taga et al., 1999; Thelen, 1995). One of their

main functions is the exploration of various musculo-skeletal organizations, in the context of multiple

constraints, such as environment, task, architecture of the nervous system, muscle strength, masses of

the limbs and so forth (see Sec. 2.4.2 and Sec. 2.4.5). Well-coordinated movement patterns emerge

from spontaneous neural and motoric activity as infants learn to exploit the physical properties of

their bodies and of the environment. In fact, fetuses (as early as 8 to 10 weeks after conception), and

newborn infants display a large variety of transient and spontaneous motoric activity, such as general

movements 7 and rhythmical sucking movements (Prechtl, 1997), spontaneous arm movements (Piek

and Carman, 1994), stepping and kicking (Thelen and Smith, 1994). An interesting property of sponta-

neous movements is that although they are not linked to a specific, identifiable goal, they are not mere

random movements. Instead, they are organized, right from the early days of postnatal life, into rec-

ognizable forms. Spontaneous kicks in the first few months of life, for instance, are well-coordinated

movements characterized by a tight joint coupling between the hip, knee, and ankle joints (Thelen and

Fischer, 1983; Thelen and Smith, 1994), and by short phase lags between the joints (Piek, 2001, p.724).

As hypothesized by Sporns and Edelman (1993), spontaneous exploratory activity may also induce cor-

relations between certain populations of sensory and motor neurons, which are eventually selected as a

task is consistently accomplished or a goal attained. The same authors also proposed three concurrent

steps of how the development of sensory-motor coordination may proceed: (a) Spontaneous gener-

ation of a variety of movement patterns; (b) development of the ability to sense the consequence of

the self-produced movements; and (c) actual selection of a few movements. We note that the ultimate

“availability” of good sensory-motor patterns is connected to the degrees of freedom problem: The

latter can only be achieved if the range of in principle possible movements is constrained by initially

reducing the number of available degrees of freedom (see Sec. 2.4.4).

2.4.7 Anticipatory movements and early abilities

Throughout development infants acquire and refine the ability to predict the sensory consequences

of their actions and the behavior of external objects and events (e.g., the “when” and “where” of a

forthcoming manual interception of an object passing by). Optimally, this ability allows movements

to be adjusted prospectively rather than reactively in response to an unexpected perturbation (Adolph

et al., 2000; Von Hofsten, 1993). Two types of control strategies are employed to control anticipatory

7General movements represent one of the most important type of spontaneous movements which have been identified.They last from a few seconds to a several minutes, are caused endogenously by the nervous system, and in normal infantsinvolve the whole body.


movements: predictive and prospective control (e.g., Peper et al., 1994; Regan, 1997). In predictive

control the current perceptual information is used to predict the sensory activation at a future point

in time. Prospective control, on the other hand, relies on the sensory (or perceptual) information

associated to a particular action as the action unfolds over time, and is thus based on a close coupling

between information and movement.

Predictive and prospective control are in place already early in development. Infants as young as

one month, for instance, are able to compensate for head movements with zero lag between head and

eye movements (Bertenthal and Von Hofsten, 1998; Von Hofsten et al., 1998). Predictive control is

important because of the intrinsic time-delays of the sensory-motor system (visual feedback can take

up to 150 msec to be processed by the cortex, for instance). An example where infants make use of

prediction is gaze following. During gaze control there are at least two situations during which predic-

tive control is important: For the prediction of the motion of visual targets, and for the prediction of

the consequences of relative movements between body parts (e.g., movement of the head with respect

to the eyes).

Prediction clearly supports the idea that the brain forms so-called “internal forward models” –

instances of internal models, which have been hypothesized to exist in the cerebellum (Miall et al.,

1993), and whose biological and behavioral relevance has been confirmed by recent experiments (e.g.,

Mussa-Ivaldi, 1999; Wolpert et al., 2001). Forward models are ‘neural simulators’ of the musculo-

skeletal system and the environment (Clark and Grush, 1999; Grush, 2004; Wolpert et al., 2003), and

thus allow predicting the future state of the system given the present state and a certain input (a state

specifies a particular body configuration).

The ability to make predictions is part of what Spelke (2000) refers to as “core initial knowledge”,

that is, a set of basic competencies on top of which human cognition is built. High-level cognitive

functions, such as planning and shared attention, for instance, can be interpreted with respect to their

capability of predicting the consequences of chains of events. The large number of behavioral predis-

positions that have been discovered, and which are part of the core knowledge, show that infants are

not mere blank slates waiting to be written on (Iverson and Thelen, 1999; Johnson, 1997; Meltzoff and

Moore, 1997; Spelke, 2000; Thelen, 1981).

2.4.8 Categorization and sensory-motor coordination

Categorization is the ability to make distinctions in the real world, i.e., to discriminate and identify

sensory stimulations, events, motor acts, emotions, and so on. This ability is of such fundamental im-

portance for cognition and intelligent behavior that a natural organism incapable of forming categories

does not have much of a chance of survival (unless the categories are innate, but then they are not

flexible). For example, the organism will not be able to discern food and non-food, peer and non-peer,


and so forth. Categorization is an efficient and adaptive initial step in perceiving, and cognizing, as

well as a base for most of our conceptual abilities. Our daily interactions with the physical world,

and our social and intellectual lives heavily rely on our capacity to form categories (Lakoff, 1987),

and so does cognitive development (Thelen and Smith, 1994). Most organisms are therefore endowed

with the capacity to perceptually categorize and behaviorally discriminate an extraordinary range of

environmental stimuli (Edelman, 1987).

Evidence from developmental psychology supports the idea that perceptual categorization and con-

cept formation are the result of active exploration and manipulation of the environment (e.g., Bushnell

and Boudreau, 1993; Gibson, 1988; Piaget, 1953; Streri, 1993). That is, while sensation, and perhaps

certain aspects of perception can proceed without a contribution of the motor apparatus, perceptual

categorization depends upon the interplay between sensory and motor system. In other words, catego-

rization is an active process, which can be conceptualized as a process of sensory-motor coordinated

interaction of the organism with its surrounding environment (e.g., discrimination of textures and size

of objects by exploratory hand movements). It is through such interaction that the raw sensory data

impinging on the sensors may be appropriately structured, and the subsequent neural processing sim-

plified. The structure induced in the sensory data is important – perhaps critical – in establishing

dynamic categories, and may be a consequence of the correlation of movements and of time-locked

external (potentially multimodal) sensory stimulation (Thelen and Smith, 1994, p. 194). We conclude

that the absence of self-produced movements can affect the development of cognitive abilities and

skills. Children with severe physical disabilities, for instance, have limited opportunities to explore

their surroundings. And this lack of experience affects their cognitive and social development.

2.4.9 Neuromodulation, value and neural plasticity

Neuromodulatory systems are small and compact groups of neurons that reach large portions of the

brain. They include the noradrenergic, serotonoergic, cholinergic, dopaminergic, and histaminergic

cell nuclei (Edelman, 1987; Edelman and Tononi, 2001). In mammals, the importance of these modu-

latory neurotransmitter systems vastly outweighs the proportion of brain space they occupy, their axons

projecting widely throughout the cerebral cortex, hippocampus, basal ganglia, cerebellum, and spinal

cord (Dickinson, 2003; Hasselmo et al., 2003). One of the primary roles of neuromodulatory systems

is the configuration and tuning of neural network dynamics at different developmental stages (Marder

and Thirumalai, 2002).

Another important role of these systems in brain function is to serve as “value systems” that either

gate the current behavioral state of the organism (e.g., waking, sleep, exploration, arousal), or that

act as internal mediators of value and environmental saliency. That is, they signal the occurrence

of relevant stimuli or events (e.g., novel stimuli, painful stimuli, rewards) by modulating the neural


activity and plasticity of a large number of neurons and synapses (Friston et al., 1994). Value systems

have several properties: (a) Their action is probabilistic, i.e., they influence big populations of neurons;

(b) their activation is temporally specific, that is, their effects are transient and short-lasting; and (c)

spatially uniform, i.e., they affect widespread regions of the brain, while acting as a single global

signal (Sporns and Alexander, 2002). Other implementations of value systems, e.g., in other species

are also possible (Dickinson, 2003).

Value systems play a pivotal role in adaptive behavior, because they mediate neural plasticity and

modulate learning in a self-supervised and self-organized manner. In doing so, they allow organisms

to autonomously learn via self-generated (possibly spontaneous) activity. In a sense, value systems

introduce biases into the perceptual system, and therefore create the necessary conditions for learning

and the self-organization of dynamic categories. The action of value systems can be either genetically

predetermined, such as in behaviors that satisfy homeostatic and appetitive needs, or it can incorporate

activity and experience-dependent processes (Sporns, 2004). The two flavors of value are also known

as innate and acquired value (Friston et al., 1994).

2.4.10 Social interaction

Interactions with adults and peers (scaffolding, tutelage, and other forms of social support) as well as

mimetic processes such as mimicry, imitation, and emulation, are hypothesized to play a central role

in the development of early social cognition and social intelligence (Meltzoff and Prinz, 2002; Whiten,

2000). The presence of a caregiver to nurture children as they grow is essential, because human infants

are extremely dependent on their caregivers, relying upon them not only for their most basic needs

but also as a guide for their cognitive development (Lindblom and Ziemke, 2003; Vygotsky, 1962). It

is important to note that in terms of development interaction with objects and interaction with peers

bear two completely different valences (Nadel, 2003). Through interaction with inanimate objects

infants acquire information “statically” and maybe learn the “simple” physics that entails the objects’

behavior. During peer to peer or infant-adult interaction, however, infants are engaged in a complex

communicative act, involving the interaction of two complex dynamical systems mutually influencing

(and modifying) each other’s behavior.

A fundamental type of interaction between infants and adults is scaffolding. The concept of scaf-

folding, whose roots can be found in the work of Vygotsky (1962) was introduced by Wood et al.

(1976) and refers to the support provided by adults to help children bootstrap cognitive, social, and

motor skills. As the child’s confidence increases, the level of assistance is gradually reduced. In other

words, scaffolding helps structuring the environment in order to facilitate interaction and learning.

Scaffolding by a more capable caregiver or imitation of a peer can reduce distractions and bias explo-

rative behaviors toward important environmental stimuli. The caregiver can also increase or decrease


the complexity of the task. This issue is akin to the concept of “sensitive periods” (Bornstein, 1989;

Gottlieb, 1991), that is, particular intervals of time during which infants are especially responsive to

the input from their caregivers, and hence more apt to acquire skills.

From a very early age, infants are endowed with the necessary means to engage in simple, but

nevertheless crucial social interactions (e.g., they show preferences for human smell, human faces, and

speech (Johnson, 1997; Nadel and Butterworth, 1999) – see also Sec. 2.4.7), which can be used by the

caregiver to regulate and shape the infant’s behavior. Joint or shared attention, that is, the ability to

attend to an object of mutual interest in the context of a social exchange, is already observed in six

months old infants (Butterworth and Jarrett, 1991). Meltzoff and Moore (1977) reported on the early

ability of very young infants to imitate both facial and manual gestures. Indeed, early and non-verbal

imitation is a powerful means for bootstrapping the development of communication and language.

Developmental psycholinguists such as Fernald (1985) provided compelling evidence for what sort of

cues preverbal infants exploit in order to recognize affective communicative intent in infant-directed

speech (motherese).

Basic social competencies are the background on which more complex social skills develop, and

they represent yet another way to facilitate learning (see Sec. 2.4.1). The reliance on social contact is

so integrated into our species that it is hard to imagine a completely asocial human. Severe develop-

mental disorders that are characterized by impaired social and communicative development, such as

autism (Baron-Cohen, 1995), can give us a glimpse on the importance of social contact (Scassellati,

2001, p. 30).

2.4.11 Intermediate discussion

In summary, despite being some sort of unfinished version of a fully developed adult, infants are well-

adapted to their specific ecological niche. As suggested in the discussion above, development is a

process during which the maturation of the neural system is tied to a concurrent and gradual lifting of

the initial limitations on sensory and motor systems. The state of immaturity that at first sight appears

to be an inadequacy plays in fact an integral role during ontogeny, and results in increased flexibility,

and faster acquisition of skills and subskills. Innate abilities, such as prospective control or prewired

motor patterns (Thelen, 1981), can also speed up skill acquisition by providing a “good” background

for the learning of novel skills. The difficulty of learning particular tasks can be further reduced by

shaping development via appropriate social exchanges and scaffolding by adults.

The various aspects of development exposed in this section are obviously highly interdependent

and cannot be considered in isolation. Spatio-temporally coordinated movement patterns (Sec. 2.4.4),

for instance, arise spontaneously and in a self-organized fashion from the interaction among brain,

body, and environment, and are – at least in part – the result of an entrainment between these three


components (Sec. 2.4.6 and Sec. 2.4.3). In general, autonomous and self-organized formation of spatio-

temporal patterns is a distinguishing trait of “open nonequilibrium systems”, that is, of systems in

which “energy” flows (a) from one region of the system to another (the system is not at equilibrium);

and (b) in and out of the system (the system is open) (e.g., Haken, 1983; Kelso, 1995).

Category learning (Sec. 2.4.8) represents another example of the interdependency between the

proposed developmental aspects, because it lends itself well to an interpretation as a dynamic process

during which, through interaction with the local environment, patterns of behavior useful for category

formation self-organize. (Sec. 2.4.3). Moreover, in analogy to the development of patterns of motor

coordination in motor learning (Sec. 2.4.4), it is possible to conceptualize the emergence of perceptual

categories as a modification of degrees of freedom: mechanical degrees of freedom (i.e., number of

joints and muscles) in the case of motor learning, and sensory-motor or perceptual degrees of freedom

(i.e., categories) in the case of category formation. The self-organization of categories is directed

by neural and bodily constraints (Sec. 2.4.2), as well as by value systems (Sec. 2.4.9), which not

only introduce the necessary biases for learning to take place, but also modulate it, by evaluating the

consequences of particular actions. Hence they constitute the engine of exploration, and represent a

conditio sine qua non for category learning, for social interactions (Sec. 2.4.10), and for directing self-

exploratory processes (Sec. 2.4.5). Self-exploration and self-learning, in turn, are strongly dependent

on spontaneous movement activity (Sec. 2.4.6). This sort of activity, albeit not oriented toward any

functional goal (such as reaching for an object, or turning the head in a particular direction), leads

to the generation of sensory information across different sensory modalities correlated in time, which

gives infants the possibility to learn to sense and predict the consequences of their own actions through

self-exploration. For example, take an infant spontaneously waving her hand in front of her eyes, and

touching her face. Over time, this sort of activity generates associations of the sensory information

that originates from outside the body (called exteroception, e.g., vision, audition or touch) with the one

coming from inside the body (or proprioception, e.g., vestibular apparatus or muscle spindles), and a

sense of bodily self can emerge (Rochat and Striano, 2000).

As can be seen from these few examples, every aspect of development is affected simultaneously

by other ones. This coupling makes their investigation challenging, and modeling a difficult enterprise.

We contend that embodied models and robotic systems represent appropriate scientific tools to tackle

the interaction and integration of the various aspects of development. The construction of a physical

system forces us to consider (a) the interaction of the proposed model with the real world, and (b) the

interaction and the integration of the various subcomponents of the model with each other. This way

of thinking has spurred, since its inception, a growing number of research endeavors.

2.5. Research landscape 39

2.5 Research landscape

In this section, we present a survey of a variety of research projects that deal with or are inspired by

developmental issues. Table 2.2 gives a representative sample of investigations and is not intended as

a fully comprehensive account of research related to developmental robotics. For the inclusion of a

study in Table 2.2, we adopted the following two criteria:

The study had to provide clear evidence for robotic experiments. That is, we did not include

computer-based models of real systems, avatars, or other simulators. This choice is not aimed

at discrediting simulations, which indeed are very valuable tools of research. In fact, we ac-

knowledge that physical instantiation is not always an absolute requirement, and that simula-

tions have distinct advantages over real world experiments, such as the possibility for extensive

and systematic experimentation (Sporns, 2004; Ziemke, 2003). If the goal, however, is to model

and understand development and how it is influenced by interaction with the environment, then

robots may represent the only viable solution. Whereas a simulation can impossibly capture

all the complexities and oddities of the physical world (Brooks, 1991; Steels, 1994; Pfeifer and

Scheier, 1999), robots – by being “naturally” situated in the real world – are the only way to

guarantee a continuous and real time coupling of body, control and environment.

The study had to show a clear intent to address hypotheses put forward in either developmental

psychology or developmental neuroscience. The use of connectionist models, reinforcement or

incremental learning applied to robot control alone – without any link to developmental theories,

for instance – did not fulfill this requirement.

Despite the admittedly rather restrictive nature of these two requirements, we were able to identify

a significant number of research papers satisfying them. In order to introduce some structure in this

rather heterogeneous collection of papers, we organized the selected articles of Table 2.2 according to

four primary areas of interest (see Table 2.3):

(1) Socially oriented interaction: This category includes robotic systems in which social interaction

plays a key role. These robots either learn particular skills via interaction with humans or with

other robots, or learn to communicate with other robots or humans. Examples are language

acquisition, imitation, social regulation.

(2) Non-social interaction: This category comprises studies that are characterized by a direct and

strong coupling between sensory and motor processes, and the surrounding local environment,

which do not involve any interaction with other robots or humans. Examples are learning to

grasp, visually-guided manipulation, perceptual categorization, and navigation.


(3) Agent-related sensory-motor control: This category organizes studies that investigate the ex-

ploration of bodily capabilities, changes of morphology (e.g., perceptual acuity, or strength of

the effectors) and their effects on motor skill acquisition, and self-supervised learning schemes

not specifically linked to a functional goal. Examples: self-exploration, categorization of motor

patterns, learning to swing or bounce.

(4) Mechanisms and processes: This category contains investigations that address mechanisms or

processes thought to increase the adaptivity of a behaving system. Examples are developmental

plasticity, value systems, neurotrophic factors, Hebbian learning, freezing and unfreezing of

mechanical degrees of freedom, increase or decrease of sensory resolution and motor accuracy,

and so on.

The borders of the proposed categories may not be as clearly defined as this classification suggests, and

instances may exist that fall in two or more of those categories. Or even worse, these categories may

appear arbitrary and ad hoc. We believe, however, that a grouping into four primary interest areas is

meaningful for the following reasons: First, the individual categories refer to different contextual sit-

uations; that is, while interactions in a social context typically involve one or more persons or robots,

not socially oriented interactions and agent-related control do not. Second, movements performed dur-

ing a social-related interaction have a communicative purpose, e.g., language, or gestures. Non-social

sensory-motor interactions as well as agent-related control, however, do not bear any communicative

value (unless an object is used as a means of communication). As stated previously, evidence from

developmental psychology suggests that interaction with peers and interaction with objects bear com-

pletely different valences. Third, unlike non-social sensory-motor interactions whose primary purpose

is the active exploration or manipulation of the surrounding environment, agent-related sensory-motor

control is mainly concerned with the exploration of the agent’s own bodily capabilities. Examples

from studies into human development (mainly concerned with motor development) are: rhythmical

stereotypies (Thelen, 1981), general movements (Prechtl, 1997), crawling (Adolph et al., 1998), and

postural control (Bertenthal and Von Hofsten, 1998; Hadders-Algra et al., 1996).

Finally, we note that the last category (mechanisms) groups mechanisms and processes that are

valid for whatever content domain – be it socially or not socially oriented interaction, or agent-related

sensory-motor control. Most of the studies surveyed in this chapter employed a number of mechanisms

and processes either explicitly or implicitly. Hebbian learning and neurotrophic factors, for instance,

are general mechanisms of plasticity. Similarly, value systems can modulate different types of learning,

and guide the self-organization of early movements. We believe that these mechanisms might form a

good basis on which to build a general theory of developmental robotics.


2.5.1 Socially oriented interaction

Studies in social interaction and acquisition of social behaviors in robotic systems have examined a

wide range of learning situations and techniques. Prominent research areas include shared (or joint)

attention, low-level imitation (that is, reproduction of simple and basic movements), language devel-

opment, and social regulation (for an overview and a taxonomy of socially interactive robots, see Fong

et al., 2003). Adopting a developmental stance within this context may indeed be a good idea.

Brian Scassellati, for instance, advocated the application of a developmental methodology as a

means of providing a structured decomposition of complex tasks, which ultimately could facilitate

(social) learning (Scassellati, 1998). In (Scassellati, 2001), he described the early stages of the imple-

mentation in a robot of a hybrid model of shared attention, which in turn was based on a model of the

development of a “theory of mind” 8 proposed by Baron-Cohen (1995). Despite the simplicity of the

robot’s behavioral responses, and the need for more complex social learning mechanisms, this study

represents a first step toward the construction of an artificial system capable of exploiting social cues

to learn to interact with other robots or humans. Another model of joint attention was implemented

by Nagai et al. (2002). The model involved the development of the sensing capabilities of a robot from

an immature to a mature state (achieved by means of a gradual increase of the sharpness of a Gaussian

spatial filter responsible for preprocessing the visual input), and a change of the caregiver’s task eval-

uation criteria, through a decrease of the task error leading to a positive reward for the robot. Along a

similar line of research, Kozima and Yano (2001) studied a “rudimentary” or early type of joint visual

attention displayed by infants. In this case, the robot was able to roughly identify the attentional target

in the direction of the caregivers’s head only when it could simultaneously see both, the caregiver and

the target.

Joint attention is but one factor on which social interaction relies upon. An architecture of mutually

regulatory human-robot interaction striving at integrating various factors involved in social exchanges

was described in (Breazeal and Scassellati, 2000). The aim of the suggested framework was to include

perception, attention, motivations, and expressive displays, so as to create an appropriate learning

context for a social infant-like robot capable of regulating on its own the intensity of the interaction.

Although the implementation did not parallel infant development exactly, the authors claimed that

the design of the system was heavily inspired by the role motivations and facial expressions play in

maintaining an appropriate level of stimulation during social interaction of infants with adults (Breazeal

and Scassellati, 2000, p. 51). Human-robot interaction was also the focus in (Dautenhahn and Billard,

1999), where the authors described an example of emergence of global interaction patterns through

exploitation of movement dynamics. The performed experiments were based on an influential theory

8Theory of mind defines a set of socially-mediated skills relating the individual’s behavior in a social context, e.g.,detection of eye contact.


of cognitive development advocated by Vygotsky (1962), which proposes that social interactions are

essential for the development of individual intelligence. For a recent review of Vygotsky’s theory of

cognitive development and its relation to socially situated Artificial Intelligence see (Lindblom and

Ziemke, 2003).

Socially situated learning can also be guided by robot-directed speech. In such a case, the robot’s

affective state – and as a consequence its behavior – could be influenced by verbal communication

with a human caregiver. It is perhaps less obvious, but equally important to note that there is no

need to associate a meaning to what is said. Breazeal and Aryananda (2002), for instance, explored

recognition of affective communicative intent through the sole extraction of particular acoustic cues

typical of infant-directed speech (Fernald, 1985). This represents an instance of nonverbal interaction

in which emotional expressions and gestures used by human caretakers shape how and what preverbal

infants learn during social exchanges. Varshavskaya (2002) applied a behavior-based approach to the

problem of early concept and vocal label acquisition in a sociable anthropomorphic robot. The goal

of the system was to generate the kind of vocal output that a pre-linguistic, ten to twelve months old

infant, may produce; namely emotive grunts, canonical babblings, which include the syllables required

for meaningful speech, and a formulaic proto-language (some sort of pre-verbal and pre-grammatical

form of the future language). In the author’s own words, most inspirational for the design of the

proto-language acquisition system was the seminal work by Halliday (1975). Dautenhahn and Billard

(1999) also investigated the synthesis of a robotic proto-language through interaction of a robot with

either a human or a robotic teacher. They were able to show how language can be grounded via a

simple movement imitation strategy. “More preverbal” was work done by Yoshikawa et al. (2003),

who constructed a system – consisting of a microphone, a simplified mechanical model of the human

vocal tract, and a neural network – that had to learn to articulate vowels. Inspired by evidence that

shows how maternal vocal imitation leads to the highest rates of infant vocalization (Pelaez-Nogueras

et al., 1996), the artificial system was trained by having the human teacher imitate the robotic system.

Recently, developmentally inspired approaches to robot imitation have received considerable at-

tention (Andry et al., 2002; Dautenhahn and Nehaniv, 2002; Demiris, 1999; Kuniyoshi et al., 2003).

Typically, in robot imitation studies the robot imitates the human teacher or another robot. This rela-

tionship was turned upside down by Stoica (2001), who showed that imitation of the movements of

a robotic arm by a human teacher, could naturally lead to eye-arm coordination as well as to an ade-

quate control of the arm – see also Yoshikawa et al.’s work on speech generation (Yoshikawa et al.,

2003). Many authors have suggested a relatively straightforward two-stage procedure: First, the ar-

tificial system learns to associate proprioceptive or other motor-related sensory information to visual

sensory information and then, while imitating, it exploits the acquired associations by querying for

the motor commands that correspond to the previously perceived sensory information. An example

of a different approach was reported by Demiris and Hayes (2002), who developed a computational


architecture of early imitation used for the control of an active vision head, which was based on the

Active Intermodal Matching hypothesis 9 for early infant imitation proposed by Meltzoff and Moore

(1997).The author gives also an overview of previous work in the field of robotic imitation (for similar

surveys, see Breazeal and Scassellati, 2002; Schaal, 1999).

Learning by imitation offers many benefits (Demiris, 1999; Demiris and Hayes, 2002; Schaal,

1999). A human demonstrator, for instance, can teach a robot to perform certain types of movements by

simply performing them in front of the robot. This strategy reduces drastically the amount of trial-and-

error for the task that the robot is trying to accomplish and consequently speeds up learning (Schaal,

1999). Furthermore, it is possible to teach new tasks to robots by interacting naturally with them. This

possibility is appealing, because it might lead to open-ended learning not constrained by any particular

task or environment.

All studies reviewed thus far presuppose in a way or another a set of basic sensory-motor skills

(such as gazing, pointing or reaching) deemed important for social exchanges of any kind. Stated dif-

ferently, for embodied systems to behave and interact – socially and non-socially – in the real world,

an appropriate coordination of perception and action is necessary. It is becoming commonly accepted

that action and perception are tightly intertwined, and that the refinement of this coupling is the out-

come of a gradual developmental process (e.g., Thelen and Smith, 1994). The following subsection

will review studies that attempt to deepen our understanding of the link between perception and action

in a non-social context.

2.5.2 Non-social interaction

Sensing and acting are tied to each other. Accurate motor control would not be possible without per-

ception, and vice versa, purposive vision would not be feasible without adequate control of actions.

In the last decade-or-so, neurophysiologists have been discovering a number of multi-sensory and

sensory-motor areas. Building models of the processing performed by those areas might be a challeng-

ing research endeavor, but more importantly, it should cast definitive doubts on the way the problem of

perception has been traditionally understood by the Artificial Intelligence (AI) community, that is, as a

process of mapping sensory stimulation onto internal symbolic representations (particularly as young

children presumably do not have “symbols” well developed 10). We have already given some hints

that this has changed. More work is certainly required in order to get a better grasp on the mechanisms

of perception and how they are linked to action.

The coordination of action and perception is of particular importance for category learning. Tradi-

tionally, the problem of categorization has been investigated by employing disembodied categorization

9The hypothesis suggests that infants try to match visual information against appropriately transformed proprioceptiveinformation.

10Thanks to an anonymous reviewer for pointing this out.


models. A growing body of evidence supports, however, a more interactive, dynamic, and embod-

ied view of how categories are formed (Lakoff and Johnson, 1999; Nolfi and Floreano, 2000; Pfeifer

and Scheier, 1999). In essence, as suggested by Dewey (1896) more than one century ago, catego-

rization can be conceptualized as a process of sensory-motor coordinated bodily interaction with the

real world. Embodied models of categorization are not passively exposed to sensory data, but through

movements and interaction with the environment, they are able to generate “good” sensory data, for

example by inducing time-locked spatio-temporal correlations within one sensory modality or across

various sensory modalities (Lungarella and Pfeifer, 2001; Lungarella and Sporns, 2004; Pfeifer and

Scheier, 1997; Sporns and Pegors, 2004; Te Boekhorst et al., 2003) (cf. “principle of information

self-structuring” previously introduced).

Categorization of objects via real-time correlation of temporally contingent information impinging

on the haptic and the visual sensors of a mobile robot was achieved by Scheier and Lambrinos (1996),

for instance. The suggested control architecture employed sensory-motor coordination at various func-

tional levels – for saccading on interesting regions in the environment, for attentional sensory-motor

loops, and for category learning. Sensory-motor activity was also critical in work performed by Krich-

mar and Edelman (2002), who studied the role played by sensory experience for the development of

perceptual categories. In particular, the authors showed that the overall frequency and temporal order

of the perceptual stimuli encountered, had a definite influence on the number of neural units devoted

to a specific object class. This result is confirmed by research on experience-dependent neural plastic-

ity (see Stiles, 2000, for a recent view).

A few other examples of the application of a developmental approach to the acquisition of visuo-

motor coordinations exist. Marjanovic et al. (1996), for instance, were able to show how acquired

oculomotor control (saccadic movements) could be reused for learning to reach or point toward a visu-

ally identified target. A similar model of developmental control of reaching was investigated by Metta

et al. (1999). The authors concluded that early motor synergies might speed up learning and consid-

erably simplify the problem of the exploration of the workspace (see also Pfeifer and Scheier, 1997).

They also pointed out that control and learning should proceed concurrently rather than separately –

as it is the case in more traditional engineering approaches. These studies complement those on the

development of joint attention, discussed in the previous section. Berthouze and colleagues employed

the tracking of a pendulum to teach an active vision head simple visual skills such as gaze control, and

saccading eye movements (Berthouze et al., 1997; Berthouze and Kuniyoshi, 1998). Remarkably, the

robot even discovered its “own vestibulo-ocular reflex.” The approach capitalized on the exploitation

of the robot-environment interaction for the emergence of coordinated behavior. Non-social, object-

related sensory-motor interaction was also central in the study performed by Metta and Fitzpatrick

(2003). Starting from a reduced set of hypotheses, their humanoid system learned – by actively poking

and prodding objects (e.g., a toy car or a bottle) – to associate particular actions with particular object


behaviors (e.g., a toy car rolls along if pushed appropriately, while a bottle tends to roll sideways).

Their results were in accordance with the theory of affordances by Gibson (1977).

A different research direction was taken by Coehlo et al. (2001). They proposed a system archi-

tecture that employed haptic categories and the integration of tactile and visual information in order to

learn to predict the best type of grasp for an observed object. Relevant in this case is the autonomous

development of complex visual features starting from simple behavioral primitives.

Weng et al. (2000) reported on a developmental algorithm tested on a robot, which had to learn

to navigate on its own in an unknown indoor environment. The robot was trained interactively, that

is, on-line and in real time, via direct touch of one the 28 touch sensors located on the robot’s body.

By receiving some help and guidance from a human teacher, the algorithm was able to automatically

develop touch-guided motor behaviors and, according to the authors, some kind of low-level vision.

2.5.3 Agent-related control

As discussed in Section 2.4.5, self-exploration plays a salient role in infancy. The emergence and

tuning of sensory-motor control is hypothesized to be the result of the exploration of the perceptual

consequences of infants’ self-produced actions (Rochat and Striano, 2000). Similarly, an agent may

attain sensory-motor control of its bodily capabilities by autonomous exploration of its sensory-motor

space. A few instances of acquisition of agent-related control in robots exist.

Inspired by findings from developmental psychology, Berthouze et al. (1998) realized a system that

employed a set of basic visuo-motor (explorative) behaviors to generate sensory-motor patterns, which

were subsequently categorized by a neural architecture capable of temporal information processing.

Following a similar line of research, Kuniyoshi et al. (2003) developed a visuo-motor learning system

whose goal was the acquisition of neonatal imitation capabilities through a self-exploratory process of

“body babbling” (Meltzoff and Moore, 1997). As in (Berthouze et al., 1998), the proposed neural archi-

tecture was also capable of temporal information processing. An agent-related (not object-related) type

of categorization is also reported in (Berthouze and Kuniyoshi, 1998). The authors used self-organizing

Kohonen maps to perform an unsupervised categorization of sensory-motor patterns, which emerged

from embodied interaction of an active vision system with its environment. The self-organization pro-

cess led to four sensory-motor categories consisting of horizontal, vertical, and “in-depth” motions,

and a not clearly defined, intermediate category.

Morphological changes (e.g., body growth, changes of visual acuity and visual resolution) rep-

resent one of the most salient characteristics of an ongoing developmental process. Lungarella and

Berthouze (2002a) investigated the role played by such changes for the acquisition of motor skills by

using a small-sized humanoid robot that had to learn to pendulate, i.e., to swing like a pendulum (cf.

Chapter 3 and Chapter 4). The authors attempted to understand whether physical limitations and con-


straints inherent to body development could be beneficial for the exploration and selection of stable

sensory-motor configurations (see also Turkewitz and Kenny, 1982; Bjorklund and Green, 1992). In

order to validate the hypothesis, Lungarella and Berthouze (2002a,b) performed a comparative analysis

between the use of all bodily degrees of freedom from the very start, and the progressive involvement

of all degrees of freedom by employing a mechanism of developmental freezing and unfreezing of

degrees of freedom (Bernstein, 1967). In a follow-up case-study (Lungarella and Berthouze, 2002c),

the same authors investigated the hypothesis that inherent adaptivity of morphological changes leads to

behavioral characteristics not obtainable by mere value-based regulation of neural parameters (Chapter

4). The authors were able to provide evidence for the claim that in learning a motor task, a reduction

of the number of available biomechanical degrees of freedom helps stabilize the interplay between en-

vironmental, and neural dynamics (the way patterns of activity in the neural system change with time).

They showed that the use of all available degrees of freedom from the start reduced the likelihood for

the occurrence of physical entrainment, i.e., mutual regulation of body and environmental dynamics.

In turn, lack of entrainment led to a reduced robustness of the system against environmental perturba-

tions. Conversely, by initially freezing some of the available degrees of freedom, physical entrainment

and thus robust oscillatory behavior could occur.

Another instance of agent-related sensory-motor control was reported by Lungarella and Berthouze

(2003) (Chapter 5). Inspired by a study of how infants strapped in a Jolly Jumper learn to bounce (Gold-

field et al., 1993), the authors performed a series of experiments with a bouncing humanoid robot (see

Fig. 2.2), aimed at understanding the mechanisms and computational principles that underly the emer-

gence of movement patterns via self-exploration of the sensory-motor space (such as entrainment). The

study showed that a suitable choice of the coupling constant between limb segments, as well as of the

gain of the sensory feedback, induced a reduction of the movement variability, an increase in bouncing

amplitude, and led to movement stability. The authors attributed the result to the entrainment of body

and environmental dynamics. Taga (1995) reported a similar finding in the case of biped walking.

2.5.4 Mechanisms and processes

A few mechanisms, such as freezing and unfreezing of degrees of freedom, or physical entrainment,

have already been discussed in the previous section. Other developmentally relevant mechanisms exist.

Some of them are related to changes in morphological parameters, such as sensor resolution, and motor

accuracy, some of them affect neural parameters, such as the number of neurons constituting the neural

system. Dominguez and Jacobs (2003) and Nagai et al. (2002), for instance, describe systems that

start with an impoverished visual input whose quality gradually improves as development (or learning)

progresses. In this section, we discuss two additional mechanisms.


Value system

Learning is modulated by value systems. A learning technique in which the output of the value sys-

tem modulates learning itself is called value-based or value-dependent learning. Unlike reinforcement

learning techniques (which provide an interesting set of computational principles), value-based learn-

ing schemes typically specify the neural mechanisms by which stimuli can modulate learning, and by

which organisms sense the consequences of their actions (Sporns, 2003, 2004) (see also Pfeifer and

Scheier, 1999, ch.14). Another difference between the two learning paradigms is that typically – in

reinforcement learning – learning is regulated by a (reinforcement) signal given by the environment,

whereas in value-based learning, the (value) signal is provided by the agent itself (self-teaching). A

number of value systems have been realized in robotic systems. In those implementations the value

system either plays the role of an internal mediator of salient environmental stimuli and events (Al-

massy et al., 1998; Krichmar and Edelman, 2002; Scheier and Lambrinos, 1996; Sporns et al., 2000;

Sporns and Alexander, 2002), or is used to guide some sort of exploratory process (Lungarella and

Berthouze, 2002c; Steels, 2003) (cf. chapter 3, 4, and 6 of this thesis).

Almassy et al. (1998) constructed a simulated neural model embedded in an autonomous real-

world device, one of whose four components was a “diffuse and ascending” value system. The value

signals were used to modify the strength of the connections from the neurons of the visual area to the

ones of the motor area. One of the results of these value-dependent modifications was that without any

supervision, appropriate behavioral actions could be linked to particular responses of the visual system.

A similar model system was described by Krichmar and Edelman (2002). Compared to previous work,

the modeled value signal had two additional features: a prolonged effect on synaptic plasticity, and

the presence of time-delays (Krichmar and Edelman, 2002, p. 829). Another instantiation of a value

system is described in (Scheier and Lambrinos, 1996; Pfeifer and Scheier, 1997). In this case, the

output of the value system was used to modulate Hebbian learning – yet another crucial mechanism.

Essentially, the robot was allowed to learn only while it was exploring objects. Sporns and Alexander

(2002) tested a computational model of a neuromodulatory system 11 – structurally and functionally

similar to the mammalian dopamine and noradrenaline system – in an autonomous robot. The model

comprised two neuromodulatory components mediating the effect of rewards and of aversive stimuli.

According to the authors, value signals played a dual role in synaptic plasticity, in that they not only

had to modulate the strength of the connection between sensory and motor units, but they were also

responsible for the change of the response properties of the value system itself.

In contrast to the previous cases, where the value system was used to modulate learning, in (Lun-

garella and Berthouze, 2002c) the value system was employed to guide the exploration of the parameter

11Neuromodulatory systems are instantiations of value systems that find justification in neurobiology. Examples includethe dopaminergic and the noradrenergic systems.


space associated with the neural system of a robot that had to learn to pendulate (cf. chapters 3, 4, and

6 of this thesis).

Developmental plasticity

Plasticity is an important ontogenetic mechanism that contributes to the adaptivity of brain, body and

behavior in response to internal and external variations. The developing brain, for instance, is continu-

ously changing (in terms of number of neurons, amount of interconnections, wiring patterns, synaptic

plasticity, and so on) and these changes are in part experience-dependent. Such neural plasticity gives

our neural circuitry the potential of acquiring (given appropriate training) nearly any function (O’Leary

et al., 1994). A similar characteristic holds for plasticity of body and behavior.

The study of a neural model incorporating mechanisms of neural plasticity was conducted by Al-

massy et al. (1998) (for more examples, see Sporns, 2004). In particular, the authors analyzed how

environmental interactions of a simulated neural model embedded in a robot may influence the initial

formation, the development, and the dynamic adjustment of complex neural responses during sensory

experience. They observed that the robot’s self-generated movements were crucial for the emergence,

and development of selective and translation-invariant visual cortical responses, because they induce

correlations in various sensory modalities. Another result was the development of a foveal prefer-

ence, that is, the system showed “stronger visual responses to objects, presented closer to the visual

fovea” (Almassy et al., 1998, p.358).

A further example of synthetic neural modeling is illustrated in (Elliott and Shadbolt, 2001). The

authors studied the application of a neural model, featuring “anatomical, activity-dependent, develop-

mental synaptic plasticity” (p. 167), to the growth of sensory-motor maps in a robot whose task was

to avoid obstacles. They showed that the deprivation of one or two (infrared-light) receptors could

be taken care of by a mechanism of developmental plasticity, which according to the authors would

allow the nervous system to adapt to the body as well as to the external environment in which the body

resides.

2.5.5 Intermediate discussion

We can make a number of observations. Almost 40% of the studies reviewed (11 out of 29) fell in the

category labeled “social interaction” (see Table 2.3). Apparently, this category constitutes a primary

direction of research in developmental robotics. This result is confirmed by the fact that lately a lot of

attention has been directed toward designing socially interactive robots. In a recent and broad overview

of the field, Fong et al. (2003) tried to understand the reasons behind the growing interest in socially

interactive robotics. They concluded that social interaction is desirable in the case robots mediate

human-human (peer-to-peer) interactions (robot as persuasive machine) or in the case robots function


as a representation of, or representative for, the human (robot as avatar 12). It is plausible to assume

that in order to acquire more refined and advanced social competencies – e.g., deferred imitation 13 –

a robot should undergo a process of progressive development of its social skills analogue to the one of

humans. Fong and his colleagues share the same opinion.

It is further interesting to note that many of the studies considered here examine to some extent the

sensory-motor competence in interacting with the local environment – in particular, basic visuo-motor

competencies such as saccading, gaze fixation, joint attention, hand-eye coordination, and visually-

guided reaching. Brooks (2003) stressed the “crucial” importance of basic social competencies (e.g.,

gaze-direction or determination of gaze-direction) for peer-to-peer interactions. Early motor compe-

tencies are a natural prerequisite for the development of basic social competencies. We were able,

however, to single out only a few studies that have attempted to go beyond pointing, reaching, or

gazing, i.e., early motor competencies. This issue is closely related to the notoriously hard problem

of learning to coordinate the many degrees of freedom of a potentially redundant nonlinear physical

system, and indeed, imitation learning may represent a suitable route to its solution (Schaal, 1999).

Another way out of the impasse, may be to exploit processes of self-exploration of the sensory-motor

system and its intrinsic dynamics. The usage of self-exploration is explicitly advocated in four of the

surveyed studies (Andry et al., 2002; Berthouze et al., 1998; Kuniyoshi et al., 2003; Lungarella and

Berthouze, 2002c), and presumably has been employed implicitly also in other ones.

From a developmental perspective, learning multi-joint coordinations or acquiring complex mo-

tor skills may benefit from the introduction of initial morphological (sensor, motor, and neural) con-

straints, which over time are gradually released (Scassellati, 2001; Lungarella and Berthouze, 2002b;

Nagai et al., 2002). In the same context, mechanisms of physical and neural entrainment, that is, mu-

tual regulation between environment and the robot’s neural and body dynamics, as well as value-based

self-exploration of body and neural parameters, deserve further investigation. A pioneering attempt

to capitalize on the coupling between the body, neural, and environmental dynamics was promoted

by Taga (1991). In his model of biped walking, he showed how movements could emerge from a

global entrainment 14 among the activity of the musculo-skeletal system and the surrounding envi-

ronment. The study was performed, however, only in simulation. Williamson (1998) used two real

robot arms to investigate a similar issue. He claimed that his approach would allow to achieve gen-

eral oscillatory motion, and more complex rhythmic tasks by exploitation of the coupled dynamics of

an oscillator system and the arm dynamics (p. 1393). Two obvious shortcomings of his investigation

were the absence of learning and of a developmental framework. Lungarella and Berthouze (2002c,

12Remote-presence robots may indeed be one of the killer applications of robotics in the near future (Brooks, 2003, p. 135).13Imitation that takes place a certain amount of time after the demonstration by the teacher.14“Since the entrainment has a global characteristic of being spontaneously established through interaction with the envi-

ronment, we call it global entrainment”(p. 148 Taga, 1991)

2.6. Developmental robotics: existing theoretical frameworks 50

2003) building on previous research, attempted to capitalize on the interplay between neural plasticity,

morphological changes, and entrainment to the dynamics of body and task (chapters 3 and 4).

Autonomy, a thorny concept without a generally accepted definition (e.g., Pfeifer and Scheier,

1999), is another research theme in need for further investigation. Loosely speaking, an autonomous

system must be self-contained and independent from external control. Thus, in such a system, the

mechanisms and processes that mold local structure to yield global function must reside entirely within

the system itself (Sporns, 2003). Autonomy is no easy feat. An autonomous robot should be also en-

dowed with an initial set of values and drives, i.e., motivations or needs to act and interact with the

environment. The role of the value system and of the motivational system is to mediate learning, pro-

mote parameter exploration, drive action selection, and regulate social interactions (Blumberg, 1996;

Breazeal and Scassellati, 2000). Concerning the value system, an important issue will have to be ad-

dressed in future work, that is, how specific or general the system of values and motivations needs to

be in order to bootstrap adaptive behavior. In current implementation values and motivations are rela-

tively simple: light is better than no light, or seek face-like blobs while avoiding nonface-like blobs. In

essence, the issue boils down to the choice of the initial set of values and drives. But, how much has to

be predefined, and how much should acquired?

Finally, we note that while the spectrum of outstanding research issues, as well as the complexity of

the available robots, have considerably increased over the past few years, not many “developmentally

inspired” reconnaissance tours into unexplored research directions have been started yet. There is, for

instance, only one single study on navigation that tried to employ developmental mechanisms (Weng

et al., 2000), and there are no studies at all on robot locomotion!

2.6 Developmental robotics: existing theoretical frameworks

Early theorization of developmental robotics can be traced back to work on behavior-based robotics,

physical embodiment, situatedness, and sensory-motor coupling with the environment (Brooks, 1991;

Brooks and Stein, 1994; Rutkowska, 1995). On route to understanding human intelligence by building

robots, Sandini et al. (1997) were among the first to recognize the importance of taking into account

development. They called their approach “Developmental Engineering.” As in traditional engineering,

the approach is directed toward the definition of a theory for the construction of complex systems. The

main objective is to show that the adoption of a framework of biological development can be success-

fully employed for constructing artificial systems. Metta (2000) pointed out that this methodology can

be envisaged as some sort of new tool for exploring developmental cognitive sciences. Such a new tool

could have a similar role to the one that system and control theory had for the analysis of human move-

ments. The authors investigated some of the aspects of visuo-motor coordination in a humanoid robot

called Babybot (see Fig. 2.2). Issues, such as the autonomous acquisition of skills, the progressive

2.6. Developmental robotics: existing theoretical frameworks 51

increase of the task complexity (by increasing the visual resolution of the system), and the integration

of various sensory modalities, were also explored (Panerai et al., 2002; Natale et al., 2002). Recently,

the same group also produced a manifesto of developmental robotics outlining various aspects relevant

to the construction of complex autonomous systems (Metta et al., 2001). The article maintained that

the ability of recognizing progressively longer chains of cause-effect relationships could be one possi-

ble way of characterizing learning in an “ecological context”, because in a natural setting no teacher

can possibly provide a detailed learning signal and enough training data (e.g., in motor learning the

correct activation of all muscles, proper torque values, and so on). For another recent manifesto of

developmental robotics, see (Elliott and Shadbolt, 2003).

Around the same time as Sandini, Ferrell and Kemp (1996) as well as Brooks (1997) argued that

development could lead to new insights into the issues of cognitive and behavioral scaling. In an ar-

ticle titled “Alternative Essences of Intelligence”, Brooks et al. (1998) explored four “intertwined key

attributes” of human-like intelligent systems, that is, development, embodiment, social interaction, and

multisensory integration. They made the following assumptions (implicitly negating three central be-

liefs of classical AI): (a) human intelligence is not as general purpose as usually thought; (b) it does

not require a monolithic control system (for the existence of which there is no evidence); and (c) in-

telligent behavior does not require a centrally stored model of the real world. The authors, drawing

inspiration from developmental neuroscience and psychology, performed a series of experiments in

which their humanoid robot(s) had to learn some fundamental sensory-motor and social behaviors (see

also Sec. 2.5). The same group also tried to capitalize on the concept of bootstrapping of skills from

previously acquired skills, i.e., the layering of new skills on top of existing ones. The gradual increase

in complexity of task-environment, sensory input (through the simulation of maturational processes),

as well as motor control, was also explored in tasks such as learning to saccade and to reach toward

a visually identified target (Marjanovic et al., 1996). Scassellati (1998, 2001) proposed that a devel-

opmental approach, in humans and in robots, might provide a useful structured decomposition when

learning complex tasks – or in his own words, “building systems developmentally facilitates learning

both by providing a structured decomposition of skills and by gradually increasing the complexity of

the task to match the competency of the system” (Scassellati, 2001, p. 29).

Another example of the novel and developmentally inspired approach to robotics was given

by Asada et al. (2001). The authors proposed a theory for the design and construction of humanoid

systems called “Cognitive Developmental Robotics.” One of the key aspect of cognitive develop-

mental robotics is to avoid implementing the robot’s control structure “according to the designers”

understanding of the robot’s physics’, but to have the robot acquire its own ‘understanding through

interaction with the environment’ (Asada et al., 2001, p. 185). This methodology departs from tradi-

tional control engineering, where the designer of the system imposes the structure of the controller.

In cognitive developmental robotics in particular, and in developmental robotics in general, the robot

2.7. Discussion 52

has to get to grips with the structure of the environment and behavior, rather than being endowed a

priori with an externally designed structure. Cognitive developmental robotics also points at how to

“prepare” the robot’s environment to progressively teach the robot new and more complex tasks with-

out overwhelming its artificial cognitive structure. This technique is called scaffolding, and parents or

caretakers often employ it to support, shape, and guide the development of infants (Sec. 2.4.10).

A last example of existing theories in developmental robotics is “Autonomous Mental Develop-

ment” (Weng et al., 2001). Autonomous mental development differs from the traditional engineering

paradigm of designing and constructing robots in which the task is “understood by the engineer”, be-

cause the machine has to develop its own understanding of the task. According to this paradigm, robots

should be designed to go through a long period of autonomous mental development, from infancy to

adulthood. Autonomous mental development relegates the human to the role of teaching and support-

ing the robot through reinforcement signals. The requirements for a truly mental development include

being non-task specific, because the task is generally unknown at design time. For the same reason,

the artificial brain has to develop a representation of the task, which could not be possibly embedded

in advance into the robot by the designer.

2.7 Discussion

One of the big outstanding research issues on the agenda of researchers of AI and robotics is how to

address the design of artificial systems with skills that go beyond “single-task” sensory-motor learning.

The very search for flexible, autonomous, and open-ended multi-task learning systems is, in essence, a

particular re-instantiation of the long-standing search for general-purpose (human-like) artificial intel-

ligence. In this respect, developmental robotics does not differ from other approaches, and embraces

a variation on the same theme. Yet – as some other scholars of the field – we speculate that the

rapprochement of robotics and developmental psychology may represent both a crucial element of a

general constructive theory for building intelligent systems, and a prolific route to gain new insights

into the nature of intelligence.

The modern view on AI notwithstanding (e.g Pfeifer and Scheier, 1999), hand designing au-

tonomous intelligent systems remains an extremely difficult enterprise. So challenging that the AI

community is starting to resign to the fact that with the current models of intelligence it may even be

impossible in principle. In fact, many to date believe that all proposed frameworks may have multi-

ple shortcomings. It is probably false to assume, for instance, that by merely simulating enough of

the right kind of brain, intelligence will ‘automagically’ emerge. In other words, enough quantitative

change may not necessarily lead to a qualitative change (e.g., De Garis et al., 1998). It is likely that

some fundamental principles still remain to be understood. Brooks (1997, 2003), for instance, has

hypothesized that our current scientific understanding of living things may be lacking some yet-to-

2.7. Discussion 53

be-discovered fundamental mathematical description – Brooks calls it provocatively the “juice” – that

is preventing us from grasping what is going on in living systems. We believe that a developmental

approach may provide a way to gracefully tackle the problem of finding Brooks’s juice. The mere

observation that almost all biological systems – to different extents – mature and develop, bears the

compelling message that development is the main reason why the adaptivity and flexibility of organic

compounds transcends the one of artificial systems. In this sense, the study of the mechanisms under-

lying postnatal development might provide the key to a deeper understanding of biological systems in

general and of intelligent systems in particular. In other words, although it might be interesting from

an engineering perspective, we have not yet succeeded in designing intelligent systems that are able to

cope with the contingencies of the real world – the reason being that we do not understand many of

the mechanisms underlying intelligent behavior yet. Thus, we are basically trying to learn from nature

that in millions of years of evolution has come up with ontogenetic development. In a possible next

step, the designer commitments could be pushed even further back (evolutionary speaking), by design-

ing only the mechanisms of genetic regulatory networks and artificial evolution, and letting everything

evolve (Nolfi and Floreano, 2000).

But what can a developmental approach do? Can it help us construct intelligent machines? The ra-

tionale is that having a complex process (development) gradually unfold in a complex artificial system

(e.g., humanoid robot) can inform our understanding of an even more complex biological system (e.g.,

human brain). Development is a historical process, in the course of which – through mutual coupling

and through interaction with the environment – new and increasingly complex levels of organization

appear and disappear. That is, adult skills do not spring up fully formed but emerge over time (see

Sec. 2.4.1). Thus, at least in principle, it should be possible to decompose the developmental progres-

sion into a sequence of increasingly complex activity patterns, that facilitate learning from the point

of view of the artificial system, and analysis and understanding on the side of the designer. Moreover,

development provides constraints and behavioral predispositions that combined with a general state of

“bodily immaturity” seem to be a source of flexibility and adaptivity (see Sec. 2.4.2 and Sec. 2.4.7).

Newborn infants, for instance, despite being restricted in many ways, are tailored to the idiosyncrasies

of their ecological niche – even to the point of displaying a rich set of adaptive biases toward social

interaction. Another contribution to the adaptivity of the developing system comes from its morpho-

logical plasticity, i.e., changes over time of sensory resolution, motor accuracy, mass of muscles and

limbs, and so on.

The message conveyed is one of the basic tenets of a developmental synthetic methodology: The

designer should not try to engineer “intelligence” into the artificial system (in general an extremely

hard problem); instead he or she should try to endow the system with an appropriate set of basic mech-

anisms for the system to develop, learn, and behave in a way that appears intelligent to an external

observer. As many others before us, we advocate the reliance on the principles of emergent functional-

2.7. Discussion 54

ity (Rutkowska, 1994) and self-organization (see Sec. 2.4.3), which are essential features of biological

systems at any level of organization.

According to Rosen (1991), the formulation of a theory about the functioning of ‘something’ (e.g.,

living cells, artificial neural networks, and so forth) entails at least two problems. The first one, called

the “physiology problem”, relates to the mechanisms that underly the functioning of this “something.”

The second one, the “construction problem”, addresses the identification of the basic building blocks

of the system. This identification is extremely difficult, because in general it is not obvious, which

of the many possible decompositions is the correct one for describing the system as a whole. Here

development comes to rescue. During ontogenesis the different factors (the building blocks) are inte-

grated into a functioning whole (the system). By studying how a system is actually assembled, we have

automatically (by default) a suitable decomposition. The understanding acquired from comprehending

development can be applied to both situations, that is, it can help us solve both, the physiology as well

as the construction problem. A real understanding of “life itself” (borrowing from Rosen) might come

only through the formulation of a constructive theory.

As is evident from the survey given above, two important aspects of living systems that devel-

opmental robotics has to date not sufficiently addressed, are morphology and materials. In order to

understand cognition, however, we cannot confine our investigations to the mere implementation of

control architectures and the “simulation” of morphological changes (see Pfeifer, 2000). If robots are

to be employed as ‘synthetic tools’ to model biological systems, we need to consider also physical

growth, change of shape and body composition, as well as material properties of sensors and actuators.

In this respect, despite not being explicitly inspired by developmental issues, the field of modu-

lar reconfigurable robotics is of some relevance for developmental robotics (e.g., Rus and Chirikjian,

2001). Murata et al. (2001), for instance, provided a taxonomy of reconfigurable, redundant and re-

generative systems, and maintained that this kind of machine represents the ultimate form of reliable

systems. Ideally, these systems should be able to produce any element in the system by themselves.

Up to now, there are no working examples of such systems. It is interesting to note that the description

given by Murata et al. bears some resemblance to the definition of “autopoietic” systems given by Mat-

urana and Varela (1998): “An autopoietic system is organized as a network of processes of production

(synthesis and destruction) of components such that these components (a) continuously regenerate and

realize the network that produces them, and (b) constitute the system as a distinguishable unity in the

domain in which they exist” (see also Beer, 2004; Luisi, 2003). An example of autopoietic system is

the cell, which is constituted of a membrane and of the machinery for protein synthesis. From the point

of view of applications, the relevance of robots that have self-repair capabilities, or that can adapt their

body shape to the task at hand is evident; and indeed, the robotics community has recently started to

address these issues (Hara and Pfeifer, 2003; Teuscher et al., 2003). From a theoretical point of view,

however, it will be important to develop computational paradigms capable of describing and managing

2.8. Future prospects and conclusion 55

the complexity of a robot body that changes over time.

As far as material properties are concerned, current technology is lacking many of the characteris-

tics that biology has, that is, durable, efficient, and powerful actuators (e.g., in terms of power-volume

and weight-torque ratios), redundant, and adaptive sensory systems (e.g., variable density of touch

receptors), as well as mechanical compliance and elasticity. Thus, the search for novel materials for

actuators and sensors will play a pivotal role. A few of these issues are being investigated for the

current generation of humanoid robots (for a review, see Dario et al., 1997), and will become more

compelling as the robots will start moving ‘out of the research labs.’ Take haptic perception (i.e., the

ability to use touch to identify objects), for instance. Due to the technological difficulties involved in

the construction of artificial skin sensors, most researchers do without this ability, or de-emphasize its

importance in relation to vision, audition, or proprioception. In many respects, however, haptic per-

ception – even more than vision – is directed toward the coupling of perception and action. Moreover,

the integration of haptic and visual stimulation is absolutely essential for the development of cognition

(e.g., visuo-haptic transfer, that is, the ability to coordinate information about the shape of objects from

hand to eyes, seems to be already present in newborns (Streri and Gentaz, 2003)).

2.8 Future prospects and conclusion

A list of future research directions that are worth pursuing needs to include autonomous learning –

where autonomous is intended in its strongest connotation, that is, as learning without a direct inter-

vention from a human designer (of course, this does not exclude interaction with a human teacher).

A key aspect of autonomous learning is the study of value systems that gate learning, and drive ex-

ploration of body dynamics and environment. We postulate that robots should acquire solutions to

contingent problems through autonomous exploration and interaction with the real world: generating

movements in various situations, while experiencing the consequences of those movements. Those so-

lutions could be due to a process of self-assembly, and thus would be constrained by the robot’s current

intrinsic dynamics. Common (not necessarily object-related) repetitive actions displayed by human in-

fants (poking, squishing, banging, bouncing, cruising) could give the developing artificial creature a

large amount of multimodal correlated sensory information, which could be used to bootstrap cogni-

tive processes, such as category formation, deferred imitation, or even a primitive sense of self. In

a plausible (but oversimplified) ‘developmental scenario’ the human designer could endow the robot

with simple biases, i.e., simple low-level ‘valences’ for movement, or for sound in the range of human

voices. A critical issue will be to have the robot develop new higher-level valences so as to bias ex-

ploration and learning for longer periods of time that transcend the time frame of usual sensory-motor

coordination tasks. Another possible route could be grounded in recent neurophysiological findings,

which seem to suggest that cognition evolved on top of pre-existing layers of motor control. In this


case, manipulation (a sensormotor act) could play a fundamental role by allowing ‘baby-robots’ (or

infants) to acquire the concept of ‘object’ in the first place, and to evolve it into language (Rizzolatti

and Arbib, 1998). This aspect, although partially neglected so far, might prove to be an important next

step on route to the construction of human-like robots.

In conclusion, the generation of robots populating the years to come will be characterized by many

human-like features, not thought to be part of intelligence in the past but considered to be crucial

aspects of human intelligence nowadays. The success of the infant field developmental robotics and

of the research methodology it advocates, will ultimately depend on whether truly autonomous ‘baby

robots’ will be constructed. It will also depend on whether by instantiating models of cognition in

developmental robots, predictions will be made that will find empirical validation.


Developm

entalfacetsLink

todevelopm

ent,representativepublication

References

Value

systems,neuralplasticity

postnatalcorticalplasticity(K

atoetal.,1991)

Alm

assyetal.(1998)

Socialinteraction,im

portanceofconstraints

earlyim

itation(N

adelandB

utterworth,1999)

Andry

etal.(2002)

Sensorim

otorcoordination

reflexivebehavior

(Piaget,1953)

Berthouze

etal.(1997)

Self-exploration,sensorim

otorcategorization

reflexivebehavior

(Piaget,1953)

Berthouze

andK

uniyoshi(1998)

Self-exploration

self-explorationB

erthouzeetal.(1998)

Socialinteraction

infant-caretakerinteractions

(Bullow

a,1979)B

reazealandS

cassellati(2000)

Socialinteraction

prosodicpitch

contours(F

ernald,1985)B

reazealandA

ryananda(2002)

Prospective

control,sensorimotor

coordinationvisuo-haptic

exploration,controlofreaching(B

erthieretal.,1996)

Coehlo

etal.(2001)

Socialinteraction

proto-languagedevelopm

ent(Vygotsky,

1962)D

autenhahnand

Billard

(1999)

Socialinteraction,early

abilitiesactive

intermodalm

atching(M

eltzoffand

Moore,1997)

Dem

iris(1999)

Neuralplasticity

neurotrophicfactors

(Purves,

1994)E

lliottandS

hadbolt(2001)

Socialinteraction

jointattention(B

utterworth

andJarrett,1991)

Kozim

aand

Yano(2001)

Categorization,neuralplasticity

homeostatic

plasticitym

echanism(Turrigiano

andN

elson,2000)K

richmar

andE

delman

(2002)

Socialinteraction,self-exploration

earlyim

itation,bodybabbling

(Meltzoffand

Moore,1997)

Kuniyoshietal.(2003)

Degrees

offreedom,value

systems

freezing/unfreezingofdegrees

offreedom(B

ernstein,1967)Lungarella

andB

erthouze(2002c)

Self-organization,self-exploration

bouncing,entrainment(G

oldfield

etal.,1993)Lungarella

andB

erthouze(2003)

Stage-like

process,valueinfantreaching

behavior(D

iamond,1990)

Marjanovic

etal.(1996)

Stage-like

process,valueinfantreaching

behavior(K

onczaketal.,1995)

Metta

etal.(1999)

Socialinteraction

mirror

systems

(Gallese

etal.,1996)M

ettaand

Fitzpatrick

(2003)

Socialinteraction,im

portanceofconstraints

jointvisualattentionN

agaietal.(2002)

Sensory-m

otorcoordination,self-organization

categorylearning

(Thelen

andS

mith,1994)

Pfeifer

andS

cheier(1997)

Socialinteraction,stage-like

processm

odelofjointattention(B

aron-Cohen,1995)

Scassellati(1998)

Categorization,sensorim

otorcoordination

explorativebehaviors

(Rochat,1989)

Scheier

andLam

brinos(1996)

Categorization,value


perceptualcategorization(E

delman,1987)

Sporns

etal.(2000)

Value


neuromodulatory

system(S

chultz,1998)S

pornsand

Alexander

(2002)

Socialinteraction

eye-armcoordination

Stoica

(2001)

Socialinteraction,early

abilitiesproto-linguistic

functions(H

alliday,1975)V

arshavskaya(2002)

Value

systemN

AW

engetal.(2000)

Socialinteraction,self-organization

contingentmaternalvocalization

(Pelaez-N

oguerasetal.,1996)

Yoshikawa

etal.(2003)

Table 2.2: Explicitely invoked developmental facet(s). NA = Not Available.


Subjectarea

Goal/focus

Robot

References

Socially

orientedinteraction

earlyim

itationM

R+

AG

Andry

etal.(2002)

socialregulationA

VH

Breazealand

Scassellati(2000)

regulationofaffective

comm

unicationA

VH

Breazealand

Aryananda

(2002)

proto-languagedevelopm

entM

RD

autenhahnand

Billard

(1999)

earlyim

itationA

VH

Dem

iris(1999)

jointvisualattentionU

TH

Kozim

aetal.(2002)


TH

+M

RN

agaietal.(2002)


TH

Scassellati(1998)

eye-armcoordination,im

itationR

AS

toica(2001)

earlylanguage

development

AV

HV

arshavskaya(2002)

vocalimitation

RS

Yoshikawa

etal.(2003)

Nonsocialsensorim

otorinteraction

saccading,gazefi

xationA

VH

Berthouze

andK

uniyoshi(1998)

visuo-hapticexploration

HG

SC

oehloetal.(2001)

visually-guidedpointing

UT

HM

arjanovicetal.(1996)

visually-guidedreaching

UT

HM

ettaetal.(1999)

visually-guidedm

anipulationU

TH

Metta

andF

itzpatrick(2003)

indoornavigation

MR

+A

GW

engetal.(2000)

Agent-related

sensorimotor

controlself-exploration,early

abilities,categorizationA

VH

Berthouze

etal.(1998)

self-exploration,earlyim

itationU

TH

+M

RK

uniyoshietal.(2003)

pendulation,morphologicalchanges

HD

Lungarellaand

Berthouze

(2002c)

bouncing,entrainment

HD

Lungarellaand

Berthouze

(2003)

Mechanism

sbehavioralinteraction,neuralplasticity

MR

+A

GA

lmassy

etal.(1998)

sensorimotor

categorization,self-organizationA

VH

Berthouze

andK

uniyoshi(1998)

sensorydeprivation,neuralplasticity

MR

Elliottand

Shadbolt(2001)

invariantobjectrecognition,conditioningM

R+

AG

Krichm

arand

Edelm

an(2002)

categorization,valueM

R+

AG

Pfeifer

andS

cheier(1997)

categorization,cross-modalassociations,exploration

MR

+A

GS

cheierand

Lambrinos

(1996)

categorization,conditioning,valueM

R+

AG

Sporns

etal.(2000)

neuromodulation,value

MR

+A

GS

pornsand

Alexander

(2002)

Table 2.3: Representative examples of developmentally inspired robotics research. AVH = Ac-tive Vision Head, UTH = Upper-Torso Humanoid, MR = Mobile Robot, HD = Humanoid, HGS =Humanoid grasping system, UTH+MR = Upper-Torso Humanoid on Mobile Platform, MR+AG =Mobile Robot equipped with Arm and Gripper, RS = Robotic System.

Chapter 3

Freezing and Freeing Degrees of

Freedom1

A skilled response is [..] highly organized, both spatially and temporally. The central problem for

skill learning is how such organization or patterning comes about. (Fitts, 1964)

3.1 Synopsis

The robust and adaptive behavior exhibited by natural organisms is the result of a complex interaction

between various plastic mechanisms acting at different time scales. So far, researchers have concen-

trated on one or another of these mechanisms, but little has been done toward integrating them into a

unified framework and studying the result of their interplay in a real world environment. In this chapter,

we present experiments with a small-sized humanoid robot that learns to swing. They illustrate that

the exploitation of neural plasticity, entrainment to physical dynamics, and body growth (where each

mechanism has a specific time scale) leads to a more efficient exploration of the sensorimotor space and

eventually to a more adaptive behavior. Such a result is consistent with observations in developmental

psychology.

3.2 Introduction

The ontogeny of any biological organism is a complex process. The different parts composing the

developing system are mutually interdependent, and are uneven in their rate of growth. Development

is especially susceptible to environmental influences, and its temporal unfolding makes it particularly

1Appeared as Lungarella, M. and Berthouze, L. (2002). On the interplay between morphological, neural, and environ-mental dynamics: a robotic case-study. Adaptive Behavior, 10(3-4), pp.223-241, 2002.

59


hard to establish the precise time of onset of specific skills during infancy or childhood, which in turn

makes it very difficult to order the onset of different abilities with respect to one another. Traditionally,

both the capabilities and the limitations of newborns have been attributed to maturational processes in

the central nervous system (McGraw, 1945; Gesell, 1946). The disappearance of certain patterns of

behavior, or the emergence of others over time have been viewed as a derivative of processes or events

occurring at some higher level, or to paraphrase Bushnell and Boudreau (1993), as changes in the mind

that would effect changes in the ability to deploy the body. This view attracted considerable atten-

tion and resulted in various models of, for example, the role of myelinization in the central nervous

system or the cortical inhibition of infantile reflexes during development (McGraw, 1940; Dekaban,

1959). However, a growing body of evidence has shown that the development of body morphology

(physical growth) also plays a major role in the emergence and disappearance of certain behavioral

patterns and of some aspects of perceptual and cognitive development (Thelen et al., 1984; Bushnell

and Boudreau, 1993; Thelen and Smith, 1994; Goldfield, 1995). Limitations at the morphological level

(e.g., changes in the mass of the eyeball) induce constraints at the cognitive level (e.g., disruption of

the development of binocular depth perception) (Aslin, 1988). Bushnell and Boudreau, for instance,

consider motor development to function as a “rate-limiting factor” in the development of perceptual

capabilities (haptic and depth perception). Naturally, these constraints – the so-called “developmen-

tal brake” (Harris, 1983) – have implications on the adaptivity of the organism. Many developmental

psychologists hypothesize that constraints in the sensory system and biases in the motor system early

in life, may have an important adaptive role in ontogeny (Turkewitz and Kenny, 1982; Bjorklund and

Green, 1992). Limitations in the sensory and motor apparata result in a reduction of the complexity

of he sensory information that impinges on the learning system during its interaction with the environ-

ment, and therefore facilitate adaptivity. Later, those initial constraints or biases are lifted, inducing

changes at the neural level, which in turn result in new patterns of environmental interaction. Bushnell

and Boudreau talk of “motor development in the mind” to refer to the co-development of the sensory

and motor system and report that specified motor abilities must be executed for the corresponding

perceptual abilities to emerge. Exploration and spontaneous movements play a critical role in this re-

gard (Von Hofsten, 1993). Although they do not know the variety of ways in which these limbs may be

used, infants are capable of spontaneously moving their limbs, from the fetal period onward (Smother-

man and Robinson, 1988; Robinson and Smotherman, 1992; Prechtl, 1997). Piaget (1953) emphasized

that when infants perform movements over and over again they are in fact exploring their own action

system. Properties of the body are actively explored while performing these spontaneous movements

so that the organism can sustain certain motions and create new forms out of them. While learning a

task, the infant may try out different musculo-skeletal organizations and explore its parameter space

guided by the dynamics of the task. In other words, these movements may be seen as actions focused

on the exploration of the external world, and on the infant’s own sensorimotor parameter space (Prechtl,

3.3. Learning to swing 61

1997). In fact, Goldfield (1995) hypothesized that the goal of exploration by an infant actor may be to

discover how to harness the energy being generated by the ongoing activity, so that the actual muscular

contribution to the act can be minimized. In this respect, it is worth noting that spontaneous move-

ments emerge during fetal life and disappear during later development, when voluntary motor activity

appears.

3.3 Learning to swing

In this chapter, we address the case of a small-sized humanoid robot that learns to pendulate, that is, to

swing as a pendulum. While various models have been proposed to control the behavior of a swinging

object (e.g., (Inaba et al., 1996; Miyakoshi et al., 1994; Saito et al., 1994; Schaal and Sternad, 1998;

Williamson, 2001)), we are not aware of any attempt to place it in a developmental context. Yet, there

is good reason to believe that such an approach would be justified. First of all, swinging can be seen

as a form of tertiary “circular reaction”, an essential component of the sensorimotor stages of Piaget’s

developmental schedule (Piaget, 1945). Circular reaction refers to the repetition of an activity in which

the body starts in one configuration, goes through a series of intermediate stages, and then returns to

the configuration from which it started. Rhythmic activity is highly characteristic of emerging skills

during the first year of life, and Thelen and Smith (1994) suggested that oscillations are the product

of a motor system under emergent control – when infants attain some degree of intentional control of

their limbs or body postures, but when their movements are not fully goal-corrected. Second, swinging

movements feature a complex interplay between environmental dynamics, body dynamics and neural

dynamics, which may benefit from an exploratory approach, i.e., not from a rigid selection of both

morphological and control parameters, but from a staged exploration of the various mechanisms.

Some instances of a developmental approach to complex control issues have been re-

ported. Berthouze and Kuniyoshi (1998) described experiments with a nonlinear redundant four de-

grees of freedom robotic vision system, where, to reduce the risk of being trapped in “stable but

inconsistent minima”, the introduction of two of the four available degrees of freedom was delayed

in time. This developmental strategy reduced the complexity of learning for each joint and led to a

faster stabilization of the controllers’ adaptive parameters. In a similar vein, Metta (2000) described a

robotic system called Babybot that used a staged release of the various mechanical degrees of freedom

to acquire the correct information for building sensory-motor and motor-motor transformations. In

both instances, development consisted of a delayed introduction of resources (the mechanical degrees

of freedom), which reduced the learning complexity of a particular task, e.g., the tracking of a pendu-

lum as in (Berthouze and Kuniyoshi, 1998). The issue was thus cast in an information-theoretic light,

and the focus was on how the introduction of bodily constraints benefits learning, rather than changes

in behavior. In that sense, the approach described by Berthouze and Kuniyoshi (1998) and by Metta

3.3. Learning to swing 62

(2000) is similar to existing connectionist learning techniques known as constrained or incremental

learning (Newport, 1990; Elman, 1993; Elman et al., 1996; Westermann, 2000), in which neural net-

works are able to learn a task only if initially handicapped by severe limitations, e.g., the reduction of

the memory size or of the number of nodes in the hidden layer.

The focus of this chapter is not on the information-theoretic implications of the developmental ap-

proach, but rather on the effects of bodily changes on behavioral performance during learning. We will

show that even though we employ a value-based regulation of neural plasticity to generate adaptive

behavior, exploiting the inherent adaptivity of motor development leads to behavioral characteristics

not obtainable by simply manipulating neural parameters. Furthermore, we will present evidence to

support the hypothesis that a developmental use of the degrees of freedom (a slow mechanism) can

help the skill acquisition process by stabilizing the interaction between environmental and neural dy-

namics (both fast mechanisms if we restrict ourselves to the transient synaptic changes characteristic

of perception-action).

Only Taga’s studies (Taga, 1997, 2000) on the development of bipedal locomotion in human infants

seem to share a similar focus. Taga proposed a computational model showing that, via a process of

freezing and freeing of the degrees of freedom of the neuro-musculo-skeletal system, the “u-shaped” 2

changes in performance typical of the development of stepping could be reproduced. In (Taga, 1997),

he concluded that it remains to be shown how the developing neural system drives the freeing and

freezing degrees of freedom and that future studies could be aimed at elucidating how the mechanisms

of freeing and freezing can be applied to the development of other types of movements.

From that viewpoint, our study is novel in that it deals with a different class of motor control

problems than those discussed by the researchers cited above. In our experimental system, pendulation

is not achieved by actuation of the pendulum but is induced by the reaction of the actuated parts (legs)

on the body. Because the body is coupled to the environment through a pendular mechanism (a non-

actuated or passive degree of freedom), body motion (and thus swinging) is possible. It is important

to note that the mechanical system is underactuated, i.e., there are fewer actuators than degrees of

freedom and proprioceptice feedback will refer to body motion and not to motion of the actuated

parts (leg joints). In that sense, the complexity of its control can be compared to that of an extended

version of the simple inverted pendulum, or of the double inverted pendulum, depending on whether

one or two mechanical degrees of freedom are considered. Although this particular control problem

has been extensively studied (Anderson, 1989; Spong, 1995), our developmental approach is novel. We

expect that starting with fewer degrees of freedom will result in multiple directions of stability which,

2U-shaped in this particular context refers to the fact that newborns’ stepping movements show a recognizable structure intime and space. While stepping movements stop when infants are about 2 months old, they reappear at around 8-10 months.This puzzling phenomenon was traditionally ascribed to maturation of the nervous system. However, Thelen et al. (1984)provided clear evidence for a biomechanical explanation, namely that of a changing balance between leg weight and musclestrength.

3.4. Experimental framework 63

while not necessarily yielding optimal task performance, will nonetheless guide the coordination of

additional degrees of freedom. These additional degrees of freedom may then allow for optimal task

performance as well as for more tolerance and adaptation to environmental perturbations.

3.4 Experimental framework

The experimental platform consisted of a small-sized humanoid robot with 12 degrees of freedom

(sometimes DOF hereafter). Through two thin metal bars fixed to its shoulders, the robot was attached

by a passive joint to a supportive metallic frame, in which it could freely oscillate in the vertical

(sagittal) plane (see Fig. 3.1). Each leg of the robot had five joints, but only two of them – hip and knee

– were used in our experiments. Each joint was actuated by a high torque RC-servo motor. Figure 3.2

Free jointFrame

Controller

Hip joint

Knee joint

Figure 3.1: Humanoid robot used in our experiments.

depicts the distributed architecture used to control the humanoid robot. Each limb was controlled

by a separate neural oscillator. Neural oscillators are particular neural structures that can produce

rhythmic activity without rhythmic input, and that are hypothesized to be responsible for producing

rhythmic movements, during activities such as swimming, walking and running, in invertebrates to

higher vertebrates (Ijspeert, 2003). The usage of oscillators in a robotic system is not novel but our


mo

tor

com

man

ds

tonic impulse

pattern generators

free joint kneehip

frame of reference

pro

pri

oce

pti

on

join

t sy

ner

gy

f

e

e

f

Figure 3.2: Schematics of the experimental system and the control architecture. Proprioceptivefeedback consists of the visual position of the hip marker in the frame of reference centered onthe hip position when the robot is in its resting position, i.e., vertical position. Joint synergywas only activated in experiments involving coordinated 2-DOF control.

focus is not on the control structure per se. Instead, we are interested in the capability of oscillators

to entrain to the frequency of an input – be it an external signal or the output of another oscillator

unit – over a wide range of frequencies. Indeed, in our framework, couplings are more relevant than

individual systems, a view also advocated by Hatsopoulos (1996). In this regard, oscillators are suitable

structures to implement a distributed control architecture and to consider developmental mechanisms

such as the freezing and freeing of the different degrees of freedom in particular.


3.4.1 Neural oscillators and joint synergy

Each neural oscillator was modelled after Matsuoka’s Matsuoka (1985) differential equations:

τuu f = −u f −βv f −ωc[ue]+−ωp[Feed ]+ + te (3.1)

τuue = −ue−βve−ωc[u f ]+−ωp[Feed ]−+ te (3.2)

τvv f = −v f + [u f ]+ (3.3)

τvve = −ve + [ue]+ (3.4)

where ue and u f are the inner states of neurons e (extensor) and f (flexor), ve and v f are variables

representing the degree of adaptation or self-inhibition of the extensor and flexor neurons and te is an

external tonic excitation signal. β is an adaptation constant, ωc is a coupling constant that controls the

mutual inhibition of neurons e and f , and ωp is a parameter weighting the proprioceptive feedback

Feed . Both τu and τv are time constants of the neurons’ inner states and determine the strength of the

adaptation effect. The operators [x]+ and [x]− return the positive and negative parts of x, respectively.

Because the servo motors used to actuate the robot did not provide any form of sensory feedback,

we used an external camera to track colored markers placed on the robot’s limbs. In all experiments,

proprioceptive feedback (Feed in Equation 3.1) refers to the visual position of the hip in a frame of

reference centered on the hip position when the robot is in its resting position (see Figure 3.2 for

a graphic description). It is important to note that, unlike most models in the literature, we have

not exploited any kinematic information on the robot itself (such as its anatomical angles) but only

kinematic information on the position of the robot with respect to the fixation point of the pendulum.

This was a natural step because our focus was on the swinging behavior. However, we will also

show that this choice affected the strong entrainment property usually found in neural oscillator-based

systems.

Joint synergy, which occurs in the human motor system, was implemented by feeding the flexor

unit of the knee oscillator with the combined outputs of the extensor and flexor units of the hip con-

troller. A factor −ωs([uhe ]+ +[uh

f ]+) was added to the term τuu f in the flexor unit of the knee oscillator

(Equation 3.1), with uhe and uh

f the inner states of the flexor and extensor units in the hip oscillator. ωs

is the intersegmental coupling parameter and determines the strength of the coupling.

Unless specified otherwise, the following control parameters were kept constant throughout the

study: β = 2.5, ωc = 2.0, ωp = 0.5, teh = 20 (hip tonic excitation) and tek = 15 (knee tonic excita-

tion). These experimentally determined values were selected because they offered the best compro-

mise between stability of the controllers and plasticity to environmental perturbations (Lungarella and

Berthouze, 2002a). Other parameters were set as discussed in the text.

3.5. Experimental results and discussion 66

3.4.2 Joint control

Similarly to Taga (1991), we used neural oscillators as rhythm generators, with an output activity y

given by the difference y = u f − ue between the activities of the flexor and extensor units. In most

robotic studies we are aware of, the oscillator’s output y is used as a motor command to control each

motor, either in position, force or torque. In systems with high-torque DC motors or pneumatic ac-

tuators and in systems with high-bandwith sensory feedback (> 1kHz) for example, this solution is

viable because the frequency of the control cycle is high enough. However, because our motor control

frequency was very low (around 15Hz) and the motors did not provide a sufficiently large torque, little

or no output torque could be expected on the pendulum when the amplitude of the pattern generator

output was either too low or changed too quickly. Thus, a high amplitude motor command was neces-

sary. Consequently, the output y of the rhythm generator was fed to a pulse generator whose output pg

was given by:

pgt = te(sgn(yt )− sgn(yt−δt)) (3.5)

where sgn(x) is the sign function, te is the tonic excitation of the neural oscillator (fixed throughout the

study), and δt is a very small time interval. In effect, this function detects sign changes in the output y

of the neural oscillator and generates a pulse of amplitude te and of sign sgn(yt ). The output pgt was

used as the actual motor command (control in position). Though very primitive (a variant of on-off

control), this controller is a suitable approximation of the output y. Indeed, it preserves frequency,

maximal amplitude as well as timing of the sign inversions within one period. Figure 3.3 illustrates

how changes in τu and τv are suitably reflected by the output of the pulse generator. In fact, the only

drawback of this control scheme is a phase shift which is easily compensated for by entrainment.

Finally, this controller is also interesting in that it implements a ballistic form of control 3, which is

consistent with the emerging control of movements in young infants (Von Hofsten, 1984).

3.5 Experimental results and discussion

3.5.1 Protocol

With the aim of a comparative analysis between an outright use of all degrees of freedom and a progres-

sive freeing of the degrees of freedom, we realized two sets of experiments. In the first one, 2-DOF

exploratory control was considered. Each pair of joint (hip, knee) was controlled by a separate os-

cillator unit. Other joints were kept stiff, in their reset position. Two cases were considered: in the

3Ballistic motor control is open-loop control and refers to the absence of feedback during movement performance. Ex-amples of ballistic movements include saccadic eye movements, and rapid aiming movements.


140 160 180 200 220 240 260 280 300

−20

−10

0

10

20

Time

Am

plitu

de

140 160 180 200 220 240 260 280 300

−20

−10

0

10

20

Time

Am

plitu

de

140 160 180 200 220 240 260 280 300

−20

−10

0

10

20

Time

Am

plitu

de

Figure 3.3: Comparison between the output of the pulse generator (thick impulse) and theoutput of the oscillator (solid line) for three different configurations of τu and τv, given a sameproprioceptive feedback (dotted line). The control settings were set as follows: τu = 0.02,τv =0.25 (top); τu = 0.06, τv = 0.25 (middle); τu = 0.06, τv = 0.75 (bottom). Note that while the ratio τu

τvis unchanged between the top and the bottom graph, both the frequency of the output and thenumber of impulses per period (i.e., the shape of the output) are changed. The vertical axisdenotes the amplitude of each signal. The horizontal axis denotes time steps (one time step is33ms).

first case, oscillator units were perfectly independent and their respective parameter space was inde-

pendently explored; in the second case, oscillator units were coupled via an intersegmental coupling

parameter ωs, with the assumption that it may lead to neural entrainment between oscillatory units.

From a control point of view, the former case is merely a particular instance of the latter with the inter-

segmental coupling parameter set to ωs = 0. In the second set of experiments, a bootstrapping 1-DOF

exploratory phase was considered during which only the hip joint was controlled, while other joints

were kept stiff, in their reset position. When (if) a stationary regime was obtained, the second degree

of freedom – knee – was released and controlled by its own oscillator unit. Again, the two cases above

were considered, with either independent control or synergetic control.

The humanoid robot’s movements were analyzed via the recording of hip, knee, and ankle posi-


tions. The same initial conditions were used in all experiments, with the humanoid robot starting from

its resting position. Unless specified otherwise, all parameter configurations were assumed to yield

motion without external intervention.

3.5.2 Exploratory process

In line with our interpretation of the swinging behavior as a “circular reaction”, we constructed a simple

value system to regulate the exploratory process. Value systems are usually defined as general biases

that are supposed to be heritage of natural selection, and which modulate learning. A number of robotic

systems have used such systems (e.g. Pfeifer and Scheier, 1999; Sporns et al., 2000). In our study,

the value system was implemented as a function of the maximum amplitude of the oscillation within a

given time window. The value v at time t was given by:

vt = maxvt−1(1− ε), |At | (3.6)

where |At | denotes the absolute value of the instantaneous amplitude of the oscillation, estimated by

measuring the visual position of the hip marker in the saggital (vertical) plane. The term (1− ε), with

0 < ε << 1, implements an exponential decay of the value when the oscillations remain consistently

lower than the previously achieved maximal amplitude. With an appropriate selection of ε, the decay is

not rapid enough for the value to decrease within a single period of a stable oscillation whose frequency

is in the range of the control frequencies considered in this study, that is, in the range [0.8,1.2]Hz.

Assuming continuity in a small neighborhood of parameter configuration, the following explo-

ration principle was adopted: when a parameter setting yields good performance (a high value v in

the value system), slow down the changes of parameters. Conversely, trigger a rapid and large change

of parameters when the setting results in low-amplitude oscillations. This is classically referred to as

the “exploration-exploitation dilemma.” On the one hand, the system should explore the parameter

space, whereas on the other hand, it should exploit good parameter configurations its exploration has

uncovered.

We implemented a mechanism inspired by a process of “Boltzmann exploration” and “Simulated

Annealing” (Kirkpatrick et al., 1983). Exploration is regulated by a parameter called “temperature” –

here 1v , where v is the value determined by the value system – so that when the temperature decreases,

exploitation of the parameter setting takes over and vice-versa, exploration is favored when the tem-

perature increases. Exploration of the parameters takes the form of an additive form of noise, whose

amount is a function of the temperature. The process is formally defined by the following equations:

τt+1u = τt

u + f (v)(τmaxu − τmin

u )Dx (3.7)

τt+1v = τt

v + f (v)(τmaxv − τmin

v )Dy (3.8)


where τmaxu,v and τmin

u,v define the range of exploration for parameters τu and τv of the extensor and

flexor neurons. Dx and Dy are stochastic variables with a discrete and uniform probability distribu-

tion P(−1,0,+1) = 13 and define the direction of change in the two-dimensional (τu,τv) parameter

space. f (v) = c(e10

1+v −1), with c an experimentally determined multiplicative constant (c = 0.1 for τu

and c = 1.0 for τv), determines the amount of change between old and new parameter configurations.

For values in the range v ∈ [0.0,160.0] (the range of visual amplitudes), this function was found to

yield the best results in terms of the trade-off between exploration and exploitation of the parameter

space. In effect, the parameter change from time step t to time step t +1 can be interpreted as a random

walk in the parameter space, with a value-dependent step size.

The unfolding of the resulting exploration process is illustrated in Figure 3.4. Initially, the low

amplitude oscillations of the system yield a low value v, that is, a high temperature 1v , which results

in a large step size. The exploratory process traverses the parameter space very rapidly. When a

parameter configuration yields a higher value v, the step size decreases until the exploration process

effectively converges onto one narrow region of the parameter space. At this stage, habituation occurs.

Habituation is one of the most elementary and ubiquitous form of plasticity and can be defined as

a decrease in the strength of a behavioral response that occurs when an initially novel stimulus is

presented repeatedly (Wang, 1995). In our study, it was simply implemented as an exponential decay

of the value v when the system remained in a 10− sec stationary regime (sustained oscillations). With

the resulting decay in value, the step size increases again and new areas of the parameter space are

explored.

3.5.3 Experimental observations

A number of experiments were performed, involving explorative runs of roughly 10 minutes, with ini-

tial conditions in the range τh,ku ∈ [0.02,0.04] and τh,k

v ∈ [0.2,0.4] for both hip and knee controllers. This

range was selected because it corresponds to a low-yield region of the parameter space (experimental

determination), and therefore guarantees that exploration will be necessary to reach a high-yield region

of the parameter space.

Within each scenario – 2-DOF exploration, 1-DOF exploration and bootstrapped 2-DOF explo-

ration – all runs were found to yield qualitatively similar results in terms of the characteristics of the

value landscape obtained, with variations accounted for by differences in initial conditions. For prac-

tical reasons (excessive strain on the physical structure of the robot as well as on the servo motors and

duration of a single experimental run), it was not possible to carry out enough runs to produce a sta-

tistically meaningful sample and therefore no statistical measurements (e.g., variances between runs)

were calculated.


τv

τu

Time (ms)

BA

C

A B C

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.02 0.03 0.04 0.05 0.06 0.07 0.08

800000 850000 900000 950000 1000000

Figure 3.4: Value-dependent exploration. The upper graph depicts the time series of the oscilla-tory movement of the robot’s hip (top) and the associated value v in the value system (bottom).Rectangular areas point to decreases of value caused by habituation. The lower graph depictsthe corresponding trajectories in parameter space. Oval areas point at dense regions of highyield parameter settings, i.e., the large oscillations observed in the time series.


Ruggedness of the value landscape in a 2-DOF independent control configuration

Figure 3.5 depicts the value landscape uncovered by a single explorative run in the 2-DOF configura-

tion with no joint synergy (ωs = 0). Each dot represents a parameter setting visited by the exploratory

process and its size is proportional to the value yielded by the setting. The plot shows that the ex-

ploratory process covered a large part of the parameter space in both hip and knee spaces and that

high-value regions are sparse and small. The latter is confirmed by the probability distribution function

of the value landscape (Figure 3.6, top). The distribution is clearly skewed towards the low values.

Under systematic exploration, the value landscape shows similar properties, as shown by Figure 3.7.

These observations indicate the presence of a “rugged” value landscape, where small changes

in parameters can be expected to yield different oscillatory behaviors. To confirm this hypothesis,

we performed a systematic analysis of the oscillatory behaviors found in a neighborhood of control

parameters. A systematic exploration of a limited region of the hip and knee parameter spaces –

namely, τhu ∈ [0.055,0.065], τh

v ∈ [0.55,0.65] and τku ∈ [0.025,0.035], τk

v ∈ [0.25,0.35] – was realized

with seven experiments, the results of four of which we discuss below. In each experiment, the resulting

behavior was evaluated in terms of the presence or absence of a stationary regime, the amplitude of

this regime, its smoothness (qualitatively), the relative configuration of hip and knee motor commands

as observed in a hip-ankle phase plot and the robustness to external perturbations (such as a manual

push). Each experiment started with the same initial conditions, that is, with the robot in its resting

position.

Though the parameter space now considered was very narrow, a small change of parameters yielded

very different behaviors. Qualitatively, the following states were observed. With τhu = 0.060, τh

v =

0.60 and τku = 0.030, τk

v = 0.30, our reference configuration for this experiment, a smooth stationary

regime of the hip oscillation was observed, with an amplitude of 80 units. While in phase with the hip

oscillations, the ankles did not reach a true stationary regime, which resulted in the ankle-hip phase

plot of Figure 3.8(left). This phenomenon can be attributed to a dampening effect stemming from this

particular morphological structure. The system was found to return to its stationary regime even in the

case of external perturbations.

Slightly changing the hip control parameters (τhu = 0.065, τh

v = 0.65) but leaving the knee param-

eters unchanged, resulted in a qualitatively very different behavior. While the ankle position quickly

reached a smooth stationary regime, an overall oscillatory behavior was not found (overall amplitude

of less than 20 units), as illustrated by the phase plot on Figure 3.8(right).

With the knee parameters unchanged, yet another behavior was obtained if the hip control parame-

ters were set to τhu = 0.055 and τh

v = 0.55. In this case, the overall oscillatory behavior was smooth and

reached a stationary regime. Interestingly, the ankle behavior exhibited several transitions to different

stationary regimes, the succession of which is depicted in Figure 3.9. Transitions between stationary


0.2 0.3 0.4 0.5 0.6 0.7 0.80.02

0.03

0.04

0.05

0.06

0.07

0.08

Tv

Tu

0.2 0.3 0.4 0.5 0.6 0.7 0.80.02

0.03

0.04

0.05

0.06

0.07

0.08

Tv

Tu

Figure 3.5: Value landscapes (left: hip parameter space; right: knee space) uncovered by asingle exploratory run in an independent 2-DOF configuration (ωs = 0). The size of a dot (acontrol setting visited by the exploratory process) is proportional to the value v obtained forthat particular control setting. Initial conditions were similar for both joints, namely, τh,k

u ∈[0.02,0.04] and τh,k

v ∈ [0.2,0.4]. The exploratory run took roughly 10 minutes.


0 0.1 0.2 0.3 0.4 0.5 0.60

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Value

Pro

babi

lity

0 0.1 0.2 0.3 0.4 0.5 0.60

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Value

Pro

babi

lity

0 0.1 0.2 0.3 0.4 0.5 0.60

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Value

Pro

babi

lity

Figure 3.6: Probability distribution functions of value landscapes obtained in three differentscenarios: independent 2-DOF exploration (top), 1-DOF exploration (middle) and bootstrapped2-DOF (bottom). The corresponding value landscapes are found in figure 3.5, 3.11(right)and 3.13 respectively. In each graph, the value space [0.0,0.6] was discretized into 50 bins. Sim-ply stated, each graph indicates the probability (vertical axis) that a value v (horizontal axis)occurs during the exploratory run considered. In the three scenarios, same initial conditionswere used.


Figure 3.7: Value landscape obtained during a systematic exploration of the knee parameterwith an arbitrarily chosen hip parameter setting (τh

u = 0.045,τhv = 0.65). The parameter space was

discretized in a 15x15 sampling and the figure is a linear approximation of the resulting valuesv. Brighter colors denote higher-yield settings. The experiment lasts about 150 minutes.

Hip

mar

ker

posi

tion

Ankle marker position

Hip

mar

ker

posi

tion


40

60

80

100

120

140

160

180

200

0 50 100 150 200

40

60

80

100

120

140

160

180

200

0 50 100 150 200

Figure 3.8: Effect of a small change in the hip control parameters on the ankle-hip phase plotsin the independent 2DOF configuration: left, oscillatory behavior without a true stationaryregime (τh

u = 0.060, τhv = 0.60, τk

u = 0.03, τkv = 0.3); right, no oscillatory behavior (τh

u = 0.065, τhv = 0.65,

τku = 0.03, τk

v = 0.3). In both graphs, the axes denote the horizontal coordinates of the hip andankle markers’ visual positions.


Hip

mar

ker

posi

tion


Hip

mar

ker

posi

tion


Hip

mar

ker

posi

tion


Hip

mar

ker

posi

tion


40

60

80

100

120

140

160

180

200

0 50 100 150 200

40

60

80

100

120

140

160

180

200

0 50 100 150 200

40

60

80

100

120

140

160

180

200

0 50 100 150 200

40

60

80

100

120

140

160

180

200

0 50 100 150 200

Figure 3.9: Evidence of preferred stable states and phase transitions in the independent 2DOFconfiguration: successive pseudo-stationary regimes obtained with τh

u = 0.055, τhv = 0.55, τk

u =0.03, τk

v = 0.3. Each graph shows the corresponding ankle-hip phase plot. In all graphs, theaxes denote the horizontal coordinate of the hip and ankle markers’ visual positions.


regimes were very rapid. Interestingly, Goldfield (1995) reported that it is a characteristics of sponta-

neous activity in infants that it enters preferred stable states and exhibits abrupt phase transitions. After

perturbation, the hip returned to its former stationary regime. The pseudo-stationary regimes in the

motion of the ankle only partially overlapped with those observed earlier.

Hip

mar

ker

posi

tion


Ank

le m

arke

rH

ip m

arke

rK

nee&

hip

posi

tion

posi

tion

com

man

d

Time (in ms)

40

60

80

100

120

140

160

180

200

0 50 100 150 200 0 10000 20000 30000 40000 50000 60000

0

50

−50

0

50

−50

0

0

Figure 3.10: Large amplitude smooth performance after a long transient: left, the ankle-hipphase plot with τh

u = 0.055, τhv = 0.65, τk

u = 0.025, and τkv = 0.35; right, the corresponding time-

series for hip and ankle visual positions and motor commands.

Finally, with τhu = 0.055, τh

v = 0.65 and τku = 0.025, τk

v = 0.35, seemingly optimal performance was

observed. An amplitude of 120 units was achieved and sustained. In-phase smooth oscillatory behavior

was obtained both at the hip and ankle level. The hip-ankle phase plot is given in Figure 3.10(left).

The time series provided in Figure 3.10(right) shows that this stationary regime was achieved only

after a smooth transient of about 50s. This regime was found to show good robustness against external

perturbations.

1-DOF exploration and physical entrainment

Freezing the lower degree of freedom yielded a very different value landscape. Figure 3.11 depicts the

value landscape uncovered by a single explorative run. As shown by the large number of configurations

visited and the size of the dots (the value), the system settled briefly in a number of oscillatory behaviors

of moderate value v. A quantitative measure of these states is provided by the probability distribution

function shown by Figure 3.6(middle). It can also be noted that all higher-yield configurations were

located in a compact region of the state – roughly, τhu ∈ [0.02,0.08] and τh

v ∈ [0.5,0.8] – an observation

confirmed when a systematic exploration of the parameter space was performed (Figure 3.12). The cor-

responding configurations were found to exhibit good robustness against environmental perturbations,

such as a manual push (Lungarella and Berthouze, 2002a).

We suggest that the compact region of the parameter space found to yield consistent values v corre-


0.2 0.3 0.4 0.5 0.6 0.7 0.80.02

0.03

0.04

0.05

0.06

0.07

0.08

Tv

Tu

Figure 3.11: Value landscape (hip space) uncovered by a single exploratory run in a 1-DOFconfiguration, i.e., the second DOF (knee) is frozen. The size of a dot (a control setting visitedby the exploratory process) is proportional to the value v obtained for that particular controlsetting. Initially, τh

u and τhv were randomly selected in the interval [0.02,0.04] and [0.2,0.4] respec-

tively. The exploratory run took roughly 10 minutes.

Figure 3.12: Value landscape obtained during a systematic exploration of the hip parameterspace in a 1-DOF configuration, i.e., the second DOF (knee) was frozen. The parameter spacewas discretized in a 15x15 sampling and the figure is a linear approximation of the resultingvalues. Brighter colors denote higher-yield settings. The experiment took about 150 minutes.


sponds to a range of frequencies where “physical entrainment” – entrainment to body dynamics – can

take place. Evidence for that can be found by comparing the frequency of the oscillating system with

both its natural frequency and its control frequency. A difference with either indicates that both body

dynamics, that is, reaction forces of the actuated body parts on the body, inertia, and environmental

forces contribute to shift the system’s frequency away from the frequency it would otherwise show in

a disembodied setup. The exploitation of such dynamics has been shown to yield robust behavior in

various tasks (Williamson, 2001; Miyakoshi et al., 1994).

The natural frequency of the system was measured by manually pushing the robot and letting

it swing freely, while tracking the position of the hip marker. The frequency was experimentally

found to be 0.905Hz (period of 1105ms) and this value was confirmed by spectral analysis of the

hip position’s time series (with a sampling frequency of 33Hz). We then considered two parameter

settings located in the high-yield compact area identified in Figure 3.12, namely, τhu = 0.040, τh

v = 0.65

and τhu = 0.070, τh

v = 0.65. In a disembodied system, that is, in simulation, these settings are shown –

by spectral analysis of the oscillator’s output – to produce a control pattern with a frequency of 0.71Hz

and 0.89Hz respectively. Experimentally however, the actual frequencies were found to be 0.77Hz

and 0.96Hz, respectively, which could be explained either by the inaccuracy inherent to servo-motor

control or by friction forces. After the system reached a stationary regime, the frequencies (1.075Hz

and 1.15Hz, respectively) were observed to be significantly different from either the natural frequency

or the control frequency, thus providing us with evidence that physical entrainment did indeed take

place. Frequency measurements made on other oscillator settings of the high-yield compact area were

found to range from 0.93Hz to 1.22Hz. This range of frequency explains the location of the basin of

attraction of Figure 3.12. Indeed, phase locking only takes place if the control inputs are in a range of

frequencies that is not too far apart from the natural frequency of the system. At first sight, this result

is at odds with existing studies showing that entrainment is a robust property and occurs with any

parameter setting such that τuτv∈ [0.1,0.5]. However, it is important to stress again that in these earlier

studies, entrainment is observed between the control frequency of the actuated joint and the feedback

frequency of the actuated system under environmental perturbations, for example, the frequency of the

robot arm sawing a wooden piece, or the arm juggling with a slinky toy (Williamson, 2001). In our

work, however, we are considering the swinging frequency of a system that is not directly actuated.

Therefore, we are discussing entrainment between the induced effects of the controlled parts on the

global system – pendulum + robot – and environmental dynamics, here gravity, physical structure

supporting the actuated system and friction forces.


0.2 0.3 0.4 0.5 0.6 0.7 0.80.02

0.03

0.04

0.05

0.06

0.07

0.08

Tv

Tu

0.2 0.3 0.4 0.5 0.6 0.7 0.80.02

0.03

0.04

0.05

0.06

0.07

0.08

Tv

Tu

Figure 3.13: Effect of the freeing of the hip DOF on the exploration of the 2-DOF configura-tion. Left, value landscape uncovered by a single exploratory run in a 1-DOF configuration,i.e., the second DOF (knee) was frozen. When the system reached a stable oscillatory state,here denoted by a white triangle (roughly [0.7,0.04]), the second DOF was released. The rightgraph shows the value landscape uncovered by the exploratory process in the resulting 2-DOFconfiguration, with an initial condition represented by the white rectangle (roughly [0.3,0.03]).In both graphs, the size of a dot (a control setting visited by the exploratory process) is pro-portional to the value v obtained for that particular control setting. Initially, τh,k

u and τh,kv were

randomly selected in the interval [0.02,0.04] and [0.2,0.4] respectively. The overall experimenttook roughly 20 minutes.


Figure 3.14: Effect of the freeing of the hip on the exploration of the 2-DOF configuration.Value landscape obtained during a systematic exploration of the knee parameter space afterits release when the system was in a stable oscillatory state in a 1-DOF configuration. Thehip oscillator was initialized with τh

u = 0.054, τhv = 0.65, which corresponds to a high-yield 1-DOF

configuration. The parameter space was discretized in a 15x15 sampling and the figure is alinear approximation of the resulting values. Brighter colors denote higher-yield settings. Theexperiment took about 150 minutes.

2-DOF bootstrapped control

When the second degree of freedom was released, that is, after the system was stabilized in its 1-DOF

stationary regime, the resulting value space was characterized by a dense distribution of high-yield

parameter settings. In Figure 3.13, we show the results of a single explorative run. The graph on

the left shows the initial part of the experiment, namely, the 1-DOF exploration of the hip parameter

(starting from the same initial conditions as in all other experiments). This value landscape naturally

has similar properties to those observed in Figure 3.11. The triangle denotes the hip parameter setting

after which the knee joint is released (or freed). The graph on the right depicts the value landscape

uncovered by the exploratory run in the knee parameter space, after release. The initial knee setting

is denoted by the white triangle, that is, the same setting used in all experiments. The exploratory

run only covered a compact high-yield region of the parameter space, an observation quantitatively

confirmed by the probability distribution function shown by Figure 3.6 (bottom).

At first sight, this result could appear trivial. Indeed, the freeing of the second degree of freedom

took place when the 1-DOF regime was already yielding a high value. Thus, by taking into account

the morphology of the system as well as the ratio r < 1.0 between knee and hip tonic excitations

(r = tek

teh = 0.75), both the inertia of the already oscillating system and the morphology of the system

could be attributed to the high value yielded when the knee parameter space was explored. However,


when a systematic exploration of the knee parameter space was realized, using the same hip parameter

as initial condition, we observed the value landscape depicted by Figure 3.14. The figure shows that

the system’s performance was not only accounted for by the inertia generated by the 1-DOF stationary

regime but also by the selection of an appropriate knee control setting. Indeed, the standard deviation of

the probability distribution function in the bootstrapped 2-DOF systematic exploration – SD = 0.0573

– is greater than the standard deviation obtained in the independent 2-DOF systematic exploration –

SD = 0.0386.

Two additional observations are noteworthy. First, the mean of the probability distribution function

obtained in the systematic exploration (mean = 0.474) is higher than the mean value (mean = 0.403) of

the probability distribution function obtained for the exploratory run discussed in this section, thus

indicating that the result depicted by Figure 3.13 (right) was not marginal. Second, this mean value is

also higher than the mean value obtained during the systematic exploration of the 1-DOF configuration

(mean = 0.158), even when considering only the compact area of high value (mean = 0.206 with a

maximal value of 0.540 for τu ∈ [0.02,0.08] and τv ∈ [0.5,0.8]). This indicates that the high value

obtained during the 1-DOF stationary regime could not account for the high value obtained after release

of the second degree of freedom, and in addition, most of the configurations explored yielded a higher

value than possibly obtained in the 1-DOF configuration. This observation validates our hypothesis that

the freezing and subsequent freeing of the second degree of freedom results in higher performance, and,

in effect, reduces the sensitivity of the system to the selection of a particular hip-knee configuration

(when compared to the independent exploration).

A reviewer questioned the fact that the value landscape obtained during 2-DOF control could differ

from the value landscape obtained during bootstrapped 2-DOF control given that no parameters other

than τh,ku ,τh,k

v were varied. Suggesting that differences could only be accounted for by different regions

of the parameter space being explored due to distinct histories, the reviewer questioned how we could

possibly explain the different values obtained in the systematic exploration.

First and foremost, this suggestion does not consider the delayed introduction of the second degree

of freedom. Because the second degree of freedom was introduced after a stationary regime was

obtained in the 1-DOF configuration, the initial conditions for a given hip-knee setting were changed.

Second, in a disembodied system, it could be argued that after a suitable transition period, the

bootstrapped system would eventually return to the state obtained in the independent case. However, it

did not occur in this study – and further experiments by the authors confirmed it even in the presence of

stronger environmental interaction (Lungarella and Berthouze, 2002a) – because physical entrainment

took place. As discussed earlier, the frequency obtained in the 1-DOF case was not equal to the

control frequency. Because both oscillators are fed with the same proprioceptive feedback, namely,

the visual position of the hip marker, when the second degree of freedom is released, its controller is

stimulated by proprioceptive feedback on which the hip oscillator has already entrained. Given the


ability of oscillators to entrain on an input signal, entrainment between the two joints effectively takes

place. Note, however, that differently from the neural entrainment that we will discuss in the next

section, here entrainment was mediated by the body and not by explicit connections between the two

controllers. A similar result, but in a different context, was reported by Taga (1991) who qualified

such entrainment as “global entrainment” 4, and by Williamson (2001). In the case of independent

control, however, this property cannot be expected because the proprioceptive feedback only reflects

the output activity generated by the particular hip-knee control configuration and thus the resulting

value landscape is very sensitive to the choice of parameters.

Control synergy and neural entrainment

In both 2-DOF independent control and bootstrapped control, the addition of joint synergy resulted in

more or less strongly correlated knee and hip control patterns. Such behavior is characteristic of “neural

entrainment”, whereby the control frequency of the lower limb locks onto the control frequency of the

upper limb. This sort of result has been extensively commented on in the literature (e.g. Taga, 1991;

Williamson, 1998).

Hip

mar

ker

posi

tion

Hip marker position Ankle marker position

Ank

le m

arke

r po

sitio

n

0

50

100

150

200

0 50 100 150 2000

50

100

150

200

0 50 100 150 200

Figure 3.15: Large amplitude oscillations with a strong intersegmental coupling (ωp = 1.0) inthe independent 2DOF configuration when τh

u = 0.055, τhv = 0.65, τk

u = 0.025, τkv = 0.35: phase plots

of the hip (left) and ankle (right) motions in the stationary regime. In both graphs, the axesdenote the horizontal coordinates of the hip (respectively ankle) marker’s visual positions.

In a series of experiments, we studied the role played by the intersegmental coupling gain ωs. With

too low a value, the coordination between hip and knee oscillators was very loose and the resulting

behavior was qualitatively similar to the results obtained in a 2-DOF independent configuration. With

a high value (here, 1.0), a strong coupling occured and the lower limb was essentially driven by the

4In his own terms, “since the entrainment has a global characteristic of being spontaneously established through inter-action with the environment, we call it global entrainment”.


0.2 0.3 0.4 0.5 0.6 0.7 0.80.02

0.03

0.04

0.05

0.06

0.07

0.08

Tv

Tu

0.2 0.3 0.4 0.5 0.6 0.7 0.80.02

0.03

0.04

0.05

0.06

0.07

0.08

Tv

Tu

Figure 3.16: Toward a flexible 1-DOF system: Effect of an intermediate coupling (ωs = 0.50)between hip and knee on the value landscapes (left: hip parameter space; right: knee space)uncovered by a single exploratory run in a 2-DOF configuration. In both graphs, the size of adot (a control setting visited by the exploratory process) is proportional to the value v obtainedfor that particular control setting. Initially, τu and τv were randomly selected in the interval[0.02,0.04] and [0.2,0.4] respectively. The exploratory run took roughly 10 minutes.

3.6. Conclusion 84

upper limb control unit. From a qualitative point of view, such strong coupling led to the most natural

looking swinging pattern and amplitudes were shown to reach their maximum value. In effect, the

2-DOF system became a “flexible 1-DOF system.” Figure 3.15 shows the resulting phase plots for hip

and ankle motions. Ankle and hip are in-phase and the ankle motion follows a sinusoid of very large

amplitude (160 units). From the point of view of the value system, a strong coupling results in the lower

limb’s control parameters becoming a nonfactor. This is confirmed by the value landscapes uncovered

by an exploratory run. As shown in Figure 3.16, a strong correlation appears between the region of

space covered by the hip exploratory process (left) and the knee exploratory process (right). When

the hip was controlled by a high-yield setting (and note that in this particular run, almost all settings

were in the high-yield region discussed earlier), the value of the 2-DOF system was high because the

lower-limb rapidly phase-locked on the hip (by neural entrainment) and thus physical entrainment (as

observed in the 1-DOF configuration) could occur.

When intermediate coupling values were considered, that is, between 0.25 and 0.50, two important

observations could be made: (a) Transients were shorter (the duration of the transient was reduced

by a factor 2 in the configuration previously discussed); and (b) abrupt phase transitions that were

observed otherwise disappeared. This result is not surprising. With an appropriately chosen coupling

gain, neural entrainment is achieved between control units, and the two units with their own distinct

time constants (or frequencies in this case) pull each other toward a new common time constant (here,

a new frequency). Because of this smooth convergence towards a stable configuration, the ongoing

physical entrainment is also stabilized, by entrainment effect. Thus, abrupt phase transitions, which

demonstrate a global instability of the control, do not occur and the transients are shortened.

Summary

In summary, the above experiments have shown the following: The outright use of both degrees of

freedom resulted in a very rugged value landscape with sparse, high amplitude but not necessarily

robust, oscillatory behaviors. Freezing the lower degree of freedom enlarged the area of high-yield

because physical entrainment could occur. While lowering the average amplitude of the oscillations,

it supported multiple directions of stability, which stabilized the system when the second degree of

freedom was released. Optimal performance was obtained when joint synergy was considered and

neural entrainment between control units occured.

3.6 Conclusion

With this case-study, we provided evidence to substantiate our claim that in learning a new motor task

(here, swinging), a reduction of the number of available biomechanical degrees of freedom helps sta-

3.6. Conclusion 85

bilize the interplay between environmental and neural dynamics. Among the various types of adaptive

mechanisms that take part in this interaction, we focussed on entrainment, both neural and physical,

and morphological development. Our study represents an attempt to disentangle the complex interplay

between morphological, neural and environmental dynamics.

With our experimental results, we stressed the importance of morphological dynamics and its ef-

fects on environmental interaction. An outright use of all degrees of freedom was shown to reduce

the likelihood that physical entrainment takes place, which in turn resulted in a reduced robustness of

the system against environmental perturbations. Instead, by freezing some of the available degrees of

freedom, physical entrainment could occur and a large high-yield area of the parameter space was ob-

tained, producing robust oscillatory behaviors. This robustness eventually stabilized the system when

the frozen degrees of freedom were released.

Interestingly, our thesis is supported by descriptive evidence in both developmental psychology

and biomechanical studies of motor skill acquisition. Thelen and Smith (1994) reported that infants

first learning to stand, typically solve the problem of how to coordinate their degrees of freedom by

freezing their body segments into an inverted pendulum-type postural coordination. Similarly, studies

by Jensen et al. (1995) on the development of infant leg kicking between 2 weeks and 7 months of

age, showed a progression from proximal control (at the hip) to more distal control (inclusion of knee

and ankle joints). Further support comes from Bernstein’s seminal work on motor skill acquisition in

which he showed that the freezing of a number of degrees of freedom is followed, as a “consequence of

experiment and exercise,” by the preliminary lifting of all restrictions, and the subsequent incorporation

of all possible degrees of freedom (Bernstein, 1967). In doing so, differentiated patterns of movement

and synergies can be explored, and eventually the most efficient or economical movement patterns can

be selected.

These three examples reflect quite accurately what we observed in our experiments: Morpholog-

ical changes (here, freezing and freeing of biomechanical degrees of freedom) are a form of plastic

mechanism and contribute to the life-time adaptivity of a system, that is, they are beneficial during

development and after. As for any other plastic mechanism, they have their own dynamics and time

scale. As such, their interplay with mechanisms operating at other time scales is likely to contribute

to the emergence of robust behavior. This hypothesis is actually supported by a recent study by Ro-

jdestvenski et al. (1999) on the robustness of biological systems with respect to changes of microscopic

parameters as a consequence of time scale hierarchy. The authors illustrate how time scale hierarchies

can lead to a decoupling of regulatory mechanisms and the emergence of robustness against parameter

variations.

In future, we will aim at corroborating this hypothesis through the study of tasks involving a greater

number of degrees of freedom as well as more environmental interaction. This will undoubtly raise

the issue of scalability of our current framework. In the presence of an increased number of avail-

3.6. Conclusion 86

able degrees of freedom, which joints should be frozen and in what order? Will a simple reduction

of the number of available degrees of freedom be sufficient to yield robust adaptivity? As a matter of

fact, our on-going studies (Lungarella and Berthouze, 2002b) show that, consistent with observations

made in developmental psychology, alternate freezing and freeing of degrees of freedom may be nec-

essary when the inability to control excessive degrees of freedom pushes the system outside the limits

of postural stability. From this perspective, morphological changes truly have their own dynamics,

and understanding the key features of this dynamics will be an interesting challenge. Even more so

will be the study of the link between morphological dynamics and yet another form of dynamics, the

spontaneous dynamics of Goldfield (1995).

Chapter 4

Alternate Freezing and Freeing of

Degrees of Freedom1

4.1 Synopsis

In the previous chapter, we provided experimental evidence that starting with fewer degrees of freedom

enables a more efficient exploration of the sensorimotor space during the acquisition of a task. The

study came as support for the well-established framework of Bernstein (1967), namely that of an initial

freezing of the distal degrees of freedom, followed by their progressive release and the exploitation

of environmental and body dynamics. In this chapter, we revisit our study by introducing a nonlinear

coupling between environment and system. Under otherwise unchanged experimental conditions, we

show that a single phase of freezing and subsequent freeing of degrees of freedom is not sufficient

to achieve optimal performance, and instead, alternate freezing and freeing of degrees of freedom is

required. The interest of this result is two-fold: (a) it confirms the recent observation by Newell and

Vaillancourt (2001) that Bernstein’s framework may be too narrow to account for real data, and (b)

it suggests that perturbations that push the system outside its postural stability or increase the task

complexity may be the mechanism that triggers alternate freezing and freeing of degrees of freedom.

4.2 Introduction

Body-related morphological changes during the early stages of infancy, either slow and irreversible

modifications (such as physical growth), or relatively rapid, task-related re-organizations of the

musculo-skeletal system (such as the transition from crawling to standing), are a salient character-

1To appear as Berthouze, L. and Lungarella, M. Motor skill acquisition under environmental perturbations: on the neces-sity of alternate freezing and freeing of degrees of freedom. Adaptive Behavior, 12(1), 2004.

87


istic of the ongoing developmental process. In this chapter, we focus on the effect on behavior of one

particular form of morphological change: the release of constraints in the motor system. A few telling

examples of constraints in the sensory, motor, and neural systems of vertebrate species such as rats, cats

and humans are the immaturity of the accommodative system (Turkewitz and Kenny, 1982), the low

acuity of vision and absence of binocularity (Hainline, 1998), the low ratio of leg muscle to leg fat, and

the poor postural control of head, trunk, arms, and legs (Bertenthal and Von Hofsten, 1998; Thelen and

Smith, 1994). Studies in developmental psychology have shown that constraints in the sensory system

and biases in the motor system and their subsequent release, may play a pivotal role in the ontogeny

of motor skills, and in shaping the infant’s exploratory behavior (Bushnell and Boudreau, 1993; Gold-

field, 1995; Harris, 1983; Piek, 2002; Thelen et al., 1984; Turkewitz and Kenny, 1982). In this chapter,

we consider the morphological limitations in the motor apparatus of a developing system as particular

instances of ontogenetic adaptations, that is, neurobehavioral traits of an immature organism with a

specific adaptive role at a particular stage of development (Bjorklund and Green, 1992). We make the

premise that appropriate initial constraints on morphological resources are not only beneficial to the

emergence of stable sensorimotor patterns with an increased tolerance to environmental perturbations,

but help also to bootstap later stages of learning and development.

The study of morphological changes is an important area of research. Yet, they have been largely

neglected by biologically motivated robotics research, presumably because of (a) the difficulties in-

volved with the actual implementation of the suggested morphological changes in real-world systems

(as opposed to simulated systems in which morphological changes can be achieved relatively easily);

and (b) the lack of proper means for quantifying their effects on neural dynamics and behavior. Re-

cently, the robotics community has started to address the former issue. The ultimate goal is to create

machines that by changing their morphology (shape) are able to perform various tasks in various envi-

ronments. Examples are the self-reconfigurable modular robots built by Murata et al. (2001), and the

morpho-functional machine initiative promoted by Hara and Pfeifer (2003). In both instances, change

of shape is concerned with the functionality of the machine, and not with learning mechanisms. The

quantification of movements has also been investigated, and a few methods have been proposed. Di-

mensional analysis, for instance, gives an index of the number of independent degrees of freedom

required to produce the time series of a particular movement (Kay, 1988; Mitra et al., 1998). The

spatio-temporal organization of the joint-space data associated with a movement can also be captured

by principal component analysis. Haken (1996) showed that early in the learning of a “pedalo task” (a

skating locomotion task in which both skates are connected by a rigid link that constrains their relative

motion to a cycloidal trajectory in the vertical plane), several principal components were necessary to

explain most of the variance of the data, and that after practice, this number of significant principal

components collapsed to one. Although useful for a descriptive characterization of the system, both

types of analysis do not provide any information on the mechanisms underlying the described learning


process.

More central to the theme of this chapter is the “degrees of freedom problem”, first pointed out

by Bernstein (1967) (see also Newell and Vaillancourt, 2001; Sporns and Edelman, 1993; Vereijken

et al., 1992; Zernicke and Schneider, 1993): Although the human musculo-skeletal apparatus is a

highly complex and non-linear system, with a large number of potentially redundant degrees of free-

dom (e.g., more than one motor signal can lead to the same trajectory), well-coordinated and precisely

controlled movements emerge. In reality, the redundancy increases at the level of the muscles (there

are many more muscles than joints), and explodes at the neural level. While it guarantees flexibility

and adaptability (think of the hand’s astounding manipulative abilities, for instance), it also challenges

the control of body movements, largely because of the enormous number of components involved in

the generation and coordination of a movement. A possible solution to the control issues raised by the

excess number of degrees of freedom, was suggested by Bernstein himself. His proposal is character-

ized by three stages of change in the number of degrees of freedom that accompany motor learning

and development. Initially, in learning a new skill or movement, the peripheral degrees of freedom

(the ones farther from the trunk, such as wrist, and ankle) are reduced to a minimum (freezing). Sub-

sequently, as a consequence of experiment and exercise, restrictions at the periphery are gradually

lifted (freeing), till “all” degrees of freedom are incorporated. Eventually, reactive phenomena (such

as gravity and passive dynamics) are exploited, and the most efficient movements are selected. Several

studies have provided evidence for “particular features” of Bernstein’s three-stage model. Vereijken

et al. (1992), for example, conducted an empirical test of the related issues of freezing and freeing

degrees of freedom in adults learning a ski simulator task. The kinematic analysis of the limb and torso

motions showed that at the outset of learning, subjects froze many of the joint segments of the whole

body. With subsequent practice, subjects introduced active motion at the ankle, knee, and hip joints in a

fashion consistent with the freeing of (release of the ban on) degrees of freedom. Other investigations

included the learning by adults of a handwriting signature with the non-dominant limb (Newell and

van Emmerik, 1989), a dart throwing task (McDonald et al., 1989), pistol shooting (Arutyunyan et al.,

1969), and the development of infant leg kicking between two weeks and seven months of age (Jensen

et al., 1995).

In this study, we approach the “degrees of freedom problem” by employing a robot-based synthetic

modeling that exploits findings from developmental psychology. Some instances of a developmental

approach to the issue have already been reported (Berthouze and Kuniyoshi, 1998; Lungarella and

Berthouze, 2002c; Metta, 2000) (see Lungarella and Berthouze, 2002c, for a review). Those studies,

however, framed the role of the freezing of degrees of freedom, and their subsequent freeing, in an

information-processing context – similar to existing connectionist learning techniques, such as con-

strained or incremental learning (e.g., Elman, 1993). More in line with our ideas, Taga (1997) reported

computer simulations of the development of bipedal locomotion in human infants. By freezing and

4.3. Pendulation study and release of the peripheral degrees of freedom 90

freeing the degrees of freedom of the neuro-musculo-skeletal system, he was able to reproduce (in

simulation) the “U-shaped” developmental trajectory of infants’ stepping movements during which re-

flexive movement patterns first appear, then disappear, and months later reappear in altered form. His

result was in agreement with Bernstein’s three-stage model of skill acquisition, and thus, he hypoth-

esized that a developmental mechanism of freezing and freeing may be important for learning stable

and complex movements. In this chapter, however, we will challenge this model by showing that in the

presence of strong couplings between system and environment during task learning, a rigid sequence

of morphological changes (freezing → freeing → selection) may not be sufficient. Instead, a more

complex dynamics of changes should be considered.

4.3 Pendulation study and release of the peripheral degrees of freedom

In Lungarella and Berthouze (2002c), we reported our investigation on the exploration of pendulation

(or swinging) in a small-sized humanoid robot. We chose swinging as a case-study, because it is a

repetitive activity, and thus, characteristic of emerging skills during the first year of life – see, for

instance, the notions of circular reaction (Piaget, 1953) and body babbling (Meltzoff and Moore, 1997).

Thelen and Smith (1994) suggested that oscillations are the product of a motor system under emergent

control; that is, when infants attain some degree of intentional control of limbs or body postures, but

when their movements are not fully “goal-corrected.”

Assuming a neural control structure suitable for the task at hand, we proposed a comparative anal-

ysis between outright use of the full body for exploration, and progressive exploration characterized by

a developmental freeing of the degrees of freedom, such as the one hypothesized by Bernstein (1967).

The study produced a number of insights, which we summarize here:

• The outright use of all degrees of freedom (hip and knee) reduced the likelihood of “physical en-

trainment”, that is, the mutual regulation of body and environmental dynamics. We observed that

small changes in the control parameters yielded different oscillatory behaviors. Moreover, even

within one control parameter setting, the system displayed several rapid and abrupt transitions

between different stationary regimes. This feature is characteristic of spontaneous movement

activity in infants (Goldfield, 1995).

• By freezing the peripheral degree of freedom (knee), we observed an increase of the range of

control parameter settings that led to stable oscillatory behaviors, as well as of the range of

oscillation frequencies for which physical entrainment could effectively occur. Miyakoshi et al.

(1994) and Williamson (1998) have shown that the exploitation of entrainment can indeed yield

robust behavior in various tasks.

4.4. Adding nonlinear perturbations 91

• Bootstrapped control of all degrees of freedom in which the peripheral degree of freedom was

released after the system had already stabilized in a single degree of freedom (1-DOF) stationary

regime, resulted in a dense distribution of parameter settings yielding stable oscillatory behaviors

with a large amplitude. Statistical analysis showed that these large oscillations could not be

accounted for solely by the oscillations achieved in the 1-DOF (frozen) configuration. Instead,

the freezing and freeing of the degrees of freedom reduced the sensitivity of the system to the

selection of particular hip-knee parameter configurations.

• The study showed that joint synergies 2, which are characteristic of human motor control (e.g.,

Spencer and Thelen, 1999), played a complementary role to physical entrainment during the

release of the peripheral degree of freedom. A strong coupling resulted in “neural entrainment”,

whereby the control frequency of the lower limb locked onto the control frequency of the upper

limb. The phase locking between both limbs stabilized the oscillatory behavior, and thus by

entrainment effect, also the ongoing physical entrainment. Abrupt phase transitions did not

occur and transients were shortened, which is typical for task execution at the later stage of

motor skill learning (Goldfield, 1995).

4.4 Adding nonlinear perturbations

In this chapter, we revisit our previous study by adding a nonlinear coupling between environment

and system. Our focus is on whether a progressive release of the peripheral degrees of freedom can

provide adaptivity and robustness against perturbations and constraints such as the rubber band. Both

experimental setup and control architecture are identical to those used in our previous study.

The experimental setup consisted of a small-sized humanoid robot with 12 degrees of freedom.

Through two thin metal bars fixed to its shoulders, the robot was attached by a passive joint to a

supportive metallic frame in which it could freely oscillate in the vertical (sagittal) plane (see Fig. 4.1).

Each leg of the robot had five joints, but only two of them (hip and knee) were used in our experiments.

High torque servo motors actuated each joint. Because these motors do not provide any form of sensor

feedback, we used an external camera to track colored markers placed on the robot’s limbs. Throughout

this study, we refer to feedback as the visual position of the hip in a frame of reference centered on the

hip position, when the robot is in its resting position.

To study the effect of environmental interaction during learning, we introduced an asymmetric

nonlinear coupling between system and environment in the form of a thread attached to the humanoid

robot at hip-level, and connected to the supportive frame via a rubber band. This flexible link was

2During task-dependent movements, the joints are not controlled individually, but are coupled in such a way that theychange relative to each other. This coupling is called a joint synergy.


Figure 4.1: Humanoid robot used in our experiments.

designed so that the rubber band would extend only when the robot was tilted backwards by at least

10 degrees. This setting was kept constant throughout the study. The strong dampening properties of

this coupling are illustrated in Figure 4.2, which shows the visual positions of the hip and ankle during

oscillations with control parameters known to yield resonant behavior in unperturbed situations.

Figure 4.3 depicts the distributed architecture used to control the humanoid robot. Each limb was

controlled by a separate neural oscillator. The four neural oscillators controlling the knees and hips

were modelled by the following set of nonlinear differential equations, derived from Matsuoka (1985):

τu f u f = −u f −βv f −ωc[ue]+−ωp[Feed ]+ + te

τue ue = −ue−βve−ωc[u f ]+−ωp[Feed ]−+ te

τv f v f = −v f + [u f ]+

τve ve = −ve + [ue]+

where ue and u f are the inner states of the neuron e (extension) and f (flexor), ve and v f are variables

representing the degree of adaptation or self-inhibition of the extensor and flexor neurons, and te is

4.4. Adding nonlinear perturbations 93C

omm

ands

Posi

tions

0 10000 20000 30000 40000 50000 60000

Com

man

dsPo

sitio

ns

Figure 4.2: Resonant oscillations for (τu = 0.065,τv = 0.6) without perturbations (top). Resultingbehavior under perturbations (bottom). In each graph, the time-series denote motor impulses(bottom), ankle position (middle) and hip position (top). In this figure, as well as all other similarfigures in this chapter, the vertical axis is unlabelled, because it depicts time-series of differentscales and units, i.e., visual positions in pixels, motor commands in radians. The horizontalline in the lower graph corresponds to the visual position of the location after which the rubberband is extended. The horizontal axis denotes time in milliseconds.

an external tonic excitation signal that determines the amplitude of the oscillation. β is an adaptation

constant, ωc is a coupling constant that controls the mutual inhibition of neurons e and f , and ω p is a

variable weighting the proprioceptive feedback Feed . This proprioceptive feedback is obtained through

the visual position of the hip in a frame of reference centered on the hip position when the robot is

in its resting position. τu and τv are time constants of the neurons’ inner states and determine the

strength of the adaptation effect. The operators [x]+ and [x]− return the positive and negative parts of

x, respectively.

Joint synergy between hip and knee, i.e., the appropriate phase relationship between the corre-

sponding neural oscillators, was implemented by feeding the flexor unit of the knee oscillator with the

combined outputs of the flexor and extensor units of the hip controller. A factor −ωs([uhf ]

+ + [uhe ]+)

was added to the term τu f u f in the flexor unit of the knee oscillator (Equation 4.1), with uhe and uh

f the

inner states of the flexor and extensor units in the hip oscillator, and ωs the intersegmental coupling

parameter determining the strength of the coupling.


e

f

f

e

ω

ω

c

c

join

t syn

erg

y

tonic impulse (te)

pattern generators

motor com

ma

nds

spring

free joint

ωp

pω

pro

prio

cep

tion

(Fee

d)

ω s

knee joint

hip joint

Figure 4.3: Schematics of the experimental system and neural control architecture. Joint syn-ergy is only activated in experiments involving coordinated 2-DOF control.

As in Taga (1991), we used each neural oscillator as a rhythm generator, with its output y given by

the difference y = u f −ue between the activities of the flexor and extensor units. This value was then

fed to a pulse generator which detects sign changes in the output y of the neural oscillator and generates

a pulse of constant amplitude and of sign sgn(y). The angular position of the motor results from the

integration in time of each pulse. Though very primitive (a variant of on-off control), this controller is

a suitable approximation of the output y. Indeed, it preserves the frequency and maximal amplitude of

4.5. Results and discussion 95

the signal, as well as the timing of sign inversions within one period.

As in the original study (unless specified otherwise), we did not change the following parameters

throughout this study: β = 2.5, ωc = 2.0, te = 20 for the hip (te = 15 for the knee). Other parameters

were set as discussed in the text.

4.5 Results and discussion

4.5.1 Protocol

With the aim of a comparative analysis between the outright use of all degrees of freedom and a pro-

gressive release of the degrees of freedom, we realized two sets of experiments. In the first set, we

considered “2-DOF exploratory control”, with each pair of hip and knee joints controlled by a separate

oscillator unit and the other joints kept stiff in their reset position. We treated two cases: In the first

case, the oscillator units were independent and their respective parameter spaces were independently

explored. In the second case, the oscillator units were coupled via an intersegmental coupling param-

eter ωs, with the goal of realizing neural entrainment between oscillatory units. In the second set of

experiments, we considered a “bootstrapping 1-DOF exploratory phase” during which only the hip

joint was controlled, while other joints were kept stiff in their reset position. When a stationary regime

was obtained, the peripheral degree of freedom (knee) was released and controlled by its own oscillator

unit. The robot’s movements were analyzed via the recording of the hip, knee, and ankle positions.

The same initial conditions were used in all experiments, with the humanoid robot starting from its

resting position. We only considered parameter configurations which yielded motion without external

intervention.

4.5.2 Experimental observations

Unless specified otherwise, all experiments within each scenario were found to yield qualitatively

similar results in terms of the characteristics of the oscillatory behavior, with variations accounted for

by differences in initial conditions. For practical reasons (excessive strain on the physical structure of

the robot as well as on the servo motors), we did not conduct sufficient runs to establish statistically

meaningful results between scenarios.

Selection of suitable hip control parameters in 1-DOF exploratory control

Because an exhaustive exploration of the parameter space for two independent neural controllers was

not feasible, we performed a preliminary exploration of the hip oscillator’s parameter space in a 1-

DOF configuration (the reader should refer to Figure 4.4 and Table 4.1 for an overview of the dif-


ferent configurations discussed in the following paragraphs). We conducted the exploration using the

value-based exploration algorithm presented in the first study (Chapter 3). This exploration essentially

confirmed our previous findings. Adaptivity to external perturbations and optimal task performance,

i.e., oscillations with large amplitude, required fine tuning of the parameters. Although the hip-ankle

phase plots were not necessarily stationary, all configurations led to a stationary regime of hip oscilla-

tions. Two settings were of particular interest, and were used to carry out the experiments described

here: (τu = 0.035,τv = 0.65) and (τu = 0.06,τv = 0.65), with ωhp ∈ [0.0,7.0] for the first setting, and

ωhp ∈ [0.0,20.0] for the second setting.

ωph

ωph

ωph

ωs

multiple co−existing regimes (P3)

noise driven (P2)

noise independent (P4)

stat. low ampl. hipnon−stat. ankle (P6)

Flexible 1−DOF (P10)

study of intersegmental coupling in 2−DOF conf.

Knee Hip

ωph=2.0 (fixed)

failure to reduce transients and abrupt phase transitions (P9)

stationary hip regime (P1)

co−existing regimes interesting

study of prioprioceptive coupling (hip−space)

low amplitude osc.anti−phase (P0)

P5

identification of suitable 1−DOF configurations

co−existing regimes (no synergy) (P8)

knee−space exploration and role of proprioceptive couplingon physical entrainment

co−existing regimes:stability test (P7)

Figure 4.4: Flow of the proposed experimental discussion with respect to both 1-DOF and2-DOF exploration (cf. Table 4.1).

The first setting (τu = 0.035,τv = 0.65) was characterized by low-amplitude (23 units) antiphase

oscillations of the legs with respect to body motion. Antiphase oscillations are indicated by a phase dif-

ference between the vertical components of the hip and ankle positions equal to π radians. A transversal

analysis along the proprioceptive gain ωhp ∈ [0.0,7.0] showed that all experiments yielded a stationary

regime, robust to external perturbations such as a manual push. With a very weak proprioceptive gain,


Label τhu,τh

v τku,τk

u ωhp ωk

p ωs

P0 0.035, 0.65 0.0P1 0.060, 0.65 0.0P2 0.060, 0.65 [3.0,7.0]P3 0.060, 0.65 [2.0,3.0]P4 0.060, 0.65 [0.0,2.0]P5 0.035, 0.65 [0.0,7.0]P6 0.060, 0.65 [0.020,0.090], [0.35,0.80] 2.0P7 [0.025,0.075], 0.65 2.0P8 0.060, 0.65 0.035, 0.40 2.0 0.0P9 0.060, 0.65 0.035, 0.40 2.0 [0.25,0.75]P10 0.060, 0.65 0.035, 0.40 2.0 1.0

Table 4.1: Synopsis of the control parameter settings used in Figure 4.4.

i.e., ωhp ∈ [0.0,1.0], we observed smooth, low amplitude (50 units at hip-level) in-phase (no phase

difference) oscillations. With a larger gain, the hip oscillations were limited to an amplitude corre-

sponding to the rubber-band extension point and the ankle behavior was not smooth. We summarized

these results in Figure 4.5.

With the second setting (τu = 0.06,τv = 0.65) a transversal analysis along the proprioceptive gain

showed a variety of behaviors. For extreme values of ωhp (ωh

p = 0 and ωhp > 6.0), we did not observe

any sustained oscillations, and amplitudes did not exceed the rubber-band extension point. Further-

more, manual pushes did not enable the system to stray around this “trivial” attractor. This result was

predictable. With ωhp = 0.0, variations in the inertial angles resulting from the perturbation were not fed

to the controller; and physical entrainment could not occur because the time-constants of the feedback

loop and the control units were not compatible. On the other hand, with too high a gain (ωhp > 6.0), the

system was essentially driven by noise, thus leading to a pseudo-chaotic oscillatory behavior.

For intermediate values of ωhp, we observed multiple co-existing regimes. The value ωh

p = 2.0 was

particularly noticeable with three distinct regimes. From the resting position, a first quasi-stationary

regime was obtained in which in-phase oscillations were sustained, albeit with very low amplitude

(rubber-band extension point) and with a continuous shift between the phases of the hip and ankle

oscillations. After a manual push, a second stationary regime was reached in which larger hip oscil-

lations occurred, but with an aperiodic hip-ankle phase plot. With yet another push, large in-phase

smooth oscillations (amplitude 75 units) were obtained, similar to those obtained with ωhp = 0.5. This

regime was not robust against external perturbations and the system would subsequently settle in any

of the three regimes. We found that this switching behavior was repeatable over various experiments.

From the point of view of the trade-off between stability and plasticity, i.e., stability to perturbation

is desirable but not at the cost of learning plasticity, a systematic occurrence of this switching behavior

across the entire control parameter space would be highly desirable as an intrinsic mechanism to strive


20

40

60

80

100

120

140

0 50 100 150 200 250 300 350 400 450

20

40

60

80

100

120

140

20 40 60 80 100 120 140

20

40

60

80

100

120

140

0 50 100 150 200 250 300 350 400 450

20

40

60

80

100

120

140

20 40 60 80 100 120 140

Figure 4.5: Time-series of hip position (top) and ankle-hip phase plots (bottom) for ωhp = 0.25

(left) and ωhp = 4.0 (right). The oscillator time-constants are: τu = 0.035,τv = 0.65 in both cases. In

the upper row of plots, the vertical axis denotes the visual positions of the ankle (left) and thehip (right). The horizontal axis denotes time in milli-seconds. In the lower row of plots, bothvertical and horizontal axes correspond to the visual positions of the hip (left plot) and ankle(right plot) in pixels.

around attractor states. Consequently, we carried out a set of experiments, in which we fixed the pro-

prioceptive gain to the critical value ωhp = 2.0. The parameter space for the hip controller was explored

with τu in the “usable” range [0.025,0.075]. The switching behavior could not be reproduced, however.

Instead, all configurations produced a single stationary regime, robust to external perturbations, with

low-amplitude hip oscillations and generally non-periodic hip-ankle phase plots.

Instability of 2-DOF exploratory control

Using the hip parameter identified above (τhu = 0.06,τh

v = 0.65), we realized a sparse exploration of

the knee neural oscillator parameters with τku ∈ [0.02,0.09] and τk

v ∈ [0.35,0.8]. Proprioception was

fed to the hip unit only, with a gain ωhp = 2.0. All experiments yielded the same qualitative behavior:

stationary low-amplitude (30 units) hip oscillations and non-stationary ankle movements.

This result was predictable. Because of its lack of proprioceptive feedback, the knee unit could not

entrain with the hip oscillations. Meanwhile, the hip unit entrained to the oscillations resulting from the


simultaneous motor commands of both hip and knee, thus inducing a continuous phase shift between

hip and knee motor commands (see Fig. 4.6). Because of the morphology of the system and the 3 : 2

ratio between hip and knee tonic excitations, hip oscillations were sustained, but both environmental

perturbations and out-of-phase knee oscillations reduced the amplitude of the oscillation to a nominal

level.

0 10000 20000 30000 40000 50000 60000

Com

man

dsPo

sitio

ns

Figure 4.6: From top to bottom, time-series of hip and ankle positions, hip and knee motorcommands with the following parameters: τh

u = 0.06, τhv = 0.65, τk

u = 0.02, τkv = 0.8 and ωh

p = 2.0.The horizontal axis denotes time in milliseconds. The system was manually perturbed afterabout 37.5s.

This interpretation was confirmed with experiments carried out with a small proprioceptive gain on

the hip (ωhp = 0.25). With a lower gain, the hip motor commands were not entrained as much to overall

oscillations, and physical entrainment between knee and hip motor commands could occur because

the phase shift was slower. Figure 4.7 illustrates the co-existence of two regimes when ωhp = 0.25 and

τku = 0.025,τk

v = 0.35. The first regime is qualitatively similar to the behavior observed in the previous

instance (although in this case, the hip oscillations also exhibit a “wave-like” stationary regime). The

second regime consists of large (55 units) in-phase oscillations.

To further confirm the hypothesis, we carried out a last batch of experiments in which the knee

control unit was also fed with proprioceptive feedback. After fixing the knee unit parameters to τku =


50000 100000 150000

Com

man

dsPo

sitio

ns

Figure 4.7: From top to bottom, time-series of hip and ankle positions, hip and knee motorcommands with the following parameters: τh

u = 0.06, τhv = 0.65, τk

u = 0.025, τkv = 0.35 and ωh

p = 0.25.The horizontal axis denotes time in ms. The system was manually perturbed at time 37s, 75s,108s and 147s (vertical lines).

0.06,τkv = 0.65, we varied the knee proprioceptive feedback gain ωk

p in the interval [0.0,8.0]. We found

oscillatory behaviors qualitatively similar to those obtained without proprioception to the knee, namely,

low-amplitude hip oscillations, stationary regime robust to external perturbations. Higher gains led to

a reduction of the phase difference between hip and ankle oscillations, and to a smoother oscillatory

behavior. With different knee parameters (τku = 0.02, τk

v = 0.35), however, we observed a wide range of

behaviors, from non-stationary and non-smooth ankle behaviors to in-phase and stationary oscillations.

With an increase in the knee proprioceptive gain, the phase shifts became stronger and the stationary

regimes were not sustained.

As in our initial study, the parameter ωs, which determines the strength of the intersegmental

coupling, played a crucial role. With too low a value, the coordination between hip and knee oscillators

was very loose and we observed results qualitatively similar to the independent case. With a high

value (here, 1.0), a strong coupling occurred and because the lower limb was mainly driven by the

hip control unit, the system essentially became a “flexible 1-DOF system” (Lungarella and Berthouze,


2002a). To illustrate this point, we carried out the following experiments. The hip unit parameters

were initialized to (τhu = 0.06,τh

v = 0.65), and the knee control parameters (τku and τk

vk) were set so

that with an intersegmental coupling of ωs = 0.0 multiple oscillatory regimes could co-exist. We used

the following values: τku = 0.035,τk

v = 0.4. The proprioceptive feedback gain to the hip was set to

ωhp = 2.0, i.e., its critical value as determined experimentally. With ωs = 1.0, the system stabilized into

a stable regime in which hip and knee oscillated in phase (see motor commands in the close-ups of

Fig. 4.8). Interestingly, knee kicking motion occurred only shortly before the robot reached the point

after which the rubber band would have extended. From an intuitive point of view, this behavior could

be optimal task performance.

Com

man

dsPo

sitio

ns

0 10000 20000 30000 40000

Com

man

dsPo

sitio

ns

20000 22000 24000

Figure 4.8: Co-existing regimes for ωs = 0.0 and τhu = 0.06,τh

v = 0.65,τku = 0.035,τk

v = 0.4 (top).Unique in-phase oscillatory regime with ωs = 1.0 (bottom). In each graph, the time-series de-note hip and ankle positions, hip and knee motor commands (from top to bottom). Right-handwindows are close-ups on the time-series. The horizontal axis denotes time in msec.

With intermediate values, i.e., ωs = [0.25,0.75], the intersegmental coupling was not sufficient to

overcome the difference in time-constants between the hip and the knee control units, and its effects

were negligible. This outcome was in sharp contrast with our previous findings that intersegmental

coupling (without proprioceptive feedback) could account for a reduction of transients and for the sup-


pression of abrupt phase transitions. We had attributed that result to the effect of neural entrainment,

whereby the outputs of the control units tend to smoothly converge towards a stable configuration (Lun-

garella and Berthouze, 2002c). In the case of physical constraints (the rubber band), however, a stable

configuration cannot be systematically found.

Bootstrapped 2-DOF exploratory control

As in the original study, we experimented with a controlled release of the second degree of freedom

after the system had reached stationary regime in a 1-DOF configuration. We selected 1-DOF param-

eter configurations such as discussed earlier but not necessarily close to the resonant solution. The

reaching of the stationary regime was visually evaluated by the experimenter and the second degree of

freedom was then released. Although this visual appraisal may appear an ad-hoc solution, it actually

helps validate our observations by introducing variance in the time after which the degree of freedom

is released.

In contrast to the initial study in which all configurations led to a stable, in-phase stationary regime

with large amplitude, the introduction of the second degree of freedom induced different behaviors

that showed a relatively high sensitivity to the values of the knee control parameters. We observed two

typical situations: (a) The introduction of the second degree of freedom induced a phase shift which

resulted in dampened oscillations, as shown in Figure 4.9 (left). This phenomenon was repeatable

and robust to external perturbations. (b) When the 1-DOF regime was close to resonant control, the

oscillatory behavior was left unchanged by the addition of a second degree of freedom, as shown in

Figure 4.9 (right). Again, this is a natural result of the morphology of the system and the 3 : 2 ratio

between hip and knee tonic excitations.

In further contrast with the initial study, we did not observe any instance where the introduction

of the second degree of freedom led to better task performance. Instead, it often induced a collapse

of the hip oscillations. We used these occurrences as a triggering signal for a new freezing/freeing

phase of the peripheral degree of freedom. After freezing, the system always returned to an oscillatory

behavior typical of its 1-DOF configuration. Subsequent releases led either to a new collapse of the

hip oscillations, and thus, a new cycle of freezing/freeing, or to sustained oscillatory behavior (see

Fig. 4.10).

This result begs the question of whether freeing and freezing are just another form of perturbation.

At this stage, we are not in a position to provide a definite theoretical reply. We are also not aware

of any existing theoretical characterization of the effect of freezing/freeing on the motion patterns

of human subjects engaged in tasks typically observed by developmental psychologists. We are not

arguing against the fact that a carefully designed perturbation, or a set of artificial constraints, could

trigger the same type of motor changes as those induced by freezing and unfreezing. However, it does


60000 80000 10000040000 60000 80000

Com

man

dsPo

sitio

ns

Figure 4.9: Results of the release of an additional degree of freedom after stabilization in a 1-DOF configuration. Left: (τh

u = 0.045,τhv = 0.65) and (τk

u = 0.025,τkv = 0.45). Right: (τh

u = 0.06,τhv = 0.65)

and (τku = 0.025,τk

v = 0.35). From top to bottom, the time-series denote hip and ankle positions,hip and knee motor commands. The horizontal axis denotes time in msec.

not appear plausible that infants rely on the likelihood of encountering such a particular perturbation

to generate the appropriate chain of changes required for them to acquire their various skills. Indeed,

developmental psychologists observe such sequence of change without having to introduce external

biases. Thus, it seems reasonable to attribute these pathway of changes to an intrinsic mechanism like

freezing and freeing (which could be seen as an intermediate stage en route to the self-organization of

motor activities). Our experimental results show that unlike external perturbations such as a manual

push, this mechanism can consistently and reliably lead the system to stray away from the sensorimotor

area explored at the time of the “perturbation.”

This could be interpreted in terms of the three stages of human motor skill acquisition proposed

by Goldfield (1995): (1) inability to control excessive degrees of freedom pushing infants outside the

limits of their postural stability; (2) reduction of the number of degrees of freedom to simplify the

control, either introducing synergies or by freezing degrees of freedom; and (3) controlled release of

the frozen degrees of freedom following recovery. Figure 4.11 shows empirical evidence for the effect

4.6. Conclusion and future directions 104

0 20000 40000 60000 80000 100000

Com

man

dsPo

sitio

ns

Figure 4.10: Oscillatory behavior obtained during alternate freezing and freeing phases. Neu-ral parameters are unchanged and set to τh

u = 0.06,τhv = 0.65,τk

u = 0.03,τkv = 0.325,ωh

p = 0.5 andωs = 0.5. From top to bottom, time-series denote hip and ankle positions, hip and knee motorcommands. The horizontal axis denotes time in milliseconds.

of alternate freezing and freeing of the degrees of freedom. The close-ups on the right-hand side show

that although the control parameters did not change, the kicking pattern of the knee did not change

between subsequent releases.

4.6 Conclusion and future directions

In this study, we set out to assess whether an initial phase of freezing followed by a subsequent phase

of freeing of degrees of freedom, such as proposed by Bernstein’s model, would be sufficient to over-

come the increase in task complexity induced by a strong nonlinear coupling between the pendulating

robot and its environment. By comparing use of the full body and progressive exploration by using a

developmental cycle of freezing and freeing of the degrees of freedom, we showed that a single stage of

freezing/freeing was not sufficient to develop stable oscillatory behaviors. In contrast to our previous

study (Lungarella and Berthouze, 2002c), alternate freezing and freeing was required. The interest of

this result is two-fold:


50000 75000 100000

Com

man

dsPo

sitio

ns

63000 66000 99000 102000

Figure 4.11: Effect of alternate freeing and freezing of the knee. Neural parameters are un-changed and set to τh

u = 0.035,τhv = 0.65,τk

u = 0.055,τkv = 0.45,ωh

p = 0.5 and ωs = 0.5. From top tobottom, time-series denote hip and ankle positions, hip and knee motor commands. Right-hand graphs are close-ups on the two different regimes. The horizontal axis denotes time inmilliseconds.

1. It confirms the recent observations by Newell and Vaillancourt (2001) that Bernstein’s frame-

work may be too narrow to account for coordination changes observed in motor learning (in

adults as well as in children) (see also Haehl et al., 2000; Ko et al., 2003). According to Ko et al.

(2003, p.48), “there is growing evidence that there may not be, as suggested by Bernstein, a

single pathway of change in the evolving patterns of coordination as a function of learning.” In-

stead, depending on the task, there can be either an increase or a decrease in (a) the number of

involved mechanical degrees of freedom, and (b) the dimension of the attractor dynamics of the

motor output (number of dynamical degrees of freedom). Newell and van Emmerik (1989), for

example, found no evidence of the freeing of the distal arm segments in the learning of signature

writing, even though McDonald et al. (1989) found evidence of a release of the most distal wrist

segment in learning a dart-throwing task with the non-dominant arm but only after several days

of practice. Newell and Vaillancourt (2001) also reports that while open chain linkages, such as

arms and legs, are more prone to exhibit a proximal to distal direction to the recruiting of the


biomechanical degrees of freedom, this pathway of change is only due to particular task con-

straints and may not be a general learning strategy. This interpretation is supported by Haehl

et al. (2000)’s study on infants learning to cruise (walking with support). This study showed that

infants displayed an initial poorly controlled exploratory phase – “wobbling” phase – character-

ized by a large number of movement reversals (i.e., dynamical degrees of freedom).

2. It provides empirical evidence suggesting that perturbations which push the system outside the

limits of its postural stability, or which increase the complexity of the task may be the triggering

mechanism for alternate freezing and freeing of degrees of freedom. As with Newell and Vail-

lancourt (2001), this study does not allow us to further speculate on (a) the factors responsible

for the multiple pathways of change observed in the learning of motor coordination (besides

task-dependence, and confluence of constraints in action), and (b) how those factors combine

with the neural dynamics to implement those changes. However, we believe that it provides

opportunities to further investigate the issue of increased task complexity and task constraints.

In Chapter 5, for instance, we will report on a study investigating robot-bouncing, which took

inspiration from a longitudinal study by Goldfield et al. (1993) on infants’ bouncing in a Jolly

Jumper, i.e., a harness hung from the ceiling by springs or rubber bands. Despite the preliminary

nature of the results the claims made here seem to be substantiated (Lungarella and Berthouze,

2003).

This study points at two challenges to be addressed in the future: The first one relates to the proper

characterization or description of the multiple pathways of change observed during the learning of mo-

tor patterns in a given task. Taking a biomechanical stance, we could quantify the motor activity in

terms of “biomechanical” degrees of freedom, i.e., the change over time of the number of joints or mus-

cles responsible for the particular coordination strategy employed to accomplish the task. A dynamical

systems perspective, on the other hand, would refer to the dynamical or “active degrees of freedom”

that correspond to the geometric layout of the attractor dynamics. In the case of simple patterns of

coordination, such as the one in our initial study, it may be justified to attribute to a single variable,

e.g., the relative phase between limbs, the role of “order parameter”, or “collective variable” (Kelso,

1995). Even then, however, the motion of a single joint can yield a dimension greater than one. As for

whole body action, we have little understanding of the number or the nature of dimensions that capture

the collective organization of the system (Newell and Vaillancourt, 2001). Thus the matching of those

two dimensions (biomechanical and dynamical) is a major challenge.

The second point is closely related to the first one and concerns the tight interaction between neural

dynamics and bodily activity. In our two studies, we intentionally focussed on the the role of physical

(morphological) changes for a fixed control parameter setting (i.e., a given neural organization). Al-

though this step was useful – it helped us demonstrate experimentally that such changes represent an


adaptive mechanism in their own right – it lacked biological plausibility causing the relatively poor per-

formance obtained in the face of strong perturbations. In reality, neural dynamics entrains to physical

dynamics (as shown by control synergy, for example) and control re-organization occurs as a result of

learning. In this respect, the choice of Matsuoka oscillators is arguable. This type of oscillator has been

shown to have poor characteristics when feedback-induced delay increases above a certain value (see

Taga, 1994, for instance). We hypothesize that, in this study, the nonlinear coupling may have intro-

duced a significant feedback-delay, which in turn resulted in the failure to entrain. Asymptotically-

stable limit-cycle oscillators with physiologically plausible characteristics, e.g., the Bonhoeffer-Van

der Pol model (Fitzhugh, 1961), are possible alternatives to Matsuoka’s model, because they exhibit

“flexible phase-locking”, i.e., they show greater flexibility in changing their relative phase to respond

to incoming entraining actions, even in the presence of strong delays (Ohgane et al., 2004).

Eventually, we would like to briefly comment on an important issue, namely that of the difference

between exploration and learning. Is this case-study about learning, or is it simply about the exploration

of the sensorimotor space during pendulation. In what way does it relate to development? In our

framework, exploration is a key component of task acquisition. Exploration produces the diversity of

sensorimotor trajectories (instances of task executions) which higher brain systems can subsequently

select, and exploit to realize learning, for example, in the form of consolidation of a parameter in

motor memory, or to train forward models (e.g. Wolpert et al., 2003). With a few exceptions, most

motor tasks require practice before optimal performance is achieved, and in young infants – at a stage

when they have not acquired many primitive motor behaviors on top of which to build more complex

skills – the role of exploration is critical. The use of value-based learning algorithms, such as the

ones discussed in this chapter, but also in Chapter 3 and in Chapter 6, implement a first step toward

learning, as exploration is driven by value, i.e., task performance. In the original study, we showed how

such value-driven exploration led to a quick convergence to a stable motor behavior. Thus, exploration

should be seen as an adaptive (plastic) mechanism in its own right, although it acts on a different

ontogenetic time-scale than that of learning or development.

Chapter 5

On the Synergy Between Neural and

Body-Environment Dynamics1

An outstanding property of the nervous system is that it is self-organizing, i.e., in contact with a

new environment the nervous system tends to develop that internal organization which leads to

behavior adapted to that environment. (Ashby, 1947)

5.1 Synopsis

The study of how infants strapped in a Jolly Jumper learn to bounce can help clarify how they explore

different ways of exploiting the dynamics of their movements. In this paper, we describe and discuss a

set of preliminary experiments performed with a bouncing humanoid robot and aimed at instantiating

a few computational principles thought to underlie the development of motor skills. Our experiments

show that a suitable choice of the coupling constants between hip, knee, and ankle joints, as well as

of the strength of the sensory feedback, induces a reduction of movement variability, and leads to an

increase in bouncing amplitude and movement stability. This result is attributed to the synergy between

neural and body-environment dynamics.

5.2 Introduction

Despite the availability of many descriptive accounts of infant development, modeling how motor

abilities unfold over time has proven to be a hard problem (Goldfield, 1995; Sporns and Edelman, 1993;

1To appear as Lungarella, M. and Berthouze L. (2004). Robot bouncing: on the synergy between neural and body-environment dynamics. In Iida, F., Pfeifer, R., Steels, L. and Kuniyoshi, Y. (eds.) Embodied Artificial Intelligence. Berlin:Springer-Verlag.

108


Thelen and Smith, 1994). Existing models are based on general principles and specific mechanisms

which are assumed to underlie the changes in early motor development.

One such mechanism is self-exploration through spontaneous activity. An important precursor

of later motor control (Forssberg, 1999; Piek, 2001; Thelen and Fischer, 1983), its main role seems

to be the exploration of various musculo-skeletal organizations in the context of multiple constraints

such as environment, task, architecture of nervous system, muscle strength, masses of the limbs, and

so on. A growing number of developmental psychologists has started to advocate the view that self-

exploration through spontaneous movements helps infants bootstrap new forms of motor activity, as

well as discover more effective ways of exploiting the dynamics generated by their bodily activities

(Angulo-Kinzler, 2001; Goldfield, 1995; Schneider et al., 1990; Sporns and Edelman, 1993; Von Hof-

sten, 1993). It has been suggested that through movements that garner information specific to stable

regions in the high-dimensional space of possible motor activations, self-exploration can lead to a

state of awareness about body and environment (Goldfield, 1995). In fact, fetuses (as early as 8 to 10

weeks after conception) as well as newborn infants display a large variety of transient and spontaneous

movement patterns such as infant stepping and kicking (Thelen and Smith, 1994), spontaneous arm

movements (Piek and Carman, 1994), general movements and sucking movements (Prechtl, 1997). In-

fants probably learn about their body by performing movements over and over again, and by exploiting

the continuous flow of sensory information from multiple sensory modalities. In doing so, they ex-

plore, discover, and eventually select – among the myriad of available solutions – those that are more

adaptive and effective (Angulo-Kinzler, 2001).

The control of exploratory movements has been traditionally attributed to neural mechanisms

alone. Prechtl (1997), for instance, linked the production and regulation of spontaneous motility in in-

fancy “exclusively” to endogenous neural mechanisms, such as central pattern generators. This claim

is somewhat substantiated by the fact that in many vertebrate species, central pattern generators appear

to generate the rhythm and form of the bursts of motoneurons (Grillner, 1985), or to govern innate

movement behaviors altogether (Forssberg, 1999).

In the last two decades, however, new evidence has pushed forward an alternative and multi-causal

explanation theoretically grounded in dynamic systems theory (Thelen and Smith, 1994). Accord-

ing to this view, coordinated motor behavior is also the result of a tight coupling between the neural

and biomechanical aspects of movement, and the environmental context in which the movement oc-

curs (Goldfield, 1995; Kelso, 1995; Taga, 1995; Thelen and Smith, 1994). Spontaneous movements

are not mere random movements, but are organized (or better, they self-organize), right from the very

start, into recognizable patterns involving various parts of the body, such as head, trunk, arms, and

legs. Spontaneous kicks in the first few months of life, for instance, appear to be particularly well-

coordinated movements characterized by a tight coupling (Thelen and Fischer, 1983; Thelen and Smith,

1994), and by short phase lags between the hip, knee and ankle joints (Piek, 2001). Rigid phase-locked

5.3. Hypotheses on infant bouncing learning 110

movements can be interpreted as a “freezing” of a number of degrees of freedom that must be controlled

by the nervous system, thus resulting in a reduction of the movement variability and complexity, and

in a faster learning process (Bernstein, 1967; Turvey and Fitzpatrick, 1993). During development, the

strong synchrony is weakened, and the degrees of freedom are gradually “released” (Piek, 2001; Thelen

and Fischer, 1983; Vaal et al., 2001). The ability to change the patterns of coordination between vari-

ous joints to accomplish a task is an important aspect of infants’ motor development (Angulo-Kinzler,

2001). It has been shown that tight interjoint coupling persisting beyond the first few months of life

may lead to poor motor development, or may even be associated with abnormal development (Vaal

et al., 2001).

In a previous paper, we examined the effects of “freezing and freeing of degrees of freedom” (Lun-

garella and Berthouze, 2002c) in a swinging biped robot. The study showed that by freezing (that is,

rigidly coupling) and by subsequently freeing the mechanical degrees of freedom, the sensorimotor

space was more efficiently explored, and the likelihood of a mutual regulation of body-environment

and neural dynamics (that is, entrainment) was increased. The aim of this chapter is to further our

understanding of the role played by the coupling (a) between joints, and (b) between the sensory ap-

paratus and the neural structure for the acquisition of motor skills. To achieve this goal, we embedded

a pattern generating neural structure in a biped robot, and by manually altering various coupling con-

stants, we systematically studied their interaction with the body-environment dynamics in the context

of a real task (bouncing).

5.3 Hypotheses on infant bouncing learning

Goldfield et al. (1993) performed a longitudinal study in which eight six-months old infants strapped

in a “Jolly Jumper” (i.e., a harness attached to a spring, see Fig. 5.1) were observed once a week, for a

period of several weeks, while learning to bounce. They concluded that in the course of learning, the

infants’ motor activity could be decomposed into an initial “assembly phase”, during which kicking

was irregular and variable in period, followed by a “tuning phase” characterized by bursts of more

periodic kicking and long bouts of sustained bouncing, during which infants seemed to refine and

adapt the movement to the particular conditions of the task. A third phase was initiated by a sudden

doubling of the bout length, and was characterized by oscillations of the mass-spring system at its

resonant frequency, a sensible rise of amplitude, and a decrease of the variability of the period of the

oscillations.

From this study, we derive a few principles. First, there is no need to postulate a set of prepro-

grammed instructions or predefined motor behaviors. It is by means of a process of self-organization

and self-discovery, and through various spontaneous (seemingly random) movements that infants ex-

plored their action space and eventually discovered that kicks against the floor had “interesting” con-

5.3. Hypotheses on infant bouncing learning 111

Figure 5.1: Infant strapped in a Jolly Jumper.

sequences (Goldfield et al., 1993). After an initial exploratory phase (assembly), the infants selected

particular behaviors and began to exploit the physical characteristics of the mass-spring system. Gold-

field and collaborators advanced the hypothesis that, in general, infants learning a task may try out

different musculo-skeletal organizations by exploring the corresponding parameter space, driven by

the dynamics of the task as well as by the existing repertoire of skills and reflexes.

Second, to achieve effective and continuous bouncing, i.e., bouncing characterized by simultaneous

leg extensions, the infants had to learn patterns of intersegmental coordination. Thus, the infants had

to explore different force and timing combinations for the control of their movements, and to integrate

the environmental information impinging on various sensory modalities, i.e., visual, vestibular, and

cutaneous. Unfortunately, the study performed by Goldfield et al. did not provide any kinematic or

kinetic analysis of the development of the infants’ movement patterns. In line with the findings reported

in (Lungarella and Berthouze, 2002c; Thelen and Fischer, 1983; Vaal et al., 2001), we hypothesize

5.4. Experimental setup 112

that in order to reduce movement complexity, the initial movements had to be performed under tight

intersegmental coupling. As development and learning progressed, the couplings were weakened, and

more complex movement patterns could be explored. Thelen and colleagues put forward evidence

showing that in infants the loosening of the tight joint coupling may not necessarily be a consequence

of learning alone (see Thelen and Smith, 1994, for instance).

Third, the rhythmic nature of the task (bouncing) can be interpreted as a particular instance of

Piagetian circular reaction2 . Rhythmic (not necessarily task-oriented) activity is highly characteristic

of emerging skills during the first year of life. Thelen and Smith suggested that oscillatory movements

are the by-product of a motor system under emergent control, that is, when infants are in the process of

attaining some degree of intentional control of their limbs or body postures, but when their movements

are not fully goal-corrected (Thelen and Smith, 1994).

Finally, this study highlighted the necessity of a value system to evaluate the consequences of the

movements performed, and to drive the exploratory process. Value systems are known to mediate

plasticity and to modulate learning in an unsupervised and self-organized manner, allowing organisms

to be adaptive, and to learn on their own via self-generated and spontanenous activity. They also

create the necessary conditions for the self-organization of dynamic sensory-motor categories, that is,

movement patterns.

5.4 Experimental setup

To test our computational hypotheses, we decided to replicate Goldfield et al.’s experiments using a

small-sized humanoid robot with 12 mechanical degrees of freedom (Fig. 5.2). The robot was sus-

pended in a leather harness attached to two springs. Each leg of the robot had three segments (thigh,

shank, and foot) and five joints, but only three of the latter (i.e., hip, knee and ankle) were used. Each

joint was actuated by a high-torque RC-servo module. These modules are high-gain positional open-

loop control devices and do not provide any feedback on the position of the corresponding joint. In fact,

there was no need to measure the anatomical angles of hip, knee and ankle, since these values were

available as the set positions of the RC-servo modules. Exteroceptive and proprioceptive information

were also taken into account. Ground reaction forces were measured by means of force sensitive re-

sistors placed under the feet of the robot (two per foot). To reduce impact forces in the joints of the

robot and to add some passive compliance, the soles of the robot’s feet were covered with soft rubber.

Torsional movements around the z-axis were measured with a single-axis solid-state gyroscope. Linear

accelerations in the sagittal plane were estimated by a dual-axis accelerometer (Fig. 5.2 right).

2Circular reactions represent an essential sensorimotor stage of Piaget’s developmental schedule (Piaget, 1953), whichrefer to the repetition of an activity in which the body starts in one configuration, goes through a series of intermediate stages,and eventually returns to the initial configuration.


k , b2 2

k , b1 1

Spring

2−axis accelerometer

1−axis solidstate gyro

force sensitive resistors

a

az

x

Z

X

Y

Figure 5.2: Left: Humanoid robot used in our experiments. Right: Schematic representation ofthe robotic setup.

5.4.1 Neural rhythm generator

Figure 5.3 (right) depicts a schematic representation of the neuro-musculo-skeletal system inspired

by Taga (1995). The neural rhythm generator or central pattern generator (Grillner, 1985) was con-

structed by using six neural oscillators, each of which was responsible for a single joint (Fig. 5.3 right).

We modeled the individual neural oscillators according to the following set of nonlinear differential

equations (Matsuoka, 1985):

τu u f = −u f −βv f −ωc g(ue)−ωp g(Feed) + te

τu ue = −ue−βve−ωc g(u f )−ωp g(−Feed) + te

τv v f = −v f + g(u f )

τv ve = −ve + g(ue)

yout = u f −ue

where ue and u f are the inner states of neurons e (extensor) and f (flexor), ve and v f are variables rep-

resenting the degree of adaptation or self-inhibition of the extensor and flexor neurons. The external

tonic excitation signal te determines the amplitude of the oscillation. β is an adaptation constant, ωc is

a coupling constant controlling the mutual inhibition of neurons e and f , τu and τv are time constants,

and determine the strength of the adaptation effect. The operator g(x) = max(0,x) returns the positive


neural rhythm generator

sense consequence of action

Neural system

Musculo−skeletal system

Environment

tonic input

motor output sensory input

generate action

parameter explorationmodulationvalue system

he he

hf hf

kf

ke ke

kf

ae

af

ae

af

ankle

knee

hip

mo

tor

com

man

ds

mo

tor

com

man

ds

Figure 5.3: Left: Basic structure of the neuro-musculo-skeletal system. The arrows in themodel show the information flow. Right: Neural rhythm generator composed of six neural os-cillators. The solid circles represent inhibitory, and the half-circles are excitatory connections.Abbreviations: he=hip extensor, hf=hip flexor, ke=knee extensor, kf=knee flexor, ae=ankle ex-tensor, af=ankle flexor. Not shown are proprioceptive feedback connections and tonic excita-tions.

part of x. The difference of the output of the extensor and the flexor neuron of each unit oscillator was

fed to a pulse generator. Its output yout was the angle of the RC-servo associated with the correspond-

ing unit oscillator. Sensory feedback to the pattern generator Feed occurred through four the pressure

sensors located under the robot’s feet. The value of the afferent feedback was computed as the sum of

the sensed ground reaction forces, weighted by the variable ωp. Appropriate joint synergies among ip-

silateral joints, i.e., appropriate phase relationships between the corresponding neural oscillators, were

produced by feeding the flexor unit of one oscillator with a combination of the output of the extensor

and flexor units of the other oscillator. As shown by Fig. 5.3, reciprocal inhibitory connections between

corresponding flexor and extensor neurons of the left and right hip joint were also implemented.

5.4.2 Selection of the neural control parameters

The adaptation constant β and the degree of mutual inhibition between extensor and flexor neuron of

a single neural oscillator were fixed throughout the whole study to β = 2.5 and ωc = 1.0. The tonic

excitation was fixed to te = 1.0, and the intersegmental coupling constant to ωs = 0.75. The high value

of the latter constant induced kicking patterns with a tight joint coupling. According to Williamson

(1998), the time constants τu and τv determine the shape and the speed of the oscillator output. In

5.5. Experiments and discussion 115

order to guarantee stable oscillations, the ratio r = τu/τv should be kept in the interval [0.1,0.5]. In all

experiments, we fixed the ratio r to 0.5. The sensory feedback coefficient ω p was variable, and was set

as specified in each sub-section.

5.5 Experiments and discussion

To model and analyze our experimental results, we assumed an ideal mass-spring-damper system. This

model represents a first attempt to identify a relationship between oscillation frequency, amplitude of

the oscillation, and other parameters. The differential equation governing the free oscillation of the

mass-spring-damper system is mx(t) + b x(t) + k x(t) = 0. In our case, m is the mass of the robot, b is

the damping coefficient of the spring and k its spring constant. The equation has solutions of the form:

x(t) = Ae−bt/2m cos(ωd t + φ), where A (amplitude of the oscillation) and φ (phase) are determined by

the initial displacement and velocity of the robot. ωn =√

k/m is defined as the undamped natural

frequency of the mass-spring-damper system and ωd =√

ω2n− (b/2m)2 < ωn is its damped natural

frequency. The mass of the robot (fixed throughout all experiments) was m = 1.33kg. The estimated

spring constant was k1 = k2 = 25.5N/m, and the damping coefficient was b = 0.065kg/sec for both

springs (Fig. 5.2 left). For the computation of b, we assumed a viscous frictional force, proportional to

the velocity of the oscillation.

In all experiments, we recorded the system’s movements by tracking the position (relative to an

earth-fixed frame of reference) of colored markers placed on the robot’s hip, knee and ankle. The ex-

periments were organized according to the complexity of their environmental interaction (with/without

ground contact, with/without sensory feedback).

5.5.1 Scenario 1 – Free oscillations

This scenario served to assess the basic properties of the real system and of the corresponding mass-

spring-damper model needed to qualify oscillatory behaviors (and materialize the presence of entrain-

ment). The robot’s joints were not actuated, and the robot was set so that its feet could not touch the

ground no matter the amplitude of the vertical oscillations. At the onset of the experiment, the robot

was lifted by an arbitrarily chosen height, and then let oscillate freely. The resulting motion was har-

monic and underdamped, with an exponentially decreasing amplitude of the form e−αt sin(2πt/T ), a

decay coefficient α = 0.124/sec, and a period T = 1.01sec. Hence, the resonance frequency of the

system could be estimated to be fR = 1/T = 0.99Hz ≈ ωd/2π. The effective spring constant of the

system was Ke f f = 50.5N/m, which is almost twice the spring constant of each spring. From our mea-

surements, we estimated the effective damping coefficient to be approximately Be f f = 0.33N sec/m.

Note that Be f f is not twice the damping coefficient of a single linear spring, as might be inferred by the


value of Ke f f . This clearly shows that the system is not a close-to-ideal mass-spring system, and that

a more rigorous approach would have to consider a better model for the damping force. For instance,

viscous frictional forces proportional to the square of the velocity of the mass should be taken into

account.

5.5.2 Scenario 2 – Forced oscillations without ground contact

In this experiment, the robot’s joints were actuated such that the equation describing the motion of

the robot was mx(t) + b x(t) + k x(t) = F(t), where driving force F(t) is a function of the paramter

settings of the neural oscillators and of the amplitude of the robot’s limb movements – as suggested

by Goldfield et al. (1993). In other words, the movement of the robot can be modeled as a forced

mass-spring system, with the robot’s kicking movements representing the driving force. As in scenario

1, the robot could not reach the ground with its feet. After an initial transient, the system converged

to a steady state, a forced harmonic oscillation. Vertical resonance was achieved for the parameter

setting (τu = 0.108,τv = 0.216), and resulted in an average vertical displacement from the rest position

of 10.6cm, and a peak displacements exceeding 17cm. The dominant frequency of the oscillation,

estimated via a spectral analysis of the vertical component of the hip marker position, was fHip =

1.01Hz, which was very close to the previously estimated resonant frequency of the system fR =

0.99Hz. Interestingly, the system displayed at least three oscillatory modes. This behavior is akin to

spontaneous activity in infants, who enter preferred stable states and exhibit abrupt phase transitions

between states (Goldfield, 1995). Parameter settings close to (τu,τv) = (0.066,0.132) led to a strong

horizontal oscillatory motion, whereas for τu > 0.150 and τv > 0.300, there was an evident torsional

movement. For τu < 0.06, vertical oscillations were essentially unexistent.

5.5.3 Scenario 3 – Forced oscillations with ground contact (ωp = 0)

The goal of this set of experiments was to assess the effect of ground contact on the oscillatory move-

ment observed in scenario 2, in the absence of afferent feedback from the touch sensors (i.e., ω p = 0).

At the onset of each experimental run, we made sure that the robot’s feet could touch the ground. To

correct for the lack of compliance in the robot’s joints, the ground was covered with soft material.

The introduction of this additional nonlinear perturbation led (given appropriate neural control param-

eters) to the emergence of a new behavior: bouncing. Figure 5.4 shows the result of three different

parameter configurations. A suitable model of the movement of the robot’s center of mass needs also

to take into account the nonlinear interaction with the ground, and the stiffness and damping charac-

teristics of the floor and the feet. We propose the following linear model (see also Goldfield et al.,

1993): mx(t) + Be f f x(t) + Ke f f x(t) = F(t), where F(t) = 0 when the feet are off the ground and

F(t) = F0−F0sin(2π f t), F0 > 0, when the feet are on the ground, with Ke f f (effective spring constant)


and Be f f (effective damping coefficient) incorporating the effect of springs, feet and floor.

0

5

10

15

20

25

5 10 15 20 25 30 35 40 45

Ver

tical

dis

plac

emen

t [cm

]

Time [sec]

0

5

10

15

20

25

10 20 30 40 50

Ver

tical

dis

plac

emen

t [cm

]

Time [sec]

0

5

10

15

20

25

10 20 30 40 50 60

Ver

tical

dis

plac

emen

t [cm

]

Time [sec]

0

2

4

6

8

10

16 18 20 22 24 26 28

Ank

le [c

m]

Hip [cm]

Figure 5.4: Forced harmonic oscillations with ground contact (bouncing) in the absence ofsensory feedback (ωp = 0). Top: τu = 0.108,τv = 0.216 and τu = 0.140,τv = 0.280, bottom: τu =0.114,τv = 0.228 (phase plot on the right). In all graphs, the three curves represent the verticaldisplacement of the ankle, knee and hip marker in cm.

5.5.4 Scenario 4 – Forced oscillations with ground contact (ωp > 0)

Afferent sensory feedback and contact with the ground induced a “haptic closure” of the sensory-

motor loop, which turned the linear and externally driven mass-spring system of experiments 2 and

3 into an autonomous limit-cycle system with the intrinsic timing determined by the moment of foot

contact with the ground and by the gain of the feedback connection ω p. In other words, the kicking

frequency (implicitly timed by the neural oscillators) and its phase relationship with the bouncing

was regulated by haptic information, and resulted in entrainment between time of ground contact and

period of the neural oscillators. A positive ωp had at least two advantages: (a) it led to a stabilized

and sustained bouncing, and (b) to an increase of its amplitude (measured as the difference between

successive maxima and minima of the vertical displacement). These effects are visualized in Figure 5.5


top-left, in which the parameters were (τu,τv) = (0.114,0.228) and ωp = 0.5. The phase plot of the

same time series is depicted in Figure 5.5 (top-right). The phase plots in figures 5.4 and 5.5 clearly

0

5

10

15

20

25

10 20 30 40 50 60

Ver

tical

dis

plac

emen

t [cm

]

Time [sec]

0

2

4

6

8

10

16 18 20 22 24 26 28

Ank

le [c

m]

Hip [cm]

0

5

10

15

20

25

10 20 30 40 50

Ver

tical

dis

plac

emen

t [cm

]

Time [sec]

0

2

4

6

8

10

16 18 20 22 24 26 28

Ank

le [c

m]

Hip [cm]

Figure 5.5: Forced harmonic oscillations with ground contact (bouncing) in presence of sen-sory feedback (ωp > 0). Top row: ωp = 0.5,τu = 0.114,τv = 0.228, bottom row: ωp = 0.75,τu =0.140,τv = 0.280.

demonstrate the stabilizing effects of sensory feedback. In Fig. 5.5 (top-right), the parameters were

(τu = 0.140,τv = 0.280) and ωp = 0.75, and the bouncing was stable and sustained. For ωp = 0,

however, the bouncing suddenly collapsed and exhibited more variability (Fig. 5.4 top-right).

The influence of sensory feedback on the bouncing amplitude is evident by comparing Fig. 5.4

(bottom) with Fig. 5.5 (top). In the latter case, the maximum vertical displacement of the hip relative

to the initial position of the ankle marker was 27.3cm, and its maximum vertical displacement relative

to the initial position of the hip marker was 4.4cm. The dominant frequency of the vertical oscillation

(determined via a spectral analysis of the hip marker) was fHip = 0.93Hz, whereas fHip = 0.95Hz

for the same parameter configuration but with ωp = 0. Thus, the sensor feedback also affected the

frequency of the oscillation. After a short initial transient, the robot settled into a stable oscillatory

movement but did not bounce.

5.6. Discussion and conclusion 119

In this scenario, the model is more complicated and has to take into account the change of phase

and timing due to the sensory feedback. This is realized by introducing a new variable φ such that

mx(t) + Be f f x(t) + Ke f f x(t) = F(t,φ).

5.6 Discussion and conclusion

The question of how sensory feedback interacts with the central pattern generator is still open (Taga,

1995). As a demonstration that sensory feedback is not necessary for the generation and coordination

of rhythmic activity, experiments in completely isolated spinal cords and in deafferented animals (i.e.,

without sensory feedback) have shown that the patterns generated by these type of structures are very

similar to those recorded in intact animals (Ijspeert, 2003). What emerged from our study is that a

suitable choice of the intersegmental coupling constant, as well as of the gain of the sensory feedback

reduces movement variability, increases bouncing amplitude, and leads to stability. We attribute this

result to the entrainment of neural and body-environment interaction dynamics. In other words, the

neural system of our model is designed to produce a basic pattern of muscle activation established

not only by the connections between the neural oscillators, but also by the input of sensory signals

representing body movements and the coupling with the environment. Through a recurrent interaction

in the sensorimotor loop, the variability and instability of the movements are stabilized into a limit

cycle. In the sense that such a coupling produces an effect greater than the sum of the individual

components, it is a synergistic coupling. A similar finding, in the case of biped walking, was reported

by Taga (1995).

It has been suggested that the developmental transformation of spontaneous motor activity into

task-specific movements consists of two phases, which are called assembly and tuning phase (Gold-

field et al., 1993). While assembly refers to the self-organization of relationships between the com-

ponents of the system, tuning is concerned with the adaptation of the system parameters to particular

conditions. In this paper, we have primarily focused on the tuning phase by making the premise that

the assembly phase results in a positive intersegmental coupling between hip, knee and ankle. It is in-

teresting to consider the issue of the mechanisms underlying the assembly phase. Although bouncing

is intrinsically a rhythmic activity for which central pattern generators represent suitable neural struc-

tures, there is no evidence that newborn infants move their limbs in a manner consistent with the output

of central pattern generators, and indeed, sporadic kicking movements are more plausible candidates.

Given that neural oscillators are usually modeled as a set of mutually inhibitory neurons, the assembly

phase could be a process during which the topology of a vanilla-type cell assembly changes, driven by

feedback from the environment, and by a value system (based on the amplitude of the oscillations, for

instance).

With respect to the tuning phase, there is still much to do. In some sense, tuning refers to the non-

5.6. Discussion and conclusion 120

stationary regime which occurs before stabilization of movement patterns. In other words, it is the by-

product of the entrainment between neural control structure and environment – when sensory feedback

turns the system into an autonomous limit-cycle system. At a lower level of control, tuning could

also be implemented as changes in gain or time-constants of the neural oscillators. An autonomous

implementation of such parameter tuning could be realized via a mechanism of Boltzmann exploration

driven by a value system (Fig. 5.2 right). The author has successfully used this combination in a

pendulating humanoid robot (Lungarella and Berthouze, 2002c).

Yet, all this may not be sufficient to hypothesize a valid model of child motor development as there

is evidence that kicking behaviors display spatio-temporal patterns. In particular, Taga et al. (1999) re-

cently discussed the chaotic dynamics of spontaneous movements in human infants. Thus, formulating

the development of those skills in a dynamical systems framework would be highly desirable so that

an appropriate set of adaptive mechanisms could be implemented and tested against human data.

Chapter 6

Value-based stochastic exploration

6.1 Synopsis

This chapter is about the principle of exploratory activity, which asserts that “exploratory activity is

a fundamental process by which an agent collects information for learning about its own body, and

control structure, and for mastering the interaction with its surrounding environment.” Because of its

relevance for all chapters of this thesis, we provide an in-depth view of this principle and motivate its

necessity from a developmental point of view (see also Chapter 2). The chapter’s main emphasis is on a

particular instantiation of the principle of exploratory activity: a value-dependent stochastic exploration

scheme that can be used to produce exploratory activity. In particular, the scheme is applied to the

online calibration of a set of PID (proportional-integral-derivative)-controllers of a high performance

robotic head.

6.2 Introduction

Every learning system faces the dilemma between exploring its parameter spaces – e.g., weights of an

artificial neural network, time constants of a set of neural oscillators, the strength of the coupling be-

tween various limbs, or between body and environment – while simultaneously exploiting the good

parameter configurations that exploration has already uncovered. This trade-off is also known as

exploration-exploitation dilemma. To solve the dilemma learning systems typically resort to some

kind of ad hoc heuristics that varies across learning task and environment. A possible strategy, for

instance, could seek to combine random and unbiased exploration, such as the one of the Metropo-

lis algorithm (Metropolis et al., 1953), and a dynamic and gradual trade-off between exploration and

exploitation known as Simulated Annealing (Cerny, 1985).

The study presented in this chapter is situated in the context of value-based learning, i.e., a form of

121


learning in which an agent endowed with a value system is responsible for producing its own reinforce-

ment. Generally speaking, the purpose of a value system is either to notify the current behavioral state

to the agent (e.g., arousal, sleep, waking), or to mediate environmental saliency, that is, the signaling of

the occurrence of relevant stimuli or events to the agent’s neural system (e.g., novelty, pain, reward) by

modulating its activity and plasticity. To date, a number of explicit realizations of value-based learn-

ing schemes in robotic systems exist. In all those implementations value systems play either the role

of internal mediators of salient environmental stimuli or events (Almassy et al., 1998; Krichmar and

Edelman, 2002; Scheier and Lambrinos, 1996; Sporns and Alexander, 2002) or, what is more relevant

for this chapter, are used to guide some sort of exploratory process (Lungarella and Berthouze, 2002c).

In previous chapters (Chapter 3 and 4), a simple value system was described that was employed

to regulate the exploration of the parameter space associated with the control system of a robot whose

task was to learn to pendulate, i.e., to swing like a pendulum. In that particular case study, the robot’s

control system consisted of a set of neural oscillators, and the space explored was the space of the

corresponding neural parameters. The value system employed in that study was a function of the

maximum amplitude of the oscillation (evaluated within a given time window through markers placed

on the robot’s body). The value at time t was given by

Vt = maxVt−1(1− ε), |At |

where |At | denotes the absolute value of the instantaneous amplitude of the oscillation. The term (1−ε),

with 0 < ε 1, realized an exponential decay of the value signal when the oscillations was smaller

than the previously achieved maximum amplitude. The following exploration principle was adopted:

when a parameter setting yielded good performance, i.e., a high value Vt , the change of parameters

was slowed down, and hence nearby sets of parameters exploited. Conversely, when the settings led to

low-amplitude oscillations, a rapid and large change of parameters was triggered.

In this chapter, we address a similar issue by generalizing the exploration process. We propose a

self-supervised value-based exploration scheme, which can be employed for searching the parameter

space associated with a large class of control structures. In particular, we show (both in simulation,

and with a real robot) how our scheme can be used to automatize the calibration of a set of linear

proportional-integrative-derivative (PID) controllers. The size of the parameter space grows exponen-

tially with the cardinality of the set of controllers, and can be very large. Hence exhaustive exploration

as well as a random search strategy are inappropriate.

In Section 6.3, we highlight some studies that motivated and inspired the stochastic exploration

algorithm and that are highly relevant for the principle of exploratory activity. We describe the back-

ground of our scheme, and flesh out its details in sections 6.4 and 6.5. In Section 6.7, we briefly

report on the results of simulations. This is followed by a section on experiments performed with a

6.3. Developmental inspiration and related work 123

high performance robot head (Section 6.8. The results of the experiments are discussed in Section 6.9.

Eventually in Section 6.10, we conclude and point to some future directions.

6.3 Developmental inspiration and related work

Many studies on motor learning indicate that the acquisition of new motor skills (in healthy infants and

in adults) is preceded by a seemingly random, exploratory phase during which possible movements are

explored, selected, and tuned, and the ability to predict the sensory consequences of those movements

is learned (e.g., Angulo-Kinzler, 2001; Goldfield et al., 1993; Haehl et al., 2000; Meltzoff and Moore,

1997; Piek and Carman, 1994; Prechtl, 1997; Thelen and Smith, 1994).

Fetuses (as early as 8 to 10 weeks after conception) as well as newborn infants display a large va-

riety of transient and spontaneous movement patterns, such as general movements, rhythmical sucking

movements, spontaneous arm movements, and stepping and kicking movements (Piek and Carman,

1994; Prechtl, 1997; Thelen and Smith, 1994). General movements, which are argued to be the most

frequently employed and most complex movement patterns, involve a large number of body segments

(e.g., limbs, head, and trunk), and are characterized by a series of unrefined movements of variable

speed, intensity, amplitude, and by the lack of distinctive timing and coordination (Prechtl, 1997).1

It has been suggested that in healthy children during the early stages of motor development, general

movements “presumably” produce all motor possibilities within the neurobiological and anthropomet-

ric constraints of the organism through self-generated motor activity and sensory information Hadders-

Algra (2002). The origin of the variability underlying spontaneous movements has been hypothesized

to reside in the endogenous activity of the nervous system, and in the interaction with the external

world (e.g., reactivity to external stimuli) as well as the interplay between these two forms of activ-

ity (Van Heijst et al., 1999). It is important to note that the true origin of such activity is far from being

understood.

Spontaneous movement patterns are not a hallmark of the prenatal period only. The classifica-

tion of rhythmical stereotypies 2 by Thelen (1979), for instance, provides evidence that for infants

aged from 4 weeks to 12 months, movement variability (quantified by the number of different move-

ment stereotypies) increases, giving raise to an enlarged range of movement combinations, and hence

exploration. Exploratory behavior was also observed by Goldfield et al. (1993), who performed a lon-

gitudinal study in which eight 6 months old infants learning to bounce while being supported by a

Jolly Jumper (i.e., a harness attached to a spring of known stiffness and damping). Goldfield et al.

report that the infants, driven by a process of self-organization, by performing various spontaneous

1As other types of spontaneous movements, general movements have been recognized as important precursors to thedevelopment of movement control.

2Stereotypies can be defined as involuntary, coordinated, repetitive, rhythmic, seemingly purposeless movements.

6.3. Developmental inspiration and related work 124

(seemingly random) movements, appeared to explore their action space, before discovering that kicks

against the floor had “interesting” consequences. After an initial exploratory assembly phase, the in-

fants selected particular behaviors, and began – during the subsequent tuning phase – to exploit the

physical characteristics of the mass-spring system. Additional evidence for exploratory learning was

gathered by Haehl et al. (2000), whose findings seem to suggest that while learning to cruise (that is,

to walk with support), infants display an initial poorly controlled and unstable exploratory “wobble

phase”, characterized by a large number of movement changes. We hypothesize that – as in The-

len’s and in Goldfield’s study – the wobble phase is a result of the exploration of the parameter space

associated with musculo-skeletal apparatus. Along a similar line, but in a different developmental

context, Meltzoff and Moore (1997) introduced the notion of body babbling, which they defined as

an experiential process during which infants, in order to acquire a mapping between movements and

the organ-relation end states that are attained, move their limbs and facial organs in repetitive body

play similar to vocal babbling. It is interesting to note that such babbling is closely related to Piaget’s

circular reaction learning paradigm (Piaget, 1953), and to Thorndike’s classical law of effect, which

states that a given behavior acquired by trial-and-error is more likely to (re-)occur if its consequences

are satisfying (Thorndike, 1911).

From the literature reviewed above it is evident that random motor activity albeit not linked to a

specific functional goal (such as reaching for an object, or turning the head in a particular direction),

can generate correlated sensory information across different sensory modalities (see principle of in-

formation self-structuring). The importance of the information generated cannot be overemphasized.

Indeed, it gives the infant the opportunity to acquire and refine the ability to predict the sensory con-

sequences of its own actions. Surprisingly, despite their obvious relevance as a precursor to motor

control, the neural mechanisms underlying such “stochastic” exploratory activity as well as the be-

havioral implications remain largely unknown. Not many modeling attempts have been made so far.

Van Heijst and Vos (1997) proposed a model of spontaneous activity present in the developing spinal

cord inspired by work of Bullock et al. (1993). The source of stochasticity was a set of randomly

chosen, but fixed, connections between a group of neurons (called spontaneous activity cluster) and

a sinusoidal rhythm generator. Harris (1998) proposed a neural mechanism of stochastic gradient de-

scent through which behavior may be optimized and related it to cerebellar physiology. Harris’ model

included a “noise generator”, that is, an exploratory random process determined by the spontaneous

activity of the neurons in the inferior olive. In that particular model, the optimal behavior could be

“discovered” by finding how the value (performance index) changed with the control parameters. This

in turn required ”some” exploration of the parameter space.

A real world application of a stochastic trial-and-error process for selection of actions in an on-line

environment is the work by Howell and Best (2000), who used a set of “continuous action reinforce-

ment learning automata” to tune a linear controller of an engine. A similar strategy was adopted in the

6.4. Enter simulated annealing 125

study reported here.

6.4 Enter simulated annealing

Simulated Annealing (SA) is a well-known stochastic technique employed for the optimization of com-

plex continuous or discrete systems (Kirkpatrick et al., 1983; Cerny, 1985; Kirkpatrick and Gregory,

1995). At the core of this method is an analogy with the way metal alloys are manufactured: First

they are heated, sometimes up to their melting point, then the metal alloy is slowly cooled down to

give its molecules the possibility to find an arrangement of lowest energy. More specifically, a ther-

modynamical system (the molecules composing the metal, for instance), being offered a succession of

options, changes its configuration from energy E j to energy E j+1 with a probability p = 1 (certainty)

in the case of E j+1 < E j (“downhill move toward an energetically more stable configuration”) and with

a probability

p =e−E j+1/kT

e−E j+1/kT + e−E j/kT=

1

1 + e(E j+1−E j)/kT≈ e−(E j+1−E j)/kT

otherwise (“uphill move toward a less stable configuration”). The parameter T > 0 denotes the tem-

perature or amount of thermal noise of the system, and k is the Boltzmann constant. The temperature

parameter, if sufficiently large, allows the system to make state transitions that would be improbable at

lower temperatures, and which can temporarily lead to an increase of energy. The appropriate choice of

the annealing or cooling schedule Tt , i.e., the sequence of temperatures and the amount of time spent at

each, plays a key role for determining success or failure of the annealing process (Hajek, 1988). On the

one hand, a fast annealing schedule allows for a rapid convergence to a local minimum of energy; on

the other hand, a slow cooling may lead to a better exploration of the space of possible configurations,

and eventually to an energetically more stable local minimum. Essentially, this is an instantiation of

the exploration-exploitation dilemma.

Due two its simplicity, Simulated Annealing has been used in a wide range of applications with a

large parameter space, such as the traveling salesman problem (Cerny, 1985), and multi-objective opti-

mization of complex analog and digital integrated circuits (Kirkpatrick and Gregory, 1995). Due to the

“generality” of its framework, SA can also be employed in combination with other methods (e.g., arti-

ficial neural networks). Here, we discuss the combination of Simulated Annealing with the Metropolis

algorithm (Metropolis et al., 1953) – a particular type of Monte Carlo method that solves problems

by randomly generating number or parameter or state configurations, and by observing the fraction

of configurations obeying some properties (Whitman and Kalos, 1982). In the case of the Metropo-

lis algorithm the distribution of the parameter configurations converges to the Boltzmann distribution

(function commonly used in statistical mechanics to determined the speeds of molecules).

6.5. Parameter exploration 126

6.5 Parameter exploration

In this section, we present and describe in detail a flexible parameter exploration scheme that can be

easily adapted to a large class of problems. The results of simulations and of experiments performed

with an high-performance robotic head, as well as possible alternative applications of the algorithm are

discussed in subsequent sections.

T=0.5

T=1

T=1.5

T=2

T=2.5

0 0.5 1 1.5 2 2.5 3 3.5 4

0

0.2

0.4

0.6

0.8

1

-dV

f(x)

=ex

p(dV

/T)

Figure 6.1: Metropolis-like exponential probability distribution. This figure exemplifies the ef-fect of the temperature on the probability to make an downhill move. See text for details.

Let Vt denote the value of the performed action at time t, and δV = Vt −Vt−1 the difference of

two subsequent values. As a slight notational difference to traditional SA, we exchanged Et , which

traditionally represents the energy of the system at time t, with Vt , and exchanged uphill moves in the

space of possible state configurations with downhill moves in the state of possible parameters. Reason

being that “good” parameter configurations lead to a high value – corresponding to energetically more

stable state configurations. The opposite holds for “bad” parameter configurations. The algorithmic

gist, however, remains unperturbed. In order to conform to the original Metropolis algorithm δE =

−δV = Vt−1−Vt . If δE < 0 , that is δV > 0, we say that the previously performed parameter change

had a positive value and hence gave rise to an uphill move.

The exploration process works as follows: A particular action is performed, and its value Vt com-

puted. Vt ∈ [0,c], where c > 0, and hence −c < δV < c. Please note that δV has to be somehow

matched to the possible temperatures Tt of the annealing process (cf. Eqn.6.2 and Fig.6.1).


1: V0← 02: T0← Tmax

3: c← 14: γ← number close to 15: ε← small number6: t← 17: x0← initial values ∈ [xlo,xhi]8: u← (xhi−xlo)/N9: repeat

10: test parameter settings11: get Vt

12: δV ←Vt −Vt−1

13: if δV > 0 then14: Tt ← γTt−1

15: repeat16: extract nt from N(0,1) limited to [−1,+1]17: a← nt (c−Vt)18: until (xt−1 + aTt u)< xhi and (xt−1 + aTt u)> xlo

19: xt ← xt−1 + aTt u20: else21: Tt ← Tt−1

22: p← eδV/Tt

23: extract r from uniform distribution [0,1]24: if r < p then25: repeat26: extract nt from N(0,1) limited to [−1,+1]27: a← nt (c−Vt)28: until (xt−1 + aTt u)< xhi and (xt−1 + aTt u)> xlo

29: xt ← xt−1 + aTt u30: else31: repeat32: extract nt from N(0,1) limited to [−1,+1]33: a← nt (c−Vt)34: until (xt−2 + aTt u)< xhi and (xt−2 + aTt u)> xlo

35: xt ← xt−2 + aTt u36: end if37: end if38: t← t+139: until terminating criterion is satisfied

Figure 6.2: Pseudo-code of the exploration process. For explanations see text.


If δV > 0, we perform the following exploration step:

xt = xt−1 + nt Tt (c−Vt)u , (6.1)

where xt is the N-dimensional vector of parameters at time t, that is xt = (x1t ,x

2t , . . .,x

Nt ). nt is the

realization of a Gaussian random process with zero mean and unitary variance – also called the gener-

ating function – which is limited, via an upper and lower saturation threshold, to the interval [−1,1]. A

generating function with a Cauchy distribution would have been another candidate. This thresholding

is necessary, in order to keep the size of the exploration steps under control (gaussian distributions have

rather long tails). The vector u is a parameter-dependent normalization factor. A possible candidate is

u = (xhi−xlo)/N, with N 1, and xhi and xlo being vectors whose elements are the upper and lower

limits of the elements of xt . Another candidate is u = (xhi− xlo)/T0, where T0 is the initial tempera-

ture. The term nt Tt (c−Vt )u can be picturized as some sort of probability cloud centered in xt−1, or

xt−2 respectively. The radius of the probability cloud shrinks for increasing values Vt , and decreasing

temperature Tt .

If δV < 0, we either attempt a downhill move according to Eqn. 6.1 (accepting the parameter

configuration xt ) with a probability:

p = eδVTt ,

or we step back to the former position xt−2 (discarding the previously tested parameter configuration

xt−1), and starting from there, we sample another area of the parameter space. With the previously

introduced formalism this gives the following parameter update:

xt = xt−2 + nt Tt (c−Vt)u .

Now, remember that the temperature parameter Tt > 0 is subject to an annealing schedule. Figure 6.1

shows the exponentially decaying trace for various temperature settings. As can be seen, decreasing

temperatures reduce the probability of uphill moves. The choice of a suitable Tt is important. We opted

for an exponential annealing schedule (for other annealing schedules see Press et al., 1995, p. 452):

Tt = γTt−1 (6.2)

where γ is close to 1. As can be seen, the temperature parameter is limited to the interval [0,T0]. We

opted, after some testing with our setup, for T0 = 20. A correct choice of the initial temperature leads to

a rapid initial exploration of many alternative paths in the parameter space. According to this scheme,

as the exploration proceeds, the control parameter Tt , as well as Vt are gradually reduced. In other

words, the system goes uphill as well as downhill, but the lower the temperature the less likely is any

6.6. Control problem 129

significant downhill excursion, so as to allow the system to settle in a promising area of the parameter

space. SA can be described as a sequence of Markov chains, each corresponding to a temperature value

Tt . Every computational step of a chain starts only after the previous step has been completed, thus the

operation of SA is strictly sequential.

6.6 Control problem

Despite advances in the field of control systems and the availability of many sophisticated control

schemes, PID control, due to its versatility and ease of use, remains a very common control algorithm.

The input of the controller at time t is the deviation e(t) of the measured feedback f eed(t) from

a specified reference signal re f (t) (see Fig. 6.3): e(t) = re f (t)− f eed(t). A standard form of the

controller is given by the following equation:

y(t) = Kp e(t) + Ki

Z T

0e(t)dt + Kd

de(t)dt

.

The output y(t) is fed to the input of the motor (labeled controlled system in Fig. 6.3).

ref

feed

Performanceevaluation

-1

Stochasticexploration

Feedback

Controlledsystem

Controller

Figure 6.3: Control scheme. The arrows in the model depict the information flow.

Optimal system performance requires the parameters Kp, Ki, and Kd to be appropriately chosen

and tuned, so as to meet prescribed time-domain performance criteria given a particular combination

of motor dynamics and load inertia. These criteria are usually specified in terms of rise and settling

6.7. Simulation 130

time, overshoot and steady-state error of one of the state variables (or a derived variable thereof, e.g.,

position), following a step change in the reference signal (set-point tracking). For the experiments

reported in this chapter we made use of the former two criteria: The overshoot M p, which is defined as

the amount of overcorrection in an under-damped control system (in an over-damped control system

the overshoot is zero), and the rise-time Tr, which is defined as the time required for the system’s step

response to rise from 0% of its final value to 100% of its final value. As reference signal we employed

a square wave of fixed frequency, but variable amplitude.

Typically, the parameters Kp, Ki, and Kd of the PID controller (or PID compensator) are determined

by means of some heuristics or some other criteria (Spong and Vidyasagar, 1989) which require a

model of the to-be-controlled system. Here, we show how a value-based stochastic exploration of the

parameter space can be used to achieve the same result without the need for such a system model.

6.7 Simulation

The proposed stochastic exploration process was first tested on a discretized version of the following

lumped-parameter model of a DC-motor:

Jdωdt

=−K f ω + Km i

Ldidt

=−Ri−Kb ω + u

where ω and i are the state variables of the system (rotating speed and current, respectively). J is

the sum of motor and load inertia, K f the electromotive force constant; Km is the armature constant,

and Kb the damping ratio of the mechanical system. R is the electrical resistance in Ohm, L is the

electrical inductance (Henry), and u is the source voltage (input). Unless specified otherwise, the

following parameters were kept constant throughout the study: R = 2.0Ω, L = 0.5H , Kb = 0.2Nmsec,

K f = Km = 0.15Nm/A, J = 0.025kgm2/s2.

At the outset of the exploration run, the controller’s PID parameters were randomly initialized in the

interval [0.0,1.0]. One iteration of the exploration process consisted in measuring the response of the

controlled system (DC-motor) to step input, and evaluating its performance in terms of overshoot and

rise time. Each iteration step lasted around three “simulated” seconds. The input signal was then reset,

and based on the result of the “performance evaluation” a new set of PID parameter was chosen (see

Fig. 6.3). The change of value over “simulated” time for two independent exploratory runs is shown in

Figure 6.4. In both cases the value was V = 1/(1 + k Mp Tr), where Mp is the overshoot, Tr the rising

time, and k > 0 a suitably chosen constant. Figure 6.5 displays a “cut” through the 3-dimensional

value landscape determined by a systematic exploration of the parameter space. Overlapped are the

6.8. Real world setup 131

parameters explored by the value-based exploration scheme.

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 200 400 600 800 1000 1200 1400 1600 1800

Val

ue

Time [s]

Figure 6.4: Normalized value vs. time (min=0.0, max=1.0). Shown are the results for V = 1/(1 +k Mp Tr). As can be seen from the graphs, in both cases, after 1000 “simulated” seconds thevalue is already very high.

6.8 Real world setup

The stochastic exploration process was also tested in a real world setup, that is, the robot head depicted

in Fig. 6.6. Each of the robot’s eyes had two independent degrees of freedom: pan and tilt. The neck

had also two degrees of freedom. Eye and neck motors differed in size and in terms of load inertia,

and hence had dissimilar dynamics. This is visible in Figure 6.7, which displays the position over time

of right eye pan and of the neck pan – given the same triple (Kp,Ki,Kd). Due to the larger inertia of

the neck pan motor (compared to the eye pan motor), the overshoot for the neck pan is larger. The

control/sampling rate was fixed to 50Hz.

We used the stochastic exploration procedure to simultaneously search the parameter space of the

PID controllers of the left and right eye pan, and the neck pan degree of freedom (totally 9 parameters).

In response to a step input, the three transient performance objectives to be minimized were: (a) V1 =

k Mp (overshoot), (b) V2 = k Mp Tr (overshoot multiplied by rise time), and (c) V3 = k (Mp + 0.1)Tr

(results not shown here), with k a small constant. Mp and Tr were both normalized to lie in the interval

[0,1] by dividing them with the maximum of the reference signal, and with the duration of the step

function applied to the motor, respectively.


0

0.2

0.4

0.6

0.8

1

1

2

3

4

5

6

7

0

0.5

1

Ki

Kp

valu

e

Figure 6.5: Systematic exploration of the parameter space and resulting value landscape forKd = 4.0. The white dots are the parameters explored during a value-based exploration.

The PID parameters were initialized with a set of parameter leading to oscillations: (Kp,Ki,Kd) =

(1.0,1.0,0.0). For each iteration of the exploration procedure, new PID parameters for the controller

of each motor were set, and subsequently tested for 3sec. The input signal was then reset, and both

eyes and the neck were allowed to go back to their initial (zero) position. The results for right eye pan

(EPR) and neck pan in two representative experimental runs are displayed in Fig. 6.9 and Fig. 6.10.

The main results for those cases are summarized in Table 6.1 and Table 6.2. As can be seen the initial

parameters lead to large initial overshoots (Mp0 > 0.7). In all four cases, the exploration is able to

successfully make out parameter settings, which yield very low overshoot. The number of visualized

exploratory iterations are 45 (Fig. 6.9) and 73 (Fig. 6.10).

V = k Mp Mp0 Mp f Tr0 Tr f Kp0 Kp f Ki0 Ki f Kd0 Kd f

EPR 0.88 ≈0 0.04 0.54 1.0 2.07 1.0 0.001 0.0 1.41Neck Pan 0.72 0.05 0.10 0.30 1.0 2.78 1.0 0.029 0.0 0.053

Table 6.1: EPR = Eye-pan right; Mp0: initial normalized overshoot; Mp f : final normalized over-shoot; Tr0: initial rise-time (in sec.); Tr f : final rise-time (in sec.).


Figure 6.6: High-performance robot head used in our experiments.

-2000

-1000

0

1000

2000

3000

4000

5000

6000

245 250 255 260 265 270

Pos

Time [s]

-10000

-5000

0

5000

10000

15000

20000

25000

245 250 255 260 265 270

Pos

Time [s]

Figure 6.7: Qualitative comparison between dynamics of eye pan and neck pan degrees offreedom.

6.9. Discussion 134

V = k Mp Tr Mp0 Mp f Mp0 Tr0 Mp f Tr f Kp0 Kp f Ki0 Ki f Kd0 Kd f

EPR 0.87 ≈0 0.035 ≈0 1.0 1.99 1.0 0.001 0.0 0.188Neck Pan 0.71 0.006 0.071 0.004 1.0 2.35 1.0 0.003 0.0 1.77

Table 6.2: EPR = Eye-pan right; Mp0: initial normalized overshoot; Mp f : final normalized over-shoot; Tr0: initial rise-time (in sec.); Tr f : final rise-time (in sec.).

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0 50 100 150 200 250 300 350 400 450

Ove

rsho

ot*R

iseT

ime

Time [s]

Figure 6.8: Time series of transient performance evaluation for eye pan degree of freedom.

6.9 Discussion

We have proposed a value-based stochastic parameter exploration scheme, and have shown (in simu-

lation and in a real robotic system) how it can be used to automatize the calibration of a set of PID

controllers. Our experimental results demonstrate that the number of iterations (and thus, the time)

required to attain convergence were acceptably low (see Fig.6.8). This result is remarkable because the

exploration despite starting with no a priori knowledge and no model of the system was able to con-

verge to a sub-optimal (but satisfying) solution in a short time. It is important to note that the prompt

availability of afferent sensory feedback was crucial for the exploration to work. An intrinsic feature

of the proposed scheme is that the parameter space is explored by gathering information to identify the

direction of parameter change that leads to an improvement of behavioral performance.

The suggested exploration process has a few distinct advantages over systematic search, or standard

gradient-based exploration. First off, the method is not greedy, i.e., the tendency to get stuck in local

minima is reduced. Commonly used strategies to jump out of local minima are either the introduction of

a small amount of noise, or “kicks” when a local minimum is reached. However, this may not be always

acceptable in a real world setup. In many ways, Simulated Annealing prescribes a controlled way of

6.10. Conclusion 135

introducing noise for a more robust iterative search. Second, the Metropolis part of the exploration

scheme quite naturally results in an “extended search” in the neighborhoods of local optima (as shown

by Figure 6.5).

Mechanisms of stochastic parameter exploration deserve attention for at least four reasons: (a) they

are simple; (b) they ensure that any possible parameter setting will be tested eventually (statistically

speaking); (c) they do not introduce any strong biases in the parameter selection; and (d) due to their

dynamic nature they are adaptive and robust against external disturbances (exploration never stops).

There are, however, also a few disadvantages: (1) In various instances random walk exploration

can require an exponential time, and often does not take full advantage of the opportunity to select the

most informative action/query (Thrun, 1995). It has been shown, for instance, that directed exploration

techniques – i.e., techniques that utilize some exploration-specific heuristics for guiding exploration

search (Thrun, 1992) – can reduce the complexity of active learning from exponential learning time

(random exploration) to polynomial training time. This heuristics for optimizing knowledge gain,

however, introduces a designer bias. (2) The development of a proper annealing schedule requires

some experimentation. Indeed, the development of an annealing schedule that works robustly for a

whole class of similar problems is still an open question (Kirkpatrick and Gregory, 1995, p. 877).

(3) A similar problem affects the choice of an appropriate generating function. (4) In an open-ended

scenario, in which exploration does not stop, another problem that would need to be considered is

“re-annealing”, that is, the resetting of the annealing schedule.

On a final note, although this chapter did not stress the biological relevance of the proposed scheme,

recent work by Kadar et al. (2002) seems to suggest that many patterns observed during exploratory

motor learning may indeed be explained by some sort of biased random walk/diffusion control model.

Interestingly, our scheme can be easily conceptualized as a directed random walk (Metropolis part) in

which the degree of randomness (diffusion) is controlled by an annealing schedule (Simulated Anneal-

ing part). It follows that further refinements of the scheme proposed here, may prove to be adequate

for modeling exploratory motor learning in infants and adults.

6.10 Conclusion

This chapter presented a particular instantiation of the principle of exploratory activity: a value-based

stochastic exploration scheme. The algorithm was applied first tested in simulation, and then applied to

the adaptive exploration of a 9-dimensional parameter associated with the controller of a robotic active

vision system. The system quickly converged to a “good” parameter configuration. We conclude,

that the algorithm can be considered an appropriate candidate in all those situations in which a large

parameter space should be rapidly explored.

Before concluding, it is important to address an issue that has been already raised in the conclu-


sion of Chapter 4, that is, how does exploration relate to learning and to development? To stress the

point, we repeat here what partially has been already said in previous chapters. In our framework,

exploration is seen as a fundamental aspect of task acquisition that goes beyond learning. Exploration

produces the diversity of sensory-motor trajectories which higher brain systems can select and exploit

to realize learning. In this sense, exploration represents not only a crucial aspect of learning, but also a

mechanisms of plasticity and adaptivity in its own right. The use of a stochastic value-based parameter

exploration scheme such as the one discussed here is only a first step toward learning. Clearly, explo-

ration has to include more than the mere exploration of the parameter space associated with the control

system. As shown in chapters 3, 4, and 5, exploration of intrinsic dynamics, of the interaction with the

environment, as well as the coupling between body, control, and environment play also a critical role.


-2000

-1000

0

1000

2000

3000

4000

5000

6000

0 50 100 150 200 250

Pos

Time [s]

-10000

-5000

0

5000

10000

15000

20000

25000

0 50 100 150 200 250

Pos

Time [s]

-2000

-1000

0

1000

2000

3000

4000

5000

6000

0 5 10 15 20 25 30

Pos

Time [s]

-10000

-5000

0

5000

10000

15000

20000

25000

0 5 10 15 20 25 30

Pos

Time [s]

-2000

-1000

0

1000

2000

3000

4000

5000

6000

245 250 255 260 265 270

Pos

Time [s]

-10000

-5000

0

5000

10000

15000

20000

25000

245 250 255 260 265 270

Pos

Time [s]

Figure 6.9: Time series of right eye pan and neck pan degrees of freedom for V = k Mp (seetext). The desired position (square wave of period 6sec) and the effective (measured position)are superposed. The length of the series is T = 273sec (corresponding to 45 exploratory itera-tions). First column, first row: Complete time series of right eye pan. Second column, first row:Complete time series of neck pan. Second and third row are close-ups of beginning and end ofthe stochastic exploration of eye and neck, respectively.


-2000

-1000

0

1000

2000

3000

4000

5000

6000

0 50 100 150 200 250 300 350 400

Pos

Time [s]

-15000

-10000

-5000

0

5000

10000

15000

20000

25000

30000

0 50 100 150 200 250 300 350 400

Pos

Time [s]

-2000

-1000

0

1000

2000

3000

4000

5000

6000

5 10 15 20 25

Pos

Time [s]

-15000

-10000

-5000

0

5000

10000

15000

20000

25000

30000

5 10 15 20 25

Pos

Time [s]

-2000

-1000

0

1000

2000

3000

4000

5000

6000

415 420 425 430 435

Pos

Time [s]

-15000

-10000

-5000

0

5000

10000

15000

20000

25000

30000

415 420 425 430 435

Pos

Time [s]

Figure 6.10: Time series of right eye pan and neck pan degrees of freedom for V = k Mp Tr

(see text). The desired position (square wave of period 6sec) and the position measured viaencoder are superposed. The length of the series is T = 439sec (corresponding to 73 exploratoryiterations). First column, first row: Complete time series of right eye pan. Second column, firstrow: Complete time series of neck pan. Second and third row are close-ups of beginning andend of the stochastic exploration of eye and neck, respectively.

Chapter 7

Information-theoretic Analysis of Sensory

Data1

7.1 Synopsis

This chapter represents the first attempt to explore the possibility to quantitatively analyse sensory

data, and the interaction between the signals in different sensory channels of an embodied agent in-

teraction with its local environment (a second and third attempt will be described in two subsequent

chapters). Here, the goal is to get a better intuition of these data which constitute, in a sense, the “raw”

(unprocessed) material that the neural system has to cope with. As an example of analysis, we em-

ploy information theoretic measures such as the Shannon entropy and mutual information. The hope

is that the results emerging from this research will eventually lead to a more formal and quantitative

description of some of the design principles of autonomous agents.

7.2 Introduction

As discussed extensively in Chapter 1 (Introduction) and Chapter 2 (Research Landscape), over the

last decade-or-so, a number of researchers have adopted a developmental perspective on artificial in-

telligence and robotics. The ultimate shared goal among them seems to be the idea of bootstrapping

high-level cognition through a process in which an agent interacts with a real physical environment

over extendend periods of time. Some of the research performed has focused on construction of robots,

some of it examined internal mechanisms by employing embodied models of cognition, such as robots,

and some of it was metaphorical.

1Appeared as Lungarella, M. and Pfeifer, R. Robot as cognitive tools: information-theoretic analysis of sensory-motordata. In Proc. of the 2nd IEEE-RAS Int. Conf. on Humanoid Robotics, pp.245-252, 2001.

139


This chapter (as well as chapters 8 and 9) represent an attempt at a more quantitative approach to

the development of embodied cognition. The goal is to acquire an understanding of the sensory data

generated in the different sensory channels as a result of an agent’s interaction with the real world,

as this is – so to speak – the “raw material” the neural substrate has to process. The way such data

is produced depends, in turn, on the particular embodiment of the agent, i.e., its morphology (type,

position, and characteristics of the individual sensors and actuators), as well as on the materials the

agent’s body is made of.

There has been some previous work trying to tackle similar issues. Most of it has been focused on

categorization, that is, the ability of making distinctions in the real world (Pfeifer and Scheier, 1997).

Such ability is of one of the most fundamental cognitive abilities. Indeed, a system that cannot make

distinctions will neither have much of a chance of survival (in the case its a natural organism), nor will it

be of much use (in the case its an artificial system such as a robot). In this research, categorization was

implemented as a process of sensory-motor coordination as suggested early on by Dewey (1896), later

by Edelman (1987), and Thelen and Smith (1994). This approach was chosen to overcome the prob-

lems of classical – disembodied – categorization models like ALCOVE (Kruschke, 1992), which view

categorization as a process of mapping an input vector onto category nodes. Such classical models start

from the assumption that there is an input vector consisting of “psychologically relevant dimensions”,

such as size of an object, its color, its weight, and so on. An agent interacting with the real world, on

the other hand, is exposed to a continuously varying stream of sensory stimulation. This represents a

completely different problem. It has been shown for simple cases that through sensory-motor coordina-

tion temporarily stable patterns of sensory stimulation can be induced, and a dimensionality reduction

of the high-dimensional sensory space can be achieved (Pfeifer and Scheier, 1997; Te Boekhorst et al.,

2003) (see Chapter 8). In ALCOVE, for instance, there are typically three or four nodes in the input

layer, which constitutes only a low-dimensional space. If done properly, sensory-motor coordination

can lead to the generation of “good” sensory data, that is, data which can result in a simplification of

category learning, in a stable categorization behavior.

The present chapter explores the possibility of a quantitative analysis of sensory data. The lead-

ing questions are: “Is it possible to quantify the informational structure produced by sensory-motor

interaction? And how does it compare to the case in which there is no sensory-motor interaction?”

As an example method we employ information theoretic measures to more quantitatively describe the

sensory data and the interrelation between the different sensory channels as the agent interacts with

the real world. We start with a short discussion of some basic aspects of sensorimotor coordination.

Then, we describe a number of experiments performed with a robotic systems and analyze the results.

Eventually, we discuss what we can learn from this type of analysis, and point to some future work.

7.3. Sensory-motor coordination 141

7.3 Sensory-motor coordination

By definition sensory-motor coordination involves both the sensory and the motor systems. In other

words, it involves the agent’s body. As mentioned above, the sensory stimulation that the neural system

has to process depends on the agent’s morphology and on its behavior: through its movements, in par-

ticular through sensory-motor coordinated movements, an agent can induce stable sensory patterns in

different sensory channels that can be exploited to form cross-modal associations (Pfeifer and Scheier,

1997). These cross-modal associations seem to be a basic prerequisite for concept formation (The-

len and Smith, 1994), which in turn are of fundamental importance for the emergence of what might

be called high-level cognition. Cross-modal associations, which also depend on the agent’s morphol-

ogy, nicely demonstrate how embodiment does not only have physical implications, but information

theoretic ones as well. In other words, sensory stimulation is influenced by at least two factors –

morphology and sensory-motor coordination – which are closely related.

Exploration strategies are particular instances of sensory-motor coordinations that are used to ex-

tract different kinds of “information” from the surrounding environment. Tactile or haptic informa-

tion picked up in combination with systematic exploratory movements of the hand (or mouth), yields

richer sensory stimulation and thus potentially more and better information than passive contact, e.g.,

particular hand movements that can be identified as being critical to the ability to recognize objects.

Results from studies on human subjects indicate that people explore objects consistently using dif-

ferent exploratory hand movements, depending on the knowledge (information) they are instructed to

obtain (Lederman and Klatzky, 1990). Particular exploration procedures are used to extract hardness,

pressure, texture, or compliance, because they provide the sensory input which is “optimal and some-

times necessary”, for extracting the desired information. The same holds for vision. Eye movements,

for instance, have a task-dependent character, and are very much depend on what perceptual judge-

ments the human subject is asked to make (Yarbus, 1967). Differences between tasks influence the

way we pick up information, which may or may not maximizes the information intake. In other words,

eye movements influence the statistics of the effective visual input. Lee and Yu (1999) sketch an active

perception framework based on information maximization to reason about the organization of saccadic

eye movements.

7.4 Experimental setup and experiments

Previous work on categorization led to the hypothesis of dimensionality reduction through sensory-

motor coordination. Such idea is suggestive, but has been derived from very simple cases. To further

corroborate this hypothesis, we increased the complexity of the agent. According to the “principle

of ecological balance” (Pfeifer, 1996; Pfeifer and Scheier, 1999) to achieve more interesting kinds

7.4. Experiments 142

of sensory-motor coordination, there has to be a balance of the complexity among the agent’s task-

environment, and its sensory, motor and neural system. This makes also sense from a developmental

perspective. Bushnell and Boudreau (1993) talk about “motor development in the mind”, i.e., in human

infants there is a co-development of the sensory and motor system.

Figure 7.1: Left: Basic manipulator geomtry.

To be able to simulate this co-development, our experiments were performed by using a five degrees

of freedom industrial robot manipulator, equipped with a color CCD-camera that was mounted on the

robot’s end-effector. This setup is often referred to as an “eye-in-hand configuration” (see Fig. 7.1).

The camera was the only exteroceptive sensor used in this set of experiments. Video frames were

recorded at a rate of 10Hz, and the resolution was reduced (downsampled) to 192x192 pixels per

frame. The sensory data were stored into a time-series file. The control of the robot was image-based,

that is, the desired end-effector position was achieved by processing the downsampled camera image

in an adequate way.

The experiments were performed in an unstructered, and static real world environment, cluttered

with a variable number of objects, of different color, form, and size.

The robot’s task was to foveate on relatively small-sized red-colored objects of different shape.

The control architecture was hard-wired. The sensory part consisted of two one-dimensional arrays

of color-sensitive cells (subsequently referred to as 1-D retinas) arranged as a cross. The 1-D retinas

consisted of a certain number of color sensitive rectangles (which might be interpreted as receptors

or sensory channels) – there were M receptors for the horizontal retina, and N for the vertical one.

The output of an individual receptor of the retina was the average over a rectangular patch of pixels


of the original camera image (see Figure 2). The receptor density was variable and depended on the

horizontal or vertical position, x or y, respectively (see Fig.2). Three receptors were considered: red (r),

green (g), and blue (b). An additional receptor, sensitive to intensity, was obtained as I = (r +g+b)/3.

An attenuation of the changing lighting conditions was achieved through a color-space transformation

described in (Itti et al., 1998). Three “broadly” color-tuned channels were created: R = r− (g + b)/2

for red, G = g− (r + b)/2 for green, and B = b− (r + g)/2 for blue. The negative values were set to

zero. Each channel yielded maximal response for the pure, fully saturated color to which it was tuned,

and yielded zero response for black (r = g = b = 0) and white (r = g = b = 255) inputs.

On the motor side (refer to Fig. 7.1), the moto-neuron controlling joint J0 (shoulder-pan), responsi-

ble for the rotation around the vertical axis, was fully connected to the color-space transformed outputs

of the horizontal 1-D retina. The moto-neurons of joint J1 (shoulder-tilt), and joint J4 (wrist), which

were responsible for the up and down movement of the camera, and which were “kinematically” cou-

pled (see eqn.1), in order to keep the camera always horizontal, were fully connected to the receptors

of the vertical retina. Furthermore, once the robot had successfully foveated on a red object, joint

J2 (elbow) was actuated in an oscillatory manner – inducing a forward (moving close) and backward

(moving away) movement of the end-effector. Its purpose was a slightly more complex sensory-motor

coordinated interaction with the object itself. The weights of the various neural connections were

hard-wired, and can be thought of as some sort of basic motor reflexes. The control architecture was

a “Braitenberg-style” reactive architecture, with direct sensory-motor couplings. The equations em-

ployed for updating the joint angles Ji[n] were:

J0[n + 1] = J0[n] + f0[n]M

∑i=0

w0,i RH,i[n]

J1[n + 1] = J1[n] + f1[n]N

∑j=0

w1, j RV, j[n]

J2[n + 1] = J2[n] + f2[n] (w0,N/2 + w0,N/2−1/2)

J4[n + 1] = J4[n] + 90 − (J1[n] + J2[n])

where f0[n] = c0 (which was a constant), if J0[n] < J0,min, f0[n] = −c0, if J0[n] > J0,max, and f0[n] =

f0[n−1] otherwise. Identical equations hold for f1[n], f2[n], J1[n], and J2[n], respectively. w0,i and w1, j

represent the weights that connect the output of the horizontal and vertical retinas to the moto-neurons.

An additional feature of the control architecture was a habituation (or boredom) coefficient h. Its

purpose was to avoid situations in which the robot kept focusing on one and the same object. Its

effect was to move the robot into random joint angle configurations, whenever it had been in a certain

configuration for a certain period of time.

The sensory-motor coordinated behavior was compared with one that was not sensory-motor coor-

7.5. Analysis methods 144

dinated, and where the joints were actuated randomly.

7.5 Analysis methods

As mentioned above, we are interested in the quantitative analysis of the sensory data with the aim

of getting a better intuition of the “raw material” the neural system has to process. More specifically,

we want to investigate the use of information theoretic measures (Cover and Thomas, 1991; Papoulis,

1991; Shannon, 1948), such as the Shannon entropy H , the Shannon or mutual information MI, and

its normalized companion MI, if applied to the sensory channels of a situated autonomous agent. The

use of information theory to analyze the nature of sensory data was inspired by a series of articles

by Tononi et al. (1994, 1996) (see also Pfeifer and Scheier, 1999).

The Shannon entropy H(X) accounts for the potential diversity or variability that is displayed by

the random variable X (see appendix of this chapter for a definition), where X represents the signal

from the sensory channel. In principle, the Shannon entropy is equivalent to our intuitive notion of

information – the more unpredictable a signal or event, the more information its occurrence contributes.

MI(X ,Y ) is a measure that takes into account linear as well as nonlinear dependencies between two

time series (observations of a stochastic process), where, in our case, X and Y represent sensory signals

from different channels. This is in contrast to the better known correlation function CORR(X ,Y) (see

appendix), which measures just linear dependencies.

A straightforward method of computing entropy (or self-information, as it is sometimes called)

and mutual information, is to estimate the first and second order probability density functions, p(x)

and p(x,y), by normalizing the 1-D and the 2-D histograms of the time-series. The entropy is

given by H(X) = −∑Ni=1 p(xi) log2 p(xi), for a discrete variable xi, which can be in N possible states

x1,x2, . . .,xN. The mutual information is defined as MI(X ;Y ) = H(X) + H(Y )−H(X ;Y), where

H(X ;Y ) = ∑N ∑M p(x,y) log2 p(x,y) is the joint entropy – the discrete random variable X and Y

have N or M different states, respectively. Furthermore, we define the redundancy as the “capacity-

normalized” mutual information MI(X ;Y) = MI(X ;Y)/C, with C = max p(x) MI(X ;Y), where the max-

imum is taken over all possible densities p(x). In our case, C = maxH(X) = 8bit for all sensory

channels, i.e., the sensory signals assume discrete values between 0 and 255. Usually, the channel

capacity C is measured in bits/sec, and the rate of transmitted information equals the entropy rate

H(X)bits/sec. We normalized everything by fs, which is the rate at which the sensors are read out.

These measures were applied to the four previously introduced sensory sub-modalities: red (R), green

(G), blue (B), and intenstity I = (R + G + B)/3.

7.6. Results 145

7.6 Results

We performed four different kind of analyses. The Shannon entropy (displayed on the vertical axis) of

the red, green, and blue sensory channels of the horizontal 1-D retina in the case of a random movement

can be seen on the left side of Figure 7.2. The retina is composed of 18 color-tuned receptors (R,

G, and B-channels). The graph on the right shows the case of sensory-motor coordinated behavior

(foveation on red-colored objects). In the central part of the retina (from receptor 8 to receptor 12)

the information inflow through the R-channels is increased, whereas the on in the peripheral part it is

decreased, if compared to the not sensory-motor coordinated case. The behavior of the G-channel and

the B-channel are reversed, i.e., the flow through the central channels is decreased. The same holds for

the vertical linear resolution retina, where the effect is less pronounced, but still visible (not shown).

The graphs are averages over six experimental runs.

Figure 7.2: Shannon Entropy for different sensory channels measure in bits. Left: No sensory-motor coordination. Right: Sensory-motor coordination (foveation on red objects).

The intra-sensory information overlap between two R-receptors is shown in Figure 7.3. Basically,

we set X = Ri and Y = R j, where Ri and R j are the ith and jth R-receptor, respectively, and compute

MI(X ;Y ). For perfectly dependent channels MI(X ;Y) = 1, while for completely independent ones

MI(X ;Y ) = 0. For not sensory-motor coordinated interaction, in our case the result of a random ac-

tuation of the joints, the information overlap is minimal, and there is no redundancy. Remember that

MI(X ;X) = H(X)/C, i.e., the redundancy is equivalent to the normalized self-information, which ex-

plains the diagonal ridge. The case of sensory-motor coordinated interaction is shown on the right side

of Figure 7.3. The information overlap is evident from the “bump” in the center of the 3-D plot. In

other words, the amount of information shared by the R-channels in the foveal part of the 1-D retina,

is larger than the one of the R-channels in the peripheral part. This redundancy is a consequence of the

interaction itself.

Figure 6 shows the information overlap between two sub-modalities of the same sensory modality –

7.7. Discussion 146

Figure 7.3: Mutual information between receptors of the same sensory modality. Randomactuation on the left. Sensory-motor coordination on the right.

color (R) and intensity (I). The color channel and the intensity chanlle share the same spatial location,

but are tuned to different stimulus dimensions, color and intensity, respectively. Again, as for the

Shannon entropy, if compared to the not sensory-motor coordinated case, the result of the sensory-

motor coordination is an increase of the mutual information in the central part of the retina, and a

decrease in the peripheral part.

The amount of variability (information) must not be confused with the cumulated amount of acti-

vation (total stimulation) of a particular sensor. Figure 7.4 illustrates this point. The not sensory-motor

coordinated case can be seen on the left side, whereas the sensory-motor coordinated case on the right

side. Since the robot’s task is to foveate on red objects, it is obvious that there will be a peak in the

cumulated stimulation of the R-receptor in that the agent spends a lot of time foveating on red objects.

Less obvious is the fact that the maximum of the entropy of the B-receptor for the same dataset is

slightly larger than the maximum for the R-receptor (see Figure 7.2). More striking is the difference

for the not sensory-motor coordinated case, where the relationship is even reversed! The entropy of the

G-channel and B-channel have an average of 5.7bit, the one of the R-channel 4.7bit. The cumulated

stimulation is higher for the R-channel though (as can be seen in Figure 7.4 left).

7.7 Discussion

The following points are of interest. An upper bound on the information flowing through a particular

sensory channel is given by the channel capacity, that is, H = C bit. Statistical information is maxi-

mized when all possible sensory states of the discretized signals have equal probability of occurrence,

which means that the probability density function is uniform. A lower bound is given by H = 0bit,

7.7. Discussion 147

Figure 7.4: Cumulated stimulation of the R, G, and B-receptors. The sensory-motor coordinatedcase in on the right.

that is, there is no variability whatsoever (constant sensory stimulation). In other words, the probability

density function has a single peak. Exploration strategies that lead to a balance between “predictabil-

ity” (low entropy) and “variability” (high entropy) of the sensory signal need further investigation. In

communication theory the goal is not quite the same as here. Given a certain amount of noise, the

information transfer through the communication channel has to be maximized, the more information

can be pushed through the channel without loss, the better.

The second point of interest is that sensory-motor coordinated interaction leads to redundancy in

the sensory channels of the same and of different modalities, i.e., to a higher mutual predictability

between them. If the two signals are totally uncorrelated, MI(X ;Y ) = 0, and the joint entropy equals

the sum of the individual entropies. The same measure reaches its maximum if the entropies of the

individual sensory channels are high, and there is a high correlation among them (low joint entropy) –

one sensory channel can be used as a predictor for the other one. This redundancy is clearly not present

in the case of not sensory-motor coordinated interaction. The mutual predictability in this case is much

lower (cf. Fig. 5).

Human infants exhibit a wide range of exploration strategies: mouthing, banging, fingering,

scratching, squeezing, waving, and listening (Kellman and Arterberry, 1998). When objects are placed

in the mouth, infants are able to detect surface properties (Meltzoff and Berton, 1979), and object char-

acteristics such as rigidity (Rochat, 1987). In other words, there are different actions related to the

exploration of different object properties, in the sense, that they provide the sensory input, which is

“optimal and sometimes necessary” (Kellman and Arterberry, 1998), for extracting the desired infor-

mation. As shown in this simple case study, the information flow through the various sensory channels

very much depends on the action itself, i.e., an appropriate choice of it seems to be of importance for

7.8. Conclusion and future work 148

the simplification of the subsequent neural processing.

Since entropy is an information theoretic measure that captures the variability of the sensory and

motor signals, it tells us something about the complexity of the interaction itself. In an agent context,

the more diverse the agent’s behavior, the more variation in the sensory channels, and the higher the en-

tropy in the sensory system. Since we are interested not only in complexity due to sensory stimulation,

but also in complexity due to self-generated sensory stimulation, we need to take into account aspects

of the motor system’s variability. A good measure to start with, is the ratio between the total entropy

of the motor signals x = (x1,x2, . . .,xS), and the total entropy of the sensory signals y = (y1,y2, . . .,yT ).

We define it as B = Hmotors(X)/Hsensors(Y ). There should be a match in the variability of the sensory

channels and of the motor outputs. In other words, B measures how well-balanced the motor and

sensory signals are.

The application of information theory is not devoid of problems. One of the biggest problems is the

huge data requirement. In order to avoid it, (strong) assumptions, about the signals involved, and/or the

noise have to be made, such as Gaussianity of the underlying stochastic process. These assumptions

are often unfounded and difficult to test. Nevertheless, assuming true independence of two random

processes, or normality of the signals, can significantly reduce the number of measurements required

for the analysis. In our case, no such assumptions about the sensory and motor data are necessary,

because a sufficiently large number of data has been collected. The number of samples per experiment

was around 3000, sufficient for the computation of measures like entropy and mutual information.

In this chapter the usage of information theory has been exlored. Other methods, such as statistical

analysis, dynamical system analysis, or neural networks, could also be used.

7.8 Conclusion and future work

The ideas exposed in this chapter are an attempt to a more formal description of some of the design

principles of autonomous agents (Pfeifer and Scheier, 1999), e.g., the principle of ecological balance

mentioned earlier.

We also tried to make a step forward in describing the requirements for adaptive agents: The

individual components (dimensions in sensory and motor space) must have a lot of variability, but they

must also be able to couple to others for specific tasks. This is precisely what complex systems are

about.

The importance of an appropriate sensory-motor coordinated interaction cannot be overestimated,

since, as the result show, it can lead to a structuring of the raw sensory stimulation, which in turn

is thought to speed up and simplify learning. We hypothesize that the self-information (entropy) of

the central and peripheral receptors largely depends on the type of interaction, but is less dependent

on the environment. In a similar way, eye-movements, and certain particular hand movements have

7.9. Information theoretic appendix 149

task-dependent characteristics.

Many additional experiments need to be performed to confirm our hypotheses, though. Other

sensory modalities (touch, audition) have to be taken into account. They would shed some light on how

infants discover intermodal relationships, and how the existence of multiple sensory channels allows

to learn more about, and function within, the real world. Different sensor morphologies and other

task-environments have to be tested and their effect on the information inflow needs to be explored.

The next step will then be to exploit the data for learning. Using similar kinds of analyses, we hope

that we will be able to derive the conditions under which agents can learn rapidly new categorization

behaviors, while maintaining the stability of existing ones.

7.9 Information theoretic appendix

Some useful definitions:

• Random variable RV : a variable that assumes a numerical value for each random outcome of an

event or experiment.

• Probability density function of a RV X p(x): normalized 1-D histogram of a RV.

• Joint probability density function of RV X and Y, p(x,y): normalized 2-D histogram of 2 RVs.

• Shannon entropy H(X): measures the randomness of a RV. The more random a RV, the more

entropy does it have. Intuitively it is a measure of (the logarithm of) the number of states the RV

could be in.

• Joint entropy H(X ;Y): measures the uncertainty about both X and Y.

• Mutual information MI(X ;Y) = H(X) + H(Y )−H(X ;Y ): measures the portion of entropy

shared by X and Y. It is high if both X and Y have entropy (high variance), and share a large

fraction of it (high co-variance). It is zero, if X and Y are statistically independent. In other

words, MI is a measure of the deviation from statistical independence.

• Channel capacity C = maxH(X): in our case the maximal amount of statistical information that

can be transferred through a sensory channel in a certain instant. It is computed over all possible

sensory signal distributions p(x).

• Redundancy MI(X ;Y) = MI(X ;Y )/C: the capacity-normalized mutual information.

Since the logarithm in base 2 is used, H and MI are measured in bits. Both, entropy and mutual infor-

mation are used in a statistical connotation; they can be thought of as multivariate generalizations of

7.9. Information theoretic appendix 150

variance and co-variance (univariate statistics) that are sensitive to both linear and nonlinear interac-

tions.

Chapter 8

Dimensionality Reduction through

Sensory-Motor Interaction1

8.1 Synopsis

Traditionally, the problem of category learning has been investigated by employing disembodied cate-

gorization models. One of the basic tenets of embodied cognitive science states that categorization can

be interpreted as a process of sensory-motor coordination, in which an embodied agent, while interact-

ing with its environment, can structure its own input space for the purpose of learning about categories.

Many researchers, including John Dewey and Jean Piaget, have argued that sensory-motor coordina-

tion is crucial for perception and for development. In this chapter we give a quantitative account of

why sensory-motor coordination is important for perception and category learning.

8.2 Introduction

The categorization and discrimination of sensory stimuli, and the generation of new perceptual cate-

gories is one of the most fundamental cognitive abilities (Edelman, 1987; Pfeifer and Scheier, 1999).

Perceptual categorization (Edelman, 1987) is of such importance that a natural organism incapable of

making perceptual discriminations does not have much of a chance of survival, and an artificial device,

such as a robot, lacking this capability, is only be of limited use. Traditionally the problem of cate-

gorization has been investigated by adopting a disembodied perspective. Categorization models like

ALCOVE (Attention Learning COVEring Map) (Kruschke, 1992) or SUSTAIN (Supervised and Unsu-

1Te Boekhorst, R., Lungarella, M. and Pfeifer, R. Dimensionality reduction through sensory-motor coordination. In Proc.of the Joint Int. Conf. on Artificial Neural Networks and Neural Information Processing, Lecture Notes in Computer Science2714, pp.805-812, 2003.

151


pervised STratified Adaptive Incremental Network) Love and Medin (1998) implement categorization

as a process of mapping an input vector consisting of “psychologically relevant dimensions” onto a set

of output (category) nodes. The problem with these traditional approaches is, roughly speaking, that

they do not work in the real world, e.g., when used on real robots, because they neglect the fact that in

the real world there are no input vectors that have been preselected by the designer, but continuously

changing sensory stimulation. Moreover, these models do not properly take into account that the prox-

imal stimulation originating from objects varies greatly depending on distance and orientation, and on

other factors that we do not discuss here.

Recently, Clark (1997) have introduced the concept of type-2 problems to denote datasets for which

the mapping from input nodes to output nodes cannot be extracted by means of learning algorithms

and statistical procedures used in classic categorization models. In contrast, whenever the aforemen-

tioned mapping can be learned from data alone, the data are said to correspond to a type-1 problem.

According to Clark and Thornton, the main difficulty in category learning is the generation of appro-

priate (type-1) data. As far as we know, there are two main strategies to achieve this. The first was

suggested by Clark (1997) themselves, and relies on an improvement of the internal processing, which

could be based on already learned things, for instance. The second approach is derived from the basic

tenets of embodied cognitive science and consists of exploiting processes of sensory-motor coordina-

tion (Pfeifer and Scheier, 1999; Scheier and Pfeifer, 1997). As suggested more than one century ago

by John Dewey 1896, categorization can be conceptualized as a process of sensory-motor coordinated

interaction – see also (Edelman, 1987; Pfeifer and Scheier, 1999; Thelen and Smith, 1994). Sensory-

motor coordination involves object-related actions, which can be used to structure the agent’s input

(sensory) space for the purpose of learning about object categories. The structuring of the sensory

space can be thought of as a mapping from a high dimensional input space to a sensory space with a

smaller number of dimensions. The important point to note is that the dimensionality reduction does

not necessarily have to be the result of internal processing only, but may be the result of an appropriate

embodied interaction. From the account given above, we derive two working hypotheses:

• Dimensionality reduction in the sensory space is the result of a sensory-motor coordinated inter-

action of the system with the surrounding environment. This leads to the emergence of correla-

tions among the input variables (sensors) and between these and the motor outputs.

• The particular temporal pattern of these correlations can be used to characterize the robot-

environment interaction, i.e., it can be considered to be a “fingerprint” of this interaction.

More specifically, we expect clearer correlations in the case of a robot that is driven by “sensory-

motor dynamics”, rather than in a robot that moves on the basis of a set of fixed and preprogrammed

instructions. Here, sensory-motor dynamics is defined as the dynamics of a system that is characterized

by continuous feedback between sensory input and motor output.

8.3. Real-world instantiation and environmental setup 153

In what follows, we describe how these two hypotheses were experimentally tested in a real robot.

In Section 8.3, we give an overview of the experimental setup we employed and of the five experiments

we performed. Then in Section 8.4, we describe and motivate the statistical methodology, which we

used to analyze the time-series collected during the robot experiments. Finally, in the last two sections,

we discuss what we have learned and point to some future work.

8.3 Real-world instantiation and environmental setup

All experiments described in this chapter were carried out with a circular wheeled mobile robot called

SamuraiT M . This mobile device is equipped with a ring of 12 ambient-light (AL) and 12 infrared (IR)

sensors, and a standard off-the-shelf color CCD camera for vision (Fig. 8.1). The two wheels allow for

independent translational and rotational movements. For the purpose of this study the original 128x128

Figure 8.1: Environmental setup. Object of different shape can be seen in the background. Ina typical experiment the robot started in one corner of the arena, and dependent on its in-builtreflexes, it tried to avoid obstacles, circled around them, or just tracked a moving obstacle (thesmall cylinder in the front). Note that the omnidirectional camera on the robot was not used forthe experiments discussed here.

pixel array was compressed into a 100 dimensional vector, whose elements were calculated by taking

the spatial average of the pixel intensity 1 over adjacent verticular rectangular image patches. Video

frames were recorded at a rate of 7Hz. In the course of each experimental session, the input data

coming from three (exteroceptive) sensory modalities (AL, IR, and vision), and the difference between

the left and right motor speed (angular velocity), were transferred to a computer and stored in a time-

1The statistical analysis described in this chapter is based on the intensity map of the image, which is obtained bycomputing the spatial average of the red, green and blue color map.

8.4. Statistical analysis 154

series file, yielding a 125 dimensional data vector per time step. The following five experiments were

carried out in a simple environment (a square arena) consisting either of stationary or moving objects

(Fig. 8.1). Each experiment was replicated 15 times. The resulting time series of each run consist of

N = 100 time steps.

• Experiment 1 – Control setup: The robot moved forward in a straight line with a constant speed.

A static red object was placed in its field of view, in the top left corner at the end of the arena.

The behavior of the robot displayed no sensory-motor coordination.

• Experiment 2 – Moving object: The complexity of the control setup was slightly increased by

letting the same object move with a constant speed. As in experiment 1, the behavior was not

sensory-motor coordinated.

• Experiment 3 – Wiggling: The robot was programmed to move forward in an oscillatory manner.

As in experiment 1 and 2, there is no sensory-motor coordination.

• Experiment 4 – Tracking 1: Simple sensory-motor coordination was implemented by letting the

robot move in such a way that it kept, as long as possible, a static object in the center of its field of

view, while moving forward towards the object. This behavior was sensory-motor coordinated.

• Experiment 5 – Tracking 2: As in Experiment 4, but now the robot had to keep a moving ob-

ject in the center of its field of view, while moving towards the object – simple sensory-motor

coordination.

The control architectures for the five experiments were designed so as to be as simple as possible for

the task at hand, i.e., the output of the control architecture of experiments 1 to 3 consisted of a pre-

programmed sequence of motor commands, whereas in the case of experiments 4 and 5, a feedback

signal proportional to the error due to a not centered tracked object was used in order to compute the

new motor activations.

8.4 Statistical analysis

The most straightforward statistical approach would be to correlate the time series of all variables

(difference , AL, IR, and preprocessed camera image) of the 125 dimensional data vector with each

other. However, by doing so we would run into the Bonferroni problem (Snedecor and Cochran,

1980): 5% of that very large number of correlations would be significant by chance alone (accepting

a significance level of α = 0.05). Moreover, the result of this operation would be strongly biased

due to the preponderance of the image data. Additional difficulties would arise from the fact that the


computed correlation coefficients would have to be combined into a single and meaningful number,

and due to the variance of the input data, this number would change over time.

Figure 8.2: Use of dimension reduction techniques, exemplified by the image data. (a) How therobot perceives an object when approaching it (experiment 1, no sensory-motor coordination).Moving forward, the image of a static object shifts to the periphery of the visual field. (b)A contour plot of the image data displayed as a time series of the pixel intensities. Verticalaxis: pixel locations. Horizontal axis: time steps. The peripheral shift shows up as an upwardcurving trace. (c) A 3D plot of (b) with pixel intensity plotted along the vertical axis. Here thetrace is visible as a trough cutting through a landscape with a ridge on the right side. (d) Areconstruction of (c) based on the first 5 PCs, which explain 95% of the variance. (e) The sameas (d) but based on average factors.

In order to avoid incurring into the Bonferroni problem, we reduced – as a first step – the number

of variables by performing a principal component analysis (PCA) on each of the three sensory modal-

ities separately. The main idea of Principal Component Analysis (PCA) is to compress the maximum

amount of information of a multivariate data set into a limited (usually small) number of principal

components (PCs). These principal components are linear combinations of the original variables and

in this way the high-dimensional data are projected onto a space of reduced dimensionality. The axes


are chosen in order to maximize the variance of the projected data. The usefulness of this method is

exemplified by a PCA performed on the camera image data (see Fig. 8.2). In the case of natural im-

ages, a PCA would result in a principal component to which contribute especially those pixel locations

whose intensity values correlate strongly in time and thus probably originate from one and the same

object.

The image reconstructed from the PCA is, however, far from perfect. This is probably due to the

fact that the PCs are mere linear combinations of “all” variables considered and, in addition, they do

not take into account the sign of their contribution to a given PC. They therefore include also those

variables that correlate only weakly or strongly negatively with a given PC. As an alternative, we

constructed so-called average factors (AF), which are the mean values calculated (for each time step)

over only those variables that load significantly high on a PC and are of the same sign. The comparison

of a reconstruction based on 5 PCs with one based on 5 AFs is shown in Fig. 8.2d and 8.2e. Also for

the other experiments we found that the image data could be adequately described by about 5 to 10

AFs. The AL data and the IR readings could be combined into an average of up to 4 AFs.

Next, the correlations between the AFs representing the reduced sensory space and the angular

data (from the wheel encoders) were computed and brought together into a correlation matrix R. One

way of summarizing the information in this matrix is to estimate 1−|R|, |R| being the determinant of

the correlation matrix. The measure 1−|R| has been put forward as a general measure of the variance

explained by the correlation among more than 2 variables and has actually been proposed to quantify

the dynamics of order and organization in developing biological systems (Banerjee et al., 1990). The

dynamics of this measure could be captured by calculating it for a window of W subsequent time steps,

and by recomputing this quantity after the window has been shifted ahead one step in time. An obvious

shortcoming of this technique is that the end of the data sequence is tapered to zero, i.e., the time series

is truncated at N−W − 1 data points, where N is the length of the data sequence. This represents a

clear loss of information, since events occurring in the tapered region are missed.

As an alternative, we computed the correlations for increasingly larger windows of 4,5, to N time

steps, but with a decreasing influence of the past, i.e., by giving less weight to data points further back

in time. This was achieved by weighting the contribution of the correlation coefficient at time t = T

(the current point in time) by a decay function wt,α (where α is a parameter controlling the steepness

of decay) leading to the calculation of the weighted correlation coefficient:

r∗T =∑T

t=0 wt,αxtyt−∑Tt=0 wt,αxt ∑T

t=0 wt,αyt/N[∑T

t=0 wt,αx2t −(∑T

t=0 wt,αxt)2/N][

∑Tt=0 wt,αy2

t −(∑Tt=0 wt,αyt)

2/N] . (8.1)

As a decay function, we chose a half-gaussian function wt,α = eln(α(t−T ))2u−1(t), where u−1(t) is the

Heaviside-function, which is 1 for t > 0 and 0 for t ≤ 0. This yields a matrix of weighted correlation


coefficients R∗(t) for each sampled point at time t. Unfortunately, the determinant of a correlation

matrix is highly sensitive to outliers. In other words, 1− |R∗| could not be used as a measure of

the dynamics of the correlation among the input and output variables. Another way of characterizing

Figure 8.3: Results of experiments 1-3 (no sensory-motor coordination). Left: experiment 1.Center: experiment 2. Right: experiment 3. From top to bottom (and for all columns) thevertical axes are H(λ), λmax, and Npc. In all graphs the horizontal axis denotes time. The curvesare the means from up to 15 experimental runs and the bars are the associated 95% confidencelimits around those means. For details refer to text.

a weighted correlation matrix, is by the set λ of its eigenvalues, λi (i = 1,2, . . .,F), where F is the

number of AFs. The ith eigenvalue equals the proportion of variance accounted for by the ith PC and

hence contains information about the correlation structure of the data set. In fact, this boils down to

yet another PCA, this time on the averaged factors. We propose 3 indices to capture the statistical

properties of the robot’s sensory data; they combine the eigenvalues λ i into a single quantity (and

that, like R∗, has to be calculated for each time step t). The first one is the Shannon Entropy H(λ) =

∑Ni=1 p(λi) log p(λi) (Shannon, 1948). This index attains its maximum for p(λi) = 0.5 (i = 1,2, . . .,N),

i.e., when the variance is evenly accounted for by all PCs. A high value of H(λ) therefore represents

a lack of correlational structure among the variables. When H(λ) is low, the total variance of the data

8.5. Experiments and Discussion 158

matrix is concentrated in one or only a few PCs and hence points to strong correlations. Another way

to quantify the same effect is the so-called Berger-Parker Index (BPI), which measures “dominance” as

D = λmax/∑Ni=1 λi. Since the eigenvalues of a correlation matrix are arranged in decreasing order and

sum up to unity, this results in D = λmax = λ1. The third measure is the number of PCs (eigenvalues)

that together explain 95% of the total variance. We will refer to it as Npc.

Figure 8.4: Results of experiments 4 and 5 (sensory-motor coordination). Left: experiment 4.Right: experiment 5. From top to bottom (and for all columns) the vertical axes are H(λ), λmax,and Npc. The horizontal axis denotes time. For details refer to text.

8.5 Experiments and Discussion

The outcome of the statistical analyses described in the previous section is summarized in Fig. 8.3

and Fig. 8.4. What do these results tell us with respect to the impact of sensory-motor coordination?

The most conspicuous difference between architectures with and without sensory-motor coordination

appears to be in the variance of the introduced indices. The curves of the experiments with sensory-

motor coordination (experiments 4 and 5) display a very large variance (represented by the errorbars).

Furthermore in these experiments the curves for Hλ and Npc decrease more or less monotonously

8.6. Conclusion and future work 159

(whereas λmax rises), implying a steady increase in correlation among the AFs. The large variance

is due to the fact that in some experiments these changes set in much earlier than in others (in some

instances the decrease was so late, that they resembled the outcomes of experiments 1 and 2). But

this does not imply that in the case of no sensory-motor coordination no reduction of dimensionality

occurs. In experiments 1 and 2 there is a reduction. However, it takes place only at the end of the

runs. Experiment 3 is odd – see Fig. 8.3, third column. Although the experiment is not sensory-motor

coordinated, the calculated indeces show the strongest reduction in dimensionality of all experiments!

Note that after an initial increase, the correlations seem to decrease again (see λmax, for instance). One

possible explanation is that the oscillatory movement forces a large part of the sensory input in the

same phase, leading to strong correlations in the case when the robot is distant from objects (beginning

of the experiment) and to weaker correlations otherwise.

8.6 Conclusion and future work

To summarize, the curves do indeed give a “fingerprint” of the robot-environment interaction (note how

the oscillations of the robot are also manifest in the λmax curve of experiment 3), and sensory-motor

coordination does lead to a reduction of dimensionality in the sensory input. However, despite this

very definite impact on the correlations of the sensory data, the results are not entirely straightforward.

Further investigation, in particular more complex sensory-motor setups, is required.

Chapter 9

Fingerprinting Agent-Environment

Interaction 1

9.1 Synopsis

This chapter investigates by means of statistical and information-theoretic measures, to what extent

sensory-motor coordinated activity can generate and structure information in the sensory channels of a

simulated agent interacting with its surrounding environment. We show how the usage of correlation,

entropy, and mutual information can be employed (a) to segment an observed behavior into distinct

behavioral states, (b) to analyze the informational relationship between the different components of the

sensory-motor apparatus, and (c) to quantify (“fingerprint”) the interaction between the agent and its lo-

cal environment. We hypothesize that a deeper understanding of the information-theoretic implications

of sensory-motor coordination can help us endow robots not only with better sensory morphologies,

but also with better strategies to explore their local environments.

9.2 Introduction

Manual haptic perception is the ability to gather information about objects by using the hands. Haptic

exploration is a task-dependent activity, and when people seek information about a particular object

property, such as size, temperature, hardness, or texture, they perform stereotyped exploratory hand

movements. In fact, spontaneously executed hand movements are the best ones to use, in the sense that

they maximize the availability of relevant sensory information gained by haptic exploration (Lederman

and Klatzky, 1990). The same holds for visual exploration. Eye movements, for instance, depend on

1Tarapore, D., Lungarella, M. and Gomez, G. Fingerprinting agent-environment interaction via information theory. InProc. of the 8th Int. Conf. on Intelligent Autonomous Systems, pp.512-520, 2004.

160


the perceptual judgement that people are asked to make, and the eyes are typically directed toward

areas of a visual scene or an image that deliver useful and essential perceptual information (Yarbus,

1967). To reason about the organization of saccadic eye movements, Lee and Yu (1999) proposed a

theoretical framework based on information maximization. The basic assumption of their theory is that

due to the small size of our foveas (high resolution part of the eye), our eyes have to continuously move

to maximize the information intake from the world. Differences between tasks obviously influence

the statistics of visual and tactile inputs, as well as the way people acquire information for object

discrimination, recognition, and categorization.

Clearly, the common denominator underlying our perceptual abilities seems to be a process of

sensory-motor coordination that couples action and perception. It follows that coordinated movements

must be considered part of the perceptual system (Thelen and Smith, 1994), and whether the sensory

stimulation is visual, tactile, or auditory, perception always includes associated movements of eyes,

hands, arms, head and neck (Ballard, 1991; Gibson, 1988). Sensory-motor coordination is important,

because (a) it induces correlations between various sensory modalities (such as vision and haptics)

that can be exploited to form cross-modal associations, and (b) it generates structure in the sensory

data that facilitates the subsequent processing of those data (Lungarella and Pfeifer, 2001; Lungarella

and Sporns, 2004; Sporns, 2003). Exploratory activity of hands and eyes is a particular instance of

coordinated motor activity that extracts different kinds of information through interaction with the en-

vironment. In other words, robots and other agents are not passively exposed to sensory information,

but they can actively shape that information. Our long-term goal is to quantitatively understand what

sort of coordinated motor activities lead to what sort of information. We also aim at identifying “fin-

gerprints” (or patterns) characterizing the agent-environment interaction. Our approach builds on top

of previous work on category learning (Pfeifer and Scheier, 1997; Scheier and Pfeifer, 1997), as well as

on information-theoretic and statistical analysis of sensory-motor data (Lungarella and Pfeifer, 2001;

Sporns, 2003; Te Boekhorst et al., 2003) (compare with Chapter 7 and Chapter 8).

In this chapter, we simulated a robotic agent whose task was to search its surrounding environment

for red objects, approach them, and explore them for a while. The analysis of the recorded sensory-

motor data showed that different types of sensory-motor activities displayed distinct fingerprints re-

producible across many experimental runs. In the two following sections, we give an overview of our

experimental setup, and describe the actual experiments. Then, in Section 9.5, we expose our methods

of analysis. In Section 9.6, we present our results and discuss them. Eventually, in Section 9.7, we

conclude and point to some future research directions.

9.3. Experimental Setup 162

9.3 Experimental Setup

We conducted our study in simulation. The experimental setup consisted of a two-wheeled robot and

of a closed environment cluttered with randomly distributed, colored cylindrical objects. A bird’s

eye view on the robot and its ecological niche is shown in Fig. 9.1 a. The robot was equipped with

eleven proximity sensors (d0−10) to measure the distance to the objects and a pan-controlled camera

unit (image sensor) – see Fig. 9.1 b. The proximity sensors had a position-dependent range, that is, the

sensors in the front and the one in the back had a short range, whereas the ones on the sides had a longer

range (see caption of Fig. 9.1). The output of each sensor was affected by additive white noise, and was

partitioned into a space having 32 discrete states, leading to sensory signals with a 5bits resolution. To

reduce the dimensionality of the input data, we divided the camera image into 24 vertical rectangular

slices with widths decreasing toward the center. We computed the amount of the “effective” red color

in each slice as R = r− (b + g)/2, where r, g, and b are the red, green, and blue components of the

color associated with each pixel of the slice. Negative values of R were set to zero. This operation

guaranteed that the red channel gave maximum response for fully saturated red color, that is, for r=31,

g=b=0. The red color slices will also be referred to as red channels or red receptors.

For the control of the robot, we opted for the Extended Braitenberg Architecture (Pfeifer and

Scheier, 1999). In this architecture, each of the robot’s sensors is connected to a number of processes

which run in parallel and continuously influence the agent’s internal state, and govern its behavior.

Because our goal is to illustrate how standard statistical and information-theoretic measures can be

employed to quantify (and fingerprint) the agent-environment interaction, we started by decomposing

the robot’s behavior into three distinct behavioral states: (a) “explore the environment” and “find red

objects”, (b) “track red objects”, and (c) “circle around red objects.” It is important to note that the

three behavioral states display coordinated motor activity, and are characterized by a tight coupling

between sensing and acting. We advance here that the segmentation of the observed behavior into

distinct behavioral states is an important (maybe even necessary) step for fingerprinting the agent-

environment interaction and identifying stable patterns of interaction (such as stereotyped exploratory

hand movements).

9.4 Experiments

A top view of a typical experiment is shown in Fig. 9.1 a. We conducted 16 experiments. Each

experiment consisted of approximately 1400 data samples, which were stored into a time series file

for subsequent analysis. At the outset of each experimental run, the robot’s initial position was set

to the final position of the previous experiment (except for the first experiment where the robot was

placed in the origin of the x-y plane), and the behavioral state was reset to “exploring.” In this particular


(a)

camera

m0 m1

d10d9

d0

d4

d1

d2 d3

d7 d8

d6 d5

(b)

(c)

Figure 9.1: (a) Bird’s eye view on the robot and its ecological niche. The trace depicts the pathof the robot during a typical experiment. (b) Schematic representation of the simulated agent.The sensors have a position-dependent range: if rl is the length of the robot, the range of d0,d1, d9, and d10 is 1.8rl, the one of d2 and d3 is 1.2rl, and the one of d4, d5, d6, d7, and d8 is 0.6rl.(c) Extended Braitenberg Control Architecture: As shown, four processes govern the agent’sbehavior.

9.5. Methods 164

state the robot randomly explored its environment while avoiding obstacles. Concurrently, the robot’s

camera panned from side to side (by 60 degrees on each side). If the maximum of the effective red

color (summed over the entire image) passed a given (fixed) threshold, it was assumed that the robot

had successfully identified a red object. The behavioral state was set to “tracking”, the camera stopped

rotating from side to side, and the robot started moving in the direction pointed at by the camera,

trying to keep the object in the camera’s center of view. Once close to the red object, the robot started

circling around it (while still keeping it in its center of view by adjusting the camera’s pan-angle).

At the same time, a “boredom” signal started increasing. The robot kept circling around the object,

until the boredom signal crossed an upper threshold. In that instant, the robot stopped circling, and

started backing away from the red object, while avoiding other objects. Concurrently, the boredom

signal began to decrease. When the boredom signal finally dropped below a lower threshold, the robot

resumed the exploration of the surrounding environment.

9.5 Methods

First, we introduce some notation. Correlation quantifies the amount of linear dependency between two

random variables X and Y , and is given by Corr(X ,Y ) = (∑x∈X ∑y∈Y p(x,y)(x−mX )(y−mY ))/σX σY ,

where p(x,y) is the second order (or joint) probability density function, mX and mY are the mean, and

σX and σY are the standard deviation (std) of x and y computed over X and Y . The entropy of a random

variable X is a measure of its uncertainty, and is defined as H(X) = −∑x∈X p(x) log p(x), where p(x)

is the first order probability density function associated with X ; in a sense entropy provides a mea-

sure for the sharpness of p(x). The joint entropy between variables X and Y is defined analogously as

H(X ,Y ) = −∑x∈X ∑y∈Y p(x,y) log p(x,y). For entropy as well as for mutual information, we assumed

the binary logarithm. Mutual information measures the statistical independence of two random vari-

ables X and Y (Cover and Thomas, 1991; Shannon, 1948). Using the joint entropy H(X ,Y), we can

define the mutual information between X and Y as MI(X ,Y ) = H(X) + H(Y ) − H(X ,Y). In compar-

ison with correlation, mutual information provides a better and more general criterion to investigate

statistical dependencies between random variables (Steuer et al., 2002). Correlation, entropy and joint

entropy were computed by first approximating p(x) and p(x,y). The most straightforward approach

is to use a histogram-based technique, described, for instance, in (Steuer et al., 2002). Because the

sensors had a resolution of 5 bits, we estimated the histograms by setting the number of bins to 32

(which leads to a bin-size of one). Having a unitary bin size allowed us to map the discretized value of

the sensory stimulus directly onto the corresponding bin for the approximation of the joint probabil-

ity density function, thus speeding up the computation. As noted previously, the distance sensors are

identified by di, i ∈ [0,10], whereas the effective red color sensors are indexed with the numbers 1 to

24.

9.6. Data Analysis and Results 165

9.6 Data Analysis and Results

We analyzed the collected datasets by means of three measures: correlation, mutual information, and

entropy (which is a particular instance of mutual information). In this section we describe, and in part

discuss, the results of our analyses.

9.6.1 Correlation

In the first behavioral state (“exploring”), the robot moved around avoiding obstacles and “searching”

for red objects. In all performed experiments, we observed either no or only weak correlations between

the proximity sensors, that is, the correlations were small and their absolute value close to zero. In

Figure 9.2 a, for instance, the average correlation is 0.011. The intrinsic noise of the sensors, as well

as the unpredictability of the sensory activations while the robot was exploring its ecological niche,

made the identification of statistical dependencies between the sensory activations by means of linear

correlation difficult. Similarly, the output of the red channels did not lead to a “stable” correlation

matrix, that is, the pair-wise correlations between the sensory channels varied significantly across the

different experimental runs. The average correlation in the case of Figure 9.3 a is 0.053 (again a low

value), and the standard deviation is 0.023. The reason is that in this state, the oscillatory movement

of the robot’s camera induced a rapidly changing stream of sensory data, which in turn led to small

correlations between the red channels.

In the second behavioral state (“tracking”), the robot moved toward the previously identified red

object. In this case, the correlations between the activity of the red receptors in and close to the center of

the image are high (see Fig. 9.3 b). A possible explanation is that the robot kept correcting the direction

of its movements so that the tracked object remained in the center of its visual field. Moreover, because

this state was characterized by a goal-directed movement of the robot toward the red object, the number

of red pixels present in the image increased, leading to an increase of the stimulation of the red receptors

located in the center (note that the activation of the red receptors is an average computed over a vertical

slice), and to a corresponding increase of the correlation between those receptors.

In the third behavioral state (“circling”), we observed negative correlations (−0.442) between the

pairs of proximity sensors located on the ipsi-lateral (same) side of the robot, such as (d2,d9) or

(d3,d10), (see Fig. 9.2 c). Due to the non-linearities of the data and the noise-induced correlations,

however, these correlations are not immediately evident from the plot. In this state, we observed in

all performed experimental runs, strong correlations between the output of the red channels located in

(and close to) the central image area (see Fig. 9.3 c). The correlation was 0.920 for receptors in the

center, with an overall average of 0.166. The standard deviation of the correlation computed over all

experiments was 0.041. While circling around the object, the robot kept foveating on it. Due to the

limitations of the camera angle, however, the object appeared on the side and not in the center of the


visual field.

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10

d0

d1

d2

d3

d4

d5

d6

d7

d8

d9

d10

(a)

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10

d0

d1

d2

d3

d4

d5

d6

d7

d8

d9

d10

(b)

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10

d0

d1

d2

d3

d4

d5

d6

d7

d8

d9

d10

(c)

Figure 9.2: Correlation matrix obtained from the pair-wise correlation of the distance sensorsfor one particular experimental run during the behavioral state: (a) “exploring”, (b) “tracking”,(c)“circling.” The higher the correlation, the larger the size of the square. From left to rightthe average correlation is: 0.011± 0.004, 0.097± 0.012, and 0.083± 0.041, where ± indicates thestandard deviation.

2 4 6 8 10 12 14 16 18 20 22 24

2

4

6

8

10

12

14

16

18

20

22

24

(a)

2 4 6 8 10 12 14 16 18 20 22 24

2

4

6

8

10

12

14

16

18

20

22

24

(b)

2 4 6 8 10 12 14 16 18 20 22 24

2

4

6

8

10

12

14

16

18

20

22

24

(c)

Figure 9.3: Correlation matrix obtained from the pair-wise correlation of the red channels forone particular experimental run during the behavioral state: (a) “exploring”, (b) “tracking”,(c) “circling.” The higher the correlation, the larger the size of the square. From left to rightthe average correlation is: 0.053± 0.023, 0.309± 0.042, and 0.166± 0.031, where ± indicates thestandard deviation.


9.6.2 Entropy and mutual information

The pair-wise mutual information between the eleven proximity sensors is shown in Figure 9.4. The

diagonal of the same plot gives the entropy of the sensory stimulation – courtesy of the expression

H(X) = MI(X ,X). Because the individual sensors are affected by uniform white noise, even sensors

that are never active, can be characterized by a potentially large entropy (see graph of cumulated

activation in Fig. 9.6).

In the first and second behavioral states, the results of the analysis of the data gathered in a partic-

ular experiment cannot be generalized to all experiments. The reason is that in experiments in which

the robot avoids obstacles, the average mutual information between sensors, as well as the entropy of

the individual sensors, is larger compared to experimental runs in which the robot does not encounter

any object. In the third behavioral state “circling”, the entropy of the activation of the sensors on both

sides of the robot is large: H(d3) = 2.83 bits and H(d10) = 2.75 bits (see Fig. 9.4 c). In the same Figure,

the mutual information between these sensors is also high: MI(d2,d9) = 0.62 bits. Figure 9.5 shows

the mutual information matrices obtained from the estimation of the mutual information for pairs of

red channels. In the behavioral state “exploring”, the average mutual information computed over all

experiments is 0.123 bits, and the standard deviation is 0.020 bits (Fig. 9.5 a shows the result for one

particular experiment). The reason for the low values of mutual information is that the camera oscil-

lates from side to side, thus leading to a rapidly changing camera image, and hence to a drop of the

statistical dependence between red channels. In the second behavioral state ”tracking”, the entropy for

the red receptors in and around the center is high in comparison with the one of the first behavioral

state (mean: 2.674 bits, standard deviation: 0.362 bits). The same holds for the mutual information be-

tween the red receptors (mean: 0.604 bits, standard deviation: 0.160 bits) (see Fig. 9.5 b). In the third

behavioral state, the entropy of the red channels at the periphery, as well as the mutual information

between them, is large (see Fig. 9.5 c). Across all experiments, for both sides of the image sensor, the

standard deviation of the mutual information assumes high values (e.g., the std of the receptor on the

far left of the image sensor is 0.461 bits). In contrast, the standard deviation for the red channels close

to the center is low (e.g., 0.244 bits), and largely independent from the direction in which the robot

is moving around the object. The standard deviation in the mutual information between red receptors

across all the experiments was low (0.102 bits). We conclude that mutual information may provide a

good and stable measure for identifying and characterizing agent-environment interaction.

9.6.3 Cumulated sensor activation

The amount of variability (information) should not be confused with the cumulated amount of sen-

sory activation (total stimulation) of a particular sensor. The total sensory stimulation for both sensory

modalities was computed by integrating – separately for each behavioral state – the activation of the in-


d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10

d0

d1

d2

d3

d4

d5

d6

d7

d8

d9

d10

(a)

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10

d0

d1

d2

d3

d4

d5

d6

d7

d8

d9

d10

(b)

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10

d0

d1

d2

d3

d4

d5

d6

d7

d8

d9

d10

(c)

Figure 9.4: Mutual information matrix obtained by estimating the mutual information betweenpairs of proximity sensors in one particular experimental run during the behavioral state: (a)”exploring”, (b) ”tracking”, (c) ”circling”. The higher the mutual information, the larger the sizeof the square.

2 4 6 8 10 12 14 16 18 20 22 24

2

4

6

8

10

12

14

16

18

20

22

24

(a)

2 4 6 8 10 12 14 16 18 20 22 24

2

4

6

8

10

12

14

16

18

20

22

24

(b)

2 4 6 8 10 12 14 16 18 20 22 24

2

4

6

8

10

12

14

16

18

20

22

24

(c)

Figure 9.5: Mutual information matrix obtained by estimating the mutual information betweenpairs of red channels in one particular experimental run during the behavioral state: (a) ”ex-ploring”, (b) ”tracking”, (c) ”circling”. The higher the mutual information, the larger the size ofthe square.

dividual sensors during an experiment. We then normalized the activation as a percentage (see Fig. 9.6).

In the “exploring” and “tracking” behavioral states the cumulated sensor activation does not show any

stable patterns across multiple experiments, in the sense that the positions of the peaks change from

experiment to experiment and depend on the number of objects encountered. In the third behavioral


state, however, the activation levels of the sensors d2 and d3 are high and stable across all experimental

runs (see Fig. 9.6 a). These sensors are used when the robot moves toward the red object. The same

graph shows that the activation levels of the sensors d9 or d10 are characterized by large values. These

particular sensors are used to prevent the robot from colliding with the object (while circling around

it). As for the distance sensors, we also computed the activation levels of the 24 red receptors (see

Fig. 9.6 b). The total stimulation of the red channels in the first behavioral state does not display sta-

bility across all experiments. In the second behavioral state the activation levels for the red receptors

close to the center are high. The activation levels, however, gradually decrease toward the periphery.

The decrease is a result of the continuous adjustments of the camera pan-angle in order to keep the

red object in the center of its visual field. Thus, the peripheral red receptors are not stimulated. The

behavioral state “circling” shows high activation levels for the image sensors on both sides of the robot.

0 1 2 3 4 5 6 7 8 9 10 11 12

0

10

20

30

40

50

60

70

80

90

100

activ

atio

n le

vels

(%

)

(a)

0 2 4 6 8 10 12 14 16 18 20 22 24

0

10

20

30

40

50

60

70

80

90

100

activ

atio

n le

vels

(%

)

(b)

Figure 9.6: (a) Plot of activation levels for the proximity sensors (1 to 12) for the three behavioralstates. (b) Plot of activation levels for the image sensors (1 to 24) for the three behavioralstates. The plots display the average computed over 16 experimental runs. The bars denotethe standard deviation.

9.6.4 Pre-processed image entropy

The change over time of the total image entropy (computed as the average of the entropies of the

individual vertical slices) is displayed in Fig. 9.6 c. While the robot is exploring its ecological niche,

the image entropy is low and constant (phase P1), that is, there is not much variability in the sensory

9.7. Further Discussion and Conclusion 170

0 200 400 600 800 1000 1200 1400 16000

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

time [simulation steps]

entr

opy

[bits

]

P1 P3 P2

Figure 9.7: Entropy of the effective red color averaged over all vertical slices. P1: exploring; P2:tracking; P3: circling. The plot displays the average computed over 16 experimental runs. Thebars denote the standard deviation.

channel. When the robot starts approaching the red object (second behavioral state), the image entropy

begins to increase (phase P2). The image entropy reaches its maximum in the third behavioral state,

and stays high as long as the robot keeps circling around the red object (phase P3).

9.7 Further Discussion and Conclusion

To summarize, coordinated motor activity leads to correlations in the sensory data that can be used

to characterize the robot-environment interaction. Statistical measures, such as correlation and mu-

tual information, can be employed to extract fingerprints of the robot-environment interaction. In the

“circling” behavioral state, for instance, the average correlation (evaluated over 16 experimental runs)

divided by the number of distance sensors (11) or red receptors (24) is 0.083± 0.041 for the distance

sensors and 0.166± 0.031 for the receptors (where ± indicates the standard deviation). Mean and stan-

dard deviation clearly show that the fingerprint (extracted by means of correlation analysis) is stable

across multiple experimental runs. Similarly, in the “tracking” behavioral state, the average correlation

is 0.097± 0.012 (for the distance sensors) and 0.309± 0.042 (for the red receptors). These results hold

also for the mutual information.

Although correlation and mutual information provide appropriate statistical measures for finger-

printing interaction, they differ in at least one important aspect. Correlation can be used to identify

fingerprints of robot-environment interaction only if the sensory activations between different sensors

happen to be temporally contiguous. We hypothesize that temporal contiguity and stability in the raw

9.7. Further Discussion and Conclusion 171

sensory data is the result of coordinated motor activity (exploration strategy). In contrast to correlation,

mutual information reveals dependencies between the sensory stimulations that correlation cannot cap-

ture. Our analyses demonstrate that even if the sensory channels are affected by additive white noise,

a proper sensory-motor coordinated interaction can indeed lead to stable fingerprints. Sensory motor

coordination generates sensory data with “high information content.” The entropy of the sensory data

lies between the minimum (0 bit) and maximum entropy bound (5 bits). High entropy could be the con-

sequence of a “complex” robot behavior. The high entropy values correspond to more uncertainty and

therefore more interesting behaviors. This would explain the high entropy values of the image sensors

in the behavioral states “tracking” and “circling” as compared to the behavioral state “exploring”. The

mutual information gives the amount of information shared by different sensors. Sensors coordinating

with the motor in a particular behavioral state exhibit high mutual information. We conclude that the

information shared by sensors and motors provides a fingerprint for the corresponding behavior.

Chapter 10

Summary and Conclusion

This section wraps up the entire story by (a) giving a quick run down of the individual chapters, (b)

pointing out their scientific contribution, and (c) showing how each chapter relates to a particular design

principle.

10.1 Summary

This thesis explored a novel area of research residing at the interface of robotics, embodied artificial

intelligence, and developmental sciences called developmental robotics. It exemplified the general

philosophy of action of developmental robotics by means of a series of case-studies in which the re-

ciprocal and dynamic interaction of control structure, body, and environment was explored, quantified,

and purposively exploited. The core idea was the realization that embedding such coupling in a devel-

opmental framework could favor the emergence of stable behavioral patterns, and provide the system

with adaptivity and robustness against changes of body and environment. Eventually, based on those

case-studies and on previous work by Pfeifer (1996) – see also (Pfeifer and Scheier, 1999; Pfeifer and

Glatzeder, 2004) – a set of computational, and integrative design principles for developmental systems

was abstracted:

• The principle of cheap design

• The principle of ecological balance

• The value principle

• The principle of design for emergence

• The time scale integration principle

172

10.1. Summary 173

• The starting simple principle

• The principle of exploratory activity

• The principle of information self-structuring

• The principle of social interaction

Figure 10.1: Seven chapters, seven case-studies. The labels denote one or two design princi-ple(s) the case-study intends to address. The numbers indicate the chapter. The picture is thesame of Chapter 1.

Every chapter of this thesis gravitates (in a way or another) around one or more of those principles

(see Fig. 10.1). Additional dependencies are given in the introductory chapter.

• Chapter 2 exposed the main reasons and key motivations behind the convergence of robotics,

artificial intelligence, and developmental sciences. Developmental robotics was defined as a

synthetic and two-pronged methodology that on one side instantiates and investigates models

10.1. Summary 174

originating from developmental psychology and developmental neuroscience, and on the other

exploits insights gained from studies on ontogenetic development to design and construct better

robotic systems (examples of such methodology were given in the subsequent chapters. By pre-

senting some aspects (facets) of developmental sciences that are of interest for developmental

robotics, and giving a general overview of the field, the chapter also attempted to show how

insights from all involved areas can be combined en route to a better understanding of adaptive

systems. A further goal of this chapter was to offer a new perspective on issues dear to develop-

mental robotics, and to point out areas on which research could be focused. All design principles

were implicitly addressed and discussed.

• The basic assumption of Chapter 3 was that the robust and adaptive behavior exhibited by natu-

ral organisms is often the product of a complex interaction between various plastic mechanisms

acting at multiple time scales. The chapter reported on experiments conducted with a small-sized

humanoid robot that learned to swing like a pendulum, and whose joints were controlled by a

set of Matsuoka neural oscillators. The study illustrated how the exploration of neural plastic-

ity, body growth, and entrainment to physical dynamics – where each mechanism has a specific

time scale – led to a more efficient exploration of the sensory and motor space, and eventually

to a more adaptive behavior. Thus, it clearly addressed the time scale integration principle.

The study also showed how an initial reduction of the number of mechanical degrees of freedom

guarantees – in the absence of strong environmental perturbations – a more efficient value-driven

exploration of the sensory-motor space. This is an instantiation of the principle of exploratory

activity, and of the value principle. In addition, the chapter reported on a comparative analysis

between the outright use of all degrees of freedom (left and right hip and knee), and the pro-

gressive involvement of all degrees of freedom by using a developmental mechanism of initial

freezing and freeing, such as hypothesized by Bernstein (1967). We observed that a freezing of

the peripheral degree of freedom (knee) led to an increase of the range of neural control param-

eters associated with a stable oscillatory behavior. This result can be seen as positive evidence

for what asserted by the starting simple principle.

• Chapter 4 revisited the study presented in Chapter 3 by introducing a coupling (a nonlinear

spring) between environment and system. Under otherwise unchanged experimental conditions

(same robot, same task), it brought forward evidence that a single phase of freezing and subse-

quent freeing of degrees of freedom is not sufficient to achieve optimal performance, and instead,

an alternate freezing and freeing of degrees of freedom is required. The interest of this result was

two-fold: (a) it confirmed the recent observation by Newell and Vaillancourt (2001) that Bern-

stein’s framework may be too narrow to account for real data, and (b) it suggested that perturba-

tions which push the system outside its postural stability, or an increase of the task complexity

10.1. Summary 175

might be the mechanisms that trigger alternate freezing and freeing of degrees of freedom. By

addressing similar issues to Chapter 3, this chapter relates to the very same principles: the time

scale integration principle, value principle, the principle of exploratory activity, and the

starting simple principle.

• Chapter 5 documented a study that was inspired by a longitudinal experiment performed

by Goldfield et al. (1993), in which six 8-months old infants strapped in a Jolly Jumper (a har-

ness attached to a spring) were examined while they learned to bounce. Goldfield and colleagues

advanced the hypothesis that the infants’ spontaneous motor activity could be decomposed in an

assembly and in a tuning phase. Assembly is a process of self-organization, which establishes a

coupling among the components of the neural and the musculo-skeletal system. Its outcome is

a task-specific movement whose parameters are subsequently explored, and tuned to particular

task conditions. In this chapter, we described and discussed a set of preliminary experiments,

which were performed with a bouncing humanoid robot, and which were aimed at instantiating a

few computational principles hypothesized to underlie the development of motor skills. Our ex-

periments showed that a suitable choice of the coupling constants between hip, knee, and ankle

joints, as well as of the strength of the sensory feedback, induces a reduction of movement vari-

ability, and leads to an increase in bouncing amplitude and movement stability. This result was

attributed to the synergy between neural and body-environment dynamics, and to their mutual

entrainment. It follows that this chapter substantiates the principle of design for emergence.

Moreover, although the parameter exploration was performed manually, the chapter also relates

to the principle of exploratory activity.

• Chapter 6 documented a value-based stochastic exploration scheme used to explore the param-

eter space associated with a neuro-musculoskeletal system. The scheme combined random and

unbiased Montecarlo exploration, and a gradual, value-driven trade-off between exploration and

exploitation (controlled by Simulated Annealing). Despite its simplicity, the method’s likelihood

to get stuck in local minima of the parameter space was low, and its convergence sufficiently

rapid. The scheme was first tested in simulation, and the applied to the online self-calibration of

a set of linear proportional-integrative-derivative (PID) controllers of a robot head, and to the ex-

ploration of the neural parameters associated with the control architectures of an oscillating and

a bouncing robot (not discussed). The chapter also provided additional support for the principle

of exploratory activity and the value principle, and motivated from a developmental point of

view, the need for endowing robots with exploratory skills. Indeed, many studies of motor learn-

ing indicate that the acquisition of new motor skills (in healthy infants and in adults) is preceded

by a seemingly random, exploratory phase during which possible movements are explored, se-

lected, and tuned, and the ability to predict the sensory consequences of those movements is

10.1. Summary 176

acquired. The stochastic exploration scheme may prove to be an adequate first step for modeling

such exploratory activity.

• Chapter 7 presented initial quantitative analyses of sensory data showing how simple sensory-

motor functions like gaze direction and foveation can generate informational structure (e.g., high

mutual information) in the visual channel of a robot. The main objective was to get a better in-

tuition of the sensory data which constitute, in a sense, the “raw” (unprocessed) material that the

neural system has to cope with. As an example of analysis, information theoretic measures such

as the Shannon entropy and mutual information were employed. The results showed that embod-

ied action/interaction can indeed induce statistical dependencies and informational structure in

and among sensory channels. Such evidence clearly confirms what asserted by the principle of

information self-structuring. A plausible assumption derived from this chapter – further cor-

roborated in chapters 8 and 9 – is that the principle of information self-structuring may emerge

as a key element toward understanding learning and development in robots and organisms (see

also Sporns, 2004, for a similar conclusion).

• Chapter 8 provided additional supporting evidence for the principle of information self-

structuring. Traditionally, categorization has been investigated by employing disembodied cat-

egorization models, in which input patterns are mapped onto category nodes. In contrast to

this mainstream view, a few researchers – including John Dewey and Jean Piaget – have argued

for a more interactive view of categorization, which hypothesizes that perceptual categorization

cannot be decoupled from coordinated motor activity. Chapter 8 embraced the second view by

assuming that by interacting with the environment an agent can structure its sensory input for

the purpose of learning about categories. The core idea is that by employing embodied agents,

it would be possible to compensate for the lack of quantitative evidence to support the hypothe-

sis that sensory-motor coordination is of crucial importance for category learning. The chapter

advanced our understanding by putting forward quantitative evidence confirming the hypothesis

that sensory-motor interaction represents one viable strategy to reduce the dimensionality of the

space of all possible configurations of states that the sensory and motor system can assume. It

is through embodied interaction and a process of exploration that the information generated by

both perceiving and acting becomes correlated. It is thus clear that the results presented in this

chapter support what asserted by the principle of information self-structuring, and the principle

of exploratory activity.

• Similarly to Chapter 7 and 8, Chapter 9 examined by means of statistical and information-

theoretic measures, to what extent sensory-motor coordinated activity can generate and struc-

ture information in the sensory channels of a simulated agent interacting with its local envi-


ronment. In this sense, it is also exemplified the importance of the principle of information

self-structuring. The novel contribution of the chapter was that is showed how the usage of

correlation, entropy, and mutual information can be employed (a) to segment an observed be-

havior into distinct behavioral states, (b) to quantify (“fingerprint”) the agent-environment in-

teraction, and (c) to analyze the informational relationship between the different components

of the sensory-motor apparatus. The chapter further discussed the hypothesis that a deeper un-

derstanding of the information-theoretic implications of sensory-motor coordination can help us

endow robots with better sensory morphologies, and with better strategies for exploring their

surrounding environment.

The only principle not directly touched in this thesis is the principle of social interaction. Indeed,

almost all experimental results exposed and discussed in this thesis belong to the third and fourth

primary area of interest introduced in Chapter 2: (a) “agent-related sensory-motor control”, that is, the

study of the agent’s bodily capabilities, changes of morphology, their effects on motor skill acquisition,

and so on; and (b) “developmental mechanisms and processes.” Although, we acknowledge the crucial

importance of social interaction for the emergence and development of cognitive structure in man and

machines (see Chapter 1 and 2), in the context of this thesis, we deliberatively chose to avoid touching

socio-historical aspects of development. We justify the active neglect to take into account such a

fundamental aspect by saying that it helped us keep the number of dependent variables low, and the

dimensionality of the problem under control, so to speak.

10.2 Conclusion

Has this thesis provided definitive answers to the questions posed at the outset of the introductory chap-

ter? Probably not. By employing robots as research vehicles, it has explored, however, some novel

paths which in the long-term could lead to such answers. Throughout the whole thesis the core specu-

lation has been that the convergence between developmental sciences, embodied artificial intelligence,

and robotics may represent a prolific route toward understanding the emergence and development of

cognitive, behavioral, and sensory-motor structure in natural and artificial systems. It may help us

not only construct more adaptive machines, but also understand the nature of man better. Indeed, the

results presented in this thesis demonstrate that by taking the developmental approach seriously can

indeed cast a new type of light on old themes.

The success of the infant field developmental robotics and of the research methodology it advo-

cates, will ultimately depend on whether it will be possible to crystalize its central assumptions into

a theory. Such a developmental theory of embodied artificial intelligence may be a key step toward

furthering our understanding of intelligence and toward the synthesis of adaptive machines, and truly


autonomous developmental “baby robots.” As the principles for developmental systems abstracted in

the case-studies, and documented in this thesis show, a theory is at the horizon. Slowly, but surely, the

pieces of this complex puzzle are coming together, and a complete picture is beginning to emerge.

Exciting times are ahead of us.

Bibliography

Adolph, K. E., Eppler, M. A., Marin, L., Weise, I. B., and Clearfield, M. W. (2000). Exploration in theservice of prospective control. Infant Behavior and Development, 23:441–460.

Adolph, K. E., Vereijken, B., and Denny, M. A. (1998). Learning to crawl. Child Development,69:1299–1312.

Almassy, N., Edelman, G. M., and Sporns, O. (1998). Behavioral constraints in the development ofneuronal properties: A cortical model embedded in a real world device. Cerebral Cortex, 8:346–361.

Anderson, W. (1989). Learning to control an inverted pendulum using neural networks. IEEE ControlSystem Magazine, pages 31–36.

Andry, P., Gaussier, P., and Nadel, J. (2002). From visuo-motor development to low-level imitation. InProc. of the 2nd International Workshop on Epigenetics Robotics, pages 7–15.

Angulo-Kinzler, R. M. (2001). Exploration and selection of intralimb coordination patterns in 3-month-old infants. Journal of Motor Behavior, 33(4):363–376.

Arutyunyan, G. H., Gurfinkel, V. S., and Mirskii, M. L. (1969). Organization of movements on execu-tion by man of an exact postural task. Biophysics, 14:1162–1167.

Asada, M., MacDorman, K. F., Ishiguro, H., and Kuniyoshi, Y. (2001). Cognitive developmentalrobotics as a new paradigm for the design of humanoid robots. Robotics and Autonomous Systems,37:185–193.

Ashby, W. R. (1947). Principles of the self-organizing dyanmic system. Journal of General Psychol-ogy, 37:125.

Aslin, R. N. (1988). Perceptual development in infancy: The minnesota symposia on child psychology.volume 20, chapter Anatomical constraints on oculomotor development: Implications for infantperception, pages 67–104. Hillsdale, NJ: Erlbaum.

Balkenius, C., Zlatev, J., Kozima, H., Dautenhahn, K., and Breazeal, C., editors (2001). Proc. of FirstIntl. Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. LundUniversity Cognitive Studies, 85.

Ballard, D. (1991). Animate vision. Artificial Intelligence, 48(1):57–86.

179

BIBLIOGRAPHY 180

Banerjee, P. R., Sibbald, S., and Maze, J. (1990). Quantifying the dynamics of order and organizationin biological systems. Journal of Theoretical Biology, 143:91–112.

Baron-Cohen, S. (1995). Mindblindness. Cambridge, MA: MIT Press.

Bates, E. A. and Elman, J. L. (2002). Connectionism and the study of change. In Johnson, M., editor,Brain Development and Cognition: A Reader. Oxford: Blackwell Publishers.

Beer, R. D. (2004). Autopoiesis and cognition in the game of life. Artificial Life, 10(3):309–326.

Beer, R. D., Chiel, H. J., Quinn, R. D., and Ritzmann, R. E. (1998). Biorobotics approaches to thestudy of motor systems. Current Opinion in Neurobiology, 8:777–782.

Bernstein, N. (1967). The Co-ordination and Regulation of Movements. London: Pergamon.

Bertenthal, B. and Von Hofsten, C. (1998). Eye and trunk control: The foundation for manual devel-opment. Neuroscience and Biobehavioral Reviews, 22(4):515–520.

Berthier, N. E., Clifton, R. K., Gullapalli, V., and McCall, D. J. (1996). Visual information and thecontrol of reaching. Journal of Motor Behavior, 28:187–197.

Berthouze, L., Bakker, P., and Kuniyoshi, Y. (1997). Learning of oculo-motor control: a preludeto robotic imitation. In Proc. of IEEE/RSJ Intl. Conf. on Robotics and Intelligent Systems, pages376–381.

Berthouze, L. and Kuniyoshi, Y. (1998). Emergence and categorization of coordinated visual behaviorthrough embodied interaction. Machine Learning, 31(1-3):187–200.

Berthouze, L., Kuniyoshi, Y., and Pfeifer, R., editors (1999). Proc. of First Intl. Workshop on Emer-gence and Development of Embodied Cognition. Workshop held in Tsukuba, Japan, unpublished.

Berthouze, L. and Lungarella, M. (2004). Motor skill acquisition under environmental perturbations:on the necessity of alternate freezing and freeing. Adaptive Behavior, 12(1). in press.

Berthouze, L. and Metta, G., editors (2004). Third Intl. Workshop on Epigenetic Robotics: ModelingCognitive Development in Robotic Systems. Workshop will take place at the University of Genova,Italy.

Berthouze, L., Shigematsu, Y., and Kuniyoshi, Y. (1998). Dynamic categorization of explorative behav-iors for emergence of stable sensorimotor configurations. In Proc. of Fifth Intl. Conf. on Simulationof Adaptive Behavior, pages 67–72.

Berthouze, L. and Ziemke, T., editors (2003). Epigenetic Robotics: Modelling Cognitive Developmentin Robotic Systems, volume 15 (4).

Bjorklund, E. M. and Green, B. L. (1992). The adaptive nature of cognitive immaturity. AmericanPsychologist, 47:46–54.

Blumberg, B. M. (1996). Old Tricks, New Dogs: Ethology and Interactive Creatures. PhD thesis,Cambridge, MA: The MIT Media Laboratory.

BIBLIOGRAPHY 181

Bornstein, M. H. (1989). Sensitive periods in development: structural characteristics and causal inter-pretations. Psychological Bulletin, 105(2):179–197.

Braitenberg, V. (1984). Vehicles: Experiments in synthetic psychology. Cambridge, MA: MIT Press.

Breazeal, C. L. (2002). Designing Social Robots. Cambridge, MA: MIT Press.

Breazeal, C. L. and Aryananda, L. (2002). Recognition of affective communicative intent in robot-directed speech. Autonomous Robots, 12:83–104.

Breazeal, C. L. and Scassellati, B. (2000). Infant-like social interactions between a robot and a humancaretaker. Adaptive Behavior, 8(1):49–74.

Breazeal, C. L. and Scassellati, B. (2002). Robots that imitate humans. Trends in Cognitive Science,6:481–487.

Brooks, R. A. (1991). Intelligence without representation. Artificial Intelligence, 47:139–160.

Brooks, R. A. (1997). From earwigs to humans. Robotics and Autonomous Systems, 20(2-4):291–304.

Brooks, R. A. (2003). Robot: The Future of Flesh and Machines. London: Penguin Books.

Brooks, R. A., Breazeal, C., Irie, R., Kemp, C. C., Marjanovic, M., Scassellati, B., and Williamson,M. M. (1998). Alternative essences of intelligence. In Proc. of the 15th Natl. Conf. on ArtificialIntelligence, pages 961–978. Madison,WI.

Brooks, R. A. and Stein, L. A. (1994). Building brains for bodies. Autonomous Robots, 1(1):7–25.

Bullock, D., Grossberg, S., and Guenther, F. H. (1993). A self-organizing neural model of motorequivalent reaching and tool use by a multijoint arm. Journal of Cognitive Neuroscience, 5(4):408–435.

Bullowa, M. (1979). Before Speech: The Beginning of Interpersonal Communication. Cambridge,London: Cambridge University Press.

Bushnell, E. M. and Boudreau, J. P. (1993). Motor development in the mind: The potential role ofmotor abilities as a determinant of aspects of perceptual development. Child Development, 64:1005–1021.

Butterworth, G. and Jarrett, B. (1991). What minds have in common in space: spatial mechanismsserving joint visual attention in infancy. British Journal of Developmental Psychology, 9:55–72.

Cerny, V. (1985). Thermodynamic approach to the traveling salesman problem. Journal of Optimiza-tion Theory Application, 45:41–51.

Chec, D. J. and Martin, S. (2002). Functional Movement Development Across the Life Span. W.B.Saunders Company.

Chomsky, N. (1986). Knowledge of Language: Its Nature, Origin, and Use. New York: Praeger.

BIBLIOGRAPHY 182

Churchland, P. S., Ramachandran, V. S., and Sejnowski, T. J. (1994). A critique of pure vision. Cam-bridge, MA: MIT Press.

Clark, A. (1997). Being There: Putting Brain, Body and World Together Again. Cambridge, MA: MITPress.

Clark, A. and Grush, R. (1999). Towards a cognitive robotics. Adaptive Behavior, 7(1):5–16.

Coehlo, J., Piater, J., and Grupen, R. (2001). Developing haptic and visual perceptual categories forreaching and grasping with a humanoid robot. Robotics and Autonomous Systems, 37:195–218.

Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. New York: Wiley.

Dario, P., Laschi, C., and Guglielmelli, E. (1997). Sensor and actuators for ”humanoid” robots. Ad-vanced Robotics, 11(6):567–584.

Dautenhahn, K. and Billard, A. (1999). Studying robot social cognition within a developmental psy-chology framework. In Proc. of Third Intl. Workshop on Advanced Mobile Robots.

Dautenhahn, K. and Nehaniv, C., editors (2002). Imitation in Animals and Artifacts. Cambridge, MA:MIT Press.

De Garis, H., Gers, F., Korkin, M., Agah, A., and Nawa, N. E. (1998). Cam-brain atr’s billion neuronartificial brain project: a three year progress report. Journal of Artificial Life and Robotics, 2:56–61.

Dekaban, A. (1959). Neurology of Infancy. Baltimore: Williams and Williams.

Demiris, Y. (1999). Robot Imitation Mechanisms in Robots and Humans. PhD thesis, Division ofInformatics, University of Edinburgh. unpublished.

Demiris, Y. and Hayes, G. (2002). Imitation as a dual-route process featuring predictive and learningcomponents: a biologically plausible computational model. In Dautenhahn, K. and Nehaniv, C.,editors, Imitation in Animals and Artifacts. Cambridge, MA: MIT Press.

Dennett, D. C. (1998). Brainchildren: A Collection of Essays. Cambridge, MA: MIT Press.

Dewey, J. (1896). The reflex arc concept in psychology. Psychological Review, 3:357–370. Originalwork published in 1896.

Di Paolo, A. E., editor (2002). Adaptive Behavior: Special issue on ”Plastic mechanisms, multipletimescales, and lifetime adaptation”, volume 10 (3-4).

Di Pellegrino, G., Fadiga, L., Fogassi, L., and Rizzolatti, G. (1992). Understanding motor events: aneurophysiological study. Experimental Brain Research, 91:176–180.

Diamond, A. (1990). Developmental time course in human infants and infant monkeys, and the neuralbases of inhibitory control in reaching. In The Development and Neural Bases of Higher CognitiveFunctions, volume 608, pages 637–676. New York Academy of Sciences.

BIBLIOGRAPHY 183

Dickinson, P. S. (2003). Neuromodulation in invertebrate nervous systems. In Arbib, M., editor, MITHandbook of Brain Theory and Neural Networks. Cambridge, MA: MIT Press.

Dominguez, M. and Jacobs, R. A. (2003). Developmental constraints aid the acquisition of binoculardisparity sensitivities. Neural Computation, 15(1):161–182.

Edelman, G. M. (1987). Neural Darwinism: The Theory of Neural Group Selection. New York: BasicBooks.

Edelman, G. M. and Tononi, G. (2001). Consciousness: How Matter Becomes Imagination. London:Penguin Books.

Eliot, L. (2001). Early Intelligence. London: Penguin Books.

Elliott, T. and Shadbolt, N. R. (2001). Growth and repair: instantiating a biologically inspired modelof neural development on the khepera robot. Robotics and Autonomous Systems, 36:149–169.

Elliott, T. and Shadbolt, N. R. (2003). Developmental robotics: manifesto and application. Philosoph-ical Transactions: Mathematical, Physical and Engineering Sciences, 361:2187–2206.

Elman, J., Sur, M., and Weng, J. J., editors (2002). Proc. of Second Intl. Conf. on Development andLearning. Workshop held at the Michigan State University, USA.

Elman, J. L. (1993). Learning and development in neural networks: the importance of starting small.Cognition, 48:71–99.

Elman, J. L., Bates, E. A., Johnson, H. A., Karmiloff-Smith, A., Parisi, D., and Plunkett, K. (1996).Rethinking Innateness: A Connectionist Perspective on Development. Cambridge, MA: MIT Press.A Bradford Book.

Fernald, A. (1985). Four-month-old infants prefer to listen to motherese. Infant Behavior and Devel-opment, 8:181–195.

Ferrell, C. B. and Kemp, C. C. (1996). An ontogenetic perspective on scaling sensorimotor intelligence.In Embodied Cognition and Action: Papers from the 1996 AAAI Fall Symposium.

Fitts, P. M. (1964). Perceptual-motor skill learning. In Melton, A., editor, Categories of HumanLearning, pages 243–285. New York: Academic Press.

Fitzhugh, R. (1961). Impulses and physiological states in theoretical models of nerve membrane.Biophy H, 1:445–456.

Fodor, J. A. (1981). Representations. Brighton, Sussex: Harvester Press.

Fodor, J. A. (1983). The Modularity of Mind. Cambridge, MA: MIT Press.

Fong, T., Nourbakhsh, I., and Dautenhahn, K. (2003). A survey of socially interactive robots. Roboticsand Autonomous Systems, 42:143–166.

BIBLIOGRAPHY 184

Forssberg, H. (1999). Neural control of human motor development. Current Opinion in Neurobiology,9:676–682.

Friston, K. J., Tononi, G., Reeke, G. N., Sporns, O., and Edelman, G. M. (1994). Value-dependentselection in the brain: simulation in a synthetic neural model. Neuroscience, 59(2):229–243.

Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti, G. (1996). Action recognition in the premotorcortex. Brain, 119:593–609.

Gell-Mann, M. (1995). What is complexity. Complexity, 1(1):16–19.

Gesell, A. (1946). The ontogenesis of infant behavior. In Carmichael, L., editor, Manual of ChildPsychology, pages 295–331.

Gibson, E. J. (1988). Exploratory behavior in the development of perceiving, acting, and the acquiringof knowledge. Annual Review of Psychology, 39:1–41.

Gibson, J. J. (1977). The theory of affordances. In Shaw, R. and Brandsford, J., editors, Perceiving,Acting, and Knowing: Toward and Ecological Psychology, pages 62–82.

Goldfield, E. C. (1995). Emergent Forms: Origins and Early Development of Human Action andPerception. New York: Oxford University Press.

Goldfield, E. C., Kay, B. A., and Warren, W. H. (1993). Infant bouncing: the assembly and tuning ofan action system. Child Development, 64:1128–1142.

Gomez, G., Lungarella, M., Eggenberger-Hotz, P., Matsushita, K., and Pfeifer, R. (2004). Simulatingdevelopment in a real robot: on the concurrent increase of sensory, motor, and neural complexity. InProc. of the Fourth Intl. Workshop on Epigenetic Robotics. to appear.

Gottlieb, G. (1991). Experiential canalization of behavioral development: Theory. DevelopmentalPsychology, 27:4–13.

Grillner, S. (1985). Neurobiological bases on rhythmic motor acts in vertebrates. Science, 228:143–149.

Grush, R. (2004). The emulation theory of representation: motor control, imagery, and perception.Behavioral and Brain Sciences. to appear.

Hadders-Algra, M. (2002). Variability in infant motor behavior: A hallmark of the healthy nervoussystem. Infant Behavior and Development, 25:433–451.

Hadders-Algra, M., Brogren, E., and Forssberg, H. (1996). Ontogeny of postural adjustments duringsitting in infancy: Variation, selection and modulation. Journal of Physiology, 493:273–288.

Haehl, V., Vardaxis, V., and Ulrich, B. (2000). Learning to cruise: Bernstein’s theory applied to skillacquisition during infancy. Human Movement Science, 19:685–715.

BIBLIOGRAPHY 185

Hafner, V. V., Fend.M., Lungarella, M., Pfeifer, R., Konig, P., and Kording, K. P. (2003). Optimal cod-ing for naturally occurring whisker deflections. In Proc. of the Joint Intl. Conf. on Neural Networksand Neural Information Processing, pages 805–812. Berlin: Springer-Verlag. LNCS 2714.

Hainline, L. (1998). How the visual system develops: normal and abnormal development. In Slater,A., editor, Perceptual Development: Visual, Auditory, and Speech Perception in Infancy, pages 5–50.Hove: Psychology Press, Ltd.

Hajek, B. (1988). Cooling schedules for optimal annealing. Math. Oper. Res., 13:311–329.

Haken, H. (1983). Synergetics: An Introduction. Berlin: Springer-Verlag.

Haken, H. (1996). Principles of Brain Functioning. A Synergetic Approach to Brain Activity, Behavior,and Cognition. Berlin: Springer-Verlag.

Halliday, M. (1975). Learning How To Mean: Explorations in the Development of Language. Cam-bridge, MA: MIT Press.

Hara, F. and Pfeifer, R., editors (2003). Morpho-functional Machines: The New Species (DesigningEmbodied Intelligence). Berlin: Springer-Verlag.

Harman, K. L., Humphrey, G. K., and Goodale, M. A. (1999). Active manual control of object viewsfacilitates visual recognition. Current Biology, 9:1315–1318.

Harris, C. (1998). On the optimal control of behavior: a stochastic perspective. Journal of NeuroscienceMethods, 83:73–88.

Harris, P. L. (1983). Infant cognition. In Haith, M. and Campos, J., editors, Handbook of ChildPsychology Vol.2: Infancy and Developmental Psychobiology, pages 689–782. New York: Wiley.

Hasselmo, M., Wyble, B., and Fransen, E. (2003). Neuromodulation in mammalian nervous systems.In Arbib, M., editor, MIT Handbook of Brain Theory and Neural Networks. Cambridge, MA: MITPress.

Hatsopoulos, N. G. (1996). Coupling the neural and physical dynamics in rhythmic movements. NeuralComputation, 8:567–581.

Hendriks-Jensen, H. (1996). Catching Ourselves in the Act. Cambridge, MA: MIT Press. A BradfordBook.

Howell, M. N. and Best, M. C. (2000). On-line pid tuning for engine idle-speed control using contin-uous action reinforcement learning automata. Control Engineering Practive, 8:147–154.

Ijspeert, A. (2003). Vertebrate locomotion. In Arbib, M., editor, MIT Handbook of Brain Theory andNeural Networks. Cambridge, MA: MIT Press.

Inaba, M., Nagasaka, K., and Kanehiro, F. (1996). Real-time vision-based control of swing motion bya human-form robot using the remote-brained approach. In Proc. of the 1996 IEEE/RSJ Intl. Conf.on Intelligent Robots and Systems, pages 15–22.

BIBLIOGRAPHY 186

Ishiguro, A., Ishimaru, K., Hayakawa, K., and Kawakatsu, T. (2003). Toward a ”well-balanced” design:a robotic case study. In Proc. of the Second Intl. Symp. on Adaptive Motion in Animals and Machines,number ThP-I-3. electronic proceedings.

Itti, L., Koch, C., and Niebur, E. (1998). A model of saliency-based visual-attention for rapid sceneanalysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254–1259.

Iverson, J. M. and Thelen, E. (1999). Hand, mouth and brain. Journal of Consciouness Studies,6(11-12):19–40.

Jensen, J. L., Thelen, E., Ulrich, B. D., and Zernicke, R. F. (1995). Adaptive dynamics of the legmovement patterns in human infants: Age-related differences in limb control. Journal of MotorBehavior, 27:366–374.

Johnson, M. H. (1997). Developmental Cognitive Neuroscience. Oxford, UK: Blackwell PublisherLtd.

Kadar, E. E., Maxwell, J. P., Stins, J., and Costall, A. (2002). Drifting towards a diffuse controlmodel of exploratory motor learning: a comparison of global and within-trial performance measures.Biological Cybernetics, 87:1–9.

Kato, N., Artola, A., and Singer, W. (1991). Developmental changes in the susceptibility to long-termpotentiation of neurons in rat visual cortex slices. Developmental Brain Research, 60:53–60.

Kay, B. A. (1988). The dimensionality of movement trajectories and the degrees of freedom problem:A tutorial. Human Movement Science, 7:343–364.

Keil, F. C. (1981). Constraints on knowledge and cognitive development. Psychological Review,88:197–227.

Kellman, P. J. and Arterberry, M. E. (1998). The Cradle of Knowledge. Cambridge, MA: MIT Press.A Bradford Book.

Kelso, S. J. (1995). Dynamic Patterns. Cambridge, MA: MIT Press. A Bradford Book.

Kelso, S. J. and Kay, B. A. (1987). Information and control: a macroscopic analysis of perception-action coupling. In Heuer, H. and Sanders, A., editors, Pespectives on Perception and Action, pages3–32.

Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983). Optimization by simulated annealing. Science,220:671–680.

Kirkpatrick, S. and Gregory, B. S. (1995). Simulated annealing. In Arbib, M., editor, The Handbookof Brain Theory and Neural Networks, pages 876–879.

Ko, Y., Challis, J., and Newell, K. (2003). Learning to coordinate redundant degrees of freedom in adynamic balance task. Human Movement Science, 22:47–66.

BIBLIOGRAPHY 187

Konczak, J., Borutta, M., and Dichgans, J. (1995). Development of goal-directed reaching in infants:hand trajectory formation and joint force control. Experimental Brain Research, 106:156–168.

Korner, A. F. and Kraemer, H. C. (1972). Individual differences in spontaneous oral behavior inneonates. In Bosma, J., editor, Proc. of the Third Symp. on Oral Sensation and Perception, pages335–346.

Kozima, H., Nakagawa, C., and Yano, H. (2002). Emergence of imitation mediated by objects. InProc. of the Second Intl. Workshop on Epigenetic Robotics, pages 59–61.

Kozima, H. and Yano, H. (2001). A robot that learns to communicate with human caregivers. In Proc.of the First Intl. Workshop on Epigenetic Robotics.

Krichmar, J. L. and Edelman, G. M. (2002). Machine psychology: autonomous behavior, perceptualcategorization and conditioning in a brain-based device. Cerebral Cortex, 12:818–830.

Kruschke, J. K. (1992). Alcove: An exemplar-based connectionist model of category learning. Psy-chological Review, 99:22–44.

Kuhl, P. K. (2000). Language, mind, and brain: experience alters perception. In Gazzaniga, M., editor,The New Cognitive Neurosciences, pages 99–115.

Kuniyoshi, Y., Yorozu, Y., Inaba.M., and Inoue.H. (2003). From visuo-motor self learning to earlyimitation – a neural architecture for humanoid learning. In Proc. of the 2003 Intl. Conf. on Roboticsand Automation, pages 3132–3139.

Lakoff, G. (1987). Women, Fire, and Dangerous Things: What Categories Reveal about the Mind.Chicago, Illinois: University of Chicago Press.

Lakoff, G. and Johnson, M. (1999). Philosophy in the Flesh: The Embodied Mind and Its Challengeto Western Thought. New York: Basic Books.

Lambrinos, D., Maris, M., Kobayashi, H., Labhart, T., Pfeifer, R., and Wehner, R. (1997). An au-tonomous agent navigating with a polarized light compass. Adaptive Behavior, 6:175–206.

Lederman, S. J. and Klatzky, R. L. (1990). Haptic exploration and object representation. In Goodale,M., editor, Vision and Action: The Control of Grasping, pages 98–109. New Jersey: Ablex.

Lee, T. S. and Yu, S. X. (1999). An information-theoretic framework for understanding saccadicbehaviors. In Solla, S. and Leen, T., editors, Proc. of the First Intl. Conf. on Neural InformationProcessing. Cambridge, MA: MIT Press.

Lichtensteiger, L. and Pfeifer, R. (2002). An optimal sensor morphology improves adaptability of neu-ral network controllers. In Dorronsoro, J., editor, Proc. of the Sixth Intl. Conf. on Neural Networks,pages 850–855. Berlin, Heidelberg: Springer-Verlag.

Lindblom, J. and Ziemke, T. (2003). Social situatedness of natural and artificial intelligence: Vygotskyand beyond. Adaptive Behavior, 11(2):79–96.

BIBLIOGRAPHY 188

Love, B. C. and Medin, D. L. (1998). Sustain: A model of human category learning. In Proc. of the15th Natl. Conf. on Artificial Intelligence (AAAI’98), pages 671–676.

Luisi, P. L. (2003). Autopoiesis: A review and a reappraisal. Naturwissenschaften, 90(2):49–59.

Lungarella, M. and Berthouze, L. (2002a). Adaptivity through physical immaturity. In Proc. of theSecond Intl. Workshop on Epigenetics Robotics, pages 79–86.

Lungarella, M. and Berthouze, L. (2002b). Adaptivity via alternate freeing and freezing of degrees offreedom. In Proc. of the 9th Intl. Conf. on Neural Information Processing, pages 492–497.

Lungarella, M. and Berthouze, L. (2002c). On the interplay between morphological, neural and envi-ronmental dynamics: a robotic case-study. Adaptive Behavior, 10(3-4):223–241.

Lungarella, M. and Berthouze, L. (2003). Learning to bounce: first lessons from a bouncing robot.In Proc. of the Second Intl. Symp. on Adaptive Motion in Animals and Machines, number ThP-II-4.electronic proceedings.

Lungarella, M. and Berthouze, L. (2004). Robot bouncing: on the interaction between body andenvironmental dynamics. In Iida, F., Pfeifer, R., Steels, L., and Kuniyoshi, Y., editors, EmbodiedArtificial Intelligence. Berlin: Springer-Verlag. LNCS.

Lungarella, M., Hafner, V. V., Pfeifer, R., and Yokoi, H. (2002a). An artificial whisker sensor forrobotics. In Proc. of the 15th Intl. Conf. on Intelligent Robots and Systems, pages 2931–2936.

Lungarella, M., Hafner, V. V., Pfeifer, R., and Yokoi, H. (2002b). Whisking: an unexplored sensorymodality. In Proc. of the 7th Intl. Conf. on the Simulation of Adaptive Behavior, pages 58–59.

Lungarella, M. and Metta, G. (2003). Beyond gazing, pointing, and reaching: a survey of developmen-tal robotics. In Proc. of the Third Intl. Workshop on Epigenetic Robotics, pages 81–89.

Lungarella, M., Metta, G., Pfeifer, R., and Sandini, G. (2003). Developmental robotics: a survey.Connection Science, 15(4):151–190.

Lungarella, M. and Pfeifer, R. (2001). Robots as cognitive tools: An information-theoretic analysisof sensory-motor data. In Proc. of the Second IEEE-RAS Intl. Conf. on Humanoid Robotics, pages245–252.

Lungarella, M. and Sporns, O. (2004). Methods for quantifying the informational structure of sensoryand motor data. Neuroinformatics. in preparation.

Manzotti, R. (2000). Intentional Robots. The Design of a Goal-seeking Environment-driven Agent.PhD thesis, University of Genova, Genova, Italy.

Marder, E. and Thirumalai, V. (2002). Cellular, synaptic and network effects of neuromodulation.Neural Networks, 15(4-6):479–493.

Marjanovic, M., Scassellati, B., and Williamson, M. (1996). Self-taught visually-guided pointing for ahumanoid robot. In From Animals to Animats 4: Proc. of the 4th Int. Conf. on Simulation of AdaptiveBehavior, pages 35–44. Cambridge, MA: MIT Press.

BIBLIOGRAPHY 189

Matsuoka, K. (1985). Sustained oscillations generated by mutually inhibiting neurons with adaptation.Biological Cybernetics, 52:367–376.

Maturana, R. H. and Varela, F. J. (1998). The tree of knowledge, the biological roots of human under-standing. Boston, London: Shambala Publications Inc.

McDonald, P. V., Emmerik, R. E., and Newell, K. M. (1989). The effects of practice on limb kinematicsin a throwing task. Journal of Motor Behavior, 21:245–264.

McGraw, M. B. (1940). Neuromuscular development of the human infant as exemplified by theachievement of erect locomotion. Journal of Pediatrics, 17:747–771.

McGraw, M. B. (1945). Neuromuscular maturation of the human infant. New York: Hafner.

Meltzoff, A. and Prinz, W. (2002). The Imitative Mind: Development, Evolution and Brain Bases.Cambridge, MA: MIT Press.

Meltzoff, A. N. and Berton, R. W. (1979). Intermodal matching in human neonates. Nature, 282:403–404.

Meltzoff, A. N. and Moore, M. K. (1977). Imitation of facial and manual gestures by human neonates.Science, 198:74–78.

Meltzoff, A. N. and Moore, M. K. (1997). Explaining facial imitation: A theoretical model. EarlyDevelopment and Parenting, 6:179–192.

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953). Equationof state calculations by fast computing machines. Journal of Chemical Physics, 21:1087–1092.

Metta, G. (2000). Babybot: A Study on Sensorimotor Development. Unpublished PhD Thesis, Univer-sity of Genova, Genova, Italy.

Metta, G. and Fitzpatrick, P. (2003). Early integration of vision and manipulation. Adaptive Behavior,11(2):109–128.

Metta, G., Sandini, G., and Konczak, J. (1999). A developmental approach to visually-guided reachingin artificial systems. Neural Networks, 12:1413–1427.

Metta, G., Sandini, G., Natale, L., and Panerai, F. (2001). Development and robotics. In Proc. of theFirst IEEE-RAS Intl. Conf. on Humanoid Robots, pages 33–42.

Miall, R. C., Weir, D. J., Wolpert, D. M., and F., S. (1993). Is the cerebellum a smith predictor? Journalof Motor Behavior, 25(3):203–216.

Mitra, S., Amazeen, P. G., and Turvey, M. T. (1998). Intermediate motor learning as decreasing active(dynamical) degrees of freedom. Human Movement Science, 17:17–65.

Miyakoshi, S., Yamakita, M., and Furata, K. (1994). Juggling control using neural oscillators. In Proc.of the 1994 IEEE/RSJ Intl. Conf. on Robots and Systems, volume 2, pages 1186–1193.

BIBLIOGRAPHY 190

Murata, S., Yoshida, E., Kurokawa, H., Tomita, K., and Kokaji, S. (2001). Self-repairing mechanicalsystems. Autonomous Robots, 10:7–21.

Mussa-Ivaldi, F. (1999). Modular features of motor control and learning. Current Opinion in Neurobi-ology, 9:713–717.

Nadel, J. (2003). Early Social Cognition. Intellectica. in press.

Nadel, J. and Butterworth, G., editors (1999). Imitation in infancy. Cambridge, MA: CambridgeUniversity Press.

Nagai, Y., Asada, M., and Hosoda, K. (2002). Developmental learning model for joint attention. InProc. of 15th Intl. Conf. on Intelligent Robots and Systems (IROS 2002), pages 932–937.

Natale, L., Metta, G., and Sandini, G. (2002). Development of auditory-evoked reflexes: visuo-acousticcues integration in a binocular head. Robotics and Autonomous Systems, 39(2):87–106.

Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press.

Newell, A. and Simon, H. (1976). Computer science as empirical study: Symbols and search. Com-munications of the ACM, 19:113–126.

Newell, K. and Vaillancourt, D. E. (2001). Dimensional change in motor learning. Human MovementScience, 20:695–715.

Newell, K. M. and van Emmerik, R. E. (1989). The acquisition of coordination: Preliminary analysisof learning to write. Human Movement Science, 8:17–32.

Newport, E. L. (1990). Maturational constraints on language learning. Cognitive Science, 14:11–28.

Nolfi, S. and Floreano, D. (2000). Evolutionary Robotics: The Biology, Intelligence, and Technologyof Self-organizing Machines. Cambridge, MA: MIT Press.

Ohgane, K., Ei.S., Kazutoshi, S., and Ohtuski, T. (2004). Emergence of adaptibility to time delay inbipedal locomotion. Biological Cybernetics, 90(2):125–132.

O’Leary, D. D., Schlagger, B. L., and Tuttle, R. (1994). Specification of neocortical areas and thalam-ocortical connections. Annual Review of Neuroscience, 17:419–439.

Panerai, F., Metta, G., and Sandini, G. (2002). Learning visual stabilization reflexes in robots withmoving eyes. Neurocomputing, 48(1-4):323–337.

Papoulis, A. (1991). Probability, Random Variables, and Stochastic Processes. New York: McGraw-Hill.

Pelaez-Nogueras, M., Gewirtz, J., and Markham, M. (1996). Infant vocalizations are conditioned bothby maternal imitation and motherese speech. Infant Behavior and Development, 19:670.

BIBLIOGRAPHY 191

Peper, L., Bootsma, R., Mestre, D., and Bakker, F. (1994). Catching balls: How to get the handto the right place at the right time. Journal of Experimental Psychology: Human Perception andPerformance, 20:591–612.

Pfeifer, R. (1996). Building ”fungus eaters”: Design principles of autonomous agents. In Maes, P.,Mataric, M., Meyer, J.-A., Pollack, J., and Wilson, S., editors, From animals to animats 4: Proc.of the Fourth Intl. Conf. on Simulation and Adaptive Behavior, pages 3–12. Cambridge, MA: MITPress. A Bradford Book.

Pfeifer, R. (2000). On the role of morphology and materials in adaptive behavior. In From Animals toAnimats 6: Proc. of the Sixth Intl. Conf. Simulation of Adaptive Behavior, pages 23–32.

Pfeifer, R. (2002). Robots as cognitive tools. Inlt. Journal of Cognition and Technology, 1(1):125–143.

Pfeifer, R. and Glatzeder, B. (2004). How the Body Shapes the Way We Think. Cambridge, MA: MITPress. forthcoming.

Pfeifer, R. and Lungarella, M., editors (2001). Proc. of Second Intl. Workshop Emergence and Devel-opment of Embodied Cognition. Workshop held in Beijing, PCR, unpublished.

Pfeifer, R. and Scheier, C. (1994). From perception to action: the right direction? In Gaussier, P. andNicoud, J.-D., editors, From Perception to Action, pages 1–11. IEEE Computer Society Press.

Pfeifer, R. and Scheier, C. (1997). Sensory-motor coordination: The metaphor and beyond. Roboticsand Autonomous Systems, 20:157–178.

Pfeifer, R. and Scheier, C. (1999). Understanding Intelligence. Cambridge, MA: MIT Press.

Piaget, J. (1945). La formation du symbole chez l’enfant. Geneve: Delachaux et Niestle Editions.

Piaget, J. (1953). The Origins of Intelligence. New York: Routledge.

Picard, R. (1997). Affective Computing. Cambridge, MA: MIT Press.

Piek, J. P. (2001). Is a quantitative approach useful in the comparison of spontaneous movements infullterm and preterm infants? Human Movement Science, 20:717–736.

Piek, J. P. (2002). The role of variability in early development. Infant Behavior and Development,156:1–14.

Piek, J. P. and Carman, R. (1994). Developmental profiles of spontaneous movements in infants. EarlyHuman Development, 39:109–126.

Prechtl, H. F. (1997). The importance of fetal movements. In Connolly, K. and Forssberg, H., editors,Neurophysiology and Neuropsychology of Motor Development, pages 42–53. Mac Keith Press.

Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1995). Simulated annealingmethods. In Numerical Recipes in C, pages 444–455. Cambridge, MA: Cambridge University Press,3rd edition.

BIBLIOGRAPHY 192

Prince, C. G., Berthouze, L., Kozima, H., Bullock, D., Stojanov, G., and Balkenius, C., editors (2003).Proc. of Third Intl. Workshop on Epigenetic Robotics: Modeling Cognitive Development in RoboticSystems. Lund University Cognitive Studies, 101.

Prince, C. G. and Demiris, Y., editors (2003). Adaptive Behavior: Special issue on ‘EpigeneticRobotics’, volume 11 (2).

Prince, C. G., Demiris, Y., Marom, Y., Kozima, H., and Balkenius, C., editors (2002). Proc. of SecondIntl. Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. LundUniversity Cognitive Studies, 94.

Purves, D. (1994). Neural Activity and the Growth of the Brain. Cambridge, MA: Cambridge Univer-sity Press.

Pylyshyn, Z. W. (1984). Computation and Cognition: Toward a Foundation for Cognitive Science.Cambridge, MA: MIT Press.

Reeke, G. N., Sporns, O., and Edelman, G. M. (1990). Synthetic neural modeling: the ‘darwin’ seriesof recognition automata. Proc. IEEE, 78:1498–1530.

Regan, D. (1997). Visual factors in hitting and catching. Journal of Sports Sciences, 15:533–558.

Rizzolatti, G. and Arbib, M. (1998). Language within our grasp. Trends in Neurosciences, 21(5):188–194.

Robinson, S. R. and Smotherman, W. P. (1992). Fundamental motor patterns of the mammalian fetus.Journal of Neurobiology, 23:1574–1600.

Rochat, P. (1987). Mouthing and grasping in neonates: Evidence for the early detection of what hardand soft substances afford for action. Infant Behavior and Development, 25:871–884.

Rochat, P. (1989). Object manipulation and exploration in 2 to 5-month-old infants. DevelopmentalPsychology, 25:871–884.

Rochat, P. and Striano, T. (2000). Perceived self in infancy. Infant Behavior and Development, 23:513–530.

Rojdestvenski, I., Cottam, M., Park, Y., and Oquist, G. (1999). Robustness and time-scale hierarchy inbiological systems. BioSystems, 50:71–82.

Rosen, R. (1991). Life Itself: A Comprehensice Inquiry into the Nature, Origin, and Fabrication ofLife. New York: Columbia University Press.

Rus, D. and Chirikjian, G., editors (2001). Autonomous Robots: Special issue on ”Self-ReconfigurableRobots”, volume 10 (1).

Rutkowska, J. C. (1994). Scaling up sensorimotor systems: constraints from human infancy. AdaptiveBehavior, 2(4):349–373.

BIBLIOGRAPHY 193

Rutkowska, J. C. (1995). Can development be designed? what we may learn from the cog project.In Advances in Artificial Life: Proc. of the Third European Conf. on Artificial Life, pages 383–395.Berlin: Springer-Verlag.

Saito, F., Fukuda, T., and Arai, F. (1994). Swing and locomotion control of a two-link brachiationrobot. IEEE Control Systems, 14:5–12.

Sandini, G. (1997). Artificial systems and neuroscience. In Proceedings of the Otto and MarthaFischbeck Seminare on Active Vision. Berlin, Germany: Wissenschaftskolleg zu Berlin.

Sandini, G., Metta, G., and Konczak, J. (1997). Human sensorimotor development and artificial sys-tems. In Proc, of the Intl. Symp. on Artificial Intelligence, Robotics, and Intellectual Human ActivitySupport for Nuclear Applications, pages 303–314.

Scassellati, B. (1998). Building behaviors developmentally: a new formalism. In Proc. of the 1998AAAI Spring Symp. on Integrating Robotics Research.

Scassellati, B. (2001). Foundations for a theory of mind for a humanoid robot. PhD thesis, MITDepartment of Electrical Engineering and Computer Science. Unpublished.

Schaal, S. (1999). Is imitation learning the route to humanoid robots? Trends in Cognitive Science,3(6):233–242.

Schaal, S. and Sternad, D. (1998). Programmable pattern generators. In Intl. Conf. on ComputationalIntelligence in Neuroscience, pages 48–51.

Scheier, C. and Lambrinos, D. (1996). Categorization in a real-world agent using haptic explorationand active perception. In Proc. of the Fourth Intl. Conf. on Simulation of Adaptive Behavior, pages65–75. Cambridge, MA: MIT Press.

Scheier, C. and Pfeifer, R. (1997). Information theoretic implications of embodiment for neural net-work learning. In Intl. Conf. on Artificial Neural Networks, pages 691–696.

Scheier, C., Pfeifer, R., and Kuniyoshi, Y. (1998). Embedded neural networks: exploiting constraints.Neural Networks, 11(7-8):1551–1569.

Schneider, K. and Zernicke, R. (1992). Mass, center of mass, and moment of inertia estimates forinfant limb segments. Journal of Biomechanics, 25:145–148.

Schneider, K., Zernicke, R., Ulrich, B., Jensen, J., and Thelen, E. (1990). Understanding movementcontrol in infants through the analysis of limb intersegmental dynamics. Journal of Motor Behavior,22:493–520.

Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80:1–27.

Shannon, C. (1948). A mathematical theory of communication. Bell Syst. Tech. Journal, 27:379–423.

BIBLIOGRAPHY 194

Sharkey, N. E. (2003). Biologically inspired robotics. In Arbib, M., editor, MIT Handbook of BrainTheory and Neural Networks. Cambridge, MA: MIT Press.

Sirois, S. and Mareshal, D. (2002). Models of habituation in infancy. Trends in Cognitive Sciences,6(7):293–298.

Slater, A. and Johnson, S. P. (1997). Visual sensory and perceptual abilities of the newborn: beyondthe blooming, buzzing confusion. In Simion, F. and Butterworth, G., editors, The Development ofSensory, Motor and Cognitive Capacities in Early Infancy: From Sensation to Cognition, pages121–141. Hove: Psychology Press.

Smitsman, A. W. and Schellingerhout, R. (2000). Exploratory behavior in blind infants: How toimprove touch? Infant Behavior and Development, 23:485–511.

Smotherman, W. P. and Robinson, S. R. (1988). Behavior of the fetus. Caldwell, NJ: Telford.

Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods. Ames, Iowa: Iowa State UniversityPress.

Spelke, E. S. (2000). Core knowledge. American Psychologist, 55:1233–1243.

Spencer, J. P. and Thelen, E. (1999). A multiscale state analysis of adult motor learning. ExperimentalBrain Research, 128:505–516.

Spong, M. W. (1995). Swing up control of the acrobot. IEEE Control Systems Magazine, pages 49–55.

Spong, M. W. and Vidyasagar, M. (1989). Robot Dynamics and Control. New York: John Wiley andSons.

Sporns, O. (2003). Embodied cognition. In Arbib, M., editor, MIT Handbook of Brain Theory andNeural Networks. Cambridge, MA: MIT Press.

Sporns, O. (2004). Developing neuro-robotic models. In Mareshal, D., Sirois, S., and Westermann, G.,editors, Constructing Cognition. Oxford University Press. to appear.

Sporns, O. and Alexander, W. (2002). Neuromodulation and plasticity in an autonomous robot. NeuralNetworks, 15:761–774.

Sporns, O., Almassy, N., and Edelman, G. (2000). Plasticity in value systems and its role in adaptivebehavior. Adaptive Behavior, 8(2):129–148.

Sporns, O. and Edelman, G. M. (1993). Solving bernstein’s problem: a proposal for the developmentof coordinated movement by selection. Child Development, 64:960–981.

Sporns, O. and Pegors, T. (2004). Information-theoratical aspects of embodied artificial intelligence.In Iida, F., Pfeifer, R., Steels, L., and Kuniyoshi, Y., editors, Embodied Artificial Intelligence. Berlin:Springer-Verlag. LNCS.

Steels, L. (1994). The artificial life roots of artificial intelligence. Artificial Life, 1:75–110.

BIBLIOGRAPHY 195

Steels, L. (2003). Personal communication.

Steuer, R., Kurths, J., Daub, C., Weise, J., and Selbig, J. (2002). The mutual information: detectingand evaluating dependencies between variables. Bioinformatics, 18:231–240. Suppl. 2.

Stiles, J. (2000). Neural plasticity and cognitive development. Developmental Neuropsychology,18(2):237–272.

Stoica, A. (2001). Robot fostering techniques for sensory-motor development of humanoid robots.Robotics and Autonomous Systems, 37:127–143.

Streri, A. (1993). Seeing, Reaching, Touching: The Relations between Vision and Touch in Infancy.Cambridge,MA: MIT Press.

Streri, A. and Gentaz, E. (2003). Cross-modal recognition of shapes from hand to eyes in newborns.Somatosensory and Motor Research, 20:11–16.

Taga, G. (1991). Self-organized control of bipedal locomotion by neural oscillators in unpredictableenvironment. Biological Cybernetics, 65:147–159.

Taga, G. (1994). Emergence of bipedal locomotion through entrainment among the neuro-musculo-skeletal system and environment. Physica D, 75:190–208.

Taga, G. (1995). A model of the neuro-musculo-skeletal system for human locomotion. BiologicalCybernetics, 73:113–121.

Taga, G. (1997). Freezing and freeing degrees of freedom in a model neuro-musculo skeletal systemsfor the development of locomotion. In Proceedings of 16th International Society of BiomechanicsCongress, page 47.

Taga, G. (2000). Nonlinear dynamics of the human motor control. In Proc. of the First Intl. Symp. onAdaptive Motion of Animals and Machines.

Taga, G., Takaya, R., and Konishi, Y. (1999). Analysis of general movements of infants towardsunderstanding of developmental principle for motor control. In Proc. of 1999 IEEE Intl. Conf. onSystems, Man, and Cybernetics, pages 678–683.

Tarapore, D., Lungarella, M., and Gomez, G. (2004). Fingerprinting agent-environment interactionvia information theory. In Proc. of the 8th Intl. Conf. On Intelligent Autonomous Systems, pages512–520.

Te Boekhorst, R., Lungarella, M., and Pfeifer, R. (2003). Dimensionality reduction through sensory-motor coordination. In Kaynak, O., Alpaydin, E., Oja, E., and Xu, L., editors, Proc. of the Joint Int.Conf. ICANN/ICONIP, pages 496–503. LNCS 2714.

Teuscher, C., Mange, D., Stauffer, A., and Tempesti.G. (2003). Bio-inspired computing tissues: to-wards machines that evolve, grow, and learn. Biosystems, 68:235–244.

Thelen, E. (1979). Rhythmical stereotypes in normal human infants. Animal Behaviour, 27:699–715.

BIBLIOGRAPHY 196

Thelen, E. (1981). Kicking, rocking and waving: Contextual analysis of rhythmical stereotypies innormal human infants. Animal Behaviour, 29:3–11.

Thelen, E. (1995). Time-scale dynamics and the development of an embodied cognition. In Port, R.and van Gelder, T., editors, Mind as Motion: Explorations in the Dynamics of Cognition, pages69–100. Cambridge, MA: MIT Press.

Thelen, E. (1999). Dynamics mechanism of change in early perceptuo-motor development. In McClel-land, J. and Siegler, S., editors, 29th Carnegie Symposium on Cognition Mechanisms of CognitiveDevelopment: Behavioral and Neural Perspectives. October. Pittsburgh.

Thelen, E. and Fischer, D. (1983). The organization of spontaneous leg movements in newborn infants.Journal of Motor Behavior, 15:353–377.

Thelen, E., Fisher, D., and Ridley-Johnson, R. (1984). The relationship between physical growth anda newborn reflex. Infant Behavior and Development, 7:479–493.

Thelen, E. and Smith, L. (1994). A Dynamic Systems Approach to the Development of Cognition andAction. Cambridge, MA: MIT Press. A Bradford Book.

Thorndike, E. L. (1911). Animal Intelligent. New York: Macmillan.

Thrun, S. (1992). The role of exploration in learning control. In White, D. and Sofge, D., editors,Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, pages 527–559. NewYork: Van Nostrand Reinhold.

Thrun, S. (1995). Exploration in active learning, pages 381–384.

Toda, M. (1982). Man, Robot, and Society. The Hague, The Netherlands: Nijhoff.

Tononi, G., Sporns, O., and Edelman, G. (1994). A measure for brain complexity: Relating functionalsegregation and integration in the nervous system. Proc. of the Natl. Academy of Science (USA),91:5033–5037.

Tononi, G., Sporns, O., and Edelman, G. (1996). A complexity measure for selective matching ofsignals by the brain. Proc. of the Natl. Academy of Science (USA), 93:3422–3427.

Trevarthen, C. (1993). The function of emotions in early infant communication and development, pages48–81.

Triesch, J. and Jebara, T., editors (2004). Proc. of Third Intl. Conf. on Development and Learning:Developing Social Brains. Conference will take place at the Salk Institute for Biological Studies, LaJolla, California.

Turing, A. M. (1948). Intelligence Machinery, volume 5.

Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236):433–460.

BIBLIOGRAPHY 197

Turkewitz, G. and Kenny, P. A. (1982). Limitation on input as a basis for neural organization andperceptual development: A preliminary theoretical statement. Developmental Psychology, 15:357–368.

Turrigiano, G. G. and Nelson, S. B. (2000). Hebb and homeostasis in neural plasticity. Current Opinionin Neurobiology, 10:358–364.

Turvey, M. T. and Fitzpatrick, P. (1993). Commentary: Development of perception-action systems andgeneral principles of pattern formation. Child Development, (64):1175–1190.

Vaal, J., van Soestl, A. J., Hopkins, B., Sie, L. T., and van der Knaap, M. S. (2001). Development ofspontaneous leg movements in infants with and without periventricular leukomalacia. ExperimentalBrain Research, 135:94–105.

Van Heijst, J. J., Touwen, B. C., and Vos, J. E. (1999). Implications of a neural network model of earlysensorimotor development for the field of developmental neurology. Early Human Development,55:77–95.

Van Heijst, J. J. and Vos, J. E. (1997). Self-organizing effects of spontaneous neural activity on thedevelopment of spinal locomotor circuits in vertebrates. Biological Cybernetics, 77:185–195.

Varela, F., Thompson, E., and Rosch, E. (1991). The Embodied Mind. Cambridge, MA: MIT Press.

Varshavskaya, P. (2002). Behavior-based early language development on a humanoid robot. In Proc.of the Second Intl. Conf. on Epigenetics Robotics, pages 149–158.

Vereijken, B., van Emmerik, R. E., Whiting, H. T., and Newell, K. M. (1992). Free(z)ing degrees offreedom in skill acquisition. Journal of Motor Behavior, 24:133–142.

Von der Malsburg, C. (2003). Self-organization and the brain. In Arbib, M., editor, MIT Handbook ofBrain Theory and Neural Networks. Cambridge, MA: MIT Press.

Von Hofsten, C. (1984). Developmental changes in the organization of prereaching movements. De-velopmental Psychology, 20:378–388.

Von Hofsten, C. (1993). Prospective control: A basic aspect of action development. Human Develop-ment, 36:253–270.

Von Hofsten, C., Vishton, P., Spelke, E., Feng, G., and Rosander, K. (1998). Predictive action ininfancy: Head tracking and reaching for moving objects. Cognition, 67(3):255–285.

Vygotsky, L. (1962). Thought and Language. Cambridge, MA: MIT Press. Original work publishedin 1934.

Walter, G. W. (1950). An imitation of life. Scientific American, 182(5):42–45.

Walter, G. W. (1951). A machine that learns. Scientific American, 185(2):60–63.

Wang, D. (1995). Habituation. In Arbib, M., editor, The Handbook of Brain Theory and NeuralNetworks, pages 441–444.

BIBLIOGRAPHY 198

Webb, B. (2001). Can robots make good models of biological behaviour? Behavioral and BrainSciences, 24:1033–1050.

Weng, J., Hwang, W., Zhang, Y., Yang, C., and Smith, R. (2000). Developmental humanoids: Hu-manoids that develop skills automatically. In Proc. of the 1st IEEE-RAS Conf. on Humanoid Robots.

Weng, J., McClelland, J., Pentland, A., Sporns, O., Stockman, I., Sur, M., and Thelen, E. (2001).Autonomous mental development by robots and animals. Science, 291(5504):599–600.

Weng, J. J., editor (2000). NSF/DARPA Workshop on Development and Learning. Workshop held inCambridge, USA.

Westermann, G. (2000). Constructivist Neural Network Models of Cognitive Development. Unpub-lished PhD Thesis, Division of Informatics, University of Edinburgh.

Westermann, G., Lungarella, M., and Pfeifer, R., editors (2001). Proc. of First Intl. Workshop onDevelopmental Embodied Cognition. Workshop held in Edinburgh, Scotland, unpublished.

Whiten, A. (2000). Primate culture and social learning. Cognitive Science, 24(3):477–508.

Whitman, P. and Kalos, M. (1982). Monte Carlo Methods. New York: Springer-Verlag.

Williamson, M. (1998). Neural control of rhythmic arm movements. Neural Networks, 11(7-8):1379–1394.

Williamson, M. (2001). Robot arm control exploiting natural dynamics. PhD thesis, MIT Departmentof Electrical Engineering and Computer Science. Unpublished.

Wolpert, D., Doya, K., and Kawato, M. (2003). A unifying computational framework for motor controland social interaction. Philosophical Transactions of the Royal Society London B, 358:593–602.

Wolpert, D., Ghahramani, Z., and Flanagan, R. (2001). Perspectives and problems in motor learning.Trends in Cognitive Sciences, 5(11):487–494.

Wood, D., Bruner, J., and Ross, G. (1976). The role of tutoring in problem solving. Journal of ChildPsychology and Psychiatry, (17):181–191.

Yarbus, A. (1967). Eye Movements and Vision. New York: Plenum Press.

Yoshikawa, Y., Koga, J., Asada, M., and Hosoda, K. (2003). A constructive model of mother-infantinteraction: toward infant’s vowel articulation. Connection Science, 15(4):211–229.

Zernicke, R. F. and Schneider, K. (1993). Biomechanics and developmental neuromotor control. ChildDevelopment, 64:982–1004.

Ziemke, T. (2003). On the role of robot simulations in embodied cognitive science. Artificial Intelli-gence and the Simulation of Behavior Journal, 1(4).

Zlatev, J. and Balkenius, C. (2001). Introduction: Why ”epigenetic robotics”? In Proc. of the FirstIntl. Workshop on Epigenetic Robotics, pages 1–4.

exploring principles toward a developmental theory of

Documents