exploring principles toward a developmental theory of
TRANSCRIPT
Exploring Principles Toward a DevelopmentalTheory of Embodied Artificial Intelligence
Dissertation
zur
Erlangung der naturwissenschaftlichen Doktorwurde
(Dr. sc. nat.)
vorgelegt der
Mathematisch-naturwissenschaftlichen Fakultat
der
Universitat Zurich
von
Max Lungarella
aus
Italien
Begutachtet von
Prof. Dr. Rolf Pfeifer
Prof. Dr. Yasuo Kuniyoshi
Prof. Dr. Olaf Sporns
Zurich 2004
Abstract
Embodied artificial intelligence is an increasingly popular research paradigm that studies intelligence
and intelligence-like processes by putting a strong emphasis on the dynamical and reciprocal interac-
tion across multiple time scales between body and control structure of an agent, and its environment.
Although a growing number of examples document the power of this novel approach, so far, the role
of development has been marginalized, or neglected altogether. However, is it possible to understand
natural intelligence, or create an artificial one without taking into account development?
The work presented in this thesis is aimed at tackling this question. It is based on two core as-
sumptions: (a) embedding the coupling of control, body, and environment in a developmental frame-
work favors the emergence of stable behavioral patterns, and leads to adaptivity and robustness against
changes of body and environment not attainable otherwise; (b) the study of mechanisms underlying de-
velopment yields the key to a deeper understanding of intelligent behavior. The methodology adopted
is synthetic and two-pronged: on the one hand, robot technology is used to instantiate and investigate
models originating from developmental sciences, and eventually to new hypotheses about the nature
of intelligence; on the other hand, the aim is to construct better robotic systems by exploiting insights
gained from studies on development.
This thesis documents the prolific combination of embodied and developmental aspects of intelli-
gent behavior through a series of robotics case-studies in which the synergetic interaction of control,
body, and environment is explored, quantified, and purposively exploited. Moreover, it highlights the
importance of exploratory activity from the perspective of dynamical systems in the case of motor
skill acquisition, and from an information-theoretic and statistical point of view in the case of category
learning. Various mechanisms related to exploration are examined: freezing and freeing of degrees
of freedom, physical and neural entrainment, the integration of multiple time-scales, value systems,
and the self-structuring of information. As well as providing a wealth of experimental support for
the methodology advocated by developmental robotics, this thesis also outlines a set of novel design
principles for developmental systems.
i
Preface
Seven out of ten chapters of this thesis are based on material that is either published or will appear
soon. As far as possible, I have tried to weld the individual contributions into a single smooth structure.
Chapter 1 introduces the philosophy of action of developmental robotics, and presents a set of partially
novel design principles for developmental systems. These principles are then fleshed out with concrete
examples in the chapters 2 to 9. Chapter 10, eventually, concludes the thesis by summarizing its main
contributions. Here, for what it is worth, are the prior sources for parts of the text:
Chapter 2
• Lungarella,M., Metta,G., Pfeifer,R. and Sandini,G. (2003). Developmental robotics: a survey.
Connection Science (special issue on Epigenetic Robotics), L.Berthouze and T.Ziemke (eds.),
vol.15, no.4, p.151-190.
Chapter 3
• Lungarella,M. and Berthouze,L. (2002). On the interplay between morphological, neural, and
environmental dynamics: a robotic case-study. Adaptive Behavior (special issue on Plastic
Mechanisms, Multiple Time Scales, and Lifetime Adaptation), E.Di Paolo (ed.), vol.10, no.3/4,
p.223-241.
Chapter 4
• Berthouze,L. and Lungarella,M. (2004). Motor skill acquisition under environmental perturba-
tions: on the necessity of alternate freezing and freeing of degrees of freedom. To appear in
Adaptive Behavior, vol.12, no.1.
Chapter 5
• Lungarella,M. and Berthouze,L. (2004). Robot bouncing: on the synergy between neural and
body-environment dynamics. To appear in In Iida,F., Pfeifer,R., Steels,L. and Kuniyoshi,Y (eds.)
Embodied Artificial Intelligence. Berlin: Springer-Verlag.
ii
Preface iii
Chapter 7
• Lungarella,M. and Pfeifer,R. (2001). Robots as cognitive tools: information-theoretic analysis of
sensory-motor data. In Proc. of the 2nd IEEE-RAS Int. Conf. on Humanoid Robotics, p.245-252.
Chapter 8
• Te Boekhorst,R., Lungarella,M. and Pfeifer,R. (2003). Dimensionality reduction through
sensory-motor coordination. In Proc. of the Joint Int. Conf. on Artificial Neural Networks
and Neural Information Processing, p.496-503, Lecture Notes in Computer Science 2714.
Chapter 9
• Tarapore,D., Lungarella,M. and Gomez,G. (2004). Fingerprinting agent-environment interaction
via information theory. In Proc. of the 8th Intl. Conf. on Intelligent Autonomous Systems, p.512-
520.
Other publications (in chronological order)
• Meyer,F., Sprowitz,A., Lungarella,M. and Berthouze,L. (2004). Simple and low-cost compliant
leg-foot system. Submitted to the 17th Intl. Conf. on Intelligent Robots and Systems.
• Tarapore,D., Lungarella,M. and Berthouze,L. (2004). Categorization of simple objects by em-
bodied agents: a statistical approach. Submitted to the 17th Intl. Conf. on Intelligent Robots and
Systems.
• Gomez,G., Lungarella,M., Eggenberger-Hotz,P., Matsushita,K. and Pfeifer,R. (2004). Simu-
lating development in a real robot: on the concurrent increase of sensory, motor, and neural
complexity. To appear in Proc. of the 4th Intl. Workshop on Epigenetic Robotics.
• Lungarella,M. and Berthouze,L. (2003). Learning to bounce: First lessons from a bouncing
robot. In Proc. of the Second Intl. Symp. on Adaptive Motion in Animals and Machines. ThP-II-
4, electronic proceedings.
• Lungarella,M. and Metta,G. (2003). Beyond gazing, pointing, and reaching: A survey of devel-
opmental robotics. In Proc. of the 3rd Int. Workshop on Epigenetic Robotics, p.81-89.
• Hafner,V.V., Fend,M., Lungarella,M., Pfeifer,R., Konig,P. and Kording,K.P. (2003). Optimal
coding for naturally occurring whisker deflections. In Proc. of the Joint Int. Conf. on Neural
Networks and Neural Information Processing, p.805-812. Berlin: Springer-Verlag. Lecture
Notes in Computer Science 2714.
Preface iv
• Lungarella,M. and Berthouze,L. (2002). Adaptivity via alternate freeing and freezing of degrees
of freedom. In Proc. of the 9th Int. Conf. on Neural Information Processing, p.492-497.
• Lungarella,M. and Berthouze,L. (2002). Adaptivity through physical immaturity. In Proc. of the
Second Intl. Workshop on Epigenetic Robotics, p.79-86.
• Lungarella,M., Hafner,V.V, Pfeifer,R. and Yokoi,H. (2002). An artificial whisker sensor for
robotics. In Proc. of the 15th Intl. Conf. on Intelligent Robots and Systems, p.2931-2936.
• Lungarella,M., Hafner,V.V, Pfeifer,R. and Yokoi,H. (2002). Whisking: an unexplored sensory
modality. In Proc. of the 7th Intl. Conf. on the Simulation of Adaptive Behavior, p.58-59.
Contents
1 Introduction 1
1.1 Historical perspective and paradigm shift . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Embodiment and its implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 The importance of development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Developmental robotics: the short version . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Design principles of developmental robotics . . . . . . . . . . . . . . . . . . . . . . . 7
1.5.1 The principle of cheap design . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5.2 The principle of ecological balance . . . . . . . . . . . . . . . . . . . . . . . 8
1.5.3 The value principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.4 The principle of design for emergence . . . . . . . . . . . . . . . . . . . . . . 10
1.5.5 The time scales integration principle . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.6 The starting simple principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.7 The principle of information self-structuring . . . . . . . . . . . . . . . . . . 13
1.5.8 The principle of exploratory activity . . . . . . . . . . . . . . . . . . . . . . . 14
1.5.9 The principle of social interaction . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 Contributions of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Developmental Robotics: The Long Version1 21
2.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 In the beginning there was the body . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Facets of development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.1 Development is an incremental process . . . . . . . . . . . . . . . . . . . . . 27
2.4.2 Development as a set of constraints . . . . . . . . . . . . . . . . . . . . . . . 29
1Appeared as Lungarella, M., Metta, G., Pfeifer, R. and Sandini, G. Developmental robotics: a survey. Connection Sci-ence, 15(4), pp.151-190, 2004.
v
CONTENTS vi
2.4.3 Development as a self-organizing process . . . . . . . . . . . . . . . . . . . . 30
2.4.4 Degrees of freedom and motor activity . . . . . . . . . . . . . . . . . . . . . . 30
2.4.5 Self-exploratory activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.6 Spontaneous activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.7 Anticipatory movements and early abilities . . . . . . . . . . . . . . . . . . . 33
2.4.8 Categorization and sensory-motor coordination . . . . . . . . . . . . . . . . . 34
2.4.9 Neuromodulation, value and neural plasticity . . . . . . . . . . . . . . . . . . 34
2.4.10 Social interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.11 Intermediate discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Research landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.1 Socially oriented interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5.2 Non-social interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.3 Agent-related control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.4 Mechanisms and processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.5.5 Intermediate discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.6 Developmental robotics: existing theoretical frameworks . . . . . . . . . . . . . . . . 50
2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.8 Future prospects and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3 Freezing and Freeing Degrees of Freedom2 57
3.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3 Learning to swing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.4 Experimental framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4.1 Neural oscillators and joint synergy . . . . . . . . . . . . . . . . . . . . . . . 63
3.4.2 Joint control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.5 Experimental results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.5.1 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.5.2 Exploratory process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5.3 Experimental observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4 Alternate Freezing and Freeing of Degrees of Freedom3 84
4.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2Appeared as Lungarella, M. and Berthouze, L. (2002). On the interplay between morphological, neural, and environ-mental dynamics: a robotic case-study. Adaptive Behavior, 10(3-4), pp.223-241, 2002.
3To appear as Berthouze, L. and Lungarella, M. Motor skill acquisition under environmental perturbations: on the neces-sity of alternate freezing and freeing of degrees of freedom. Adaptive Behavior, 12(1), 2004.
CONTENTS vii
4.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3 Pendulation study and release of the peripheral degrees of freedom . . . . . . . . . . . 87
4.4 Adding nonlinear perturbations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.5 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.5.1 Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.5.2 Experimental observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.6 Conclusion and future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5 On the Synergy Between Neural and Body-Environment Dynamics4 105
5.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3 Hypotheses on infant bouncing learning . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.4 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.4.1 Neural rhythm generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.4.2 Selection of the neural control parameters . . . . . . . . . . . . . . . . . . . . 111
5.5 Experiments and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.5.1 Scenario 1 – Free oscillations . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.5.2 Scenario 2 – Forced oscillations without ground contact . . . . . . . . . . . . 113
5.5.3 Scenario 3 – Forced oscillations with ground contact (ωp = 0) . . . . . . . . . 113
5.5.4 Scenario 4 – Forced oscillations with ground contact (ωp > 0) . . . . . . . . . 114
5.6 Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6 Value-based Stochastic Exploration 118
6.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.3 Developmental inspiration and related work . . . . . . . . . . . . . . . . . . . . . . . 120
6.4 Enter simulated annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.5 Parameter exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.6 Control problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.7 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.8 Real world setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1324To appear as Lungarella, M. and Berthouze L. (2004). Robot bouncing: on the synergy between neural and body-
environment dynamics. In Iida, F., Pfeifer, R., Steels, L. and Kuniyoshi, Y. (eds.) Embodied Artificial Intelligence. Berlin:Springer-Verlag.
CONTENTS viii
7 Information-theoretic Analysis of Sensory Data5 136
7.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.3 Sensory-motor coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.5 Analysis methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.8 Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.9 Information theoretic appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8 Dimensionality Reduction through Sensory-Motor Interaction6 147
8.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.3 Real-world instantiation and environmental setup . . . . . . . . . . . . . . . . . . . . 149
8.4 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8.5 Experiments and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.6 Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
9 Fingerprinting Agent-Environment Interaction 7 156
9.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
9.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
9.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
9.6 Data Analysis and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
9.6.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.6.2 Entropy and mutual information . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.6.3 Cumulated sensor activation . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
9.6.4 Pre-processed image entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
9.7 Further Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1665Appeared as Lungarella, M. and Pfeifer, R. Robot as cognitive tools: information-theoretic analysis of sensory-motor
data. In Proc. of the 2nd IEEE-RAS Int. Conf. on Humanoid Robotics, pp.245-252, 2001.6Te Boekhorst, R., Lungarella, M. and Pfeifer, R. Dimensionality reduction through sensory-motor coordination. In Proc.
of the Joint Int. Conf. on Artificial Neural Networks and Neural Information Processing, Lecture Notes in Computer Science2714, pp.805-812, 2003.
7Tarapore, D., Lungarella, M. and Gomez, G. Fingerprinting agent-environment interaction via information theory. InProc. of the 8th Int. Conf. on Intelligent Autonomous Systems, pp.512-520, 2004.
CONTENTS ix
10 Summary and Conclusion 168
10.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
10.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
List of Figures
1.1 Coupling between body, control structure, and environment embedded in a develop-
mental framework. Shown is the information flow, e.g., the neural system affects the
musculo-skeletal apparatus via motor signals, and conversely proprioceptive sensory
information indicating the current state of the musculo-skeletal system is fed back to
the neural system. Similarly, information flows back and forth between body and envi-
ronment, and from the environment to the control structure. It follows that these three
factors cannot be considered in isolation. . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Interaction between developmental sciences, embodied artificial intelligence, and
robotics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Experimental variety: seven chapters, seven case-studies. The labels denote one or two
design principle(s) the case-study is mainly intended to address. The numbers indicate
the chapter in which the case-study is presented. . . . . . . . . . . . . . . . . . . . . 19
2.1 Examples of robots used in developmental robotics. From left to right, top to bottom:
BabyBot (LiraLab), BabyBouncer (AIST), Infanoid (CRL), COG (MIT). . . . . . . . . 24
3.1 Humanoid robot used in our experiments. . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Schematics of the experimental system and the control architecture. Proprioceptive
feedback consists of the visual position of the hip marker in the frame of reference
centered on the hip position when the robot is in its resting position, i.e., vertical po-
sition. Joint synergy was only activated in experiments involving coordinated 2-DOF
control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
x
LIST OF FIGURES xi
3.3 Comparison between the output of the pulse generator (thick impulse) and the output
of the oscillator (solid line) for three different configurations of τu and τv, given a
same proprioceptive feedback (dotted line). The control settings were set as follows:
τu = 0.02,τv = 0.25 (top); τu = 0.06, τv = 0.25 (middle); τu = 0.06, τv = 0.75 (bottom).
Note that while the ratio τuτv
is unchanged between the top and the bottom graph, both
the frequency of the output and the number of impulses per period (i.e., the shape of
the output) are changed. The vertical axis denotes the amplitude of each signal. The
horizontal axis denotes time steps (one time step is 33ms). . . . . . . . . . . . . . . . 65
3.4 Value-dependent exploration. The upper graph depicts the time series of the oscillatory
movement of the robot’s hip (top) and the associated value v in the value system (bot-
tom). Rectangular areas point to decreases of value caused by habituation. The lower
graph depicts the corresponding trajectories in parameter space. Oval areas point at
dense regions of high yield parameter settings, i.e., the large oscillations observed in
the time series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.5 Value landscapes (left: hip parameter space; right: knee space) uncovered by a single
exploratory run in an independent 2-DOF configuration (ωs = 0). The size of a dot
(a control setting visited by the exploratory process) is proportional to the value v
obtained for that particular control setting. Initial conditions were similar for both
joints, namely, τh,ku ∈ [0.02,0.04] and τh,k
v ∈ [0.2,0.4]. The exploratory run took roughly
10 minutes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.6 Probability distribution functions of value landscapes obtained in three different sce-
narios: independent 2-DOF exploration (top), 1-DOF exploration (middle) and boot-
strapped 2-DOF (bottom). The corresponding value landscapes are found in fig-
ure 3.5, 3.11(right) and 3.13 respectively. In each graph, the value space [0.0,0.6]
was discretized into 50 bins. Simply stated, each graph indicates the probability (verti-
cal axis) that a value v (horizontal axis) occurs during the exploratory run considered.
In the three scenarios, same initial conditions were used. . . . . . . . . . . . . . . . . 70
3.7 Value landscape obtained during a systematic exploration of the knee parameter with an
arbitrarily chosen hip parameter setting (τhu = 0.045,τh
v = 0.65). The parameter space
was discretized in a 15x15 sampling and the figure is a linear approximation of the
resulting values v. Brighter colors denote higher-yield settings. The experiment lasts
about 150 minutes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
LIST OF FIGURES xii
3.8 Effect of a small change in the hip control parameters on the ankle-hip phase plots in
the independent 2DOF configuration: left, oscillatory behavior without a true station-
ary regime (τhu = 0.060, τh
v = 0.60, τku = 0.03, τk
v = 0.3); right, no oscillatory behavior
(τhu = 0.065, τh
v = 0.65, τku = 0.03, τk
v = 0.3). In both graphs, the axes denote the hori-
zontal coordinates of the hip and ankle markers’ visual positions. . . . . . . . . . . . . 72
3.9 Evidence of preferred stable states and phase transitions in the independent 2DOF
configuration: successive pseudo-stationary regimes obtained with τhu = 0.055, τh
v =
0.55, τku = 0.03, τk
v = 0.3. Each graph shows the corresponding ankle-hip phase plot.
In all graphs, the axes denote the horizontal coordinate of the hip and ankle markers’
visual positions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.10 Large amplitude smooth performance after a long transient: left, the ankle-hip phase
plot with τhu = 0.055, τh
v = 0.65, τku = 0.025, and τk
v = 0.35; right, the corresponding
time-series for hip and ankle visual positions and motor commands. . . . . . . . . . . 73
3.11 Value landscape (hip space) uncovered by a single exploratory run in a 1-DOF configu-
ration, i.e., the second DOF (knee) is frozen. The size of a dot (a control setting visited
by the exploratory process) is proportional to the value v obtained for that particular
control setting. Initially, τhu and τh
v were randomly selected in the interval [0.02,0.04]
and [0.2,0.4] respectively. The exploratory run took roughly 10 minutes. . . . . . . . . 74
3.12 Value landscape obtained during a systematic exploration of the hip parameter space in
a 1-DOF configuration, i.e., the second DOF (knee) was frozen. The parameter space
was discretized in a 15x15 sampling and the figure is a linear approximation of the
resulting values. Brighter colors denote higher-yield settings. The experiment took
about 150 minutes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.13 Effect of the freeing of the hip DOF on the exploration of the 2-DOF configuration.
Left, value landscape uncovered by a single exploratory run in a 1-DOF configuration,
i.e., the second DOF (knee) was frozen. When the system reached a stable oscilla-
tory state, here denoted by a white triangle (roughly [0.7,0.04]), the second DOF was
released. The right graph shows the value landscape uncovered by the exploratory pro-
cess in the resulting 2-DOF configuration, with an initial condition represented by the
white rectangle (roughly [0.3,0.03]). In both graphs, the size of a dot (a control set-
ting visited by the exploratory process) is proportional to the value v obtained for that
particular control setting. Initially, τh,ku and τh,k
v were randomly selected in the interval
[0.02,0.04] and [0.2,0.4] respectively. The overall experiment took roughly 20 minutes. 77
LIST OF FIGURES xiii
3.14 Effect of the freeing of the hip on the exploration of the 2-DOF configuration. Value
landscape obtained during a systematic exploration of the knee parameter space after
its release when the system was in a stable oscillatory state in a 1-DOF configuration.
The hip oscillator was initialized with τhu = 0.054, τh
v = 0.65, which corresponds to
a high-yield 1-DOF configuration. The parameter space was discretized in a 15x15
sampling and the figure is a linear approximation of the resulting values. Brighter
colors denote higher-yield settings. The experiment took about 150 minutes. . . . . . . 78
3.15 Large amplitude oscillations with a strong intersegmental coupling (ω p = 1.0) in the
independent 2DOF configuration when τhu = 0.055, τh
v = 0.65, τku = 0.025, τk
v = 0.35:
phase plots of the hip (left) and ankle (right) motions in the stationary regime. In
both graphs, the axes denote the horizontal coordinates of the hip (respectively ankle)
marker’s visual positions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.16 Toward a flexible 1-DOF system: Effect of an intermediate coupling (ωs = 0.50) be-
tween hip and knee on the value landscapes (left: hip parameter space; right: knee
space) uncovered by a single exploratory run in a 2-DOF configuration. In both graphs,
the size of a dot (a control setting visited by the exploratory process) is proportional
to the value v obtained for that particular control setting. Initially, τu and τv were ran-
domly selected in the interval [0.02,0.04] and [0.2,0.4] respectively. The exploratory
run took roughly 10 minutes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.1 Humanoid robot used in our experiments. . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2 Resonant oscillations for (τu = 0.065,τv = 0.6) without perturbations (top). Resulting
behavior under perturbations (bottom). In each graph, the time-series denote motor
impulses (bottom), ankle position (middle) and hip position (top). In this figure, as
well as all other similar figures in this chapter, the vertical axis is unlabelled, because
it depicts time-series of different scales and units, i.e., visual positions in pixels, motor
commands in radians. The horizontal line in the lower graph corresponds to the visual
position of the location after which the rubber band is extended. The horizontal axis
denotes time in milliseconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3 Schematics of the experimental system and neural control architecture. Joint synergy
is only activated in experiments involving coordinated 2-DOF control. . . . . . . . . . 90
4.4 Flow of the proposed experimental discussion with respect to both 1-DOF and 2-DOF
exploration (cf. Table 4.1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
LIST OF FIGURES xiv
4.5 Time-series of hip position (top) and ankle-hip phase plots (bottom) for ωhp = 0.25
(left) and ωhp = 4.0 (right). The oscillator time-constants are: τu = 0.035,τv = 0.65 in
both cases. In the upper row of plots, the vertical axis denotes the visual positions of
the ankle (left) and the hip (right). The horizontal axis denotes time in milli-seconds.
In the lower row of plots, both vertical and horizontal axes correspond to the visual
positions of the hip (left plot) and ankle (right plot) in pixels. . . . . . . . . . . . . . . 94
4.6 From top to bottom, time-series of hip and ankle positions, hip and knee motor com-
mands with the following parameters: τhu = 0.06, τh
v = 0.65, τku = 0.02, τk
v = 0.8 and
ωhp = 2.0. The horizontal axis denotes time in milliseconds. The system was manually
perturbed after about 37.5s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.7 From top to bottom, time-series of hip and ankle positions, hip and knee motor com-
mands with the following parameters: τhu = 0.06, τh
v = 0.65, τku = 0.025, τk
v = 0.35
and ωhp = 0.25. The horizontal axis denotes time in ms. The system was manually
perturbed at time 37s, 75s, 108s and 147s (vertical lines). . . . . . . . . . . . . . . . . 97
4.8 Co-existing regimes for ωs = 0.0 and τhu = 0.06,τh
v = 0.65,τku = 0.035,τk
v = 0.4 (top).
Unique in-phase oscillatory regime with ωs = 1.0 (bottom). In each graph, the time-
series denote hip and ankle positions, hip and knee motor commands (from top to
bottom). Right-hand windows are close-ups on the time-series. The horizontal axis
denotes time in msec. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.9 Results of the release of an additional degree of freedom after stabilization in a 1-
DOF configuration. Left: (τhu = 0.045,τh
v = 0.65) and (τku = 0.025,τk
v = 0.45). Right:
(τhu = 0.06,τh
v = 0.65) and (τku = 0.025,τk
v = 0.35). From top to bottom, the time-series
denote hip and ankle positions, hip and knee motor commands. The horizontal axis
denotes time in msec. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.10 Oscillatory behavior obtained during alternate freezing and freeing phases. Neural
parameters are unchanged and set to τhu = 0.06,τh
v = 0.65,τku = 0.03,τk
v = 0.325,ωhp =
0.5 and ωs = 0.5. From top to bottom, time-series denote hip and ankle positions, hip
and knee motor commands. The horizontal axis denotes time in milliseconds. . . . . . 100
4.11 Effect of alternate freeing and freezing of the knee. Neural parameters are unchanged
and set to τhu = 0.035,τh
v = 0.65,τku = 0.055,τk
v = 0.45,ωhp = 0.5 and ωs = 0.5. From top
to bottom, time-series denote hip and ankle positions, hip and knee motor commands.
Right-hand graphs are close-ups on the two different regimes. The horizontal axis
denotes time in milliseconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.1 Infant strapped in a Jolly Jumper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
LIST OF FIGURES xv
5.2 Left: Humanoid robot used in our experiments. Right: Schematic representation of the
robotic setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3 Left: Basic structure of the neuro-musculo-skeletal system. The arrows in the model
show the information flow. Right: Neural rhythm generator composed of six neural os-
cillators. The solid circles represent inhibitory, and the half-circles are excitatory con-
nections. Abbreviations: he=hip extensor, hf=hip flexor, ke=knee extensor, kf=knee
flexor, ae=ankle extensor, af=ankle flexor. Not shown are proprioceptive feedback
connections and tonic excitations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.4 Forced harmonic oscillations with ground contact (bouncing) in the absence of sensory
feedback (ωp = 0). Top: τu = 0.108,τv = 0.216 and τu = 0.140,τv = 0.280, bottom:
τu = 0.114,τv = 0.228 (phase plot on the right). In all graphs, the three curves represent
the vertical displacement of the ankle, knee and hip marker in cm. . . . . . . . . . . . 114
5.5 Forced harmonic oscillations with ground contact (bouncing) in presence of sensory
feedback (ωp > 0). Top row: ωp = 0.5,τu = 0.114,τv = 0.228, bottom row: ωp =
0.75,τu = 0.140,τv = 0.280. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.1 Metropolis-like exponential probability distribution. This figure exemplifies the effect
of the temperature on the probability to make an downhill move. See text for details. . 123
6.2 Pseudo-code of the exploration process. For explanations see text. . . . . . . . . . . . 124
6.3 Control scheme. The arrows in the model depict the information flow. . . . . . . . . . 126
6.4 Normalized value vs. time (min=0.0, max=1.0). Shown are the results for V = 1/(1 +
k Mp Tr). As can be seen from the graphs, in both cases, after 1000 “simulated” seconds
the value is already very high. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.5 Systematic exploration of the parameter space and resulting value landscape for Kd =
4.0. The white dots are the parameters explored during a value-based exploration. . . . 129
6.6 High-performance robot head used in our experiments. . . . . . . . . . . . . . . . . . 130
6.7 Qualitative comparison between dynamics of eye pan and neck pan degrees of freedom. 130
6.8 Time series of transient performance evaluation for eye pan degree of freedom. . . . . 131
6.9 Time series of right eye pan and neck pan degrees of freedom for V = k M p (see text).
The desired position (square wave of period 6sec) and the effective (measured posi-
tion) are superposed. The length of the series is T = 273sec (corresponding to 45
exploratory iterations). First column, first row: Complete time series of right eye pan.
Second column, first row: Complete time series of neck pan. Second and third row are
close-ups of beginning and end of the stochastic exploration of eye and neck, respectively. 134
LIST OF FIGURES xvi
6.10 Time series of right eye pan and neck pan degrees of freedom for V = k M p Tr (see
text). The desired position (square wave of period 6sec) and the position measured
via encoder are superposed. The length of the series is T = 439sec (corresponding to
73 exploratory iterations). First column, first row: Complete time series of right eye
pan. Second column, first row: Complete time series of neck pan. Second and third
row are close-ups of beginning and end of the stochastic exploration of eye and neck,
respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.1 Left: Basic manipulator geomtry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7.2 Shannon Entropy for different sensory channels measure in bits. Left: No sensory-
motor coordination. Right: Sensory-motor coordination (foveation on red objects). . . 142
7.3 Mutual information between receptors of the same sensory modality. Random actua-
tion on the left. Sensory-motor coordination on the right. . . . . . . . . . . . . . . . . 143
7.4 Cumulated stimulation of the R, G, and B-receptors. The sensory-motor coordinated
case in on the right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.1 Environmental setup. Object of different shape can be seen in the background. In a
typical experiment the robot started in one corner of the arena, and dependent on its in-
built reflexes, it tried to avoid obstacles, circled around them, or just tracked a moving
obstacle (the small cylinder in the front). Note that the omnidirectional camera on the
robot was not used for the experiments discussed here. . . . . . . . . . . . . . . . . . 149
8.2 Use of dimension reduction techniques, exemplified by the image data. (a) How the
robot perceives an object when approaching it (experiment 1, no sensory-motor coor-
dination). Moving forward, the image of a static object shifts to the periphery of the
visual field. (b) A contour plot of the image data displayed as a time series of the pixel
intensities. Vertical axis: pixel locations. Horizontal axis: time steps. The peripheral
shift shows up as an upward curving trace. (c) A 3D plot of (b) with pixel intensity
plotted along the vertical axis. Here the trace is visible as a trough cutting through a
landscape with a ridge on the right side. (d) A reconstruction of (c) based on the first
5 PCs, which explain 95% of the variance. (e) The same as (d) but based on average
factors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.3 Results of experiments 1-3 (no sensory-motor coordination). Left: experiment 1. Cen-
ter: experiment 2. Right: experiment 3. From top to bottom (and for all columns)
the vertical axes are H(λ), λmax, and Npc. In all graphs the horizontal axis denotes
time. The curves are the means from up to 15 experimental runs and the bars are the
associated 95% confidence limits around those means. For details refer to text. . . . . 153
LIST OF FIGURES xvii
8.4 Results of experiments 4 and 5 (sensory-motor coordination). Left: experiment 4.
Right: experiment 5. From top to bottom (and for all columns) the vertical axes are
H(λ), λmax, and Npc. The horizontal axis denotes time. For details refer to text. . . . . 154
9.1 (a) Bird’s eye view on the robot and its ecological niche. The trace depicts the path of
the robot during a typical experiment. (b) Schematic representation of the simulated
agent. The sensors have a position-dependent range: if rl is the length of the robot, the
range of d0, d1, d9, and d10 is 1.8rl, the one of d2 and d3 is 1.2rl, and the one of d4,
d5, d6, d7, and d8 is 0.6rl. (c) Extended Braitenberg Control Architecture: As shown,
four processes govern the agent’s behavior. . . . . . . . . . . . . . . . . . . . . . . . 159
9.2 Correlation matrix obtained from the pair-wise correlation of the distance sensors for
one particular experimental run during the behavioral state: (a) “exploring”, (b) “track-
ing”, (c)“circling.” The higher the correlation, the larger the size of the square. From
left to right the average correlation is: 0.011±0.004, 0.097±0.012, and 0.083±0.041,
where ± indicates the standard deviation. . . . . . . . . . . . . . . . . . . . . . . . . 162
9.3 Correlation matrix obtained from the pair-wise correlation of the red channels for one
particular experimental run during the behavioral state: (a) “exploring”, (b) “tracking”,
(c) “circling.” The higher the correlation, the larger the size of the square. From left
to right the average correlation is: 0.053± 0.023, 0.309± 0.042, and 0.166± 0.031,
where ± indicates the standard deviation. . . . . . . . . . . . . . . . . . . . . . . . . 162
9.4 Mutual information matrix obtained by estimating the mutual information between
pairs of proximity sensors in one particular experimental run during the behavioral
state: (a) ”exploring”, (b) ”tracking”, (c) ”circling”. The higher the mutual informa-
tion, the larger the size of the square. . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
9.5 Mutual information matrix obtained by estimating the mutual information between
pairs of red channels in one particular experimental run during the behavioral state:
(a) ”exploring”, (b) ”tracking”, (c) ”circling”. The higher the mutual information, the
larger the size of the square. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
9.6 (a) Plot of activation levels for the proximity sensors (1 to 12) for the three behavioral
states. (b) Plot of activation levels for the image sensors (1 to 24) for the three behav-
ioral states. The plots display the average computed over 16 experimental runs. The
bars denote the standard deviation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
9.7 Entropy of the effective red color averaged over all vertical slices. P1: exploring; P2:
tracking; P3: circling. The plot displays the average computed over 16 experimental
runs. The bars denote the standard deviation. . . . . . . . . . . . . . . . . . . . . . . 166
LIST OF FIGURES xviii
10.1 Seven chapters, seven case-studies. The labels denote one or two design principle(s)
the case-study intends to address. The numbers indicate the chapter. The picture is the
same of Chapter 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
List of Tables
1.1 Overview of design principles for developmental systems. . . . . . . . . . . . . . . . 17
2.1 Facets of development at a glance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2 Explicitely invoked developmental facet(s). NA = Not Available. . . . . . . . . . . . . 40
2.3 Representative examples of developmentally inspired robotics research. AVH = Active
Vision Head, UTH = Upper-Torso Humanoid, MR = Mobile Robot, HD = Humanoid,
HGS = Humanoid grasping system, UTH+MR = Upper-Torso Humanoid on Mobile
Platform, MR+AG = Mobile Robot equipped with Arm and Gripper, RS = Robotic
System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1 Synopsis of the control parameter settings used in Figure 4.4. . . . . . . . . . . . . . . 93
6.1 EPR = Eye-pan right; Mp0: initial normalized overshoot; Mp f : final normalized over-
shoot; Tr0: initial rise-time (in sec.); Tr f : final rise-time (in sec.). . . . . . . . . . . . . 129
6.2 EPR = Eye-pan right; Mp0: initial normalized overshoot; Mp f : final normalized over-
shoot; Tr0: initial rise-time (in sec.); Tr f : final rise-time (in sec.). . . . . . . . . . . . . 131
xix
Chapter 1
Introduction
The quest for artificial intelligence is the quest for human nature. Anonymous
And He breathed into his nostrils the breath of life; and man became a living soul. Genesis II,7
I propose to consider the question, “Can machines think?” This should begin with definitions of
the meaning of the terms “machine” and “think”. Turing (1950)
Despite being crude (the tortoises) conveyed the impression of having goals, independence, and
spontaneity. Walter (1950)
Any aspect of learning and any other characteristic of intelligence may - in principle - be described
so precisely as to be simulated through a machine. McCarthy (1956)
Can machines think? Can they autonomously acquire novel skills? And then, what is the role played
by development? How does intelligent behavior emerge from the interaction between a developing or-
ganism and its environment? Can an artificial being through self-directed exploratory activity discover
interesting and unexpected strategies to exploit the interaction of body, control, and environmental
structure? It is undeniable that these are truly difficult questions.
The core speculation of this thesis is that the recent convergence of developmental sciences, em-
bodied artificial intelligence, and robotics not only gives rise to a prolific approach to seek novel an-
swers to such old issues, but also constitutes a cornerstone of a developmental theory for designing
and constructing intelligent adaptive systems. By uniting psychologists, neuroscientists, engineers,
and computer scientists in the quest for understanding intelligence, and synthesizing intelligent behav-
ior, developmental robotics (as the methodology will be referred to in this thesis) together with other
similar approaches, also paves the way to a novel and interdisciplinary style of conducting research in
which robots are perceived as means to achieve an end (understanding principles underlying intelligent
behavior), and not merely as an end unto themselves (what is typically done in robotics). In other
words, as the case-studies presented in this thesis document, development can inspire the construc-
1
1.1. Historical perspective and paradigm shift 2
tion of robots, and – conversely – robots can be used as tools to model aspects of development (see
Fig. 1.2). Concerning the latter point, it is important to note that in contrast to living beings, robots
have the methodological advantage that their internal states are accessible and can be recorded for sub-
sequent analysis. Moreover, it is possible to simply build various assumptions into the system (e.g.,
lesions), and perform tests without having to worry about ethical issues.
Unlike previous approaches to the synthesis of intelligent behavior (see following section), devel-
opmental robotics – borrowing directly from one of the basic tenets of embodied artificial intelligence
– holds that a system’s control structure cannot be decoupled from the body, and from the system’s
interaction with the local environment. Yet, developmental robotics, as asserted in this thesis, goes one
step further. The main intuition resides in the realization that embedding the reciprocal and dynamic
coupling of the three aforementioned factors in a developmental framework simplifies the emergence
of stable behavioral patterns, and provides the system with adaptivity and robustness against changes
of body and environment. The developmental part purports to emphasize the importance of the in-
teraction between experience and maturation in shaping the emergence and development of cognitive
structure, motor skills, and behavior. Whereas experience typically pertains to the permanent effects
of environmental conditions, task requirements, and learning, maturation refers to physical changes
of control and body morphology. It follows that the couplings between all these factors need to be
adequately taken into account and integrated into the design process (Fig. 1.1).
This thesis documents a series of developmental robotics case-studies in which the synergetic in-
teraction of control, body, and environment is explored, quantified, and purposively exploited. Based
on those case-studies, a set of novel, computational, and integrative design principles is abstracted.
It is our strong conviction that the experimental validation and quantification of the proposed design
principles may represent a founding block of a developmental theory of artificial systems, which could
have a big impact not only on developmental robotics, but also on other related fields of research.
1.1 Historical perspective and paradigm shift
As the epigraphs at the beginning of this chapter document, for a long time, people have been romanced
by the idea of constructing intelligent machines and replicating intelligent behavior displayed by hu-
mans and animals. Traditional Jewish mysticism includes tales of the Golem, a thinking automaton
made from the sticky clay of the bank of the river Moldau. In the 17th century, philosopher Gottfried
Wilhelm von Leibnitz outlined plans for a thinking machine by conceiving an artificial universal lan-
guage composed of symbols, which could stand for objects or concepts, and logical rules for their
manipulation. Alan Turing devised a much-discussed imitation game used as a yardstick or assessing
if a machine is intelligent or not that since then has been known as the Turing Test for artificial intel-
ligence (Turing, 1950). Gray Walter constructed the tortoises Elmer and Elsie that displayed tropism
1.1. Historical perspective and paradigm shift 3
neural systembrain
sensory feedback(exteroceptive)
(elasticity
, complia
nce ...)
mechanical feedback
actio
ns
mechanical systemmusculo−skeletal system
(proprioceptive)
sensory feedbackm
otor signals
Environment
Control structure
Body
Development
experience
maturation
maturationexperience
Figure 1.1: Coupling between body, control structure, and environment embedded in a de-velopmental framework. Shown is the information flow, e.g., the neural system affects themusculo-skeletal apparatus via motor signals, and conversely proprioceptive sensory infor-mation indicating the current state of the musculo-skeletal system is fed back to the neuralsystem. Similarly, information flows back and forth between body and environment, and fromthe environment to the control structure. It follows that these three factors cannot be consid-ered in isolation.
and reactive behaviors (Walter, 1950, 1951).
While the advent of the computer in the fifties of last century did not change the dreams and ambi-
tions of people, it made artificial intelligence a reasonable possibility. Thereafter, numerous research
groups around the world have been engaged in the construction of artificial systems with the professed
goal of emulating, equaling, or even surpassing all of our mental and physical abilities. In particu-
lar, classical “Good Old Fashioned Artificial Intelligence” (GOFAI) research has attempted (in vain)
to synthesize intelligence or higher cognition, by formalizing knowledge and crystallizing cognitive
principles mostly obtained from the study of adult human beings. It was hoped that a powerful “logic
system” combined to a massive database of common knowledge could be constructed for general prob-
lem solving (the essence of intelligence). One of the most unfortunate consequences of this attempt to
construct artificial intelligence has been the tacit acceptance of a strong and explicit separation between
cognitive language-like data structures (symbols and representations), the mechanisms that operate on
these structures (algorithms, search procedures), and the machine used to implement that program
(hardware).
1.2. Embodiment and its implications 4
This research effort has undergone a very significant paradigm shift triggered almost twenty years
ago, when some researchers realized that the shortcomings of the good old fashioned approach had
nothing to do with the relative paucity of the knowledge the systems explicitly encoded. Rather, they
thought that these shortcomings could be attributed to the lack of a fluid coupling between the system
and a real-world environment posing real-world problems of sensing and acting. Concepts, such as
situatedness (that is, the fact that embodied beings sense and act in a real physical environment) and
embodiment came to the forefront and spawned some of the most exciting and groundbreaking work
in the contemporary study of natural and artificial intelligence.
1.2 Embodiment and its implications
Embodied artificial intelligence is an increasingly popular research paradigm that studies intelligence
and intelligence-like processes by putting a strong emphasis on the dynamical and reciprocal interac-
tion across multiple time scales between brain and body of an agent, and its environment. Its method-
ology is synthetic and does not represent conventional science, but rather a fine blend of science and
engineering. That is, the aim is to understand the nature of adaptive intelligence by “building” robust
artificial systems. The adoption of such a ”synthetic methodology” leads, surprisingly quickly, to a
radical rethinking of many of the old and comfortable ideas about the nature of intelligence.
Embodied artificial intelligence incorporates explicitly aspects of body morphology, motor activ-
ity, and interaction with the local environment in its theoretical framework. Embodiment has proven
to be an essential characteristics of adaptive systems whose importance can hardly be overemphasized.
The coupling between body, brain, and environment implies that an embodied agent is continuously
exposed to a stream of sensory stimulation, to physical forces (e.g., gravity), to energy dissipation,
to wear and tear, and to damage. Long and short-term influences of the environment on the agent’s
brain and body constitute a physical implication of embodiment. It is important to understand that em-
bodiment has not only a physical implication, but an information-theoretic one as well. An embodied
agent does not passively absorb information from its surrounding environment, but due to its particular
morphological setup, and through its actions on the environment, it actively structures, selects, and
exploits such information. That is, an embodied system, by being naturally coupled to the environment
through sensory-motor interaction can shape its own sensory experience, and the quality of the sensory
data relayed to its control architecture (e.g., brain).
1.3 The importance of development
Another assumption of the cognitivistic research paradigm was the neglect of ontogenetic development
by marginalizing it, and relegating it to the role of an in principle necessary, but all in all dispensable
1.3. The importance of development 5
transient. However, is it possible to create artificial cognition without resorting to developmental mech-
anisms? Do intelligent autonomous systems really need to undergo an initial phase of development?
And, how central is the role played by developmental processes in the emergence of cognition?
These and other questions, led an increasing number of researchers of AI, robotics, and autonomous
systems design to diverge from this non-developmental approach by rejecting its nativist flavor. Indeed,
making a fully equipped intelligent and complete adult robot might either involve too much work, or
be beyond our intellectual and technological capabilities. It could turn out that any adaptive artificial
creature needs to be, if not born, at least the beneficiary of a longish period of infancy. It is therefore not
surprising that development is turning into one of the core issues in the ongoing endeavor of creating
intelligent systems. Their “developmental” control architectures are worked up by starting from real
neuroscientific and developmental data. It is relied more on getting such systems to learn and develop
by themselves, or – by pushing the designer commitments even further back – to mimic genetic mod-
ifications and to evolve generations of progressively more refined artificial systems that once evolved,
develop and learn through interaction with the environment.
It is reasonable to assume that it might be vastly easier to engineer or “artificially evolve” an ini-
tially primitive and simplistic infant robot that then could be let mature and develop, more or less the
way we all do. Further, the mere observation that in contrast to artificial systems, almost all biolog-
ical systems – to different extents at least – mature and develop, bears the compelling message that
development may be one of the main reason why the robustness, adaptivity, and versatility of organic
compounds still transcend the one of artificial systems. In humans, for instance, adult skills do not
spring up fully formed at birth, but emerge over a prolonged period of time by learning, and by ex-
periencing the rough-and-tumble environment of the real world in which each individual acquires its
own history (Thelen, 1999). Further, the state of immaturity of sensory, motor, and cognitive systems,
a salient characteristic of development that at first sight appears to be an inadequacy and of which
artificial systems are deliberately devoid, rather than being a problem might be of advantage. There
is evidence showing that early morphological and cognitive limitations effectively decrease the “in-
formation overload” (at a perceptual, motor, and cognitive level) that would otherwise most certainly
overwhelm infants, and may lead – according to a theoretical position pioneered by Turkewitz and
Kenny (1982) – to an increase of the adaptivity of the organism.
It follows that the study of the mechanisms underlying development might yield the key to a deeper
understanding of intelligent systems. There are a number of studies attempting to elaborate such mech-
anisms using connectionist models, such as the one described in the seminal book by Elman et al.
(1996), the study by Dominguez and Jacobs (2003), or the one by Sirois and Mareshal (2002). These
models are “disembodied” as they do not take into account any form of interaction between brain,
body, and real world. It has become increasingly clear, however, that in order to understand (percep-
tual, motor, cognitive) development and the emergence of cognition, it is not possible to entirely bypass
1.4. Developmental robotics: the short version 6
embodiment, that is, the continuous and mutual interaction of brain, body, and environment across mul-
tiple time scales (Eliot, 2001; Goldfield, 1995; Piaget, 1953; Thelen and Smith, 1994). Developmental
robotics strives to fill in this gap.
1.4 Developmental robotics: the short version
Developmental robotics is clearly an intellectual offshoot of embodied artificial intelligence, and as
such incorporates ideas from an equally wide range of disciplines: robotics, artificial intelligence,
developmental psychology, developmental neuroscience, cognitive science, and biology. Probably, a
good way to understand an interdisciplinary science is through its central aims, and its core assump-
tions. However, how are we to go about it in the case of developmental robotics? What are its central
aims? The following section will suggest an answer to the first question. Here, we give two possible
answers to the second one (Fig. 1.2):
• Developmental robotics aims at understanding the development of cognitive processes in natural
and artificial systems, and how such processes emerge and develop through the fluid interplay
of brain, body, and local environment (Fig.1.1). Robots are used as research tools to instantiate
or validate developmental models of cognition and action. By taking into account the embodied
nature of intelligence new hypotheses about natural phenomena are put forward, and predictions
made.
• Developmental robotics aims at conceiving a coherent set of principles to facilitate the design
and construction of intelligent systems. Such principles will eventually lead to a general theory
for developmental systems (Table 1.1).
First, it is important to note that these two goals (one analytic, and one synthetic) are coupled by
a mutually causal relationship. In fact, an understanding of cognition may be tightly linked to the
ability of engineering autonomous intelligent machines. In a sense, this is the essence of the synthetic
methodology (“understanding by building”). Further, it is important to note that developmental robotics
does not aim at mimicking or imitating nature, but only at taking inspiration from it, and at promoting
intuition. As already pointed out, development provides us with a strategy to tackle old issues in novel
ways. No organic lineage, for instance, has been able to avail itself of the possibility of inheriting
acquired characteristics to its offspring – an evolutionary hypothesis known as Lamarckian evolution.
From the absence of examples of Lamarckian evolution in nature it is not possible to deduce, however,
that it cannot be employed for constructing robots and other artificial creatures. Rather, the opposite
may be the case. Engineering artificial creatures by means of a developmental approach may indeed
involve a series of iteration-production cycles conceptually similar to Lamarckian evolution in which
1.5. Design principles of developmental robotics 7
robot technology
synthesis of intelligent systems
IntelligenceArtificial
EmbodiedRobotics
natu
ral p
heno
men
a
new h
ypot
hese
s ab
out
desig
n pr
incip
les
for
inte
lligen
t sys
tem
s
RoboticsDevelopmental
DevelopmentalSciences
inspirations
intuitions and modelling
tools
Figure 1.2: Interaction between developmental sciences, embodied artificial intelligence, androbotics.
newborn agents are initialized with knowledge and control structure acquired by individuals of previous
generations (see Dennett, 1998, for a similar point).
1.5 Design principles of developmental robotics
Is there a theory of developmental robotics? To date there is no definitive answer to this question.
However, en route to such a theory, it is possible to point out a set of principles (or guidelines) aimed at
capturing design ideas and heuristics in a concise and pertinent way, and which could be employed
for actually designing and constructing intelligent autonomous systems. Indeed, courtesy of their
constructive nature, such design principles represent tangible examples – the essence, one might argue
– of the synthetic methodology. A further advantage of such a principled approach stems from the fact
that a set of principles is a flexible entity amenable to extensions, patches, and changes. The idea is
to carefully observe complex systems (natural and artificial) and to seek generic principles of adaptive
behavior based on the assumption that some of those principles might be at work in other systems or at
other levels as well. By devising experimental scenarios to quantify and possibly validate the proposed
principles one is forced to think about the interaction between them, and hence about the interaction
between various aspects of intelligent behavior.
In the field of embodied artificial intelligence, a coherent set of design principles for intelligent
1.5. Design principles of developmental robotics 8
systems has already been proposed by Pfeifer (1996), and was thoroughly discussed in (Pfeifer and
Scheier, 1999). Although all these principles are significant in a way or another for the developmen-
tally inspired design and construction of robots, to design developmental agents a number of additional,
more specific issues need to be addressed. In this sense, the set of design principles for developmen-
tal systems subsequently brought forward does not represent a mere subset, but an extension of the
previously proposed design principles for intelligent systems.
An overview of the proposed principles is given in Table 1.1. In some respects, the table formalizes
in an extremely compact form, a significant part of the insights of the very rich literature of various
fields that is relevant for the design of intelligent developmental systems. It is important to note that
the principles have been deliberately stated in a general way, to help us keep the grand scheme in mind
and not get bogged down in details. Each of the principles can of course be spelled out more explicitly,
and this, in fact, is done in each chapter of the thesis.
1.5.1 The principle of cheap design
This principle asserts that the design of a developmental agent must be parsimonious, and must exploit
the physics of the system-environment interaction, as well as the constraints of the agent’s ecological
niche.
Parsimony (or simplicity) is a general modeling principle (known also as Occam’s razor 1) which
admonishes the designer to choose from a set of otherwise equivalent explanations or models of a
given phenomenon the one that makes less assumptions. Its logical implication being that there is less
chance of introducing inconsistencies, ambiguities and redundancies in the design. In this sense, design
of autonomous agents should rely more on exploiting the idiosyncrasies of the system-environment
interaction, on the proper choice of materials and morphology (spatial arrangement and properties of
sensors and effectors), as well as on emergence, but less on computation. For an in depth discussion of
the “principle of cheap design” and many examples, see (Pfeifer and Scheier, 1999, ch.13).
Chapters 3, 4, and 5 provide good illustration of this principle. These chapters document the
properties of physical entrainment (mutual and rhythmic regulation of the intrinsic dynamics of the
body, and the environment) and of neural entrainment (body-mediated regulation of neural and envi-
ronmental dynamics). Entrainment is a particular form of emergent system-environment coupling that
if adequately exploited can simplify control, and improve stability of a system.
1.5.2 The principle of ecological balance
This principle states that the agent’s complexity (in this case: its behavioral diversity) has to match the
complexity of the environment as measured by the agent’s sensory apparatus; further, given a certain
1“One should not increase, beyond what is necessary, the number of entities required to explain anything.”
1.5. Design principles of developmental robotics 9
task environment, a balance is required between the complexity of the sensor, motor, and control system.
Here, the word complexity is used in its intuitive connotation, that is, as the number of components
that can be independently varied in an agent’s sensory, motor, and control system. Such components
are also referred to as degrees of freedom associated with a particular system. For example, a humanoid
robot with 40 mechanical degrees of freedom is mechanically more complex than a two-wheeled mo-
bile robot. For a set of less intuitive descriptions of complexity see (Gell-Mann, 1995), for instance.
The principle also asserts that given a particular task environment, there is a sort of natural point of
equilibrium or balance between the agent’s control structure, the material properties of the agent’s
body, and its morphology (i.e., the agent’s structural characteristics, its sensory-motor setup – accu-
racy, distribution, resolution of actuators and sensors, and so forth). Again, for a thorough discussion
and many instantiations of this principle, refer to (Pfeifer and Scheier, 1999, ch.13).
One of the main difficulties with this principle is its qualitative nature (see also Ishiguro et al.,
2003). First steps in the direction of quantifying the complexity of the agent-environment interaction
(such as perceived through the agent’s sensors) are exemplified in chapters 7, 8, and 9.
1.5.3 The value principle
This principle states that for a developmental process to take place and for an autonomous agent to
behave adaptively in the real world, a set of mechanisms for self-supervised learning, and a repertoire
of basic motivations and values must be provided that shape the development of the agent’s control and
bodily structure.
Value systems clearly satisfy this requirement. They not only modulate learning (via neural or
hormonal signals, for instance), but they also mediate neural plasticity in a self-supervised and self-
organized manner. Their output informs the agent whether an action was good or bad, and depending
on the result, the probability of that action being repeated in the future is either increased or decreased.
Thus, value systems play a pivotal role in adaptive behavior. For more details on value systems and
their relevance for natural and artificial systems, please refer to Chapter 2, and to (Pfeifer and Scheier,
1999, ch.14).
This principle is also about motivation, that is, why behavior happens in the first place. Indeed,
motivation can be thought of as the major driving force of behavior. It seems that to date no gener-
ally accepted answers to this question have been conceived. Research on motivation and emotion is
highly relevant in this context, because emotions – like values – play a primary causal role in per-
ception and action, and in shaping experience (Breazeal, 2002; Manzotti, 2000; Pfeifer, 2000; Picard,
1997). In human infants, for instance, emotions have been hypothesized to protect the integrity of the
body, to guide perception, activity, and learning, and to regulate social interaction with other agents or
people (Trevarthen, 1993).
1.5. Design principles of developmental robotics 10
Further examples of this principle are given in chapters 3, 4, and 6. In these chapters the exploration
of the parameter space associated with the neural system is driven by a value system. This principle is
strongly tied to the “principle of exploratory activity.”
1.5.4 The principle of design for emergence
This principle asserts that the agent should not be completely designed, but rather should be endowed
with the ability to self-direct the exploration of its own sensory-motor capabilities, and with means to
escape its limited in-built behavioral repertoire, and to acquire its own history.
One of the basic tenets of developmental robotics is that the designer should not try to “code intel-
ligence” directly into the artificial system – in general an extremely hard problem. Instead, the system
should be equipped with an appropriate set of basic mechanisms or features to autonomously develop,
learn, and behave in a way that appears intelligent to an external observer. Agent-related features (pa-
rameters) in this case can be anatomical (e.g., body, materials, characteristics of the sensors) as well as
related to control (e.g., neural). Clearly, it is not trivial to decide which features and mechanisms have
to be innately fixed at the outset, and which one should be learned or trained up by the interaction of
the system with its local environment. This principle asserts that by relying more on emergence, the
choice of the ensemble of basic skills and mechanisms is not as important as generally thought.
On the contrary, it is more important “not” to completely specify at the outset every aspect of the
agent’s design, but rather to endow the agent (a) with a minimal set of mechanisms to self-direct the
exploration of its own sensory-motor capabilities, and (b) with means to escape its limited built-in
behavioral repertoire, and to acquire its own personal history. In other words, the designer should
design for emergence. This means (by definition) that it will not be possible to predict the system’s
behavior through analysis at any level simpler than the system as a whole. One of the main advantages
of systems designed for emergence – in contrast to systems in which emergence is not possible – is
that they tend to be more adaptive and robust against environmental perturbations and changes (such
as growth, or task modifications). It is important to note that here “emergence of behavior” has a
pragmatic connotation, that is, in the sense of not being pre-programmed. Thus, the final (emergent)
structure is the result of the history of the interaction of the agent with the – simulated or real world –
environment.
The emergence of structured patterns or global order from local interactions between the com-
ponents of a system without the need of explicit instructions, is a characteristic feature of self-
organization. The process of self-organization can lead either to permanent changes in the system
(self-organization with structural changes), or induce reversible formation of patterns (self-organization
without structural changes). The latter form of self-organization is frequently found in collective phe-
nomena (Pfeifer and Scheier, 1999). Finally, we note that emergence is always the result of a system-
1.5. Design principles of developmental robotics 11
environment interaction, and therefore a matter of degree. This means that behaviors are typically
neither completely emergent nor completely preprogrammed. The more removed from the actual
behavior the designer commitments are made, the more the resulting behavior is called “emergent.”
This principle is related to the “principle of self-organization” discussed in (Pfeifer and Scheier, 1999,
ch.14).
As in the case of the “principle of cheap design”, chapters 3, 4, and 5 provide good illustrations of
the principle discussed in this subsection. As exemplified by those chapters, entrainment can cause in
certain instances coordinated movements to emerge from interaction of control structure, body struc-
ture, and surrounding local environment (e.g., Kelso, 1995; Taga, 1991). Chapter 3 gives also concrete
evidence for abrupt phase transitions from a stable pattern to another one. It suffices to note here
that such phase transitions are a typical property of emergent design and are often observed in natural
systems.
1.5.5 The time scales integration principle
This principle states that when designing a developmental agent, a number of different time scales
exist that have to be taken into account; developmental and learning mechanisms must be conceived
to achieve a smooth integration of those time scales.
Neural dynamics; body dynamics; learning through trial-and-error, reinforcement, and observa-
tion; development of brain and body; evolutionary adaptations, and other dynamic processes and com-
ponents contributing to behavior are all characterized by different time scales. Neural dynamics, for
instance, is based on neural activity and transient (short-term) synaptic changes necessary for perceiv-
ing and acting, and on permanent (long-term) changes resulting from learning. Behavior, however,
seems to be flexibly self-organized at all these time scales (e.g., Goldfield, 1995; Kelso, 1995; Thelen
and Smith, 1994). Clearly, in a sufficiently complex system there is no a characteristic “global” time
scale, but all processes and sub-processes proceed on a wide range of time scales, and all time scales
are integrated and continuously meshed together. It is therefore important not to gloss over the link-
ages between the various time scales when designing or constructing a behaving system. For example,
we can define the oscillation frequency of a neural rhythm generator, but as soon as the generator is
coupled to the mechanical system through afferent sensory feedback, the oscillation frequency changes
– refer to (Grillner, 1985), or to Chapter 3 for concrete examples of this phenomenon. And conversely,
we can fix the mass of a limb and theoretically predict its pendular oscillation frequency, however, as
soon as we couple it to a neural system, we change its natural (intrinsic) oscillation frequency. From
this example, it is also possible to imply that some time scales are under conscious decision control
whereas others are not.
Sometimes, time scales have to be chosen carefully. Take motor skill acquisition, for instance. It
1.5. Design principles of developmental robotics 12
has been demonstrated that initially, while learning a new skill or movement, the peripheral degrees of
freedom (the ones further from the trunk, such as wrist and ankle) are reduced to a minimum through
tight joint coupling (freezing of degrees of freedom). Subsequently, the restrictions at the periphery are
gradually weakened so that more complex movement patterns can be explored (freeing of degrees of
freedom). The designer of a developmental system is forced to address issues such as at what point in
time the system should freeze, and when it should unfreeze in order to simplify motor skill acquisition.
Morphological changes (here: freezing and unfreezing of degrees of freedom) are a form of plastic
mechanism, and as for any mechanism of plasticity they have their own characteristic time scale. But
then, how should the choice of various time constants be made?
This principle is addressed in chapters 3, 4, and 5, which discuss the interaction between various
types of dynamics having different time scales.
1.5.6 The starting simple principle
This principle asserts that a gradual and well-balanced increase of both the agent’s internal complexity
(perceptual and motor) and its external complexity (regulated by the task environment or an instructor)
speeds up the learning of tasks and the acquisition of new skills, compared to an agent that is complex
from the onset. Also, the mechanisms by which the agent’s complexity can be successively increased
and integrated with its neural and morphological dynamics need to be specified.
The rationale of this principle is that the co-development of an agent’s sensory, motor, and control
structures, and of its environmental setup, while not necessarily leading to an optimal task performance,
guarantees a more efficient exploration of the agent’s sensory and motor space. For example, the
initial immaturity and wide spacing of photoreceptors in the infant retina, as well as limitations on the
accommodative system, significantly limit what the infant sees. The specific effect is to filter out high
spatial frequency information, and to make objects that are close to the infant most salient. It is not
unreasonable to assume, however, that such limitations may facilitate, for instance, the learning about
size constancy (see Turkewitz and Kenny, 1982). More generally, early morphological constraints and
cognitive limitations can lead to an increase of the adaptivity of a developing system. That is, the
immaturity of sensory, motor, and cognitive systems, which at first sight appear to be an inadequacy,
are in fact of advantage, because they effectively decrease or eliminate the “information overload”
that otherwise would most certainly overwhelm the infant. Following similar lines of argumentation,
several other researchers have suggested that the processing limitations of young learners, originating
from the immaturity of the neural system, can actually be beneficial for the acquisition new skills,
and the learning of tasks (Bjorklund and Green, 1992; Dominguez and Jacobs, 2003; Newport, 1990;
Elman, 1993; Westermann, 2000).
The primary difficulty with this principle is the lack of flexibility it affords. This deficiency gives
1.5. Design principles of developmental robotics 13
rise to the constraint-flexibility dilemma: The more constrained a system is, the less flexible it be-
comes. Indeed, on the one side, the introduction of constraints reduces the number of parameters of
a learning problem, or the space of possible limb configurations, thus, speeding up learning; on the
other side, such constraints may preclude the system to explore potentially interesting parameter sets,
or movement patterns.
This design principle is related to the “principle of ecological balance”, and to the “principle of
cheap design.” It pertains to the principle of ecological balance in the sense that if the system starts
simple (i.e., its internal and external complexity are kept low), it is probably easier to make sure that
the system is ecologically balanced at all instants in time during its developmental history. It relates to
the principle of cheap design because it strives for simplicity and parsimony. However, it is important
to note that this principle is not propose a minimalist approach. Starting simple does by no means
imply that the system needs to be trivial. The system needs to be as simple as possible, but not simpler.
Chapters 3 and 4 explicitly address the “starting simple principle.” More specifically, the two chap-
ters document a comparative analysis between the outright use of four out of four degrees of freedom,
and the progressive involvement of all degrees of freedom by using a developmental mechanism of
freezing and freeing (two degrees of freedom are initially blocked, and subsequently released), such
as hypothesized by Bernstein (1967). The results show that in case of reduced task complexity, such
mechanism might be indeed simplify emergence of stable movement patterns.
1.5.7 The principle of information self-structuring
This principle asserts that an embodied agent is not passively exposed to information from its surround-
ing environment, but due to its particular morphology, and through its actions on the environment, it
actively structures, selects, and exploits such information; it is crucial to take this characteristic into
consideration at design time. By information self-structuring we mean that the agent has an active role
in shaping its own sensory input through self-produced movements.
The first important, information-theoretic, implication of this principle is that embodiment (i.e.,
sensory morphology, embodied interaction, and so on) directly affects the information-processing ca-
pacities of the agent’s control system, in the sense that it allows the agent to generate constraints in
its sensory input. Indeed, appropriate morphological constraints and self-produced coordinated move-
ments can induce spatio-temporal patterns in the raw (unprocessed) sensory input, and generate “good”
data that are easier to process, simplifying the control problem faced by the agent. The choice of a par-
ticular sensory morphology, for instance, has been shown to improve the learning performance and the
adaptability of a neural network controller embedded in a robot (Lichtensteiger and Pfeifer, 2002). Fur-
ther, it has been hypothesized that specific ways of interacting with the environment induce constraints
in the agent’s sensory input that can be exploited for learning (Scheier et al., 1998; Lungarella and
1.5. Design principles of developmental robotics 14
Pfeifer, 2001). In humans, experiments by Harman et al. (1999) have shown that the active manipula-
tion of objects by adult subjects can promote perceptual learning and object recognition. Mapped onto a
neural context this means that it is easier for neural circuits to exploit sensory data having informational
regularities, and to stabilize neural connections that incorporate recurrent statistical features (Sporns
and Pegors, 2004). It is therefore plausible to assume that to simplify neural computations, natural
systems are optimized, at evolutionary, developmental and behavioral time scales, to structure their
sensory input through self-produced coordinated motor activity. This characteristic of neural systems
has to be taken into account when designing artificial systems.
There is a second, equally important implication of this principle: The embodiment of an agent
generates over time, correlations and redundancies across multiple sensory modalities, which may not
only lead to a disambiguation of the sensory input and to a reduction of the effective dimensionality of
the sensory space, but could also be exploited to bootstrap concept formation, categorization, and other
high-level cognitive processes (Thelen and Smith, 1994). From a design point of view it is crucial to
note that along with the interaction, the location of the agent’s sensors also imposes constraints on the
sensory input, and that sensors should be positioned so as to provide redundant information about the
world.
This principle is addressed in chapters 7, 8, and 9, and in (Lungarella and Sporns, 2004), where it
is fleshed out with concrete examples. Methods for quantifying the informational structure of sensory
and motor data are also presented.
1.5.8 The principle of exploratory activity
This principle states that exploratory activity is a fundamental active process by which an embodied
agent collects information for learning about its own body, and for mastering the interaction with its
surrounding environment. It is thus necessary to equip the agent with a set of mechanisms to perform
exploratory actions, and for evaluating the results at its disposal.
In the context of this principle, we make the distinction between two types of exploratory activities,
or exploration strategies: (a) external exploration, that is, a set of active processes by which an agent
gathers information for learning about the surrounding world; and (b) internal exploration (or self-
exploration), that is, a set of active processes by which an agent gathers information for learning about
its own body dynamics (see also Chapter 2). The first type of exploratory activity is oriented toward
the maximization of task-relevant sensory information. For example, hand-eye coordination begins
to develop between two and four months, inaugurating a period of trial-and-error practice at sighting
objects and grabbing them. At four months most infants can grasp an object that is within reach,
looking only at the object and not at their hands. Such results is truly impressive, and – as asserted by
the principle of exploratory activity – the result of a unceasing exploratory actions performed by the
1.5. Design principles of developmental robotics 15
infant. The second type of exploration strategy is oriented at maximizing body-relevant information
and movement variety. For example, by exploring force and timing combinations, and by integrating
various kinds of environmental information, a multitude of patterns of inter-segmental coordination can
be learned. Newborn infants, for instance, have been observed to spend up to 20% of their waking hours
contacting their face with their hands (Korner and Kraemer, 1972). In analogy to vocal babbling, this
experiential process has been called circular reaction (Piaget, 1953), “motor babbling” (Bullock et al.,
1993), or “body babbling” Meltzoff and Moore (1997). Essentially, it represents an exploratory process
of perception-action coupling during which body-related (proprioceptive) information generated by
perceiving and acting on objects and environment becomes correlated. The result is a unique mapping
between limb movements and limb configurations. Other examples of exploratory processes are the
rhythmical stereotypies displayed by infants (kicking, waving, banging, bouncing) and spontaneous
movement activity (Angulo-Kinzler, 2001; Thelen and Smith, 1994).
It is important to note that such exploratory processes include varying levels of attention (see also
Adolph et al., 2000). Thus, an agent should be endowed with adequate mechanisms that depending
on the state of its attentional system would allow it to select different kinds of exploratory behaviors.
On one extreme of inattentiveness are spontaneous wiggles, trashes, and movement stereotypies which
produce sensory information related to body and posture, that is, position of limbs and body relative to
gravity and the supporting surface. In a sense, almost all sensory-motor coordinated movements belong
to this category. On the other extreme of focused attention are concerted, directed movements whose
purpose is to generate and garner additional information about possibilities for action, or properties of
objects. Situated somewhere in the middle of this continuum are causal exploratory scans (e.g., visual
exploration while walking) and information gathering movements which are the byproduct of another
ongoing action (the walking movements themselves generate visual flow, vestibular, and kinesthetic
information about the status of the body relative to the environment, for instance).
From a design perspective it is useful to identify the spectrum of possible types of exploratory
activity. That is, given a certain task environment, what is the best way for the agent to explore its
parameter space? Plausibly, pure random exploration is inefficient, and in the long term exhaustive.
Conversely, pure systematic exploration requires too much time, and the system may end up exploring
uninteresting areas of its parameter space. The “value principle” comes to rescue. Random exploration
coupled to a value system evaluating the outcome of a particular action may indeed provide a satisfying
path.
The primary difficulty with this principle is the exploration-exploitation dilemma which every
learning system has to cope with. That is, the dilemma between exploring the space of possible pa-
rameters (e.g., weights of an artificial neural network, time constants of a neural oscillator, and so on),
while simultaneously exploiting the good parameter configurations that exploration has already uncov-
ered. This dilemma is also known as the stability-plasticity dilemma or diversity-compliance trade-off.
1.5. Design principles of developmental robotics 16
It refers to a trade-off between a conservative aspect that exploits (complies to) the givens (compliance
with rules), and one that is responsible for generating the diversity required to remain adaptive. In other
words, there is always a trade-off between generating new solutions, being flexible and innovative, and
complying with the existing rules, exploiting was is already known.
The thesis highlights the importance of exploratory activity from the perspective of dynamical
systems (chapters 3, 4, and 5), and from an information-theoretic and statistical point of view (chap-
ters 7, 8, and 9). An in-depth view of these principles is provided in Chapter 2 and in Chapter 6, where
we first further motivate its necessity for intelligent adaptive behavior, and then present a value-based
stochastic exploration scheme.
1.5.9 The principle of social interaction
This principle states that when designing a developmental agent, it is important to think about potential
social interactions of the agent, and about mechanisms that can be exploited to implement socially
mediated learning; these mechanisms have to be taken into account at design time.
It is well known that human infants are endowed from a very early age with the necessary means
to engage in simple, but nevertheless crucial social interactions, e.g., they show preferences for hu-
man smell, human faces and speech (Johnson, 1997; Nadel and Butterworth, 1999), and they imitate
protruding tongues, smiles, and other facial expressions (Meltzoff and Moore, 1977). Indeed, social
interaction bears many potential advantages: (a) It helps structure the agent’s environment simplifying
and speeding up the learning of tasks and the acquisition of new skills; (b) it shapes its developmental
trajectory and epigenetic landscape, and increases the agent’s behavioral diversity. Scaffolding by a
more capable agent or caregiver, or imitation of a peer, for instance, can reduce distractions and bias
explorative behaviors toward important environmental stimuli. The caregiver can also increase or de-
crease the complexity of the task. A particularly important type of interaction is scaffolding. Typically,
it is employed to shape and guide the development of infants. Scaffolding is a supportive framework,
usually provided by a more capable agent (e.g., an adult), that enables a less capable (infant) agent to
perform activities of which it may not be capable on his own until somewhat later. For example, infants
demonstrate the ability to walk if they are supported in the right way long before their leg muscles have
developed sufficient strength to hold them up (Thelen, 1981). The scaffolding continuously pushes the
infant agent a little beyond its current capabilities, and pushes it in the direction in which its “caregiver”
wishes it to go.
In essence, this principle copes with the question of how to prepare (engineer) an agent’s local envi-
ronment so that the agent can acquire new and progressively more complex skills over time. Although,
we acknowledge the crucial importance of social interaction for the emergence and development of
cognitive structure in man and machines (see Chapter 2 for detailed discussion of the issue), in the
1.5. Design principles of developmental robotics 17
context of this thesis, we deliberatively chose to avoid touching socio-historical aspects of develop-
ment.
1.5.10 Discussion
Although it is undeniably true that the proposed set of principles does capture some of the essential
aspects of adaptive, developmental systems, it is also most likely the case that this set is neither close
nor complete. Further, it is important to emphasize that despite being bolstered by empirical evidence,
these principles are still to be regarded as working hypotheses on the nature of developmental systems.
In fact, it remains to be seen whether they can stand up against further (possibly harder) empirical
testing. While being subjected to such tests, the proposed principles of design should also enable the
generation of testable hypotheses (see Fig. 1.2). In other words, they should not only be effective as
design heuristics, but they should also help us pose interesting questions. Moreover, because one of
the purposes of these principles is to appropriately characterize intelligent developmental systems, they
should allows us to make predictions, and suggest new experiments to perform with natural systems
(e.g., animals and humans), as well as with artificial ones.
It is essential that the design principles we have formulated not be looked at in isolation. Their
real power comes from their interdependencies. All the principles, in fact, are connected because they
all pertain to developmental agents embedded in their task environments. Some principles are more
closely related than others, however. In what follows a few examples of dependencies are given. Let
us start with the value principle which states, among other things, that for a developmental process
to take place, and for an agent to behave adaptively, a repertoire of basic values and motivations of
sorts must be provided by the designer. The idea of providing value is to increase, at a later point
in time, the probability that the organism will behave in a certain way if a similar situation occurs.
Whereas motivation drives behavior, values serve as implicit or explicit evaluators of behavior. Such
values represent clearly a necessary condition for social interaction, and can be exploited for socially
mediated learning. An agent, for instance, could be endowed with the means to discern a smile from
a non-smile, and thus be able to visually acquire positive or negative feedback from a caretaker or a
tutor. Basic motivations and values constitute also the engine of an agent’s exploratory activity (as long
as such activity is not completely random or systematic), and of its behavioral diversity. Self-directed
activity, albeit not necessarily oriented toward a functional goal, is also likely to induce spatial and
temporal structure across different sensory modalities. By adequately exploiting the self-structuring of
information and recurrent patterns in the system-environment interaction, it might be possible to start
simpler, and to reduce the complexity of the control structure. Besides to the starting simple principle,
this is also clearly related to the principles of cheap design, and ecological balance.
As can be inferred from this short discussion, every principle is affected simultaneously by other
1.5. Design principles of developmental robotics 18
Principle Synopsis
Cheap design The design of a developmental agent must be parsimonious, andmust exploit the physics of the system-environment interaction, aswell as the constraints of the agent’s ecological nice.
Ecological balance The agent’s complexity must match the complexity of the environ-ment as measured by the agent’s sensors; further a balance isrequired between the complexity of motor, sensory, and controlsystem.
Value For a developmental process to take place and for an agent tobehave adaptively in the real world, a set of mechanisms for self-supervised learning, and a repertoire of innate values and motiva-tions must be provided that direct the development of the agent’scontrol and bodily structure.
Design for emergence The agent should not be completely designed, but rather shouldbe endowed with the ability to self-direct the exploration of its ownsensory-motor capabilities, and with means to escape its limitedbuilt-in behavioral repertoire, and to acquire its own history.
Time scale integration When designing a developmental agent, a number of differenttime scales exist that have to be taken into account; develop-mental and learning mechanisms must be conceived to achievea smooth integration of those time scales.
Starting simple A gradual and well-balanced increase of the agent’s internalcomplexity and of its external complexity speeds up learning oftasks and acquisition of new skills; the mechanisms by which theagent’s internal and external complexity can be successively in-creased and integrated with its neural and morphological dynam-ics need to be specified.
Information self-structuring An embodied agent does not passively absorb information fromits surrounding environment, but due to its particular morphology,and through its actions on the environment, it is able to activelystructure, select, and exploit such information; this characteristichas be taken into account at design time.
Exploratory activity Exploratory activity is a fundamental process by which an agentcollects information for learning about its own body, and controlstructure, and for mastering the interaction with its surroundingenvironment. Thus, it is necessary to equip the agent with a setof mechanisms to perform exploratory actions, and for evaluatingand exploiting the results at its disposal.
Social interaction When designing a developmental agent, it is important to thinkabout potential social interactions of the agent, and about mecha-nisms that can be exploited to implement socially mediated learn-ing; these mechanisms have to be taken into account at designtime.
Table 1.1: Overview of design principles for developmental systems.
1.6. Contributions of the thesis 19
ones. Such couplings makes their investigation challenging. The aim of this thesis is to substantiate
the proposed principles and their mutual interdependencies, and to flesh them out by means of a series
of experimental case-studies (see Fig. 10.1).
Figure 1.3: Experimental variety: seven chapters, seven case-studies. The labels denote oneor two design principle(s) the case-study is mainly intended to address. The numbers indicatethe chapter in which the case-study is presented.
1.6 Contributions of the thesis
Adopting a developmental approach leads to a novel perspective on the design and construction of
robots, and to a promising methodology for modeling many of the processes and mechanisms under-
lying development. This thesis advances the nascent field of developmental robotics in several ways:
• it proposes a set of design principles for developmental systems which may constitute an ad-
equate “take-off platform” for a developmental theory of embodied artificial intelligence (this
1.6. Contributions of the thesis 20
chapter);
• it gives a state-of-the-art survey of the emergent field of developmental robotics (Fig. 1.2), and
points out ten aspects (“facets”) of biological development that should be addressed to advance
the field (Chapter 2);
• it shows how robots can successfully be employed to confirm hypotheses or observations made
in developmental or movement science (chapters 3, 4, and 5);
• it shows how initially freezing and subsequently freeing degrees of freedom can increase (a) the
likelihood of physical entrainment, (b) the range of parameters that lead to stable behavior, (c)
the robustness of the system against external perturbations, and (d) the speed and efficiency of
the exploration of the sensory-motor space (Chapter 3);
• it shows – confirming recent observations made in movement science – that if the complexity
of the task is increased a single phase of freezing and freezing is not sufficient, and alternate
freezing and freeing is necessary (Chapter 4);
• it quantitatively investigates the role of the coupling (a) between joints, and (b) between the sen-
sory apparatus and the neural structure for the acquisition of rhythmic motor skills (chapters 3, 4,
and 5);
• it proposes two value-based exploration schemes, and shows how they can be put to work to
explore the parameter space of a robot’s control system (chapters 3 and 6);
• it provides quantitative support for the assertion that embodied systems are not passively exposed
to sensory information, but courtesy of their morphology can self-structure such information
(chapters 7, 8, and 9).
• it presents initial analyses demonstrating how simple sensory-motor functions like gaze direction
and foveation can generate informational structure (in this case, mutual information) in the visual
channel (Chapter 7);
• it shows how information theoretic and statistical measures can be used to quantify (a) the
amount of informational structure induced by sensory-motor coordination (chapters 7 and 9),
and (b) the agent-environment interaction (chapters 8 and 9);
Chapter 2
Developmental Robotics: The Long
Version1
Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce
one which simulates the child’s? Bit by bit one would be able to allow the machine to make more
and more “choices” or “decisions.” One would eventually find it possible to program it so as to
make its behaviour the result of a comparatively small number of general principles. When these
became sufficiently general, interference would no longer be necessary, and the machine would
have “grown up.” (Turing, 1948)
2.1 Synopsis
Developmental robotics is an emerging field located at the intersection of robotics, embodied artificial
intelligence, and developmental science. This chapter elucidates the main reasons and key motivations
behind the convergence of fields with seemingly disparate interests, and shows why developmental
robotics might prove to be beneficial for all fields involved. The advocated methodology is synthetic
and two-pronged: on the one hand it employs robots to instantiate models originating from develop-
mental sciences; on the other hand, by exploiting insights gained from studies on ontogenetic develop-
ment, it aims at developing better robotic systems. This chapter gives a survey of the relevant research
issues, and points to some future research directions.
1Appeared as Lungarella, M., Metta, G., Pfeifer, R. and Sandini, G. Developmental robotics: a survey. Connection Sci-ence, 15(4), pp.151-190, 2004.
21
2.2. Introduction 22
2.2 Introduction
Developmental robotics is an emergent area of research at the intersection of robotics and developmen-
tal sciences – in particular developmental psychology and developmental neuroscience. It constitutes
an interdisciplinary and two-pronged approach to robotics, which on one side employs robots to instan-
tiate and investigate models originating from developmental sciences, and on the other side seeks to
design better robotic systems by applying insights gained from studies on ontogenetic development.2
Judging from the number of recent and forthcoming conferences, symposia, and journal special
issues, it is evident that there is growing interest for developmental robotics: Workshop on Emergence
and Development of Embodied Cognition (Berthouze et al., 1999; Pfeifer and Lungarella, 2001), work-
shop on Epigenetic Robotics (Balkenius et al., 2001; Prince et al., 2002, 2003; Berthouze and Metta,
2004), workshop on Developmental Embodied Cognition (Westermann et al., 2001), workshop and
conferences on Development and Learning (Weng, 2000; Elman et al., 2002; Triesch and Jebara, 2004),
special issue of Adaptive Behavior on “plastic mechanisms, multiple timescales and lifetime adapta-
tion” (Di Paolo, 2002), the one on “Epigenetic Robotics” (Prince and Demiris, 2003), and a special
issue of Connection Science (Berthouze and Ziemke, 2003). There are at least two distinct driving
forces behind the growth of the alliance between developmental psychology and robotics:
• Engineers are seeking novel methodologies oriented toward the advancement of robotics, and the
construction of better, that is, more autonomous, adaptable and sociable robotic systems. In that
sense, studies of cognitive development can be used as a valuable source of inspiration (Brooks
et al., 1998; Metta, 2000; Asada et al., 2001).
• Robots can be employed as research tools for the investigation of embodied models of develop-
ment. Neuroscientists, developmental psychologists, and also engineers, may gain considerable
insights from trying to embed a particular model into robots. This approach is also known as
synthetic neural modeling, or synthetic methodology (Pfeifer and Scheier, 1999; Pfeifer, 2002;
Reeke et al., 1990; Sandini, 1997; Sporns, 2003).
The research methodology advocated by developmental robotics is very similar to the one sup-
ported by epigenetic robotics. The two research endeavors not only share problems and challenges
but are also driven by a common vision. From a methodological point of view both partake of a
biomimetic approach to robotics known as biorobotics, which resides at the interface of robotics and
biology. Biorobotics addresses biological questions by building physical models of animals, and strives
to advance engineering by integrating aspects of animal sensory systems, biomechanics and motor con-
trol into the construction of robotic systems (Beer et al., 1998; Lambrinos et al., 1997; Sharkey, 2003;
2Ontogenetic development designates a process during which an organism develops from a single cell into its adult form.
2.3. In the beginning there was the body 23
Webb, 2001). There is, however, at least one important difference of emphasis between epigenetic
robotics and developmental robotics: While the former focuses primarily on cognitive and social de-
velopment (Zlatev and Balkenius, 2001), as well as on sensory-motor environmental interaction (Prince
and Demiris, 2003), the latter encompasses a broader spectrum of issues, by investigating also the ac-
quisition of motor skills, and the role played by morphological development. In the context of this
review, the difference will not be stressed any further.
The primary goal of this chapter is to present an overview of the state of the art of developmental
robotics (and hence of epigenetic robotics), and to motivate the usage of robots as “cognitive” or
“synthetic’ tools”, that is, as research tools to study and model the emergence and development of
cognition and action. From a methodological point of view, this review is not intended to be critical.
Developmental robotics is still in its infancy, and an indication of the pros and cons of specific pieces of
research may be premature. We hope, however, that the review will offer new perspectives on certain
issues and point out areas in need of further research. The secondary goal is to uncover the driving
forces behind the growth of developmental robotics as a research area, and to expose its hopefully
far-reaching implications for the design and construction of robotic systems. We advocate the idea that
ontogenetic development should not only be a source of inspiration but also a design alternative for
roboticists, as well as a new and powerful tool for cognitive scientists.
In the following section, we make an attempt to trace back the origins of developmental robotics,
which we believe are to be found in the rejection of the cognitivistic paradigm by many scholars of
artificial intelligence. Next, we present our working definition of ontogenetic development, and sum-
marize some of its key aspects. In the following sections, we give an overview of the various current
and past research directions (including motivations and goals), show who is doing or has been doing
what and to what purpose, and discuss the implications of the developmental approach for robotics
research. In the final section, we point to future research directions and conclude.
2.3 In the beginning there was the body
In an ever-growing number of fields there is an ongoing and intense debate about the usefulness of
taking into account ideas of embodiment, i.e., the claim that having a body, which mediates perception
and affects behavior, plays an integral role in the emergence of human cognition. Scholars of arti-
ficial intelligence, artificial life, robotics, developmental psychology, neuroscience, philosophy, and
other disciplines, seem to agree on the fact that brain, body and environment are reciprocally coupled,
and that cognitive processes arise from having a body with specific perceptual and motor capabilities,
interacting with and moving in the real world (Beer et al., 1998; Brooks, 1991; Clark, 1997; Hendriks-
Jensen, 1996; Lakoff and Johnson, 1999; Pfeifer and Scheier, 1999; Sporns, 2003; Thelen and Smith,
1994; Varela et al., 1991). This paradigm stands in stark contrast to the mind-as-computer metaphor
2.3. In the beginning there was the body 24
Figure 2.1: Examples of robots used in developmental robotics. From left to right, top tobottom: BabyBot (LiraLab), BabyBouncer (AIST), Infanoid (CRL), COG (MIT).
advocated by traditional cognitive science, according to which the body is seen as an output device that
merely executes commands generated by a rule-based manipulation of symbols, which are associated
with an internal representation of the world (Fodor, 1981; Newell and Simon, 1976). Perception is
largely seen as a means of creating an internal representation of the world rich enough to allow rea-
soning and cognizing to be conceptualized as a process of symbol manipulation (computer program),
which can take place entirely in the mind. One of the most unfortunate consequences of the mind-as-
computer metaphor for cognitive science and artificial intelligence in general, and for developmental
psychology and robotic research in particular, has been the tacit acceptance of a strong separation be-
2.3. In the beginning there was the body 25
tween cognitive structure (i.e., symbols and representations), the software operating on that structure
(i.e., mechanisms of attention, decision making and reasoning), and the hardware on which to im-
plement the software (Bates and Elman, 2002; Brooks, 1991; Pfeifer and Scheier, 1999; Thelen and
Smith, 1994). Another assumption of the cognitivistic research paradigm was a denial of the impor-
tance of ontogenetic development by rationalists-nativists (Chomsky, 1986; Keil, 1981). In the field of
language acquisition, for instance, Chomsky theorized that all languages derive from a universal gram-
mar, somehow encoded in our genome. The purpose of development and learning was to merely fine
tune some parameters to a specific language. The same cognitivistic approach also hypothesized accu-
rate, symbol-based representations of the real world (Newell, 1990), as well as task-specific models of
information processing and reasoning (Pylyshyn, 1984).
Out of dissatisfaction with the direction (cognitive) psychology was heading to, and to overcome
the limitations inherent to the rather artificial distinction of the developmental phenomena into domain-
specific competencies and modules (Fodor, 1983), Masao Toda proposed the study of “Fungus Eaters”,
i.e., simple but nevertheless complete and autonomous creatures endowed with everything needed to
behave in the real world (Toda, 1982). Around the same time Braitenberg (1984) defined the “law
of uphill analysis and downhill synthesis” 3 and argued for the introduction of a novel methodology
in psychology, which he called “synthetic psychology.” Two similar approaches followed: “synthetic
neural modeling” (Reeke et al., 1990), which attempts to correlate neural and behavioral events taking
place at multiple levels of organization, and the synthetic methodology (Pfeifer and Scheier, 1999), a
wider term that embraces the whole family of synthetic approaches. The shared common goal of syn-
thetic approaches is to seek an understanding of cognitive phenomena by building physical models of
the system under study. Typically they are applied in a bottom-up way: Initially, a simple system (e.g.,
with a small number of sensors) is built and explored, then its complexity is successively increased
(e.g., by adding sensors) if required to achieve a desired behavior.
The extension of the synthetic methodology to include development is a conceptually small step.
First, development is a process during which changes in all domains of function and behavior occur
from simple to complex (see Section 2.4.1). Therefore it is reasonable to assume that its key aspects
can be captured by means of a bottom-up synthetic approach. Second, cognitive development cannot
be isolated from the body in which it is instantiated, and from the real world in which it is embedded,
and with which the body physically interacts. As a matter of fact, the traditional approach (based
on the computer metaphor) has ultimately failed to address the intimate linkage between brain, body
and environment, and to study behavioral and neural changes typical of ontogenetic development that
are important for the emergence of cognition. The construction of an artificial system through the
3Also known as the law of uphill analysis and downhill invention. This law suggests that the synthesis (construction)of something new is easier than the analysis of something that already exists. We contend, however, that the definition of acomprehensive set of quantitative design principles or – even better – of a theory of synthesis for behaving system is a muchharder problem.
2.3. In the beginning there was the body 26
application of a “developmental synthetic methodology”, however, is not straightforward. An adequate
research methodology as well as a good set of design principles supporting such a methodology are
still open research issues. One possible reason being the difficulty in disentangling the complex notion
of development itself, which is – as we will show in the following section – multifaceted, non-linear,
complex, and yet to be fully understood.
The central tenet of embodied cognition is that cognitive and behavioral processes emerge from
the reciprocal and dynamic coupling between brain, body and environment. Since its inception, this
view has spawned paradigm changes in several fields, which in turn have influenced the way we think
about the role of embodiment for the emergence of cognition. Ballard (1991), for instance, introduced
the concept of animate or active vision, which states – roughly speaking – that visual processes can be
simplified if visual sensing is appropriately intertwined with acting and moving in the world (see also
Churchland et al., 1994). By employing active vision, problems such as figure/ground segmentation
or estimation of shape from shading, become well-conditioned. The paradigm change expresses how
action and motor control contribute to the improvement of perceptual abilities. Biological systems are
not passively exposed to sensory input but instead interact actively with their surrounding environment.
Thus accordingly, the “Holy Grail of Artificial Intelligence”, and that is, a computerized general vision
system, has to be viewed as strictly dependent on the availability of a controllable body coupled to a
less controllable world. In a similar vein, Brooks (1991) showed that behavior does not necessarily
have to rely on accurate models of the environment, but rather might be the result of the interaction of
a simple system with a complex world. In other words, there is no need to build enduring, full-scale
internal models of the world, because the environment can be probed and reprobed as needed. More
recently, Pfeifer and Scheier (1994) argued that a better global understanding of the perception-action
cycle might be required – contrary to our intuition.4 The authors proposed an alternative view that
breaking up perception, computation, and action into different subsystems might be too strong of a
commitment. In other words, the minimal unit of processing should be a complete perception-action
cycle. Neurophysiology too contributed to the paradigm change. Emblematic was the discovery of vi-
sually responsive motor neurons supporting the hypothesis of an intimate coupling between vision and
action in the definition of higher cognitive abilities, such as object and action recognition (Di Pellegrino
et al., 1992; Gallese et al., 1996). Fascinating, along the same line of research, is also the link between
action and language proposed by Rizzolatti and Arbib (1998), who argued that the visuo-motor neu-
rons found in the area F5 of monkeys are most probably the natural homologue of the Broca’s area of
humans.4This point was also very strongly made by Dewey over 100 years ago (Dewey, 1896).
2.4. Facets of development 27
2.4 Facets of development
Ontogenetic development is commonly seen as a process of change whereby appropriate biological
structure and skills emerge anew in an organism through a complex, variable, and constructive interplay
between endogenous and environmental factors (Johnson, 1997). Unlike development and maturation,
which involve species-typical growth, and changes at the level of cell, tissue and body, learning is
experience-dependent, and is often characterized by a relatively permanent change of behavior result-
ing from exercise and practice (e.g., Chec and Martin, 2002). The debate nowadays gravitates around
the precise nature of the interaction between learning and development. There are at least three leading
views. The first one is closest to the one of Piaget, and sees learning capitalizing on the achievements
of development (Piaget, 1953). The interaction is unidirectional, and learning cannot occur unless a
certain level of development has been achieved. The second view is bidirectional and states that learn-
ing and development are mutually coupled, in the sense that the developmental process enables, limits,
or even triggers learning, and learning in turn advances development (Kuhl, 2000). The third, more
radical view, which accommodates continuity and change under the theoretical umbrella of dynamic
systems theory (Thelen and Smith, 1994), suggests to erase the boundaries between development and
learning altogether, while considering dynamics at all levels of organization (molecular, cellular, struc-
tural, functional, and so on). We do not take any position in this or other debates. We are convinced,
however, that using robots as tools for scientific investigation might provide a route to disentangle open
issues – such as the nature of the interaction between development and learning. An additional advan-
tage of the proposed methodology is that we can simply build various assumptions into the system and
perform tests as we like – no ethical issues involved. The latter point is perhaps a less obvious, but
equally important justification for this area of research. It is relatively straightforward, for instance,
to build pathological conditions into a robot’s sensory, motor and neural systems (e.g., by lesioning
or augmenting its sensory-motor apparatus). Thus, robotic models cannot only help elucidate princi-
ples underlying normal (healthy) development, but they may as well provide insight into disease and
dysfunction of brain, body, and behavioral processes.
In the remainder of this section, we review several important facets (components) of ontogenetic
development, and give pointers to some of the pertinent literature. The reader should bear in mind that
we do not intend to give an exhaustive account of biological development. Both, our choice of what to
include and what to discard is therefore limited and biased by our beliefs of what is deemed important
and what is not. However, we do intend to convey the message that these seemingly disparate facets of
development are closely intertwined, and that – if taken into account during the design and construction
of artificial systems – they can represent a valuable source of inspiration. We also point out that many of
these aspects can and should be conceptualized as principles for the design of intelligent developmental
systems. A set of generalized principles for agent design can be found in (Pfeifer, 1996) and in (Pfeifer
2.4. Facets of development 28
and Scheier, 1999). For quick reference, we summarized the list of facets in Table 2.1.
Facet Synopsis References
Incremental process prior structures and functions are nec-essary to bootstrap later structures andfunctions
Piaget (1953), Thelen and Smith(1994)
Importance of con-straints
early constraints can lead to an increaseof the adaptivity of a developing organ-ism
Bushnell and Boudreau(1993), Elman (1993), Hendriks-Jensen (1996), Turkewitz andKenny (1982)
Self-organizing process development and learning are not deter-mined by innate mechanisms alone Goldfield (1995), Kelso
(1995), Thelen and Smith(1994)
Degrees of freedom constraining the movement space maybe benefical for the emergence of well-coordinated and precise movements
Bernstein (1967), Goldfield(1995), Sporns and Edelman(1993)
Self-exploration self-acquired control of body dynamicsAngulo-Kinzler (2001), Goldfield(1995), Thelen and Smith(1994)
Spontaneous activity spontaneous exploratory movementsare important precursors of motorcontrol in early infancy
Piek (2002), Prechtl (1997)
Prospective control,early abilities
predictive control is a basic early compe-tency on top of which human cognition isbuilt
Meltzoff and Moore(1997), Spelke (2000), Von Hof-sten et al. (1998)
Categorization, sensori-motor coordination
categorization is a fundamental ability,and can be conceptualized as a senso-rimotor interaction with the environment
Edelman (1987), Thelen andSmith (1994)
Value systems value systems mediate environmentalsaliency and modulate learning in a self-supervised and self-organized manner
Sporns and Alexander (2002)
Social interaction interaction with adults and peers are veryimportant for cognitive development Baron-Cohen (1995), Meltzoff
and Moore (1977), Vygotsky(1962)
Table 2.1: Facets of development at a glance.
2.4.1 Development is an incremental process
By assuming a certain level of abstraction, development in virtually any domain (e.g., nervous sys-
tem, motor apparatus, cognition) can be described as a sequence of stages through which the infant
advances. Indeed, the idea that development may be an incremental process is not novel, and has been
2.4. Facets of development 29
already proposed by Jean Piaget in his theory of stages of cognitive development more than 50 years
ago (e.g Piaget, 1953), as well as by Eleanor Gibson, who suggested to decompose infant exploration
into three distinct phases (Gibson, 1988). The apparent stage-like nature of development, however,
does by no means imply stable underlying processes, characterized by a well-ordered, discontinuous,
and incremental unfolding of clearly defined stages (as suggested by Piaget). Thelen and Smith (1994)
give evidence for the opposite: depending on the level of observation, development is messy, and fluid,
full of instabilities, and non-linearities, and may even occur with regressions. There may be rapid
spurts, such as the onset of babbling, as well as more protracted, and gradual changes, such as the
acquisition of postural control. Various systems (for example, the perceptual and the motor system) do
not even change at the same rate. This list of properties is a clear indication of how challenging the
study of developmental changes is. An additional difficulty arises from the fact that those changes are
both qualitative (e.g., transition from crawling to walking), and quantitative (e.g., increase of muscle-
fat ratio). Another important characteristic of the developmental progression is that later structures
build up on less complete and less efficient prior structures and their behavioral expression. In other
words, the former structures provide a background of subskills and knowledge that can be re-used
by the latter. The mastery of reaching, for instance, requires an adequate gaze and head control, and
a stable trunk support. The latter being even more important for fine manipulation (Bertenthal and
Von Hofsten, 1998). Finally, we point out the absence a central executive behind this developmental
progression. In other words, development is largely decentralized, and exhibits the properties of a
self-organizing system (see Sec. 2.4.3).
2.4.2 Development as a set of constraints
The notion of initial constraints or of “brake” on development is often invoked in order to explain de-
velopmental trajectories (Bushnell and Boudreau, 1993; Harris, 1983). Examples of constraints present
at birth in many vertebrate species (e.g., rats, cats, humans) are the limitations of the organism’s ner-
vous system (such as neural connectivity and number of neuronal cells), and of its sensory and motor
apparata (such as reduced visual acuity, and low muscle strength). Because each developmental step
somehow establishes the boundary conditions for the next one, a particular ability cannot emerge if any
of the capacities it entails is lacking. Thus, particular constraints can act (metaphorically speaking) as a
brake on development. These rate-limiting factors (as they are also called sometimes) are not necessar-
ily a bad thing. Turkewitz and Kenny (1982) pioneered the theoretical position that early morphological
limitations and constraints, can lead to an increase of the adaptivity of a developing organism (see also
Bjorklund and Green, 1992). That is, the immaturity of the sensory and the motor system, which at first
sight appears to be an inadequacy, is of advantage, because it effectively decreases or eliminates the
“information overload” that otherwise would most certainly overwhelm the infant. According to this
2.4. Facets of development 30
hypothesis, the limited acuity of vision, contrast sensitivity, and color perception of neonates (Slater
and Johnson, 1997, p. 126) may actually improve their perceptual efficiency by reducing the com-
plexity of the environmental information impinging on their visual system (for additional examples,
see Hendriks-Jensen, 1996). Following similar lines of argumentation, several other researchers have
suggested that processing limitations of young learners, originating from the immaturity of the neural
system, can actually be beneficial for learning itself (Dominguez and Jacobs, 2003; Elman, 1993; New-
port, 1990; Westermann, 2000). In other words, constraints can be interpreted as particular instances
of “ontogenetic adaptations”, that is, unique adaptations to the environment throughout development,
which effectively simplify the world, and hence facilitate learning (Bjorklund and Green, 1992).
2.4.3 Development as a self-organizing process
A fundamental characteristic of self-organization is that structured patterns or global order can emerge
from local interactions between the components constituting a system, without the need for explicit
instructions or a central program (see also Sec. 2.4.1). In this sense, development largely unfolds in a
self-organized fashion. The earliest actions of human infants, for instance, are spontaneous and exhibit
the typical properties of a self-organizing system (Goldfield, 1995; Sporns and Edelman, 1993; Thelen
and Smith, 1994). A growing body of evidence has shown that the control of movements of particu-
lar (exploratory) actions is not determined by innate mechanisms alone, but rather, emerges from the
dynamics of a sufficiently complex action system interacting with its surrounding environment (Bern-
stein, 1967; Goldfield, 1995; Kelso and Kay, 1987; Taga, 1991, 1995). In other words, the dynamics
of the interaction of infants and their surroundings modulates the ever-changing landscape of their ex-
ploratory activities. The intrinsic tendency to coordination or pattern formation between brain, body,
and environment, is often referred to as entrainment, or intrinsic dynamics (Kelso, 1995). Gentaro
Taga, for instance, was able to show that rhythmic movements (in his case: walking) can emerge from
what he called a “global entrainment” among the activity of the neural system, the musculo-skeletal
system, and the surrounding environment (Taga, 1991). Another vivid illustration of a dynamically
self-organized activity was provided by Thelen (1981). She found that the trajectory and the cyclic
rhythmicity of kicks displayed by human infants and the intrinsic timing of the movement phases was
the “result of cooperative (and local) interactions of the neuro-skeletal muscular system within partic-
ular energetic and environmental constraints” (Thelen and Smith, 1994, p. 79).
Processes of self-organization and pattern formation are not confined to the learning and the de-
velopment of movements but are an essential feature of biological systems at any level of organiza-
tion (Kelso, 1995). Iverson and Thelen (1999), for instance, invoked entrainment, and other principles
of dynamic coordination typical of self-organized behavior, to explain the developmental origins of
gestures that accompany the expression of language in speech; Edelman (1987) hypothesized that
2.4. Facets of development 31
perceptual categorization – one of the primitives of mental life – arises autonomously through self-
organization; and finally, even the amazing complexity of the brain has been proposed to be the result
of a process of self-organized ontogenesis (Von der Malsburg, 2003).
2.4.4 Degrees of freedom and motor activity
Perhaps not surprisingly, movements of infants lack control and coordination compared with those of
adults. The coordination of movements (in particular in humans) is very poor at birth and undergoes a
gradual maturation over an extended period of postnatal life. Examples of this developmental progres-
sion are crawling (Adolph et al., 1998), walking with support (Haehl et al., 2000), walking (Thelen and
Smith, 1994, p. 71), reaching and grasping (Streri, 1993). Despite the fact that the musculo-skeletal
apparatus is a highly non-linear system, with a large number of biomechanical and muscular degrees
of freedom,5 and in spite of the potential redundancy of those degrees of freedom in many move-
ment tasks (i.e., the activation of different muscle groups can lead to the same movement trajectory),
well-coordinated and precisely controlled movements emerge. The “degrees of freedom problem”,
first pointed out by Bernstein (1967), has recently attracted a lot of attention (Goldfield, 1995; Sporns
and Edelman, 1993; Vereijken et al., 1992; Zernicke and Schneider, 1993). A possible solution to the
control issues raised by the degrees of freedom problem, that is how – despite the complexity of the
neuro-musculo-skeletal system – stable and well-coordinated movements are produced, was suggested
by Bernstein himself. His proposal is characterized by three stages of change in the number of degrees
of freedom that takes place during motor skill acquisition: Initially, in learning a new skill or move-
ment, the peripheral degrees of freedom (the ones further from the trunk, such as wrist and ankle) are
reduced to a minimum through tight joint coupling (freezing of degrees of freedom). Subsequently,
the restrictions at the periphery are gradually weakened so that more complex movement patterns can
be explored (freeing of degrees of freedom). Eventually preferred patterns emerge that exploit reactive
phenomena (such as gravity and passive dynamics) so as to enhance efficiency of the movement. The
strong joint coupling of the first phase has been observed in spontaneous kicking in the first few months
of life (Thelen and Fischer, 1983), and is thought to allow infants to learn without the interference of
complex, uncoordinated motor patterns.
Recently, the straightforward, but rather narrow and unidirectional view of the nature of change
in the number of controlled degrees of freedom proposed by Bernstein has been contended – in adult
studies (Spencer and Thelen, 1999; Newell and Vaillancourt, 2001), as well as in infant studies (Haehl
et al., 2000). These recent observations seem to indicate that while according to Bernstein’s framework
biomechanical degrees of freedom only increase (as a consequence of practice and exercise), there can
5The space of possible motor activations is very large: “consider the 600 or so muscles in the human body as being, forextreme simplicity, either contracted or relaxed. This leads to 2600 possible motor activation patterns, more than the numberof atoms in the known universe” ( Wolpert et al., 2003).
2.4. Facets of development 32
be – depending on the task – an increase or decrease of the number of degrees of freedom. Despite
such counter evidence, Bernstein’s proposal bears at least two important messages, which fit very
nicely into the above discussion: (a) the presence of initial constraints that are gradually lifted, and
(b) the emergence of coordinated movements from a dynamic interaction (via external feedback and
forces) between the maturing organisms and the environment.
2.4.5 Self-exploratory activity
Scaffolding by parents and caretakers (see Sec. 2.4.10), as well as active exploration of objects and
events, have been acknowledged to be of crucial importance for the developing infant (Bushnell and
Boudreau, 1993; Gibson, 1988; Piaget, 1953; Rochat, 1989). Little attention, however, has been paid to
the understanding of what sort of information is available to infants as a result of their self-exploratory
acts. Self-exploration plays an important role in infancy, in that infants’ “sense of bodily self” to some
extent emerges from a systematic exploration of the perceptual consequences of their self-produced
actions (Rochat and Striano, 2000). The exploration of the infants’ own capacities is one of the primary
driving forces of development and change in behavior, and infants explore, discover, and select –
among all possible solutions – those that seem more adaptive and efficient (Angulo-Kinzler, 2001,
p. 363). Exploratory actions, traditionally thought to be actions focused on the external world, may
as well be focused on the infants own action system (Von Hofsten, 1993). Infants exploring their own
action system or their immediate surroundings have been observed to perform movements over and
over again (Piaget, 1953). Newborn infants, for instance, have been observed to spend up to 20% of
their waking hours contacting their face with their hands (Korner and Kraemer, 1972). In analogy to
vocal babbling, this experiential process has also been called “body babbling” (Meltzoff and Moore,
1997). By means of self-exploratory activities infants learn to control and exploit the dynamics of their
bodies (Goldfield, 1995; Smitsman and Schellingerhout, 2000; Thelen and Smith, 1994). The nature of
these dynamics differs from infant to infant (each infant has a unique set of abilities, muscle physiology,
fat distribution, and so on), and depends on the dynamics of the interaction with the environment,
which in turn varies from task to task. Self-exploration can also be conceptualized as a process of
soft-assembly, 6 i.e., a process of self-organization (see Sec. 2.4.3) during which new movements are
generated, and more effective ways of harnessing environmental forces are explored, discovered, and
selected (Goldfield, 1995; Schneider et al., 1990; Schneider and Zernicke, 1992).
6Soft-assembly refers to a self-organizing ability of biological systems to “freely” recruit the components (such as neu-rons, groups of neurons, and mechanical degrees of freedom) that are part of the system, yielding flexibility, variability androbustness against external perturbations (Clark, 1997; Goldfield, 1995; Thelen and Smith, 1994).
2.4. Facets of development 33
2.4.6 Spontaneous activity
Spontaneous movements have been recognized as important precursors to the development of motor
control in early infancy (Forssberg, 1999; Piek, 2002; Taga et al., 1999; Thelen, 1995). One of their
main functions is the exploration of various musculo-skeletal organizations, in the context of multiple
constraints, such as environment, task, architecture of the nervous system, muscle strength, masses of
the limbs and so forth (see Sec. 2.4.2 and Sec. 2.4.5). Well-coordinated movement patterns emerge
from spontaneous neural and motoric activity as infants learn to exploit the physical properties of
their bodies and of the environment. In fact, fetuses (as early as 8 to 10 weeks after conception), and
newborn infants display a large variety of transient and spontaneous motoric activity, such as general
movements 7 and rhythmical sucking movements (Prechtl, 1997), spontaneous arm movements (Piek
and Carman, 1994), stepping and kicking (Thelen and Smith, 1994). An interesting property of sponta-
neous movements is that although they are not linked to a specific, identifiable goal, they are not mere
random movements. Instead, they are organized, right from the early days of postnatal life, into rec-
ognizable forms. Spontaneous kicks in the first few months of life, for instance, are well-coordinated
movements characterized by a tight joint coupling between the hip, knee, and ankle joints (Thelen and
Fischer, 1983; Thelen and Smith, 1994), and by short phase lags between the joints (Piek, 2001, p.724).
As hypothesized by Sporns and Edelman (1993), spontaneous exploratory activity may also induce cor-
relations between certain populations of sensory and motor neurons, which are eventually selected as a
task is consistently accomplished or a goal attained. The same authors also proposed three concurrent
steps of how the development of sensory-motor coordination may proceed: (a) Spontaneous gener-
ation of a variety of movement patterns; (b) development of the ability to sense the consequence of
the self-produced movements; and (c) actual selection of a few movements. We note that the ultimate
“availability” of good sensory-motor patterns is connected to the degrees of freedom problem: The
latter can only be achieved if the range of in principle possible movements is constrained by initially
reducing the number of available degrees of freedom (see Sec. 2.4.4).
2.4.7 Anticipatory movements and early abilities
Throughout development infants acquire and refine the ability to predict the sensory consequences
of their actions and the behavior of external objects and events (e.g., the “when” and “where” of a
forthcoming manual interception of an object passing by). Optimally, this ability allows movements
to be adjusted prospectively rather than reactively in response to an unexpected perturbation (Adolph
et al., 2000; Von Hofsten, 1993). Two types of control strategies are employed to control anticipatory
7General movements represent one of the most important type of spontaneous movements which have been identified.They last from a few seconds to a several minutes, are caused endogenously by the nervous system, and in normal infantsinvolve the whole body.
2.4. Facets of development 34
movements: predictive and prospective control (e.g., Peper et al., 1994; Regan, 1997). In predictive
control the current perceptual information is used to predict the sensory activation at a future point
in time. Prospective control, on the other hand, relies on the sensory (or perceptual) information
associated to a particular action as the action unfolds over time, and is thus based on a close coupling
between information and movement.
Predictive and prospective control are in place already early in development. Infants as young as
one month, for instance, are able to compensate for head movements with zero lag between head and
eye movements (Bertenthal and Von Hofsten, 1998; Von Hofsten et al., 1998). Predictive control is
important because of the intrinsic time-delays of the sensory-motor system (visual feedback can take
up to 150 msec to be processed by the cortex, for instance). An example where infants make use of
prediction is gaze following. During gaze control there are at least two situations during which predic-
tive control is important: For the prediction of the motion of visual targets, and for the prediction of
the consequences of relative movements between body parts (e.g., movement of the head with respect
to the eyes).
Prediction clearly supports the idea that the brain forms so-called “internal forward models” –
instances of internal models, which have been hypothesized to exist in the cerebellum (Miall et al.,
1993), and whose biological and behavioral relevance has been confirmed by recent experiments (e.g.,
Mussa-Ivaldi, 1999; Wolpert et al., 2001). Forward models are ‘neural simulators’ of the musculo-
skeletal system and the environment (Clark and Grush, 1999; Grush, 2004; Wolpert et al., 2003), and
thus allow predicting the future state of the system given the present state and a certain input (a state
specifies a particular body configuration).
The ability to make predictions is part of what Spelke (2000) refers to as “core initial knowledge”,
that is, a set of basic competencies on top of which human cognition is built. High-level cognitive
functions, such as planning and shared attention, for instance, can be interpreted with respect to their
capability of predicting the consequences of chains of events. The large number of behavioral predis-
positions that have been discovered, and which are part of the core knowledge, show that infants are
not mere blank slates waiting to be written on (Iverson and Thelen, 1999; Johnson, 1997; Meltzoff and
Moore, 1997; Spelke, 2000; Thelen, 1981).
2.4.8 Categorization and sensory-motor coordination
Categorization is the ability to make distinctions in the real world, i.e., to discriminate and identify
sensory stimulations, events, motor acts, emotions, and so on. This ability is of such fundamental im-
portance for cognition and intelligent behavior that a natural organism incapable of forming categories
does not have much of a chance of survival (unless the categories are innate, but then they are not
flexible). For example, the organism will not be able to discern food and non-food, peer and non-peer,
2.4. Facets of development 35
and so forth. Categorization is an efficient and adaptive initial step in perceiving, and cognizing, as
well as a base for most of our conceptual abilities. Our daily interactions with the physical world,
and our social and intellectual lives heavily rely on our capacity to form categories (Lakoff, 1987),
and so does cognitive development (Thelen and Smith, 1994). Most organisms are therefore endowed
with the capacity to perceptually categorize and behaviorally discriminate an extraordinary range of
environmental stimuli (Edelman, 1987).
Evidence from developmental psychology supports the idea that perceptual categorization and con-
cept formation are the result of active exploration and manipulation of the environment (e.g., Bushnell
and Boudreau, 1993; Gibson, 1988; Piaget, 1953; Streri, 1993). That is, while sensation, and perhaps
certain aspects of perception can proceed without a contribution of the motor apparatus, perceptual
categorization depends upon the interplay between sensory and motor system. In other words, catego-
rization is an active process, which can be conceptualized as a process of sensory-motor coordinated
interaction of the organism with its surrounding environment (e.g., discrimination of textures and size
of objects by exploratory hand movements). It is through such interaction that the raw sensory data
impinging on the sensors may be appropriately structured, and the subsequent neural processing sim-
plified. The structure induced in the sensory data is important – perhaps critical – in establishing
dynamic categories, and may be a consequence of the correlation of movements and of time-locked
external (potentially multimodal) sensory stimulation (Thelen and Smith, 1994, p. 194). We conclude
that the absence of self-produced movements can affect the development of cognitive abilities and
skills. Children with severe physical disabilities, for instance, have limited opportunities to explore
their surroundings. And this lack of experience affects their cognitive and social development.
2.4.9 Neuromodulation, value and neural plasticity
Neuromodulatory systems are small and compact groups of neurons that reach large portions of the
brain. They include the noradrenergic, serotonoergic, cholinergic, dopaminergic, and histaminergic
cell nuclei (Edelman, 1987; Edelman and Tononi, 2001). In mammals, the importance of these modu-
latory neurotransmitter systems vastly outweighs the proportion of brain space they occupy, their axons
projecting widely throughout the cerebral cortex, hippocampus, basal ganglia, cerebellum, and spinal
cord (Dickinson, 2003; Hasselmo et al., 2003). One of the primary roles of neuromodulatory systems
is the configuration and tuning of neural network dynamics at different developmental stages (Marder
and Thirumalai, 2002).
Another important role of these systems in brain function is to serve as “value systems” that either
gate the current behavioral state of the organism (e.g., waking, sleep, exploration, arousal), or that
act as internal mediators of value and environmental saliency. That is, they signal the occurrence
of relevant stimuli or events (e.g., novel stimuli, painful stimuli, rewards) by modulating the neural
2.4. Facets of development 36
activity and plasticity of a large number of neurons and synapses (Friston et al., 1994). Value systems
have several properties: (a) Their action is probabilistic, i.e., they influence big populations of neurons;
(b) their activation is temporally specific, that is, their effects are transient and short-lasting; and (c)
spatially uniform, i.e., they affect widespread regions of the brain, while acting as a single global
signal (Sporns and Alexander, 2002). Other implementations of value systems, e.g., in other species
are also possible (Dickinson, 2003).
Value systems play a pivotal role in adaptive behavior, because they mediate neural plasticity and
modulate learning in a self-supervised and self-organized manner. In doing so, they allow organisms
to autonomously learn via self-generated (possibly spontaneous) activity. In a sense, value systems
introduce biases into the perceptual system, and therefore create the necessary conditions for learning
and the self-organization of dynamic categories. The action of value systems can be either genetically
predetermined, such as in behaviors that satisfy homeostatic and appetitive needs, or it can incorporate
activity and experience-dependent processes (Sporns, 2004). The two flavors of value are also known
as innate and acquired value (Friston et al., 1994).
2.4.10 Social interaction
Interactions with adults and peers (scaffolding, tutelage, and other forms of social support) as well as
mimetic processes such as mimicry, imitation, and emulation, are hypothesized to play a central role
in the development of early social cognition and social intelligence (Meltzoff and Prinz, 2002; Whiten,
2000). The presence of a caregiver to nurture children as they grow is essential, because human infants
are extremely dependent on their caregivers, relying upon them not only for their most basic needs
but also as a guide for their cognitive development (Lindblom and Ziemke, 2003; Vygotsky, 1962). It
is important to note that in terms of development interaction with objects and interaction with peers
bear two completely different valences (Nadel, 2003). Through interaction with inanimate objects
infants acquire information “statically” and maybe learn the “simple” physics that entails the objects’
behavior. During peer to peer or infant-adult interaction, however, infants are engaged in a complex
communicative act, involving the interaction of two complex dynamical systems mutually influencing
(and modifying) each other’s behavior.
A fundamental type of interaction between infants and adults is scaffolding. The concept of scaf-
folding, whose roots can be found in the work of Vygotsky (1962) was introduced by Wood et al.
(1976) and refers to the support provided by adults to help children bootstrap cognitive, social, and
motor skills. As the child’s confidence increases, the level of assistance is gradually reduced. In other
words, scaffolding helps structuring the environment in order to facilitate interaction and learning.
Scaffolding by a more capable caregiver or imitation of a peer can reduce distractions and bias explo-
rative behaviors toward important environmental stimuli. The caregiver can also increase or decrease
2.4. Facets of development 37
the complexity of the task. This issue is akin to the concept of “sensitive periods” (Bornstein, 1989;
Gottlieb, 1991), that is, particular intervals of time during which infants are especially responsive to
the input from their caregivers, and hence more apt to acquire skills.
From a very early age, infants are endowed with the necessary means to engage in simple, but
nevertheless crucial social interactions (e.g., they show preferences for human smell, human faces, and
speech (Johnson, 1997; Nadel and Butterworth, 1999) – see also Sec. 2.4.7), which can be used by the
caregiver to regulate and shape the infant’s behavior. Joint or shared attention, that is, the ability to
attend to an object of mutual interest in the context of a social exchange, is already observed in six
months old infants (Butterworth and Jarrett, 1991). Meltzoff and Moore (1977) reported on the early
ability of very young infants to imitate both facial and manual gestures. Indeed, early and non-verbal
imitation is a powerful means for bootstrapping the development of communication and language.
Developmental psycholinguists such as Fernald (1985) provided compelling evidence for what sort of
cues preverbal infants exploit in order to recognize affective communicative intent in infant-directed
speech (motherese).
Basic social competencies are the background on which more complex social skills develop, and
they represent yet another way to facilitate learning (see Sec. 2.4.1). The reliance on social contact is
so integrated into our species that it is hard to imagine a completely asocial human. Severe develop-
mental disorders that are characterized by impaired social and communicative development, such as
autism (Baron-Cohen, 1995), can give us a glimpse on the importance of social contact (Scassellati,
2001, p. 30).
2.4.11 Intermediate discussion
In summary, despite being some sort of unfinished version of a fully developed adult, infants are well-
adapted to their specific ecological niche. As suggested in the discussion above, development is a
process during which the maturation of the neural system is tied to a concurrent and gradual lifting of
the initial limitations on sensory and motor systems. The state of immaturity that at first sight appears
to be an inadequacy plays in fact an integral role during ontogeny, and results in increased flexibility,
and faster acquisition of skills and subskills. Innate abilities, such as prospective control or prewired
motor patterns (Thelen, 1981), can also speed up skill acquisition by providing a “good” background
for the learning of novel skills. The difficulty of learning particular tasks can be further reduced by
shaping development via appropriate social exchanges and scaffolding by adults.
The various aspects of development exposed in this section are obviously highly interdependent
and cannot be considered in isolation. Spatio-temporally coordinated movement patterns (Sec. 2.4.4),
for instance, arise spontaneously and in a self-organized fashion from the interaction among brain,
body, and environment, and are – at least in part – the result of an entrainment between these three
2.4. Facets of development 38
components (Sec. 2.4.6 and Sec. 2.4.3). In general, autonomous and self-organized formation of spatio-
temporal patterns is a distinguishing trait of “open nonequilibrium systems”, that is, of systems in
which “energy” flows (a) from one region of the system to another (the system is not at equilibrium);
and (b) in and out of the system (the system is open) (e.g., Haken, 1983; Kelso, 1995).
Category learning (Sec. 2.4.8) represents another example of the interdependency between the
proposed developmental aspects, because it lends itself well to an interpretation as a dynamic process
during which, through interaction with the local environment, patterns of behavior useful for category
formation self-organize. (Sec. 2.4.3). Moreover, in analogy to the development of patterns of motor
coordination in motor learning (Sec. 2.4.4), it is possible to conceptualize the emergence of perceptual
categories as a modification of degrees of freedom: mechanical degrees of freedom (i.e., number of
joints and muscles) in the case of motor learning, and sensory-motor or perceptual degrees of freedom
(i.e., categories) in the case of category formation. The self-organization of categories is directed
by neural and bodily constraints (Sec. 2.4.2), as well as by value systems (Sec. 2.4.9), which not
only introduce the necessary biases for learning to take place, but also modulate it, by evaluating the
consequences of particular actions. Hence they constitute the engine of exploration, and represent a
conditio sine qua non for category learning, for social interactions (Sec. 2.4.10), and for directing self-
exploratory processes (Sec. 2.4.5). Self-exploration and self-learning, in turn, are strongly dependent
on spontaneous movement activity (Sec. 2.4.6). This sort of activity, albeit not oriented toward any
functional goal (such as reaching for an object, or turning the head in a particular direction), leads
to the generation of sensory information across different sensory modalities correlated in time, which
gives infants the possibility to learn to sense and predict the consequences of their own actions through
self-exploration. For example, take an infant spontaneously waving her hand in front of her eyes, and
touching her face. Over time, this sort of activity generates associations of the sensory information
that originates from outside the body (called exteroception, e.g., vision, audition or touch) with the one
coming from inside the body (or proprioception, e.g., vestibular apparatus or muscle spindles), and a
sense of bodily self can emerge (Rochat and Striano, 2000).
As can be seen from these few examples, every aspect of development is affected simultaneously
by other ones. This coupling makes their investigation challenging, and modeling a difficult enterprise.
We contend that embodied models and robotic systems represent appropriate scientific tools to tackle
the interaction and integration of the various aspects of development. The construction of a physical
system forces us to consider (a) the interaction of the proposed model with the real world, and (b) the
interaction and the integration of the various subcomponents of the model with each other. This way
of thinking has spurred, since its inception, a growing number of research endeavors.
2.5. Research landscape 39
2.5 Research landscape
In this section, we present a survey of a variety of research projects that deal with or are inspired by
developmental issues. Table 2.2 gives a representative sample of investigations and is not intended as
a fully comprehensive account of research related to developmental robotics. For the inclusion of a
study in Table 2.2, we adopted the following two criteria:
The study had to provide clear evidence for robotic experiments. That is, we did not include
computer-based models of real systems, avatars, or other simulators. This choice is not aimed
at discrediting simulations, which indeed are very valuable tools of research. In fact, we ac-
knowledge that physical instantiation is not always an absolute requirement, and that simula-
tions have distinct advantages over real world experiments, such as the possibility for extensive
and systematic experimentation (Sporns, 2004; Ziemke, 2003). If the goal, however, is to model
and understand development and how it is influenced by interaction with the environment, then
robots may represent the only viable solution. Whereas a simulation can impossibly capture
all the complexities and oddities of the physical world (Brooks, 1991; Steels, 1994; Pfeifer and
Scheier, 1999), robots – by being “naturally” situated in the real world – are the only way to
guarantee a continuous and real time coupling of body, control and environment.
The study had to show a clear intent to address hypotheses put forward in either developmental
psychology or developmental neuroscience. The use of connectionist models, reinforcement or
incremental learning applied to robot control alone – without any link to developmental theories,
for instance – did not fulfill this requirement.
Despite the admittedly rather restrictive nature of these two requirements, we were able to identify
a significant number of research papers satisfying them. In order to introduce some structure in this
rather heterogeneous collection of papers, we organized the selected articles of Table 2.2 according to
four primary areas of interest (see Table 2.3):
(1) Socially oriented interaction: This category includes robotic systems in which social interaction
plays a key role. These robots either learn particular skills via interaction with humans or with
other robots, or learn to communicate with other robots or humans. Examples are language
acquisition, imitation, social regulation.
(2) Non-social interaction: This category comprises studies that are characterized by a direct and
strong coupling between sensory and motor processes, and the surrounding local environment,
which do not involve any interaction with other robots or humans. Examples are learning to
grasp, visually-guided manipulation, perceptual categorization, and navigation.
2.5. Research landscape 40
(3) Agent-related sensory-motor control: This category organizes studies that investigate the ex-
ploration of bodily capabilities, changes of morphology (e.g., perceptual acuity, or strength of
the effectors) and their effects on motor skill acquisition, and self-supervised learning schemes
not specifically linked to a functional goal. Examples: self-exploration, categorization of motor
patterns, learning to swing or bounce.
(4) Mechanisms and processes: This category contains investigations that address mechanisms or
processes thought to increase the adaptivity of a behaving system. Examples are developmental
plasticity, value systems, neurotrophic factors, Hebbian learning, freezing and unfreezing of
mechanical degrees of freedom, increase or decrease of sensory resolution and motor accuracy,
and so on.
The borders of the proposed categories may not be as clearly defined as this classification suggests, and
instances may exist that fall in two or more of those categories. Or even worse, these categories may
appear arbitrary and ad hoc. We believe, however, that a grouping into four primary interest areas is
meaningful for the following reasons: First, the individual categories refer to different contextual sit-
uations; that is, while interactions in a social context typically involve one or more persons or robots,
not socially oriented interactions and agent-related control do not. Second, movements performed dur-
ing a social-related interaction have a communicative purpose, e.g., language, or gestures. Non-social
sensory-motor interactions as well as agent-related control, however, do not bear any communicative
value (unless an object is used as a means of communication). As stated previously, evidence from
developmental psychology suggests that interaction with peers and interaction with objects bear com-
pletely different valences. Third, unlike non-social sensory-motor interactions whose primary purpose
is the active exploration or manipulation of the surrounding environment, agent-related sensory-motor
control is mainly concerned with the exploration of the agent’s own bodily capabilities. Examples
from studies into human development (mainly concerned with motor development) are: rhythmical
stereotypies (Thelen, 1981), general movements (Prechtl, 1997), crawling (Adolph et al., 1998), and
postural control (Bertenthal and Von Hofsten, 1998; Hadders-Algra et al., 1996).
Finally, we note that the last category (mechanisms) groups mechanisms and processes that are
valid for whatever content domain – be it socially or not socially oriented interaction, or agent-related
sensory-motor control. Most of the studies surveyed in this chapter employed a number of mechanisms
and processes either explicitly or implicitly. Hebbian learning and neurotrophic factors, for instance,
are general mechanisms of plasticity. Similarly, value systems can modulate different types of learning,
and guide the self-organization of early movements. We believe that these mechanisms might form a
good basis on which to build a general theory of developmental robotics.
2.5. Research landscape 41
2.5.1 Socially oriented interaction
Studies in social interaction and acquisition of social behaviors in robotic systems have examined a
wide range of learning situations and techniques. Prominent research areas include shared (or joint)
attention, low-level imitation (that is, reproduction of simple and basic movements), language devel-
opment, and social regulation (for an overview and a taxonomy of socially interactive robots, see Fong
et al., 2003). Adopting a developmental stance within this context may indeed be a good idea.
Brian Scassellati, for instance, advocated the application of a developmental methodology as a
means of providing a structured decomposition of complex tasks, which ultimately could facilitate
(social) learning (Scassellati, 1998). In (Scassellati, 2001), he described the early stages of the imple-
mentation in a robot of a hybrid model of shared attention, which in turn was based on a model of the
development of a “theory of mind” 8 proposed by Baron-Cohen (1995). Despite the simplicity of the
robot’s behavioral responses, and the need for more complex social learning mechanisms, this study
represents a first step toward the construction of an artificial system capable of exploiting social cues
to learn to interact with other robots or humans. Another model of joint attention was implemented
by Nagai et al. (2002). The model involved the development of the sensing capabilities of a robot from
an immature to a mature state (achieved by means of a gradual increase of the sharpness of a Gaussian
spatial filter responsible for preprocessing the visual input), and a change of the caregiver’s task eval-
uation criteria, through a decrease of the task error leading to a positive reward for the robot. Along a
similar line of research, Kozima and Yano (2001) studied a “rudimentary” or early type of joint visual
attention displayed by infants. In this case, the robot was able to roughly identify the attentional target
in the direction of the caregivers’s head only when it could simultaneously see both, the caregiver and
the target.
Joint attention is but one factor on which social interaction relies upon. An architecture of mutually
regulatory human-robot interaction striving at integrating various factors involved in social exchanges
was described in (Breazeal and Scassellati, 2000). The aim of the suggested framework was to include
perception, attention, motivations, and expressive displays, so as to create an appropriate learning
context for a social infant-like robot capable of regulating on its own the intensity of the interaction.
Although the implementation did not parallel infant development exactly, the authors claimed that
the design of the system was heavily inspired by the role motivations and facial expressions play in
maintaining an appropriate level of stimulation during social interaction of infants with adults (Breazeal
and Scassellati, 2000, p. 51). Human-robot interaction was also the focus in (Dautenhahn and Billard,
1999), where the authors described an example of emergence of global interaction patterns through
exploitation of movement dynamics. The performed experiments were based on an influential theory
8Theory of mind defines a set of socially-mediated skills relating the individual’s behavior in a social context, e.g.,detection of eye contact.
2.5. Research landscape 42
of cognitive development advocated by Vygotsky (1962), which proposes that social interactions are
essential for the development of individual intelligence. For a recent review of Vygotsky’s theory of
cognitive development and its relation to socially situated Artificial Intelligence see (Lindblom and
Ziemke, 2003).
Socially situated learning can also be guided by robot-directed speech. In such a case, the robot’s
affective state – and as a consequence its behavior – could be influenced by verbal communication
with a human caregiver. It is perhaps less obvious, but equally important to note that there is no
need to associate a meaning to what is said. Breazeal and Aryananda (2002), for instance, explored
recognition of affective communicative intent through the sole extraction of particular acoustic cues
typical of infant-directed speech (Fernald, 1985). This represents an instance of nonverbal interaction
in which emotional expressions and gestures used by human caretakers shape how and what preverbal
infants learn during social exchanges. Varshavskaya (2002) applied a behavior-based approach to the
problem of early concept and vocal label acquisition in a sociable anthropomorphic robot. The goal
of the system was to generate the kind of vocal output that a pre-linguistic, ten to twelve months old
infant, may produce; namely emotive grunts, canonical babblings, which include the syllables required
for meaningful speech, and a formulaic proto-language (some sort of pre-verbal and pre-grammatical
form of the future language). In the author’s own words, most inspirational for the design of the
proto-language acquisition system was the seminal work by Halliday (1975). Dautenhahn and Billard
(1999) also investigated the synthesis of a robotic proto-language through interaction of a robot with
either a human or a robotic teacher. They were able to show how language can be grounded via a
simple movement imitation strategy. “More preverbal” was work done by Yoshikawa et al. (2003),
who constructed a system – consisting of a microphone, a simplified mechanical model of the human
vocal tract, and a neural network – that had to learn to articulate vowels. Inspired by evidence that
shows how maternal vocal imitation leads to the highest rates of infant vocalization (Pelaez-Nogueras
et al., 1996), the artificial system was trained by having the human teacher imitate the robotic system.
Recently, developmentally inspired approaches to robot imitation have received considerable at-
tention (Andry et al., 2002; Dautenhahn and Nehaniv, 2002; Demiris, 1999; Kuniyoshi et al., 2003).
Typically, in robot imitation studies the robot imitates the human teacher or another robot. This rela-
tionship was turned upside down by Stoica (2001), who showed that imitation of the movements of
a robotic arm by a human teacher, could naturally lead to eye-arm coordination as well as to an ade-
quate control of the arm – see also Yoshikawa et al.’s work on speech generation (Yoshikawa et al.,
2003). Many authors have suggested a relatively straightforward two-stage procedure: First, the ar-
tificial system learns to associate proprioceptive or other motor-related sensory information to visual
sensory information and then, while imitating, it exploits the acquired associations by querying for
the motor commands that correspond to the previously perceived sensory information. An example
of a different approach was reported by Demiris and Hayes (2002), who developed a computational
2.5. Research landscape 43
architecture of early imitation used for the control of an active vision head, which was based on the
Active Intermodal Matching hypothesis 9 for early infant imitation proposed by Meltzoff and Moore
(1997).The author gives also an overview of previous work in the field of robotic imitation (for similar
surveys, see Breazeal and Scassellati, 2002; Schaal, 1999).
Learning by imitation offers many benefits (Demiris, 1999; Demiris and Hayes, 2002; Schaal,
1999). A human demonstrator, for instance, can teach a robot to perform certain types of movements by
simply performing them in front of the robot. This strategy reduces drastically the amount of trial-and-
error for the task that the robot is trying to accomplish and consequently speeds up learning (Schaal,
1999). Furthermore, it is possible to teach new tasks to robots by interacting naturally with them. This
possibility is appealing, because it might lead to open-ended learning not constrained by any particular
task or environment.
All studies reviewed thus far presuppose in a way or another a set of basic sensory-motor skills
(such as gazing, pointing or reaching) deemed important for social exchanges of any kind. Stated dif-
ferently, for embodied systems to behave and interact – socially and non-socially – in the real world,
an appropriate coordination of perception and action is necessary. It is becoming commonly accepted
that action and perception are tightly intertwined, and that the refinement of this coupling is the out-
come of a gradual developmental process (e.g., Thelen and Smith, 1994). The following subsection
will review studies that attempt to deepen our understanding of the link between perception and action
in a non-social context.
2.5.2 Non-social interaction
Sensing and acting are tied to each other. Accurate motor control would not be possible without per-
ception, and vice versa, purposive vision would not be feasible without adequate control of actions.
In the last decade-or-so, neurophysiologists have been discovering a number of multi-sensory and
sensory-motor areas. Building models of the processing performed by those areas might be a challeng-
ing research endeavor, but more importantly, it should cast definitive doubts on the way the problem of
perception has been traditionally understood by the Artificial Intelligence (AI) community, that is, as a
process of mapping sensory stimulation onto internal symbolic representations (particularly as young
children presumably do not have “symbols” well developed 10). We have already given some hints
that this has changed. More work is certainly required in order to get a better grasp on the mechanisms
of perception and how they are linked to action.
The coordination of action and perception is of particular importance for category learning. Tradi-
tionally, the problem of categorization has been investigated by employing disembodied categorization
9The hypothesis suggests that infants try to match visual information against appropriately transformed proprioceptiveinformation.
10Thanks to an anonymous reviewer for pointing this out.
2.5. Research landscape 44
models. A growing body of evidence supports, however, a more interactive, dynamic, and embod-
ied view of how categories are formed (Lakoff and Johnson, 1999; Nolfi and Floreano, 2000; Pfeifer
and Scheier, 1999). In essence, as suggested by Dewey (1896) more than one century ago, catego-
rization can be conceptualized as a process of sensory-motor coordinated bodily interaction with the
real world. Embodied models of categorization are not passively exposed to sensory data, but through
movements and interaction with the environment, they are able to generate “good” sensory data, for
example by inducing time-locked spatio-temporal correlations within one sensory modality or across
various sensory modalities (Lungarella and Pfeifer, 2001; Lungarella and Sporns, 2004; Pfeifer and
Scheier, 1997; Sporns and Pegors, 2004; Te Boekhorst et al., 2003) (cf. “principle of information
self-structuring” previously introduced).
Categorization of objects via real-time correlation of temporally contingent information impinging
on the haptic and the visual sensors of a mobile robot was achieved by Scheier and Lambrinos (1996),
for instance. The suggested control architecture employed sensory-motor coordination at various func-
tional levels – for saccading on interesting regions in the environment, for attentional sensory-motor
loops, and for category learning. Sensory-motor activity was also critical in work performed by Krich-
mar and Edelman (2002), who studied the role played by sensory experience for the development of
perceptual categories. In particular, the authors showed that the overall frequency and temporal order
of the perceptual stimuli encountered, had a definite influence on the number of neural units devoted
to a specific object class. This result is confirmed by research on experience-dependent neural plastic-
ity (see Stiles, 2000, for a recent view).
A few other examples of the application of a developmental approach to the acquisition of visuo-
motor coordinations exist. Marjanovic et al. (1996), for instance, were able to show how acquired
oculomotor control (saccadic movements) could be reused for learning to reach or point toward a visu-
ally identified target. A similar model of developmental control of reaching was investigated by Metta
et al. (1999). The authors concluded that early motor synergies might speed up learning and consid-
erably simplify the problem of the exploration of the workspace (see also Pfeifer and Scheier, 1997).
They also pointed out that control and learning should proceed concurrently rather than separately –
as it is the case in more traditional engineering approaches. These studies complement those on the
development of joint attention, discussed in the previous section. Berthouze and colleagues employed
the tracking of a pendulum to teach an active vision head simple visual skills such as gaze control, and
saccading eye movements (Berthouze et al., 1997; Berthouze and Kuniyoshi, 1998). Remarkably, the
robot even discovered its “own vestibulo-ocular reflex.” The approach capitalized on the exploitation
of the robot-environment interaction for the emergence of coordinated behavior. Non-social, object-
related sensory-motor interaction was also central in the study performed by Metta and Fitzpatrick
(2003). Starting from a reduced set of hypotheses, their humanoid system learned – by actively poking
and prodding objects (e.g., a toy car or a bottle) – to associate particular actions with particular object
2.5. Research landscape 45
behaviors (e.g., a toy car rolls along if pushed appropriately, while a bottle tends to roll sideways).
Their results were in accordance with the theory of affordances by Gibson (1977).
A different research direction was taken by Coehlo et al. (2001). They proposed a system archi-
tecture that employed haptic categories and the integration of tactile and visual information in order to
learn to predict the best type of grasp for an observed object. Relevant in this case is the autonomous
development of complex visual features starting from simple behavioral primitives.
Weng et al. (2000) reported on a developmental algorithm tested on a robot, which had to learn
to navigate on its own in an unknown indoor environment. The robot was trained interactively, that
is, on-line and in real time, via direct touch of one the 28 touch sensors located on the robot’s body.
By receiving some help and guidance from a human teacher, the algorithm was able to automatically
develop touch-guided motor behaviors and, according to the authors, some kind of low-level vision.
2.5.3 Agent-related control
As discussed in Section 2.4.5, self-exploration plays a salient role in infancy. The emergence and
tuning of sensory-motor control is hypothesized to be the result of the exploration of the perceptual
consequences of infants’ self-produced actions (Rochat and Striano, 2000). Similarly, an agent may
attain sensory-motor control of its bodily capabilities by autonomous exploration of its sensory-motor
space. A few instances of acquisition of agent-related control in robots exist.
Inspired by findings from developmental psychology, Berthouze et al. (1998) realized a system that
employed a set of basic visuo-motor (explorative) behaviors to generate sensory-motor patterns, which
were subsequently categorized by a neural architecture capable of temporal information processing.
Following a similar line of research, Kuniyoshi et al. (2003) developed a visuo-motor learning system
whose goal was the acquisition of neonatal imitation capabilities through a self-exploratory process of
“body babbling” (Meltzoff and Moore, 1997). As in (Berthouze et al., 1998), the proposed neural archi-
tecture was also capable of temporal information processing. An agent-related (not object-related) type
of categorization is also reported in (Berthouze and Kuniyoshi, 1998). The authors used self-organizing
Kohonen maps to perform an unsupervised categorization of sensory-motor patterns, which emerged
from embodied interaction of an active vision system with its environment. The self-organization pro-
cess led to four sensory-motor categories consisting of horizontal, vertical, and “in-depth” motions,
and a not clearly defined, intermediate category.
Morphological changes (e.g., body growth, changes of visual acuity and visual resolution) rep-
resent one of the most salient characteristics of an ongoing developmental process. Lungarella and
Berthouze (2002a) investigated the role played by such changes for the acquisition of motor skills by
using a small-sized humanoid robot that had to learn to pendulate, i.e., to swing like a pendulum (cf.
Chapter 3 and Chapter 4). The authors attempted to understand whether physical limitations and con-
2.5. Research landscape 46
straints inherent to body development could be beneficial for the exploration and selection of stable
sensory-motor configurations (see also Turkewitz and Kenny, 1982; Bjorklund and Green, 1992). In
order to validate the hypothesis, Lungarella and Berthouze (2002a,b) performed a comparative analysis
between the use of all bodily degrees of freedom from the very start, and the progressive involvement
of all degrees of freedom by employing a mechanism of developmental freezing and unfreezing of
degrees of freedom (Bernstein, 1967). In a follow-up case-study (Lungarella and Berthouze, 2002c),
the same authors investigated the hypothesis that inherent adaptivity of morphological changes leads to
behavioral characteristics not obtainable by mere value-based regulation of neural parameters (Chapter
4). The authors were able to provide evidence for the claim that in learning a motor task, a reduction
of the number of available biomechanical degrees of freedom helps stabilize the interplay between en-
vironmental, and neural dynamics (the way patterns of activity in the neural system change with time).
They showed that the use of all available degrees of freedom from the start reduced the likelihood for
the occurrence of physical entrainment, i.e., mutual regulation of body and environmental dynamics.
In turn, lack of entrainment led to a reduced robustness of the system against environmental perturba-
tions. Conversely, by initially freezing some of the available degrees of freedom, physical entrainment
and thus robust oscillatory behavior could occur.
Another instance of agent-related sensory-motor control was reported by Lungarella and Berthouze
(2003) (Chapter 5). Inspired by a study of how infants strapped in a Jolly Jumper learn to bounce (Gold-
field et al., 1993), the authors performed a series of experiments with a bouncing humanoid robot (see
Fig. 2.2), aimed at understanding the mechanisms and computational principles that underly the emer-
gence of movement patterns via self-exploration of the sensory-motor space (such as entrainment). The
study showed that a suitable choice of the coupling constant between limb segments, as well as of the
gain of the sensory feedback, induced a reduction of the movement variability, an increase in bouncing
amplitude, and led to movement stability. The authors attributed the result to the entrainment of body
and environmental dynamics. Taga (1995) reported a similar finding in the case of biped walking.
2.5.4 Mechanisms and processes
A few mechanisms, such as freezing and unfreezing of degrees of freedom, or physical entrainment,
have already been discussed in the previous section. Other developmentally relevant mechanisms exist.
Some of them are related to changes in morphological parameters, such as sensor resolution, and motor
accuracy, some of them affect neural parameters, such as the number of neurons constituting the neural
system. Dominguez and Jacobs (2003) and Nagai et al. (2002), for instance, describe systems that
start with an impoverished visual input whose quality gradually improves as development (or learning)
progresses. In this section, we discuss two additional mechanisms.
2.5. Research landscape 47
Value system
Learning is modulated by value systems. A learning technique in which the output of the value sys-
tem modulates learning itself is called value-based or value-dependent learning. Unlike reinforcement
learning techniques (which provide an interesting set of computational principles), value-based learn-
ing schemes typically specify the neural mechanisms by which stimuli can modulate learning, and by
which organisms sense the consequences of their actions (Sporns, 2003, 2004) (see also Pfeifer and
Scheier, 1999, ch.14). Another difference between the two learning paradigms is that typically – in
reinforcement learning – learning is regulated by a (reinforcement) signal given by the environment,
whereas in value-based learning, the (value) signal is provided by the agent itself (self-teaching). A
number of value systems have been realized in robotic systems. In those implementations the value
system either plays the role of an internal mediator of salient environmental stimuli and events (Al-
massy et al., 1998; Krichmar and Edelman, 2002; Scheier and Lambrinos, 1996; Sporns et al., 2000;
Sporns and Alexander, 2002), or is used to guide some sort of exploratory process (Lungarella and
Berthouze, 2002c; Steels, 2003) (cf. chapter 3, 4, and 6 of this thesis).
Almassy et al. (1998) constructed a simulated neural model embedded in an autonomous real-
world device, one of whose four components was a “diffuse and ascending” value system. The value
signals were used to modify the strength of the connections from the neurons of the visual area to the
ones of the motor area. One of the results of these value-dependent modifications was that without any
supervision, appropriate behavioral actions could be linked to particular responses of the visual system.
A similar model system was described by Krichmar and Edelman (2002). Compared to previous work,
the modeled value signal had two additional features: a prolonged effect on synaptic plasticity, and
the presence of time-delays (Krichmar and Edelman, 2002, p. 829). Another instantiation of a value
system is described in (Scheier and Lambrinos, 1996; Pfeifer and Scheier, 1997). In this case, the
output of the value system was used to modulate Hebbian learning – yet another crucial mechanism.
Essentially, the robot was allowed to learn only while it was exploring objects. Sporns and Alexander
(2002) tested a computational model of a neuromodulatory system 11 – structurally and functionally
similar to the mammalian dopamine and noradrenaline system – in an autonomous robot. The model
comprised two neuromodulatory components mediating the effect of rewards and of aversive stimuli.
According to the authors, value signals played a dual role in synaptic plasticity, in that they not only
had to modulate the strength of the connection between sensory and motor units, but they were also
responsible for the change of the response properties of the value system itself.
In contrast to the previous cases, where the value system was used to modulate learning, in (Lun-
garella and Berthouze, 2002c) the value system was employed to guide the exploration of the parameter
11Neuromodulatory systems are instantiations of value systems that find justification in neurobiology. Examples includethe dopaminergic and the noradrenergic systems.
2.5. Research landscape 48
space associated with the neural system of a robot that had to learn to pendulate (cf. chapters 3, 4, and
6 of this thesis).
Developmental plasticity
Plasticity is an important ontogenetic mechanism that contributes to the adaptivity of brain, body and
behavior in response to internal and external variations. The developing brain, for instance, is continu-
ously changing (in terms of number of neurons, amount of interconnections, wiring patterns, synaptic
plasticity, and so on) and these changes are in part experience-dependent. Such neural plasticity gives
our neural circuitry the potential of acquiring (given appropriate training) nearly any function (O’Leary
et al., 1994). A similar characteristic holds for plasticity of body and behavior.
The study of a neural model incorporating mechanisms of neural plasticity was conducted by Al-
massy et al. (1998) (for more examples, see Sporns, 2004). In particular, the authors analyzed how
environmental interactions of a simulated neural model embedded in a robot may influence the initial
formation, the development, and the dynamic adjustment of complex neural responses during sensory
experience. They observed that the robot’s self-generated movements were crucial for the emergence,
and development of selective and translation-invariant visual cortical responses, because they induce
correlations in various sensory modalities. Another result was the development of a foveal prefer-
ence, that is, the system showed “stronger visual responses to objects, presented closer to the visual
fovea” (Almassy et al., 1998, p.358).
A further example of synthetic neural modeling is illustrated in (Elliott and Shadbolt, 2001). The
authors studied the application of a neural model, featuring “anatomical, activity-dependent, develop-
mental synaptic plasticity” (p. 167), to the growth of sensory-motor maps in a robot whose task was
to avoid obstacles. They showed that the deprivation of one or two (infrared-light) receptors could
be taken care of by a mechanism of developmental plasticity, which according to the authors would
allow the nervous system to adapt to the body as well as to the external environment in which the body
resides.
2.5.5 Intermediate discussion
We can make a number of observations. Almost 40% of the studies reviewed (11 out of 29) fell in the
category labeled “social interaction” (see Table 2.3). Apparently, this category constitutes a primary
direction of research in developmental robotics. This result is confirmed by the fact that lately a lot of
attention has been directed toward designing socially interactive robots. In a recent and broad overview
of the field, Fong et al. (2003) tried to understand the reasons behind the growing interest in socially
interactive robotics. They concluded that social interaction is desirable in the case robots mediate
human-human (peer-to-peer) interactions (robot as persuasive machine) or in the case robots function
2.5. Research landscape 49
as a representation of, or representative for, the human (robot as avatar 12). It is plausible to assume
that in order to acquire more refined and advanced social competencies – e.g., deferred imitation 13 –
a robot should undergo a process of progressive development of its social skills analogue to the one of
humans. Fong and his colleagues share the same opinion.
It is further interesting to note that many of the studies considered here examine to some extent the
sensory-motor competence in interacting with the local environment – in particular, basic visuo-motor
competencies such as saccading, gaze fixation, joint attention, hand-eye coordination, and visually-
guided reaching. Brooks (2003) stressed the “crucial” importance of basic social competencies (e.g.,
gaze-direction or determination of gaze-direction) for peer-to-peer interactions. Early motor compe-
tencies are a natural prerequisite for the development of basic social competencies. We were able,
however, to single out only a few studies that have attempted to go beyond pointing, reaching, or
gazing, i.e., early motor competencies. This issue is closely related to the notoriously hard problem
of learning to coordinate the many degrees of freedom of a potentially redundant nonlinear physical
system, and indeed, imitation learning may represent a suitable route to its solution (Schaal, 1999).
Another way out of the impasse, may be to exploit processes of self-exploration of the sensory-motor
system and its intrinsic dynamics. The usage of self-exploration is explicitly advocated in four of the
surveyed studies (Andry et al., 2002; Berthouze et al., 1998; Kuniyoshi et al., 2003; Lungarella and
Berthouze, 2002c), and presumably has been employed implicitly also in other ones.
From a developmental perspective, learning multi-joint coordinations or acquiring complex mo-
tor skills may benefit from the introduction of initial morphological (sensor, motor, and neural) con-
straints, which over time are gradually released (Scassellati, 2001; Lungarella and Berthouze, 2002b;
Nagai et al., 2002). In the same context, mechanisms of physical and neural entrainment, that is, mu-
tual regulation between environment and the robot’s neural and body dynamics, as well as value-based
self-exploration of body and neural parameters, deserve further investigation. A pioneering attempt
to capitalize on the coupling between the body, neural, and environmental dynamics was promoted
by Taga (1991). In his model of biped walking, he showed how movements could emerge from a
global entrainment 14 among the activity of the musculo-skeletal system and the surrounding envi-
ronment. The study was performed, however, only in simulation. Williamson (1998) used two real
robot arms to investigate a similar issue. He claimed that his approach would allow to achieve gen-
eral oscillatory motion, and more complex rhythmic tasks by exploitation of the coupled dynamics of
an oscillator system and the arm dynamics (p. 1393). Two obvious shortcomings of his investigation
were the absence of learning and of a developmental framework. Lungarella and Berthouze (2002c,
12Remote-presence robots may indeed be one of the killer applications of robotics in the near future (Brooks, 2003, p. 135).13Imitation that takes place a certain amount of time after the demonstration by the teacher.14“Since the entrainment has a global characteristic of being spontaneously established through interaction with the envi-
ronment, we call it global entrainment”(p. 148 Taga, 1991)
2.6. Developmental robotics: existing theoretical frameworks 50
2003) building on previous research, attempted to capitalize on the interplay between neural plasticity,
morphological changes, and entrainment to the dynamics of body and task (chapters 3 and 4).
Autonomy, a thorny concept without a generally accepted definition (e.g., Pfeifer and Scheier,
1999), is another research theme in need for further investigation. Loosely speaking, an autonomous
system must be self-contained and independent from external control. Thus, in such a system, the
mechanisms and processes that mold local structure to yield global function must reside entirely within
the system itself (Sporns, 2003). Autonomy is no easy feat. An autonomous robot should be also en-
dowed with an initial set of values and drives, i.e., motivations or needs to act and interact with the
environment. The role of the value system and of the motivational system is to mediate learning, pro-
mote parameter exploration, drive action selection, and regulate social interactions (Blumberg, 1996;
Breazeal and Scassellati, 2000). Concerning the value system, an important issue will have to be ad-
dressed in future work, that is, how specific or general the system of values and motivations needs to
be in order to bootstrap adaptive behavior. In current implementation values and motivations are rela-
tively simple: light is better than no light, or seek face-like blobs while avoiding nonface-like blobs. In
essence, the issue boils down to the choice of the initial set of values and drives. But, how much has to
be predefined, and how much should acquired?
Finally, we note that while the spectrum of outstanding research issues, as well as the complexity of
the available robots, have considerably increased over the past few years, not many “developmentally
inspired” reconnaissance tours into unexplored research directions have been started yet. There is, for
instance, only one single study on navigation that tried to employ developmental mechanisms (Weng
et al., 2000), and there are no studies at all on robot locomotion!
2.6 Developmental robotics: existing theoretical frameworks
Early theorization of developmental robotics can be traced back to work on behavior-based robotics,
physical embodiment, situatedness, and sensory-motor coupling with the environment (Brooks, 1991;
Brooks and Stein, 1994; Rutkowska, 1995). On route to understanding human intelligence by building
robots, Sandini et al. (1997) were among the first to recognize the importance of taking into account
development. They called their approach “Developmental Engineering.” As in traditional engineering,
the approach is directed toward the definition of a theory for the construction of complex systems. The
main objective is to show that the adoption of a framework of biological development can be success-
fully employed for constructing artificial systems. Metta (2000) pointed out that this methodology can
be envisaged as some sort of new tool for exploring developmental cognitive sciences. Such a new tool
could have a similar role to the one that system and control theory had for the analysis of human move-
ments. The authors investigated some of the aspects of visuo-motor coordination in a humanoid robot
called Babybot (see Fig. 2.2). Issues, such as the autonomous acquisition of skills, the progressive
2.6. Developmental robotics: existing theoretical frameworks 51
increase of the task complexity (by increasing the visual resolution of the system), and the integration
of various sensory modalities, were also explored (Panerai et al., 2002; Natale et al., 2002). Recently,
the same group also produced a manifesto of developmental robotics outlining various aspects relevant
to the construction of complex autonomous systems (Metta et al., 2001). The article maintained that
the ability of recognizing progressively longer chains of cause-effect relationships could be one possi-
ble way of characterizing learning in an “ecological context”, because in a natural setting no teacher
can possibly provide a detailed learning signal and enough training data (e.g., in motor learning the
correct activation of all muscles, proper torque values, and so on). For another recent manifesto of
developmental robotics, see (Elliott and Shadbolt, 2003).
Around the same time as Sandini, Ferrell and Kemp (1996) as well as Brooks (1997) argued that
development could lead to new insights into the issues of cognitive and behavioral scaling. In an ar-
ticle titled “Alternative Essences of Intelligence”, Brooks et al. (1998) explored four “intertwined key
attributes” of human-like intelligent systems, that is, development, embodiment, social interaction, and
multisensory integration. They made the following assumptions (implicitly negating three central be-
liefs of classical AI): (a) human intelligence is not as general purpose as usually thought; (b) it does
not require a monolithic control system (for the existence of which there is no evidence); and (c) in-
telligent behavior does not require a centrally stored model of the real world. The authors, drawing
inspiration from developmental neuroscience and psychology, performed a series of experiments in
which their humanoid robot(s) had to learn some fundamental sensory-motor and social behaviors (see
also Sec. 2.5). The same group also tried to capitalize on the concept of bootstrapping of skills from
previously acquired skills, i.e., the layering of new skills on top of existing ones. The gradual increase
in complexity of task-environment, sensory input (through the simulation of maturational processes),
as well as motor control, was also explored in tasks such as learning to saccade and to reach toward
a visually identified target (Marjanovic et al., 1996). Scassellati (1998, 2001) proposed that a devel-
opmental approach, in humans and in robots, might provide a useful structured decomposition when
learning complex tasks – or in his own words, “building systems developmentally facilitates learning
both by providing a structured decomposition of skills and by gradually increasing the complexity of
the task to match the competency of the system” (Scassellati, 2001, p. 29).
Another example of the novel and developmentally inspired approach to robotics was given
by Asada et al. (2001). The authors proposed a theory for the design and construction of humanoid
systems called “Cognitive Developmental Robotics.” One of the key aspect of cognitive develop-
mental robotics is to avoid implementing the robot’s control structure “according to the designers”
understanding of the robot’s physics’, but to have the robot acquire its own ‘understanding through
interaction with the environment’ (Asada et al., 2001, p. 185). This methodology departs from tradi-
tional control engineering, where the designer of the system imposes the structure of the controller.
In cognitive developmental robotics in particular, and in developmental robotics in general, the robot
2.7. Discussion 52
has to get to grips with the structure of the environment and behavior, rather than being endowed a
priori with an externally designed structure. Cognitive developmental robotics also points at how to
“prepare” the robot’s environment to progressively teach the robot new and more complex tasks with-
out overwhelming its artificial cognitive structure. This technique is called scaffolding, and parents or
caretakers often employ it to support, shape, and guide the development of infants (Sec. 2.4.10).
A last example of existing theories in developmental robotics is “Autonomous Mental Develop-
ment” (Weng et al., 2001). Autonomous mental development differs from the traditional engineering
paradigm of designing and constructing robots in which the task is “understood by the engineer”, be-
cause the machine has to develop its own understanding of the task. According to this paradigm, robots
should be designed to go through a long period of autonomous mental development, from infancy to
adulthood. Autonomous mental development relegates the human to the role of teaching and support-
ing the robot through reinforcement signals. The requirements for a truly mental development include
being non-task specific, because the task is generally unknown at design time. For the same reason,
the artificial brain has to develop a representation of the task, which could not be possibly embedded
in advance into the robot by the designer.
2.7 Discussion
One of the big outstanding research issues on the agenda of researchers of AI and robotics is how to
address the design of artificial systems with skills that go beyond “single-task” sensory-motor learning.
The very search for flexible, autonomous, and open-ended multi-task learning systems is, in essence, a
particular re-instantiation of the long-standing search for general-purpose (human-like) artificial intel-
ligence. In this respect, developmental robotics does not differ from other approaches, and embraces
a variation on the same theme. Yet – as some other scholars of the field – we speculate that the
rapprochement of robotics and developmental psychology may represent both a crucial element of a
general constructive theory for building intelligent systems, and a prolific route to gain new insights
into the nature of intelligence.
The modern view on AI notwithstanding (e.g Pfeifer and Scheier, 1999), hand designing au-
tonomous intelligent systems remains an extremely difficult enterprise. So challenging that the AI
community is starting to resign to the fact that with the current models of intelligence it may even be
impossible in principle. In fact, many to date believe that all proposed frameworks may have multi-
ple shortcomings. It is probably false to assume, for instance, that by merely simulating enough of
the right kind of brain, intelligence will ‘automagically’ emerge. In other words, enough quantitative
change may not necessarily lead to a qualitative change (e.g., De Garis et al., 1998). It is likely that
some fundamental principles still remain to be understood. Brooks (1997, 2003), for instance, has
hypothesized that our current scientific understanding of living things may be lacking some yet-to-
2.7. Discussion 53
be-discovered fundamental mathematical description – Brooks calls it provocatively the “juice” – that
is preventing us from grasping what is going on in living systems. We believe that a developmental
approach may provide a way to gracefully tackle the problem of finding Brooks’s juice. The mere
observation that almost all biological systems – to different extents – mature and develop, bears the
compelling message that development is the main reason why the adaptivity and flexibility of organic
compounds transcends the one of artificial systems. In this sense, the study of the mechanisms under-
lying postnatal development might provide the key to a deeper understanding of biological systems in
general and of intelligent systems in particular. In other words, although it might be interesting from
an engineering perspective, we have not yet succeeded in designing intelligent systems that are able to
cope with the contingencies of the real world – the reason being that we do not understand many of
the mechanisms underlying intelligent behavior yet. Thus, we are basically trying to learn from nature
that in millions of years of evolution has come up with ontogenetic development. In a possible next
step, the designer commitments could be pushed even further back (evolutionary speaking), by design-
ing only the mechanisms of genetic regulatory networks and artificial evolution, and letting everything
evolve (Nolfi and Floreano, 2000).
But what can a developmental approach do? Can it help us construct intelligent machines? The ra-
tionale is that having a complex process (development) gradually unfold in a complex artificial system
(e.g., humanoid robot) can inform our understanding of an even more complex biological system (e.g.,
human brain). Development is a historical process, in the course of which – through mutual coupling
and through interaction with the environment – new and increasingly complex levels of organization
appear and disappear. That is, adult skills do not spring up fully formed but emerge over time (see
Sec. 2.4.1). Thus, at least in principle, it should be possible to decompose the developmental progres-
sion into a sequence of increasingly complex activity patterns, that facilitate learning from the point
of view of the artificial system, and analysis and understanding on the side of the designer. Moreover,
development provides constraints and behavioral predispositions that combined with a general state of
“bodily immaturity” seem to be a source of flexibility and adaptivity (see Sec. 2.4.2 and Sec. 2.4.7).
Newborn infants, for instance, despite being restricted in many ways, are tailored to the idiosyncrasies
of their ecological niche – even to the point of displaying a rich set of adaptive biases toward social
interaction. Another contribution to the adaptivity of the developing system comes from its morpho-
logical plasticity, i.e., changes over time of sensory resolution, motor accuracy, mass of muscles and
limbs, and so on.
The message conveyed is one of the basic tenets of a developmental synthetic methodology: The
designer should not try to engineer “intelligence” into the artificial system (in general an extremely
hard problem); instead he or she should try to endow the system with an appropriate set of basic mech-
anisms for the system to develop, learn, and behave in a way that appears intelligent to an external
observer. As many others before us, we advocate the reliance on the principles of emergent functional-
2.7. Discussion 54
ity (Rutkowska, 1994) and self-organization (see Sec. 2.4.3), which are essential features of biological
systems at any level of organization.
According to Rosen (1991), the formulation of a theory about the functioning of ‘something’ (e.g.,
living cells, artificial neural networks, and so forth) entails at least two problems. The first one, called
the “physiology problem”, relates to the mechanisms that underly the functioning of this “something.”
The second one, the “construction problem”, addresses the identification of the basic building blocks
of the system. This identification is extremely difficult, because in general it is not obvious, which
of the many possible decompositions is the correct one for describing the system as a whole. Here
development comes to rescue. During ontogenesis the different factors (the building blocks) are inte-
grated into a functioning whole (the system). By studying how a system is actually assembled, we have
automatically (by default) a suitable decomposition. The understanding acquired from comprehending
development can be applied to both situations, that is, it can help us solve both, the physiology as well
as the construction problem. A real understanding of “life itself” (borrowing from Rosen) might come
only through the formulation of a constructive theory.
As is evident from the survey given above, two important aspects of living systems that devel-
opmental robotics has to date not sufficiently addressed, are morphology and materials. In order to
understand cognition, however, we cannot confine our investigations to the mere implementation of
control architectures and the “simulation” of morphological changes (see Pfeifer, 2000). If robots are
to be employed as ‘synthetic tools’ to model biological systems, we need to consider also physical
growth, change of shape and body composition, as well as material properties of sensors and actuators.
In this respect, despite not being explicitly inspired by developmental issues, the field of modu-
lar reconfigurable robotics is of some relevance for developmental robotics (e.g., Rus and Chirikjian,
2001). Murata et al. (2001), for instance, provided a taxonomy of reconfigurable, redundant and re-
generative systems, and maintained that this kind of machine represents the ultimate form of reliable
systems. Ideally, these systems should be able to produce any element in the system by themselves.
Up to now, there are no working examples of such systems. It is interesting to note that the description
given by Murata et al. bears some resemblance to the definition of “autopoietic” systems given by Mat-
urana and Varela (1998): “An autopoietic system is organized as a network of processes of production
(synthesis and destruction) of components such that these components (a) continuously regenerate and
realize the network that produces them, and (b) constitute the system as a distinguishable unity in the
domain in which they exist” (see also Beer, 2004; Luisi, 2003). An example of autopoietic system is
the cell, which is constituted of a membrane and of the machinery for protein synthesis. From the point
of view of applications, the relevance of robots that have self-repair capabilities, or that can adapt their
body shape to the task at hand is evident; and indeed, the robotics community has recently started to
address these issues (Hara and Pfeifer, 2003; Teuscher et al., 2003). From a theoretical point of view,
however, it will be important to develop computational paradigms capable of describing and managing
2.8. Future prospects and conclusion 55
the complexity of a robot body that changes over time.
As far as material properties are concerned, current technology is lacking many of the characteris-
tics that biology has, that is, durable, efficient, and powerful actuators (e.g., in terms of power-volume
and weight-torque ratios), redundant, and adaptive sensory systems (e.g., variable density of touch
receptors), as well as mechanical compliance and elasticity. Thus, the search for novel materials for
actuators and sensors will play a pivotal role. A few of these issues are being investigated for the
current generation of humanoid robots (for a review, see Dario et al., 1997), and will become more
compelling as the robots will start moving ‘out of the research labs.’ Take haptic perception (i.e., the
ability to use touch to identify objects), for instance. Due to the technological difficulties involved in
the construction of artificial skin sensors, most researchers do without this ability, or de-emphasize its
importance in relation to vision, audition, or proprioception. In many respects, however, haptic per-
ception – even more than vision – is directed toward the coupling of perception and action. Moreover,
the integration of haptic and visual stimulation is absolutely essential for the development of cognition
(e.g., visuo-haptic transfer, that is, the ability to coordinate information about the shape of objects from
hand to eyes, seems to be already present in newborns (Streri and Gentaz, 2003)).
2.8 Future prospects and conclusion
A list of future research directions that are worth pursuing needs to include autonomous learning –
where autonomous is intended in its strongest connotation, that is, as learning without a direct inter-
vention from a human designer (of course, this does not exclude interaction with a human teacher).
A key aspect of autonomous learning is the study of value systems that gate learning, and drive ex-
ploration of body dynamics and environment. We postulate that robots should acquire solutions to
contingent problems through autonomous exploration and interaction with the real world: generating
movements in various situations, while experiencing the consequences of those movements. Those so-
lutions could be due to a process of self-assembly, and thus would be constrained by the robot’s current
intrinsic dynamics. Common (not necessarily object-related) repetitive actions displayed by human in-
fants (poking, squishing, banging, bouncing, cruising) could give the developing artificial creature a
large amount of multimodal correlated sensory information, which could be used to bootstrap cogni-
tive processes, such as category formation, deferred imitation, or even a primitive sense of self. In
a plausible (but oversimplified) ‘developmental scenario’ the human designer could endow the robot
with simple biases, i.e., simple low-level ‘valences’ for movement, or for sound in the range of human
voices. A critical issue will be to have the robot develop new higher-level valences so as to bias ex-
ploration and learning for longer periods of time that transcend the time frame of usual sensory-motor
coordination tasks. Another possible route could be grounded in recent neurophysiological findings,
which seem to suggest that cognition evolved on top of pre-existing layers of motor control. In this
2.8. Future prospects and conclusion 56
case, manipulation (a sensormotor act) could play a fundamental role by allowing ‘baby-robots’ (or
infants) to acquire the concept of ‘object’ in the first place, and to evolve it into language (Rizzolatti
and Arbib, 1998). This aspect, although partially neglected so far, might prove to be an important next
step on route to the construction of human-like robots.
In conclusion, the generation of robots populating the years to come will be characterized by many
human-like features, not thought to be part of intelligence in the past but considered to be crucial
aspects of human intelligence nowadays. The success of the infant field developmental robotics and
of the research methodology it advocates, will ultimately depend on whether truly autonomous ‘baby
robots’ will be constructed. It will also depend on whether by instantiating models of cognition in
developmental robots, predictions will be made that will find empirical validation.
2.8. Future prospects and conclusion 57
Developm
entalfacetsLink
todevelopm
ent,representativepublication
References
Value
systems,neuralplasticity
postnatalcorticalplasticity(K
atoetal.,1991)
Alm
assyetal.(1998)
Socialinteraction,im
portanceofconstraints
earlyim
itation(N
adelandB
utterworth,1999)
Andry
etal.(2002)
Sensorim
otorcoordination
reflexivebehavior
(Piaget,1953)
Berthouze
etal.(1997)
Self-exploration,sensorim
otorcategorization
reflexivebehavior
(Piaget,1953)
Berthouze
andK
uniyoshi(1998)
Self-exploration
self-explorationB
erthouzeetal.(1998)
Socialinteraction
infant-caretakerinteractions
(Bullow
a,1979)B
reazealandS
cassellati(2000)
Socialinteraction
prosodicpitch
contours(F
ernald,1985)B
reazealandA
ryananda(2002)
Prospective
control,sensorimotor
coordinationvisuo-haptic
exploration,controlofreaching(B
erthieretal.,1996)
Coehlo
etal.(2001)
Socialinteraction
proto-languagedevelopm
ent(Vygotsky,
1962)D
autenhahnand
Billard
(1999)
Socialinteraction,early
abilitiesactive
intermodalm
atching(M
eltzoffand
Moore,1997)
Dem
iris(1999)
Neuralplasticity
neurotrophicfactors
(Purves,
1994)E
lliottandS
hadbolt(2001)
Socialinteraction
jointattention(B
utterworth
andJarrett,1991)
Kozim
aand
Yano(2001)
Categorization,neuralplasticity
homeostatic
plasticitym
echanism(Turrigiano
andN
elson,2000)K
richmar
andE
delman
(2002)
Socialinteraction,self-exploration
earlyim
itation,bodybabbling
(Meltzoffand
Moore,1997)
Kuniyoshietal.(2003)
Degrees
offreedom,value
systems
freezing/unfreezingofdegrees
offreedom(B
ernstein,1967)Lungarella
andB
erthouze(2002c)
Self-organization,self-exploration
bouncing,entrainment(G
oldfield
etal.,1993)Lungarella
andB
erthouze(2003)
Stage-like
process,valueinfantreaching
behavior(D
iamond,1990)
Marjanovic
etal.(1996)
Stage-like
process,valueinfantreaching
behavior(K
onczaketal.,1995)
Metta
etal.(1999)
Socialinteraction
mirror
systems
(Gallese
etal.,1996)M
ettaand
Fitzpatrick
(2003)
Socialinteraction,im
portanceofconstraints
jointvisualattentionN
agaietal.(2002)
Sensory-m
otorcoordination,self-organization
categorylearning
(Thelen
andS
mith,1994)
Pfeifer
andS
cheier(1997)
Socialinteraction,stage-like
processm
odelofjointattention(B
aron-Cohen,1995)
Scassellati(1998)
Categorization,sensorim
otorcoordination
explorativebehaviors
(Rochat,1989)
Scheier
andLam
brinos(1996)
Categorization,value
systems,neuralplasticity
perceptualcategorization(E
delman,1987)
Sporns
etal.(2000)
Value
systems,neuralplasticity
neuromodulatory
system(S
chultz,1998)S
pornsand
Alexander
(2002)
Socialinteraction
eye-armcoordination
Stoica
(2001)
Socialinteraction,early
abilitiesproto-linguistic
functions(H
alliday,1975)V
arshavskaya(2002)
Value
systemN
AW
engetal.(2000)
Socialinteraction,self-organization
contingentmaternalvocalization
(Pelaez-N
oguerasetal.,1996)
Yoshikawa
etal.(2003)
Table 2.2: Explicitely invoked developmental facet(s). NA = Not Available.
2.8. Future prospects and conclusion 58
Subjectarea
Goal/focus
Robot
References
Socially
orientedinteraction
earlyim
itationM
R+
AG
Andry
etal.(2002)
socialregulationA
VH
Breazealand
Scassellati(2000)
regulationofaffective
comm
unicationA
VH
Breazealand
Aryananda
(2002)
proto-languagedevelopm
entM
RD
autenhahnand
Billard
(1999)
earlyim
itationA
VH
Dem
iris(1999)
jointvisualattentionU
TH
Kozim
aetal.(2002)
jointvisualattentionU
TH
+M
RN
agaietal.(2002)
jointvisualattentionU
TH
Scassellati(1998)
eye-armcoordination,im
itationR
AS
toica(2001)
earlylanguage
development
AV
HV
arshavskaya(2002)
vocalimitation
RS
Yoshikawa
etal.(2003)
Nonsocialsensorim
otorinteraction
saccading,gazefi
xationA
VH
Berthouze
andK
uniyoshi(1998)
visuo-hapticexploration
HG
SC
oehloetal.(2001)
visually-guidedpointing
UT
HM
arjanovicetal.(1996)
visually-guidedreaching
UT
HM
ettaetal.(1999)
visually-guidedm
anipulationU
TH
Metta
andF
itzpatrick(2003)
indoornavigation
MR
+A
GW
engetal.(2000)
Agent-related
sensorimotor
controlself-exploration,early
abilities,categorizationA
VH
Berthouze
etal.(1998)
self-exploration,earlyim
itationU
TH
+M
RK
uniyoshietal.(2003)
pendulation,morphologicalchanges
HD
Lungarellaand
Berthouze
(2002c)
bouncing,entrainment
HD
Lungarellaand
Berthouze
(2003)
Mechanism
sbehavioralinteraction,neuralplasticity
MR
+A
GA
lmassy
etal.(1998)
sensorimotor
categorization,self-organizationA
VH
Berthouze
andK
uniyoshi(1998)
sensorydeprivation,neuralplasticity
MR
Elliottand
Shadbolt(2001)
invariantobjectrecognition,conditioningM
R+
AG
Krichm
arand
Edelm
an(2002)
categorization,valueM
R+
AG
Pfeifer
andS
cheier(1997)
categorization,cross-modalassociations,exploration
MR
+A
GS
cheierand
Lambrinos
(1996)
categorization,conditioning,valueM
R+
AG
Sporns
etal.(2000)
neuromodulation,value
MR
+A
GS
pornsand
Alexander
(2002)
Table 2.3: Representative examples of developmentally inspired robotics research. AVH = Ac-tive Vision Head, UTH = Upper-Torso Humanoid, MR = Mobile Robot, HD = Humanoid, HGS =Humanoid grasping system, UTH+MR = Upper-Torso Humanoid on Mobile Platform, MR+AG =Mobile Robot equipped with Arm and Gripper, RS = Robotic System.
Chapter 3
Freezing and Freeing Degrees of
Freedom1
A skilled response is [..] highly organized, both spatially and temporally. The central problem for
skill learning is how such organization or patterning comes about. (Fitts, 1964)
3.1 Synopsis
The robust and adaptive behavior exhibited by natural organisms is the result of a complex interaction
between various plastic mechanisms acting at different time scales. So far, researchers have concen-
trated on one or another of these mechanisms, but little has been done toward integrating them into a
unified framework and studying the result of their interplay in a real world environment. In this chapter,
we present experiments with a small-sized humanoid robot that learns to swing. They illustrate that
the exploitation of neural plasticity, entrainment to physical dynamics, and body growth (where each
mechanism has a specific time scale) leads to a more efficient exploration of the sensorimotor space and
eventually to a more adaptive behavior. Such a result is consistent with observations in developmental
psychology.
3.2 Introduction
The ontogeny of any biological organism is a complex process. The different parts composing the
developing system are mutually interdependent, and are uneven in their rate of growth. Development
is especially susceptible to environmental influences, and its temporal unfolding makes it particularly
1Appeared as Lungarella, M. and Berthouze, L. (2002). On the interplay between morphological, neural, and environ-mental dynamics: a robotic case-study. Adaptive Behavior, 10(3-4), pp.223-241, 2002.
59
3.2. Introduction 60
hard to establish the precise time of onset of specific skills during infancy or childhood, which in turn
makes it very difficult to order the onset of different abilities with respect to one another. Traditionally,
both the capabilities and the limitations of newborns have been attributed to maturational processes in
the central nervous system (McGraw, 1945; Gesell, 1946). The disappearance of certain patterns of
behavior, or the emergence of others over time have been viewed as a derivative of processes or events
occurring at some higher level, or to paraphrase Bushnell and Boudreau (1993), as changes in the mind
that would effect changes in the ability to deploy the body. This view attracted considerable atten-
tion and resulted in various models of, for example, the role of myelinization in the central nervous
system or the cortical inhibition of infantile reflexes during development (McGraw, 1940; Dekaban,
1959). However, a growing body of evidence has shown that the development of body morphology
(physical growth) also plays a major role in the emergence and disappearance of certain behavioral
patterns and of some aspects of perceptual and cognitive development (Thelen et al., 1984; Bushnell
and Boudreau, 1993; Thelen and Smith, 1994; Goldfield, 1995). Limitations at the morphological level
(e.g., changes in the mass of the eyeball) induce constraints at the cognitive level (e.g., disruption of
the development of binocular depth perception) (Aslin, 1988). Bushnell and Boudreau, for instance,
consider motor development to function as a “rate-limiting factor” in the development of perceptual
capabilities (haptic and depth perception). Naturally, these constraints – the so-called “developmen-
tal brake” (Harris, 1983) – have implications on the adaptivity of the organism. Many developmental
psychologists hypothesize that constraints in the sensory system and biases in the motor system early
in life, may have an important adaptive role in ontogeny (Turkewitz and Kenny, 1982; Bjorklund and
Green, 1992). Limitations in the sensory and motor apparata result in a reduction of the complexity
of he sensory information that impinges on the learning system during its interaction with the environ-
ment, and therefore facilitate adaptivity. Later, those initial constraints or biases are lifted, inducing
changes at the neural level, which in turn result in new patterns of environmental interaction. Bushnell
and Boudreau talk of “motor development in the mind” to refer to the co-development of the sensory
and motor system and report that specified motor abilities must be executed for the corresponding
perceptual abilities to emerge. Exploration and spontaneous movements play a critical role in this re-
gard (Von Hofsten, 1993). Although they do not know the variety of ways in which these limbs may be
used, infants are capable of spontaneously moving their limbs, from the fetal period onward (Smother-
man and Robinson, 1988; Robinson and Smotherman, 1992; Prechtl, 1997). Piaget (1953) emphasized
that when infants perform movements over and over again they are in fact exploring their own action
system. Properties of the body are actively explored while performing these spontaneous movements
so that the organism can sustain certain motions and create new forms out of them. While learning a
task, the infant may try out different musculo-skeletal organizations and explore its parameter space
guided by the dynamics of the task. In other words, these movements may be seen as actions focused
on the exploration of the external world, and on the infant’s own sensorimotor parameter space (Prechtl,
3.3. Learning to swing 61
1997). In fact, Goldfield (1995) hypothesized that the goal of exploration by an infant actor may be to
discover how to harness the energy being generated by the ongoing activity, so that the actual muscular
contribution to the act can be minimized. In this respect, it is worth noting that spontaneous move-
ments emerge during fetal life and disappear during later development, when voluntary motor activity
appears.
3.3 Learning to swing
In this chapter, we address the case of a small-sized humanoid robot that learns to pendulate, that is, to
swing as a pendulum. While various models have been proposed to control the behavior of a swinging
object (e.g., (Inaba et al., 1996; Miyakoshi et al., 1994; Saito et al., 1994; Schaal and Sternad, 1998;
Williamson, 2001)), we are not aware of any attempt to place it in a developmental context. Yet, there
is good reason to believe that such an approach would be justified. First of all, swinging can be seen
as a form of tertiary “circular reaction”, an essential component of the sensorimotor stages of Piaget’s
developmental schedule (Piaget, 1945). Circular reaction refers to the repetition of an activity in which
the body starts in one configuration, goes through a series of intermediate stages, and then returns to
the configuration from which it started. Rhythmic activity is highly characteristic of emerging skills
during the first year of life, and Thelen and Smith (1994) suggested that oscillations are the product
of a motor system under emergent control – when infants attain some degree of intentional control of
their limbs or body postures, but when their movements are not fully goal-corrected. Second, swinging
movements feature a complex interplay between environmental dynamics, body dynamics and neural
dynamics, which may benefit from an exploratory approach, i.e., not from a rigid selection of both
morphological and control parameters, but from a staged exploration of the various mechanisms.
Some instances of a developmental approach to complex control issues have been re-
ported. Berthouze and Kuniyoshi (1998) described experiments with a nonlinear redundant four de-
grees of freedom robotic vision system, where, to reduce the risk of being trapped in “stable but
inconsistent minima”, the introduction of two of the four available degrees of freedom was delayed
in time. This developmental strategy reduced the complexity of learning for each joint and led to a
faster stabilization of the controllers’ adaptive parameters. In a similar vein, Metta (2000) described a
robotic system called Babybot that used a staged release of the various mechanical degrees of freedom
to acquire the correct information for building sensory-motor and motor-motor transformations. In
both instances, development consisted of a delayed introduction of resources (the mechanical degrees
of freedom), which reduced the learning complexity of a particular task, e.g., the tracking of a pendu-
lum as in (Berthouze and Kuniyoshi, 1998). The issue was thus cast in an information-theoretic light,
and the focus was on how the introduction of bodily constraints benefits learning, rather than changes
in behavior. In that sense, the approach described by Berthouze and Kuniyoshi (1998) and by Metta
3.3. Learning to swing 62
(2000) is similar to existing connectionist learning techniques known as constrained or incremental
learning (Newport, 1990; Elman, 1993; Elman et al., 1996; Westermann, 2000), in which neural net-
works are able to learn a task only if initially handicapped by severe limitations, e.g., the reduction of
the memory size or of the number of nodes in the hidden layer.
The focus of this chapter is not on the information-theoretic implications of the developmental ap-
proach, but rather on the effects of bodily changes on behavioral performance during learning. We will
show that even though we employ a value-based regulation of neural plasticity to generate adaptive
behavior, exploiting the inherent adaptivity of motor development leads to behavioral characteristics
not obtainable by simply manipulating neural parameters. Furthermore, we will present evidence to
support the hypothesis that a developmental use of the degrees of freedom (a slow mechanism) can
help the skill acquisition process by stabilizing the interaction between environmental and neural dy-
namics (both fast mechanisms if we restrict ourselves to the transient synaptic changes characteristic
of perception-action).
Only Taga’s studies (Taga, 1997, 2000) on the development of bipedal locomotion in human infants
seem to share a similar focus. Taga proposed a computational model showing that, via a process of
freezing and freeing of the degrees of freedom of the neuro-musculo-skeletal system, the “u-shaped” 2
changes in performance typical of the development of stepping could be reproduced. In (Taga, 1997),
he concluded that it remains to be shown how the developing neural system drives the freeing and
freezing degrees of freedom and that future studies could be aimed at elucidating how the mechanisms
of freeing and freezing can be applied to the development of other types of movements.
From that viewpoint, our study is novel in that it deals with a different class of motor control
problems than those discussed by the researchers cited above. In our experimental system, pendulation
is not achieved by actuation of the pendulum but is induced by the reaction of the actuated parts (legs)
on the body. Because the body is coupled to the environment through a pendular mechanism (a non-
actuated or passive degree of freedom), body motion (and thus swinging) is possible. It is important
to note that the mechanical system is underactuated, i.e., there are fewer actuators than degrees of
freedom and proprioceptice feedback will refer to body motion and not to motion of the actuated
parts (leg joints). In that sense, the complexity of its control can be compared to that of an extended
version of the simple inverted pendulum, or of the double inverted pendulum, depending on whether
one or two mechanical degrees of freedom are considered. Although this particular control problem
has been extensively studied (Anderson, 1989; Spong, 1995), our developmental approach is novel. We
expect that starting with fewer degrees of freedom will result in multiple directions of stability which,
2U-shaped in this particular context refers to the fact that newborns’ stepping movements show a recognizable structure intime and space. While stepping movements stop when infants are about 2 months old, they reappear at around 8-10 months.This puzzling phenomenon was traditionally ascribed to maturation of the nervous system. However, Thelen et al. (1984)provided clear evidence for a biomechanical explanation, namely that of a changing balance between leg weight and musclestrength.
3.4. Experimental framework 63
while not necessarily yielding optimal task performance, will nonetheless guide the coordination of
additional degrees of freedom. These additional degrees of freedom may then allow for optimal task
performance as well as for more tolerance and adaptation to environmental perturbations.
3.4 Experimental framework
The experimental platform consisted of a small-sized humanoid robot with 12 degrees of freedom
(sometimes DOF hereafter). Through two thin metal bars fixed to its shoulders, the robot was attached
by a passive joint to a supportive metallic frame, in which it could freely oscillate in the vertical
(sagittal) plane (see Fig. 3.1). Each leg of the robot had five joints, but only two of them – hip and knee
– were used in our experiments. Each joint was actuated by a high torque RC-servo motor. Figure 3.2
Free jointFrame
Controller
Hip joint
Knee joint
Figure 3.1: Humanoid robot used in our experiments.
depicts the distributed architecture used to control the humanoid robot. Each limb was controlled
by a separate neural oscillator. Neural oscillators are particular neural structures that can produce
rhythmic activity without rhythmic input, and that are hypothesized to be responsible for producing
rhythmic movements, during activities such as swimming, walking and running, in invertebrates to
higher vertebrates (Ijspeert, 2003). The usage of oscillators in a robotic system is not novel but our
3.4. Experimental framework 64
mo
tor
com
man
ds
tonic impulse
pattern generators
free joint kneehip
frame of reference
pro
pri
oce
pti
on
join
t sy
ner
gy
f
e
e
f
Figure 3.2: Schematics of the experimental system and the control architecture. Proprioceptivefeedback consists of the visual position of the hip marker in the frame of reference centered onthe hip position when the robot is in its resting position, i.e., vertical position. Joint synergywas only activated in experiments involving coordinated 2-DOF control.
focus is not on the control structure per se. Instead, we are interested in the capability of oscillators
to entrain to the frequency of an input – be it an external signal or the output of another oscillator
unit – over a wide range of frequencies. Indeed, in our framework, couplings are more relevant than
individual systems, a view also advocated by Hatsopoulos (1996). In this regard, oscillators are suitable
structures to implement a distributed control architecture and to consider developmental mechanisms
such as the freezing and freeing of the different degrees of freedom in particular.
3.4. Experimental framework 65
3.4.1 Neural oscillators and joint synergy
Each neural oscillator was modelled after Matsuoka’s Matsuoka (1985) differential equations:
τuu f = −u f −βv f −ωc[ue]+−ωp[Feed ]+ + te (3.1)
τuue = −ue−βve−ωc[u f ]+−ωp[Feed ]−+ te (3.2)
τvv f = −v f + [u f ]+ (3.3)
τvve = −ve + [ue]+ (3.4)
where ue and u f are the inner states of neurons e (extensor) and f (flexor), ve and v f are variables
representing the degree of adaptation or self-inhibition of the extensor and flexor neurons and te is an
external tonic excitation signal. β is an adaptation constant, ωc is a coupling constant that controls the
mutual inhibition of neurons e and f , and ωp is a parameter weighting the proprioceptive feedback
Feed . Both τu and τv are time constants of the neurons’ inner states and determine the strength of the
adaptation effect. The operators [x]+ and [x]− return the positive and negative parts of x, respectively.
Because the servo motors used to actuate the robot did not provide any form of sensory feedback,
we used an external camera to track colored markers placed on the robot’s limbs. In all experiments,
proprioceptive feedback (Feed in Equation 3.1) refers to the visual position of the hip in a frame of
reference centered on the hip position when the robot is in its resting position (see Figure 3.2 for
a graphic description). It is important to note that, unlike most models in the literature, we have
not exploited any kinematic information on the robot itself (such as its anatomical angles) but only
kinematic information on the position of the robot with respect to the fixation point of the pendulum.
This was a natural step because our focus was on the swinging behavior. However, we will also
show that this choice affected the strong entrainment property usually found in neural oscillator-based
systems.
Joint synergy, which occurs in the human motor system, was implemented by feeding the flexor
unit of the knee oscillator with the combined outputs of the extensor and flexor units of the hip con-
troller. A factor −ωs([uhe ]+ +[uh
f ]+) was added to the term τuu f in the flexor unit of the knee oscillator
(Equation 3.1), with uhe and uh
f the inner states of the flexor and extensor units in the hip oscillator. ωs
is the intersegmental coupling parameter and determines the strength of the coupling.
Unless specified otherwise, the following control parameters were kept constant throughout the
study: β = 2.5, ωc = 2.0, ωp = 0.5, teh = 20 (hip tonic excitation) and tek = 15 (knee tonic excita-
tion). These experimentally determined values were selected because they offered the best compro-
mise between stability of the controllers and plasticity to environmental perturbations (Lungarella and
Berthouze, 2002a). Other parameters were set as discussed in the text.
3.5. Experimental results and discussion 66
3.4.2 Joint control
Similarly to Taga (1991), we used neural oscillators as rhythm generators, with an output activity y
given by the difference y = u f − ue between the activities of the flexor and extensor units. In most
robotic studies we are aware of, the oscillator’s output y is used as a motor command to control each
motor, either in position, force or torque. In systems with high-torque DC motors or pneumatic ac-
tuators and in systems with high-bandwith sensory feedback (> 1kHz) for example, this solution is
viable because the frequency of the control cycle is high enough. However, because our motor control
frequency was very low (around 15Hz) and the motors did not provide a sufficiently large torque, little
or no output torque could be expected on the pendulum when the amplitude of the pattern generator
output was either too low or changed too quickly. Thus, a high amplitude motor command was neces-
sary. Consequently, the output y of the rhythm generator was fed to a pulse generator whose output pg
was given by:
pgt = te(sgn(yt )− sgn(yt−δt)) (3.5)
where sgn(x) is the sign function, te is the tonic excitation of the neural oscillator (fixed throughout the
study), and δt is a very small time interval. In effect, this function detects sign changes in the output y
of the neural oscillator and generates a pulse of amplitude te and of sign sgn(yt ). The output pgt was
used as the actual motor command (control in position). Though very primitive (a variant of on-off
control), this controller is a suitable approximation of the output y. Indeed, it preserves frequency,
maximal amplitude as well as timing of the sign inversions within one period. Figure 3.3 illustrates
how changes in τu and τv are suitably reflected by the output of the pulse generator. In fact, the only
drawback of this control scheme is a phase shift which is easily compensated for by entrainment.
Finally, this controller is also interesting in that it implements a ballistic form of control 3, which is
consistent with the emerging control of movements in young infants (Von Hofsten, 1984).
3.5 Experimental results and discussion
3.5.1 Protocol
With the aim of a comparative analysis between an outright use of all degrees of freedom and a progres-
sive freeing of the degrees of freedom, we realized two sets of experiments. In the first one, 2-DOF
exploratory control was considered. Each pair of joint (hip, knee) was controlled by a separate os-
cillator unit. Other joints were kept stiff, in their reset position. Two cases were considered: in the
3Ballistic motor control is open-loop control and refers to the absence of feedback during movement performance. Ex-amples of ballistic movements include saccadic eye movements, and rapid aiming movements.
3.5. Experimental results and discussion 67
140 160 180 200 220 240 260 280 300
−20
−10
0
10
20
Time
Am
plitu
de
140 160 180 200 220 240 260 280 300
−20
−10
0
10
20
Time
Am
plitu
de
140 160 180 200 220 240 260 280 300
−20
−10
0
10
20
Time
Am
plitu
de
Figure 3.3: Comparison between the output of the pulse generator (thick impulse) and theoutput of the oscillator (solid line) for three different configurations of τu and τv, given a sameproprioceptive feedback (dotted line). The control settings were set as follows: τu = 0.02,τv =0.25 (top); τu = 0.06, τv = 0.25 (middle); τu = 0.06, τv = 0.75 (bottom). Note that while the ratio τu
τvis unchanged between the top and the bottom graph, both the frequency of the output and thenumber of impulses per period (i.e., the shape of the output) are changed. The vertical axisdenotes the amplitude of each signal. The horizontal axis denotes time steps (one time step is33ms).
first case, oscillator units were perfectly independent and their respective parameter space was inde-
pendently explored; in the second case, oscillator units were coupled via an intersegmental coupling
parameter ωs, with the assumption that it may lead to neural entrainment between oscillatory units.
From a control point of view, the former case is merely a particular instance of the latter with the inter-
segmental coupling parameter set to ωs = 0. In the second set of experiments, a bootstrapping 1-DOF
exploratory phase was considered during which only the hip joint was controlled, while other joints
were kept stiff, in their reset position. When (if) a stationary regime was obtained, the second degree
of freedom – knee – was released and controlled by its own oscillator unit. Again, the two cases above
were considered, with either independent control or synergetic control.
The humanoid robot’s movements were analyzed via the recording of hip, knee, and ankle posi-
3.5. Experimental results and discussion 68
tions. The same initial conditions were used in all experiments, with the humanoid robot starting from
its resting position. Unless specified otherwise, all parameter configurations were assumed to yield
motion without external intervention.
3.5.2 Exploratory process
In line with our interpretation of the swinging behavior as a “circular reaction”, we constructed a simple
value system to regulate the exploratory process. Value systems are usually defined as general biases
that are supposed to be heritage of natural selection, and which modulate learning. A number of robotic
systems have used such systems (e.g. Pfeifer and Scheier, 1999; Sporns et al., 2000). In our study,
the value system was implemented as a function of the maximum amplitude of the oscillation within a
given time window. The value v at time t was given by:
vt = maxvt−1(1− ε), |At | (3.6)
where |At | denotes the absolute value of the instantaneous amplitude of the oscillation, estimated by
measuring the visual position of the hip marker in the saggital (vertical) plane. The term (1− ε), with
0 < ε << 1, implements an exponential decay of the value when the oscillations remain consistently
lower than the previously achieved maximal amplitude. With an appropriate selection of ε, the decay is
not rapid enough for the value to decrease within a single period of a stable oscillation whose frequency
is in the range of the control frequencies considered in this study, that is, in the range [0.8,1.2]Hz.
Assuming continuity in a small neighborhood of parameter configuration, the following explo-
ration principle was adopted: when a parameter setting yields good performance (a high value v in
the value system), slow down the changes of parameters. Conversely, trigger a rapid and large change
of parameters when the setting results in low-amplitude oscillations. This is classically referred to as
the “exploration-exploitation dilemma.” On the one hand, the system should explore the parameter
space, whereas on the other hand, it should exploit good parameter configurations its exploration has
uncovered.
We implemented a mechanism inspired by a process of “Boltzmann exploration” and “Simulated
Annealing” (Kirkpatrick et al., 1983). Exploration is regulated by a parameter called “temperature” –
here 1v , where v is the value determined by the value system – so that when the temperature decreases,
exploitation of the parameter setting takes over and vice-versa, exploration is favored when the tem-
perature increases. Exploration of the parameters takes the form of an additive form of noise, whose
amount is a function of the temperature. The process is formally defined by the following equations:
τt+1u = τt
u + f (v)(τmaxu − τmin
u )Dx (3.7)
τt+1v = τt
v + f (v)(τmaxv − τmin
v )Dy (3.8)
3.5. Experimental results and discussion 69
where τmaxu,v and τmin
u,v define the range of exploration for parameters τu and τv of the extensor and
flexor neurons. Dx and Dy are stochastic variables with a discrete and uniform probability distribu-
tion P(−1,0,+1) = 13 and define the direction of change in the two-dimensional (τu,τv) parameter
space. f (v) = c(e10
1+v −1), with c an experimentally determined multiplicative constant (c = 0.1 for τu
and c = 1.0 for τv), determines the amount of change between old and new parameter configurations.
For values in the range v ∈ [0.0,160.0] (the range of visual amplitudes), this function was found to
yield the best results in terms of the trade-off between exploration and exploitation of the parameter
space. In effect, the parameter change from time step t to time step t +1 can be interpreted as a random
walk in the parameter space, with a value-dependent step size.
The unfolding of the resulting exploration process is illustrated in Figure 3.4. Initially, the low
amplitude oscillations of the system yield a low value v, that is, a high temperature 1v , which results
in a large step size. The exploratory process traverses the parameter space very rapidly. When a
parameter configuration yields a higher value v, the step size decreases until the exploration process
effectively converges onto one narrow region of the parameter space. At this stage, habituation occurs.
Habituation is one of the most elementary and ubiquitous form of plasticity and can be defined as
a decrease in the strength of a behavioral response that occurs when an initially novel stimulus is
presented repeatedly (Wang, 1995). In our study, it was simply implemented as an exponential decay
of the value v when the system remained in a 10− sec stationary regime (sustained oscillations). With
the resulting decay in value, the step size increases again and new areas of the parameter space are
explored.
3.5.3 Experimental observations
A number of experiments were performed, involving explorative runs of roughly 10 minutes, with ini-
tial conditions in the range τh,ku ∈ [0.02,0.04] and τh,k
v ∈ [0.2,0.4] for both hip and knee controllers. This
range was selected because it corresponds to a low-yield region of the parameter space (experimental
determination), and therefore guarantees that exploration will be necessary to reach a high-yield region
of the parameter space.
Within each scenario – 2-DOF exploration, 1-DOF exploration and bootstrapped 2-DOF explo-
ration – all runs were found to yield qualitatively similar results in terms of the characteristics of the
value landscape obtained, with variations accounted for by differences in initial conditions. For prac-
tical reasons (excessive strain on the physical structure of the robot as well as on the servo motors and
duration of a single experimental run), it was not possible to carry out enough runs to produce a sta-
tistically meaningful sample and therefore no statistical measurements (e.g., variances between runs)
were calculated.
3.5. Experimental results and discussion 70
τv
τu
Time (ms)
BA
C
A B C
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.02 0.03 0.04 0.05 0.06 0.07 0.08
800000 850000 900000 950000 1000000
Figure 3.4: Value-dependent exploration. The upper graph depicts the time series of the oscilla-tory movement of the robot’s hip (top) and the associated value v in the value system (bottom).Rectangular areas point to decreases of value caused by habituation. The lower graph depictsthe corresponding trajectories in parameter space. Oval areas point at dense regions of highyield parameter settings, i.e., the large oscillations observed in the time series.
3.5. Experimental results and discussion 71
Ruggedness of the value landscape in a 2-DOF independent control configuration
Figure 3.5 depicts the value landscape uncovered by a single explorative run in the 2-DOF configura-
tion with no joint synergy (ωs = 0). Each dot represents a parameter setting visited by the exploratory
process and its size is proportional to the value yielded by the setting. The plot shows that the ex-
ploratory process covered a large part of the parameter space in both hip and knee spaces and that
high-value regions are sparse and small. The latter is confirmed by the probability distribution function
of the value landscape (Figure 3.6, top). The distribution is clearly skewed towards the low values.
Under systematic exploration, the value landscape shows similar properties, as shown by Figure 3.7.
These observations indicate the presence of a “rugged” value landscape, where small changes
in parameters can be expected to yield different oscillatory behaviors. To confirm this hypothesis,
we performed a systematic analysis of the oscillatory behaviors found in a neighborhood of control
parameters. A systematic exploration of a limited region of the hip and knee parameter spaces –
namely, τhu ∈ [0.055,0.065], τh
v ∈ [0.55,0.65] and τku ∈ [0.025,0.035], τk
v ∈ [0.25,0.35] – was realized
with seven experiments, the results of four of which we discuss below. In each experiment, the resulting
behavior was evaluated in terms of the presence or absence of a stationary regime, the amplitude of
this regime, its smoothness (qualitatively), the relative configuration of hip and knee motor commands
as observed in a hip-ankle phase plot and the robustness to external perturbations (such as a manual
push). Each experiment started with the same initial conditions, that is, with the robot in its resting
position.
Though the parameter space now considered was very narrow, a small change of parameters yielded
very different behaviors. Qualitatively, the following states were observed. With τhu = 0.060, τh
v =
0.60 and τku = 0.030, τk
v = 0.30, our reference configuration for this experiment, a smooth stationary
regime of the hip oscillation was observed, with an amplitude of 80 units. While in phase with the hip
oscillations, the ankles did not reach a true stationary regime, which resulted in the ankle-hip phase
plot of Figure 3.8(left). This phenomenon can be attributed to a dampening effect stemming from this
particular morphological structure. The system was found to return to its stationary regime even in the
case of external perturbations.
Slightly changing the hip control parameters (τhu = 0.065, τh
v = 0.65) but leaving the knee param-
eters unchanged, resulted in a qualitatively very different behavior. While the ankle position quickly
reached a smooth stationary regime, an overall oscillatory behavior was not found (overall amplitude
of less than 20 units), as illustrated by the phase plot on Figure 3.8(right).
With the knee parameters unchanged, yet another behavior was obtained if the hip control parame-
ters were set to τhu = 0.055 and τh
v = 0.55. In this case, the overall oscillatory behavior was smooth and
reached a stationary regime. Interestingly, the ankle behavior exhibited several transitions to different
stationary regimes, the succession of which is depicted in Figure 3.9. Transitions between stationary
3.5. Experimental results and discussion 72
0.2 0.3 0.4 0.5 0.6 0.7 0.80.02
0.03
0.04
0.05
0.06
0.07
0.08
Tv
Tu
0.2 0.3 0.4 0.5 0.6 0.7 0.80.02
0.03
0.04
0.05
0.06
0.07
0.08
Tv
Tu
Figure 3.5: Value landscapes (left: hip parameter space; right: knee space) uncovered by asingle exploratory run in an independent 2-DOF configuration (ωs = 0). The size of a dot (acontrol setting visited by the exploratory process) is proportional to the value v obtained forthat particular control setting. Initial conditions were similar for both joints, namely, τh,k
u ∈[0.02,0.04] and τh,k
v ∈ [0.2,0.4]. The exploratory run took roughly 10 minutes.
3.5. Experimental results and discussion 73
0 0.1 0.2 0.3 0.4 0.5 0.60
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Value
Pro
babi
lity
0 0.1 0.2 0.3 0.4 0.5 0.60
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Value
Pro
babi
lity
0 0.1 0.2 0.3 0.4 0.5 0.60
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Value
Pro
babi
lity
Figure 3.6: Probability distribution functions of value landscapes obtained in three differentscenarios: independent 2-DOF exploration (top), 1-DOF exploration (middle) and bootstrapped2-DOF (bottom). The corresponding value landscapes are found in figure 3.5, 3.11(right)and 3.13 respectively. In each graph, the value space [0.0,0.6] was discretized into 50 bins. Sim-ply stated, each graph indicates the probability (vertical axis) that a value v (horizontal axis)occurs during the exploratory run considered. In the three scenarios, same initial conditionswere used.
3.5. Experimental results and discussion 74
Figure 3.7: Value landscape obtained during a systematic exploration of the knee parameterwith an arbitrarily chosen hip parameter setting (τh
u = 0.045,τhv = 0.65). The parameter space was
discretized in a 15x15 sampling and the figure is a linear approximation of the resulting valuesv. Brighter colors denote higher-yield settings. The experiment lasts about 150 minutes.
Hip
mar
ker
posi
tion
Ankle marker position
Hip
mar
ker
posi
tion
Ankle marker position
40
60
80
100
120
140
160
180
200
0 50 100 150 200
40
60
80
100
120
140
160
180
200
0 50 100 150 200
Figure 3.8: Effect of a small change in the hip control parameters on the ankle-hip phase plotsin the independent 2DOF configuration: left, oscillatory behavior without a true stationaryregime (τh
u = 0.060, τhv = 0.60, τk
u = 0.03, τkv = 0.3); right, no oscillatory behavior (τh
u = 0.065, τhv = 0.65,
τku = 0.03, τk
v = 0.3). In both graphs, the axes denote the horizontal coordinates of the hip andankle markers’ visual positions.
3.5. Experimental results and discussion 75
Hip
mar
ker
posi
tion
Ankle marker position
Hip
mar
ker
posi
tion
Ankle marker position
Hip
mar
ker
posi
tion
Ankle marker position
Hip
mar
ker
posi
tion
Ankle marker position
40
60
80
100
120
140
160
180
200
0 50 100 150 200
40
60
80
100
120
140
160
180
200
0 50 100 150 200
40
60
80
100
120
140
160
180
200
0 50 100 150 200
40
60
80
100
120
140
160
180
200
0 50 100 150 200
Figure 3.9: Evidence of preferred stable states and phase transitions in the independent 2DOFconfiguration: successive pseudo-stationary regimes obtained with τh
u = 0.055, τhv = 0.55, τk
u =0.03, τk
v = 0.3. Each graph shows the corresponding ankle-hip phase plot. In all graphs, theaxes denote the horizontal coordinate of the hip and ankle markers’ visual positions.
3.5. Experimental results and discussion 76
regimes were very rapid. Interestingly, Goldfield (1995) reported that it is a characteristics of sponta-
neous activity in infants that it enters preferred stable states and exhibits abrupt phase transitions. After
perturbation, the hip returned to its former stationary regime. The pseudo-stationary regimes in the
motion of the ankle only partially overlapped with those observed earlier.
Hip
mar
ker
posi
tion
Ankle marker position
Ank
le m
arke
rH
ip m
arke
rK
nee&
hip
posi
tion
posi
tion
com
man
d
Time (in ms)
40
60
80
100
120
140
160
180
200
0 50 100 150 200 0 10000 20000 30000 40000 50000 60000
0
50
−50
0
50
−50
0
0
Figure 3.10: Large amplitude smooth performance after a long transient: left, the ankle-hipphase plot with τh
u = 0.055, τhv = 0.65, τk
u = 0.025, and τkv = 0.35; right, the corresponding time-
series for hip and ankle visual positions and motor commands.
Finally, with τhu = 0.055, τh
v = 0.65 and τku = 0.025, τk
v = 0.35, seemingly optimal performance was
observed. An amplitude of 120 units was achieved and sustained. In-phase smooth oscillatory behavior
was obtained both at the hip and ankle level. The hip-ankle phase plot is given in Figure 3.10(left).
The time series provided in Figure 3.10(right) shows that this stationary regime was achieved only
after a smooth transient of about 50s. This regime was found to show good robustness against external
perturbations.
1-DOF exploration and physical entrainment
Freezing the lower degree of freedom yielded a very different value landscape. Figure 3.11 depicts the
value landscape uncovered by a single explorative run. As shown by the large number of configurations
visited and the size of the dots (the value), the system settled briefly in a number of oscillatory behaviors
of moderate value v. A quantitative measure of these states is provided by the probability distribution
function shown by Figure 3.6(middle). It can also be noted that all higher-yield configurations were
located in a compact region of the state – roughly, τhu ∈ [0.02,0.08] and τh
v ∈ [0.5,0.8] – an observation
confirmed when a systematic exploration of the parameter space was performed (Figure 3.12). The cor-
responding configurations were found to exhibit good robustness against environmental perturbations,
such as a manual push (Lungarella and Berthouze, 2002a).
We suggest that the compact region of the parameter space found to yield consistent values v corre-
3.5. Experimental results and discussion 77
0.2 0.3 0.4 0.5 0.6 0.7 0.80.02
0.03
0.04
0.05
0.06
0.07
0.08
Tv
Tu
Figure 3.11: Value landscape (hip space) uncovered by a single exploratory run in a 1-DOFconfiguration, i.e., the second DOF (knee) is frozen. The size of a dot (a control setting visitedby the exploratory process) is proportional to the value v obtained for that particular controlsetting. Initially, τh
u and τhv were randomly selected in the interval [0.02,0.04] and [0.2,0.4] respec-
tively. The exploratory run took roughly 10 minutes.
Figure 3.12: Value landscape obtained during a systematic exploration of the hip parameterspace in a 1-DOF configuration, i.e., the second DOF (knee) was frozen. The parameter spacewas discretized in a 15x15 sampling and the figure is a linear approximation of the resultingvalues. Brighter colors denote higher-yield settings. The experiment took about 150 minutes.
3.5. Experimental results and discussion 78
sponds to a range of frequencies where “physical entrainment” – entrainment to body dynamics – can
take place. Evidence for that can be found by comparing the frequency of the oscillating system with
both its natural frequency and its control frequency. A difference with either indicates that both body
dynamics, that is, reaction forces of the actuated body parts on the body, inertia, and environmental
forces contribute to shift the system’s frequency away from the frequency it would otherwise show in
a disembodied setup. The exploitation of such dynamics has been shown to yield robust behavior in
various tasks (Williamson, 2001; Miyakoshi et al., 1994).
The natural frequency of the system was measured by manually pushing the robot and letting
it swing freely, while tracking the position of the hip marker. The frequency was experimentally
found to be 0.905Hz (period of 1105ms) and this value was confirmed by spectral analysis of the
hip position’s time series (with a sampling frequency of 33Hz). We then considered two parameter
settings located in the high-yield compact area identified in Figure 3.12, namely, τhu = 0.040, τh
v = 0.65
and τhu = 0.070, τh
v = 0.65. In a disembodied system, that is, in simulation, these settings are shown –
by spectral analysis of the oscillator’s output – to produce a control pattern with a frequency of 0.71Hz
and 0.89Hz respectively. Experimentally however, the actual frequencies were found to be 0.77Hz
and 0.96Hz, respectively, which could be explained either by the inaccuracy inherent to servo-motor
control or by friction forces. After the system reached a stationary regime, the frequencies (1.075Hz
and 1.15Hz, respectively) were observed to be significantly different from either the natural frequency
or the control frequency, thus providing us with evidence that physical entrainment did indeed take
place. Frequency measurements made on other oscillator settings of the high-yield compact area were
found to range from 0.93Hz to 1.22Hz. This range of frequency explains the location of the basin of
attraction of Figure 3.12. Indeed, phase locking only takes place if the control inputs are in a range of
frequencies that is not too far apart from the natural frequency of the system. At first sight, this result
is at odds with existing studies showing that entrainment is a robust property and occurs with any
parameter setting such that τuτv∈ [0.1,0.5]. However, it is important to stress again that in these earlier
studies, entrainment is observed between the control frequency of the actuated joint and the feedback
frequency of the actuated system under environmental perturbations, for example, the frequency of the
robot arm sawing a wooden piece, or the arm juggling with a slinky toy (Williamson, 2001). In our
work, however, we are considering the swinging frequency of a system that is not directly actuated.
Therefore, we are discussing entrainment between the induced effects of the controlled parts on the
global system – pendulum + robot – and environmental dynamics, here gravity, physical structure
supporting the actuated system and friction forces.
3.5. Experimental results and discussion 79
0.2 0.3 0.4 0.5 0.6 0.7 0.80.02
0.03
0.04
0.05
0.06
0.07
0.08
Tv
Tu
0.2 0.3 0.4 0.5 0.6 0.7 0.80.02
0.03
0.04
0.05
0.06
0.07
0.08
Tv
Tu
Figure 3.13: Effect of the freeing of the hip DOF on the exploration of the 2-DOF configura-tion. Left, value landscape uncovered by a single exploratory run in a 1-DOF configuration,i.e., the second DOF (knee) was frozen. When the system reached a stable oscillatory state,here denoted by a white triangle (roughly [0.7,0.04]), the second DOF was released. The rightgraph shows the value landscape uncovered by the exploratory process in the resulting 2-DOFconfiguration, with an initial condition represented by the white rectangle (roughly [0.3,0.03]).In both graphs, the size of a dot (a control setting visited by the exploratory process) is pro-portional to the value v obtained for that particular control setting. Initially, τh,k
u and τh,kv were
randomly selected in the interval [0.02,0.04] and [0.2,0.4] respectively. The overall experimenttook roughly 20 minutes.
3.5. Experimental results and discussion 80
Figure 3.14: Effect of the freeing of the hip on the exploration of the 2-DOF configuration.Value landscape obtained during a systematic exploration of the knee parameter space afterits release when the system was in a stable oscillatory state in a 1-DOF configuration. Thehip oscillator was initialized with τh
u = 0.054, τhv = 0.65, which corresponds to a high-yield 1-DOF
configuration. The parameter space was discretized in a 15x15 sampling and the figure is alinear approximation of the resulting values. Brighter colors denote higher-yield settings. Theexperiment took about 150 minutes.
2-DOF bootstrapped control
When the second degree of freedom was released, that is, after the system was stabilized in its 1-DOF
stationary regime, the resulting value space was characterized by a dense distribution of high-yield
parameter settings. In Figure 3.13, we show the results of a single explorative run. The graph on
the left shows the initial part of the experiment, namely, the 1-DOF exploration of the hip parameter
(starting from the same initial conditions as in all other experiments). This value landscape naturally
has similar properties to those observed in Figure 3.11. The triangle denotes the hip parameter setting
after which the knee joint is released (or freed). The graph on the right depicts the value landscape
uncovered by the exploratory run in the knee parameter space, after release. The initial knee setting
is denoted by the white triangle, that is, the same setting used in all experiments. The exploratory
run only covered a compact high-yield region of the parameter space, an observation quantitatively
confirmed by the probability distribution function shown by Figure 3.6 (bottom).
At first sight, this result could appear trivial. Indeed, the freeing of the second degree of freedom
took place when the 1-DOF regime was already yielding a high value. Thus, by taking into account
the morphology of the system as well as the ratio r < 1.0 between knee and hip tonic excitations
(r = tek
teh = 0.75), both the inertia of the already oscillating system and the morphology of the system
could be attributed to the high value yielded when the knee parameter space was explored. However,
3.5. Experimental results and discussion 81
when a systematic exploration of the knee parameter space was realized, using the same hip parameter
as initial condition, we observed the value landscape depicted by Figure 3.14. The figure shows that
the system’s performance was not only accounted for by the inertia generated by the 1-DOF stationary
regime but also by the selection of an appropriate knee control setting. Indeed, the standard deviation of
the probability distribution function in the bootstrapped 2-DOF systematic exploration – SD = 0.0573
– is greater than the standard deviation obtained in the independent 2-DOF systematic exploration –
SD = 0.0386.
Two additional observations are noteworthy. First, the mean of the probability distribution function
obtained in the systematic exploration (mean = 0.474) is higher than the mean value (mean = 0.403) of
the probability distribution function obtained for the exploratory run discussed in this section, thus
indicating that the result depicted by Figure 3.13 (right) was not marginal. Second, this mean value is
also higher than the mean value obtained during the systematic exploration of the 1-DOF configuration
(mean = 0.158), even when considering only the compact area of high value (mean = 0.206 with a
maximal value of 0.540 for τu ∈ [0.02,0.08] and τv ∈ [0.5,0.8]). This indicates that the high value
obtained during the 1-DOF stationary regime could not account for the high value obtained after release
of the second degree of freedom, and in addition, most of the configurations explored yielded a higher
value than possibly obtained in the 1-DOF configuration. This observation validates our hypothesis that
the freezing and subsequent freeing of the second degree of freedom results in higher performance, and,
in effect, reduces the sensitivity of the system to the selection of a particular hip-knee configuration
(when compared to the independent exploration).
A reviewer questioned the fact that the value landscape obtained during 2-DOF control could differ
from the value landscape obtained during bootstrapped 2-DOF control given that no parameters other
than τh,ku ,τh,k
v were varied. Suggesting that differences could only be accounted for by different regions
of the parameter space being explored due to distinct histories, the reviewer questioned how we could
possibly explain the different values obtained in the systematic exploration.
First and foremost, this suggestion does not consider the delayed introduction of the second degree
of freedom. Because the second degree of freedom was introduced after a stationary regime was
obtained in the 1-DOF configuration, the initial conditions for a given hip-knee setting were changed.
Second, in a disembodied system, it could be argued that after a suitable transition period, the
bootstrapped system would eventually return to the state obtained in the independent case. However, it
did not occur in this study – and further experiments by the authors confirmed it even in the presence of
stronger environmental interaction (Lungarella and Berthouze, 2002a) – because physical entrainment
took place. As discussed earlier, the frequency obtained in the 1-DOF case was not equal to the
control frequency. Because both oscillators are fed with the same proprioceptive feedback, namely,
the visual position of the hip marker, when the second degree of freedom is released, its controller is
stimulated by proprioceptive feedback on which the hip oscillator has already entrained. Given the
3.5. Experimental results and discussion 82
ability of oscillators to entrain on an input signal, entrainment between the two joints effectively takes
place. Note, however, that differently from the neural entrainment that we will discuss in the next
section, here entrainment was mediated by the body and not by explicit connections between the two
controllers. A similar result, but in a different context, was reported by Taga (1991) who qualified
such entrainment as “global entrainment” 4, and by Williamson (2001). In the case of independent
control, however, this property cannot be expected because the proprioceptive feedback only reflects
the output activity generated by the particular hip-knee control configuration and thus the resulting
value landscape is very sensitive to the choice of parameters.
Control synergy and neural entrainment
In both 2-DOF independent control and bootstrapped control, the addition of joint synergy resulted in
more or less strongly correlated knee and hip control patterns. Such behavior is characteristic of “neural
entrainment”, whereby the control frequency of the lower limb locks onto the control frequency of the
upper limb. This sort of result has been extensively commented on in the literature (e.g. Taga, 1991;
Williamson, 1998).
Hip
mar
ker
posi
tion
Hip marker position Ankle marker position
Ank
le m
arke
r po
sitio
n
0
50
100
150
200
0 50 100 150 2000
50
100
150
200
0 50 100 150 200
Figure 3.15: Large amplitude oscillations with a strong intersegmental coupling (ωp = 1.0) inthe independent 2DOF configuration when τh
u = 0.055, τhv = 0.65, τk
u = 0.025, τkv = 0.35: phase plots
of the hip (left) and ankle (right) motions in the stationary regime. In both graphs, the axesdenote the horizontal coordinates of the hip (respectively ankle) marker’s visual positions.
In a series of experiments, we studied the role played by the intersegmental coupling gain ωs. With
too low a value, the coordination between hip and knee oscillators was very loose and the resulting
behavior was qualitatively similar to the results obtained in a 2-DOF independent configuration. With
a high value (here, 1.0), a strong coupling occured and the lower limb was essentially driven by the
4In his own terms, “since the entrainment has a global characteristic of being spontaneously established through inter-action with the environment, we call it global entrainment”.
3.5. Experimental results and discussion 83
0.2 0.3 0.4 0.5 0.6 0.7 0.80.02
0.03
0.04
0.05
0.06
0.07
0.08
Tv
Tu
0.2 0.3 0.4 0.5 0.6 0.7 0.80.02
0.03
0.04
0.05
0.06
0.07
0.08
Tv
Tu
Figure 3.16: Toward a flexible 1-DOF system: Effect of an intermediate coupling (ωs = 0.50)between hip and knee on the value landscapes (left: hip parameter space; right: knee space)uncovered by a single exploratory run in a 2-DOF configuration. In both graphs, the size of adot (a control setting visited by the exploratory process) is proportional to the value v obtainedfor that particular control setting. Initially, τu and τv were randomly selected in the interval[0.02,0.04] and [0.2,0.4] respectively. The exploratory run took roughly 10 minutes.
3.6. Conclusion 84
upper limb control unit. From a qualitative point of view, such strong coupling led to the most natural
looking swinging pattern and amplitudes were shown to reach their maximum value. In effect, the
2-DOF system became a “flexible 1-DOF system.” Figure 3.15 shows the resulting phase plots for hip
and ankle motions. Ankle and hip are in-phase and the ankle motion follows a sinusoid of very large
amplitude (160 units). From the point of view of the value system, a strong coupling results in the lower
limb’s control parameters becoming a nonfactor. This is confirmed by the value landscapes uncovered
by an exploratory run. As shown in Figure 3.16, a strong correlation appears between the region of
space covered by the hip exploratory process (left) and the knee exploratory process (right). When
the hip was controlled by a high-yield setting (and note that in this particular run, almost all settings
were in the high-yield region discussed earlier), the value of the 2-DOF system was high because the
lower-limb rapidly phase-locked on the hip (by neural entrainment) and thus physical entrainment (as
observed in the 1-DOF configuration) could occur.
When intermediate coupling values were considered, that is, between 0.25 and 0.50, two important
observations could be made: (a) Transients were shorter (the duration of the transient was reduced
by a factor 2 in the configuration previously discussed); and (b) abrupt phase transitions that were
observed otherwise disappeared. This result is not surprising. With an appropriately chosen coupling
gain, neural entrainment is achieved between control units, and the two units with their own distinct
time constants (or frequencies in this case) pull each other toward a new common time constant (here,
a new frequency). Because of this smooth convergence towards a stable configuration, the ongoing
physical entrainment is also stabilized, by entrainment effect. Thus, abrupt phase transitions, which
demonstrate a global instability of the control, do not occur and the transients are shortened.
Summary
In summary, the above experiments have shown the following: The outright use of both degrees of
freedom resulted in a very rugged value landscape with sparse, high amplitude but not necessarily
robust, oscillatory behaviors. Freezing the lower degree of freedom enlarged the area of high-yield
because physical entrainment could occur. While lowering the average amplitude of the oscillations,
it supported multiple directions of stability, which stabilized the system when the second degree of
freedom was released. Optimal performance was obtained when joint synergy was considered and
neural entrainment between control units occured.
3.6 Conclusion
With this case-study, we provided evidence to substantiate our claim that in learning a new motor task
(here, swinging), a reduction of the number of available biomechanical degrees of freedom helps sta-
3.6. Conclusion 85
bilize the interplay between environmental and neural dynamics. Among the various types of adaptive
mechanisms that take part in this interaction, we focussed on entrainment, both neural and physical,
and morphological development. Our study represents an attempt to disentangle the complex interplay
between morphological, neural and environmental dynamics.
With our experimental results, we stressed the importance of morphological dynamics and its ef-
fects on environmental interaction. An outright use of all degrees of freedom was shown to reduce
the likelihood that physical entrainment takes place, which in turn resulted in a reduced robustness of
the system against environmental perturbations. Instead, by freezing some of the available degrees of
freedom, physical entrainment could occur and a large high-yield area of the parameter space was ob-
tained, producing robust oscillatory behaviors. This robustness eventually stabilized the system when
the frozen degrees of freedom were released.
Interestingly, our thesis is supported by descriptive evidence in both developmental psychology
and biomechanical studies of motor skill acquisition. Thelen and Smith (1994) reported that infants
first learning to stand, typically solve the problem of how to coordinate their degrees of freedom by
freezing their body segments into an inverted pendulum-type postural coordination. Similarly, studies
by Jensen et al. (1995) on the development of infant leg kicking between 2 weeks and 7 months of
age, showed a progression from proximal control (at the hip) to more distal control (inclusion of knee
and ankle joints). Further support comes from Bernstein’s seminal work on motor skill acquisition in
which he showed that the freezing of a number of degrees of freedom is followed, as a “consequence of
experiment and exercise,” by the preliminary lifting of all restrictions, and the subsequent incorporation
of all possible degrees of freedom (Bernstein, 1967). In doing so, differentiated patterns of movement
and synergies can be explored, and eventually the most efficient or economical movement patterns can
be selected.
These three examples reflect quite accurately what we observed in our experiments: Morpholog-
ical changes (here, freezing and freeing of biomechanical degrees of freedom) are a form of plastic
mechanism and contribute to the life-time adaptivity of a system, that is, they are beneficial during
development and after. As for any other plastic mechanism, they have their own dynamics and time
scale. As such, their interplay with mechanisms operating at other time scales is likely to contribute
to the emergence of robust behavior. This hypothesis is actually supported by a recent study by Ro-
jdestvenski et al. (1999) on the robustness of biological systems with respect to changes of microscopic
parameters as a consequence of time scale hierarchy. The authors illustrate how time scale hierarchies
can lead to a decoupling of regulatory mechanisms and the emergence of robustness against parameter
variations.
In future, we will aim at corroborating this hypothesis through the study of tasks involving a greater
number of degrees of freedom as well as more environmental interaction. This will undoubtly raise
the issue of scalability of our current framework. In the presence of an increased number of avail-
3.6. Conclusion 86
able degrees of freedom, which joints should be frozen and in what order? Will a simple reduction
of the number of available degrees of freedom be sufficient to yield robust adaptivity? As a matter of
fact, our on-going studies (Lungarella and Berthouze, 2002b) show that, consistent with observations
made in developmental psychology, alternate freezing and freeing of degrees of freedom may be nec-
essary when the inability to control excessive degrees of freedom pushes the system outside the limits
of postural stability. From this perspective, morphological changes truly have their own dynamics,
and understanding the key features of this dynamics will be an interesting challenge. Even more so
will be the study of the link between morphological dynamics and yet another form of dynamics, the
spontaneous dynamics of Goldfield (1995).
Chapter 4
Alternate Freezing and Freeing of
Degrees of Freedom1
4.1 Synopsis
In the previous chapter, we provided experimental evidence that starting with fewer degrees of freedom
enables a more efficient exploration of the sensorimotor space during the acquisition of a task. The
study came as support for the well-established framework of Bernstein (1967), namely that of an initial
freezing of the distal degrees of freedom, followed by their progressive release and the exploitation
of environmental and body dynamics. In this chapter, we revisit our study by introducing a nonlinear
coupling between environment and system. Under otherwise unchanged experimental conditions, we
show that a single phase of freezing and subsequent freeing of degrees of freedom is not sufficient
to achieve optimal performance, and instead, alternate freezing and freeing of degrees of freedom is
required. The interest of this result is two-fold: (a) it confirms the recent observation by Newell and
Vaillancourt (2001) that Bernstein’s framework may be too narrow to account for real data, and (b)
it suggests that perturbations that push the system outside its postural stability or increase the task
complexity may be the mechanism that triggers alternate freezing and freeing of degrees of freedom.
4.2 Introduction
Body-related morphological changes during the early stages of infancy, either slow and irreversible
modifications (such as physical growth), or relatively rapid, task-related re-organizations of the
musculo-skeletal system (such as the transition from crawling to standing), are a salient character-
1To appear as Berthouze, L. and Lungarella, M. Motor skill acquisition under environmental perturbations: on the neces-sity of alternate freezing and freeing of degrees of freedom. Adaptive Behavior, 12(1), 2004.
87
4.2. Introduction 88
istic of the ongoing developmental process. In this chapter, we focus on the effect on behavior of one
particular form of morphological change: the release of constraints in the motor system. A few telling
examples of constraints in the sensory, motor, and neural systems of vertebrate species such as rats, cats
and humans are the immaturity of the accommodative system (Turkewitz and Kenny, 1982), the low
acuity of vision and absence of binocularity (Hainline, 1998), the low ratio of leg muscle to leg fat, and
the poor postural control of head, trunk, arms, and legs (Bertenthal and Von Hofsten, 1998; Thelen and
Smith, 1994). Studies in developmental psychology have shown that constraints in the sensory system
and biases in the motor system and their subsequent release, may play a pivotal role in the ontogeny
of motor skills, and in shaping the infant’s exploratory behavior (Bushnell and Boudreau, 1993; Gold-
field, 1995; Harris, 1983; Piek, 2002; Thelen et al., 1984; Turkewitz and Kenny, 1982). In this chapter,
we consider the morphological limitations in the motor apparatus of a developing system as particular
instances of ontogenetic adaptations, that is, neurobehavioral traits of an immature organism with a
specific adaptive role at a particular stage of development (Bjorklund and Green, 1992). We make the
premise that appropriate initial constraints on morphological resources are not only beneficial to the
emergence of stable sensorimotor patterns with an increased tolerance to environmental perturbations,
but help also to bootstap later stages of learning and development.
The study of morphological changes is an important area of research. Yet, they have been largely
neglected by biologically motivated robotics research, presumably because of (a) the difficulties in-
volved with the actual implementation of the suggested morphological changes in real-world systems
(as opposed to simulated systems in which morphological changes can be achieved relatively easily);
and (b) the lack of proper means for quantifying their effects on neural dynamics and behavior. Re-
cently, the robotics community has started to address the former issue. The ultimate goal is to create
machines that by changing their morphology (shape) are able to perform various tasks in various envi-
ronments. Examples are the self-reconfigurable modular robots built by Murata et al. (2001), and the
morpho-functional machine initiative promoted by Hara and Pfeifer (2003). In both instances, change
of shape is concerned with the functionality of the machine, and not with learning mechanisms. The
quantification of movements has also been investigated, and a few methods have been proposed. Di-
mensional analysis, for instance, gives an index of the number of independent degrees of freedom
required to produce the time series of a particular movement (Kay, 1988; Mitra et al., 1998). The
spatio-temporal organization of the joint-space data associated with a movement can also be captured
by principal component analysis. Haken (1996) showed that early in the learning of a “pedalo task” (a
skating locomotion task in which both skates are connected by a rigid link that constrains their relative
motion to a cycloidal trajectory in the vertical plane), several principal components were necessary to
explain most of the variance of the data, and that after practice, this number of significant principal
components collapsed to one. Although useful for a descriptive characterization of the system, both
types of analysis do not provide any information on the mechanisms underlying the described learning
4.2. Introduction 89
process.
More central to the theme of this chapter is the “degrees of freedom problem”, first pointed out
by Bernstein (1967) (see also Newell and Vaillancourt, 2001; Sporns and Edelman, 1993; Vereijken
et al., 1992; Zernicke and Schneider, 1993): Although the human musculo-skeletal apparatus is a
highly complex and non-linear system, with a large number of potentially redundant degrees of free-
dom (e.g., more than one motor signal can lead to the same trajectory), well-coordinated and precisely
controlled movements emerge. In reality, the redundancy increases at the level of the muscles (there
are many more muscles than joints), and explodes at the neural level. While it guarantees flexibility
and adaptability (think of the hand’s astounding manipulative abilities, for instance), it also challenges
the control of body movements, largely because of the enormous number of components involved in
the generation and coordination of a movement. A possible solution to the control issues raised by the
excess number of degrees of freedom, was suggested by Bernstein himself. His proposal is character-
ized by three stages of change in the number of degrees of freedom that accompany motor learning
and development. Initially, in learning a new skill or movement, the peripheral degrees of freedom
(the ones farther from the trunk, such as wrist, and ankle) are reduced to a minimum (freezing). Sub-
sequently, as a consequence of experiment and exercise, restrictions at the periphery are gradually
lifted (freeing), till “all” degrees of freedom are incorporated. Eventually, reactive phenomena (such
as gravity and passive dynamics) are exploited, and the most efficient movements are selected. Several
studies have provided evidence for “particular features” of Bernstein’s three-stage model. Vereijken
et al. (1992), for example, conducted an empirical test of the related issues of freezing and freeing
degrees of freedom in adults learning a ski simulator task. The kinematic analysis of the limb and torso
motions showed that at the outset of learning, subjects froze many of the joint segments of the whole
body. With subsequent practice, subjects introduced active motion at the ankle, knee, and hip joints in a
fashion consistent with the freeing of (release of the ban on) degrees of freedom. Other investigations
included the learning by adults of a handwriting signature with the non-dominant limb (Newell and
van Emmerik, 1989), a dart throwing task (McDonald et al., 1989), pistol shooting (Arutyunyan et al.,
1969), and the development of infant leg kicking between two weeks and seven months of age (Jensen
et al., 1995).
In this study, we approach the “degrees of freedom problem” by employing a robot-based synthetic
modeling that exploits findings from developmental psychology. Some instances of a developmental
approach to the issue have already been reported (Berthouze and Kuniyoshi, 1998; Lungarella and
Berthouze, 2002c; Metta, 2000) (see Lungarella and Berthouze, 2002c, for a review). Those studies,
however, framed the role of the freezing of degrees of freedom, and their subsequent freeing, in an
information-processing context – similar to existing connectionist learning techniques, such as con-
strained or incremental learning (e.g., Elman, 1993). More in line with our ideas, Taga (1997) reported
computer simulations of the development of bipedal locomotion in human infants. By freezing and
4.3. Pendulation study and release of the peripheral degrees of freedom 90
freeing the degrees of freedom of the neuro-musculo-skeletal system, he was able to reproduce (in
simulation) the “U-shaped” developmental trajectory of infants’ stepping movements during which re-
flexive movement patterns first appear, then disappear, and months later reappear in altered form. His
result was in agreement with Bernstein’s three-stage model of skill acquisition, and thus, he hypoth-
esized that a developmental mechanism of freezing and freeing may be important for learning stable
and complex movements. In this chapter, however, we will challenge this model by showing that in the
presence of strong couplings between system and environment during task learning, a rigid sequence
of morphological changes (freezing → freeing → selection) may not be sufficient. Instead, a more
complex dynamics of changes should be considered.
4.3 Pendulation study and release of the peripheral degrees of freedom
In Lungarella and Berthouze (2002c), we reported our investigation on the exploration of pendulation
(or swinging) in a small-sized humanoid robot. We chose swinging as a case-study, because it is a
repetitive activity, and thus, characteristic of emerging skills during the first year of life – see, for
instance, the notions of circular reaction (Piaget, 1953) and body babbling (Meltzoff and Moore, 1997).
Thelen and Smith (1994) suggested that oscillations are the product of a motor system under emergent
control; that is, when infants attain some degree of intentional control of limbs or body postures, but
when their movements are not fully “goal-corrected.”
Assuming a neural control structure suitable for the task at hand, we proposed a comparative anal-
ysis between outright use of the full body for exploration, and progressive exploration characterized by
a developmental freeing of the degrees of freedom, such as the one hypothesized by Bernstein (1967).
The study produced a number of insights, which we summarize here:
• The outright use of all degrees of freedom (hip and knee) reduced the likelihood of “physical en-
trainment”, that is, the mutual regulation of body and environmental dynamics. We observed that
small changes in the control parameters yielded different oscillatory behaviors. Moreover, even
within one control parameter setting, the system displayed several rapid and abrupt transitions
between different stationary regimes. This feature is characteristic of spontaneous movement
activity in infants (Goldfield, 1995).
• By freezing the peripheral degree of freedom (knee), we observed an increase of the range of
control parameter settings that led to stable oscillatory behaviors, as well as of the range of
oscillation frequencies for which physical entrainment could effectively occur. Miyakoshi et al.
(1994) and Williamson (1998) have shown that the exploitation of entrainment can indeed yield
robust behavior in various tasks.
4.4. Adding nonlinear perturbations 91
• Bootstrapped control of all degrees of freedom in which the peripheral degree of freedom was
released after the system had already stabilized in a single degree of freedom (1-DOF) stationary
regime, resulted in a dense distribution of parameter settings yielding stable oscillatory behaviors
with a large amplitude. Statistical analysis showed that these large oscillations could not be
accounted for solely by the oscillations achieved in the 1-DOF (frozen) configuration. Instead,
the freezing and freeing of the degrees of freedom reduced the sensitivity of the system to the
selection of particular hip-knee parameter configurations.
• The study showed that joint synergies 2, which are characteristic of human motor control (e.g.,
Spencer and Thelen, 1999), played a complementary role to physical entrainment during the
release of the peripheral degree of freedom. A strong coupling resulted in “neural entrainment”,
whereby the control frequency of the lower limb locked onto the control frequency of the upper
limb. The phase locking between both limbs stabilized the oscillatory behavior, and thus by
entrainment effect, also the ongoing physical entrainment. Abrupt phase transitions did not
occur and transients were shortened, which is typical for task execution at the later stage of
motor skill learning (Goldfield, 1995).
4.4 Adding nonlinear perturbations
In this chapter, we revisit our previous study by adding a nonlinear coupling between environment
and system. Our focus is on whether a progressive release of the peripheral degrees of freedom can
provide adaptivity and robustness against perturbations and constraints such as the rubber band. Both
experimental setup and control architecture are identical to those used in our previous study.
The experimental setup consisted of a small-sized humanoid robot with 12 degrees of freedom.
Through two thin metal bars fixed to its shoulders, the robot was attached by a passive joint to a
supportive metallic frame in which it could freely oscillate in the vertical (sagittal) plane (see Fig. 4.1).
Each leg of the robot had five joints, but only two of them (hip and knee) were used in our experiments.
High torque servo motors actuated each joint. Because these motors do not provide any form of sensor
feedback, we used an external camera to track colored markers placed on the robot’s limbs. Throughout
this study, we refer to feedback as the visual position of the hip in a frame of reference centered on the
hip position, when the robot is in its resting position.
To study the effect of environmental interaction during learning, we introduced an asymmetric
nonlinear coupling between system and environment in the form of a thread attached to the humanoid
robot at hip-level, and connected to the supportive frame via a rubber band. This flexible link was
2During task-dependent movements, the joints are not controlled individually, but are coupled in such a way that theychange relative to each other. This coupling is called a joint synergy.
4.4. Adding nonlinear perturbations 92
Figure 4.1: Humanoid robot used in our experiments.
designed so that the rubber band would extend only when the robot was tilted backwards by at least
10 degrees. This setting was kept constant throughout the study. The strong dampening properties of
this coupling are illustrated in Figure 4.2, which shows the visual positions of the hip and ankle during
oscillations with control parameters known to yield resonant behavior in unperturbed situations.
Figure 4.3 depicts the distributed architecture used to control the humanoid robot. Each limb was
controlled by a separate neural oscillator. The four neural oscillators controlling the knees and hips
were modelled by the following set of nonlinear differential equations, derived from Matsuoka (1985):
τu f u f = −u f −βv f −ωc[ue]+−ωp[Feed ]+ + te
τue ue = −ue−βve−ωc[u f ]+−ωp[Feed ]−+ te
τv f v f = −v f + [u f ]+
τve ve = −ve + [ue]+
where ue and u f are the inner states of the neuron e (extension) and f (flexor), ve and v f are variables
representing the degree of adaptation or self-inhibition of the extensor and flexor neurons, and te is
4.4. Adding nonlinear perturbations 93C
omm
ands
Posi
tions
0 10000 20000 30000 40000 50000 60000
Com
man
dsPo
sitio
ns
Figure 4.2: Resonant oscillations for (τu = 0.065,τv = 0.6) without perturbations (top). Resultingbehavior under perturbations (bottom). In each graph, the time-series denote motor impulses(bottom), ankle position (middle) and hip position (top). In this figure, as well as all other similarfigures in this chapter, the vertical axis is unlabelled, because it depicts time-series of differentscales and units, i.e., visual positions in pixels, motor commands in radians. The horizontalline in the lower graph corresponds to the visual position of the location after which the rubberband is extended. The horizontal axis denotes time in milliseconds.
an external tonic excitation signal that determines the amplitude of the oscillation. β is an adaptation
constant, ωc is a coupling constant that controls the mutual inhibition of neurons e and f , and ω p is a
variable weighting the proprioceptive feedback Feed . This proprioceptive feedback is obtained through
the visual position of the hip in a frame of reference centered on the hip position when the robot is
in its resting position. τu and τv are time constants of the neurons’ inner states and determine the
strength of the adaptation effect. The operators [x]+ and [x]− return the positive and negative parts of
x, respectively.
Joint synergy between hip and knee, i.e., the appropriate phase relationship between the corre-
sponding neural oscillators, was implemented by feeding the flexor unit of the knee oscillator with the
combined outputs of the flexor and extensor units of the hip controller. A factor −ωs([uhf ]
+ + [uhe ]+)
was added to the term τu f u f in the flexor unit of the knee oscillator (Equation 4.1), with uhe and uh
f the
inner states of the flexor and extensor units in the hip oscillator, and ωs the intersegmental coupling
parameter determining the strength of the coupling.
4.4. Adding nonlinear perturbations 94
e
f
f
e
ω
ω
c
c
join
t syn
erg
y
tonic impulse (te)
pattern generators
motor com
ma
nds
spring
free joint
ωp
pω
pro
prio
cep
tion
(Fee
d)
ω s
knee joint
hip joint
Figure 4.3: Schematics of the experimental system and neural control architecture. Joint syn-ergy is only activated in experiments involving coordinated 2-DOF control.
As in Taga (1991), we used each neural oscillator as a rhythm generator, with its output y given by
the difference y = u f −ue between the activities of the flexor and extensor units. This value was then
fed to a pulse generator which detects sign changes in the output y of the neural oscillator and generates
a pulse of constant amplitude and of sign sgn(y). The angular position of the motor results from the
integration in time of each pulse. Though very primitive (a variant of on-off control), this controller is
a suitable approximation of the output y. Indeed, it preserves the frequency and maximal amplitude of
4.5. Results and discussion 95
the signal, as well as the timing of sign inversions within one period.
As in the original study (unless specified otherwise), we did not change the following parameters
throughout this study: β = 2.5, ωc = 2.0, te = 20 for the hip (te = 15 for the knee). Other parameters
were set as discussed in the text.
4.5 Results and discussion
4.5.1 Protocol
With the aim of a comparative analysis between the outright use of all degrees of freedom and a pro-
gressive release of the degrees of freedom, we realized two sets of experiments. In the first set, we
considered “2-DOF exploratory control”, with each pair of hip and knee joints controlled by a separate
oscillator unit and the other joints kept stiff in their reset position. We treated two cases: In the first
case, the oscillator units were independent and their respective parameter spaces were independently
explored. In the second case, the oscillator units were coupled via an intersegmental coupling param-
eter ωs, with the goal of realizing neural entrainment between oscillatory units. In the second set of
experiments, we considered a “bootstrapping 1-DOF exploratory phase” during which only the hip
joint was controlled, while other joints were kept stiff in their reset position. When a stationary regime
was obtained, the peripheral degree of freedom (knee) was released and controlled by its own oscillator
unit. The robot’s movements were analyzed via the recording of the hip, knee, and ankle positions.
The same initial conditions were used in all experiments, with the humanoid robot starting from its
resting position. We only considered parameter configurations which yielded motion without external
intervention.
4.5.2 Experimental observations
Unless specified otherwise, all experiments within each scenario were found to yield qualitatively
similar results in terms of the characteristics of the oscillatory behavior, with variations accounted for
by differences in initial conditions. For practical reasons (excessive strain on the physical structure of
the robot as well as on the servo motors), we did not conduct sufficient runs to establish statistically
meaningful results between scenarios.
Selection of suitable hip control parameters in 1-DOF exploratory control
Because an exhaustive exploration of the parameter space for two independent neural controllers was
not feasible, we performed a preliminary exploration of the hip oscillator’s parameter space in a 1-
DOF configuration (the reader should refer to Figure 4.4 and Table 4.1 for an overview of the dif-
4.5. Results and discussion 96
ferent configurations discussed in the following paragraphs). We conducted the exploration using the
value-based exploration algorithm presented in the first study (Chapter 3). This exploration essentially
confirmed our previous findings. Adaptivity to external perturbations and optimal task performance,
i.e., oscillations with large amplitude, required fine tuning of the parameters. Although the hip-ankle
phase plots were not necessarily stationary, all configurations led to a stationary regime of hip oscilla-
tions. Two settings were of particular interest, and were used to carry out the experiments described
here: (τu = 0.035,τv = 0.65) and (τu = 0.06,τv = 0.65), with ωhp ∈ [0.0,7.0] for the first setting, and
ωhp ∈ [0.0,20.0] for the second setting.
ωph
ωph
ωph
ωs
multiple co−existing regimes (P3)
noise driven (P2)
noise independent (P4)
stat. low ampl. hipnon−stat. ankle (P6)
Flexible 1−DOF (P10)
study of intersegmental coupling in 2−DOF conf.
Knee Hip
ωph=2.0 (fixed)
failure to reduce transients and abrupt phase transitions (P9)
stationary hip regime (P1)
co−existing regimes interesting
study of prioprioceptive coupling (hip−space)
low amplitude osc.anti−phase (P0)
P5
identification of suitable 1−DOF configurations
co−existing regimes (no synergy) (P8)
knee−space exploration and role of proprioceptive couplingon physical entrainment
co−existing regimes:stability test (P7)
Figure 4.4: Flow of the proposed experimental discussion with respect to both 1-DOF and2-DOF exploration (cf. Table 4.1).
The first setting (τu = 0.035,τv = 0.65) was characterized by low-amplitude (23 units) antiphase
oscillations of the legs with respect to body motion. Antiphase oscillations are indicated by a phase dif-
ference between the vertical components of the hip and ankle positions equal to π radians. A transversal
analysis along the proprioceptive gain ωhp ∈ [0.0,7.0] showed that all experiments yielded a stationary
regime, robust to external perturbations such as a manual push. With a very weak proprioceptive gain,
4.5. Results and discussion 97
Label τhu,τh
v τku,τk
u ωhp ωk
p ωs
P0 0.035, 0.65 0.0P1 0.060, 0.65 0.0P2 0.060, 0.65 [3.0,7.0]P3 0.060, 0.65 [2.0,3.0]P4 0.060, 0.65 [0.0,2.0]P5 0.035, 0.65 [0.0,7.0]P6 0.060, 0.65 [0.020,0.090], [0.35,0.80] 2.0P7 [0.025,0.075], 0.65 2.0P8 0.060, 0.65 0.035, 0.40 2.0 0.0P9 0.060, 0.65 0.035, 0.40 2.0 [0.25,0.75]P10 0.060, 0.65 0.035, 0.40 2.0 1.0
Table 4.1: Synopsis of the control parameter settings used in Figure 4.4.
i.e., ωhp ∈ [0.0,1.0], we observed smooth, low amplitude (50 units at hip-level) in-phase (no phase
difference) oscillations. With a larger gain, the hip oscillations were limited to an amplitude corre-
sponding to the rubber-band extension point and the ankle behavior was not smooth. We summarized
these results in Figure 4.5.
With the second setting (τu = 0.06,τv = 0.65) a transversal analysis along the proprioceptive gain
showed a variety of behaviors. For extreme values of ωhp (ωh
p = 0 and ωhp > 6.0), we did not observe
any sustained oscillations, and amplitudes did not exceed the rubber-band extension point. Further-
more, manual pushes did not enable the system to stray around this “trivial” attractor. This result was
predictable. With ωhp = 0.0, variations in the inertial angles resulting from the perturbation were not fed
to the controller; and physical entrainment could not occur because the time-constants of the feedback
loop and the control units were not compatible. On the other hand, with too high a gain (ωhp > 6.0), the
system was essentially driven by noise, thus leading to a pseudo-chaotic oscillatory behavior.
For intermediate values of ωhp, we observed multiple co-existing regimes. The value ωh
p = 2.0 was
particularly noticeable with three distinct regimes. From the resting position, a first quasi-stationary
regime was obtained in which in-phase oscillations were sustained, albeit with very low amplitude
(rubber-band extension point) and with a continuous shift between the phases of the hip and ankle
oscillations. After a manual push, a second stationary regime was reached in which larger hip oscil-
lations occurred, but with an aperiodic hip-ankle phase plot. With yet another push, large in-phase
smooth oscillations (amplitude 75 units) were obtained, similar to those obtained with ωhp = 0.5. This
regime was not robust against external perturbations and the system would subsequently settle in any
of the three regimes. We found that this switching behavior was repeatable over various experiments.
From the point of view of the trade-off between stability and plasticity, i.e., stability to perturbation
is desirable but not at the cost of learning plasticity, a systematic occurrence of this switching behavior
across the entire control parameter space would be highly desirable as an intrinsic mechanism to strive
4.5. Results and discussion 98
20
40
60
80
100
120
140
0 50 100 150 200 250 300 350 400 450
20
40
60
80
100
120
140
20 40 60 80 100 120 140
20
40
60
80
100
120
140
0 50 100 150 200 250 300 350 400 450
20
40
60
80
100
120
140
20 40 60 80 100 120 140
Figure 4.5: Time-series of hip position (top) and ankle-hip phase plots (bottom) for ωhp = 0.25
(left) and ωhp = 4.0 (right). The oscillator time-constants are: τu = 0.035,τv = 0.65 in both cases. In
the upper row of plots, the vertical axis denotes the visual positions of the ankle (left) and thehip (right). The horizontal axis denotes time in milli-seconds. In the lower row of plots, bothvertical and horizontal axes correspond to the visual positions of the hip (left plot) and ankle(right plot) in pixels.
around attractor states. Consequently, we carried out a set of experiments, in which we fixed the pro-
prioceptive gain to the critical value ωhp = 2.0. The parameter space for the hip controller was explored
with τu in the “usable” range [0.025,0.075]. The switching behavior could not be reproduced, however.
Instead, all configurations produced a single stationary regime, robust to external perturbations, with
low-amplitude hip oscillations and generally non-periodic hip-ankle phase plots.
Instability of 2-DOF exploratory control
Using the hip parameter identified above (τhu = 0.06,τh
v = 0.65), we realized a sparse exploration of
the knee neural oscillator parameters with τku ∈ [0.02,0.09] and τk
v ∈ [0.35,0.8]. Proprioception was
fed to the hip unit only, with a gain ωhp = 2.0. All experiments yielded the same qualitative behavior:
stationary low-amplitude (30 units) hip oscillations and non-stationary ankle movements.
This result was predictable. Because of its lack of proprioceptive feedback, the knee unit could not
entrain with the hip oscillations. Meanwhile, the hip unit entrained to the oscillations resulting from the
4.5. Results and discussion 99
simultaneous motor commands of both hip and knee, thus inducing a continuous phase shift between
hip and knee motor commands (see Fig. 4.6). Because of the morphology of the system and the 3 : 2
ratio between hip and knee tonic excitations, hip oscillations were sustained, but both environmental
perturbations and out-of-phase knee oscillations reduced the amplitude of the oscillation to a nominal
level.
0 10000 20000 30000 40000 50000 60000
Com
man
dsPo
sitio
ns
Figure 4.6: From top to bottom, time-series of hip and ankle positions, hip and knee motorcommands with the following parameters: τh
u = 0.06, τhv = 0.65, τk
u = 0.02, τkv = 0.8 and ωh
p = 2.0.The horizontal axis denotes time in milliseconds. The system was manually perturbed afterabout 37.5s.
This interpretation was confirmed with experiments carried out with a small proprioceptive gain on
the hip (ωhp = 0.25). With a lower gain, the hip motor commands were not entrained as much to overall
oscillations, and physical entrainment between knee and hip motor commands could occur because
the phase shift was slower. Figure 4.7 illustrates the co-existence of two regimes when ωhp = 0.25 and
τku = 0.025,τk
v = 0.35. The first regime is qualitatively similar to the behavior observed in the previous
instance (although in this case, the hip oscillations also exhibit a “wave-like” stationary regime). The
second regime consists of large (55 units) in-phase oscillations.
To further confirm the hypothesis, we carried out a last batch of experiments in which the knee
control unit was also fed with proprioceptive feedback. After fixing the knee unit parameters to τku =
4.5. Results and discussion 100
50000 100000 150000
Com
man
dsPo
sitio
ns
Figure 4.7: From top to bottom, time-series of hip and ankle positions, hip and knee motorcommands with the following parameters: τh
u = 0.06, τhv = 0.65, τk
u = 0.025, τkv = 0.35 and ωh
p = 0.25.The horizontal axis denotes time in ms. The system was manually perturbed at time 37s, 75s,108s and 147s (vertical lines).
0.06,τkv = 0.65, we varied the knee proprioceptive feedback gain ωk
p in the interval [0.0,8.0]. We found
oscillatory behaviors qualitatively similar to those obtained without proprioception to the knee, namely,
low-amplitude hip oscillations, stationary regime robust to external perturbations. Higher gains led to
a reduction of the phase difference between hip and ankle oscillations, and to a smoother oscillatory
behavior. With different knee parameters (τku = 0.02, τk
v = 0.35), however, we observed a wide range of
behaviors, from non-stationary and non-smooth ankle behaviors to in-phase and stationary oscillations.
With an increase in the knee proprioceptive gain, the phase shifts became stronger and the stationary
regimes were not sustained.
As in our initial study, the parameter ωs, which determines the strength of the intersegmental
coupling, played a crucial role. With too low a value, the coordination between hip and knee oscillators
was very loose and we observed results qualitatively similar to the independent case. With a high
value (here, 1.0), a strong coupling occurred and because the lower limb was mainly driven by the
hip control unit, the system essentially became a “flexible 1-DOF system” (Lungarella and Berthouze,
4.5. Results and discussion 101
2002a). To illustrate this point, we carried out the following experiments. The hip unit parameters
were initialized to (τhu = 0.06,τh
v = 0.65), and the knee control parameters (τku and τk
vk) were set so
that with an intersegmental coupling of ωs = 0.0 multiple oscillatory regimes could co-exist. We used
the following values: τku = 0.035,τk
v = 0.4. The proprioceptive feedback gain to the hip was set to
ωhp = 2.0, i.e., its critical value as determined experimentally. With ωs = 1.0, the system stabilized into
a stable regime in which hip and knee oscillated in phase (see motor commands in the close-ups of
Fig. 4.8). Interestingly, knee kicking motion occurred only shortly before the robot reached the point
after which the rubber band would have extended. From an intuitive point of view, this behavior could
be optimal task performance.
Com
man
dsPo
sitio
ns
0 10000 20000 30000 40000
Com
man
dsPo
sitio
ns
20000 22000 24000
Figure 4.8: Co-existing regimes for ωs = 0.0 and τhu = 0.06,τh
v = 0.65,τku = 0.035,τk
v = 0.4 (top).Unique in-phase oscillatory regime with ωs = 1.0 (bottom). In each graph, the time-series de-note hip and ankle positions, hip and knee motor commands (from top to bottom). Right-handwindows are close-ups on the time-series. The horizontal axis denotes time in msec.
With intermediate values, i.e., ωs = [0.25,0.75], the intersegmental coupling was not sufficient to
overcome the difference in time-constants between the hip and the knee control units, and its effects
were negligible. This outcome was in sharp contrast with our previous findings that intersegmental
coupling (without proprioceptive feedback) could account for a reduction of transients and for the sup-
4.5. Results and discussion 102
pression of abrupt phase transitions. We had attributed that result to the effect of neural entrainment,
whereby the outputs of the control units tend to smoothly converge towards a stable configuration (Lun-
garella and Berthouze, 2002c). In the case of physical constraints (the rubber band), however, a stable
configuration cannot be systematically found.
Bootstrapped 2-DOF exploratory control
As in the original study, we experimented with a controlled release of the second degree of freedom
after the system had reached stationary regime in a 1-DOF configuration. We selected 1-DOF param-
eter configurations such as discussed earlier but not necessarily close to the resonant solution. The
reaching of the stationary regime was visually evaluated by the experimenter and the second degree of
freedom was then released. Although this visual appraisal may appear an ad-hoc solution, it actually
helps validate our observations by introducing variance in the time after which the degree of freedom
is released.
In contrast to the initial study in which all configurations led to a stable, in-phase stationary regime
with large amplitude, the introduction of the second degree of freedom induced different behaviors
that showed a relatively high sensitivity to the values of the knee control parameters. We observed two
typical situations: (a) The introduction of the second degree of freedom induced a phase shift which
resulted in dampened oscillations, as shown in Figure 4.9 (left). This phenomenon was repeatable
and robust to external perturbations. (b) When the 1-DOF regime was close to resonant control, the
oscillatory behavior was left unchanged by the addition of a second degree of freedom, as shown in
Figure 4.9 (right). Again, this is a natural result of the morphology of the system and the 3 : 2 ratio
between hip and knee tonic excitations.
In further contrast with the initial study, we did not observe any instance where the introduction
of the second degree of freedom led to better task performance. Instead, it often induced a collapse
of the hip oscillations. We used these occurrences as a triggering signal for a new freezing/freeing
phase of the peripheral degree of freedom. After freezing, the system always returned to an oscillatory
behavior typical of its 1-DOF configuration. Subsequent releases led either to a new collapse of the
hip oscillations, and thus, a new cycle of freezing/freeing, or to sustained oscillatory behavior (see
Fig. 4.10).
This result begs the question of whether freeing and freezing are just another form of perturbation.
At this stage, we are not in a position to provide a definite theoretical reply. We are also not aware
of any existing theoretical characterization of the effect of freezing/freeing on the motion patterns
of human subjects engaged in tasks typically observed by developmental psychologists. We are not
arguing against the fact that a carefully designed perturbation, or a set of artificial constraints, could
trigger the same type of motor changes as those induced by freezing and unfreezing. However, it does
4.5. Results and discussion 103
60000 80000 10000040000 60000 80000
Com
man
dsPo
sitio
ns
Figure 4.9: Results of the release of an additional degree of freedom after stabilization in a 1-DOF configuration. Left: (τh
u = 0.045,τhv = 0.65) and (τk
u = 0.025,τkv = 0.45). Right: (τh
u = 0.06,τhv = 0.65)
and (τku = 0.025,τk
v = 0.35). From top to bottom, the time-series denote hip and ankle positions,hip and knee motor commands. The horizontal axis denotes time in msec.
not appear plausible that infants rely on the likelihood of encountering such a particular perturbation
to generate the appropriate chain of changes required for them to acquire their various skills. Indeed,
developmental psychologists observe such sequence of change without having to introduce external
biases. Thus, it seems reasonable to attribute these pathway of changes to an intrinsic mechanism like
freezing and freeing (which could be seen as an intermediate stage en route to the self-organization of
motor activities). Our experimental results show that unlike external perturbations such as a manual
push, this mechanism can consistently and reliably lead the system to stray away from the sensorimotor
area explored at the time of the “perturbation.”
This could be interpreted in terms of the three stages of human motor skill acquisition proposed
by Goldfield (1995): (1) inability to control excessive degrees of freedom pushing infants outside the
limits of their postural stability; (2) reduction of the number of degrees of freedom to simplify the
control, either introducing synergies or by freezing degrees of freedom; and (3) controlled release of
the frozen degrees of freedom following recovery. Figure 4.11 shows empirical evidence for the effect
4.6. Conclusion and future directions 104
0 20000 40000 60000 80000 100000
Com
man
dsPo
sitio
ns
Figure 4.10: Oscillatory behavior obtained during alternate freezing and freeing phases. Neu-ral parameters are unchanged and set to τh
u = 0.06,τhv = 0.65,τk
u = 0.03,τkv = 0.325,ωh
p = 0.5 andωs = 0.5. From top to bottom, time-series denote hip and ankle positions, hip and knee motorcommands. The horizontal axis denotes time in milliseconds.
of alternate freezing and freeing of the degrees of freedom. The close-ups on the right-hand side show
that although the control parameters did not change, the kicking pattern of the knee did not change
between subsequent releases.
4.6 Conclusion and future directions
In this study, we set out to assess whether an initial phase of freezing followed by a subsequent phase
of freeing of degrees of freedom, such as proposed by Bernstein’s model, would be sufficient to over-
come the increase in task complexity induced by a strong nonlinear coupling between the pendulating
robot and its environment. By comparing use of the full body and progressive exploration by using a
developmental cycle of freezing and freeing of the degrees of freedom, we showed that a single stage of
freezing/freeing was not sufficient to develop stable oscillatory behaviors. In contrast to our previous
study (Lungarella and Berthouze, 2002c), alternate freezing and freeing was required. The interest of
this result is two-fold:
4.6. Conclusion and future directions 105
50000 75000 100000
Com
man
dsPo
sitio
ns
63000 66000 99000 102000
Figure 4.11: Effect of alternate freeing and freezing of the knee. Neural parameters are un-changed and set to τh
u = 0.035,τhv = 0.65,τk
u = 0.055,τkv = 0.45,ωh
p = 0.5 and ωs = 0.5. From top tobottom, time-series denote hip and ankle positions, hip and knee motor commands. Right-hand graphs are close-ups on the two different regimes. The horizontal axis denotes time inmilliseconds.
1. It confirms the recent observations by Newell and Vaillancourt (2001) that Bernstein’s frame-
work may be too narrow to account for coordination changes observed in motor learning (in
adults as well as in children) (see also Haehl et al., 2000; Ko et al., 2003). According to Ko et al.
(2003, p.48), “there is growing evidence that there may not be, as suggested by Bernstein, a
single pathway of change in the evolving patterns of coordination as a function of learning.” In-
stead, depending on the task, there can be either an increase or a decrease in (a) the number of
involved mechanical degrees of freedom, and (b) the dimension of the attractor dynamics of the
motor output (number of dynamical degrees of freedom). Newell and van Emmerik (1989), for
example, found no evidence of the freeing of the distal arm segments in the learning of signature
writing, even though McDonald et al. (1989) found evidence of a release of the most distal wrist
segment in learning a dart-throwing task with the non-dominant arm but only after several days
of practice. Newell and Vaillancourt (2001) also reports that while open chain linkages, such as
arms and legs, are more prone to exhibit a proximal to distal direction to the recruiting of the
4.6. Conclusion and future directions 106
biomechanical degrees of freedom, this pathway of change is only due to particular task con-
straints and may not be a general learning strategy. This interpretation is supported by Haehl
et al. (2000)’s study on infants learning to cruise (walking with support). This study showed that
infants displayed an initial poorly controlled exploratory phase – “wobbling” phase – character-
ized by a large number of movement reversals (i.e., dynamical degrees of freedom).
2. It provides empirical evidence suggesting that perturbations which push the system outside the
limits of its postural stability, or which increase the complexity of the task may be the triggering
mechanism for alternate freezing and freeing of degrees of freedom. As with Newell and Vail-
lancourt (2001), this study does not allow us to further speculate on (a) the factors responsible
for the multiple pathways of change observed in the learning of motor coordination (besides
task-dependence, and confluence of constraints in action), and (b) how those factors combine
with the neural dynamics to implement those changes. However, we believe that it provides
opportunities to further investigate the issue of increased task complexity and task constraints.
In Chapter 5, for instance, we will report on a study investigating robot-bouncing, which took
inspiration from a longitudinal study by Goldfield et al. (1993) on infants’ bouncing in a Jolly
Jumper, i.e., a harness hung from the ceiling by springs or rubber bands. Despite the preliminary
nature of the results the claims made here seem to be substantiated (Lungarella and Berthouze,
2003).
This study points at two challenges to be addressed in the future: The first one relates to the proper
characterization or description of the multiple pathways of change observed during the learning of mo-
tor patterns in a given task. Taking a biomechanical stance, we could quantify the motor activity in
terms of “biomechanical” degrees of freedom, i.e., the change over time of the number of joints or mus-
cles responsible for the particular coordination strategy employed to accomplish the task. A dynamical
systems perspective, on the other hand, would refer to the dynamical or “active degrees of freedom”
that correspond to the geometric layout of the attractor dynamics. In the case of simple patterns of
coordination, such as the one in our initial study, it may be justified to attribute to a single variable,
e.g., the relative phase between limbs, the role of “order parameter”, or “collective variable” (Kelso,
1995). Even then, however, the motion of a single joint can yield a dimension greater than one. As for
whole body action, we have little understanding of the number or the nature of dimensions that capture
the collective organization of the system (Newell and Vaillancourt, 2001). Thus the matching of those
two dimensions (biomechanical and dynamical) is a major challenge.
The second point is closely related to the first one and concerns the tight interaction between neural
dynamics and bodily activity. In our two studies, we intentionally focussed on the the role of physical
(morphological) changes for a fixed control parameter setting (i.e., a given neural organization). Al-
though this step was useful – it helped us demonstrate experimentally that such changes represent an
4.6. Conclusion and future directions 107
adaptive mechanism in their own right – it lacked biological plausibility causing the relatively poor per-
formance obtained in the face of strong perturbations. In reality, neural dynamics entrains to physical
dynamics (as shown by control synergy, for example) and control re-organization occurs as a result of
learning. In this respect, the choice of Matsuoka oscillators is arguable. This type of oscillator has been
shown to have poor characteristics when feedback-induced delay increases above a certain value (see
Taga, 1994, for instance). We hypothesize that, in this study, the nonlinear coupling may have intro-
duced a significant feedback-delay, which in turn resulted in the failure to entrain. Asymptotically-
stable limit-cycle oscillators with physiologically plausible characteristics, e.g., the Bonhoeffer-Van
der Pol model (Fitzhugh, 1961), are possible alternatives to Matsuoka’s model, because they exhibit
“flexible phase-locking”, i.e., they show greater flexibility in changing their relative phase to respond
to incoming entraining actions, even in the presence of strong delays (Ohgane et al., 2004).
Eventually, we would like to briefly comment on an important issue, namely that of the difference
between exploration and learning. Is this case-study about learning, or is it simply about the exploration
of the sensorimotor space during pendulation. In what way does it relate to development? In our
framework, exploration is a key component of task acquisition. Exploration produces the diversity of
sensorimotor trajectories (instances of task executions) which higher brain systems can subsequently
select, and exploit to realize learning, for example, in the form of consolidation of a parameter in
motor memory, or to train forward models (e.g. Wolpert et al., 2003). With a few exceptions, most
motor tasks require practice before optimal performance is achieved, and in young infants – at a stage
when they have not acquired many primitive motor behaviors on top of which to build more complex
skills – the role of exploration is critical. The use of value-based learning algorithms, such as the
ones discussed in this chapter, but also in Chapter 3 and in Chapter 6, implement a first step toward
learning, as exploration is driven by value, i.e., task performance. In the original study, we showed how
such value-driven exploration led to a quick convergence to a stable motor behavior. Thus, exploration
should be seen as an adaptive (plastic) mechanism in its own right, although it acts on a different
ontogenetic time-scale than that of learning or development.
Chapter 5
On the Synergy Between Neural and
Body-Environment Dynamics1
An outstanding property of the nervous system is that it is self-organizing, i.e., in contact with a
new environment the nervous system tends to develop that internal organization which leads to
behavior adapted to that environment. (Ashby, 1947)
5.1 Synopsis
The study of how infants strapped in a Jolly Jumper learn to bounce can help clarify how they explore
different ways of exploiting the dynamics of their movements. In this paper, we describe and discuss a
set of preliminary experiments performed with a bouncing humanoid robot and aimed at instantiating
a few computational principles thought to underlie the development of motor skills. Our experiments
show that a suitable choice of the coupling constants between hip, knee, and ankle joints, as well as
of the strength of the sensory feedback, induces a reduction of movement variability, and leads to an
increase in bouncing amplitude and movement stability. This result is attributed to the synergy between
neural and body-environment dynamics.
5.2 Introduction
Despite the availability of many descriptive accounts of infant development, modeling how motor
abilities unfold over time has proven to be a hard problem (Goldfield, 1995; Sporns and Edelman, 1993;
1To appear as Lungarella, M. and Berthouze L. (2004). Robot bouncing: on the synergy between neural and body-environment dynamics. In Iida, F., Pfeifer, R., Steels, L. and Kuniyoshi, Y. (eds.) Embodied Artificial Intelligence. Berlin:Springer-Verlag.
108
5.2. Introduction 109
Thelen and Smith, 1994). Existing models are based on general principles and specific mechanisms
which are assumed to underlie the changes in early motor development.
One such mechanism is self-exploration through spontaneous activity. An important precursor
of later motor control (Forssberg, 1999; Piek, 2001; Thelen and Fischer, 1983), its main role seems
to be the exploration of various musculo-skeletal organizations in the context of multiple constraints
such as environment, task, architecture of nervous system, muscle strength, masses of the limbs, and
so on. A growing number of developmental psychologists has started to advocate the view that self-
exploration through spontaneous movements helps infants bootstrap new forms of motor activity, as
well as discover more effective ways of exploiting the dynamics generated by their bodily activities
(Angulo-Kinzler, 2001; Goldfield, 1995; Schneider et al., 1990; Sporns and Edelman, 1993; Von Hof-
sten, 1993). It has been suggested that through movements that garner information specific to stable
regions in the high-dimensional space of possible motor activations, self-exploration can lead to a
state of awareness about body and environment (Goldfield, 1995). In fact, fetuses (as early as 8 to 10
weeks after conception) as well as newborn infants display a large variety of transient and spontaneous
movement patterns such as infant stepping and kicking (Thelen and Smith, 1994), spontaneous arm
movements (Piek and Carman, 1994), general movements and sucking movements (Prechtl, 1997). In-
fants probably learn about their body by performing movements over and over again, and by exploiting
the continuous flow of sensory information from multiple sensory modalities. In doing so, they ex-
plore, discover, and eventually select – among the myriad of available solutions – those that are more
adaptive and effective (Angulo-Kinzler, 2001).
The control of exploratory movements has been traditionally attributed to neural mechanisms
alone. Prechtl (1997), for instance, linked the production and regulation of spontaneous motility in in-
fancy “exclusively” to endogenous neural mechanisms, such as central pattern generators. This claim
is somewhat substantiated by the fact that in many vertebrate species, central pattern generators appear
to generate the rhythm and form of the bursts of motoneurons (Grillner, 1985), or to govern innate
movement behaviors altogether (Forssberg, 1999).
In the last two decades, however, new evidence has pushed forward an alternative and multi-causal
explanation theoretically grounded in dynamic systems theory (Thelen and Smith, 1994). Accord-
ing to this view, coordinated motor behavior is also the result of a tight coupling between the neural
and biomechanical aspects of movement, and the environmental context in which the movement oc-
curs (Goldfield, 1995; Kelso, 1995; Taga, 1995; Thelen and Smith, 1994). Spontaneous movements
are not mere random movements, but are organized (or better, they self-organize), right from the very
start, into recognizable patterns involving various parts of the body, such as head, trunk, arms, and
legs. Spontaneous kicks in the first few months of life, for instance, appear to be particularly well-
coordinated movements characterized by a tight coupling (Thelen and Fischer, 1983; Thelen and Smith,
1994), and by short phase lags between the hip, knee and ankle joints (Piek, 2001). Rigid phase-locked
5.3. Hypotheses on infant bouncing learning 110
movements can be interpreted as a “freezing” of a number of degrees of freedom that must be controlled
by the nervous system, thus resulting in a reduction of the movement variability and complexity, and
in a faster learning process (Bernstein, 1967; Turvey and Fitzpatrick, 1993). During development, the
strong synchrony is weakened, and the degrees of freedom are gradually “released” (Piek, 2001; Thelen
and Fischer, 1983; Vaal et al., 2001). The ability to change the patterns of coordination between vari-
ous joints to accomplish a task is an important aspect of infants’ motor development (Angulo-Kinzler,
2001). It has been shown that tight interjoint coupling persisting beyond the first few months of life
may lead to poor motor development, or may even be associated with abnormal development (Vaal
et al., 2001).
In a previous paper, we examined the effects of “freezing and freeing of degrees of freedom” (Lun-
garella and Berthouze, 2002c) in a swinging biped robot. The study showed that by freezing (that is,
rigidly coupling) and by subsequently freeing the mechanical degrees of freedom, the sensorimotor
space was more efficiently explored, and the likelihood of a mutual regulation of body-environment
and neural dynamics (that is, entrainment) was increased. The aim of this chapter is to further our
understanding of the role played by the coupling (a) between joints, and (b) between the sensory ap-
paratus and the neural structure for the acquisition of motor skills. To achieve this goal, we embedded
a pattern generating neural structure in a biped robot, and by manually altering various coupling con-
stants, we systematically studied their interaction with the body-environment dynamics in the context
of a real task (bouncing).
5.3 Hypotheses on infant bouncing learning
Goldfield et al. (1993) performed a longitudinal study in which eight six-months old infants strapped
in a “Jolly Jumper” (i.e., a harness attached to a spring, see Fig. 5.1) were observed once a week, for a
period of several weeks, while learning to bounce. They concluded that in the course of learning, the
infants’ motor activity could be decomposed into an initial “assembly phase”, during which kicking
was irregular and variable in period, followed by a “tuning phase” characterized by bursts of more
periodic kicking and long bouts of sustained bouncing, during which infants seemed to refine and
adapt the movement to the particular conditions of the task. A third phase was initiated by a sudden
doubling of the bout length, and was characterized by oscillations of the mass-spring system at its
resonant frequency, a sensible rise of amplitude, and a decrease of the variability of the period of the
oscillations.
From this study, we derive a few principles. First, there is no need to postulate a set of prepro-
grammed instructions or predefined motor behaviors. It is by means of a process of self-organization
and self-discovery, and through various spontaneous (seemingly random) movements that infants ex-
plored their action space and eventually discovered that kicks against the floor had “interesting” con-
5.3. Hypotheses on infant bouncing learning 111
Figure 5.1: Infant strapped in a Jolly Jumper.
sequences (Goldfield et al., 1993). After an initial exploratory phase (assembly), the infants selected
particular behaviors and began to exploit the physical characteristics of the mass-spring system. Gold-
field and collaborators advanced the hypothesis that, in general, infants learning a task may try out
different musculo-skeletal organizations by exploring the corresponding parameter space, driven by
the dynamics of the task as well as by the existing repertoire of skills and reflexes.
Second, to achieve effective and continuous bouncing, i.e., bouncing characterized by simultaneous
leg extensions, the infants had to learn patterns of intersegmental coordination. Thus, the infants had
to explore different force and timing combinations for the control of their movements, and to integrate
the environmental information impinging on various sensory modalities, i.e., visual, vestibular, and
cutaneous. Unfortunately, the study performed by Goldfield et al. did not provide any kinematic or
kinetic analysis of the development of the infants’ movement patterns. In line with the findings reported
in (Lungarella and Berthouze, 2002c; Thelen and Fischer, 1983; Vaal et al., 2001), we hypothesize
5.4. Experimental setup 112
that in order to reduce movement complexity, the initial movements had to be performed under tight
intersegmental coupling. As development and learning progressed, the couplings were weakened, and
more complex movement patterns could be explored. Thelen and colleagues put forward evidence
showing that in infants the loosening of the tight joint coupling may not necessarily be a consequence
of learning alone (see Thelen and Smith, 1994, for instance).
Third, the rhythmic nature of the task (bouncing) can be interpreted as a particular instance of
Piagetian circular reaction2 . Rhythmic (not necessarily task-oriented) activity is highly characteristic
of emerging skills during the first year of life. Thelen and Smith suggested that oscillatory movements
are the by-product of a motor system under emergent control, that is, when infants are in the process of
attaining some degree of intentional control of their limbs or body postures, but when their movements
are not fully goal-corrected (Thelen and Smith, 1994).
Finally, this study highlighted the necessity of a value system to evaluate the consequences of the
movements performed, and to drive the exploratory process. Value systems are known to mediate
plasticity and to modulate learning in an unsupervised and self-organized manner, allowing organisms
to be adaptive, and to learn on their own via self-generated and spontanenous activity. They also
create the necessary conditions for the self-organization of dynamic sensory-motor categories, that is,
movement patterns.
5.4 Experimental setup
To test our computational hypotheses, we decided to replicate Goldfield et al.’s experiments using a
small-sized humanoid robot with 12 mechanical degrees of freedom (Fig. 5.2). The robot was sus-
pended in a leather harness attached to two springs. Each leg of the robot had three segments (thigh,
shank, and foot) and five joints, but only three of the latter (i.e., hip, knee and ankle) were used. Each
joint was actuated by a high-torque RC-servo module. These modules are high-gain positional open-
loop control devices and do not provide any feedback on the position of the corresponding joint. In fact,
there was no need to measure the anatomical angles of hip, knee and ankle, since these values were
available as the set positions of the RC-servo modules. Exteroceptive and proprioceptive information
were also taken into account. Ground reaction forces were measured by means of force sensitive re-
sistors placed under the feet of the robot (two per foot). To reduce impact forces in the joints of the
robot and to add some passive compliance, the soles of the robot’s feet were covered with soft rubber.
Torsional movements around the z-axis were measured with a single-axis solid-state gyroscope. Linear
accelerations in the sagittal plane were estimated by a dual-axis accelerometer (Fig. 5.2 right).
2Circular reactions represent an essential sensorimotor stage of Piaget’s developmental schedule (Piaget, 1953), whichrefer to the repetition of an activity in which the body starts in one configuration, goes through a series of intermediate stages,and eventually returns to the initial configuration.
5.4. Experimental setup 113
k , b2 2
k , b1 1
Spring
2−axis accelerometer
1−axis solidstate gyro
force sensitive resistors
a
az
x
Z
X
Y
Figure 5.2: Left: Humanoid robot used in our experiments. Right: Schematic representation ofthe robotic setup.
5.4.1 Neural rhythm generator
Figure 5.3 (right) depicts a schematic representation of the neuro-musculo-skeletal system inspired
by Taga (1995). The neural rhythm generator or central pattern generator (Grillner, 1985) was con-
structed by using six neural oscillators, each of which was responsible for a single joint (Fig. 5.3 right).
We modeled the individual neural oscillators according to the following set of nonlinear differential
equations (Matsuoka, 1985):
τu u f = −u f −βv f −ωc g(ue)−ωp g(Feed) + te
τu ue = −ue−βve−ωc g(u f )−ωp g(−Feed) + te
τv v f = −v f + g(u f )
τv ve = −ve + g(ue)
yout = u f −ue
where ue and u f are the inner states of neurons e (extensor) and f (flexor), ve and v f are variables rep-
resenting the degree of adaptation or self-inhibition of the extensor and flexor neurons. The external
tonic excitation signal te determines the amplitude of the oscillation. β is an adaptation constant, ωc is
a coupling constant controlling the mutual inhibition of neurons e and f , τu and τv are time constants,
and determine the strength of the adaptation effect. The operator g(x) = max(0,x) returns the positive
5.4. Experimental setup 114
neural rhythm generator
sense consequence of action
Neural system
Musculo−skeletal system
Environment
tonic input
motor output sensory input
generate action
parameter explorationmodulationvalue system
he he
hf hf
kf
ke ke
kf
ae
af
ae
af
ankle
knee
hip
mo
tor
com
man
ds
mo
tor
com
man
ds
Figure 5.3: Left: Basic structure of the neuro-musculo-skeletal system. The arrows in themodel show the information flow. Right: Neural rhythm generator composed of six neural os-cillators. The solid circles represent inhibitory, and the half-circles are excitatory connections.Abbreviations: he=hip extensor, hf=hip flexor, ke=knee extensor, kf=knee flexor, ae=ankle ex-tensor, af=ankle flexor. Not shown are proprioceptive feedback connections and tonic excita-tions.
part of x. The difference of the output of the extensor and the flexor neuron of each unit oscillator was
fed to a pulse generator. Its output yout was the angle of the RC-servo associated with the correspond-
ing unit oscillator. Sensory feedback to the pattern generator Feed occurred through four the pressure
sensors located under the robot’s feet. The value of the afferent feedback was computed as the sum of
the sensed ground reaction forces, weighted by the variable ωp. Appropriate joint synergies among ip-
silateral joints, i.e., appropriate phase relationships between the corresponding neural oscillators, were
produced by feeding the flexor unit of one oscillator with a combination of the output of the extensor
and flexor units of the other oscillator. As shown by Fig. 5.3, reciprocal inhibitory connections between
corresponding flexor and extensor neurons of the left and right hip joint were also implemented.
5.4.2 Selection of the neural control parameters
The adaptation constant β and the degree of mutual inhibition between extensor and flexor neuron of
a single neural oscillator were fixed throughout the whole study to β = 2.5 and ωc = 1.0. The tonic
excitation was fixed to te = 1.0, and the intersegmental coupling constant to ωs = 0.75. The high value
of the latter constant induced kicking patterns with a tight joint coupling. According to Williamson
(1998), the time constants τu and τv determine the shape and the speed of the oscillator output. In
5.5. Experiments and discussion 115
order to guarantee stable oscillations, the ratio r = τu/τv should be kept in the interval [0.1,0.5]. In all
experiments, we fixed the ratio r to 0.5. The sensory feedback coefficient ω p was variable, and was set
as specified in each sub-section.
5.5 Experiments and discussion
To model and analyze our experimental results, we assumed an ideal mass-spring-damper system. This
model represents a first attempt to identify a relationship between oscillation frequency, amplitude of
the oscillation, and other parameters. The differential equation governing the free oscillation of the
mass-spring-damper system is mx(t) + b x(t) + k x(t) = 0. In our case, m is the mass of the robot, b is
the damping coefficient of the spring and k its spring constant. The equation has solutions of the form:
x(t) = Ae−bt/2m cos(ωd t + φ), where A (amplitude of the oscillation) and φ (phase) are determined by
the initial displacement and velocity of the robot. ωn =√
k/m is defined as the undamped natural
frequency of the mass-spring-damper system and ωd =√
ω2n− (b/2m)2 < ωn is its damped natural
frequency. The mass of the robot (fixed throughout all experiments) was m = 1.33kg. The estimated
spring constant was k1 = k2 = 25.5N/m, and the damping coefficient was b = 0.065kg/sec for both
springs (Fig. 5.2 left). For the computation of b, we assumed a viscous frictional force, proportional to
the velocity of the oscillation.
In all experiments, we recorded the system’s movements by tracking the position (relative to an
earth-fixed frame of reference) of colored markers placed on the robot’s hip, knee and ankle. The ex-
periments were organized according to the complexity of their environmental interaction (with/without
ground contact, with/without sensory feedback).
5.5.1 Scenario 1 – Free oscillations
This scenario served to assess the basic properties of the real system and of the corresponding mass-
spring-damper model needed to qualify oscillatory behaviors (and materialize the presence of entrain-
ment). The robot’s joints were not actuated, and the robot was set so that its feet could not touch the
ground no matter the amplitude of the vertical oscillations. At the onset of the experiment, the robot
was lifted by an arbitrarily chosen height, and then let oscillate freely. The resulting motion was har-
monic and underdamped, with an exponentially decreasing amplitude of the form e−αt sin(2πt/T ), a
decay coefficient α = 0.124/sec, and a period T = 1.01sec. Hence, the resonance frequency of the
system could be estimated to be fR = 1/T = 0.99Hz ≈ ωd/2π. The effective spring constant of the
system was Ke f f = 50.5N/m, which is almost twice the spring constant of each spring. From our mea-
surements, we estimated the effective damping coefficient to be approximately Be f f = 0.33N sec/m.
Note that Be f f is not twice the damping coefficient of a single linear spring, as might be inferred by the
5.5. Experiments and discussion 116
value of Ke f f . This clearly shows that the system is not a close-to-ideal mass-spring system, and that
a more rigorous approach would have to consider a better model for the damping force. For instance,
viscous frictional forces proportional to the square of the velocity of the mass should be taken into
account.
5.5.2 Scenario 2 – Forced oscillations without ground contact
In this experiment, the robot’s joints were actuated such that the equation describing the motion of
the robot was mx(t) + b x(t) + k x(t) = F(t), where driving force F(t) is a function of the paramter
settings of the neural oscillators and of the amplitude of the robot’s limb movements – as suggested
by Goldfield et al. (1993). In other words, the movement of the robot can be modeled as a forced
mass-spring system, with the robot’s kicking movements representing the driving force. As in scenario
1, the robot could not reach the ground with its feet. After an initial transient, the system converged
to a steady state, a forced harmonic oscillation. Vertical resonance was achieved for the parameter
setting (τu = 0.108,τv = 0.216), and resulted in an average vertical displacement from the rest position
of 10.6cm, and a peak displacements exceeding 17cm. The dominant frequency of the oscillation,
estimated via a spectral analysis of the vertical component of the hip marker position, was fHip =
1.01Hz, which was very close to the previously estimated resonant frequency of the system fR =
0.99Hz. Interestingly, the system displayed at least three oscillatory modes. This behavior is akin to
spontaneous activity in infants, who enter preferred stable states and exhibit abrupt phase transitions
between states (Goldfield, 1995). Parameter settings close to (τu,τv) = (0.066,0.132) led to a strong
horizontal oscillatory motion, whereas for τu > 0.150 and τv > 0.300, there was an evident torsional
movement. For τu < 0.06, vertical oscillations were essentially unexistent.
5.5.3 Scenario 3 – Forced oscillations with ground contact (ωp = 0)
The goal of this set of experiments was to assess the effect of ground contact on the oscillatory move-
ment observed in scenario 2, in the absence of afferent feedback from the touch sensors (i.e., ω p = 0).
At the onset of each experimental run, we made sure that the robot’s feet could touch the ground. To
correct for the lack of compliance in the robot’s joints, the ground was covered with soft material.
The introduction of this additional nonlinear perturbation led (given appropriate neural control param-
eters) to the emergence of a new behavior: bouncing. Figure 5.4 shows the result of three different
parameter configurations. A suitable model of the movement of the robot’s center of mass needs also
to take into account the nonlinear interaction with the ground, and the stiffness and damping charac-
teristics of the floor and the feet. We propose the following linear model (see also Goldfield et al.,
1993): mx(t) + Be f f x(t) + Ke f f x(t) = F(t), where F(t) = 0 when the feet are off the ground and
F(t) = F0−F0sin(2π f t), F0 > 0, when the feet are on the ground, with Ke f f (effective spring constant)
5.5. Experiments and discussion 117
and Be f f (effective damping coefficient) incorporating the effect of springs, feet and floor.
0
5
10
15
20
25
5 10 15 20 25 30 35 40 45
Ver
tical
dis
plac
emen
t [cm
]
Time [sec]
0
5
10
15
20
25
10 20 30 40 50
Ver
tical
dis
plac
emen
t [cm
]
Time [sec]
0
5
10
15
20
25
10 20 30 40 50 60
Ver
tical
dis
plac
emen
t [cm
]
Time [sec]
0
2
4
6
8
10
16 18 20 22 24 26 28
Ank
le [c
m]
Hip [cm]
Figure 5.4: Forced harmonic oscillations with ground contact (bouncing) in the absence ofsensory feedback (ωp = 0). Top: τu = 0.108,τv = 0.216 and τu = 0.140,τv = 0.280, bottom: τu =0.114,τv = 0.228 (phase plot on the right). In all graphs, the three curves represent the verticaldisplacement of the ankle, knee and hip marker in cm.
5.5.4 Scenario 4 – Forced oscillations with ground contact (ωp > 0)
Afferent sensory feedback and contact with the ground induced a “haptic closure” of the sensory-
motor loop, which turned the linear and externally driven mass-spring system of experiments 2 and
3 into an autonomous limit-cycle system with the intrinsic timing determined by the moment of foot
contact with the ground and by the gain of the feedback connection ω p. In other words, the kicking
frequency (implicitly timed by the neural oscillators) and its phase relationship with the bouncing
was regulated by haptic information, and resulted in entrainment between time of ground contact and
period of the neural oscillators. A positive ωp had at least two advantages: (a) it led to a stabilized
and sustained bouncing, and (b) to an increase of its amplitude (measured as the difference between
successive maxima and minima of the vertical displacement). These effects are visualized in Figure 5.5
5.5. Experiments and discussion 118
top-left, in which the parameters were (τu,τv) = (0.114,0.228) and ωp = 0.5. The phase plot of the
same time series is depicted in Figure 5.5 (top-right). The phase plots in figures 5.4 and 5.5 clearly
0
5
10
15
20
25
10 20 30 40 50 60
Ver
tical
dis
plac
emen
t [cm
]
Time [sec]
0
2
4
6
8
10
16 18 20 22 24 26 28
Ank
le [c
m]
Hip [cm]
0
5
10
15
20
25
10 20 30 40 50
Ver
tical
dis
plac
emen
t [cm
]
Time [sec]
0
2
4
6
8
10
16 18 20 22 24 26 28
Ank
le [c
m]
Hip [cm]
Figure 5.5: Forced harmonic oscillations with ground contact (bouncing) in presence of sen-sory feedback (ωp > 0). Top row: ωp = 0.5,τu = 0.114,τv = 0.228, bottom row: ωp = 0.75,τu =0.140,τv = 0.280.
demonstrate the stabilizing effects of sensory feedback. In Fig. 5.5 (top-right), the parameters were
(τu = 0.140,τv = 0.280) and ωp = 0.75, and the bouncing was stable and sustained. For ωp = 0,
however, the bouncing suddenly collapsed and exhibited more variability (Fig. 5.4 top-right).
The influence of sensory feedback on the bouncing amplitude is evident by comparing Fig. 5.4
(bottom) with Fig. 5.5 (top). In the latter case, the maximum vertical displacement of the hip relative
to the initial position of the ankle marker was 27.3cm, and its maximum vertical displacement relative
to the initial position of the hip marker was 4.4cm. The dominant frequency of the vertical oscillation
(determined via a spectral analysis of the hip marker) was fHip = 0.93Hz, whereas fHip = 0.95Hz
for the same parameter configuration but with ωp = 0. Thus, the sensor feedback also affected the
frequency of the oscillation. After a short initial transient, the robot settled into a stable oscillatory
movement but did not bounce.
5.6. Discussion and conclusion 119
In this scenario, the model is more complicated and has to take into account the change of phase
and timing due to the sensory feedback. This is realized by introducing a new variable φ such that
mx(t) + Be f f x(t) + Ke f f x(t) = F(t,φ).
5.6 Discussion and conclusion
The question of how sensory feedback interacts with the central pattern generator is still open (Taga,
1995). As a demonstration that sensory feedback is not necessary for the generation and coordination
of rhythmic activity, experiments in completely isolated spinal cords and in deafferented animals (i.e.,
without sensory feedback) have shown that the patterns generated by these type of structures are very
similar to those recorded in intact animals (Ijspeert, 2003). What emerged from our study is that a
suitable choice of the intersegmental coupling constant, as well as of the gain of the sensory feedback
reduces movement variability, increases bouncing amplitude, and leads to stability. We attribute this
result to the entrainment of neural and body-environment interaction dynamics. In other words, the
neural system of our model is designed to produce a basic pattern of muscle activation established
not only by the connections between the neural oscillators, but also by the input of sensory signals
representing body movements and the coupling with the environment. Through a recurrent interaction
in the sensorimotor loop, the variability and instability of the movements are stabilized into a limit
cycle. In the sense that such a coupling produces an effect greater than the sum of the individual
components, it is a synergistic coupling. A similar finding, in the case of biped walking, was reported
by Taga (1995).
It has been suggested that the developmental transformation of spontaneous motor activity into
task-specific movements consists of two phases, which are called assembly and tuning phase (Gold-
field et al., 1993). While assembly refers to the self-organization of relationships between the com-
ponents of the system, tuning is concerned with the adaptation of the system parameters to particular
conditions. In this paper, we have primarily focused on the tuning phase by making the premise that
the assembly phase results in a positive intersegmental coupling between hip, knee and ankle. It is in-
teresting to consider the issue of the mechanisms underlying the assembly phase. Although bouncing
is intrinsically a rhythmic activity for which central pattern generators represent suitable neural struc-
tures, there is no evidence that newborn infants move their limbs in a manner consistent with the output
of central pattern generators, and indeed, sporadic kicking movements are more plausible candidates.
Given that neural oscillators are usually modeled as a set of mutually inhibitory neurons, the assembly
phase could be a process during which the topology of a vanilla-type cell assembly changes, driven by
feedback from the environment, and by a value system (based on the amplitude of the oscillations, for
instance).
With respect to the tuning phase, there is still much to do. In some sense, tuning refers to the non-
5.6. Discussion and conclusion 120
stationary regime which occurs before stabilization of movement patterns. In other words, it is the by-
product of the entrainment between neural control structure and environment – when sensory feedback
turns the system into an autonomous limit-cycle system. At a lower level of control, tuning could
also be implemented as changes in gain or time-constants of the neural oscillators. An autonomous
implementation of such parameter tuning could be realized via a mechanism of Boltzmann exploration
driven by a value system (Fig. 5.2 right). The author has successfully used this combination in a
pendulating humanoid robot (Lungarella and Berthouze, 2002c).
Yet, all this may not be sufficient to hypothesize a valid model of child motor development as there
is evidence that kicking behaviors display spatio-temporal patterns. In particular, Taga et al. (1999) re-
cently discussed the chaotic dynamics of spontaneous movements in human infants. Thus, formulating
the development of those skills in a dynamical systems framework would be highly desirable so that
an appropriate set of adaptive mechanisms could be implemented and tested against human data.
Chapter 6
Value-based stochastic exploration
6.1 Synopsis
This chapter is about the principle of exploratory activity, which asserts that “exploratory activity is
a fundamental process by which an agent collects information for learning about its own body, and
control structure, and for mastering the interaction with its surrounding environment.” Because of its
relevance for all chapters of this thesis, we provide an in-depth view of this principle and motivate its
necessity from a developmental point of view (see also Chapter 2). The chapter’s main emphasis is on a
particular instantiation of the principle of exploratory activity: a value-dependent stochastic exploration
scheme that can be used to produce exploratory activity. In particular, the scheme is applied to the
online calibration of a set of PID (proportional-integral-derivative)-controllers of a high performance
robotic head.
6.2 Introduction
Every learning system faces the dilemma between exploring its parameter spaces – e.g., weights of an
artificial neural network, time constants of a set of neural oscillators, the strength of the coupling be-
tween various limbs, or between body and environment – while simultaneously exploiting the good
parameter configurations that exploration has already uncovered. This trade-off is also known as
exploration-exploitation dilemma. To solve the dilemma learning systems typically resort to some
kind of ad hoc heuristics that varies across learning task and environment. A possible strategy, for
instance, could seek to combine random and unbiased exploration, such as the one of the Metropo-
lis algorithm (Metropolis et al., 1953), and a dynamic and gradual trade-off between exploration and
exploitation known as Simulated Annealing (Cerny, 1985).
The study presented in this chapter is situated in the context of value-based learning, i.e., a form of
121
6.2. Introduction 122
learning in which an agent endowed with a value system is responsible for producing its own reinforce-
ment. Generally speaking, the purpose of a value system is either to notify the current behavioral state
to the agent (e.g., arousal, sleep, waking), or to mediate environmental saliency, that is, the signaling of
the occurrence of relevant stimuli or events to the agent’s neural system (e.g., novelty, pain, reward) by
modulating its activity and plasticity. To date, a number of explicit realizations of value-based learn-
ing schemes in robotic systems exist. In all those implementations value systems play either the role
of internal mediators of salient environmental stimuli or events (Almassy et al., 1998; Krichmar and
Edelman, 2002; Scheier and Lambrinos, 1996; Sporns and Alexander, 2002) or, what is more relevant
for this chapter, are used to guide some sort of exploratory process (Lungarella and Berthouze, 2002c).
In previous chapters (Chapter 3 and 4), a simple value system was described that was employed
to regulate the exploration of the parameter space associated with the control system of a robot whose
task was to learn to pendulate, i.e., to swing like a pendulum. In that particular case study, the robot’s
control system consisted of a set of neural oscillators, and the space explored was the space of the
corresponding neural parameters. The value system employed in that study was a function of the
maximum amplitude of the oscillation (evaluated within a given time window through markers placed
on the robot’s body). The value at time t was given by
Vt = maxVt−1(1− ε), |At |
where |At | denotes the absolute value of the instantaneous amplitude of the oscillation. The term (1−ε),
with 0 < ε 1, realized an exponential decay of the value signal when the oscillations was smaller
than the previously achieved maximum amplitude. The following exploration principle was adopted:
when a parameter setting yielded good performance, i.e., a high value Vt , the change of parameters
was slowed down, and hence nearby sets of parameters exploited. Conversely, when the settings led to
low-amplitude oscillations, a rapid and large change of parameters was triggered.
In this chapter, we address a similar issue by generalizing the exploration process. We propose a
self-supervised value-based exploration scheme, which can be employed for searching the parameter
space associated with a large class of control structures. In particular, we show (both in simulation,
and with a real robot) how our scheme can be used to automatize the calibration of a set of linear
proportional-integrative-derivative (PID) controllers. The size of the parameter space grows exponen-
tially with the cardinality of the set of controllers, and can be very large. Hence exhaustive exploration
as well as a random search strategy are inappropriate.
In Section 6.3, we highlight some studies that motivated and inspired the stochastic exploration
algorithm and that are highly relevant for the principle of exploratory activity. We describe the back-
ground of our scheme, and flesh out its details in sections 6.4 and 6.5. In Section 6.7, we briefly
report on the results of simulations. This is followed by a section on experiments performed with a
6.3. Developmental inspiration and related work 123
high performance robot head (Section 6.8. The results of the experiments are discussed in Section 6.9.
Eventually in Section 6.10, we conclude and point to some future directions.
6.3 Developmental inspiration and related work
Many studies on motor learning indicate that the acquisition of new motor skills (in healthy infants and
in adults) is preceded by a seemingly random, exploratory phase during which possible movements are
explored, selected, and tuned, and the ability to predict the sensory consequences of those movements
is learned (e.g., Angulo-Kinzler, 2001; Goldfield et al., 1993; Haehl et al., 2000; Meltzoff and Moore,
1997; Piek and Carman, 1994; Prechtl, 1997; Thelen and Smith, 1994).
Fetuses (as early as 8 to 10 weeks after conception) as well as newborn infants display a large va-
riety of transient and spontaneous movement patterns, such as general movements, rhythmical sucking
movements, spontaneous arm movements, and stepping and kicking movements (Piek and Carman,
1994; Prechtl, 1997; Thelen and Smith, 1994). General movements, which are argued to be the most
frequently employed and most complex movement patterns, involve a large number of body segments
(e.g., limbs, head, and trunk), and are characterized by a series of unrefined movements of variable
speed, intensity, amplitude, and by the lack of distinctive timing and coordination (Prechtl, 1997).1
It has been suggested that in healthy children during the early stages of motor development, general
movements “presumably” produce all motor possibilities within the neurobiological and anthropomet-
ric constraints of the organism through self-generated motor activity and sensory information Hadders-
Algra (2002). The origin of the variability underlying spontaneous movements has been hypothesized
to reside in the endogenous activity of the nervous system, and in the interaction with the external
world (e.g., reactivity to external stimuli) as well as the interplay between these two forms of activ-
ity (Van Heijst et al., 1999). It is important to note that the true origin of such activity is far from being
understood.
Spontaneous movement patterns are not a hallmark of the prenatal period only. The classifica-
tion of rhythmical stereotypies 2 by Thelen (1979), for instance, provides evidence that for infants
aged from 4 weeks to 12 months, movement variability (quantified by the number of different move-
ment stereotypies) increases, giving raise to an enlarged range of movement combinations, and hence
exploration. Exploratory behavior was also observed by Goldfield et al. (1993), who performed a lon-
gitudinal study in which eight 6 months old infants learning to bounce while being supported by a
Jolly Jumper (i.e., a harness attached to a spring of known stiffness and damping). Goldfield et al.
report that the infants, driven by a process of self-organization, by performing various spontaneous
1As other types of spontaneous movements, general movements have been recognized as important precursors to thedevelopment of movement control.
2Stereotypies can be defined as involuntary, coordinated, repetitive, rhythmic, seemingly purposeless movements.
6.3. Developmental inspiration and related work 124
(seemingly random) movements, appeared to explore their action space, before discovering that kicks
against the floor had “interesting” consequences. After an initial exploratory assembly phase, the in-
fants selected particular behaviors, and began – during the subsequent tuning phase – to exploit the
physical characteristics of the mass-spring system. Additional evidence for exploratory learning was
gathered by Haehl et al. (2000), whose findings seem to suggest that while learning to cruise (that is,
to walk with support), infants display an initial poorly controlled and unstable exploratory “wobble
phase”, characterized by a large number of movement changes. We hypothesize that – as in The-
len’s and in Goldfield’s study – the wobble phase is a result of the exploration of the parameter space
associated with musculo-skeletal apparatus. Along a similar line, but in a different developmental
context, Meltzoff and Moore (1997) introduced the notion of body babbling, which they defined as
an experiential process during which infants, in order to acquire a mapping between movements and
the organ-relation end states that are attained, move their limbs and facial organs in repetitive body
play similar to vocal babbling. It is interesting to note that such babbling is closely related to Piaget’s
circular reaction learning paradigm (Piaget, 1953), and to Thorndike’s classical law of effect, which
states that a given behavior acquired by trial-and-error is more likely to (re-)occur if its consequences
are satisfying (Thorndike, 1911).
From the literature reviewed above it is evident that random motor activity albeit not linked to a
specific functional goal (such as reaching for an object, or turning the head in a particular direction),
can generate correlated sensory information across different sensory modalities (see principle of in-
formation self-structuring). The importance of the information generated cannot be overemphasized.
Indeed, it gives the infant the opportunity to acquire and refine the ability to predict the sensory con-
sequences of its own actions. Surprisingly, despite their obvious relevance as a precursor to motor
control, the neural mechanisms underlying such “stochastic” exploratory activity as well as the be-
havioral implications remain largely unknown. Not many modeling attempts have been made so far.
Van Heijst and Vos (1997) proposed a model of spontaneous activity present in the developing spinal
cord inspired by work of Bullock et al. (1993). The source of stochasticity was a set of randomly
chosen, but fixed, connections between a group of neurons (called spontaneous activity cluster) and
a sinusoidal rhythm generator. Harris (1998) proposed a neural mechanism of stochastic gradient de-
scent through which behavior may be optimized and related it to cerebellar physiology. Harris’ model
included a “noise generator”, that is, an exploratory random process determined by the spontaneous
activity of the neurons in the inferior olive. In that particular model, the optimal behavior could be
“discovered” by finding how the value (performance index) changed with the control parameters. This
in turn required ”some” exploration of the parameter space.
A real world application of a stochastic trial-and-error process for selection of actions in an on-line
environment is the work by Howell and Best (2000), who used a set of “continuous action reinforce-
ment learning automata” to tune a linear controller of an engine. A similar strategy was adopted in the
6.4. Enter simulated annealing 125
study reported here.
6.4 Enter simulated annealing
Simulated Annealing (SA) is a well-known stochastic technique employed for the optimization of com-
plex continuous or discrete systems (Kirkpatrick et al., 1983; Cerny, 1985; Kirkpatrick and Gregory,
1995). At the core of this method is an analogy with the way metal alloys are manufactured: First
they are heated, sometimes up to their melting point, then the metal alloy is slowly cooled down to
give its molecules the possibility to find an arrangement of lowest energy. More specifically, a ther-
modynamical system (the molecules composing the metal, for instance), being offered a succession of
options, changes its configuration from energy E j to energy E j+1 with a probability p = 1 (certainty)
in the case of E j+1 < E j (“downhill move toward an energetically more stable configuration”) and with
a probability
p =e−E j+1/kT
e−E j+1/kT + e−E j/kT=
1
1 + e(E j+1−E j)/kT≈ e−(E j+1−E j)/kT
otherwise (“uphill move toward a less stable configuration”). The parameter T > 0 denotes the tem-
perature or amount of thermal noise of the system, and k is the Boltzmann constant. The temperature
parameter, if sufficiently large, allows the system to make state transitions that would be improbable at
lower temperatures, and which can temporarily lead to an increase of energy. The appropriate choice of
the annealing or cooling schedule Tt , i.e., the sequence of temperatures and the amount of time spent at
each, plays a key role for determining success or failure of the annealing process (Hajek, 1988). On the
one hand, a fast annealing schedule allows for a rapid convergence to a local minimum of energy; on
the other hand, a slow cooling may lead to a better exploration of the space of possible configurations,
and eventually to an energetically more stable local minimum. Essentially, this is an instantiation of
the exploration-exploitation dilemma.
Due two its simplicity, Simulated Annealing has been used in a wide range of applications with a
large parameter space, such as the traveling salesman problem (Cerny, 1985), and multi-objective opti-
mization of complex analog and digital integrated circuits (Kirkpatrick and Gregory, 1995). Due to the
“generality” of its framework, SA can also be employed in combination with other methods (e.g., arti-
ficial neural networks). Here, we discuss the combination of Simulated Annealing with the Metropolis
algorithm (Metropolis et al., 1953) – a particular type of Monte Carlo method that solves problems
by randomly generating number or parameter or state configurations, and by observing the fraction
of configurations obeying some properties (Whitman and Kalos, 1982). In the case of the Metropo-
lis algorithm the distribution of the parameter configurations converges to the Boltzmann distribution
(function commonly used in statistical mechanics to determined the speeds of molecules).
6.5. Parameter exploration 126
6.5 Parameter exploration
In this section, we present and describe in detail a flexible parameter exploration scheme that can be
easily adapted to a large class of problems. The results of simulations and of experiments performed
with an high-performance robotic head, as well as possible alternative applications of the algorithm are
discussed in subsequent sections.
T=0.5
T=1
T=1.5
T=2
T=2.5
0 0.5 1 1.5 2 2.5 3 3.5 4
0
0.2
0.4
0.6
0.8
1
-dV
f(x)
=ex
p(dV
/T)
Figure 6.1: Metropolis-like exponential probability distribution. This figure exemplifies the ef-fect of the temperature on the probability to make an downhill move. See text for details.
Let Vt denote the value of the performed action at time t, and δV = Vt −Vt−1 the difference of
two subsequent values. As a slight notational difference to traditional SA, we exchanged Et , which
traditionally represents the energy of the system at time t, with Vt , and exchanged uphill moves in the
space of possible state configurations with downhill moves in the state of possible parameters. Reason
being that “good” parameter configurations lead to a high value – corresponding to energetically more
stable state configurations. The opposite holds for “bad” parameter configurations. The algorithmic
gist, however, remains unperturbed. In order to conform to the original Metropolis algorithm δE =
−δV = Vt−1−Vt . If δE < 0 , that is δV > 0, we say that the previously performed parameter change
had a positive value and hence gave rise to an uphill move.
The exploration process works as follows: A particular action is performed, and its value Vt com-
puted. Vt ∈ [0,c], where c > 0, and hence −c < δV < c. Please note that δV has to be somehow
matched to the possible temperatures Tt of the annealing process (cf. Eqn.6.2 and Fig.6.1).
6.5. Parameter exploration 127
1: V0← 02: T0← Tmax
3: c← 14: γ← number close to 15: ε← small number6: t← 17: x0← initial values ∈ [xlo,xhi]8: u← (xhi−xlo)/N9: repeat
10: test parameter settings11: get Vt
12: δV ←Vt −Vt−1
13: if δV > 0 then14: Tt ← γTt−1
15: repeat16: extract nt from N(0,1) limited to [−1,+1]17: a← nt (c−Vt)18: until (xt−1 + aTt u)< xhi and (xt−1 + aTt u)> xlo
19: xt ← xt−1 + aTt u20: else21: Tt ← Tt−1
22: p← eδV/Tt
23: extract r from uniform distribution [0,1]24: if r < p then25: repeat26: extract nt from N(0,1) limited to [−1,+1]27: a← nt (c−Vt)28: until (xt−1 + aTt u)< xhi and (xt−1 + aTt u)> xlo
29: xt ← xt−1 + aTt u30: else31: repeat32: extract nt from N(0,1) limited to [−1,+1]33: a← nt (c−Vt)34: until (xt−2 + aTt u)< xhi and (xt−2 + aTt u)> xlo
35: xt ← xt−2 + aTt u36: end if37: end if38: t← t+139: until terminating criterion is satisfied
Figure 6.2: Pseudo-code of the exploration process. For explanations see text.
6.5. Parameter exploration 128
If δV > 0, we perform the following exploration step:
xt = xt−1 + nt Tt (c−Vt)u , (6.1)
where xt is the N-dimensional vector of parameters at time t, that is xt = (x1t ,x
2t , . . .,x
Nt ). nt is the
realization of a Gaussian random process with zero mean and unitary variance – also called the gener-
ating function – which is limited, via an upper and lower saturation threshold, to the interval [−1,1]. A
generating function with a Cauchy distribution would have been another candidate. This thresholding
is necessary, in order to keep the size of the exploration steps under control (gaussian distributions have
rather long tails). The vector u is a parameter-dependent normalization factor. A possible candidate is
u = (xhi−xlo)/N, with N 1, and xhi and xlo being vectors whose elements are the upper and lower
limits of the elements of xt . Another candidate is u = (xhi− xlo)/T0, where T0 is the initial tempera-
ture. The term nt Tt (c−Vt )u can be picturized as some sort of probability cloud centered in xt−1, or
xt−2 respectively. The radius of the probability cloud shrinks for increasing values Vt , and decreasing
temperature Tt .
If δV < 0, we either attempt a downhill move according to Eqn. 6.1 (accepting the parameter
configuration xt ) with a probability:
p = eδVTt ,
or we step back to the former position xt−2 (discarding the previously tested parameter configuration
xt−1), and starting from there, we sample another area of the parameter space. With the previously
introduced formalism this gives the following parameter update:
xt = xt−2 + nt Tt (c−Vt)u .
Now, remember that the temperature parameter Tt > 0 is subject to an annealing schedule. Figure 6.1
shows the exponentially decaying trace for various temperature settings. As can be seen, decreasing
temperatures reduce the probability of uphill moves. The choice of a suitable Tt is important. We opted
for an exponential annealing schedule (for other annealing schedules see Press et al., 1995, p. 452):
Tt = γTt−1 (6.2)
where γ is close to 1. As can be seen, the temperature parameter is limited to the interval [0,T0]. We
opted, after some testing with our setup, for T0 = 20. A correct choice of the initial temperature leads to
a rapid initial exploration of many alternative paths in the parameter space. According to this scheme,
as the exploration proceeds, the control parameter Tt , as well as Vt are gradually reduced. In other
words, the system goes uphill as well as downhill, but the lower the temperature the less likely is any
6.6. Control problem 129
significant downhill excursion, so as to allow the system to settle in a promising area of the parameter
space. SA can be described as a sequence of Markov chains, each corresponding to a temperature value
Tt . Every computational step of a chain starts only after the previous step has been completed, thus the
operation of SA is strictly sequential.
6.6 Control problem
Despite advances in the field of control systems and the availability of many sophisticated control
schemes, PID control, due to its versatility and ease of use, remains a very common control algorithm.
The input of the controller at time t is the deviation e(t) of the measured feedback f eed(t) from
a specified reference signal re f (t) (see Fig. 6.3): e(t) = re f (t)− f eed(t). A standard form of the
controller is given by the following equation:
y(t) = Kp e(t) + Ki
Z T
0e(t)dt + Kd
de(t)dt
.
The output y(t) is fed to the input of the motor (labeled controlled system in Fig. 6.3).
ref
feed
Performanceevaluation
-1
Stochasticexploration
Feedback
Controlledsystem
Controller
Figure 6.3: Control scheme. The arrows in the model depict the information flow.
Optimal system performance requires the parameters Kp, Ki, and Kd to be appropriately chosen
and tuned, so as to meet prescribed time-domain performance criteria given a particular combination
of motor dynamics and load inertia. These criteria are usually specified in terms of rise and settling
6.7. Simulation 130
time, overshoot and steady-state error of one of the state variables (or a derived variable thereof, e.g.,
position), following a step change in the reference signal (set-point tracking). For the experiments
reported in this chapter we made use of the former two criteria: The overshoot M p, which is defined as
the amount of overcorrection in an under-damped control system (in an over-damped control system
the overshoot is zero), and the rise-time Tr, which is defined as the time required for the system’s step
response to rise from 0% of its final value to 100% of its final value. As reference signal we employed
a square wave of fixed frequency, but variable amplitude.
Typically, the parameters Kp, Ki, and Kd of the PID controller (or PID compensator) are determined
by means of some heuristics or some other criteria (Spong and Vidyasagar, 1989) which require a
model of the to-be-controlled system. Here, we show how a value-based stochastic exploration of the
parameter space can be used to achieve the same result without the need for such a system model.
6.7 Simulation
The proposed stochastic exploration process was first tested on a discretized version of the following
lumped-parameter model of a DC-motor:
Jdωdt
=−K f ω + Km i
Ldidt
=−Ri−Kb ω + u
where ω and i are the state variables of the system (rotating speed and current, respectively). J is
the sum of motor and load inertia, K f the electromotive force constant; Km is the armature constant,
and Kb the damping ratio of the mechanical system. R is the electrical resistance in Ohm, L is the
electrical inductance (Henry), and u is the source voltage (input). Unless specified otherwise, the
following parameters were kept constant throughout the study: R = 2.0Ω, L = 0.5H , Kb = 0.2Nmsec,
K f = Km = 0.15Nm/A, J = 0.025kgm2/s2.
At the outset of the exploration run, the controller’s PID parameters were randomly initialized in the
interval [0.0,1.0]. One iteration of the exploration process consisted in measuring the response of the
controlled system (DC-motor) to step input, and evaluating its performance in terms of overshoot and
rise time. Each iteration step lasted around three “simulated” seconds. The input signal was then reset,
and based on the result of the “performance evaluation” a new set of PID parameter was chosen (see
Fig. 6.3). The change of value over “simulated” time for two independent exploratory runs is shown in
Figure 6.4. In both cases the value was V = 1/(1 + k Mp Tr), where Mp is the overshoot, Tr the rising
time, and k > 0 a suitably chosen constant. Figure 6.5 displays a “cut” through the 3-dimensional
value landscape determined by a systematic exploration of the parameter space. Overlapped are the
6.8. Real world setup 131
parameters explored by the value-based exploration scheme.
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 200 400 600 800 1000 1200 1400 1600 1800
Val
ue
Time [s]
Figure 6.4: Normalized value vs. time (min=0.0, max=1.0). Shown are the results for V = 1/(1 +k Mp Tr). As can be seen from the graphs, in both cases, after 1000 “simulated” seconds thevalue is already very high.
6.8 Real world setup
The stochastic exploration process was also tested in a real world setup, that is, the robot head depicted
in Fig. 6.6. Each of the robot’s eyes had two independent degrees of freedom: pan and tilt. The neck
had also two degrees of freedom. Eye and neck motors differed in size and in terms of load inertia,
and hence had dissimilar dynamics. This is visible in Figure 6.7, which displays the position over time
of right eye pan and of the neck pan – given the same triple (Kp,Ki,Kd). Due to the larger inertia of
the neck pan motor (compared to the eye pan motor), the overshoot for the neck pan is larger. The
control/sampling rate was fixed to 50Hz.
We used the stochastic exploration procedure to simultaneously search the parameter space of the
PID controllers of the left and right eye pan, and the neck pan degree of freedom (totally 9 parameters).
In response to a step input, the three transient performance objectives to be minimized were: (a) V1 =
k Mp (overshoot), (b) V2 = k Mp Tr (overshoot multiplied by rise time), and (c) V3 = k (Mp + 0.1)Tr
(results not shown here), with k a small constant. Mp and Tr were both normalized to lie in the interval
[0,1] by dividing them with the maximum of the reference signal, and with the duration of the step
function applied to the motor, respectively.
6.8. Real world setup 132
0
0.2
0.4
0.6
0.8
1
1
2
3
4
5
6
7
0
0.5
1
Ki
Kp
valu
e
Figure 6.5: Systematic exploration of the parameter space and resulting value landscape forKd = 4.0. The white dots are the parameters explored during a value-based exploration.
The PID parameters were initialized with a set of parameter leading to oscillations: (Kp,Ki,Kd) =
(1.0,1.0,0.0). For each iteration of the exploration procedure, new PID parameters for the controller
of each motor were set, and subsequently tested for 3sec. The input signal was then reset, and both
eyes and the neck were allowed to go back to their initial (zero) position. The results for right eye pan
(EPR) and neck pan in two representative experimental runs are displayed in Fig. 6.9 and Fig. 6.10.
The main results for those cases are summarized in Table 6.1 and Table 6.2. As can be seen the initial
parameters lead to large initial overshoots (Mp0 > 0.7). In all four cases, the exploration is able to
successfully make out parameter settings, which yield very low overshoot. The number of visualized
exploratory iterations are 45 (Fig. 6.9) and 73 (Fig. 6.10).
V = k Mp Mp0 Mp f Tr0 Tr f Kp0 Kp f Ki0 Ki f Kd0 Kd f
EPR 0.88 ≈0 0.04 0.54 1.0 2.07 1.0 0.001 0.0 1.41Neck Pan 0.72 0.05 0.10 0.30 1.0 2.78 1.0 0.029 0.0 0.053
Table 6.1: EPR = Eye-pan right; Mp0: initial normalized overshoot; Mp f : final normalized over-shoot; Tr0: initial rise-time (in sec.); Tr f : final rise-time (in sec.).
6.8. Real world setup 133
Figure 6.6: High-performance robot head used in our experiments.
-2000
-1000
0
1000
2000
3000
4000
5000
6000
245 250 255 260 265 270
Pos
Time [s]
-10000
-5000
0
5000
10000
15000
20000
25000
245 250 255 260 265 270
Pos
Time [s]
Figure 6.7: Qualitative comparison between dynamics of eye pan and neck pan degrees offreedom.
6.9. Discussion 134
V = k Mp Tr Mp0 Mp f Mp0 Tr0 Mp f Tr f Kp0 Kp f Ki0 Ki f Kd0 Kd f
EPR 0.87 ≈0 0.035 ≈0 1.0 1.99 1.0 0.001 0.0 0.188Neck Pan 0.71 0.006 0.071 0.004 1.0 2.35 1.0 0.003 0.0 1.77
Table 6.2: EPR = Eye-pan right; Mp0: initial normalized overshoot; Mp f : final normalized over-shoot; Tr0: initial rise-time (in sec.); Tr f : final rise-time (in sec.).
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0 50 100 150 200 250 300 350 400 450
Ove
rsho
ot*R
iseT
ime
Time [s]
Figure 6.8: Time series of transient performance evaluation for eye pan degree of freedom.
6.9 Discussion
We have proposed a value-based stochastic parameter exploration scheme, and have shown (in simu-
lation and in a real robotic system) how it can be used to automatize the calibration of a set of PID
controllers. Our experimental results demonstrate that the number of iterations (and thus, the time)
required to attain convergence were acceptably low (see Fig.6.8). This result is remarkable because the
exploration despite starting with no a priori knowledge and no model of the system was able to con-
verge to a sub-optimal (but satisfying) solution in a short time. It is important to note that the prompt
availability of afferent sensory feedback was crucial for the exploration to work. An intrinsic feature
of the proposed scheme is that the parameter space is explored by gathering information to identify the
direction of parameter change that leads to an improvement of behavioral performance.
The suggested exploration process has a few distinct advantages over systematic search, or standard
gradient-based exploration. First off, the method is not greedy, i.e., the tendency to get stuck in local
minima is reduced. Commonly used strategies to jump out of local minima are either the introduction of
a small amount of noise, or “kicks” when a local minimum is reached. However, this may not be always
acceptable in a real world setup. In many ways, Simulated Annealing prescribes a controlled way of
6.10. Conclusion 135
introducing noise for a more robust iterative search. Second, the Metropolis part of the exploration
scheme quite naturally results in an “extended search” in the neighborhoods of local optima (as shown
by Figure 6.5).
Mechanisms of stochastic parameter exploration deserve attention for at least four reasons: (a) they
are simple; (b) they ensure that any possible parameter setting will be tested eventually (statistically
speaking); (c) they do not introduce any strong biases in the parameter selection; and (d) due to their
dynamic nature they are adaptive and robust against external disturbances (exploration never stops).
There are, however, also a few disadvantages: (1) In various instances random walk exploration
can require an exponential time, and often does not take full advantage of the opportunity to select the
most informative action/query (Thrun, 1995). It has been shown, for instance, that directed exploration
techniques – i.e., techniques that utilize some exploration-specific heuristics for guiding exploration
search (Thrun, 1992) – can reduce the complexity of active learning from exponential learning time
(random exploration) to polynomial training time. This heuristics for optimizing knowledge gain,
however, introduces a designer bias. (2) The development of a proper annealing schedule requires
some experimentation. Indeed, the development of an annealing schedule that works robustly for a
whole class of similar problems is still an open question (Kirkpatrick and Gregory, 1995, p. 877).
(3) A similar problem affects the choice of an appropriate generating function. (4) In an open-ended
scenario, in which exploration does not stop, another problem that would need to be considered is
“re-annealing”, that is, the resetting of the annealing schedule.
On a final note, although this chapter did not stress the biological relevance of the proposed scheme,
recent work by Kadar et al. (2002) seems to suggest that many patterns observed during exploratory
motor learning may indeed be explained by some sort of biased random walk/diffusion control model.
Interestingly, our scheme can be easily conceptualized as a directed random walk (Metropolis part) in
which the degree of randomness (diffusion) is controlled by an annealing schedule (Simulated Anneal-
ing part). It follows that further refinements of the scheme proposed here, may prove to be adequate
for modeling exploratory motor learning in infants and adults.
6.10 Conclusion
This chapter presented a particular instantiation of the principle of exploratory activity: a value-based
stochastic exploration scheme. The algorithm was applied first tested in simulation, and then applied to
the adaptive exploration of a 9-dimensional parameter associated with the controller of a robotic active
vision system. The system quickly converged to a “good” parameter configuration. We conclude,
that the algorithm can be considered an appropriate candidate in all those situations in which a large
parameter space should be rapidly explored.
Before concluding, it is important to address an issue that has been already raised in the conclu-
6.10. Conclusion 136
sion of Chapter 4, that is, how does exploration relate to learning and to development? To stress the
point, we repeat here what partially has been already said in previous chapters. In our framework,
exploration is seen as a fundamental aspect of task acquisition that goes beyond learning. Exploration
produces the diversity of sensory-motor trajectories which higher brain systems can select and exploit
to realize learning. In this sense, exploration represents not only a crucial aspect of learning, but also a
mechanisms of plasticity and adaptivity in its own right. The use of a stochastic value-based parameter
exploration scheme such as the one discussed here is only a first step toward learning. Clearly, explo-
ration has to include more than the mere exploration of the parameter space associated with the control
system. As shown in chapters 3, 4, and 5, exploration of intrinsic dynamics, of the interaction with the
environment, as well as the coupling between body, control, and environment play also a critical role.
6.10. Conclusion 137
-2000
-1000
0
1000
2000
3000
4000
5000
6000
0 50 100 150 200 250
Pos
Time [s]
-10000
-5000
0
5000
10000
15000
20000
25000
0 50 100 150 200 250
Pos
Time [s]
-2000
-1000
0
1000
2000
3000
4000
5000
6000
0 5 10 15 20 25 30
Pos
Time [s]
-10000
-5000
0
5000
10000
15000
20000
25000
0 5 10 15 20 25 30
Pos
Time [s]
-2000
-1000
0
1000
2000
3000
4000
5000
6000
245 250 255 260 265 270
Pos
Time [s]
-10000
-5000
0
5000
10000
15000
20000
25000
245 250 255 260 265 270
Pos
Time [s]
Figure 6.9: Time series of right eye pan and neck pan degrees of freedom for V = k Mp (seetext). The desired position (square wave of period 6sec) and the effective (measured position)are superposed. The length of the series is T = 273sec (corresponding to 45 exploratory itera-tions). First column, first row: Complete time series of right eye pan. Second column, first row:Complete time series of neck pan. Second and third row are close-ups of beginning and end ofthe stochastic exploration of eye and neck, respectively.
6.10. Conclusion 138
-2000
-1000
0
1000
2000
3000
4000
5000
6000
0 50 100 150 200 250 300 350 400
Pos
Time [s]
-15000
-10000
-5000
0
5000
10000
15000
20000
25000
30000
0 50 100 150 200 250 300 350 400
Pos
Time [s]
-2000
-1000
0
1000
2000
3000
4000
5000
6000
5 10 15 20 25
Pos
Time [s]
-15000
-10000
-5000
0
5000
10000
15000
20000
25000
30000
5 10 15 20 25
Pos
Time [s]
-2000
-1000
0
1000
2000
3000
4000
5000
6000
415 420 425 430 435
Pos
Time [s]
-15000
-10000
-5000
0
5000
10000
15000
20000
25000
30000
415 420 425 430 435
Pos
Time [s]
Figure 6.10: Time series of right eye pan and neck pan degrees of freedom for V = k Mp Tr
(see text). The desired position (square wave of period 6sec) and the position measured viaencoder are superposed. The length of the series is T = 439sec (corresponding to 73 exploratoryiterations). First column, first row: Complete time series of right eye pan. Second column, firstrow: Complete time series of neck pan. Second and third row are close-ups of beginning andend of the stochastic exploration of eye and neck, respectively.
Chapter 7
Information-theoretic Analysis of Sensory
Data1
7.1 Synopsis
This chapter represents the first attempt to explore the possibility to quantitatively analyse sensory
data, and the interaction between the signals in different sensory channels of an embodied agent in-
teraction with its local environment (a second and third attempt will be described in two subsequent
chapters). Here, the goal is to get a better intuition of these data which constitute, in a sense, the “raw”
(unprocessed) material that the neural system has to cope with. As an example of analysis, we em-
ploy information theoretic measures such as the Shannon entropy and mutual information. The hope
is that the results emerging from this research will eventually lead to a more formal and quantitative
description of some of the design principles of autonomous agents.
7.2 Introduction
As discussed extensively in Chapter 1 (Introduction) and Chapter 2 (Research Landscape), over the
last decade-or-so, a number of researchers have adopted a developmental perspective on artificial in-
telligence and robotics. The ultimate shared goal among them seems to be the idea of bootstrapping
high-level cognition through a process in which an agent interacts with a real physical environment
over extendend periods of time. Some of the research performed has focused on construction of robots,
some of it examined internal mechanisms by employing embodied models of cognition, such as robots,
and some of it was metaphorical.
1Appeared as Lungarella, M. and Pfeifer, R. Robot as cognitive tools: information-theoretic analysis of sensory-motordata. In Proc. of the 2nd IEEE-RAS Int. Conf. on Humanoid Robotics, pp.245-252, 2001.
139
7.2. Introduction 140
This chapter (as well as chapters 8 and 9) represent an attempt at a more quantitative approach to
the development of embodied cognition. The goal is to acquire an understanding of the sensory data
generated in the different sensory channels as a result of an agent’s interaction with the real world,
as this is – so to speak – the “raw material” the neural substrate has to process. The way such data
is produced depends, in turn, on the particular embodiment of the agent, i.e., its morphology (type,
position, and characteristics of the individual sensors and actuators), as well as on the materials the
agent’s body is made of.
There has been some previous work trying to tackle similar issues. Most of it has been focused on
categorization, that is, the ability of making distinctions in the real world (Pfeifer and Scheier, 1997).
Such ability is of one of the most fundamental cognitive abilities. Indeed, a system that cannot make
distinctions will neither have much of a chance of survival (in the case its a natural organism), nor will it
be of much use (in the case its an artificial system such as a robot). In this research, categorization was
implemented as a process of sensory-motor coordination as suggested early on by Dewey (1896), later
by Edelman (1987), and Thelen and Smith (1994). This approach was chosen to overcome the prob-
lems of classical – disembodied – categorization models like ALCOVE (Kruschke, 1992), which view
categorization as a process of mapping an input vector onto category nodes. Such classical models start
from the assumption that there is an input vector consisting of “psychologically relevant dimensions”,
such as size of an object, its color, its weight, and so on. An agent interacting with the real world, on
the other hand, is exposed to a continuously varying stream of sensory stimulation. This represents a
completely different problem. It has been shown for simple cases that through sensory-motor coordina-
tion temporarily stable patterns of sensory stimulation can be induced, and a dimensionality reduction
of the high-dimensional sensory space can be achieved (Pfeifer and Scheier, 1997; Te Boekhorst et al.,
2003) (see Chapter 8). In ALCOVE, for instance, there are typically three or four nodes in the input
layer, which constitutes only a low-dimensional space. If done properly, sensory-motor coordination
can lead to the generation of “good” sensory data, that is, data which can result in a simplification of
category learning, in a stable categorization behavior.
The present chapter explores the possibility of a quantitative analysis of sensory data. The lead-
ing questions are: “Is it possible to quantify the informational structure produced by sensory-motor
interaction? And how does it compare to the case in which there is no sensory-motor interaction?”
As an example method we employ information theoretic measures to more quantitatively describe the
sensory data and the interrelation between the different sensory channels as the agent interacts with
the real world. We start with a short discussion of some basic aspects of sensorimotor coordination.
Then, we describe a number of experiments performed with a robotic systems and analyze the results.
Eventually, we discuss what we can learn from this type of analysis, and point to some future work.
7.3. Sensory-motor coordination 141
7.3 Sensory-motor coordination
By definition sensory-motor coordination involves both the sensory and the motor systems. In other
words, it involves the agent’s body. As mentioned above, the sensory stimulation that the neural system
has to process depends on the agent’s morphology and on its behavior: through its movements, in par-
ticular through sensory-motor coordinated movements, an agent can induce stable sensory patterns in
different sensory channels that can be exploited to form cross-modal associations (Pfeifer and Scheier,
1997). These cross-modal associations seem to be a basic prerequisite for concept formation (The-
len and Smith, 1994), which in turn are of fundamental importance for the emergence of what might
be called high-level cognition. Cross-modal associations, which also depend on the agent’s morphol-
ogy, nicely demonstrate how embodiment does not only have physical implications, but information
theoretic ones as well. In other words, sensory stimulation is influenced by at least two factors –
morphology and sensory-motor coordination – which are closely related.
Exploration strategies are particular instances of sensory-motor coordinations that are used to ex-
tract different kinds of “information” from the surrounding environment. Tactile or haptic informa-
tion picked up in combination with systematic exploratory movements of the hand (or mouth), yields
richer sensory stimulation and thus potentially more and better information than passive contact, e.g.,
particular hand movements that can be identified as being critical to the ability to recognize objects.
Results from studies on human subjects indicate that people explore objects consistently using dif-
ferent exploratory hand movements, depending on the knowledge (information) they are instructed to
obtain (Lederman and Klatzky, 1990). Particular exploration procedures are used to extract hardness,
pressure, texture, or compliance, because they provide the sensory input which is “optimal and some-
times necessary”, for extracting the desired information. The same holds for vision. Eye movements,
for instance, have a task-dependent character, and are very much depend on what perceptual judge-
ments the human subject is asked to make (Yarbus, 1967). Differences between tasks influence the
way we pick up information, which may or may not maximizes the information intake. In other words,
eye movements influence the statistics of the effective visual input. Lee and Yu (1999) sketch an active
perception framework based on information maximization to reason about the organization of saccadic
eye movements.
7.4 Experimental setup and experiments
Previous work on categorization led to the hypothesis of dimensionality reduction through sensory-
motor coordination. Such idea is suggestive, but has been derived from very simple cases. To further
corroborate this hypothesis, we increased the complexity of the agent. According to the “principle
of ecological balance” (Pfeifer, 1996; Pfeifer and Scheier, 1999) to achieve more interesting kinds
7.4. Experiments 142
of sensory-motor coordination, there has to be a balance of the complexity among the agent’s task-
environment, and its sensory, motor and neural system. This makes also sense from a developmental
perspective. Bushnell and Boudreau (1993) talk about “motor development in the mind”, i.e., in human
infants there is a co-development of the sensory and motor system.
Figure 7.1: Left: Basic manipulator geomtry.
To be able to simulate this co-development, our experiments were performed by using a five degrees
of freedom industrial robot manipulator, equipped with a color CCD-camera that was mounted on the
robot’s end-effector. This setup is often referred to as an “eye-in-hand configuration” (see Fig. 7.1).
The camera was the only exteroceptive sensor used in this set of experiments. Video frames were
recorded at a rate of 10Hz, and the resolution was reduced (downsampled) to 192x192 pixels per
frame. The sensory data were stored into a time-series file. The control of the robot was image-based,
that is, the desired end-effector position was achieved by processing the downsampled camera image
in an adequate way.
The experiments were performed in an unstructered, and static real world environment, cluttered
with a variable number of objects, of different color, form, and size.
The robot’s task was to foveate on relatively small-sized red-colored objects of different shape.
The control architecture was hard-wired. The sensory part consisted of two one-dimensional arrays
of color-sensitive cells (subsequently referred to as 1-D retinas) arranged as a cross. The 1-D retinas
consisted of a certain number of color sensitive rectangles (which might be interpreted as receptors
or sensory channels) – there were M receptors for the horizontal retina, and N for the vertical one.
The output of an individual receptor of the retina was the average over a rectangular patch of pixels
7.4. Experiments 143
of the original camera image (see Figure 2). The receptor density was variable and depended on the
horizontal or vertical position, x or y, respectively (see Fig.2). Three receptors were considered: red (r),
green (g), and blue (b). An additional receptor, sensitive to intensity, was obtained as I = (r +g+b)/3.
An attenuation of the changing lighting conditions was achieved through a color-space transformation
described in (Itti et al., 1998). Three “broadly” color-tuned channels were created: R = r− (g + b)/2
for red, G = g− (r + b)/2 for green, and B = b− (r + g)/2 for blue. The negative values were set to
zero. Each channel yielded maximal response for the pure, fully saturated color to which it was tuned,
and yielded zero response for black (r = g = b = 0) and white (r = g = b = 255) inputs.
On the motor side (refer to Fig. 7.1), the moto-neuron controlling joint J0 (shoulder-pan), responsi-
ble for the rotation around the vertical axis, was fully connected to the color-space transformed outputs
of the horizontal 1-D retina. The moto-neurons of joint J1 (shoulder-tilt), and joint J4 (wrist), which
were responsible for the up and down movement of the camera, and which were “kinematically” cou-
pled (see eqn.1), in order to keep the camera always horizontal, were fully connected to the receptors
of the vertical retina. Furthermore, once the robot had successfully foveated on a red object, joint
J2 (elbow) was actuated in an oscillatory manner – inducing a forward (moving close) and backward
(moving away) movement of the end-effector. Its purpose was a slightly more complex sensory-motor
coordinated interaction with the object itself. The weights of the various neural connections were
hard-wired, and can be thought of as some sort of basic motor reflexes. The control architecture was
a “Braitenberg-style” reactive architecture, with direct sensory-motor couplings. The equations em-
ployed for updating the joint angles Ji[n] were:
J0[n + 1] = J0[n] + f0[n]M
∑i=0
w0,i RH,i[n]
J1[n + 1] = J1[n] + f1[n]N
∑j=0
w1, j RV, j[n]
J2[n + 1] = J2[n] + f2[n] (w0,N/2 + w0,N/2−1/2)
J4[n + 1] = J4[n] + 90 − (J1[n] + J2[n])
where f0[n] = c0 (which was a constant), if J0[n] < J0,min, f0[n] = −c0, if J0[n] > J0,max, and f0[n] =
f0[n−1] otherwise. Identical equations hold for f1[n], f2[n], J1[n], and J2[n], respectively. w0,i and w1, j
represent the weights that connect the output of the horizontal and vertical retinas to the moto-neurons.
An additional feature of the control architecture was a habituation (or boredom) coefficient h. Its
purpose was to avoid situations in which the robot kept focusing on one and the same object. Its
effect was to move the robot into random joint angle configurations, whenever it had been in a certain
configuration for a certain period of time.
The sensory-motor coordinated behavior was compared with one that was not sensory-motor coor-
7.5. Analysis methods 144
dinated, and where the joints were actuated randomly.
7.5 Analysis methods
As mentioned above, we are interested in the quantitative analysis of the sensory data with the aim
of getting a better intuition of the “raw material” the neural system has to process. More specifically,
we want to investigate the use of information theoretic measures (Cover and Thomas, 1991; Papoulis,
1991; Shannon, 1948), such as the Shannon entropy H , the Shannon or mutual information MI, and
its normalized companion MI, if applied to the sensory channels of a situated autonomous agent. The
use of information theory to analyze the nature of sensory data was inspired by a series of articles
by Tononi et al. (1994, 1996) (see also Pfeifer and Scheier, 1999).
The Shannon entropy H(X) accounts for the potential diversity or variability that is displayed by
the random variable X (see appendix of this chapter for a definition), where X represents the signal
from the sensory channel. In principle, the Shannon entropy is equivalent to our intuitive notion of
information – the more unpredictable a signal or event, the more information its occurrence contributes.
MI(X ,Y ) is a measure that takes into account linear as well as nonlinear dependencies between two
time series (observations of a stochastic process), where, in our case, X and Y represent sensory signals
from different channels. This is in contrast to the better known correlation function CORR(X ,Y) (see
appendix), which measures just linear dependencies.
A straightforward method of computing entropy (or self-information, as it is sometimes called)
and mutual information, is to estimate the first and second order probability density functions, p(x)
and p(x,y), by normalizing the 1-D and the 2-D histograms of the time-series. The entropy is
given by H(X) = −∑Ni=1 p(xi) log2 p(xi), for a discrete variable xi, which can be in N possible states
x1,x2, . . .,xN. The mutual information is defined as MI(X ;Y ) = H(X) + H(Y )−H(X ;Y), where
H(X ;Y ) = ∑N ∑M p(x,y) log2 p(x,y) is the joint entropy – the discrete random variable X and Y
have N or M different states, respectively. Furthermore, we define the redundancy as the “capacity-
normalized” mutual information MI(X ;Y) = MI(X ;Y)/C, with C = max p(x) MI(X ;Y), where the max-
imum is taken over all possible densities p(x). In our case, C = maxH(X) = 8bit for all sensory
channels, i.e., the sensory signals assume discrete values between 0 and 255. Usually, the channel
capacity C is measured in bits/sec, and the rate of transmitted information equals the entropy rate
H(X)bits/sec. We normalized everything by fs, which is the rate at which the sensors are read out.
These measures were applied to the four previously introduced sensory sub-modalities: red (R), green
(G), blue (B), and intenstity I = (R + G + B)/3.
7.6. Results 145
7.6 Results
We performed four different kind of analyses. The Shannon entropy (displayed on the vertical axis) of
the red, green, and blue sensory channels of the horizontal 1-D retina in the case of a random movement
can be seen on the left side of Figure 7.2. The retina is composed of 18 color-tuned receptors (R,
G, and B-channels). The graph on the right shows the case of sensory-motor coordinated behavior
(foveation on red-colored objects). In the central part of the retina (from receptor 8 to receptor 12)
the information inflow through the R-channels is increased, whereas the on in the peripheral part it is
decreased, if compared to the not sensory-motor coordinated case. The behavior of the G-channel and
the B-channel are reversed, i.e., the flow through the central channels is decreased. The same holds for
the vertical linear resolution retina, where the effect is less pronounced, but still visible (not shown).
The graphs are averages over six experimental runs.
Figure 7.2: Shannon Entropy for different sensory channels measure in bits. Left: No sensory-motor coordination. Right: Sensory-motor coordination (foveation on red objects).
The intra-sensory information overlap between two R-receptors is shown in Figure 7.3. Basically,
we set X = Ri and Y = R j, where Ri and R j are the ith and jth R-receptor, respectively, and compute
MI(X ;Y ). For perfectly dependent channels MI(X ;Y) = 1, while for completely independent ones
MI(X ;Y ) = 0. For not sensory-motor coordinated interaction, in our case the result of a random ac-
tuation of the joints, the information overlap is minimal, and there is no redundancy. Remember that
MI(X ;X) = H(X)/C, i.e., the redundancy is equivalent to the normalized self-information, which ex-
plains the diagonal ridge. The case of sensory-motor coordinated interaction is shown on the right side
of Figure 7.3. The information overlap is evident from the “bump” in the center of the 3-D plot. In
other words, the amount of information shared by the R-channels in the foveal part of the 1-D retina,
is larger than the one of the R-channels in the peripheral part. This redundancy is a consequence of the
interaction itself.
Figure 6 shows the information overlap between two sub-modalities of the same sensory modality –
7.7. Discussion 146
Figure 7.3: Mutual information between receptors of the same sensory modality. Randomactuation on the left. Sensory-motor coordination on the right.
color (R) and intensity (I). The color channel and the intensity chanlle share the same spatial location,
but are tuned to different stimulus dimensions, color and intensity, respectively. Again, as for the
Shannon entropy, if compared to the not sensory-motor coordinated case, the result of the sensory-
motor coordination is an increase of the mutual information in the central part of the retina, and a
decrease in the peripheral part.
The amount of variability (information) must not be confused with the cumulated amount of acti-
vation (total stimulation) of a particular sensor. Figure 7.4 illustrates this point. The not sensory-motor
coordinated case can be seen on the left side, whereas the sensory-motor coordinated case on the right
side. Since the robot’s task is to foveate on red objects, it is obvious that there will be a peak in the
cumulated stimulation of the R-receptor in that the agent spends a lot of time foveating on red objects.
Less obvious is the fact that the maximum of the entropy of the B-receptor for the same dataset is
slightly larger than the maximum for the R-receptor (see Figure 7.2). More striking is the difference
for the not sensory-motor coordinated case, where the relationship is even reversed! The entropy of the
G-channel and B-channel have an average of 5.7bit, the one of the R-channel 4.7bit. The cumulated
stimulation is higher for the R-channel though (as can be seen in Figure 7.4 left).
7.7 Discussion
The following points are of interest. An upper bound on the information flowing through a particular
sensory channel is given by the channel capacity, that is, H = C bit. Statistical information is maxi-
mized when all possible sensory states of the discretized signals have equal probability of occurrence,
which means that the probability density function is uniform. A lower bound is given by H = 0bit,
7.7. Discussion 147
Figure 7.4: Cumulated stimulation of the R, G, and B-receptors. The sensory-motor coordinatedcase in on the right.
that is, there is no variability whatsoever (constant sensory stimulation). In other words, the probability
density function has a single peak. Exploration strategies that lead to a balance between “predictabil-
ity” (low entropy) and “variability” (high entropy) of the sensory signal need further investigation. In
communication theory the goal is not quite the same as here. Given a certain amount of noise, the
information transfer through the communication channel has to be maximized, the more information
can be pushed through the channel without loss, the better.
The second point of interest is that sensory-motor coordinated interaction leads to redundancy in
the sensory channels of the same and of different modalities, i.e., to a higher mutual predictability
between them. If the two signals are totally uncorrelated, MI(X ;Y ) = 0, and the joint entropy equals
the sum of the individual entropies. The same measure reaches its maximum if the entropies of the
individual sensory channels are high, and there is a high correlation among them (low joint entropy) –
one sensory channel can be used as a predictor for the other one. This redundancy is clearly not present
in the case of not sensory-motor coordinated interaction. The mutual predictability in this case is much
lower (cf. Fig. 5).
Human infants exhibit a wide range of exploration strategies: mouthing, banging, fingering,
scratching, squeezing, waving, and listening (Kellman and Arterberry, 1998). When objects are placed
in the mouth, infants are able to detect surface properties (Meltzoff and Berton, 1979), and object char-
acteristics such as rigidity (Rochat, 1987). In other words, there are different actions related to the
exploration of different object properties, in the sense, that they provide the sensory input, which is
“optimal and sometimes necessary” (Kellman and Arterberry, 1998), for extracting the desired infor-
mation. As shown in this simple case study, the information flow through the various sensory channels
very much depends on the action itself, i.e., an appropriate choice of it seems to be of importance for
7.8. Conclusion and future work 148
the simplification of the subsequent neural processing.
Since entropy is an information theoretic measure that captures the variability of the sensory and
motor signals, it tells us something about the complexity of the interaction itself. In an agent context,
the more diverse the agent’s behavior, the more variation in the sensory channels, and the higher the en-
tropy in the sensory system. Since we are interested not only in complexity due to sensory stimulation,
but also in complexity due to self-generated sensory stimulation, we need to take into account aspects
of the motor system’s variability. A good measure to start with, is the ratio between the total entropy
of the motor signals x = (x1,x2, . . .,xS), and the total entropy of the sensory signals y = (y1,y2, . . .,yT ).
We define it as B = Hmotors(X)/Hsensors(Y ). There should be a match in the variability of the sensory
channels and of the motor outputs. In other words, B measures how well-balanced the motor and
sensory signals are.
The application of information theory is not devoid of problems. One of the biggest problems is the
huge data requirement. In order to avoid it, (strong) assumptions, about the signals involved, and/or the
noise have to be made, such as Gaussianity of the underlying stochastic process. These assumptions
are often unfounded and difficult to test. Nevertheless, assuming true independence of two random
processes, or normality of the signals, can significantly reduce the number of measurements required
for the analysis. In our case, no such assumptions about the sensory and motor data are necessary,
because a sufficiently large number of data has been collected. The number of samples per experiment
was around 3000, sufficient for the computation of measures like entropy and mutual information.
In this chapter the usage of information theory has been exlored. Other methods, such as statistical
analysis, dynamical system analysis, or neural networks, could also be used.
7.8 Conclusion and future work
The ideas exposed in this chapter are an attempt to a more formal description of some of the design
principles of autonomous agents (Pfeifer and Scheier, 1999), e.g., the principle of ecological balance
mentioned earlier.
We also tried to make a step forward in describing the requirements for adaptive agents: The
individual components (dimensions in sensory and motor space) must have a lot of variability, but they
must also be able to couple to others for specific tasks. This is precisely what complex systems are
about.
The importance of an appropriate sensory-motor coordinated interaction cannot be overestimated,
since, as the result show, it can lead to a structuring of the raw sensory stimulation, which in turn
is thought to speed up and simplify learning. We hypothesize that the self-information (entropy) of
the central and peripheral receptors largely depends on the type of interaction, but is less dependent
on the environment. In a similar way, eye-movements, and certain particular hand movements have
7.9. Information theoretic appendix 149
task-dependent characteristics.
Many additional experiments need to be performed to confirm our hypotheses, though. Other
sensory modalities (touch, audition) have to be taken into account. They would shed some light on how
infants discover intermodal relationships, and how the existence of multiple sensory channels allows
to learn more about, and function within, the real world. Different sensor morphologies and other
task-environments have to be tested and their effect on the information inflow needs to be explored.
The next step will then be to exploit the data for learning. Using similar kinds of analyses, we hope
that we will be able to derive the conditions under which agents can learn rapidly new categorization
behaviors, while maintaining the stability of existing ones.
7.9 Information theoretic appendix
Some useful definitions:
• Random variable RV : a variable that assumes a numerical value for each random outcome of an
event or experiment.
• Probability density function of a RV X p(x): normalized 1-D histogram of a RV.
• Joint probability density function of RV X and Y, p(x,y): normalized 2-D histogram of 2 RVs.
• Shannon entropy H(X): measures the randomness of a RV. The more random a RV, the more
entropy does it have. Intuitively it is a measure of (the logarithm of) the number of states the RV
could be in.
• Joint entropy H(X ;Y): measures the uncertainty about both X and Y.
• Mutual information MI(X ;Y) = H(X) + H(Y )−H(X ;Y ): measures the portion of entropy
shared by X and Y. It is high if both X and Y have entropy (high variance), and share a large
fraction of it (high co-variance). It is zero, if X and Y are statistically independent. In other
words, MI is a measure of the deviation from statistical independence.
• Channel capacity C = maxH(X): in our case the maximal amount of statistical information that
can be transferred through a sensory channel in a certain instant. It is computed over all possible
sensory signal distributions p(x).
• Redundancy MI(X ;Y) = MI(X ;Y )/C: the capacity-normalized mutual information.
Since the logarithm in base 2 is used, H and MI are measured in bits. Both, entropy and mutual infor-
mation are used in a statistical connotation; they can be thought of as multivariate generalizations of
7.9. Information theoretic appendix 150
variance and co-variance (univariate statistics) that are sensitive to both linear and nonlinear interac-
tions.
Chapter 8
Dimensionality Reduction through
Sensory-Motor Interaction1
8.1 Synopsis
Traditionally, the problem of category learning has been investigated by employing disembodied cate-
gorization models. One of the basic tenets of embodied cognitive science states that categorization can
be interpreted as a process of sensory-motor coordination, in which an embodied agent, while interact-
ing with its environment, can structure its own input space for the purpose of learning about categories.
Many researchers, including John Dewey and Jean Piaget, have argued that sensory-motor coordina-
tion is crucial for perception and for development. In this chapter we give a quantitative account of
why sensory-motor coordination is important for perception and category learning.
8.2 Introduction
The categorization and discrimination of sensory stimuli, and the generation of new perceptual cate-
gories is one of the most fundamental cognitive abilities (Edelman, 1987; Pfeifer and Scheier, 1999).
Perceptual categorization (Edelman, 1987) is of such importance that a natural organism incapable of
making perceptual discriminations does not have much of a chance of survival, and an artificial device,
such as a robot, lacking this capability, is only be of limited use. Traditionally the problem of cate-
gorization has been investigated by adopting a disembodied perspective. Categorization models like
ALCOVE (Attention Learning COVEring Map) (Kruschke, 1992) or SUSTAIN (Supervised and Unsu-
1Te Boekhorst, R., Lungarella, M. and Pfeifer, R. Dimensionality reduction through sensory-motor coordination. In Proc.of the Joint Int. Conf. on Artificial Neural Networks and Neural Information Processing, Lecture Notes in Computer Science2714, pp.805-812, 2003.
151
8.2. Introduction 152
pervised STratified Adaptive Incremental Network) Love and Medin (1998) implement categorization
as a process of mapping an input vector consisting of “psychologically relevant dimensions” onto a set
of output (category) nodes. The problem with these traditional approaches is, roughly speaking, that
they do not work in the real world, e.g., when used on real robots, because they neglect the fact that in
the real world there are no input vectors that have been preselected by the designer, but continuously
changing sensory stimulation. Moreover, these models do not properly take into account that the prox-
imal stimulation originating from objects varies greatly depending on distance and orientation, and on
other factors that we do not discuss here.
Recently, Clark (1997) have introduced the concept of type-2 problems to denote datasets for which
the mapping from input nodes to output nodes cannot be extracted by means of learning algorithms
and statistical procedures used in classic categorization models. In contrast, whenever the aforemen-
tioned mapping can be learned from data alone, the data are said to correspond to a type-1 problem.
According to Clark and Thornton, the main difficulty in category learning is the generation of appro-
priate (type-1) data. As far as we know, there are two main strategies to achieve this. The first was
suggested by Clark (1997) themselves, and relies on an improvement of the internal processing, which
could be based on already learned things, for instance. The second approach is derived from the basic
tenets of embodied cognitive science and consists of exploiting processes of sensory-motor coordina-
tion (Pfeifer and Scheier, 1999; Scheier and Pfeifer, 1997). As suggested more than one century ago
by John Dewey 1896, categorization can be conceptualized as a process of sensory-motor coordinated
interaction – see also (Edelman, 1987; Pfeifer and Scheier, 1999; Thelen and Smith, 1994). Sensory-
motor coordination involves object-related actions, which can be used to structure the agent’s input
(sensory) space for the purpose of learning about object categories. The structuring of the sensory
space can be thought of as a mapping from a high dimensional input space to a sensory space with a
smaller number of dimensions. The important point to note is that the dimensionality reduction does
not necessarily have to be the result of internal processing only, but may be the result of an appropriate
embodied interaction. From the account given above, we derive two working hypotheses:
• Dimensionality reduction in the sensory space is the result of a sensory-motor coordinated inter-
action of the system with the surrounding environment. This leads to the emergence of correla-
tions among the input variables (sensors) and between these and the motor outputs.
• The particular temporal pattern of these correlations can be used to characterize the robot-
environment interaction, i.e., it can be considered to be a “fingerprint” of this interaction.
More specifically, we expect clearer correlations in the case of a robot that is driven by “sensory-
motor dynamics”, rather than in a robot that moves on the basis of a set of fixed and preprogrammed
instructions. Here, sensory-motor dynamics is defined as the dynamics of a system that is characterized
by continuous feedback between sensory input and motor output.
8.3. Real-world instantiation and environmental setup 153
In what follows, we describe how these two hypotheses were experimentally tested in a real robot.
In Section 8.3, we give an overview of the experimental setup we employed and of the five experiments
we performed. Then in Section 8.4, we describe and motivate the statistical methodology, which we
used to analyze the time-series collected during the robot experiments. Finally, in the last two sections,
we discuss what we have learned and point to some future work.
8.3 Real-world instantiation and environmental setup
All experiments described in this chapter were carried out with a circular wheeled mobile robot called
SamuraiT M . This mobile device is equipped with a ring of 12 ambient-light (AL) and 12 infrared (IR)
sensors, and a standard off-the-shelf color CCD camera for vision (Fig. 8.1). The two wheels allow for
independent translational and rotational movements. For the purpose of this study the original 128x128
Figure 8.1: Environmental setup. Object of different shape can be seen in the background. Ina typical experiment the robot started in one corner of the arena, and dependent on its in-builtreflexes, it tried to avoid obstacles, circled around them, or just tracked a moving obstacle (thesmall cylinder in the front). Note that the omnidirectional camera on the robot was not used forthe experiments discussed here.
pixel array was compressed into a 100 dimensional vector, whose elements were calculated by taking
the spatial average of the pixel intensity 1 over adjacent verticular rectangular image patches. Video
frames were recorded at a rate of 7Hz. In the course of each experimental session, the input data
coming from three (exteroceptive) sensory modalities (AL, IR, and vision), and the difference between
the left and right motor speed (angular velocity), were transferred to a computer and stored in a time-
1The statistical analysis described in this chapter is based on the intensity map of the image, which is obtained bycomputing the spatial average of the red, green and blue color map.
8.4. Statistical analysis 154
series file, yielding a 125 dimensional data vector per time step. The following five experiments were
carried out in a simple environment (a square arena) consisting either of stationary or moving objects
(Fig. 8.1). Each experiment was replicated 15 times. The resulting time series of each run consist of
N = 100 time steps.
• Experiment 1 – Control setup: The robot moved forward in a straight line with a constant speed.
A static red object was placed in its field of view, in the top left corner at the end of the arena.
The behavior of the robot displayed no sensory-motor coordination.
• Experiment 2 – Moving object: The complexity of the control setup was slightly increased by
letting the same object move with a constant speed. As in experiment 1, the behavior was not
sensory-motor coordinated.
• Experiment 3 – Wiggling: The robot was programmed to move forward in an oscillatory manner.
As in experiment 1 and 2, there is no sensory-motor coordination.
• Experiment 4 – Tracking 1: Simple sensory-motor coordination was implemented by letting the
robot move in such a way that it kept, as long as possible, a static object in the center of its field of
view, while moving forward towards the object. This behavior was sensory-motor coordinated.
• Experiment 5 – Tracking 2: As in Experiment 4, but now the robot had to keep a moving ob-
ject in the center of its field of view, while moving towards the object – simple sensory-motor
coordination.
The control architectures for the five experiments were designed so as to be as simple as possible for
the task at hand, i.e., the output of the control architecture of experiments 1 to 3 consisted of a pre-
programmed sequence of motor commands, whereas in the case of experiments 4 and 5, a feedback
signal proportional to the error due to a not centered tracked object was used in order to compute the
new motor activations.
8.4 Statistical analysis
The most straightforward statistical approach would be to correlate the time series of all variables
(difference , AL, IR, and preprocessed camera image) of the 125 dimensional data vector with each
other. However, by doing so we would run into the Bonferroni problem (Snedecor and Cochran,
1980): 5% of that very large number of correlations would be significant by chance alone (accepting
a significance level of α = 0.05). Moreover, the result of this operation would be strongly biased
due to the preponderance of the image data. Additional difficulties would arise from the fact that the
8.4. Statistical analysis 155
computed correlation coefficients would have to be combined into a single and meaningful number,
and due to the variance of the input data, this number would change over time.
Figure 8.2: Use of dimension reduction techniques, exemplified by the image data. (a) How therobot perceives an object when approaching it (experiment 1, no sensory-motor coordination).Moving forward, the image of a static object shifts to the periphery of the visual field. (b)A contour plot of the image data displayed as a time series of the pixel intensities. Verticalaxis: pixel locations. Horizontal axis: time steps. The peripheral shift shows up as an upwardcurving trace. (c) A 3D plot of (b) with pixel intensity plotted along the vertical axis. Here thetrace is visible as a trough cutting through a landscape with a ridge on the right side. (d) Areconstruction of (c) based on the first 5 PCs, which explain 95% of the variance. (e) The sameas (d) but based on average factors.
In order to avoid incurring into the Bonferroni problem, we reduced – as a first step – the number
of variables by performing a principal component analysis (PCA) on each of the three sensory modal-
ities separately. The main idea of Principal Component Analysis (PCA) is to compress the maximum
amount of information of a multivariate data set into a limited (usually small) number of principal
components (PCs). These principal components are linear combinations of the original variables and
in this way the high-dimensional data are projected onto a space of reduced dimensionality. The axes
8.4. Statistical analysis 156
are chosen in order to maximize the variance of the projected data. The usefulness of this method is
exemplified by a PCA performed on the camera image data (see Fig. 8.2). In the case of natural im-
ages, a PCA would result in a principal component to which contribute especially those pixel locations
whose intensity values correlate strongly in time and thus probably originate from one and the same
object.
The image reconstructed from the PCA is, however, far from perfect. This is probably due to the
fact that the PCs are mere linear combinations of “all” variables considered and, in addition, they do
not take into account the sign of their contribution to a given PC. They therefore include also those
variables that correlate only weakly or strongly negatively with a given PC. As an alternative, we
constructed so-called average factors (AF), which are the mean values calculated (for each time step)
over only those variables that load significantly high on a PC and are of the same sign. The comparison
of a reconstruction based on 5 PCs with one based on 5 AFs is shown in Fig. 8.2d and 8.2e. Also for
the other experiments we found that the image data could be adequately described by about 5 to 10
AFs. The AL data and the IR readings could be combined into an average of up to 4 AFs.
Next, the correlations between the AFs representing the reduced sensory space and the angular
data (from the wheel encoders) were computed and brought together into a correlation matrix R. One
way of summarizing the information in this matrix is to estimate 1−|R|, |R| being the determinant of
the correlation matrix. The measure 1−|R| has been put forward as a general measure of the variance
explained by the correlation among more than 2 variables and has actually been proposed to quantify
the dynamics of order and organization in developing biological systems (Banerjee et al., 1990). The
dynamics of this measure could be captured by calculating it for a window of W subsequent time steps,
and by recomputing this quantity after the window has been shifted ahead one step in time. An obvious
shortcoming of this technique is that the end of the data sequence is tapered to zero, i.e., the time series
is truncated at N−W − 1 data points, where N is the length of the data sequence. This represents a
clear loss of information, since events occurring in the tapered region are missed.
As an alternative, we computed the correlations for increasingly larger windows of 4,5, to N time
steps, but with a decreasing influence of the past, i.e., by giving less weight to data points further back
in time. This was achieved by weighting the contribution of the correlation coefficient at time t = T
(the current point in time) by a decay function wt,α (where α is a parameter controlling the steepness
of decay) leading to the calculation of the weighted correlation coefficient:
r∗T =∑T
t=0 wt,αxtyt−∑Tt=0 wt,αxt ∑T
t=0 wt,αyt/N[∑T
t=0 wt,αx2t −(∑T
t=0 wt,αxt)2/N][
∑Tt=0 wt,αy2
t −(∑Tt=0 wt,αyt)
2/N] . (8.1)
As a decay function, we chose a half-gaussian function wt,α = eln(α(t−T ))2u−1(t), where u−1(t) is the
Heaviside-function, which is 1 for t > 0 and 0 for t ≤ 0. This yields a matrix of weighted correlation
8.4. Statistical analysis 157
coefficients R∗(t) for each sampled point at time t. Unfortunately, the determinant of a correlation
matrix is highly sensitive to outliers. In other words, 1− |R∗| could not be used as a measure of
the dynamics of the correlation among the input and output variables. Another way of characterizing
Figure 8.3: Results of experiments 1-3 (no sensory-motor coordination). Left: experiment 1.Center: experiment 2. Right: experiment 3. From top to bottom (and for all columns) thevertical axes are H(λ), λmax, and Npc. In all graphs the horizontal axis denotes time. The curvesare the means from up to 15 experimental runs and the bars are the associated 95% confidencelimits around those means. For details refer to text.
a weighted correlation matrix, is by the set λ of its eigenvalues, λi (i = 1,2, . . .,F), where F is the
number of AFs. The ith eigenvalue equals the proportion of variance accounted for by the ith PC and
hence contains information about the correlation structure of the data set. In fact, this boils down to
yet another PCA, this time on the averaged factors. We propose 3 indices to capture the statistical
properties of the robot’s sensory data; they combine the eigenvalues λ i into a single quantity (and
that, like R∗, has to be calculated for each time step t). The first one is the Shannon Entropy H(λ) =
∑Ni=1 p(λi) log p(λi) (Shannon, 1948). This index attains its maximum for p(λi) = 0.5 (i = 1,2, . . .,N),
i.e., when the variance is evenly accounted for by all PCs. A high value of H(λ) therefore represents
a lack of correlational structure among the variables. When H(λ) is low, the total variance of the data
8.5. Experiments and Discussion 158
matrix is concentrated in one or only a few PCs and hence points to strong correlations. Another way
to quantify the same effect is the so-called Berger-Parker Index (BPI), which measures “dominance” as
D = λmax/∑Ni=1 λi. Since the eigenvalues of a correlation matrix are arranged in decreasing order and
sum up to unity, this results in D = λmax = λ1. The third measure is the number of PCs (eigenvalues)
that together explain 95% of the total variance. We will refer to it as Npc.
Figure 8.4: Results of experiments 4 and 5 (sensory-motor coordination). Left: experiment 4.Right: experiment 5. From top to bottom (and for all columns) the vertical axes are H(λ), λmax,and Npc. The horizontal axis denotes time. For details refer to text.
8.5 Experiments and Discussion
The outcome of the statistical analyses described in the previous section is summarized in Fig. 8.3
and Fig. 8.4. What do these results tell us with respect to the impact of sensory-motor coordination?
The most conspicuous difference between architectures with and without sensory-motor coordination
appears to be in the variance of the introduced indices. The curves of the experiments with sensory-
motor coordination (experiments 4 and 5) display a very large variance (represented by the errorbars).
Furthermore in these experiments the curves for Hλ and Npc decrease more or less monotonously
8.6. Conclusion and future work 159
(whereas λmax rises), implying a steady increase in correlation among the AFs. The large variance
is due to the fact that in some experiments these changes set in much earlier than in others (in some
instances the decrease was so late, that they resembled the outcomes of experiments 1 and 2). But
this does not imply that in the case of no sensory-motor coordination no reduction of dimensionality
occurs. In experiments 1 and 2 there is a reduction. However, it takes place only at the end of the
runs. Experiment 3 is odd – see Fig. 8.3, third column. Although the experiment is not sensory-motor
coordinated, the calculated indeces show the strongest reduction in dimensionality of all experiments!
Note that after an initial increase, the correlations seem to decrease again (see λmax, for instance). One
possible explanation is that the oscillatory movement forces a large part of the sensory input in the
same phase, leading to strong correlations in the case when the robot is distant from objects (beginning
of the experiment) and to weaker correlations otherwise.
8.6 Conclusion and future work
To summarize, the curves do indeed give a “fingerprint” of the robot-environment interaction (note how
the oscillations of the robot are also manifest in the λmax curve of experiment 3), and sensory-motor
coordination does lead to a reduction of dimensionality in the sensory input. However, despite this
very definite impact on the correlations of the sensory data, the results are not entirely straightforward.
Further investigation, in particular more complex sensory-motor setups, is required.
Chapter 9
Fingerprinting Agent-Environment
Interaction 1
9.1 Synopsis
This chapter investigates by means of statistical and information-theoretic measures, to what extent
sensory-motor coordinated activity can generate and structure information in the sensory channels of a
simulated agent interacting with its surrounding environment. We show how the usage of correlation,
entropy, and mutual information can be employed (a) to segment an observed behavior into distinct
behavioral states, (b) to analyze the informational relationship between the different components of the
sensory-motor apparatus, and (c) to quantify (“fingerprint”) the interaction between the agent and its lo-
cal environment. We hypothesize that a deeper understanding of the information-theoretic implications
of sensory-motor coordination can help us endow robots not only with better sensory morphologies,
but also with better strategies to explore their local environments.
9.2 Introduction
Manual haptic perception is the ability to gather information about objects by using the hands. Haptic
exploration is a task-dependent activity, and when people seek information about a particular object
property, such as size, temperature, hardness, or texture, they perform stereotyped exploratory hand
movements. In fact, spontaneously executed hand movements are the best ones to use, in the sense that
they maximize the availability of relevant sensory information gained by haptic exploration (Lederman
and Klatzky, 1990). The same holds for visual exploration. Eye movements, for instance, depend on
1Tarapore, D., Lungarella, M. and Gomez, G. Fingerprinting agent-environment interaction via information theory. InProc. of the 8th Int. Conf. on Intelligent Autonomous Systems, pp.512-520, 2004.
160
9.2. Introduction 161
the perceptual judgement that people are asked to make, and the eyes are typically directed toward
areas of a visual scene or an image that deliver useful and essential perceptual information (Yarbus,
1967). To reason about the organization of saccadic eye movements, Lee and Yu (1999) proposed a
theoretical framework based on information maximization. The basic assumption of their theory is that
due to the small size of our foveas (high resolution part of the eye), our eyes have to continuously move
to maximize the information intake from the world. Differences between tasks obviously influence
the statistics of visual and tactile inputs, as well as the way people acquire information for object
discrimination, recognition, and categorization.
Clearly, the common denominator underlying our perceptual abilities seems to be a process of
sensory-motor coordination that couples action and perception. It follows that coordinated movements
must be considered part of the perceptual system (Thelen and Smith, 1994), and whether the sensory
stimulation is visual, tactile, or auditory, perception always includes associated movements of eyes,
hands, arms, head and neck (Ballard, 1991; Gibson, 1988). Sensory-motor coordination is important,
because (a) it induces correlations between various sensory modalities (such as vision and haptics)
that can be exploited to form cross-modal associations, and (b) it generates structure in the sensory
data that facilitates the subsequent processing of those data (Lungarella and Pfeifer, 2001; Lungarella
and Sporns, 2004; Sporns, 2003). Exploratory activity of hands and eyes is a particular instance of
coordinated motor activity that extracts different kinds of information through interaction with the en-
vironment. In other words, robots and other agents are not passively exposed to sensory information,
but they can actively shape that information. Our long-term goal is to quantitatively understand what
sort of coordinated motor activities lead to what sort of information. We also aim at identifying “fin-
gerprints” (or patterns) characterizing the agent-environment interaction. Our approach builds on top
of previous work on category learning (Pfeifer and Scheier, 1997; Scheier and Pfeifer, 1997), as well as
on information-theoretic and statistical analysis of sensory-motor data (Lungarella and Pfeifer, 2001;
Sporns, 2003; Te Boekhorst et al., 2003) (compare with Chapter 7 and Chapter 8).
In this chapter, we simulated a robotic agent whose task was to search its surrounding environment
for red objects, approach them, and explore them for a while. The analysis of the recorded sensory-
motor data showed that different types of sensory-motor activities displayed distinct fingerprints re-
producible across many experimental runs. In the two following sections, we give an overview of our
experimental setup, and describe the actual experiments. Then, in Section 9.5, we expose our methods
of analysis. In Section 9.6, we present our results and discuss them. Eventually, in Section 9.7, we
conclude and point to some future research directions.
9.3. Experimental Setup 162
9.3 Experimental Setup
We conducted our study in simulation. The experimental setup consisted of a two-wheeled robot and
of a closed environment cluttered with randomly distributed, colored cylindrical objects. A bird’s
eye view on the robot and its ecological niche is shown in Fig. 9.1 a. The robot was equipped with
eleven proximity sensors (d0−10) to measure the distance to the objects and a pan-controlled camera
unit (image sensor) – see Fig. 9.1 b. The proximity sensors had a position-dependent range, that is, the
sensors in the front and the one in the back had a short range, whereas the ones on the sides had a longer
range (see caption of Fig. 9.1). The output of each sensor was affected by additive white noise, and was
partitioned into a space having 32 discrete states, leading to sensory signals with a 5bits resolution. To
reduce the dimensionality of the input data, we divided the camera image into 24 vertical rectangular
slices with widths decreasing toward the center. We computed the amount of the “effective” red color
in each slice as R = r− (b + g)/2, where r, g, and b are the red, green, and blue components of the
color associated with each pixel of the slice. Negative values of R were set to zero. This operation
guaranteed that the red channel gave maximum response for fully saturated red color, that is, for r=31,
g=b=0. The red color slices will also be referred to as red channels or red receptors.
For the control of the robot, we opted for the Extended Braitenberg Architecture (Pfeifer and
Scheier, 1999). In this architecture, each of the robot’s sensors is connected to a number of processes
which run in parallel and continuously influence the agent’s internal state, and govern its behavior.
Because our goal is to illustrate how standard statistical and information-theoretic measures can be
employed to quantify (and fingerprint) the agent-environment interaction, we started by decomposing
the robot’s behavior into three distinct behavioral states: (a) “explore the environment” and “find red
objects”, (b) “track red objects”, and (c) “circle around red objects.” It is important to note that the
three behavioral states display coordinated motor activity, and are characterized by a tight coupling
between sensing and acting. We advance here that the segmentation of the observed behavior into
distinct behavioral states is an important (maybe even necessary) step for fingerprinting the agent-
environment interaction and identifying stable patterns of interaction (such as stereotyped exploratory
hand movements).
9.4 Experiments
A top view of a typical experiment is shown in Fig. 9.1 a. We conducted 16 experiments. Each
experiment consisted of approximately 1400 data samples, which were stored into a time series file
for subsequent analysis. At the outset of each experimental run, the robot’s initial position was set
to the final position of the previous experiment (except for the first experiment where the robot was
placed in the origin of the x-y plane), and the behavioral state was reset to “exploring.” In this particular
9.4. Experiments 163
(a)
camera
m0 m1
d10d9
d0
d4
d1
d2 d3
d7 d8
d6 d5
(b)
(c)
Figure 9.1: (a) Bird’s eye view on the robot and its ecological niche. The trace depicts the pathof the robot during a typical experiment. (b) Schematic representation of the simulated agent.The sensors have a position-dependent range: if rl is the length of the robot, the range of d0,d1, d9, and d10 is 1.8rl, the one of d2 and d3 is 1.2rl, and the one of d4, d5, d6, d7, and d8 is 0.6rl.(c) Extended Braitenberg Control Architecture: As shown, four processes govern the agent’sbehavior.
9.5. Methods 164
state the robot randomly explored its environment while avoiding obstacles. Concurrently, the robot’s
camera panned from side to side (by 60 degrees on each side). If the maximum of the effective red
color (summed over the entire image) passed a given (fixed) threshold, it was assumed that the robot
had successfully identified a red object. The behavioral state was set to “tracking”, the camera stopped
rotating from side to side, and the robot started moving in the direction pointed at by the camera,
trying to keep the object in the camera’s center of view. Once close to the red object, the robot started
circling around it (while still keeping it in its center of view by adjusting the camera’s pan-angle).
At the same time, a “boredom” signal started increasing. The robot kept circling around the object,
until the boredom signal crossed an upper threshold. In that instant, the robot stopped circling, and
started backing away from the red object, while avoiding other objects. Concurrently, the boredom
signal began to decrease. When the boredom signal finally dropped below a lower threshold, the robot
resumed the exploration of the surrounding environment.
9.5 Methods
First, we introduce some notation. Correlation quantifies the amount of linear dependency between two
random variables X and Y , and is given by Corr(X ,Y ) = (∑x∈X ∑y∈Y p(x,y)(x−mX )(y−mY ))/σX σY ,
where p(x,y) is the second order (or joint) probability density function, mX and mY are the mean, and
σX and σY are the standard deviation (std) of x and y computed over X and Y . The entropy of a random
variable X is a measure of its uncertainty, and is defined as H(X) = −∑x∈X p(x) log p(x), where p(x)
is the first order probability density function associated with X ; in a sense entropy provides a mea-
sure for the sharpness of p(x). The joint entropy between variables X and Y is defined analogously as
H(X ,Y ) = −∑x∈X ∑y∈Y p(x,y) log p(x,y). For entropy as well as for mutual information, we assumed
the binary logarithm. Mutual information measures the statistical independence of two random vari-
ables X and Y (Cover and Thomas, 1991; Shannon, 1948). Using the joint entropy H(X ,Y), we can
define the mutual information between X and Y as MI(X ,Y ) = H(X) + H(Y ) − H(X ,Y). In compar-
ison with correlation, mutual information provides a better and more general criterion to investigate
statistical dependencies between random variables (Steuer et al., 2002). Correlation, entropy and joint
entropy were computed by first approximating p(x) and p(x,y). The most straightforward approach
is to use a histogram-based technique, described, for instance, in (Steuer et al., 2002). Because the
sensors had a resolution of 5 bits, we estimated the histograms by setting the number of bins to 32
(which leads to a bin-size of one). Having a unitary bin size allowed us to map the discretized value of
the sensory stimulus directly onto the corresponding bin for the approximation of the joint probabil-
ity density function, thus speeding up the computation. As noted previously, the distance sensors are
identified by di, i ∈ [0,10], whereas the effective red color sensors are indexed with the numbers 1 to
24.
9.6. Data Analysis and Results 165
9.6 Data Analysis and Results
We analyzed the collected datasets by means of three measures: correlation, mutual information, and
entropy (which is a particular instance of mutual information). In this section we describe, and in part
discuss, the results of our analyses.
9.6.1 Correlation
In the first behavioral state (“exploring”), the robot moved around avoiding obstacles and “searching”
for red objects. In all performed experiments, we observed either no or only weak correlations between
the proximity sensors, that is, the correlations were small and their absolute value close to zero. In
Figure 9.2 a, for instance, the average correlation is 0.011. The intrinsic noise of the sensors, as well
as the unpredictability of the sensory activations while the robot was exploring its ecological niche,
made the identification of statistical dependencies between the sensory activations by means of linear
correlation difficult. Similarly, the output of the red channels did not lead to a “stable” correlation
matrix, that is, the pair-wise correlations between the sensory channels varied significantly across the
different experimental runs. The average correlation in the case of Figure 9.3 a is 0.053 (again a low
value), and the standard deviation is 0.023. The reason is that in this state, the oscillatory movement
of the robot’s camera induced a rapidly changing stream of sensory data, which in turn led to small
correlations between the red channels.
In the second behavioral state (“tracking”), the robot moved toward the previously identified red
object. In this case, the correlations between the activity of the red receptors in and close to the center of
the image are high (see Fig. 9.3 b). A possible explanation is that the robot kept correcting the direction
of its movements so that the tracked object remained in the center of its visual field. Moreover, because
this state was characterized by a goal-directed movement of the robot toward the red object, the number
of red pixels present in the image increased, leading to an increase of the stimulation of the red receptors
located in the center (note that the activation of the red receptors is an average computed over a vertical
slice), and to a corresponding increase of the correlation between those receptors.
In the third behavioral state (“circling”), we observed negative correlations (−0.442) between the
pairs of proximity sensors located on the ipsi-lateral (same) side of the robot, such as (d2,d9) or
(d3,d10), (see Fig. 9.2 c). Due to the non-linearities of the data and the noise-induced correlations,
however, these correlations are not immediately evident from the plot. In this state, we observed in
all performed experimental runs, strong correlations between the output of the red channels located in
(and close to) the central image area (see Fig. 9.3 c). The correlation was 0.920 for receptors in the
center, with an overall average of 0.166. The standard deviation of the correlation computed over all
experiments was 0.041. While circling around the object, the robot kept foveating on it. Due to the
limitations of the camera angle, however, the object appeared on the side and not in the center of the
9.6. Data Analysis and Results 166
visual field.
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
d0
d1
d2
d3
d4
d5
d6
d7
d8
d9
d10
(a)
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
d0
d1
d2
d3
d4
d5
d6
d7
d8
d9
d10
(b)
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
d0
d1
d2
d3
d4
d5
d6
d7
d8
d9
d10
(c)
Figure 9.2: Correlation matrix obtained from the pair-wise correlation of the distance sensorsfor one particular experimental run during the behavioral state: (a) “exploring”, (b) “tracking”,(c)“circling.” The higher the correlation, the larger the size of the square. From left to rightthe average correlation is: 0.011± 0.004, 0.097± 0.012, and 0.083± 0.041, where ± indicates thestandard deviation.
2 4 6 8 10 12 14 16 18 20 22 24
2
4
6
8
10
12
14
16
18
20
22
24
(a)
2 4 6 8 10 12 14 16 18 20 22 24
2
4
6
8
10
12
14
16
18
20
22
24
(b)
2 4 6 8 10 12 14 16 18 20 22 24
2
4
6
8
10
12
14
16
18
20
22
24
(c)
Figure 9.3: Correlation matrix obtained from the pair-wise correlation of the red channels forone particular experimental run during the behavioral state: (a) “exploring”, (b) “tracking”,(c) “circling.” The higher the correlation, the larger the size of the square. From left to rightthe average correlation is: 0.053± 0.023, 0.309± 0.042, and 0.166± 0.031, where ± indicates thestandard deviation.
9.6. Data Analysis and Results 167
9.6.2 Entropy and mutual information
The pair-wise mutual information between the eleven proximity sensors is shown in Figure 9.4. The
diagonal of the same plot gives the entropy of the sensory stimulation – courtesy of the expression
H(X) = MI(X ,X). Because the individual sensors are affected by uniform white noise, even sensors
that are never active, can be characterized by a potentially large entropy (see graph of cumulated
activation in Fig. 9.6).
In the first and second behavioral states, the results of the analysis of the data gathered in a partic-
ular experiment cannot be generalized to all experiments. The reason is that in experiments in which
the robot avoids obstacles, the average mutual information between sensors, as well as the entropy of
the individual sensors, is larger compared to experimental runs in which the robot does not encounter
any object. In the third behavioral state “circling”, the entropy of the activation of the sensors on both
sides of the robot is large: H(d3) = 2.83 bits and H(d10) = 2.75 bits (see Fig. 9.4 c). In the same Figure,
the mutual information between these sensors is also high: MI(d2,d9) = 0.62 bits. Figure 9.5 shows
the mutual information matrices obtained from the estimation of the mutual information for pairs of
red channels. In the behavioral state “exploring”, the average mutual information computed over all
experiments is 0.123 bits, and the standard deviation is 0.020 bits (Fig. 9.5 a shows the result for one
particular experiment). The reason for the low values of mutual information is that the camera oscil-
lates from side to side, thus leading to a rapidly changing camera image, and hence to a drop of the
statistical dependence between red channels. In the second behavioral state ”tracking”, the entropy for
the red receptors in and around the center is high in comparison with the one of the first behavioral
state (mean: 2.674 bits, standard deviation: 0.362 bits). The same holds for the mutual information be-
tween the red receptors (mean: 0.604 bits, standard deviation: 0.160 bits) (see Fig. 9.5 b). In the third
behavioral state, the entropy of the red channels at the periphery, as well as the mutual information
between them, is large (see Fig. 9.5 c). Across all experiments, for both sides of the image sensor, the
standard deviation of the mutual information assumes high values (e.g., the std of the receptor on the
far left of the image sensor is 0.461 bits). In contrast, the standard deviation for the red channels close
to the center is low (e.g., 0.244 bits), and largely independent from the direction in which the robot
is moving around the object. The standard deviation in the mutual information between red receptors
across all the experiments was low (0.102 bits). We conclude that mutual information may provide a
good and stable measure for identifying and characterizing agent-environment interaction.
9.6.3 Cumulated sensor activation
The amount of variability (information) should not be confused with the cumulated amount of sen-
sory activation (total stimulation) of a particular sensor. The total sensory stimulation for both sensory
modalities was computed by integrating – separately for each behavioral state – the activation of the in-
9.6. Data Analysis and Results 168
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
d0
d1
d2
d3
d4
d5
d6
d7
d8
d9
d10
(a)
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
d0
d1
d2
d3
d4
d5
d6
d7
d8
d9
d10
(b)
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
d0
d1
d2
d3
d4
d5
d6
d7
d8
d9
d10
(c)
Figure 9.4: Mutual information matrix obtained by estimating the mutual information betweenpairs of proximity sensors in one particular experimental run during the behavioral state: (a)”exploring”, (b) ”tracking”, (c) ”circling”. The higher the mutual information, the larger the sizeof the square.
2 4 6 8 10 12 14 16 18 20 22 24
2
4
6
8
10
12
14
16
18
20
22
24
(a)
2 4 6 8 10 12 14 16 18 20 22 24
2
4
6
8
10
12
14
16
18
20
22
24
(b)
2 4 6 8 10 12 14 16 18 20 22 24
2
4
6
8
10
12
14
16
18
20
22
24
(c)
Figure 9.5: Mutual information matrix obtained by estimating the mutual information betweenpairs of red channels in one particular experimental run during the behavioral state: (a) ”ex-ploring”, (b) ”tracking”, (c) ”circling”. The higher the mutual information, the larger the size ofthe square.
dividual sensors during an experiment. We then normalized the activation as a percentage (see Fig. 9.6).
In the “exploring” and “tracking” behavioral states the cumulated sensor activation does not show any
stable patterns across multiple experiments, in the sense that the positions of the peaks change from
experiment to experiment and depend on the number of objects encountered. In the third behavioral
9.6. Data Analysis and Results 169
state, however, the activation levels of the sensors d2 and d3 are high and stable across all experimental
runs (see Fig. 9.6 a). These sensors are used when the robot moves toward the red object. The same
graph shows that the activation levels of the sensors d9 or d10 are characterized by large values. These
particular sensors are used to prevent the robot from colliding with the object (while circling around
it). As for the distance sensors, we also computed the activation levels of the 24 red receptors (see
Fig. 9.6 b). The total stimulation of the red channels in the first behavioral state does not display sta-
bility across all experiments. In the second behavioral state the activation levels for the red receptors
close to the center are high. The activation levels, however, gradually decrease toward the periphery.
The decrease is a result of the continuous adjustments of the camera pan-angle in order to keep the
red object in the center of its visual field. Thus, the peripheral red receptors are not stimulated. The
behavioral state “circling” shows high activation levels for the image sensors on both sides of the robot.
0 1 2 3 4 5 6 7 8 9 10 11 12
0
10
20
30
40
50
60
70
80
90
100
activ
atio
n le
vels
(%
)
(a)
0 2 4 6 8 10 12 14 16 18 20 22 24
0
10
20
30
40
50
60
70
80
90
100
activ
atio
n le
vels
(%
)
(b)
Figure 9.6: (a) Plot of activation levels for the proximity sensors (1 to 12) for the three behavioralstates. (b) Plot of activation levels for the image sensors (1 to 24) for the three behavioralstates. The plots display the average computed over 16 experimental runs. The bars denotethe standard deviation.
9.6.4 Pre-processed image entropy
The change over time of the total image entropy (computed as the average of the entropies of the
individual vertical slices) is displayed in Fig. 9.6 c. While the robot is exploring its ecological niche,
the image entropy is low and constant (phase P1), that is, there is not much variability in the sensory
9.7. Further Discussion and Conclusion 170
0 200 400 600 800 1000 1200 1400 16000
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
time [simulation steps]
entr
opy
[bits
]
P1 P3 P2
Figure 9.7: Entropy of the effective red color averaged over all vertical slices. P1: exploring; P2:tracking; P3: circling. The plot displays the average computed over 16 experimental runs. Thebars denote the standard deviation.
channel. When the robot starts approaching the red object (second behavioral state), the image entropy
begins to increase (phase P2). The image entropy reaches its maximum in the third behavioral state,
and stays high as long as the robot keeps circling around the red object (phase P3).
9.7 Further Discussion and Conclusion
To summarize, coordinated motor activity leads to correlations in the sensory data that can be used
to characterize the robot-environment interaction. Statistical measures, such as correlation and mu-
tual information, can be employed to extract fingerprints of the robot-environment interaction. In the
“circling” behavioral state, for instance, the average correlation (evaluated over 16 experimental runs)
divided by the number of distance sensors (11) or red receptors (24) is 0.083± 0.041 for the distance
sensors and 0.166± 0.031 for the receptors (where ± indicates the standard deviation). Mean and stan-
dard deviation clearly show that the fingerprint (extracted by means of correlation analysis) is stable
across multiple experimental runs. Similarly, in the “tracking” behavioral state, the average correlation
is 0.097± 0.012 (for the distance sensors) and 0.309± 0.042 (for the red receptors). These results hold
also for the mutual information.
Although correlation and mutual information provide appropriate statistical measures for finger-
printing interaction, they differ in at least one important aspect. Correlation can be used to identify
fingerprints of robot-environment interaction only if the sensory activations between different sensors
happen to be temporally contiguous. We hypothesize that temporal contiguity and stability in the raw
9.7. Further Discussion and Conclusion 171
sensory data is the result of coordinated motor activity (exploration strategy). In contrast to correlation,
mutual information reveals dependencies between the sensory stimulations that correlation cannot cap-
ture. Our analyses demonstrate that even if the sensory channels are affected by additive white noise,
a proper sensory-motor coordinated interaction can indeed lead to stable fingerprints. Sensory motor
coordination generates sensory data with “high information content.” The entropy of the sensory data
lies between the minimum (0 bit) and maximum entropy bound (5 bits). High entropy could be the con-
sequence of a “complex” robot behavior. The high entropy values correspond to more uncertainty and
therefore more interesting behaviors. This would explain the high entropy values of the image sensors
in the behavioral states “tracking” and “circling” as compared to the behavioral state “exploring”. The
mutual information gives the amount of information shared by different sensors. Sensors coordinating
with the motor in a particular behavioral state exhibit high mutual information. We conclude that the
information shared by sensors and motors provides a fingerprint for the corresponding behavior.
Chapter 10
Summary and Conclusion
This section wraps up the entire story by (a) giving a quick run down of the individual chapters, (b)
pointing out their scientific contribution, and (c) showing how each chapter relates to a particular design
principle.
10.1 Summary
This thesis explored a novel area of research residing at the interface of robotics, embodied artificial
intelligence, and developmental sciences called developmental robotics. It exemplified the general
philosophy of action of developmental robotics by means of a series of case-studies in which the re-
ciprocal and dynamic interaction of control structure, body, and environment was explored, quantified,
and purposively exploited. The core idea was the realization that embedding such coupling in a devel-
opmental framework could favor the emergence of stable behavioral patterns, and provide the system
with adaptivity and robustness against changes of body and environment. Eventually, based on those
case-studies and on previous work by Pfeifer (1996) – see also (Pfeifer and Scheier, 1999; Pfeifer and
Glatzeder, 2004) – a set of computational, and integrative design principles for developmental systems
was abstracted:
• The principle of cheap design
• The principle of ecological balance
• The value principle
• The principle of design for emergence
• The time scale integration principle
172
10.1. Summary 173
• The starting simple principle
• The principle of exploratory activity
• The principle of information self-structuring
• The principle of social interaction
Figure 10.1: Seven chapters, seven case-studies. The labels denote one or two design princi-ple(s) the case-study intends to address. The numbers indicate the chapter. The picture is thesame of Chapter 1.
Every chapter of this thesis gravitates (in a way or another) around one or more of those principles
(see Fig. 10.1). Additional dependencies are given in the introductory chapter.
• Chapter 2 exposed the main reasons and key motivations behind the convergence of robotics,
artificial intelligence, and developmental sciences. Developmental robotics was defined as a
synthetic and two-pronged methodology that on one side instantiates and investigates models
10.1. Summary 174
originating from developmental psychology and developmental neuroscience, and on the other
exploits insights gained from studies on ontogenetic development to design and construct better
robotic systems (examples of such methodology were given in the subsequent chapters. By pre-
senting some aspects (facets) of developmental sciences that are of interest for developmental
robotics, and giving a general overview of the field, the chapter also attempted to show how
insights from all involved areas can be combined en route to a better understanding of adaptive
systems. A further goal of this chapter was to offer a new perspective on issues dear to develop-
mental robotics, and to point out areas on which research could be focused. All design principles
were implicitly addressed and discussed.
• The basic assumption of Chapter 3 was that the robust and adaptive behavior exhibited by natu-
ral organisms is often the product of a complex interaction between various plastic mechanisms
acting at multiple time scales. The chapter reported on experiments conducted with a small-sized
humanoid robot that learned to swing like a pendulum, and whose joints were controlled by a
set of Matsuoka neural oscillators. The study illustrated how the exploration of neural plastic-
ity, body growth, and entrainment to physical dynamics – where each mechanism has a specific
time scale – led to a more efficient exploration of the sensory and motor space, and eventually
to a more adaptive behavior. Thus, it clearly addressed the time scale integration principle.
The study also showed how an initial reduction of the number of mechanical degrees of freedom
guarantees – in the absence of strong environmental perturbations – a more efficient value-driven
exploration of the sensory-motor space. This is an instantiation of the principle of exploratory
activity, and of the value principle. In addition, the chapter reported on a comparative analysis
between the outright use of all degrees of freedom (left and right hip and knee), and the pro-
gressive involvement of all degrees of freedom by using a developmental mechanism of initial
freezing and freeing, such as hypothesized by Bernstein (1967). We observed that a freezing of
the peripheral degree of freedom (knee) led to an increase of the range of neural control param-
eters associated with a stable oscillatory behavior. This result can be seen as positive evidence
for what asserted by the starting simple principle.
• Chapter 4 revisited the study presented in Chapter 3 by introducing a coupling (a nonlinear
spring) between environment and system. Under otherwise unchanged experimental conditions
(same robot, same task), it brought forward evidence that a single phase of freezing and subse-
quent freeing of degrees of freedom is not sufficient to achieve optimal performance, and instead,
an alternate freezing and freeing of degrees of freedom is required. The interest of this result was
two-fold: (a) it confirmed the recent observation by Newell and Vaillancourt (2001) that Bern-
stein’s framework may be too narrow to account for real data, and (b) it suggested that perturba-
tions which push the system outside its postural stability, or an increase of the task complexity
10.1. Summary 175
might be the mechanisms that trigger alternate freezing and freeing of degrees of freedom. By
addressing similar issues to Chapter 3, this chapter relates to the very same principles: the time
scale integration principle, value principle, the principle of exploratory activity, and the
starting simple principle.
• Chapter 5 documented a study that was inspired by a longitudinal experiment performed
by Goldfield et al. (1993), in which six 8-months old infants strapped in a Jolly Jumper (a har-
ness attached to a spring) were examined while they learned to bounce. Goldfield and colleagues
advanced the hypothesis that the infants’ spontaneous motor activity could be decomposed in an
assembly and in a tuning phase. Assembly is a process of self-organization, which establishes a
coupling among the components of the neural and the musculo-skeletal system. Its outcome is
a task-specific movement whose parameters are subsequently explored, and tuned to particular
task conditions. In this chapter, we described and discussed a set of preliminary experiments,
which were performed with a bouncing humanoid robot, and which were aimed at instantiating a
few computational principles hypothesized to underlie the development of motor skills. Our ex-
periments showed that a suitable choice of the coupling constants between hip, knee, and ankle
joints, as well as of the strength of the sensory feedback, induces a reduction of movement vari-
ability, and leads to an increase in bouncing amplitude and movement stability. This result was
attributed to the synergy between neural and body-environment dynamics, and to their mutual
entrainment. It follows that this chapter substantiates the principle of design for emergence.
Moreover, although the parameter exploration was performed manually, the chapter also relates
to the principle of exploratory activity.
• Chapter 6 documented a value-based stochastic exploration scheme used to explore the param-
eter space associated with a neuro-musculoskeletal system. The scheme combined random and
unbiased Montecarlo exploration, and a gradual, value-driven trade-off between exploration and
exploitation (controlled by Simulated Annealing). Despite its simplicity, the method’s likelihood
to get stuck in local minima of the parameter space was low, and its convergence sufficiently
rapid. The scheme was first tested in simulation, and the applied to the online self-calibration of
a set of linear proportional-integrative-derivative (PID) controllers of a robot head, and to the ex-
ploration of the neural parameters associated with the control architectures of an oscillating and
a bouncing robot (not discussed). The chapter also provided additional support for the principle
of exploratory activity and the value principle, and motivated from a developmental point of
view, the need for endowing robots with exploratory skills. Indeed, many studies of motor learn-
ing indicate that the acquisition of new motor skills (in healthy infants and in adults) is preceded
by a seemingly random, exploratory phase during which possible movements are explored, se-
lected, and tuned, and the ability to predict the sensory consequences of those movements is
10.1. Summary 176
acquired. The stochastic exploration scheme may prove to be an adequate first step for modeling
such exploratory activity.
• Chapter 7 presented initial quantitative analyses of sensory data showing how simple sensory-
motor functions like gaze direction and foveation can generate informational structure (e.g., high
mutual information) in the visual channel of a robot. The main objective was to get a better in-
tuition of the sensory data which constitute, in a sense, the “raw” (unprocessed) material that the
neural system has to cope with. As an example of analysis, information theoretic measures such
as the Shannon entropy and mutual information were employed. The results showed that embod-
ied action/interaction can indeed induce statistical dependencies and informational structure in
and among sensory channels. Such evidence clearly confirms what asserted by the principle of
information self-structuring. A plausible assumption derived from this chapter – further cor-
roborated in chapters 8 and 9 – is that the principle of information self-structuring may emerge
as a key element toward understanding learning and development in robots and organisms (see
also Sporns, 2004, for a similar conclusion).
• Chapter 8 provided additional supporting evidence for the principle of information self-
structuring. Traditionally, categorization has been investigated by employing disembodied cat-
egorization models, in which input patterns are mapped onto category nodes. In contrast to
this mainstream view, a few researchers – including John Dewey and Jean Piaget – have argued
for a more interactive view of categorization, which hypothesizes that perceptual categorization
cannot be decoupled from coordinated motor activity. Chapter 8 embraced the second view by
assuming that by interacting with the environment an agent can structure its sensory input for
the purpose of learning about categories. The core idea is that by employing embodied agents,
it would be possible to compensate for the lack of quantitative evidence to support the hypothe-
sis that sensory-motor coordination is of crucial importance for category learning. The chapter
advanced our understanding by putting forward quantitative evidence confirming the hypothesis
that sensory-motor interaction represents one viable strategy to reduce the dimensionality of the
space of all possible configurations of states that the sensory and motor system can assume. It
is through embodied interaction and a process of exploration that the information generated by
both perceiving and acting becomes correlated. It is thus clear that the results presented in this
chapter support what asserted by the principle of information self-structuring, and the principle
of exploratory activity.
• Similarly to Chapter 7 and 8, Chapter 9 examined by means of statistical and information-
theoretic measures, to what extent sensory-motor coordinated activity can generate and struc-
ture information in the sensory channels of a simulated agent interacting with its local envi-
10.2. Conclusion 177
ronment. In this sense, it is also exemplified the importance of the principle of information
self-structuring. The novel contribution of the chapter was that is showed how the usage of
correlation, entropy, and mutual information can be employed (a) to segment an observed be-
havior into distinct behavioral states, (b) to quantify (“fingerprint”) the agent-environment in-
teraction, and (c) to analyze the informational relationship between the different components
of the sensory-motor apparatus. The chapter further discussed the hypothesis that a deeper un-
derstanding of the information-theoretic implications of sensory-motor coordination can help us
endow robots with better sensory morphologies, and with better strategies for exploring their
surrounding environment.
The only principle not directly touched in this thesis is the principle of social interaction. Indeed,
almost all experimental results exposed and discussed in this thesis belong to the third and fourth
primary area of interest introduced in Chapter 2: (a) “agent-related sensory-motor control”, that is, the
study of the agent’s bodily capabilities, changes of morphology, their effects on motor skill acquisition,
and so on; and (b) “developmental mechanisms and processes.” Although, we acknowledge the crucial
importance of social interaction for the emergence and development of cognitive structure in man and
machines (see Chapter 1 and 2), in the context of this thesis, we deliberatively chose to avoid touching
socio-historical aspects of development. We justify the active neglect to take into account such a
fundamental aspect by saying that it helped us keep the number of dependent variables low, and the
dimensionality of the problem under control, so to speak.
10.2 Conclusion
Has this thesis provided definitive answers to the questions posed at the outset of the introductory chap-
ter? Probably not. By employing robots as research vehicles, it has explored, however, some novel
paths which in the long-term could lead to such answers. Throughout the whole thesis the core specu-
lation has been that the convergence between developmental sciences, embodied artificial intelligence,
and robotics may represent a prolific route toward understanding the emergence and development of
cognitive, behavioral, and sensory-motor structure in natural and artificial systems. It may help us
not only construct more adaptive machines, but also understand the nature of man better. Indeed, the
results presented in this thesis demonstrate that by taking the developmental approach seriously can
indeed cast a new type of light on old themes.
The success of the infant field developmental robotics and of the research methodology it advo-
cates, will ultimately depend on whether it will be possible to crystalize its central assumptions into
a theory. Such a developmental theory of embodied artificial intelligence may be a key step toward
furthering our understanding of intelligence and toward the synthesis of adaptive machines, and truly
10.2. Conclusion 178
autonomous developmental “baby robots.” As the principles for developmental systems abstracted in
the case-studies, and documented in this thesis show, a theory is at the horizon. Slowly, but surely, the
pieces of this complex puzzle are coming together, and a complete picture is beginning to emerge.
Exciting times are ahead of us.
Bibliography
Adolph, K. E., Eppler, M. A., Marin, L., Weise, I. B., and Clearfield, M. W. (2000). Exploration in theservice of prospective control. Infant Behavior and Development, 23:441–460.
Adolph, K. E., Vereijken, B., and Denny, M. A. (1998). Learning to crawl. Child Development,69:1299–1312.
Almassy, N., Edelman, G. M., and Sporns, O. (1998). Behavioral constraints in the development ofneuronal properties: A cortical model embedded in a real world device. Cerebral Cortex, 8:346–361.
Anderson, W. (1989). Learning to control an inverted pendulum using neural networks. IEEE ControlSystem Magazine, pages 31–36.
Andry, P., Gaussier, P., and Nadel, J. (2002). From visuo-motor development to low-level imitation. InProc. of the 2nd International Workshop on Epigenetics Robotics, pages 7–15.
Angulo-Kinzler, R. M. (2001). Exploration and selection of intralimb coordination patterns in 3-month-old infants. Journal of Motor Behavior, 33(4):363–376.
Arutyunyan, G. H., Gurfinkel, V. S., and Mirskii, M. L. (1969). Organization of movements on execu-tion by man of an exact postural task. Biophysics, 14:1162–1167.
Asada, M., MacDorman, K. F., Ishiguro, H., and Kuniyoshi, Y. (2001). Cognitive developmentalrobotics as a new paradigm for the design of humanoid robots. Robotics and Autonomous Systems,37:185–193.
Ashby, W. R. (1947). Principles of the self-organizing dyanmic system. Journal of General Psychol-ogy, 37:125.
Aslin, R. N. (1988). Perceptual development in infancy: The minnesota symposia on child psychology.volume 20, chapter Anatomical constraints on oculomotor development: Implications for infantperception, pages 67–104. Hillsdale, NJ: Erlbaum.
Balkenius, C., Zlatev, J., Kozima, H., Dautenhahn, K., and Breazeal, C., editors (2001). Proc. of FirstIntl. Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. LundUniversity Cognitive Studies, 85.
Ballard, D. (1991). Animate vision. Artificial Intelligence, 48(1):57–86.
179
BIBLIOGRAPHY 180
Banerjee, P. R., Sibbald, S., and Maze, J. (1990). Quantifying the dynamics of order and organizationin biological systems. Journal of Theoretical Biology, 143:91–112.
Baron-Cohen, S. (1995). Mindblindness. Cambridge, MA: MIT Press.
Bates, E. A. and Elman, J. L. (2002). Connectionism and the study of change. In Johnson, M., editor,Brain Development and Cognition: A Reader. Oxford: Blackwell Publishers.
Beer, R. D. (2004). Autopoiesis and cognition in the game of life. Artificial Life, 10(3):309–326.
Beer, R. D., Chiel, H. J., Quinn, R. D., and Ritzmann, R. E. (1998). Biorobotics approaches to thestudy of motor systems. Current Opinion in Neurobiology, 8:777–782.
Bernstein, N. (1967). The Co-ordination and Regulation of Movements. London: Pergamon.
Bertenthal, B. and Von Hofsten, C. (1998). Eye and trunk control: The foundation for manual devel-opment. Neuroscience and Biobehavioral Reviews, 22(4):515–520.
Berthier, N. E., Clifton, R. K., Gullapalli, V., and McCall, D. J. (1996). Visual information and thecontrol of reaching. Journal of Motor Behavior, 28:187–197.
Berthouze, L., Bakker, P., and Kuniyoshi, Y. (1997). Learning of oculo-motor control: a preludeto robotic imitation. In Proc. of IEEE/RSJ Intl. Conf. on Robotics and Intelligent Systems, pages376–381.
Berthouze, L. and Kuniyoshi, Y. (1998). Emergence and categorization of coordinated visual behaviorthrough embodied interaction. Machine Learning, 31(1-3):187–200.
Berthouze, L., Kuniyoshi, Y., and Pfeifer, R., editors (1999). Proc. of First Intl. Workshop on Emer-gence and Development of Embodied Cognition. Workshop held in Tsukuba, Japan, unpublished.
Berthouze, L. and Lungarella, M. (2004). Motor skill acquisition under environmental perturbations:on the necessity of alternate freezing and freeing. Adaptive Behavior, 12(1). in press.
Berthouze, L. and Metta, G., editors (2004). Third Intl. Workshop on Epigenetic Robotics: ModelingCognitive Development in Robotic Systems. Workshop will take place at the University of Genova,Italy.
Berthouze, L., Shigematsu, Y., and Kuniyoshi, Y. (1998). Dynamic categorization of explorative behav-iors for emergence of stable sensorimotor configurations. In Proc. of Fifth Intl. Conf. on Simulationof Adaptive Behavior, pages 67–72.
Berthouze, L. and Ziemke, T., editors (2003). Epigenetic Robotics: Modelling Cognitive Developmentin Robotic Systems, volume 15 (4).
Bjorklund, E. M. and Green, B. L. (1992). The adaptive nature of cognitive immaturity. AmericanPsychologist, 47:46–54.
Blumberg, B. M. (1996). Old Tricks, New Dogs: Ethology and Interactive Creatures. PhD thesis,Cambridge, MA: The MIT Media Laboratory.
BIBLIOGRAPHY 181
Bornstein, M. H. (1989). Sensitive periods in development: structural characteristics and causal inter-pretations. Psychological Bulletin, 105(2):179–197.
Braitenberg, V. (1984). Vehicles: Experiments in synthetic psychology. Cambridge, MA: MIT Press.
Breazeal, C. L. (2002). Designing Social Robots. Cambridge, MA: MIT Press.
Breazeal, C. L. and Aryananda, L. (2002). Recognition of affective communicative intent in robot-directed speech. Autonomous Robots, 12:83–104.
Breazeal, C. L. and Scassellati, B. (2000). Infant-like social interactions between a robot and a humancaretaker. Adaptive Behavior, 8(1):49–74.
Breazeal, C. L. and Scassellati, B. (2002). Robots that imitate humans. Trends in Cognitive Science,6:481–487.
Brooks, R. A. (1991). Intelligence without representation. Artificial Intelligence, 47:139–160.
Brooks, R. A. (1997). From earwigs to humans. Robotics and Autonomous Systems, 20(2-4):291–304.
Brooks, R. A. (2003). Robot: The Future of Flesh and Machines. London: Penguin Books.
Brooks, R. A., Breazeal, C., Irie, R., Kemp, C. C., Marjanovic, M., Scassellati, B., and Williamson,M. M. (1998). Alternative essences of intelligence. In Proc. of the 15th Natl. Conf. on ArtificialIntelligence, pages 961–978. Madison,WI.
Brooks, R. A. and Stein, L. A. (1994). Building brains for bodies. Autonomous Robots, 1(1):7–25.
Bullock, D., Grossberg, S., and Guenther, F. H. (1993). A self-organizing neural model of motorequivalent reaching and tool use by a multijoint arm. Journal of Cognitive Neuroscience, 5(4):408–435.
Bullowa, M. (1979). Before Speech: The Beginning of Interpersonal Communication. Cambridge,London: Cambridge University Press.
Bushnell, E. M. and Boudreau, J. P. (1993). Motor development in the mind: The potential role ofmotor abilities as a determinant of aspects of perceptual development. Child Development, 64:1005–1021.
Butterworth, G. and Jarrett, B. (1991). What minds have in common in space: spatial mechanismsserving joint visual attention in infancy. British Journal of Developmental Psychology, 9:55–72.
Cerny, V. (1985). Thermodynamic approach to the traveling salesman problem. Journal of Optimiza-tion Theory Application, 45:41–51.
Chec, D. J. and Martin, S. (2002). Functional Movement Development Across the Life Span. W.B.Saunders Company.
Chomsky, N. (1986). Knowledge of Language: Its Nature, Origin, and Use. New York: Praeger.
BIBLIOGRAPHY 182
Churchland, P. S., Ramachandran, V. S., and Sejnowski, T. J. (1994). A critique of pure vision. Cam-bridge, MA: MIT Press.
Clark, A. (1997). Being There: Putting Brain, Body and World Together Again. Cambridge, MA: MITPress.
Clark, A. and Grush, R. (1999). Towards a cognitive robotics. Adaptive Behavior, 7(1):5–16.
Coehlo, J., Piater, J., and Grupen, R. (2001). Developing haptic and visual perceptual categories forreaching and grasping with a humanoid robot. Robotics and Autonomous Systems, 37:195–218.
Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. New York: Wiley.
Dario, P., Laschi, C., and Guglielmelli, E. (1997). Sensor and actuators for ”humanoid” robots. Ad-vanced Robotics, 11(6):567–584.
Dautenhahn, K. and Billard, A. (1999). Studying robot social cognition within a developmental psy-chology framework. In Proc. of Third Intl. Workshop on Advanced Mobile Robots.
Dautenhahn, K. and Nehaniv, C., editors (2002). Imitation in Animals and Artifacts. Cambridge, MA:MIT Press.
De Garis, H., Gers, F., Korkin, M., Agah, A., and Nawa, N. E. (1998). Cam-brain atr’s billion neuronartificial brain project: a three year progress report. Journal of Artificial Life and Robotics, 2:56–61.
Dekaban, A. (1959). Neurology of Infancy. Baltimore: Williams and Williams.
Demiris, Y. (1999). Robot Imitation Mechanisms in Robots and Humans. PhD thesis, Division ofInformatics, University of Edinburgh. unpublished.
Demiris, Y. and Hayes, G. (2002). Imitation as a dual-route process featuring predictive and learningcomponents: a biologically plausible computational model. In Dautenhahn, K. and Nehaniv, C.,editors, Imitation in Animals and Artifacts. Cambridge, MA: MIT Press.
Dennett, D. C. (1998). Brainchildren: A Collection of Essays. Cambridge, MA: MIT Press.
Dewey, J. (1896). The reflex arc concept in psychology. Psychological Review, 3:357–370. Originalwork published in 1896.
Di Paolo, A. E., editor (2002). Adaptive Behavior: Special issue on ”Plastic mechanisms, multipletimescales, and lifetime adaptation”, volume 10 (3-4).
Di Pellegrino, G., Fadiga, L., Fogassi, L., and Rizzolatti, G. (1992). Understanding motor events: aneurophysiological study. Experimental Brain Research, 91:176–180.
Diamond, A. (1990). Developmental time course in human infants and infant monkeys, and the neuralbases of inhibitory control in reaching. In The Development and Neural Bases of Higher CognitiveFunctions, volume 608, pages 637–676. New York Academy of Sciences.
BIBLIOGRAPHY 183
Dickinson, P. S. (2003). Neuromodulation in invertebrate nervous systems. In Arbib, M., editor, MITHandbook of Brain Theory and Neural Networks. Cambridge, MA: MIT Press.
Dominguez, M. and Jacobs, R. A. (2003). Developmental constraints aid the acquisition of binoculardisparity sensitivities. Neural Computation, 15(1):161–182.
Edelman, G. M. (1987). Neural Darwinism: The Theory of Neural Group Selection. New York: BasicBooks.
Edelman, G. M. and Tononi, G. (2001). Consciousness: How Matter Becomes Imagination. London:Penguin Books.
Eliot, L. (2001). Early Intelligence. London: Penguin Books.
Elliott, T. and Shadbolt, N. R. (2001). Growth and repair: instantiating a biologically inspired modelof neural development on the khepera robot. Robotics and Autonomous Systems, 36:149–169.
Elliott, T. and Shadbolt, N. R. (2003). Developmental robotics: manifesto and application. Philosoph-ical Transactions: Mathematical, Physical and Engineering Sciences, 361:2187–2206.
Elman, J., Sur, M., and Weng, J. J., editors (2002). Proc. of Second Intl. Conf. on Development andLearning. Workshop held at the Michigan State University, USA.
Elman, J. L. (1993). Learning and development in neural networks: the importance of starting small.Cognition, 48:71–99.
Elman, J. L., Bates, E. A., Johnson, H. A., Karmiloff-Smith, A., Parisi, D., and Plunkett, K. (1996).Rethinking Innateness: A Connectionist Perspective on Development. Cambridge, MA: MIT Press.A Bradford Book.
Fernald, A. (1985). Four-month-old infants prefer to listen to motherese. Infant Behavior and Devel-opment, 8:181–195.
Ferrell, C. B. and Kemp, C. C. (1996). An ontogenetic perspective on scaling sensorimotor intelligence.In Embodied Cognition and Action: Papers from the 1996 AAAI Fall Symposium.
Fitts, P. M. (1964). Perceptual-motor skill learning. In Melton, A., editor, Categories of HumanLearning, pages 243–285. New York: Academic Press.
Fitzhugh, R. (1961). Impulses and physiological states in theoretical models of nerve membrane.Biophy H, 1:445–456.
Fodor, J. A. (1981). Representations. Brighton, Sussex: Harvester Press.
Fodor, J. A. (1983). The Modularity of Mind. Cambridge, MA: MIT Press.
Fong, T., Nourbakhsh, I., and Dautenhahn, K. (2003). A survey of socially interactive robots. Roboticsand Autonomous Systems, 42:143–166.
BIBLIOGRAPHY 184
Forssberg, H. (1999). Neural control of human motor development. Current Opinion in Neurobiology,9:676–682.
Friston, K. J., Tononi, G., Reeke, G. N., Sporns, O., and Edelman, G. M. (1994). Value-dependentselection in the brain: simulation in a synthetic neural model. Neuroscience, 59(2):229–243.
Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti, G. (1996). Action recognition in the premotorcortex. Brain, 119:593–609.
Gell-Mann, M. (1995). What is complexity. Complexity, 1(1):16–19.
Gesell, A. (1946). The ontogenesis of infant behavior. In Carmichael, L., editor, Manual of ChildPsychology, pages 295–331.
Gibson, E. J. (1988). Exploratory behavior in the development of perceiving, acting, and the acquiringof knowledge. Annual Review of Psychology, 39:1–41.
Gibson, J. J. (1977). The theory of affordances. In Shaw, R. and Brandsford, J., editors, Perceiving,Acting, and Knowing: Toward and Ecological Psychology, pages 62–82.
Goldfield, E. C. (1995). Emergent Forms: Origins and Early Development of Human Action andPerception. New York: Oxford University Press.
Goldfield, E. C., Kay, B. A., and Warren, W. H. (1993). Infant bouncing: the assembly and tuning ofan action system. Child Development, 64:1128–1142.
Gomez, G., Lungarella, M., Eggenberger-Hotz, P., Matsushita, K., and Pfeifer, R. (2004). Simulatingdevelopment in a real robot: on the concurrent increase of sensory, motor, and neural complexity. InProc. of the Fourth Intl. Workshop on Epigenetic Robotics. to appear.
Gottlieb, G. (1991). Experiential canalization of behavioral development: Theory. DevelopmentalPsychology, 27:4–13.
Grillner, S. (1985). Neurobiological bases on rhythmic motor acts in vertebrates. Science, 228:143–149.
Grush, R. (2004). The emulation theory of representation: motor control, imagery, and perception.Behavioral and Brain Sciences. to appear.
Hadders-Algra, M. (2002). Variability in infant motor behavior: A hallmark of the healthy nervoussystem. Infant Behavior and Development, 25:433–451.
Hadders-Algra, M., Brogren, E., and Forssberg, H. (1996). Ontogeny of postural adjustments duringsitting in infancy: Variation, selection and modulation. Journal of Physiology, 493:273–288.
Haehl, V., Vardaxis, V., and Ulrich, B. (2000). Learning to cruise: Bernstein’s theory applied to skillacquisition during infancy. Human Movement Science, 19:685–715.
BIBLIOGRAPHY 185
Hafner, V. V., Fend.M., Lungarella, M., Pfeifer, R., Konig, P., and Kording, K. P. (2003). Optimal cod-ing for naturally occurring whisker deflections. In Proc. of the Joint Intl. Conf. on Neural Networksand Neural Information Processing, pages 805–812. Berlin: Springer-Verlag. LNCS 2714.
Hainline, L. (1998). How the visual system develops: normal and abnormal development. In Slater,A., editor, Perceptual Development: Visual, Auditory, and Speech Perception in Infancy, pages 5–50.Hove: Psychology Press, Ltd.
Hajek, B. (1988). Cooling schedules for optimal annealing. Math. Oper. Res., 13:311–329.
Haken, H. (1983). Synergetics: An Introduction. Berlin: Springer-Verlag.
Haken, H. (1996). Principles of Brain Functioning. A Synergetic Approach to Brain Activity, Behavior,and Cognition. Berlin: Springer-Verlag.
Halliday, M. (1975). Learning How To Mean: Explorations in the Development of Language. Cam-bridge, MA: MIT Press.
Hara, F. and Pfeifer, R., editors (2003). Morpho-functional Machines: The New Species (DesigningEmbodied Intelligence). Berlin: Springer-Verlag.
Harman, K. L., Humphrey, G. K., and Goodale, M. A. (1999). Active manual control of object viewsfacilitates visual recognition. Current Biology, 9:1315–1318.
Harris, C. (1998). On the optimal control of behavior: a stochastic perspective. Journal of NeuroscienceMethods, 83:73–88.
Harris, P. L. (1983). Infant cognition. In Haith, M. and Campos, J., editors, Handbook of ChildPsychology Vol.2: Infancy and Developmental Psychobiology, pages 689–782. New York: Wiley.
Hasselmo, M., Wyble, B., and Fransen, E. (2003). Neuromodulation in mammalian nervous systems.In Arbib, M., editor, MIT Handbook of Brain Theory and Neural Networks. Cambridge, MA: MITPress.
Hatsopoulos, N. G. (1996). Coupling the neural and physical dynamics in rhythmic movements. NeuralComputation, 8:567–581.
Hendriks-Jensen, H. (1996). Catching Ourselves in the Act. Cambridge, MA: MIT Press. A BradfordBook.
Howell, M. N. and Best, M. C. (2000). On-line pid tuning for engine idle-speed control using contin-uous action reinforcement learning automata. Control Engineering Practive, 8:147–154.
Ijspeert, A. (2003). Vertebrate locomotion. In Arbib, M., editor, MIT Handbook of Brain Theory andNeural Networks. Cambridge, MA: MIT Press.
Inaba, M., Nagasaka, K., and Kanehiro, F. (1996). Real-time vision-based control of swing motion bya human-form robot using the remote-brained approach. In Proc. of the 1996 IEEE/RSJ Intl. Conf.on Intelligent Robots and Systems, pages 15–22.
BIBLIOGRAPHY 186
Ishiguro, A., Ishimaru, K., Hayakawa, K., and Kawakatsu, T. (2003). Toward a ”well-balanced” design:a robotic case study. In Proc. of the Second Intl. Symp. on Adaptive Motion in Animals and Machines,number ThP-I-3. electronic proceedings.
Itti, L., Koch, C., and Niebur, E. (1998). A model of saliency-based visual-attention for rapid sceneanalysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254–1259.
Iverson, J. M. and Thelen, E. (1999). Hand, mouth and brain. Journal of Consciouness Studies,6(11-12):19–40.
Jensen, J. L., Thelen, E., Ulrich, B. D., and Zernicke, R. F. (1995). Adaptive dynamics of the legmovement patterns in human infants: Age-related differences in limb control. Journal of MotorBehavior, 27:366–374.
Johnson, M. H. (1997). Developmental Cognitive Neuroscience. Oxford, UK: Blackwell PublisherLtd.
Kadar, E. E., Maxwell, J. P., Stins, J., and Costall, A. (2002). Drifting towards a diffuse controlmodel of exploratory motor learning: a comparison of global and within-trial performance measures.Biological Cybernetics, 87:1–9.
Kato, N., Artola, A., and Singer, W. (1991). Developmental changes in the susceptibility to long-termpotentiation of neurons in rat visual cortex slices. Developmental Brain Research, 60:53–60.
Kay, B. A. (1988). The dimensionality of movement trajectories and the degrees of freedom problem:A tutorial. Human Movement Science, 7:343–364.
Keil, F. C. (1981). Constraints on knowledge and cognitive development. Psychological Review,88:197–227.
Kellman, P. J. and Arterberry, M. E. (1998). The Cradle of Knowledge. Cambridge, MA: MIT Press.A Bradford Book.
Kelso, S. J. (1995). Dynamic Patterns. Cambridge, MA: MIT Press. A Bradford Book.
Kelso, S. J. and Kay, B. A. (1987). Information and control: a macroscopic analysis of perception-action coupling. In Heuer, H. and Sanders, A., editors, Pespectives on Perception and Action, pages3–32.
Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983). Optimization by simulated annealing. Science,220:671–680.
Kirkpatrick, S. and Gregory, B. S. (1995). Simulated annealing. In Arbib, M., editor, The Handbookof Brain Theory and Neural Networks, pages 876–879.
Ko, Y., Challis, J., and Newell, K. (2003). Learning to coordinate redundant degrees of freedom in adynamic balance task. Human Movement Science, 22:47–66.
BIBLIOGRAPHY 187
Konczak, J., Borutta, M., and Dichgans, J. (1995). Development of goal-directed reaching in infants:hand trajectory formation and joint force control. Experimental Brain Research, 106:156–168.
Korner, A. F. and Kraemer, H. C. (1972). Individual differences in spontaneous oral behavior inneonates. In Bosma, J., editor, Proc. of the Third Symp. on Oral Sensation and Perception, pages335–346.
Kozima, H., Nakagawa, C., and Yano, H. (2002). Emergence of imitation mediated by objects. InProc. of the Second Intl. Workshop on Epigenetic Robotics, pages 59–61.
Kozima, H. and Yano, H. (2001). A robot that learns to communicate with human caregivers. In Proc.of the First Intl. Workshop on Epigenetic Robotics.
Krichmar, J. L. and Edelman, G. M. (2002). Machine psychology: autonomous behavior, perceptualcategorization and conditioning in a brain-based device. Cerebral Cortex, 12:818–830.
Kruschke, J. K. (1992). Alcove: An exemplar-based connectionist model of category learning. Psy-chological Review, 99:22–44.
Kuhl, P. K. (2000). Language, mind, and brain: experience alters perception. In Gazzaniga, M., editor,The New Cognitive Neurosciences, pages 99–115.
Kuniyoshi, Y., Yorozu, Y., Inaba.M., and Inoue.H. (2003). From visuo-motor self learning to earlyimitation – a neural architecture for humanoid learning. In Proc. of the 2003 Intl. Conf. on Roboticsand Automation, pages 3132–3139.
Lakoff, G. (1987). Women, Fire, and Dangerous Things: What Categories Reveal about the Mind.Chicago, Illinois: University of Chicago Press.
Lakoff, G. and Johnson, M. (1999). Philosophy in the Flesh: The Embodied Mind and Its Challengeto Western Thought. New York: Basic Books.
Lambrinos, D., Maris, M., Kobayashi, H., Labhart, T., Pfeifer, R., and Wehner, R. (1997). An au-tonomous agent navigating with a polarized light compass. Adaptive Behavior, 6:175–206.
Lederman, S. J. and Klatzky, R. L. (1990). Haptic exploration and object representation. In Goodale,M., editor, Vision and Action: The Control of Grasping, pages 98–109. New Jersey: Ablex.
Lee, T. S. and Yu, S. X. (1999). An information-theoretic framework for understanding saccadicbehaviors. In Solla, S. and Leen, T., editors, Proc. of the First Intl. Conf. on Neural InformationProcessing. Cambridge, MA: MIT Press.
Lichtensteiger, L. and Pfeifer, R. (2002). An optimal sensor morphology improves adaptability of neu-ral network controllers. In Dorronsoro, J., editor, Proc. of the Sixth Intl. Conf. on Neural Networks,pages 850–855. Berlin, Heidelberg: Springer-Verlag.
Lindblom, J. and Ziemke, T. (2003). Social situatedness of natural and artificial intelligence: Vygotskyand beyond. Adaptive Behavior, 11(2):79–96.
BIBLIOGRAPHY 188
Love, B. C. and Medin, D. L. (1998). Sustain: A model of human category learning. In Proc. of the15th Natl. Conf. on Artificial Intelligence (AAAI’98), pages 671–676.
Luisi, P. L. (2003). Autopoiesis: A review and a reappraisal. Naturwissenschaften, 90(2):49–59.
Lungarella, M. and Berthouze, L. (2002a). Adaptivity through physical immaturity. In Proc. of theSecond Intl. Workshop on Epigenetics Robotics, pages 79–86.
Lungarella, M. and Berthouze, L. (2002b). Adaptivity via alternate freeing and freezing of degrees offreedom. In Proc. of the 9th Intl. Conf. on Neural Information Processing, pages 492–497.
Lungarella, M. and Berthouze, L. (2002c). On the interplay between morphological, neural and envi-ronmental dynamics: a robotic case-study. Adaptive Behavior, 10(3-4):223–241.
Lungarella, M. and Berthouze, L. (2003). Learning to bounce: first lessons from a bouncing robot.In Proc. of the Second Intl. Symp. on Adaptive Motion in Animals and Machines, number ThP-II-4.electronic proceedings.
Lungarella, M. and Berthouze, L. (2004). Robot bouncing: on the interaction between body andenvironmental dynamics. In Iida, F., Pfeifer, R., Steels, L., and Kuniyoshi, Y., editors, EmbodiedArtificial Intelligence. Berlin: Springer-Verlag. LNCS.
Lungarella, M., Hafner, V. V., Pfeifer, R., and Yokoi, H. (2002a). An artificial whisker sensor forrobotics. In Proc. of the 15th Intl. Conf. on Intelligent Robots and Systems, pages 2931–2936.
Lungarella, M., Hafner, V. V., Pfeifer, R., and Yokoi, H. (2002b). Whisking: an unexplored sensorymodality. In Proc. of the 7th Intl. Conf. on the Simulation of Adaptive Behavior, pages 58–59.
Lungarella, M. and Metta, G. (2003). Beyond gazing, pointing, and reaching: a survey of developmen-tal robotics. In Proc. of the Third Intl. Workshop on Epigenetic Robotics, pages 81–89.
Lungarella, M., Metta, G., Pfeifer, R., and Sandini, G. (2003). Developmental robotics: a survey.Connection Science, 15(4):151–190.
Lungarella, M. and Pfeifer, R. (2001). Robots as cognitive tools: An information-theoretic analysisof sensory-motor data. In Proc. of the Second IEEE-RAS Intl. Conf. on Humanoid Robotics, pages245–252.
Lungarella, M. and Sporns, O. (2004). Methods for quantifying the informational structure of sensoryand motor data. Neuroinformatics. in preparation.
Manzotti, R. (2000). Intentional Robots. The Design of a Goal-seeking Environment-driven Agent.PhD thesis, University of Genova, Genova, Italy.
Marder, E. and Thirumalai, V. (2002). Cellular, synaptic and network effects of neuromodulation.Neural Networks, 15(4-6):479–493.
Marjanovic, M., Scassellati, B., and Williamson, M. (1996). Self-taught visually-guided pointing for ahumanoid robot. In From Animals to Animats 4: Proc. of the 4th Int. Conf. on Simulation of AdaptiveBehavior, pages 35–44. Cambridge, MA: MIT Press.
BIBLIOGRAPHY 189
Matsuoka, K. (1985). Sustained oscillations generated by mutually inhibiting neurons with adaptation.Biological Cybernetics, 52:367–376.
Maturana, R. H. and Varela, F. J. (1998). The tree of knowledge, the biological roots of human under-standing. Boston, London: Shambala Publications Inc.
McDonald, P. V., Emmerik, R. E., and Newell, K. M. (1989). The effects of practice on limb kinematicsin a throwing task. Journal of Motor Behavior, 21:245–264.
McGraw, M. B. (1940). Neuromuscular development of the human infant as exemplified by theachievement of erect locomotion. Journal of Pediatrics, 17:747–771.
McGraw, M. B. (1945). Neuromuscular maturation of the human infant. New York: Hafner.
Meltzoff, A. and Prinz, W. (2002). The Imitative Mind: Development, Evolution and Brain Bases.Cambridge, MA: MIT Press.
Meltzoff, A. N. and Berton, R. W. (1979). Intermodal matching in human neonates. Nature, 282:403–404.
Meltzoff, A. N. and Moore, M. K. (1977). Imitation of facial and manual gestures by human neonates.Science, 198:74–78.
Meltzoff, A. N. and Moore, M. K. (1997). Explaining facial imitation: A theoretical model. EarlyDevelopment and Parenting, 6:179–192.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953). Equationof state calculations by fast computing machines. Journal of Chemical Physics, 21:1087–1092.
Metta, G. (2000). Babybot: A Study on Sensorimotor Development. Unpublished PhD Thesis, Univer-sity of Genova, Genova, Italy.
Metta, G. and Fitzpatrick, P. (2003). Early integration of vision and manipulation. Adaptive Behavior,11(2):109–128.
Metta, G., Sandini, G., and Konczak, J. (1999). A developmental approach to visually-guided reachingin artificial systems. Neural Networks, 12:1413–1427.
Metta, G., Sandini, G., Natale, L., and Panerai, F. (2001). Development and robotics. In Proc. of theFirst IEEE-RAS Intl. Conf. on Humanoid Robots, pages 33–42.
Miall, R. C., Weir, D. J., Wolpert, D. M., and F., S. (1993). Is the cerebellum a smith predictor? Journalof Motor Behavior, 25(3):203–216.
Mitra, S., Amazeen, P. G., and Turvey, M. T. (1998). Intermediate motor learning as decreasing active(dynamical) degrees of freedom. Human Movement Science, 17:17–65.
Miyakoshi, S., Yamakita, M., and Furata, K. (1994). Juggling control using neural oscillators. In Proc.of the 1994 IEEE/RSJ Intl. Conf. on Robots and Systems, volume 2, pages 1186–1193.
BIBLIOGRAPHY 190
Murata, S., Yoshida, E., Kurokawa, H., Tomita, K., and Kokaji, S. (2001). Self-repairing mechanicalsystems. Autonomous Robots, 10:7–21.
Mussa-Ivaldi, F. (1999). Modular features of motor control and learning. Current Opinion in Neurobi-ology, 9:713–717.
Nadel, J. (2003). Early Social Cognition. Intellectica. in press.
Nadel, J. and Butterworth, G., editors (1999). Imitation in infancy. Cambridge, MA: CambridgeUniversity Press.
Nagai, Y., Asada, M., and Hosoda, K. (2002). Developmental learning model for joint attention. InProc. of 15th Intl. Conf. on Intelligent Robots and Systems (IROS 2002), pages 932–937.
Natale, L., Metta, G., and Sandini, G. (2002). Development of auditory-evoked reflexes: visuo-acousticcues integration in a binocular head. Robotics and Autonomous Systems, 39(2):87–106.
Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press.
Newell, A. and Simon, H. (1976). Computer science as empirical study: Symbols and search. Com-munications of the ACM, 19:113–126.
Newell, K. and Vaillancourt, D. E. (2001). Dimensional change in motor learning. Human MovementScience, 20:695–715.
Newell, K. M. and van Emmerik, R. E. (1989). The acquisition of coordination: Preliminary analysisof learning to write. Human Movement Science, 8:17–32.
Newport, E. L. (1990). Maturational constraints on language learning. Cognitive Science, 14:11–28.
Nolfi, S. and Floreano, D. (2000). Evolutionary Robotics: The Biology, Intelligence, and Technologyof Self-organizing Machines. Cambridge, MA: MIT Press.
Ohgane, K., Ei.S., Kazutoshi, S., and Ohtuski, T. (2004). Emergence of adaptibility to time delay inbipedal locomotion. Biological Cybernetics, 90(2):125–132.
O’Leary, D. D., Schlagger, B. L., and Tuttle, R. (1994). Specification of neocortical areas and thalam-ocortical connections. Annual Review of Neuroscience, 17:419–439.
Panerai, F., Metta, G., and Sandini, G. (2002). Learning visual stabilization reflexes in robots withmoving eyes. Neurocomputing, 48(1-4):323–337.
Papoulis, A. (1991). Probability, Random Variables, and Stochastic Processes. New York: McGraw-Hill.
Pelaez-Nogueras, M., Gewirtz, J., and Markham, M. (1996). Infant vocalizations are conditioned bothby maternal imitation and motherese speech. Infant Behavior and Development, 19:670.
BIBLIOGRAPHY 191
Peper, L., Bootsma, R., Mestre, D., and Bakker, F. (1994). Catching balls: How to get the handto the right place at the right time. Journal of Experimental Psychology: Human Perception andPerformance, 20:591–612.
Pfeifer, R. (1996). Building ”fungus eaters”: Design principles of autonomous agents. In Maes, P.,Mataric, M., Meyer, J.-A., Pollack, J., and Wilson, S., editors, From animals to animats 4: Proc.of the Fourth Intl. Conf. on Simulation and Adaptive Behavior, pages 3–12. Cambridge, MA: MITPress. A Bradford Book.
Pfeifer, R. (2000). On the role of morphology and materials in adaptive behavior. In From Animals toAnimats 6: Proc. of the Sixth Intl. Conf. Simulation of Adaptive Behavior, pages 23–32.
Pfeifer, R. (2002). Robots as cognitive tools. Inlt. Journal of Cognition and Technology, 1(1):125–143.
Pfeifer, R. and Glatzeder, B. (2004). How the Body Shapes the Way We Think. Cambridge, MA: MITPress. forthcoming.
Pfeifer, R. and Lungarella, M., editors (2001). Proc. of Second Intl. Workshop Emergence and Devel-opment of Embodied Cognition. Workshop held in Beijing, PCR, unpublished.
Pfeifer, R. and Scheier, C. (1994). From perception to action: the right direction? In Gaussier, P. andNicoud, J.-D., editors, From Perception to Action, pages 1–11. IEEE Computer Society Press.
Pfeifer, R. and Scheier, C. (1997). Sensory-motor coordination: The metaphor and beyond. Roboticsand Autonomous Systems, 20:157–178.
Pfeifer, R. and Scheier, C. (1999). Understanding Intelligence. Cambridge, MA: MIT Press.
Piaget, J. (1945). La formation du symbole chez l’enfant. Geneve: Delachaux et Niestle Editions.
Piaget, J. (1953). The Origins of Intelligence. New York: Routledge.
Picard, R. (1997). Affective Computing. Cambridge, MA: MIT Press.
Piek, J. P. (2001). Is a quantitative approach useful in the comparison of spontaneous movements infullterm and preterm infants? Human Movement Science, 20:717–736.
Piek, J. P. (2002). The role of variability in early development. Infant Behavior and Development,156:1–14.
Piek, J. P. and Carman, R. (1994). Developmental profiles of spontaneous movements in infants. EarlyHuman Development, 39:109–126.
Prechtl, H. F. (1997). The importance of fetal movements. In Connolly, K. and Forssberg, H., editors,Neurophysiology and Neuropsychology of Motor Development, pages 42–53. Mac Keith Press.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1995). Simulated annealingmethods. In Numerical Recipes in C, pages 444–455. Cambridge, MA: Cambridge University Press,3rd edition.
BIBLIOGRAPHY 192
Prince, C. G., Berthouze, L., Kozima, H., Bullock, D., Stojanov, G., and Balkenius, C., editors (2003).Proc. of Third Intl. Workshop on Epigenetic Robotics: Modeling Cognitive Development in RoboticSystems. Lund University Cognitive Studies, 101.
Prince, C. G. and Demiris, Y., editors (2003). Adaptive Behavior: Special issue on ‘EpigeneticRobotics’, volume 11 (2).
Prince, C. G., Demiris, Y., Marom, Y., Kozima, H., and Balkenius, C., editors (2002). Proc. of SecondIntl. Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. LundUniversity Cognitive Studies, 94.
Purves, D. (1994). Neural Activity and the Growth of the Brain. Cambridge, MA: Cambridge Univer-sity Press.
Pylyshyn, Z. W. (1984). Computation and Cognition: Toward a Foundation for Cognitive Science.Cambridge, MA: MIT Press.
Reeke, G. N., Sporns, O., and Edelman, G. M. (1990). Synthetic neural modeling: the ‘darwin’ seriesof recognition automata. Proc. IEEE, 78:1498–1530.
Regan, D. (1997). Visual factors in hitting and catching. Journal of Sports Sciences, 15:533–558.
Rizzolatti, G. and Arbib, M. (1998). Language within our grasp. Trends in Neurosciences, 21(5):188–194.
Robinson, S. R. and Smotherman, W. P. (1992). Fundamental motor patterns of the mammalian fetus.Journal of Neurobiology, 23:1574–1600.
Rochat, P. (1987). Mouthing and grasping in neonates: Evidence for the early detection of what hardand soft substances afford for action. Infant Behavior and Development, 25:871–884.
Rochat, P. (1989). Object manipulation and exploration in 2 to 5-month-old infants. DevelopmentalPsychology, 25:871–884.
Rochat, P. and Striano, T. (2000). Perceived self in infancy. Infant Behavior and Development, 23:513–530.
Rojdestvenski, I., Cottam, M., Park, Y., and Oquist, G. (1999). Robustness and time-scale hierarchy inbiological systems. BioSystems, 50:71–82.
Rosen, R. (1991). Life Itself: A Comprehensice Inquiry into the Nature, Origin, and Fabrication ofLife. New York: Columbia University Press.
Rus, D. and Chirikjian, G., editors (2001). Autonomous Robots: Special issue on ”Self-ReconfigurableRobots”, volume 10 (1).
Rutkowska, J. C. (1994). Scaling up sensorimotor systems: constraints from human infancy. AdaptiveBehavior, 2(4):349–373.
BIBLIOGRAPHY 193
Rutkowska, J. C. (1995). Can development be designed? what we may learn from the cog project.In Advances in Artificial Life: Proc. of the Third European Conf. on Artificial Life, pages 383–395.Berlin: Springer-Verlag.
Saito, F., Fukuda, T., and Arai, F. (1994). Swing and locomotion control of a two-link brachiationrobot. IEEE Control Systems, 14:5–12.
Sandini, G. (1997). Artificial systems and neuroscience. In Proceedings of the Otto and MarthaFischbeck Seminare on Active Vision. Berlin, Germany: Wissenschaftskolleg zu Berlin.
Sandini, G., Metta, G., and Konczak, J. (1997). Human sensorimotor development and artificial sys-tems. In Proc, of the Intl. Symp. on Artificial Intelligence, Robotics, and Intellectual Human ActivitySupport for Nuclear Applications, pages 303–314.
Scassellati, B. (1998). Building behaviors developmentally: a new formalism. In Proc. of the 1998AAAI Spring Symp. on Integrating Robotics Research.
Scassellati, B. (2001). Foundations for a theory of mind for a humanoid robot. PhD thesis, MITDepartment of Electrical Engineering and Computer Science. Unpublished.
Schaal, S. (1999). Is imitation learning the route to humanoid robots? Trends in Cognitive Science,3(6):233–242.
Schaal, S. and Sternad, D. (1998). Programmable pattern generators. In Intl. Conf. on ComputationalIntelligence in Neuroscience, pages 48–51.
Scheier, C. and Lambrinos, D. (1996). Categorization in a real-world agent using haptic explorationand active perception. In Proc. of the Fourth Intl. Conf. on Simulation of Adaptive Behavior, pages65–75. Cambridge, MA: MIT Press.
Scheier, C. and Pfeifer, R. (1997). Information theoretic implications of embodiment for neural net-work learning. In Intl. Conf. on Artificial Neural Networks, pages 691–696.
Scheier, C., Pfeifer, R., and Kuniyoshi, Y. (1998). Embedded neural networks: exploiting constraints.Neural Networks, 11(7-8):1551–1569.
Schneider, K. and Zernicke, R. (1992). Mass, center of mass, and moment of inertia estimates forinfant limb segments. Journal of Biomechanics, 25:145–148.
Schneider, K., Zernicke, R., Ulrich, B., Jensen, J., and Thelen, E. (1990). Understanding movementcontrol in infants through the analysis of limb intersegmental dynamics. Journal of Motor Behavior,22:493–520.
Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80:1–27.
Shannon, C. (1948). A mathematical theory of communication. Bell Syst. Tech. Journal, 27:379–423.
BIBLIOGRAPHY 194
Sharkey, N. E. (2003). Biologically inspired robotics. In Arbib, M., editor, MIT Handbook of BrainTheory and Neural Networks. Cambridge, MA: MIT Press.
Sirois, S. and Mareshal, D. (2002). Models of habituation in infancy. Trends in Cognitive Sciences,6(7):293–298.
Slater, A. and Johnson, S. P. (1997). Visual sensory and perceptual abilities of the newborn: beyondthe blooming, buzzing confusion. In Simion, F. and Butterworth, G., editors, The Development ofSensory, Motor and Cognitive Capacities in Early Infancy: From Sensation to Cognition, pages121–141. Hove: Psychology Press.
Smitsman, A. W. and Schellingerhout, R. (2000). Exploratory behavior in blind infants: How toimprove touch? Infant Behavior and Development, 23:485–511.
Smotherman, W. P. and Robinson, S. R. (1988). Behavior of the fetus. Caldwell, NJ: Telford.
Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods. Ames, Iowa: Iowa State UniversityPress.
Spelke, E. S. (2000). Core knowledge. American Psychologist, 55:1233–1243.
Spencer, J. P. and Thelen, E. (1999). A multiscale state analysis of adult motor learning. ExperimentalBrain Research, 128:505–516.
Spong, M. W. (1995). Swing up control of the acrobot. IEEE Control Systems Magazine, pages 49–55.
Spong, M. W. and Vidyasagar, M. (1989). Robot Dynamics and Control. New York: John Wiley andSons.
Sporns, O. (2003). Embodied cognition. In Arbib, M., editor, MIT Handbook of Brain Theory andNeural Networks. Cambridge, MA: MIT Press.
Sporns, O. (2004). Developing neuro-robotic models. In Mareshal, D., Sirois, S., and Westermann, G.,editors, Constructing Cognition. Oxford University Press. to appear.
Sporns, O. and Alexander, W. (2002). Neuromodulation and plasticity in an autonomous robot. NeuralNetworks, 15:761–774.
Sporns, O., Almassy, N., and Edelman, G. (2000). Plasticity in value systems and its role in adaptivebehavior. Adaptive Behavior, 8(2):129–148.
Sporns, O. and Edelman, G. M. (1993). Solving bernstein’s problem: a proposal for the developmentof coordinated movement by selection. Child Development, 64:960–981.
Sporns, O. and Pegors, T. (2004). Information-theoratical aspects of embodied artificial intelligence.In Iida, F., Pfeifer, R., Steels, L., and Kuniyoshi, Y., editors, Embodied Artificial Intelligence. Berlin:Springer-Verlag. LNCS.
Steels, L. (1994). The artificial life roots of artificial intelligence. Artificial Life, 1:75–110.
BIBLIOGRAPHY 195
Steels, L. (2003). Personal communication.
Steuer, R., Kurths, J., Daub, C., Weise, J., and Selbig, J. (2002). The mutual information: detectingand evaluating dependencies between variables. Bioinformatics, 18:231–240. Suppl. 2.
Stiles, J. (2000). Neural plasticity and cognitive development. Developmental Neuropsychology,18(2):237–272.
Stoica, A. (2001). Robot fostering techniques for sensory-motor development of humanoid robots.Robotics and Autonomous Systems, 37:127–143.
Streri, A. (1993). Seeing, Reaching, Touching: The Relations between Vision and Touch in Infancy.Cambridge,MA: MIT Press.
Streri, A. and Gentaz, E. (2003). Cross-modal recognition of shapes from hand to eyes in newborns.Somatosensory and Motor Research, 20:11–16.
Taga, G. (1991). Self-organized control of bipedal locomotion by neural oscillators in unpredictableenvironment. Biological Cybernetics, 65:147–159.
Taga, G. (1994). Emergence of bipedal locomotion through entrainment among the neuro-musculo-skeletal system and environment. Physica D, 75:190–208.
Taga, G. (1995). A model of the neuro-musculo-skeletal system for human locomotion. BiologicalCybernetics, 73:113–121.
Taga, G. (1997). Freezing and freeing degrees of freedom in a model neuro-musculo skeletal systemsfor the development of locomotion. In Proceedings of 16th International Society of BiomechanicsCongress, page 47.
Taga, G. (2000). Nonlinear dynamics of the human motor control. In Proc. of the First Intl. Symp. onAdaptive Motion of Animals and Machines.
Taga, G., Takaya, R., and Konishi, Y. (1999). Analysis of general movements of infants towardsunderstanding of developmental principle for motor control. In Proc. of 1999 IEEE Intl. Conf. onSystems, Man, and Cybernetics, pages 678–683.
Tarapore, D., Lungarella, M., and Gomez, G. (2004). Fingerprinting agent-environment interactionvia information theory. In Proc. of the 8th Intl. Conf. On Intelligent Autonomous Systems, pages512–520.
Te Boekhorst, R., Lungarella, M., and Pfeifer, R. (2003). Dimensionality reduction through sensory-motor coordination. In Kaynak, O., Alpaydin, E., Oja, E., and Xu, L., editors, Proc. of the Joint Int.Conf. ICANN/ICONIP, pages 496–503. LNCS 2714.
Teuscher, C., Mange, D., Stauffer, A., and Tempesti.G. (2003). Bio-inspired computing tissues: to-wards machines that evolve, grow, and learn. Biosystems, 68:235–244.
Thelen, E. (1979). Rhythmical stereotypes in normal human infants. Animal Behaviour, 27:699–715.
BIBLIOGRAPHY 196
Thelen, E. (1981). Kicking, rocking and waving: Contextual analysis of rhythmical stereotypies innormal human infants. Animal Behaviour, 29:3–11.
Thelen, E. (1995). Time-scale dynamics and the development of an embodied cognition. In Port, R.and van Gelder, T., editors, Mind as Motion: Explorations in the Dynamics of Cognition, pages69–100. Cambridge, MA: MIT Press.
Thelen, E. (1999). Dynamics mechanism of change in early perceptuo-motor development. In McClel-land, J. and Siegler, S., editors, 29th Carnegie Symposium on Cognition Mechanisms of CognitiveDevelopment: Behavioral and Neural Perspectives. October. Pittsburgh.
Thelen, E. and Fischer, D. (1983). The organization of spontaneous leg movements in newborn infants.Journal of Motor Behavior, 15:353–377.
Thelen, E., Fisher, D., and Ridley-Johnson, R. (1984). The relationship between physical growth anda newborn reflex. Infant Behavior and Development, 7:479–493.
Thelen, E. and Smith, L. (1994). A Dynamic Systems Approach to the Development of Cognition andAction. Cambridge, MA: MIT Press. A Bradford Book.
Thorndike, E. L. (1911). Animal Intelligent. New York: Macmillan.
Thrun, S. (1992). The role of exploration in learning control. In White, D. and Sofge, D., editors,Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, pages 527–559. NewYork: Van Nostrand Reinhold.
Thrun, S. (1995). Exploration in active learning, pages 381–384.
Toda, M. (1982). Man, Robot, and Society. The Hague, The Netherlands: Nijhoff.
Tononi, G., Sporns, O., and Edelman, G. (1994). A measure for brain complexity: Relating functionalsegregation and integration in the nervous system. Proc. of the Natl. Academy of Science (USA),91:5033–5037.
Tononi, G., Sporns, O., and Edelman, G. (1996). A complexity measure for selective matching ofsignals by the brain. Proc. of the Natl. Academy of Science (USA), 93:3422–3427.
Trevarthen, C. (1993). The function of emotions in early infant communication and development, pages48–81.
Triesch, J. and Jebara, T., editors (2004). Proc. of Third Intl. Conf. on Development and Learning:Developing Social Brains. Conference will take place at the Salk Institute for Biological Studies, LaJolla, California.
Turing, A. M. (1948). Intelligence Machinery, volume 5.
Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236):433–460.
BIBLIOGRAPHY 197
Turkewitz, G. and Kenny, P. A. (1982). Limitation on input as a basis for neural organization andperceptual development: A preliminary theoretical statement. Developmental Psychology, 15:357–368.
Turrigiano, G. G. and Nelson, S. B. (2000). Hebb and homeostasis in neural plasticity. Current Opinionin Neurobiology, 10:358–364.
Turvey, M. T. and Fitzpatrick, P. (1993). Commentary: Development of perception-action systems andgeneral principles of pattern formation. Child Development, (64):1175–1190.
Vaal, J., van Soestl, A. J., Hopkins, B., Sie, L. T., and van der Knaap, M. S. (2001). Development ofspontaneous leg movements in infants with and without periventricular leukomalacia. ExperimentalBrain Research, 135:94–105.
Van Heijst, J. J., Touwen, B. C., and Vos, J. E. (1999). Implications of a neural network model of earlysensorimotor development for the field of developmental neurology. Early Human Development,55:77–95.
Van Heijst, J. J. and Vos, J. E. (1997). Self-organizing effects of spontaneous neural activity on thedevelopment of spinal locomotor circuits in vertebrates. Biological Cybernetics, 77:185–195.
Varela, F., Thompson, E., and Rosch, E. (1991). The Embodied Mind. Cambridge, MA: MIT Press.
Varshavskaya, P. (2002). Behavior-based early language development on a humanoid robot. In Proc.of the Second Intl. Conf. on Epigenetics Robotics, pages 149–158.
Vereijken, B., van Emmerik, R. E., Whiting, H. T., and Newell, K. M. (1992). Free(z)ing degrees offreedom in skill acquisition. Journal of Motor Behavior, 24:133–142.
Von der Malsburg, C. (2003). Self-organization and the brain. In Arbib, M., editor, MIT Handbook ofBrain Theory and Neural Networks. Cambridge, MA: MIT Press.
Von Hofsten, C. (1984). Developmental changes in the organization of prereaching movements. De-velopmental Psychology, 20:378–388.
Von Hofsten, C. (1993). Prospective control: A basic aspect of action development. Human Develop-ment, 36:253–270.
Von Hofsten, C., Vishton, P., Spelke, E., Feng, G., and Rosander, K. (1998). Predictive action ininfancy: Head tracking and reaching for moving objects. Cognition, 67(3):255–285.
Vygotsky, L. (1962). Thought and Language. Cambridge, MA: MIT Press. Original work publishedin 1934.
Walter, G. W. (1950). An imitation of life. Scientific American, 182(5):42–45.
Walter, G. W. (1951). A machine that learns. Scientific American, 185(2):60–63.
Wang, D. (1995). Habituation. In Arbib, M., editor, The Handbook of Brain Theory and NeuralNetworks, pages 441–444.
BIBLIOGRAPHY 198
Webb, B. (2001). Can robots make good models of biological behaviour? Behavioral and BrainSciences, 24:1033–1050.
Weng, J., Hwang, W., Zhang, Y., Yang, C., and Smith, R. (2000). Developmental humanoids: Hu-manoids that develop skills automatically. In Proc. of the 1st IEEE-RAS Conf. on Humanoid Robots.
Weng, J., McClelland, J., Pentland, A., Sporns, O., Stockman, I., Sur, M., and Thelen, E. (2001).Autonomous mental development by robots and animals. Science, 291(5504):599–600.
Weng, J. J., editor (2000). NSF/DARPA Workshop on Development and Learning. Workshop held inCambridge, USA.
Westermann, G. (2000). Constructivist Neural Network Models of Cognitive Development. Unpub-lished PhD Thesis, Division of Informatics, University of Edinburgh.
Westermann, G., Lungarella, M., and Pfeifer, R., editors (2001). Proc. of First Intl. Workshop onDevelopmental Embodied Cognition. Workshop held in Edinburgh, Scotland, unpublished.
Whiten, A. (2000). Primate culture and social learning. Cognitive Science, 24(3):477–508.
Whitman, P. and Kalos, M. (1982). Monte Carlo Methods. New York: Springer-Verlag.
Williamson, M. (1998). Neural control of rhythmic arm movements. Neural Networks, 11(7-8):1379–1394.
Williamson, M. (2001). Robot arm control exploiting natural dynamics. PhD thesis, MIT Departmentof Electrical Engineering and Computer Science. Unpublished.
Wolpert, D., Doya, K., and Kawato, M. (2003). A unifying computational framework for motor controland social interaction. Philosophical Transactions of the Royal Society London B, 358:593–602.
Wolpert, D., Ghahramani, Z., and Flanagan, R. (2001). Perspectives and problems in motor learning.Trends in Cognitive Sciences, 5(11):487–494.
Wood, D., Bruner, J., and Ross, G. (1976). The role of tutoring in problem solving. Journal of ChildPsychology and Psychiatry, (17):181–191.
Yarbus, A. (1967). Eye Movements and Vision. New York: Plenum Press.
Yoshikawa, Y., Koga, J., Asada, M., and Hosoda, K. (2003). A constructive model of mother-infantinteraction: toward infant’s vowel articulation. Connection Science, 15(4):211–229.
Zernicke, R. F. and Schneider, K. (1993). Biomechanics and developmental neuromotor control. ChildDevelopment, 64:982–1004.
Ziemke, T. (2003). On the role of robot simulations in embodied cognitive science. Artificial Intelli-gence and the Simulation of Behavior Journal, 1(4).
Zlatev, J. and Balkenius, C. (2001). Introduction: Why ”epigenetic robotics”? In Proc. of the FirstIntl. Workshop on Epigenetic Robotics, pages 1–4.