on-line regression algorithms for learning mechanical models of robots: a survey

15
Robotics and Autonomous Systems 59 (2011) 1115–1129 Contents lists available at SciVerse ScienceDirect Robotics and Autonomous Systems journal homepage: www.elsevier.com/locate/robot On-line regression algorithms for learning mechanical models of robots: A survey Olivier Sigaud , Camille Salaün, Vincent Padois Université Pierre et Marie Curie, Institut des Systèmes Intelligents et de Robotique - CNRS UMR 7222, Pyramide Tour 55 - Boîte Courrier 173, 4 Place Jussieu, 75252 Paris CEDEX 05, France article info Article history: Received 6 April 2011 Received in revised form 13 July 2011 Accepted 18 July 2011 Available online 23 July 2011 Keywords: Adaptive and learning systems Adaptive control Mechanical models abstract With the emergence of more challenging contexts for robotics, the mechanical design of robots is becoming more and more complex. Moreover, their missions often involve unforeseen physical interactions with the environment. To deal with these difficulties, endowing the controllers of the robots with the capability to learn a model of their kinematics and dynamics under changing circumstances is becoming mandatory. This emergent necessity has given rise to a significant amount of research in the Machine Learning community, generating algorithms that address more and more sophisticated on-line modeling questions. In this paper, we provide a survey of the corresponding literature with a focus on the methods rather than on the results. In particular, we provide a unified view of all recent algorithms that outlines their distinctive features and provides a framework for their combination. Finally, we give a prospective account of the evolution of the domain towards more challenging questions. © 2011 Elsevier B.V. All rights reserved. 1. Introduction A major trend in robotics research is a shift from industrial applications, where everything can be planned in advance, to service, intervention, and exploration applications where robots must be able to address a large diversity of tasks while reacting to unforeseen situations [1]—in particular when they have to interact with humans. In front of these versatility requirements, the robots themselves get more and more complex. Due to the presence of humans, they must also be less dangerous, thus lighter, which results in stiffness limitations [2]. Taking all these constraints together, the accurate low-level control of these systems for achieving elementary tasks is an increasingly difficult matter, not to mention higher level decision making problems. In this context, efficient control methods are generally model- based and call upon mechanical models of the robot [3]. For instance, when the robot is redundant with respect to a task, a forward model of its velocity kinematics is mandatory to make profit of the redundancy. Besides, a model of the dynamics is required when the dynamics effects can generate disturbances that are larger than what local PID controllers with a limited stiffness can reject instantaneously. However, the accuracy of model-based control directly depends on the accuracy of these models and of the sensors used to close the control loop. Up to a certain extent, mechanical engineering knowledge can help to provide such models. Indeed, the kinematic equation of Corresponding author. Tel.: +33 1 4427 8853; fax: +33 1 4427 5145. E-mail addresses: [email protected] (O. Sigaud), [email protected] (C. Salaün), [email protected] (V. Padois). a rigid poly-articulated plant is easy to obtain given a precise knowledge of its geometry. Similarly, its dynamic equation can be written as the combination of well-modeled factors such as the inertia of the plant, Coriolis and centrifugal effects as well as external forces (including gravity). But, for instance, the geometry of the mechanical chain changes when the robot is using a tool or when its structure is altered accidentally. And, at the dynamics level, there are additional unmodeled effects such as joint and cable frictions or backlashes [4] that are often difficult to model accurately. Furthermore, some of these effects are inherently non- stationary. In that context, on-line adaptation of models is a valuable strat- egy because it can capture non-stationarities in the mechanical properties of the plant. Two approaches in that respect are the on-line parameter iden- tification step of adaptive control [5], which tunes parameters of a given model given its mechanical structure [6] and model learn- ing, which identifies mechanical models using supervised learning methods without any knowledge of the structure. Whereas clas- sical identification methods generally assume a rigid body model and then incorporate some enhancements to take non-linear phe- nomena such as friction or flexibilities into account, the model learning approach makes no assumption on the structure and in- cludes all phenomena in a general function built out of experimen- tal data. This paper provides a survey of state-of-the-art methods in the supervised learning approach. A related survey paper has been published recently in [7]. That previous publication covers many topics in that domain, from the different control methods that make use of the learned models to the presentation of specific 0921-8890/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.robot.2011.07.006

Upload: olivier-sigaud

Post on 12-Sep-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On-line regression algorithms for learning mechanical models of robots: A survey

Robotics and Autonomous Systems 59 (2011) 1115–1129

Contents lists available at SciVerse ScienceDirect

Robotics and Autonomous Systems

journal homepage: www.elsevier.com/locate/robot

On-line regression algorithms for learning mechanical models of robots: A surveyOlivier Sigaud ∗, Camille Salaün, Vincent PadoisUniversité Pierre et Marie Curie, Institut des Systèmes Intelligents et de Robotique - CNRS UMR 7222, Pyramide Tour 55 - Boîte Courrier 173, 4 Place Jussieu,75252 Paris CEDEX 05, France

a r t i c l e i n f o

Article history:Received 6 April 2011Received in revised form13 July 2011Accepted 18 July 2011Available online 23 July 2011

Keywords:Adaptive and learning systemsAdaptive controlMechanical models

a b s t r a c t

With the emergence of more challenging contexts for robotics, the mechanical design of robotsis becoming more and more complex. Moreover, their missions often involve unforeseen physicalinteractions with the environment. To deal with these difficulties, endowing the controllers of the robotswith the capability to learn a model of their kinematics and dynamics under changing circumstances isbecoming mandatory. This emergent necessity has given rise to a significant amount of research in theMachine Learning community, generating algorithms that address more and more sophisticated on-linemodeling questions. In this paper, we provide a survey of the corresponding literature with a focus onthe methods rather than on the results. In particular, we provide a unified view of all recent algorithmsthat outlines their distinctive features and provides a framework for their combination. Finally, we give aprospective account of the evolution of the domain towards more challenging questions.

© 2011 Elsevier B.V. All rights reserved.

1. Introduction

A major trend in robotics research is a shift from industrialapplications, where everything can be planned in advance, toservice, intervention, and exploration applications where robotsmust be able to address a large diversity of tasks while reacting tounforeseen situations [1]—in particular when they have to interactwith humans. In front of these versatility requirements, the robotsthemselves get more and more complex. Due to the presence ofhumans, they must also be less dangerous, thus lighter, whichresults in stiffness limitations [2]. Taking all these constraintstogether, the accurate low-level control of these systems forachieving elementary tasks is an increasingly difficult matter, notto mention higher level decision making problems.

In this context, efficient control methods are generally model-based and call upon mechanical models of the robot [3]. Forinstance, when the robot is redundant with respect to a task, aforward model of its velocity kinematics is mandatory to makeprofit of the redundancy. Besides, a model of the dynamics isrequiredwhen the dynamics effects can generate disturbances thatare larger than what local PID controllers with a limited stiffnesscan reject instantaneously. However, the accuracy of model-basedcontrol directly depends on the accuracy of these models and ofthe sensors used to close the control loop.

Up to a certain extent, mechanical engineering knowledge canhelp to provide such models. Indeed, the kinematic equation of

∗ Corresponding author. Tel.: +33 1 4427 8853; fax: +33 1 4427 5145.E-mail addresses: [email protected] (O. Sigaud),

[email protected] (C. Salaün), [email protected] (V. Padois).

0921-8890/$ – see front matter© 2011 Elsevier B.V. All rights reserved.doi:10.1016/j.robot.2011.07.006

a rigid poly-articulated plant is easy to obtain given a preciseknowledge of its geometry. Similarly, its dynamic equation canbe written as the combination of well-modeled factors such asthe inertia of the plant, Coriolis and centrifugal effects as well asexternal forces (including gravity). But, for instance, the geometryof the mechanical chain changes when the robot is using a toolor when its structure is altered accidentally. And, at the dynamicslevel, there are additional unmodeled effects such as joint andcable frictions or backlashes [4] that are often difficult to modelaccurately. Furthermore, some of these effects are inherently non-stationary.

In that context, on-line adaptation of models is a valuable strat-egy because it can capture non-stationarities in the mechanicalproperties of the plant.

Two approaches in that respect are the on-line parameter iden-tification step of adaptive control [5], which tunes parameters ofa given model given its mechanical structure [6] and model learn-ing, which identifies mechanical models using supervised learningmethods without any knowledge of the structure. Whereas clas-sical identification methods generally assume a rigid body modeland then incorporate some enhancements to take non-linear phe-nomena such as friction or flexibilities into account, the modellearning approach makes no assumption on the structure and in-cludes all phenomena in a general function built out of experimen-tal data.

This paper provides a survey of state-of-the-art methods in thesupervised learning approach. A related survey paper has beenpublished recently in [7]. That previous publication covers manytopics in that domain, from the different control methods thatmake use of the learned models to the presentation of specific

Page 2: On-line regression algorithms for learning mechanical models of robots: A survey

1116 O. Sigaud et al. / Robotics and Autonomous Systems 59 (2011) 1115–1129

robotics applications that demonstrate the feasibility of the meth-ods. Another related and recent survey paper is Hoffmann et al. [8].It investigates the notion of body schema and helps discriminatingbetween different kinds of representation that the controller of arobot can have of its body. It covers the vision processing ques-tions related to the acquisition of such representations (e.g. [9])and more general work on the acquisition of a body schema ora body image that directly relate visual features to mechanicalproperties of the plant [10–13]. It also investigates somewhat su-perficially the relationships to computational neurosciences issuesrelated to model learning in animals and humans.

Here, in the terminology of Hoffmann et al. [8], we focus onme-chanical models that are either forward or inverse models, with-out addressing all the other points listed above. More precisely,we present the local and incremental regression algorithms usedfor learning mechanical models so as to provide a unifying viewthrough a study of their similarities and differences. In particular,we highlight the features that make these algorithms unique, andinvestigate whether they may be combined into a new generationof methods.

The paper is organized as follows. In Section 2, we give somebackground knowledge about the mechanical models of robots. InSection 3, we present machine learning methods that are used tolearn these models. Each time we present a method, we give anoverview of the applications to robot model learning it has givenrise to. All these methods are discussed in Section 4. Section 5 isdevoted to a prospective presentation of the evolution of the field.Finally, we summarize our main messages in Section 6.

2. Mechanical models of robots

Before presenting the methods used to learn mechanical mod-els in Section 3, in this section, we explain what these models are.The control methods based on such models are not covered in thepaper, we refer the reader to [7,14] or to other papers cited inSection 3 to learn more about these control methods.

2.1. Joint space and task space

Rigid poly-articulated systems are defined by a finite numberof rigid bodies interconnected by ideal joints. An ideal joint is akinematic relationship between two rigid bodies.

Thenparameters chosen to describe the configuration of a robotalong with the joint equations are sufficient to describe the pose ofeach body composing it. These parameters can be gathered to formthe vector of configuration parameters q also named generalizedcoordinates. The configuration of a robot is thus defined on ann-dimensional space called joint space or configuration space.

Though configuration parameters are often chosen so that theycan be intuitively related to the motion of the actuators of theplant, it is very difficult to use them to describe most of the tasksof a robot. In order to facilitate the description of these tasks, analternative space can be used, which is often called task space oroperational space.

A unitary task consists in positioning the coordinate frameof a body, for instance an end-effector, relatively to a referencecoordinate frame. This task can be described with a vector oftask space parameters ξ of size m ≤ 6, since six parameters aresufficient to locally span the space of positions and orientations.

2.2. Joint space to task space mappings

The joint space to task space mapping can be described at threedifferent levels. At the geometric level, the forward kinematicsmodel is a non-linear function f such that ξ = f (q).

If the robot is redundant with respect to the task, there is aninfinite number of possible inverses for f . In that case, there isno simple method to span the set of possible solutions at thegeometric level and the mapping is often described at the velocitylevel by the Jacobian matrix J (q) = ∂f (q) /∂q such that

ξ = J (q) q. (1)J (q) is an m × n matrix and thus can be inverted using

linear algebra techniques. In the redundant and non-singular cases,i.e. rank (J (q)) = m and m < n, there exists an infinity of possiblegeneralized inverses of J (q) (see [15]). Given the redundancy, asystem can make profit of internal motions that do not induceany perturbation on the task to achieve other tasks. The resultingtrajectory of the plant depends on the specific inverse that has beenchosen. Different possible redundancy resolution schemes can beapplied, depending on the compatibility of the tasks or constraints,which have to be solved. We refer the reader to [14] for a formaltreatment of this topic. Conversely, in the case of parallel plants,the inverse model is unique whereas there can be an infinity offorward models.

2.3. Forward and inverse dynamics

The forward dynamics mapping relates forces applied on theplant, among which the applied m-dimensional control inputvector Γ , to the acceleration q. The Dynamics (D) of a plant isdescribed as

q = A−1 (q) (Γ − Γ ext− b (q, q) − g (q) − ϵ (q, q)), (2)

where A (q) , b (q, q), g (q), ϵ (q, q) and Γ ext are respectively then × n inertia matrix of the plant, the vector of Coriolis andcentrifugal effects, the vector of gravity effects, the vector ofunmodeled effects and the torques resulting from external forcesapplied to the plant.

At the dynamics level, the state of the plant is s =qT , qT

Tthus the induced dimensionality is larger than at the kinematicslevel where it is q.

The Rigid Body Dynamics (rbd) of the plant corresponds tothe dynamics of the system with rigid parts and ideal joints. It isgiven by (2) where the term corresponding to unmodeled effectsis removed. More generally, (2) can be written

q = Dq, q, Γ , Γ ext . (3)

Similarly, the inverse dynamics (ID) is described as

Γ = A (q) q + b (q, q) + g (q) + ϵ (q, q) + Γ ext, (4)or, more generally,

Γ = IDq, q, q, Γ ext . (5)

In any given state s, the joint space to joint space mappingrepresented by (5) gives the torques to be applied by the actuatorsof the robot to obtain the desired acceleration q at the next step.

2.4. Inverse dynamics in the task space

In [16], Khatib formulated an alternative expression of thedynamics of the plant in the task space

F = Λ (q) ξ + µ (q, q) + p (q) + ϵ (q, q) + F ext (6)

Γ = J (q)T F (7)where Λ (q) , µ (q, q), p (q) and ϵ (q, q) are the projection in thetask space of the inertia matrix, the Coriolis and centrifugal forces,the gravity forces and the unmodeled effects forces. F ext is thevector of external forces applied to the plant at the end-effectorlevel. These equations relate the task space acceleration vector ξ,with a dimension less than or equal to 6, to the joint torques vector,with a dimension equal to n.

Page 3: On-line regression algorithms for learning mechanical models of robots: A survey

O. Sigaud et al. / Robotics and Autonomous Systems 59 (2011) 1115–1129 1117

LWR

MoE

RFWR LWPLS

LWPR LGP IMLM

SVMGMM

ILOGMR LoSVR XCSF

XCS

iRFRLS

MLP

ANN

RBFNSOM GPR LCS

Fig. 1. Historical view of several lines of research on supervised learning algorithms giving rise to incremental and local regression methods to infer mechanical models.

In general, inverse dynamics in the task space is a mappingfrom R2n

× Rm to Rn whereas inverse dynamics in joint space is amapping fromR3n toRn. Poly-articulated systemshave a task spacedimension which is usually smaller or equal to the dimension ofthe joint space. Thus, themodel of the inverse dynamics in the taskspace is generally smaller than the combination of the kinematicsmodel with the standard inverse dynamics model.

3. Regression of mechanical models

In supervised learning, a class of machine learning techniques,a supervisor provides input/output pairs that can be used to learna relationship between the corresponding training data. When theoutput is discrete, the classification problem consists in associatingthe presented input to an output value, its class. In contrast, whenthe output is continuous, the regression problem consists in findinga function that approximates as accurately as possible a latentfunction that describes the input/output relationship.

Formally, given a set of k training samplesS = {xi, yi}ki=1, wherexi ∈ X ⊆ Rl is a vector of dimension l and yi ∈ Y ⊆ Rthe regression problem consists in approximating a latent functionf : X → Y with a function f taken in some predetermined familyso as to minimize some measurement of the approximation error.

For instance, the Least Square (ls) regression method considersthe family of linear functions f (x) = βTx where β is a weightvector minimizing the quadratic error

minβ

‖y − Xβ‖2, (8)

where y = [y1, . . . , yk]T and X = [x1, . . . , xk]T is a k × lmatrix.Learning a mechanical model of a plant is a regression problem

where training samples are obtained from the state and controls ofthe plant along time.

A global view of the regression methods presented in this pa-per is given in Fig. 1. The next section describes early regres-sion methods used to learn such models, from Artificial NeuralNetworks (anns) to Locally Weighted Regression (lwr) methods.Then we compare several state-of-the-art methods that all incre-mentally split the input space into regions and perform regressionon local models associated to these regions.

In order to insist on what distinguishes these methods fromtheir competitors and before discussing the impact of thesedifferences, we focus on the following questions for all algorithms.How are the regions defined? How does the algorithm performlocal regression? How do the region boundaries evolve? How doesthe algorithm add (or delete) a region? How does it compute theglobal output?

3.1. Artificial Neural Networks

This section is dedicated to anns and their evolution to-wards lwr methods. anns are a wide class of non-linear function

approximation tools. They are declined under different formssuch as Multi-Layer Perceptrons (mlps) [17], Self-Organizing Maps(soms) [18] or Radial Basis Function networks (rbfns) [19]. Otherforms of neural networks such as population coding are not coveredin this paper because they are less relevant to the introduction ofthe subsequent sections, although they have been applied to robotmodel learning (see e.g. [20]). For a classical survey about learningmechanical models with anns, we refer the reader to [21].

In all the sections below, α is the learning rate, nn is the numberof neurons of a network and nr is the number of regions.

3.1.1. Multi-Layer PerceptronsAn mlp approximates the latent function f through a directed

graph of artificial neurons whose nodes convey a non-linearfunction and edges convey a weight that is tuned by the algorithm.The excitation level vi(x) of each neuron i is calculated as

vi(x) =

nn−i=1

βixi

where xi is the input of neuron i and βi is the associated weight.The activation function yi = f (vi) uses the excitation level todetermine if neuron i fires or not and is its output. Classicalactivation functions are yi = tanh(vi) or the sigmoid functionyi = (1 + e−vi)−1.

The learning process consists in computing the error of theglobal network output to update all weighted connections. Theweights can be updated by back-propagating through the networkan error δyi between the desired value and the real output of eachneuron i

δβij = −α∂δyi∂βij

(9)

where βij is the weight between input xi and neuron i.mlps are used to learn the inverse kinematics (e.g. [22–24]),

the Jacobian matrix [25] or the forward kinematics model of arobot [26,27]. More recently, Sang and Han [28] and Sadjadian andTaghirad [29] use ann methods to learn the forward models ofparallel robotswhich involves highly coupled non-linear equationsthat are difficult to model analytically.

3.1.2. Self-Organizing MapsA som is another type of ann. It is trained using unsupervised

learning to produce a low-dimensional discrete representation ofthe input space. An important feature of soms is that they canpreserve the topological properties of the input space.

Barretto et al. [30] provide a good survey of the applicationof soms to robot identification problems. Most authors usethese methods in the context of visuo-motor control, apart from[31,32] who respectively learn the inverse kinematics of a six dofsindustrial robot Puma 562 and the forward kinematics model ofa simple plant. In this context, the evaluation is based on theaccuracy of the model itself rather than on its control capabilities.

Page 4: On-line regression algorithms for learning mechanical models of robots: A survey

1118 O. Sigaud et al. / Robotics and Autonomous Systems 59 (2011) 1115–1129

–5 0 5 10 15–1

0

1

2

3

4

Fig. 2. Approximation (doted) of a sinus function (top in black) with anrbfn (bottom in cyan). (For interpretation of the references to colour in this figurelegend, the reader is referred to the web version of this article.)

In the visuo-motor case, Ritter et al. [33] and Martinetz et al.[34] use soms to learn a mapping between the three-dimensionalend-effector positionmeasured by two cameras and joint positionson a three dofs robot.More recentwork in the domain can be foundin [35].

Furthermore, specific soms called Growing Neural Gas are capa-ble of expanding in the relevant dimensions of a domain. They havebeen combined with locally weighted learning methods presentedhereafter [36]. The resulting algorithm is endowed with interest-ing properties for incremental function approximation in a largedomain, but to our knowledge it has not been applied to the iden-tification of mechanical models. Thus, we do not investigate thetopic in more detail.

3.1.3. Radial Basis Function NetworksRadial Basis Function Networks (rbfns), illustrated in Fig. 2, can

be seen as a simplification over mlps, retaining only one layer andusing a Radial Basis Function (rbf) as activation function.

An rbfφ is a functionwhose output value y depends only on thedistance ri from its center ci to the input value x. A standard rbf isa Gaussian function defined as

φ(ri) =1

2πσi2e−

r2i2σi2 (10)

where σi2 is the variance of the Gaussian function i which

determines its size and shape. The values of all σi are often set toa unique value σ . To perform regression with rbfns, the output ofall functions are weighted and summed together to get the globalnetwork output

y =

nn−i=1

βiφ(ri) (11)

where the βi are weights.In rbfns, the weights are updated incrementally with different

methods such as gradient descent based on the error betweenthe output of the network and the desired value. As it can beconsidered as a one layer ann, the back-propagation can bewrittenasδβi = −α(yi − yi)φ(ri). (12)

In [37,38], an rbfn is used to learn the forward kinematics ofa robot. Each basis function is derived to obtain a Jacobian matrix,using the Resolved Motion Rate Control (rmrc) [39] framework togenerate trajectories. In [40], an rbfn is used to learn an inversedynamics model of a 4 dofs manipulator in three dimensions.

3.1.4. Locally Weighted RegressionLocally Weighted Regression (lwr) methods [41] use Gaussian

features like rbfns, but the weights are approximated with thels method (see Fig. 3).

Fig. 3. In lwr, x is the input of a latent function. A Gaussian, centered on x, weightsstored points with respect to their proximity to x. A regression is then computedusing the weighted points [42].

Themain difference between rbfns and lwrmethods lies in theintroduction of a linear model associated to each rbf and used toperform ls regression. Each rbf defines a region of validity for thecorresponding linear model.

More precisely, around each point x, the contribution of each re-ceptive field to the global output is computedwith aGaussian func-tion (10). Thus lwr globally approximates non-linear functions bycombining local linear models.

Compared to anns or rbfns, lwr benefits from the power oflinear estimation methods. But lwr needs to retain all trainingdata to perform the ls approximation, which may be infeasiblein large spaces given memory limitations. However, the linearmodels β can also be computed incrementally, using the RecursiveLeast Square (rls) algorithm. This extension to lwr gave rise torfwr [43], that is an incremental local regressionmethod avoidingstorage of data. In parallel, lwrwas improved in another directionby lwpls [44], that uses the Partial Least Square (pls) [45,46]projection mechanisms to neglect irrelevant dimensions in theinput space.

3.2. Locally Weighted Projection Regression

Locally Weighted Projection Regression (lwpr) [47] is probablythe most popular learning method for learning mechanical mod-els of robots. As described in [48], it can be seen as combiningthe properties of lwpls and rfwr. Indeed, it performs simultane-ously incremental regression of local linear models like rfwr andinput projection like lwpls. Furthermore, the shape of the regioncorresponding to each linear model is also tuned incrementally.lwpr provides a fast, incremental and reasonably accurate approx-imation, thus it is a good reference for comparisons.Definition of regions. In lwpls, rfwr and lwpr, a region is calleda receptive field. Like its ancestors, lwpr uses Gaussian functionsφi(x)

φi(x) = e−12 (x−ci)TWi(x−ci) (13)

where the positive distance metric Wi of the Gaussian delimits aregionwhich is updated during learning tomatch the training data.Local regression mechanism. As in lwr, each region has a corre-sponding local linearmodelβi which is used to predict a local scalaroutput yi relative to an input vector x

yi(x) = βix. (14)

lwpr inherits from the capacity of lwpls to reduce the inputdimensionality using pls. Some projection is made on the inputvector tomodel high dimensional functions only on their most rel-evant directions. Moreover, like rfwr, the linear model approx-imation is performed efficiently using an incremental version ofpls called nipals [45]. Several other ways to perform dimensional-ity reduction in the context of local linear regression are discussedin [49].Evolution of regions. The distance metric W is adapted when newdata are received to improve the coverage of the input space. It is

Page 5: On-line regression algorithms for learning mechanical models of robots: A survey

O. Sigaud et al. / Robotics and Autonomous Systems 59 (2011) 1115–1129 1119

computed as W = MTM where M is an upper-triangular matrixto ensure W is symmetric and positive definite. M is modifiedrecursively and individually to regulate the size of the regionswiththe update law

M = M − α∂C∂M

(15)

where C is a minimized cost function. This cost function is morecomplex than the weighted mean square error criterion in lwr.It is computed by a leave one out cross-validation method andincludes a penalty term that prevents the population of modelsfrom collapsing. The reader can refer to [48] or [47] for a detailedpresentation of the incremental version of the algorithm.

Note that, in contrast with the distance metrics, the centersci of the regions are not adapted. Their position depends on themechanism used to add a new region.Adding new regions. The rule for adding a new region is thefollowing. For a given input x, the Gaussian function φi(x) definingeach region can be seen as a level of activation of the local modelcorresponding to this region. So, if, for the current x, the level ofactivation of no region is above a threshold wgen, a new region iscreated with x as center and an initial Dinit as distance metrics. Thevalue of wgen plays a major role in the number of regions createdfor a given approximation problem.Global model approximation. The approximation performed bylwpr is the sum of local linearmodels weighted by their respectiveGaussian weights

y(x) =1

Φ (x)

nr−i=1

φi (x) yi (x) (16)

where Φ (x) =∑nr

i=1 φi (x).Furthermore, the gradient of the latent function, differentiated

with respect to the input (and not with respect to time) andthe confidence bounds on the predicted output are also providedthrough an incremental algorithm [50].Applications to robotmodel learning. lwpr is certainly themost usedalgorithm in the context of learning mechanical models of a plant.As a result, it is used as a basis for comparisons in most if not allthe papers presented in the next sections. Hereafter, we list a fewof themost impressive applications of lwpr to learningmechanicalmodels of robots.

In [51], an inverse kinematics model is learned with lwpr. Theplant is an atr humanoid robot [52] with 26 actuated joints amongwhich only 4 are relevant for the learning algorithm. An end-effector is controlled in Cartesian space. Even though 22 of thesedimensions are irrelevant, the dimension of the input (ξ , q) is 29and the dimension of the output q is 26. Themodel is learned alonga task space trajectory, which is much easier than learning in thewhole space. The method solves redundancy by minimizing a costfunction that depends on the configuration. Doing so, no inversionis involved and singularity problems are avoided.

In [53,47,48], the authors use lwpr to learn the inverse dynam-ics model of a seven dofs anthropomorphic Sarcos robotic arm,again along a task space trajectory. The trajectory is followed witha task space PD controller. Some noise is added, making the plantexplore and learn an envelope of possible configurations aroundthe trajectory. The input of the learning algorithm is (q, q, q) of di-mension 21 and the output is the torque Γ of dimension 7. Thenumber of training data points is 5 × 104 and the model contains260 regions.

In [49], the same approach is used to learn the inverse dynamicsmodel of the atr humanoid robot. The input dimension is 90and the output dimension is 30. Here, 7.5 × 106 samples and2200 regions are required to obtain an accurate model. To ourknowledge, this is the largest space in which a function has been

learned with lwpr, but once again the function is learned alonga trajectory in a small sub-part of the input space. Moreover, thenumber of joints used to control the arm is much smaller than thenumber of joints taken as input of the learning algorithm. Besides,since learning 30 parallel lwpr models would be computationallytoo expensive, the authors choose to learn one single model with30 dimensional outputs projected, thanks to the pls algorithm,onto the input data. Indeed, as a consequence of using thepls regression, it is possible to learn one singlemodel with a vectoroutput instead of a scalar one.

More recently, lwpr was used in [54] to learn the inversedynamics in the task space of an anthropomorphic Sarcos arm anda Mitsubishi PA-10 robot. For both plants, the input dimension is17 and the output dimension is 7. As underlined in Section 2.4,those dimensions have to be compared to the size of two separatemodels which can be learned independently: an inverse velocitykinematics model which links a 3 dimensional space to a 7dimensional space and an inverse dynamic model which linksa 21 dimensional space to a 7 dimensional space. The inversedynamics in the task space is also learnedwith lwpr in [55], wherethe system deals with redundancy by choosing a solution thatminimizes the torque output used to control the plant.

Finally, in [56], the inverse dynamics model of a two dofsplanar arm actuated with three artificial agonist–antagonist pairsof muscles is learned with lwpr in the whole planar space. Theinput is (q, q, u) where u is the activation of muscles and theoutput is (q, q). The input dimension is dim(q, q, u) = 10 and theoutput dimension is dim(q, q) = 4. The authors use an optimalfeedback control loop based on the ilqg algorithm [57] to findthe muscular activations that minimize the muscular input whileperforming some reaching movements. After a learning periodgenerating 1.2×106 training points, themodel is composed of 852regions and reaches three different targets.

3.3. Extended classifier system for function approximation

The xcsf algorithm is a regression method that shares manysimilarities with lwpr but comes from Learning Classifier Sys-tems (lcss), a family of adaptive rule-based systems first proposedby Holland [58] that combines reinforcement learning mecha-nisms [59] and genetic algorithms [60].

The immediate ancestor of xcsf is xcs [61], an efficientaccuracy-based lcs designed to solve discrete classificationproblems and sequential decision problems. xcs finds accuratesolutions to a wide range of problems with high probability inpolynomial time [62]. xcsf [63,64] is an evolution of xcs towardsfunction approximation.

As any lcs, xcsf manages a population P of rules, called clas-sifiers. These classifiers contain a condition part and a predictionpart. The condition part defines the region of validity of a lo-cal model whereas the prediction part contains the local modelitself. xcsf is a generic framework that can use different kindsof prediction models (linear, quadratic, etc.) and can pave theinput space with different families of regions (Gaussian, hyper-rectangular, etc.). In this paper, we only consider the case of linearprediction models and Gaussian regions. In this case, the structureof the function approximation built by xcsf is similar to the onebuilt by lwpr.Definition of regions. An important specificity of xcsf is that itdistinguishes a prediction input space and a condition space. Theprediction input space, noted X, corresponds to the input used tocompute the linear models with regression. The condition space,noted Z, is paved with Gaussian classifiers, corresponding to theregions of lwpr. A classifier defines a domain φi(z) in the follow-ing way

φi(z) = e−12 (z−ci)TWi(z−ci) (17)

Page 6: On-line regression algorithms for learning mechanical models of robots: A survey

1120 O. Sigaud et al. / Robotics and Autonomous Systems 59 (2011) 1115–1129

where ci is the center of the Gaussian i andWi is a positive distancemetric, inverse of the squared variance of Gaussian i. The input ofthe condition space, z ∈ Z, defines the space where the Gaussianinfluences the global prediction.

The match set M is the subset of all reliable classifiers in thepopulation P for which φi(z) is above a threshold φ0.1

Local regression mechanism. Each classifier has a correspondinglinear model βi which is used to predict a local output vector yirelative to an input vector x, provided an augmented state xaug =xT 1

T to deal with the presence of an offset β0.The linear model is updated using the rls algorithm, the

incremental version of the lsmethod. If a classifier imakes an errorδyi = y − yi, it is updated with δβi = δyixTi α.Evolution of regions. A specificity of xcsf is that it uses a geneticalgorithm (ga) instead of a gradient-based method to update itspopulation P . As a result, there is no evolution of a region per se, butthe ga rather uses a fitness function to add new regions similar tothe old ones after applying mutation operators and deleting someregions. Here we explain how the fitness is obtained.

The prediction error of a classifier i approximates the meanabsolute deviation of its prediction using the Widrow–Hoff deltarule [66]

δϵi = α (|δyi| − ϵi)

where ϵi is the mean absolute error of the classifier i at theprevious time step. Then the classifier accuracy is determined bythe scaled inverse of ϵi. Classifiers with an ϵi below a thresholdϵ0 are considered trustworthy. Finally, the fitness value is derivedfrom the classifier accuracy relative to all others in the match setM (see [67,65] for algorithmic details).Adding and deleting regions. In xcsf, a classifier is created whenthere is no classifier in the match set corresponding to an input.This mechanism is called covering, it creates a matching conditionwhich corresponds to that input with a randomly generatedcondition space.

But classifiers can also be created by an evolutionary com-ponent based on a niched steady-state ga [64] improved with aset-size-relative tournament selection [68]. This favors the recom-bination of efficient classifiers by crossover and mutations. A ga isrun if the time since the last ga invocation exceeds a threshold θGAto balance non-uniform problem-space sampling.

The reproduction of classifiers is performed within the matchset M , whereas the deletion of inaccurate classifiers is performedover the whole population P . This combination of mechanismsensures a good generalization capability [61,67]. In particular,general classifiers are reproduced more often as they match theinput data more often than the others.

In more detail, the ga selects in the current match set M twoclassifiers based on their relative predictive accuracy with respectto the other classifiers. The centers cnewi of their offspring aremoved within an interval ‖ci − cnewi ‖ ≤

√−2Wi lnφ0 where φ0

is the threshold used to generate the match set M . The shape ofthe ellipsoids, defined by σi which is the square-root of Wi, is alsorandomly increased or decreased within a limited interval.

Deletion takes place in the population when the numberof classifiers in P reaches the maximum nmax

P . Supernumeraryclassifiers are deleted from P with a probability proportional toan estimate of the size of the match sets that they occur in. If aclassifier is sufficiently experienced and its fitness F is significantlylower than the average fitness of classifiers in P , its deletionprobability is further increased [69].

1 This threshold is named θm in [65].

Finally, two additional processes that further optimize thepopulation, namely subsumption and compaction, are describedin [65].Global model approximation. The global population P partitions theinput space into a set of overlapping prediction models. xcsf gen-erates the global approximation based on the current match setM .The learning output y is given for a (x, z) pair as the weighted sumof the linear models of each classifier i in M

y (x, z) =1

Φ (z)

nM−i=1

φi (z) yi (x) (18)

where Φ (z) =∑nM

i=1 φi (z) and nM is the number of classifiers inthe match set M .

In sum, the evolutionary mechanism is designed to evolvepartitions in which linear approximations are maximally accurate.Indeed, xcsf strives to evolve a complete, maximally accurate, andmaximally general population of local approximations [70]. Amorecomplete description of xcsf can be found in [67] or in [65].Applications to robot model learning. In [65,71,72], the authorsuse xcsf to learn the forward kinematics model of a four dofssimulated arm in order tomodelmotor adaptation in arm reachingexperiments. They invert the model with classical analyticalmethods in order to control an end-effector in three dimensionswhile controlling the redundancy. They choose three differentsecondary tasks in the joint space which consist in minimizingangular velocities, avoiding joint limits or specifying a globalposture.

3.4. Support Vector Regression

In Support Vector Regression (svr), the regression extensionof Support Vector Machines (svms), it is assumed that the latentfunction f (x) can be parametrized as f (x) = φ(x)Tw, where φ isa feature vector mapping the input x into some high dimensionalspace and w is the corresponding weight vector [73–75]. Theweight vectorw can be represented as a linear combination of theinput vectors in the feature space with coefficients βi, i.e. w =∑nf

i=1 βiφ(xi), where nf is the number of features.Given this representation, the prediction y of an input x is

y = f (x) =

nf−i=1

βi⟨φ(xi), φ(x)⟩,

=

nf−i=1

βik(xi, x) (19)

where ⟨, ⟩ denotes the dot product between two vectors andk(x, x′) is a kernel function. A standard kernel is the Gaussianfunction

k(x, x′) = σ 2e−12 (x−x′)TW (x−x′), (20)

where W denotes the kernel width.By comparing (19) with (16) and (18), one can see that the

global structure of the approximation in svms is different fromthe one used in lwpr and xcsf. Whereas the former is a linearcombination of kernel functions, the latter is a kernel-weightedcombination of linear models.

The standard way to obtain the parameters βi of the modelsin svms if through a quadratic programming optimization processbased on all training samples [74]. There also exists a ls formu-lation called ls-svm [76,77]. In [78], the authors are interested inobtaining an accurate estimation of internal forces in the arm ofthe humanoid robot James. The arm of the robot is equipped witha force/torque sensor that estimates a total force from which the

Page 7: On-line regression algorithms for learning mechanical models of robots: A survey

O. Sigaud et al. / Robotics and Autonomous Systems 59 (2011) 1115–1129 1121

internal and external forces have to be determined. This in turn re-quires the estimation of a dynamics model, for which the authorscompare ls-svm and a batch neural network approach to the ana-lytical derivation of the model from cad parameters. The resultsshow that ls-svm converges faster than the neural network ap-proach, even though the final performance is similar.

An important parameter for computational complexity is thenumber of support vectors, hence algorithms like ν-svr use aspecific meta-parameter to control this number [79]. Given itsavailability in the libsvm library [80], this method has been usedas a baseline for performance comparisons in the context ofcomparisons for model learning methods [81,82].

3.4.1. LoSVRThe focus of this survey being incremental and local approaches,

we must mention that an svm approach endowed with theseproperties and called losvr is presented in [83]. losvr is built ona previous incremental svm algorithm presented in [84]. It is usedto learn the inverse dynamics model of a simulated 2 dofs arm andthen a real robot. However, the addressed problem is rather smalland the authors are expecting poor generalization capabilities fromtheir system, so we do not discuss it further in this paper.

3.4.2. Random features regularized least squaresAn important concern in locally weighted regression methods

is the model complexity, expressed in terms of the number of re-gions. A standard way to deal with this concern is called regular-ization. It consists in adding a term related to model complexity inthe cost function that is optimized by the approximation method.

For instance, in Regularized Least Squares2 (rgls), the ls costfunction (8) becomes

minβ

λ

2‖β‖

2+ ‖y − Xβ‖

2, (21)

where λ > 0 is a regularization parameter that balances the trade-off between complexity and accuracy.

The optimal β can be obtained by setting the gradient of (21) to0, which gives

β = (λI + X TX)−1X Ty. (22)Kernel Regularized Least-Squares (krgls)3 is the extension of

rgls to non-linear functions using the same kernel trick as insvms [85]. Like gpr, the problem of krgls is its cubic cost in thenumber of samples.

In [86], the authors propose to approximate the kernel matrixwith a set of random features, giving rise to Random FeaturesRegularized Least Squares (rfrls), an algorithm that converges tokrgls in O

1

√D

, where D is the number of features on which the

kernel is projected.Finally, in [87], the authors present an incremental version of

rfrls (named irfrls hereafter) that introduces the incrementalestimation method of rls in rfrls. The complexity of this methodis in O

D2

, thus it is independent of the number of samples.

Furthermore, the accuracy versus time complexity trade-off can beeasily tuned by changing the number of featuresD. Experiments onseveral databases confirm that krgls is competitivewith respect togpr and that rfrls and irfrls can be set close to krglswith a smallenough processing time.

3.5. Gaussian processes regression

In svr (Section 3.4), one is looking for a unique vector ofweightsβ such that the approximation

2 Not to be confused with Recursive Least Squares.3 Again, not to be confused with krls.

f (x) =

nf−i=1

βiφi(x)

is as close as possible to the latent function f . Bayesian linearregression methods provide an alternative basis to this problem.Instead of just looking for a specific vector β, they look for theposterior distribution over β given all the approximation samplesand someprior. Given the resulting distribution, the bestmodel canbe determined either by averaging over the distributions ofmodelsor by selecting the Maximum A Posteriori (MAP) element of thedistribution. As a result, suchmethods provide amore accurate butalso more expensive way of finding an appropriate model.

As clearly explained in [88], Gaussian Processes Regression(gpr) methods are mathematically equivalent to Bayesian linearregression. The main assumption in gpr is that the data presentedto the learning process is corrupted by Gaussian noise with zeromean and variance σ 2

n .An important property of Gaussian vectors is the following. If

n ≥ 1 and X = [X1, . . . , Xn, Xn+1] is a Gaussian vector of averageµ = 0 and whose variance

∑can be decomposed in the following

way:−=

[KkkTk∗

],

where K is a covariance matrix, k is a vector and k∗ is a scalar, thenE(f (xn+1)|Y1 = y1, . . . , Yn = yn) = kTK−1y and Var(f (xn+1)|Y1 =

y1, . . . , Yn = yn) = k∗− kTK−1k where yT

= (y1, . . . , yn), E(.)denotes the expectancy and Var(.) denotes the variance.

If the latent function f is assumed to be a Gaussian process,a function C : (x, x′) → E(f (x), f (x′)) must be chosen toestimate the covariance matrix K out of data. Once this function ischosen and K is estimated, the approximation is obtained by justgenerating a Gaussian random variable of average kTK−1y, and ofvariance k∗

− kTK−1k (see [88] for details).The advantage of gpr methods over Bayesian linear regression

is that one does not need to define a set of feature functions overthe input space. But, given the necessity to compute K−1, standardgpr has O(n3) computation and O(n2) storage cost, where n is thenumber of samples used for approximation. Thus, asWilliams [88]points out, gpr methods can be preferred only if n is not too bigwith respect to the number of feature functions that would berequired to perform Bayesian linear regression.

3.5.1. Practical approaches to GPRGiven the cubic cost in the number of samples, there are

two practical approaches to gpr. The sparse approach consists inlimiting n by choosing adequately the points to remember andforgetting the rest. A good overview of this approach can be foundin [89]. For instance, it has been used recently in Sparse On-line Gaussian Processes (sogp), in the context of learning controlpolicies from demonstrations [90]. The mixture of experts (moe)approach rather consists in splitting the global domain of the latentfunction into smaller regions. In moe methods, the regions arecalled experts. Indeed, if one splits the function domain into kregions, the standard gpr complexities are k.O(n3

i ) and k.O(n2i )

respectively, where ni is the number of samples that fall in regioni ∈ {1, . . . , k}.

This approach is also used to learn a policy from demonstrationin [91], that describes an incremental version of Infinite Mixtureof Gaussian Process Experts (imgpe) [92] and compares it to lwpr.From the analysis of the authors themselves, even if their algorithmbenefits from the Bayesian linear regression properties in contrastwith lwpr and provides more accurate results is some contexts,it is still too slow to be used in an on-line robotics set-up. Toour knowledge, this system has not been applied to learningmechanical models of a robot. The system presented below can beseen as a better compromise following this line of research.

Page 8: On-line regression algorithms for learning mechanical models of robots: A survey

1122 O. Sigaud et al. / Robotics and Autonomous Systems 59 (2011) 1115–1129

3.5.2. LGPLocal Gaussian Processes (lgp) is a system that combines the

accuracy of gprmethods on one region with the good capability oflwpr to split the latent function domain into regions [93,81,94,95].Many mechanisms being identical to those of lwpr, we only givea brief presentation, insisting on the points where both systemsdiffer.

In lgp, regions are defined as in lwpr. Furthermore, the mech-anism for adding a new region in lgp is identical to the one usedin lwpr: a region is added if the distance of the current point to allactivated region is more than a threshold wgen.

By contrast, the mechanisms for the evolution of regions differ.In lwpr, the parameters W determining the shape of the regionsevolve whereas the centers of all regions are fixed. In lgp, theparametersW are fixed whereas the centers are moving each timea region receives a new point, the center being the barycenter ofall the points associated to the region.

The linear approximation provided in each region is based on alocal gpr method. In that context, the covariance function used isthe standard Gaussian kernel function (20).

In order to compute the global model output, instead of usingthe models of all regions as done in lwpr, lgp uses only the Mregions whose center is closest to the current point.

In order to prevent the k.O(n3i ) computation cost to get too

expensive, once a threshold is reached, the algorithm removes thepoints that bring the least information. In [75], lgp is augmentedwith an on-line sparsification algorithm inspired from what isused in Kernel Recursive Least-Squares (krls) [96]. This additionalprocess addresses the same issue as sogp, i.e. the incrementalselection of samples that are used to train the local Gaussianprocesses.

In terms of application to robot model learning, in [81], theauthors compare the performance of lgp with lwpr, standardgpr and ν-svr (see Section 3.4). The comparison is performed onlearning the inverse dynamics in the task space of a seven dofsSarcos arm. It shows that lgp is competitive and provides a goodtrade-off between lwpr and gpr.

Finally, in [95], the authors also show that using lgp could giverise to better results than just using the parametricmodel providedby the robot manufacturer of a torque controlled manipulator.

3.6. Gaussian mixture models

A mixture model is a probabilistic model for representing thepresence of sub-populations within an overall population, withoutrequiring that an observed data-set should identify the sub-population to which an individual observation belongs. In thatcontext, the Expectation–Maximization (EM) algorithm [97] can beused tomake statistical inferences about the properties of the sub-populations given only observations on the global population.

The EM algorithm is an iterative method for finding maxi-mum likelihood orMAP estimates of parameters in statisticalmod-els, where the model depends on unobserved latent variables.EM alternates between performing an expectation (E) step and amaximization (M) step. The E step computes the expectation ofthe log-likelihood evaluated using the current estimate for the la-tent variables. The M step computes parameters maximizing theexpected log-likelihood found on the E step. These estimates arethen used to determine the distribution of the latent variables inthe next E step, up to convergence. In the mixture model context,the latent variable in the EM algorithm is the identity of the sub-population to which a specific data point should be attributed.

The specific computation of the E step and the M step dependson the underlying model. In Gaussian Mixture Models (gmms), thepopulation is a predetermined collection of Gaussian regions, thusthe function approximation is built as in rbfns [98].

3.6.1. ILO-GMRIn the context of machine learning methods for robotics,

gmms are usedmostly as an alternative to gprmethods for learningpolicies by demonstration (see [99] for a review). The correspond-ing methods are called Gaussian Mixture Regression (gmr), they arecubic in the number of samples, like gpr.

ilo-gmr, presented in [82], is one such method. However, onesection of the paper is devoted to the comparison of ilo-gmr withlwpr, gpr, ν-svr, gmr, and lgp, to the problem of approximatingthe inverse velocity kinematics model of a Sarcos robot arm usingthe database provided by the authors of Nguyen-Tuong et al. [81].The results show that ilo-gmr performs close to gpr and slightlybetter than lgp and standard gmr.

With respect to standard gmr, the main advantage of ilo-gmrlies in an efficient partitioning of the set of samples into regionsbased on an algorithm similar to kd-trees described in [100]. Thisallows one to break the cubic complexity of gmr and to quicklyaccess the output for a given input.

In all other respects, ilo-gmr is identical to standard gmr, thusthemodel is defined as a fixed collection of Gaussian regionswhoseweights are trained with an EM algorithm.

3.6.2. Incremental mixture of linear modelsAll themethods presented so far are approximatingmechanical

models as a mapping between an input space and an output space.This mapping is a function that returns a unique output for agiven input. It can represent a forward model or an inverse modeldepending onwhich variables are used as input andwhich are usedas output. Moreover, two functions may represent the inverse ofeach other if part of the input and the output are exchanged in thelearning process.

In contrast with this standard approach, the method presentedin [101] approximates a manifold in the space spanned by allinput and output variables. This manifold can represent both aforward and an inverse model at the same time. It corresponds toa subspace that is compatible with the constraints generated bythe considered model. Formally, if we call ν a [xT yT

]Tpoint in

the global space, the manifold can be represented by a constraintH(ν) = 0 that imposes the restriction that the point νmust complywith the model.

For instance, let us consider the kinematicsmodel. The concate-nation of any q vector and the corresponding ξ vector of task spaceposition defines a point in a q×ξ space. All such points are lying onthe manifold that characterizes the kinematics of the correspond-ing plant.

In practice, the authors are trying to approximate this manifoldusing a gmm approach. They try to estimate the probability of apoint ν belonging to the manifold, i.e.,

p(H(ν) = 0|ν1, ν2, . . . , νN).

Definition of regions. More precisely, themanifold verifyingH(ν) =

0 is approximated with a collection of regions that are defined asGaussian functions with a center cm and a local covariance matrixCm. The probability of a data point νi belonging to a region m isnoted as p(νi|m).Evolution of regions. The evolution of the parameters cm and Cmof these regions is driven by an incremental implementationof the EM algorithm (see [101] for the equations realizing thisincremental implementation).Adding and deleting regions. The system starts with only one regionand adds new regions when necessary. More precisely if, for somepoint ν, p(ν|m) is below a threshold pgen for all regions, then a newregion n is added centered on this point ν and with a covarianceinitialized to maximize p(ν|n).

The system also computes an adjacency matrix between allpairs of regions based on the idea that if, for a point ν, p(ν|m1) and

Page 9: On-line regression algorithms for learning mechanical models of robots: A survey

O. Sigaud et al. / Robotics and Autonomous Systems 59 (2011) 1115–1129 1123

Table 1Summary of the discussion of the systems.

System LWPR XCSF iRFRLS LGP IMLM ILO-GMR

Based on LWR LCSs SVR GPR + LWR GMMs GMRMain feature Dimensionality

reductionPrediction vs.condition spacedistinction

Constantcomputationalcomplexity

Sparsification Bidirectionalmapping

Use of kd-trees for fastaccess

Other assets Available in C++and Matlab withcookbook. Fast.Widely used

Available in Java.Flexible

Simple. Easy tuning Lessmeta-parametersthan LWPR

No matrix inversionrequired

Simple

Drawbacks Hard to tune.Initialization stagerequired

Complex. Sensitiveto tuning

No improvement offeatures

Slower than LWPR Less mature, notused on real robots

Fixed number offeatures

Favorite context ofuse

Large systems,along a trajectory,with dynamics

Learning on wholespace

Learning on wholespace

Sparse data,requiring accuracy

Redundant andparallel systems

Learning on wholespace

p(ν|m2) are both above a threshold, thenm1 andm2 are very likelyto be adjacent.Global model approximation. The global probability of a data pointνi belonging to the manifold verifying H(ν) = 0 results from acombination of the probabilities of all local regions. The difficultyhere is that the systemmust provide either a forward or an inversemodel depending on the need of the controller. Thus it is designedto answer to a query: given a specification of some coordinates of apoint, what are (if any) the points lying on themanifold that matchthis specification?

For a given query νq, after simplemathematical calculations, thesystem can provide for each region m the corresponding answerνam with a degree of confidence given by p(ν|m), where ν =

[νTq ν

Tam]

T .But, in many cases, there will be either several discrete possible

answer values or a continuous region corresponding to an infinityof such potential answers. Thus, a simple weighted average overthe answers of all reliable regions does not provide a satisfactorysolution.

Instead, the system uses the adjacency relationship describedabove to group together the answers of connected component andit provides one averaged answer and the corresponding degreeof confidence for each group. By doing so, a set of answers maysample a continuous space of solutions.Applications to robot model learning. To our knowledge, this systemhas not been applied so far to a robot. It was just compared withlwpr in simulated experiments. For the sake of easy reference, wecall this system imlm (for Incremental Mixture of Linear Models)below.

4. Discussion

All the systems studied in Section 3 have been applied tolearning the mechanical model of some plant. However, manyother regression algorithms exist that could be applied to thisproblem. In particular, the regression methods used to learn acontrol policy from demonstration can generally be used as suchto model identification problems. In that category, we alreadymentioned sogp and imgpe, for instance. We did not present thisbroader class of systems because it is well covered in [102,99].

In this section, we focus on the incremental and local regres-sion algorithms used for learning a mechanical model of robotspresented in more details above. We do not discuss anymore theanns approaches that are not local because they do not fall intothe unified view that is presented below. In practice, we focus onlwpr, xcsf, irfrls, lgp, imlm and ilo-gmr. The purpose of this sec-tion is to compare the systems by first providing a unifying viewbefore discussing their strengths andweaknesses from the presen-tation of their properties. This part of the discussion is summarized

in Table 1. Finally, we investigate from the unifying view whethertheir specific features could be combined into a new generation ofsystems.

4.1. Performance comparisons

Several papers present a performance comparison betweensome of the systems reviewed here [81,95,87,82]. These compar-isons are made possible thanks to the availability of a Sarcos anda Barrett arm database first published in [81]. The papers gener-ally compare the accuracy of themodels,measuredwith a standardnmse calculation and sometimes the prediction time as a functionof the number of samples.

However, the corresponding results are to be taken with care.Indeed, most of the algorithms are sensitive to meta-parametertuning and there is no standardway to accurately perform this tun-ing so that all algorithms are given a fair chance. As a consequence,the same algorithm is attributed different performances in differ-ent papers.

A striking outcome of a survey of these results is that, despiteits popularity, lwpr is generally less accurate than most of itscompetitors. This paradox can be explained by the fact thatlwpr obtained impressive results when learning along a trajectoryfor very large plants whereas in the comparisons listed above itis used for learning over the whole space for smaller plants. Werefer the reader to the list of papers above for other performancecomparisons.

4.2. A unifying view

From the survey of the systems presented in Section 3, aunifying view emerges.We consider four aspects of these systems:the structure of the model, the way this structure evolves or not,the local regression method used and finally the meta-parametertuning issue.

4.2.1. Structure of the modelsIf we consider Gaussian regions and linear models for xcsf,

and Gaussian kernels for irfrls, then all the systems listed aboveinclude a Gaussian component and a linear component in theirmodel. However, two broad classes of different structures mustbe distinguished. In lwpr, lgp and xcsf, the model is made ofa Gaussian weighted combination of linear models, whereas inirfrls, imlm and ilo-gmr, it is made of a linear combinationof Gaussian kernels. In that respect, the first class of systems isendowed with a more flexible structure, as already pointed outabout lwr in Section 3.1.4.

4.2.2. Evolution of the structureTable 2 summarizes in what respect the structure of the model

can evolve in different systems. The systems are organized from

Page 10: On-line regression algorithms for learning mechanical models of robots: A survey

1124 O. Sigaud et al. / Robotics and Autonomous Systems 59 (2011) 1115–1129

Table 2Capabilities of the systems to evolve the structure of the model. A • indicates that the system is endowed with the corresponding capability.

System iRFRLS ILO-GMR LGP LWPR IMLM XCSF

Addition of regions • • • •

Deletion of regions •

Moving centers • • • •

Moving shapes • • • •

the less flexible one, irfrls, where regions are generated randomlyonce and for all to the most flexible one, xcsf, where all elementsof the structure can evolve.

4.2.3. Regression methodAll the algorithms listed above perform local regression based

on the minimization of an ls criterion, but the way to perform thisminimization differs. As clearly stated in [75], standard svr ap-proaches rely on a batch quadratic programming optimization pro-cess to get the linear parameters whereas gpr gets them througha matrix inversion (see Section 3.5). irfrls is an svr algorithm,but the batch optimization is replaced by an incremental versionknown as the QR algorithm in the field of adaptive filtering [103].ilo-gmr relies on a standard EM algorithm. Finally, xcsf uses stan-dard rls whereas lwpr relies on the more sophisticated nipalsalgorithm.

4.2.4. Tuning meta-parametersIn order to approximate an arbitrarily complex function in a

large space, onemust dealwith a trade-off between the accuracy ofthe approximatemodel and the number of regions used to performthis approximation. Dealingwith this trade-off is amatter ofmeta-parameter tuning. Different algorithms use a different strategy inthat respect.

• In irfrls, the only meta-parameter is the number of Gaussiankernels. This makes it very easy to tune and very fast, but thisapproach should meet its limitations when many kernels arerequired.

• In lwpr and lgp, the indirect parametrization is based on settingin advance the size of the regions through the wgen parameter.This is a poor compromise since one must try a wgen value,monitor the number of regions and the accuracy, and try againas long as the number of regions is too large or the accuracy toolow.

• imlm sets a threshold pgen on the accuracy of the local models.Using this threshold provides guarantees in terms of accuracy,at the expense of computational complexity if a segmentationinto many regions is necessary to reach the threshold.

• xcsf rather sets a maximum number of regions to perform theapproximation. Such parametrization provides a good grip onthe computational complexity of the process, at the expenseof accuracy. But xcsf also uses a threshold on the globalaccuracy of prediction. Indeed, if the maximum population sizeis reached, the algorithm tries to improve individual classifiersin the population until the accuracy threshold is reached.However, if both constraints are not set with care, the processmay diverge due to incompatible evolutionary pressures.

4.3. Key features of systems

The unifying view presented above reveals some of the featureson which the systems can readily be compared. We can nowhighlight the strengths and weaknesses of these systems andexhibit the features that provide them with a unique advantageover their competitors.

4.3.1. LWPRThere are several reasons for the popularity of lwpr. First,

good documentation and an efficient and versatile code are avail-able [104]. Second, lwprhas been convincingly applied to the iden-tification of diverse models (kinematics, velocity kinematics anddynamics, forward and inverse) of large mechanical systems suchas humanoid robots. This success was possible because lwpr isfast, reasonably accurate and because it performs dimensionalityreduction in the prediction space of each local model. This dimen-sionality reduction feature is unique to lwpr among the systemswe presented. Apart from its very appealing features, lwpr alsosuffers from several important limitations.

First, it comeswithmanymeta-parameters and its performanceis sensitive to their tuning. To temper this criticism, Klanke et al.[50] provides a useful ‘‘cookbook’’ to help determinehow themeta-parameters should be tuned.

A second problem with lwpr is initialization. Because thecenters of the region do not move during the learning process,lwpr is sensitive to the samples received in the first iterations.Thus, in order to get a good prediction accuracy with lwpr, it isnecessary to start with an initialization stage that must ensurea good coverage of the latent function domain. However, thisinitialization stage can be difficult to realize in a concrete roboticsset-up if the system is not endowed with the model it is supposedto learn.

Third, all the impressive results published on plants with manydofs have been obtained along some specific trajectory in thestate space of the plant which makes the problem easier butalso less relevant. It is easier because learning along a trajectorydramatically reduces the domain that has to be split into regions.With lwpr, covering a domain requires a number of regionsthat is roughly exponential in the number of dimensions ofthe domain: the dimensionality reduction property provided bynipals is applied to the local model in each region but does notresult in a significantly lower number of regions. Furthermore,experiments show that the performance of lwpr collapses as soonas the number of regions gets above a few tens of thousands. Theproblem is also less relevant because controlling a plant along anarticular trajectory prevents making profit of redundant dofs. Toface this difficulty, the learning mechanism has to be endowedwith a good extrapolation capability, which is generally not thecase with the local methods studied in this paper in general andwith lwpr in particular.

4.3.2. XCSFxcsf shares a lot of similarities with lwpr in the structure

of the learned models. However, it differs in the algorithmsused to update this structure and in the meta-parameters thatconstrain the algorithms. A specific comparison between bothsystems has shown that xcsf generally outperforms lwpr instandard function approximation problems, even if lwpr usuallyconverges faster [105]. Among other things, xcsf can be tunedmore intuitively and its evolutionary component can modify thecenters of the approximation regions. Given thatxcsf is available inJava [106], the main reason for its lesser use in the model learningcommunity is probably its lesser maturity in the field.

But the unique feature of xcsf is the condition space versusprediction space distinction. This feature can result in important

Page 11: On-line regression algorithms for learning mechanical models of robots: A survey

O. Sigaud et al. / Robotics and Autonomous Systems 59 (2011) 1115–1129 1125

savings in the context of learning mechanical models of a plant.We illustrate these savings in the context of learning the forwardvelocity kinematics model and the inverse dynamics model.

According to (1), learning a forward velocity kinematics modelwith lwpr requires the particular positions and velocities q and qas input and the task velocity ξ as output. But, taking a closer lookat the equation ξ = J (q) q, one can see that it relates linearly ξto q for different contexts based on q. Thus one can use the valuesof q to define the condition space and the corresponding relationbetween ξ and q to learn the local models in the prediction space.

Instead of segmenting a (q, q) space into regions to approx-imate the forward velocity kinematics function, one just needsto segment a (q) space. As a result, the condition space is twicesmaller with xcsf than with any other local regression method.

The same is true when one considers learning an inverse dy-namicsmodel. Eq. (5) describing the dynamics of a poly-articulatedsystem can be reformulated

Γ = A (q) q + B (q, q) . (23)

This equation can be viewed as a linear relation between q andΓ with dependencies on q and q

Γ =A (q) B (q, q)

[q1

]. (24)

In this context, B (q, q) can be seen as an offset. Thus, similar tothe velocity kinematic case, one can learn an inverse dynamicsmodel by providing at each time step a state (q, q) as conditionparameters and a (q, Γ ) pair as input to the prediction space.

This approach is less expensive in terms of dimensionality thanthe standard one. Indeed, one needs to span (q, q) instead of(q, q, q) to learn the model. As a result, the space is 1/3 smaller.

In Section 3.1.4, we noted that the major improvement infunction approximation techniques brought by lwr has consistedin separating a linear model from the radial basis function thatdelimits the corresponding region of influence. In xcsf, the ideais put one step further: since those entities are separated, they donot need to be based on the same dimensions, thus the global inputvector can be split into a condition part and a prediction part. Theautonomous determination of which input should be used in thecondition part, in the prediction part or just ignored is a matter forfuture research.

4.3.3. IRFRLSThe main feature of irfrls is its constant computational

complexity. This property results from the segmentation of thelatent function domain into a set of random kernels, whichis the simplest possible mechanism. The number of kernels isthe only meta-parameter that needs to be tuned. On the datasets presented in [81] and other data sets extracted from theJames and iCub humanoid robots, irfrls obtains a surprisinglygood performance compared to its competitors. However, therandom kernel approach obviously results in a suboptimalstructure of themodel, whichmay become critical when themodelhas to be very large. This system being very recent, more work isrequired to better determine its empirical properties, but given itssimplicity, it is a good starting point for a new family of systems.

4.3.4. LGPlgp was designed after lwpr and has been compared favor-

ably to its ancestor in several contexts [81,95]. One difficulty withlwpr being meta-parameter tuning, a strong incentive for callingupon gpr methods is that they generally come with much lessmeta-parameters. Moreover, by splitting the domain of approx-imation into regions, lgp breaks down the computational com-plexity of standard gpr methods. It is still slower than lwpr, but

it also inherits the greater accuracy of gpr. However, some meta-parameters inherited from lwpr for the segmentation into regionssuch as wgen are still difficult to tune.

Because it performs local regression based on a gpr process,lgp is strongly driven towards the use of sparse data for eachregion. Sparsification is the main feature brought into this surveyby lgp.

Along this line of research, one may wonder if using standardBayesian linear regression instead of gpr in a mixture of expertapproaches would not result in a more efficient approach, thefeatures necessary for Bayesian linear regression being providedby the segmentation into regions.

4.3.5. IMLMAs explained in Section 2.2, the forward velocity kinematics

model of a redundant plant is unique whereas there exists aninfinity of corresponding inverse models. Conversely, for a parallelplant, the inverse velocity kinematics model is unique whereasthere is an infinity of forward models. A first appealing propertyof the approach of Lopes and Damas [101] is that it can store allsuchmodels from data without knowing in advance if themodel isunique nor losing the information corresponding to the potentialinfinity of solutions.

A second appealing property is that no matrix inversion isnecessary to retrieve either a forward or an inverse informationfrom the stored data. Intuitively, in the case of kinematics, onemayprovide a vector q or a vector ξ and retrieve the complementarypart by looking for all points in the manifold that share thespecified coordinates.

Nevertheless, retrieving the required information from such amanifold is not straightforward. Themanifold being approximatedby a set of Gaussian regions, the system returns one value foreach region whose probability of belonging to the manifold isabove a threshold. As a consequence, when there is an infinity ofcorrect answers to a query, the system only returns a set of discreteanswers sampling the domain of correct answers. Furthermore, theaccuracy of the sampling is sensitive to the size of the regions.

Finally, when there are several correct answers, one musttake additional constraints into consideration to choose a specificanswer among the possible ones. Despite a first solution proposedin [101], this point is still an open problem.

4.3.6. ILO-GMRilo-gmr is a straightforward extension of gmr systems de-

signed to perform learning from demonstration. It is a simple ap-proach, whose main drawback is that the number of kernels haveto be determined in advance. However, an interesting feature ofilo-gmr is its use of kd-trees to organize the regions and the cor-responding samples so as to quickly access them. An extension ofgprwith kd-trees has also been proposed in [107].

4.4. Combining features from several systems

In the previous section, the key features of the studied systemshave been highlighted. Now, we discuss whether these featurescould be combined in order to build a new generation of systems.

4.4.1. Dimensionality reductionThe incremental dimensionality reduction property provided

to lwpr by the nipals algorithm is a key feature of lwpr. It isparticularly adequate for learning models along trajectories inspaces with many dimensions, some of which being irrelevantalong the given trajectory. xcsf learns the same kind of linearmodels as lwpr, using rls instead of nipals. It seems thatnipals could replace rls in xcsfwithoutmajormodifications of therest of the algorithm. Whether a similar dimensionality reductionproperty could be introduced into gpr or gmm basedmethods suchas lgp and imlm is a more difficult open issue.

Page 12: On-line regression algorithms for learning mechanical models of robots: A survey

1126 O. Sigaud et al. / Robotics and Autonomous Systems 59 (2011) 1115–1129

4.4.2. Condition space versus prediction space distinctionAs outlined above, lwpr, lgp and xcsf both build their model

as a Gaussian weighted combination of linear models. As a con-sequence, the distinction between prediction space and conditionspace used in xcsf could certainly be transferred to lwpr and lgp.In contrast, such a transfer would be much less immediate for allthe other systems that use a different structure for the approxima-tion of the latent function.

4.4.3. Learning the input–output manifoldLearning the input× output manifold is unique to imlm. All the

other methods are designed to approximate a function rather thana manifold. Anyway, even if replacing the gmm foundation of thatsystemby other regressionmethods such as the one used in xcsf orlgp is probably an option, more work is required on top of imlm inorder to efficiently use its unique feature.

4.4.4. Meta-parameter optimizationFrom Section 4.2.4, the meta-parameters used in imlm and

xcsf look easier to tune than the one used in lwpr and lgp, witha slight advantage to xcsf. Whether the approach used in xcsf canbe transposed to the other systems is an open question.

Alternatively, onemay think of dealingwith the accuracy versusnumber of regions trade-off with a multi-objective function opti-mization approach. Such an approachwould generate a populationof non-dominated solutions amongwhich a very accurate solution,a very sparse one or any compromise between both extrema couldbe chosen by the user.

4.4.5. Sparsification and fast access to samplesIt seems that the sparsification algorithm proposed in lgp is not

specific of this algorithm in any way. Thus, it could probably beincluded in most of the other systems, giving rise to immediatecomputational savings.

Similarly, the fast access to samples brought by the use of kd-trees in ilo-gmr could certainly be imported in most if not all ofthe systems listed in this survey.

A new system that incorporates these properties as well assome of those that we listed in the previous sections remain to bebuilt, giving rise to a new generation of more efficient systems.

5. Towards more advanced systems

Most of the experimental work presented so far consists inlearning mechanical models of a robot without any interactionswith its environment. In this context, the resulting benefit is lim-ited. Indeed, themechanical structure of a robot is generally almostperfectly known through the cad model, thus a good engineeringwork and an eventual additional parametric identification processcan suffice to get an accurate model.

Obviously, the methods presented in this paper are becomingmore advantageous when they address the robotics contexts out-lined in the introduction. Future robots will interact physicallywith unknown objects and will even do so with unknown toolsgiven the fast development of dexterous manipulation capabili-ties [108–110]. Evenmore convincingly, theywill interactwith hu-man users that are highly unpredictable. In all these cases, a modelprovided in advance cannot generally account for unspecified in-teractions. In this section, we first stress the challenges raised bythesemore difficult contexts of use, thenwe survey somenew linesof research that start addressing them.

5.1. New challenges

To face the challenges of interactions with the environment,the next generation of systems will have to address problems with

many more dimensions, they will have to adapt faster, and theywill have to memorize several models corresponding to differentcontexts of use, together with the capability to switch betweenthese models appropriately.

5.1.1. Challenge 1: plants with more dimensionsAmong the algorithms surveyed here, lwpr is the only one

that has been used on a plant as complex as a humanoid. Butonly one arm of the robot was moving and, above all, learningwas performed along a specific trajectory. Learning kinematicsand dynamics model of a more than 10 dofs robot over thewhole space is still beyond the capabilities of all state-of-the-art methods. However, for a humanoid engaged in a complexeveryday-life mission, having such a complete model is necessary.A key property for addressing these larger problems withoutfacing hard computational complexity challenges is generalization.Future systems should definitely be able to extrapolate themodelsthey are learning beyond the domain where they were trained,which is generally not the case of the local learning methodssurveyed in this paper.

5.1.2. Challenge 2: faster adaptationAbarely addressed issue in all the papers surveyed here is speed

of adaptation. Indeed, when a robot interactswith different objectsor users, one would like the model to adapt to a new context ofinteraction fast enough to deal immediately with this new context.For instance, if a robot is pouring water from a container, theweight is changing quickly. Or, if a robot is in physical interactionwith an user, the force exerted by the user may vary at anymoment.

It is hard to accurately present the state-of-the-art on thattopic in the absence of any systematic study, but the methodspresented in this survey are not equivalent in that respect andare probably slower than what is required for the kind of non-stationary interactions just described above.

5.1.3. Challenge 3: switching between contextsIf a robot has to physically interact with several users or

objects, eventually with some tools, the systems described in thispaper must be enriched with two capabilities: the capability tolearn different models corresponding to the different contexts ofinteraction and the capability to switch between these modelsappropriately. Otherwise, the system would spend a lot of timere-learning the same models back and forth. An additionalrequirement is that the system is able to recognize which modelit should be using in a given context or to determine that a newmodel is necessary.

5.2. New lines of research

Corresponding to the above challenges, two of the lines of re-search below consist in incremental improvements of the existingsystems whereas the third line consists in an extension of the ad-dressed domain. However, each of the two first lines contributes toall challenges simultaneously.

5.2.1. Line 1: Combining knowledge and regressionA way to increase the speed of adaptation and to deal with

larger plants is to rely more on expert knowledge about the plant.Indeed, the model learning approach does not benefit from priorknowledge of the rbd model of the plant, though this knowledgeis generally available. Recently, the authors of Nguyen-Tuong andPeters [111] combined the strengths of rbd identification andmachine learning methods, using gpr for learning the differencebetween the parametric mapping and the observed behavior.

Page 13: On-line regression algorithms for learning mechanical models of robots: A survey

O. Sigaud et al. / Robotics and Autonomous Systems 59 (2011) 1115–1129 1127

Their method provides a higher model accuracy and bettergeneralization for unknown trajectories compared to rbd iden-tification and gpr-based methods. Furthermore, it shows a goodlearning performance even on sparse and noisy data.

Though their results are promising, it is still unclear how priorknowledge helps learning a robot dynamics, notably depending onthe control law used to actuate the robot. Furthermore, their semi-parametric regression approach does not alleviate the method-ological difficulty of performing identification, which, in mostcases, is not straightforward.

A different approach to the same problem is presented in [112],where the authors incorporate prior knowledge of the rbd modelof the plant in lwpr by initializing all linear models with a firstorder approximation of the known rbd. The application to asimple simulated system convincingly improves the learning andgeneralization performance.

Other methods based on visual processing such as Mansardet al. [10], Sturm et al. [11], Hersch et al. [12], Martinez-Cantinet al. [13] incorporate prior knowledge of the structure of the bodyof the robot. Thesemethods are surveyed in [8]. The promising lineof research corresponding to all these works should be pursuedwhatever the model learning method used.

With a very different perspective, the systempresented in [113]learns the forward kinematics model of mechanical systems usingBezier curves. Theirmethod is based on the insight that the forwardkinematics model of a mechanical system with rotational jointsmainly calls upon trigonometric functions. It spans efficiently theparametric space of such functions in order to get an accuratemodel from sparse data.

Though this method does not rely on a parametric model of therobot in the classical sense, it is based on a strong assumption aboutthe structure of themodel. On the onehand, this assumptionmakesthe parameter tuning more efficient. On the other hand, it makesthemethod less general. In particular, it can only be efficient whenlearning the forward kinematics model of a plant whose dofs allresult from rotational joints. The extension to prismatic joints andto dynamics must call upon additional assumptions.

5.2.2. Line 2: Active learningDifferent ways to span the input/output space can result in a

large difference in the time necessary to learn an accurate modelover the whole space. Thus it can increase both the speed ofadaptation and the size of the problems that can be addressed.In Machine Learning, choosing appropriately the set of learningexamples so as to increase the learning speed is called activelearning.

Active learning of amechanicalmodel is a difficult topic becauseit generates a ‘‘chicken and egg’’ problem. Indeed, in order tochoose the samples that feed the learning process, it is necessary toknow how to drive the plant towards the corresponding states. Butto do so, having an accurate model is generally required. A way toescape this difficulty consists in calling upon model-free controlmethods such as articular PID controllers to drive the systemtowards areas of interest. So far, there are few attempts to designspecific active learningmethods for learningmechanical models ofrobots, exceptions being Robbel [114], Saegusa et al. [115], Baranesand Oudeyer [116,117], Martinez-Cantin et al. [13]. Given thegrowing interest in active learning in general, important progresscan be expected in this domain in the near future.

5.2.3. Line 3: Switching models and sharing between modelsIn [118,119], the authors address the problem of switching be-

tween models raised in Section 5.1.3. They rely on lwpr combinedwith an EM algorithm to learn elementary models and use a la-tent variable to account for the different unknown contexts of in-teraction. So far, their work is just a proof of concept. It should be

extended with perception processes to help the system recogniz-ing the different contexts based on richer information than just thearticular state of the robot.

Along a different line of research, given the infinity of inversevelocity kinematics models of a redundant plant, a different modelcan be used to achieve each different task in a different way. Thus,addressing a complex mission composed of several tasks can alsobe seen as a problem of learning several models that must bememorized in parallel. Some authors are using this approach andlearn mechanical models in the context of multiple tasks basedon gpr. More interestingly, since the inverse dynamics model isunique butmay be involved in the realization of diverse tasks at thekinematic level, one can see the problem of learning this inversedynamics model through a first task and using it for other tasks asa transfer learning problem [120,121].

On top of those ideas, a next challenge will consist in general-izing what is learned in interaction with one object to interactionwith a class of similar objects.

6. Conclusion

The on-line modeling of robotics systemwith regression meth-ods is a very active field. In this paper, we surveyed in particu-lar local and incremental algorithms currently used to solve thisproblem, providing a unifying view and comparing their features.Among other things, we have shown that, despite its popularityand efficiency for some problems such as learning along a trajec-tory in a large domain, lwpr suffers from several important draw-backs that justify the important effort dedicated to looking for abetter algorithm.

Furthermore, we have shown that several systems come withinteresting additional features, such as the condition space/predi-ction space distinction in xcsf or the capability to learn both theforward and inverse model into a unique manifold in imlm. Wehighlighted that there is still room for large performance improve-ments in this domain just by combining some the features of thesealgorithms. Further ideas such as the multi-objective optimizationof the trade-off between the accuracy and the number of regionsmight also give rise to new lines of research. Finally, we are con-vinced that the application of such combinations of algorithmsshould help addressing in the near future the ‘‘new frontier’’ ofmodel learning, such as the physical interactionwith unknown ob-jects, tools and users.

Acknowledgment

This work is supported by the French ANR program (ANR 2010BLAN 0216 01), more at http://macsi.isir.upmc.fr.

References

[1] O. Sigaud, J. Peters, Introduction, in: O. Sigaud, J. Peters (Eds.), From MotorLearning to Interaction Learning in Robots, Springer, 2010, pp. 1–12 (Chapter1).

[2] M. Zinn, O. Khatib, B. Roth, J.K. Salisbury, Playing it safe, IEEE Robotics andAutomation Magazine (2004) 12–21.

[3] W. Chung, L.-C. Fu, S.-H. Hsu, Motion control, in: B. Siciliano, O. Khatib (Eds.),Handbook of Robotics, Springer 2008, pp. 133–159. (Chapter 6).

[4] R. Featherstone, D.E. Orin, Dynamics, in: B. Siciliano, O. Khatib (Eds.),Handbook of Robotics, Springer 2008, pp. 35–65. (Chapter 2).

[5] B. Siciliano, L. Sciavicco, L. Villani, G. Oriolo, Robotics: Modelling, Planningand Control, Springer Verlag, 2008.

[6] L. Ljung, System Identification: Theory for the User, Prentice-Hall, Inc., UpperSaddle River, NJ, USA, 1986.

[7] D. Nguyen-Tuong, J. Peters, Model learning for robot control: a survey,Cognitive Processing (2011) 1–22.

[8] M. Hoffmann, H. Marques, A. Arieta, H. Sumioka, M. Lungarella, R. Pfeifer,Body schema in robotics: a review, IEEE Transactions on AutonomousMentalDevelopment 2 (2010) 304–324.

[9] M. Jagersand, Image-based predictive display for high dof uncalibrated tele-manipulation using affine and intensity subspacemodels, Advanced Robotics14 (2001) 683–701.

Page 14: On-line regression algorithms for learning mechanical models of robots: A survey

1128 O. Sigaud et al. / Robotics and Autonomous Systems 59 (2011) 1115–1129

[10] N. Mansard, M. Lopes, J. Santos-Victor, F. Chaumette, Jacobian learningmethods for tasks sequencing in visual servoing, in: IEEE/RSJ InternationalConference on Intelligent Robots and Systems, Beijing, China, 2006,pp. 4284–4290.

[11] J. Sturm, C. Plagemann, W. Burgard, Unsupervised body scheme learningthrough self-perception, in: IEEE International Conference on Robotics andAutomation, IEEE, 2008, pp. 3328–3333.

[12] M. Hersch, E. Sauser, A. Billard, Online learning of the body schema,International Journal of Humanoid Robotics 5 (2008) 161–181.

[13] R. Martinez-Cantin, M. Lopes, L. Montesano, Body schema acquisitionthrough active learning, in: IEEE International Conference on Robotics andAutomation, 2010, pp. 1860–1866.

[14] C. Salaün, V. Padois, O. Sigaud, Control of redundant robots using learnedmodels: an operational space control approach, in: IEEE/RSJ InternationalConference on Intelligent Robots and Systems, 2009, pp. 878–885.

[15] A. Ben Israel, T.N.E. Greville, Generalized Inverses: Theory and Applications,2nd ed., Springer, 2003.

[16] O. Khatib, Dynamic control of manipulators in operational space, in: SixthCISM-IFToMM Congress on Theory of Machines and Mechanisms, 1983, pp.1128–1131.

[17] F. Rosenblatt, The perceptron: a probabilistic model for information storageand organization in the brain, Psychological Review 65 (1958) 386–408.

[18] T. Kohonen, Self-Organizing Maps, Springer, Berlin, 2001.[19] G. Cybenko, Approximation by superpositions of a sigmoidal function,

Mathematics of Control, Signals, and Systems, MCSS 2 (1989) 303–314.[20] M. Butz, O. Herbort, J. Hoffmann, Exploiting redundancy for flexible behavior:

unsupervised learning in a modular sensorimotor control architecture,Psychological Review 114 (2007) 1015–1046.

[21] M. Jordan, D. Rumelhart, Forward models: supervised learning with a distalteacher, Cognitive Science: A Multidisciplinary Journal 16 (1992) 307–354.

[22] F. Pourboghrat, Neural networks for learning inverse-kinematics of redun-dant manipulators, in: 32nd Midwest Symposium on Circuits and Systems,vol. 2, 1989, pp. 760–762.

[23] J. Barhen, S. Gulati, M. Zak, Neutral learning of constrained nonlineartransformations, Computer 22 (1989) 67–76.

[24] Z. Ahmad, A. Guez, On the solution to the inverse kinematic problem, in:IEEE International Conference on Robotics and Automation, vol. 3, 1990, pp.1692–1697.

[25] S. Lee, R. Kil, Robot kinematic control based on bidirectional mapping neuralnetwork, in: International Joint Conference on Neural Networks, vol. 3, 1990,pp. 327–335.

[26] M. Brüwer, H. Cruse, A network model for the control of the movement of aredundant manipulator, Biological Cybernetics 62 (1990) 549–555.

[27] L. Nguyen, R. Patel, K. Khorasani, Neural network architectures for theforward kinematics problem in robotics, in: International Joint Conferenceon Neural Networks, vol. 3, 1990, pp. 393–399.

[28] L.H. Sang, M.-C. Han, The estimation for forward kinematic solution ofStewart platform using the neural network, in: IEEE/RSJ InternationalConference on Intelligent Robots and Systems, vol. 1, 1999, pp. 501–506.

[29] H. Sadjadian, H.D. Taghirad, Numerical methods for computing the forwardkinematics of a redundant parallel manipulator, in: IEEE Conference onMechatronics and Robotics, Aachen, Germany, 2004, pp. 557–562.

[30] V.R. Barretto, A.F.R. Araujo, H.J. Ritter, Self-organizing feature maps formodeling and control of robotic manipulators, Journal of Intelligent andRobotic Systems 36 (2003) 407–450.

[31] J. Walter, K. Schulten, Implementation of self-organizing neural networksfor visuo-motor control of an industrial robot, IEEE Transactions on NeuralNetworks 4 (1993) 86–96.

[32] D. Wang, A. Zilouchian, Solutions of kinematics of robot manipulators usinga kohonen self-organizing neural network, in: IEEE International Symposiumon Intelligent Control, 1997, pp. 251–255.

[33] H.J. Ritter, T.M. Martinetz, K.J. Schulten, Topology-conserving maps forlearning visuo-motor-coordination, Neural Networks 2 (1989) 159–168.

[34] T. Martinetz, H. Ritter, K. Schulten, Three-dimensional neural net for learningvisuomotor coordination of a robot arm, IEEE Transactions on NeuralNetworks 1 (1990) 131–136.

[35] S. Kumar, P. Premkumar, A. Dutta, L. Behera, Visual motor control of a7dof redundantmanipulator using redundancy preserving learning network,Robotica 28 (2010) 795–810.

[36] F. Flentge, Locally weighted interpolating growing neural gas, IEEE Transac-tions on Neural Networks 17 (2006) 1382–1392.

[37] G. Sun, B. Scassellati, Reaching through learned forward model, in: 4thIEEE/RAS International Conference on Humanoid Robots, Los Angeles, USA,vol. 1, 2004, pp. 93–112.

[38] G. Sun, B. Scassellati, A fast and efficient model for learning to reach,International Journal of Humanoid Robotics 2 (2005) 391–414.

[39] D.E. Whitney, Resolved motion rate control of manipulators and humanprostheses, IEEE Transactions on Man–Machine Systems 10 (1969) 47–53.

[40] S. Lin, A.A. Goldenberg, Neural-network control of mobile manipulators, IEEETransactions on Neural Networks 12 (2001) 1121–1133.

[41] C.G. Atkeson, Using locally weighted regression for robot learning, in: IEEEInternational Conference on Robotics and Automation, vol. 2, 1991, pp.958–963.

[42] D. Cohn, Z. Ghahramani, M. Jordan, Active learning with statistical models,Journal of Artificial Intelligence Research 4 (1996) 129–145.

[43] S. Schaal, C.G. Atkeson, Receptive field weighted regression, Tech. Rep. TR-H-209, ART Human Inf. Process. Lab., Kyoto, Japan, 1997.

[44] S. Schaal, S. Vijayakumar, C.G. Atkeson, Local dimensionality reduction, in:Advances in neural information processing systems, 1998, pp. 633–639.

[45] H. Wold, Soft modelling by latent variables: the non-linear iterative partialleast squares (NIPALS) approach, in: Perspectives in Probability and Statistics,1975, pp. 117–142.

[46] L. Elden, Partial least-squares vs. Lanczos bidiagonalization-I: analysis of aprojectionmethod formultiple regression, Computational Statistics andDataAnalysis 46 (2004) 11–31.

[47] S. Vijayakumar, S. Schaal, Locally weighted projection regression: an o(n)algorithm for incremental real time learning in high dimensional space, in:International Conference on Machine Learning, 2000, pp. 1079–1086.

[48] S. Schaal, C.G. Atkeson, S. Vijayakumar, Scalable techniques from nonpara-metric statistics for real time robot learning, Applied Intelligence 17 (2002)49–60.

[49] S. Vijayakumar, A. D’Souza, S. Schaal, LWPR: a scalable method forincremental online learning in high dimensions, Technical Report, EdinburghUniversity Press, 2005.

[50] S. Klanke, S. Vijayakumar, S. Schaal, A library for locally weighted projectionregression, Journal of Machine Learning Research 9 (2008) 623–626.

[51] A. D’Souza, S. Vijayakumar, S. Schaal, Learning inverse kinematics, in:IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 1,2001, pp. 298–303.

[52] M. Kawato, Internal models for motor control and trajectory planning,Current Opinion in Neurobiology 9 (1999) 718–727.

[53] S. Vijayakumar, S. Schaal, Local dimensionality reduction for locallyweightedlearning, in: IEEE International Symposium on Computational Intelligence inRobotics and Automation, 1997, pp. 220–225.

[54] J. Peters, S. Schaal, Learning to control in operational space, InternationalJournal of Robotics Research 27 (2008) 197–212.

[55] J. Sun de la Cruz, D. Kulic,W. Owen, Learning inverse dynamics for redundantmanipulator control, in: International Conference on Autonomous andIntelligent Systems, Springer, 2010, pp. 1–6.

[56] D. Mitrovic, S. Klanke, S. Vijayakumar, Adaptive optimal control forredundantly actuated arms, in: From Animals to Animats 10: 10thInternational Conference on Simulation of Adaptive Behavior, Sab 2008,Osaka, Japan, July 7–12, 2008, Proceedings, Springer, 2008, p. 93.

[57] W. Li, Optimal Control for Biological Movement, Ph.D. Thesis, University ofCalifornia, San Diego, 2006.

[58] J.H. Holland, Adaptation in Natural and Artificial Systems: An IntroductoryAnalysis with Applications to Biology, Control, and Artificial Intelligence,University of Michigan Press, Ann Arbor, MI, 1975.

[59] R. Sutton, A. Barto, Reinforcement Learning, MIT Press, 1998.[60] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine

Learning, Addison-Wesley, Boston, MA, USA, 1989.[61] S.W. Wilson, Classifier fitness based on accuracy, Evolutionary Computation

3 (1995) 149–175.[62] M.V. Butz, D. Goldberg, P. Lanzi, Computational complexity of the XCS

classifier system, in: Foundations of LearningClassifier Systems, vol. 51, 2005,pp. 91–125.

[63] S.W.Wilson, Function approximationwith a classifier system, in: Genetic andEvolutionary Computation Conference, Morgan Kaufmann, San Francisco,California, USA, 2001, pp. 974–981.

[64] S.W. Wilson, Classifiers that approximate functions, Natural Computing 1(2002) 211–234.

[65] M.V. Butz, O. Herbort, Context-dependent predictions and cognitive armcontrol with XCSF, in: Conference on Genetic and Evolutionary Computation,ACM, 2008, pp. 1357–1364.

[66] B. Widrow, M. Hoff, Adaptive switching circuits, Western Electronic Showand Convention (1960) 96–104.

[67] M.V. Butz, T. Kovacs, P.L. Lanzi, S.W.Wilson, Toward a theory of generalizationand learning in XCS, IEEE Transactions on Evolutionary Computation 8 (2004)28–46.

[68] M.V. Butz, D.E. Goldberg, K. Tharakunnel, Analysis and improvement offitness exploitation in XCS: bounding models, tournament selection, andbilateral accuracy, Evolutionary Computation 11 (2003) 239–277.

[69] T. Kovacs, Deletion schemes for classifier systems, in: Genetic andEvolutionary Computation Conference, Orlando, Florida, 1999, pp. 329–336.

[70] M.V. Butz, D. Goldberg, Bounding the population size in XCS to ensurereproductive opportunities, in: Genetic and Evolutionary Computation, vol.2724, Springer-Verlag, 2003, pp. 1844–1856.

[71] M. Butz, G. Pedersen, P. Stalph, Learning sensorimotor control structureswith XCSF: redundancy exploitation and dynamic control, in: Conference onGenetic and Evolutionary Computation, ACM, 2009, pp. 1171–1178.

[72] P. Stalph,M. Butz, G. Pedersen, Controlling a four degree of freedomarm in 3Dusing the XCSF learning classifier system, in: KI 2009: Advances in ArtificialIntelligence, 2009, pp. 193–200.

[73] H. Drucker, C. Burges, L. Kaufman, A. Smola, V. Vapnik, Support vectorregressionmachines, in: Advances inNeural Information Processing Systems,1997, pp. 155–161.

[74] A.J. Smola, B. Schölkopf, A tutorial on support vector regression, Statistics andComputing 14 (2004) 199–222.

[75] D. Nguyen-Tuong, J. Peters, Incremental sparsification for real-time onlinemodel learning, in: International Conference on Artificial Intelligence andStatistics, vol. 9, 2010, pp. 557–564.

Page 15: On-line regression algorithms for learning mechanical models of robots: A survey

O. Sigaud et al. / Robotics and Autonomous Systems 59 (2011) 1115–1129 1129

[76] J. Suykens, J. Vandewalle, Least squares support vector machine classifiers,Neural Processing Letters 9 (1999) 293–300.

[77] J. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, J. Vandewalle, LeastSquares Support Vector Machines, 2002.

[78] M. Fumagalli, A. Gijsberts, S. Ivaldi, L. Jamone, G. Metta, L. Natale, F. Nori, G.Sandini, Learning to exploit proximal force sensing: a comparison approach,in: O. Sigaud, J. Peters (Eds.), From Motor to Interaction Learning in Robots,Springer, 2010, pp. 149–167.

[79] B. Schölkopf, A. Smola, R. Williamson, P. Bartlett, New support vectoralgorithms, Neural Computation 12 (2000) 1207–1245.

[80] C. Chang, C. Lin, LIBSVM: a library for support vector machines, 2001.Software available at: http://www.csie.ntu.edu.tw/cjlin/libsvm.

[81] D. Nguyen-Tuong, J. Peters, M. Seeger, B. Schölkopf, Learning inversedynamics: a comparison, in: European Symposium on Artificial NeuralNetworks, 2008.

[82] T. Cederborg, M. Li, A. Baranes, P.-Y. Oudeyer, Incremental local onlineGaussian mixture regression for imitation learning of multiple tasks, in:IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010.pp. 267–274.

[83] Y. Choi, S.Y. Cheong, N. Schweighofer, Local online support vector regressionfor learning control, in: International Symposium on ComputationalIntelligence in Robotics and Automation, 2007, pp. 13–18.

[84] J. Ma, J. Theiler, S. Perkins, Accurate on-line support vector regression, NeuralComputation 15 (2003) 2683–2703.

[85] R. Rifkin, G. Yeo, T. Poggio, Regularized least-squares classification, NATOScience Series Sub Series III Computer and Systems Sciences 190 (2003)131–154.

[86] A. Rahimi, B. Recht, Random features for large-scale kernel machines, in: Ad-vances in Neural Information Processing Systems, 2008, pp. 1177–1184.

[87] A. Gijsberts, G. Metta, Incremental learning of robot dynamics using randomfeatures, in: IEEE International Conference on Robotics and Automation,2010, pp. 951–956.

[88] C.K.I. Williams, Prediction with Gaussian processes: from linear regressionto linear prediction and beyond, Technical Report NCRG/97/012, NeuralComputing Research Group, 1997.

[89] J. Quiñonero Candela, C.E. Rasmussen, A unifying view of sparse approximateGaussian process regression, Journal of Machine Learning Research 6 (2005)1939–1959.

[90] D. Grollman, O.C. Jenkins, Sparse incremental learning for interactive robotcontrol policy estimation, in: IEEE International Conference on Robotics andAutomation, 2008, pp. 3315–3320.

[91] F. Woods, D. Grollman, K. Heller, O.C. Jenkins, M. Black, IncrementalNonparametric Bayesian Regression, Technical Report, Brown UniversityDepartment of Computer Science, 2008.

[92] E. Meeds, S. Osindero, An alternative infinite mixture of Gaussian processexperts, in: Advances in Neural Information Processing Systems, MIT Press,2006, pp. 883–890.

[93] D. Nguyen-Tuong, J. Peters, Local Gaussian processes regression for real-time model-based robot control, in: IEEE/RSJ International Conference onIntelligent Robot Systems, 2008. pp. 380–385.

[94] D. Nguyen-Tuong, M. Seeger, J. Peters, Model learning with local Gaussianprocess regression, Advanced Robotics 23 (2009) 2015–2034.

[95] D. Nguyen-Tuong, M. Seeger, J. Peters, Real-time local Gaussian processesmodel learning, in: O. Sigaud, J. Peters (Eds.), From Motor to InteractionLearning in Robots, Springer, 2010, pp. 193–207.

[96] Y. Engel, S. Mannor, R. Meir, The kernel recursive least-squares algorithm,IEEE Transactions on Signal Processing 52 (2004) 2275–2285.

[97] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incompletedata via the EM algorithm, Journal of the Royal Statistical Society. Series B 39(1977) 1–38.

[98] Z. Ghahramani, S.T. Roweis, Learning nonlinear dynamical systems using anEM algorithm, in: Advances in Neural Information Processing Systems, MITPress, 1999, pp. 599–605.

[99] S. Calinon, Robot Programming by Demonstration, EPFL, CRC Press, 2009.[100] A. Angeli, D. Filliat, S. Doncieux, J. Meyer, Fast and incremental method

for loop-closure detection using bags of visual words, IEEE Transactions onRobotics 24 (2008) 1027–1037.

[101] M. Lopes, B. Damas, A learning framework for generic sensory-motor maps,in: IEEE/RSJ International Conference on Intelligent Robot Systems, 2007,pp. 1533–1540.

[102] B.D. Argall, S. Chernova, M. Veloso, B. Browning, A survey of robot learningfrom demonstration, Robotics and Autonomous Systems 57 (2009) 469–483.

[103] A.H. Sayed, Adaptive Filters, Wiler-IEEE Press, 2008.[104] 2008. http://homepages.inf.ed.ac.uk/svijayak/software/LWPR/.[105] P. Stalph, J. Rubinsztajn, O. Sigaud, M. Butz, A comparative study: Function

approximation with LWPR and XCSF, in: 13th International Workshop onAdvances in Learning Classifier Systems, 2010.

[106] 2009. http://www.coboslab.psychologie.uniwuerzburg.de/code/.[107] Y. Shen, A. Ng, M. Seeger, Fast Gaussian process regression using kd-trees,

in: Advances in Neural Information Processing Systems, 2006, p. 1225.[108] R. Grupen, T. Henderson, I. McCammon, A survey of general-purpose

manipulation, The International Journal of Robotics Research 8 (1989) 38.

[109] A. Bicchi, Hands for dexterous manipulation and robust grasping: a difficultroad toward simplicity, IEEE Transactions on Robotics and Automation 16(2000) 652–662.

[110] K. Kaneko, K. Harada, F. Kanehiro, Development of multi-fingered hand forlife-size humanoid robots, in: IEEE International Conference on Robotics andAutomation, 2008, pp. 913–920.

[111] D. Nguyen-Tuong, J. Peters, Using model knowledge for learning inversedynamics, in: IEEE International Conference on Robotics and Automation,2010.

[112] J. Sun de la Cruz, D. Kulić, W. Owen, Online incremental learning of inversedynamics incorporating prior knowledge, in: International Conference onAutonomous and Intelligent Systems, Springer, 2011, pp. 167–176.

[113] S. Ulbrich, V. Ruiz de Angulo, T. Asfour, C. Torras, R. Dillmann, Rapidlearning of humanoid body schemas with kinematic Bezier maps, in: IEEE-RAS International Conference on Humanoid Robots, 2009, pp. 431–438.

[114] P. Robbel, Active Learning in Motor Control, Master’s Thesis, University ofEdinburgh, UK, 2005.

[115] R. Saegusa, G. Metta, G. Sandini, Active learning for sensorimotor co-ordinations of autonomous robots, in: Human System Interaction, 2009,pp. 701–706.

[116] A. Baranes, P.-Y. Oudeyer, R-IAC: robust intrinsically motivated explorationand active learning, IEEE Transactions on Autonomous Mental Development1 (2009) 155–169.

[117] A. Baranes, P.-Y. Oudeyer, Maturationally-constrained competence-basedintrinsically motivated learning, in: IEEE International Conference onDevelopment and Learning, 2010.

[118] G. Petkos, M. Toussaint, S. Vijayakumar, Learning multiple models of non-linear dynamics for control under varying contexts, in: International Confer-ence on Artificial Neural Networks, ICANN, Springer, 2006, pp. 898–907.

[119] G. Petkos, S. Vijayakumar, Load estimation and control using learneddynamicsmodels, in: IEEE/RSJ International Conference on Intelligent Robotsand Systems, 2007, pp. 1527–1532.

[120] K.M.A. Chai, C.K.I. Williams, S. Klanke, S. Vijayakumar, Multi-task Gaussianprocess learning of robot inverse dynamics, in: Advances in NeuralInformation Processing Systems, 2009.

[121] D.-Y. Yeung, Y. Zhang, Learning inverse dynamics by Gaussian processregression under the multi-task learning framework, in: The Path toAutonomous Robots, 2009, pp. 131–142.

Olivier Sigaud received his Ph.D. in Computer Sciencein 1996 from the University of Paris 11, Orsay, France.He has been Research Engineer for the Dassault Aviation(Aerospace) company from 1995 to 2001.

Then, he joined the AnimatLab in the Computer Sci-ence Department of UPMC in Paris, as Assistant Professor.He became Professor there in 2005. In 2007, the Animat-Lab joined the ISIR in the same University. Olivier Sigaudis the author of over 75 referenced technical publications.His research interests include Machine Learning, Humanmotor control, and Robotics.

Camille Salaün is teaching assistant in MechanicalEngineering at UPMC. He received his Master’s degreein 2007 and his Ph.D. in 2010 from UPMC. His researchactivities mainly focus on robot model learning, withapplications to the humanoid robot iCub.

Vincent Padois is Associate Professor of Robotics andComputer Science and a member of the Institut des Sys-tèmes Intelligents et de Robotique (ISIR) at UniversitéPierre et Marie Curie (UPMC) in Paris, France. He receivedhis Master’s degree in 2001 and his Ph.D. in 2005 bothin Automatic Control from the Institut National Polytech-nique de Toulouse, France. In 2006 and 2007, he was apost-doctoral fellow in the group of Professor O. Khatibat Stanford University. His research activities mainly fo-cus on the modelling and control of redundant and com-plex systems such as wheeled mobile manipulators and

humanoid robots. As a natural extension, his research activities involve humanmo-tion analysis and the combination of model-based control theory in Robotics withincremental model learning techniques. Finally, he is involved in the developmentof dynamic simulation software dedicated to Robotics.