incremental online sparsi cation for model learning in

14
Incremental Online Sparsification for Model Learning in Real-time Robot Control Duy Nguyen-Tuong and Jan Peters Max Planck Institute for Biological Cybernetics, Spemannstraße 38, 72076 T¨ ubingen Abstract For many applications such as compliant, accurate robot tracking control, dynamics models learned from data can help to achieve both compliant control performance as well as high tracking quality. Online learning of these dynamics models allows the robot controller to adapt itself to changes in the dynamics (e.g., due to time-variant nonlinearities or unforeseen loads). However, online learning in real-time applications – as required in control – cannot be realized by straightforward usage of off-the-shelf machine learning methods such as Gaussian process regression or support vector regres- sion. In this paper, we propose a framework for online, incremental sparsification with a fixed budget designed for fast real-time model learning. The proposed approach employs a sparsification method based on an independence measure. In combination with an incremental learning approach such as incremental Gaussian process regression, we obtain a model approximation method which is applicable in real-time online learning. It exhibits competitive learning accuracy when compared with standard regression techniques. Implementation on a real Barrett WAM robot demonstrates the applicability of the approach in real-time online model learning for real world systems. Keywords: Sparse Data, Machine Learning, Real-time Online Model Learning, Inverse Dynamics, Robot Control 1. Introduction In recent years, model learning has become an important tool in a variety of robotics applica- tions such as terrain modeling [5], sensor evalu- ation [11], model-based control [8, 13] and many others. The reason for this rising interest is that accurate analytical models are often hard to ob- tain due to the increasing complexity of modern robot systems. Model learning can be a useful al- ternative as the model is estimated directly from measured data. Unknown nonlinearites are di- rectly taken in account, while they are neglected either by the standard physics-based modeling tech- niques or approximated by hand-crafted models. Nevertheless, the excessive computational com- plexity of the more advanced regression techniques still hinders their widespread application in robotics. Models that have been learned offline can only approximate the model correctly in the area of the state space that is covered by the sampled data, and often do not generalize beyond that re- gion. Thus, in order to cope with unknown state space regions online model learning is essential. Furthermore, it also allows the adaptation of the model to changes in the robot dynamics, for ex- ample, due to unforeseen loads or time-variant nonlinearities such as backlash, complex friction and stiction. However, real-time online model learning poses three major challenges: first, the learning and prediction processes need to be sufficiently fast; second, the learning system needs to deal with large amounts of data; third, the data arrives as a continuous stream and, thus, the model has to be continuously adapted to new training exam- Preprint submitted to Neurocomputing June 7, 2010

Upload: others

Post on 22-Jan-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Incremental Online Sparsi cation for Model Learning in

Incremental Online Sparsification

for Model Learning in Real-time Robot Control

Duy Nguyen-Tuong and Jan Peters

Max Planck Institute for Biological Cybernetics, Spemannstraße 38, 72076 Tubingen

Abstract

For many applications such as compliant, accurate robot tracking control, dynamics models learnedfrom data can help to achieve both compliant control performance as well as high tracking quality.Online learning of these dynamics models allows the robot controller to adapt itself to changes inthe dynamics (e.g., due to time-variant nonlinearities or unforeseen loads). However, online learningin real-time applications – as required in control – cannot be realized by straightforward usage ofoff-the-shelf machine learning methods such as Gaussian process regression or support vector regres-sion. In this paper, we propose a framework for online, incremental sparsification with a fixed budgetdesigned for fast real-time model learning. The proposed approach employs a sparsification methodbased on an independence measure. In combination with an incremental learning approach such asincremental Gaussian process regression, we obtain a model approximation method which is applicablein real-time online learning. It exhibits competitive learning accuracy when compared with standardregression techniques. Implementation on a real Barrett WAM robot demonstrates the applicability ofthe approach in real-time online model learning for real world systems.

Keywords: Sparse Data, Machine Learning, Real-time Online Model Learning, Inverse Dynamics,Robot Control

1. Introduction

In recent years, model learning has become animportant tool in a variety of robotics applica-tions such as terrain modeling [5], sensor evalu-ation [11], model-based control [8, 13] and manyothers. The reason for this rising interest is thataccurate analytical models are often hard to ob-tain due to the increasing complexity of modernrobot systems. Model learning can be a useful al-ternative as the model is estimated directly frommeasured data. Unknown nonlinearites are di-rectly taken in account, while they are neglectedeither by the standard physics-based modeling tech-niques or approximated by hand-crafted models.Nevertheless, the excessive computational com-plexity of the more advanced regression techniquesstill hinders their widespread application in robotics.

Models that have been learned offline can onlyapproximate the model correctly in the area ofthe state space that is covered by the sampleddata, and often do not generalize beyond that re-gion. Thus, in order to cope with unknown statespace regions online model learning is essential.Furthermore, it also allows the adaptation of themodel to changes in the robot dynamics, for ex-ample, due to unforeseen loads or time-variantnonlinearities such as backlash, complex frictionand stiction.

However, real-time online model learning posesthree major challenges: first, the learning andprediction processes need to be sufficiently fast;second, the learning system needs to deal withlarge amounts of data; third, the data arrives asa continuous stream and, thus, the model has tobe continuously adapted to new training exam-

Preprint submitted to Neurocomputing June 7, 2010

Page 2: Incremental Online Sparsi cation for Model Learning in

ples. A few approaches for real-time model learn-ing for robotics have been introduced in the ma-chine learning literature, such as locally weightedprojection regression [18] or local Gaussian pro-cess regression [10]. In these methods, the statespace is partitioned in local regions for which localmodels are approximated and, thus, these meth-ods will not make use of the global behavior ofthe embedded functions. As the proper alloca-tion of relevant areas of the state space is es-sential, appropriate online clustering becomes acentral problem for these approaches. For highdimensional data, partitioning of the state spaceis well-known to be a difficult issue [18, 10]. Tocircumvent this online clustering, an alternativeis to find a sparse representation of the knowndata [12, 15, 6]. For robot applications, however,it requires finding an incremental sparsificationmethod applicable in real-time online learning –a major challenge tackled in this paper.

Inspired by the work in [15, 3], we proposea method for incremental, online sparsificationwhich can be integrated into several existing on-line regression methods, making them applicablefor model learning in real-time. The suggestedsparsification is performed using a test of linear in-dependence to select a sparse subset of the train-ing data points, often called the dictionary. Theresulting framework allows us to derive criteria forincremental insertion and deletion of dictionarydata points, which are two essential operations insuch an online learning scenario. For evaluation,we combine our sparsification framework with anincremental approach for Gaussian process regres-sion (GPR) as described in [10]. The resulting al-gorithm is applied in online learning of the inversedynamics model for robot control [17, 9].

The rest of the paper will be organized as fol-lows: first, we present our sparsification approachwhich enables real-time online model learning. InSection 3, the efficiency of the proposed approachin combination with an incremental GPR updateis demonstrated by an offline comparison of learn-ing inverse dynamics models with well-establishedregression methods, i.e., ν-support vector regres-sion [16], standard Gaussian process regression[12], locally weighted projection regression [18]

and local Gaussian process regression [10]. Fi-nally, the capability of incremental GPR usingonline sparsification for real-time model learningwill be illustrated by online approximation of in-verse dynamics models for real-time tracking con-trol on a Barrett WAM. A conclusion will be givenin Section 4.

2. Incremental Sparsification for Real-timeOnline Model Learning

In this section, we introduce a sparsificationmethod which – in combination with an incre-mental kernel regression – enables fast, real-timemodel learning. The proposed sparsification ap-proach is formulated within the framework of ker-nel methods. Therefore, we first present the basicintuition behind the kernel methods and motivatethe need of online sparsification. Subsequently,we describe the proposed sparsification methodin details.

2.1. Model Learning with Kernel Methods

By learning a model, we want to approximatea mapping from the input set X to the target setY. Given n training data points {xi, yi}ni=1, weintend to discover the latent function f(xi) whichtransforms the input vector xi into a target valueyi given by the model yi=f(xi)+εi, where εi rep-resents a noise term. In general, it is assumedthat f(x) can be parametrized as f(x)=φ(x)T w,where φ is a feature vector mapping the input xinto some high dimensional space and w is thecorresponding weight vector [15, 12]. The weightw can be represented as a linear combination ofthe input vectors in the feature space, i.e., w =∑n

i=1 αiφ(xi) with αi denoting the linear coeffi-cients. Using these results, the prediction y of aquery point x can be given as

y = f(x) =∑n

i=1 αi〈φ(xi),φ(x)〉 ,=∑n

i=1 αik(xi,x) .(1)

As indicated by Equation (1), the inner product offeatures vectors 〈φ(xi),φ(x)〉 can be representedas a kernel value k(xi,x) [15]. Thus, instead offinding a feature vector, only appropriate kernels

2

Page 3: Incremental Online Sparsi cation for Model Learning in

datastream

insertSelectData Points

delete

Dictionary Online ModelLearning

Figure 1: Sparsification for online model learning.

need to be determined. An often used kernel is,for example, the Gaussian kernel

k(xp,xq)=exp

(−1

2(xp−xq)

T W(xp−xq)

), (2)

where W denotes the kernel widths [15, 12]. Foremploying kernel methods in model learning, how-ever, one needs to estimate the linear coefficientsαi using training examples. State-of-the-art ker-nel methods such as kernel ridge regression, sup-port vector regression (SVR) or Gaussian processregression (GPR), differ in the approaches for es-timating αi [15, 12, 4]. While support vector re-gression estimates the linear coefficients by opti-mization using training data [15], kernel ridge re-gression and Gaussian process regression basicallysolve the problem by matrix inversion [4, 12] (seethe Appendix for a short review of GPR). Thecomplexity of model learning with kernel meth-ods, i.e., the estimation of αi, depends largely onthe number of training examples. In GPR, for ex-ample, the computational complexity is O(n3), ifthe model is obtained in batch learning.

However, online model learning requires incre-mental updates, e.g., incremental estimation ofαi, as the data arrives sequentially. There havebeen many attempts to develop incremental, on-line algorithms for kernel methods, such as incre-mental SVM [2], sequential SVR [19], recursivekernel learning with NARX form [6] or the ker-nel recursive least-squares algorithm [3], for anoverview see [15]. However, most incremental ker-nel methods do not scale to online learning in real-time, e.g., for online learning with model updatesat 50 Hz or faster. The main reason is that theyare neither sparse [2, 19], as they use the completedata set for model training, nor do they restrictthe size of the sparse set [3]. To overcome theseshortcomings, we propose the setup illustrated inFigure 1.

To ensure real-time constraints, we train themodel using a dictionary with a fixed budget.The budget of the dictionary, i.e., the sparse set,needs to be determined from the intended learn-ing speed and available computational power. Toefficiently make use of the stream of continuouslyarriving data, we select only the most informa-tive data points for the dictionary. If the budgetlimit is reached, dictionary points will need to bedeleted. Finally, for the model training using dic-tionary data, most incremental kernel regressionmethods can be employed, e.g., incremental GPRas described in [10], sequential SVR [19] or incre-mental SVM [2].

Inspired by the work in [3, 14], we use a linearindependence measure to select the most infor-mative points given the current dictionary. Basedon this measure, we derive criteria to remove datapoints from the dictionary, if a given limit is reached.The following sections describe the proposed ap-proach in detail.

2.2. Sparsification using Linear Independence Test

The main idea in our sparsification approachis that we intend to cover the relevant state spaceat the best, given a limited number of dictio-nary points. At any point in time, our algorithmmaintains a dictionary D = {di}mi=1 where m de-notes the current number of dictionary points di.The choice of the dictionary element di might becrucial for particular application and will be dis-cussed in Section 2.5. To test whether a new pointdm+1 should be inserted into the dictionary, weneed to ensure that it can not be approximatedin the feature space spanned by the current dic-tionary set. This test can be performed using ameasure δ defined as

δ =

wwwwwm∑

i=1

aiφ(di)− φ(dm+1)

wwwww2

, (3)

(see, e.g., [14, 15] for background information),where ai denote the coefficients of linear depen-dence. Equation (3) can be understood as a dis-tance of the new point dm+1 to the linear planespanned by the dictionary set D in the featurespace as illustrated in Figure 2. Thus, the value

3

Page 4: Incremental Online Sparsi cation for Model Learning in

Figure 2: Geometrical interpretation of the independencemeasure δ. Here, the dictionary consists of two data points{d1,d2} which span a linear plane in the feature space.The independence measure δ for a new data point dnew

can be interpreted as the distance to this plane.

δ can be considered as an independence measureindicating how well a new data point dm+1 can beapproximated in the feature space of a given dataset. Thus, the larger the value of δ is, the moreindependent is dm+1 from the dictionary set D,and the more informative is dm+1 for the learningprocedure.

The coefficients ai from Equation (3) can bedetermined by minimizing δ. Formulated in ma-trix form, the minimization of δ can be given as

a = mina

[aTKa− 2aTk + k

], (4)

where K = k(D,D) represents the kernel ma-trix, k= k(D,dm+1) is the kernel vector and k=k(dm+1,dm+1) denotes a kernel value. Note thatin Equation (4) we make use of the property thatinner products of feature vectors can be repre-sented as kernel values. Minimizing Equation (4)yields the optimal coefficient vector a = K−1k.The parameter a from Equation (4) can be furtherregularized taking in account problems such asoutliers. The regularization can be controlled bya regularization-parameter which can be chosento alleviate the outlier’s contribution to the spar-sication process. Usually, this parameter can beobtained from training data by cross-validation.However, in this paper we omitted this regulariza-tion as we did not observe any serious problemsdue to outliers. After substituting the optimal

Algorithm 1 Independence test with online up-dates of the dictionary.

Input: new point dm+1, threshold η, Nmax.Compute: a=K−1k with k=k(D,dm+1),Compute: δ = k(dm+1,dm+1)− kTa,if δ > η then

if number of dictionary points < Nmax thenInsert dm+1 into D and update the dictio-nary using Algorithm 2.

elseInsert dm+1 into D and update the dictio-nary using Algorithm 3, while replacing anold dictionary point from D.

end ifIncremental online model training using thenew dictionary D=D ∪ dm+1.

end if

value a into Equation (3), δ becomes

δ = k(dm+1,dm+1)− kTa . (5)

Using δ we can decide whether to insert new datapoints into the dictionary. The sparsification pro-cedure is summarized in Algorithm 1.

The independence measure δ as given in Equa-tion (5), can be used as a criterion for selectingnew data points by thresholding, see Algorithm1. The threshold parameter η implicitly controlsthe level of sparsity. However, for a given thresh-old value η, the number of dictionary points se-lected from the online data stream is not knownbeforehand and, thus, can be very large. Largedictionaries are prohibitively expensive in termsof computational complexity and as a result notreal-time capable. To cope with this problem, weneed to define an upper bound on the dictionarysize and delete old dictionary points if this limitis reached.

For deleting old dictionary points, we considerthe independence measures δi of every dictionarypoint i as illustrated in Figure 3. The value δi

indicates, how well the dictionary point i can beapproximated by the remainder of the dictionary.The score δi has to be updated, when a new datapoint is inserted into the dictionary or an old datapoint is removed from the dictionary. In Sections

4

Page 5: Incremental Online Sparsi cation for Model Learning in

Figure 3: Geometrical interpretation of individual independence measures δi, i=1..3, for the dictionary D={d1,d2,d3}.The dictionary point with a minimal δi is more likely to be deleted. After every dictionary operation (e.g., insertion anddeletion of dictionary points), the individual independence measures δi have to be incrementally updated.

2.3 and 2.4, we will discuss an efficient, incremen-tal way of updating the value δi applicable forreal-time online computation.

2.3. Dictionary Update for Inserting New Points

For inserting a new data point dm+1 into thedictionary, we have to incrementally update theindependence measure δi and corresponding coef-ficients for every dictionary point i, as changingthe dictionary implies a change for δi. This up-date for an existing dictionary point di is achievedby adjusting the corresponding coefficient vectorai. Updating ai implies an update of (Ki)−1 andki. Here, inserting a new point will extend Ki

by a row/column and ki by a value, respectively,such that

Kinew =

[Ki

old km+1

kTm+1 km+1

],

kinew =

[ki

old ki,m+1

]T,

(6)

where km+1 =k(dm+1,dm+1), ki,m+1 =k(di,dm+1),km+1 = k(Di,dm+1) with Di = D\{di}. Usingthe Equation (6), the incremental update of theinverse matrix (Ki

new)−1 is given by

(Ki

new

)−1=

1

γi

[γi(K

iold)−1 +αiα

Ti −αi

−αTi 1

]. (7)

This result leads to the update rule for the linearindependency value δi for the i-th dictionary point

Algorithm 2 Update the dictionary after insert-ing a new data point.

Input: new dictionary point dm+1.Update dictionary D = {di}m+1

i=1 .for i=1 to m do

Computekm+1 =k(Di,dm+1),km+1 =k(dm+1,dm+1),ki,m+1 =k(dm+1,di).

Compute

αi =(Ki

old

)−1km+1,

γi = km+1 −αTi km+1.

Update δi as given in Equation (8).Update (Ki

new)−1 as given in Equation (7).end for

given by

δi = k(di,di)− kiTnewa

inew , with

ainew =

1

γi

[γia

iold +αiα

Ti k

iold − ki,m+1αi

−αTi k

iold + ki,m+1

].

(8)

The variables γi and αi are determined by αi =(Ki

old)−1km+1 and γi =km+1 − kTm+1αi. The pro-

cedure for the dictionary update after insertion ofnew data points is summarized in Algorithm 2.

2.4. Dictionary Update for Replacing Points

As the data arrives continuously in an onlinesetting, it is necessary to limit the number of dic-tionary points so that the computational power

5

Page 6: Incremental Online Sparsi cation for Model Learning in

of the system is not exceeded and the real-timeconstraints are not violated. The individual in-dependence value δi – as illustrated in Figure 3 –gives rise to a straightforward deletion rule: thesmaller the independence value δi is, the morelikely the corresponding dictionary point is to bedeleted. The idea is to delete points that are moredependent on other dictionary points, i.e., wherethe corresponding independence value δi is small.For the deletion of dictionary points, we addition-ally consider a temporal allocation of each dictio-nary point by imposing a time-dependent forget-ting rate λi ∈ [0, 1]. Thus, we take the indepen-dency value δi weighted by the forgetting value λi

as a deleting score. The role of λi will be discussedin detail in Section 2.5.

Insertion and additional deletion of dictionarypoints also change the independence values of otherdictionary points which have to be updated sub-sequently. Insertion of a new point with an addi-tional deletion of the j-th dictionary point impliesa manipulation of the j-th row/column of Ki andthe j-th value of ki given by

Kinew =

Kiold(1:j) km+1(1:j) Ki

old(j:m)

kTm+1(1:j) km+1 kT

m+1(j:m)

KiTold(j:m) km+1(j:m) K

iold(j:m)

,ki

new =[ki

old(1:j) ki,m+1 kiold(j:m)

]T.

(9)

The values km+1, km+1 and ki,m+1 are determinedas shown in Section 2.3. The incremental updateof the independence measure δi for every i-th dic-tionary point can be performed directly using anincremental matrix inverse update. Hence,

δi =k(di,di)−kiTnewa

inew and ai

new =Akinew, (10)

where A is computed by the update rule

A=A∗ − rowj[A∗]TrTA∗

1 + rT rowj[A∗]T

with

A∗=(Kiold)−1 − (Ki

old)−1r rowj[(Kiold)−1]

1 + rT rowj[(Kiold)−1]T

.

Here, the vector r is determined by r = km+1−rowj[K

iold]T and rowj[M ] denotes the j-th row

of a given matrix M . The update of (Kinew)−1

can be given as (Kinew)−1 = A. The complete

procedure is summarized in Algorithm 3.

Algorithm 3 Replace the least important datapoint in the dictionary by a new one.

Input: new dictionary point dm+1.Compute λi as given in Equation (12) and, sub-sequently, find the dictionary point with mini-mal δi weighted by the forgetting rate λi:j=mini(λ

iδi).Update dictionary D by overwriting point j:dj =dm+1.

for i=1 to m doComputekm+1 =k(Di,dm+1),km+1 =k(dm+1,dm+1),ki,m+1 =k(dm+1,di).

Computer = km+1−rowj[K

iold]T .

Update δi as given in Equation (10).Update (Ki

new)−1 =A in Equation (10).end for

2.5. Characterization of the Dictionary Space andTemporal Allocation

In previous sections, we described the proce-dures for incremental insertion and deletion ofdictionary points appropriate for real-time com-putation. The basic idea is that we attempt toapproximate the dictionary space (characterizedby the dictionary vector d) at best using a limitedsparse data set D. However, the choice of the dic-tionary space is a crucial step which may dependon particular applications. As we are dealing withregression in this paper, it is reasonable to choosethe dictionary vector as

d=[x y

]T, (11)

where x is the input vector and y represents thetarget values. In so doing, our dictionary spacewill include the input as well as target distribu-tions. In practice, it shows that input and targetdistributions both incorporate relevant informa-tion for model approximation by regression. Forother applications such as learning classificationmodels, considering the input space only mightbe sufficient.

In addition to the spatial allocation of the dic-tionary space as achieved by employing the inde-

6

Page 7: Incremental Online Sparsi cation for Model Learning in

pendence measure δ, temporal allocation can betaken into account by introducing a time-variantforgetting factor for every dictionary point i. Here,we consider the forgetting rate

λi(t)=exp

(−(t− ti)2

2h

), (12)

where ti is the time when the dictionary point iis included into the sparse set and h representsthe intensity of the forgetting rate. For deleting apoint from the dictionary, the deletion score is theindependence value weighted by the correspond-ing forgetting value λi ∈ [0, 1], as shown in Algo-rithm 3. By imposing a temporal weight on theindependence values, we make sure that tempo-ral information encoded in the data is sufficientlytaken in account for the model learning. The for-getting score λ (controlled by the parameter h)represents a trade-off between temporal and spa-tial covering. If the h value is very small, the tem-poral effects will gain more importance and thesparsication will behave similar to a time windowin common online algorithms.

2.6. Comparison to Previous WorkOur work was inspired by the work in [3] where

the authors also use a linear independence mea-sure to select dictionary points. However, theydo not remove dictionary points again but in-stead show that the dictionary size is boundedfor a given threshold η if the data space is as-sumed to be compact. In practice, for a giventhreshold value the actual dictionary size is datadependent and unknown beforehand, thus, thedictionary can be very large. In order to copewith the computational constraints in real-timeapplications, the dictionary size has to be limited.In contrast to the algorithm in [3], our approachallows us to formulate an efficient insertion anddeletion procedure for a given budget. In [3], thespatial allocation is performed in the input spaceonly, resulting in an uniform covering of the com-plete state space which might be suboptimal formodel approximation by regression. As we addi-tionally employ a temporal allocation, the result-ing online learning algorithm is able to adapt themodel to temporal effects which is an importantissue for online learning on real systems.

0 0.5 1 1.5 2 2.5 3−2

−1.5

−1

−0.5

0

0.5

1

1.5

X

Y

Test DataKRLSSI−GPR

(a) Comparison with KRLS

0 0.5 1 1.5 2 2.5 3−2

−1.5

−1

−0.5

0

0.5

1

1.5

X

Y

Test Data5 DictPt10 DictPt15 DictPt

(b) SI-GPR prediction with differentbudgets

Figure 4: An illustrative example of the prediction after asingle sweep through a toy data set. The dots show the po-sitions of the corresponding dictionary points in the inputspace. (a) The performance of SI-GPR is quite similar toKRLS while the selection of the sparse data sets poses themain difference. SI-GPR selects 15 dictionary points andKRLS 21 data points, respectively. (b) Using a fixed bud-get, e.g., 5, 10 or 15 dictionary points, the most relevantregions in the input space are covered with data points.

An Illustrative Toy-Example

In the following section, we compare the ker-nel recursive least square (KRLS) [3] with our ap-proach. For the model training, we combine thesparsification with an incremental GPR learning[10], called sparse incremental GPR (SI-GPR). Aquick review for GPR and its incremental updatescan be found in the Appendix. It should be notedthat other incremental model learning methodscan also be used in combination with our incre-mental sparsification, such as sequential SVR orincremental SVM [19, 2].

As a toy example, we generate a noisy data set

7

Page 8: Incremental Online Sparsi cation for Model Learning in

where the relation between the target y and theinput x is given by yi =sin(x2

i )+εi. The data setconsists of 315 points, where εi is white Gaussiannoise with standard deviation 0.2 and xi rangesfrom 0 to π. Here, the data is incrementally fedto the algorithms and the prediction is computedafter a single sweep through the data set. Theresults are shown in Figure 4, where η= 0.3 anda Gaussian kernel with width 0.3 is being used.In Figure 4 (a), it can be seen that the predic-tion performance of SI-GPR using dictionary isquite similar to KRLS. Here, the selection of thesparse data points presents the main difference.While KRLS uniformly fills up the complete in-put space (resulting in 21 dictionary points), thesparsification used by SI-GPR selects the mostrelevant data points as dictionary points, wherethe limit of the dictionary is set to be 15. Figure 4(b) shows the performance of SI-GPR for differentdictionary sizes, where the prediction improves asexpected with increasing dictionary size.

3. Evaluations

In this section, we evaluate our sparsificationapproach used in combination with the incremen-tal GPR (SI-SVR) in several different experimen-tal settings with a focus on inverse dynamics mod-eling for robot tracking control.

First, we give a short review of learning dy-namics models for control. Subsequently, we eval-uate the algorithm in the context of learning in-verse dynamics. The learning accuracy of SI-GPRwill be compared with other regression methods,i.e., LWPR [18], GPR [12], ν-SVR [16] and LGP[10]. For this evaluation in inverse dynamics learn-ing, we employ 2 data sets including syntheticdata, as well as real robot data generated from the7 degrees-of-freedom (DoF) Barrett WAM, shownin Figure 6 (a). In Section 3.3, SI-GPR is appliedfor real-time online learning of inverse dynamicsmodels for robot computed torque control (state-of-the-art batch regression methods can not beapplied for online model learning). Finally, inSection 3.4, we demonstrate the capability of theapproach in online learning of a dynamics whichchanges in presence of different loads.

Dynamics Model Robot

qd

qd

qd

KvKp

∑∑

∑+

+ +

−−+

+

u

qq

∑+

Figure 5: Feedforward control with online model learning.

3.1. Learning Dynamics Models for Control

Model-based tracking control laws [17] deter-mine the joint torques u that are required forthe robot to follow a desired trajectory qd, qd, qd,where qd, qd, qd denote the desired joint angles,velocity and acceleration. In feedforward control,the motor command u consists of two parts: afeedforward term uFF to achieve the movementand a feedback term uFB to ensure stability ofthe tracking, as shown in Figure 5. The feed-back term can be a linear control law such asuFB = Kpe+Kve, where e denotes the trackingerror with position gain Kp and velocity gain Kv.The feedforward term uFF is determined using aninverse dynamics model and, traditionally, the an-alytical rigid-body model is employed [17].

If a sufficiently precise inverse dynamics modelcan be approximated, the resulting control lawu=uFF (qd, qd, qd)+uFB will accurately drive therobot along the desired trajectory. Due to thehigh complexity of modern robot systems such ashumanoids or service robots, traditional analyti-cal rigid-body model often cannot provide a suf-ficiently accurate inverse dynamics model. Thelack of model precision has to be compensatedby increasing the tracking gains Kp and Kv mak-ing the robot stiff and less safe for the environ-ment [9]. Thus, to fulfill both requirements ofcompliant control, i.e., having low tracking gainsand high tracking accuracy, more precise modelsare necessary. One possibility to obtain an accu-rate inverse dynamics model is to learn it directlyfrom measured data. The resulting problem is a

8

Page 9: Incremental Online Sparsi cation for Model Learning in

(a) Barrett WAM

1 2 3 4 5 6 70

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Degree of Freedom

nMS

E

LWPRLGPν−SVRGPRSI−GPR

(b) Error on simulation data

1 2 3 4 5 6 70

0.05

0.1

0.15

0.2

0.25

Degree of Freedom

nMS

E

LWPRLGPν−SVRGPRSI−GPR

(c) Error on real WAM data

Figure 6: (a) 7-DoF Barrett WAM, used for evaluations. (b) Error (nMSE) on simulated data from a robot modelfor every DoF. (c) Error (nMSE) on real robot data for every DoF. As shown by the results, SI-GPR is competitiveto standard batch regression methods such as ν-SVR and GPR, local learning methods such as LWPR and LGP. Formodel learning with SI-GPR, the dictionary size is limited to be 2000 data points.

regression problem by approximating the inversedynamics model q, q, q→ u from sampled data[1]. The resulting mapping can be subsequentlyapplied for predicting the appropriate feedforwardmotor commands uFF. As trajectories and corre-sponding joint torques are sampled directly fromthe real robot, learning the inverse dynamics willinclude all nonlinearities of the system encodedby the data.

If the dynamics model can be learned online(see Figure 5), the robot controller can adapt it-self to changes in the robot dynamics, e.g., dueto unforeseen load or time-variant disturbances.Online learning of dynamics models using sam-pled data realized in a setup as in Figure 5 can beconsidered as a self-supervised learning problem.

3.2. Offline Comparison in Learning Inverse Dy-namics

For the generation of two data sets, i.e., onewith simulation data and one with real robot data,we sample joint space trajectories and correspond-ing torques from an analytical model of the Bar-rett WAM, as well as from the real robot. Thisresults in two test scenarios, each having 12000training points and 3000 test points. Given sam-ples x=[q, q, q] as input and using the correspond-ing joint torques y=u as targets, we have a re-gression problem with 21 input dimensions and7 output dimensions (i.e., a single desired torque

for each motor joint). The robot inverse dynamicsmodel is estimated separately for each DoF em-ploying LWPR, ν-SVR, GPR, LGP and SI-GPR.

Employing SI-GPR for learning the inverse dy-namics model, we first sparsify the full data sets(i.e., 12000 data points) as described in Section2, and subsequently apply incremental GPR foran offline approximation of the model. For alldata sets, the dictionary size is limited to be 2000points, where the parameter η is set to be 0.1.For the sparsification process and the incremen-tal GPR, we employ the Gaussian kernel, whosehyperparameters are obtained by optimizing themarginal likelihood [12].

Figures 6 (b) and (c) show the offline approx-imation errors on the test sets evaluated usingthe normalized mean square error (nMSE) whichis defined as the fraction of mean squared errorand the variance of target. It can be observedfrom the results that SI-GPR using sparsificationis competitive in learning accuracy despite thesmaller amount of training examples (i.e., dic-tionary points). In practice, it also shows thatthe models are easier to train using SI-GPR com-pared to local learning methods such as LWPRand LGP, as the latter require an appropriateclustering of the state space which is not straight-forward to perform for many data sets.

9

Page 10: Incremental Online Sparsi cation for Model Learning in

1 2 3 4 5 6 70

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Degree of Freedom

RM

SE

Rigid−BodySI−GPR

(a) Tracking error on BarrettWAM in joint space for all 7DoFs

0 10 20 30 40 50 60

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

Time [sec]Jo

int P

ositi

on [r

ad]

DesiredRigid−BodySI−GPR

(b) Tracking error on BarrettWAM in joint space for the 1stDoF

0 10 20 30 40 50 60

−10

−8

−6

−4

−2

0

2

4

6

Time [sec]

Join

t Tor

que

[Nm

]

Joint torque: uFeedforward: u

FF

(c) Feedforward uFF and jointtorque u of the 1st DoF duringSI-GPR online learning

Figure 7: (a) Compliant tracking performance in joint space after 60 sec computed as RMSE. Computed torque controlwith online learned model (SI-GPR) outperforms common analytical rigid-body model. (b) Tracking performance injoint space for the 1st DoF, e.g., the shoulder flexion extension, (other DoFs are similar) during the first 60 sec. (c)Predicted feedforward torque uFF and joint torque u of the 1st DoF. The predicted torque uFF gradually converges tothe joint torque u as the latter is sampled as target for the model learning.

3.3. Model Online Learning in Computed TorqueControl

In this section, we apply SI-GPR for the real-time online learning of the inverse dynamics modelfor torque prediction in robot tracking control,as introduced in Section 3.1. The control taskwith model online learning is performed with 500Hz sample rate (i.e., we get a new test point ev-ery 2 ms) on the real Barrett WAM. In so doing,the trajectory q, q, q and the corresponding jointtorques u are sampled online as shown in Figure5. The inverse dynamics model, i.e., the mappingq, q, q→u, is learned in a self-supervised mannerduring the tracking task using the trajectory asinput and the joint torque as target, starting withan empty dictionary. As the learned inverse dy-namics model is subsequently applied to computethe feedforward torques uFF, it can be observedthat uFF gradually converges against u. In thisexperiment, we set the tracking gains (see Sec-tion 3.1) to very low values satisfying the require-ment of compliant control. In so doing, inaccurateinverse dynamics models, however, will result inlarge tracking errors.

In this online learning experiment, the maxi-mal dictionary size is set to be 300, η=0.01 and aGaussian kernel is used whose parameters are de-

termined by optimizing the marginal likelihood.For the forgetting score λ, a forgetting rate h=0.4is chosen by cross-validation. The desired track-ing trajectory is generated such that the robot’send-effector follows a 8-figure in the task space.For comparison, we also apply an analytical rigid-body model for torque prediction [17]. The track-ing results are shown in Figure 7, where trackingcontrol task is performed for 60 sec on BarrettWAM. During this time, the dictionary is first in-crementally filled up and subsequently updatedabout 700 times. The inverse dynamics model isincrementally learned online using the dictionary.

Figure 7 (a) compares the tracking performanceof the online learned model and the analyticalrigid-body model in joint space for all 7 DoFs,where the tracking error is computed as root meansquare error (RMSE). It can be observed that thetracking accuracy can be significantly improved,if the inverse dynamics model is approximated on-line. For several DoFs such as the 4th DoF (i.e,the elbow flexion), the rigid-body model fails todescribed the true dynamics resulting in a largetracking error as shown in Figure 7 (a). This ex-ample demonstrates the difficulty in analyticallymodeling complex systems which may be allevi-ated by directly learning the model.

10

Page 11: Incremental Online Sparsi cation for Model Learning in

(a) Starting to Learn (b) After Learning (c) Dynamics Alternation (d) Degraded Performance

(e) After Learning (f) Dynamics Alternation (g) Degraded Performance (h) After Learning

Figure 8: Tracking experiment with online model learning. The blue, thick lines illustrate the desired trajectory and thered, dotted lines show the robot trajectory in task space. (a) As the model is started to be learned online (beginningwith an empty dictionary), the robot first shows a transient behavior. (b) With a successfully learned model, the robotis able to follow the trajectory well. (c) The original dynamics is changed by hanging a heavy water bottle to the arm.(d) Due to the modified dynamics, the robot fails to track the desired initial trajectory. Therefore, the robot starts tolearn the modified dynamics online. (e) As the online model learning converges, the robot gradually moves back to thedesired initial position. (f) The dynamics is modified again by removing the water bottle. (f) The learned model is nomore accurate, i.e., the predicted torques are now too large. The modified dynamics is subsequently adapted online. (g)As the dynamics model is successfully adapted, the robot returns to the initial desired position. A video showing theexperiment can be seen at “http://www.robot-learning.de/”.

Figures 7 (b,c) show the tracking performancein joint space for the 1st DoF, i.e., the shoulderflexion extension, during the first 60 sec, otherDoFs are similar. It can be seen that the pre-dicted torque uFF (for 1st DoF) consistently con-verges to the joint torque u as the latter is sam-pled as target for the model learning.

3.4. Online Learning for Changing Dynamics

In this section, we demonstrate the capabil-ity of the algorithm for online model learning andself-adaption to changes in the dynamics in a morecomplex task. Figures 8 (a)-(h) show the progressof the experiment. First, we learn the robot in-verse dynamics online starting with an empty dic-tionary. We apply SI-GPR with the same pa-rameter setting as given in Section 3.3. Here, wealso first incrementally fill up the dictionary andsubsequently update for new data points. Sub-sequently, we modify the robot dynamics by at-taching a heavy water bottle to the arm (beside

the changing load, the swinging bottle addition-ally introduces a time-variant nonlinearity). Dueto the change in the dynamics, the robot can-not follow the given trajectory in joint space. Asin Section 3.3, the desired joint space trajectoryis defined such that the robot draw a figure-8 inthe task space. The end-effector velocity of therobot during the tracking in task space is about0.6 m/sec, where the displacement ranges from0.2 to 0.5 m. The modified dynamics can now belearned online, as shown in the Figures 8 (a)-(h).

As the online model learning converges, themodified dynamics is taken into account by thepredicted feedforward torques uFF. As an ex-ample, Figure 9 shows the joint angle and corre-sponding torques of the 4th DoF (i.e., the robot’selbow flexion) during the experiment. With theattached water bottle, the predicted feedforwardtorques uFF gradually becomes more precise asthe model converges. The decreasing model er-ror results in accurate tracking performance. By

11

Page 12: Incremental Online Sparsi cation for Model Learning in

0 25 65 130 170 200

1

1.5

2

Join

t Pos

ition

[rad

]

Desired

Elbow Joint

0 25 65 130 170 2000

5

10

Torq

ue [N

m]

Joint torque: uFeedforward: u

FF

0 25 65 130 170 200

−5

0

5

Torq

ue [N

m]

Time [sec]

Feedback: u FB

(a)

Bottleremoved

Bottleattached

(b) (d)(c) (e)

Figure 9: Elbow joint angle (4th DoF) and corresponding torques of the Barrett WAM during the experiment. (a)The robot starts with an unmodified dynamics. (b) As the water bottle is attached to the arm, it causes a jump inthe feedback torque uFB and joint torque u. Due to the compliant tracking mode, the change in the feedback torqueuFB is not sufficient to compensate the resulting tracking error. (c) As the dynamics model is learned online, the newdynamics can be incorporated in the prediction of the feedforward torque uFF. As the online model learning converges,the resulting tracking error is also gradually reduced, i.e., the robot returns to the desired position. Note how thefeedback torque uFB is decreasing, as the model, i.e., the torque prediction, becomes more accurate. (d) The onlinemodification of the dynamics, i.e., removing the water bottle, leads to changes in the feedback and joint torques. (e)As the adaptation is successfully done and the feedforward torque uFF converges, the robot moves back to the desiredtrajectory and the feedback torque decreases again.

continuously updating the dictionary, i.e., by in-sertion and eventual deletion of dictionary points,the model can adapt to dynamical changes in theenvironment. This effect can further be demon-strated by removing the water bottle. As ob-served from Figure 9, the predicted torque uFF

is subsequently reduced, since smaller torques arenow required for proper tracking of the desiredtrajectory.

In this experiment, we employ very low track-ing gains. As a result, the model errors will seri-ously degrade the tracking accuracy unless a more

precise inverse dynamics model is learned. Fig-ure 9 shows that the feedback torques uFB arenot able to compensate the model error due tothe low tracking gains. As the inverse dynamicsmodel becomes more accurate during the onlinelearning, the feedback torques uFB decrease, and,thus, the joint torques u mainly depend on thefeedforward term uFF. As the torque generationrelies more on the inverse dynamics model, wecan achieve both compliant control (by using lowtracking gains) and high tracking accuracy at thesame time.

12

Page 13: Incremental Online Sparsi cation for Model Learning in

4. Conclusion and Future Work

Motivated by the need of fast online modellearning in robotics, we have developed an in-cremental sparsification framework which can beused in combination with an online learning algo-rithm enabling an application in real-time onlinemodel learning. The proposed approach providesa way to efficiently insert and delete dictionarypoints taking in account the required fast compu-tation during model online learning in real-time.The implementation and evaluation on a physicalBarrett WAM robot emphasizes the applicabilityin real-time online model learning for real worldsystems. Our future research will be focused onfurther extensions such as including a databaseenabling online learning for large data sets.

Acknowledgments

We would like to thank Bernhard Scholkopffrom Max Planck Institute for Biological Cyber-netics for helpful discussions and for providing uswith relevant machine learning literature of re-duced set methods.

References

[1] Burdet, E., Codourey, A., 1998. Evaluation of para-metric and nonparametric nonlinear adaptive con-trollers. Robotica 16 (1), 59–73.

[2] Cauwenberghs, G., Poggio, T., 2000. Incremental anddecremental support vector machine learning. Ad-vances in Neural Information Processing Systems.

[3] Engel, Y., Mannor, S., Meir, R., 2004. The kernel re-cursive least-square algorithm. IEEE Transaction onsignal processing 52.

[4] Hastie, T., Tibshirani, R., Friedman, J., 2001. TheElements of Statistical Learning. Springer, New York.

[5] Lang, T., Plagemann, C., Burgard, W., 2007. Adap-tive non-stationary kernel regression for terrain mod-eling. Robotics: Science and Systems (RSS).

[6] Liu, Y., Wang, H., Yu, J., Lia, P., 2009. Selectiverecursive kernel learning for online identification ofnonlinear systems with NARX form. Journal of Pro-cess Control, 181–194.

[7] M.Seeger, 2007. Low rank update for the choleskydecomposition. Tech. rep., University of California atBerkeley.

[8] Nakanishi, J., Schaal, S., 2004. Feedback error learn-ing and nonlinear adaptive control. Neural Networks.

[9] Nguyen-Tuong, D., Peters, J., Seeger, M., 2008. Com-puted torque control with nonparametric regressionmodels. Proceedings of the 2008 American ControlConference (ACC 2008).

[10] Nguyen-Tuong, D., Seeger, M., Peters, J., 2008. Localgaussian process regression for real time online modellearning and control. Advances in Neural InformationProcessing Systems.

[11] Plagemann, C., Kersting, K., Pfaff, P., Burgard, W.,2007. Heteroscedastic gaussian process regression formodeling range sensors in mobile robotics. Snowbirdlearning workshop.

[12] Rasmussen, C. E., Williams, C. K., 2006. GaussianProcesses for Machine Learning. MIT-Press, Mas-sachusetts Institute of Technology.

[13] Schaal, S., Atkeson, C. G., Vijayakumar, S., 2002.Scalable techniques from nonparameteric statisticsfor real-time robot learning. Applied Intelligence, 49–60.

[14] Scholkopf, B., Mika, S., Burges, C. J. C., Knirsch, P.,Muller, K.-R., Ratsch, G., Smola, A. J., 1999. Inputspace versus feature space in kernel-based methods.IEEE Transactions on Neural Networks 10 (5), 1000–1017.

[15] Scholkopf, B., Smola, A., 2002. Learning with Ker-nels: Support Vector Machines, Regularization, Op-timization and Beyond. MIT-Press, Cambridge, MA.

[16] Scholkopf, B., Smola, A., Williamson, R., Bartlett,P., 2000. New support vector algorithms. NeuralComputation.

[17] Spong, M. W., Hutchinson, S., Vidyasagar, M., 2006.Robot Dynamics and Control. John Wiley and Sons,New York.

[18] Vijayakumar, S., D’Souza, A., Schaal, S., 2005. In-cremental online learning in high dimensions. NeuralComputation.

[19] Vijayakumar, S., Wu, S., 1999. Sequential supportvector classifiers and regression. International Con-ference on Soft Computing.

Appendix

Gaussian Process Regression

A powerful probabilistic method for model learn-ing is Gaussian process regression (GPR). Givena set of n training data points {xi, yi}ni=1, we in-tend to discover the latent function f(xi) whichtransforms the input vector xi into a target valueyi given by the model yi= f(xi)+εi , where εi isGaussian noise with zero mean and variance σ2

n

[12]. A Gaussian process model is determined byits covariance function and mean function which is

13

Page 14: Incremental Online Sparsi cation for Model Learning in

often assumed to be zero. In practice, the covari-ance function can be chosen by the users such asa Gaussian kernel as given in Equation (2). Theresulting prediction f(x∗) and the correspondingvariance V(x∗) of a query input vector x∗ can begiven as

f(x∗) = k∗T (K + σ2

nI)−1

y ,

= k∗Tα ,

V(x∗) = k(x∗,x∗)− k∗T (K + σ2

nI)−1

k∗ ,(13)

Here, K = k(X,X) denotes the covariance ma-trix evaluated on the training input data X, k∗=k(X,x∗) is the covariance vector evaluated on Xand the query point x∗, and α = (K + σ2

nI)−1yis the so-called prediction vector. The open pa-rameters of GPR, i.e., the hyperparameters, canbe optimized from training data. The usual prac-tice is to maximize the log marginal likelihood us-ing common optimization procedures, e.g., quasi-Newton methods [12].

The GPR model as introduced here can alsobe obtained incrementally for online applications[10]. Essentially, the matrix (K+σ2

nI)−1 in Equa-tion (13) has to be incrementally updated whenincluding a new point [10]. This approach is equiv-alent to a rank-one update which yields a com-plexity of O(m2), where m denotes the currentnumber of training data points [7]. For real-timeapplications, it is necessary to define an upperbound for m in order to cope with the limitedcomputational power. Thus, it is essential to de-velop a method for incrementally selecting a lim-ited number of informative data points makingthe regression real-time capable.

The Authors

Duy Nguyen-Tuong hasbeen pursuing his Ph.D. since2007 at the Max Planck Insti-tute for Biological Cybernet-ics in the department of Bern-hard Scholkopf supervised byJan Peters. Before doing so, hestudied control and automa-

tion engineering at the University of Stuttgart

and the National University of Singapore. Hismain research interest is the application of ma-chine learning techniques in control and robotics.

Jan Peters is a senior re-search scientist and heads theRobot Learning Lab (RoLL)at the Max Planck Institute forBiological Cybernetics in Tue-bingen, Germany. He gradu-ated from University of South-ern California (USC) with aPh.D. in Computer Science.He holds two German M.S. de-

grees in Informatics and in Electrical Engineering(from Hagen University and Munich Universityof Technology) and two M.S. degrees in Com-puter Science and Mechanical Engineering fromUSC. Jan Peters has been a visiting researcherat the Department of Robotics at the GermanAerospace Research Center (DLR) in Oberpfaf-fenhofen, Germany, at Siemens Advanced Engi-neering (SAE) in Singapore, at the National Uni-versity of Singapore (NUS), and at the Depart-ment of Humanoid Robotics and ComputationalNeuroscience at the Advanded Telecommunica-tion Research (ATR) Center in Kyoto, Japan.His research interests include robotics, nonlinearcontrol, machine learning, reinforcement learning,and motor skill learning.

14