incremental motion learning with gaussian process ... · demonstrations, we use gaussian processes...

Incremental Motion Learning with Gaussian ProcessModulated Dynamical Systems

Klas Kronander, Mohammad Khansari and Aude BillardLearning Algorithms and Systems Laboratory

Ecole Polytechnique Federale de Lausanne, Switzerland{klas.kronander,mohammad.khansari,aude.billard}@epfl.ch

Abstract—Dynamical Systems (DS) for robot motion modelingare well-suited for efficient robot learning and control. Our focusin this extended abstract is on autonomous dynamical systems,which represent a motion plan completely without dependencyon time. We develop a method that allows to locally reshapean existing, stable autonomous DS without risking introductionof additional equilibrium points or unstable behavior. This isachieved by locally applying rotations and scalings to the originaldynamics. Gaussian Processes are then used to incrementallylearn reshaped dynamical systems. We briefly report on pre-liminary results from applying the proposed methodology forlearning 2d hand-writing motions.

I. INTRODUCTION

A set of preprogrammed behaviors is insufficient for a trulyversatile robot. Alternative solutions should hence be sought toendow robots with the capability to learn tasks, both supportedby a teacher (Learning from Demonstration) and on its own(Reinforcement Learning). In both cases, Dynamical Systems(DS) have emerged as one of the most general and flexibleways of representing motion plans for robots.

In this work, we explore incremental learning in au-tonomous dynamical systems. Most currently existing DSrepresentations are not ideally suited for this purpose, asthey either ensure stability through a phase variable1 [1] orimpose stability constraints which can be difficult to satisfyin an incremental learning setting [2]. We propose a new DSrepresentation based on locally applying rotations and scalingsto a dynamical system with known stability properties. Thisapproach allows representation of very complex trajectorieswithout risking the introduction of spurious attractor pointsor unstable behavior. In order to learn from incrementaldemonstrations, we use Gaussian Processes to encode thevariations of the parameter vector determining how the originaldynamics should be rotated and scaled in different parts ofthe state space. We will refer to our framework as GaussianProcess Modulated Dynamical Systems (GP-MDS). Like anyGP-based method, GP-MDS suffers computationally as thetraining set grows. To deal with this, we propose a noveltrajectory-based heuristic for managing sparsity of the trainingdata in GP-MDS.

This extended abstract consists of a description of the GP-MDS architechture, and briefly presents preliminary resultsfrom applying GP-MDS to learning 2d hand-writing motions.

1The use of an external phase variable for driving the DS forward in practicemeans that the system is not autonomous.

II. APPROACH

Let x ∈ RN represent a N -dimensional kinematic variable,e.g. a Cartesian position vector. Let a continuous function f :RN 7→ RN represent a dynamical system:

x = f(x) (1)

In the remainder of this document, it will be assumed that fhas a single attractor, which without loss of generality can beplaced at the origin. We will refer to Eq. (1) as the originaldynamics.

A. Locally Modulated Dynamical Systems

The goal of this work is to locally modify the originaldynamics. This is achieved by introducing local modulation,resulting in a system termed reshaped dynamics:

x = g(x) = M(x)f(x) (2)

where M(x) ∈ RN×N is a continuous matrix valued functionthat modulates the original dynamics f(x) by rotation andspeed-scaling:

M(x) = (1 + κ(x))R(x) (3)

Here, κ(x) is a continuous state-dependent scalar functionstrictly larger than −1 and (x) ∈ RN×N is a state-dependentrotation matrix. Note that since M(x) has full rank, thereshaped dynamics will have the same equilibrium points asthe original dynamics. Moreover, if M is chosen such that itis locally active2, then the reshaped dynamics are bounded.

In order to learn reshaped dynamics, we parameterize themodulation function M(x) = M(θ(x)) with θ(x) ∈ RP

being a vector of P = 2 parameters (rotation angle and speedscaling) in the 2d case, and P = 4 (rotation represented as axisangle in 3 parameters and one parameter for the speed scaling).For a trajectory data set {xm, xm}Mm=1, a corresponding set{xm, θm}Mm=1 can be easily be computed by comparing f(xm)and xm for each m = 1 . . .M . It is possible to similarlyparameterize rotations in higher dimension, e.g. for reshapingdynamics expressed in joint space.

Note that as the norm of θ goes to zero, the modulationfunction Eq. (3) goes to the identity matrix. Hence, if a localregression technique is applied to θ, the reshaped dynamics

2M(x) is said to be locally active if there exists some closed subset χ ofRN such that M(x) = IN for all x ∈ RN \ χ

Training data

Training points used by GP-MDS

Influence region of the GPExample trajectories fromreshaped dynamical system

Example trajectories fromoriginal dynamical system

Fig. 1. Left: Example of reshaped dynamics using GP-MDS in a 3d system.The colored streamtapes represent example trajectories of the reshaped dy-namics. The streamtapes colored in black represent trajectories that do not passthrough the reshaped region of the state space, and hence retain the straight-line characteristics of the linear system that is used as original dynamics here.The green streamtube are artificially generated data representing an expandingspiral. Points in magenta represent the subset of this data that was selected astraining set. The gray surface illustrates the region in which the dynamics aresignificantly altered (corresponding to a level set of the predictive variancein the GP). Right: Same as left but zoomed in and the influence surface hasbeen sliced to improve visibility of the training points and the trajectories.

(a) (b) (c) (d)

Fig. 2. Examples of GP-MDS in a 2d systems. (a): Streamlines representthe direction of motion at each point. As seen, all trajectories converge to thesingle attractor of the linear original dynamics. Green circles indicate collecteddata points and magenta-colored circles indicate points selected for use in theGP training set. The copper-colored colormap illustrates the reshaped region,e.g. the region where the GP outputs a parameter vector different from zero.(b): Similar to (a) but with different collected data.(c): A 2d GP-MDSexample with nonlinear original dynamics. (d): A 2d GP-MDS example withnonlinear original dynamics and several reshaped parts of the state space.

are guaranteed to be bounded. In the next Section, we applyGaussian Process Regression for encoding θ(x) using a dataset {xm, θm}Mm=1.

B. Learning Modulation with Gaussian Processes

In this section, we present how the reshaped dynamicscan be learned by encoding the parameter vector θ usingGaussian Process Regression. Due to space restrictions, weomit a review of GPR here and refer to [3]. The predictivemean of the p:th entry of the parameter vector at a test-pointx∗ is:

θp(x∗) = Kx∗X [KXX + σ2nI]−1Θp (4)

where Θp = [θp1 , . . . , θPM ]T and:

KXx∗ = [k(x1, x∗), . . . , k(xM , x

∗)], Kx∗X = KTXx∗

The element at row i, column j of the M ×M matrix KXX

is given by:[KXX ]ij = k(xi, xj)

The behavior of GPR is determined by the choice of obser-vation noise variance σ2

n and covariance function k(·, ·). Inthis work, we use the squared exponential covariance function,defined by:

k(x, x′) = σ2f exp

(− (x− x′)T (x− x′)

2l

)where l, σf > 0 are scalar hyper-parameters. In this work,these parameters and the observation noise variance were setto predetermined values. Alternatively, they could optimizedto maximize the likelihood of the training data [3].

If an identical GP prior is used for each dimension 1 . . . Pof θ, computing multidimensional θ is done at little additionalcost compared to predicting a single output, since the inputdependent vector a(x∗) = Kx∗X [KXX + σ2

nI]−1 can beprecomputed and predictions of each dimension is then simplydone by computing a dot product θp(x∗) = a(x∗)T Θp.

With fixed hyper-parameters, incremental learning can beachieved simply by incrementally expanding the training setwith new data. However, since GPR involves inversion ofa matrix that grows with the number of training data, it isimportant to sparsely represent the incoming data. This can bedone by defining some selection criteria that determines if newdata should be included in the training set or not. While mostprevious such criteria are related to the predictive varianceor information-theoretic measures which depend on the inputpatterns of the data [4], we propose a custom selection criteriafor GP-MDS which depends on the outputs and hence relatesdirectly to the resulting trajectory. To determine whether anew pair xM+1, θM+1 should be added to the training set, wecompare with the predicted parameter vector using the currenttraining set, θ(xM+1). The speed-scalings κM+1, κ(xM+1)and the rotation angles φM+1, φ(xM+1) are extracted fromθM+1 and θ(xM+1). Then, let J1

M+1 and J2M+1 denote two

positive scalar functions, defined as:

J1M+1 =

|κM+1 − κ(xM+1)|1 + κM+1

(5a)

J2M+1 = min

k∈N(|φM+1 − φ(xM+1) + 2kπ|) (5b)

The first function, J1M+1 is a relative measure of the speed

error, and the second function J2M+1 is an absolute measure

of the error in rotation angle. The decision whether to adda point is then made by comparing the values of thesefunctions with predefined thresholds J

1, J

2. If the value of

either function exceeds its threshold, the data point is addedto the training set, otherwise it is not. The thresholds shouldbe set so as to achieve a good trade-off between sparsityand accurate trajectory representation. In this work, we usedJ1

= 0.3, corresponding to a speed error smaller than 30%being considered tolerable. For the rotation angle, a thresholdof J

2= 10o was used.

Fig. 1 shows an example of reshaping a linear 3d systemwith GP-MDS to locally incorporate an expanding spiralpattern. Fig. 2 shows a set of examples of reshaped dynamicsin 2d.

III. LEARNING HANDWRITING MOTIONS

To illustrate an application of the proposed approach ina Learning from Demonstration setting, we use GP-MDS tolearn a set of handwriting motions from the LASA handwritingdata set [2]. The form of the original dynamics is not con-strained - any first order system can be used as f in Eq. (2).For example, an SEDS model (Gaussian Mixture Regressionwith stability constraints) can be used.

The first column of Fig. 3 shows training data for threeletters from the LASA handwriting set, along with streamlinesfrom SEDS models trained on this data. Note that thesemodels already do a good job at producing smooth generalizeddynamics from the data. The middle column of Fig. 3 showsGP-MDS being applied for refining the SEDS dynamics. Forletter N, starting trajectories left of the demonstrated startinglocation is problematic, as illustrated by the black exampletrajectory in Fig. 3a. In Fig. 3b, this is remedied with a verysimple corrective demonstration. For letters W and Z, oneadditional demonstration (different from the demonstrationsused for the SEDS models) was given. The goal here is tosharpen the corners, which are overly smooth both in the origi-nal demonstrations and the resulting SEDS model (Figures 3dand 3g). In order to favor detail over generalization, a finelengthscale was selected, resulting in the sharpened letters inFigures 3e and 3h.

The right column of Fig. 3 shows streamlines from GP-MDSapplied to a linear system in place of an SEDS model. In thesecases, the original training data (the same that was used fortraining the SEDS models) was used for GP-MDS. A mediumscale lengthscale was chosen to trade-off generalization anddetail. This exemplifies that GP-MDS can be used evenwithout any task-knowledge in the original dynamics althougha good original model can provide advantages such as bettergeneralization and allow GP-MDS to focus on detail refiningrather than general behavior.

Note the sparse selection of training data in Fig. 3, middlecolumn. In areas of the state-space were the original dynamicshave the same direction as the corrective demonstration, it isnot necessary to add training data3. The sparse data selectionis also clearly visible near the end of the letters in the rightcolumn of Fig. 3, since the demonstrations there are roughlyaligned with the trajectories of the linear system which is usedas original dynamics in these cases.

IV. CONCLUSION

This extended abstract presented a novel representation forautonomous dynamical systems, based on locally rotating andscaling existing dynamics. An advantage of this representationis that by construction it is impossible to introduce spuri-ous attractors or unstable behavior. Gaussian Processes wereemployed for learning reshaped dynamics, resulting in theGP-MDS framework, which uses a heuristic sparsity criteria

3In these experiments J1= 0.3 was set to a very high value, tolerating

speed errors up to 30 % since speed was not considered important for thistask. In practive, the selection criteria is hence based on the angle error only.

Original training dataHighlighted trajectoriesStreamlines of dynamics

Collected corrective dataSelected GP data

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Fig. 3. Left column: Demonstrated trajectories (red dots) and resultingSEDS models for the letters N,Z and W. Example trajectories are highlightedin black. Middle column: GP-MDS is used to improve various aspectsof the SEDS models. The copper colormap illustrates the reshaped region,highlighting that GP-MDS locally modifies the dynamics. Right column: Theoriginal training data is provided to GP-MDS, with a simple linear systemreplacing SEDS as original dynamics.

that is tailored for the DS application. Preliminary resultsfrom applying GP-MDS for refining handwriting motions werepresented.

We plan to improve the data selection procedure by incorpo-rating pruning of old training points, so that the dynamics canbe continually reshaped over time with a maximum allowedsize of the number of points in the GP training set. Thecurrent sparsity criteria is inherently sensitive to outliers, apoint which is usually not crucial when data comes froma continuous process such as trajectory demonstrations. Thealgorithm would however benefit from a method of discardingoutliers and this will be explored in future work.

ACKNOWLEDGMENT

This research was supported by the Swiss National ScienceFoundation through the National Center of Competence inResearch Robotics.

REFERENCES

[1] A. Ijspeert, J. Nakanishi, and S. Schaal, “Movement imitation withnonlinear dynamical systems in humanoid robots,” IEEE Intl. Conf. onRobotics and Automation, pp. 1398–1403, 2002.

[2] S. Khansari-Zadeh and A. Billard, “Learning stable non-linear dynamicalsystems with Gaussian Mixture Models,” IEEE Transactions on Robotics,vol. 27, pp. 1–15, 2011.

[3] C. Rasmussen and C. Williams, Gaussian processes for machine learning.MIT Press, 2006.

[4] J. Quinonero Candela and C. Rasmussen, “A unifying view of sparseapproximate Gaussian process regression,” The Journal of MachineLearning Research, vol. 6, pp. 1939–1959, 2005.

incremental motion learning with gaussian process ... · demonstrations, we use gaussian processes...

Documents