motion optimization using gaussian process dynamical...

Multibody Syst Dyn (2015) 34:307–325DOI 10.1007/s11044-014-9441-8

Motion optimization using Gaussian process dynamicalmodels

Hyuk Kang · F.C. Park

Received: 13 September 2013 / Accepted: 21 November 2014 / Published online: 17 December 2014© Springer Science+Business Media Dordrecht 2014

Abstract We propose an efficient method for generating suboptimal motions for multibodysystems using Gaussian process dynamical models. Given a dynamical model for a multi-body system, and a trial motion, a lower-dimensional Gaussian process dynamical modelis fitted to the trial motion. New motions are then generated by performing a dynamic op-timization in the lower-dimensional space. We introduce the notion of variance tubes as anintuitive and efficient means of restricting the optimization search space. The performanceof our algorithm is evaluated through detailed case studies of raising motions for an arm andjumping motions for a humanoid.

Keywords Robot dynamics · Motion optimization · Machine learning · Gaussian processdynamical model

1 Introduction

The primary challenge behind dynamics-based motion optimization for multibody systemshas always been, and continues to be, the high dimensionality of the system and the ensuingcomplexity of the dynamics equations. A typical humanoid robot, for example, can possessup to sixty or more degrees of freedom; existing numerical motion optimization algorithmsencounter difficulties when the degrees of freedom exceed ten. As multibody structures be-come more complex and engage in more complex tasks, coping with the high dimensionalityof the dynamic equations will require nontraditional approaches beyond simply relying onfaster computational processing.

The focus in this paper will be on systems with a large number of degrees of freedomthat can be modeled as a tree structure multibody system, for example, humanoid robotsand digital characters. To address the problem of high dimensionality, dimension reduction

H. Kang (B) · F.C. ParkSchool of Mechanical and Aerospace Engineering, Seoul National University, Seoul, Koreae-mail: [email protected]

F.C. Parke-mail: [email protected]

http://crossmark.crossref.org/dialog/?doi=10.1007/s11044-014-9441-8&domain=pdf

mailto:[email protected]

mailto:[email protected]

308 H. Kang, F.C. Park

methods based on principal component analysis (PCA) have been investigated in the litera-ture. These methods typically assume that a set of training motion data, in the form of, e.g.,human motion capture data, is available. In [1], for example, a set of training motions (fiftytrials of a human lifting a dumbbell) are used to extract principal component basis functionsfor each joint; more general joint trajectories are then generated as a linear combination ofthese principal components. If a sufficient number of basis functions are available, then onecan attempt to find the optimal linear combination that minimizes some performance crite-rion. Although the resulting trajectories will only be suboptimal, they will likely resemblethe training motion data (which is desirable if the training motions are viewed as being closeto optimal), and, moreover, the ensuing optimization is finite-dimensional.

An obvious downside of the above approach is that sufficient training motion data mustbe available to extract enough principal components. These methods also encounter diffi-culties handling boundary constraints. In general, the number of principal components d

should exceed the number of boundary conditions, and to keep d from becoming too large,such methods often resort to ad hoc techniques that only partially ensure boundary con-straint satisfaction. Finally, because only linear combinations of the principal componentsare possible, the set of possible trajectories will necessarily be limited.

Recently, considerable progress has been made in statistical techniques for nonlineardimension reduction. Broadly, these methods can be classified into embedding-based andmapping-based techniques. In the former, for example, the locally linear embedding methodof [2] or the Isomap algorithm of [3], the low-dimensional structure of the data is modeleddirectly, without generating a mapping between the latent space and configuration space.Mapping-based techniques, on the other hand, determine a nonlinear mapping between thedata and their latent space coordinates, typically via a combination of local linear models(e.g., [4, 5]), or through a single nonlinear function (e.g., [6]).

Among mapping methods, the Gaussian process latent variable model (GPLVM) [6] hasbeen used in a number of motion generation contexts, from generating style-specific inversekinematics [7] to mapping human motions to nonhumanoid characters in a natural way [8].In our recent previous work [9], a dynamics-based motion optimization algorithm basedon a GPLVM was developed. In contrast with the related approach of [8], our algorithmaccommodates general integral functional objective functions that reflect physical criteriaand, more importantly, formulates the optimization directly in the lower-dimensional latentspace.

One of the shortcomings of the GPLVM-based approach is that the motions generatedin [9] often tend to be choppy and discontinuous; this is because GPLVM-based optimizationfails to take into account the temporal correlation of the data. To address these and othershortcomings of the GPLVM model, Wang et al. [10] have proposed the Gaussian processdynamical model (GPDM) as a natural generalization of GPLVM to time series data. In [10]the GPDM model is used with some success to represent, classify, and synthesize humanmotion trajectories considering only the kinematics.

In this paper we develop an efficient dynamics-based motion optimization algorithmbased on the Gaussian process dynamical model. Given training motion data, we use theGPDM to construct a stochastic time series model of the system in the lower-dimensionallatent space, together with a nonlinear mapping from the latent space to the system’s con-figuration space. Trajectory optimization is then performed directly in the latent space. Weintroduce the notion of variance tubes as a natural way to represent the deviation of theresulting optimizing trajectories from training motion data and thus of specifying the opti-mization search space. Case studies of humanoid structure performing raising and jumpingmotions are used to demonstrate that even with just a single trial motion, our GPDM-based

Motion optimization using Gaussian process dynamical models 309

trajectory generation method can produce trajectories that are smoother and more physicallyrealistic than GPLVM methods.

The remainder of this paper is organized as follows. The problem formulation is de-scribed in Sect. 2, together with a discussion of the dynamics model and its computation.Section 3 presents our GPDM-based dynamics optimization algorithm, whereas Sect. 4presents results of our case studies. Concluding remarks and directions for future investi-gation are presented in Sect. 5.

2 Dynamics-based motion optimization

We shall consider a three-dimensional, tree structure, rigid multibody dynamic model, inwhich all joints are assumed actuated. We refer the reader, for example, to [11–13] for detailsand further references on our approach to multibody dynamic modeling and algorithms. Ourpurpose in this section is to describe the general structure of the dynamics equations anddistinctive features of the dynamics computations for motions involving ground contact andfree flight. Finally, we formulate the motion optimization problem as a numerical optimalcontrol problem.

2.1 Multibody dynamic modeling

We consider a rigid multibody system with a tree structure topology. Identifying one link(e.g., the torso) as the base link and assuming the absence of external contact forces, thedynamic equations can be written in the form

M(q)q + C(q, q)q + V (q) = τ, (1)

where q ∈ RD are the configuration coordinates, which include the joint position qjoint ∈

RD−6 and the position and orientation of the base link qbaselink ∈ R

6, M(q) ∈ RD×D is the

mass matrix, C(q, q) ∈ RD×D is the Coriolis matrix, V (q) ∈ R

D is the gravity force, andτ ∈R

D is the vector of input torques and forces. The vector τ can be further partitioned intoτ = (τjoint, τbaselink), where τjoint denotes the torque vector exerted at the joints, and τbaselink isa six-dimensional moment–force pair (or spatial force) exerted on the base link.

In the event that there exist external contact forces applied at any of the links, the dynamicequations (1) are modified into the following form:

M(q)q + C(q, q)q + V (q) = τ +∑

J Ti Fci , (2)

where J Ti Fci represents the external contact forces applied to the links (here Fci = (mi, fi)

represents the six-dimensional moment–force pair, or spatial force, exerted at the ith contact,and Ji denotes the ith contact Jacobian defined by the contact conditions).

Assuming that the base link force is zero (τbaselink = 0) and no contact force is exertedon the system, the system may no longer be in contact with the ground. In such cases,given the motion of the joints, the motion of the base link is determined from momentumconservation laws. Let q be partitioned into q = (qjoint, qbaselink). The equations of motioncan then be rewritten as

M(q)

(qjoint

qbaselink

)+ C(q, q)q + V (q) =

(τjoint

τbaselink

). (3)


The objective is to compute (qbaselink, τjoint) from given values for (q, q, qjoint) and withτbaselink set to zero. The dynamics can then be evaluated by a hybrid dynamics algorithm(see, e.g., [14] for the particular version used in our paper).

When the system comes into contact with the environment, one needs to solve for thecontact force Fci in addition to (qbaselink, τjoint). By imposing a unilateral contact constraintwith friction the problem is then formulated as a mixed linear complementarity problem(MLCP). The hybrid dynamics algorithm with solution of the MLCP is described in detailin [14].

2.2 Optimal control formulation and spline-based solutions

Our interest is in minimizing cost functionals of the form

J (τ) =∫ tf

0L(q, q, τ, t) dt (4)

subject to (1) and boundary constraints. In the absence of constraints on kinematics and dy-namics, an efficient way of finding a suboptimal solution is to parameterize the configurationcoordinates q(t) by B-splines [15] (note that in the case of motions involving free flight, q(t)

is replaced by qjoint(t)). The B-splines in turn depend on the choice of basis function Bi(t)

and the control points P = {p1,p2, . . . , pm}, where pi ∈ RD . The configuration trajectories

are then of the form q = q(t,P ), where

q(t,P ) =m∑

i=1

piBi(t). (5)

Finding the optimal control is thus reduced to a finite-dimensional parameter optimizationproblem. Naturally, the computational burden increases with the configuration space dimen-sion D, and also with the number of control points m; assuming a robot with sixty degreesof freedom (D = 60) and five control points (m = 5), the resulting parameter optimizationproblem then is of dimension 300. Not only is the problem high-dimensional, but the costfunction is highly nonlinear and involves numerical integration. The combination of highdimensionality and nonlinearity, accumulation of numerical errors, and sensitivity to initialsolutions and boundary conditions makes obtaining a solution very difficult. These difficul-ties serve as motivation for our GPDM-based motion optimization algorithm.

3 GPDM motion optimization

In this section we first review the Gaussian process dynamical model (GPDM) of Wang etal. [10] and then present our GPDM-based motion optimization algorithm.

3.1 Gaussian process dynamical model basics

Given data in the form of a time series, the standard approach to modeling such data is toassume a certain parametric time series form of the dynamics and to determine the values ofthe model parameters such that they are a best fit to the data in some appropriate sense. Sucha parametric approach to model identification runs into difficulties when (i) only limiteddata is available, (ii) the model is very complicated, and (iii) the number of parameters


is large. One of the advantages of GPDM is that by regarding the parameters as randomvariables and then marginalizing them out, the burden of identifying the model parametersis removed. GPDM consists of a function from the latent space to the data (or observation)space, together with a dynamical function in the latent space. These two functions can beobtained as closed-form probability densities by marginalizing out the parameters of the twomappings. To use GPDM, one must determine these so-called hyperparameters associatedwith the probability densities and the latent coordinates.

Given time series observation data Q = {qt }N1 , qt ∈ R

D , we assume that these data aregenerated in a lower-dimensional latent space R

d by a stochastic Markov dynamics processas follows:

xt = f (xt−1) + nx,t , (6)

qt = g(xt ) + nq,t , (7)

where f : Rd → Rd and g : Rd → R

D are nonlinear, and nx,t and nq,t are zero-mean Gaus-sian white noise processes. The latent points X = {xt }N

1 , xt ∈ Rd , corresponding to Q are

initially unknown. In the GPDM framework, f and g are formulated as linear combinationsof some given scalar basis functions φi(x), ψi(x) ∈ R:

f (x) =m∑

i

aiφi(x), (8)

g(x) =m∑

i

biψi(x). (9)

Defining A = [a1 a2 · · · am]T ∈ Rm×d and B = [b1 b2 · · ·bm]T ∈ R

m×D , if we assume anisotropic Gaussian prior on each of the columns of B , the conditional density for the data Q

can be obtained by marginalizing over g:

p(Q|X, β) = |W |N√(2π)ND|KQ|D exp

(−1

2tr(K−1

Q QW 2QT))

, (10)

where β = {β1, β2, β3,W } comprises the hyperparameters that specify (11), W is a weight-ing matrix, and KQ is a kernel matrix whose elements are defined via the following radialbasis function (RBF) kernel:

kQ

(x, x ′) = β1 exp

(−β2

2

∥∥x − x ′∥∥2)

+ β−13 δx,x′ , (11)

where δx,x′ is the Kronecker delta function.The joint density over the latent coordinates X can be obtained by marginalizing over the

coefficients A:

p(X|α) =∫

p(X|A, α)p(A|α) dA, (12)

where α are a set of kernel hyperparameters to be defined later. Incorporating the Markovproperty (6) gives

p(X|α) = p(x1)

∫ N∏

t=2

p(xt |xt−1,A, α)p(A|α) dA. (13)


Finally, with an isotropic Gaussian prior on the columns of A, the density (13) reduces tothe following closed-form expression:

p(X|α) = p(x1)√(2π)(N−1)d |KX|d exp

(−1

2tr(K−1

X XoutXTout

)), (14)

where Xout = {xt }N2 is the set of all latent points (minus x1), and KX is the (N −1)× (N −1)

kernel matrix constructed from Xin = {xt }N−11 . All elements of the dynamic kernel matrix

KX can be defined by kernel function, for which we typically use an “RBF + linear” kernel,that is,

kX

(x, x ′) = α1 exp

(−α2

2

∥∥x − x ′∥∥2)

+ α3xT x ′ + α−1

4 δx,x′ , (15)

where α = {α1, α2, α3, α4} are the hyperparameters. The RBF term enables GPDM to realizenonlinear dynamics, whereas the linear term provides a natural preference for predictionsclose to the existing data.

3.1.1 Learning

GPDM learning involves finding the latent positions X and model parameters α, β thatspecify the kernel functions kX(x, x ′), kQ(x, x ′). The GPDM parameters are learned fromthe training data Q by maximizing the posterior P (X, α, β|Q). This is equivalent to mini-mizing the negative log-posterior, that is,

minX,α,β

Lgp = − logP (X, α, β|Q)

= d

2ln |KX| + 1

2tr(K−1

X XoutXTout

) − N ln |W | + D

2ln |KQ| + 1

2tr(K−1

Q QW 2QT)

+∑

i

lnαi +∑

i

lnβi. (16)

For our purposes, we use the algorithm proposed in [10] to solve this optimization problem.

3.1.2 New pose generation

Once the model parameters β and latent positions X have been learned, given a new latentposition x∗, the density over the associated new pose q∗ is given by

q∗ ∼ N(μQ

(x∗), σ 2

Q

(x∗)ID

), (17)

μQ(x) = QT K−1Q kQ(x), (18)

σ 2(x) = kQ(x, x) − kQ(x)T K−1Q kQ(x), (19)

where kQ(x) = [kQ(x1, x) · · · kQ(xN, x)]T , ID is the D-dimensional identity matrix (notethat N represents the normal distribution). The function μQ(x) is the mean pose recon-structed from the latent position x. The variance σ 2(x) reflects the uncertainty of the re-construction, which clearly impacts the optimization (details are provided in the followingsection). In what follows, we define xr(t) to be the continuous path connecting the latentpositions {xt }N

i .


3.2 Optimization in latent space

We now show how GPDM can be exploited for the purposes of motion optimization. Weassume as given a rigid multibody dynamic model, and that its dynamics can be evaluatedusing the algorithms described in Sect. 2.1. A set of training motion data is assumed avail-able in the form of a set of joint trajectories Q = {qt }N

1 , where each qt ∈ RD represents one

trial of a particular motion; this training data can be obtained, for example, from motioncapture data.

The first step is to construct and learn a GPDM from the given training motion data. Oncethe GPDM is learned, the latent space corresponding to the training data is determined, andthe motion optimization problem reformulated and solved directly in this lower-dimensionallatent space. For this purpose, we parameterize the latent space trajectory x(t) using B-splines:

x(t,P ) =m∑

i=1

piBi(t), (20)

where {Bi} are basis functions, and P = {p1, . . . , pm} is the set of control points (note thateach pi ∈ R

d ). With this latent space formulation, the number of optimization parameters isconsiderably reduced from D × m to d × m.

Although the trajectory x(t) is parameterized in latent space, to evaluate the dynam-ics (2), one needs the joint trajectory q(t) corresponding to x(t). In the GPDM formulation,this joint trajectory is obtained in the form of a probability density. For a new latent point x,an associated full pose q can be determined by

q(x) = μQ(x), (21)

where μQ(x) is the mean function mapping from x to q as given in (18).The optimization problem (4) can now be reformulated as follows:

minP

J (P ) =∫ tf

0L(P, t) dt, (22)

subject to the dynamic equations. Note that the input torques τ are parameterized as τ =τ (P, t); x, x, and x are all given functions of t and P from (20) and its time derivatives; q ,q , and q are all given functions of μQ(x) in (21) and its time derivatives. Specifically,

x =m∑

i

piBi(t), (23)

x =m∑

i

piBi(t), (24)

x =m∑

i

piBi(t), (25)

q = QT K−1Q kQ(x), (26)

q = QT K−1Q

∂kQ

∂x(x)x, (27)


Fig. 1 Variance tubes inthree-dimensional latent space:the yellow region represents avariance tube. All points aredrawn from the low-dimensionaltraining motion trajectory xr (t)

(Color figure online)

q = QT K−1Q

(d∑

i

xi

∂

∂x

(∂kQ

∂xi

)x + ∂kQ

∂x(x)x

), (28)

where xi ∈ R denotes the ith element of x. Thus, the torque τ also becomes an explicitfunction of the spline parameters through (2). The latent trajectory corresponding to thetraining data (xr(t) defined in Sect. 3.1) is parameterized in terms of B-splines as an initialvalue for the optimization (22).

3.3 Optimization search space

Because our optimization is performed directly in the latent space, characterizing the fea-sible optimization search space, is not as straightforward as in the original configurationspace. For this purpose, we introduce the notion of variance tubes. Setting the latent spacerepresentation of the original trial motion data, denoted xr(t), as the initial trajectory for theiterative optimization procedure, the search space is then specified to be a neighborhood ofxr(t) as follows:

Tδ(x) = {x ∈ R

d | σ 2(x) ≤ δ}, (29)

where σ 2(x) is the variance as defined in (19), and δ is an arbitrarily chosen value suppliedby the user. We call this neighborhood a variance tube because the shape of this spaceresembles a tube whose radius varies along its central axis according to the variance σ 2(x).Examples of variance tubes are shown in Fig. 1.

The low-dimensional training trajectory xr(t) can be viewed as passing through the cen-tral axis of the tube Tδ(x). As x(t) becomes more distant from xr(t), the value of σ 2(x)

increases. Thus, if x(t) lies in the tube Tδ(x), the variance of q is less than δ, which limitsthe uncertainty of the full pose q . If δ is set to zero, then the class of admissible motionsreduces to the original training motion. Thus, the motion reconstructed from a latent trajec-tory that lies inside a sufficiently small variance tube would be closely resemble the trainingmotion data.

The variance tube size may be chosen either manually or through some empirically de-rived heuristic. Alternatively, given multiple sequences of training motion data, it is usuallynot difficult to obtain some intuition on how to best set the tube size. For example, afterGPDM learning has been performed for multiple sequences of training data, each training


sequence has a latent trajectory in the shared latent space. Then, the smallest size of the tubethat contains all latent trajectories can be defined as

δ = inf{δ | x(i)(t) ⊂ Tδ(x), i = 1, . . . , r,

}, (30)

where x(i)(t) is the ith latent trajectory corresponding to the ith training sequence, and r isthe number of training data sequences. If the tube size is smaller than δ, the search spacedoes not take into account all training sequences. Further details on learning from multiplesequences are given in [10].

Given a training motion, we seek a generalization of this motion that optimizes a trajec-tory between new start and goal poses. In our GPDM-based optimization framework, if thelatent points corresponding to the new start and goal poses lie inside the variance tube, thepreviously learned latent space can then be straightforwardly used as the optimization space.If the discrepancy between the original and new poses is too great, the existing latent spacecannot be used, and one must perform GPDM learning (and construct a new latent space)for a new trial motion consistent with the given start and goal poses.

3.4 Algorithm

A pseudo-code description of the overall optimization algorithm is described in Algorithm 1.Note that the choice of numerical optimization method used to update the control points P

is left to the user. One can use local-gradient-based Newton-type optimization algorithms asdescribed in [16]. The choice of an algorithm among Newton-type methods does not leadto a meaningful difference of optimization performance. For all our simulation experiments,L-BFGS [17] is used to update control points P .

Algorithm 1 GPDM-based motion optimization algorithm1: xr(t) ← GPDM Learning(Qtraining)2: control points P ← parameterization(xr )3: while satisfaction of stopping criteria do4: P ← update(P ) by optimization algorithm5: if P ∈ Tδ(x) then6: q, q, q ← reconstruct(P )7: else8: goto 49: end if

10: effort = dynamics(q, q, q)11: end while

4 Experiments

All our simulation experiments are evaluated on a PC with a 2.8 GHz Intel(R) core(TM) i7quad processor and 4.0 GB RAM. All algorithms are implemented in the C++ language.

4.1 Arm motions

To evaluate our algorithm, we compare the quality of the trajectories produced by ourGPDM-based motion optimization algorithm with those obtained via a complete dynamics-


Fig. 2 Schematic of arm model

Fig. 3 Optimized arm raising motion for tube size 0.0001: The left figure shows the latent space, in which thesolid line is the optimized path, the dashed line is the initial path, the shaded region represents the variancetube, and all balls represent B-spline control points. The eight frames show the corresponding pose duringthe motion sequence

based optimization in the configuration space. In our first example, we consider the seven-degree-of-freedom arm model shown in Fig. 2.

The total mass of the arm model is 8.74 kg. The masses and inertias of each link in thearm model are set to values that closely resemble those of the human arm. The motion tobe optimized is an arm raising movement that connects given start and end poses illustratedrespectively by frames 1 and 8 of Fig. 3. An initial trial motion that satisfies the bound-ary conditions is arbitrarily generated by the user. For the objective function, we adopt theminimum torque criterion, that is, L = |τ |2 in the objective function (22). Choosing valuesfor the model and optimization parameters as given in Table 1, we repeat the optimizationfor various tube sizes, so as to qualitatively assess the influence of variance tube sizes inGPDM-based optimization. The numerical results are summarized in Table 2; the resultingmotions are shown in Fig. 3.

4.1.1 Comparison with full configuration space dynamic optimization

As seen from Table 2, performance gains of the GPDM-based optimization increase from2.87 % to 35.68 % as the tube size increases from 0.000001 to 0.01. A larger tube sizeimplies a larger search space and, as expected, leads to a reduced overall cost. For com-parison purposes, performance gains for the full configuration space dynamic optimization


Table 1 Parameters for armmodel optimization Objective function minimum torque

Mass 8.74 kg

Motion duration 2.9 sec

DOF for learning 7

Latent dimension 3

Number of control points 9

Time step 0.001 sec

Table 2 Arm raising motion optimization results

GPDM-based optimization (latent dimension = 3) Full c-space optimization

Tube size Cost P. gain Computationtime

Joint limit Cost P. gain Computationtime

0.000001 341.89 2.87 7.55 ±5◦ 333.27 5.32 23.32

0.000004 304.62 13.46 7.27 ±10◦ 298.21 15.28 23.45

0.000006 294.27 16.4 7.57 ±15◦ 285.82 18.8 23.69

0.000008 286.88 18.5 7.92 ±20◦ 269.10 23.55 23.92

0.00001 282.48 19.75 7.91 ±25◦ 250.91 28.72 24.22

0.00002 265.44 24.59 8.36 ±30◦ 240.63 31.64 24.52

0.00004 253.12 28.09 8.48 ±35◦ 225.77 35.86 24.31

0.00006 249.14 29.22 8.55 ±40◦ 224.22 36.3 24.86

0.00008 246.36 30.01 8.32 ±45◦ 216.37 38.53 24.56

0.0001 241.82 31.3 8.57 ±50◦ 206.66 41.29 24.47

0.0002 237.32 32.58 8.63 ±55◦ 199.02 43.46 24.3

0.0004 230.45 34.53 8.66 ±60◦ 191.14 45.7 24.12

0.0005 226.41 35.68 8.5 ±62◦ 189.38 46.2 24.31

0.01 226.41 35.68 8.5 ±90◦ 189.38 46.2 24.31

P. gain represents performance gain

The unit of cost is N2 m2 s

The unit of performance gain is %

The unit of computation time is seconds

Initial cost is 352

in [15, 16] increase from 5.32 % to 46.2 % as the joint range limits (i.e., the maximum vari-ation allowed in each joint angle) are increased from ±5◦ to ±90◦. The performance gainsrespectively saturate at a tube size of 0.0005 for the GPDM-based optimization and at a jointlimit value of ±62◦ for the full configuration space dynamic optimization.

Given that the maximum performance gains for the GPDM-based optimization are simi-lar to the performance gains obtained for the full configuration space dynamic optimizationnear ±35◦, we expect that the latent space covers about ±35◦ of the joint limit range. Com-paring the maximum performance gains of the two methods, the GPDM-based optimizationreaches approximately 77 % of the maximum performance gain levels attained by full con-figuration space dynamic optimization. As the given training motion approaches the optimalmotion, the differences in performance between the two methods should decrease, sinceboth optimizations are performed in a similar search space. Table 2 compares the computa-


Table 3 Arm raising motionresults for kinodynamic RRT*

Parameters and units are thesame as in Table 2

Kinodynamic RRT*

Iteration Cost P. gain Computation time

3830 355.38 −0.96 5436

4265 315.73 10.30 8310

5390 275.06 21.86 15 529

7660 252.74 28.20 24 371

9135 225.59 35.91 46 532

tional efficiency of the GPDM-based optimization with optimization in the full configurationspace. Computation times for the GPDM-based optimization are on the order of three timesfaster than full configuration space dynamic optimization. For higher-dimensional systems,one can expect an even greater increase in computational efficiency for the GPDM-basedoptimization.

4.1.2 Comparison with sampling-based dynamic optimization

In this section we compare our algorithm with the kinodynamic RRT* algorithm recentlyproposed in [18]. Kinodynamic RRT* is a sampling-based planning algorithm based on theasymptotically optimal RRT* algorithm of [19], in which the dynamic equations are treatedas constraints. The numerical results obtaining using kinodynamic RRT* for the arm raisingmotion are summarized in Table 3.

As can be seen from Table 3, the performance gain increases as the number of iterationsincreases. Although the performance gain at iteration 7660 is comparable to that obtainedwith GPDM-based optimization for tube size 0.00004 (see Table 2), the computation timesfor the GPDM-based optimization are on the order of three-thousand times faster than thatfor kinodynamic RRT*. Note that the performance gain at iteration 3830 is negative becausethe cost associated with the trajectory at iteration 3830 is greater than that for the initialtrajectory used in the GPDM-based optimization. For the GPDM-based optimization, theadditional computation time required to achieve a performance gain from 2.87 to 35.68is less than one second. In contrast, for kinodynamic RRT*, additional 41 096 seconds ofcomputation time are required to achieve a performance gain increase from −0.96 to 35.91.

For motion optimization purposes, our results clearly demonstrate the computational ad-vantages of GPDM-based optimization over kinodynamic RRT*. In some sense, this is not asurprise since kinodynamic RRT* is primarily intended for linear dynamic system; for non-linear systems, the system equations must be locally linearized, which then has the unfortu-nate consequence of the final trajectory not satisfying the original nonlinear dynamics equa-tion due to accumulated errors from the linearization. For highly nonlinear high-dimensionalsystems like robots, the excessive computational requirements and linearization errors ofkinodynamic RRT* outweigh the asymptotic optimality features (i.e., the solution asymp-totically reaching the optimal solution as the number of iterations increases to infinity).

4.1.3 Comparison with two-dimensional GPDM-based optimization

To examine how the choice of latent space dimension influences optimization performance,in this section we perform the same set of arm-raising motion optimizations, but in a two-dimensional rather than three-dimensional latent space. All optimization parameters are thesame as for the three-dimensional GPDM-based optimization as shown in Table 1. The


Table 4 Arm raising motionresults for GPDM-basedoptimization in atwo-dimensional latent space

Parameters and units are thesame as in Table 2

GPDM-based optimization (latent dimension = 2)

Tube size Cost P. gain Computation time

0.000001 339.87 3.44 5.23

0.000004 318.74 8.16 5.35

0.000006 309.04 11.94 5.54

0.000008 301.59 14.32 6.84

0.00001 293.01 16.76 6.74

0.00002 289.74 17.69 6.75

0.00004 268.25 23.79 6.5

0.00006 255.76 27.34 6.86

0.00008 251.57 28.53 6.68

0.0001 249.86 29.02 6.76

0.001 249.86 29.02 6.75

numerical results are summarized in Table 4. As seen from the table, performance gains in-crease as tube size is increased, saturating at a value of 29.02 for a tube size of 0.0001. TheGPDM-based optimization in the two-dimensional latent space is at approximately 81 %level of the maximum performance gains attained by the GPDM-based optimization in thethree-dimensional latent space. Clearly, the performance of the optimization in the two-dimensional latent space is inferior to that performed in the three-dimensional latent space.Intuitively, this degradation is to be expected since any reduction in the dimension of thelatent space implies a corresponding decrease in the corresponding search space volume.Yamane et al. [8] also report similar findings on the relationship between the volume of theoriginal space and the dimension of the latent space. On the other hand, since computa-tion times decrease as the latent space dimension is reduced, there exists the usual tradeoffbetween the amount of dimension reduction (less computation) and performance (lower tra-jectory costs).

4.1.4 New start poses

When an arm raising optimization is performed for a new start arm pose, for best results,the latent point corresponding to the new start arm pose should lie inside the variance tube.To examine the possible range of start arm poses with respect to tube size as the startinglatent point is moved away in the radial direction of the tube, the corresponding arm poseis reconstructed and illustrated in Fig. 4. To compare the numerical differences between thestart and reconstructed arm poses, the distance metric between two arm poses is defined asfollows:

translational distance = |p0 − pδ|, (31)

pose distance = ∣∣log(RT

0 Rδ

)∣∣ + |p0 − pδ|, (32)

where R0 ∈ SO(3) and p0 ∈ R3 denote the orientation and position of the end-effector at

the start pose, whereas Rδ ∈ SO(3) and pδ ∈ R3 are the orientation and position of the end-

effector at a point that is at a distance δ from the start point. The distances along the tube sizeare shown in Fig. 5. We observe that the translational distance (blue line) and pose distance(red line) increase in an approximately linear fashion, so that for this case, the tube size anddistance obey a near linear relationship.


Fig. 4 End-effector poses forvarious tube sizes

Fig. 5 End-effector distances forvarious tube sizes (Color figureonline)

4.2 Humanoid motions

The human model and data used in our humanoid motion optimization experiments arefrom the CMU motion capture dataset.1 The human model consists of 31 links and has 62degrees of freedom. The schematic of the humanoid model is shown in Fig. 6. Since only thekinematic model data are provided in the database (i.e., link lengths, link hierarchy, etc.), forthe dynamics model, we specify appropriate values for the link masses and inertias based onavailable data for similarly sized humans.

1http://mocap.cs.cmu.edu.

http://mocap.cs.cmu.edu


Fig. 6 Schematic of humanoidmodel

For the humanoid model, we consider the broad jump motion as shown in Fig. 7(a). Forthe broad jump motion, we exploit the symmetry of the motion and use only the degreesof freedom from the left half of the body during the learning phase; the left-half motionof the body is then replicated for the right half. The broad jump motion includes a flightphase, and we use our earlier hybrid dynamics algorithm with the six degrees of freedomcorresponding to the base link excluded from the learning phase. Taking advantage of thesymmetry of the broad jump, only 37 degrees of freedom (out of 62, 19 can be ignored fromsymmetry considerations, and the six degrees of freedom corresponding to the base link canbe excluded, leaving 62 − 19 − 6 = 37) are thus used in the learning phase. After learning,when the motion is reconstructed from the latent space in optimization phase, ignored 19joint values are duplicated from the corresponding symmetric joint values that are includedin learning phase.

For the GPDM learning phase, we use exactly one motion sequence for the jump motion;we emphasize again that this is one of the important advantages of our GPDM approach,which, unlike previous PCA-based approaches, does not require multiple sets of trainingmotion data. The dimension of the latent space is set to be three. For the motion optimization,the simulation time tf is chosen to be of the same value as the motion data; to satisfy theboundary conditions, a subset of the control points at the trajectory endpoints are excludedfrom the optimization variables. For the jumping motion, because the humanoid model canbe in contact with ground unlike the case for arm motions, we need to specify friction andrestitution coefficients for our dynamics simulation. The optimization parameters includingfriction and restitution coefficients are shown in Table 5.

For the jumping motion optimization, the effort L in the objective function (22) is set to

L = −|PprojVfeet|, (33)


Fig. 7 Jump motion sequencesfor tube size 0.1

Table 5 Optimizationparameters Height 1.75 m

Mass 63.86 kg

Motion duration 1.3 sec

DOF for learning 37

Latent dimension 3

Number of control points 11

Time step 0.001 sec

Friction coefficient 0.4

Restitution coefficient 0.5

where Vfeet is the velocity of the feet, and Pproj is a projection matrix that projects the ve-locity onto the ground. Integrating the projected speed L over time, we obtain the distancetraversed by the feet. Unlike the previous arm raising motion optimization, for the broadjump motion optimization, it is necessary to impose a torque limit constraint; clearly, thelength of the jump depends on the torques generated by the actuators, so that reasonabletorque limits should be applied. The initial motion trajectory (i.e., training data) as shownin Fig. 7(a) is used to calculate the torque trajectory τjoint(t), and τmax = max(τjoint(t)) andτmin = min(τjoint(t)) can be obtained. We therefore impose the following torque limit con-


Table 6 Jump distance

P. gain means performance gain

The unit of distance is m

The unit of performance gainis %

The unit of computation time isseconds

Initial distance is 1.45 m

Tube size Distance P. gain Computation time

0.02 1.47 1.38 18.72

0.04 1.64 13.1 18.85

0.06 1.7 17.24 18.97

0.08 1.77 22.07 18.97

0.1 1.81 24.83 18.93

0.12 1.88 29.66 19.01

0.14 1.92 32.41 19.35

0.16 1.92 32.41 19.35

straints:

τmin ≤ τjoint ≤ τmax. (34)

The optimization results are shown in Fig. 7. From the results of Fig. 7(b) it can be seenthat the body falls over during landing. In principle, one could decrease the variance tubesize δ so that it more closely resembles the human training motion data, but instead weimpose an appropriate balancing constraint at landing. The dynamic balancing constraintsare expressed in terms of the zero-moment point (ZMP) and center of mass (COM). Tomaintain balance during landing, we require that the ZMP and projected COM lie in theinterior of the support polygon S, which is expressed as a set of linear inequalities of theform [20]

ASpzmp(q, q, q) + bS ≤ 0, (35)

ASpcom(q) + bS ≤ 0, (36)

where the matrix AS and vector bS specify the boundary of the support polygon, and pzmp

and pcom represent ZMP and projected COM positions. The torque limit and the balanc-ing constraints are imposed as soft constraints in this motion optimization. If, thus, someconstraints C(P, t) < 0 are violated, a squared penalty term C(P, t)2 is augmented to theobjective function with a large weight.

The optimized motions are shown in Fig. 7(c), whereas the numerical results are sum-marized in Table 6. The table shows that performance gains are improved from 4.83 % to35.86 % as the tube size increases from 0.02 to 0.16. Optimized jump trajectory q(t) wouldbe differed from the original trajectory, and then its difference leads to the change of massM(q(t)) and Coriolis C(q(t), ˙q(t)) in dynamics equations (3). Thus, even smaller torquesthan those of original motion could generate longer jump distance as a result of optimization.

Since the performance is saturated at a tube size of 0.14, one would expect that torqueand/or balancing constraints will not be satisfied for larger tube sizes. The table also showsthat the resulting increase in computation times is very slight; although the tube is increasedeight-fold, the computation time increases by only 0.63 seconds, from 18.72 to 19.35 sec-onds.

5 Conclusions

In this paper we have examined the extent to which the Gaussian process dynamical model(GPDM) can be used as a method of dimension reduction in motion optimization problems.


Compared to other more classical dimension reduction methods that have been used formotion optimization purposes, for example, principal component analysis, one of the mainadvantages of using GPDM is that only a small number of training motion data—even asingle trial is sufficient in most cases—are required during the learning phase. We introducethe notion of variance tubes as an intuitive and efficient means of specifying the searchspace in the latent space. The variance tube can be adjusted through a single parameter δ,with smaller values of δ leading to optimized trajectories that more closely resemble theoriginal training motion data.

To evaluate the performance of our algorithm, we consider a seven-dof arm and a 37-dofsymmetric humanoid model. Because the optimization is performed directly in the lower-dimensional latent space, the problem dimension is greatly reduced; motions for a typicalsystem with a large number of degrees of freedom can be reasonably modeled and optimizedin a latent space of dimension as low as three. Our numerical studies with humanoid motioncapture data demonstrate that considerable improvements can be achieved over humanoidmovements that exactly replicate the human training motion data. GPDM-based optimiza-tion offers a highly promising, computationally efficient framework for generating optimalmotions for high-dimensional multibody dynamic systems. Our current efforts are directedat extending the algorithm to allow for more complex task constraints.

References

1. Lim, B., Ra, S., Park, F.C.: Movement primitives, principal component analysis, and the efficient genera-tion of natural motions. In: IEEE International Conference on Robotics and Automation, pp. 4641–4646(2005)

2. Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290,2323–2326 (2000)

3. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimension-ality reduction. Science 290, 2319–2323 (2000)

4. Roweis, S., Saul, L., Hinton, G.E.: Global coordination of local linear models. In: Advances in NeuralInformation Processing Systems, vol. 14, pp. 889–896 (2001)

5. Teh, W.Y., Roweis, S.: Automatic alignment of local representations. In: Advances in Neural InformationProcessing Systems, vol. 15, pp. 841–848. MIT Press, Cambridge (2003)

6. Lawrence, N.D.: Gaussian process latent variable models for visualization of high dimensional data.Adv. Neural Inf. Process. Syst. 16, 329–336 (2004)

7. Grochow, K., Martin, S.L., Hertzmann, A., Popovic, Z.: Style-based inverse kinematics. In: ACM Trans-actions on Graphics (TOG), vol. 23, pp. 522–531. ACM, New York (2004)

8. Yamane, K., Ariki, Y., Hodgins, J.: Animating non-humanoid characters with human motion data. In:ACM SIGGRAPH Symposium on Computer Animation, pp. 169–178. ACM, New York (2010)

9. Kang, H., Park, F.C.: Humanoid motion optimization via nonlinear dimension reduction. In: IEEE Inter-national Conference on Robotics and Automation, pp. 1444–1449 (2012)

10. Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. IEEETrans. Pattern Anal. Mach. Intell. 30(2), 283–298 (2008)

11. Park, F.C., Bobrow, J.E., Ploen, S.R.: A Lie group formulation of robot dynamics. Int. J. Robot. Res.14(6), 609–618 (1995)

12. Featherstone, R.: Robot Dynamics Algorithms. Kluwer, Boston (1987)13. Murray, R.M., Li, Z., Sastry, S.S.: A Mathematical Introduction to Robotic Manipulation. CRC Press,

Boca Raton (1994)14. Babic, J., Lim, B., Omrcen, D., Lenarcic, J., Park, F.C.: A biarticulated robotic leg for jumping move-

ments: theory and experiments. J. Mech. Robot. 1(1), 011013 (2009)15. Bobrow, J.E., Martin, B., Sohl, G., Wang, E.C., Kim, J., Park, F.C.: Optimal robot motions for physical

criteria. J. Robot. Syst. 18(12), 785–795 (2001)16. Lee, S.H., Kim, J.G., Kim, M.S., Park, F.C., Bobrow, J.E.: Newton-type algorithms for dynamic-based

robot motion optimization. IEEE Trans. Robot. 21(4), 657–667 (2005)17. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)


18. Webb, D.J., Berg, J.V.D.: Kinodynamic RRT*: asymptotically optimal motion planning for robots withlinear dynamics. In: IEEE International Conference on Robotics and Automation, pp. 5054–5061 (2013)

19. Karaman, S., Frazzoli, E.: Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res.30(7), 846–894 (2011)

20. Park, J., Han, J., Park, F.C.: Convex optimization algorithms for active balancing of humanoid robots.IEEE Trans. Robot. 23(4), 817–822 (2007)

motion optimization using gaussian process dynamical...

Documents