a mathematical model of the adaptative control of human arm motions

8/8/2019 A Mathematical Model of the Adaptative Control of Human Arm Motions

1/14

Abstract. This paper discusses similarities between mod-els of adaptive motor control suggested by recent

experiments with human and animal subjects, and thestructure of a new control law derived mathematicallyfrom nonlinear stability theory. In both models, thecontrol actions required to track a specied trajectory areadaptively assembled from a large collection of simplecomputational elements. By adaptively recombiningthese elements, the controllers develop complex internalmodels which are used to compensate for the eects ofexternally imposed forces or changes in the physicalproperties of the system. On a motor learning taskinvolving planar, multi-joint arm motions, the simulatedperformance of the mathematical model is shown to bequalitatively similar to observed human performance,suggesting that the model captures some of the interestingfeatures of the dynamics of low-level motor adaptation.

1 Introduction

Cybernetics, as envisioned by Norbert Wiener [33], is theunied study of the information and control mecha-nisms governing biological and technological systems. Inthe spirit of this unied vision, we explore below possiblebridges between recent models of the adaptive control ofmultijoint arm motions developed separately in robotics

and neuroscience, comparing both the underlying struc-ture as well as the observable performance of thedierent models.

In neuroscience, new experiments in motor learning[28] present convincing evidence that humans developinternal models of the structure of any external forceswhich alter the normal dynamic characteristics of theirarm motions. These adaptive models are then used togenerate compensating torques which allow the arm tofollow an invariant reference trajectory to a speciedtarget. As a possible mechanism for this adaptation, it

has been conjectured that the internal models, and thecompensating torques they generate, are `pieced to-

gether' from a collection of motor computational ele-ments, representing abstractions of the actions ofindividual muscles and their neural control circuitry [18,28]. Motor learning in this context can thus be viewed asa method for continuously adjusting the contribution ofeach computational element so as to oset the eects ofnew environmentally imposed forces.

On the other hand, working from rst principleswithin the framework of nonlinear stability theory, anew class of robot control algorithms has been devel-oped which closely mirrors this biological model [27].Formalizing and extending previous biologically in-spired manipulator control algorithms, e.g. [9, 1416],these new controllers allow simultaneous learning andcontrol of arbitrary multijoint motions with guaranteedstability and convergence properties.

This conuence of formal mathematics and observedneurobiology is quite interesting and merits further ex-ploration. In this paper, we present an initial evaluationof the ability of the new robot control algorithm to alsoprovide a model of the adaptation of human multijointarm motions. Specically, we compare the performanceof the new algorithm on a simulation of one of thelearning tasks used in [28] to the actual performance ofhuman subjects on the same task. As shown below, thenew model not only reproduces many of the measuredend results of this motor learning task, but also captures

a substantial component of the actual time evolution ofthe adaptation observed in human subjects. The quali-tative correlation between the simulated and measuredadaptive performance suggests that the proposed algo-rithm may provide a model for some of the interestingfeatures of low-level motor adaptation.

2 Models of arm dynamics and control mechanisms

2.1 Arm dynamics and plausible controller structures

To a good approximation, human arm dynamics can be

modeled as the motion of an open kinematic chain of

Biol. Cybern. 80, 369382 (1999)

A mathematical model of the adaptive control of human arm motions

Robert M. Sanner, Makiko Kosha

University of Maryland, Space Systems Laboratory, College Park, MD 20742, USA

Received: 20 September 1994 / Accepted in revised form: 18 November 1998

Correspondence to: R.M. Sanner

(e-mail: [email protected])


2/14

rigid links, attached together through revolute joints,with control torques applied about each joint [4, 30].Within the limits of exure of each joint, human armmotions can thus be modeled by the same equationsused to model revolute robot manipulators, i.e.,

Hqq FqY q Gq EqY q s 1

where q P Rn are the joint angle of the arm. The matrixH P Rnn is a symmetric, uniformly positive deniteinertia matrix, the vector F P Rn contains the Coriolisand centripetal torques, the vector G P Rn contains thegravitational torques (and hence is identically zero formotions externally constrained to the horizontal plane),and nally the vector E P Rn represents any torquesapplied to the arm through interactions with itsenvironment. The forcing input s P Rn represents thecontrol torques applied at each arm joint.

Given the recent progress in the development ofadaptive, trajectory following, robotic control laws [23,

30, 31], it is natural to wonder whether in fact humanarm control algorithms have a similar structure. Eachadaptive robot control algorithm breaks down into twocomponents. The rst of these is a xed one which, givenperfect information about the dynamics of the robot andits environment, counteracts the natural dynamic ten-dency of the robot and ensures that the closed-loop armmotions are asymptotically attracted to a task-depen-dent desired trajectory. The second, adaptive componentrecursively develops an estimated model of the govern-ing dynamics, which is then used in place of the (un-known) true model in the xed component. The desiredtrajectory is present as a signal exogenous to the armcontrol loop, depending upon the nature of the task at

hand; the controller uses this signal together with mea-surements of the position and velocity of each joint andthe current estimated model to generate the necessarytorques.

For such an algorithm to be plausible as a model ofhuman arm motions, there must rst be evidence inhumans of a task-dependent desired trajectory which thearm attempts to follow. Experimental and theoreticalevidence supporting this has been reported in [6, 11],which show that, in the absence of other constraints, forplanar arm motions humans appear to make rest-to-rest,point-to-point motions in a manner which minimizes thederivative of hand acceleration. This is the so-called

`minimum jerk' trajectory through the arm's workspace.The desired arm trajectory is thus a function only of theend points of the required motion, and the organizationof motion is decoupled from the execution of motion, aswith an adaptive robot controller.

A second criterion for plausibility is evidence thathumans adaptively form internal representations oftheir own and any environmental dynamics, and thatthese representations are used to ensure that arm mo-tions follow the desired, i.e., minimum jerk, trajectory.Using a series of innovative experiments, the study [28]presented convincing evidence that both of theseproperties of human arm control are present, at least

for two degrees of freedom, planar arm motions. By

measuring how humans learned to compensate for anexternally applied disturbance to their arm motions,this study concluded that the end-eect of humansensory-motor learning in a new dynamic environmentis an internal representation, in joint coordinates, ofthe forces applied by the new environment. This rep-resentation is then used to generate compensating tor-

ques which allow the arm to follow the minimum jerktrajectory.

On the basis of these and prior experiments, [28]proposed a control law which may describe the struc-ture of the motor control strategy employed in planararm motions. The following section describes theircontrol law mathematically and contrasts it with thestructure of the control laws used for adaptive robotmanipulators.

2.2 Arm control laws: mathematical descriptions

To explain the observed performance of their humansubjects, the following trajectory tracking control lawwas proposed in [28]:

sqY qY Hqqm FqY q EqY q KD ~q KP~q

2

where KD and KP are constant, positive, denitematrices, ~q q qm, and qm is the model(minimum jerk) trajectory the joint angles q arerequired to follow. The termsHYFY andE representlearned estimates of the corresponding terms whichappear in the equations of motion (1). This control lawthus consists of constant linear feedback terms togetherwith learned estimates of the nonlinear componentsof (1).

On the other hand, working directly from the math-ematical structure of (1), an alternative, equally plausi-ble controller structure can be identied which, as will beshown below, is directly amenable to stable, continuous,adaptive operation. To understand the structure of thisalgorithm, rst dene a new measure of the trackingerror

s d

d K

~q ~q K~q 3

where K is a constant, positive, denite matrix. Notethat this algebraic denition of the new error metric salso has a dynamic interpretation: The actual trackingerrors ~q are the output of an exponentially stable linearlter driven by s. Thus, a controller capable ofmaintaining the condition s 0 will produce exponen-tial convergence of ~q to zero, and hence exponentialconvergence of the actual joint trajectories to the desiredtrajectory qm.

Use of this metric allows the development of controllaws for (1) which directly exploit the natural passivity(conservation of mechanical energy) property of these

systems [31]. Consider the following control law

370


3/14

sqY qY Hqq CqY q q EqY q KD ~q KDK~q 4

where q qm K~q, and both KD and K arepositive denite matrices, with Kh possibly time varying.For xed feedback gains, this is very similar to (2), butsubstituting q for qm, and utilizing the known (butnonunique) factorization

FqY q CqY q q

Denoting by soqY qY the control law obtained using theactual matrices rYgand the actual external forces E, theresulting closed loop equations of motion can be written,after some manipulation, as

H s KDs Cs ~sqY qY 5

where ~s s so. With these dynamics, the uniformlypositive denite energy function sY sTHqsa2has a time derivative

sY sTKDs sT~s sT H 2Csa2

Conservation of energy for the mechanical system (1)identies a specic C for which the matrix H 2C isskew symmetric, thus rendering the last term aboveidentically zero. The energy function thus satises thedissipation inequality [31]

sY kD k s k2 s~s

where kD is a uniform lower bound on the eigenvalues ofKD. The closed-loop dynamics hence describe apassive input-output relation between s and ~s, a factwhich is instrumental in the development of stable, on-

line adaptation mechanisms, such as those examinedbelow. Moreover, if~s 0, that is if `perfect' knowledgeis incorporated into the controller, then the energyfunction is actually a Lyapunov-function for the system,showing that s, and hence ~q, decay exponentially tozero from any initial conditions using s so [30].

3 Adaptive arm control and `neural' networks

How might the control law (4) be implemented biolog-ically and, more importantly, how do the components ofthe controller evolve in response to changing environ-mental force patterns or changes in the physical

properties of the arm? In the following section, thisquestion is considered from both a mathematical and abiological viewpoint, and the two vantages are shown tosuggest quite similar solutions.

3.1 Adaptive robot controllersand possible biological analogs

Adaptive robot applications exploit a factorization ofthe nonlinear components

snlqY qY Hqq CqY q q EqY q

YqY qY 6

where prior knowledge of the exact structure of theequations of motion is used to separate the (assumedknown) nonlinear functions comprising the elements ofHY CY and E from the (unknown but constant) physicalparameters in the vector . Indeed, with this factoriza-tion, the control law

sqY qY KDs YqY qY 7which uses estimates of the physical parameters in placeof the true values, coupled with the continuous adapta-tion law

CYTqY qY s 8where C is a symmetric, positive, denite matrixcontrolling the rate of adaptation, results in globallystable operation and asymptotically perfect tracking ofany suciently smooth desired trajectory [30].

Although this solution is mathematically elegant, itseems unlikely that the human nervous system is spe-cically hardwired with a particular set of nonlinearfunctions to be used in motion control laws. Moreover,since generally the forces, E, imposed by the environ-ment will be quite complex and variable in structure, it isnot apparent how such a simple parameterization couldadequately capture the entire possible range of envi-ronments which might be encountered.

From a biological perspective, the study [28] suggests,similar to [3, 18], that internal models of the nonlinearterms, and the compensating torques they generate, are`pieced together' from elementary structures collectivelycalled motor computational elements. These structuresrepresent abstractions of the low-level biological com-ponents of motor control, whose contributions at any

time depend upon the instantaneous conguration qand instantaneous velocity q of the arm. Moreover,experimental evidence suggests that these computationalelements are additive, so that simultaneous stimulationof two motor control circuits results in a (time-varying)output torque which is the sum of the torques whichwould result from separate stimulation of each circuit [8,20].

These observations suggest that the structure of theadaptive nonlinear component of human control of armmotions can be represented using a superposition of theform

s nli qY qY xk1

aiYkukqY qY 9wheresnli qY qY is an (adaptive) estimate of thenonlinear torque required about the ith joint given thecurrent state of the limb and the desired motion. Eachuk represents a (conguration- and velocity-dependent)torque produced by a motor computational element,andaiYk are weights which represent the relativestrength of each elementary torque at time . Motorlearning in this context can thus be viewed as a methodfor re-weighting the elementary torques so as to osetthe eects of new environmentally imposed forces or

changes in arm physical properties.

371


4/14

A possible mechanism for learning these relativeweightings is suggested by the Hebbian model of ne-uroplasticity [10], in which a synaptic strength is modi-ed according to the temporal correlation of the ringrates of the neurons it joins. Since the end product of themotor learning tasks considered herein is achieved whenthe arm follows the desired trajectory, a natural rst

approximation to the dynamics of the learning processwhich incorporates the above ideas is:

aiYk cukqY qY siwhere c is a constant which controls the rate of learning.In this scheme, each weightaiYk evolves in time accordingto the correlation of the elementary torque output, uk,and the tracking error measure si.

Signicantly, the two previous equations preciselyexpress the structure of the adaptive component of aclass of recently developed robot control algorithms [27],which join the stable `neural' control algorithms in [26]

with the adaptive robot algorithm in [30]. The followingtwo sections make this connection formally, exploring indetail the structure of this algorithm and its relation tothe motor computational element conjecture.

3.2 Modeling the motor computational elements

To develop a rm mathematical basis for the ideasdeveloped in the preceding section, consider the follow-ing alternative representation of the nonlinear compo-nent of the required control input:

snl

qY qY Hqq

CqY q q

EqY q MqY qqY qY

or, in component form,

snli qY qY 2n1j1

wiYjqY qjqY qY

where l ql, ln q

l, for l 1 F F F n, and 2n1 1.

Unlike expansion (6), which decomposes snl into amatrix of known functions, Y, multiplying a vector ofunknown constants , this expansion decomposes snl

into a matrix of (potentially) unknown functions M,

multiplying a vector of known signals . Without theprior information assumed above, an adaptive controllercapable of producing the required control input mustlearn each of the unknown component functions,wiYjqY q, as opposed to the conventional model whichmust learn only the unknown constants, .

The motor computational element conjecture sug-gests that approximations to the necessary functions are`pieced together' from the simpler functions uk. Signi-cantly, abstract models of biological computationstrategies have been shown to have precisely this func-tion approximation property [5, 7, 12, 21], provided eachwiYjqY q is continuous in q and q. Indeed, if this is the

case, for many dierent computational models there

exists an expansion which satises, for any qY q con-tained in a prespecied compact set e & R2n,

wiYjqY q xk1

iYjYkkqY qY nk

iYj

for any chosen accuracy iYj. This expansion approx-imates each component of the matrix M using a singlehidden layer `neural' network design with qY q as thenetwork inputs; here k is the model of the signalprocessing performed by a single `neural' element ornode, nk are the `input weights' associated with nodek, and iYjYk is the output weight associated with thatnode.

This theoretical result has been further strengthenedwith the development of constructive algorithms allow-ing a precise specication of x and nk based upon esti-mates of the smoothness of the functions beingapproximated [26]. For example, in radial basis functionmodels, i.e., models for which kY nk k nkk

for some positive scaling parameter , the parameters nkcan be chosen to encode a uniform mesh over the set ewhose spacing is determined by bounds on the signi-cant frequency content of the Fourier transform of thefunctions being approximated. This analysis thus leavesonly the specic output weights, iYjYk, to be learned inorder to accurately approximate the unknown functionswiYj.

Since the size of the required networks rises rapidlywith the number of independent variables in the func-tions to be learned, a practical implementation willmaximally exploit prior information to reduce the net-work size. To this end, in [27] it is noted that the term

CqY q q

may be further decomposed as

CqY q q C1q q q

where C1q P Rnn2 , and q q P Rn

2

contains all pos-sible combinations qi q

jY for iYj 1Y F F F Y n. If a similar

decomposition of E is assumed, for example EqY q E1qp q where now E1q P R

nn and p q P Rn

represents an assumed known q dependence, the numberof independent variables in each unknown function isreduced by a factor of 2. Indeed, the nonlinear terms canunder these conditions be decomposed as

snlqY qY NqqY qY

where P Rnn2 now contains the elements of qY q q,and p q.

Thus, assuming the functions required for eachcomponent of snl are suciently smooth, a networkapproximation of the form

sNi qY qY nn2j1

xk1

iYjYkkqY nkjqY qY 10

can accurately approximate the required nonlinearcontrol input for appropriate values of the networkparameters xY nk, and iYjYk. Indeed, dening d

snl

sN

one has

372


5/14

jdiqY qY j nn2j1

iYjjjqY qY j

for any inputs qY q P e, where each iYj is now theworst case network approximation error to the com-ponents ofx. Since e is compact, and since the smooth,minimum jerk cartesian paths produce correspondinglysmooth desired joint trajectories, kdk is uniformlybounded provided qY q can be guaranteed to remainin e. Furthermore, this bound can be made arbitrarilysmall by increasing the size of the approximatingnetwork [27].

There remains to specify the set e, dening the regionof the arm's state space on which the networks musthave good approximating abilities. For planar armmotions, note that the joint variables can be mathe-matically constrained to lie in the compact set pYpn,and most are physically constrained to lie in a strictsubset of this. Moreover, since there are physical limitson the torques which can be exerted by the muscles, and

physical limits on the possible range of joint angles,there are corresponding limits on the range of joint ve-locities which can be produced. The mechanical prop-erties of the arm thus naturally determine the `nominaloperating range', e, in which the motor controller wouldneed to develop an accurate model of the arm's dy-namics.

The expansion (10) utilizes ``neural'' network theory,the motor computational element conjecture of [28], andestablished theories of adaptive robot control to developa representation of each weighted computational ele-ment as

aiYkukqY qY kqY nk nn2j1

iYjYkjqY qY

However, representation (10) allows a more nelygrained approach to the control problem than expansion(9), and one more closely tied to the natural dynamics ofthe system, by allowing independent adjustment of eachof the iYjYk to determine the required input. Moreover,the above construction is by no means unique: instead of(10), which uses a single network to approximate thecomponents of sN, one could easily imagine dierentnetworks approximating each of these terms. Thisstrategy would produce a family of dierent k dening

each motor control element, corresponding to thedierent nodes used in each approximating network.

Note that, while the N decomposition used aboveallows a signicant reduction in the number of networknodes and weights required to achieve a speciedtracking accuracy [27], there is no biological signicanceclaimed for this simplication. It is done purely tominimize the calculations required in the simulationsbelow, and a more general implementation could ofcourse use the original M parameterization and basisfunctions k with both joint angles and joint velocities asinputs.

Additionally, the analysis in this section shows merely

that it is possible to construct the necessary control in-

puts from a collection of very simple elements; it saysnothing about the actual structure of the elements usedin human motor control. Further experimentation isneeded to determine a precise description of these ele-ments in humans, and to thus permit development of atruly accurate model of human motion control. Signi-cantly, however, the algorithm in [27] requires only that

the collection fukg be mathematically `dense' in the classof possible functions needed in the control law. Providedthe elements employed can satisfy the above approxi-mation conditions, the resulting controller, coupled withthe adaptation mechanism developed in the followingsection, will asymptotically track any smooth desiredtrajectory. This relative insensitivity to the specicstructure of the approximating elements suggests thathigh quality simulation models can be developed with-out precise knowledge of human physiology, an ideawhich will be explored more fully in Sec. 4.

3.3 Adapting the motor computational elements

To understand how the representations developed abovemight be adaptively used, a comparison of the `neural'expansion (10) with the adaptive robot algorithm (7)(8)suggests the control law

sqY qY KDs sNqY qY 11

where

sNi qY qY nn2j1

xk1

iYjYkkqY nkjqY qY 12

which again uses estimates of the required parameters inplace of the (assumed unknown) actual values. Use of(11) with dynamics (1) produces the closed-loop dynam-ics

H s KDs Cs YN

~ d

where the elements of the vector ~ are of the formiYjYk iYjYk, and the elements of Y

N P Rnxn22n are

the corresponding combinations kqY nkjqY qY .These are precisely the closed-loop dynamics of the

adaptive system obtained using (7), but for the presenceof the disturbance term, d, describing the discrepancy

between the required nonlinear torques and the bestpossible network approximation to them. As shown in[26, 27], proper treatment of this disturbance is funda-mental to the development of a successful adaptationalgorithm. The control and adaptation laws (7)(8) de-pend upon the globally exact linear parameterizationsnl Y. The representation (10), however, providesonly the locally approximate parameterizationsnl YN d, requiring special care in the design of anadaptation mechanism.

As long as the joint angles and velocities remainwithin their nominal operating range e, the impact ofthe disturbance term can be accommodated by using

robust adaptation methods, for example, by adding a

373


6/14

weight decay term to the adaptive algorithm. Theadaptive law (8) becomes in this case

iYjYk cxiYjYk sijqY qY kqY nk

13

where x 0 if kk ` 0, and x x0 b 0 other-

wise, and here 0 is an upper bound on the totalmagnitude of the parameters required to accuratelyapproximate snl. The parameter c is a positive constantwhich controls the rate of learning, and can be dierentfor each weight. An equally eective robust modicationto the adaptive law, and one which may be morebiological, is to allow each parameter value to saturateusing a projection algorithm of the form

iYjYk PcsijqY qY kqY nkY iYjYkY max

14

where PYyYz if z` y` z, or if y z and b 0, or ify! zand ` 0; PYyYz 0 otherwise.

Above it has been argued that the architecture of thearm ensures that the joint state variables remain withinan easily computable nominal range, e. In the moregeneral nonlinear control case considered in [26, 27],where such physical arguments may not be immediatelyobvious, a modication to the control law (11) is re-quired if the state variables ever leave their predenednominal range. This is accomplished by adding a `su-pervisory' or robust component to (11) itself, whoseaction is mathematically determined to force the stateback into the nominal range. Indeed, the mechanicalconstraints in a human arm, through the forces theyexert as the arm approaches the feasible conguration

boundaries, can be viewed as a realization of this `su-pervisory' control action. Similarly, it can be argued thatpainful stimulus from hyperextension of a joint wouldprovoke an analogous `over-ride' of the nominal motorcontrol strategy, forcing the limb to return to a morerelaxed conguration. The formal details of the requiredmodications are provided in [27].

This combination of `neural' approximation, robustonline adaptation, and robust `supervisory' action (ifrequired) can be proven to result in a globally stableclosed-loop system. In addition, the actual joint trajec-tories can be shown to asymptotically converge, inthe mean, to a small neighborhood of the desired tra-jectories [26, 27]. Explicitly, the convergence can be ex-pressed as

lim3I

1

0

k~qk2d

k2Dk2

15

where

D

sup

supPe

ni1

jdiqY qY j2

and k is the smallest eigenvalue of K. Larger linearfeedback gains and/or networks with betterapproximation capabilities will thus reduce the asymp-

totic tracking errors.

4 Simulating human motor learning

While the gross features of the controller (11)(14) seemto agree with experimental observations about thestructure of human motor control mechanisms, it isnot known to what extent this algorithm accuratelymodels the actual biological mechanisms of adaptation.

Instead, the algorithm constitutes a testable hypothesisabout adaptive motor control, where mathematics andnonlinear control theory have been used to bridge thegaps in available neurobiological data. This section thuspresents a preliminary evaluation of the properties ofthis model, qualitatively comparing its performanceduring a specic motor learning task with that of humansubjects.

The selected task is a simulation of the experimentused to train the human subjects in [28]. This experimentconsists of making short reaching motions constrainedto the horizontal plane, while the arm is perturbed by anunknown, but deterministic, pattern of externally ap-

plied forces. When no perturbing forces are applied, theobserved human arm motions are virtually straight linesfrom the starting point to the desired target, with a bell-shaped velocity prole in agreement with a minimumjerk trajectory. Under the inuence of the perturbations,the motions are initially sharply deected from thenominal straight line motion. With practice, however,these deections are almost entirely eliminated, reect-ing the adaptation of the motor control strategies uti-lized by the subjects.

As will be shown below, not only is this behaviorreected in the simulation using the proposed adaptivecontrol law, but the actual time evolution of the con-troller performance closely resembles that recorded from

the human subjects.

4.1 Simulation construction

A simulation model of two degree of freedom armmotions (see Fig. 1) was created using the dynamics (1),with n 2, and the representative human arm mass andlength parameters reported in [28]. The two degrees offreedom here correspond to elbow and shoulder rota-tions in the horizontal plane; for the purposes of theseexperiments, the hand can be considered rigidly attachedto the end of the arm, contributing no additional degrees

of freedom.

Fig. 1. Model of two degrees of freedom, planar arm motions

374


7/14

The desired trajectories driving the arm motions werecomputed from the experimental tasks used to train thehuman subjects in [28]. Each of these tasks consisted of a10 cm reaching motion, possibly in the presence of an(initially) unknown pattern of environmental forces. Thedesired endpoint for each reaching motion moved in apseudorandom fashion throughout a 15 by 15 cm

workspace centered at (0.26 m, 0.42 m) relative to thesubject's shoulder (the location of the q1 joint in themodel pictured in Fig. 1).

To generate the sequence of desired motions, thehand was initially placed in the center of the workspace.A direction was chosen randomly from the setf0Y 45Y F F F Y 315g measured clockwise with 0 corre-sponding to motion in the y direction. The desiredendpoint for the reaching motion was then 10 cm alongthis direction. After the hand reached this target, a newtarget was chosen at a distance of 10 cm from the oldtarget and along a new randomly selected direction. Theselection process was modied to keep the targets within

the 15 by 15 cm workspace.To generate a desired trajectory corresponding toeach reaching motion, a minimum jerk hand path ofduration 0.65 s was assumed, as in [28]. In the simula-tion, the hand was allowed a total of 1.3 s to reach thedesired target before a new target was selected. Thus, thedesired trajectory for each reaching motion consisted ofa 0.65 s minimum jerk path to the target, followed by a0.65 s hold at the target.

The controller was initialized with perfect `self-knowledge', i.e., at 0, H H and C C, but noknowledge of any external forces, i.e., E 0 at 0.This was accomplished by modifying the control law(11) slightly, so that

sqY qY KDs Hqq C1q q q

sNqY qY 16

withsN still given by (12) and alliYjYk0 0X. Theadaptive network contributions in this case thus learnonly the departures from the nominal model thecontroller has developed from an assumed prior `life-time' of practice. This initialization is by no meansnecessary, and is done only to facilitate comparison withthe experimental and simulation results reported in [28].The robotic examples considered in [27] demonstrate theability of the algorithm to track any desired trajectory

quickly with no such prior information.The torques applied about each joint were determined

using the control laws (14), (12), and (16), together withthe gain matrices

KD 2X3 0X90X9 2X4

!K

6X5 0X0640X064 6X67

!

which were computed using the representative jointstiness and viscosity coecients reported in [19, 28].Note that measurements of human arm motion suggestthat the eective stiness component becomes smaller as~q increases [29]. While the proposed control law (11) has

the exibility to accomodate these time-varying feed-

back gains, in the absence of an analytic model for thesevariations in humans and to facilitate comparisons with[28], this feature has not been exploited here. Similarly,since the arm motions required to perform the experi-ment are well within the workspace of the simulatedarm, a simulation of the constraint forces imposed bythe joint limits was not included.

4.2 Network design

A radial basis function network was employed, withnodes kq q k for a xed scale parameter b 0 and a xed range of translations kPK &Z2.Such a network is known to be capable of approximat-ing continuous functions with an accuracy proportionalto , uniformly on the interior of a domain spanned bythe translates ka, kPK [22, 26]. The constant b 0,quantifying the rate of convergence of the approxima-tion, depends upon the specic basis function as well

as the smoothness of the functions being approximatedby the network.

For this study, a Gaussian basis function was chosen,with q expkqka2, and the domain on which goodapproximation is required is the subset of pY p2 con-taining the range of joint angles required during theexperiment, which here is e X5Y 2 X5Y 2X5. Thescale factor, , in the network was chosen as 2,ensuring that the `width' of the Gaussians was broadenough to allow the possibility of generalization ofnetwork learning across the set e, and the transla-tion range was correspondingly chosen as K 5Y F F F Y 8 3Y F F F Y 9, producing a total of 182 nodes.

Larger values of would allow a theoretically betterapproximation (since in (15) is proportional to ), atthe expense of a larger network size (since the range oftranslates k/Y kK must still cover the same set e) anddecreased generalization capability (since each Gaussianwill be more `narrow', and thus contribute little to theapproximation at points in e remote from its center k/).The analyses in [25, 26] provide a more precise mathe-matical discussion of these tradeos. The specic valueschosen above attempt to balance the conicting goals ofhigh accuracy, small network size, and good general-ization potential, but for this preliminary study no at-tempt was made to optimize this tradeo.

The resulting network can be expressed as

sNi qY qY 8j1

kPK

iYjYkq kjqY qY

where

q1Y q2Y q1 q

1Y q1 q

2Y q2 q

1Y q2 q

2Y q1Y q2

and there are thus a total of 2912 adjustable outputweights which must be learned. Each output weight wasupdated during the experiment using (14) with theconservative upper bound max 75. The learning rates

were chosen to vary with j; the specic values cj 0X01

375


8/14

for j 1Y F F F Y 6 and cj 0X04 for j 7Y 8 were used inthe simulation.

4.3 Adaptive controller performance

To display the evolving performance of the controller, a

set of 8 reaching motions originating at the center of theworkspace and extending 10 cm along each of thedirections in the above set was used. The resulting `starpattern', corresponding to the minimum jerk trajectoriesto these targets, is shown in Fig. 2. With no externalforces acting on the system and no additional loadsplaced upon the arm, the control law as initializedshould be able to perfectly track these trajectories, andindeed, Fig. 2 is exactly reproduced using the baselinecontroller.

In the presence of new environmental forces, how-ever, substantial deviations from these trajectories areexpected until the controller builds up a sucient in-

ternal model of the new forces. Figure 3 shows the initialperformance of the controller on the star pattern whenthe arm is subject to the eld

EqY q tqftq q 17

where

f 10X1 11X211X2 11X1

!

and t is the Jacobian of the mapping from joint toCartesian coordinates. This is the eld used with onegroup of experimental subjects in [28]; Fig. 3 is in fact

quite similar to the measured human behavior on initialexposure to this force eld, as well as to the output ofShadmehr and Mussa-Ivaldi's own simulation model.(Recall from Sect. 2 that, even without adaptation, the

proposed control law diers in several ways from (2)proposed in [28].)

Having established the baseline controller perfor-mance, both without external forces and in the elddescribed by (17), a series of 250 reaching motions weresimulated in the presence of the eld (17), using thepsuedorandom procedure described above. The con-troller's performance on the star pattern was thenevaluated, followed by another set of 250 reachingmotions, and so on. In this fashion, a total of 1000

reaching motions were simulated, giving the controlleran opportunity to build up a model of the forces, E, andfour equally spaced `snapshots' were generated of theevolving controller performance on the canonical starpattern.

The results of this simulation are summarized inFig. 4, which bears a noticeable resemblance to thecomparable plots of human performance reported in [28]and shown for comparison in Fig. 5. In particular, theorientation, length, and rate of diminution of the `hooks'in the deviations from the desired trajectory agree wellwith the human data. It is evident from this gure thatthe controller is gradually learning to counteract the

inuence of the applied force eld so as to regain thebaseline tracking of the desired trajectories shown inFig. 2.

4.4 `Aftereects' of adaptation

As pointed out in, [28] it is possible that the baselineperformance is being recovered, not by developing aninternal model of the new forces, but rather by makingthe linear parts of the controller more robust, forexample `stiening' the joints by increasing uD and K.Indeed, the bound (15) above displays this possibility

explicitly. The network contribution to the control law

Fig. 2. Desired trajectories used to evaluate the performance of the

proposed control law

Fig. 3. Tracking of the desired trajectory upon initial exposure to theenvironmental force pattern given by (17)

376


9/14

might thus simply augment the linear feedback terms.To resolve this issue, at the same time each of the above`snapshots' was taken, a second snapshot was generated,again evaluating the performance of the controller on

the star pattern, but here with no environmental forcesapplied, i.e., with E = 0.

If the controller is simply learning to increase thelinear feedback, its performance with the eld `o 'should exactly resemble the baseline performance ofFig. 2, since the magnitudes of the feedback gains willnot aect the perfect tracking observed in this situation.If, on the other hand, the controller is developing aninternal model of E, and using this model to modify thetorques it commands, then when the environmentalforces suddenly vanish, there should be substantialdeviations from the baseline performance, since thecontroller will be generating torques to counteract a eld

which is no longer present. Indeed, these deviations

should increase as a function of learning time, eventuallyresembling `mirror images' of the deviations seen inFig. 4. These deviations away from the baseline perfor-mance under the nominal (no eld) operating conditions

have been termed the `aftereects' of adaptation by [28].Figure 6 shows that, indeed, the controller exhibits

signicant aftereects, and that the magnitude of thesedeviations increases steadily, eventually resembling a`mirror image' of the trajectory deviations seen in Fig. 4.The adaptive networks are thus indeed being used tomodel and oset the force eld. There is generally goodqualitative agreement with the comparable plots of theaftereects recorded in human subjects, although onsome of the legs, notably those at 90, 135 and 315, thedeviations are less severe than observed in the humandata. Otherwise, the orientation, magnitude, and rate ofgrowth of the trajectory deviations agree with those

observed in humans.

Fig. 4. Evolution of the performance of the adaptive control algorithm as a function of training time. After attempting to track 1000pseudorandom motions throughout the workspace, perfect tracking of the desired trajectories of Fig. 2 is nearly completely recovered. Comparewith the measured human performance on the same task in shown Fig. 5

377


10/14

4.5 Generalization and persistency of excitation

It is important also to evaluate the extent to which themodel learned by the network can generalize to novelregions of the state space. In the human experiments

reported in [28], the test subjects clearly showed thatlearning in the original 15 by 15 cm workspace inu-enced the performance of identical reaching tasksconducted in a dierent workspace. This observationled Shadmehr and Mussa-Ivaldi to conclude that themotor computational elements were `broadly tuned'across the state space: That is, the observed adaptationwas not an extremely localized, `look-up table' phenom-enon, but rather utilized elements uk which contributesignicantly over a large range of joint angles andvelocities. This observation was incorporated into theconstruction of the simulation, by appropriately select-ing the variance of the Gaussians used in the adaptive

networks. The relatively small Gaussian scaling param-eter used in the network ensures that changes to theweights iYjYk will produce eects in the control law faroutside the original workspace. Figure 7 illustrates thisgeneralization by showing that aftereects are observedon motions performed far outside the workspace usedfor training.

An interesting and well-known feature of directadaptive control systems is that such devices will buildonly as complete an internal model as is sucient toaccurately track the commanded desired trajectories. Inthe experiments simulated above, there is thus noguarantee that the internal model which permits recov-

ery of the baseline performance in a given workspace

will coincide with the actual structure of the environ-mental forces. In fact, note that the actual eld (17) doesnot deviate substantially during the experiment (andsimulation) from the linearization

EvqY q tq0ftq0 q 18

where q0 are the joint angles which place the hand in thecenter of the original workspace. This would suggestthat even though the computational elements arebroadly tuned, the controller will not correctly general-ize its learning to a new workspace, but rather will usean internal model more similar to the linearized eldseen in the original workspace.

Indeed, after performing the 1000 reaching motionsdescribed above, Fig. 8 shows the performance of thecontroller, operating in the same force eld (17), buttracking a star pattern centered instead at (0X114 m,

0.48 m) relative to the shoulder. Compare this withFig. 9, which shows the performance of the same con-troller, again tracking the relocated star pattern, butoperating instead in the linearized eld (18). These plotsreveal that, while the controller has clearly generalizedits learning, it has not developed a complete model of theforce eld (17). While neither plot displays excellenttracking of the star pattern in the new workspace, thetracking in the linearized eld, shown in Fig. 9, isqualitatively closer to the tracking ultimately achieved inthe original workspace (the last `snapshot' in Fig. 5).This suggests that during its training, the controller de-veloped only a locally accurate model of the actual eld,

more similar to Ev than to E. When the controller

Fig. 5. Human performance on the same simulated task,reprinted from [28]

378


11/14

attempts to use this model in the new workspace, wherethe true eld is quite dierent from Ev, the poor trackingperformance observed in Fig. 8 results. Figures 7through 9 are again similar to the corresponding plots ofhuman performance reported in [28], although legs,

notably those at 0, 45, and 224, the deviations ob-served in the human data are signicantly worse thanthose in the simulation data.

Continued practice in the new workspace would causeconvergence to the new desired trajectories, recoveringagain the baseline behavior. In general, however, if allthe required motions can be accurately followed withouta completely accurate model, there is no pressure for thesystem to improve its controller. By instead choosingappropriately `exciting' desired motions, so that perfecttracking would require a perfect model, the internalmodel of the environmental forces can be made to as-ymptotically converge to the actual force structure. The

persistency of excitation conditions, which mathemati-

cally dene the required trajectories, are reviewed in [30]in a robotic content, while in [32] the structure of theseconditions for applications employing Gaussian net-works is discussed.

4.6 Additional observations

The results above were obtained with no specialconsideration given to the specic network elementsused, save to ensure that they were broadly tuned, andthat the collection had a good approximating power fora large class of functions. Since it is very unlikely thatactual motor computational elements have a preciselyGaussian structure, the qualitative agreement of thesimulation results with the observed human perfor-mance illustrates the relative insensitivity of theproposed model to the exact structure of the elemen-

tary functions employed. The specic values for the

Fig. 6. Evolution of the ``aftereects'' of adaptation as a function of training time. After attempting to track 1000 pseudorandom motionsthroughout the workspace, the trajectory perturbations produced by the aftereects resemble a mirror image of the perturbations seen in Fig. 4.Compare with the measured human performance on the same task reported in [28]

379


12/14

adaptation gains, however, were chosen by trial anderror to yield incremental performance improvementqualitatively similar to the reported human perfor-mance. Recall that the learning rates are free parametersin the stable adaptation mechanisms (13), (14); anypositive values will result in a stable, convergentalgorithm. The specic choices which would match thesimulated learning rate to that observed in humans,

however, could not be predicted a priori.

Similarly, while the asymptotic recovery of the de-sired trajectory depends mathematically only on theapproximation power of the aggregate collection fkg,the manner in which the network generalizes its learningto a new workspace is much more sensitive to the specicchoice of basis function. A very large choice of in theGaussian network above, for example, might producecomparable results on the learning task, but exhibitvirtually no generalization in the new workspace, since

such a network exhibits quite local learning. Dierentchoices of the `shape' of (for example, sigmoidal asopposed to Gaussian) would similarly produce dierentgeneralization properties. No attempt was made in thisstudy to tune the choice of basis functions to bettermatch the generalization observed in humans; this willbe a topic of future investigation.

Finally, note that while the simulated experimentsdescribed above required only learning the unknown E,the control and adaptation laws used are capable ofaccommodating much more complex changes in thedynamics. For example, if the arm were to suddenly graba massive, oddly shaped object, such as a tennis racket

or bowling ball, the matrices r and g would suddenlychange, requiring comparable modications to thenonlinear components of the control law to ensurecontinued tracking accuracy. Similar changes occur tothese components of the dynamics over longer time pe-riods, as skeletal structure and musculature change. Theproposed algorithm naturally has the exibility to adaptto these more general changes in the dynamics.

5 Concluding remarks

In this paper, we have attempted to illustrate the strong

similarity between models of adaptive motor control

Fig. 7. Tracking of a star pattern using a null eld in a workspacecentered at 0X114 m, 0X48 m) relative to the shoulder, after trainingin the eld (17) in the original workspace. The presence of aftereectsin the new workspace indicates that the algorithm has generalized itslearning

Fig. 8. Tracking of a star pattern centered in the new workspace aftertraining in the original workspace. The motion is still perturbed by theenvironmental force eld (17)

Fig. 9. Tracking of a star pattern centered in a dierent workspacethan that used for training. The perturbing eld is now given by (18),even though (17) was used in the training

380


13/14

suggested by recent experiments with human and animalsubjects, and the structure of new robotic control lawsderived mathematically. In both models, the nonlinearcomponent of the torques required to track a speciedreference trajectory is assembled from a collection ofvery simple, elementary functions. By adaptively recom-bining these functions, the controllers can develop

internal models of their own dynamics and of anyexternally applied forces, and use these adaptive modelsto compute the required torques. Biologically, theelementary functions represent abstractions of theactions of individual muscles and their neural controlcircuitry. Mathematically, however, the elementaryfunctions can be any collection of basis elements whichpermit accurate reconstruction of continuous functions,such as those comprising current `neural' networkmodels.

Instead of iterative training methods, we have pro-posed a continuously adaptive model which has a strongHebbian avor. By using the adaptive elements in a

method which fully exploits the underlying passive me-chanical properties of arm motions, the resulting strat-egy of simultaneous learning and control can beguaranteed to produce stable, convergent operation.This continuous model has enabled not only repro-duction of many of the end-results of the particularmotor learning task examined, but also captures a sig-nicant component of the actual time evolution of theadaptation observed in human subjects.

The insensitivity of the proposed algorithm to thespecic choice of basis functions is quite encouraging.The actual structure of the computational elements un-derlying human motor control may not resemble any ofthe biological computation models currently under in-

vestigation, including those employed herein. Since theperformance of the model does not depend upon aspecic choice of computational element, only upon theproperties of the aggregate, the adaptive control modeldescribed above may capture some of the interestingfeatures of actual low-level motor adaptation. Indeed,the underlying idea continuously patching togethercomplex control strategies from a collection of simpleelements is not only biologically plausible, it representsa sound engineering solution to the problem of learningin unstructured environments.

References

1. Arimoto S, Kawamura S, Miyazaki F (1984) Bettering opera-tion of robots by learning. J Rob Sys 1:123140

2. Atkeson CG, Reinkensmeyer DJ (1990) Using associativecontent- addressable memories to control robots in: Miller TW,Sutton RS, Werbos PJ (eds) Neural networks for control MITPress, Cambridge, Mass.

3. Bizzi E, Mussa-Ivaldi F, Giszter S (1991) Computations un-derlying the execution of movement: a novel biological per-spective. Science 253:287291

4. Craig JJ (1986) Introduction to robotics: mechanics and controlAddison-Wesely, Reading, Mass.

5. Cybenko G (1989) Approximations by superposition of a sig-

moidal function. Math Cont Sig Sys 2:303314

6. Flash T, Hogan N (1985) The coordination of arm movements:an experimentally conrmed mathematical model. J Neurosci5:16881703

7. Girosi F, Poggio T (1990) Networks and the best approxima-tion property. Biol Cybern 63:169176

8. Giszter S, Mussa-Ivaldi F, Bizzi E (1993) Convergent forceelds organized in the frog's spinal cord. J Neurosci 13:467491

9. Gomi H, Kawato M (1993) Neural-network control for aclosed-loop system using feedback-error-learning. NeuralNetworks 6:933946

10. Hebb DO (1948) The organization of behavior. Wiley, NewYork

11. Hogan N (1984) An organizing principle for a class of volun-tary movements. J Neurosci 4:27452754

12. Hornik K, Stinchcombe M, White H (1989) Multilayer feed-forward networks are universal approximators. Neural Net-works 2:359366

13. Jordan MI (1980) Learning inverse mappings using forwardmodels. Proc. 6th Yale Workshop on Adaptive and LearningSystems pp 146151

14. Jordan MI, Rumelhart DE (1992) Forward models: supervisedlearning with a distal teacher. Cogn Sci 16:307354

15. Kawato M (1989) Adaptation and learning in control of vol-

untary movement by the central nervous system. Adv Rob3:229249

16. Kawato M, Furukawa K, Suzuki F (1987) A hierarchicalneural-network model for control and learning of voluntarymovement. Biol Cybern 57:169185

17. Miller WT, Glanz FH, Kraft LG (1987) Application of a gen-eral learning algorithm to the control of robotic manipulators.Int J Rob Res 6:8498

18. Mussa-Ivaldi F, Giszter S (1992) Vector eld approximation: acomputational paradigm for motor control and learning. BiolCybern 67:491500

19. Mussa-Ivaldi F, Hogan N, Bizzi E (1985) Neural , mechanical,and geometric factors subserving arm posture in humans.J Neurosci 5:27322743

20. Mussa-Ivaldi F, Giszter S, Bizzi E (1994) Linear combinations

of primitives in vertebrate motor control. Proc Nat Acad Sci91:7534753821. Poggio T, Girosi F (1990) Networks for approximation and

learning. Proc IEEE 78:1481149722. Powell MJD (1992) The theory of radial basis function ap-

proximation in 1990. In: Light WA (ed) Advances in numer-ical analysis, Vol II. Wavelets, subdivision algorithms, andradial basis functions. Oxford University Press, Oxford, pp105210

23. Sadegh N, Horowitz R (1990) Stability and robustness analysisof a class of adaptive controllers for robotic manipulators. Int JRob Res 9:7492

24. Sanger T (1994) Neural network learning control of robotmanipulators using gradually increasing task diculty. IEEETrans Rob Aut 10:323333

25. Sanner RM (1993) Stable adaptive control and recursive

identication of nonlinear systems using radial Gaussian net-works. PhD Thesis, MIT Department of Aeronautics andAstronautics

26. Sanner RM, Slotine J-JE (1997) Gaussian networks for directadaptive control. IEEE Trans Neural Networks 3:837863

27. Sanner RM, Slotine J-JE (1995) Stable adaptive control of ro-bot manipulators using ``neural'' networks. Neural Comput7:753788

28. Shadmehr R, Mussa-Ivaldi F (1994) Adaptive representation ofdynamics during learning of a motor task. J Neurosci 14:32083224

29. Shadmehr R, Mussa-Ivaldi FA, Bizzi E (1993) Postural forceelds of the human arm and their role in generating multi-jointmovements. J Neurosci 13:4362

30. Slotine J-JE, Li W (1987) On the adaptive control of robotic

manipulators. Int J Rob Res 6:3

381


14/14

31. Slotine J-JE, Li W (1991) Applied nonlinear control. Prentice-Hall, Englewood Clis, NJ

32. Slotine J-JE, Sanner RM (1993) Neural networks for adaptivecontrol and recursive identication: a theoretical framework.In: Trentelman HL, Willems JC (eds) Essays on control:

perspectives in the theory and its applications. Birkhauser,Boston

33. Wiener N (1961) Cybernetics: or control and communication inthe animal and the machine, 2nd edn. MIT Press, Cambridge,Mass.

382

a mathematical model of the adaptative control of human arm motions

Documents