robust control strategies for quadruped bounding...

39
University of California Santa Barbara Robust Control Strategies for Quadruped Bounding Locomotion by Virgile Paris Internship Report submitted to the Department of Electrical Engineering in partial fulfillment of the requirements for the Master of Science Degree in Control Theory & Engineering Diploma in Electrical Engineering at the BORDEAUX INSTITUTE OF T ECHNOLOGY ENSEIRB-MATMECA September 2015 c Bordeaux Institute of Technology 2015. All right reserved.

Upload: others

Post on 07-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • University of CaliforniaSanta Barbara

    Robust Control Strategies for QuadrupedBounding Locomotion

    by

    Virgile Paris

    Internship Report submitted to the Department of Electrical Engineering

    in partial fulfillment of the requirements for the

    Master of Science Degree in Control Theory &

    Engineering Diploma in Electrical Engineering

    at the

    BORDEAUX INSTITUTE OF TECHNOLOGY

    ENSEIRB-MATMECA

    September 2015

    c©Bordeaux Institute of Technology 2015. All right reserved.

  • AbstractThis report presents a study of a quadruped bounding robot modeled as a planar

    three-links system, with a 8-dimensional state space adopting bounding gaits overrough terrain. In this study, we strive toward accurately controlling this system inthe prospect of optimizing stability over disparate terrain. Systematic and analyticalprocess to establish control strategies for dynamic bounding motion are described.

    A computationally tractable meshing algorithm mapping the 8-dimensional reach-able state space was developed in the process. This meshing was used as a map in thedevelopment of a path finding algorithm (A*), which find a combination of con-trollers to use from one step to another in order to overcome the obstacles of theterrain. The obstacles used in this particular report were gaps with specific sizes,separated by specific distances, which were both randomly computed.

    Using the planar model described above in a simulation, the robot was able toleap over any type of gaps with a maximum width of approximately 20 cm using themethod described above. These results provide a starting point for the establishmentof control strategies using meshing algorithms to move on rough terrain involvingslopes, obstacles with specific heights and widths and could also be used for agilityperformances such as climbing stairs, acquiring gaits with desired features withouthaving to wait for the steady state to be attained.

    Supervisor :

    Katie BylAssistant Professor, Dept. of Electrical and Computer Engineering,University of California, Stanta Barbara, USA

    Tutor :

    Patrick LanusseAssociate Professor in Automatic Control, ENSEIRB-MATMECA,Bordeaux Institute of Technology, France

    Page 1/38

  • AcknowledgmentsI would like to express my gratitude to the following individuals for their support

    throughout the project, which greatly enhanced the quality of this report.

    Dr. Katie Byl, Assistant Professor, Dept. of Electrical and Computer Engineering, Uni-versity of California, Stanta Barbara - for giving me the opportunity to be a partof the UCSB Robotics Laboratory and for guidance throughout the internship,which helped shape the report into its final state and achieve the results pre-sented herein.

    Pat Terry, Cenk Oguz Saglam, Chelsea Lau, Sebastian Sovero, Giulia Piovan,Tom Strizic, and all other local and international students, Current and pastmembers of the UCSB Robotics Lab - for their hospitality and companionship,my time spent in the laboratory was a cheerful and joyful experience thanks toall of them.

    Patrick Lanusse, Associate Professor, Bordeaux Institute of Technology, ENSEIRB- MAT-MECA - for tutoring me and supervising administrative proceedings.

    Thibault Barbié, Student from ENSICA, Toulouse, France - for its support through-out this internship and for contributing to my work by presenting ideas andsuggestions.

    Page 2/38

  • ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    I Introduction 61 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 The State-of-the-Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Mathematical Model and Control Strategy . . . . . . . . . . . . . . . . . . . . . . 84 Thesis Contributions and Organisation . . . . . . . . . . . . . . . . . . . . . . . . 12

    II Search for Controllers 131 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    1.1 Return Map Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.2 Limit Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2 Controller Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.1 The cost function C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2 Newton-Raphson Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 Newton-Raphson Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.4 Newton-Raphson Iterative Process . . . . . . . . . . . . . . . . . . . . . . . . . 202.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    III Behavioral Policy and Discretization of the Reachable State Space 231 Meshing algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    1.1 Meshing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Path Finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    2.1 Introduction to the A* Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 262.2 Applying the A* Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3 A* Algorithm Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    IV Conclusion and Future Work 311 Gradient Optimization : Applications and Improvements . . . . . . . . . . . . . . 312 Possible Use of the Meshing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 32

    Page 3/38

  • 3 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    Appendix 33A Mathematical Definitions and Properties . . . . . . . . . . . . . . . . . . . . . . . 33

    A.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33A.2 Taylor series approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    B Gradient and Hessian Matrix Computation . . . . . . . . . . . . . . . . . . . . . 35B.1 Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35B.2 Hessian Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    Bibliography 36

    List of Figures

    1 Some state-of-the-art quadruped robot developed. SCOUT II (b), Spot (d), chee-tah II (e) are human sized robots. Little dog (f), PAW (c), the famous quadrupedrobot capable of trotting down smooth surfaces developed by Marc Raibert (a)are smaller robots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2 Canid robot, composed of legs which will be modeled as springs and a actively-driven, compliant spine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    3 Quadruped model associated to the Canid robot . . . . . . . . . . . . . . . . . 94 Bounding phases of the system . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 General shape of the body angle trajectory and reference . . . . . . . . . . . . . 116 Illustration of the mapping process leading to a return map and the convergence

    of a state space trajectory toward a closed trajectory or fixed point. . . . . . . . 147 Illustration of the Newton-Raphson method . . . . . . . . . . . . . . . . . . . 188 Newton-Raphson iterative process . . . . . . . . . . . . . . . . . . . . . . . . . 219 Mapping and structure of a node . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    Page 4/38

  • Nomenclature

    l Vector featuring specific gaits

    u Input vector of a system

    x 8-dimensional state space vector

    θ1 Geometric rear leg angle, relative to ground

    θ2 Geometric back hip angle

    θ3 Geometric spine angle

    θ4 Geometric front hip angle

    A∗ A star

    C Cost function used in Newton-Raphson algorithm

    c(a,B) Closest element of the set B to the element a

    COP Center Of Pressure

    d(a,B) Euclidean distance from the element a to the set B

    DOF Degree Of Freedom

    M 3-dimensional matrix called state transition map and used in algorithm 3

    Q Temporary Set of 8-dimensional state space vector used in algorithm 3

    r rear leg length

    U Set of 6-dimensional input vectors

    X Set of 8-dimensional state space vectors

    Page 5/38

  • Introduction

    Part I

    Introduction

    1 Motivation

    The wheel has hitherto been the prevailing mean of locomotion in robotics as it is the mostpractical and straightforward way of accurately controlling the trajectory, position, speed andacceleration of a robot moving on the ground. However, it is certainly not offering the mostmaneuverability. Although wheeled robotic locomotion allow a variety of movements in anydirection on the ground and on every kind of smooth surfaces, it performs poorly on moreabrupt terrain such as real life environments where activities more dynamic (e.g. climbing,jumping, among others) could be necessary in order to overcome obstacles. That is why thefocus of this report is on legged robotics and more precisely bounding gaits.

    Bounding gait control design has recently become an even more attractive area of interestsince the outstanding performances demonstrated by the cutting-edge robots capable of adoptingsuch gaits. Some of those robots were proven to accomplish agility performances never seenbefore, e.g. the MIT robot cheetah II capable of jumping over thirty centimetres high obstacleswhile adopting a bounding gait (see figure 1 (e)) or the Boston Dynamics’ Spot robot capable ofpassing through steep terrain in real life environment with little trouble (see figure 1 (d)), showinghigh recovery from large perturbation, and overall demonstrating uncanny life-like motion.

    Agility and adroitness have always been a challenge in robotics, and bounding gaits seem tooffer a high range of possibilities that are yet to be explored. The ease with which animals canmove, swerve, climb, leap and turn in any kind of environment shows how much the field ofquadruped locomotion hasn’t been exploited to its full potential.

    2 The State-of-the-Art

    The most common way to stabilise a walking robot is to use the polygon formed by the contactpoints of the robot with the ground. Intuitively, if the center of mass of the system remains

    Page 6/38

  • Introduction

    Figure 1: Some state-of-the-art quadruped robot developed. SCOUT II (b), Spot (d), cheetahII (e) are human sized robots. Little dog (f), PAW (c), the famous quadruped robot capable oftrotting down smooth surfaces developed by Marc Raibert (a) are smaller robots.

    within this polygon, then the system is defined as statically stable and will not fall. Although,being proved to be efficient, this type of stability as shown serious drawbacks such as requiringcontrollers to fight the natural behaviour of the system. Moreover, using this approach, thesystem must remain within the support polygon at every instant in time which makes it veryslow since the robot’s motion is then only due to the actuators. The entire energy accumulatedduring the motion is therefore lost.

    To cope with those issues, another approach consisting in thinking the system as dynamicallystable rather than statically stable was developed. Instead of forcing the system to be stable atevery moment in time, it is abandoned to its natural behaviour. The dynamic stability allows thesystem to be unstable as long as the overall behaviour is stable. A prime example is human gaitwhich is dynamically stable but not statically stable. Numerous examples of work on severaldynamic stable quadruped robots [2, 10, 20, 21] have been done since the work of Marc H.Raibert [4] who was one of the first to focus on dynamic stability on quadruped robots.

    Page 7/38

  • Introduction

    However, different drawbacks are associated with that kind of gaits as well. One of today’seminent challenges in dynamic walking is the ability to perform successfully on rough terrain.The focus of this report is on control strategies on those types of terrain.

    3 Mathematical Model and Control Strategy

    The quadruped model that was adopted throughout this study was the Canid robot model (seefigure 2), which is a planar three-links system (see figure 3).

    Figure 2: Canid robot, composed of legs which will be modeled as springs and a actively-driven,compliant spine.

    The dynamics of this system can be seen as a 8-dimensional state vector

    x =(θ1 θ2 θ3 l θ̇1 θ̇2 θ̇3 l̇

    )Twhen considering the rear leg stance phase. Canid has three actuators located at the internaljoints : the back hip, spine and front hip actuator applying respectively the torques τ1, τ2 and τ3.During rear leg stance phase, the front leg is not of interest and the system has four DOF (3 anglesand 1 length) and only three actuator including the front hip actuator which is disregarded. Thesystem is therefore underactuated by 2 DOF. The dynamics of the system in terms of Cartesian

    Page 8/38

  • Introduction

    coordinates is

    xb = lcos(θ1) yb = lsin(θ1) (3.1)

    xh = xb + lcos(θ1 + θ2) yh = yb + lsin(θ1 + θ2) (3.2)

    xf = xh + lcos(θ1 + θ2 + θ3) yf = yh + lsin(θ1 + θ2 + θ3) (3.3)

    θ1

    θ2

    θ3

    θ4

    l

    y

    x

    mb, Jb

    mf , Jf(x, y)CM

    l′

    (xb, yb)

    (xh, yh)(xf , yf )

    θb

    Figure 3: Quadruped model associated to the Canid robot

    First, the potential energy V and kinetic co-energy T are found :

    T = 12[mb(ẋ2b + ẏ2b ) +mf (ẋ2f + ẏ2f ) + Jbθ̇2back + Jf θ̇2front] (3.4)

    V = mbgyb +mfgyf +12ksθ

    23 +

    12kh(l − l0)

    2 + 12kh(l′ − l′0)2 (3.5)

    , where ks and kh are respectively the spring constants of the spine spring and the hips spring,l0 and l′0 the relaxed length of the rear and front legs respectively, subsequently θback = θ1 + θ2and θfront = θ1 + θ2 + θ3. Using the Lagrangian method, the equations of motion can then be

    Page 9/38

  • Introduction

    derived from the equations (3.1), (3.2), (3.3) and (3.4), (3.5) in the canonical form of

    ẋ = M−1(x)(C(x) + u) (3.6)

    Integrating the dynamics forward using equation (3.6), the entire dynamics of the system can bedescribed as a function of an input vector u. Using this approach, the equations of motion canbe determined in the four phases of the system cartooned on figure 4.

    θ1

    θ2θ3

    l

    θ1

    l

    θ1

    l

    θ1

    (1) DOUBLE LEG STANCE PHASE

    (2) REAR LEG STANCE PHASE

    (3) FLIGHT PHASE

    (4) FRONT LEG STANCE PHASE

    θ2 θ3

    θ3θ2

    θ2 θ3

    l

    Figure 4: Bounding phases of the system

    The body angle θb and the center of mass (x, y)CM can then be controlled using the followingdefinitions :

    xCM =mbxb +mfxfmb +mf

    , yCM =mbyb +mfyfmb +mf

    (3.7)

    θb = arctan(yf − ybxf − xb

    )(3.8)

    In this study the focus was set on controlling the body angle using a simple PD controller on

    Page 10/38

  • Introduction

    the back hip actuator applying a torque τ1, the front hip joint being passive. This controller willbe used to drive the system to desired references of body angle trajectories, which will be tunedaccordingly depending on the kind of gait desired (see part II).

    τ1 = Kpε+Kdε̇ where ε = θb − θref (3.9)

    , Kp and Kd being determined empirically. The reference trajectory was chosen to be the follow-ing :

    θref (t) =

    k1t if t 6 tswitchk2(t− tswitch) + k1tswitch −∆θ if t > tswitch (3.10)

    θ

    ttswitch

    ∆θk1

    k2

    0

    tlift−off ttouch−down

    Actual trajectory θb(t)Reference trajectory θref (t)

    FLIGHT PHASEREAR LEGSTANCE PHASE

    Figure 5: General shape of the body angle trajectory and reference

    As it can be seen on the figure 5, this controller performs poorly in terms of making the actualtrajectory follow the reference accurately. However, let us recall that the ability of the controllerto accurately drive the system is an irrelevant concern in this study since we want the systemto be stable dynamically rather than statically. The idea is to tune the parameters specific to thereference trajectory in the interest of finding a desired actual trajectory with specific features (steplength, velocity, stability, maximum apex height, touch-down body angle, ...). The stated goalis to have a set of controllers associated with different bounding motions and be able to switch

    Page 11/38

  • Introduction

    between them to avoid obstacles while maintaining dynamic stability. The controllers will bedefined as vectors u =

    (u1 · · · u6

    )T, where u1 = −θ2 and u2 = −θ4 being the opposite

    of the geometric back and front hips angles set during flight phase1 for the next touch-downconfiguration and u3 = k1, u4 = k2, u5 = tswitch, u6 = ∆θ.

    4 Thesis Contributions and Organisation

    The next part is a detailed description of the different methods applied in attempt to find con-trollers associated with a gait featuring different characteristics. Besides, this part presents someof the tools that will be used throughout this report such as Poincarré maps and limit cycles, italso explains the gradient descent that was used to find specific controllers for the system and theway the controller search was seen as an optimization problem.

    Part III uses a powerful meshing method using the controllers found by the algorithm devel-oped in part II. This meshing was then applied to mapping the terrain in real time, which led tothe development of a path finding algorithm (A∗) to find a combination of controllers to use atevery step to overcome an upcoming obstacle.

    Finally, part IV is the conclusion of this report and presents both the improvements that canbe made on the work presented in this report but also the possibilities that those strategies offerfor legged locomotion.

    1The exact time when those angle are set during the flight phase is not of concern since the inertia of the legs isnot taken into account in this model.

    Page 12/38

  • Search for Controllers

    Part II

    Search for ControllersThe different types of bounding gaits a robot is capable of adopting are crucial if it is destinedto eventually interact in the real world. Without a variety of them the robot will be doomed tofail after only a few steps. Those failures could be due to numerous hindrances such as ditches orobjects preventing the robot from pursuing his path.

    Therefore, specific features such as the step length, the apex height, the speed or the stabilityon steep terrain could be of concern in the design of a controller associated to a specific boundingmotion.

    In this part, I will present a systematic process to establish a control strategy featuring specificcharacteristics of a desired bounding trajectory as well as the different tools that will be usedextensively.

    1 Definitions

    1.1 Return Map Analysis

    Discretizing the dynamics of a system is essential to apprehend the concept of dynamic stabilitysince only the overall behaviour of the system is of interest as discussed in part I. The Poincarrémap or return map is one the most used approaches in the legged robot literature. Let us consideran n-dimensional system ẋ = f(x) and a n-1 dimensional section S which is transverse to thedynamics i.e. the n-dimensional state vector trajectory flows through it during each period.Integrating the dynamics forward from one intersection with S to the next represents the processof mapping and is depicted on figure 6. The Poincarré map P can then describe the system usingthe expression

    xn+1 = P (xn).

    Page 13/38

  • Search for Controllers

    1.2 Limit Cycle

    A limit cycle can be defined as a nominally periodic sequence of steps that is stable as a whole butnot locally stable at every instant in time [3]. A sequence of steps stable as a whole means that for astate vector xn representing the state of the nth step,

    limn→+∞

    xn = limn→+∞

    P n(x0) = x∗ (1.1)

    where x0 is the initial state vector and x∗ is a constant vector.Indeed, if the state vector xn becomes relatively constant after a sufficient number of steps, it

    intuitively means that the system will repeat the same step from that moment on, thus if it didnot fall so far, it has no reason to fall if no disturbance occur. A sequence of steps not locally stableat every instant in time means that the stability has to be studied step by step and not throughouttime. In other words, the state vector has to converge toward a closed trajectory in a state space,this idea is depicted on figure 6.

    xn−1

    xn−2

    xn = x∗

    xn−3

    S

    Figure 6: Illustration of the mapping process leading to a return map and the convergence of astate space trajectory toward a closed trajectory or fixed point.

    The vector x∗ of the equation (1.1) is called a fixed point. Finding a fixed point then comesdown to solving

    P (x∗) = x∗. (1.2)

    Page 14/38

  • Search for Controllers

    One of the reasons why return map analysis is popular in legged robotics is because it con-verts the search for a closed trajectory in a search for fixed points of a mapping which con-siderably simplify the problem and is in alignment with the idea of dynamic stability since itallows the system to move freely as long as it returns to a specific state at each step. For moreinformation about limit cycles, the reader can refer to [3], [12] and [22].

    2 Controller Search Algorithm

    2.1 The cost function C

    In the interest of finding controllers associated to fixed points, we used a method, called theNewton-Raphson method, finding the inputs locally minimizing a function. This function wasdefined as a cost function, which is usually a sum of weighted differences between the actualparameters, defined as the vector l and the desired parameters, defined as ld. The idea is to usethe cost function to specify what needs to be found.

    In our case, the vector l is a function with two vectors as inputs : a controller u ∈ R6 and astate space vector x ∈ R8. The output l(u,x) is a vector composed of the state space vector afterone step and a vector p ∈ Rn, n being the number of information desired about the step taken,those information could be the step length, the maximum apex height, the body angle at lift-off,among others. For example, if the system has taken k steps and x(k) is the corresponding vector,then l(u,x(k)) =

    (p x(k+1)

    )T.

    The function ld represents the vector which the system is wanted to converge toward. Thismeans that it is composed of a vector pd containing the desired parameters of one step (oncethose parameters are set, this vector is fixed). However, ld is composed of another vector whichis not fixed, this vector represents the stability of the system and is composed of the state spacevector of the previous step. For example, if the system has taken k steps, then ld will be notedl(k−1)d and will be defined as l

    (k−1)d =

    (pd x

    (k−1))T

    .The first part of the cost function could thus be defined asC1(u,x(k)) =

    ∑8+ni=1 wi(li(u,x(k))−

    l(k)di

    )2 where wi are the weights associated to each squared difference. Henceforth, for simplifica-

    Page 15/38

  • Search for Controllers

    tion purposes, the matrix notation will be adopted and C1 will be defined as

    C1(u,x(k)) = (l(u,x(k))− l(k)d )TW (l(u,x(k))− l(k)d )

    with W = diag(wi) ∈ R(8+n)×(8+n). The second part of the definition of the cost functionconsisted of penalizing the cost when the controller found by the algorithm make the systemfall. The easiest way to do this was to associate the fall with a very high cost defined as a constantC2 = 105. Thus, the cost function was defined as

    C(u,x(k)) =

    C1(u,x(k)) when l(u,x(k)) 6= failure

    C2 when l(u,x(k)) = failure

    In our case, p = p = L and pd = pd = Ld, where L and Ld are the step length and desired steplength respectively. In the case where p would contain several information, the weights wouldbe different, the definition of those weights is however beyond the scope of this report. Theweights associated with the squared differences have to be chosen very carefully though. Indeed,ideally we would want the algorithm to focus on minimizing the error regarding the step lengthL and only after this is done, the focus can be on stabilizing the system while keeping the errorL− Ld constant. That is why W is defined as following

    W =Wp 0n×8

    08×n Ws

    Ws = ws ∗1

    |L−Ld|I8

    Wp = wp ∗ |L− Ld|

    where L and Ld are in centimetres and ws = 1, wp = 5 were chosen empirically.

    2.2 Newton-Raphson Method

    Using second order Taylor-Young expansion around the estimate of a controller associated witha fixed point (noted u∗), we have the following

    C(u+ ∆u) = C(u) + ∇C(u)∆u+ 12∆uTHC(u)∆u

    Page 16/38

  • Search for Controllers

    This approximation is valid since the research is constrained to a local minimum. Let us recallthat HC(u) and ∇C are the hessian matrix and the nabla of C with respect to u around theestimate u(k−1). Subsequently, we want :

    ∇C(u+ ∆u) = ∂C(u+ ∆u)∂∆u

    ∣∣∣∣∣u=u∗

    = 0

    Using the product differentiation rule for matrices and noticing the symmetry of the matrixHC(u) :

    0 = ∂C(u)∂∆u

    ∣∣∣∣∣u=u∗

    + ∇C∂∆u∂∆u

    ∣∣∣∣∣u=u∗

    + 12∂

    ∂∆u

    ∣∣∣∣∣u=u∗

    (∆uTHC(u)∆u

    )= 0 + ∇C + 12

    ((HC(u)∆u)T

    ∂∆u∂∆u

    ∣∣∣∣∣u=u∗

    + (∆u)T ∂∂∆u

    ∣∣∣∣∣u=u∗

    (HC(u)∆u))

    = ∇C + 12(∆uTHC(u)T + ∆uTHC(u)

    )= ∇C + (∆u)THC(u)

    Therefore, by taking the transpose of the previous expression and noticing that ∆u is the differ-ence between the estimate controller and the one we want to converge toward,

    0 = (∇C)T +HC(u)∆u ⇐⇒ −(∇C)T = HC(u)(u(k) − u(k−1))

    The iterative process of the Newton-Raphson method can then be computed using the followingexpression :

    u(k) = u(k−1) −HC(u)−1(∇C)T (2.1)

    However, let us point out that the Newton-Raphson algorithm can diverge away from theroot if the estimate becomes too close from an inflection point. That is why the value of C canincrease at times but decreases overall. To counteract this problem, (2.1) was slightly modified

    u(k) = u(k−1) − αHC(u)−1(∇C)T (2.2)

    and a step size α was introduced as to prevent the algorithm to either take too much time toconverge or to indefinitely leap over the actual minimum of the cost function without ever

    Page 17/38

  • Search for Controllers

    converging. Therefore, if sized suitably, α prevents this from happening. This idea is depictedon figure 7. In order to scale α properly, a line search algorithm was used.

    α suitably chosenα undersized

    α oversized

    u(0) u(0)

    u(0)

    Figure 7: Illustration of the Newton-Raphson method

    Page 18/38

  • Search for Controllers

    2.3 Newton-Raphson Algorithm

    In order for the algorithm to stop even if the desired results are not atteigned, we defined Nmaxas the maximum number of iteration permitted before making the iteration process stop. Fix-ing Nmax to 50 seemed to be a reasonable limit. Ideally, we want the algorithm to stop if thecontroller estimation stagnates or if the algorithm converged on a critical point (which is re-ally rare). This is why the condition for the algorithm to stop is ||C(u(k),x(k−1))|| < ε1 or||u(k) − u(k−1)|| < ε2. The results were obtained using ε1 = ε2 = 10−13 (ε1 and ε2 were deter-mined empirically).

    Algorithm 1 Newton-Raphson Algorithm1: Input : u(1), x∗

    2: Output : u(k), x(k)

    3: for k = 2→ Nmax do4: Determine α using the Armijo rule.5: u(k) = u(k−1) − αHC(u)−1(∇C)T

    6: Compute l(u(k),x∗) and save the state space vector x(k) corresponding to step taken from x∗

    7: with the controller u(k).8: if C(u(k),x∗) < ε1 then9: print "Converged on critical point"

    10: end if11: if ||u(k) − u(k−1)|| < ε2 then12: print "Converged on a controller u"13: return (u(k),x(k))14: end if15: end for

    The Armijo rule is a line search consisting in ensuring that the cost function decreases fromone step to another and that the decrease is significant enough. Basically, α is initialised to 1,which gives us a first C(u(k),x∗) that will be compared to the previous cost using the followingtest :

    C(u(k−1),x∗)− C(u(k),x∗) < σβms∇CHC(u)−1(∇C)T (2.3)

    Page 19/38

  • Search for Controllers

    where β = 1/3, σ = 10−3, m = 0 and s = 1 (see [7]). As long as (2.3) will not be satisfied, mwill be incremented. Once that test is satisfy, the actual value of u(k) is computed. The way thegradient and the Hessian matrix are computed are explained in the appendix.

    2.4 Newton-Raphson Iterative Process

    The reason why this iterative process is more intricate than it seems is due to the two inputsof the cost function. Indeed, the Newton-Raphson method is usually used with a cost functionwith only one input. Therefore, the order in which the different vectors (u, x, ...) are update isof paramount concern.

    Let us recall that the Newton-Raphson algorithm outputs a controller uminimizing the costfunction C(u,x). The goal being to find a controller associated with a stable fixed point, i.e.to reduce the difference of state space vector from one step to the next, and to have specificfeatures which are characterized in the desired parameters p. In other words, it is to minimizethe difference between the vector l =

    (p(k) x(k)

    )Tand ld =

    (pd x

    (k−1))T

    . However, theNewton-Raphson algorithm can only optimize the cost function in terms of the vector u, thusonly minimzing the difference l − ld from one step to the next. If the objective is to find acontroller associated to a fixed point, updating the current state from which the step is taken iscrucial. In that prospect, we used the following approach.

    First we choose an estimate of the controller u(1) = uest and the corresponding fixed pointx(1) = xest. We can then define ld such as ld =

    (pd x

    (1))T

    . The Newton-Raphson algorithmthen find the controller u(2) minimizing the difference between the current state space vectorwith the corresponding parameters p and the one of the previous step with the desired fixedparameters pd. Once u(2) is obtained, the fixed point associated with that controller can then becomputed2. The weights associated with the squared differences are updated at that time as well.The process is then reiterated with x(2) and u(2) or more generally x(k) and u(k) until the desiredprecision is attained which is to say C(u(i+1),x(i+1)) > ε, where ε (10−2 in our case) was chosenempirically. This process is depicted on figure 8.

    2The search for a fixed point from a pair (u, x) was done by using the function lsqnonlin.m of the softwareMATLAB.

    Page 20/38

  • Search for Controllers

    (u(i), x(i)) (u(i+1), x)Newton-Raphson(u(i), x(i)) Find fixed

    point x∗ of thenew controller

    i← i+1

    1 2

    3

    Save the vectors u(i+1) and x(i+1) as estimates for the next iteration

    x(i+1) ← x∗W ← W (u(i+1),x(i+1))

    Figure 8: Newton-Raphson iterative process

    Algorithm 2 Newton-Raphson iterative process1: Input : u(1), x∗2: Output : uoptimal, xoptimal3: (u(2), x)← Newton-Raphson(u(1), x∗)4: Find fixed point corresponding to (u(2), x) and save it as x(2).5: i← 16: while C(u(i+1),x(i+1)) > ε do7: i← i+18: (u(i+1), x)← Newton-Raphson(u(i), x(i))9: Find fixed point corresponding to (u(i+1), x) and save it as x(i+1).

    10: Update W in the definition of the cost function C.11: end while12: (uoptimal, xoptimal)← (u(i+1), x(i+1))13: return (uoptimal, xoptimal)

    2.5 Results

    Using the algorithm, we were able to find controllers within 2 cm of the desired step length bydoing several trials. The range of desired controllers that we were able to build varies from −20

    Page 21/38

  • Search for Controllers

    cm to +20 cm, every controller having a step length of 5 cm apart from the next and previousone. Half of the controllers have a negative step length because of the way the step length wascalculated. The distance of the front foot from the start before the step is observed, let us defineit as df and the distance of the rear foot from the start after the step is defined as dr. The steplength is then the defined as

    L = dr − df .

    All the controllers found this way were associated with stable fixed point, which was surprisingas nothing in the algorithm guarantees the stability of the fixed point. Subsequently, 9 stablecontrollers were successfully built and used to mesh the reachable state space of the systemwhich was applied to map the terrain and find a path using the A∗ algorithm as the next part willexplained in more detail.

    Page 22/38

  • Behavioral Policy and Discretization of the Reachable State Space

    Part III

    Behavioral Policy and Discretization of theReachable State SpaceThe ability to quickly switch from a particular bounding motion to another and the configu-ration in which the system should find itself when the switching occurs, are issues in leggedrobotics. Indeed, in most cases the configuration of the robot will depend on its configurationat the previous step xn−1 and on the controller that was used at that step un−1.

    Subsequently, switching from one controller to another is a problem on several levels. Firstly,the state of the next step after switching controllers is most likely far from the limit cycle x∗

    associated to the new controller. Secondly, even if the controllers ui and uj are associated todynamically stable gaits, switching from one to another will oftentimes make the robot fall. Asa result, to enable the robot to switch controllers, it has to have information about the nextstep provided that he already knows the new controller. This is where the idea of meshing thereachable state space comes from, it will be developed in the first section.

    The second section will address the problem of using the information of the reachable statespace and the ability to switch controllers in order to overcome obstacles. The problem can bereduced to a path finding problem and was solved using the A∗ algorithm.

    1 Meshing algorithm

    1.1 Meshing

    The methodology of the meshing algorithm implemented was the following. The knowledge ofinformation about the behaviour of the system in every possible configuration (every state x,every controller u) was the incentive of this approach. In order to apply Markov Chain Theoryto the system, we need to build the deterministic state transition matrix M for each controller

    Page 23/38

  • Behavioral Policy and Discretization of the Reachable State Space

    such as :

    M{i,j,k} =

    Li,j,k when P (xi,uk) = xj0 otherwisewhere Li,j,k is the step length of the step corresponding to the equation P (xi,uk) = xj . M is a3 dimensional matrix called state transition map, this matrix is deterministic.

    The actual algorithm used (see algorithm 3) originates from [18], [15] and the idea is toinitialize the algorithm by defining the set of state vector X and the set of controller U and tosimulate a single step for each state vector x ∈ X and each controller u ∈ U in the interestof knowing whether the step was a success or a failure (0 or Li,j,k in the state transition map)building the matrix M this way and adding the new vector generated this way in the initial setof Vector X .

    In the process, the different states must be distinguished as to not compute the same statevector twice. To achieve this, we used the euclidean distance d(a,B) defined as

    d(a,B) := minb∈B

    √√√√√length(a)∑

    i=1

    (ai − bi)2σ2i

    (1.1)where σi is the standard deviation of the vector bi. The euclidean distance d(a,B) determinesthe distance between the vector a and the set of vectors B. For every new vector that is yet tobe tested, its remoteness to the set of vector already computed is calculated and depending uponthis result, it will or will not be added to the existing set of vector. When the distance betweenthe current vector and the set of vector X is below the threshold dthr, the current vector isapproximated to the nearest vector in X so as to determine where the new vector is located inX . In the interest of doing so, the following function is defined.

    c(a,B) := argminb∈B

    √√√√√length(a)∑

    i=1

    (ai − bi)2σ2i

    (1.2)This approximation allows us to identify the state vector on which the current vector loops backand to store the information in the state transition map M .

    Page 24/38

  • Behavioral Policy and Discretization of the Reachable State Space

    In the algorithm 3, the sets of vectorsX and U are ordered such as xi and uk refer respectivelyto the ith and kth vector of X and U .

    Algorithm 3 Meshing Algorithm1: Input : X , U , dthr2: Output : M3: M is initialized with zeros in the 3 dimensions4: Qnext ← X5: while Qnext not empty do6: Qcurrent ← Qnext7: empty Qnext8: for each xi ∈ Qcurrent do9: for each uk ∈ U do

    10: Simulate one step and save the state vector xj thusly generated .11: if the step was a success then12: if d(xj , X) > dthr then13: j ← Card(X) + 114: Add xj to X15: Add xj to Qnext16: One line and one column of zeros are added to M along the 3rd17: dimensions.18: else19: xn = c(xj , X)20: j ← n21: end if22: M(i, j, k, l)← Li,j,k23: else24: M(i, 1, k, l)← −125: end if26: end for27: end for28: end while29: return M

    In order to manage the new state vectors generated in the process, the idea is to introducetwo temporary sets of vectors Qcurrent and Qnext who are respectively going to take the place ofthe set of vectors that have to be tested during that iteration and the set of vectors that are goingto be tested during the next iteration. If no vector need to be tested during the next iteration,the meshing is complete and the state transition map M is returned.

    Page 25/38

  • Behavioral Policy and Discretization of the Reachable State Space

    2 Path Finding

    2.1 Introduction to the A* Algorithm

    The A∗ algorithm is one of the most important algorithm of the shortest path problems. Itconsists of finding the shortest path of a given configuration in terms of a cost function definedas the sum of a heuristic function and a movement cost function, such as

    f = h+ g (2.1)

    where f is the cost function, h and g are respectively the heuristic function and the movementcost function, commonly representing the distance of a node to the end goal and the cost neces-sary to move from one node to another respectively.

    2.2 Applying the A* Algorithm

    The obstacles used in this particular study were gaps with specific sizes, separated by specificdistances, which were both randomly computed. The sizes and the distances are assumed togenerate Gaussian distributions such as

    P{D = d} = e− 12 (

    d−µdσd

    )2

    σd√

    P{W = w} = e− 12 (

    w−µwσw

    )2

    σw√

    2π, where w is the width of the gap and d is the distance between the end of one gap and the begin-ning of the next one. µd, σd and µw, σw are respectively the mean and variance of the distancebetween two gaps and the mean and variance of the width of a gap. Those value were empiri-cally determined so that any gap can theoretically be overcome by the system if the conditionsare ideal.

    In this study, the terrain that is used is a terrain with gaps, in the prospect of reducing thecomputational cost of the mapping of the node, the A∗ algorithm will be used only a few steps

    Page 26/38

  • Behavioral Policy and Discretization of the Reachable State Space

    before the gap. The A∗ algorithm was used in order to both reduce the number of steps beforeleaping over a gap and enhancing the minimum distance between the rear foot and the gap andthe front foot and the gap. Thus, let us define dr as the distance of the rear foot from the start,df as the distance of the front foot from the start and dg as the distant of the gap from the start.Subsequently, we can define

    drg = |dr − dg| (2.2)

    dfg = |df − dg| (2.3)

    In other words, min(drg, dfg) has to be as great as possible when the robot is close to a gap.Thus, the heuristic and movement cost function were chosen as

    h(df ) =1df

    (2.4)

    g(drg, dfg) =1

    min(drg, dfg)(2.5)

    Mapping the reachable state space using the state transition map in terms of nodes was alsoan important part of the A∗ algorithm. In the process of mapping, the structure of a node hadto be chosen. The process of mapping was done as depicted on the figure 9.

    Page 27/38

  • Behavioral Policy and Discretization of the Reachable State Space

    number : j+k+1drdfxuωh(df )pointer : 3open list : [m, n]

    1

    2

    3

    j

    j+1

    j+k

    j+k+l

    j+k+1

    m

    n

    Figure 9: Mapping and structure of a node

    It is worth noting that mapping this way has the advantage of offering the choice not tobuild irrelevant nodes. In this study, only the reachable nodes will be built, which are the nodeswhere the robot did not fall. If any switching from one controller to another lead to the fallof the robot, then the corresponding node will be discarded. Likewise, the mapping discardsthe nodes considering the system as being in a gap, which means that either dr ∈ [Gap.location,Gap.location+Gap.width] or df ∈ [Gap.location, Gap.location+Gap.width]. In our case, theheuristic value associated to each nodes can be precomputed however, the movement cost func-tion is will be computed in the A∗ algorithm (see algorithm 4).

    2.3 A* Algorithm Pseudo Code

    Because of computational cost considerations, the A∗ Algorithm starts when the robot is closeto the gap and stops after the rear leg is on the other side of the gap. Indeed, the number of nodes

    Page 28/38

  • Behavioral Policy and Discretization of the Reachable State Space

    grows exponentially with the number of steps the system takes before jumping.

    Algorithm 4 A* Algorithm Pseudo Code1: Input : Node(1), Gap2: Output : path3: Using the state transition map, the position of the Gap and Node(1), all the possible nodes are built.4: i← 15: while Node(i).dr < Gap.location+Gap.width do6: if the open list of Node(i) is not empty then7: Compute the cost function f of each nodes from the open list of Node(i) and store the8: controller u corresponding to the node minimizing f into path.9: else

    10: The controller u leading to this node is discarded from the path.11: i← the number of the node from which Node(i) comes from (or pointer).12: The node that was examined is discarded from the open list of Node(i).13: end if14: end while

    In other words, the algorithm will go forward every time a node has other nodes in its openlist and will go backward every time its open list is empty and discard the node from which itcomes from. Therefore, if a path to the other side of the gap exists, it will find it. The index of acontroller of the variable path then corresponds to the number of step taken by the system sincethe A∗ algorithm was used. In other words, ui=path(i), where i is that number of steps.

    2.4 Results and Discussion

    After a large number of simulation, the system was shown to perform well on approximately20 centimetres long gaps. There is a number of reason why the algorithm is flawed and couldbe enhanced. First and foremost, the density of state of the meshing from which the mappingresults is neither infinite nor uniform. Subsequently, the difference between the state in whichthe system refers to using the mapping and the actual state will be different from zero and willfluctuate depending on the state and the controller.

    Secondly, the heuristic and movement cost functions could be improved by taking otherparameters into account and by studying the compromise between the number of steps takenand how far the robot is from a gap before and after the jump. Indeed, the less steps the robot istaken, the more risks there is to fall into a gap.

    Page 29/38

  • Behavioral Policy and Discretization of the Reachable State Space

    Thirdly, the algorithm was used only one step before jumping in order not to take too longfor this particular simulation. As a result, the system had little room for maneuver. However,if the algorithm is to be used in real time, the path has to be computed only a few steps beforejumping. However, this issue could be counteract in two ways. The speed of computation ofthe nodes could be higher than it currently is by merely building the nodes that are just after thegap. This would consists of looking at every nodes at the very end of every branches that arestill expanding and verify that dr is greater than the end of the gap and stop expanding if it is thecase. Another way to use the algorithm in real time would be to look at the state the robot isgoing to be in n steps before the jump but computing this state m steps before the jump (withm > n) using the state transition matrix. The drawback of this method however is that the errorof the meshing would accumulate with how far the robot is looking ahead, but it would allowmore time for the computation and thus more room for maneuver for the last n steps. A possiblefuture work would be to quantify that compromise in order to find the best combination of lookahead steps m and room for maneuver n.

    Page 30/38

  • Conclusion and Future Work

    Part IV

    Conclusion and Future Work

    1 Gradient Optimization : Applications and Improvements

    The Gradient descent of type Newton-Raphson that was described in part II would become aneven more powerful tool in the case where not just the step length but several features of thetype of gait desired were included in the algorithm. This would consist of merely modifying thevector p which in our case was a scalar.

    The Newton-Raphson gradient descent yielded satisfactory end results. In our case, severaltrials were needed before making the algorithm converge toward a controller associated to a gaitfeaturing a step length within 2 cm of the desired step length. Those results were obtained nomatter how far the initial step length was from the desired step length (frequently the initialerror was of approximately 30 cm). The way the Newton-Raphson algorithm was modifiedallows the local search for controllers to be virtually extended to a global search, thus makingit less dependent upon the initial conditions. We plan to study the benefits of using differentfunctions adjusting the weights associated to the cost function throughout the optimization inthe prospect of making the algorithm more reliable and less sensitive to initial conditions.

    It is worth mentioning that to ensure the algorithm not to converge toward a controllermaking the system fall, the minimizing problem could be converted to a maximizing a problemthus associating a cost of C2 = 0 with a system failure. This way, there is no smaller value thatC(u,x) can take in this case and the algorithm is ensured not to take that value. In our case(seeing the cost function as a function to be minimized) C2 was merely chosen to be 105 becauseempirically it was determined that it was impossible for C(u,x) to take a greater value than 105

    without making the system fail.

    Page 31/38

  • Conclusion and Future Work

    2 Possible Use of the Meshing Algorithm

    There is a wide range of problems in which the meshing algorithm could be used. In addition tousing the meshing algorithm as a way to overcome obstacles, it could be used to make a systemattain a desired limit cycle in the least number of steps possible. It could also be used to calculatethe number of steps before the robot fall by considering the problem of dynamic stability as aMarkov chain problem and adopting the MFPT (mean first-passage time) metastability metric,which intuitively apprehend and quantify the overall stability and robustness of the system overa variety of terrain [13].

    3 Closing Remarks

    The field of dynamic locomotion is becoming more and more concerned about the ability ofrobots to be agile and thus to control the robot from step to step in a very fast manner. Meshingthe reachable state space, in the prospect of predicting beforehand the state the robot is going tobe in, turned out to be one of the most useful and effective tool for doing so.

    Applying the meshing to a basic A* algorithm was an intuitive and simple application, how-ever, the applications of that idea are a lot broader than that. The main advantage of the meshingis that it is tractable and can be adapted to any number of specificity of the terrain, provided thatthe robot is able to manage a great amount of data, such as different kind of slopes, different stairsteps, and virtually all kinds of terrain profile that could be thought of.

    Page 32/38

  • Appendix

    Appendix A: Mathematical Definitions and Properties

    Appendix A.1: Definitions

    Definition : GradientLet f be a scalar field, f : Rn → R. The gradient is a vector operator which can be applied on ascalar field only. The gradient extends the differential operator to a scalar field. It is defined as

    ∇f = ∂f∂x

    =(∂f∂x1

    · · · ∂f∂xn

    )Definition : Jacobian matrixLet f be a vector field, f : Rn → Rn. The definition of the gradient of a function can be extendedto a vector function. The result is a matrix called Jacobian matrix of f with respect to x. It isdefined as

    Jf (x) =∂f

    ∂x=(∂f∂x1

    · · · ∂f∂xn

    )=

    ∂f1∂x1

    · · · ∂f1∂xn

    ... . . ....

    ∂fn∂x1

    · · · ∂fn∂xn

    By abusing the notation, the Jacobian matrix is often written as ∇f .

    Definition : Hessian matrixLet f be a scalar field, f : Rn → R. The Hessian matrix is defined as

    Hf (x) = J(∇f)T (x) =∂(∇f)T∂x

    =(∂(∇f)T∂x1

    · · · ∂(∇f)T

    ∂xn

    )

    =

    ∂∇f1∂x1

    · · · ∂∇f1∂xn

    ... . . ....

    ∂∇fn∂x1

    · · · ∂∇fn∂xn

    =

    ∂2f1∂x21

    · · · ∂2f1∂xn∂x1

    ... . . ....

    ∂2fn∂xn∂x1

    · · · ∂2fn∂x2n

    Page 33/38

  • Appendix

    Appendix A.2: Taylor series approximations

    Taylor series approximation of a scalar function of a vectorOnly the second-order approximation is considered here. Let δx be an infinitesimal variation ofx and f a scalar field such as f : Rn → R, then

    f(x+ δx) ' f(x) + 〈∇f, δx〉+ 12〈Hf (x)δx, δx〉

    ' f(x) + ∇fδx+ 12δxTHf (x)δx

    Taylor series approximation of a vector function of a vectorOnly the first-order approximation is considered here. Let δx be an infinitesimal variation of xand f a vector field such as f : Rn → Rn, then

    f(x+ δx) ' f(x) + 〈Jf (x), δx〉

    ' f(x) + Jf (x)δx

    Page 34/38

  • Appendix

    Appendix B: Gradient and Hessian Matrix Computation

    Appendix B.1: Gradient

    To keep the computational cost of the Newton-Raphson algorithm to the minimum, ∇C wascomputed by taking the analytical derivative of the cost function, with respect to the vector u,using the product differentiation rule for matrices which is proven in [1]:

    ∂xTy

    ∂u= yT ∂x

    ∂u+ xT ∂y

    ∂u(B.1)

    Identifying x as l(u) − ld and y as W (l(u) − ld), and noticing that the matrix W is diagonal(thus symmetric) the rule can be applied directly to have the form used for the computation :

    ∇C = ∂C∂u

    = ∂∂u

    ((l(u)− ld)TW (l(u)− ld)

    )= (W (l(u)− ld))T

    ∂(l(u)− ld)∂u

    + (l(u)− ld)T∂(W (l(u)− ld))

    ∂u

    = (l(u)− ld)TW T∂l(u)∂u

    + (l(u)− ld)TW∂l(u)∂u

    = 2(l(u)− ld)TW∂l(u)∂u

    = 2(l(u)− ld)TWJl(u)

    where Jl(u) is the Jacobian matrix of l with respect to u.Let us recall that computing l requires to simulate a step of the system, the number of call to

    this function is thus of interest. That is why the Jacobian matrix was calculated as the following:

    Jl(u) =(∂l∂u1

    · · · ∂l∂u6

    )where

    ∂l

    ∂ui= l(u1, ..., ui + du, ..., u6)− l(u1, ..., ui − du, ..., u6)2du

    The number of steps required for the computation of ∇C was then limited to 12.

    Page 35/38

  • Appendix

    Appendix B.2: Hessian Matrix

    To compute the Hessian matrix, we used the second order Taylor expansion :

    C(u+ ∆u) = C(u) + ∇C∆u+ 12∆uTHC(u)∆u

    For simplification purposes, we write ∆u = εei. Thus,

    C(u+ εei) = C(u) + ∇Cεei +12e

    Ti HC(u)eiε2

    = C(u) + ∂C∂ui

    ε+ 12∂2C

    ∂u2iε2 (B.2)

    and by replacing ∆u by εej we have

    C(u+ εej) = C(u) +∂C

    ∂ujε+ 12

    ∂2C

    ∂u2jε2 (B.3)

    Then finally by replacing ∆u by ε (ei + ej)

    C(u+ ε(ej + ej)) = C(u) +(∂C

    ∂uj+ ∂C∂ui

    )ε+ 12(ei + ej)

    THC(u)(ei + ej)ε2

    = C(u) +(∂C

    ∂uj+ ∂C∂ui

    )ε+ 12

    (∂C

    ∂uj+ ∂C∂ui

    + 2 ∂2C

    ∂ui∂uj

    )(B.4)

    By subtracting equations (B.2) and (B.3) to equation (B.4), the elements of the Hessian matrixcan be computed :

    ∂2C

    ∂ui∂uj= C(u) + C(u+ ε(ej + ej))− C(u+ εei)− C(u+ εej)

    ε2

    Page 36/38

  • Bibliography

    [1] R. J. Barnes. Matrix differentiation. Department of Civil Engineering, University of Min-nesota, 2014.

    [2] K. Byl. Metastable Legged-Robot Locomotion. PhD thesis, Massachusetts Institute of Tech-nology.

    [3] M. W. Daan G.E. Hobbelen. Humanoid Robots : Human-like Machines. Matthias HackelSBN, 2007.

    [4] M. H. Raibert. Legged robots that balance. MIT Press, 1986.

    [5] R. L. Tedrake. Applied Optimal Control for Dynamically Stable Legged Locomotion. PhDthesis, Massachusetts Institute of Technology.

    [6] J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 2006.

    [7] D. P. Bertsekas. Constrained Optimization and Lagrange Multiplier Methods. Athena Scien-tific, Belmont, Massachusetts, 1996.

    [8] D. Papadopoulos and M. Buehler. Stable running in a quadruped robot with compliantlegs. In IEEE Int. Conf. Robotics and Automation, pages 444–449, 2000.

    [9] K. B. Petersen and M. S. Pedersen. The Matrix Cookbook. 2012.

    [10] I. Poulakaki. On the passive dynamics of quadrupedal running. Master’s thesis, McGillUniversity, Montreal, Canada, July 2002.

    [11] J. R. Movellan. Matrix recipes, 2006.

    [12] J. S. P. H. Robert J. Full, Timothy Kubow and D. Koditschek. Quantifying dynamicstability and maneuverability in legged locomotion. In Integr. Comp. Biol., pages 149–157,2002.

    [13] C. O. Saglam. Tractable Quantification of Metastability for Robust Bipedal Locomotion. PhDthesis, University of California, Santa Barbara.

    Page 37/38

  • [14] C. O. Saglam and K. Byl. Stability and gait transition of the five-link biped on stochasticallyrough terrain using a discrete set of sliding mode controllers. IEEE Int. Conf. Robotics andAutomation (ICRA), Paper 437, 2013.

    [15] C. O. Saglam and K. Byl. Switching policies for metastable walking. In Proc. IEEE Confer-ence on Decision and Control (CDC), 2013.

    [16] C. O. Saglam and K. Byl. Metastable markov chains. In CDC, 2014.

    [17] C. O. Saglam and K. Byl. Quantifying the trade-offs between stability versus energy use forunderactuated biped walking. In Proc. IROS, 2014.

    [18] C. O. Saglam and K. Byl. Robust policies via meshing for metastable rough terrain walking.In Proc. Robotics: Science and Systems (RSS), 2014.

    [19] C. O. Saglam and K. Byl. Meshing hybrid zero dynamics for rough terrain walking. Ac-cepted for ICRA, 2015.

    [20] J. E. Seipel. Analytic-holistic two-segment model of quadruped back-bending in the sagittalplane. Proceedings of the ASME 2011 International Design Engineering Technical Conferences& Computers and Information in Engineering Conference, page 855–61, August 2011.

    [21] J. A. Smith. Galloping, bounding and wheeled-leg modes of locomotion on underactuatedquadrupedal robots. PhD thesis, McGill University.

    [22] S. H. Strogatz. In Nonlinear dynamics and Chaos : with applications to physics, biology,chemistry, and engineering, 1994.

    Page 38/38

    AbstractAcknowledgmentsI IntroductionMotivationThe State-of-the-ArtMathematical Model and Control StrategyThesis Contributions and OrganisationII Search for ControllersDefinitionsReturn Map AnalysisLimit CycleController Search AlgorithmThe cost function CNewton-Raphson MethodNewton-Raphson AlgorithmNewton-Raphson Iterative ProcessResultsIII Behavioral Policy and Discretization of the Reachable State SpaceMeshing algorithmMeshingPath FindingIntroduction to the A* AlgorithmApplying the A* AlgorithmA* Algorithm Pseudo CodeResults and DiscussionIV Conclusion and Future WorkGradient Optimization : Applications and ImprovementsPossible Use of the Meshing AlgorithmClosing RemarksAppendixMathematical Definitions and PropertiesDefinitionsTaylor series approximations

    Gradient and Hessian Matrix ComputationGradientHessian Matrix

    Bibliography