algames: a fast solver for constrained dynamic games · robot’s trajectory on the other agents is...

ALGAMES: A Fast Solver for ConstrainedDynamic Games

Simon Le Cleac’hDepartment of Mechanical Engineering

Stanford [email protected]

Mac SchwagerDepartment of Aeronautics

& AstronauticsStanford University

[email protected]

Zachary ManchesterDepartment of Aeronautics

& AstronauticsStanford University

[email protected]

Abstract—Dynamic games are an effective paradigm for deal-ing with the control of multiple interacting actors. This paperintroduces ALGAMES (Augmented Lagrangian GAME-theoreticSolver), a solver that handles trajectory optimization problemswith multiple actors and general nonlinear state and inputconstraints. Its novelty resides in satisfying the first order opti-mality conditions with a quasi-Newton root-finding algorithm andrigorously enforcing constraints using an augmented Lagrangianformulation. We evaluate our solver in the context of autonomousdriving on scenarios with a strong level of interactions betweenthe vehicles. We assess the robustness of the solver using MonteCarlo simulations. It is able to reliably solve complex problemslike ramp merging with three vehicles three times faster than astate-of-the-art DDP-based approach. A model predictive control(MPC) implementation of the algorithm demonstrates real-timeperformance on complex autonomous driving scenarios with anupdate frequency higher than 60 Hz.

I. INTRODUCTION

Controlling a robot in an environment where it interactswith other agents is a complex task. Traditional approachesin the literature adopt a predict-then-plan architecture. First,predictions of other agents’ trajectories are computed, thenthey are fed into a planner that considers them as immutableobstacles. This approach is limiting because the effect of therobot’s trajectory on the other agents is ignored. Moreover,it can lead to the “frozen robot” problem that arises whenthe planner finds that all paths to the goal are unsafe [1].It is therefore crucial for a robot to simultaneously predictthe trajectories of other vehicles on the road while planningits own trajectory, in order to capture the reactive natureof all the agents in the scene. ALGAMES provides sucha joint trajectory predictor and planner by considering allagents as players in a Nash style dynamic game. We envisionALGAMES as being run on-line by a robot in a recedinghorizon loop, at each iteration planning a trajectory for therobot by explicitly accounting for the reactive nature of allagents in its vicinity.

Joint trajectory prediction and planning in scenarios withmultiple interacting agents is well-described by a dynamicgame. Dealing with the game-theoretic aspect of multi-agentplanning problems is a critical issue that has a broad rangeof applications. For instance, in autonomous driving, rampmerging, lane changing, intersection crossing, and overtaking

Fig. 1. A ramp merging problem is solved using a receding horizon imple-mentation of ALGAMES. Each time the receding-horizon plan is updated, weadd a colored dot representing the position of each vehicle at the time of theupdate. In this simulation, ALGAMES has been called 151 times in 3 secondsto find receding horizon Nash equilibrium trajectories. ALGAMES generatesseemingly natural trajectories that account for the individual objectives of allplayers while sharing the responsibility of avoiding collisions.

maneuvers all comprise some degree of game-theoretic inter-actions [2, 3, 4, 5, 6, 7]. Other potential applications includemobile robots navigating in crowds, like package deliveryrobots, tour guides, or domestic robots; robots interacting withpeople in factories, such as mobile robots or fixed-base multi-link manipulators; and competitive settings like drone and carracing [8, 9].

In this work, we seek solutions to constrained multi-playergeneral-sum dynamic games. In dynamic games, the players’strategies are sequences of decisions. It is important to noticethat, unlike traditional optimization problems, non-cooperativegames have no “optimal” solution. Depending on the structureof the game, asymmetry between players, etc., different con-cepts of solutions are possible. In this work, we search forNash equilibrium solutions. This type of equilibrium modelssymmetry between the players; all players are treated equally.At such equilibria, no player can reduce its cost by unilaterallychanging its strategy. For extensive details about the game-theory concepts addressed in this paper, we refer readers to

the work of Bressan [10] and Basar et al. [11].Our solver is aimed at finding a Nash equilibrium for

multi-player dynamic games, and can handle general nonlinearstate and input constraints. This is particularly important forrobotic applications, where the agents often interact throughtheir desire to avoid collisions with one another or with theenvironment. Such interactions are most naturally representedas (typically nonlinear) state constraints. This is a crucial fea-ture that sets game-theoretic methods for robotics apart fromgame-theoretic methods in other domains, such as economics,behavioral sciences, and robust control. In these domains, theagent interactions are traditionally represented in the objectivefunctions themselves, and these games typically have no stateor input constraints. In mathematics literature, Nash equilibriawith constraints are referred to as Generalized Nash Equilibria[12]. Hence, in this paper we present an augmented Lagrangiansolver for finding Generalized Nash Equilibria specificallytailored to robotics applications.

Our solver assumes that players are rational agents actingto minimize their costs. This rational behavior is formulatedusing the first-order necessary conditions for Nash equilib-ria, analogous to the Karush-Kuhn-Tucker (KKT) conditionsin optimization. By relying on an augmented Lagrangianapproach to handle constraints, the solver is able to solvemulti-player games with several agents and a high level ofinteractions at real-time speeds. Finding a Nash equilibriumfor 3 autonomous cars in a freeway merging scenario takes 90ms. Our primary contributions are:

1) A general solver for dynamic games aimed at identifyingGeneralized Nash Equilibrium strategies.

2) A real time MPC implementation of the solver ableto handle noise, disturbances, and collision constraints(Fig. 1).

3) A comparison with iLQGames [4] on speed. ALGAMESfinds Nash equilibria 3 times faster than iLQGames fora fixed constraint satisfaction criterion.

II. RELATED WORK

A. Equilibrium Selection

Recent work focused on solving multi-player dynamicgames can be categorized by the type of equilibrium theyselect. Several works [2, 3, 9, 13] have opted to searchfor Stackelberg equilibria, which model an asymmetry ofinformation between players. These approaches are usuallyformulated for games with two players, a leader and a follower.The leader chooses its strategy first, then the follower selectsthe best response to the leader’s strategy. Alternatively, a Nashequilibrium does not introduce hierarchy between players;each player’s strategy is the best response to the other players’strategies. As pointed out in [6], searching for open-loopStackelberg equilibrium strategies can fail on simple examples.In the context of autonomous driving, for instance, whenplayers’ cost functions only depend on their own state andcontrol trajectories, the solution becomes trivial. The leaderignores mutual collision constraints and the follower has to

Fig. 2. Superimposed sequence of images depicting the trajectories obtainedby solving for open-loop Stackelberg equilibrium strategies. The slow bluevehicle is the leader and cuts in front of the the fast red vehicle, which isthe follower. This example illustrates the fact that the leader always has the“right of way” over the follower in any situation.

adapt to this strategy. This behavior can be overly aggressivefor the leader (or overly passive for the follower) and does notcapture the game-theoretic nature of the problem, see Figure2.

Nash equilibria have been investigated in [4, 5, 8, 14, 15,16, 17]. We also take the approach of searching for Nashequilibria, as this type of equilibrium seems better suitedto symmetric, multi-robot interaction scenarios. Indeed, wehave observed more natural behavior emerging from Nashequilibria compared to Stackelberg when solving for open-loop strategies.

B. Game-Theoretic Trajectory Optimization

Most of the algorithms proposed in the robotics literature tosolve for game-theoretic equilibria can be grouped into fourtypes: First are algorithms aimed at finding Nash equilibriathat rely on decomposition, such as Jacobi or Gauss-Siedelmethods [8, 14, 18]. These algorithms are based on an iterativebest response scheme in which players take turns at improvingtheir strategies considering the other agents’ strategies asimmutable. This type of approach is easy to interpret andscales reasonably well with the number of players. However,convergence of these algorithms is not well understood [12],and special care is required to capture the game-theoreticnature of the problem [8]. Moreover, solving for a Nashequilibrium until convergence can require many iterations,each of which is a (possibly expensive) trajectory optimizationproblem. This can lead to prohibitively long solution times.

Second, there are a variety of algorithms based on dynamicprogramming. In [6], a Markovian Stackelberg strategy iscomputed via dynamic programming. This approach seemsto capture the game-theoretic nature of autonomous driving.However, dynamic programming suffers from the curse ofdimensionality, and therefore practical implementations relyon simplified dynamics models coupled with coarse discretiza-tion of the state and input spaces. To counterbalance theseapproximations, a lower-level planner informed by the statevalues under the Markovian Stackelberg strategy is run. Thisapproach, which scales exponentially with the state dimension,has only been demonstrated in a two-player setting. Addingmore players is likely to prevent real-time application ofthis algorithm. In contrast, our proposed approach scales

polynomially with the number of players (see Section IV-E).Third, algorithms akin to differential dynamic programming

have been developed for robust control [19] and later appliedto game-theoretic problems [4, 15]. This approach scalespolynomially with the number of players and is fast enoughto run real-time in a model-predictive control (MPC) fashion[4]. However, this type of approach does not natively han-dle constraints. Collision-avoidance constraints are typicallyhandled using large penalties that can result in numerical ill-conditioning which, in turn, can impact the robustness or theconvergence rate of the solver. Moreover, it leads to a trade-off between trajectory efficiency and avoiding collisions withother players.

Finally, algorithms that are analogous to direct methods intrajectory optimization have also been developed [16, 17]. Analgorithm based on a first-order splitting method was proposedby Di [17] that is known to have a linear convergence rate. Theexperiments presented with this work show convergence ofthe algorithm after typically 103 to 104 iterations. A differentapproach based on Newton’s method has been proposed [16],but it is restricted to unconstrained dynamic games. Our solverbelongs to this family of approaches. It also relies on a second-order Newton-type method, but it is able to handle generalstate and control input constraints. In addition, we demonstrateconvergence on relatively complex problems in typically lessthan 102 iterations.

C. Generalized Nash Equilibrium Problems

As mentioned above, we focus on finding Nash equilibriafor multi-player games in which players are coupled throughshared state constraints (such as collision-avoidance con-straints). Therefore, these problems are instances of General-ized Nash Equilibrium Problems (GNEPs). The operations re-search field has a rich literature on GNEPs [20, 21, 22, 23, 24].Exact penalty methods have been proposed to solve GNEPs[21, 22]. Complex constraints such as those that couple play-ers’ strategies are handled using penalties, allowing solution ofmulti-player games jointly for all the players. However, theseexact penalty methods require minimization of nonsmoothobjective functions, which leads to slow convergence rates inpractice.

In the same vein, a penalty approach relying on an aug-mented Lagrangian formulation of the problem has beenadvanced by Pang et al. [20]. This work, however, convertsthe augmented Lagrangian formulation to a set of KKT con-ditions, including complementarity constraints. The resultingconstraint-satisfaction problem is solved with an off-the-shelflinear complementarity problem (LCP) solver that exploitsthe linearity of a specific problem. Our solver, in contrast,is not tailored for a specific example and can solve generalGNEPs. It draws inspiration from the augmented Lagrangianformulation, which does not introduce nonsmooth terms inthe objective function, enabling fast convergence. Moreover,this formulation avoids ill-conditioning, which improves thenumerical robustness of our solver.

III. PROBLEM STATEMENT

In the discretized trajectory optimization setting with N timesteps, we denote by n the state size, m the control input size,xν

k the state, and uνk the control input of player ν at the time

step k. In formulating the game, we do not distinguish betweenthe robot carrying out the computation, and the other agentswhose trajectories it is predicting. All agents are modeledequivalently, as is typical in the case of Nash style games.

Following the formalism of Facchinei [12], we considerthe GNEP with M players. Each player ν decides over itscontrol input variables Uν = [(uν

1 )T . . .(uν

N−1)T ]T ∈Rmν

. Thisis player ν’s strategy where mν denotes the dimension of thecontrol inputs controlled by player ν and mν = mν(N−1) isthe dimension of the whole trajectory of player ν’s controlinputs. By U−ν , we denote the vector of all the players’strategies except the one of player ν . Additionally, we definethe trajectory of state variables X = [(x2)

T . . .(xN)T ]T ∈ Rn

where n= n(N−1), which results from applying all the controlinputs decided by the players to a joint dynamical system,

xk+1 = f (xk,u1k , . . . ,u

Mk ) = f (xk,uk), (1)

with k denoting the time step index. The kinodynamic con-straints over the whole trajectory can be expressed with nequality constraints,

D(X ,U1, . . . ,UM) = D(X ,U) = 0 ∈ Rn. (2)

The cost function of each player is noted Jν(X ,Uν) :Rn+mν →R. It depends on player ν’s control inputs Uν as well as on thestate trajectory X , which is shared with all the other players.The goal of player ν is to select a strategy Uν and a statetrajectory X that minimizes the cost function Jν . Naturally,the choice of state trajectory X is constrained by the otherplayers’ strategies U−ν and the dynamics of the system viaEquation 2. In addition, the strategy Uν must respect a setof constraints that depends on the state trajectory X as wellas on the other players strategies U−ν . We express this witha concatenated set of inequality constraints C : Rn+m → Rnc .Formally,

minX ,Uν

Jν(X ,Uν),

s.t. D(X ,U) = 0,C(X ,U)≤ 0.

(3)

Problem (3), is a GNEP because of the constraints that couplethe strategies of all the players. A solution of this GNEP (ageneralized Nash equilibrium), is a vector U such that, for allν = 1, . . . ,M, Uν is a solution to (3) with the other players’strategies fixed to U−ν . This means that at an equilibrium pointU , no player can decrease their cost by unilaterally changingtheir strategy Uν to any other feasible point.

When solving for a generalized Nash equilibrium of thegame, U , we identify open-loop Nash equilibrium trajectories,in the sense that the whole trajectory Uν is the best responseto the other players’ strategies U−ν given the initial state ofthe system x0. Thus the control signal is a function of time,

not of the current state of the system1 xk. However, one canrepeatedly resolve the open-loop game as new information isobtained over time to obtain a policy that is closed-loop inthe model-predictive control sense, as demonstrated in SectionVII. This formulation is general enough to comprise multi-player general-sum dynamic games with nonlinear constraintson the states and control inputs. Practically, in the contextof autonomous driving, the cost function Jν encodes theobjective of player ν , while the concatenated set of constraintsC includes collision constraints coupled between players.

IV. AUGMENTED LAGRANGIAN FORMULATION

We propose an algorithm to solve the previously definedGNEP in the context of trajectory optimization. We expressthe condition that players are acting optimally to minimizetheir cost functions subject to constraints as an equality. Todo so, we first derive the augmented Lagrangian associatedwith (3) solved by each player. Then, we use the fact that, atan optimal point, the gradient of the augmented Lagrangian isnull [25]. Therefore, at a generalized Nash equilibrium point,the gradients of the augmented Lagrangians of all players mustbe null. Concatenating this set of M equality constraints withthe dynamics equality constraints, we obtain a set of equationsthat we solve using a quasi-Newton root-finding algorithm.

A. Individual Optimality

First, without loss of generality, we suppose that the vectorC is actually the concatenated set of inequality and equalityconstraints, i.e. C = [CT

i CTe ]

T ∈ Rnci+nce , where Ci ≤ 0 is thevector of inequality constraints and Ce = 0 is the vector ofequality constraints. To embed the notion that each playeris acting optimally, we formulate the augmented Lagrangianassociated with (3) for player ν . The dynamics constraints arehandled with the Lagrange multiplier term µν ∈Rn, while theother constraints are dealt with using both a multiplier and aquadratic penalty term specific to the augmented Lagrangianformulation. We denote by λ ∈ Rnc the Lagrange multipliersassociated with the vector of constraints C; ρ ∈Rnc is a penaltyweight vector;

Lν(X ,U) = Jν +µν T D+λ

TC+12

CT IρC. (4)

Iρ is a diagonal matrix defined as,

Iρ,kk =

{0 if Ck(X ,U)< 0 ∧ λk = 0, k ≤ nci,

ρk otherwise,(5)

where k = 1, . . . ,nci + nce indicates the kth constraint. It isimportant to notice that the Lagrange multipliers µν associatedwith the dynamics constraints are specific to each player ν , butthe Lagrange multipliers and penalties λ and ρ are commonto all players. Given the appropriate Lagrange multipliers µν

and λ , the gradient of the augmented Lagrangian with respectto the individual decision variables ∇X ,Uν Lν =Gν is null at an

1One might also explore solving for feedback Nash equilibria, where thestrategies are functions of the state of all agents. This is an interesting directionfor future work.

optimal point of (3). The fact that player ν is acting optimallyto minimize Jν under the constraints D and C can thereforebe expressed as follows,

∇X ,Uν Lν(X ,U,µν) = Gν(X ,U,µν) = 0. (6)

It is important to note that this equality constraint preservescoupling between players since the gradient Gν depends onthe other players’ strategies U−ν .

B. Root-Finding Problem

At a generalized Nash equilibrium, all players are acting op-timally and the dynamics constraints are respected. Therefore,to find an equilibrium point, we have to solve the followingroot-finding problem,

minX ,U,µ

0,

s.t. Gν(X ,U,µν) = 0, ∀ ν ∈ {1, . . . ,M},D(X ,U) = 0,

(7)

We use Newton’s method to solve the root-finding prob-lem. We denote by G the concatenation of the augmentedLagrangian gradients of all players and the dynamics con-straints, G(X ,U,µ) = [(G1)T , . . . ,(GM)T ,DT ]T , where µ =[(µ1)T , . . . ,(µM)T ]T ∈RnM . We compute the first order deriva-tive of G with respect to the primal variables X ,U andthe dual variables µ that we concatenate in a single vectory = [(X)T ,(U)T ,(µ)T ],

H = ∇X ,U,µ G = ∇yG. (8)

Newton’s method allows us to identify a search direction δyin the primal-dual space,

δy =−H−1G. (9)

We couple this search direction with a backtracking line-search[26] given in Algorithm 1 to ensure local convergence to asolution using Newton’s Method [26] detailed in Algorithm 2.

Algorithm 1 Backtracking line-search1: procedure LINESEARCH(y,G,δy)2: Parameters3: α = 1,4: β ∈ (0,1/2),5: τ ∈ (0,1),6: Until ||G(y+αδy)||1 < (1−αβ )||G(y)||1 do7: α ← τα

8: return α

C. Augmented Lagrangian Updates

To obtain convergence of the Lagrange multipliers λ , weupdate them with a dual-ascent step. This update can be seenas shifting the value of the penalty terms into the Lagrangemultiplier terms,

λk←

{max(0,λk +ρkCk(X ,U)) k ≤ nci,

λk +ρkCk(X ,U) nci < k ≤ nci +nce.(10)

Algorithm 2 Newton’s method for root-finding problem1: procedure NEWTON’SMETHOD(y)2: Until Convergence do3: G← [(∇X ,U1 L1)T , . . . ,(∇X ,UM LM)T ,DT ]T

4: H← ∇yG5: δy←−H−1G6: α ← LINESEARCH(y,G,δy)7: y← y+αδy8: return y

Algorithm 3 ALGAMES solver1: procedure ALGAMES(y0,ρ0)2: Initialization3: ρ ← ρ(0),4: λ ← 0,5: µν ← 0, ∀ν6: X ,U ← X (0),U (0)

7: Until Convergence do8: y← NEWTON’SMETHOD(y)9: λ ← DUALASCENT(y,λ ,ρ),

10: ρ ← INCREASINGSCHEDULE(ρ),11: return y

We also update the penalty weights according to an increasingschedule, with γ > 1:

ρk← γρk, ∀k ∈ {1, . . . ,nc}. (11)

D. ALGAMES

By combining Newton’s method for finding the point wherethe dynamics is respected and the gradients of the augmentedLagrangians are null with the Lagrange multiplier and penaltyupdates, we obtain our solver ALGAMES (Augmented La-grangian GAME-theoretic Solver) presented in Algorithm 3.The algorithm, which iteratively solves the GNEP, requires asinputs an initial guess for the primal-dual variables y(0) andinitial penalty weights ρ(0). The algorithm outputs the open-loop strategies of all players X ,U and the Lagrange multipliersassociated with the dynamics constraints µ .

E. Algorithm Complexity

Following a quasi-Newton approximation of the matrix H[26], we neglect some of the second-order derivative termsassociated with the constraints. Therefore, the most expensivepart of the algorithm is the Newton step defined by Equation9. By exploiting the sparsity pattern of the matrix H, wecan solve Equation 9 in O(N(n+m)3). Indeed, the sparsitystructure allows us to perform a back-substitution scheme akinto solving a Riccati equation, which has known complexityof O(N(n + m)3). The complexity is cubic in the numberof states n and the number of control inputs m, which aretypically linear in the number of players M. Therefore, theoverall complexity of the algorithm is O(NM3).

F. Algorithm Discussion

Here we discuss the inherent difficulty in solving forNash equilibria in large problems, and explain some of thelimitations of our approach. First of all, finding a Nashequilibrium is a non-convex problem in general. Indeed, itis known that even for single-shot discrete games, solving forexact Nash equilibria is computationally intractable for a largenumber of players [27]. It is therefore not surprising that, inour more difficult setting of a dynamic game in continuousspace, no guarantees can be provided about finding an exactNash equilibrium. Furthermore, in complex interaction spaces,constraints can be highly nonlinear and nonconvex. This isthe case in the autonomous driving context with collisionavoidance constraints. In this setting, one cannot expect tofind an algorithm working in polynomial time with guaranteedconvergence to a Nash equilibrium respecting constraints.On the other hand, local convergence of Newton’s methodto open-loop Nash equilibria (that is, starting sufficientlyclose to the equilibrium, the algorithm will converge to it)has been established in the unconstrained case [16]. Ourapproach solves a sequence of unconstrained problems via theaugmented Lagrangian formulation. Each of these problems,therefore, has guaranteed local convergence. However, theoverall method has no guarantee of global convergence to ageneralized Nash equilibrium, and this is expected given theknown computational difficulty of the problem.

Second, our algorithm requires an initial guess for thestate and control input trajectories X , U and the dynamicsmultipliers µ . Empirically, we observe that choosing µ = 0and simply rolling out the dynamics starting from the initialstate x0 without any control was a sufficiently good initialguess to get convergence to a local optimum that respects boththe constraints and the first-order optimality conditions. For adetailed empirical study of the convergence of ALGAMES andits failure cases, we refer to Sections VI-D and VI-E.

Finally, even for simple linear quadratic games, the Nashequilibrium solution is not necessarily unique. In general, anentire subspace of equilibria exists. In this case, the matrixH in Equation 9 will be singular. In practice, we regularizethis matrix so that large steps δy are penalized, resulting inan invertible matrix H and convergence to a Nash equilibriumthat minimizes the norm of y.

V. SIMULATIONS: DESIGN AND SETUP

We choose to apply our algorithm in the autonomous drivingcontext. Indeed, many maneuvers like lane changing, rampmerging, overtaking, and intersection crossing involve a highlevel of interaction between vehicles. We assume a single caris computing the trajectories for all cars in its neighborhood, soas to find its own trajectory to act safely among the group. Weassume that this car has access to a relatively good estimateof the surrounding cars’ objective functions. Such an estimatecould, in principle, be obtained by applying inverse optimalcontrol on observed trajectories of the surrounding cars.

In a real application, the car could be surrounded by othercars that might not necessarily follow a Nash equilibrium

strategy. In this case, we demonstrate empirically that byrepeating the computation as frequently as possible in an MPCfashion, we obtain safe and adaptive autonomous behaviors.

A. Autonomous Driving Problem1) Constraints: Each vehicle in the scene is an agent of the

game. Our objective is to find a generalized Nash equilibriumtrajectory for all of the vehicles. These trajectories have to bedynamically feasible. The dynamics constraints at time step kare expressed as follows,

xk+1 = f (xk,u1k , . . . ,u

Mk ). (12)

We consider a nonlinear unicycle model for the dynamics ofeach vehicle. A vehicle state, xν

k , is composed of a 2D position,a heading angle and a scalar velocity. The control input uν

k iscomposed of an angular velocity and a scalar acceleration.

In addition, it is critical that the trajectories respectcollision-avoidance constraints. We model the collision zoneof the vehicles as circles of radius r. The collision constraintsbetween vehicles are then simply expressed in terms of theposition xν

k of each vehicle,

r2−||xνk − xν ′

k ||22 ≤ 0, ∀ ν ,ν ′ ∈ {1, . . . ,M},ν 6= ν′. (13)

We also model boundaries of the road to force the vehicles toremain on the roadway. This means that the distance betweenthe vehicle and the closest point, q, on each boundary, b, hasto remain larger than the collision circle radius, r,

r2−||xνk −qb||22 ≤ 0, ∀ b, ∀ ν ∈ {1, . . . ,M}. (14)

In summary, based on reasonable simplifying assumptions,we have expressed the driving problem in terms of non-convexand non-linear coupled constraints.

2) Cost Function: We use a quadratic cost function penal-izing the use of control inputs and the distance between thecurrent state and the desired final state x f of the trajectory,

Jν(X ,Uν) =N−1

∑k=1

12(xk− x f )

T Q(xk− x f )+12

uνk

T Ruνk+ (15)

12(xN− x f )

T Q f (xN− x f ). (16)

This cost function only depends on the decision variables pν

of vehicle ν . Players’ behaviors are coupled only throughcollision constraints. We could also add terms depending onother vehicles’ strategies, such as a congestion penalty.

VI. COMPARISON TO ILQGAMES

A. MotivationIn order to evaluate the merits of ALGAMES, we compare it

to iLQGames [4] which is a DDP-based algorithm for solvinggeneral dynamic games. Both algorithms solve the problem byiteratively solving linear-quadratic approximations that havean analytical solution [11]. For iLQGames, the augmentedobjective function Jν differs from the objective function, Jν ,by a quadratic term penalizing constraint violations,

Jν(X ,U) = Jν(X ,U)+12

C(X ,U)T IρC(X ,U). (17)

Fig. 3. On the left, the three cars at their nominal initial state. On theright, the three cars are faded and standing at the nominal desired final state.The yellow car has successfully merged in between the two other cars. Theroadway boundaries are depicted in black.

Scenario # Players ALGAMES iLQGames2 38±10ms 104±23ms

Ramp merging 3 89±14ms 197±15ms4 860±251ms 705±209ms2 50±11ms 752±168ms

Intersection 3 116±22ms 362±93ms4 509±33ms 1905±498ms

Fig. 4. For each scenario and each number of players, we run each solver100 times to estimate the mean solve time and its standard deviation.

Where Iρ is defined by,

Iρ,kk =

{0 if Ck(X ,U)< 0, k ≤ nci,

ρk otherwise.(18)

Here ρ is an optimization hyperparameter that we can tune tosatisfy constraints. For ALGAMES, the augmented objectivefunction, Lν , is actually an augmented Lagrangian, see Equa-tion 4. The hyperparameters for ALGAMES are the initialvalue of ρ(0) and its increase rate γ defined in Equation 11.

B. Timing Experiments

We evaluate the performance of both algorithms in two sce-narios, see Figures 3 and 7, with the number of players varyingfrom two to four. To compare the speed of both algorithms,we set the termination criterion as a threshold on constraintviolations C ≤ 10−3. The timing results averaged over 100samples are presented in Table 4. First, we notice that bothalgorithms achieve real-time or near-real-time performance oncomplex autonomous driving scenarios (the horizon of thesolvers is fixed to 5s).

We observe that the speed performance of ALGAMES andiLQGames are comparable in the ramp merging scenario. Forthis scenario, we tuned the value of the penalty for iLQGamesto ρ = 102. Notice that for all scenarios the dimensions of theproblem are scaled so that the velocities and displacements areall the same order of magnitude. For the intersection scenario,we observe that the two-player and four-player cases both havemuch higher solve times for iLQGames compared to the 3-player case. Indeed, in those two cases, we had to increasethe penalty to ρ = 103, otherwise the iLQGames would plateauand never reach the constraint satisfaction criterion. This, inturn, slowed the algorithm down by decreasing the constraintviolation convergence rate.

−6 −4 −2 00

100

200

Constraint Viol.: log(C)−6 −4 −2 0

0

200

400

Constraint Viol.: log(C)

0 2 4 60

500

1,000

Solve time in s0 2 4 6

0

200

400

Solve time in s

0 50 100 1500

200400600

Solver iterations0 50 100 150

0100200300

Solver iterationsFig. 5. Monte Carlo analysis with 1000 randomly sampled initial states ofALGAMES on the left and iLQGames on the right. The top and middle plotsindicate maximum constraint violation of the solution at the end of the solveand the solve time respectively. The bottom left and right plots displays thenumber of Newton steps and the number of Riccati backward passes executedduring the solve of ALGAMES and iLQGames respectively.

C. Discussion

The main takeaway from these experiments is that, fora given scenario, it is generally possible to find a suitablevalue for ρ that will ensure the convergence of iLQGames toconstraint satisfaction. With higher values for ρ , we can reachbetter constraint satisfaction at the expense of slower conver-gence rate. In the context of a receding horizon implementation(MPC), finding a good choice of ρ that would suit the wholesequence of scenarios encountered by a vehicle could bedifficult. In contrast, the same hyperparameters ρ(0) = 1 andγ = 10 were used in ALGAMES for all the experiments acrossthis paper. This supports the idea that, thanks to its adaptivepenalty scheme, ALGAMES requires little tuning.

While performing the timing experiments, we also no-ticed several instances of oscillatory behavior for iLQGames.The solution would oscillate, preventing it from converging.This happened even after an adaptive regularization schemewas implemented to regularize iLQGames’ Riccati backwardpasses. Oscillatory behavior was not seen with ALGAMES.We hypothesize that this is due to the dual ascent updatecoupled with the penalty logic detailed in Equations 10 and5, which add hysteresis to the solver.

D. Monte Carlo Analysis

To evaluate the robustness of ALGAMES, we performed aMonte Carlo analysis of its performance on a ramp mergingproblem. First, we set up a roadway with hard boundaries aspictured in Fig. 3. We position two vehicles on the roadwayand one on the ramp in a collision-free initial configuration.

Scenario Freq. in Hz E[δ t] in ms σ [δ t] in msRamp Merging 69 14 72

Intersection 66 15 66Fig. 6. Running the MPC implementation of ALGAMES 100 times on bothscenarios, we obtain the mean update frequency of the MPC as well as themean and standard deviation of δ t, the time required to update the MPC plan.

Fig. 7. The blue car starts on the left and finishes on the right. The reddoes the opposite. The pedestrian with the yellow collision avoidance cylindercrosses the road from the bottom to the top of the image.

We choose a desired final state where the incoming vehiclehas merged into the traffic. Our objective is to generate gen-eralized Nash equilibrium trajectories for the three vehicles.These trajectories are collision-free and cannot be improvedunilaterally by any player. To introduce randomness in thesolving process, we apply a random perturbation to the initialstate of the problem. Specifically, we perturb x0 by adding auniformly sampled noise. This would typically correspond todisplacing the initial position of the vehicles by±1m, changingtheir initial velocity by ±3% and their heading by ±2.5◦.

We observe in Figure 5, that ALGAMES consistently findsa satisfactory solution to the problem using the same hyperpa-rameters ρ(0) = 1 and γ = 10. Out of the 1000 samples 99.5%converged to constraint satisfaction C≤ 10−3 while respectingthe optimality criterion ||G||1 < 10−2. By definition, ||G||1is a merit function for satisfying optimality and dynamicsconstraints. We also observe that the solver converges to asolution in less than 0.2s for 96% of the samples. The solverrequires less than 16 Newton steps to converge for 94% ofthe samples. These empirical data tend to support the fact thatALGAMES is able to solve the class of ramp merging problemquickly and reliably.

For comparison, we present in Figure 5 the results obtainedwith iLQGames. We apply the same constraint satisfactioncriterion C ≤ 10−3. We fixed the value of the penalty hy-perparameter ρ for all the samples as it would not be a faircomparison to tune it for each sample. Only 3 samples did notconverge with iLQGames, this is a performance comparable toALGAMES for which 5 samples failed to converge. However,we observe that iLQGames is 3 times slower than ALGAMESwith an average solve time of 350 ms compared to 110 msand require on average 4 times more iterations (9 against 41).

E. Solver Failure Cases

The Monte Carlo analysis allows us to identify the typi-cal failure cases of our solver. We empirically identify thecases where the solver does not satisfy the constraints or the

optimality criterion for the ramp merging problem. Typicallyin such cases, the initial guess, which consists of rollingout the dynamics with no control, is far from a reasonablesolution. Since the constraints are ignored during this initialrollout, the car at the back can overtake the car at the frontby driving through it. This creates an initial guess whereconstraints are strongly violated. Moreover, we hypothesizethat the tight roadway boundary constraints tend to stronglypenalize solutions that would ’disentangle’ the car trajectoriesbecause they would require large boundary violation at first.Therefore, the solver gets stuck in this local optimum wherecars overlap each other. Sampling several initial guesses withrandom initial control inputs and solving in parallel couldreduce the occurrence of these failure cases. Also being ableto detect, reject and re-sample initial guesses when the initialcar trajectories are strongly entangled could also improve therobustness of the solver.

VII. MPC IMPLEMENTATION OF ALGAMES

In this section, we propose a model-predictive control(MPC) implementation of the algorithm that demonstratesreal-time performance. The benefits of the MPC are twofold:it provides a feedback policy instead of an open-loop strategy,and it can improve interactions with actors for which we donot have a good estimate of the objective function.

A. MPC Feedback Policy

First, the strategies identified by ALGAMES are open-loopNash equilibrium strategies. They are sequences of control in-puts. On the contrary, DDP-based approaches like iLQGames,solve for feedback Nash equilibrium strategies which providea sequence of control gains. In the MPC setting, we can obtaina feedback policy with ALGAMES by updating the strategyas fast as possible and only executing the beginning of thestrategy. This assumes a fast update rate of the solution. Tosupport the feasibility of the approach, we implemented anMPC on the ramp merging scenario described in Figure 3.There are 3 players constantly maintaining a 40 time stepstrategy with 3 seconds of horizon. We simulate 3 secondsof operation of the MPC by constantly updating the strategiesand propagating noisy unicycle dynamics for each vehicle. Wecompile the results from 100 MPC trajectories in Table 6. Weobtain a 69 Hz update frequency for the planner on average.We observe similar performance on the intersection problemdefined in Figure 7, with an update frequency of 66 Hz.

B. Adaptive Behavior

The second benefit of MPC is that it mitigates a majorassumption of this work. We assumed that the car we controlhas access to the exact objective functions of the surroundingcars. A more realistic setting would be that the car has accessto a noisy estimate of the surrounding cars’ objective func-tions. This estimate could be provided by an inverse optimalcontrol algorithm, for instance. By re-planning at a highenough frequency, an MPC implementation of ALGAMEScould safely control an agent who has an inaccurate estimate of

0 0.2 0.4 0.6 0.8 10

0.5

1

Scaled Time

Scal

edV

eloc

ity

Initial PlanMPC

Fig. 8. The top plot represents the initial strategy that the blue car computesfor all the players. There is one color dot for each players at each step of theplanned strategy. Notice that the blue car predicts that following a straight pathat a relatively constant speed will avoid collision with the yellow pedestrian.The middle chart shows the actual path of the 3 players by plotting theirpositions for each MPC strategy update. Notice that the blue car has to nudgeto avoid the yellow pedestrian. The bottom plot shows that the blue car alsodramatically slows down compared to the initial plan, to avoid colliding withthe yellow pedestrian.

surrounding agents’ objective functions. We further assumedthat the other players solve for Nash equilibrium strategies,which is not necessarily the case in the presence of selfishplayers.

To support this claim, we modified the intersection scenariodescribed in Figure 7 so that the blue car has a poor estimateof the pedestrian desired speed. Specifically, in its nominalstrategy, the blue car crosses the intersection without slowingdown crossing the crosswalk right after the yellow pedestrian.For this nominal strategy the blue car has solved for the GNEassuming that the pedestrian would have a desired speed vd .However, in reality, the pedestrian is crossing the road at speedv0 that is significantly lower than vd . In addition, the pedestrianis not solving for any GNE and is just crossing the road in astraight line at a constant speed v0. Therefore, the pedestrianobjective function assumed by the blue car does not capturethe real behavior of the pedestrian. However, when applyingthe MPC implementation of ALGAMES, we observe that theblue car gradually adapts its strategy to accommodate for thepedestrian, see Figure 8. Indeed, as the blue car gets closer tothe pedestrian the car significantly slows down compared tothe nominal strategy. Also, we observe that the car shifts tothe right of the roadway to avoid the pedestrian. This nudgingmaneuver was not present in the nominal plan because thepedestrian was expected to have crossed the lane already. Itis important to note that this adaptive behavior is observedeven though the blue car kept a constant and wrong estimateof the pedestrian’s objective function. Being able to refinethe estimates of other players objectives online could further

improve the adaptive property of the algorithm.

VIII. CONCLUSIONS

We have introduced a new algorithm for finding con-strained Nash equilibrium trajectories in multi-player dynamicgames. We demonstrated the performance and robustnessof the solver through a Monte Carlo analysis on complexautonomous driving scenarios including nonlinear and non-convex constraints. We have shown real-time performance forup to 4 players and implemented ALGAMES in a receding-horizon framework to give a feedback policy. We empiricallydemonstrated the ability to safely interact with players thatviolate the Nash equilibrium assumptions when the strategiesare updated fast enough online. The results we obtained fromALGAMES are promising, as they seem to let the vehiclesshare the responsibility for avoiding collisions, leading tonatural-looking trajectories where players are able to negotiatecomplex, interactive traffic scenarios that are challenging fortraditional, non-game-theoretic trajectory planners. For thisreason, we believe that this solver could be a very efficienttool to generate trajectories in situations where the level ofinteraction between players is strong. Our implementationof ALGAMES is available at https://github.com/RoboticExplorationLab/ALGAMES.jl.

REFERENCES

[1] P. Trautman and A. Krause, “Unfreezing the robot: Nav-igation in dense, interacting crowds,” in 2010 IEEE/RSJInternational Conference on Intelligent Robots and Sys-tems, (Taipei), pp. 797–803, IEEE, Oct. 2010.

[2] D. Sadigh, S. S. Sastry, S. A. Seshia, and A. Dragan, “In-formation gathering actions over human internal state,” in2016 IEEE/RSJ International Conference on IntelligentRobots and Systems (IROS), (Daejeon, South Korea),pp. 66–73, IEEE, Oct. 2016.

[3] D. Sadigh, S. Sastry, S. A. Seshia, and A. D. Dragan,“Planning for Autonomous Cars that Leverage Effectson Human Actions,” in Robotics: Science and SystemsXII, Robotics: Science and Systems Foundation, 2016.

[4] D. Fridovich-Keil, E. Ratner, J. C. Shih, A. D. Dragan,and C. J. Tomlin, “Iterative Linear Quadratic Approx-imations for Nonlinear Multi-Player General-Sum Dif-ferential Games,” arXiv preprint arXiv:1909.04694, p. 8,2019.

[5] A. Dreves and M. Gerdts, “A generalized Nash equi-librium approach for optimal control problems of au-tonomous cars: A generalized Nash equilibrium approachfor optimal control problems of autonomous cars,” Opti-mal Control Applications and Methods, vol. 39, pp. 326–342, Jan. 2018.

[6] J. F. Fisac, E. Bronstein, E. Stefansson, D. Sadigh,S. S. Sastry, and A. D. Dragan, “Hierarchical Game-Theoretic Planning for Autonomous Vehicles,” in 2019International Conference on Robotics and Automation(ICRA), (Montreal, QC, Canada), pp. 9590–9596, IEEE,May 2019.

[7] E. Schmerling, K. Leung, W. Vollprecht, and M. Pavone,“Multimodal Probabilistic Model-Based Planning forHuman-Robot Interaction,” in 2018 IEEE InternationalConference on Robotics and Automation (ICRA), (Bris-bane, QLD), pp. 1–9, IEEE, May 2018.

[8] R. Spica, D. Falanga, E. Cristofalo, E. Montijano,D. Scaramuzza, and M. Schwager, “A Real-Time GameTheoretic Planner for Autonomous Two-Player DroneRacing,” arXiv:1801.02302 [cs], Jan. 2018.

[9] A. Liniger and J. Lygeros, “A Noncooperative GameApproach to Autonomous Racing,” IEEE Transactionson Control Systems Technology, pp. 1–14, 2019.

[10] A. Bressan, “Noncooperative Differential Games. A Tu-torial,” Department of Mathematics, Penn State Univer-sity, p. 81, 2010.

[11] T. Basar and G. J. Olsder, Dynamic noncooperative gametheory, vol. 23. Siam, 1999.

[12] F. Facchinei and C. Kanzow, “Generalized Nash equilib-rium problems,” 4OR, vol. 5, pp. 173–210, Sept. 2007.

[13] J. H. Yoo and R. Langari, “Stackelberg Game BasedModel of Highway Driving,” in Volume 1: Adaptive Con-trol; Advanced Vehicle Propulsion Systems; AerospaceSystems; Autonomous Systems; Battery Modeling; Bio-chemical Systems; Control Over Networks; Control Sys-tems Design; Cooperativ, (Fort Lauderdale, Florida,USA), pp. 499–508, ASME, Oct. 2012.

[14] A. Britzelmeier, A. Dreves, and M. Gerdts, “Numericalsolution of potential games arising in the control of co-operative automatic vehicles,” in 2019 Proceedings of theConference on Control and Its Applications (W. S. Levineand R. Stockbridge, eds.), (Philadelphia, PA), pp. 38–45, Society for Industrial and Applied Mathematics, Jan.2019.

[15] B. Di and A. Lamperski, “Differential Dynamic Program-ming for Nonlinear Dynamic Games,” arXiv:1809.08302[math], Sept. 2018.

[16] B. Di and A. Lamperski, “Newton’s Method and Differ-ential Dynamic Programming for Unconstrained Nonlin-ear Dynamic Games,” arXiv:1906.09097 [cs, eess], Jan.2020.

[17] B. Di and A. Lamperski, “First-Order Algorithmsfor Constrained Nonlinear Dynamic Games,”arXiv:2001.01826 [cs, eess], Jan. 2020.

[18] M. Wang, Z. Wang, J. Talbot, J. Christian Gerdes,and M. Schwager, “Game Theoretic Planning for Self-Driving Cars in Competitive Scenarios,” in Robotics:Science and Systems XV, Robotics: Science and SystemsFoundation, June 2019.

[19] J. Morimoto and C. G. Atkeson, “Minimax DifferentialDynamic Programming: An Application to Robust BipedWalking,” Advances in neural information processingsystems, pp. 1563–1570, 2003.

[20] J.-S. Pang and M. Fukushima, “Quasi-variational in-equalities, generalized Nash equilibria, and multi-leader-follower games,” Computational Management Science,vol. 2, pp. 21–56, Jan. 2005.

https://github.com/RoboticExplorationLab/ALGAMES.jl

https://github.com/RoboticExplorationLab/ALGAMES.jl

[21] F. Facchinei and J.-S. Pang, “Exact penalty functionsfor generalized Nash problems,” Large-scale nonlinearoptimization, pp. 115–126, 2006.

[22] F. Facchinei, A. Fischer, and V. Piccialli, “GeneralizedNash equilibrium problems and Newton methods,” Math-ematical Programming, vol. 117, pp. 163–194, Mar.2009.

[23] F. Facchinei and C. Kanzow, “Penalty Methods for theSolution of Generalized Nash Equilibrium Problems,”SIAM Journal on Optimization, vol. 20, pp. 2228–2253,Jan. 2010.

[24] M. Fukushima, “Restricted generalized Nash equilibriaand controlled penalty algorithm,” Computational Man-agement Science, vol. 8, pp. 201–218, Aug. 2011.

[25] D. P. Bertsekas, Constrained optimization and Lagrangemultiplier methods. Academic press, 2014.

[26] J. Nocedal and S. Wright, Numerical optimization.Springer Science & Business Media, 2006.

[27] C. Daskalakis, P. W. Goldberg, and C. H. Papadimitriou,“The complexity of computing a Nash equilibrium,”SIAM Journal on Computing, vol. 39, no. 1, pp. 195–259, 2009.

algames: a fast solver for constrained dynamic games · robot’s trajectory on the other agents is...

Documents