efficient method for computing strategies for …tions to the multi-player pursuit evasion...

EFFICIENT METHOD FOR COMPUTING STRATEGIES FORSUCCESSIVE PURSUIT DIFFERENTIAL GAMES

A Thesis Presented

by

Reed Jensen

to

The Department of Electrical and Computer Engineering

in partial fulfillment of the requirementsfor the degree of

Master of Science

in

Electrical Engineering

Northeastern UniversityBoston, Massachusetts

April 2014

Efficient method for computing strategies for successive

pursuit differential games

Reed Jensen

April 2014

Abstract

In successive pursuit, a pursuer seeks to capture as many evaders as possible in successionin the shortest amount of time. At the same time, a coalition of evaders seeks to maximizecapture time (or prevent capture entirely) with or without the knowledge of the pursuer’scontrol law or preferred capture order. This study seeks to obtain a control strategy for boththe pursuer and the coalition of evaders that is robust to uncertainty and variation in thepursuer or evader coalition strategy and that can be computed in a reasonable amount oftime. A combination of techniques from differential game theory and discrete optimizationare employed to compute such a strategy. In particular, a sub-optimal numerical approachusing limited lookahead and a Monte Carlo tree search algorithm are used to obtain solutionsin the presence of a high-dimensional action space. Examples are presented for both simplepursuit dynamics and the dynamics of the so-called Homicidal Chauffeur game.

This work is sponsored by the Department of the Air Force under Air Force ContractFA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are thoseof the author and are not necessarily endorsed by the United States Government.

Acknowledgments

I want to thank Dr. Mykel Kochenderfer and Dr. Bahram Shafai for their counsel andsupport, and my wife and family for their love and dedication.

Contents

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Problem formulation 11

2.1 Two-player differential pursuit game . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Simple pursuit game formulation . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Homicidal Chauffeur game formulation . . . . . . . . . . . . . . . . . . . . . 15

2.4 Differential games with multiple players . . . . . . . . . . . . . . . . . . . . . 16

3 Optimal and approximate solutions 19

3.1 Solution approach for the two-player differential pursuit game . . . . . . . . 20

3.2 Two-player simple pursuit example . . . . . . . . . . . . . . . . . . . . . . . 24

3.3 Homicidal Chauffeur example . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4 Value function when Isaacs condition not satisfied . . . . . . . . . . . . . . . 38

3.5 Limited lookahead for multi-player games . . . . . . . . . . . . . . . . . . . . 39

3.6 Approximating cost-to-go for limited lookahead . . . . . . . . . . . . . . . . 41

3.7 Example solution for simple pursuit of several evaders . . . . . . . . . . . . . 44

4 Simulation approach 50

4.1 Numerical solutions to the successive pursuit game . . . . . . . . . . . . . . 50

4.2 Simulation using the limited lookahead method . . . . . . . . . . . . . . . . 52

4.3 Combinatorial optimization using tree search . . . . . . . . . . . . . . . . . . 54

4.4 Computational resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

i

5 Results and analysis 60

5.1 Limited lookahead performance with one pursuer and two evaders . . . . . . 61

5.2 Tree search performance with many evaders . . . . . . . . . . . . . . . . . . 65

5.3 Lookahead performance with many evaders . . . . . . . . . . . . . . . . . . . 67

5.4 Limited lookahead and the Homicidal Chauffeur game . . . . . . . . . . . . . 73

6 Conclusion and Future Work 77

Bibliography 80

ii

List of Figures

2.1 Reduced coordinates for the Homicidal Chauffeur game. The pursuer is lo-cated at the center with its heading aligned with the x2 axis. A turn by thepursuer causes the coordinate system to rotate about the point C. . . . . . . 16

3.1 The value map and singular surfaces of a Homicidal Chauffeur game withvp = 3, ve = 1, ω = 1/3 and ε = 1. Coordinates are centered on the pursuer,with the pursuer heading aligned with the vertical (x2) axis. The contoursrepresent capture times for various initial conditions, sampled at 0.5 timeunits and increasing outward from the useable part (UP) of the target set.All distances and times are normalized by the pursuer speed. . . . . . . . . . 31

3.2 Map of optimal capture times for successive pursuit of two evaders, normal-ized by the separation distance between the two evaders (reproduced fromBreakwell [1] with kind permission from Springer Science and Business Me-dia). Capture times are represented by solid contours, and sample optimaltrajectories of the pursuer relative to the two-evader system are represented bydashed lines. Note that initial conditions from regions 3 and 6 yield optimaltrajectories that contain curved motion in inertial space. . . . . . . . . . . . 46

4.1 A sample MCTS minimizing search tree for a four-evader successive pursuitgame. Each node represents a simulation run and each edge an evader in acapture sequence. The number on each node is the running expected capturetime. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.1 Sample engagement using limited lookahead as compared with the optimalresult (denoted by ∗ and dashed lines) in the linear motion regime. . . . . . 62

5.2 Sample engagement in Breakwell’s “curved motion” zone (capture sequencenot fixed) using limited lookahead. The final capture time is 12.5 sec shorterthan the fixed sequence capture time. . . . . . . . . . . . . . . . . . . . . . . 64

5.3 Side-by-side comparison of full two-evader solution (adapted from Breakwell[1]) with the limited lookahead results for a variety of initial pursuer loca-tions. Solid contours represent capture times (normalized by the initial evaderseparation distance and pursuer speed), while dashed lines represent sampletrajectories relative to the two-evader system. The focal and dispersal linesappear along the bottom of the figure. . . . . . . . . . . . . . . . . . . . . . 65

iii

5.4 Average number of iterations for MCTS to achieve optimal and sub-optimal(within 1% error) results as compared to brute force (N ! iterations). Theerror bars represent one standard deviation. . . . . . . . . . . . . . . . . . . 67

5.5 Scenario with three evaders starting in the linear motion regime. The optimalsolution is represented by dashed lines. . . . . . . . . . . . . . . . . . . . . . 68

5.6 Three-evader scenario, with two starting in the curved motion regime. . . . . 69

5.7 Scenario with four evaders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.8 Limited lookahead results in inertial and pursuer-centric coordinates for atwo-player Homicidal Chauffeur game. In this scenario, the optimal play forthe evader is to follow the pursuer for a brief period until the pursuer can turnaround. In the right figure, the game trajectory in pursuer-centric coordinatesreveals several singular surfaces. The game trajectory moves along a universalline, departs from a dispersal line, moves around a barrier, and returns againto a universal line before reaching the target set. . . . . . . . . . . . . . . . 74

5.9 Limited lookahead results for a three-player Homicidal Chauffeur game. . . . 76

iv

Chapter 1

Introduction

Pursuit games provide a way of modeling conflict by representing competition as a pursuer

seeking to catch an evader and minimize some objective such as capture time, and an evader

seeking to maximize the same or to avoid capture entirely. The modeling of conflicts arises

in a large variety of domains including biology, economics, operations research, navigation

and collision avoidance, military applications, and control systems and engineering design.

Conflict models are often used to determine optimal participant strategies or controls that

maximizes a player’s benefits or minimizes worst-case cost.

In some models of conflict processes there can be many competing parties or players

seeking to optimize their own benefits. The analysis of optimal player decisions in dynamic

games involving multiple players can be difficult. To date, a general solution to multi-player

differential pursuit games – games with the state dynamics governed by differential equations

– is not yet available. Because of the number of various player pairings, multi-player games

may also suffer from the so-called “curse of dimensionality” where large state and action

spaces can make analytical and numerical solutions difficult. As a novel contribution to the

literature, it is the goal of this work to demonstrate at least an approximate solution to

1

zero-sum, multi-player differential games using a modern discrete optimization technique to

manage the high-dimensionality of the multi-player problem.

This work will focus on solutions to a successive pursuit differential game where pursuers

seek sequential capture of all the evaders, and each evader attempts to delay capture as long

as possible. The evaders work together as a coalition with perfect knowledge of all evader

states to maximize the game objective, while a pursuer or team of pursuers seeks to minimize

the same objective, which for the examples presented will be capture time. The goal is to

find player strategies that can be executed independent of the opposing players’ controls,

including pursuer capture order, that guarantee at least a minimum amount of performance.

A successive pursuit game with a single pursuer and multiple evaders can be seen as an

extension to the classical Traveling Salesman Problem (TSP) where a salesman seeks the

shortest path to visit every city once. The solution to TSP is known to be NP-complete,

though several efficient approximation schemes have been developed [2]. In the sense of

visiting a combination of target points, Belousov et al [3] have labeled the successive pursuit

problem the Dynamic Traveling Salesman Problem (DTSP), where the target “cities” (ner-

vous consumers?) now actively evade the pursuing salesman. The intent of this work is to

use a modern, efficient tree search method – Monte Carlo Tree Search – to demonstrate a

practical solution to the combinatorial DTSP in the presence of evader dynamics governed

by differential equations.

The following section discusses the differential pursuit game and defines the concepts of

a game solution and optimal strategies. The subsequent section includes a brief background

and summary of the latest research in successive pursuit games and the history and applica-

tion of Monte Carlo Tree Search. The final section will then introduce the remainder of the

paper.

2

1.1 Background

Conflict processes or games are called dynamic games if the order of the decisions made

by the different parties are important [4]. Dynamic games where the benefit to one player

exactly matches the detriment to the other are called zero-sum, and many conflict processes

have this property. Players that are at odds seek an optimal strategy that yields the largest

benefit to their party.

In conflict processes there are many ways to define optimality. For zero-sum games, one

way to define optimality is by determining the Nash equilibrium of the game. Under this

condition, no unilateral decision by one player or coalition of players can reduce the benefit

of the other player or coalition. A player strategy that guarantees a Nash equilibrium

is called a guaranteeing strategy and will be considered the definition of optimality for the

subsequent sections. Guaranteeing strategies ensure at least a minimum benefit to the player

that executes it regardless of the moves or decisions by the other players. For two-player,

zero-sum games this minimum benefit is called the game value.

Guaranteeing strategies can be useful in applications like robust control where the con-

troller seeks to maintain a minimum level of control performance in the presence of worst-case

noise or other uncertainties. In this case the roles of pursuer and evader, minimizer and max-

imizer, can be reversed depending on the application. Because guaranteeing strategies do

not necessarily require the knowledge of the other players’ controls, they can also be useful

in conflicts where information about the adversarial processes is limited. Of course, as more

knowledge about the opponent becomes available, it may be possible to form other optimal

strategies that yield a larger payoff.

Differential games are a type of dynamic game where the game state is described by a set

of differential equations and were introduced by Isaacs in the 1950s [5]. Some advantages of

using the differential game formulation are that it may provide a continuous game solution

3

in time and/or space, define entire sets of game trajectories that meet a specified condition

such as capture or escape, or reveal singularities in the game that may have profound effects

on the game outcome and optimal player controls. Differential games are defined by the

state differential equations, the game state space, admissible player information and control

sets, player preferences and objectives, and the target or termination sets for each player.

Solving a differential game often involves determining the game value and associated

player controls that solve a set of partial differential equations (PDEs) called the Hamilton-

Jacobi-Bellman-Isaacs (HJI) equations. Candidate trajectories from the HJI solution are

then checked to ensure that they terminate on the target set, fill the entire game space, and

meet boundary conditions at the boundaries of the game space and other singular surfaces

that may appear. Verification of candidate solutions and the discovery and characterization

of singular surfaces contribute to the difficulty of solving differential games analytically and

numerically.

The challenge of solving sets of partial differential equations with possible discontinuous

solutions has been partially addressed by the identification of viscosity solutions as weak

solutions to the HJI equations and the development of several numerical solution approaches

(see [6] for an overview). However, numerical solutions to PDEs can be time consuming,

which for practical applications has motivated the development of methods for approximating

multi-player differential game solutions. In his doctoral dissertation, Li [7] presents a method

for approximating solutions to zero-sum, multi-player differential pursuit games using a

limited lookahead technique akin to the limited lookahead of optimal control. Li proves that

after a finite number of iterations, the limited lookahead technique can approach the optimal

game value. It is his technique that this work will adopt to achieve efficient differential game

solutions.

The ultimate contribution of this work is to combine the limited lookahead approach of

Li with an efficient, modern tree search method – Monte Carlo Tree Search – to solve the

4

differential successive pursuit game in the presence of many evaders in a practical amount of

time. This then provides an automated way to derive robust control strategies for competitive

processes.

1.2 History

The study of differential games was introduced by Isaacs [5] in the 1950’s when he devised

several games relevant to military conflicts, including the “Homicidal Chauffeur” game, and

formulated their solutions. In his work he combined concepts from classical game theory

and control theory to derive optimal control strategies for several dynamical systems that

can be represented by differential equations. To do so, he used the dynamic programming

principle in conjunction with a set of partial differential equations that now include his

name – the Hamilton-Jacobi-Bellman-Isaacs (HJI) equations. Additionally he discovered

many singular phenomena that arise within differential games that have a profound impact

on game outcomes.

His work on differential games and singular surfaces were later continued by J. V. Break-

well, P. Bernhard, A. Merz, and J. Lewin (for just a few examples, see [1, 8, 9]). In the 1980’s,

work by Crandall and Lions [10] and independently by Subbotin and Krassovski [11, 12] lead

to the notion of viscosity solutions, which are weak solutions to the HJI PDEs. These con-

cepts have been developed for several variations of the HJI equations and allow for both

smooth and non-smooth solutions. This has enabled many modern numerical approaches to

solving differential games such as level set methods (see [13, 14, 15, 6, 16, 17]).

Recently there has been much interest in the study of multi-player differential pursuit

games with a variety of dynamics and objectives. Zemskov et al [18] consider a single

pursuer and two evaders with the “game of two cars” dynamics. Shevchenko [19] considers a

similar problem with two terminal manifolds but in the context of search and identification.

5

Bhattacharya [20] addresses non-singular solutions to a spatial jamming problem as a zero-

sum multi-player differential game. Fuchs et al [21, 22] examine cooperation among multiple

evaders through a modified cost function to encourage pursuer retreat, assuming open-loop

pursuer intent. Yeung and Petrosyan [23] consider cooperative stochastic differential games

that include non-zero-sum solutions. While this list is far from exhaustive, it does suggest

that a solution method for multi-player games that addresses varying dynamics, singular

solutions, closed-loop decision feedback, and stochastic behaviors would be of interest.

Differential games involving successive pursuit of multiple evaders were initially studied

by Breakwell et al [1], who also identify some singular surfaces within the game (see Section

3.7). They were also studied by Petrosjan [24] who proved that an infinite set of Nash

equilibria exist in non-zero-sum, many evader games. He also demonstrated that allowing

the pursuer to change its preferred capture order over time can improve its performance

[25]. In this vein, Shevchenko has considered open-loop, alternative capture sequences and

multiple terminal manifolds for successive capture [26, 27, 28]. Determining a closed-loop

optimal strategy for choosing between terminal sets in general multi-player pursuit games is

still an open problem.

For simple successive pursuit with a known capture order, Chikrii et al [29] derive the

optimal control for the pursuer and evaders and show that straight-line motion is optimal

for each party. Belousov et al [3] demonstrate a numerically efficient method to obtain the

Chikrii solution for a known capture order, claiming efficient computation for scenarios with

11 or 12 evaders. Berdyshev finds solutions for a pursuer with nonlinear motion constraints

[30, 31], also under the fixed capture sequence assumption. It should be noted that, because

these solutions require a fixed capture order, they omit some of the interesting curved motion

solutions and singular surfaces as described by Breakwell and Petrosjan that influence the

optimal capture time.

6

Liu et al [32] solve for the evader optimal open-loop control for the many-evaders suc-

cessive pursuit problem that does not assume a pursuer capture sequence. They compare

results with the solution from Belousov and the optimal HJI solution for the two-evader case,

showing also the linear and curved motion regions found by Breakwell et al. To improve

their solution for time-varying capture sequences, they also implement an iterative open-loop

approach. Computation times within one second are achieved for scenarios of five evaders

or less. The work does not, however, determine the optimal actions of the pursuer.

No general solution to multi-player differential games has been derived to date. Some of

the challenges to solving these games are the difficulty in defining appropriate terminal con-

ditions, solving complex partial differential equations, addressing capturability, and coping

with high-dimensional game and action spaces. Stipanovic et al [33] recently have looked

at Lyapunov methods as alternatives to solving PDEs and have also looked at capturability

[34]. Shevchenko [28] has examined the selection of alternative terminal manifolds for mul-

tiple evader problems. Studies of differential pursuit games with asymmetric information

[18, 35] and non-zero-sum objectives [25] are also of recent interest.

To avoid the analytical and computational difficulty of this problem, several approxima-

tions to the multi-player pursuit evasion differential game have recently been considered.

Jang et al [36] use direct differentiation of the game value function to solve a set of ordi-

nary differential equations rather than the HJI PDEs and obtain a non-cooperative set of

pursuer strategies. Bolonkin et al [37] uses a geometric method to approximate the single-

pursuer, multiple-evader problem quickly. Ge et al [38], Wang et al [39], and Wei et al

[40] independently use a hierarchical approach to solve the multi-player game, dividing it

into a collection of solvable subgames to reduce communication overhead and achieve real-

time performance in some instances. While these methods consider efficient solutions to the

multi-player problem, they do not necessarily claim optimality.

7

In his doctoral thesis, Li ([7], see also [41, 42, 43]) introduces a framework for approxi-

mating the solution to differential games with multiple pursuers and evaders. He extends the

concept of limited lookahead and rollout policies from optimal control [44] to multi-player,

zero-sum differential games and demonstrates the finite convergence of subsequent iterations

of the limited lookahead approach to the optimal solution. Furthermore, he shows that a

hierarchical approach similar to the studies above yields a valid estimate of the cost-to-go

for the limited lookahead method for certain successive pursuit games. Thus, Li’s approach

potentially realizes some of the computational efficiency benefits of the previous studies while

also achieving near-optimal results. It is this approach – limited lookahead with hierarchical

decomposition of the cost-to-go – that will be examined in this work.

In each of the works on successive pursuit by Li, Liu, and Belousov, among others, the

difficulty of the combinatorial nature of the problem is mentioned. This study seeks to extend

the results of these works by approximating the optimal controls of both players under a

variable capture sequence, as in Li, while also solving the combinatorial problem efficiently.

This will be accomplished by combining Li’s limited lookahead approach with the Monte

Carlo Tree Search method, a tool commonly used in discrete combinatorial games with high

branching factors.

A substantial review of Monte Carlo Tree Search (MCTS) and its variants can be found in

Browne et al [45]. MCTS has typically been used in the domain of two-player, discrete games

such as Go, where the method selects the best action sequences of each player, represented

by the branches of the tree, using random sampling, a tree search policy, and rollout-based

simulation. MCTS has also been used successfully in single-player games, decision theory

applications such as Markov decision processes, and optimization problems including the

traveling salesman problem and other NP-complete problems (see [46, 47, 48, 49]). Rimmel

et al [50] and Perez et al [47] have used MCTS with some success to solve TSP with time

windows and with dynamic constraints on the salesman motion.

8

Because MCTS is an anytime algorithm that returns a useful result even when termi-

nating prematurely, Perez et al [48] have used MCTS in conjunction with rolling horizon

evolutionary algorithms to find TSP solutions that also navigate obstacles in real time. The

anytime nature of the MCTS approach, its ability to quickly find valuable branches in com-

binatorial trees, and the compatibility of its rollout-based simulation approach with limited

lookahead suggest that MCTS is a prime candidate for addressing the Dynamic Traveling

Salesman problem covered in this work.

1.3 Outline

Before testing the ability of limited lookahead with MCTS to solve successive pursuit games,

it is necessary to introduce the theory and examples that will be used. The next chapter

presents the formulation of a differential game for two and several players and introduces

two differential game examples – simple pursuit and the Homicidal Chauffeur game.

The subsequent chapter formulates both optimal and sub-optimal solutions to these and

similar games that will be used in later analysis. The analytical solutions to the two-player

simple pursuit and Homicidal Chauffeur game are first reviewed, followed by a review of the

development of the limited lookahead method by Li for multi-player games. An overview

of existing solutions for the successive pursuit of many evaders and the proposed solution

approach for the above examples finish the chapter. Chapter 4 provides the assumptions

and implementation details for simulating limited lookahead and Monte Carlo Tree Search,

including the chosen numerical optimization and software packages.

Finally, Chapter 5 shows the results of the proposed technique for the successive pursuit

scenario. First the results are compared with known solutions for two-evader successive

pursuit with and without a fixed capture sequence. The performance of MCTS in selecting

the optimal capture sequence is examined in the next section. Results for limited lookahead

9

with MCTS for the many-evader scenario are then presented. The results conclude by testing

the technique in both the two-player and multi-player Homicidal Chauffeur game. Chapter

6 offers concluding remarks and suggests future work.

10

Chapter 2

Problem formulation

This paper considers zero-sum differential games with the competing players or processes

modeled as pursuers (minimizers) and evaders (maximizers). This chapter begins with the

formulation of a general zero-sum, two-player differential pursuit game. It then follows with

examples of the simple pursuit game and the Homicidal Chauffeur game, illustrating how

the dimension of different dynamical models can be reduced to a minimum set to ease game

analysis. Finally, the game formulation is extended to multiple players, which is the form

that will be used throughout the rest of the paper. The subsequent chapter will then address

the construction of game solutions.

2.1 Two-player differential pursuit game

A zero-sum pursuit-evasion (PE) differential game between a single pursuer and single evader

can be formulated as follows. Let the combined state variable of the pursuer and evader be

represented by x ∈ Rn, where the dimensionality n depends on the specific dynamics of the

game. The set of all possible states in the game is called the game set, denoted by S, and

11

can be a subset of the n-dimensional Euclidean state space. The dynamics of the game are

represented by f : Rn × Rnp × Rne → Rn,

x(t) = f(x(t), a(t), b(t)), x(0) = x0, a ∈ A, b ∈ B (2.1)

where a ∈ Rnp and b ∈ Rne are the control vectors of the pursuer and evader, respectively,

and A and B are the admissible control sets of the game. For this paper it is assumed that

f is single-valued and convex in a and b, the range of f is bounded and Lipschitz continuous

and the control sets A and B are convex. It should be noted that, while (2.1) does not

show explicitly its dependence on time t, time-dependent formulations can be considered by

including t in the state vector x. For this reason, explicit notation for time t will be generally

suppressed for the subsequent development when the state vector x is present.

A differential game terminates when the state vector reaches the target set Λ, defined as

a closed subset of the boundaries ∂S of the game set S. For this paper it will be assumed

that Λ is piecewise smooth and that termination occurs when the state velocity vector f

penetrates the target set, or

f(x, a, b) · n(x) < 0

where n(x) is a unit vector normal to the boundary ∂S. Additionally, the boundary of the

target set Λ will be denoted by a continuous and continuously differentiable scalar function

`(x) = 0.

The objective of the game for the pursuer (evader) is to minimize (maximize) a cost

function of the form

J(x, a, b) =

∫ T

0

G(x(t), a(t), b(t))dt+Q(x(T )) (2.2)

12

where G : Rn × Rnp × Rne → R is the running cost and Q : Rn → R is the terminal cost. It

is assumed that G has the same properties as f and that Q and its derivatives have at most

a finite number of jump discontinuities. Games with only a running cost term are called in

the literature games of degree, while games with only a terminal cost are called games of

kind. For the pure pursuit game, G = 1 and Q = 0 and the game cost is the capture time:

J(x, a, b) =

∫ T

0

dt (2.3)

where T is the capture time,

T = inf{t ∈ R+ : x(t) ∈ Λ}. (2.4)

It should be noted that, for games where the evader is guaranteed to escape (i.e., the target

set is never reached), T can be infinite.

In a two-player zero-sum game, the preference of each player is in pure conflict with

the other. Each player makes a control decision based on the game information, which for

this paper will be the true, current state vector and its histories x(τ), 0 ≤ τ ≤ t unless

otherwise stated. A policy for a pursuer (evader) that assigns a control vector a (b) from

the admissible control set A (B) to a state x(t) is called the pursuer’s (evader’s) closed-loop

strategy and will be denoted by α(x(t)) = a(t) (β(x(t)) = b(t)). The set of all admissible

strategies for the pursuer and evader will be denoted byA and B, respectively. As will be seen

in the Homicidal Chauffeur game, there may be instances where optimal play requires the

knowledge of the opponent’s control, i.e., the strategy is not admissible. In those cases, the

player’s deterministic strategy will be replaced with a mixed strategy (randomized control

selection) so as not to violate game information constraints.

The sections that follow define two example games, simple pursuit and the Homicidal

13

Chauffeur, that will be used in subsequent sections to demonstrate the sub-optimal ap-

proaches of the paper.

2.2 Simple pursuit game formulation

Two-player simple pursuit consists of a pursuer and evader that travel at maximum speeds

vp = 1 and v2 = ν, 0 < ν < 1, respectively, and can turn in any direction instantaneously.

The dynamics of the game are

xp = (vp sinφ, vp cosφ)T = (sinφ, cosφ)T , −π < φ ≤ π (2.5)

xe = (ve sinψ, ve cosψ)T = (ν sinψ, ν cosψ)T , −π < ψ ≤ π (2.6)

where xp,e represents the two-dimensional position of the pursuer and evader, respectively,

and φ and ψ are the respective controls. To simplify the analysis of the game, one can reduce

the dimensionality of the problem dynamics by defining a new state x = xe − xp relative to

the pursuer location. The dynamics then become

x = f(x, φ, ψ) = (ν sinψ − sinφ, ν cosψ − cosφ)T . (2.7)

The game set is S = {x : x ∈ R2} and the target set is a small circle around the pursuer

with radius ε, Λ = {x : x ∈ R2, x21 + x2

2 ≤ ε2}. The boundary of the target set can be

characterized by the scalar function `(x) = x21 +x2

2− ε2 = 0. The admissible controls at time

t are A = {φ : −π < φ ≤ π} and B = {ψ : −π < ψ ≤ π}. The pursuer (evader) seeks to

minimize (maximize) the capture time according to the cost function J in (2.3).

14

2.3 Homicidal Chauffeur game formulation

In the Homicidal Chauffeur game, the evader dynamics are equivalent to simple pursuit,

while the pursuer has an additional turn rate limit ω:

xp1 = sinxp3 xe1 = ν sinψ (2.8)

xp2 = cosxp3 xe2 = ν cosψ (2.9)

xp3 = ωφ (2.10)

The pursuer’s control is drawn from A = {φ : φ ∈ R, |φ| ≤ 1}, while the evader’s is taken

from the set B = {ψ : ψ ∈ R,−π < ψ ≤ π}. The game set, target set, information

constraints, and objective function are as in the simple pursuit game above.

The dimensionality of the problem can be reduced from n = 5 to n = 2 by transforming

the coordinates relative to the pursuer and folding in the turn rate,

x1 = −ωx2φ+ ν sinψ (2.11)

x2 = ωx1φ− 1 + ν cosψ. (2.12)

In this reduced-space formulation, the pursuer heading is fixed along the x2-axis such that

a turn causes the coordinate system to rotate. Consequently the evader control ψ adopts a

different meaning from the inertial coordinates in (2.8) (see Figure 2.1). The dynamics in

the reduced game space may be less intuitive but will make the mathematical analysis more

tractable.

15

Figure 2.1: Reduced coordinates for the Homicidal Chauffeur game. The pursuer is located atthe center with its heading aligned with the x2 axis. A turn by the pursuer causes the coordinatesystem to rotate about the point C.

2.4 Differential games with multiple players

To formulate a multi-player differential game, a few modifications need to be made to the

definitions earlier in the section. Here the formulation proceeds as in Li [7]. Assuming M

pursuers and N evaders, where each pursuer and evader is denoted by the index i and j

respectively, the dynamics of each pursuer and evader are

xip = f ip(xip(t), ai(t)), xip(0) = xip0 , i = 1, . . . ,M

xje = f je (xje(t), bj(t)), xje(0) = xje0 , j = 1, . . . , N

with respective controls ai ∈ Ai and bj ∈ Bj. The total state vector is then x , (xTp , xTe )T ,

xp , (xp1 , . . . , xpM )T , xe , (xe1 , . . . , xeN )T with the combined dynamics fp , (f 1p , . . . , f

Mp )T

and fe , (f 1e , . . . , f

Ne )T .

To define the termination of the multi-player pursuit game, additional definitions of a

terminal state are required. Let Pp,e(xp,e) : Rnp,ne → Rn be a projection operator that returns

16

the positional elements of dimension n from the respective state vector. Capture between

pursuer i and evader j occurs when ||Pp(xip) − Pe(xje)|| ≤ ε for t ≥ 0. The capture time of

the j-th evader is then signified by

Tj = {t ≥ 0 | ∃ i such that ||Pp(xip)− Pe(xje)|| ≤ ε} (2.13)

and the game is terminated at the final capture time,

T = maxj=1..N

Tj (2.14)

In the successive pursuit games in this paper, the game terminates only after the final

evader is captured. Following Li, define a discrete variable zj ∈ {0, 1} that assigns a value

of 0 to evader j when it is captured and 1 otherwise. The dynamics of each zj is governed

by the algebraic equations

gj(0, x) = 0

gj(1, x) =

0, if ||Pp(xip)− Pe(xje)|| ≤ ε for some i

1, otherwise

z(t) = z(t+) = g(z(t−), x(t)) (2.15)

where g , (g1, . . . , gN)T , z , (z1, . . . , zN)T , z ∈ Z = ΠNj=1Zj, Zj = {0, 1}, and z(t+), z(t−)

denote the left and right limits at time t, respectively. Assuming the evader stops after

capture, the dynamics can be revised to their final form as

x = f(x(t), z(t), a(t), b(t)), x(0) = x0 (2.16)

17

where f , (f ip, · · · , fMp , zje · f je , · · · , zNe · fNe )T and a ∈ Aa = ΠMi=1Ai, b ∈ Ba = ΠN

j=1Bj.

The pursuer (evader) seeks to minimize (maximize) the objective

J(x, z, a, b) =

∫ T

t0

G(x(t), z(t), a(t), b(t))dt+Q(x(T )) (2.17)

subject to (2.15) and (2.16), with the same restrictions on G(·) and Q(·) as in the two-player

formulation above. For the pure pursuit of multiple evaders the objective becomes

J(x, z, a, b) =

∫ T

t

[ N∑j=1

zj(t)

]dt (2.18)

which is the sum of the capture times for each evader.

18

Chapter 3

Optimal and approximate solutions

The solution to a differential game produces the optimal player controls and game outcome

given these controls. One advantage of the differential game formulation, though, is that

one can also obtain information about entire sets of game trajectories. For example, one

can determine the set of all initial conditions where a certain threshold objective, such as

capture, is met. Additionally, the differential solution with its continuous formulation can

reveal the “topography” of the game – conditions where certain decisions yield larger or

smaller payoffs, where or when critical control decision points occur, or when to use mixed

or behavioral strategies, for example.

It will be seen, however, that some solutions to differential games can be extremely com-

plex, even when the player dynamics are simple. Furthermore, such solutions may require

heavy numerical computation. For these reasons it is desirable to have a reliable approxi-

mation to the game solution, especially if the approximation can be realized in real-time.

The sections in this chapter outline the solution to the zero-sum formulation of a dif-

ferential game by first demonstrating the approach for solving a two-player zero-sum game.

The analytical solution to the simple pursuit and Homicidal Chauffeur games will be cov-

ered briefly, including some comments on the game topography. Approximation of the game

19

outcome will then be addressed, including an adaptation of the limited lookahead method

from optimal control to multi-player games, work formulated previously by Li [7]. Additional

techniques for approximating the cost-to-go estimate of the limited lookahead method, also

derived by Li, will be outlined and adapted to the present problem. Finally, the chapter

gives an example solution for successive capture of several evaders for the case where the

pursuer capture sequence is either known or unknown to the evaders.

3.1 Solution approach for the two-player differential pur-suit game

The solution to a two-player, zero-sum pursuit-evasion differential game consists of the fol-

lowing elements [51]:

• The capture set Sc ⊂ S where the capture of the evader is guaranteed, ∀x ∈ Sc

• The escape set Se ⊂ S (Se ∩ Sc = ∅) where capture is prevented indefinitely, ∀x ∈ Se

• The optimal pursuer strategy α∗ which guarantees game termination, ∀x ∈ Sc

• The optimal evader strategy β∗ which guarantees that a game does not terminate,

∀x ∈ Se

• The game value function V (x) = J(x, α∗, β∗), if it exists, representing the game out-

come

Optimal play or the optimal trajectory for a PE game is defined as the triplet (x, α∗, β∗)

for games where x ∈ Sc, and the value function V (x) = J(x, α∗, β∗) is the optimal outcome

at x.

20

A value function is said to exist if the following is satisfied [4]. First, define the upper

value function

V (x) = minα∈A

maxβ∈B

{∫ T

t

G(x(τ), α(x(τ)), β(x(τ)))dτ +Q(x(T ))}

(3.1)

and the lower value function as

V (x) = maxβ∈B

minα∈A

{∫ T

t

G(x(τ), α(x(τ)), β(x(τ)))dτ +Q(x(T ))}

(3.2)

Assuming V (x) is differentiable in t and x, it satisfies the partial differential equation

− ∂V

∂t= min

a∈Amaxb∈B

[∂V

∂xf(x, a, b) +G(x, a, b)

](3.3)

and analogously for V (x),

− ∂V

∂t= max

b∈Bmina∈A

[∂V

∂xf(x, a, b) +G(x, a, b)

]. (3.4)

If the upper and lower values are equal

V (x) = V (x) = V (x) (3.5)

then the so-called Isaacs condition is satisfied and one obtains a single game value V (x)

satisfying the Hamilton-Jacobi-Isaacs equation:

−∂V∂t

= mina∈A

maxb∈B

[∂V

∂xf(x, a, b) +G(x, a, b)

](3.6)

= maxb∈B

mina∈A

[∂V

∂xf(x, a, b) +G(x, a, b)

]. (3.7)

21

From Basar and Olsder [4], the following theorem establishes the existence of the game

value function.

Theorem 1. If a continuously differentiable function V (x) exists that (i) satisfies the HJI

equation (3.6), (ii) V (x(T )) = q(x(T )) on the boundary of the target set Λ, and (iii) either

α∗ or β∗ generates trajectories that terminate in finite time, then V (x) is the value function

and the pair (α∗, β∗) satisfy the saddle condition

J(x, α∗, β) ≤ J(x, α∗, β∗) ≤ J(x, α, β∗). (3.8)

The saddle-point condition (3.8) of the above zero-sum game constitutes the Nash equi-

librium of the differential game. Under this condition, neither player can improve their

guaranteed result, V (x), by a unilateral deviation from their optimal strategy [51]. For the

remainder of the paper, an optimal strategy is one in the sense of (3.8) and will be called

a guaranteeing strategy, since each party in the differential game can guarantee at least the

game value V (x).

It should be noted that the Isaacs condition will hold for f and G that are separable, i.e.,

f(x, a, b) = f1(x, a) + f2(x, b),

G(x, a, b) = G1(x, a) +G2(x, b).

For cases where the Isaacs condition does not hold, such as when the value function or its

derivative is discontinuous, one may solve for the upper value V (x). The formulation for

the sub-optimal solution of a pursuit-evasion game in Section 3.4 addresses this condition in

more detail.

It remains to show how to solve for the saddle-point equilibrium V (x) for the differential

game formulation above. The following theorem is also from Basar and Olsder [4]:

22

Theorem 2. Suppose the pair of feedback strategies (α∗, β∗) provides a saddle-point solution

to the differential game (2.1) - (2.4), with x∗(t) denoting the corresponding state trajectory.

Furthermore, let its open-loop representation {a(t) = α(t, x∗(t)), b(t) = β(t, x∗(t))} also

provide a saddle-point solution. Then there exists a costate function p(·) : [0, T ]→ Rn such

that the following relations are satisfied:

x∗(t) = f(x∗(t), a∗(t), b∗(t)), x∗(0) = x0 (3.9a)

H(x∗, p, a∗, b) ≤ H(x∗, p, a∗, b∗) ≤ H(x∗, p, a, b∗), ∀a ∈ A, ∀b ∈ B, (3.9b)

pT (t) = − ∂

∂xH(x∗(t), p(t), a∗(t), b∗(t)), (3.9c)

pT (T ) =∂

∂xQ(x∗(T )) along `(x(T )) = 0, (3.9d)

where

H(x, p, a, b) , G(x, a, b) + pTf(x, a, b) (3.10)

is the Hamiltonian and

H(x, p, a∗, b∗) = mina∈A

maxb∈B

H(x, p, a, b) (3.11)

is known as the first main equation of Isaacs [5]. In pursuit-evasion games, the costate

equation is the gradient of the value function,

pT (t) =∂

∂xV (x(t))

Note that in this case, the gradient of V (x(t)) is a function of time only.1

The equations in (3.9) can be used to solve for the (regular) optimal trajectories, control

strategies, and value function where it is continuous and differentiable. In the section that1Also recall that the state vector x(t) may contain the variable t.

23

follows, the solution methodology will be demonstrated for simple pursuit. Later a partial

solution for the Homicidal Chauffeur game will be shown.

There are many cases in differential games where V (x) is discontinuous in the derivative

or in the function itself, or when the optimal strategies α∗ or β∗ are not unique. These

situations give rise to singular surfaces which divide the game set into mutually disjoint

regions where V (x) is continuous. Within the continuous regions – the regular part of the

game space – the costate equations above can be solved to obtain regular trajectories. At the

discontinuous boundaries, however, additional techniques must be used to find the singular

surfaces. The Homicidal Chauffeur example in Section 3.3 provides a brief example of a

singular solution and identifies a few additional singular surfaces present in the game. For a

more detailed introduction to singular surfaces, see Lewin [51].

The determination of the capture and escape sets is another important element of the

differential PE game solution. Such sets can be constructed from the game set boundaries

or from singular surfaces. Since most of the example solutions in this paper are confined

to the capture set, a detailed discussion of determining capture and escape sets will not be

covered here. The example solutions in this chapter will address capturability briefly.

3.2 Two-player simple pursuit example

As an example of how to solve a differential game using the equations in (3.9), this section

examines the two-player, zero-sum simple pursuit game defined in Section 2.2. The goal is to

find a set of candidate optimal trajectories that begin in the capture region Sc and terminate

on the target set Λ. The simple pursuit example below will demonstrate the procedure for

finding solutions in the regular part of the game space. The procedure follows the approach

from Lewin [51].

24

To identify a candidate regular trajectory, one must begin by partitioning the target

set into a usable part where such trajectories may terminate, and a non-usable part where

optimal trajectories cannot terminate. The usable part of the target set ΛUP are the points

along the boundary ∂Λ that satisfy

ΛUP , {x ∈ ∂Λ | mina∈A

maxb∈B

[f(x, a, b) · n(x)] < 0} (3.12)

where n(x) ∈ Rn is a unit vector normal to the target set pointing into the game set. The

non-usable part ΛNUP can be defined analogously with the inequality reversed, and the

boundary ΛBUP is (3.12) where the condition is an equality.

For the simple pursuit game of Section 2.2, one can evaluate the condition in (3.12) by

first determining the controls a = φ ∈ {−π < φ ≤ π}, b = ψ ∈ {−π < ψ ≤ π}:

φ = arg minφ

[f(x, a, b) · n(x)

]= arg min

φ

[(ν sinψ − sinφ) x1/ε+ (ν cosψ − cosφ) x2/ε

]ψ = arg max

ψ

[f(x, a, b) · n(x)

]= arg max

ψ

[(ν sinψ − sinφ) x1/ε+ (ν cosψ − cosφ) x2/ε

]

where n(x) = (x1/ε, x2/ε)T and ε = x2

1 + x22 on the boundary of the target set. Evaluating

these optimization conditions yields

tan φ = x1/x2 = tan ψ,

signifying that the controls at the boundary point away from the origin. Substituting sin φ =

x1/ε, cos φ = x2/ε into the condition in (3.12), one obtains

f(x, a, b) · n(x) = (ν − 1)(x21 + x2

2) < 0, ∀x ∈ ∂Λ

since ν < 1 by definition. Since this is satisfied for all x, the entire target set boundary is

25

the usable part, ΛUP = ∂Λ. This means that the regular optimal trajectories can terminate

anywhere on the circle `(x) = x21 + x2

2 − ε2.

One now seeks the candidate optimal control laws for each player. This is obtained using

the first main equation of Isaacs (3.11):

φ∗ = arg minφH(x, p, φ, ψ∗)

= arg minφ

[p1(ν sinψ∗ − sinφ) + p2(ν cosψ∗ − cosφ) + 1

]ψ∗ = arg max

ψH(x, p, φ∗, ψ)

= arg maxψ

[p1(ν sinψ − sinφ∗) + p2(ν cosψ − cosφ∗) + 1

]

where H(x, p, φ, ψ) is the Hamiltonian from (3.10) and G(x, a, b) = 1.

Proceeding in the same manner as with boundary condition above, one obtains

tanφ∗ = p1/p2 = tanψ∗ (3.13)

suggesting that the player controls are parallel.2 To fully determine the controls, the costate

variables pT = (p1, p2) = ∂V∂x

need to be determined. Referring to (3.9c) and noting that H

is independent of x, one can deduce that p is constant and therefore the player controls yield

constant, straight-line motion.

The equation in (3.9c) is known as the adjoint equation or retro-path equation (RPE), as

it signifies an integration along a path from the terminal set in reverse time. Since p = 0, it

is evident that an additional condition is needed to solve for p. One condition can be found

from the terminal condition in (3.9d). Before proceeding, however, it is useful to state three

lemmas from Lewin [51]:2Anti-parallel would imply the optimal evader control is to always approach the pursuer!

26

Lemma 1. At points x in the useable part where optimal trajectories terminate:

V (x) = Q(x) (3.14)

Lemma 2. If the subset of the usable part where optimal trajectories terminate is of dimen-

sion m and if κ are m vectors that span the tangent to that subset at x, then the following

m relations between the directional derivatives of V (x) and Q(x) hold:

∇V (x) · κ = ∇Q(x) · κ (3.15)

Lemma 3. For points in the capture set Sc that belong to regular parts of optimal trajectories,

Equation (3.9b) and the following relation must hold:

H(x, p, a∗, b∗) = 0 (3.16)

The first lemma simply states that the value at the usable part of the terminal set is the

cost function itself. The second lemma gives a tangent boundary condition for the gradient

of the value function at the boundary (equivalent to the costate pT ) that can be used to solve

the RPE. The equations (3.9b) from the second lemma and (3.11) constitute the Isaacs first

main equation, and (3.16) is the second main equation. The second main equation suggests

that, for terminal cost games, the Hamiltonian for the regular part of the solution space can

be interpreted as a measure (in the informal sense) of how much the game trajectory points

perpendicular to the gradient of the game value.

To finish the simple pursuit solution, the tangent relation (3.15) suggests that, since

Q(x) = 0,

∇V (x) · κ = pT · κ = 0,

27

or that p is perpendicular to the terminal surface. Since φ∗ and ψ∗ are parallel to p, then

they must also point normal to surface. With κ = (−x2/ε, x1/ε)T at the terminal boundary,

one obtainsp1

p2

=x1

x2

=x1(0)

x2(0).

Since the player controls point in the direction of p1/p2, the optimal pursuer and evader

controls consist of straight-line motion away from the pursuer along the initial line connecting

the two players. Since the optimal trajectories cover the entire game space and can terminate

anywhere on the target set, it can be concluded that the locally-derived optimal controls are

globally optimal. Furthermore, since ν < 1 the pursuer will always overtake the evader for

any initial condition, so the capture set Sc is the entire game set S.

To determine the game value V (x), more conditions on p are needed. Using the second

main equation (3.16), one obtains

H(x, p, φ, ψ) = p1(ν sin ψ − sin φ) + p2(ν cos ψ − cos φ) + 1 = 0.

Substituting p1 = p2 tan φ = p2 tan ψ and noting that cos φ = cos ψ one obtains after some

algebra

cos ψ = cos φ = p2(1− ν)

sin ψ = sin φ = p1(1− ν)

which, when substituted back into (3.16) yields

p21 + p2

2 =

(1

1− ν

)2

28

Finally, substituting p1 = x1x2p2 into the above gives

p1 = ± x1√x2

1 + x22

(1

1− ν

)p2 = ± x2√

x21 + x2

2

(1

1− ν

)

which, when integrated with respect to x returns

V (x) =

√x2

1 + x22

1− ν+ C.

Using (3.14) from the first lemma resolves the constant of integration at the terminal set

to finally obtain

V (x) =

√x2

1 + x22 − ε

1− ν

which is the geometrically intuitive result. For example, for an evader speed of ν = 1/2, the

pursuer captures at a location twice the initial relative distance (minus the target radius)

along the initial bearing to the evader.

Such a result can be determined by simpler means using geometrical arguments, as Isaacs

did [5], but the result here does illustrate the basic solution procedure. The example in the

following section is more challenging and requires many of the tools presented here.

3.3 Homicidal Chauffeur example

The Homicidal Chauffeur (HC) game consists of a pursuer (a car) with a turn rate limit who

chases a pedestrian who is slower but can turn instantaneously. The game was introduced

by Isaacs, who showed that the solution exhibits a variety of singular surfaces. Merz [8] in

his dissertation discovered twelve different singular phenomena in twenty different regions

29

of the game’s parameter space. Singular surfaces have a profound effect on the formation

optimal player strategies, often requiring player controls to consist of several different stages

with a variety of control laws. The Homicidal Chauffeur game’s nonlinear dynamics and rich

set of singular surfaces make it a good candidate for testing the viability of the lookahead

method and its utility in computing complex strategies in an automated way.

This section demonstrates a few of the singular phenomena present the Homicidal Chauf-

feur game and addresses briefly the solution for a single set of parameters in a limited region

of the game space. The exposition below follows the works of Isaacs [5] and Merz [8].

The solution to HC begins as with the previous example by finding the usable and non-

usable parts of the circular target set. Let x = (ε sin θ, ε cos θ)T be a point on the boundary

of the target set `(x) = 0, with θ defined clockwise from the x2-axis. The vector normal to

the circle is n(x) = (n1, n2)T = (sin θ, cos θ)T . Using condition (3.12) one obtains

minφ

maxψ

[n1 ˙x1 + n2 ˙x2] = minφ

maxψ

[sin θ

(−ω(ε cos θ)φ+ ν sinψ

)+

cos θ(ω(ε sin θ)φ− 1 + ν cosψ

)]= max

ψ[− cos θ + ν cos(ψ − θ)]

= ν − cos θ < 0

which yields the condition for the angle at the boundary of the usable part θB

cos θB = ν, 0 ≤ θB ≤π

2(3.17)

where θB is confined to the first quadrant. The useable part is then

|θ| < θB (3.18)

30

with the boundary occurring at θB. The useable part (UP) is identified in the diagram in

Figure 3.1.

−4 −2 0 2 4

−2

−1

0

1

2

3

4

Universal Line

Barrier

Λ

UP

Regular Trajectories

Equivocal Line

Dispersal Line

Figure 3.1: The value map and singular surfaces of a Homicidal Chauffeur game with vp = 3, ve =1, ω = 1/3 and ε = 1. Coordinates are centered on the pursuer, with the pursuer heading alignedwith the vertical (x2) axis. The contours represent capture times for various initial conditions,sampled at 0.5 time units and increasing outward from the useable part (UP) of the target set. Alldistances and times are normalized by the pursuer speed.

For the range of parameters used in this game, the boundary of the usable part has an

interesting property – it is the origin of a singular surface called a barrier. A barrier is a line

or surface in the game set where the game value is discontinuous and is so called because

neither player can penetrate the surface if the other plays optimally. The condition for a

barrier is similar in form to Isaac’s main equations and the equation for the usable part:

mina∈A

maxb∈B

[f(x, a, b) · n(x)] = 0 (3.19)

31

where n(x) ∈ Rn in this case is a vector normal to the barrier surface. The derivation of the

barrier in this example is illustrative and will be given briefly; for full details, see [5].

Substituting the dynamics (2.11) into the main equation (3.16)

minφ

maxψ

(−ω (x2n1 − x1n2)φ− n2 + ν (n1 sinψ + n2 cosψ)

)= 0

and solving, one obtains

φ = sgn S = σ, σ ∈ {−1, 1}

where S = x2n1 − x1n2 is the switch function that determines the direction of the pursuer

control. For the evader,

cos ψ =n2

ρ, sin ψ =

n1

ρ, ρ =

√n2

1 + n22

and the main equation with φ, ψ becomes

−σωS − n2 + νρ = 0.

The RPE equation (3.9c) and trajectory equations (2.11) then become

x1 = −ωσx2 + νn1

ρ, x2 = ωσx1 − 1 + ν

n2

ρ

n1 = −ωσn2, n2 = ωσn1

32

With the additional condition S = −n1 and, on ∂Λ, σ = sgnn1 = sgn θB the above

equations can be solved to obtain the right barrier (σ = 1):

x1 = (ε− ντ) sin(θ + σωτ) +1− cosσωτ

ω(3.20)

x2 = (ε− ντ) cos(θ + σωτ) +sinσωτ

ω(3.21)

The left barrier (σ = −1) is symmetric with the right. Figure 3.1 illustrates the barrier

paths.

Note that the barrier paths terminate before reaching the x2 axis. The switch function

determines this termination point. Using the relations above, the switch function can be

found to be

S = [cos θ − cos(θ + σωτ)] .

The barrier continues until the switch function is no longer positive, or

θ + σωτ = 2π − θ.

At this point the barrier terminates and optimal trajectories can be routed around it.

An evader starting behind the barrier requires, then, that the pursuer travel away from

the evader for some period before turning full circle and finally pursuing along a straight

line in the same manner as the simple pursuit game. The evader, on the other hand, chases

the pursuer directly along their connecting line until their trajectory passes the end of the

barrier. For the rest of the pursuit, the evader flees along a straight line tangential to the

pursuer’s turning circle until capture occurs.

To address the game solution, one is interested in finding the optimal player controls and

trajectories. In this problem, the regular trajectories that emanate from the terminal set are

less significant than that of simple pursuit, as they originate fairly close to target set and

33

fill very little of the game set. The derivation of these trajectories is similar to the previous

examples and will not be reproduced here; more details can be found in [8] and [5]. The

regular trajectories can be found to be

x∗1 = (ε− ντ) sin (θB + ωτ) +1− cosωτ

ω.

x∗2 = (ε− ντ) cos (θB + ωτ) +sinωτ

ω

An example trajectory emanating from the target set can be seen in Figure 3.1. Note that

this trajectory begins on the barrier. For this case, all trajectories terminating on the usable

part except at x1 = 0 begin on the barrier at a point called the dispersal point and do not

fill the entire game set. This behavior leaves a void above the target set, and other methods

must be used to obtain candidate trajectories.

Much of the game set for this example is filled with optimal trajectories that are tribu-

taries to a singular line called a universal line by Isaacs. Optimal trajectories join this line

transversely from both sides and then travel along it. In this game the x2 axis constitutes the

universal line for x2 > ε as well as a portion below the target set (see Figure 3.1). Universal

lines act as tributaries for optimal trajectories such that, should a player act sub-optimally,

the optimal next move is to return to the line. Note that in some instances a non-admissible

strategy – one where one player must know the control of the other to act optimally – is

required to remain on the line. This can result in a chatter condition where the player con-

stantly oscillates to and from the surface. An example of this will be seen in the results of

Section 5.4.

Universal surfaces are often good candidates for finding optimal trajectories within voids.

On the universal line the switch function, its retrograde derivative, and the Hamiltonian are

all zero (see Lewin [51, p. 187]). Using these relations, one can derive the following optimal

34

trajectories that fill much of the void of the present game [5]:

x1 = (h− ντ) sinσωτ +1− cosσωτ

ω

x2 = (h− ντ) cosσωτ +sinσωτ

ω

and the value function

V =h− ε1− ν

+ τ

where h is the distance along x2 from ε where the optimal trajectory contacts the universal

line. To compute V one can rearrange the trajectory equations and solve for τ and h assuming

the condition x1 = 0 when τ = 0:

cosσωτ =−R(x1 −R) + x2

√x2

1 + x22 − 2x1R

(x1 −R)2 + x22

sinσωτ =x2R + (x1 −R)

√x2

1 + x22 − 2x1R

(x1 −R)2 + x22

where R = 1/ω is the turn radius. It is these trajectories and the contours of this expression

for V that fill much of the region shown in Figure 3.1.

The optimal player controls for this region, aside from the area behind the barrier, are

as described previously, where the pursuer executes a hard turn in one direction and follows

with straight simple pursuit, while the evader flees along a straight line tangentially from

the pursuer turning circle. A detailed derivation of these control strategies can be found in

Isaacs [5] or Merz [8].

It is evident from the phenomena of the barrier and universal line in this example that

singular surfaces can have a significant effect on player strategies. The presence and type

of singular surfaces in a differential game can vary according to the game parameters. The

35

parameters for the present section follow the example from Patsko et al [14] with an evader-

to-pursuer speed ratio ν = 1/3, a capture radius ε = 1 and turn rate ω = 1/3. These

parameters correspond to Region IIc of Merz, wherein a barrier, a universal line, a pursuer

dispersal line, an equivocal line, and safe contact may be encountered in the course of play

(see [8], also [52]). The barrier and universal line concepts have been discussed previously.

A dispersal line for a player indicates a set of points where the player, upon reaching

that point, must decide between two equally valid optimal strategies and make an immediate

change of course using the selected control. In this Homicidal Chauffeur example, a pursuer

dispersal line occurs along the negative x2 axis (see Figure 3.1) where the pursuer faces

directly away from the evader and must choose to turn sharply either right or left. Both the

gradient of V and the switch function are discontinuous across this line.

An equivocal line for a player indicates a set of points where the optimal strategy for

the player may be either to choose to remain on the equivocal line or to deviate. In this

game, an equivocal line for the evader extends from the end of the barrier to the negative

x2 axis, joining at the junction of the universal line and the dispersal line (again refer to

Figure 3.1). If the evader chooses to remain on the line it can travel to the end of the barrier,

along which it can travel to terminate the game tangentially along the terminal set. The

behavior of traveling alongside a boundary, barrier, or terminal set is called safe contact. If

for a particular game a mere grazing of the terminal set is a result preferred by the evader,

it may elect to follow this strategy. Otherwise the evader deviates from the equivocal line

and follows one of the optimal trajectories emanating therefrom.

Travel along the equivocal line requires that the evader follow a path in pure pursuit of

the pursuer. For the pursuer, travel along an equivocal line requires a mixed strategy, where

its optimal control must be selected from two control options according to some probability.

This control, unlike the hard-turn controls described previously, have time-varying curvature

where the direction of the curvature is selected randomly at each instant. This behavior can

36

result in a chattering phenomenon similar to that of the universal line mentioned previously.

However, such a condition can be avoided, as noted by Isaacs, if the pursuer plays sub-

optimally for a brief period in order to draw the evader beyond the equivocal line and onto

a regular trajectory that requires only a simple sharp turn.

It should be noted that where the equivocal line, universal line, and dispersal line meet

there is a condition where both the pursuer and evader have different control options. In this

case both parties may have to execute mixed strategies. The presence of mixed strategies

suggests that an automated numerical solution to the Homicidal Chauffeur must address the

randomized selection of player controls (see Section 4.2).

While not in this particular instantiation of the game, other singular surfaces in Homicidal

Chauffeur and other differential games can occur, for example, a switch envelope or a focal

line. For a good review of the topography of singular surfaces, see Lewin [51, ch. 8]. It

should be noted that a focal line – similar to a universal line, but where trajectories contact

tangentially – will be seen in the two-evader simple pursuit game in Section 3.7.

Because of the complexity of the singular surfaces within the game, Homicidal Chauffeur

has been used as a test case for several numerical solution schemes such as level set methods

[52]. Such schemes are able to generate value maps, such as the contours in Figure 3.1, as

lookup tables which can be used by online approximation schemes such as limited lookahead.

This effectively enables fast and automatic generation of optimal controls without relying

on the detailed analysis of this section. The goal of the remainder of this chapter and the

simulation results of Section 5.4 is to introduce limited lookahead and examine its viability

for approximating optimal controls for games with singular value functions.

37

3.4 Value function when Isaacs condition not satisfied

Before an approximate solution to a two-player differential game can be obtained, it is

first necessary to examine the case when the Isaacs condition (3.5) is not known to hold.

This can be necessary, say, when one has only an approximate upper (lower) bound of the

upper (lower) value of the game, as will be the case for the sub-optimal approaches of the

next section. In this situation it is often the case [4] that one of the players assumes an

instantaneous informational advantage over the other. Formally, the team of pursuers can

assume a strategy α : B(t) → A(t), α ∈ Γ(t) based on a strategy from the evader set B(t).

Analogously, the evaders assume a strategy β : A(t) → B(t), β ∈ ∆(t). Sets Γ and ∆

contain all possible nonanticipative strategies – strategies based only on opponent’s current

or previous states and controls – for the pursuers and evaders, respectively.

Given these informational constraints, the following value functions can be defined

V +(x(t), z(t)) = infa∈A(t)

supβ∈∆(t)

(J(x(t), z(t), a(t), β[a](t))

)= inf

a∈A(t)supβ∈∆(t)

(∫ T

t

G(x(t), z(t), a(t), β[a](t))dt+Q(x(T ))

)(3.22)

where the evader has the informational advantage, and

V −(x(t), z(t)) = supb∈B(t)

infα∈Γ(t)

(J(x(t), z(t), α[b](t), b(t))

)= sup

b∈B(t)

infα∈Γ(t)

(∫ T

t

G(x(t), z(t), α[b](t), b(t))dt+Q(x(T ))

)(3.23)

where the pursuer has the advantage. Note that V +(x(t), z(t)) ≥ V −(x(t), z(t)).

Given the regularity conditions on f , G, and Q from earlier sections, V + and V − are

solutions to (3.3) and (3.4) and are thus equal to V and V , respectively (see [4]). This for-

mulation is particularly useful when solving (3.3) and (3.4) using the viscosity formulation

38

initially derived by Crandall and Lions [10]. The viscosity framework allows for numerical

solutions to the Hamilton-Jacobi-Isaacs equations, including value functions with disconti-

nuities and discontinuous derivatives that commonly occur in differential games. The level

set method (see Section 4.1) uses the viscosity formulation to solve for the value map for a

variety of differential games.

The sub-optimal solutions of the following section, in addition the viscosity solutions to

the HJI equation, will assume the informational advantages and value functions presented

here.

3.5 Limited lookahead for multi-player games

To date, a general solution to multi-player differential games has not been found. One

difficulty lies in defining terminal sets and specifying how the dynamics and objective should

change as different players reach the terminal sets at different times. In the formulation of

multi-player games in Section 2.4, a discrete variable representing the capture of each evader

was introduced to account for asynchronous capture. However, a solution to the Hamilton-

Jacobi-Isaacs equation with mixed continuous and discrete variables is not yet available. In

place of a general optimal solution, Li in his dissertation [7] developed a general methodology

for multi-player differential games to approximate the upper or lower value of the game using

the limited lookahead method. His work is summarized in this section and will be used

elsewhere in the paper to approximate the solution to the successive capture problem.

In the limited lookahead scheme, the current game value and optimal trajectories for all

of the players are computed over a small time interval [t, t + ∆t] using an estimate of the

game value from t+ ∆t to the capture time T . This game value estimate is analogous to the

cost-to-go of limited lookahead in optimal control [44] and analogous to a rollout policy. If

the cost-to-go has the improving property, then successive iterations of the lookahead scheme

39

will result in an approximate game value that approaches the true (upper or lower) value of

the game as the number of iterations approach infinity. Correspondingly, the minimax (or

maximin) strategies of the players will also approach the optimal (guaranteeing) strategies.

In his dissertation, Li proves that the cost-to-go of the game formulation above has the

improving property and finite convergence to the game value under certain conditions to be

described subsequently.

To facilitate the estimation of the cost-to-go, it is beneficial to define a structured, or

restricted, control set with time-consistent elements. Imposing a structure on the control set

can, for example, simplify the number of types of controls that must be examined to find the

optimal strategy and eases the evaluation of different strategy combinations. In this paper,

a control set under structure S is denoted by AS(t, x, z), BS(t, x, z), respectively for pursuers

and evaders.

In order to establish the improving property of the approximate cost-to-go, it is necessary

to require that the control sets are set-time-consistent. If a control is selected from a set-

time-consistent set at some time t, the control is guaranteed to be available at a later time

τ : t ≤ τ ≤ T . Also, if a control structure S is independent of state x, z, and time t, i.e.,

AS(t, x, z)→ AS for all x, z, and t, then it is set-time-consistent.

It is assumed that the differential game takes place in the capture region, x ∈ Sc, that is,

for any x ∈ Sc, z ∈ Z for any time t ≥ 0 there exists a ∈ AS(t, x, z) such that T <∞ for all

b ∈ B(t). This assumption reduces the problem to finding optimal strategies for the players

without having to address capturability.

Under these assumptions, an approximate upper value can be defined as

V (x, z) = infa∈AS(t,x,z)

supβ∈∆(t)

(∫ T

t

G(x(t), z(t), a(t), β[a](t))dt+Q(x(T ))

). (3.24)

40

The definition of the lower value is analogous. It should be noted that V (x, z) ≥ V (x, z),

i.e., the approximate upper value is an upper bound for the actual upper value.

It is now possible to state the theorem for limited lookahead in multi-player differential

games as proven by Li:

Theorem 3. Under a set-time-consistent control structure S for any x ∈ Sc, z ∈ Z and

∆t : 0 ≤ ∆t ≤ T − t, the function V (x, z) in (3.24) satisfies

V (x, z) = infa∈A(t)

supβ∈∆(t)

(∫ t+∆t

t

G(x(τ), z(τ), a(τ), β[a](τ))dτ + V (xt,a,β[a](t+ ∆t), zt(t+ ∆t))

)(3.25)

where xt,a,b(τ) and zt(τ) are the continuous and discrete state vectors at time τ ≥ t as

generated from the initial states x(t) and z(t), respectively, under the controls a and b.

With the formulation in (3.25), one can use an estimate of the cost-to-go at a short time

interval ∆t later – the V term on the right hand side of the equation – to estimate the game

value and then obtain the corresponding approximately optimal strategies a∗ and β∗ for the

game using (3.25). If V has the improving property, then subsequent evaluations of (3.25)

will yield approximate game values that are closer and closer to the true upper value V . Li

has proven that the limited lookahead value V (x, z) has the improving property, that is, it

approaches the true upper value as the number of iterations approaches infinity, and it does

so under finitely many iterations. For details, see [7]. This establishes the ability of the

limited lookahead method to approximate the upper value of a differential game and hence

to determine the approximate optimal controls.

3.6 Approximating cost-to-go for limited lookahead

As detailed in the previous section, the limited lookahead method requires an estimate of

the cost-to-go – the estimated game value at a future state – to refine the estimate of the

41

game value at the current state and determine the appropriate controls for the current time

interval. In formulating limited lookahead method for multi-player games, Li introduced the

concept of a structured or restricted control set to facilitate estimation of cost-to-go. This

section provides an example from Li [7] of a control structure that can be used in running

cost games like simple pursuit to obtain a valid estimate of the cost-to-go.

Because it can be difficult to define terminal states in a general multi-player differential

game, Li proposes a hierarchical solution to the game where the game is divided into two

“levels” of optimization – an upper level where the assignment of pursuers and evaders is

optimized, and a lower level where the game value for a particular assignment is solved. For

assignments where pursuers engage more than one evader, the games are solved sequentially

with the assumption that the evaders know the strategy of the pursuer for all of the previous

engagements. In this sense, when approximating the upper value using the hierarchical

method, the evaders are given an informational advantage at the lower level in a manner

similar to that of (3.24) to derive a “local optimization” against the pursuer. The pursuer

then finds the assignment at the upper level that yields the smallest game value.

Let si be the assigned capture sequence for pursuer i, represented by an ordered set of

evader indices, si = {si1, · · · , siNi}, sik ∈ {1, · · · , N} and Ni is the number of evaders assigned

to pursuer i. Let Si be the set of all possible capture sequences for pursuer i and S = ΠMi=1Si

be the set of all possible pursuer team assignments. The (upper) game value estimate for an

engagement assignment s = {s1, · · · , sM} ∈ S at the upper level of the hierarchical method

is then given by

Vh

(x, z) = mins∈S

Vs

(x, z) (3.26)

where Vs

(x, z) is the game value assuming the pursuers follow an assigned capture sequence

s and represents the lower level optimization of the hierarchical approach.

42

Assigning the team of pursuers a capture sequence s effectively imposes a control structure

on the pursuers in the sense of the structure S in Section 3.5, and hence the value Vs

(x, z)

can be obtained using the optimization from (3.24). It should be noted that, because the

number of evaders remains constant throughout the entire engagement, the set of possible

engagements S is independent of both time and state and is thus set-time-consistent, imply-

ing that Vh

(x, z) has the improving property and is a valid starting point for an iterative

approach like limited lookahead [7].

If one assumes a game with rolling cost, such as a pure pursuit game with the objective

J(x, z, a, b) =

∫ T

t

[ N∑j=1

zj(t)

]dt

representing the sum of the capture time of each evader, one can approximate Vh

further.

First, it is assumed that each evader can be captured by at least one pursuer, and that an

evader is captured by no more than one pursuer. Then, assuming each pairing of pursuer

i and evader j can be solved as a two-player game, an upper value for the pairing can be

obtained individually as

V ij = infai∈Ai(t)

supβj∈∆j

i (t)

(J(xij, ai, βj[ai])

)(3.27)

= infai∈Ai(t)

supβj∈∆j

i (t)

(∫ T

t

dt

)(3.28)

where xij is the combined state of pursuer i and evader j and Ai(t),∆ji (t) are defined as in

Section 3.4. The game value for the lower level optimization within the hierarchical method

can then formed as a sum of the two-player game capture times,

Vs

(x, z) =M∑i=1

∑j∈si

V ij (3.29)

43

and used in the upper-level equation (3.26) to form the hierarchical estimate of the game

value. It should be noted that, since each evader is assumed to be capturable, V ij < ∞

and, as proven by Li, Vh

is uniformly continuous and is therefore finitely convergent under

iteration such that the limited lookahead method is valid for the hierarchical approach [7].

The simulation results from Chapter 5 will demonstrate the validity of this hierarchical

approach for single-pursuer, multiple-evader scenarios.

3.7 Example solution for simple pursuit of several evaders

This section illustrates some of the elements of the solution to the single-pursuer, multiple-

evader scenario with successive capture – the Dynamic Traveling Salesman problem. The

dynamics are assumed to be those of simple pursuit (2.7) with the objective of pure pursuit

given by (2.18). It will be assumed that the pursuer speed is greater than any single evader

speed (νj < 1 for j = 1, · · · , N) so that capture of each evader is guaranteed (x ∈ Sc). Also,

for the examples in this section, capture refers to point capture (ε→ 0).

One of the first solutions to a simple pursuit game with successive capture of two evaders

was given by Breakwell et al [1] using geometrical arguments and numerical integration.

Breakwell demonstrates that for many initial conditions the optimal strategy of both the

pursuer and evaders is straight-line motion. The direction of each evader path is determined

numerically, with the second evader heading directly away from the first evader’s capture

point and the pursuer heading linearly to each capture point in succession. Breakwell also

shows that for a set of initial conditions where the evaders become equidistant from the

pursuer, curved motion by all players is optimal. Depending on the time when this occurs,

the pursuer maintains equal distance to the evaders until a critical time when the pursuer

must choose one or the other.

44

Figure 3.2 shows Breakwell’s numerical solution to the simple pursuit game with two

evaders (reproduced from the original paper [1]) for a variety of initial conditions. The

capture time of the second evader, normalized by the initial distance between the two evaders

and the pursuer speed, is indicated by the contours in the figure. The axes indicate the

pursuer position relative to the center of the evader pair, and the y-axis is fixed along a

line between the two evaders. The initial pursuer locations where curved motion occur in

inertial space are indicated on the figure as regions 3 and 6; the remaining regions require

straight-line motion only. Some sample trajectories are also shown, overlaid as dashed curves.

The results from Figure 3.2 also reveal two singular surfaces – a focal line drawn from P ∗

to PC and a dispersal line from PC to the origin. All trajectories beginning in regions 3 and

6 are drawn to the focal surface. If the pursuer reaches the focal line before the point PC ,

then the optimal control for the pursuer is to remain on the line until PC , after which the

pursuer reaches the dispersal line. Trajectories that arrive at or begin on the dispersal line

demand an immediate choice by the pursuer to commit to either evader in order to obtain an

optimal capture time. From this it is evident that in the curved motion region, the optimal

capture sequence is not necessarily fixed for all time throughout in the engagement.

For scenarios with N > 2 evaders and time-varying capture sequences it can be surmised

that the singular surfaces in the game value map become increasingly complicated. However,

if one fixes the capture sequence, it has been shown by Chikrii [29] that linear motion

(“parallel pursuit”) is optimal for all parties. Chikrii and Belousov et al [3] have derived

algorithms for the linear motion regime (see Section 3.7) that provide solutions that are

equivalent to V s of the previous section. Thus, one could use these algorithms to compute

the lower-level optimization for the hierarchical method. Combined with a combinatorial

45

Figure 3.2: Map of optimal capture times for successive pursuit of two evaders, normalized by theseparation distance between the two evaders (reproduced from Breakwell [1] with kind permissionfrom Springer Science and Business Media). Capture times are represented by solid contours, andsample optimal trajectories of the pursuer relative to the two-evader system are represented bydashed lines. Note that initial conditions from regions 3 and 6 yield optimal trajectories thatcontain curved motion in inertial space.

optimization scheme to solve the upper-level equation (3.26), one could obtain a cost-to-go

suitable for the limited lookahead scheme.

Recently Liu et al [32] have examined the successive capture problem from the evader

perspective, creating open-loop controls for the N evaders and iterating these controls over

time to approximate an optimal evader response independent of capture sequence. They solve

the two-evader problem numerically using the HJI formulation and demonstrate through

numerical simulation the existence of the curved motion region described by Breakwell.

While they do not address the optimal pursuer response, they do create a heuristic control

46

for the pursuer to approximate the full HJI solution.

It is noted in several references [15, 7, 32] that solving the HJI equations numerically

for more than two evaders can become become computationally prohibitive and is likely

unsuitable for real-time implementation. It is surmised that the limited lookahead solution

method can be realized in a near-real-time fashion for N ≥ 2, particularly when an efficient

tree search method is used to solve the upper-level combinatorial optimization. This will be

tested in Chapter 5. As a pre-requisite, it is necessary to first define how the approximate

solution is obtained for the single-pursuer, many-evader simple pursuit game.

The formulation of the limited lookahead method for simple pursuit is as follows. Assume

one seeks to approximate the upper value of the game. To estimate the game value V for

the current time t using (3.25), a short interval ∆t is selected and the optimal control for

the interval is estimated in the following manner.

For the short time interval it is assumed that the pursuer and evaders travel in a straight

line with the understanding that in the limit that ∆t → 0, curved motion can be approxi-

mated. A pursuer control a is selected from A = {φ : −π < φ ≤ π} and a respective evader

control β[a] is selected from ∆(t) = {ΠNj=1Bj | a}, Bj = {ψj : −π < ψj ≤ π}. Under these

controls the pursuer and evaders are propagated to x(t+ ∆t)|a,β[a] = x, z(t+ ∆t) = z. The

cost-to-go in equation (3.25) is estimated from the state x, z using the hierarchical approach.

Since the objective is the sum of evader capture times and the pursuer speed is greater than

any evader speed, the necessary conditions for the hierarchical approach are satisfied.

For the hierarchical approach, the lower-level, two-player game values V ij are given by a

sequence of two-player games as formulated in Section 3.2, where straight-line motion directly

away from the pursuer location is the optimal control for the first engaged evader. The next

evader in the sequence moves directly away from the capture location of the previous, etc.,

and the pursuer follows linearly until all evaders are captured. The values for each subgame

47

are computed for the fixed capture sequence and summed according to (3.29). Next, the

combinatorial minimization over all possible capture sequences is computed as in (3.26) to

obtain the hierarchical estimate of the cost-to-go, Vh

(x, z).

Finally, the optimization over the pursuer controls A(t) and evader controls ∆(t) is con-

ducted using Vh

(x, z) and G =∑

j zj(τ) in (3.25) to find the approximate optimal controls

for the interval, at,t+∆t = φt,t+∆t and βt,t+∆t = ψt,t+∆t and the approximate upper value

V (x, z). As an approximation, the pursuer and evader strategies for the entire engagement

are formed by adjoining the strategies for each interval,

a∗ = [at,t+∆t, · · · , at+∆t,T ]

β∗ = [βt,t+∆t, · · · , βt+∆t,T ]

as in Li et al [43]. As time progresses and Equation (3.25) is iterated, the estimate of the

game value and hence the approximate optimal controls approach their true values.

It should be noted that a similar process can be followed for the dynamics of the Homicidal

Chauffeur game using the same objective, assuming capturability of all evaders is ensured.

A condition for capturability within a two-player subgame is given by Isaacs [5, p. 237] as

ωε >√

1− ν2 + sin−1 ν − 1, ν < 1 (3.30)

for a circular target region about the pursuer with radius ε, the evader-speed to pursuer-

speed ratio of ν and a pursuer turn rate limit ω. With these conditions met, the hierarchical

approach can be used to approximate the optimal solution assuming the solutions to each

subgame V ij can be computed. The next chapter details the simulation approaches and im-

plementation of the approximate solution above, including an efficient method for computing

48

the subgames V ij for games with more complex dynamics and singular surfaces such as the

Homicidal Chauffeur.

49

Chapter 4

Simulation approach

In the previous section it was shown that a solution to the multi-player differential game

can be approximated using the limited lookahead method. The motivation for this work is

to demonstrate that such a solution can be simulated efficiently for a single-pursuer, multi-

evader pursuit game with N > 2 evaders. The simulation efforts in this work serve both

to efficiently compute lookahead results and to validate the results against known solutions

which, due to the nature of the pursuit games explored in this work, must also be computed

numerically for comparison. The following sections describe the simulation methods used to

compute both the known solutions and the lookahead solutions. As the novel contribution of

this paper, the application of both Monte Carlo Tree Search and table lookups to compute

the cost-to-go of the lookahead method are detailed below.

4.1 Numerical solutions to the successive pursuit game

In order to demonstrate the utility of the limited lookahead simulation technique, one needs

a reference solution for comparison. For two-player games, the viscosity method has been

shown to provide adequate numerical solutions for many game variations [13, 14, 15], even

50

when the value function is discontinuous. It was proven by Barron et al [53] and Evans and

Souganidis [54] that the values V + and V − from (3.22) and (3.23) are the viscosity solutions

to the HJI equations (3.3) and (3.4). Level set methods for partial differential equations have

been used to solve these HJI equations for games such as the Homicidal Chauffeur [52]. For

the sake of generating reference results to compare with the approximate lookahead values,

this paper leverages qualitative results from Breakwell for the two-evader successive pursuit

game and the value map generated by level set methods by Patsko et al [14] for the Homicidal

Chauffeur game in place of solving the HJI equations explicitly.

For successive pursuit of many evaders – the Dynamic Traveling Salesman problem –

Belousov et al developed an efficient algorithm to solve for the optimal evader directions when

the player trajectories start in the linear motion region and follow a fixed capture sequence.

Belousov et al were able to transform the problem from an N -dimensional optimization

problem to a nonlinear, root-finding problem for a single variable. The algorithm requires

an initial guess for the first evader’s heading and solves for the roots of a nonlinear, iterative

function to obtain the complete vector of evader heading solutions.

It should be noted that the finding an appropriate initial condition for the nonlinear

function is not completely straightforward. The root-finding solution can be sensitive to

the initial guess, and not all initial conditions yield a valid result. In implementing the

algorithm, it was discovered that different initial conditions can yield a variety of maxima,

with the number of peaks on the order of the number of evaders. In order to obtain the

global maximum for capture time, a variety of initial conditions must be supplied to the

root-finding routine until the best solution is found.

For this paper, the global maximum over the set of initial first-evader headings was found

using a basin-hopping algorithm (see Section 4.4). The basin-hopping algorithm randomly

selects an initial evader heading and executes a local numerical minimization algorithm that

uses the (negated) capture time returned by the Belousov equations. It then accepts or

51

rejects the capture time using the Metropolis criterion and repeats the process for another

initial heading until the maximum number of iterations is reached. Since the initial heading

selection is stochastic, there is a small probability of not finding the (negated) global maxi-

mum. To improve the chances of finding the global value, it was determined empirically that

floor(c log(N)) iterations would be sufficient, where c = 5 yielded an error rate less than

1% for N > 2 evaders.

In Chapter 5 the limited lookahead results in linear scenarios will be validated using the

method of Belousov et al. Additionally, two-player games in the curved motion region will

be compared with the Breakwell value map from Figure 3.2 in Section 3.7.

4.2 Simulation using the limited lookahead method

In order to simulate optimal controls using limited lookahead it is necessary to estimate the

expected cost-to-go. It was shown in Section 3.6 that the hierarchical approach provides a

simple means for computing an estimate of the cost-to-go by dividing the estimate into two

optimizations – an “upper-level” combinatorial optimization and a “lower-level” combination

of two-player subgames. For this work, two simulation techniques were used to solve each

level of the hierarchical approach. At the upper level, the Monte Carlo tree search method

was used to quickly find the (approximately) optimal capture sequence and will be described

in the next section.

For simulation at the lower level, it is desirable to efficiently simulate the two-player

subgames. In a differential game with additive cost such as pure pursuit, it is possible

to chain a series of subgame values together to construct an overall game value, which is

capture time in the case of pure pursuit. Because evaders are assumed to have knowledge of

the pursuer strategy for previous subgames, the subgames are not independent and must be

solved sequentially. However, since only the value is required to estimate the cost-to-go in

52

(3.25) one can reduce computation by using a pre-computed table of two-player game values

to solve the lower-level subgames. In the Homicidal Chauffeur game, for example, a value

map table indexed by the relative pursuer-evader position xij, the pursuer-evader speed ratio

ν, and the pursuer turn rate ω is sufficient to generate V ij for each pursuer-evader pair (see

the contours in Figure 3.1). Using interpolated table lookups to solve complex games has the

potential to reduce on-line game computation significantly, deferring the solution of partial

differential equations to an offline simulation. Section 5.4 will use this method to solve the

multi-player Homicidal Chauffeur game and evaluate the potential of solving the game in

real-time.

Once the hierarchical cost-to-go Vh

has been found using (3.26) and (3.29), the simulation

must implement the minimax optimization of (3.25) to form the value estimate V . As

discussed in Section 3.7, in order to solve the optimization, each player is assumed to undergo

linear motion over the short interval ∆t, and the optimal strategies that are solved locally

for each interval are combined to form the overall player strategy.

To solve the minimax optimization in (3.25) for each player control it is again neces-

sary to use a global method, as multiple maxima and minima can form within each team’s

optimization space. Additionally, the method must support optimization in multiple dimen-

sions in order to support the maximization for the many evaders. Two global optimization

routines were considered initially: brute-force global optimization and basin hopping. Brute-

force optimization consists of forming a grid of points in the optimization space, evaluating

the objective function at each point, and executing a local optimization at the extreme

point. Brute-force search has the advantage that it is simple to implement, uses existing

minimization routines, and does not require random sampling. Brute-force search does,

however, require sampling a grid of points which for high-dimensional problems may use a

large amount of memory. For the current game, brute-force search will be considered for the

pursuer’s one-dimensional minimization step.

53

For the evaders’ multi-dimensional maximization step, a similar basin hopping algorithm

to the previous section was chosen. Basin hopping has a much lower memory footprint than

brute-force search since it does not have to form a full grid and is therefore suitable for the

many-evader problem. However, basin-hopping does have a random sampling component

which requires that enough iterations are run in order to guarantee that the pursuer’s min-

imizing objective function is as smooth as possible. In this study, the number of iterations

were chosen in the same manner as described for the root-finding algorithm of Section 4.1.

For the local optimization, the BFGS algorithm [55] was selected experimentally for the

evader maximization step due to its speed. For the pursuer’s minimization step, a modified

Powell’s method [56] was used for the local minimization for its performance in the presence of

noisy objective functions. Solver stability within the minimax optimization can be especially

important, as large errors in the inner optimization loop (maximization for minimax) can

yield unstable results in the outer loop (minimization) that may cause trajectories to diverge.

It should be noted that the random point selection of the basin hopping method can

be advantageous when dealing with certain singular game surfaces. For example, in the

case of the Homicidal Chauffeur game when a pursuer encounters the dispersal surface at

the bottom of Figure 3.1, the optimal play for the pursuer is to employ a mixed strategy

when choosing which direction to turn. Similar surfaces can also arise for the evader. The

stochastic nature of basin hopping and, as will be seen in the next section, Monte Carlo Tree

Search, provide a natural solution to decision surfaces that require mixed strategies.

4.3 Combinatorial optimization using tree search

Many games and optimization problems suffer from the so-called “curse of dimensionality”

where computing an optimal solution is either difficult or even impossible due to a high-

dimensional state or action space. For the pursuit-evasion problem, each of the works by Li,

54

Belousov et al, and Liu et al [7, 3, 32] note the combinatorial challenge of selecting an optimal

capture sequence – the upper level of the hierarchical method from Section 3.6. Indeed, for

the “static evader” case – a Traveling Salesman problem – the combinatorial problem has

been shown to be NP-complete. For a computational treatment of TSP, including both exact

and approximate methods, see [2]. While many of these methods have good performance

and may even be suitable for real-time applications, it was desired to find an efficient search

algorithm that would be simple to implement, flexible enough to accommodate a variety of

objectives and player dynamics, and also suitable for stochastic problems. The Monte Carlo

Tree Search method was selected as a candidate for solving the combinatorial step of the

approximate differential game solution because it meets these criteria.

Monte-Carlo Tree Search has received much attention in recent years due to its success in

discrete combinatorial games1 such as Go that have high branching factors. A good review of

MCTS and its variants can be found in [45]. MCTS works in the following manner. A search

tree is constructed by selecting nodes asymmetrically according to a tree policy that balances

exploration of new nodes with the exploitation of more promising nodes. A simulation is

run from the selected node using a default policy that reports a terminal expected value

or outcome to be used by the tree policy. Within the simulation, the default policy assigns

sequential actions randomly or, if domain knowledge is available, according to some heuristic.

The selected node and its ancestors are then updated with the results of the simulation and

the tree search is resumed using the updated node values until a maximum number of

iterations are reached.

MCTS has several salient features. The exploration and exploitation capability of the tree

policy expands promising nodes while still allowing for the discovery of even better branch

paths. MCTS is an “anytime algorithm,” meaning that the process can be terminated at1Combinatorial games have two players and are zero-sum, perfect-information, deterministic, discrete,

and sequential [45]

55

any time and still return a promising path in the tree. This is especially useful when MCTS

is used for real-time applications. MCTS does not require the storage or manipulation of

intermediate states, meaning that the algorithm does not require domain knowledge and can

thus be applied to a variety of domains. The algorithm also allows for the simulation of

stochastic applications, as the tree and default policies are stochastic in nature. MCTS is

also simple to implement.

MCTS is a natural fit for computing the optimal capture sequence within the hierarchical

framework. For the single-pursuer, multiple evader pursuit game, the action states of MCTS

are modeled as the components s1, s2, . . . of the capture sequence s. A tree node represents

a partial capture sequence s = {s1, s2, . . . , sna} where na is the current number of evaders

assigned, or the current tree depth. Each node is updated with the latest expected game

value for that sequence as determined by simulation using the default policy. The tree policy

then balances exploitation of the best partial capture sequences found so far with unexplored

sequences to select the next evader sna+1 ∈ {1, · · · , N} \ s in the sequence.

A sample tree search result for a four-evader successive pursuit game is shown in Figure

4.1. Each node (aside from the root node) represents at least one simulation run using the

default policy, and the number on each node is the running expected reward. Each edge

represents an evader in the capture sequence. For the example shown, the solver sought to

minimize the capture time and found the minimum time result with a capture time of 555.3.

The minimum-time capture sequence in the example is {0, 1, 3, 2} (the node for Evader 2 is

not shown).

The tree policy chosen for the current implementation is the “plain” Upper Confidence

Bound for Trees (UCT) algorithm by Kocsis and Szepesvari [57], where each node in the tree

is treated as a multi-armed bandit problem. A child node sk ∈ {1, · · · , N} \ s is selected to

56

3

3

2 0

0

3

1 2

1

2

3

3 3

982.4

1050.9 755.8 1644.6 1146.3

660.5 739.0764.0 969.9 876.41291.3

783.8

1388.1

555.3

Figure 4.1: A sample MCTS minimizing search tree for a four-evader successive pursuit game.Each node represents a simulation run and each edge an evader in a capture sequence. The numberon each node is the running expected capture time.

maximize the quantity

Rk + 2Cp

√2 lnnInIk

(4.1)

where Rk is the current expected reward of the child node, nI is the number of times the

parent node has been visited, nIk is the number of times the child node has been visited,

and Cp > 0 is a constant. The expected reward term R is computed as the average value

of the simulation results from previous visits and is continually updated with values from

future simulations as child nodes are selected and new simulations are run. In plain UCT,

a larger expected reward for a given node encourages exploitation of the associated branch.

The second term in (4.1), however, encourages exploration and guarantees that all nodes will

be visited as nI approaches infinity. A value of Cp = 1√2as used by Kocsis and Szepesvari is

assumed for this study.

57

The upper confidence bound in plain UCT is guaranteed to be within a constant factor of

the best possible bound on the growth of the regret – the difference between the true value

and estimated value after nI iterations – which goes as O(log(nI)). To meet this condition,

however, R must be within the support of [0, 1], which is not generally the case for game

values that represent, say, the capture time of an evader. To workaround this issue, the

current implementation normalizes the expected rewards for all nodes by the largest reward

encountered in the tree search. Experiments conducted with this workaround show that at

least the qualitative balance between exploration and exploitation is preserved.

The use of a default policy in MCTS is compatible with the rollout policy required

by limited lookahead. In the current implementation, the default policy for MCTS is the

computation of the expected game value Vs

, where the remaining components of the partial

sequence s are selected randomly from a uniform distribution to form the full sequence s.

While uniform sampling of remaining evaders is simple, it can be inefficient for deterministic

problems – a sampled capture sequence s may be repeated unnecessarily. To prevent this

from occurring, an option was added to the current implementation to remember previously

visited nodes. Adding memory to MCTS has been done previously with Nested Monte Carlo

Tree Search [50], where a tree search is conducted at each level and the best branch at each

level is remembered. For most of the tests in this work, however, remembering already-

visited nodes did very little to improve search performance, as checking previously visited

nodes adds overhead. The ability to memorize visits to branches was kept as an option, but

for most of the results it is not used. Examining the benefits and drawbacks of memory in

MCTS for this problem is an area for future investigation.

58

4.4 Computational resources

For the algorithms and results in this paper, the following resources were used. All simula-

tions were run on an Intel Core i7-3720QM 2.6 GHz CPU with 8GB RAM. The simulation

routines have been written in C Python [58] and leverage the optimizations provided by the

Anaconda distribution [59], including Numba [60], a package that uses LLVM [61] to compile

Python code into C on-the-fly for subsequent evaluations. Mathematical computations and

plots use the Numpy, Scipy, and Matplotlib packages [62, 63, 64]. The local optimization

and basin hopping methods [65] were provided by the Scipy optimize package. For local

minimization within basin hopping, the BFGS algorithm [55] was used for the maximization

step and Powell’s method [56] for the minimization step, as mentioned previously.

Some of the value map plots leverage the Python multiprocessing package for parallel

processing. In general, MCTS is suited for parallelization, as each default policy simulation is

evaluated independently. However, care must be taken when combining results from similar

branches; see [45] for a summary of MCTS parallelization methods. In this work, none of

the MCTS tree searches are parallelized.

59

Chapter 5

Results and analysis

The results in this section demonstrate both the utility of the limited lookahead method

when applied to differential games with singular surfaces and the efficiency of the Monte

Carlo Tree Search method. The section begins with limited lookahead results for the simple

pursuit of two evaders by a single pursuer, comparing simulation results with those of Break-

well [1] and the algorithm from Belousov et al [3]. Next, the MCTS method is benchmarked

against brute-force search to examine its potential in computing optimal engagements with

many evaders. Lookahead engagement results using MCTS for scenarios with more than

two evaders follow thereafter, along with a summary of typical computation times for com-

ponents of the lookahead method. Finally, the extension of the limited lookahead method

with multiple evaders to the more complex dynamics of the Homicidal Chauffeur game is

demonstrated.

60

5.1 Limited lookahead performance with one pursuer andtwo evaders

The work on approximate multi-player game solutions by Li from Chapter 4 suggests that ap-

proximations to both the upper and lower values of differential pursuit games can be obtained

iteratively using the limited lookahead method. This section demonstrates the closeness of

the approximation for the single-pursuer, two-evader simple pursuit game. Comparisons of

lookahead results with the geometrical solutions by Chikrii and Belousov (see Section 3.7)

for initial conditions in the linear motion region are given, along with examples of scenarios

that begin in Breakwell’s “curved motion” zones. Finally, the approximate upper values of

the game are compared qualitatively with the results from Breakwell (equivalent to the full

HJI solution) for a variety of initial conditions, revealing the closeness of the approximation

and highlighting several features inherent in the two-evader game.

Figure 5.1 shows the results of an engagement between a single pursuer and two evaders

(ν = 1/2) with an initial condition inside the linear solution set, which is the set of points

where linear motion by all parties is optimal and the optimal capture sequence is fixed for

all time. Empty circles represent the initial positions of the pursuer and evaders, while dots

represent positions at each time step (∆t = 0.1). Dashed lines represent the optimal linear

motion solution that one obtains using the method of Belousov et al, and filled circles rep-

resent the capture points, annotated with respective capture times. Capture times denoted

with an asterisk (∗) are the optimal linear capture times.

The cost-to-go for the limited lookahead algorithm was computed by solving the two-

player subgames sequentially. For the engagement in Figure 5.1, the second evader assumes

that the first evader seeks to maximize its own objective function independently, regardless

of the second evader’s decisions. Thus when computing the first subgame V 1, Evader 1 is

estimated to flee directly away from the pursuer. The path for the second subgame V 2 is

61

−20 −10 0 10 20 30

0

10

20

30

40

P E0

T ∗ = 19.03

E1

T ∗ = 70.68

E0

T = 19.40

E1

T = 71.30

P

Figure 5.1: Sample engagement using limited lookahead as compared with the optimal result(denoted by ∗ and dashed lines) in the linear motion regime.

formed as Evader 2 flees directly from the estimated capture point of Evader 1. It is clear

from Figure 5.1 that this estimate is sub-optimal, as Evader 2’s initial heading does not

align with the optimal solution from Belousov et al. However, after roughly 25 iterations

(2.5 sec simulation time) the sub-optimal solution approaches the optimal one. The final

capture time is within a few time steps (1%) of the optimal result. The path in Figure 5.1

demonstrates qualitatively the ability of the limited lookahead method to approximate the

optimal solution, even when the subgames of the estimated time-to-go are sub-optimal.

The simulation of the two-player subgames above can be done very quickly, as the motion

is linear and the subgames are connected only by their initial conditions. This cost-to-go

estimate requires, however, that each evader assume a general strategy for the other evaders

in the coalition. Another way to estimate the cost-to-go is to assume that each evader

plays a strategy that is completely independent of the other evaders, or in other words, each

evader plays the game as if it is the only target, oblivious to the existence of other evaders.

62

This approach requires the simulation of an entire game for each evader, since each evader

executes the simple-pursuit control law throughout the entire engagement, fleeing along a

line directly away from the pursuer. As the pursuer cannot follow a straight line against

every evader at once, this results in nonlinear evader motion that in general must be solved

numerically.

Estimating the cost-to-go in this fashion has the advantage of requiring the fewest as-

sumptions among the evaders in the coalition, but it has an added computational cost and

can result in cost-to-go estimates that are even lower than those of the linear subgames

above. However, experiments using this estimation approach yield nearly identical results

to that of Figure 5.1, suggesting that this cost-to-go estimation method is viable for lim-

ited lookahead. This fact will be exploited in the table lookup approach for the Homicidal

Chauffeur game in Section 5.4.

Figure 5.2 shows the engagement results for an example where evaders are in the “curved

motion” zone as described by Breakwell (see Section 3.7). In this region, the pursuer benefits

from delaying pursuit while it is equidistant from the evaders until a certain critical point,

after which the pursuer must pick one evader or the other. The motion of all players during

this delayed decision period is circular, as was described by Breakwell. Note that curved

motion does not occur in the optimal solution from Belousov et al when the capture sequence

is fixed for all time. The advantage given to the pursuer for delaying a capture decision sug-

gests the importance of considering time-varying capture sequences when computing optimal

strategies for multi-player games.

As stated in Section 3.7, the two-player simple pursuit game has at least two singular

surfaces, a focal line and a dispersal line. In this example, the pursuer entered the focal line

and traveled along that path for roughly one second until reaching the dispersal line, after

which it was optimal to commit to either evader. This behavior is consistent with Figure 3.2,

63

−6 −4 −2 0 2 4−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

E0

T = 4.84E1

T = 13.25

P

Figure 5.2: Sample engagement in Breakwell’s “curved motion” zone (capture sequence not fixed)using limited lookahead. The final capture time is 12.5 sec shorter than the fixed sequence capturetime.

at least to the accuracy of the figure, and demonstrates that the limited lookahead method

can yield results even in the presence of singularities in the game value function.

As mentioned in Chapter 3.7, Breakwell et al [1] derived the optimal result for the two-

evader scenario, including the regions with curved motion. Figure 5.3 shows a side-by-side

comparison of the Breakwell solution with the results from limited lookahead simulations for

the same initial conditions. The axes in the figure represent the pursuer location relative to

the center of a two-evader system, normalized by the evader separation distance and pursuer

speed (refer to Figure 3.2 above). A few trajectories from the lookahead simulations are

plotted as dashed lines over the figure to reveal the focal and dispersal lines inherent in the

game. Though not exact, the value contours, trajectory paths, focal line, and dispersal line

are at least qualitatively similar and suggest that limited lookahead is able to approximate

the optimal solution in both the curved and linear motion regions.

64

−3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.00.0

0.5

1.0

1.5

2.0

2.5

3.0

E1

E2

E1

E2

E1

E2

E1

E2

E1

E2

E1

E2

E1

E2

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

Figure 5.3: Side-by-side comparison of full two-evader solution (adapted from Breakwell [1])with the limited lookahead results for a variety of initial pursuer locations. Solid contours representcapture times (normalized by the initial evader separation distance and pursuer speed), while dashedlines represent sample trajectories relative to the two-evader system. The focal and dispersal linesappear along the bottom of the figure.

5.2 Tree search performance with many evaders

Now that the efficacy of the lookahead method has been demonstrated for two evaders, it

remains to be shown whether the game with two or more evaders can be solved efficiently.

As noted by previous authors [7, 3, 32], as the number of evaders increases the combinatorial

optimization of the capture sequence becomes the primary factor in limiting computational

efficiency. It is necessary, then, to examine the performance of the proposed solution for

searching the optimal capture sequence – Monte Carlo Tree Search.

To summarize the assumptions of Section 4.3, MCTS will be used to select the optimal

capture sequence – represented by a branch in a search tree – by simulating a complete

game from each node in the branch using a rollout policy – a default choice of the remaining

capture sequence. For this implementation, the default policy will be to select remaining

evaders to capture from a uniform distribution. The rollout policy then gives an estimate of

65

the total capture time – the game reward or value. This estimate is used in limited lookahead

as the cost-to-go estimate for a given choice for the player controls (see Equation (3.25)).

Nodes are updated with the reward estimate to inform the tree policy for the next branch

selection. In this implementation, the tree policy selects capture sequences using the plain

UCT algorithm.

To illustrate the benefit of MCTS and plain UCT as compared to a brute-force search for

several evaders, the conditions of the two-evader game in the previous section are extended to

N evaders. Figure 5.4 shows the average number of iterations of the rollout policy needed to

achieve optimal and near-optimal results for N = 1, . . . , 10. The average is needed because

MCTS uses randomized branch selection. To remove dependence on initial conditions, the

starting locations of the evaders are also randomized within a 20 x 20 grid surrounding the

pursuer. The number of iterations until the optimal solution is reached is recorded for each

trial; for the curves in Figure 5.4, 100 trials per scenario are used. The MCTS results are

compared with a brute force search, which requires N ! iterations.

For each trial, the number of iterations required to achieve a capture time less than one

percent from the optimal is also recorded; the statistics from these results are also shown in

Figure 5.4. For the simulations tested, the mean number of iterations tends to follow the

trend N log(N). Figure 5.4 also shows a single standard deviation around the mean for both

the optimal and approximate results, indicating that most of the approximate results are

within a constant factor of N log(N), which is a significant improvement over brute-force

search.

As noted in the Section 4.3, the tree search logic used here incorporates no memory of

previously run simulations. This allows for the possibility of simulating a single pursuit order

multiple times. While not demonstrated here, preliminary experiments show that adding the

66

2 3 4 5 6 7 8 9 10

Number of evaders (N)

10−1

100

101

102

103

104

105

106

107

Num

bero

fite

ratio

nsOptimal< 1% ErrorN !

N logN

Figure 5.4: Average number of iterations for MCTS to achieve optimal and sub-optimal (within1% error) results as compared to brute force (N ! iterations). The error bars represent one standarddeviation.

ability to remember previously visited nodes reduces the standard deviation of the number

of iterations needed.

It is apparent from Figure 5.4 that using MCTS with plain UCT can reduce the number

of rollout evaluations by an order of magnitude or more for simulations with more than

four evaders. Later in the section it will be shown how this improvement facilitates the fast

execution of the lookahead technique.

5.3 Lookahead performance with many evaders

To fully exercise the limited lookahead and MCTS methods, this section presents results for

scenarios with three or more evaders. As noted earlier in Section 3.7, an HJI solution for

more than two evaders is not available for comparison, but for regions with linear motion

67

(i.e., where a static capture sequence is optimal) the lookahead method can be compared

to the optimal results from Chikrii and Belousov. Figure 5.5 shows a scenario with three

evaders starting in the linear region, again showing lookahead results as dotted paths and

the optimal linear motion solution as solid lines. The time step used for this simulation was

∆t = 0.05 and the discrepancy from the optimal capture time is about ten time steps (about

1% error).

−5 0 5 10 15 20 25 30 35

−5

0

5

10

T ∗ = 1.69

T ∗ = 9.38

T ∗ = 45.34

E0

T = 45.40

E1

T = 1.61

E2

T = 9.33

P

Figure 5.5: Scenario with three evaders starting in the linear motion regime. The optimal solutionis represented by dashed lines.

Figure 5.6 shows a three-evader example with the first two evaders starting in the curved-

motion region for a two-evader game. The third evader is placed at a location well within the

linear regime for either of the other evaders. Again the lookahead results capture qualitatively

68

the curved motion of the first two evaders, followed by linear pursuit of the third. This

resulted in a shorter capture time than if the pursuer had fixed the capture sequence from

the start of the engagement.

−5 0 5 10 15 20

−10

−5

0

5

E0

E1

T = 7.93

E2

T = 26.47

P

Figure 5.6: Three-evader scenario, with two starting in the curved motion regime.

Finally, Figure 5.7 shows a four-evader scenario with the evaders placed arbitrarily around

the pursuer. Evaders 1 and 3 begin equidistant from the pursuer, possibly on a singular

surface. The equidistant condition occurs again after the capture of Evader 0, during the

pursuit of Evaders 1 and 2. After Evader 2 is captured, the players follow a linear strategy

for the remainder of the engagement. Similar results were found for five- and six-evader

engagements.

To realize the utility of the limited lookahead approximation one must also examine the

computation time of the algorithm. Table 5.1 shows the minimum1 run time (in ms) for1Only the minimum time is reported to avoid the inconsistencies of computer clock timing.

69

−60 −40 −20 0 20 40 60−40

−20

0

20

40

60

80

E0

E1E2

E3

P

Figure 5.7: Scenario with four evaders.

several components of the lookahead algorithm, averaged over many randomly generated

scenarios. In the “lower level” of the hierarchical limited lookahead algorithm, an estimate of

the capture time (the single-sequence cost-to-go, or Vs

) is needed for each possible capture

sequence s so that the minimum can be found using discrete optimization (Equation (3.26)).

The first row in Table 5.1 represents the single-sequence cost-to-go estimate using the linear

evader subgame strategy from Section 5.1, where each evader flees linearly from the estimated

capture location of the previous evader. Because the evader motion is assumed to be linear

the computation can be done quickly.

The next section in Table 5.1 represents the combined time to compute the evader max-

imization strategy in (3.25) and to search for the minimum-time capture sequence given the

evader strategy for a single time step. The evader maximization is computed first for a

single sequence as stated in 4.2, using the BFGS algorithm combined with the gradient of

the value function. Adding a gradient computation increased the single-sequence run time

70

Table 5.1: Average minimum run time vs evader number for different lookahead algorithm com-ponents

Algorithm component N=2 N=3 N=4 N=5 N=6Single-sequence cost-to-go estimate 56 85 110 130 160 µs. . . using table lookup (HC) 55 81 110 140 170Minimum-sequence max evader strategy (Brute) 6.6 39 270 2200 1.9e4 msMinimum-sequence max evader strategy (MCTS) 10 53 150 360 690Limited lookahead for single time step (Brute) 0.23 1.4 9.8 91 1800 sLimited lookahead for single time step (MCTS) 0.38 1.9 4.7 14 44

slightly but also reduced the number of maximizer function evaluations, resulting in a net

speed improvement.

Using the maximization result for each sequence, the tree search algorithm examines

different pursuit sequences and reports the minimum-time sequence for a given initial pursuer

heading, which the pursuer will use in its own minimization step. The second row of the

table above compares the search performance (plus evader maximization) for both brute-

force search and plain UCT Monte Carlo Tree Search. Though initially MCTS has some

overhead due to the extra search logic, the benefits of MCTS are apparent for N > 3.

For the simulation runs above, the number of iterations used in the MCTS algorithm was

cN log(N), where c is a constant of approximately 3, chosen empirically to balance execution

speed with accuracy. If c is set too low, the MCTS algorithm finds the optimal time less often

and the capture time profile presented to the pursuer for minimization becomes too noisy.

Note that the chosen iteration number is consistent with the results of Figure 5.4, where the

approximate optimal value is reached on average within a constant factor of N log(N).

As was mentioned in 4.2, the cost-to-go function evaluated by the evader can have multiple

extrema, particularly when singular surfaces are encountered. To help the maximizer find

the global maximum, the basin-hopping technique of 4.2 was tried. Unfortunately due to a

memory error associated with the basin-hopping routine, not all scenarios in Table 5.1 could

71

be computed. Furthermore, brute-force global optimization by sampling an entire grid of

points also became computationally prohibitive. Thus, a new approach was needed.

Instead of brute-force sampling an entire grid of points in the optimization space, MCTS

is used to sparsely sample the grid. To capture a grid as a tree, the space is first divided

evenly across each dimension, and each grid unit is represented by a node in a tree. Next, each

grid unit is itself sampled in an identical manner, and a set of nodes representing the newly

divided grid squares is assigned as children to the parent grid node. Division is continued

until the desired number of samples for each dimension is reached. MCTS then samples the

grid squares as tree nodes in the manner of Section 4.3, preferring grids with higher (lower)

rewards while also maintaining exploration of new maxima (minima) in other grids. This

MCTS grid-sampling method uses Latin Hypercube Sampling during the simulation step to

explore newly selected grid squares, ensuring a more uniform sampling of the space. Using

this sampling method, each global maximization step could be executed to completion.

The pursuer minimization routine is the final step of the limited lookahead algorithm.

The time to compute a one-second limited lookahead step can be found in the final rows of

Table 5.1, where the advantages of reducing the search space using sampling are apparent.

Iterations for the MCTS optimization step were roughly 3N log(N), which resulted in good

accuracy for the linear motion scenarios and fair accuracy for the curved motion scenarios. It

is apparent from the table results that for near-real-time applications, say, within a second,

the limited lookahead method as currently implemented would only be suitable for the two-

evader scenario. However, additional modification options exist that could bring run times

to within a second (see Chapter 6).

While more needs to be done to define accuracy and stability measures and to fine-

tune the algorithms for speed, the results of this section demonstrate, at least as a proof

of concept, the viability of limited lookahead with Monte Carlo Tree Search to efficiently

compute automated player controls for successive pursuit games.

72

5.4 Limited lookahead and the Homicidal Chauffeurgame

The Homicidal Chauffeur game provides an interesting and valuable test case for the limited

lookahead method with multiple evaders. First, the game provides a rich set of singular

surfaces and simple nonlinear dynamics that can exercise Li’s theory in the presence of

a discontinuous game value or value gradient. This is especially important in multi-player

games where, as shown in Section 3.7 and the results of this chapter, singular surfaces readily

appear. Second, if successful, the limited lookahead method would provide an automatic and

efficient way to derive the optimal control strategies of each player in a complex game. As was

shown by Merz [8], the Homicidal Chauffeur game can require one to fifteen stages within a

player strategy depending on the initial conditions. The test scenarios in this section exhibit

several of the singular surfaces described in Section 3.3 in order to test the viability of limited

lookahead and MCTS.

In the two-player scenario in Figure 5.8, the evader is positioned to the left of the pursuer,

inside the pursuer’s turning circle but just outside the capture region. This corresponds to a

location in pursuer-centric coordinates just behind the barrier and to the left of the capture

region (see Section 3.3 and Figure 3.1 for a description of the relevant singular surfaces). The

player controls in this scenario consist of of four stages. First, the evader heads tangentially

toward the pursuer’s turning circle whilst the pursuer turns at maximum rate away from

the evader. Then, once the game trajectory reaches the (bottom) universal line, the evader

follows the pursuer directly while the pursuer “evades” along the same line. Once the game

trajectory reaches the dispersal point, the pursuer chooses a hard turn to the right and the

evader flees along a tangent to the pursuer’s turning circle. These motions send the game

trajectory around the barrier and finally to the (top) universal line, where the game ends in

simple pursuit. The motion along and around singular surfaces can be seen in Figure 5.8b.

73

−2 −1 0 1 2 3 4−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

E0

T = 10.88

P

(a) Inertial coordinates

−4 −2 0 2 4

−2

0

2

4

6

E0

2.0

4.0

6.0

8.0

8.0

10.0

(b) Pursuer-centric coordinates and value map

Figure 5.8: Limited lookahead results in inertial and pursuer-centric coordinates for a two-playerHomicidal Chauffeur game. In this scenario, the optimal play for the evader is to follow the pursuerfor a brief period until the pursuer can turn around. In the right figure, the game trajectory inpursuer-centric coordinates reveals several singular surfaces. The game trajectory moves along auniversal line, departs from a dispersal line, moves around a barrier, and returns again to a universalline before reaching the target set.

Figure 5.8b also shows the game value (capture time) for the first scenario overlaid with

the game trajectory in pursuer-centric coordinates. For this example, the capture time of

10.88 obtained in the simulation matches closely with the capture time contour at the initial

relative location of the evader. These coordinates also reveal the singular game surfaces

encountered in this engagement. Directly behind the pursuer on the negative x2 axis is a

universal line to which the optimal trajectory is initially drawn. Note that the trajectory

reveals the chattering behavior that is characteristic of this surface. Likewise, the universal

line along the positive x2 axis also exhibits the chattering effect.

The pursuer dispersal line begins where the trajectory leaves the universal line and is

the point where the pursuer must commit a sharp turn to one direction or the other. In

this case, the pursuer “choice” is made according to a mixed strategy which arises naturally

from the stochastic solver. The trajectory then proceeds around the barrier as the pursuer

turns and the evader flees tangent to the turning circle, drawing the trajectory around the

74

barrier and onto the universal line until capture. Using only a table of capture time values,

the limited lookahead method was able to generate a game trajectory with all of the major

game features for this scenario.

To illustrate performance for a multiple-evader Homicidal Chauffeur game, Figure 5.9

shows an example with three evaders, where two evaders begin in front of the barrier. Here

the optimal play of the first is to flee directly away while the second, anticipating the first

capture, eventually flees tangent to a turning circle approximately from that point. The

third evader follows the pursuer in simple pursuit, trying to remain behind the barrier as

long as possible before fleeing tangent to the pursuer turning circle.

In the current implementation of the cost-to-go estimate using independent table lookup,

the evaders assume no information is exchanged between them and thus act independently,

which is sub-optimal. However, even with a sub-optimal estimate the solution converges to

behavior that considers the motion and capture locations of the other evaders. So, while no

solution to the full HJI problem currently exists for this problem as a reference, the numer-

ical results here suggest that play using sub-optimal, independent subgames can approach

optimal play behavior that considers the moves of other evaders.

Regarding computation performance, the solutions to the Homicidal chauffeur game sce-

narios could be computed very quickly using the table-lookup method. The second row of

Table 5.1 shows that the run time for a single-sequence cost-to-go estimate for N evaders

is comparable to the estimate for the simple pursuit. Limited lookahead run times are thus

comparable to those in the last section of the table. These results suggest that fast solutions

to complex pursuit games like the Homicidal Chauffeur are possible using the techniques of

this paper.

75

−2 −1 0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

E0

T = 1.01

E1

T = 3.43

E2

T = 12.66

P

(a) Inertial coordinates

−4 −2 0 2 4

−2

0

2

4

6

E0

E1

E2

2.0

4.0

6.0

8.0

8.0

10.0

(b) Pursuer-centric coordinates and value map

Figure 5.9: Limited lookahead results for a three-player Homicidal Chauffeur game.

76

Chapter 6

Conclusion and Future Work

The primary goal of this work is to establish whether the limited lookahead method combined

with Monte Carlo Tree Search is indeed a viable and efficient way to solve multi-player games,

including the simple successive pursuit game with several evaders – the Dynamic Traveling

Salesman problem – and a many-evader Homicidal Chauffeur game. The results of Chapter

5 demonstrate that one can obtain approximate game trajectories and capture times for the

two-evader simple pursuit game that agree well with the optimal solutions in both the linear

(fixed capture sequence) and curved motion regions of the game space. Furthermore, the

two-evader game can be solved within a second of real time.

For the many-evader simple pursuit game, the limited lookahead results are able approxi-

mate the optimal player paths within the fixed capture sequence region. Though no reference

solution is available for the multi-player game in the curved motion region for comparison,

the trajectories for the constrained scenarios in Section 5.3 at least appear reasonable.

Automated solutions to the multi-player differential pursuit game, like many other multi-

agent problems, suffer from the “curse of dimensionality.” For the single-pursuer, N -evader

pursuit game the number of possible capture combinations grows as N !. This study pro-

poses Monte Carlo Tree Search as a means to reduce the number of iterations needed to

77

achieve an optimal or near-optimal solution. To achieve capture times within one percent

of optimal, MCTS required O(N log(N)) iterations on average to converge. Optimal results

were also attained much more quickly than brute-force, though not as quickly as N log(N).

Furthermore, the limited lookahead results in Chapter 5 demonstrate that it is possible to

use an approximate MCTS result to generate approximate game trajectories.

Though MCTS significantly reduces the number of iterations needed to find an optimal

capture sequence, the execution time of the current limited lookahead implementation for

the many-evader scenarios still does not meet real-time requirements. This is due primarily

to the global minimization routines required for the minimax portion of limited lookahead.

Attempts were made to reduce the compute time, such as supplying the gradient of the cost

function and using Monte Carlo Tree Search as a grid-sampling mechanism, but these efforts

were not enough to bring run times within the one-second goal. For bringing multi-player

game solutions into the domain of real-time execution, it will be necessary to find more

efficient numerical optimization schemes. Nevertheless, should a more efficient technique be

found, MCTS still can provide a significant improvement for the combinatorial step.

Finally, it was shown in Section 5.4 that limited lookahead can produce an automated

solution to the Homicidal Chauffeur game. By computing a two-player subgame offline and

storing the game values as a lookup table, the limited lookahead method is able to produce

trajectories for a many-evader engagement, even in the presence of singular game surfaces

such as barriers, universal lines, and dispersal lines.

The results of this study suggest a number of avenues for further work. Finding a more

efficient continuous optimization scheme for the minimax operation has already been men-

tioned. The numerical stability of the minimax optimization in conjunction with the stochas-

tic sampling of MCTS should be studied in more detail. Of particular interest is the stability

78

near singular surfaces and in the presence of multiple minima as the number of players be-

comes large. The stability and convergence of the MCTS grid sampling method introduced

here are also of interest.

More could be done to fine-tune the MCTS implementation, including using better mem-

ory management, a compiled language, and parallelization techniques. Some performance

gains could also be had in several auxiliary routines, as not all were implemented using the

optimization of the Numba / LLVM framework. As mentioned by previous authors [7], [31],

it is also worth exploring the many combinatorial techniques that are used in the Traveling

Salesman problem.

It would be valuable to extend the work of Li to games with non-zero-sum objectives,

asymmetric player information, or stochastic processes. Li initially addresses stochastic

conditions for the lookahead method in [7]. Since MCTS is suitable for stochastic simulations,

the work here could likely be adapted.

To date there is no general solution to the multi-player differential pursuit game. Some

of the challenges include defining game termination, solving complex PDEs to obtain the

value function, addressing capturability, and, as shown here, coping with high dimensionality.

Each of these challenge areas have open questions that warrant future study.

79

Bibliography

[1] J. Breakwell and P. Hagedorn, “Point capture of two evaders in succession,” Journal ofOptimization Theory and Applications 27(1) (1979).

[2] D. Applegate, R.E. Bixby, V. Chvatal, andW.J. Cook, The Traveling Salesman Problem:A Computational Study, 2006 Princeton University Press (2006).

[3] A. Belousov, Y.I. Berdyshev, A. Chentsov, and A. Chikrii, “Solving the dynamic trav-eling salesman problem,” Cybernetics and Systems Analysis 46(5) (2010).

[4] T. Basar and G.J. Olsder, Dynamic Noncooperative Game Theory, 2nd ed., SIAM(1999).

[5] R. Isaacs, Differential Games: A Mathematical Theory with Applications to Warfareand Pursuit, Control and Optimization, Dover (1965).

[6] M. Falcone, “Numerical methods for differential games based on partial differential equa-tions,” International Game Theory Review 8(2), 231–272 (2006).

[7] D. Li, Multi-player Pursuit-Evasion Differential Games, Dissertation, The Ohio StateUniversity (2006).

[8] A.W. Merz, The Homicidal Chauffeur–A Differential Game, Ph.D. thesis, StanfordUniversity (1971).

[9] J. Lewin and G. Olsder, “Conic surveillance evasion,” Journal of Optimization Theoryand Applications (1979).

[10] M.G. Crandall and P.L. Lions, “Viscosity solutions of Hamilton-Jacobi equations,”Transactions of the American Mathematical Society 277(1), 1–42 (1983).

[11] A. Subbotin, “Generalization of the main equation of differential game theory,” J. Optim.Th. Appl. 43, 103–133 (1984).

[12] N. Krasovskii and A. Subbotin, Game Theoretical Control Problems, Springer (1984).

80

[13] M. Bardi, M. Falcone, and P. Soravia, “Numerical methods for pursuit-evasion gamesvia viscosity solutions,” in T.R. M. Bardi, T. Parthasarathy (ed.), Stochastic and Dif-ferential Games: Theory and Numerical Methods, Annals of the International Societyof Differential Games, vol. 4, pp. 289–303 (2000).

[14] V. Patsko, “Level sets of the value function in differential games with the homicidalchauffeur dynamics,” International Game Theory Review 3(1), 67–112 (2001).

[15] I.M. Mitchell, A.M. Bayen, and C.J. Tomlin, “A time-dependent Hamilton-Jacobi for-mulation of reachable sets for continuous dynamic games,” IEEE Transactions on Au-tomatic Control (2005).

[16] S. Shankaran, D.M. Stipanovic, and C.J. Tomlin, “Collision avoidance strategies for athree-player game,” in Advances in Dynamic Games, Annals of the International Societyof Dynamic Games 11, Springer Science+Business Media (2011).

[17] N.D. Botkin, K.H. Hoffmann, and V.L. Turova, “Stable numerical schemes for solv-ing Hamilton-Jacobi-Bellman-Isaacs equations,” SIAM J. Sci. Comput. 33(2), 992–1007(2011).

[18] K. Zemskov and A. Pashkov, “Construction of optimal position strategies in a differentialpursuit-evasion game with one pursuer and two evaders,” J. Appl. Maths Mechs 61(3),391–399 (1997).

[19] I. Shevchenko, “Successive pursuit with a bounded detection domain,” Journal of Opti-mization Theory and Applications 95(1), 25–48 (1997).

[20] S. Bhattacharya and T. Basar, “Differential game-theoretic approach to a spatial jam-ming problem,” in R.C. P. Cardaliaguet (ed.), Advances in Dynamic Games, SpringerScience+Business Media, Annals of the International Society of Dynamic Games 12(2012).

[21] Z.E. Fuchs, P.P. Khargonekar, and J. Evers, “Cooperative defense within a single-pursuer, two-evader pursuit evasion differential game,” in 49th IEEE Conference onDecision and Control (2010).

[22] Z.E. Fuchs and P.P. Khargonekar, “Encouraging attacker retreat through defender coop-eration,” in 2011 50th IEEE Conference on Decision and Control and European ControlConference (CDC-ECC) (2011).

[23] D.W. Yeung and L.A. Petrosyan, Cooperative Stochastic Differential Games, SpringerScience+Business Media (2006).

[24] L. Petrosjan and V. Shirjaev, Hierarchical Games, Saransk (1986).

81

[25] S.I. Tarashnina, “Nash equilibria in differential pursuit game with one pursuer and mevaders,” in V.M. L.A. Petrosjan (ed.), Game Theory and Applications III, Nova SciencePublishers, Inc. (1997).

[26] I. Shevchenko, “Minimizing the distance to one evader while chasing another,” Comput-ers and Mathematics with Applications 47 (2004).

[27] I. Shevchenko, “Approaching coalitions of evaders on the average,” in Advances in Dy-namic Game Theory, Birkhauser Boston (2007).

[28] I. Shevchenko, “Strategies for alternative pursuit games,” in P.B. et al. (ed.), Advances inDynamic Games and Their Applications, Birkhauser Boston, Annals of the InternationalSociety of Dynamic Games 10 (2009).

[29] A. Chikrii and S. Kalashnikova, “Pursuit of a group of evaders by a single controlledobject,” Kibernetika 4, 1–8 (1987).

[30] Y.I. Berdyshev, “On a nonlinear problem of a sequential control with a parameter,”Journal of Computer and Systems Sciences International 47(3), 380–385 (2008).

[31] Y.I. Berdyshev, “Choosing the sequence of approach of a nonlinear object to a group ofmoving points,” Journal of Computer and Systems Sciences International 50(1), 30–37(2011).

[32] S.Y. Liu, Z. Zhou, C. Tomlin, and K. Hedrick, “Evasion as a team against a fasterpursuer,” in 2013 American Control Conference (ACC) (2013).

[33] D.M. Stipanovic, A. Melikyan, and N. Hovakimyan, “Guaranteed strategies for nonlinearmulti-player pursuit-evasion games,” International Game Theory Review 12(1) (2010).

[34] D.M. Stipanovic, A. Melikyan, and N. Hovakimyan, “Some sufficient conditions formulti-player pursuit-evasion games with continuous and discrete observations,” in Ad-vances in Dynamic Games and Their Applications, Annals of the International Societyof Dynamic Games 10, Birkhauser Boston (2009).

[35] T. Abramyants, M. Ivanov, E. Maslov, and V. Yakhno, “A detection evasion problem,”Automation and Remote Control 65(10), 1523–1530 (2004).

[36] J.S. Jang and C.J. Tomlin, “Control strategies in multi-player pursuit and evasion game,”in AIAA Guidance, Navigation, and Control Conference and Exhibit (2005).

[37] A. Bolonkin and R. Murphey, “Geometry-based parametric modeling for single-pursuer/multiple-evader problems,” Journal of Guidance, Control, and Dynamics 28(1) (2005).

[38] J. Ge, L. Tang, J. Reimann, and G. Vachtsevanos, “Hierarchical decomposition approachfor pursuit-evasion differential game with multiple players,” in Aerospace Conference,2006 IEEE (2006).

82

[39] X. Wang, J.B. Cruz, Jr., G. Chen, K. Pham, and E. Blasch, “Formation control in multi-player pursuit evasion game with superior evaders,” in Defense and Security Symposium,International Society for Optics and Photonics (2007).

[40] M. Wei, G. Chen, J.B. Cruz, Jr., L.S. Haynes, K. Pham, and E. Blasch, “Multi-pursuermulti-evader pursuit-evasion games with jamming confrontation,” Journal of AerospaceComputing, Information, and Communication 4 (2007).

[41] D. Li and J.B. Cruz, Jr., “A hierarchical approach to multi-player pursuit-evasion dif-ferential games,” in Proceedings of the 44th IEEE Conference on Decision and Control(2005).

[42] D. Li and J. Cruz, “Better cooperative control with limited look-ahead,” in AmericanControl Conference, IEEE (2006).

[43] D. Li and J.B. Cruz, Jr., “Improvement with look-ahead on cooperative pursuit games,”in Proceedings of the 45th IEEE Conference on Decision & Control (2006).

[44] D.P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific (2000).

[45] C.B. Browne, E. Powley, D. Whitehouse, S.M. Lucas, P.I. Cowling, P. Rohlfshagen,S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A survey of Monte Carlo treesearch methods,” IEEE Transactions on Computational Intelligence and AI in Games4(1) (2012).

[46] M.P. Schadd, M.H. Winands, H.J. van den Herik, and H. Aldewereld, “AddressingNP-complete puzzles with Monte-Carlo methods.” in Proceedings of the AISB 2008Symposium on Logic and the Simulation of Interaction and Reasoning. (2008).

[47] D. Perez, P. Rohlfshagen, and S.M. Lucas, “Monte Carlo tree search for the physicaltravelling salesman problem,” Applications of Evolutionary Computation (2012).

[48] D. Perez, S. Samothrakis, P. Rohlfshagen, and S.M. Lucas, “Rolling horizon evolutionversus tree search for navigation in single-player real-time games,” in Proceeding of thefifteenth annual conference on Genetic and evolutionary computation conference (2013).

[49] E.J. Powley, D. Whitehouse, and P.I. Cowling, “Monte Carlo tree search with macro-actions and heuristic route planning for the physical travelling salesman problem,” inComputational Intelligence and Games (CIG), 2012 IEEE Conference on (2012).

[50] A. Rimmel, F. Teytaud, and T. Cazenave, “Optimization of the nested Monte-Carloalgorithm on the traveling salesman problem with time windows,” Applications of Evo-lutionary Computation (2011).

[51] J. Lewin, Differential Games: Theory and Methods for Solving Game Problems withSingular Surfaces, Springer-Verlag (1994).

83

[52] V.S. Patsko and V.L. Turova, “Homicidal chauffeur game: History and modern studies,”in Advances in Dynamic Games, Annals of the International Society of Dynamic Games11, Springer Science+Business Media (2011).

[53] E. Barron, L. Evans, and R. Jensen, “Viscosity solutions of Isaac’s equations and dif-ferential games with Lipschitz controls,” Journal of Differential Equations 53, 213–233(1984).

[54] L. Evans and P. Souganidis, “Differential games and representation formulas for solutionsof Hamilton-Jacobi-Isaacs equations,” Indiana Univ. Math. J. 33, 773–797 (1984).

[55] J. Nocedal and S. Wright, Numerical Optimization, Springer New York (2006).

[56] M. Powell, “An efficient method for finding the minimum of a function of several variableswithout calculating derivatives,” Computer Journal 7(2), 155–162 (1964).

[57] L. Kocsis and C. Szepesvari, “Bandit based Monte-Carlo planning,” in Proc. Eur. Conf.Mach. Learn. (2006).

[58] “C Python, version 2.7.5,” http://www.python.org, [Online; accessed 12-March-2014].

[59] “Anaconda Python - Continuum Analytics,” http://www.continuum.io, [Online; ac-cessed 12-March-2014].

[60] “Numba - Continuum Analytics,” http://numba.pydata.org, [Online; accessed 12-March-2014].

[61] “LLVM,” http://www.llvm.org, [Online; accessed 12-March-2014].

[62] “Numpy,” http://www.numpy.org, [Online; accessed 12-March-2014].

[63] “Scipy,” http://www.scipy.org, [Online; accessed 12-March-2014].

[64] “Matplotlib,” http://matplotlib.org, [Online; accessed 12-March-2014].

[65] D. Wales and J. Doye, “Global optimization by basin-hopping and the lowest energystructures of Lennard-Jones clusters containing up to 110 atoms,” Journal of PhysicalChemistry A 101, 5111 (1997).

84

efficient method for computing strategies for …tions to the multi-player pursuit evasion...

Documents