journal title on infusing reachability-based safety...

18
On Infusing Reachability-Based Safety Assurance within Planning Frameworks for Human-Robot Vehicle Interactions Journal Title XX(X):118 c The Author(s) 2019 Reprints and permission: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/ToBeAssigned www.sagepub.com/ SAGE Karen Leung 1 , Edward Schmerling 1 , Mengxuan Zhang 1 , Mo Chen 2 , John Talbot 1 , J. Christian Gerdes 1 , and Marco Pavone 1 Abstract Action anticipation, intent prediction, and proactive behavior are all desirable characteristics for autonomous driving policies in interactive scenarios. Paramount, however, is ensuring safety on the road—a key challenge in doing so is accounting for uncertainty in human driver actions without unduly impacting planner performance. This paper introduces a minimally-interventional safety controller operating within an autonomous vehicle control stack with the role of ensuring collision-free interaction with an externally controlled (e.g., human-driven) counterpart while respecting static obstacles such as a road boundary wall. We leverage reachability analysis to construct a real-time (100Hz) controller that serves the dual role of (i) tracking an input trajectory from a higher-level planning algorithm using model predictive control, and (ii) assuring safety by maintaining the availability of a collision-free escape maneuver as a persistent constraint regardless of whatever future actions the other car takes. A full-scale steer-by-wire platform is used to conduct traffic weaving experiments wherein two cars, initially side-by-side, must swap lanes in a limited amount of time and distance, emulating cars merging onto/off of a highway. We demonstrate that, with our control stack, the autonomous vehicle is able to avoid collision even when the other car defies the planner’s expectations and takes dangerous actions, either carelessly or with the intent to collide, and otherwise deviates minimally from the planned trajectory to the extent required to maintain safety. Keywords Probabilistic planning, safety-preserving controller, backward reachability analysis, vehicle model predictive control, human-robot interaction. 1 Introduction Decision-making and control for mobile robots is typically stratified into levels. A high-level planner, informed by representative yet simplified dynamics of a robot and its environment, might be responsible for selecting an optimal, yet coarse trajectory plan, which is then implemented through a low-level controller that respects more accurate models of the robot’s dynamics and control constraints. While additional components may be required to flesh out a robot’s full control stack from model to motor commands, selecting the right “division of responsibilities” is fundamental to system design. One consideration that defies clear classification, however, is how to ensure a mobile robot’s safety when operating in close proximity with a rapidly evolving and stochastic environment. Safety is a function of uncertainty in both the robot’s dynamics and those of its surroundings; high-level planners typically do not replan sufficiently rapidly to ensure split-second reactivity to threats, yet low-level controllers are typically too short-sighted to ensure safety beyond their local horizon. Human-robot interactions are an unavoidable aspect of many modern robotic applications and ensuring safety for these interactions is critical, especially in applications such as autonomous driving where collisions may lead to life- threatening injury. However, ensuring safety within the planning and control framework can be very challenging due to the uncertainty in how humans may behave. To quantify this uncertainty, robots often rely on generative models of human behavior in order to inform their planning algorithms (Sadigh et al. 2016; Schmerling et al. 2018), thereby enabling more efficient and communicative interactions. In general, under nominal operating conditions that reflect the modeling assumptions, these model-based probabilistic planners can offer high performance (e.g., minimizing time and control effort for the robot). However, these planners alone are typically insufficient for ensuring absolute safety because (i) they depend on probabilistic models of human behavior and thus safety is not enforced deterministically, (ii) dangerous but low-likelihood events may not be adequately captured in the human behavior prediction model, and (iii) reasoning about these probabilistic behavior models is typically too computationally expensive for the planners to react in real time when humans strongly defy expectations and/or diverge from modeling assumptions. In this work we implement a control stack for a full-scale autonomous car (the “robot”) engaging in close proximity interactions with a human-controlled vehicle (the “human”). 1 Stanford University, Stanford, CA 94305, USA 2 Simon Fraser University, Burnaby, BC V5A 1S6, Canada Corresponding author: Marco Pavone, 496 Lomita Mall, Rm. 261 Stanford, CA 94305 Email: [email protected] Prepared using sagej.cls [Version: 2017/01/17 v1.20]

Upload: others

Post on 05-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

On Infusing Reachability-Based SafetyAssurance within Planning Frameworksfor Human-Robot Vehicle Interactions

Journal TitleXX(X):1–18c©The Author(s) 2019

Reprints and permission:sagepub.co.uk/journalsPermissions.navDOI: 10.1177/ToBeAssignedwww.sagepub.com/

SAGE

Karen Leung1, Edward Schmerling1, Mengxuan Zhang1, Mo Chen2, John Talbot1, J.Christian Gerdes1, and Marco Pavone1

AbstractAction anticipation, intent prediction, and proactive behavior are all desirable characteristics for autonomous drivingpolicies in interactive scenarios. Paramount, however, is ensuring safety on the road—a key challenge in doing sois accounting for uncertainty in human driver actions without unduly impacting planner performance. This paperintroduces a minimally-interventional safety controller operating within an autonomous vehicle control stack with therole of ensuring collision-free interaction with an externally controlled (e.g., human-driven) counterpart while respectingstatic obstacles such as a road boundary wall. We leverage reachability analysis to construct a real-time (100Hz)controller that serves the dual role of (i) tracking an input trajectory from a higher-level planning algorithm using modelpredictive control, and (ii) assuring safety by maintaining the availability of a collision-free escape maneuver as apersistent constraint regardless of whatever future actions the other car takes. A full-scale steer-by-wire platform isused to conduct traffic weaving experiments wherein two cars, initially side-by-side, must swap lanes in a limited amountof time and distance, emulating cars merging onto/off of a highway. We demonstrate that, with our control stack, theautonomous vehicle is able to avoid collision even when the other car defies the planner’s expectations and takesdangerous actions, either carelessly or with the intent to collide, and otherwise deviates minimally from the plannedtrajectory to the extent required to maintain safety.

KeywordsProbabilistic planning, safety-preserving controller, backward reachability analysis, vehicle model predictive control,human-robot interaction.

1 Introduction

Decision-making and control for mobile robots is typicallystratified into levels. A high-level planner, informed byrepresentative yet simplified dynamics of a robot and itsenvironment, might be responsible for selecting an optimal,yet coarse trajectory plan, which is then implementedthrough a low-level controller that respects more accuratemodels of the robot’s dynamics and control constraints.While additional components may be required to fleshout a robot’s full control stack from model to motorcommands, selecting the right “division of responsibilities”is fundamental to system design.

One consideration that defies clear classification, however,is how to ensure a mobile robot’s safety when operatingin close proximity with a rapidly evolving and stochasticenvironment. Safety is a function of uncertainty in both therobot’s dynamics and those of its surroundings; high-levelplanners typically do not replan sufficiently rapidly to ensuresplit-second reactivity to threats, yet low-level controllers aretypically too short-sighted to ensure safety beyond their localhorizon.

Human-robot interactions are an unavoidable aspect ofmany modern robotic applications and ensuring safety forthese interactions is critical, especially in applications suchas autonomous driving where collisions may lead to life-threatening injury. However, ensuring safety within theplanning and control framework can be very challenging due

to the uncertainty in how humans may behave. To quantifythis uncertainty, robots often rely on generative models ofhuman behavior in order to inform their planning algorithms(Sadigh et al. 2016; Schmerling et al. 2018), thereby enablingmore efficient and communicative interactions. In general,under nominal operating conditions that reflect the modelingassumptions, these model-based probabilistic planners canoffer high performance (e.g., minimizing time and controleffort for the robot). However, these planners alone aretypically insufficient for ensuring absolute safety because (i)they depend on probabilistic models of human behavior andthus safety is not enforced deterministically, (ii) dangerousbut low-likelihood events may not be adequately capturedin the human behavior prediction model, and (iii) reasoningabout these probabilistic behavior models is typically toocomputationally expensive for the planners to react in realtime when humans strongly defy expectations and/or divergefrom modeling assumptions.

In this work we implement a control stack for a full-scaleautonomous car (the “robot”) engaging in close proximityinteractions with a human-controlled vehicle (the “human”).

1Stanford University, Stanford, CA 94305, USA2Simon Fraser University, Burnaby, BC V5A 1S6, Canada

Corresponding author:Marco Pavone, 496 Lomita Mall, Rm. 261 Stanford, CA 94305Email: [email protected]

Prepared using sagej.cls [Version: 2017/01/17 v1.20]

Page 2: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

2 Journal Title XX(X)

Our control stack aims to stay true to planned trajectoriesfrom a high-level planner since freedom of motion isessential for the planner to carry out the driving task whileconveying future intent to the other vehicle. At the same time,we allow the robot to deviate from the desired trajectory tothe point that is necessary to maintain safety. Our primarytool for designing a controller that does not needlesslyimpinge upon the planner’s choices is Hamilton-Jacobi (HJ)backward reachability. We provide a brief overview ofthe reachability analysis literature relevant to our work inSection 3.

2 Related WorkFor high-level planners, safety is often incentivized, butnot strictly enforced as a hard constraint. For example,safety is often part of the objective function when selectingoptimal plans, or represented via artificial potential fields(Wolf and Burdick 2008; Sadigh et al. 2016; Schmerlinget al. 2018). Although these approaches are designed toaccount for interactive scenarios in which another sentientagent is a key environmental consideration, they containcompeting objectives (i.e., collision avoidance vs. goal-oriented performance), often do not plan at a sufficientlyhigh rate to account for rapidly-changing environments, andultimately provide no theoretical framework for ensuringsafety for both the human and the robot.

Another common approach to finding collision-free plansis to use forward reachability. The idea is to fix a timehorizon and compute the set of states where the otheragents could possibly be in the future and plan trajectoriesfor the robot that avoid this set (Althoff and Dolan 2011;Liu et al. 2017; Lorenzetti et al. 2018). Although thisgives a stronger sense of safety, this is only practicalfor short time horizons otherwise it will lead to overlyconservative robot behaviors or even planning probleminfeasibility as the set to avoid grows. In general, approachesthat use forward reachability results in very conservativeresults, especially for interactive scenarios where freedomof motion (e.g., nudging) is necessary to convey intent.These methods, as well as methods that enforce safety as anobjective, are often subjected to model simplification, suchas using a linear dynamics model instead of a nonlinearone, or making simplifying assumptions for computationaltractability. However, model simplifications can lead tooverly conservative or imprecise results which may impedeperformance.

Safety can also be introduced at the low-level whichoften obeys higher fidelity dynamics models. A commonapproach in ensuring safe low-level controls is to use reactivecollision avoidance techniques—the robot is normallyallowed to apply any control, but switches to an avoidancecontroller when near safety violation. Examples of thisgeneral approach include HJ backward reachability-basedcontrollers which have been applied in such a switchedfashion (Fisac et al. 2018; Chen and Tomlin 2018;Bajcsy et al. 2019) and have proven to be effectivefor avoiding other interacting agents and static obstacles.HJ reachability has been studied extensively and appliedsuccessfully in a variety of safety-critical interactive settings(Mitchell et al. 2005; Bokanowski et al. 2010; Gattami

et al. 2011; Margellos and Lygeros 2011; Chen et al.2017; Dhinakaran et al. 2017) due to its flexibility withrespect to system dynamics, and its optimal (i.e., non-overly-conservative) avoidance maneuvers stemming fromits equivalence to an exhaustive search over joint systemdynamics. Other approaches include using a precomputedemergency maneuver library (Arora et al. 2015). A keydrawback to switching controllers, however, is that theperformance goals considered by the high-level planner arecompletely ignored when the reactive controller steps in.For some cases, this may be acceptable and necessary, butin general, it is desirable to ensure safety without undulyimpacting planner performance where possible. Instead ofswitching to a different controller, Gray et al. (2013), Funkeet al. (2017), and Brown et al. (2017) adapt the low-level online optimal controller to incorporate constraintsthat avoid static obstacles. The approach taken in Grayet al. (2013) aims to be minimally interventional, however,they only consider cases with disturbance uncertainty ratherthan environmental uncertainty stemming from the systeminteracting with other human agents. These approachesare able to strike the optimal balance between trackingperformance and safety (i.e., optimizing performance subjectto safety as a hard constraint), but these approaches aspresented are effective for avoiding static obstacles only.

Inspired by the non-conservative nature and optimalityof HJ reachability, and the performance of online optimalcontrol for trajectory tracking, this work aims to infusereachability-based safety assurance into the low-levelcontroller such that when the robot is near safety violation,the robot is able to simultaneously maintain safety and followthe high-level plans without unduly impacting performance.To the best of our knowledge there has not been anywork that explicitly addresses the integration of reachability-based safety controllers as a component within a robot’scontrol stack, i.e., with safety as a constraint upon a primaryplanning objective.Statement of Contributions: The contributions of this paperare twofold. First, we propose a method for formallyincorporating reachability-based safety within an existingoptimization-based control framework. The main insightthat enables our approach is the recognition that, nearsafety violation, the set of safety-preserving controls oftencontains more than just the optimal avoidance control.Instead of directly applying this optimal avoidance controlwhen prompted by reachability considerations, as in aswitching control approach, we quantify the set of safety-preserving controls and pass it to the broader controlframework as a constraint. Our intent is to enable minimalintervention against the direction of a higher-level plannerwhen evasive action is required. Second, we evaluate thebenefits, performance, and trade-offs of this safe controlmethodology in the context of a probabilistic planningframework for the traffic weaving scenario studied at ahigh level in Schmerling et al. (2018), wherein two cars,initially side-by-side, must swap lanes in a limited amountof time and distance. Experiments with a full-scale steer-by-wire vehicle reveal that our combined control stack achievesbetter safety than applying a tracking controller alone to theplanner output, and smoother operation (with similar safety)compared with a switching control scheme; in our discussion

Prepared using sagej.cls

Page 3: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

Karen Leung, Edward Schmerling, et al. 3

Forward Reachable Set

(Open-loop safety)

Global coordinates

(a) The human’s dynamics are propagatedforward in time. The robot plans trajectories thatstay outside of the human’s FRS (shaded region).

Backward Reachable Set

(Closed-loop safety)

Relative coordinates Robot’s reactive policy

(b) Joint dynamics are propagated backwards in time. The BRS (shaded region) is notoverly-conservative even for long time horizons because the robot can react if thehuman was to swerve (see right). The robot is in an unsafe state if the human entersthe BRS, defined in the relative state space.

Figure 1. Illustration of forward and backward reachability, and how they are commonly used to ensure safe planning and controls.The robot and human are depicted in red and blue respectively.

we provide a roadmap towards improving the level of safetyassurance in the face of practical considerations such asunmodeled dynamics, as well as towards generalizations ofthe basic traffic weaving scenario.

A preliminary version of this work appeared at the2018 International Symposium on Experimental Robotics(ISER). In this revised and extended version, we providethe following additional contributions: (i) an extension to thecontrol framework to account for static obstacles, (ii) moreexposition on the vehicle dynamics models used and a deeperdiscussion on the underlying modeling assumptions, (iii)an empirical study of the safety-efficiency trade-off for ourapproach, and (iv) additional experimental results includingtrials with a static road boundary wall and a comparison witha baseline approach.

3 Background: Hamilton-JacobiReachability

In this section, we provide a brief overview of reachabilityanalysis as a tool for constructing safe trajectory plans, andintroduce HJ backward reachability concepts relevant to thiswork.

3.1 Overview: Reachability AnalysisGiven a dynamics model governing a robotic systemincorporating control and disturbance inputs, reachabilityanalysis is the study of the set of states that the systemcan reach from its initial conditions. It is often used forformal verification as it can give guarantees on whether ornot the evolution of the system will be safe, i.e., whether thereachable set includes undesirable outcomes. Reachabilityanalysis can be divided into two main paradigms: (i) forwardreachability and (ii) backward reachability.Forward Reachability Analysis: The forward reachableset (FRS) is the set of states that the system couldpotentially be in after some time horizon t. This is computedby propagating the dynamics combined with all feasiblecontrol sequences and disturbances forward in time. Whenconsidering the interaction between two agents, for examplea human and a robot, the forward reachable set is computedfor the human and the robot plans to avoid this set to ensure

collision-free trajectories (see Figure 1a). This open-loopmentality, however, leads to an overly-conservative robotoutlook. That is, while considering actions in the present,the robot does not incorporate the possibility that its futureobservations of where the human goes might influence howmuch it actually needs to take avoidance actions; in short therobot’s plan to avoid the FRS does not incorporate closed-loop feedback. To reduce the over-conservative nature offorward reachability, the time horizon t for which the FRSis computed over is typically kept small and is recomputedfrequently. This approach has been found to be effective infinding collision-free trajectories (Althoff and Dolan 2014),but it is difficult to extend to interactive scenarios where thereare more uncertainties in the rapidly changing environment.Aside from the overly-conservative nature of FRS, the keydrawback with using forward reachability is that safety(i.e., avoiding the FRS) hinges on the planner’s capabilitiesand operating frequency. Even if computing the FRS isinstantaneous, the planner may still be unable to react tosplit-second threats.

Backward Reachability Analysis: Let the target set representa set of undesirable states (e.g., collision states of thesystem). The backward reachable set (BRS) is the set ofstates that could result in the system being in the target set,assuming worst-case disturbances, after some time horizont. Specifically, the BRS represents the set of states fromwhich there does not exist a controller that can prevent therobot dynamics from being driven into the target set underworst-case disturbances within a time horizon t. As such,to rule out such an eventuality, the BRS is treated as the“avoid set”. Critical differences between the FRS and BRSare that (i) the BRS is computed backwards in time, and(ii) the BRS is computed assuming closed-loop reactionsto the disturbances. Illustrated in Figure 1b, for the caseof relative dynamics between human- and robot-controlledvehicles, computation of the BRS takes into account the factthat the robot is able to react to the human at any time andin any state configuration (if the human was to swerve intothe robot, the robot can swerve too to avoid a collision).This leads to the BRS being less overly-conservative thanthe FRS. In practice, the results of the BRS computationare cached via a look-up table, and at run-time, the optimalrobot policy can be computed via a near-instant lookup of the

Prepared using sagej.cls

Page 4: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

4 Journal Title XX(X)

Figure 2. Contour plots of slices of the HJI value function V computed from the relative dynamics (see Equation (8)) and choice ofsigned distance as the terminal value function. Slices show V as a function of relative x and y position (see Figure 3 for anillustration of the human-robot relative pose) with all other states held fixed. The zero-level set, outlined with a thick black line, is thebackwards reachable set from collision states (i.e., the avoid set).

Figure 3. The relative human-robot coordinate frame andrelative pose between the robot (red) car and human (blue) car.The frame is centered and rotated about the robot’s pose.

reachability cache. As such, we can always compute optimalactions for the robot at any state configuration and regardlessof the high-level planner used.

We elect to use backward reachability because (i) its non-overly conservative nature stemming from the closed-loopcomputation ensures that safe controls will be used onlywhen necessary and not unduly impact planner performance,(ii) safety is defined intrinsically in the BRS computationwhereas safety using FRS-based approaches depends onwhether the robot’s planned trajectory intersect the FRS,and (iii) it provides a computational handle on safe controlswhich can be evaluated at high operating frequencies to reactto split-second threats.

Moreover, we will compute the BRS using Hamilton-Jacobi (HJ) reachability analysis, a particular approachto computing reachable sets. There are many existingapproaches (Greenstreet and Mitchell 1998; Kurzhanskiand Varaiya 2000; Frehse et al. 2011; Althoff and Krogh2014; Majumdar et al. 2014), but there is always atrade-off between modeling assumptions, scalability, andrepresentation fidelity (i.e., whether the method computesover- or under-approximations of the reachable set).Compared to alternatives approaches, HJ reachability is themost computationally expensive, but it is able to computethe BRS exactly∗ for any general nonlinear dynamics withcontrol and disturbance inputs because it essentially usesa brute force computation via dynamic programming. Asa result, the BRS solution to the HJ reachability problemrepresents a constructive proof of the existence of a safety-preserving control policy (i.e., a safety certificate) fromstates outside the BRS. Despite the apparent computational

drawbacks, we note that the BRS may be computed offline,and requires only a near-instant lookup during runtime,allowing it to be used in controllers or planners that run ata very high operational frequency.

3.2 Hamilton-Jacobi Backward ReachabilityAnalysis

We briefly review relevant HJ backward reachabilitydefinitions for the remainder of this section; see Chen andTomlin (2018) for a more in-depth treatment. HJ reachabilitycasts the reachability problem as an optimal control problemand thus computing the reachable set is equivalent to solvingthe Hamilton-Jacobi-Isaacs (HJI) partial differential equation(PDE).

The general HJ reachability formulation is as follows.Let the system dynamics be given by x = f(x, u, d) wherex ∈ Rn is the state, u ∈ U ⊂ Rm is the control, and d ∈ D ⊂Rp is the disturbance. For example, x could be the stateof a robot, u be the robot’s controls, d be environmentaldisturbances such as wind, and f be the dynamics of therobot. The system dynamics f : Rn × U ×D → Rn areassumed to be uniformly continuous, bounded, and Lipschitzcontinuous in x for a fixed u and d. Let T ⊆ Rn be thetarget set that the system wants to avoid at the end of a timehorizon |t| (note that t < 0 when propagating backwards intime). For collision avoidance, T typically represents the setof states that are in collision with an obstacle. For brevity,the following description of HJ backward reachability willuse notation relevant to the rest of the paper, that is,we are interested in collision avoidance for human-robotinteractions. Despite the notation, the following theory isstill applicable to general systems x = f(x, u, d) outside thedomain of human-robot interactions.

In the context of human-robot interactions, f(·) describesthe relative dynamics between the human and the robot(denoted by f

rel(·) where the subscript rel indicates the

relative human-robot system), u corresponds to the robot’scontrols, and d corresponds to the human’s controls sincethe human actions are treated as disturbance inputs. Moreconcretely, let (xR , uR) represent the robot state and control,(xH , uH) represent the human state and control, and x

relbe

the relative state between the human and the robot. Thusthe relative dynamics of the robot and human are given by

∗With precision dependent on parameters of the numerical solver, e.g.,discretization choices in mesh size/time step.

Prepared using sagej.cls

Page 5: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

Karen Leung, Edward Schmerling, et al. 5

Reachability cache

100Hz

MPC tracking controller

Safety constraints

Relative stateControls

100Hz

Trajectory

Interaction planner

3-4Hz

Robot desired trajectory

Executed trajectory

Planned trajectory

Avoid Set

Figure 4. Decision-making and control stack for human-robot pairwise vehicle interactions. Our contribution in this work is theintegration of safety-ensuring control constraints, derived from a HJ backward reachable set computed and cached offline, into amodel predictive controller’s tracking optimization problem. A high-level interaction planner produces nominal trajectories for therobot car and the low-level safe tracking controller executes controls that minimally deviate from the planner’s choice if the vehiclesapproach the set of unsafe relative states.

xrel

= frel

(xrel, u

R, u

H), and T represents the set of relative

states corresponding to when the human and robot are incollision. The formal definition of the BRS, denoted byA(t),for the human-robot relative system is

A(t) := {xrel∈ Rn : ∃uH(·),∀uR(·),∃s ∈ [t, 0],

xrel

(t) = xrel∧ x

rel= f

rel(x

rel, uR , uH) ∧ x

rel(s) ∈ T }.

(1)

A(t) represents the set of “avoid states” at time t from whichif the human followed an adversarial policy u

H(·), any robot

policy uR

(·) would lead to the relative state trajectory xrel

(·)being inside T within a time horizon |t|. Assuming optimal(i.e., adversarial) human actions, A(t) can be computed bydefining a value function V (t, x

rel) which obeys the HJI

PDE (Mitchell et al. 2005; Fisac et al. 2015); the solutionV (t, x

rel) gives the BRS as its zero sublevel set:

A(t) = {xrel

: V (t, xrel

) ≤ 0}.

The HJI PDE is solved starting from the boundary conditionV (0, x

rel), the sign of which reflects set membership of x

rel

in T .† Thus computing the BRS is equivalent to solving theHJI PDE with boundary condition V (0, x

rel); we cache the

solution V (t, xrel

) to be used online as a look-up table.For the case of the vehicle-vehicle interactions investi-

gated in this work, when the control and maximum velocitycapabilities of the human car are no greater than those ofthe robot car, one can take the limit t→ −∞ and obtainthe infinite time horizon BRS A∞ with corresponding valuefunction V∞(x

rel).‡ Intuitively, this prescribed parity in

control authority ensures that if the human and robot startsufficiently far apart, then the human will never be able to“catch” the robot. This holds even if the human car may havetransient maneuverability advantages over the robot as weassume later in this work. That is, we expect that the BRSwill not encompass the entire state space as t→ −∞, andin practice we compute the BRS over a sufficiently large

finite time horizon to the point where it appears that theBRS has converged. We recognize that we make a strongassumption on the human and robot’s control authority inenabling this computation. In reality, the human and robotcar may have very different control authority (e.g., differentengines resulting in different acceleration and velocity capa-bilities) and the infinite BRS may not be bounded. In suchcases, practitioners may compute the BRS over a horizonsuitable for the interaction where guarantees afforded by HJreachability only hold over that time horizon. Illustrativeslices of the value function and the BRS for the vehicle-vehicle relative dynamics with equal control and velocitycapabilities considered in this work are shown in Figure 2while Figure 3 illustrates the human-robot relative frameused to define the relative dynamics involved in the BRScomputation (the mathematical details are given in Section5.3). In Figure 2 (left), the pear-shaped BRS stems from thefact the robot car can swerve its front more rapidly than itsrear to avoid collision. In Figure 2 (middle), if the robotcar is traveling faster, it is unsafe for the robot car to be inthe region behind the human car because a collision may beunavoidable if the human brakes abruptly. In Figure 2 (right),if the human car is traveling faster, in this case at an angle,it is unsafe for the robot to be in the region in front of thehuman car because the robot car may not be able to maneuverout of the way before the human catches up.

We can compute the optimal robot collision avoidancecontrol

u∗R

= arg maxuR

minuH

∇V (xrel

)T frel

(xrel, u

R, u

H) (2)

which offers the greatest increase in V (xrel

) assuming opti-mal (worst-case) actions by the human. For general nonlinear

†See Section 8 for discussion of specific choices of V (0, xrel ).‡For ease of notation going forward we will often write V := V∞.

Prepared using sagej.cls

Page 6: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

6 Journal Title XX(X)

systems, computing the optimal collision avoidance controlin Equation (2) may be nontrivial as there could be mul-tiple local maxima. For control/disturbance affine systems,however, the solutions to the optimal control/disturbance arebang-bang.

Recall from Equation (1) that when the system is outsideof the avoid setA(t) (i.e., V > 0), there exists a robot policy(e.g., Equation (2)) that keeps the robot safe over a timehorizon |t| regardless of any (including adversarial) policytaken by the human. Previous applications of HJI solutionsswitch to the optimal control (Equation (2)) when near theboundary of the BRS, i.e., when safety is nearly violated(Fisac et al. 2018; Bajcsy et al. 2019) This reflects the goalof HJ reachability-based safety which is to ensure that thesystem stays outside of the avoid set A(t), that is, the valuefunction should always stay positive.

In an interactive scenario where, for example, we maywant to let a robot planner convey intent by nudging towardsthe human car to the extent that is safe, we prefer a lessextreme control strategy. In the next section, we describein detail how to infuse reachability-based safety assurancewithin a multi-tiered control framework that consists ofdifferent planning objectives.

4 Control Stack ArchitectureIn this section, we propose using a safety-preservingHJI controller, rather than switching to the optimal HJIcontroller defined in Equation (2), and describe how toincorporate it within an existing control stack—a high-levelplanner feeding desired trajectories to a low-level trackingcontroller—to enable safe human-robot interactions thatminimally impinges on the high-level planning performanceobjective. The control stack architecture is illustrated inFigure 4. The proposed control stack is applicable togeneral human-robot interactions (e.g., an autonomouscar interacting with a human-driven car, an autonomousmanipulator arm working alongside a human, wheeledmobile robots navigating a crowded sidewalk). However,in this work, we focus on the traffic-weaving scenario,wherein two cars initially side-by-side must swap lanesin a limited amount of time and distance, because itis a representative interactive scenario that encapsulatesmany challenging characteristics inherent to human-robotinteraction. Successful and smooth traffic-weaves rely onaction anticipation, intent prediction, and proactive behaviorfrom each vehicle, and ensuring safety is critical becausecollisions may lead to life-threatening injury.

4.1 Interaction PlannerOur proposed control framework is agnostic to theinteraction planner used. The only assumption for theplanner is that it outputs desired trajectories (whichpresumably reflect high-performance goal-oriented nominalbehavior) for the robot car to track. In general, high-level planners often optimize objectives that weigh safetyconsiderations (e.g., distance between cars) relative to otherconcerns (e.g., control effort), and typical to human-robotinteractive scenarios, they may reason anticipatively withrespect to a probabilistic interaction dynamics model. Thatis, although the planner is encouraged to select safer plans,

safety is not enforced as a deterministic constraint at theplanning level.

In this work, we use the traffic-weaving interaction plannerfrom Schmerling et al. (2018). It uses a predictive model offuture human behavior to select desired trajectories for therobot car to follow, updated at ∼3Hz. We extend this workby using a hindsight optimization policy (Yoon et al. 2008)instead of the limited-lookahead action policy in order toencourage more information-seeking actions from the robot.

4.2 Tracking ControllerGiven a desired trajectory from the interaction planner, thetracking controller computes optimal controls to track thedesired trajectory. We assume that the outputs of the low-level tracking controller are directly usable by the robot’sactuators (e.g., steering and longitudinal force commands fora vehicle, torque commands for each joint on a manipulatorarm). As such, the tracking controller typically uses a moreaccurate dynamics model and operates at a much higherfrequency (∼ 100Hz) than the interaction planner, often atthe expense of environmental awareness. To ensure safetywith respect to a dynamic obstacle (i.e., a human-drivenvehicle) we incorporate additional constraints computedfrom HJ reachability analysis into the tracking optimizationproblem. These constraints are designed to ensure that ateach control step the robot car does not enter an unsafe set ofrelative states that may lead to collision.

In this work, we are concerned with vehicle trajectorytracking. We adapt the real-time model predictive control(MPC) tracking controller from Brown et al. (2017) bymodifying it to include an additional invariant set constraintdetailed in the next section. This MPC tracking controller,operating at 100Hz, computes optimal controls to track adesired trajectory by solving an optimization problem at eachiteration. The optimization problem is based on a singletrack vehicle model (also known as the bicycle model)and incorporates friction and stability control constraints (inaddition to control and state constraints) while minimizinga combination of tracking error and control derivatives. Amore in-depth treatment of this combined MPC and HJIcontroller is given in Section 6.

4.3 Safety-Preserving HJI ControlRather than switching to the optimal avoidance controllerdefined in Equation (2) when nearing safety violation, wepropose adding containment in the set of safety-preservingcontrols as an additional constraint to the low-level MPCtracking controller. The set of safety-preserving controls

UR

(xrel

) = {uR

: minuH

∇V (xrel

)T frel

(xrel, u

R, u

H) ≥ 0}

(3)represent the set of robot controls that ensure the valuefunction is nondecreasing. This constraint can be computedonline since the BRS is computed offline and the valuefunction V and its gradient ∇V are cached. Online weemploy a safety buffer ε > 0 so that when the conditionV (x

rel) ≤ ε holds, indicating that the robot is nearing

safety violation, we add the constraint uR ∈ UR(xrel

) tothe list of tracking controller constraints. By adding thisadditional safety-preserving constraint when near safety

Prepared using sagej.cls

Page 7: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

Karen Leung, Edward Schmerling, et al. 7

violation, the MPC tracking controller selects control actionsthat prevent the robot from further violating the safetythreshold while simultaneously optimizing for trackingperformance and obeying additional constraints. This resultsin a minimally interventional safety controller—the MPCtracking controller will only minimally deviate from thedesired trajectory to the extent necessary to maintain safetyfor the robot car.

For the traffic-weaving scenario investigated in this paper,the MPC problem for vehicle trajectory tracking (Brownet al. 2017) is formulated as a quadratic program (QP)(to enable fast solve time amenable to a 100Hz operatingfrequency) and hence requires the constraints to be linear. Assuch, we instead apply the constraint uR ∈ UR(x

rel) where

UR

(xrel

) = {uR

: MTHJIu

R+ b

HJI≥ 0} (4)

is a linearized approximation of UR(xrel

). Specifically, forthe current relative state x

rel, current robot control u

R, and

optimal, i.e., worst-case, human action defined analogouslyto Equation (2) u∗

H, the terms in the linearization are

MHJI

=∂

∂uR

(∇V T f

rel(x

rel, u

R, u

H))∣∣∣

(xrel,u∗

H,u

R)

bHJI = ∇V T frel

(xrel, u∗

H, uR)−MT

HJIuR .

In general, UR

(xrel

) may not be a half-space, leading to thelinear approximation UR(x

rel) including controls outside of

UR(xrel

). However, since we bound the change in the controlinputs across each time step, feasible controls remain closeto the linearization point where the approximation error issmall.

5 DynamicsIn this section, we detail the vehicle dynamics model used tomodel the robot and human car, the relative dynamics modelbetween the human and the robot necessary for computingthe BRS, and the tracking dynamics used for the MPCtracking controller. We use a six-state nonlinear single trackmodel to describe the robot car’s dynamics and assumethe human car obeys a four-state dynamically extendednonlinear unicycle model with longitudinal acceleration andyaw rate as control inputs. As a result, the relative dynamicsmodel has seven states. This represents a compromisebetween model fidelity and the number of state dimensionsin the relative dynamics; increasing the former reducesthe amount of model mismatch with the real systemwhile reducing the latter is essential since solving theHJI PDE suffers greatly from the curse of dimensionality.The computation becomes notoriously expensive past fiveor more state dimensions without compromising on griddiscretization or employing some decoupling strategy.

5.1 Robot Vehicle DynamicsThe robot car (denoted by subscript R) will be modeled usingthe single track vehicle model illustrated in Figure 5. Let(pxR

, pyR) be the position of the robot car’s center of massdefined in an inertial reference frame and ψ

Rbe the yaw

angle (heading) of the robot car relative to the horizontalaxis. UxR and UyR are the velocities in the robot car’s body

Figure 5. Schematic of the single track model (bicycle model)tracking a path. This is the dynamics model used to describethe robot vehicle.

frame, and rR is the yaw rate. The state for the singletrack model is xR =

[pxR

pyR ψR

UxRUyR r

R

]T.

The control input uR =[δ Fx

]Tconsists of the steering

command δ and longitudinal tire force Fx which isdistributed between the front and rear tires Fx = Fxf

+Fxr

via a fixed mapping. Assuming a quadratic modelof longitudinal drag force (Fxdrag

= −Cd0 − Cd1UxR−

Cd2U2xR

) and for vehicle mass (m) and moment of inertia(Izz), and the distances from the center of mass to the frontand rear axles (df , dr), the equations of motion for the robotcar are

xR =

UxR

cosψR− UyR sinψ

R

UxRsinψ

R+ UyR cosψ

R

r1m (Fxf

cos δ − Fyf sin δ + Fxr + Fxdrag) + rRUyR

1m (Fyf cos δ + Fyr + Fxf

sin δ)− rRUxR

1Izz

(dfFyf cos δ + d

fFxf

sin δ − drFyr)

(5)

The controls are assumed to be limited by the steeringsystem, friction limits, and power capacity of the vehicle.Using the brush coupled tire model by Pacejka (2002), thelateral tire force Fyf and Fyr at the front and rear tiresis a function of slip angle (αf , αr), tire cornering stiffness(Cαf

, Cαr), longitudinal tire forces (Fxf

, Fxr), coefficient of

friction (µ), and normal tire forces (Fxf, Fzr ). As such, the

lateral tire force for either the front or rear tires (denoted byi ∈ {f, r}) is

Fyi =

0 if Fxi > µFzi−Cαi tanαi

(1− γ + 1

3γ2)

if γ < 1

−Fymaxsign(tanαi) if γ ≥ 1

(6)where γ =

∣∣∣Cαi tanαi3Fymax

∣∣∣ and Fymax=√µ2F 2

zi − F 2xi . The

slip angles and normal forces (accounting for weight transferdue to Fxi ) for the front and rear tires can be computed usingthe following equations

αf = tan−1 UyR + dfrR

UxR

− δ, αr = tan−1 UyR − drrR

UxR

,

Fzf =mgd

r− hFxL

, Fzr =mgd

f+ hFxL

,

Prepared using sagej.cls

Page 8: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

8 Journal Title XX(X)

Figure 6. Schematic of the dynamically-extended unicyclemodel. This is the dynamics model used to describe the humanvehicle.

where h is the distance from the center of mass to theground, L = d

f+ dr , and Fx = Fxf

cos δ − Fyf sin δ + Fxr

is the total longitudinal force in the vehicle’s body frame.

5.2 Human Vehicle DynamicsThe human car (denoted by subscript H) will be modeledusing the dynamically extended unicycle model illustratedin Figure 6. Let (pxH , pyH) be the position of the center ofthe human car’s rear axle defined in an inertial referenceframe and ψ

Hbe the yaw angle (heading) of the human car

relative to the horizontal axis. The velocity of the human carin the vehicle frame is v

H. The state for the dynamically

extended unicycle model is xH

=[pxH

pyH ψH

vH

]T.

The control input uH

=[ω a

]Tconsists of the yaw rate

ω and longitudinal acceleration a. The control limits ofthe human car are chosen such that the robot and humancar share the same power, steering, and friction limits. Theequations of motion for the human car are

xH

=

vH cosψH

vH sinψH

ωa

. (7)

Due to its simpler dynamics representation, the humancar has a transient advantage in control authority over therobot car (it may change its path curvature discontinuously,while the robot may not), but by equating the steady-statecontrol limits we ensure that the infinite time horizon BRScomputation converges.

5.3 Relative DynamicsThe relative state (denoted by subscript rel) betweenthe robot car and human car is defined with respect toa coordinate system centered on and aligned with therobot car’s vehicle frame. We define the relative position(pxrel

, pyrel) as[pxrel

pyrel

]=

[cosψR sinψR

− sinψR cosψR

] [pxH − pxR

pyH − pyR

],

and the relative heading ψrel

as ψrel

= ψH− ψ

R.

Since the velocity states are defined with respectto the vehicle frame, we cannot define analogous rel-ative velocity states and must include the individualvelocity states of each vehicle. As such, the rela-tive state for the human-robot vehicle system is x

rel=[

pxrelpyrel ψ

relUxR

UyR vH

rR

]T. In the lan-

guage of HJ reachability, the disturbance input of the sys-tem is the human car’s control d = uH =

[ω a

]Tand the

control input is the robot car’s control u = uR

=[δ Fx

]T.

Combining Equations (5) and (7), the equations of motionfor the relative system are

xrel

=

vH

cosψrel− UxR

+ pyrelrRvH sinψ

rel− UyR − pxrel

rRω − rR

1m (Fxf

cos δ − Fyf sin δ + Fxr+ Fxdrag

) + rRUyR

1m (Fyf cos δ + Fyr + Fxf

sin δ)− rRUxR

a1Izz

(dfFyf cos δ + d

fFxf

sin δ − drFyr)

.

(8)

5.4 Tracking DynamicsThe tracking MPC controller relies on an error dynamicsmodel. Define a path (see Figure 5) through space wheres is the distance along the path and at any distance s,we know the path heading ψ(s), the curvature κ(s), and acoordinate system (s, e) tangential and normal to the path.Given the position and heading of the robot car, we canproject the car to the closest point on the path. Let thelateral error e be the lateral distance along direction e and∆ψ = ψR − ψpath be the robot car’s heading relative to thepath computed from this closest point. Using this projectionand the robot car’s dynamics from Equation (5), we cancompute the error dynamics relative to the desired path.Using the same notation defined previously in Equation (5),the state for the robot car tracking a desired path is x =[s UxR UyR rR ∆ψ e

]Tand the controls u = u

R=[

δ Fx]T

. The tracking dynamics are

˙x =

UxR cos ∆ψ − UyR sin ∆ψ

1m (Fxf

cos δ − Fyf sin δ + Fxr+ Fxd) + r

RUyR

1m (Fyf cos δ + Fyr + Fxf

sin δ)− rRUxR

1Izz

(dfFyf cos δ + d

fFxf

sin δ − drFyr)

r − (UxRcos ∆ψ − UyR sin ∆ψ)κ(s)

UxRsin ∆ψ + UyR cos ∆ψ

.(9)

6 The MPC+HJI Trajectory TrackingController

In this section, we detail the optimization problem used inthe MPC+HJI tracking controller, describe how this problemcan be adapted to accommodate collision avoidance forstatic obstacles, and provide numerical details about the BRScomputation used in this formulation. Central to an MPCcontroller is an optimization problem; at each time step, thecontroller solves an optimization problem to find an optimalcontrol sequence, passes the first control input to the actuator,and then repeats this process. To be amenable to real-timeapplications, the optimization problem requires a fast solvetime (∼ 0.01s). In this work, the MPC tracking problem isformulated as a convex optimization problem, namely a QP,enabling the use of efficient solvers (Mattingley and Boyd2012; Stellato et al. 2017) which are capable of solving theQP within the tight operating frequency.

Prepared using sagej.cls

Page 9: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

Karen Leung, Edward Schmerling, et al. 9

6.1 Optimization ProblemBoth the trajectory tracking objective and safety-preservingcontrol constraint rely on optimizing over the robot steeringand longitudinal force inputs simultaneously. Let qk =[∆sk UxR,k UyR,k r

R,k ∆ψk ek]T

be the state ofthe robot car with respect to a nominal trajectory at discretetime step k. ∆sk, ek and ∆ψk denote longitudinal, lateral,and heading error; UxR,k, UyR,k, and r

R,k are body-framelongitudinal and lateral velocity, and yaw rate respectively asdefined in Figure 5. Let uk =

[δk Fx,k

]Tbe the controls

at step k and let Akqk +B−k uk +B+k uk+1 + ck = qk+1

denote linearized first-order-hold dynamics of Equation (9).We adopt the varying time steps method (Nshort time stepsof size ∆tshort and Nlong time steps of size ∆tlong) andstable handling envelope constraint from Brown et al. (2017)(expressed asHk andGk in the problem formulation below).To ensure the existence of a feasible solution, we useslack variables σβ,k, σr,k, and σ

HJI,k on the stability andHJI constraints. The HJI reachability constraint M

HJIuk +

bHJI≥ −σ

HJIis activated only when V (x

rel) ≤ ε. Although

HJI theory suggests that applying this constraint on thenext action alone is sufficient, we apply it over the nextNHJI = 3 timesteps (30ms lookahead) to account for theapproximations inherent in our QP formulation. The MPCtracking problem is a quadratic program of the form

minimizeq,u,σ,σ

HJI,∆δ,∆Fx

T∑k=1

(∆sTkQ∆s∆sk + ∆ψTk Q∆ψ∆ψk+

eTkQeek +

(∆δk∆tk

)TR∆δ

(∆δk∆tk

)+(

∆Fx,k∆tk

)TR∆Fx

(∆Fx,k∆tk

)+

Wβσβ,k +Wrσr,k +WHJIσHJI,k

)∆tk

subject to q1 = qcurr, u1 = ucurr,

δk+1 − δk = ∆δk,

δmin∆tk ≤ ∆δk ≤ δmax∆tk,

δmin ≤ δk ≤ δmax,

Fx,k+1 − Fx,k = ∆Fx,k,

Fx,min ≤ Fx,k ≤ Fx,max,

Ux,min ≤ UxR,k ≤ Ux,max,

σβ,k ≥ 0, σr,k ≥ 0,

Akqk +B−k uk +B+k uk+1 + ck = qk+1,

Hk

[Uy,krk

]−Gk ≤

[σβ,kσr,k

],

for k = 1, ..., Nshort +Nlong

σHJI,j ≥ 0 (if V (xrel

) ≤ ε),M

HJIuj + b

HJI≥ −σ

HJI(if V (x

rel) ≤ ε),

for j = 1, ..., NHJI.

(10)

The objective strives to minimize a combination oftracking error (longitudinal, lateral and angular), controlrates (steering and longitudinal tire forces), and magnitudeof the slack variables. The constraints ensure (i) continuity

Figure 7. The BRS of the robot-wall system where zeroheading corresponds to the robot car being parallel to the wall.

with the current and next state and control, (ii) the change incontrols across each time step is bounded, (iii) the positivityof slack variables, (iv) the (linearized) dynamics are satisfied,(v) the vehicle stability constraints are satisfied, and (vi) theHJI safety-preserving half-plane constraint is satisfied whenV (x

rel) ≤ ε.

The time-discretization and linearizations (dynamics andconstraints) we apply amount to an approximate variantof sequential quadratic programming (SQP). In particular,however, we solve one QP at each MPC step rather thanthe usual iteration until convergence. Since the trackingproblems are so similar from one MPC step to the next, wefind that this approach yields sufficient performance for ourpurposes. We interpolate along each solution trajectory tocompute the linearization nodes for the QP at the next MPCstep.

We use the ForwardDiff.jl automatic differentiation(AD) package implemented in the Julia programminglanguage (Revels et al. 2016) to linearize the trajectorytracking dynamics as well as the HJI relative dynamicsfor the safety-preserving constraint. We call the OperatorSplitting Quadratic Program (OSQP) solver (Stellato et al.2017) through the Parametron.jl modeling framework(Koolen and contributors 2018); this combination of softwareenables us to solve the following MPC optimization problemat 100Hz. The MPC code, including optimization parametersand vehicle parameters (also included in Appendix Aand B), can be found here: https://github.com/StanfordASL/Pigeon.jl.

6.2 Adding Static ObstaclesThe current formulation does not prevent the robot fromdriving very far from its nominal trajectory (e.g., completelyoff the road) in order to avoid the human. In more realisticroad settings, there could be environmental constraints suchas a concrete road boundary. In the same way that collisionavoidance with a human-controlled vehicle is formulated asan additional constraint to the robot’s MPC problem, we canadd another safety constraint to account for a static wall to

Prepared using sagej.cls

Page 10: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

10 Journal Title XX(X)

prevent the robot from swerving completely off the road.We consider two approaches for deriving this constraint: thefirst based on an additional HJI computation and giving riseto a similar control constraint applied over the first N

HJI

timesteps, and the second treating the wall as a static obstaclewith associated state constraints that apply over the wholeduration of the MPC trajectory optimization.

For the first approach, we can compute a value functionVWALL(xR) describing the interaction between the robot andthe wall by using Equation (5) without the pxR state (weonly care about the distance from the wall and not howfar along the wall the robot is). Two slices of the robot-wall BRS projected on the pyR − ψR

plane are illustratedin Figure 7. Intuitively, if the robot is moving towards thewall, the faster it is moving, the further away it needsto be away from the wall in order for there to exist anoptimal collision avoidance control. Note that since the wallis static, there is no disturbance input. The robot-wall HJIcontrol constraint M

WALLu

R+ b

WALL≥ 0 becomes active

when VWALL(xR) ≤ ε. This means that it is possible for bothHJI-safety constraints to be active simultaneously. We notethat theoretically we could consider the robot, human, andwall simultaneously by computing the BRS for the jointsystem. However, naively increasing the state size withoutany decomposition would be computationally undesirableeven offline. Thus we treat the human-robot and robot-wallsystems separately, but will discuss later the impact of thisdesign choice.

Alternatively, we can account for a static wall by addinglateral error bound constraints into the MPC trackingproblem—this is the approach taken in Brown et al. (2017).This involves always adding a left and right lateral errorconstraint (emin,k ≤ ek ≤ emax,k) on each node point alongthe MPC trajectory such that the lateral deviation from thedesired trajectory does not exceed the lateral distance to thewall. Specifically, we add the following constraints to the QPin Equation (10):

ek − ek,min ≥ −σe, ek,max − ek ≥ −σe for k = 1, ..., N

as well as a slack penalty, Weσe to the cost function.This approach ensures no collision with the wall overthe MPC time horizon only (in contrast to HJI whichis over an infinite time horizon) and in general provideshigher tracking performance because the MPC controller canoptimize tracking states and controls over the entire trackingtrajectory. We investigate both these approaches and providemore discussion in Section 8.

6.3 BRS ComputationWe use the BEACLS toolkit (Tanabe and Chen 2018)implemented in C++ to compute the BRS. Since HJreachability suffers from the curse of dimensionality, thisis, to the best of our knowledge, the first attempt to use HJreachability to compute the BRS for a seven-state relativesystem, especially with such high modeling fidelity. We,however, do sacrifice on grid size and use a relatively coarsegrid compared to other literature standards. We linearlyinterpolate between grid points in order to evaluate thevalue function and its gradient at any given state (human-robot relative state or robot-wall state). We use a grid

Figure 8. X1: a steer-by-wire experimental vehicle platform.The 1/10-scale RC car has a mast that is visible to the LiDARs.

size of 13× 13× 9× 9× 9× 9× 9 for our 7D systemuniformly spaced over (pxrel

, pyrel , ψrel, UxR , UyR , vH , rR) ∈

[−15, 15]× [−5, 5]× [−π/2, π/2]× [1, 12]× [−2, 2]×[1, 12]× [−1, 1]; computing the BRS with this system anddiscretization takes approximately 70 hours on a 3.0GHzoctocore AMD Ryzen 1700 CPU.

Computing the BRS requires computing the optimalcontrol and disturbance defined in Equation (2). For therelative dynamics model, the optimal disturbance (i.e.,human actions) is a bang-bang solution since the disturbanceis affine. Due to the highly nonlinear nature of the dynamics,we use a uniform grid search across δ and Fx to calculateoptimal actions (i.e., robot actions).

For the robot-wall system, we compute the optimalcontrol actions in the same fashion. We use a gridsize of 21× 9× 9× 9× 9 uniformly spaced over(pyR , ψR , UxR , UyR , rR) ∈ [−3, 7]× [−π/2, π/2]×[1, 12]× [−2, 2]× [−1, 1]; computing the BRS withthis system and discretization takes approximately 40minutes.

7 Results

7.1 Experimental Vehicle PlatformX1 is a flexible steer-by-wire, drive-by-wire, and brake-by-wire experimental vehicle developed by the StanfordDynamic Design Lab (see Figure 8). It is equipped withthree LiDARs (one 32-beam and two 16-beam), a differentialGPS/INS which provides 100Hz pose estimates accurate towithin a few centimeters as well as high fidelity velocity,acceleration, and yaw rate estimates. To control X1, desiredsteering (δ) and longitudinal tire force (Fxf

, Fxr) commands

are sent to the dSpace MicroAutoBox (MAB) which handlesall sensor inputs except LiDAR (handled by the onboardPC) and implements all low level actuator controllers.Similarly, state information about the vehicle is obtainedfrom the MAB. We use the Robot Operating System (ROS)to communicate with the MAB which handles sensing andcontrol at the hardware level; the planning/control stackdescribed in this work is running onboard X1 on a consumerdesktop PC running Ubuntu 16.04 equipped with a quadcoreIntel Core i7-6700K CPU and an NVIDIA GeForce GTX1080 GPU. X1 parameters used in the equations of motion(Equation (5)) are listed in Appendix A.

We also perform experiments using a LiDAR-visible 1/10-scale RC car (Figure 8 right) as the human-driven car to

Prepared using sagej.cls

Page 11: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

Karen Leung, Edward Schmerling, et al. 11

(a) Controller comparison on experimental data where the robot car uses the MPC+HJI controller.

(b) Controller comparison on experimental data where the robot car uses the switching controller.

Figure 9. Controller comparison on planner trajectories from X1-virtual human-driven vehicle experiments. Left: Simulations ofusing the MPC, MPC+HJI, and switching controllers when V (xrel(t)) first drops below ε = 0.05 are shown, and are comparedagainst the executed trajectory (from experiments) and the desired trajectory (from the interaction planner). Right: Thecorresponding evolutions of V (xrel(t)). Bottom: Illustration of the traffic-weaving interaction. The human controlled car (blue car)carelessly drives into the path of the robot car (red car), causing the robot car to react by swerving. The transparency of therectangles corresponds to the speed of the car (higher transparency corresponds to higher speed).

investigate the robustness of our proposed control stack withperception uncertainty.

The code used to run the experiments can be foundhere: https://github.com/StanfordASL/safe_traffic_weaving.

7.2 ExperimentsTo evaluate our proposed control stack—a synthesisof a high-level probabilistic interaction planner withthe MPC+HJI tracking controller—we perform full-scalehuman-in-the-loop traffic-weaving trials with X1 taking onthe role of the robot car. To ramp up towards testing withtwo full-scale vehicles in the near future, we investigate twotypes of human car: (i) a virtual human-driven car and (ii)a 1/10-scale LiDAR-visible human-driven RC car. We scale

the highway traffic-weaving scenario (mean speed ∼28m/s)in Schmerling et al. (2018) down to a mean speed of ∼8m/sby shortening the track (reducing longitudinal velocity bya constant) and scaling time by a factor of 4/3 (with theeffect of scaling speeds by 3/4 and accelerations by 9/16).The value of the parameters in the MPC tracking problemare listed in Appendix B.

We investigate and evaluate the effectiveness of ourcontrol stack by allowing the human car to act carelessly(i.e., swerving blindly towards the robot car) during theexperiments. In Sections 7.2.1 and 7.2.2 we compare ourproposed controller (MPC+HJI) against a tracking-onlyMPC controller (MPC) and a controller that switches to theHJI optimal avoidance controller when near safety violation(switching). In Section 7.2.3 we compare the two static

Prepared using sagej.cls

Page 12: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

12 Journal Title XX(X)

(a) Trade-off between total safety and average efficiency.

(b) Trade-off between worst-case safety and worst-case efficiency.

Figure 10. Safety and efficiency trade-offs for the differentcontrol strategies.

obstacle avoidance approaches developed in Section 6.2to a scenario with a virtual wall constraining the robot’strajectory. We investigate applying an extension of the samestate-constraint-based static obstacle avoidance strategy tothe problem of collision with a dynamic obstacle (the human)in Section 7.2.4 and discuss its shortcomings. Finally, inSection 7.2.5 we present experiments with the human carphysically embodied by an RC car.

7.2.1 Virtual Human-Driven Vehicle To ensure a com-pletely safe experimental environment, our first tier of exper-imentation uses a joystick-controlled virtual vehicle for thehuman car and allows the robot control stack to have perfectobservation of the human car state. Experimental trials ofthe probabilistic planning framework using (i) our proposedapproach (MPC+HJI) and (ii) switching to the optimal HJIcontroller (switching) are shown in Figure 9, along witha simulated comparison between the two safety controllersand the tracking-only controller (MPC). For comparativepurposes, the controllers were simulated with the displayednominal trajectory held fixed, but in reality, the nominaltrajectory in these experiments was updated at ∼ 3Hz. Asexpected, we see that when safety violation occurs, theMPC+HJI controller represents a middle ground betweenthe tracking-only MPC which does not react to the humancar’s intrusion, and the switching controller which arguablyoverreacts with a large excursion outside the lane boundaries.Evidently, our proposed controller tries to be minimallyinterventional—the robot car swerves/brakes but only to anextent that is necessary.

Looking at the value function, we see that, as expected,the MPC+HJI controller aims to keep the value positive, butdoes not necessarily strive to increase it, while the switchingcontroller aims to increase the value as much as possible.The MPC (tracking only) controller fails to increase the valueat all when safety violation occurs. All controllers however,experience a period where the value is negative, even the twoHJI-based controllers which theoretically guarantee safety.We believe this is due to model mismatch; we will discussthis point in more depth in Section 8.

Additionally, in some trials when the robot car wastraveling faster, the activation of the HJI constraint resultedin the robot car performing a large but smooth swerve thattraversed completely outside the lane boundaries. We addressthis limitation by adding a wall constraint into the MPCformulation discussed in Section 7.2.3.

7.2.2 Safety and Efficiency Trade-off Our MPC+HJIcontroller considers the set of safety-preserving controlswhile optimizing its tracking performance when it is nearsafety violation. In contrast, the HJI switching controller usesthe optimal avoidance control policy and as a result neglectsto track the desired trajectory that was selected by the plannerfor interaction performance. As such, there is a trade offbetween safety, defined with respect to the value function,and efficiency.

For use as a comparison metric, we define the notion oftotal safety of an interaction of time length T as

Stotal :=

∫ T

0

1[V (xrel

(t)) ≤ 0]V (xrel

(t)) dt

which is the integral of the value function when it is negative.Total safety considers not only the magnitude of the safetyviolation, but also the duration of the violation. We can alsodefine the worst-case safety as

Sworst := mint∈[0,T ]

V (xrel

(t))

which does not consider the duration of safety violation,but rather the worst-case safety violation with respect to thevalue function over the interaction.

Efficiency of the interaction is more difficult to quantify. Inthis analysis, we define efficiency with respect to the g-forcesexperienced by the vehicle as this is a proxy for controleffort and passenger comfort. Alternative metrics includetime taken to complete the interaction, and friction availablein the tires. We define average efficiency as

Eavg := 1− 1

T

∫ T

0

1

g

√ax(t)2 + ay(t)2 dt

where ax(t) and ay(t) are the x and y acceleration of therobot car; larger Eavgvalues correspond to better efficiency.g-forces should not exceed one as this is beyond the physicallimits of a vehicle. We also define the worst-case efficiencyas

Eworst := maxt∈[0,T ]

(1− 1

g

√ax(t)2 + ay(t)2

)which considers the largest g-force experienced during theinteraction.

Given these quantities, we can compare the safety andefficiency trade-offs between the MPC, MPC+HJI, and

Prepared using sagej.cls

Page 13: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

Karen Leung, Edward Schmerling, et al. 13

switching controllers. Multiple experimental trials usingX1 were carried out using each controller, and Figure10a compares the average/total metrics while Figure 10bcompares the worst-case metrics. We see that in both cases,the MPC+HJI controller provides a good balance betweensafety and efficiency; the safety almost equaling that of theoptimal switching controller, and efficiency almost equalingthat of the MPC controller. The switching controller providesthe highest level of safety (with respect to the value function)but experiences lower efficiency since it often results inheavy braking and sharp swerving. The MPC controllerprovides lower safety scores but with larger variations asthe resulting safety score is scenario dependent rather thancontroller dependent.

(a) Comparison of steering angle. The plot begins when the robotcar is planning autonomously.

(b) Comparison of longitudinal force. The plot begins when therobot car is planning autonomously.

Figure 11. Comparison of control sequences from using MPClateral error state constraints, robot-wall HJI control constraint,and no constraints to avoid a static road boundary.

7.2.3 Static wall We investigate two methods, (i) persis-tently adding lateral error state constraints into the MPCproblem and (ii) adding an additional HJI-based controlconstraint for a robot-wall relative system into the MPCproblem when V

WALL(x

R) ≤ ε, for avoiding a static (virtual)

road boundary wall when the robot car is swerving off theroad to avoid a collision with the human car.

Figure 11 compares the control sequences and Figure12 compares the trajectory of both approaches. Here, thehuman car is commanded to stay in the lane and is initializedinside the BRS so that when the robot starts planningautonomously, the robot car will react immediately andswerve out of the way—we investigate the nature of theswerving. We do note that the human car is able to make therobot car swerve even though it is staying in its lane. Thisis due to the wide, pear-shape BRS when the two cars areparallel and at similar speeds (see Figure 1b left). We discussthis in more detail in Section 8.

Around 20 meters from the start of the roadway in Figure12, the robot car begins to plan autonomously. Before then,the cars are following a straight line in order to speed upfrom ≈ 1m/s to the desired interaction speed. To providethe cleanest comparison, all results in this subsection arederived from simulation. Starting in the right lane, we seein all cases that when safety is violated the robot carswerves to the right (δ < 0). When using the lateral errorconstraints (Figure 12, top), the robot car essentially hasa “look-ahead” capability because it is able to optimizeits steering commands over the MPC tracking horizon,essentially distributing the responsibility of avoiding the wallacross the entire tracking MPC horizon. As a result, the robotcar is able to successfully and smoothly steer back onto theroad and avoid the wall.

When using the robot-wall HJI control constraint (Figure12, center), the robot car does not preemptively swerveback onto the road and instead only reacts to the wallwhen VWALL(xR) ≤ ε as designed. The robot car performsa hard brake to the point of almost stopping§ and thenbegins to eventually command a maximum steering angle(18 deg). The sharp and abrupt behavior stems from the HJIformulation assuming the robot can and will take extremeactions, including cases when using the safety-preservingcontrol set. As a result, the responsibility for evasive action istriggered at the very latest possible instance and compressedinto a control constraint over a single (in practice 3) MPCtimestep. This is in contrast to the other approach of alwayshaving MPC lateral error state constraints over the entireMPC horizon. The MPC+HJI controller is effective inavoiding dynamic obstacles, but for static obstacles, HJI isnot suitable because it is unnecessary to reason about thedynamics of something that is static. With no wall constraints(Figure 12, bottom), the robot car swerves completely off theroad, and in order to quickly get to the left lane before theend of the road (inscribed as an objective in the high-levelplanner), it overshoots to the other side of the road and alsodrives beyond the road boundary on the other side.

7.2.4 Comparison to baseline dynamic obstacle avoid-ance Motivated by the success of applying state constraintsin the previous section, we compare our proposed frame-work (MPC+HJI for safe human-robot interaction) with abaseline approach of avoiding collision with the humancar using lateral error constraints. In order to define safetystate constraints over the duration of the robot trajectory,we must however make some assumption on the human’sfuture trajectory. For this baseline, at each MPC iteration,we assume that the human car will continue moving at itscurrent heading and with its current velocity.¶ The lateralerror constraint at each MPC time step is computed betweenthe robot’s desired trajectory and the human’s projectedtrajectory—this is analogous to applying the approach taken

§Since the dynamic bicycle model is ill-defined for low speeds (thusexplaining the oscillations in Fx), the experiment (when using the robot-wall HJI control constraint) is essentially over around t = 9. In general, theMPC problem can switch to the kinematic model at low speeds (Pattersonet al. 2018) which is well defined in that speed region.¶Alternatively, one could use a sample or a maximum a posteriori estimatefrom the interaction planner’s prediction model.

Prepared using sagej.cls

Page 14: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

14 Journal Title XX(X)

Figure 12. Trajectory comparison between using lateral error constraints (top), a robot-wall HJI control constraint (middle), andusing no wall constraints (bottom). The human car is in the left (top) lane and is constant across all three trials, and the robot car isin the right (bottom) lane. The transparency of the car corresponds to the speed of the car (higher transparency corresponds tohigher speed).

(a) Simulation trial of using MPC+HJI for human car collision avoidance(including lateral error constraints for wall collision avoidance).

(b) Simulation trial of using lateral error constraints for both human carand wall collision avoidance.

Figure 13. Comparison between the MPC+HJI controller (left) and a baseline approach not based on any reachability analysis(right) for the task of dynamic obstacle avoidance. The trajectory of the human car starting in the left (top) lane is visualized in gray;the trajectory of the robot car starting in the right (bottom) lane is visualized in color. In each close-up view, error barsaccompanying each snapshot of the robot illustrate the first (k = 1) lateral state constraint of the MPC problem at that step. To helpcorrespond the positions of the cars over the trials, each pair of matching-shape markers represents a common time instant.

in Brown et al. (2017) but with (deterministic) dynamicobstacles. A simulation of our proposed approach andthe baseline approach is shown in Figure 13. Notably, inthis simulation, the human car defies the baseline’s linear-extrapolation-based prediction model and curves into theother lane towards the robot car.

We see in Figure 13a that when using the MPC+HJIcontroller (with lateral error state constraints for wallavoidance), we are able to successfully avoid a collision withthe human car and the wall, and maintain relative statesoutside the BRS. The baseline approach demonstrated in

Figure 13b, on the other hand, collides with the humanbecause its consideration of only one possible future (i.e.,human trajectory) at each MPC step leads the robot into aregion of inevitable collision (defined against all, includingworst-case, human controls). In particular we see that as thehuman trajectory increases its curvature towards the middleof the lane change, the lateral error bounds (based on anassumption of constant velocity) are not stringent enough tomaintain robot safety. The utility of the MPC+HJI approachis that it distills safety considerations with respect to allpossible realizations of the human trajectory into a single

Prepared using sagej.cls

Page 15: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

Karen Leung, Edward Schmerling, et al. 15

control constraint. Moreover, this constraint is not overlyconservative (to improve the state-constraint-based approachone might imagine defining lateral error constraints to avoidthe entire forward reachable set of the human), because itincorporates the concept of closed-loop feedback into itsBRS computation.

7.2.5 1/10-Scale Human-Driven Vehicle To begin inves-tigating the effects of perception uncertainty on our safetyassurance framework, we use three LiDARs onboard X1to track a human-driven RC car, and implement a Kalmanfilter for human car state estimation (position, velocity, andacceleration). Even with imperfect observations, we showsome successful preliminary results (an example is shownin Figure 14) at mean speeds of 4m/s, close to the limits ofthe RC car + LiDAR-visible mast in crosswinds at the testtrack. We observe similar behavior as in the virtual humancar experiments, including the fact that the value functiondips briefly below zero before the MPC+HJI controller isable to arrest its fall; we discuss this behavior in the nextsection.

8 DiscussionBeyond the qualitative and quantitative confirmation of ourdesign goals, our experimental results reveal three maininsights.

Takeaway 1: The reachability cache is underly-conservativewith respect to robot car dynamics and overly-conservativewith respect to human car dynamics.

In all cases—hardware experiments as well as simulationresults—the HJI value function V dips below zero, indicatingthat neither the HJI+MPC nor even the optimal avoidanceswitching controller are capable of guaranteeing safety inthe strictest sense. The root of this apparent paradox isin the computation of the reachability cache used by bothcontrollers as the basis of their safety assurance. Thoughthe 7-state relative dynamics model (Equation (8)) subsumesa single-track vehicle model that has proven successfulin predicting the evolution of highly dynamic vehiclemaneuvers (Funke et al. 2017; Brown et al. 2017), the wayit is employed in computing the value function V omitsrelevant components of the dynamics. In particular, whencomputing the optimal avoidance control (Equation (2)) aspart of solving the HJI PDE, we assume total freedom overthe choice of robot steering angle δ and longitudinal forcecommand Fx, up to maximum control limits. This does notaccount for, e.g., limits on the steering slew rate (traversing[δmin, δmax] takes approximately 2 seconds), and thus thevalue function is computed under the assumption that therobot can brake/swerve far faster than it actually can.

We note that simply tuning the safety buffer ε isinsufficient to account for these unmodeled dynamics. InFigure 9a we see that V may drop from approximately 0.5(the value when the two cars traveling at 8m/s start side-by-side in lanes) to -0.3 in the span of a few tenths of asecond. Selecting ε > 0.5 might give enough time for thesteering to catch up, but such a selection would preventthe robot car from accomplishing the traffic weaving taskeven under nominal conditions, i.e., when the human caris equally concerned about collision avoidance. Even for

ε = 0.05, we see that the human car can cause the robotto swerve by just staying in its lane but with a small offsettowards the robot’s lane (see Figure 12). This is becausethe safety controller would push the robot car outside of itslane from the outset to maintain the buffer. This behaviorfollows from wide level sets associated with the transientcontrol authority asymmetry (recall that in the HJI relativedynamics the human car may adjust its trajectory curvaturediscontinuously), assumed as a conservative safety measureas well as a way to keep the relative state dimensionmanageable.

The simplest remedy for both of these issues is toincrease the fidelity of the relative dynamics model byincorporating additional integrator states δ for the robot andω for the human. Naively increasing the state dimensionto 8 or 9, however, might not be computationally feasible(even offline) without devising more efficient HJI solutiontechniques or choosing an extremely coarse discretizationover the additional states. By literature standards we alreadyuse a relatively coarse discretization grid for solving theHJI PDE; associated numerical inaccuracies may be anothersource of the observed safety mismatch and alternate gridchoices are a possible subject of future investigation. Webelieve that simulation, accounting for slew rates, could bea good tool to prototype such efforts, noting that as it standswe have relatively good agreement between simulation andthe experimental platform in our testing.

Takeaway 2: Interpretability of the value function V shouldbe a key consideration in future work.

In this work the terminal value function V (0, xrel

) isspecified as the separation/penetration distance between thebounding boxes of the two vehicles, a purely geometricquantity dependent only on pxrel

, pyrel , and ψrel

. Recallingthat V (:= V∞) represents the worst-case eventual outcomeof a differential game assuming optimal actions from bothrobot and human, we may interpret the above results throughthe lens of worst-case outcomes, i.e., a value of -0.3 maybe thought of as 30cm of collision penetration assumingoptimal collision seeking/avoidance from human/robot.When extending this work to cases with environmentalobstacles (e.g., concrete highway boundaries that precludelarge deviations from the lane), or multi-agent settings wherethe robot must account for the uncertainty in multiple otherparties’ actions, for many common scenarios it may be thecase that guaranteeing absolute safety is impossible. Insteadof avoiding a BRS of states that might lead to collision, weshould instead treat the value function inside the BRS asa cost. In particular we should specify more contextuallyrelevant values of V (0, x

rel) for states in collision, e.g.,

negative kinetic energy or another notion of collision severityas a function of the velocity states UxR

, UyR , vH

, and rR

inaddition to the relative pose. This would lead to a controllerthat prefers, in the worst case, collisions at lower speed, orperhaps “glancing blows” where the velocities of the two carsare similar in magnitude and direction.

Takeaway 3: Static obstacles should be accounted forusing MPC tracking state constraints and not HJI controlconstraints. It is natural to use HJI when reasoning aboutpotential dangers from a dynamic, unpredictable agent,but for static obstacles there is no uncertainty in how

Prepared using sagej.cls

Page 16: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

16 Journal Title XX(X)

Figure 14. Time-lapse of pairwise vehicle interaction: X1 with 1/10-scale human-driven RC car. The RC car (green trajectory)nudges into X1 (red trajectory), which swerves gently to avoid. Inset: The value function over time of the X1-human driven RC carexperiment

the obstacle may act. That is, when the other agent isdeterministic, there is no notion of optimizing robot safetypolicies over all worst-case actions by the other agent.Indeed, in this case it is possible to confidently extendthe safety constraint along the entire MPC horizon, asdemonstrated in Section 7.2.3. This is as opposed to derivinga safety constraint to be applied at the first step (on thepresent control action; the HJI+MPC approach), as must bedone with a nondeterministic agent with unknown futuretrajectory. Spreading the collision avoidance constraint overthe problem horizon allows for the MPC optimizationto more smoothly accomplish its objectives (i.e., controlsmoothness and tracking performance) while maintainingsafety. We note, however, that naive extensions of thisapproach to nondeterministic agents (recall Section 7.2.4) donot similarly provide strong safety assurances.

9 ConclusionsWe have investigated a control scheme for providingreal-time safety assurance to underpin the guidance ofa probabilistic planner for human-robot vehicle-vehicleinteractions. By essentially projecting the planner’s desiredtrajectory into the set of safety-preserving controls wheneversafety is threatened, we preserve more of the planner’s intentthan would be achieved by adopting the optimal control withrespect to separation distance. Our experiments show thatwith our proposed minimally interventional safety controller,we accomplish the high level objective (traffic weaving)despite the human car swerving directly onto the path of therobot car, and accomplish this relatively smoothly comparedto using a switching controller that results in the robot carswerving more violently off the road. Further, we investigatethe addition of a road boundary wall to our formulation toprevent the robot car from swerving completely out of thelane which could be dangerous in realistic road settings, andcompare against a baseline approach using state constraintsto avoid dynamic obstacles to show that our framework isbetter designed to keep the robot car safe even in the face ofworst-case human car behaviors.

We note that this work represents only a promising firststep towards the integration of reachability-based safetyguarantees into a probabilistic planning framework. Futurework includes investigating this framework for cases where

the human and robot have very different dynamics, suchas a pedestrian or cyclist interacting with a car, or ahuman interacting with a robotic manipulator, and forinteractions involving multiple (more than two) agents. Wemay also consider adapting our approach to ensure highplanning performance while guaranteeing satisfaction ofconstraints other than safety, e.g., task requirements specifiedby temporal logic constraints as considered in Chen et al.(2018). In the context of human-robot vehicle interactions,we have already discussed the concrete modifications tothis controller we believe are necessary to improve thepractical impact of our theoretical guarantees; further studyshould also consider better fitting of the planning objectiveat the controller level. That is, instead of performing a naiveprojection, i.e., the one that minimizes trajectory trackingerror, it is likely that a more nuanced selection informedby the planner’s prediction model would represent a better“backup choice” in the case that safety is threatened. Werecognize that ultimately, guaranteeing absolute safety on acrowded roadway may not be realistic, but we believe thatin such situations value functions derived from reachabilitymay provide a useful metric for near-instantly evaluating thefuture implications of a present action choice.

Acknowledgements

This work was supported by the Office of Naval Research(Grant N00014-17-1-2433), by Qualcomm, and by theToyota Research Institute (“TRI”). This article solely reflectsthe opinions and conclusions of its authors and not ONR,Qualcomm, TRI, or any other Toyota entity. The authorswould like to thank the X1 team, in particular Matt Brown,Larry Cathey, and Amine Elhafsi, Matteo Zallio, and theThunderhill Raceway Park for accommodating testing.

References

Althoff M and Dolan J (2014) Online verification of automatedroad vehicles using reachability analysis. IEEE Transactionson Robotics 30(4): 903–918.

Althoff M and Dolan JM (2011) Set-based computation of vehiclebehaviors for the online verification of autonomous vehicles.In: Proc. IEEE Int. Conf. on Intelligent Transportation Systems.

Prepared using sagej.cls

Page 17: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

Karen Leung, Edward Schmerling, et al. 17

Althoff M and Krogh BH (2014) Reachability analysis ofnonlinear differential-algebraic systems. IEEE Transactions onAutomatic Control 59(2): 371–383.

Arora S, Choudhury S, Althoff D and Scherer S (2015) Emergencymaneuver library – ensuring safe navigation in partiallyknown environments. In: Proc. IEEE Conf. on Robotics andAutomation.

Bajcsy A, Bansal S, Bronstein E, Tolani V and Tomlin CJ (2019)An efficient reachability-based framework for provably safeautonomous navigation in unknown environments. In: Proc.IEEE Conf. on Decision and Control.

Bokanowski O, Forcadel N and Zidani H (2010) Reachability andminimal times for state constrained nonlinear problems withoutany controllability assumption. SIAM Journal on Control andOptimization 48(7): 4292–4316.

Brown M, Funke J, Erlien S and Gerdes JC (2017) Safe drivingenvelopes for path tracking in autonomous vehicles. ControlEngineering Practice 61: 307–316.

Chen M, Hu Q, Fisac J, Akametalu K, Mackin C and TomlinC (2017) Reachability-based safety and goal satisfaction ofunmanned aerial platoons on air highways. AIAA Journal ofGuidance, Control, and Dynamics 40(6): 1360 – 1373.

Chen M, Tam Q, Livingston SC and Pavone M (2018)Signal temporal logic meets Hamilton-Jacobi reachability:Connections and applications. In: Workshop on AlgorithmicFoundations of Robotics. In Press.

Chen M and Tomlin CJ (2018) Hamilton–Jacobi reachability: Somerecent theoretical advances and applications in unmannedairspace management. Annual Review of Control, Robotics,and Autonomous Systems 1(1): 333–358. DOI:10.1146/annurev-control-060117-104941.

Dhinakaran A, Chen M, Chou G, Shih JC and Tomlin CJ (2017)A hybrid framework for multi-vehicle collision avoidance. In:Proc. IEEE Conf. on Decision and Control.

Fisac JF, Akametalu AK, Zeilinger MN, Kaynama S, Gillula J andTomlin CJ (2018) A general safety framework for learning-based control in uncertain robotic systems. IEEE Transactionson Automatic Control 64(7): 2737–2752.

Fisac JF, Chen M, Tomlin CJ and Sastry SS (2015) Reach-avoidproblems with time-varying dynamics, targets and constraints.In: Hybrid Systems: Computation and Control.

Frehse G, Le Guernic C, Donze A, Cotton S, Ray R, Lebeltel O,Ripado R, Girard A, Dang T and Maler O (2011) Spaceex:Scalable verification of hybrid systems. In: Proc. Int. Conf.Computer Aided Verification.

Funke J, Brown M, Erlien SM and Gerdes JC (2017) Collisionavoidance and stabilization for autonomous vehicles inemergency scenarios. IEEE Transactions on Control SystemsTechnology 25(4): 1204 – 1216.

Gattami A, Al Alam A, Johansson KH and Tomlin CJ (2011)Establishing safety for heavy duty vehicle platooning: A gametheoretical approach. IFAC World Congress 44(1): 3818–3823.

Gray A, Gao Y, Hedrick JK and Borrelli F (2013) Robust predictivecontrol for semi-autonomous vehicles with an uncertain drivermodel. In: IEEE Intelligent Vehicles Symposium.

Greenstreet MR and Mitchell I (1998) Integrating projections. In:Hybrid Systems: Computation and Control.

Koolen T and contributors (2018) Parametron.jl: Efficiently solvinginstances of a parameterized family of optimization problemsin Julia. Available at https://github.com/tkoolen/

Parametron.jl.Kurzhanski A and Varaiya P (2000) Ellipsoidal techniques for

reachability analysis: Internal approximation. System andControl Letters 41(3): 201–211.

Liu SB, Roehm H, Heinzemann C, Ltkebohle I, Oehlerking J andAlthoff M (2017) Provably safe motion of mobile robots inhuman environments. In: IEEE/RSJ Int. Conf. on IntelligentRobots & Systems.

Lorenzetti J, Chen M, Landry B and Pavone M (2018) Reach-avoidgames via mixed-integer second-order cone programming. In:Proc. IEEE Conf. on Decision and Control.

Majumdar A, Vasudevan R, Tobenkin MM and Tedrake R (2014)Convex optimization of nonlinear feedback controllers viaoccupation measures. Int. Journal of Robotics Research 33(9):1209–1230.

Margellos K and Lygeros J (2011) Hamilton-Jacobi formulationfor reach-avoid differential games. IEEE Transactions onAutomatic Control 56(8): 1849–1861. DOI:10.1109/TAC.2011.2105730.

Mattingley J and Boyd S (2012) CVXGEN: A code generator forembedded convex optimization. Optimization and Engineering12(1): 1–27.

Mitchell IM, Bayen AM and Tomlin CJ (2005) A time-dependentHamilton-Jacobi formulation of reachable sets for continuousdynamic games. IEEE Transactions on Automatic Control50(7): 947–957.

Pacejka H (2002) Tire and Vehicle Dynamics. Elsevier.Patterson V, Thornton S and Gerdes JC (2018) Tire modeling to

enable model predictive control of automated vehicles fromstandstill to the limits of handling. In: Int. Symp. on AdvancedVehicle Control.

Revels J, Lubin M and Papamarkou T (2016) Forward-modeautomatic differentiation in julia. Available at https://arxiv.org/abs/1607.07892.

Sadigh D, Sastry S, Seshia SA and Dragan AD (2016) Planning forautonomous cars that leverage effects on human actions. In:Robotics: Science and Systems.

Schmerling E, Leung K, Vollprecht W and Pavone M (2018)Multimodal probabilistic model-based planning for human-robot interaction. In: Proc. IEEE Conf. on Robotics andAutomation.

Stellato B, Banjac G, Goulart P, Bemporad A and Boyd S (2017)OSQP: An operator splitting solver for quadratic programsAvailable at https://arxiv.org/abs/1711.08013.

Tanabe K and Chen M (2018) BEACLS: Berkeley Efficient API inC++ for Level Set methods. Available at https://github.com/HJReachability/beacls.

Wolf MT and Burdick JW (2008) Artificial potential functions forhighway driving with collision avoidance. In: Proc. IEEE Conf.on Robotics and Automation.

Yoon S, Fern A, Givan R and Kambhampati S (2008) Probabilisticplanning via determinization in hindsight. In: Proc. AAAI Conf.on Artificial Intelligence.

Prepared using sagej.cls

Page 18: Journal Title On Infusing Reachability-Based Safety ...asl.stanford.edu/wp-content/papercite-data/pdf/Leung.Sch...human was to swerve (see right). The robot is in an unsafe state if

18 Journal Title XX(X)

A Vehicle Parameters

Table 1. X1 parameters relevant to defining the equations ofmotion in Equation (5).

Variable Description Valueg Standard Earth gravity 9.80665ms−2

m Total mass 1964kgIzz Yaw moment of inertia 2900 kgm2

h Vertical height of CG 0.47md

fDistance from CG to front axle 1.4978m

dr

Distance from CG to rear axle 1.3722mCαf Front cornering stiffness 150kN/radCαr Rear cornering stiffness 220kN/radCd0 Drag coefficient constant 241NCd1 Drag coefficient linear 25.1Nm−1sCd2 Drag coefficient quadratic 0.0Nm−2s2

FxfFront wheel drive fraction (Fx ≥ 0) 0

FxrRear wheel drive fraction (Fx ≥ 0) 1

FxfFront wheel brake fraction (Fx < 0) 0.6

Fxr Rear wheel brake fraction (Fx < 0) 0.4

B Trajectory Parameters

Table 2. Parameter values relevant for the MPC trackingoptimization in Equation (10).

Variable Description ValueQ∆s Quadratic cost on longitudinal error 1.0m−2s−1

Q∆ψ Quadratic cost on heading error 1.0s−1

Qe Quadratic cost on lateral error 1.0m−2s−1

R∆δ Quadratic cost on change in steeringangle

0.1s

R∆Fx Quadratic cost on change in longitu-dinal tire force

0.5N−2s

Wβ Linear cost on sideslip stability slackvariable

900π s−1

Wr Linear cost on yaw rate stability slackvariable

50.0

WHJI Linear cost on HJI slack variable 500.0m−1

We Linear cost on lateral bound slackvariable

500.0m−1s−1

δmin Minimum steering angle -18× π180

δmax Maximum steering angle 18× π180

δmin Minimum steering rate -0.344 s−1

δmax Maximum steering rate 0.344 s−1

Fx,min Minimum longitudinal tire force -16794NFx,max Maximum longitudinal tire force min(5600,

75000Ux

)NUx,min Minimum longitudinal velocity 1.0ms−1

Ux,max Maximum longitudinal velocity 15ms−1

Nlong Number of long MPC time steps 10Nshort Number of short MPC time steps 5N

HJINumber of time steps with HJIconstraint applied

3

∆tlong Length of long MPC time step 0.2s∆tshort Length of short MPC time step 0.01s

ε Size of HJI value function buffer 0.05m

Prepared using sagej.cls