baidu apollo em motion planner · 2018-07-24 · baidu apollo em motion planner haoyang fan1 ;y,...

Baidu Apollo EM Motion Planner

Haoyang Fan1,†, Fan Zhu2,†, Changchun Liu, Liangliang Zhang, Li Zhuang,Dong Li, Weicheng Zhu, Jiangtao Hu, Hongye Li, Qi Kong3,∗

Abstract— In this manuscript, we introduce a real-time motion planning system based on the BaiduApollo (open source) autonomous driving platform.The developed system aims to address the industriallevel-4 motion planning problem while consideringsafety, comfort and scalability. The system coversmultilane and single-lane autonomous driving in ahierarchical manner: (1) The top layer of the systemis a multilane strategy that handles lane-change sce-narios by comparing lane-level trajectories computedin parallel. (2) Inside the lane-level trajectory gener-ator, it iteratively solves path and speed optimizationbased on a Frenet frame. (3) For path and speedoptimization, a combination of dynamic programmingand spline-based quadratic programming is proposedto construct a scalable and easy-to-tune frameworkto handle traffic rules, obstacle decisions and smooth-ness simultaneously. The planner is scalable to bothhighway and lower-speed city driving scenarios. Wealso demonstrate the algorithm through scenarioillustrations and on-road test results.

The system described in this manuscript has beendeployed to dozens of Baidu Apollo autonomousdriving vehicles since Apollo v1.5 was announced inSeptember 2017. As of May 16th, 2018, the systemhas been tested under 3,380 hours and approximately68,000 kilometers (42,253 miles) of closed-loop au-tonomous driving under various urban scenarios.

The algorithm described in this manuscriptis available at https://github.com/ApolloAuto/apollo/tree/master/modules/planning.

I. INTRODUCTION

Autonomous driving research began in the 1980sand has significantly grown over the past ten years.Autonomous driving aims to reduce road fatalities,

† Authors who contributed equally in this manuscript∗ Corresponding author for this manuscript# www.apollo.auto1 Haoyang Fan is with Baidu USA LLC,

[email protected] Fan Zhu is with Baidu USA LLC, [email protected] Qi Kong is with Baidu USA LLC, [email protected].

increase traffic efficiency and provide convenienttravel. However, autonomous driving is a chal-lenging task that requires accurately sensing theenvironment, a deep understanding of vehicle inten-tions and safe driving under different scenarios. Toaddress these difficulties, we constructed an Apolloopen source autonomous driving platform. Theflexible modularized architecture of the developedplatform supports fully autonomous driving deploy-ment https://github.com/ApolloAuto/apollo.

In the figure, the HD map module provides ahigh-definition map that can be accessed by everyon-line module. Perception and localization mod-ules provide the necessary dynamic environmentinformation, which can be further used to predictfuture environment status in the prediction module.The motion planning module considers all infor-mation to generate a safe and smooth trajectory tofeed into the vehicle control module.

In motion planner, safety is always the top pri-ority. We consider autonomous driving safety in,but not limited to, the following aspects: trafficregulations, range coverage, cycle time efficiencyand emergency safety. All these aspects are critical.Traffic regulations are designed by governments forpublic transportation safety, and such regulationsalso apply to autonomous driving vehicles. Anautonomous driving vehicle should follow trafficregulations at all times. For range coverage, we aimto provide a trajectory with at least an eight secondor two hundred meter motion planning trajectory.The reason is to leave enough room to maintain safedriving within regular autonomous driving vehicledynamics. The execution time of the motion plan-ning algorithm is also important. In the case of anemergency, the system could react within 100 ms,compared with a 300 ms reaction time for a normal

arX

iv:1

807.

0804

8v1

[cs

.RO

] 2

0 Ju

l 201

8

https://github.com/ApolloAuto/apollo/tree/master/modules/planning



https://github.com/ApolloAuto/apollo

https://github.com/ApolloAuto/apollo

human driver. A safety emergency module is thelast shell for protecting the safety of riders. Fora level-4 motion planner, once upstream modulescan no longer function normally, the safety modulewithin the motion planner shall respond with animmediate emergency behavior and send warningsthat interrupt humans. Moreover, the autonomousdriving system shall have the ability to react toemergencies in a lower-level-like control module.Further safety design is beyond this manuscript’sscope.

Fig. 1 shows the architecture of the Apollo onlinemodules.

HD Map

Perception

Localization

Routing

Prediction

MotionPlanning

VehicleControl

Apollo Realtime Module Pipeline

Fig. 1: On board modules of the Apollo open sourceautonomous driving platform

In addition to safety, passengers’ ride experienceis also important. The measurement of ride experi-ence includes, but is not limited to, scenario cov-erage, traffic regulation and comfort. For scenariocoverage, the motion planner should not only beable to handle simple driving scenarios (e.g., stop,nudge, yield and overtake) but also handle multi-

lane driving, heavy traffic and other complicatedon-road driving scenarios. Planning within trafficregulations is also important for ride experience. Inaddition to being a safety requirement, followingthe traffic regulations will also minimize the risk ofaccidents and reduce emergency reactions for au-tonomous driving. The comfort during autonomousdriving is also important. In motion planning, com-fort is generally measured by the smoothness of theprovided autonomous driving trajectory.

This manuscript presents the Apollo EM planner,which is based on an EM-type iterative algorithm([1]). This planner targets safety and ride experi-ence with a multilane, path-speed iterative, trafficrule and decision combined design. The intuitionsfor this planner are introduced as follows:

A. Multilane Strategy

For level-4 on-road autonomous driving motionplanning, a lane-change strategy is necessary. Onecommon approach is to develop a search algorithmwith a cost functional on all possible lanes and thenselect a final trajectory from all the candidates withthe lowest cost [2], [3]. This approach has somedifficulties. First, the search space is expandedacross multiple lanes, which causes the algorithmto be computationally expensive. Second, trafficregulations (e.g., right of the road, traffic lights)are different across lanes, and it is not easy toapply traffic regulations under the same frame.Furthermore, trajectory stability that avoids suddenchanges between cycles should be taken into con-sideration. It is important to follow consistent on-road driving behavior to inform other drivers of theintention of the autonomous driving vehicle.

Typically, a multilane strategy should cover bothnonpassive and passive lane-change scenarios. InEM planner, a nonpassive lane-change is a requesttriggered by the routing module for the purposeof reaching the final destination. A passive lanechange is defined as an ego car maneuver whenthe default lane is blocked by the dynamic en-vironment. In both passive and nonpassive lanechanges, we aim to deliver a safe and smooth lane-change strategy with a high success rate. Thus,we propose a parallel framework to handle both

passive and nonpassive lane changes. For candidatelanes, all obstacles and environment information areprojected on lane-based Frenet frames. Then, thetraffic regulations are bound with the given lane-level strategy. Under this framework, each candi-date lane will generate a best-possible trajectorybased on the lane-level optimizer. Finally, a cross-lane trajectory decider will determine which lane tochoose based on both the cost functional and safetyrules.

B. Path-Speed Iterative Algorithm

In lane-level motion planning, optimality andtime consumption are both important. Thus, manyautonomous driving motion planning algorithmsare developed in Frenet frames with time (SLT)to reduce the planning dimension with the helpof a reference line. Finding the optimal trajectoryin a Frenet frame is essentially a 3D constrainedoptimization problem. There are typically two typesof approaches: direct 3D optimization methods andthe path-speed decoupled method. Direct methods(e.g., [4] and [5]) attempt to find the optimal tra-jectory within SLT using either trajectory samplingor lattice search. These approaches are limited bytheir search complexity, which increases as boththe spatial and temporal search resolutions increase.To qualify the time consumption requirement, onehas to compromise with increasing the search gridsize or sampling resolution. Thus, the generatedtrajectory is suboptimal. Conversely, the path-speeddecoupled approach optimizes path and speed sep-arately. Path optimization typically considers staticobstacles. Then, the speed profile is created basedon the generated path [6]. It is possible that thepath-speed approach is not optimal with the ap-pearance of dynamic obstacles. However, sincethe path and speed are decoupled, this approachachieves more flexibility in both path and speedoptimization.

EM planner optimizes path and speed iteratively.The speed profile from the last cycle is used toestimate interactions with oncoming and low-speeddynamic obstacles in the path optimizer. Then, thegenerated path is sent to the speed optimizer toevaluate an optimal speed profile. For high-speed

dynamic obstacles, EM planner prefers a lane-change maneuver rather than nudging for safetyreasons. Thus, the iterative strategy of EM plannercan help to address dynamic obstacles under thepath-speed decoupled framework.

C. Decisions and Traffic Regulations

In EM planner, decisions and traffic regulationsare two different constraints. Traffic regulations area non-negotiable hard constraint, whereas obstacleyield, overtake, and nudge decisions are negotiablebased on different scenarios. For the decision-making module, some planners directly apply nu-merical optimization [7], [5] to make decisionsand plans simultaneously. In Apollo EM planner,we make decisions prior to providing a smoothtrajectory. The decision process is designed to makeon-road intentions clear and reduce the search spacefor finding the optimal trajectory. Many decision-included planners attempt to generate vehicle statesas the ego car decision. These approaches can befurther divided into hand-tuning decisions ([8], [9],[3]) and model-based decisions ([10], [11]). Theadvantage of hand-tuning decision is its tunability.However, scalability is its limitation. In some cases,scenarios can go beyond the hand-tuning decisionrule’s description. Conversely, the model-based de-cision approaches generally discretize the ego carstatus into finite driving statuses and use data-drivenmethods to tune the model. In particular, somepapers, such as [12] and [13], propose a unifiedframework to handle decisions and obstacle pre-diction simultaneously. The consideration of multi-agent interactions will benefit both prediction anddecision-making processes.

Targeting level-4 autonomous driving, a decisionmodule shall include both scalability and feasibil-ity. Scalability is the scenario expression ability(i.e., the autonomous driving cases that can beexplained). When considering dozens of obstacles,the decision behavior is difficult to be accuratelydescribed by a finite set of ego car states. Forfeasibility, we mean that the generated decisionshall include a feasible region in which the egocar can maneuver within dynamic limitations. How-ever, both hand-tuning and model-based decisions

do not generate a collision-free trajectory to verifythe feasibility.

In EM planner’s decision step, we describe thebehavior differently. First, the ego car movingintention is described by a rough and feasibletrajectory. Then, the interactions between obstaclesare measured with this trajectory. This feasibletrajectory-based decision is scalable even whenscenarios become more complicated. Second, theplanner will also generate a convex feasible spacefor smoothing spline parameters based on the tra-jectory. A quadratic-programming-based smoothingspline solver could be used to generate smootherpath and speed profiles that follow the decision.This guarantees a feasible and smooth solution.

The remainder of this paper is organized asfollows. In section II, we introduce the multilaneframework inside EM planner. In section III, wefocus on lane-level optimization and discuss EMiteration step by step. In section IV, we providean example with oncoming traffic for a simpledemonstration. Section V discuss the performanceof EM planner. In section VI, we finalize thediscussion and present conclusions for EM planner.

II. EM PLANNER FRAMEWORK WITHMULTILANE STRATEGY

In this section, we first introduce the architectureof Apollo EM planner, and then we focus onthe structure of the lane-level optimizer. Fig. 2presents an overview of EM planner. On top ofthe planner, all sources of information are collectedand synced at the data center module. After datacollection, the reference line generator will producesome candidate lane-level reference lines alongwith information about traffic regulations and ob-stacles. This process is based on the high-definitionmap and navigation information from the routingmodule. During lane-level motion planning, wefirst construct a Frenet frame based on a specifiedreference line. The relation between the ego carand its surrounding environment is evaluated inthe Frenet frame constructed by the reference line,as well as traffic regulations. Furthermore, restruc-tured information passes to the lane-level optimizer.The lane-level optimizer module performs path

Reference Line Generator

Reference Line Frenet Frame

Optimizer

SL Projection (E-step)

Path Planning (M-step)

ST Projection (E-step)

Speed Planning (M-step)

Lane 1

Reference Line Frenet Frame

Optimizer

SL Projection (E-step)

Path Planning (M-step)

ST Projection (E-step)

Speed Planning (M-step)

Lane 2

Reference Line Trajectory Decider

Feasible Car Trajectory

EM Planner

Data Center

Fig. 2: EM Framework

optimization and speed optimization. During pathoptimization, information about the surroundings isprojected on the Frenet frame (E-step).Based on theinformation projected in the Frenet frame, a smoothpath is generated (M-step). Similarly, during speedoptimization, once a smooth path is generated bythe path optimizer, obstacles are projected on thestation-time graph (E-step). Then, the speed op-timizer will generate a smooth speed profile (M-step). Combining path and speed profiles, we willobtain a smooth trajectory for the specified lane. Inthe last step, all lane-level best trajectories are sentto the reference line trajectory decider. Based onthe current car status, regulations and the cost ofeach trajectory, the trajectory decider will decide abest trajectory for the ego car maneuver.

III. EM PLANNER AT LANE LEVEL

In this section, we discuss the lane-level op-timization problem. Fig. 3 shows the path-speedEM iteration inside lane-level planning. The it-eration includes two E-steps and two M-steps inone planning cycle. The trajectory information williterate between planning cycles. We explain thesubmodules as follows.

E-step SL Projection

DP Path Decision

QP Path Planning

M-step

Path Profile

Past Cycle Trajectory

DP Speed Decision

QP Speed Planning

M-step

E-step ST Projection

Trajectory Vehicle Control

Fig. 3: EM Iteration

In the first E-step, obstacles are projected on thelane Frenet frame. This projection includes bothstatic obstacle projection and dynamic obstacle pro-jection. Static obstacles will be projected directlybased on a Cartesian-Frenet frame transformation.In the Apollo framework, the intentions of dynamicobstacles are described with an obstacle movingtrajectory. Considering the previous cycle planningtrajectory, we can evaluate the estimated dynamicobstacle and ego car positions at each time point.Then, the overlap of dynamic obstacles and the egocar at each time point will be mapped in the Frenetframe. In addition, the appearance of dynamicobstacles during path optimization will eventuallylead to nudging. Thus, for safety considerations,the SL projection of dynamic obstacles will onlyconsider low-speed traffic and oncoming obstacles.For high-speed traffic, EM planner’s parallel lane-change strategy will cover the scenario. In thesecond E-step, all obstacles, including high-speed,low-speed and oncoming obstacles, are evaluatedon the station-time frame based on the generatedpath profile. If the obstacle trajectory has overlapwith the planned path, then a corresponding regionin the station-time frame will be generated.

In two M-steps, path and speed profiles are gen-erated by a combination of dynamic programmingand quadratic programming. Although we projectedobstacles on SL and ST frames, the optimal pathand speed solution still lies in a non-convex space.Thus, we use dynamic programming to first obtaina rough solution; meanwhile, this solution canprovide obstacle decisions such as nudge, yield and

overtake. We use the rough decision to determinea convex hull for the quadratic-programming-basedspline optimizer. Then, the optimizer can find so-lutions within the convex hull. We will cover themodules in the following.

A. SL and ST Mapping (E-step)

The SL projection is based on a G2 (contin-uous curvature derivative) smooth reference lineas in [3]. In Cartesian space, obstacles and theego car status are described with location andheading (x, y, θ), as well as curvature and thederivative of curvature (κ, dκ) for the ego car. Then,these are mapped to the Frenet frame coordinates(s, l, dl, ddl, dddl), which represent station, lateral,and lateral derivatives. Since the positions of staticobstacles are time invariant, the mapping is straight-forward. For dynamic obstacles, we mapped theobstacles with the help of the last cycle trajectoryof the ego car. The last cycle’s moving trajectoryis projected on the Frenet frame to extract thestation direction speed profile. This will provide anestimate of the ego car’s station coordinates givena specific time. The estimated ego car station coor-dinates will help to evaluate the dynamic obstacleinteractions. Once an ego car’s station coordinateshave interacted with an obstacle trajectory pointwith the same time, a shaded area on the SLmap will be marked as the estimated interactionwith the dynamic obstacle. Here, the interactionis defined as the ego car and obstacle boundingbox overlapping. For example, as shown in Fig. 4,an oncoming dynamic obstacle and correspondingtrajectory estimated from the prediction module aremarked in red. The ego car is marked in blue.The trajectory of the oncoming dynamic obstacle isfirst discretized into several trajectory points withtime, and then the points are projected to the Frenetframe. Once we find that the ego car’s stationcoordinates have an interaction with the projectedobstacle points, the overlap region (shown in purplein the figure) will be marked in the Frenet frame.

ST projection helps us evaluate the ego car’sspeed profile. After the path optimizer generatesa smooth path profile in the Frenet Frame, bothstatic obstacle and dynamic obstacle trajectories

s

l

t = 0 t = 0.5 t = 1 t = 1.5 t = 2 t = 2.5 t = 3 t = 3.5 t = 4 t = 4.5 t = 5 t = 5.5 t = 6 t = 6.5 t = 7 t = 7.5 t = 8

t = 0t = 0.5t = 1t = 1.5t = 2t = 2.5t = 3t = 3.5

s

l

Fig. 4: SL projection with oncoming traffic example

are projected on the path if there are any interac-tions.An interaction is also defined as the boundingboxes overlapping. In Fig. 5, one obstacle cut intothe ego driving path at 2 seconds and 40 metersahead, as marked in red, and one obstacle behindthe ego car is marked in green. The remainingregion is the speed profile feasible region. Thespeed optimization M-step will attempt to find afeasible smooth solution in this region.

0 2 4 6 8

020

4060

8010

0

time

s

cut−in obstacle

obstacle behind

Fig. 5: ST projection with cut-in obstacle andobstacle behind ego car

B. M-Step DP Path

The M-step path optimizer optimizes the pathprofile in the Frenet frame. This is represented asfinding an optimal function of lateral coordinatel = f(s) w.r.t. station coordinate in nonconvex SLspace (e.g., nudging from left and right might betwo local optima). Thus, the path optimizer includestwo steps: dynamic-programming-based path deci-sion and spline-based path planning. The dynamic

programming path step provides a rough path pro-file with feasible tunnels and obstacle nudge deci-sions. As shown in Fig. 6, the step includes a latticesampler, cost function and dynamic programmingsearch.

The lattice sampler is based on a Frenet frame.As shown in Fig. 7, multiple rows of points are firstsampled ahead of the ego vehicle. Points betweendifferent rows are smoothly connected by quinticpolynomial edges. The interval distance betweenrows of points depends on the speed, road structure,lane change and so forth. The framework allowscustomizing the sampling strategy based on appli-cation scenarios. For example, a lane change mightneed a longer sampling interval than current lanedriving. In addition, the lattice total station distancewill cover at least 8 seconds or 200 meters forsafety considerations.

After the lattice is constructed, each graph edgeis evaluated by the summation of cost functionals.We use information from the SL projection, trafficregulations and vehicle dynamics to construct thefunctional. The total edge cost functional is a linearcombination of smoothness, obstacle avoidance andlane cost functionals.

Ctotal(f(s)) = Csmooth(f)+Cobs(f)+Cguidance(f)

The smoothness functional for a given path ismeasured by:

Csmooth(f) = w1

∫(f ′(s))2ds+ w2

∫(f ′′(s))2ds

+ w3

∫(f ′′′(s))2ds.

In the smoothness cost functional, f ′(s) representsthe heading difference between the lane and egocar, f ′′(s) is related to the curvature of the path, andf ′′′(s) is related to the derivative of the curvature

of the ego car. With the form of polynomials, theabove cost can also be evaluated analytically.

The obstacle cost given an edge is evalu-ated at a sequence of fixed station coordinatess0, s1, ..., sn with all obstacles. The obstacle costfunctional is based on the bounding box distancebetween the obstacle and ego car. Denote the dis-tance as d. The form of individual cost is givenby:

Cobs(d) =

0, d > dn

Cnudge(d− dc), dc ≤ d ≤ dnCcollision d < dc

,

where Cnudge is defined as a monotonically de-creasing function. dc is set to leave a buffer forsafety considerations. The nudge range dn is ne-gotiable based on the scenario. Ccollision is thecollision cost, which has a large value that helpsto detect infeasible paths.

The lane cost includes two parts: guidance linecost and on-road cost. The guidance line is definedas an ideal driving path when there are no surround-ing obstacles. This line is generally extracted asthe centerline of the path. Define the guidance linefunction as g(s). Then, it is measured as

Cguidance(f) =

∫(f(s)− g(s))2ds

. The on-road cost is typically determined by theroad boundary. Path points that are outside the roadwill have a high penalty.

The final path cost is a combination of all thesesmooth, obstacle and lane costs. Then, edge costsare used to select a candidate path with the lowestcost through a dynamic programming search. Thecandidate path will also determine the obstacledecisions. For example, in Fig. 7, the obstacle ismarked as a nudge from the right side.

C. M-Step Spline QP Path

The spline QP path step is a refinement of thedynamic programming path step. In a dynamicprogramming path, a feasible tunnel is generatedbased on the selected path. Then, the spline-basedQP step will generate a smooth path within thisfeasible tunnel, as shown in Fig. 8.

Pointwise CostFunctional

Lattice SamplingStrategy

Path Lattice

Dynamic Programming Search

SL Map&Obs.Info.

Vehicle Dynamic

TrafficRegulations

DP Path FeasibleTunnel

NudgeDecisions

DP Result

Fig. 6: Dynamic programming structure

The spline QP path is generated by optimizingan objective function with a linearized constraintthrough the QP spline solver. Fig. 9 shows thepipeline of the QP path step.

The objective function of the QP path is a linearcombination of smoothness costs and guidance linecost. The guidance line in this step is the DPpath. The guidance line provides an estimate of theobstacle nudging distance. Mathematically, the QPpath step optimizes the following functional:

Cs(f) = w1

∫(f ′(s))2ds+ w2

∫(f ′′(s))2ds

+ w3

∫(f ′′′(s))2 + w4

∫(f(s)− g(s))2ds.

where g(s) is the DP path result. f ′(s), f ′′(s)and f ′′′(s) are related to the heading, curvatureand derivative of curvature. The objective functiondescribes the balance between nudging obstaclesand smoothness.

The constraints in the QP path include boundaryconstraints and dynamic feasibility. These con-straints are applied on f(s), f ′(s) and f ′′(s) at asequence of station coordinates s0, s1, ...., sn. Toextract boundary constraints, the feasible ranges atstation points are extracted. The feasible range ateach point is described as (llow,i,≤ lhigh,i). In EMplanner, the ego vehicle is considered under thebicycle model. Thus, simply providing a range forl = f(s) is not sufficient since the heading of theego car also matters.

As shown in Fig. 10, to keep the boundary con-

s

l

s

lright nudge

s

l

s

l

Fig. 7: Dynamic programming path optimizer sampling for default lane and change lane

s

l

s

lright nudge

left nudge

DP solutionQP solution

Fig. 8: Spline QP Path Example

SmoothFunctional DP Path

Cost Functional

FeasibleTunnel

Linearized Constraint

Spline QP Solver

VehicleDynamics

Smooth Path

Fig. 9: Spline QP Path Example

straint convex and linear, we add two half circleson the front and rear ends of the ego car. Denotethe front-to-rear wheel center distance as lf and thevehicle width as w. Then, the lateral position of theleft-front corner is given by

lleft front corner = f(s) + sin(θ)lf + w/2

, where θ is the heading difference between the egocar and road station direction. The constraint canbe further linearized using the following inequalityapproximation:

f(s) + sin(θ)lf + w/2 ≤ f(s) + f ′(s)lr + w/2

≤ lleft corner bound

Similarly, the linearization can be applied on theremaining three corners. The linearized constraints

are good enough since θ is generally small. Forθ < pi/12, the estimation will be less than 2 - 3cm conservative on the lateral direction comparedto the constraint without linearization.

s

l

s

lright nudge

left nudge

θ

w 2

Fig. 10: Spline QP Path Constraint Linearization

The range constraints of f ′′(s) and f ′′′(s) canalso be used as dynamic feasibility since they arerelated to curvature and the curvature derivative. Inaddition to the boundary constraint, the generatedpath shall match the ego car’s initial lateral positionand derivatives (f(s0), f ′(s0), f ′′(s0)). Since allconstraints are linear with respect to spline param-eters, a quadratic programming solver can be usedto solve the problem very fast.

The details of the smoothing spline and quadraticprogramming problem are covered in VI.

D. M-Step DP Speed Optimizer

The speed optimizer generates a speed profilein the ST graph, which is represented as a stationfunction with respect to time S(t). Similar to in thepath optimizer, finding a best speed profile on theST graph is a non-convex optimization problem. Weuse dynamic programming combined with splinequadratic programming to find a smooth speedprofile on the ST graph. In Fig. 12, the DP speedstep includes a cost functional, ST graph grids anddynamic programming search. The generated result

includes a piecewise linear speed profile, a feasibletunnel and obstacle speed decisions, as shown inFig. 11. The speed profile will be used in the splineQP speed step as a guidance line, and the feasibletunnel will be used to generate a convex region.

0 2 4 6 8

020

4060

8010

0

time

s

follow−yield

overtake

Fig. 11: DP Speed Optimizer

In detail, obstacle information is first discretizedinto grids on the ST graph. Denote (t0, t1, ..., tn)as equally spaced evaluated points on the time axiswith interval dt. A piecewise linear speed profilefunction is represented as S = (s0, s1, ..., sn) onthe grids. Furthermore, the derivatives are approx-imated by the finite difference method.

s′i = vi ≈ si−si−1

dt

s′′i = ai ≈ si−2si−1+si−2

(dt)2

s′′′i = ji ≈ si−3si−1−3si−2+si−3

(dt)3

The goal is to optimize a cost functional in the STgraph within the constraints. In detail, the cost forthe DP speed optimizer is represented as follows:

Ctotal(S) = w1

∫ tn

t0

g(S′ − Vref )dt

+ w2

∫ tn

t0

(S′′)2dt+ w3

∫ tn

t0

(S′′′)2dt

+ w4Cobs(S)

The first term is the velocity keeping cost. This termindicates that the vehicle shall follow the designatedspeed when there are no obstacles or traffic lightrestrictions present. Vref describes the referencespeed, which is determined by the road speedlimits, curvature and other traffic regulations. Theg function is designed to have different penaltiesfor values that are less or greater than Vref . Theacceleration and jerk square integral describes thesmoothness of the speed profile. The last term,Cobs, describes the total obstacle cost. The dis-tances of the ego car to all obstacles are evaluatedto determine the total obstacle costs.

The dynamic programming search space is alsowithin the vehicle dynamic constraints. The dy-namic constraints include acceleration, jerk limitsand a monotonicity constraint since we require thatthe generated trajectories do not perform backingmaneuvers when driving on the road. Backing canonly be performed under parking or other specifiedscenarios. The search algorithm is straightforward;some necessary pruning based on vehicle dynamicconstraints is also applied to accelerate the process.We will not discuss the search part in detail.

ST Map&Obs

TrafficRegulations

VehicleDynamics

CostFunctional

DiscretizedST Grid

Dynamic Programming Search

DPSpeed

SpeedDecisions

FeasibleTunnel

Fig. 12: DP Speed Optimizer

E. M-Step QP Speed Optimizer

Since the piecewise linear speed profile cannotsatisfy dynamic requirements, the spline QP stepis needed to fill this gap. In Fig. 13, the splineQP speed step includes three parts: cost functional,linearized constraint and spline QP solver.

The cost functional is described as follows:

Ctotal(S) = w1

∫ tn

t0

(S − Sref )2dt+ w2

∫ tn

t0

(S′′)2dt

+ w3

∫ tn

t0

(S′′′)2dt.

The first term measures the distance between theDP speed guidance profile Sref and generated pathS. The acceleration and jerk terms are measures ofthe speed profile smoothness. Thus, the objectivefunction is a balance between following the guid-ance line and smoothness.

SmoothFunctional DP Speed

Cost Functional

FeasibleTunnel

Linearized Constraint

Spline QP Solver

VehicleDynamics

SmoothSpeed

TrafficRegulations

Fig. 13: Spline QP speed optimizer

The spline optimization is within the linearizedconstraint. The constraints in QP speed optimiza-tion include the following boundary constraints:

S(ti) ≤ S(ti+1), i = 0, 1, 2, ..., n− 1,

Sl,ti ≤ S(ti) ≤ Su,ti ,

S′(ti) ≤ Vupper,

−Decmax ≤ S′′(ti) ≤ Accmax

−Jmax ≤ S′′′(ti) ≤ Jmax

as well as constraints for matching the initial veloc-ity and acceleration. The first constraint is mono-tonicity evaluated at designated points. The second,third, and fourth constraints are requirements fromtraffic regulations and vehicle dynamic constraints.After wrapping up the cost objective and con-straints, the spline solver will generate a smoothfeasible speed profile as in Fig. 14. Combined withthe path profile, EM planner will generate a smoothtrajectory for the control module.

0 2 4 6 8

020

4060

8010

0

time

s

follow−yield

overtake

feasible region

DP SpeedQP Speed

Fig. 14: Spline QP speed optimizer

F. Notes on Solving Quadratic Programming Prob-lems

For safety considerations, we evaluate the pathand speed at approximately one-hundred differentlocations or time points. The number of constraintsis greater than six hundred. For both the path andspeed optimizers, we find that piecewise quinticpolynomials are good enough. The spline generallycontains 3 to 5 polynomials with approximately30 parameters. Thus, the quadratic programmingproblem has a relatively small objective functionbut large number of constraints. Consequently,an active set QP solver is good for solving theproblem. In addition to accelerating the quadraticprogramming, we use the result calculated in thelast cycle as a hot start. The QP problem can besolved within 3 ms on average, which satisfies ourtime consumption requirement.

G. Notes on Non-convex Optimization With DP andQP

DP and QP alone both have their limitations inthe non-convex domain. A combination of DP andQP will take advantage of the two and reach an

ideal solution.• DP: As described earlier in this manuscript,

the DP algorithm depends on a sampling stepto generate candidate solutions. Because of therestriction of processing time, the number ofsampled candidates is limited by the samplinggrid. Optimization within a finite grid yieldsa rough DP solution. In other words, DP doesnot necessarily, and in almost all cases wouldnot, deliver the optimal solution. For example,DP could select a path that nudges the obstaclefrom the left but not nudge with the bestdistance.

• QP: Conversely, QP generates a solution basedon the convex domain. It is not availablewithout the help of the DP step. For example,if an obstacle is in front of the master vehicle,QP requires a decision, such as nudge from theleft, nudge from the right, follow, or overtake,to generate its constraint. A random or rule-based decision will make QP susceptible tofailure or fall into a local minimal.

• DP + QP: A DP plus QP algorithm willminimize the limitations of both: (1) The EMplanner first uses DP to search within a grid toreach a rough resolution. (2) The DP resultsare used to generate a convex domain andguide QP. (3) QP is used to search for theoptimal solution in the convex region that mostlikely contains global optima.

IV. CASE STUDY

As mentioned in the above sections, althoughmost state-of-the-art planning algorithms are basedon heavy decisions, EM planner is a light-decision-based planner. It is true that a heavy-decision-basedalgorithm, or heavily rule-based algorithm, is easilyunderstood and explained. The disadvantages arealso clear: it may be trapped in corner cases (whileits frequency is closely related to the complexityand magnitude of the number of rules) and notalways be optimal. In this section, we will illustratethe benefits of the light-decision-based planningalgorithm by presenting several case studies. Thesecases were exposed during the intense daily testroutines in Baidu’s heavy-decision planning mod-

ules and solved by the latest light-decision planningmodule.

Figure 15 is a hands-on example of how EMplanner iterates within and between planning cyclesto achieve the optimal trajectory. In this case study,we demonstrate how the trajectory is generatedwhile an obstacle enters our path. Assuming that themaster vehicle has a speed of 10 meters per secondand there is a dynamic obstacle that is movingtoward us in the opposite direction with a speed thatis also 10 meters per second, EM planner generatesthe path and speed profile iteratively with the stepsbelow.

1) Historical Planning (Figure 15a). In the his-torical planning profile, i.e., before the dy-namic obstacle enters, the master vehicleis moving forward straight with a constantspeed of 10 meters per second.

2) Path Profile Iteration 1 (Figure 15b). Duringthis step, the speed profile is cruising at 10m/s from the historical profile. Based on thiscursing speed, the master vehicle and thedynamic obstacle will meet each other atposition S = 40m. Consequently, the bestway to avoid this obstacle is to nudge it fromthe right side at S = 40m.

3) Speed Profile Iteration 1 (Figure 15c). Basedon the path profile, which is nudge from theright, from step 1, the master vehicle adjustsits speed according to its interaction with theobstacle. Thus, the master vehicle will slowto 5 m/s when passing an obstacle with aslower speed, as passengers may expect.

4) Path Profile Iteration 2 (Figure 15d). Underthe new speed profile, which is slower thanthe original one, the master vehicle no longerpasses the dynamic obstacle at S = 40m butrather a new position at S = 30m. Thus, thepath to nudge the obstacle should be updatedto a new one to maximize the nudge distanceat S = 30m.

5) Speed Profile Iteration 2 (Figure 15e). Underthe new path profile, where the nudge isperformed at S = 30m, the slow down atS = 40m is no longer necessary. The newspeed profile indicates that the master vehicle

s

s = 80

l

s = 40

s = 80

v = 10 m/s

(a) Stage A: Historical Planning

s

s = 80

l

s = 40

s = 80

v = 10 m/s

(b) Stage B: Path Planning Cycle 1

s

s = 80

l

s = 40

s = 80

v = 10 m/s

(c) Stage C: Speed Planning Cycle 1

s

s = 80

l

s = 40

s = 80

v = 10 m/s

(d) Stage D: Path Planning Cycle 2

s

s = 80

l

s = 40

s = 80

v = 10 m/s

(e) Stage E: Speed Planning Cycle 2

Fig. 15: Case Study - Nudge oncoming dynamic obstacleThis figure shows how the EM planner manages to iteratively solve the update path and speed profile.

can accelerate at S = 40m and still generatea smooth pass at S = 30m.

Thus, the final trajectory based on the four stepsis to slow to nudge the obstacle at S=30 m andthen accelerate after the master vehicle passes theobstacle, which is very likely how human driversperform under this scenario.

Note that it is not necessary to always takeexactly four steps to create the plan. It could takefewer or more steps depending on the scenario. Ingeneral, the more complicated the environment is,

the more steps that may be required.

V. COMPUTATIONAL PERFORMANCE

Because the three-dimensional station-lateral-speed problem has been split into two two-dimensional problems, i.e., station-lateral problemand station-speed problem, the computational com-plexity of EM planner has been significantly de-creased, and thus, this planner is very efficient.Assuming that we have n obstacles with M can-didate path profiles and N candidate speed profiles,

the computational complexity of this algorithm isO(n(M+N)).

On a PIC of Nuvo-6108GC-GTX1080-E3-1275,with DDR4-16GB-ECC and HDD1TB-72 [?], ittakes less than 100 ms on average.

VI. CONCLUSION

EM planner is a light-decision-based algorithm.Compared with other heavy-decision-based algo-rithms, the advantage of EM planner is its abilityto perform under complicated scenarios with multi-ple obstacles. When heavy-decision-based methodsattempt to predetermine how to act with eachobstacle, the difficulties are significant: (1) It isdifficult to understand and predict how obstaclesinteract with each other and the master vehicle;thus, their following movement is hard to describeand therefore hard to be considered by any rules.(2) With multiple obstacles blocking the road, theprobability of not finding a trajectory that meetsall predetermined decisions is dramatically reduced,leading to planning failure.

One critical issue in autonomous driving ve-hicles is the challenge of safety vs. passability.A strict rule increases the safety of the vehiclebut lowers the passability, and vice versa. Takethe lane-changing case as an example; one couldeasily pause the lane-changing process if there is avehicle behind with simple rules. This could grantsafety but considerably decreases the passability.EM planner described in this manuscript is alsodesigned to solve the inconsistency of potentialdecisions and planning, while it also improves thepassability of autonomous driving vehicles.

EM planner significantly reduces computationalcomplexity by transforming a three-dimensionalstation-lateral-speed problem into two two-dimensional station-lateral/station-speed problems.It could significantly reduce the processing timeand therefore increase the interaction ability of thewhole system.

As of May 16th, 2018, the effectivity of this sys-tem has been proven under 3,380 hours and approx-imately 68,000 kilometers (42,253 miles) of intenseclosed-loop testing in Baidu Apollo autonomousdriving vehicles. The algorithm has been evaluated

under different countries, traffic laws and condi-tions, including extremely crowded urban scenariossuch as Beijing, China, and Sunnyvale, CA, USA.The algorithm has also been evaluated and testedin more than one-hundred-thousand hours and amillion kilometers (0.621 million miles) simulationtest.

The algorithm described in this manuscriptis available at https://github.com/ApolloAuto/apollo-11/tree/master/modules/planning.

ACKNOWLEDGEMENT

We would like to thank Apollo Community fortheir useful suggestions and contributions.

https://github.com/ApolloAuto/apollo-11/tree/master/modules/planning



APPENDIX 1

Apollo is an open autonomous driving platform.It is a high-performance flexible architecture thatsupports fully autonomous driving capabilities. Forbusiness contact, please visit http://apollo.auto.

A list of contributors for Apollo includes,but is not limited to, https://github.com/ApolloAuto/apollo-11/graphs/contributors.

APPENDIX 2: CONSTRAINED SMOOTHING SPLINEAND QUADRATIC PROGRAMMING

The spline QP path and speed optimizer usesthe same constrained smoothing spline framework.In this section, we formalize the smoothing splineproblem within the quadratic programming frame-work and linearized constraints. Optimization witha quadratic convex objective and linearized con-straints can be rapidly and stably solved.

Under the quadratic programming setting, weoptimize a third-order smooth function f(x) givena quadratic objective functional and linear con-straints. A smoothing spline function f(x) on do-main (a, b), a = x0, b = xn is represented as alinear combination of piecewise polynomial bases.Denote x0 = a, x1, x2, ..., xn = b as the knots.Then, f(x) is given by the following:

f(x) =

f0(x− x0) x ∈ [x0, x1)

f1(x− x1) x ∈ [x1, x2)

...

fn−1(x− xn−1) x ∈ [xn−1, xn]

(1)

where fk(x) = pk0 + pk1x + ...pkmxm is a

polynomial function. The coefficient is defined aspk = (pk0, pk1, ..., pkm)T for fk. The smooth-ing spline parameter vector is defined as p =(pT

0 ,pT1 , ...,p

Tn−1)T . Mathematically, the spline

solver attempts to find a function f that optimizesthe linear combination of functionals on the repro-ducing kernel Hilbert space (RKHS) domain. Thedomain space is described as

Ω = f : [a, b]→ R|f, f (1), f (2), f (3)

is abs. conti. and∫ b

a

(f (m))2dx <∞,m = 0, 1, 2, 3

The spline QP problem is formed with an objec-tive functional, linearized constraints and quadraticprogramming solver. It is described as follows:

arg minf∈Ω

P(f) =

3∑i=0

wiPi(f) (2)

with L(f(x)) <= 0. The objective functional P(f)is a linear combination of four functionals, Pi, i =0, 1, 2, 3 on Ω. Specifically,

P0(f) =

∫ b

a

(f(x)− g(x))2dx,

P1(f) =

∫ b

a

(f (1)(x))2dx,

P2(f) =

∫ b

a

(f (2)(x))2dx,

P3(f) =

∫ b

a

(f (3)(x))2dx,

where g is a pre-specified guideline function in Ω.The first one represents the distance to the guidanceline g, whereas the remaining three describe thesmoothness of f . The constraint functional L(f(x))includes the following types based on the problemsetup:

1) Boundary constraint, i.e., L ≤ f (m)(x) ≤U,m = 0, 1, 2, 3 given evaluated location x.For example, m = 0, x = 2 represents thesmooth function value; then, f(2) shall bebounded between lower bound l and upperbound L. Specifically, boundary constraintwith heading has the form L ≤ f(x) +cf ′(x) ≤ U .

2) Smoothness constraint up to the m-th deriva-tive. Since f is piecewise polynomials, itrequires polynomials are joint at spline knotsxk, k = 1, 2, ..., n− 1 with up to m-th-orderderivative matching.

http://apollo.auto

http://apollo.auto

https://github.com/ApolloAuto/apollo-11/graphs/contributors



3) Monotonicity constraint. The constraint isspecifically designed for the speed smoother,which guarantees that the path is monotonicat a sequence of specified points.

Theorem 1: Functional P defined in 2 has aquadratic form, and functional L has a linear formwith respect to parameter p.

Proof: For simplicity, denote variable vec-tor x(i), i = 0, 1, 2, 3 as the i-th-order derivativecoefficient of f with respect to parameter p atx ∈ [xk, xk+1), k = 0, 1, ..., n− 1. That is,

x(0) =(1, x, x2, ..., xm)T ,

x(1) =(0, 1, 2x, ..., mxm−1)T ,

x(2) =(0, 0, 2, ...,m(m− 1)xm−2)T ,

x(3) =(0, 0, 0, ...,m(m− 1)(m− 2)xm−3)T ,

where x = x−xk, x ∈ [xk, xk+1), k = 0, 1, ..., n−1. In (1), each Pi, i = 1, 2, 3 can be represented asa summation of integrations:

Pi(f) =

n−1∑k=0

∫ xk+1

xk

(f (i))2dx

=

n−1∑k=0

∫ xk+1

xk

pTk x(i)x

T(i)pkdx

= pTX(i)p

where X(i) is defined as the block diagonalmatrix Diag(

∫ x1

x0x(i)x

T(i)dx, ...,

∫ xk+1

xkx(i)x

T(i)dx).

Similarly, for P0(f),

P0(f) =

n−1∑k=0

∫ xk+1

xk

(f(x)− g(x))2dx =

n−1∑k=0

∫ xk+1

xk

(pTk x(0)x

T(i)pk − 2pT

k x(0)g(x) + (g(x))2)dx

= pTX(0)p− 2pT c + const.

where X(0) is defined as the diagonal block matrixDiag(

∫ x1

x0x(0)x

T(0)dx, ...,

∫ xk+1

xkx(0)x

T(0)dx) and

c =∫ xk+1

xkx(0)g(x)dx.

Thus, the objective function P(f) has a quadraticconvex form with respect to spline parameter vectorp.

REFERENCES

[1] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximumlikelihood from incomplete data via the em algorithm,”J. Royal Stat. Soc. Ser. B (Methodol.), vol. 39, pp. 1–38,1977.

[2] Z. Ajanovic, B. Lacevic, B. Shyrokau, M. Stolz, andM. Horn, “Search-based optimal motion planning for au-tomated driving,” arXiv preprint arXiv:1803.04868, 2018.

[3] M. Werling, J. Ziegler, S. Kammel, and S. Thrun, “Optimaltrajectory generation for dynamic street scenarios in afrenet frame,” in 2010 IEEE International Conference onRobotics and Automation (ICRA). IEEE, 2010, pp. 987–993.

[4] M. McNaughton, C. Urmson, J. M. Dolan, and J.-W. Lee,“Motion planning for autonomous driving with a confor-mal spatiotemporal lattice,” in 2011 IEEE InternationalConference on Robotics and Automation (ICRA). IEEE,2011, pp. 4889–4895.

[5] J. Ziegler and C. Stiller, “Spatiotemporal state lattices forfast trajectory planning in dynamic on-road driving scenar-ios,” in IEEE/RSJ International Conference on IntelligentRobots and Systems, 2009 (IROS 2009). IEEE, 2009, pp.1879–1884.

[6] T. Gu, J. Atwood, C. Dong, J. M. Dolan, and J.-W. Lee,“Tunable and stable real-time trajectory planning for ur-ban autonomous driving,” in 2015 IEEE/RSJ InternationalConference on Intelligent Robots and Systems (IROS).IEEE, 2015, pp. 250–256.

[7] W. Xu, J. Wei, J. M. Dolan, H. Zhao, and H. Zha,“A real-time motion planner with trajectory optimizationfor autonomous vehicles,” in 2012 IEEE InternationalConference on Robotics and Automation (ICRA). IEEE,2012, pp. 2061–2067.

[8] M. Montemerlo, J. Becker, S. Bhat, H. Dahlkamp, D. Dol-gov, S. Ettinger, D. Haehnel, T. Hilden, G. Hoffmann,B. Huhnke et al., “Junior: The stanford entry in the urbanchallenge,” J. Field Robot., vol. 25, no. 9, pp. 569–597,2008.

[9] C. Urmson, J. Anhalt, D. Bagnell, C. Baker, R. Bittner,M. N. Clark, J. Dolan, D. Duggins, T. Galatali, C. Geyeret al., “Autonomous driving in urban environments: Bossand the urban challenge,” J. Field Robot., vol. 25, no. 8,pp. 425–466, 2008.

[10] H. Bai, D. Hsu, and W. S. Lee, “Integrated perception andplanning in the continuous space: A pomdp approach,” Int.J. Robot. Res., vol. 33, no. 9, pp. 1288–1302, 2014.

[11] S. Brechtel, T. Gindele, and R. Dillmann, “Probabilisticdecision-making under uncertainty for autonomous drivingusing continuous pomdps,” in 2014 IEEE 17th Interna-tional Conference on Intelligent Transportation Systems(ITSC). IEEE, 2014, pp. 392–399.

[12] A. G. Cunningham, E. Galceran, R. M. Eustice, andE. Olson, “Mpdm: Multipolicy decision-making in dy-namic, uncertain environments for autonomous driving,”in 2015 IEEE International Conference on Robotics andAutomation (ICRA). IEEE, 2015, pp. 1670–1677.

[13] E. Galceran, A. G. Cunningham, R. M. Eustice, and E. Ol-son, “Multipolicy decision-making for autonomous drivingvia changepoint-based behavior prediction.” in Robotics:Science and Systems, 2015, pp. 2290–2297.

baidu apollo em motion planner · 2018-07-24 · baidu apollo em motion planner haoyang fan1 ;y,...

Documents