gradient sampling for improved action selection and path...

27
Gradient Sampling for Improved Action Selection and Path Synthesis Ian M. Mitchell Department of Computer Science The University of British Columbia November 2016 [email protected] http://www.cs.ubc.ca/~mitchell Copyright 2016 by Ian M. Mitchell This work is made available under the terms of the Creative Commons Attribution 4.0 International license http://creativecommons.org/licenses/by/4.0/ Ian M. Mitchell — 1

Upload: vokhue

Post on 06-Mar-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Gradient Sampling forImproved Action Selection and Path Synthesis

Ian M. Mitchell

Department of Computer ScienceThe University of British Columbia

November 2016

[email protected]

http://www.cs.ubc.ca/~mitchell

Copyright 2016 by Ian M. MitchellThis work is made available under the terms of the Creative Commons Attribution 4.0 International license

http://creativecommons.org/licenses/by/4.0/

Ian M. Mitchell — 1

Page 2: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Shortest Path via the Value Function

• Assume isotropic holonomic vehicle

ddtx(t) = x(t) = u(t), ‖u(t)‖ ≤ 1.

• Plan paths to target set T optimal by costmetric

ψ(x0) = infx(·)

∫ tf

t0

c(x(s)) ds,

tf = argmins | x(s) ∈ T .

• Value function ψ(x) satisfies Eikonalequation

‖∇ψ(x)‖ = c(x), for x ∈ Ω \ T ;

ψ(x) = 0, for x ∈ T .

Top: Goal location (blue circle) andobstacles (grey).Middle: Contours of value function.Bottom: Gradients of value function(subsampled grid).

Ian M. Mitchell — 2

Page 3: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Path Extraction from Value Function

• Given the value function, optimal statefeedback action

u∗(x) =∇ψ(x)

‖∇ψ(x)‖.

• Typical robot makes decisions on a periodiccycle with period δt so path is given by

ti+1 = ti + ∆t,

x(ti+1) = x(ti) + ∆t u∗(x(ti)).

• Even variable step integrators forx(t) = u∗(x(t)) struggle

Top: Fixed stepsize explicit (forwardEuler).Middle: Adaptive stepsize implicit(ode15s).Bottom: Sampled gradient algorithm.

Ian M. Mitchell — 3

Page 4: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Not Just for Static HJ PDEs!

Time dependent HJ PDEs arise in:

• Finite horizon optimal control.

• Reachable sets and viability kernels for formal verification of continuous andhybrid state systems.

(See game of two identical vehicles / softwalls slide deck.)

Ian M. Mitchell — 4

Page 5: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Outline

1. Motivation: Value Functions and Action / Path Synthesis

2. Background: Gradient Sampling and Particle Filters

3. Gradient Sampling Particle Filter

4. Dealing with Stationary Points

5. Concluding Remarks

Ian M. Mitchell — 5

Page 6: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Outline

1. Motivation: Value Functions and Action / Path Synthesis

2. Background: Gradient Sampling and Particle Filters

3. Gradient Sampling Particle Filter

4. Dealing with Stationary Points

5. Concluding Remarks

Ian M. Mitchell — 5

Page 7: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Gradient Sampling for Nonsmooth Optimization I

Gradient sampling algorithm [Burke, Lewis & Overton, SIOPT 2005]

• Evaluate gradient at k random sampleswithin ε-ball of current point x(ti)

x(k)(ti) = x(ti) + ε δx(k),

p(k)(ti) = ∇ψ(x(k)(ti)).

• Determine consensus direction

p∗(ti) = argminp∈P(ti)

‖p‖

P(ti) = convp(1)(ti), . . . , p(K)(ti).

P(ti) approximates the Clarkesubdifferential at x(ti).

Gradient samples (yellow) andconsensus direction (red).

Plotted in state space.

Plotted in gradient space.Convex hull (blue) also shown.

Ian M. Mitchell — 6

Page 8: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Gradient Sampling for Nonsmooth Optimization II

Gradient sampling algorithm [Burke, Lewis & Overton, SIOPT 2005]

If ‖p∗(ti)‖ = 0

• There is a Clarke ε-stationary point insidethe sampling ball.

• Shrink ε and resample.

If ‖p∗(ti)‖ 6= 0

• Choose step length s by Armijo line searchalong p∗(ti).

• Set new point

x(ti+1) = x(ti)− sp∗(ti)

‖p∗(ti)‖.

Gradient samples (yellow) andconsensus direction (red).

Plotted in state space.

Plotted in gradient space.Convex hull (blue) also shown.

Ian M. Mitchell — 7

Page 9: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Particle Filters

Monte Carlo localization (MCL) [Thrun, Burgard & Fox, Probabilistic Robotics,2005] is often used to estimate current state for mobile robots.

• State estimate is a collection of weighted samples

(w(k)(t), x(k)(t)).

• Predict: Draw new sample state x(k)(ti+1) when action u(ti) is taken

x(k)(ti+1) ∼ p(x(ti+1) | x(k)(ti), u(ti)).

• Correct: Update weights w(k)(ti+1) when sensor reading arrives

w(k)(ti+1) = p(sensor reading | x(k)(ti+1)) w(k)(ti),

• Resample states and reset weights regularly.

We always work with particle cloud after resampling (when all weights are unity).

Ian M. Mitchell — 8

Page 10: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Outline

1. Motivation: Value Functions and Action / Path Synthesis

2. Background: Gradient Sampling and Particle Filters

3. Gradient Sampling Particle Filter

4. Dealing with Stationary Points

5. Concluding Remarks

Ian M. Mitchell — 8

Page 11: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

The Gradient Sampling Particle Filter (GSPF)

• Sample the gradients at the particle locations.

• If ‖p∗(ti)‖ 6= 0, then p∗(ti) is a consensus descent direction for current stateestimate.

Choosing action by steepest descent on theexpected state.

Choosing action by GSPF.

Simulated traversal of a narrow corridor.Estimated (blue) and true (green) paths shown.

Ian M. Mitchell — 9

Page 12: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Narrow Corridor Simulation

Run in ROS (Robot Operating System) with simulated motion and sensor noise(ROS/Gazebo) and particle filter localization (ROS/AMCL).

Occupancy grid constructed with laser rangefinder(s) and SLAM package.

Cost function c(x) (higher cost is darker blue).

Value function ψ(x) (higher value is red). Vertical component of ∇ψ(x) (yellow isdownward, brown is upward).

Ian M. Mitchell — 10

Page 13: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Narrow Corridor Simulation

Compare paths taken when choosing action by AMCL expected state (roughly themean of particle locations) or GSPF.

• Chattering remains even as step size reduced.

Choosing action by steepest descent on theexpected state.

Choosing action by GSPF.

Visualizations shown in the videos:

• Main window: Estimated state and path (blue), actual state and path (green), particlecloud (pink), action choices (red arrows), sensor readings (red dots), vertical component of∇ψ(x) (background).

• Lower left window: Gradient space visualization of gradient samples (yellow) and actionchoice (red).

Ian M. Mitchell — 11

Page 14: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Outline

1. Motivation: Value Functions and Action / Path Synthesis

2. Background: Gradient Sampling and Particle Filters

3. Gradient Sampling Particle Filter

4. Dealing with Stationary Points

5. Concluding Remarks

Ian M. Mitchell — 11

Page 15: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Finite Wall Scenario

If ‖p∗(ti)‖ = 0 there is no consensus direction.

Finite wall scenario displays the twotypical types of stationary points:

• Minimum (left side): Path iscomplete(?)

• Saddle point (right side): Seek adescent direction.

Cost. Value approximation.

Ian M. Mitchell — 12

Page 16: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Classify the Stationary Point

Quadratic ansatz for value function in neighborhood of samples

ψ(x) = 12 (x− xc)T A(x− xc)

• Fit to the existing gradient samples

∇ψ(x) = A(x− xc).

• Solve by least squares

minA,b‖p(k)(ti)−Ax(k)(ti)− b‖

and set xc = A−1b.

• Examine eigenvalues λjdj=1 of A

I If all λj > 0, local minimum.I If any λj < 0, corresponding eigenvectors

are descent directions.

ψ(x) at minimum.

ψ(x) at saddle.

Ian M. Mitchell — 13

Page 17: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Classification Experiments: Minimum

State space view of path.

State space gradient samples(gold) and eigenvectors of

Hessian of ψ(x) (blue).Inward pointing eigenvectorarrow pairs correspond to

positive eigenvalues.

Ian M. Mitchell — 14

Page 18: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Classification Experiments: Saddle

State space view of path.

Gradient space convex hull

State space eigenvectors ofHessian of ψ(x).

Ian M. Mitchell — 15

Page 19: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Resolve the Stationary Point

Three potential responses to detection of a stationary point

• Stop: If it is a minimum and localization issufficiently accurate.

• Reduce sampling radius: Collect additional sensordata to improve localization estimate.

I Rate and/or quality of sensing can be reduced whenconsensus direction is available.

I Localization should be improved by independentsensor data.

• Vote: If it is a saddle and improved localization isinfeasible.

I Let v be the eigenvector associated to a negativeeigenvalue and α =

∑k sign(−v

T p(k)).I Travel in direction sign(α)v.

0 5 10 15 20 25 30 35 400.00

0.79

1.57

2.36

3.14

3.93

4.71

5.50

6.28

Action Angle vs. Particle Covariance

Time

Angle

0 5 10 15 20 25 30 35 400

0.05

0.1

0.15

0.2

0.25

Covariance

Decreasing localizationcovariance (green) allows foridentification of a consensus

direction (blue). In greyregion, no consensusdirection was found.

Ian M. Mitchell — 16

Page 20: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Finite Wall Simulation

Generate paths for finite wall scenario using each of the two resolution proceduresfor the saddle point.

Resolution by using an improved sensor. Resolution by voting.

Visualizations shown in the videos:

• Main window: Estimated state and path (blue), actual state and path (green), particlecloud (pink), action choices (red arrows), gradient samples (yellow), sensor readings (reddots) and eigenvectors of the Hessian approximation (blue arrows).

• Lower left window: Gradient space visualization of gradient samples (yellow) and actionchoice (red).

Ian M. Mitchell — 17

Page 21: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Outline

1. Motivation: Value Functions and Action / Path Synthesis

2. Background: Gradient Sampling and Particle Filters

3. Gradient Sampling Particle Filter

4. Dealing with Stationary Points

5. Concluding Remarks

Ian M. Mitchell — 17

Page 22: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Gradient Sampling is not just for Value Functions

Actions synthesized by nearest neighbor lookup on RRT* tree.

Ian M. Mitchell — 18

Page 23: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

No Time For. . .

Monotone acceptance ordered upwind method [Alton & Mitchell, JSC 2012].

Ian M. Mitchell — 19

Page 24: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

No Time For. . .

Mixed implicit explicit formulation of reach sets to reduce computationaldimension [Mitchell, HSCC 2011].

Ian M. Mitchell — 20

Page 25: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

No Time For. . .

Infusion pump

Controller

Parametric approximations of viability and discriminating kernels with applicationsto verification of automated anesthesia delivery [Maidens et al, Automatica 2013].

Ian M. Mitchell — 21

Page 26: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

No Time For. . .

Shared control smart wheelchair for older adultswith cognitive impairments [Viswanathan et al,

Autonomous Robots 2016].

0 2 4 6 8 10 12 14 16 18 20−2

0

2

x1

State Trajectories

0 2 4 6 8 10 12 14 16 18 200

1

2

x2

0 2 4 6 8 10 12 14 16 18 20−1

0

1

x3

0 2 4 6 8 10 12 14 16 18 20−1

0

1

x4

0 2 4 6 8 10 12 14 16 18 20−0.1

0

0.1

x5

0 2 4 6 8 10 12 14 16 18 20−2

0

2

x6

0 2 4 6 8 10 12 14 16 18 20−0.5

0

0.5

u10 2 4 6 8 10 12 14 16 18 20

−0.2

0

0.2

t (seconds)u2

Ensuring safety for human-in-the-loop flightcontrol of low sample rate indoor quadrotors

[Mitchell et al, CDC 2016].

Ian M. Mitchell — 22

Page 27: Gradient Sampling for Improved Action Selection and Path ...mitchell/Talks/linz-sampled-gradient.pdf · Path Extraction from Value Function Given the value function, optimal state

Conclusions

Gradient sampling particle filter (GSPF)

• Utilizes natural uncertainty in system state to reduce chattering due tonon-smooth value function and/or numerical approximation.

• Easily implemented on existing planners and state estimation.

• To appear in [Traft & Mitchell, CDC 2016].

Future work

• Nonholonomic dynamics.

• Convergence proof.

• Scalability to more particles.

• Set-valued actions.

• Human-in-the-loop control and interface.

Ian M. Mitchell — 23