wisemove a framework to investigate safe deep reinforcement … · 2019-11-19 · deep model for...

18
WISEMOVE: A Framework to Investigate Safe Deep Reinforcement Learning for Autonomous Driving Jaeyoung Lee, Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, Sean Sedwards University of Waterloo September, 14th, 2019

Upload: others

Post on 13-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

WISEMOVE: A Framework to Investigate Safe Deep Reinforcement Learning for Autonomous Driving

Jaeyoung Lee, Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, Sean Sedwards

University of Waterloo September, 14th, 2019

Page 2: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

WISEMOVE?

‣ A research platform that mimics our autonomous driving stack.

1

‣ Objective: investigate the safety and performance of motion planners

trained using deep reinforcement learning

‣ Features:

✓ Hierarchical Decision Making

✓ Runtime Verification

✓ Reinforcement Learning / Monte Carlo Tree Search (MCTS)

Page 3: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

No learning component …Motion Planner

Local Planner

reference trajectories

measurements, perceptions, etc.

high-level decision

(abstracted)Behaviour Planner

Motion Planning Architecture in 100 km Public Drive (2018)2

Page 4: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

(w/o MCTS)

Deep Model for Trajectory Generation

reference trajectories

measurements, perceptions, etc.

Deep Model for Decision Making

Option

Deep models are trained by deep reinforcement learning.

(high-level decision)

STO

P

STOP

stop region

stop

regi

on

intersection

ego

WISEMOVE Architecture3

Road Scenario

Motion Planner

Page 5: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

Deep Model for Trajectory Generation

Deep Model for Decision Making

Option

‣Five Options:

KeepLane, Stop, Wait, Follow, ChangeLane

‣Components

✓ preconditions‣ Two “two-lane and one-way” roads

‣ All-ways stop implemented by the stop region

‣ 0~5 other vehicles

Road Scenario

STO

P

STOP

stop region

stop

regi

on

intersection

ego

WISEMOVE Architecture4

(w/o MCTS)Motion Planner

G((has_stopped_in_stop_region and in_stop_region) U highest_priority)

✓ time-out (e.g., 1 sec.)

✓ speed limit, target lane

, e.g., in an option ‘Wait’,

Page 6: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

Deep Model for Trajectory Generation

Deep Model for Decision Making

Runtime Verifier

G((has_stopped_in_stop_region and in_stop_region) U highest_priority)

‣Checks LTL-like strings until violated.

G(in_stop_region => (in_stop_region U has_stopped_in_stop_region))

‣ An episode ends when:

✓ Ego reaches the right end on the road, ✓ a traffic rule is violated, or ✓ a collision happens.

WISEMOVE Architecture5

Road Scenario

STO

P

STOP

stop region

stop

regi

on

intersection

ego

(w/o MCTS)Motion Planner

✓preconditions, e.g., in an option ‘Wait’,

✓ traffic-rules, e.g., in a stop region,

Page 7: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

Deep Model for Decision Making

Deep Model for Trajectory Generation

Input: a state representation Output: the learnt ‘best’ Option

Option(high-level decision)

Next Option?

‣Act upon the termination of the current Option.

WISEMOVE Architecture6

(w/o MCTS)Motion Planner

‣Choose the ‘best’ Option.

Page 8: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

Deep Model for Trajectory Generation

Input: a state representation (simplified) Output: reference trajectories, given an Option

Option(high-level decision)

‣A deep model is stored for each Option.

WISEMOVE Architecture7

Deep Model for Decision Making

Next Option?

(w/o MCTS)Motion Planner

‣Trajectories generated with simplified vehicle model.

Page 9: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

WISEMOVE Architecture8

Deep Model for Trajectory Generation

Deep Model for Decision Making

Next Option?time

Follow

StopWait

KeepLane

reference trajectory “____”

To the road scenario

(w/o MCTS)Motion Planner

Option “ ”

Page 10: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

9

Deep Model for Trajectory Generation

reference trajectory “____”

Training & Testing Low-level Deep Models

✓ was trained by reinforcement learning (DDPG) with

✓ 20 sec. timeout

✓(additional) preconditions and, if necessary, traffic rules.

Option “ ”

‣Five Deep Models —one for each Option.

‣Each model

✓outputs continuous control commands generating the trajectories

Page 11: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

After 100,000 steps training …

KeepLane

Stop

Follow

Wait

10

Page 12: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

After 100,000 steps training …

KeepLane

Stop

Follow

Wait

mean (std) % success after 100,000 training

(averaged over 100 trials of 100 episodes)

11

Page 13: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

After 1,000,000 steps training …

KeepLane

Stop

Follow

Wait

12

Page 14: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

‣Each low-level deep model is trained a priori for 1,000,000 steps.

Training & Testing High-level Deep Model

Deep Model for Trajectory Generation

Deep Model for Decision Making

Next Option?timeFollow

StopWait

KeepLane

reference trajectory “____”

To the road scenario

(w/o MCTS)Motion Planner

Option “ ”

‣One deep model, trained by reinforcement learning (DQN), outputs an Option.

‣1 sec. time-out for each option; 20 sec. time-out for an entire episode.

13

Page 15: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

Overall performance (after 200,000 steps training)

(averaged over 1000 episodes)

Training & Testing High-level Deep Model14

Page 16: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

With MCTS over Options …

KeepLane

Follow WaitStopcurrent state

StopChangeLane

Stop

Follow

Wait

. . .

ChangeLane

Stop

KeepLane

Stop

Wait

. . .Traverse until the leaf node, with exploration & exploitation

WaitKeepLane

ChangeLane

KeepLane

⊥Simulate!

Backpropagate!

Overall performance

(averaged over 1000 episodes)

15

Page 17: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

‣ The results are reproducible using the publicly available code at

‣ Future works

✓ Comparisons of RL and hand-coded motion planners.

✓ Different scenarios, realistic vehicle dynamics, etc.

✓ Simulation-to-Real

Concluding Remarks

git.uwaterloo.ca/wise-lab/wise-move/

‣ Features:

Options / Reinforcement Learning / Runtime Verification / Monte Carlo Tree Search (MCTS)

16

Page 18: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region

Thank you for attention!Q & A

Acknowledgment

This work is supported by the Japanese Science and Technology agency (JST) ERATO project JPMJER1603: HASUO

Metamathematics for Systems Design, and by the Natural Sciences and Engineering Research Council of Canada (NSERC)

Discovery Grant: Model-Based Synthesis and Safety Assurance of Intelligent Controllers for Autonomous Vehicles.