wisemove a framework to investigate safe deep reinforcement … · 2019-11-19 · deep model for...
TRANSCRIPT
![Page 1: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/1.jpg)
WISEMOVE: A Framework to Investigate Safe Deep Reinforcement Learning for Autonomous Driving
Jaeyoung Lee, Aravind Balakrishnan, Ashish Gaurav, Krzysztof Czarnecki, Sean Sedwards
University of Waterloo September, 14th, 2019
![Page 2: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/2.jpg)
WISEMOVE?
‣ A research platform that mimics our autonomous driving stack.
1
‣ Objective: investigate the safety and performance of motion planners
trained using deep reinforcement learning
‣ Features:
✓ Hierarchical Decision Making
✓ Runtime Verification
✓ Reinforcement Learning / Monte Carlo Tree Search (MCTS)
![Page 3: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/3.jpg)
No learning component …Motion Planner
Local Planner
reference trajectories
measurements, perceptions, etc.
high-level decision
(abstracted)Behaviour Planner
Motion Planning Architecture in 100 km Public Drive (2018)2
![Page 4: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/4.jpg)
(w/o MCTS)
Deep Model for Trajectory Generation
reference trajectories
measurements, perceptions, etc.
Deep Model for Decision Making
Option
Deep models are trained by deep reinforcement learning.
(high-level decision)
STO
P
STOP
stop region
stop
regi
on
intersection
ego
WISEMOVE Architecture3
Road Scenario
Motion Planner
![Page 5: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/5.jpg)
Deep Model for Trajectory Generation
Deep Model for Decision Making
Option
‣Five Options:
KeepLane, Stop, Wait, Follow, ChangeLane
‣Components
✓ preconditions‣ Two “two-lane and one-way” roads
‣ All-ways stop implemented by the stop region
‣ 0~5 other vehicles
Road Scenario
STO
P
STOP
stop region
stop
regi
on
intersection
ego
WISEMOVE Architecture4
(w/o MCTS)Motion Planner
G((has_stopped_in_stop_region and in_stop_region) U highest_priority)
✓ time-out (e.g., 1 sec.)
✓ speed limit, target lane
, e.g., in an option ‘Wait’,
![Page 6: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/6.jpg)
Deep Model for Trajectory Generation
Deep Model for Decision Making
Runtime Verifier
G((has_stopped_in_stop_region and in_stop_region) U highest_priority)
‣Checks LTL-like strings until violated.
G(in_stop_region => (in_stop_region U has_stopped_in_stop_region))
‣ An episode ends when:
✓ Ego reaches the right end on the road, ✓ a traffic rule is violated, or ✓ a collision happens.
WISEMOVE Architecture5
Road Scenario
STO
P
STOP
stop region
stop
regi
on
intersection
ego
(w/o MCTS)Motion Planner
✓preconditions, e.g., in an option ‘Wait’,
✓ traffic-rules, e.g., in a stop region,
![Page 7: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/7.jpg)
Deep Model for Decision Making
Deep Model for Trajectory Generation
Input: a state representation Output: the learnt ‘best’ Option
Option(high-level decision)
Next Option?
‣Act upon the termination of the current Option.
WISEMOVE Architecture6
(w/o MCTS)Motion Planner
‣Choose the ‘best’ Option.
![Page 8: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/8.jpg)
Deep Model for Trajectory Generation
Input: a state representation (simplified) Output: reference trajectories, given an Option
Option(high-level decision)
‣A deep model is stored for each Option.
WISEMOVE Architecture7
Deep Model for Decision Making
Next Option?
(w/o MCTS)Motion Planner
‣Trajectories generated with simplified vehicle model.
![Page 9: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/9.jpg)
WISEMOVE Architecture8
Deep Model for Trajectory Generation
Deep Model for Decision Making
Next Option?time
Follow
StopWait
KeepLane
reference trajectory “____”
To the road scenario
(w/o MCTS)Motion Planner
Option “ ”
![Page 10: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/10.jpg)
9
Deep Model for Trajectory Generation
reference trajectory “____”
Training & Testing Low-level Deep Models
✓ was trained by reinforcement learning (DDPG) with
✓ 20 sec. timeout
✓(additional) preconditions and, if necessary, traffic rules.
Option “ ”
‣Five Deep Models —one for each Option.
‣Each model
✓outputs continuous control commands generating the trajectories
![Page 11: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/11.jpg)
After 100,000 steps training …
KeepLane
Stop
Follow
Wait
10
![Page 12: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/12.jpg)
After 100,000 steps training …
KeepLane
Stop
Follow
Wait
mean (std) % success after 100,000 training
(averaged over 100 trials of 100 episodes)
11
![Page 13: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/13.jpg)
After 1,000,000 steps training …
KeepLane
Stop
Follow
Wait
12
![Page 14: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/14.jpg)
‣Each low-level deep model is trained a priori for 1,000,000 steps.
Training & Testing High-level Deep Model
Deep Model for Trajectory Generation
Deep Model for Decision Making
Next Option?timeFollow
StopWait
KeepLane
reference trajectory “____”
To the road scenario
(w/o MCTS)Motion Planner
Option “ ”
‣One deep model, trained by reinforcement learning (DQN), outputs an Option.
‣1 sec. time-out for each option; 20 sec. time-out for an entire episode.
13
![Page 15: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/15.jpg)
Overall performance (after 200,000 steps training)
(averaged over 1000 episodes)
Training & Testing High-level Deep Model14
![Page 16: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/16.jpg)
With MCTS over Options …
KeepLane
Follow WaitStopcurrent state
StopChangeLane
Stop
Follow
Wait
. . .
ChangeLane
Stop
KeepLane
Stop
Wait
. . .Traverse until the leaf node, with exploration & exploitation
WaitKeepLane
ChangeLane
KeepLane
⊥Simulate!
Backpropagate!
Overall performance
(averaged over 1000 episodes)
15
![Page 17: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/17.jpg)
‣ The results are reproducible using the publicly available code at
‣ Future works
✓ Comparisons of RL and hand-coded motion planners.
✓ Different scenarios, realistic vehicle dynamics, etc.
✓ Simulation-to-Real
Concluding Remarks
git.uwaterloo.ca/wise-lab/wise-move/
‣ Features:
Options / Reinforcement Learning / Runtime Verification / Monte Carlo Tree Search (MCTS)
16
![Page 18: WISEMOVE A Framework to Investigate Safe Deep Reinforcement … · 2019-11-19 · Deep Model for Trajectory Generation Deep Model for Decision Making Runtime Verifier G((has_stopped_in_stop_region](https://reader033.vdocuments.mx/reader033/viewer/2022050609/5fb073fbb924d81bfd0799dc/html5/thumbnails/18.jpg)
Thank you for attention!Q & A
Acknowledgment
This work is supported by the Japanese Science and Technology agency (JST) ERATO project JPMJER1603: HASUO
Metamathematics for Systems Design, and by the Natural Sciences and Engineering Research Council of Canada (NSERC)
Discovery Grant: Model-Based Synthesis and Safety Assurance of Intelligent Controllers for Autonomous Vehicles.