shield synthesis for ai - tumkretinsk/live_2019_shields.pdf · 2019-05-03 · bettina könighofer...
TRANSCRIPT
![Page 1: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/1.jpg)
Bettina Könighofer Shield Synthesis for AI
www.iaik.tugraz.at www.iaik.tugraz.at
Shield Synthesis for AI
Bettina KönighoferRoderick Bloem
Ufuk TopcuScott NiekumMohammed AlshiekSuda Bharadwaj
Rüdiger Ehlers
Nils Jansen
Sebastian Junges
Rayna Dimitrova
Thomas HenzingerGuy AvniKrishnendu Chatterjee
![Page 2: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/2.jpg)
Bettina Könighofer Shield Synthesis for AI
2
LiVe @ ETAPS, PragueApril 6, 2019
Reinforcement Learning
ReactiveSynthesis
Your Controller
Synthesis
Your Specification
𝑮 𝑏𝑙𝑜𝑐𝑘𝑒𝑑 → 𝑭𝑅 ∧ 𝑮 𝑏𝑙𝑜𝑐𝑘𝑒𝑑 → 𝑭𝑆∧ 𝑮 𝑏𝑙𝑜𝑐𝑘𝑒𝑑 → 𝑿 𝐶 𝑼 𝑏𝑙𝑜𝑐𝑘𝑒𝑑
Infinitely often, visit R and S. If S is blocked, go to C. Resume visiting R and S once S is unblocked.
via Game Solving
![Page 3: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/3.jpg)
Bettina Könighofer Shield Synthesis for AI
3
LiVe @ ETAPS, PragueApril 6, 2019
Reinforcement Learning
ReactiveSynthesis
• Large• Complicated• Highly optimized• Many sensors• …
Your Controller
• Large• Hard to write• Greyscale
Your Specification
![Page 4: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/4.jpg)
Bettina Könighofer Shield Synthesis for AI
4
Reinforcement Learning
Reinforcement Learning
ReactiveSynthesis
EnvironmentLearning
Agent𝑠𝑡𝑎𝑡𝑒
𝑎𝑐𝑡𝑖𝑜𝑛
𝑟𝑒𝑤𝑎𝑟𝑑
LiVe @ ETAPS, PragueApril 6, 2019
![Page 5: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/5.jpg)
Bettina Könighofer Shield Synthesis for AI
5
LiVe @ ETAPS, PragueApril 6, 2019
Reinforcement Learning
ReactiveSynthesis
Correctness Guarantees
OptimalityHow?
![Page 6: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/6.jpg)
Bettina Könighofer Shield Synthesis for AI
6
LiVe @ ETAPS, PragueApril 6, 2019
Reinforcement Learning
ReactiveSynthesis
Correctness Guarantees
Optimality
• Large• Complicated• Highly optimized• Many sensors• …
Your Controller
• Large• Hard to write• Greyscale
Your Specification
Shielding
![Page 7: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/7.jpg)
Bettina Könighofer Shield Synthesis for AI
7
LiVe @ ETAPS, PragueApril 6, 2019
Reinforcement Learning
ReactiveSynthesis
• Critical aspects only• Small & sweet
• Large• Complicated• Highly optimized• Many sensors• …
Your Controller Critical Spec
Correctness Guarantees
OptimalityShielding
![Page 8: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/8.jpg)
Bettina Könighofer Shield Synthesis for AI
8
LiVe @ ETAPS, PragueApril 6, 2019
EnvironmentLearning
Agent
Shield
ShieldingPreemptive
Reinforcement Learning
ReactiveSynthesis
Minimal Interference
![Page 9: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/9.jpg)
Bettina Könighofer Shield Synthesis for AI
9
LiVe @ ETAPS, PragueApril 6, 2019
ShieldingPost-Posed
Environment Learning Agent
Shield
Policy Update: for 𝒔𝒂𝒇𝒆_𝒂𝒄𝒕𝒊𝒐𝒏 using 𝑟𝑒𝑤𝑎𝑟𝑑 for 𝑎𝑐𝑡𝑖𝑜𝑛 if 𝑎𝑐𝑡𝑖𝑜𝑛 𝒔𝒂𝒇𝒆_𝒂𝒄𝒕𝒊𝒐𝒏 :
1. Assign a punishment to 𝑎𝑐𝑡𝑖𝑜𝑛2. Assign 𝑟𝑒𝑤𝑎𝑟𝑑 to 𝑎𝑐𝑡𝑖𝑜𝑛
Shield can be added in execution phase
![Page 10: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/10.jpg)
Bettina Könighofer Shield Synthesis for AI
A Shield for PAC-MAN10
LiVe @ ETAPS, PragueApril 6, 2019
M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Niekum, U. Topcu: Safe Reinforcement Learning via Shielding. AAAI 2018
![Page 11: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/11.jpg)
Bettina Könighofer Shield Synthesis for AI
Outline11
Safety Shields
Optimal Shields
Safety Shields for Multi-Agent Systems
Probabilistic Safety Shields
LiVe @ ETAPS, PragueApril 6, 2019
AAAI-18
submission
ACC-19
arXiv
![Page 12: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/12.jpg)
Bettina Könighofer Shield Synthesis for AI
Optimal Shields12
LiVe @ ETAPS, PragueApril 6, 2019
Problems of learned controllers (Safety problems)
1. Difficult to add new features2. Poor performance on
un-trained behavior3. No local fairness
Solution:Optimal Shield
![Page 13: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/13.jpg)
Bettina Könighofer Shield Synthesis for AI
Shields for Traffic Light Controllers13
LiVe @ ETAPS, PragueApril 6, 2019
Learned Controller: “minimize total waiting time”
1. Difficult to add new features priority to public transport, changes due to an accident
2. Poor performance on un-trained behavior Uniform traffic congestion meets rush-hour traffic
3. No local fairness Farm road never gets green
![Page 14: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/14.jpg)
Bettina Könighofer Shield Synthesis for AI
Lightweight shields 𝑐 : Cost for behavior 𝑐 : Cost for interference
Optimal Shields Synthesis14
LiVe @ ETAPS, PragueApril 6, 2019
Environment Learned Controller
opt-Shield’
𝝀 ⋅ 𝒄𝑩𝑬𝑯 𝟏 𝝀 ⋅ 𝒄𝑰𝑵𝑻
Mean-Payoff Game with 2 Objectives Mean-Payoff Game
𝜆: tradeoff between objective of controller vs shield
Two cost functions
![Page 15: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/15.jpg)
Bettina Könighofer Shield Synthesis for AI
Controller Deep Convolutional Q-Network
16 dim input vector num approaching cars, waiting time 4 layers (16, 604, 604, 4 nodes), Q-learning: 𝛼 0.001, 𝛾 0.95
“Minimize waiting time of two junctions” Shield 𝑐 : size of maximal queue 𝑐 : 1 for interference, 0 otherwise
Dealing with rush-hour traffic15
LiVe @ ETAPS, PragueApril 6, 2019
𝑎𝑏𝑠𝑡𝑟𝑎𝑐𝑡 𝑠𝑡𝑎𝑡𝑒1,8,1,2
![Page 16: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/16.jpg)
Bettina Könighofer Shield Synthesis for AI
Dealing with rush-hour traffic16
LiVe @ ETAPS, PragueApril 6, 2019
![Page 17: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/17.jpg)
Bettina Könighofer Shield Synthesis for AI
Outline17
Safety Shields
Optimal Shields
Safety Shields for Multi-Agent Systems
Probabilistic Safety Shields
LiVe @ ETAPS, PragueApril 6, 2019
AAAI-18
submission
ACC-19
arXiv
![Page 18: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/18.jpg)
Bettina Könighofer Shield Synthesis for AI
Safety Shields for Multi-Agent Systems18
Task: Enforce global safety property1. Quantitative interference costs Counting cost function Different costs for interferences with different agents
2. Fair Shielding Do not always interfere with the same agent repeatedly
LiVe @ ETAPS, PragueApril 6, 2019
![Page 19: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/19.jpg)
Bettina Könighofer Shield Synthesis for AI
Case Study: Warehouse19
LiVe @ ETAPS, PragueApril 6, 2019
![Page 20: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/20.jpg)
Bettina Könighofer Shield Synthesis for AI
Case Study: Warehouse20
LiVe @ ETAPS, PragueApril 6, 2019
S. Bharadwaj, R. Bloem, R. Dimitrova, B. Könighofer, and U. Topcu: Synthesis of Minimum-Cost Shields for Multi-agent Systems. ACC-19
![Page 21: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/21.jpg)
Bettina Könighofer Shield Synthesis for AI
Outline21
Safety Shields
Optimal Shields
Safety Shields for Multi-Agent Systems
Probabilistic Safety Shields
LiVe @ ETAPS, PragueApril 6, 2019
AAAI-18
submission
ACC-19
arXiv
![Page 22: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/22.jpg)
Bettina Könighofer Shield Synthesis for AI
Shielding original Pacman?22
LiVe @ ETAPS, PragueApril 6, 2019
State space is huge!
Not realizable!
![Page 23: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/23.jpg)
Bettina Könighofer Shield Synthesis for AI
Learning the Adversary Model23
Each ghost has it‘s individual behaviour Observe it, model the behaviour Data augmentation techniques Is PAC-MAN north, south, east, or west?
Results in MDP of environment Guaranteed safety w.r.t. probabilistic temporal
logic spec
LiVe @ ETAPS, PragueApril 6, 2019
![Page 24: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/24.jpg)
Bettina Könighofer Shield Synthesis for AI
MDP is huge! Scalability24
Finite Horizon safety for finite number of steps infinite horizon may cause large errors anyways
Piecewise Construction compute shield for each state independently
LiVe @ ETAPS, PragueApril 6, 2019
![Page 25: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/25.jpg)
Bettina Könighofer Shield Synthesis for AI
MDP is huge! Scalability25
Independent Agents crashing probabilities for different agents are
stochastically independent compute individually, compose shields
Abstractions adversaries may be far away neglect adversary positions that are not relevant
LiVe @ ETAPS, PragueApril 6, 2019
![Page 26: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/26.jpg)
Bettina Könighofer Shield Synthesis for AI
Probabilistic Safety Shield for Pacman26
LiVe @ ETAPS, PragueApril 6, 2019
N. Jansen, B. Könighofer2, S. Junges, and R. Bloem:Shielded Decision-Making in MDPs, arXiv
![Page 27: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/27.jpg)
Bettina Könighofer Shield Synthesis for AI
Future Work27
Safety Shields
Optimal Shields
Safety Shields for Multi-Agent Systems
Probabilistic Safety Shields
LiVe @ ETAPS, PragueApril 6, 2019
Performance in autonomous systems
Shields for CPS, Deal with wrong models
Partially observable MDPs
Distributed Shield Synthesis
![Page 28: Shield Synthesis for AI - TUMkretinsk/LiVE_2019_Shields.pdf · 2019-05-03 · Bettina Könighofer Shield Synthesis for AI 18 Safety Shields for Multi-Agent Systems Task: Enforce global](https://reader035.vdocuments.mx/reader035/viewer/2022070712/5ecc9e8c16d59e6c77630e60/html5/thumbnails/28.jpg)
Bettina Könighofer Shield Synthesis for AI
28
LiVe @ ETAPS, PragueApril 6, 2019