deep reinforcement learning for green security games with ...reinforcement learning and the classic...
TRANSCRIPT
Deep Reinforcement Learning for Green Security Games with Real-time Information
Yufei Wang1, Ryan Shi2, Lantao Yu3, Yi Wu4,
Rohit Singh5, Lucas Joppa6, Fei Fang2
Peking University1, Carnegie Mellon University2
Stanford University3 , University of California, Berkeley4
World Wide Fund for Nature5, Microsoft Research6
AAAI 19
Green Security Challenges
4/25/2019 3
Motivation
122 rhinos in 2009 1215 rhinos in 2014
Green Security Games
44/25/2019
• Green Security Games model the strategic interaction between law enforcement agencies (defenders) and their opponents (attackers). [Fang, Stone, and Tambe 2015; Fang et al. 2016; Xu et al. 2017]
• However, previous work mainly focuses on computing patrol strategy offline.
Motivation
• Design patrol routes with a limited number of patrol resources.
The effect of Real-time Information
54/25/2019
Motivation
Our contribution
• We propose a game model GSG-I for green security games, that incorporates the vital real-time information.
• We design an efficient algorithm DeDOL that combines deep reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I.
• DeDOL is built upon the Policy Space Response Oracle framework, with two important domain-specific enhancements that improves the training efficiency.
4/25/2019 6
Contribution
Our proposed GSG-I Model • GSG-I: Green Security Games with Real-Time Information
Snares placed by the attacker
Attacker’s footprints
Defender’s footprints
Animal catching prob 𝑷𝒊𝒋.
74/25/2019
GSG-I Model
Attacker’s view
Defender’s view
Approximating Best Response against a fixed opponent with DQN
Up Down Left Right Still
84/25/2019
DQN Strategy
[Mnih et al. 2015]
Double Oracle & Policy Space Response Oracle
4/25/2019 9
DO & PSRO
Parameterized policy Approximate Best Response using Deep RL
PSRO
[McMahan, Gordon, and Blum 2003; Lanctot et al. 2017]
Initial heuristic strategy
• Attacking Tool Placement
104/25/2019
Enhancement on PSRO
Attacker: parameterized heuristic Defender: random sweeping
• Moving Direction
DQN strategy performance
4/25/2019 11
DQN Experiments
DQN strategy demo
DQN patroller against a heuristic poacher on 7x7 Gaussian grid.
DQN poacher against a random sweeping patroller on 7x7 random grid.
124/25/2019
DQN Experiments
Local modes: restrict attacker’s entry point
Ease training
Provide good building block strategy
Global mode with 4 entry points local mode with 1 top-left entry point
134/25/2019
Enhancement on PSRO
The DeDOL algorithm workflow
• Deep-Q Network based Double Oracle enhanced with Local modes
144/25/2019
DeDOL
Generate initial Strategies 𝐺0 ={𝑅𝑆,𝐻𝑒𝑢𝑟𝑖𝑠𝑡𝑖𝑐}
PSRO(Local, Entry Point 1, 𝐺0)
PSRO(Local, Entry Point 2, 𝐺0)
PSRO(Local, Entry Point 𝑛, 𝐺0)
…
Combine all subgames to 𝐺𝑇
𝐺𝑇1
𝐺𝑇2
𝐺𝑇𝑛
PSRO(Global, 𝐺𝑇)
Output optimal defender strategy 𝜎𝑇+𝑘
𝑑
DeDOL Performance
• Vanilla PSRO: no heuristic initial strategy, no local mode• CFR: counter-factual regret minimization• Metric: highest expected utility against a best-response poacher
across all iterations• DeDOL with local mode performs the best!
154/25/2019
Experiments
Summary
• We incorporates the real-time information in Green Security Games, and proposed a new game model.
• We design an efficient algorithm DeDOL to compute the optimal patrol strategy in GSG-I, which is built upon the Double Oracle / Policy Space Response Oracle framework with domain-specific enhancements.
164/25/2019
Motivation
Thank you!
Lantao Yu
Rohit [email protected]
Lucas [email protected]
Yufei Wang
Zheyuan Ryan Shi
174/25/2019The Azure resources are provided by Microsoft for Research AI for Earth award program.