deep reinforcement learning for green security games with ...reinforcement learning and the classic...

16
Deep Reinforcement Learning for Green Security Games with Real-time Information Yufei Wang 1 , Ryan Shi 2 , Lantao Yu 3 , Yi Wu 4 , Rohit Singh 5 , Lucas Joppa 6 , Fei Fang 2 Peking University 1 , Carnegie Mellon University 2 Stanford University 3 , University of California, Berkeley 4 World Wide Fund for Nature 5 , Microsoft Research 6 AAAI 19

Upload: others

Post on 29-Mar-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

Deep Reinforcement Learning for Green Security Games with Real-time Information

Yufei Wang1, Ryan Shi2, Lantao Yu3, Yi Wu4,

Rohit Singh5, Lucas Joppa6, Fei Fang2

Peking University1, Carnegie Mellon University2

Stanford University3 , University of California, Berkeley4

World Wide Fund for Nature5, Microsoft Research6

AAAI 19

Page 2: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

Green Security Challenges

4/25/2019 3

Motivation

122 rhinos in 2009 1215 rhinos in 2014

Page 3: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

Green Security Games

44/25/2019

• Green Security Games model the strategic interaction between law enforcement agencies (defenders) and their opponents (attackers). [Fang, Stone, and Tambe 2015; Fang et al. 2016; Xu et al. 2017]

• However, previous work mainly focuses on computing patrol strategy offline.

Motivation

• Design patrol routes with a limited number of patrol resources.

Page 4: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

The effect of Real-time Information

54/25/2019

Motivation

Page 5: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

Our contribution

• We propose a game model GSG-I for green security games, that incorporates the vital real-time information.

• We design an efficient algorithm DeDOL that combines deep reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I.

• DeDOL is built upon the Policy Space Response Oracle framework, with two important domain-specific enhancements that improves the training efficiency.

4/25/2019 6

Contribution

Page 6: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

Our proposed GSG-I Model • GSG-I: Green Security Games with Real-Time Information

Snares placed by the attacker

Attacker’s footprints

Defender’s footprints

Animal catching prob 𝑷𝒊𝒋.

74/25/2019

GSG-I Model

Attacker’s view

Defender’s view

Page 7: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

Approximating Best Response against a fixed opponent with DQN

Up Down Left Right Still

84/25/2019

DQN Strategy

[Mnih et al. 2015]

Page 8: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

Double Oracle & Policy Space Response Oracle

4/25/2019 9

DO & PSRO

Parameterized policy Approximate Best Response using Deep RL

PSRO

[McMahan, Gordon, and Blum 2003; Lanctot et al. 2017]

Page 9: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

Initial heuristic strategy

• Attacking Tool Placement

104/25/2019

Enhancement on PSRO

Attacker: parameterized heuristic Defender: random sweeping

• Moving Direction

Page 10: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

DQN strategy performance

4/25/2019 11

DQN Experiments

Page 11: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

DQN strategy demo

DQN patroller against a heuristic poacher on 7x7 Gaussian grid.

DQN poacher against a random sweeping patroller on 7x7 random grid.

124/25/2019

DQN Experiments

Page 12: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

Local modes: restrict attacker’s entry point

Ease training

Provide good building block strategy

Global mode with 4 entry points local mode with 1 top-left entry point

134/25/2019

Enhancement on PSRO

Page 13: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

The DeDOL algorithm workflow

• Deep-Q Network based Double Oracle enhanced with Local modes

144/25/2019

DeDOL

Generate initial Strategies 𝐺0 ={𝑅𝑆,𝐻𝑒𝑢𝑟𝑖𝑠𝑡𝑖𝑐}

PSRO(Local, Entry Point 1, 𝐺0)

PSRO(Local, Entry Point 2, 𝐺0)

PSRO(Local, Entry Point 𝑛, 𝐺0)

Combine all subgames to 𝐺𝑇

𝐺𝑇1

𝐺𝑇2

𝐺𝑇𝑛

PSRO(Global, 𝐺𝑇)

Output optimal defender strategy 𝜎𝑇+𝑘

𝑑

Page 14: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

DeDOL Performance

• Vanilla PSRO: no heuristic initial strategy, no local mode• CFR: counter-factual regret minimization• Metric: highest expected utility against a best-response poacher

across all iterations• DeDOL with local mode performs the best!

154/25/2019

Experiments

Page 15: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

Summary

• We incorporates the real-time information in Green Security Games, and proposed a new game model.

• We design an efficient algorithm DeDOL to compute the optimal patrol strategy in GSG-I, which is built upon the Double Oracle / Policy Space Response Oracle framework with domain-specific enhancements.

164/25/2019

Motivation

Page 16: Deep Reinforcement Learning for Green Security Games with ...reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I. •DeDOL is built

Thank you!

Lantao Yu

[email protected]

Fei [email protected]

Yi [email protected]

Rohit [email protected]

Lucas [email protected]

Yufei Wang

[email protected]

Zheyuan Ryan Shi

[email protected]

174/25/2019The Azure resources are provided by Microsoft for Research AI for Earth award program.