deep reinforcement learning for green security games with ...reinforcement learning and the classic...

Deep Reinforcement Learning for Green Security Games with Real-time Information

Yufei Wang1, Ryan Shi2, Lantao Yu3, Yi Wu4,

Rohit Singh5, Lucas Joppa6, Fei Fang2

Peking University1, Carnegie Mellon University2

Stanford University3 , University of California, Berkeley4

World Wide Fund for Nature5, Microsoft Research6

AAAI 19

Green Security Challenges

4/25/2019 3

Motivation

122 rhinos in 2009 1215 rhinos in 2014

Green Security Games

44/25/2019

• Green Security Games model the strategic interaction between law enforcement agencies (defenders) and their opponents (attackers). [Fang, Stone, and Tambe 2015; Fang et al. 2016; Xu et al. 2017]

• However, previous work mainly focuses on computing patrol strategy offline.

Motivation

• Design patrol routes with a limited number of patrol resources.

The effect of Real-time Information

54/25/2019

Motivation

Our contribution

• We propose a game model GSG-I for green security games, that incorporates the vital real-time information.

• We design an efficient algorithm DeDOL that combines deep reinforcement learning and the classic Double Oracle framework in security games to solve zero-sum GSG-I.

• DeDOL is built upon the Policy Space Response Oracle framework, with two important domain-specific enhancements that improves the training efficiency.

4/25/2019 6

Contribution

Our proposed GSG-I Model • GSG-I: Green Security Games with Real-Time Information

Snares placed by the attacker

Attacker’s footprints

Defender’s footprints

Animal catching prob 𝑷𝒊𝒋.

74/25/2019

GSG-I Model

Attacker’s view

Defender’s view

Approximating Best Response against a fixed opponent with DQN

Up Down Left Right Still

84/25/2019

DQN Strategy

[Mnih et al. 2015]

Double Oracle & Policy Space Response Oracle

4/25/2019 9

DO & PSRO

Parameterized policy Approximate Best Response using Deep RL

PSRO

[McMahan, Gordon, and Blum 2003; Lanctot et al. 2017]

Initial heuristic strategy

• Attacking Tool Placement

104/25/2019

Enhancement on PSRO

Attacker: parameterized heuristic Defender: random sweeping

• Moving Direction

DQN strategy performance

4/25/2019 11

DQN Experiments

DQN strategy demo

DQN patroller against a heuristic poacher on 7x7 Gaussian grid.

DQN poacher against a random sweeping patroller on 7x7 random grid.

124/25/2019

DQN Experiments

Local modes: restrict attacker’s entry point

Ease training

Provide good building block strategy

Global mode with 4 entry points local mode with 1 top-left entry point

134/25/2019

Enhancement on PSRO

The DeDOL algorithm workflow

• Deep-Q Network based Double Oracle enhanced with Local modes

144/25/2019

DeDOL

Generate initial Strategies 𝐺0 ={𝑅𝑆,𝐻𝑒𝑢𝑟𝑖𝑠𝑡𝑖𝑐}

PSRO(Local, Entry Point 1, 𝐺0)

PSRO(Local, Entry Point 2, 𝐺0)

PSRO(Local, Entry Point 𝑛, 𝐺0)

…

Combine all subgames to 𝐺𝑇

𝐺𝑇1

𝐺𝑇2

𝐺𝑇𝑛

PSRO(Global, 𝐺𝑇)

Output optimal defender strategy 𝜎𝑇+𝑘

𝑑

DeDOL Performance

• Vanilla PSRO: no heuristic initial strategy, no local mode• CFR: counter-factual regret minimization• Metric: highest expected utility against a best-response poacher

across all iterations• DeDOL with local mode performs the best!

154/25/2019

Experiments

Summary

• We incorporates the real-time information in Green Security Games, and proposed a new game model.

• We design an efficient algorithm DeDOL to compute the optimal patrol strategy in GSG-I, which is built upon the Double Oracle / Policy Space Response Oracle framework with domain-specific enhancements.

164/25/2019

Motivation

Thank you!

Lantao Yu

[email protected]

Fei [email protected]

Yi [email protected]

Rohit [email protected]

Lucas [email protected]

Yufei Wang

[email protected]

Zheyuan Ryan Shi

[email protected]

174/25/2019The Azure resources are provided by Microsoft for Research AI for Earth award program.

deep reinforcement learning for green security games with ...reinforcement learning and the classic...

Documents