the obstacle tower - gamesim.ai · generative art – made with unity the obstacle tower: a...
TRANSCRIPT
Gen
erat
ive
Art –
Mad
e w
ith U
nity
The Obstacle Tower:A Generalization Challenge in Vision, Control, and Planning
Arthur JulianiAhmed Khalifa*Vincent-Pierre Berges Jonathan HarperHunter HenryAdam CrespiJulian Togelius* Danny Lange
1
*
Montezuma’s Revenge - The most notorious benchmark?
● Sparse rewards● Intuitive conceptual entities● Requires planning and fine motor skills● Hard to make progress!
5
(Taken from article by The Verge)
Montezuma’s Revenge - Weaknesses
● Deterministic● Single solution● Low visual fidelity● Hard to make progress!
6
Montezuma’s Revenge - Solved?
● Solved with Demonstrations ○ (Aytar et al 2018)
● Solved with Curiosity ○ (Burda et al 2018)
● Solved with Go-Explore ○ (Ecoffet et al 2018)
7
● (Ecoffet et al 2018)
● Vision○ High-fidelity 3D visuals○ Realtime lighting/shadows
● Control ○ Platforming puzzles
● Planning○ Complex floor layouts
● Generalization○ Procedural Floors, Rooms, and
Visuals
Obstacle Tower Environment
10
● Each episode a new tower○ Each tower filled with 25 floors
■ Each floor filled with rooms● Each room filled with obstacles and puzzles!
Obstacle Tower Environment
11
● Floors use generative grammar
Procedural Generation
12
● Rooms use probabilistic templates
Po PX PX PoPX GK GX PXPX GX GX PXPo PX PX Po
GX GX GXGX GX GXGX GX GX
To To To ToTo GB Go ToTo Go GT ToTo To To To
PX GX PXGX hX GXPX GX PX
● Lighting and visuals vary
● Visual Themes:
○ Ancient
○ Modern
○ Industrial
● Lighting:
○ Intensity
○ Angle
○ Color● P - Pit● G - Ground● K - Key● … etc
Obstacle Tower Environment
13
Default
Retro
Observations Actions Rewards
84x84
168x168
Keys Collected&
Time Left+
One of 54 Actions
+1Solve a floor
+0.1Open a door
+0.1Pick up a key
Obstacle Tower Evaluation
Fixed
Varied
Training Testing
No Generalization
Weak Generalization
Strong Generalization
x 100
● 15 humans playtesters● 5 in each condition● 5 mins “training”
Obstacle Tower Human Results
16
Condition Train Test Test (Max)
No Generalization
15.2 (2.9) 15.2 (2.9) 22
Weak Generalization
12.3 (2.9) 15.6 (3.5) 21
Strong Generalization
12 (6.8) 9.3 (3.1) 20
Average floor completion rates.
Obstacle Tower - Agent Training Results
18
● Algorithms○ Rainbow (Dopamine)○ PPO (OpenAI Baselines)
● Conditions○ Fixed○ Varied
Obstacle Tower - Agent Test Results
19
Condition PPO (F) PPO (V) RNB (F) RNB (V)
No Generalization
5.0 (0.0) 1.0 (0.0) 5.0 (0.0) 5.0 (0.0)
Weak Generalization
1.2 (0.4) 0.8 (0.4) 0.6 (0.8) 3.2 (1.1)
Strong Generalization
0.6 (0.8) 0.6 (0.5) 0.0 (0.0) 1.6 (0.5)
Average floor completion rates.
Obstacle Tower - Agent Test Results
20
● Algorithms○ Rainbow (Dopamine)○ PPO (OpenAI Baselines)
● Conditions○ Fixed○ Varied
Obstacle Tower Research Avenues
24
Intrinsic Motivation
World-Model LearningMeta-Learning
Hierarchical Control
Ha & Schmidhuber (2018)Finn et al (2017)
Pathak et al (2017)Merel et al (2018)
● Spring - Version 2.0○ 100 floors per tower○ More visual themes○ More obstacles○ Enemies!
Development in 2019
25
● Summer - Version 3.0○ Open Source release○ Custom Rewards○ Custom Observation Spaces○ Custom Floors & Rooms