raia hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · raia...
TRANSCRIPT
![Page 1: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/1.jpg)
Learning in sequential environments
Raia HadsellStaff Research Scientist, DeepMind
raiahadsell.com
![Page 2: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/2.jpg)
Scaling deep reinforcement learning towards the real world:
part 1: learning sequential tasks without forgettingpart 2: learning to navigate in complex worlds
![Page 3: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/3.jpg)
Raia Hadsell 2017
EnvironmentAgent
Reinforcement Learning
OBSERVATIONS
ACTIONS
REWARD?
![Page 4: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/4.jpg)
Raia Hadsell 2017
○ Maximizing Qπ(s,a) over possible policies gives the optimal
action-value function and the Bellman equation:
○ Basic idea:
■ Approximate →
■ Apply the Bellman Equation as an iterative update
Value Iteration
![Page 5: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/5.jpg)
Raia Hadsell 2017
○ Use a neural network for Q(s,a; )
○ Train end-to-end from raw pixels
End-to-End Reinforcement Learning
![Page 6: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/6.jpg)
Raia Hadsell 2017
but.. a network for every task?
![Page 7: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/7.jpg)
Raia Hadsell 2017
one network for all?
![Page 8: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/8.jpg)
Raia Hadsell 2017
Catastrophic forgetting
● Well-known phenomenon● Especially severe in Deep RL
![Page 9: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/9.jpg)
Raia Hadsell 2017
Catastrophic forgetting
https://www.youtube.com/watch?v=Fh_zNpdc0Xs
![Page 10: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/10.jpg)
Raia Hadsell 2017
Catastrophic forgetting
https://www.youtube.com/watch?v=yk_sW4x6zb0https://www.youtube.com/watch?v=V4oT1Ei-8_khttps://www.youtube.com/watch?v=LjFGy4BxOL8
![Page 11: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/11.jpg)
Raia Hadsell 2017
An illustration
Task B
*
Task A
SGD
EWC
L2
![Page 12: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/12.jpg)
Raia Hadsell 2017
Elastic Weight Consolidation
Task B
*
Task A
Elastic Weight Consolidation (EWC):
Constrain important parameters
to stay close to their old values
Continual learning in the brain:
Synaptic consolidation reduces
the plasticity of synapses that are
vital to previous tasks.
SGD
EWC
L2
![Page 13: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/13.jpg)
Raia Hadsell 2017
Elastic Weight Consolidation
Implement constraint as a quadratic penalty
that is applied while training on B, but not
uniformly - rather, should be greater for
important parameters of Task A.
Posterior distribution
contains exactly this,
but is intractable. Task B
*
Task A
SGD
EWC
L2
![Page 14: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/14.jpg)
Raia Hadsell 2017
Estimate posterior with Gaussian.
Mean: parameter vector *A
Diagonal precision given by approximation
of the Fisher Information F.
Elastic Weight Consolidation
Task B
*
Task A
SGD
EWC
L2
![Page 15: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/15.jpg)
Raia Hadsell 2017
Elastic Weight Consolidation
Task B
*
Task A
SGD
EWC
L2
![Page 16: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/16.jpg)
Raia Hadsell 2017
Experiment: Permuted MNIST
Random, fixed permutations of MNIST dataset.
Train a multilayer, fully-connected network with ReLus until convergence
We compare SGD, L2 regularisation, and EWC.
Perm A Perm B Perm C
![Page 17: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/17.jpg)
Raia Hadsell 2017
Experiment: Permuted MNIST
![Page 18: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/18.jpg)
Raia Hadsell 2017
Experiment: Permuted MNIST
![Page 19: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/19.jpg)
Raia Hadsell 2017
Experiment: Permuted MNIST
![Page 20: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/20.jpg)
Raia Hadsell 2017
Experiment: Permuted MNIST
![Page 21: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/21.jpg)
Raia Hadsell 2017
Let’s try something harder...
Sequential reinforcement learning tasks (10 Atari games)
Random ordering with extended game play on each task, multiple returns
Unknown task boundaries
Regular testing of all 10 games
Single network with fixed capacity
![Page 22: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/22.jpg)
Raia Hadsell 2017
Experiment: Atari 10
Forget-Me-Not1 allows labeling of data
segments, used for
● EWC regularisation
● Task-specific replay buffers used for
DDQN2
● Task-specific bias and gains at each
network layer
Fisher estimated at each task
boundary and EWC penalty is updated
[1] The forget-me-not process, Milan et al., NIPS 2016[2] Deep reinforcement learning with double q-learning, Hasselt et al., AAAI 2016
![Page 23: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/23.jpg)
https://www.youtube.com/watch?v=Ry2WRcnwsYU
![Page 24: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/24.jpg)
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia
Clopath, Dharshan Kumaran, Raia Hadsell
Overcoming catastrophic forgetting in neural networks
PNAS 2017arxiv.org/abs/1612.00796
![Page 25: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/25.jpg)
Raia Hadsell 2017
Learning to navigate in complex mazes
![Page 26: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/26.jpg)
Raia Hadsell 2017
Navigation mazesGame episode:
1. Random start2. Find the goal (+10)3. Teleport randomly4. Re-find the goal (+10)5. Repeat (limited time)
+10 +1
![Page 27: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/27.jpg)
Raia Hadsell 2017
Navigation mazesGame episode:
1. Random start2. Find the goal (+10)3. Teleport randomly4. Re-find the goal (+10)5. Repeat (limited time)
+10 +1
Variants:● Static maze, static goal● Static maze, random goal● Random maze
Observations: RGB, velocityActions: 8
![Page 28: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/28.jpg)
Raia Hadsell 2017
Navigation mazesGame episode:
1. Random start2. Find the goal (+10)3. Teleport randomly4. Re-find the goal (+10)5. Repeat (limited time)
3600 steps/episode
Variants:● Static maze, static goal● Static maze, random goal● Random maze
Observations: RGB, velocityActions: 8
![Page 29: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/29.jpg)
Raia Hadsell 2017
Navigation mazes Game episode:
1. Random start2. Find the goal (+10)3. Teleport randomly4. Re-find the goal (+10)5. Repeat (limited time)
3600 steps/episode
10800 steps/episode
3600 steps/episode
Variants:● Static maze, static goal● Static maze, random goal● Random maze
Observations: RGB, velocityActions: 8
![Page 30: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/30.jpg)
Raia Hadsell 2017
The vast and meaningless silence of an agent exploring...
Given: Sparse rewards
Wanted:Spatial knowledge
1e7
I have been here before! I know where to go!
1e7
Why is learning navigation via reinforcement learning hard?
![Page 31: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/31.jpg)
Raia Hadsell 2017
Given:Sparse rewards
Wanted:Spatial knowledge
1. Accelerate reinforcement learning through auxiliary losses➔ Stable gradients help learning, even if unrelated to reward
2. Drive spatial knowledge through choice of auxiliary tasks:● Depth prediction● Loop closure prediction
Why is learning navigation via reinforcement learning hard?
![Page 32: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/32.jpg)
Raia Hadsell 2017
Nav agent ingredients:
1. Convolutional encoder and RGB inputs
enc
xt
![Page 33: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/33.jpg)
Raia Hadsell 2017
Nav agent ingredients:
1. Convolutional encoder and RGB inputs
2. Stacked LSTM
enc
xt
![Page 34: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/34.jpg)
Raia Hadsell 2017
Nav agent ingredients:
1. Convolutional encoder and RGB inputs
2. Stacked LSTM
3. Additional inputs (reward, action, and velocity)
enc
xt rt-1 {vt, at-1}
![Page 35: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/35.jpg)
Raia Hadsell 2017
Nav agent ingredients:
1. Convolutional encoder and RGB inputs
2. Stacked LSTM
3. Additional inputs (reward, action, and velocity)
4. RL: Asynchronous advantage actor critic (A3C)
enc
xt rt-1 {vt, at-1}
Mnih et al. (2016)
![Page 36: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/36.jpg)
Raia Hadsell 2017
Nav agent ingredients:
1. Convolutional encoder and RGB inputs
2. Stacked LSTM
3. Additional inputs (reward, action, and velocity)
4. RL: Asynchronous advantage actor critic (A3C)
5. Aux task 1: Depth predictors
enc
Depth (D1 )
xt rt-1 {vt, at-1}
Depth (D2 )
![Page 37: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/37.jpg)
Raia Hadsell 2017
Nav agent ingredients:
1. Convolutional encoder and RGB inputs
2. Stacked LSTM
3. Additional inputs (reward, action, and velocity)
4. RL: Asynchronous advantage actor critic (A3C)
5. Aux task 1: Depth predictors
![Page 38: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/38.jpg)
Raia Hadsell 2017
Nav agent ingredients:
1. Convolutional encoder and RGB inputs
2. Stacked LSTM
3. Additional inputs (reward, action, and velocity)
4. RL: Asynchronous advantage actor critic (A3C)
5. Aux task 1: Depth predictor
6. Aux task 2: Loop closure predictor enc
Loop (L)
Depth (D1 )
xt rt-1 {vt, at-1}
Depth (D2 )
![Page 39: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/39.jpg)
Raia Hadsell 2017
Nav agent ingredients:
1. Convolutional encoder and RGB inputs
2. Stacked LSTM
3. Additional inputs (reward, action, and velocity)
4. RL: Asynchronous advantage actor critic (A3C)
5. Aux task 1: Depth predictor
6. Aux task 2: Loop closure predictor
7. For analysis: Position decoder
enc
Loop (L)
Depth (D1 )
xt rt-1 {vt, at-1}
Depth (D2 ) Position
![Page 40: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/40.jpg)
Raia Hadsell 2017
details..
![Page 41: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/41.jpg)
Raia Hadsell 2017
more details.. policy gradient:
depth prediction from visual features:
depth prediction from LSTM features:
loop prediction from LSTM features:
![Page 42: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/42.jpg)
Raia Hadsell 2017
Experiments
xt rt-1 {vt, at-1}
enc
xt
enc enc
Loop (L)
Depth (D1 )
a. FF A3C c. Nav A3C d. Nav A3C +D1D2L
xt rt-1 {vt, at-1}
enc
xt
b. LSTM A3C
Depth (D2 )
![Page 43: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/43.jpg)
Raia Hadsell 2017
+10 +1
Results on large maze with static goal
https://www.youtube.com/watch?v=zHhbypmKaj0
![Page 44: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/44.jpg)
Raia Hadsell 2017
Should depth be an input? Or a target?
rgbdt rt-1 {vt, at-1}
enc enc
Depth (D1 )
rgbt rt-1 {vt, at-1}
Depth (D2 )
Answer: the dense, non-noisy gradients from depth as a target are more helpful
![Page 45: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/45.jpg)
Raia Hadsell 2017
Results with Random Goal locations
Is the agent remembering the goal
location?
● Mean time to first goal find of episode:
14.0 sec
● Mean time to subsequent goal finds:
7.2 sec
● Not as impressive for large mazes:
15.4 sec vs 15.0 sec.
small
large
![Page 46: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/46.jpg)
Raia Hadsell 2017
Latency to goal (as the agent returns)
● Trajectories of the Nav A3C+D+L agent in the I-maze and random goal maze over the course of one episode
● Value function and goal finding (red lines) are shown
![Page 47: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/47.jpg)
Raia Hadsell 2017
Position decoding
● Trajectories of the Nav A3C+D+L agent in the random goal maze
● Position likelihoods are overlaid (predicted from LSTM hiddens)
● Initial uncertainty gives way to accurate position estimation.
enc
���� L
D1
xt rt-1 {vt, at-1}
D2Position
![Page 48: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/48.jpg)
Raia Hadsell 2017
Results in random mazessmall
large
https://www.youtube.com/watch?v=EKXQAjoNdGM
![Page 50: Raia Hadsellraiahadsell.com/.../oxford_cslecture_raia_hadsell_2017.pdf · 2017. 6. 1. · Raia Hadsell 2017 Experiment: Atari 10 Forget-Me-Not1 allows labeling of data segments, used](https://reader035.vdocuments.mx/reader035/viewer/2022062510/61449d54b5d1170afb43fd2b/html5/thumbnails/50.jpg)
Thank you!raiahadsell.com
Piotr Mirowski Razvan Pascanu
Fabio Ross Andy Hubert Laurent Koray Dharsh Misha Andrea
Learning to navigate in complex environments
ICLR2017arxiv.org/abs/1611.03673