lab 6-1: q network - github pages · 2017-10-02 · lab 6-1: q network reinforcement learning with...
TRANSCRIPT
![Page 2: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/2.jpg)
State(0~15) as input
(2)Ws(1)sstate 7
![Page 3: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/3.jpg)
State(0~15) as input
(2)Ws(1)sOne-hot state 7
![Page 4: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/4.jpg)
np.identify
![Page 5: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/5.jpg)
State (0~15) as input
(2)Ws(1)sOne-hot state 7
![Page 6: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/6.jpg)
Q-Network training (Network construction)
(2)Ws(1)s
![Page 7: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/7.jpg)
Q-Network training (linear regression)
(2)Ws(1)s
y = r + �maxQ(s0)
cost(W ) = (Ws� y)2
![Page 8: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/8.jpg)
Algorithm
Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.
![Page 9: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/9.jpg)
Algorithm
Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.
![Page 10: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/10.jpg)
Y label and loss function
Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.
![Page 11: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/11.jpg)
Code: Network and setup
![Page 12: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/12.jpg)
Code: Training
![Page 13: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/13.jpg)
Code: results
Percent of successful episodes: 0.5195%
![Page 14: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/14.jpg)
Q-Table VS NetworkQ-network: 0.5195%
Q-table: 0.653
![Page 15: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/15.jpg)
Array shape
[[0, 2, …]][ [0,1,2,3], [3,1,2,3], [0,5,2,3], … ]
1x16
16x4
[[a1,a2,a3,a4]]1x4
![Page 16: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/16.jpg)
Array Shape
[[a1,a2,a3,a4]]1x4
![Page 17: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/17.jpg)
Exercise
• Too slow- Minibatch?
• A bit unstable?
![Page 18: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input](https://reader034.vdocuments.mx/reader034/viewer/2022042021/5e77e305e5c7e55068628b56/html5/thumbnails/18.jpg)
Next
Lab: Q-network for
cart pole