lab 6-1: q network - github pages · 2017-10-02 · lab 6-1: q network reinforcement learning with...

Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Upload: others

Post on 17-Mar-2020

3 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Lab 6-1: Q Network

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

State(0~15) as input

(2)Ws(1)sstate 7

State(0~15) as input

(2)Ws(1)sOne-hot state 7

np.identify

State (0~15) as input

(2)Ws(1)sOne-hot state 7

Q-Network training (Network construction)

(2)Ws(1)s

Page 7: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Q-Network training (linear regression)

(2)Ws(1)s

y = r + �maxQ(s0)

cost(W ) = (Ws� y)2

Page 8: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Algorithm

Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.

Page 9: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Algorithm

Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.

Page 10: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Y label and loss function

Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.

Page 11: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Code: Network and setup

Page 12: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Code: Training

Page 13: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Code: results

Percent of successful episodes: 0.5195%

Page 14: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Q-Table VS NetworkQ-network: 0.5195%

Q-table: 0.653

Page 15: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Array shape

[[0, 2, …]][ [0,1,2,3], [3,1,2,3], [0,5,2,3], … ]

1x16

16x4

[[a1,a2,a3,a4]]1x4

Page 16: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Array Shape

[[a1,a2,a3,a4]]1x4

Page 17: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Exercise

• Too slow- Minibatch?

• A bit unstable?

Page 18: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Lab: Q-network for

cart pole

Can OpenAI Codex and Other Large Language Models Help Us

GeoﬀreyIrving PaulChristiano DarioAmodei OpenAIAI safety via debate GeoﬀreyIrving∗ PaulChristiano OpenAI DarioAmodei Abstract TomakeAIsystemsbroadlyusefulforchallengingreal-worldtasks,weneedthemtolearn

Lab 3: Dummy Q-learning (table) - GitHub Pages · PDF fileLab 3: Dummy Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your

Measuring the Algorithmic Efficiency of Neural NetworksMeasuring the Algorithmic Efﬁciency of Neural Networks Danny Hernandez OpenAI [email protected] Tom B. Brown OpenAI [email protected]

ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Lecture 1: Introduction - GitHub Pageshunkim.github.io/ml/RL/rl01.pdf · Lecture 1: Introduction Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

NIPS 2016 · (2015) Google gave its introduction/tutorial on TensorFlow, released its best model on ImageNet (2015) OpenAI announced its existence OpenAI released their Universe platform

Lab 2: Playing OpenAI Gym Games - GitHub Pages · 2017-10-02 · Lab 2: Playing OpenAI Gym Games Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI

ReLink: Recovering Links between Bugs and Changes › ~hunkim › images › b › b6 › Relink_fse2011.pdf · We manually inspected the explicit links, which have explicit bug IDs

Lab 7: DQN 1 (NIPS 2013) - GitHub Pages · Lab 7: DQN 1 (NIPS 2013) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI

OpenAI Five Model Architecture · OpenAI Five Model Architecture (06/06/2018) Title: dota_network_diagram Created Date: 6/24/2018 4:00:19 PM

ML with Tensorflow labhunkim.github.io/ml/lab9.pdf · 2017-10-02 · Lab 9-1 NN for XOR Sung Kim Data set. XOR with logistic regression? XOR with logistic

CLAMI: Defect Prediction on Unlabeled Datasetspeople.csail.mit.edu/hunkim/papers/nam-HDP-fse2015.pdfthat show the potential for defect prediction on unlabeled datasets in an automated

Extending the OpenAI Gym for robotics: a toolkit for ...erlerobotics.com/whitepaper/robot_gym.pdfExtending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS

PDF - arXiv · Proximal Policy Optimization Algorithms John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov OpenAI fjoschu, filip, prafulla, alec

10-703 Deep RL and Controls OpenAI Gym Recitation API Basic Datatypes ... Minecraft. VirtualEnv Installation ... 10-703 Deep RL and Controls OpenAI Gym Recitation Author: Devin Schwab

Jonas Schneider, Head of Engineering for Robotics, OpenAI

Lecture 1: Introduction - GitHub Pages 1: Introduction Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim Nature of Learning •We learn from past--

Extending the OpenAI Gym for robotics: a toolkit for ... · OpenAI Gym [1] is a is a toolkit for reinforcement learning research that has recently gained popularity in the machine

Lab 7: DQN 1 (NIPS 2013) - GitHub Pageshunkim.github.io/ml/RL/rl07-l1.pdf · Lab 7: DQN 1 (NIPS 2013) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 …people.csail.mit.edu/hunkim/papers/lee-tse2015.pdfAbstract—Recommendation systems are intended to increase developer productivity

Lab 5: Windy Frozen Lake Nondeterministic world! · Lab 5: Windy Frozen Lake Nondeterministic world! Reinforcement Learning with TensorFlow&OpenAI Gym ... Score over time: 0.0165

ns-3 meets OpenAI Gym: The Playground for …...ns-3 meets OpenAI Gym MSWiM ’19, Nov 25–29, 2019, Miami Beach, USA over Ethernet or WiFi network devices. Based on the core concepts,

OpenAI Five Model Architecture - Amazon S3

Generative Adversarial Networks (GANs)Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artiﬁcial Intelligence Lab, 2016-08-31

Tech-Circle #18 Pythonではじめる強化学習 OpenAI Gym 体験ハンズオン

Lab 4: Q-learning (table) - GitHub Pages · Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Lecture 7: DQN - GitHub Pageshunkim.github.io/ml/RL/rl07.pdf · Lecture 7: DQN Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim Q-function Approximation:

Lab 4: Q-learning (table) - GitHub Pageshunkim.github.io/ml/RL/rl-l04.pdfLab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI

Large-Scale Study of Curiosity-Driven LearningLarge-Scale Study of Curiosity-Driven Learning Yuri Burda OpenAI Harri Edwards OpenAI Deepak Pathak UC Berkeley Amos Storkey Univ. of

Catacomb : A database backed WebDAV and DASL repositorywebdav.org/papers/catacomb-apachecon2002.pdf · Title: Catacomb : A database backed WebDAV and DASL repository Author: hunkim

1 Enriching Documents with Examples: A Corpus Mining ...people.csail.mit.edu/hunkim/papers/kim-tois2013.pdfof “prepareStatement().” As a result, human effort is needed to sift