deep reinforcement learning for robotics · deep reinforcement learning . robotics . percepts ....
TRANSCRIPT
![Page 1: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/1.jpg)
Deep Reinforcement Learning for Robotics Pieter Abbeel -- UC Berkeley EECS
![Page 2: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/2.jpg)
State-of-the-art object detection until 2012:
Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …):
60 million learned parameters (since then, billions of parameters)
~1.2 million training images
Object Detection in Computer Vision
Input Image
Hand-engineered features (SIFT,
HOG, DAISY, …)
Support Vector
Machine (SVM)
“cat” “dog” “car” …
Input Image
8-layer neural network with 60 million parameters to learn
“cat” “dog” “car” …
![Page 3: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/3.jpg)
Performance
graph credit Matt Zeiler, Clarifai
![Page 4: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/4.jpg)
Performance
graph credit Matt Zeiler, Clarifai
![Page 5: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/5.jpg)
Performance
graph credit Matt Zeiler, Clarifai
AlexNet
![Page 6: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/6.jpg)
Performance
graph credit Matt Zeiler, Clarifai
AlexNet
![Page 7: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/7.jpg)
Performance
graph credit Matt Zeiler, Clarifai
AlexNet
![Page 8: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/8.jpg)
Speech Recognition
graph credit Matt Zeiler, Clarifai
![Page 9: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/9.jpg)
History
Is deep learning 3, 30, or 60 years old?
2000s Sparse, Probabilistic, and Energy models (Hinton, Bengio, LeCun, Ng)
Rosenblatt’s Perceptron
(Olshausen, 1996)
based on history by K. Cho
![Page 10: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/10.jpg)
Data
1.2M training examples
* 2048 (shifts)
* 90 (PCA re-coloring)
1.2M * 2k *90 ~ 0.216 trillion
Human eye: 1k frames/s
~6.84yrs
Compute power
Two NVIDIA GTX 580 GPUs
5-6 days of training time
What’s Changed Nonlinearity
Sigmoid
ReLU
Regularization
Drop-out
(Training data augmentation)
Exploration of model structure
Optimization know-how
![Page 11: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/11.jpg)
State-of-the-art object detection until 2012:
Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …):
60 million learned parameters (since then, billions of parameters)
~1.2 million training images
Object Detection in Computer Vision
Input Image
Hand-engineered features (SIFT,
HOG, DAISY, …)
Support Vector
Machine (SVM)
“cat” “dog” “car” …
Input Image
8-layer neural network with 60 million parameters to learn
“cat” “dog” “car” …
![Page 12: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/12.jpg)
Current state-of-the-art robotics
Deep reinforcement learning
Robotics
Percepts Hand-
engineered state-
estimation
Many-layer neural network
with many parameters to learn
Hand-engineered
control policy class
Hand-tuned (or learned) 10’ish free parameters
Motor commands
Percepts Motor commands
![Page 13: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/13.jpg)
Reinforcement Learning (RL)
Robotics
Marketing / Advertising
Dialogue
Optimizing operations / logistics
Queue management
…
Robot + Environment
probability of taking action a in state s
![Page 14: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/14.jpg)
How About Deep RL?
Pong Enduro Beamrider Q*bert
![Page 15: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/15.jpg)
Deep Q-learning [Mnih et al, 2013]
Monte Carlo Tree Search [Xiao-Xiao et al, 2014]
Trust Region Policy Optimization [Schulman, Levine, Moritz, Jordan, A., 2014]
Deep Reinforcement Learning for Atari Games
Pong Enduro Beamrider Q*bert
![Page 16: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/16.jpg)
[Schulman, Levine, Moritz, Jordan, Abbeel, ICML 2015]
Experiments in Locomotion
![Page 17: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/17.jpg)
How About Real Robotic Visuo-Motor Skills?
![Page 18: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/18.jpg)
Architecture (92,000 parameters)
[Levine*, Finn*, Darrell, Abbeel, 2015, TR at: rll.berkeley.edu/deeplearningrobotics]
![Page 19: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/19.jpg)
Block Stacking – Learning the Controller for a Single Instance
![Page 20: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/20.jpg)
Learned Skills
![Page 21: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/21.jpg)
Architectures for shared learning / transfer learning
Multiple robots and sensors (including simulation)
Multiple tasks
Simulation – Real world
Frontiers / Limitations Exploration
Controllers that require memory / estimation
Temporal hierarchy
![Page 22: Deep Reinforcement Learning for Robotics · Deep reinforcement learning . Robotics . Percepts . Hand-engineered state-estimation . Many-layer neural network with many parameters to](https://reader030.vdocuments.mx/reader030/viewer/2022041021/5ed0af42b559fe126a336bdf/html5/thumbnails/22.jpg)
Thank you