reinforcement learning with matlab & simulinkcode for configuring agent and training. 17 create...
TRANSCRIPT
![Page 1: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/1.jpg)
1© 2019 The MathWorks, Inc.
Reinforcement Learning with MATLAB & Simulink
Christoph Stockhammer MathWorks Application Engineering
![Page 2: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/2.jpg)
2
Unsupervised
Learning[No Labeled Data]
Clustering
Machine Learning
Machine Learning, Deep Learning, and Reinforcement Learning
![Page 3: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/3.jpg)
3
Unsupervised
Learning[No Labeled Data]
Supervised Learning
[Labeled Data]
Clustering Classification Regression
Machine Learning
Machine Learning, Deep Learning, and Reinforcement Learning
![Page 4: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/4.jpg)
4
Unsupervised
Learning[No Labeled Data]
Supervised Learning
[Labeled Data]
Clustering Classification Regression
Machine Learning
Machine Learning, Deep Learning, and Reinforcement Learning
Deep Learning
Supervised learning typically involves
manual feature extraction
Deep learning typically does not
involve feature extraction
![Page 5: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/5.jpg)
5
Unsupervised
Learning[No Labeled Data]
Supervised Learning
[Labeled Data]
Clustering Classification Regression
Deep Learning
Machine Learning
Reinforcement
Learning
[Interaction Data]
Decision
MakingControl
Reinforcement Learning vs Machine Learning
Reinforcement learning:
Learning a behavior or
accomplishing a task through
trial & error
[interaction]
Complex problems typically need
deep models
[Deep Reinforcement Learning]
Reinforcement Learning Toolbox
New in
![Page 6: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/6.jpg)
6
Reinforcement Learning enables the use of Deep Learning for
Controls and Decision Making Applications
A.I. Gameplay
Finance
Robotics
Autonomous driving
![Page 7: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/7.jpg)
7
Why is reinforcement learning appealing?
Teach a robot to follow a straight line using camera data
![Page 8: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/8.jpg)
8
Let’s try to solve this problem the traditional way
Motor
Control
Leg &
Trunk
Trajectories
Balance
Motor
Commands
Observations
Sensors
Camera
Data
Feature
Extraction
State
EstimationController
Observations
Motor
Commands
![Page 9: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/9.jpg)
9
What is the alternative approach?
Sensors
Camera
Data
Feature
Extraction
State
EstimationController
Observations
Motor
Commands
Camera
Data
Sensors
Motor
Commands
How do we
design this?
![Page 10: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/10.jpg)
10
A Practical Example of Reinforcement LearningTraining a Robot to Walk
Robot’s computer learns how to walk…
(agent)
using sensor readings from joints, torso,…
(state)
that represent robot’s pose and orientation,…
(environment)
by generating joint torque commands,…
(action)
based on an internal state-to-action mapping…
(policy)
that tries to optimize forward locomotion, …
(reward).
The policy is updated through repeated trial-
and-error by a reinforcement learning
algorithm
AGENT
Reinforcement
Learning
Algorithm
Policy
ENVIRONMENT
ACTION
REWARD
STATE
Policy
update
![Page 11: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/11.jpg)
11
Connections with Controls
OtAt
Rt = 𝑥 𝑡 𝑇𝑅𝑥(𝑡) + 𝑢 𝑡 𝑇𝑄𝑢(𝑡)
![Page 12: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/12.jpg)
12
Define Environment to Generate Data
Physical modeling of robot
dynamics and contact
forces using Simscape
Agent
The environment provides 29 observations to the agent.
The observations are: Y (lateral) and Z (vertical) translations of the torso center of mass; X (forward), Y (lateral), and
Z (vertical) translation velocities; yaw, pitch, and roll angles of the torso; yaw, pitch, and roll angular velocities;
angular position and velocity of 3 joints (ankle, knee, hip) on both legs; and previous actions from the agent. The
translation in the Z direction is normalized to a similar range as the other observations.
![Page 13: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/13.jpg)
13
Define Environment to Generate Data
AgentReward defines task to
learn
![Page 14: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/14.jpg)
14
Define Environment to Generate Data
*Reward function inspired by: N. Heess et al, "Emergence of Locomotion Behaviours in Rich Environments," Technical Report, ArXiv, 2017.
![Page 15: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/15.jpg)
15
Define Policy and Learning Algorithm
Define agent’s policy and
learning algorithm
Agent
![Page 16: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/16.jpg)
16
Code for Configuring Agent and Training
![Page 17: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/17.jpg)
17
Create Critic Network
![Page 18: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/18.jpg)
18
Create Actor Network
![Page 19: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/19.jpg)
19
Create DDPG Agent
![Page 20: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/20.jpg)
20
Training the Agent
![Page 21: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/21.jpg)
21
Train Robot to Walk and Track Progress
Episode Number
Epis
ode R
ew
ard
![Page 22: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/22.jpg)
22
Train Robot to Walk and Track Progress
Episode Number
...
Accelerate Training with Parallel Computing
Epis
ode R
ew
ard
![Page 23: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/23.jpg)
23
Deploy Policy to Embedded Device
![Page 24: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/24.jpg)
24
Everything is Great, Right?
![Page 25: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/25.jpg)
25
Reward Function Design Matters
![Page 26: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/26.jpg)
26
Reward Function Design Matters
![Page 27: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/27.jpg)
27
Reward Function Design Matters
![Page 28: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/28.jpg)
28
Reward Function Design Matters
![Page 29: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/29.jpg)
29
Reward Function Design Matters
![Page 30: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/30.jpg)
30
Simulation and Virtual Models are a Key Aspect of Reinforcement
Learning
Reinforcement learning needs a lot of data
(sample inefficient)
– Training on hardware can be prohibitively
expensive and dangerous
Virtual models allow you to simulate conditions
hard to emulate in the real world
– This can help develop a more robust
solution
Many of you have already developed MATLAB
and Simulink models that can be reused
![Page 31: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/31.jpg)
31
Pros & Cons of Reinforcement Learning
Pros Cons
No need to label data before training A lot of simulation trials required
Complex end-to-end solutions can be developed
(e.g. camera input→ car steering wheel)
Reward signal design, network layer structure &
hyperparameter tuning can be challenging
Can be applied to uncertain, nonlinear
environments
No performance guarantees, Training may not
converge
Virtual models allow simulations of varying
conditions and training parallelization
Further training might be necessary after
deployment on real hardware
![Page 32: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/32.jpg)
32
Reinforcement Learning Toolbox
New in
Built-in and custom algorithms for reinforcement
learning
Environment modeling in MATLAB and Simulink
Deep Learning Toolbox support for designing policies
Training acceleration through GPUs and cloud
resources
Deployment to embedded devices and production
systems
Reference examples for getting started
![Page 33: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/33.jpg)
33
Predefined Environments and Many Examples
MATLAB Environment
– 'BasicGridWorld'
– 'CartPole-Discrete'
– 'CartPole-Continuous'
– 'DoubleIntegrator-Discrete'
– 'DoubleIntegrator-Continuous'
– 'SimplePendulumWithImage-Discrete'
– 'SimplePendulumWithImage-Continuous'
– 'WaterFallGridWorld-Stochastic'
– 'WaterFallGridWorld-Deterministic'
Simulink Environment
– 'SimplePendulumModel-Discrete'
– 'SimplePendulumModel-Continuous'
– 'CartPoleSimscapeModel-Discrete'
– 'CartPoleSimscapeModel-Continuous'
Examples
– Grid World, MDP
– Classical Control Benchmarks
– Automotive
– Robotics
– Custom LQR Agent
![Page 34: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/34.jpg)
34
Extensible Environment Interface
env = rlFunctionEnv(obsInfo,actInfo,stepfcn,resetfcn)
– obsInfo: Observation Specification
– actInfo: Action Specification
– stepfcn: Function handle for stepping the environment
– resetfcn: Function handle for resetting the environment
Subclassing from rl.env.MATLABEnvironment
– Custom MATLAB Environments
– Interfacing with 3rd party simulators (e.g. OpenAI Gym)
![Page 35: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/35.jpg)
35
Resources
Reference examples for controls,
robotics, and autonomous system
applications
Documentation written for
engineers and domain experts
Tech Talk video series on
reinforcement learning concepts for
engineers
![Page 36: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/36.jpg)
36
Reinforcement Learning Toolbox
New in
Built-in and custom algorithms for reinforcement
learning
Environment modeling in MATLAB and Simulink
Deep Learning Toolbox support for designing policies
Training acceleration through GPUs and cloud
resources
Deployment to embedded devices and production
systems
Reference examples for getting started
![Page 37: Reinforcement Learning with MATLAB & SimulinkCode for Configuring Agent and Training. 17 Create Critic Network. 18 Create Actor Network. 19 Create DDPG Agent. 20 Training the Agent](https://reader033.vdocuments.mx/reader033/viewer/2022052013/6029ad2c019533582a0c3f37/html5/thumbnails/37.jpg)
37
Questions?