powerpoint presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8....

46
Optimal Control and Planning CS 285 Instructor: Sergey Levine UC Berkeley

Upload: others

Post on 22-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Optimal Control and Planning

CS 285

Instructor: Sergey LevineUC Berkeley

Page 2: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Today’s Lecture

1. Introduction to model-based reinforcement learning

2. What if we know the dynamics? How can we make decisions?

3. Stochastic optimization methods

4. Monte Carlo tree search (MCTS)

5. Trajectory optimization

• Goals:• Understand how we can perform planning with known dynamics models in

discrete and continuous spaces

Page 3: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Recap: the reinforcement learning objective

Page 4: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Recap: model-free reinforcement learning

assume this is unknown

don’t even attempt to learn it

Page 5: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

What if we knew the transition dynamics?

• Often we do know the dynamics1. Games (e.g., Atari games, chess, Go)

2. Easily modeled systems (e.g., navigating a car)

3. Simulated environments (e.g., simulated robots, video games)

• Often we can learn the dynamics1. System identification – fit unknown parameters of a known model

2. Learning – fit a general-purpose model to observed transition data

Does knowing the dynamics make things easier?

Often, yes!

Page 6: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

1. Model-based reinforcement learning: learn the transition dynamics, then figure out how to choose actions

2. Today: how can we make decisions if we know the dynamics?a. How can we choose actions under perfect knowledge of the system dynamics?

b. Optimal control, trajectory optimization, planning

3. Next week: how can we learn unknown dynamics?

4. How can we then also learn policies? (e.g. by imitating optimal control)

Model-based reinforcement learning

policy

system dynamics

Page 7: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

The objective

1. run away

2. ignore

3. pet

Page 8: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

The deterministic case

Page 9: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

The stochastic open-loop case

why is this suboptimal?

Page 10: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Aside: terminologywhat is this “loop”?

closed-loop open-loop

only sent at t = 1,then it’s one-way!

Page 11: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

The stochastic closed-loop case

(more on this later)

Page 12: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Open-Loop Planning

Page 13: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

But for now, open-loop planning

Page 14: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Stochastic optimization

simplest method: guess & check “random shooting method”

Page 15: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Cross-entropy method (CEM)

can we do better?

typically use Gaussian distribution

see also: CMA-ES (sort of like CEM with momentum)

Page 16: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

What’s the problem?

1. Very harsh dimensionality limit

2. Only open-loop planning

What’s the upside?

1. Very fast if parallelized

2. Extremely simple

Page 17: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Discrete case: Monte Carlo tree search (MCTS)

Page 18: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Discrete case: Monte Carlo tree search (MCTS)

e.g., random policy

Page 19: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Discrete case: Monte Carlo tree search (MCTS)

+10 +15

Page 20: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Discrete case: Monte Carlo tree search (MCTS)

Q = 10N = 1

Q = 12N = 1

Q = 16N = 1

Q = 22N = 2Q = 38N = 3

Q = 10N = 1

Q = 12N = 1

Q = 22N = 2Q = 30N = 3

Q = 8N = 1

Page 21: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Additional reading

1. Browne, Powley, Whitehouse, Lucas, Cowling, Rohlfshagen, Tavener, Perez, Samothrakis, Colton. (2012). A Survey of Monte Carlo Tree Search Methods.• Survey of MCTS methods and basic summary.

Page 22: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Trajectory Optimization with Derivatives

Page 23: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Can we use derivatives?

Page 24: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Shooting methods vs collocation

shooting method: optimize over actions only

Page 25: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Shooting methods vs collocation

collocation method: optimize over actions and states, with constraints

Page 26: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Linear case: LQR

linear quadratic

Page 27: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Linear case: LQR

Page 28: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Linear case: LQR

Page 29: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Linear case: LQR

linear linearquadratic

Page 30: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Linear case: LQR

linear linearquadratic

Page 31: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Linear case: LQR

Page 32: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Linear case: LQR

Page 33: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

LQR for Stochastic and Nonlinear Systems

Page 34: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Stochastic dynamics

Page 35: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

The stochastic closed-loop case

Page 36: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Nonlinear case: DDP/iterative LQR

Page 37: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Nonlinear case: DDP/iterative LQR

Page 38: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Nonlinear case: DDP/iterative LQR

Page 39: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Nonlinear case: DDP/iterative LQR

Page 40: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Nonlinear case: DDP/iterative LQR

Page 41: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Nonlinear case: DDP/iterative LQR

Page 42: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Case Study and Additional Readings

Page 43: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Case study: nonlinear model-predictive control

Page 44: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games
Page 45: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

Additional reading

1. Mayne, Jacobson. (1970). Differential dynamic programming. • Original differential dynamic programming algorithm.

2. Tassa, Erez, Todorov. (2012). Synthesis and Stabilization of Complex Behaviors through Online Trajectory Optimization.• Practical guide for implementing non-linear iterative LQR.

3. Levine, Abbeel. (2014). Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics.• Probabilistic formulation and trust region alternative to deterministic line search.

Page 46: PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/... · 2021. 8. 3. · What if we knew the transition dynamics? •Often we do know the dynamics 1. Games

What’s wrong with known dynamics?

Next time: learning the dynamics model