aa203 optimal and learning-based control -...

17
AA203 Optimal and Learning-based Control Dynamic programming

Upload: others

Post on 17-Mar-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

AA203Optimal and Learning-based Control

Dynamic programming

Page 2: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

Roadmap

4/13/19 AA 203 | Lecture 3 2

Optimal control

Open-loop

Indirect methods

Direct methods

Closed-loop

DP HJB / HJI

MPC

Adaptiveoptimal control Model-based RL

Linear methods

Non-linear methods

Page 3: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

Principle of optimality

The key concept behind the dynamic programming approach is the principle of optimalitySuppose optimal path for a multi-stage decision-making problem is

• first decision yields segment ! − # with cost $%&• remaining decisions yield segments # − ' with cost $&(• optimal cost is then $%(∗ = $%& + $&(4/13/19 AA 203 | Lecture 3 3

Page 4: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

Principle of optimality

• Claim: If ! − # − $ is optimal path from ! to $, then # − $ is optimal path from # to $• Proof: Suppose # − % − $ is the optimal path from # to $. Then

&'() < &')and

&+' + &'() < &+' + &') = &+)∗

4/13/19 AA 203 | Lecture 3 4

Contradiction!

Page 5: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

Principle of optimality

Principle of optimality (for discrete-time systems): Let !∗:={!&∗, !(∗, … , !*+(∗ } be an optimal policy. Assume state -. is reachable. Consider the subproblem whereby we are at -. at time / and we wish to minimize the cost-to-go from time / to time 0. Then the truncated policy {!.∗ , !.1(∗ , … , !*+(∗ } is optimal for the subproblem

• tail policies optimal for tail subproblems• notation: !.∗ -. = !∗ (-., /)

4/13/19 AA 203 | Lecture 3 5

Page 6: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

Applying the principle of optimality

4/13/19 AA 203 | Lecture 3 6

Principle of optimality: if ! − # is the initial segment of the optimal path from ! to $, then # − $ is the terminal segment of this path

Hence, the optimal trajectory is found by comparing:

%&'( = *&' + *'(∗%&-( = *&- + *-(∗%&.( = *&. + *.(∗

Page 7: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

Applying the principle of optimality

• need only to compare the concatenations of immediate decisions and optimal decisions → significant decrease in computation / possibilities • in practice: carry out this procedure backward in time

4/13/19 AA 203 | Lecture 3 7

Page 8: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

Example

4/13/19 AA 203 | Lecture 3 8

Optimal cost: 18Optimal path: ! → # → $ → % → & → ℎ

Page 9: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

DP Algorithm• Model: !"#$ = & !", (", ) , ("∈ +(!")• Cost: .(!/) = ℎ1 !2 + ∑"5/16$ 7 !", 8"(!"), )

DP Algorithm: For every initial state !/, the optimal cost .∗(!/) is equal to ./(!/), given by the last step of the following algorithm, which proceeds backward in time from stage : − 1 to stage 0:

.1(!1) = ℎ1(!1)." !" = min(A∈B(!A)

7 !", (", ) + ."#$ & !", (", ) , ) = 0,… , : − 1

Furthermore, if ("∗ = 8"∗(!") minimizes the right hand side of the above equation for each !" and ), the policy {8/∗, 8$∗, … , 816$∗ } is optimal 4/13/19 AA 203 | Lecture 3 9

Page 10: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

Comments

• discretization (from differential equations to difference equations)• quantization (from continuous to discrete state variables / controls)• interpolation • global minimum• constraints, in general, simplify the numerical procedure • optimal control in closed-loop form • curse of dimensionality

4/13/19 AA 203 | Lecture 3 10

Page 11: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

Example: discrete LQR

• In most cases, DP algorithm needs to be performed numerically• A few cases can be solved analytically

Discrete LQR: select control inputs to minimize

! "# = 12 "'

( ) "' + 12+,-#

'./[",( 1 ", + 2,( 3 24]

subject to the dynamics",6/ = 7,", + 8, 2,

Assumption: ) = )( ≥ 0, 1 = 1( ≥ 0, 3 = 3( > 04/13/19 AA 203 | Lecture 3 11

Page 12: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

Example: discrete LQR

First step:

Going backward

4/13/19 AA 203 | Lecture 3 12

Page 13: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

Example: discrete LQR

Taking derivative

and

4/13/19 AA 203 | Lecture 3 13

Page 14: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

DP for discrete LQR

Hence, the optimizer satisfies

so

4/13/19 AA 203 | Lecture 3 14

Page 15: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

DP for discrete LQR

Plugging in

4/13/19 AA 203 | Lecture 3 15

Page 16: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

DP for discrete LQR

Proceeding by induction, the solution is given by

1. #$ %$ = '( %$

) *$%$, where *$ = +2. -.∗ = 0.%., where 0. = − 3 + 5.) *.6'5. 7'5.) *.6' 8.3. #.(%.) = '

( %.) *.%., where

*. = < + 0.) 3 0. + 8. + 5.0. ) *.6' (8. + 5.0.)

At the end, #= %= = '( %=

) *=%=

4/13/19 AA 203 | Lecture 3 16

Page 17: AA203 Optimal and Learning-based Control - asl.stanford.eduasl.stanford.edu/aa203/pdfs/lecture/lecture_3.pdf · 4/13/19 AA 203 | Lecture 3 2 Optimal control Open-loop Indirect methods

Next time

• iLQR, DDP, and LQG

!"#$ = &"!" + (" )" + *"+" = ,"!" + -"

4/13/19 AA 203 | Lecture 3 17