aa203 optimal and learning-based control -...

AA203Optimal and Learning-based Control

Dynamic programming

Roadmap

4/13/19 AA 203 | Lecture 3 2

Optimal control

Open-loop

Indirect methods

Direct methods

Closed-loop

DP HJB / HJI

MPC

Adaptiveoptimal control Model-based RL

Linear methods

Non-linear methods

Principle of optimality

The key concept behind the dynamic programming approach is the principle of optimalitySuppose optimal path for a multi-stage decision-making problem is

• first decision yields segment ! − # with cost $%&• remaining decisions yield segments # − ' with cost $&(• optimal cost is then $%(∗ = $%& + $&(4/13/19 AA 203 | Lecture 3 3


• Claim: If ! − # − $ is optimal path from ! to $, then # − $ is optimal path from # to $• Proof: Suppose # − % − $ is the optimal path from # to $. Then

&'() < &')and

&+' + &'() < &+' + &') = &+)∗

4/13/19 AA 203 | Lecture 3 4

Contradiction!


Principle of optimality (for discrete-time systems): Let !∗:={!&∗, !(∗, … , !*+(∗ } be an optimal policy. Assume state -. is reachable. Consider the subproblem whereby we are at -. at time / and we wish to minimize the cost-to-go from time / to time 0. Then the truncated policy {!.∗ , !.1(∗ , … , !*+(∗ } is optimal for the subproblem

• tail policies optimal for tail subproblems• notation: !.∗ -. = !∗ (-., /)

4/13/19 AA 203 | Lecture 3 5

Applying the principle of optimality

4/13/19 AA 203 | Lecture 3 6

Principle of optimality: if ! − # is the initial segment of the optimal path from ! to $, then # − $ is the terminal segment of this path

Hence, the optimal trajectory is found by comparing:

%&'( = *&' + *'(∗%&-( = *&- + *-(∗%&.( = *&. + *.(∗

Applying the principle of optimality

• need only to compare the concatenations of immediate decisions and optimal decisions → significant decrease in computation / possibilities • in practice: carry out this procedure backward in time

4/13/19 AA 203 | Lecture 3 7

Example

4/13/19 AA 203 | Lecture 3 8

Optimal cost: 18Optimal path: ! → # → $ → % → & → ℎ

DP Algorithm• Model: !"#$ = & !", (", ) , ("∈ +(!")• Cost: .(!/) = ℎ1 !2 + ∑"5/16$ 7 !", 8"(!"), )

DP Algorithm: For every initial state !/, the optimal cost .∗(!/) is equal to ./(!/), given by the last step of the following algorithm, which proceeds backward in time from stage : − 1 to stage 0:

.1(!1) = ℎ1(!1)." !" = min(A∈B(!A)

7 !", (", ) + ."#$ & !", (", ) , ) = 0,… , : − 1

Furthermore, if ("∗ = 8"∗(!") minimizes the right hand side of the above equation for each !" and ), the policy {8/∗, 8$∗, … , 816$∗ } is optimal 4/13/19 AA 203 | Lecture 3 9

Comments

• discretization (from differential equations to difference equations)• quantization (from continuous to discrete state variables / controls)• interpolation • global minimum• constraints, in general, simplify the numerical procedure • optimal control in closed-loop form • curse of dimensionality

4/13/19 AA 203 | Lecture 3 10

Example: discrete LQR

• In most cases, DP algorithm needs to be performed numerically• A few cases can be solved analytically

Discrete LQR: select control inputs to minimize

! "# = 12 "'

( ) "' + 12+,-#

'./[",( 1 ", + 2,( 3 24]

subject to the dynamics",6/ = 7,", + 8, 2,

Assumption: ) = )( ≥ 0, 1 = 1( ≥ 0, 3 = 3( > 04/13/19 AA 203 | Lecture 3 11


First step:

Going backward

4/13/19 AA 203 | Lecture 3 12


Taking derivative

and

4/13/19 AA 203 | Lecture 3 13

DP for discrete LQR

Hence, the optimizer satisfies

so

4/13/19 AA 203 | Lecture 3 14

DP for discrete LQR

Plugging in

4/13/19 AA 203 | Lecture 3 15

DP for discrete LQR

Proceeding by induction, the solution is given by

1. #$ %$ = '( %$

) *$%$, where *$ = +2. -.∗ = 0.%., where 0. = − 3 + 5.) *.6'5. 7'5.) *.6' 8.3. #.(%.) = '

( %.) *.%., where

*. = < + 0.) 3 0. + 8. + 5.0. ) *.6' (8. + 5.0.)

At the end, #= %= = '( %=

) *=%=

4/13/19 AA 203 | Lecture 3 16

Next time

• iLQR, DDP, and LQG

!"#$ = &"!" + (" )" + *"+" = ,"!" + -"

4/13/19 AA 203 | Lecture 3 17

aa203 optimal and learning-based control -...

Documents