aa203 optimal and learning-based control -...
TRANSCRIPT
AA203Optimal and Learning-based Control
Dynamic programming
Roadmap
4/13/19 AA 203 | Lecture 3 2
Optimal control
Open-loop
Indirect methods
Direct methods
Closed-loop
DP HJB / HJI
MPC
Adaptiveoptimal control Model-based RL
Linear methods
Non-linear methods
Principle of optimality
The key concept behind the dynamic programming approach is the principle of optimalitySuppose optimal path for a multi-stage decision-making problem is
• first decision yields segment ! − # with cost $%&• remaining decisions yield segments # − ' with cost $&(• optimal cost is then $%(∗ = $%& + $&(4/13/19 AA 203 | Lecture 3 3
Principle of optimality
• Claim: If ! − # − $ is optimal path from ! to $, then # − $ is optimal path from # to $• Proof: Suppose # − % − $ is the optimal path from # to $. Then
&'() < &')and
&+' + &'() < &+' + &') = &+)∗
4/13/19 AA 203 | Lecture 3 4
Contradiction!
Principle of optimality
Principle of optimality (for discrete-time systems): Let !∗:={!&∗, !(∗, … , !*+(∗ } be an optimal policy. Assume state -. is reachable. Consider the subproblem whereby we are at -. at time / and we wish to minimize the cost-to-go from time / to time 0. Then the truncated policy {!.∗ , !.1(∗ , … , !*+(∗ } is optimal for the subproblem
• tail policies optimal for tail subproblems• notation: !.∗ -. = !∗ (-., /)
4/13/19 AA 203 | Lecture 3 5
Applying the principle of optimality
4/13/19 AA 203 | Lecture 3 6
Principle of optimality: if ! − # is the initial segment of the optimal path from ! to $, then # − $ is the terminal segment of this path
Hence, the optimal trajectory is found by comparing:
%&'( = *&' + *'(∗%&-( = *&- + *-(∗%&.( = *&. + *.(∗
Applying the principle of optimality
• need only to compare the concatenations of immediate decisions and optimal decisions → significant decrease in computation / possibilities • in practice: carry out this procedure backward in time
4/13/19 AA 203 | Lecture 3 7
Example
4/13/19 AA 203 | Lecture 3 8
Optimal cost: 18Optimal path: ! → # → $ → % → & → ℎ
DP Algorithm• Model: !"#$ = & !", (", ) , ("∈ +(!")• Cost: .(!/) = ℎ1 !2 + ∑"5/16$ 7 !", 8"(!"), )
DP Algorithm: For every initial state !/, the optimal cost .∗(!/) is equal to ./(!/), given by the last step of the following algorithm, which proceeds backward in time from stage : − 1 to stage 0:
.1(!1) = ℎ1(!1)." !" = min(A∈B(!A)
7 !", (", ) + ."#$ & !", (", ) , ) = 0,… , : − 1
Furthermore, if ("∗ = 8"∗(!") minimizes the right hand side of the above equation for each !" and ), the policy {8/∗, 8$∗, … , 816$∗ } is optimal 4/13/19 AA 203 | Lecture 3 9
Comments
• discretization (from differential equations to difference equations)• quantization (from continuous to discrete state variables / controls)• interpolation • global minimum• constraints, in general, simplify the numerical procedure • optimal control in closed-loop form • curse of dimensionality
4/13/19 AA 203 | Lecture 3 10
Example: discrete LQR
• In most cases, DP algorithm needs to be performed numerically• A few cases can be solved analytically
Discrete LQR: select control inputs to minimize
! "# = 12 "'
( ) "' + 12+,-#
'./[",( 1 ", + 2,( 3 24]
subject to the dynamics",6/ = 7,", + 8, 2,
Assumption: ) = )( ≥ 0, 1 = 1( ≥ 0, 3 = 3( > 04/13/19 AA 203 | Lecture 3 11
Example: discrete LQR
First step:
Going backward
4/13/19 AA 203 | Lecture 3 12
Example: discrete LQR
Taking derivative
and
4/13/19 AA 203 | Lecture 3 13
DP for discrete LQR
Hence, the optimizer satisfies
so
4/13/19 AA 203 | Lecture 3 14
DP for discrete LQR
Plugging in
4/13/19 AA 203 | Lecture 3 15
DP for discrete LQR
Proceeding by induction, the solution is given by
1. #$ %$ = '( %$
) *$%$, where *$ = +2. -.∗ = 0.%., where 0. = − 3 + 5.) *.6'5. 7'5.) *.6' 8.3. #.(%.) = '
( %.) *.%., where
*. = < + 0.) 3 0. + 8. + 5.0. ) *.6' (8. + 5.0.)
At the end, #= %= = '( %=
) *=%=
4/13/19 AA 203 | Lecture 3 16
Next time
• iLQR, DDP, and LQG
!"#$ = &"!" + (" )" + *"+" = ,"!" + -"
4/13/19 AA 203 | Lecture 3 17