lecture 17: maximum principle - purdue universityjianghai/teaching/ece695/... · 2018-10-25 ·...

23
Lecture 17: Maximum Principle

Upload: others

Post on 14-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Lecture 17: Maximum Principle

Page 2: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Optimal Control Problem

For control system x = f (x(t), u(t), t), x(0) = x0, solve

minu(t), t∈[0,tf ]

J(u) =

∫ tf

0F (x , u, t) dt + g(x(tf ))

subject to x = f (x , u, t), ∀t ∈ [0, tf ]; x(0) = x0

• Solve for optimal control u∗ and optimal cost J∗

• Running cost F (x , u, t) ≥ 0 and terminal cost g(·) ≥ 0

• Free terminal time problem: tf may also need to be optimized

• Possible additional constraints:• State constraint x(t) ∈ Ωx(t) and control constraint u ∈ Ωu(t)• Terminal state constraint x(tf ) ∈ Sf

2 / 23

Page 3: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Examples

Example 1 (LQR)

• Dynamics x = Ax + Bu, with x(0) = x0, x(tf ) ∈ Sf

• Cost function

∫ tf

0

(xTQx + uTRu) dt

Example 2 (Brachistochrone Problem) Find the slide with the fastest

descent time: τdescent = 1√2g

∫ y0

0

√1+dx/dy

y dy

• Dynamicsdx

dt= u, with fixed x(0) = 0, x(y0) = x0

• Cost function 1√2g

∫ y0

0

√1+ut

dt

3 / 23

Page 4: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Dynamic Programming Approach

Value function V (x , t) is the optimal cost over the time interval[t, tf ] starting from x(t) = x :

V (x , t) := minu(τ), τ∈[t,tf ]

∫ tf

tF (x , u, τ) dτ + g(x(tf ))

subject to x = f (x , u, τ), ∀τ ∈ [t, tf ]; x(t) = x

Optimality Principle: assuming u(·) ≡ v in [t, t + δ] for some small δ > 0

V (x , t) = minv

∫ t+δ

t

F (x , v , t) dτ + V (x(t + δ), t + δ)

+ o(δ)

• x(t + δ) = x(t) + f (x(t), v , t)δ + o(δ)

• Taylor series expansion of V (·, ·) at (x(t), t)

4 / 23

Page 5: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Hamilton-Jacobi-Bellman equation

V (x , t) satisfies the Hamilton-Jacobi-Bellman (HJB) equation:

minuF (x , u, t) +∇xV (x , t) · f (x , u, t) = −∇tV (x , t)

with boundary condition: V (·, tf ) = g(·)

• A partial differential equation, typically solved backward in time

• Optimal control u∗ is the one achieving minimum above

• V may not be differentiable everywhere (viscosity solution)

5 / 23

Page 6: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Linear Quadratic Regulation

Continuous-time LQR problem:

minu(t), t∈[0,tf ]

∫ tf

0

(xTQx + uTRu

)dt + x(tf )TQf x(tf )

subject to x = Ax + Bu, x(0) = x0

• Value function is quadratic: V (x , t) = xTP(t)x with P(tf ) = Qf

• HJB equation implies P(·) satisfies the Riccati differential equation

minu

xTQx + uTRu + 2xTP(Ax + Bu)

= −xT Px

⇒ minu

[xu

]T [Q + PA + ATP PB

BTP R

] [xu

]= −xT Px

⇒ − P = Q + PA + ATP − PBR−1BTP

6 / 23

Page 7: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Pontryagin Maximum Principle

Suppose u∗(·), x∗(·) are a solution to the optimal control problem

minu

∫ tf

0

F (x , u, t) dt + g(x(tf ))

s.t. x = f (x , u, t), t ∈ [0, tf ]; x(0) = x0

Then there exists a co-state λ∗(·) ∈ Rn such thatx∗ = ∇λH(x∗, u∗, λ∗, t)

λ∗ = −∇xH(x∗, u∗, λ∗, t) (adjoint equation)

where H(x , u, λ, t) := F (x , u, t) + λT f (x , u, t) is the Hamiltonian

Optimal control u∗ satisfies ∇uH(x∗, u, λ∗, t) = 0, or more generally

H(x∗, u∗, λ∗, t) = infuH(x∗, u, λ∗, t)

The Mathematical Theory of Optimal Processes L. S. Pontryagin, V. G.Boltyanskii, R. V. Gamkrelidze, and E. F. Mishchenko, Interscience, 1962

7 / 23

Page 8: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Connection with Dynamic Programming

• Via dynamic programming, optimal control is decided from

u∗ = arg minu

[F (x∗, u, t) +∇xV (x∗, t) · f (x∗, t)

]• Via maximum principle, optimal control is decided from

u∗ = arg minu

H(x∗, u∗, λ∗, t) = arg minu

[F (x∗, u, t) + λ∗ · f (x∗, t)

]• Indeed, co-state is the gradient of value function w.r.t. state:

λ∗ = ∇xV (x∗, t), ∀t ∈ [t0, tf ]

8 / 23

Page 9: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Calculus of Variations

Perturb control u(·) to u(·) + δu(·)

State perturbed from x(·) to x(·) + δx(·), and

δx = ∇x f (x , u, t) δx +∇uf (x , u, t) δu + o(δ)

(linearizatized dynamics around x(·))

Optimality condition (assume g ≡ 0): constrained optimization problem

minδu,δx

δJ =

∫ tf

0

[∇xF (x , u, t)δx +∇uF (x , u, t)δu] dt

s.t. ˙δx = ∇x f (x , u, t) δx +∇uf (x , u, t) δu, ∀t ∈ [0, tf ]

achieves its minimum value of 0 at δu = 0 and δx = 0

9 / 23

Page 10: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Unconstrained Optimization

By introducing the Lagrange multiplier function λ(·), the constrainedoptimization problem is converted to an unconstrained one with Lagrangian∫ tf

0

[(∇xH(x , u, t) + λT

)δx +∇uH(x , u, t)δu

]dt − λT (tf )δx(tf )

where H = F + λT f is the Hamiltonian defined before.

To achieve minimum at δx = 0 and δu = 0, their coefficients should be zero,which implies the maximum principle

Further, we have λT (tf )δx(tf ) = 0

1 If x(tf ) is fixed, λ(tf ) is unconstrained

2 If x(tf ) is unconstrained, λ(tf ) = 0

3 If x(tf ) ∈ Sf , λ(tf ) ⊥ tangent space of Sf at x(tf ) (transversalitycondition)

Extension to g 6≡ 0 case straightforward (replace λ(tf ) with λ(tf )−∇xg(x(tf )))

10 / 23

Page 11: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Example

• Dynamics

x1 = u1

x2 = u2

, i.e., f (x , u, t) =

[u1

u2

]• x(0) = x0, x(tf ) ∈ Sf

• Cost

∫ tf

0

(u21 + u2

2) dt, i.e., F (x , u, t) = u21 + u2

2

Solution via maximum principle

• Co-state λ =[λ1 λ2

]T• Hamiltonion H(x , u, λ, t) = F + λT f = u2

1 + u22 + λ1u1 + λ2u2

• Co-state dynamics λ∗ = −∇xH(x∗, u∗, λ∗, t) = 0; thus λ∗ is constant

• Optimal control: ∂H∂ui

= 2u∗i + λ∗i = 0; hence u∗ = −λ∗

2is constant

• Transversality condition: u∗ (hence the optimal path) ⊥ Sf

11 / 23

Page 12: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Dubins Path

Vehicle dynamics

x = v0 cos θ

y = v0 sin θ

θ = u

with fixed x(0) = x0 and x(tf ) = xf

• Constant speed v0 and bounded turn rate u ∈ [−1, 1]

• Cost J(u) = tf =∫ tf

01 dt, i.e., F ≡ 1 (free terminal time problem)

• Shortest curve with bounded curvature connecting x0 and xf

Solution via maximum principle

• Hamiltonian H = 1 + v0(λ1 cos θ + λ2 sin θ) + λ3u

λ1 = 0 (i.e., λ1 is constant)

λ2 = 0 (i.e., λ2 is constant)

λ3 = v0(−λ1 sin θ + λ2 cos θ)

and u∗ =

1 if λ3 < 0

−1 if λ3 > 0

? if λ3 = 0

12 / 23

Page 13: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Dubins PathFact: optimal path is a combination of no more than three motion primitives

• S (straight): u ≡ 0

• L (left turn): u ≡ 1

• R (right turn): u ≡ −1

Further, the only possible combinations are LRL,RLR, LSL, LSR,RSL,RSR

“On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminalpositions and tangents”, L. E. Dubins. American Journal of Mathematics, 79:497-516, 1957.

13 / 23

Page 14: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

A relook at LQR Problem

Continuous-time LQR problem:

minimize J(x , u) =1

2

∫ tf

0

(xTQx + uTRu) dt

subject to x = Ax + Bu, t ∈ [0, tf ], x(0) = x0

Hamiltonian function H(x , u, λ) = 12

(xTQx + uTRu

)+ λT (Ax + Bu)

By Maximum Principle, the optimal u, x , λ satisfy

Hu = Ru + BTλ = 0 ⇒ u∗ = −R−1BTλ∗x∗ = Hλ = Ax∗ − BR−1BTλ∗

λ∗ = −Hx = −Qx∗ − ATλ∗⇒

[x∗

λ∗

]=

[A −BR−1BT

−Q −AT

] [x∗

λ∗

]with two-point boundary conditions x(0) = x0, λtf = 0

Connection to value function Vt(x) = xTP(t)x : λ(t) = P(t)x(t)

14 / 23

Page 15: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Optimal Control of Hybrid System• Two modes 1 and 2 with domains

D1 = x2 ≥ 0 and D2 = x2 ≤ 0

• Identical dynamics f1 = f2 =

x1 = u1

x2 = u2

• Switching surface (guard) D1 ∩ D2

• Trivial reset condition

Suppose two modes have different running cost:

F1(x , u, t) = u21 + u2

2 , F2(x , u, t) = 2u21 + 2u2

2

Optimal control problem: Among all the solutions that start from x0 ∈ D1 attime 0, switch exactly once from mode 1 to mode 2, and end at xf ∈ D2 at afixed terminal time tf , find the one with the least cost

• With a switching time t1 ∈ (0, tf ), the cost is∫ t1

0

F1(x , u, t) dt +

∫ tf

t1

F2(x , u, t) dt

15 / 23

Page 16: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Variational Method

To see if u is optimal, perturb it to u + δu

• Switching time t1 is perturbed to t1 + δt1

• State trajectory is perturbed from x to x + δx (two segments)

• Cost J is perturbed to J + δJ

Optimality condition: For u, x to be optimal, following problem shouldhave optimal solution δx = 0 and δu = 0:

minδu

δJ subject to ODE(δx , δu), δx(0) = δx(tf ) = 0

• Introduce Lagrange multiplier (co-state) λ(t), t ∈ [0, tf ], to convert theabove constrained problem to unconstrained problem

• Integration by part and set coefficients of δu and δx to zero

16 / 23

Page 17: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Optimality Condition

(Hybrid) Hamiltonian function H(x , u, λ, t) is defined as

H(x , u, λ, t) :=

F1(x , u, t) + λT f1(x , u, t)

= u21 + u2

2 + λ1u1 + λ2u2 if x ∈ D1

F2(x , u, t) + λT f2(x , u, t)

= 2u21 + 2u2

2 + λ1u1 + λ2u2 if x ∈ D2

Suppose u∗, x∗ are an optimal solution with the switching timet∗1 ∈ (0, tf ). Then there exists a co-state λ∗(t), t ∈ [0, tf ], such that

x∗ = Hλ(x∗, u∗, λ∗, t), t ∈ [0, t1) ∪ (t1, tf ]

λ∗ = −Hx(x∗, u∗, λ∗, t), t ∈ [0, t1) ∪ (t1, tf ]

Moreover, u∗ satisfies ∇uH(x∗, u∗, λ∗, t) = 0; and trasversality condition

(λ(t+1 )− λ(t−1 )) ⊥ Tx∗(t1)(D1 ∩ D2)

17 / 23

Page 18: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Solving Optimality Condition

• As H does not depend on x , λ∗ = −Hx = 0 implies

λ∗(t) ≡

λ− t ∈ [0, t1)

λ+ t ∈ (t1, tf ]

• Transversal condition (λ(t+1 )− λ(t−1 )) ⊥ Tx∗(t1)(D1 ∩ D2) implies

λ−1 = λ+1

• ∇uH(x∗, u∗, λ∗, t) = 0 implies u∗ is constant in each mode:

u∗ =

−λ

2 t ∈ [0, t1)

−λ+

4 t ∈ (t1, tf ]

18 / 23

Page 19: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Finding Optimal Solution

Optimal u∗, x∗ specified by the two vectors λ−, λ+ satisfyingλ−1 = λ+

1

−λ−2

2 · t1 = vertical coordinate of x0

−λ−

2 · t1 − λ+

4 · (tf − t1) = xf − x0

• Four unknowns and four equations

• In general admits a unique solution

• Solution determines u∗, x∗ (and t1)

19 / 23

Page 20: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Snell’s Law

A light ray passing through a boundarybetween two isotropic media

• v1, v2: velocities of light

• n1, n2: refractive indexes

Snell’s Law:

sin θ1

sin θ2=

v1

v2=

n2

n1

From wikipedia.org

20 / 23

Page 21: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Hybrid Maximum Principle

A general theory of maximum principle for hybrid systems:

• H.J. Sussmann, A maximum principle for hybrid optimal controlproblems, CDC, pp. 425-430, 1999

Notable references on optimal control of hybrid systems:

• S. C. Bengea and R. A. DeCarlo, Optimal control of switchingsystems, Automatica , 2005.

• X. Xu and P. Antsaklis Optimal control of switched systems basedon parameterization of the switching instants, TAC, 2004.

• M. Egerstedt, Y. Wardi and H. Axelsson, Transition-timeoptimization for switched-mode dynamical systems, TAC, 2006.

21 / 23

Page 22: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Embedding Technique

Optimal control problem for a switched system with σ(t) ∈ 1, . . . ,m

minu(·),σ(·)

J =

∫ tf

0

Fσ(t)(x(t), u(t), t) dt + g(x(tf ), tf )

subject to x(t) = fσ(t)(x(t), u(t), t), ∀t ∈ [0, tf ]; x(0) = x0 (1)

Idea: solve the optimal control problem for a nonswitched system:

minu(·),∆(·)

J =

∫ tf

0

m∑i=1

∆i (t) · Fi (x(t), u(t), t) dt + g(x(tf ), tf )

subejct to x(t) =m∑i=1

∆i (t) · fi (x(t), u(t), t) (2)

• Non-switched system (2) has inputs u(·) and[∆1(t) · · · ∆m(t)

]taking values in the m-simplex: ∆i (t) ≥ 0,

∑mi=1 ∆i (t) = 1

• Under some mild conditions, solutions of (1) are dense in solutions of (2)

22 / 23

Page 23: Lecture 17: Maximum Principle - Purdue Universityjianghai/Teaching/ECE695/... · 2018-10-25 · which implies the maximum principle Further, we have T(t f) x(t f) = 0 1 If x (t f)

Embedding Technique

Hamiltonian of embedded system

H(x , u,∆, λ, t) =m∑i=1

∆i · Hi (x , u, λ, t)

where Hi (x , u, λ, t) = Fi + λT fi is the Hamiltonian of i-th subsystem

Optimal solution of embedded system:

x = Hλ =m∑i=1

∆i · fi (x , u, t), λ = −m∑i=1

∆i ·∂

∂xHi (x , u, λ, t)

Optimal u and ∆: H(x , u∗,∆∗, λ, t) = mini,u Hi (x , u, λ, t)

• Generally, optimal ∆∗ takes values in a corner of m-simplex

• In some cases, ∆∗ takes interior values and J∗ 6= J∗

“Optimal control of switching systems”, S. C. Bengea and R. A. DeCarlo,Automatica, 2005.

23 / 23