optimal control and dynamic programming · optimal control and dynamic programming duarte antunes....

48
4SC000 Q2 2017-2018 Optimal Control and Dynamic Programming Duarte Antunes

Upload: others

Post on 15-Jun-2020

50 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

4SC000 Q2 2017-2018

Optimal Control and Dynamic Programming

Duarte Antunes

Page 2: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

Part IIStage decision problems

Page 3: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

Recap

Optimal control formulation• Dynamic model & cost function (transition diagram for discrete optimization problems).

• Computing an optimal policy vs computing an optimal path.

1

Dynamic progamming algorithm• Allows to compute policies (to deal with uncertainty).

• Equivalent way to write it: DP equation.

• Stochastic dynamic programming: computes a policy that minimizes an expected cost.

Alternative algorithms• To compute optimal paths, alternative algorithms (e.g. Dijkstra's) may be more efficient.

Partial information• When there is only partial information about the state rely on the Bayes filter.

Page 4: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

2

Introduce optimal control concepts for stage decision problems

Goals of part II

Discrete optimization problems Stage decision problems

Formulation Transition diagramDynamic system &

additive cost function

DP algorithm & Stochastic DP

Graphical DP algorithm & DP equation

DP equation

Alternative algorithms Dijkstra's algorithm Static optimization

Partial information Bayes filter Kalman filter

Applicationfocus

Operational research & Computer science

Digital control

Page 5: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

Outline

• Dynamic programming for stage decision problems

• Linear quadratic regulator

Page 6: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

3

Stage decision problems

h�1X

k=0

gk(xk, uk) + gh(xh)

xk+1 = fk(xk, uk)

Dynamic model

Cost function

k 2 {0, . . . , h� 1}

• State and input live in arbitrary spaces.

• If these spaces are discrete this is a discrete optimization problem.

• Typically and for every .

• Goals: find an optimal path and find an optimal policy.

xk 2 Xk uk 2 Uk(xk)

xk 2 Rn uk 2 Rm k 2 {0, . . . , h� 1}

Page 7: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

4

Optimal path

• Given an initial condition , a path is a set of decisions such that,

• A path is said to be optimal if there does not exist another path with a strictly smaller cost.

x0 {u0, u1, . . . , uh�1}{(x0, u0), . . . , (xh�1, uh�1)}

X0 X1

x0

x1

g0(x0, u0)g1(x1, u1)

gh�1(xh�1, uh�1)

Stage 0 Stage 1 Stage h� 1 Stage h

Xh�1 Xh

xh

xh�1 gh(xh)

x1 = f0(x0, u0) x2 = f1(x1, u1) xh=fh�1(xh�1,uh�1)

and satisfy the equations of the dynamic model.uk 2 Uk(xk)

Page 8: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

5

Optimal policy

Policy A policy is a set of functions ⇡ = {µ0, . . . , µh�1}, µk : Xk ! Uk.

x`

h

Optimal policyA policy is said to be optimal if for every state at every stage , is the first action of an optimal path for the tail subproblem which considers only stages with initial condition and cost

` 2 {0, . . . , h� 1} µ`(x`)

x`{`, `+ 1, . . . , h}

h�1X

k=`

gk(xk, uk) + gh(xh)

Page 9: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

Dynamic programming algorithm

6

Start with for every and for each decision stage, starting from the last and moving backwards, , compute and k 2 {h� 1, h� 2, . . . , 0} Jkµk

Then is an optimal policy.{µ0, . . . , µh�1}

(DP equation)

from

Theorem The policy obtained with the DP algorithm is an optimal policy. (proof in the appendix).

Jh(xh) = gh(xh) xh 2 Xh

and

where is the minimizer in the DP equation, i.e.,uk

J

k

(xk

) = minuk2Uk(xk)

g

k

(xk

, u

k

) + J

k+1(fk(xk

, u

k

))

µk(xk) = uk

Jk(xk) = gk(xk, µk(xk)) + Jk+1(fk(xk, µk(xk))

Page 10: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

7

Dynamic

model

k 2 {0, 1}

Terminal cost

Cost

Quadratic

Non-quadratic

Simple integrator example

Consider the following simple example of a stage-decision problem

xk+1 = xk + uk

g2(x2) = x

22

1X

k=0

x

2k + u

2k + g2(x2)

g2(x2) = e

x2

Page 11: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

8

Quadratic terminal costStep I

x2 = x1 + u1

= minu1

x

21 + u

21 + x

22J1(x1) = min

u1

g1(x1, u1) + g2(x2) = minu1

2(x21 + u

21 + x1u1)

Qx1(u1)

|{z}Quadratic function of u1

-3 -2 -1 0 1 21

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

Qx1(u1)

u1

How to compute the minimum?

Differentiate and equate to zero to find minimizer!

d

du1Q

x1(u1) = 0 , 2(2u1 + x1) = 0 , u1 = �1

2x1

Replacing in we obtain the cost-to-goQx1(u1) J1(x1) =

3

2x

21

Page 12: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

9

Quadratic terminal costStep 2

J0(x0) = minu0

g0(x0, u0) + J1(x1)

x1 = x0 + u0

= minu0

x

20 + u

20 +

3

2x

21 = min

u0

5

2x

20 + 3u0x0 +

5

2u

20

Differentiating and equating to zero we obtain a function belonging to the optimal policy

u0 = �3

5x0

which leads to the cost-to-go

J0(x0) =8

5x

20

Page 13: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

10

Optimal policy and optimal path

Optimal policy

Optimal path for

Computed by using:

Optimal cost

u0 = �3

5x0 u1 = �1

2x1

x0 = 1

k 2 {0, 1}xk+1 = xk + uk

u0 = �3

5u1 = �1

5

x0 = 1x1 =

2

5x2 =

1

5

J0(1) =8

5

Page 14: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

11

Non-quadratic terminal costLet us try to apply the dynamic programming algorithm considering a non-quadratic terminal cost

We get stuck

• this equation implicitly determines from but there is not an explicit form.

• This implies that it is not easy to determine and move to step 2. u1 = µ1(x1)

u1 x1

g2(x2) = e

x2

Step 1J1(x1) = min

u1

g1(x1, u1) + g2(x2) = minu1

x

21 + u

21 + e

x2

x2 = x1 + u1

Differentiating and equating to zero, we obtain

2u1 + ex1+u1 = 0

Page 15: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

12

Discussion

Linear dynamic models, quadratic cost• For these problems, we can explicitly obtain the optimal policy as shown next.

Non-linear dynamic models and/or non-quadratic cost• Very hard to apply DP and hence obtain optimal policies.

• This leads to approximation techniques such as discretization.

• Another class of approximation techniques will addressed in the next lectures.

Page 16: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

Outline

• Dynamic programming for stage decision problems

• Linear quadratic regulator

Page 17: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

13

Linear quadratic regulator

GivenDynamic model

Cost function

Find

•This is the finite-horizon linear quadratic optimal control problem in discrete-time.

•The solution when approaches infinity and the matrices in the dynamic model and cost function are time-invariant is the linear quadratic regulator.

xk+1 = Akxk +Bkuk k 2 {0, . . . , h� 1}

Optimal policyuk = µk(xk) k 2 {0, . . . , h� 1}

h�1X

k=0

⇥x

|k u

|k

⇤ Qk Sk

S

|k Rk

� xk

uk

�+ x

|hQhxh

h

Page 18: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

14

Remarks• The linear quadratic regulator is one of the celebrated results in control

theory and one of the main achievements of optimal control.

Qh � 0Qk Sk

S|k Rk

�� 0, Rk > 0• Assumptions: are symmetric,

xk+1 = Axk +Buk

h�1X

k=0

⇥x

|k u

|k

⇤ Q S

S

|R

� xk

uk

�+ x

|hQhxh

• Model and cost are often time-invariant, i.e., and

• Cost function can result from a continuous-time problem.

• However, in general the cost is specified in discrete-time and used as tuning knob to obtain desired specifications (e.g. overshoot, etc).

• We focus on the stabilization problem, i.e., driving the state to zero.

Qk, Rk

Page 19: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

15

Dynamic programming algorithm

xh = Ah�1xh�1 +Bh�1uh�1Step I

J

h�1(xh�1) = minuh�1

⇥x

|h�1 u

|h�1

⇤ Q

h�1 S

h�1

S

|h�1 R

h�1

� x

h�1

u

h�1

�+ J

h

(xh

)| {z }x

|hQhxh

Jh(Ah�1xh�1+Bh�1uh�1) = (Ah�1xh�1+Bh�1uh�1)|Qh(Ah�1xh�1+Bh�1uh�1)

terminal cost

Quadratic function of uh�1

Then

Jh�1(xh�1) = minuh�1

x

|h�1

�A

|h�1QhAh�1 +Qh�1

�xh�1

+ 2u|h�1

�S

|h�1 +B

|h�1QhAh�1

�xh�1 + u

|h�1

�B

|h�1QhBh�1 +Rh�1

�uh�1

Page 20: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

16

Minimizing a quadratic function in Rn

J(�X�1y) = y|X�1XX�1y � 2y|X�1y + z

= z � y|X�1y

Unique minimizer

Minimum

rJ(u) = 0 , 2Xu+ 2y = 0 , u = �X�1y

minu2Rn

J(u) J(u) = u|Xu+ 2u|y + z

105

0-5

-10-10-5

05

-20

0

20

40

60

80

100

120

140

10

X > 0

Page 21: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

17

Dynamic Programming

Step I

|{z}X

|{z}y

z|{z}

Jh�1(xh�1) = minuh�1

x

|h�1

�A

|h�1QhAh�1 +Qh�1

�xh�1

+ 2u|h�1

�S

|h�1 +B

|h�1QhAh�1

�xh�1 + u

|h�1

�B

|h�1QhBh�1 +Rh�1

�uh�1

Policy

uh�1 = �X

�1y = �

�B

|h�1QhBh�1 +Rh�1

��1�S

|h�1 +B

|h�1QhAh�1

�xh�1

Cost-to-go

Jh�1(xh�1)=z � y

|X

�1y

=x

|h�1

�A

|h�1QhAh�1+Qh�1

�xh�1

� x

|h�1(Sh�1+A

|h�1QhBh�1

��B

|h�1QhBh�1 +Rh�1

��1�S

|h�1 +B

|h�1QhAh�1)xh�1

Page 22: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

18

Dynamic Programming

Step 2

Jh�2(xh�2) = x

|h�2Ph�2xh�2

xh�1 = Ah�2xh�2 +Bh�2uh�2

J

h�2(xh�2) = minuh�2

⇥x

|h�2 u

|h�2

⇤ Q

h�2 S

h�2

S

|h�2 R

h�2

� x

h�2

u

h�2

�+ J

h�1(xh�1)| {z }x

|h�1Ph�1xh�1

uh�2 = Kh�2xh�2

Since the cost-to.go is quadratic (as the terminal cost) we can apply the same reasoning and obtain

Ph�2 =A|h�2Ph�1Ah�2 +Qh�2

��Sh�2 +A|

h�2Ph�1Bh�2

��B|

h�2Ph�1Bh�2 +Rh�2

��1�S|h�2 +B|

h�2Ph�1Ah�2

Kh�2 = ��B|

h�2Ph�1Bh�2 +Rh�2

��1�S|h�2 +B|

h�2Ph�1Ah�2

Page 23: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

19

Dynamic Programming

Step

J

k

(xk

) = minuk

⇥x

|k

u

|k

⇤ Q

k

S

k

S

|k

R

k

� x

k

u

k

�+ J

k+1(xk+1)| {z }x

|k+1Pk+1xk+1

xk+1 = Akxk +Bkukh� k

Jk(xk) = x

|kPkxk

uk = Kkxk

Kk = ��B|

kPk+1Bk +Rk

��1�S|k +B|

kPk+1Ak

Thus, simply iterate these equations for starting with to obtain the optimal policy

uk = Kkxk

The optimal cost for a given initial condition is J0(x0) = x

|0P0x0

Ph = Qh

Riccati equation

k 2 {h� 1, . . . , 1, 0}

Pk = A|kPk+1Ak+Qk�(Sk+A|

kPk+1Bk)�B|

kPk+1Bk+Rk

��1(S|

k +B|kPk+1Ak)

Page 24: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

20

Example: double integrator

Consider a double integrator

Discretization

Continuous-time model

F

y(t)v(t)

�=

0 10 0

� y(t)v(t)

�+

01

�u(t)

y(t) =F (t)

m| {z }u(t)

x(t) = [y(t) v(t)]|

y = 0 y

xk+1 = e

2

40 10 0

3

5⌧

xk +

Z ⌧

0e

2

40 10 0

3

5r

dr

01

�uk

=

1 ⌧

0 1

�xk +

⌧2

2⌧

�uk

Page 25: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

21

Example: double integrator

Qualitative goal: drive the mass to position zero in a fast way but with reasonable actuation values.

Dynamic model

xk+1 =

1 ⌧

0 1

�xk +

⌧2

2⌧

�uk

To achieve this goal let us start with this cost function

and then tune these parameters to improve the results.

Q =

1 00 1

�R = 1 h = 5

⌧ = 0.2

Ph�1k=0(x

|kQxk + u

|kRuk) + x

|hQhxh

Qh =

10 00 10

Page 26: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

22

Dynamic programming

Iterate the following equations to obtain the optimal policy

k 2 {4, 3, 2, 1, 0}

Kk = ��B|

kPk+1Bk +Rk

��1�S|k +B|

kPk+1Ak

Pk = A|kPk+1Ak+Qk�(Sk+A|

kPk+1Bk)�B|

kPk+1Bk+Rk

��1(S|

k +B|kPk+1Ak)

P5 = Q5 =

10 00 10

=

10.9715 1.70941.7094 8.4359

First iteration

=⇥�0.1425 �1.4530

⇤K4 = �(

⇥0.02 0.2

⇤10 00 10

�0.020.2

�+1)�1

⇥0.02 0.2

⇤10 00 10

�1 0.20 1

P4 =

1 0.20 1

�| 10 00 10

� 1 0.20 1

�+

1 00 1

��1 0.20 1

�| 10 00 10

� 0.020.2

�(�K4)

Page 27: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

23

Dynamic programming

P3 =

11.739 3.1443.144 8.0782

�P2 =

12.188 4.3114.311 8.2725

�P1 =

12.295 5.1655.165 8.675

�P0 =

12.121 5.7025.702 9.085

K1 = [�0.807 �1.432]K0 = [�0.918 �1.503] K2 = [�0.638 �1.368] K3 = [�0.414 �1.353]

Next iterations

uk = Kkxk k 2 {0, 1, . . . , 4}Optimal policy

Optimal path for initial condition x0 =⇥1 0

⇤| (iterate )uk = Kkxk

(x0, u0) = ( [1 0]|,�0.918)

(x1, u1) = ([0.982 � 0.184]|,�0.529) (x4, u4) = ([0.8082 � 0.313]|, 0.339)

x5 = [0.7525 � 0.2448]|(x2, u2) = ([0.934 � 0.289]|,�0.200)

(x3, u3) = ([0.8724 � 0.330]|, 0.085)

xk+1 = Axk +Buk

Page 28: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

24

Plots and tuning

t0 0.5 1

y(t)

-0.20

0.20.40.60.81

t0 0.5 1

v(t)

-0.4

-0.3

-0.2

-0.1

0

t0 0.5 1

u(t)

-1

-0.5

0

0.5

Transitory responses are still far from qualitative specifications

Guidelines to tune the cost• By increasing the terminal cost one expects that the response gets closer to the desired

final position.• Same is expected by penalizing more the position error relatively to the velocity error.• Decreasing the penalty on the control action will allow more control authority to reach

the origin.

Page 29: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

25

Increasing terminal cost

t0 0.5 1

y(t)

-0.20

0.20.40.60.81

t0 0.5 1v(t)

-1

-0.8

-0.6

-0.4

-0.2

0

t0 0.5 1

u(t)

-3-2-10123

t0 0.5 1

y(t)

-0.20

0.20.40.60.81

t0 0.5 1

v(t)

-1.5

-1

-0.5

0

t0 0.5 1

u(t)

-5

0

5

Qh = 1000I

Qh = 100I

Final position error improved by increasing terminal cost

Page 30: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

26

Changing state cost

t0 0.5 1

y(t)

-0.20

0.20.40.60.81

t0 0.5 1v(t)

-1

-0.8

-0.6

-0.4

-0.2

0

t0 0.5 1

u(t)

-4

-2

0

2

4Qh = 100I

Q =

10 00 1

t0 0.5 1

y(t)

-0.20

0.20.40.60.81

t0 0.5 1

v(t)

-2

-1.5

-1

-0.5

0

t0 0.5 1

u(t)

-8-6-4-2024

Q =

100 00 1

Increasing position cost leads to smaller position error and larger velocity

Page 31: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

27

Changing control cost

Qh = 100I

t0 0.5 1

y(t)

-0.20

0.20.40.60.81

t0 0.5 1

v(t)

-3

-2

-1

0

1

t0 0.5 1

u(t)

-15

-10

-5

0

5

10

t0 0.5 1

y(t)

-0.20

0.20.40.60.81

t0 0.5 1

v(t)

-5-4-3-2-101

t0 0.5 1

u(t)

-30

-20

-10

0

10

20

Decreasing control penalty leads to fast responses, but large actuation!R = 0.1

R = 0.01 Q =

10 00 1

Page 32: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

28

Cheap control

As we obtain deadbeat control: zero state is achieved in 2 stepsR ! 0

t0 0.5 1

y(t)

-0.20

0.20.40.60.81

t0 0.5 1

v(t)-5-4-3-2-101

t0 0.5 1

u(t)

-30-20-100102030

Can we then always drive a mass to zero in two sampling periods?

• No, because this typically requires very large unfeasible actuations. • Actuators have limitations which were not incorporated in our linear model.• In this LQR framework the solution is to increasingly penalize the control input

until actuation constraints are met. More on this point later.

Page 33: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

29

Increasing the horizon

t0 1 2

y(t)

-0.20

0.20.40.60.81

t0 1 2

v(t)

-0.4

-0.3

-0.2

-0.1

0

t0 1 2

u(t)

-0.8

-0.6

-0.4

-0.2

0

0.2

t0 2 4 6

y(t)

-0.20

0.20.40.60.81

t0 2 4 6

v(t)

-0.5

-0.4

-0.3

-0.2

-0.1

0

t0 2 4 6

u(t)

-1-0.8-0.6-0.4-0.20

0.2

Let us increase the horizon considering Q = Qh = I, R = 1 x0 =⇥1 0

⇤|

J0(x0) = x

|0Px0 = 8.347costh = 10

h = 30 J0(x0) = x

|0Px0 = 9.1881cost

0

0

Page 34: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

30

Increasing the horizon

t0 5 10

y(t)

-0.20

0.20.40.60.81

t0 5 10

v(t)

-0.5-0.4-0.3-0.2-0.10

0.1

t0 5 10

u(t)

-1-0.8-0.6-0.4-0.20

0.2

t0 10 20

y(t)

-0.20

0.20.40.60.81

t0 10 20

v(t)

-0.5-0.4-0.3-0.2-0.10

0.1

t0 10 20

u(t)

-1-0.8-0.6-0.4-0.20

0.2

h = 50

h = 100

J0(x0) = x

|0Px0 = 9.1890

J0(x0) = x

|0Px0 = 9.1890

The cost converges to a constant as the time horizon increases

0

0

Page 35: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

31

Discussion• Since the cost is positive-definite, if the horizon is large the optimal

input should drive the state to zero to stop paying cost.• This explains why the cost converges as the horizon increases.

• This reasoning is valid for every initial condition. Thus if converges as then converges, where results from the recursion

• Note that we are now considering time-invariant

h0X

k=0

gk(xk,Kkuk) +h�1X

k=h0+1

gk(xk,Kkuk) + gT (xT , uT )

|{z}⇡ 0 since xk ⇡ 0 and uk = Kkxk

x

|0P0x0

P0h ! 1

k 2 {h� 1, h� 2, . . . , 0}

P0

Ak, Bk, Qk, Rk, Sk

Pk=A|Pk+1A+Q�(S+A|Pk+1B)�B|Pk+1B+R

��1(S|+B|Pk+1A)

Page 36: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

32

Discussion

• For the double integrator example with

• Let denote the limit of the recursion

then

• Moreover,

Q = Qh = I, R = 1

P

K = ��B|PB+R

��1�S| +B|PA

�P =A|PA+Q�(S+A|PB)

�B|PB+R

��1(S|+B|PA)

Pk=A|Pk+1A+Q�(S+A|Pk+1B)�B|Pk+1B+R

��1(S|+B|Pk+1A)

P0

k0 20 40 60 80 100

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

k1k2

Kk ! K

Pk =

p1,k p2,kp2,k p3,k

Kk =⇥K1,k K2,k

k0 20 40 60 80 100012345678910

p1p2p3

Page 37: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

33

Infinite horizon LQR

P =A|PA+Q�(S+A|PB)�B|PB+R

��1(S|+B|PA)

1X

k=0

x

|kQxk + 2x|

kSuk + u

|kRuk

xk+1 = Axk +Buk

The optimal policy for the stage decision problem with infinite number of stages with dynamic model

and cost function

is given by uk = Kxk

where is the unique positive definite solution to the algebraicP

(A,B) controllable

Q SS| R

�> 0

K = ��B|PB+R

��1�S| +B|PA

Furthermore the closed-loop is exponentially stable.xk+1 = (A+BK)xk

Riccati equation

Proposition (special case of [Bertsekas, Sec. 4., Proposition 4.4.1])

Page 38: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

34

Discussion

• As mentioned in [Bertsekas, Sec. 4.1], we can relax the assumptions is controllable and is positive definite.

(A,B)Q

• For simplicity, throughout the discussion, we assume . S = 0

• In fact, if is controllable, is positive definite, for a full rank and (not necessarily positive definite if ) and is observable, then the previous theorem still holds.

(A,B) R Q = NN|

N 2 Rn⇥r r n r < n(A,N)

R•Moreover, if we further relax the assumptions to is positive definite, is stabilizable, , full rank and is detectable then the theorem still holds except that is not necessarily positive definite.

(A,B)Q = NN|

P(A,N)N

• Actually according to ‘Linear optimal control’, B. O. Andersson, J. B. Moore, Sec 14.1, we just need to assure that is positive definite and is observable to guarantee stability of the closed-loop.

B|QB +R

• Therefore, we can for instance pick and positive definite and the closed-loop is stable (this will not the case for continuous-time optimal control problems).

R = 0 Q

(A,N)

Page 39: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

35

Inverted pendulumInverted pendulum

Linearized model (see [1, p. 32])✓

[1] Feedback control of dynamic systems, Franklin, Powell, Emani-Naeini

d

dt

2

664

x

x

3

775 =

2

6664

0 1 0 0

0 � (I+m`2)bq

m2g`2

q 00 0 0 1

0 �m`bq

mg`(M+m)q 0

3

7775

2

664

x

x

3

775+

2

664

0I+ml2

q

0mlq

3

775u(t)

q = (I +m`2)(M +m)�m2`2

State space

m, I

M x

x = 0

`

u

(I +m`

2)✓ �mg`✓ = m`x

(M +m)x+ bx�m`✓ = u

Page 40: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

36

Matlab implementation

clear all, close all, clc % definition of the continuous-time modelm = 0.2;M = 1;b = 0.05;I = 0.01;g = 9.8;l = 0.5;p = (I+m*l^2)*(M+m)-m^2*l^2;Ac = [0 1 0 0; 0 -(I+m*l^2)*b/p (m^2*g*l^2)/p 0; 0 0 0 1; 0 -(m*l*b)/p m*g*l*(M+m)/p 0];Bc = [ 0; (I+m*l^2)/p; 0; m*l/p];

% discretization n = 4;tau = 0.1;sysd = c2d(ss(Ac,Bc,zeros(1,n),0),tau);A = sysd.a; B = sysd.b; % LQR controlQ = diag([1 1 1 1]);S = zeros(4,1);R = 1;K = dlqr(A,B,Q,R,S); K = -K; % simulationkend = 10/tau;x0 = [1 0 0 0]';x(:,1) = x0;for k=1:kend u(:,k) = K*x(:,k); x(:,k+1) = A*x(:,k)+B*u(:,k);end plot((1:kend)*tau,u), figure, plot((1:kend)*tau,x(3,1:end-1)),figure, plot((1:kend)*tau,x(1,1:end-1)),

Model definition Controller synthesis

Page 41: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

37

Time responses

Q = I, S = 0, R = 1, ⌧ = 0.1

t0 2 4 6 8 10

u

-3

-2

-1

0

1

2

3

t0 2 4 6 8 10

3

-0.02

-0.01

0

0.01

0.02

0.03

0.04

t0 2 4 6 8 10

x

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Page 42: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

38

Tuning the parametersWant faster convergence ? Reduce penalty on control input to increase control authority

Want to reduce the angle amplitude? Increase penalty on the angle state

R = 0.01

t0 2 4 6 8 10

x

0

0.2

0.4

0.6

0.8

1

1.2

t0 2 4 6 8 10

3

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

t0 2 4 6 8 10

u

-3

-2

-1

0

1

2

3

4

t0 2 4 6 8 10

x

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

t0 2 4 6 8 10

3

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

t0 2 4 6 8 10

u

-3

-2

-1

0

1

2

3

Q = diag([1 1 100 1])

Page 43: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

Concluding remarks

To summarise:• Stage decision problems are extensions of discrete optimization problem for which state and input spaces can be arbitrary.

• In practice may be hard to obtain expressions for the costs-to-go

• When the cost is quadratic and the system is linear we obtain a framework for state feedback control design for any linear plant.

39

After this lecture, you should be able to:• Apply DP to stage-decision problems.

• Solve finite-horizon optimal control problems in discrete-time with a quadratic cost and a linear model by iteratively solving Riccati equations.

• Obtain the linear quadratic regulator using the algebraic Riccati equations for infinite horizon problems.

Page 44: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

Appendix A Proof of optimality of dynamic programming

Page 45: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

Proof of optimality

A1

Theorem

The policy obtained with the DP algorithm is an optimal policy.

Proof

We shall prove using induction that obtained by the DP algorithm is an optimal policy for the subproblem from stage to stage and that is the cost of the optimal path starting at .

• Step I: Prove this for .

• Step II: Assume that the induction hypothesis holds for a given and prove it for .

⇡k := {µk, . . . , µh�1}k h� 1

Jk(xk) xk

k = h� 1

k � 1k

Page 46: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

Step I

A2

• It is also clear that

is the optimal cost for the subproblem with initial condition at stage .

• By construction is an optimal policy as it is the first decision of the optimal path from stage to stage since

min

uh�12Uh�1(xh�1)g

h�1(xh�1, uh�1) + J

h

(fh�1(xh�1, uh�1))

= g

h�1(xh�1, µh�1(xh�1)) + J

h

(fh�1(xh�1, µh�1(xh�1))).

⇡h�1 = {µh�1}h� 1 h

J

h�1(xh�1) = minuh�12Uh�1(xh�1) gh�1(xh�1, uh�1) + J

h

(fh�1(xh�1, uh�1))

xh�1 h� 1

Page 47: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

Step II

A3

• Assume now that is an optimal policy and is the cost of the optimal path which starts at initial state . We shall prove using contradiction that is an optimal policy and is the cost of an optimal path which starts at initial state .

• Argument using contradiction: if is not optimal then there must exist a state such that is not the first action of the optimal path from stage to stage denoted by

• Since we are assuming that is an optimal policy we must have

⇡k+1 := {µk+1, . . . , µh�1} Jk+1(xk+1)xk+1

Jk(xk)xk

µk(xk) uk( 6= µk(xk)) k

� = {(xk, uk), (xk+1, uk+1), . . . , xh)}

⇡k+1 := {µk+1, . . . , µh�1}

u`+1 = µ`+1(x`+1) for every ` 2 {k + 1, h� 2}

⇡k := {µk, . . . , µh�1}

⇡k

h

Page 48: Optimal Control and Dynamic Programming · Optimal Control and Dynamic Programming Duarte Antunes. Part II Stage decision problems . Recap Optimal control formulation • Dynamic

Step II

A4

• The cost of such path is

• However, the cost of the path which has as the first decision is less or equal (contradiction)

J� =h�1X

`=k

g`(x`, u`) + gh(xh),

= gk(xk, uk) +h�1X

`=k+1

g`(x`, µ`(xk)) + gh(xh),

= gk(xk, uk) + Jk+1(f(xk, uk)).

µk(xk)

Jk(xk) = minuk

gk(xk, uk) + Jk+1(f(xk, uk))

= gk(xk, µk(xk)) + Jk+1(f(xk, µk(xk))

gk(xk, uk) + Jk+1(f(xk, uk)) = J�