linear quadratic regulatormocha-java.uccs.edu/ece5520/ece5520-ch08.pdf · 2015-10-25 ·...

ECE4520/5520: Multivariable Control Systems I. 8–1

LINEAR QUADRATIC REGULATOR

8.1: Introduction to optimal control

■ The engineering tradeoff in control-system design is

Fast response Slower response

Large intermediate states versus Smaller intermediate states

Large control effort Smaller control effort

EXAMPLE: Consider

Px.t/ D x.t/ ! u.t/

with state feedback u.t/ D !kx.t/; k 2 R:

Px.t/ D .1 C k/x.t/:

■ Eigenvalue at 1 C k. Can make as negative (fast) as we want, with

large negative k and corresponding large input u.t/.

■ Suppose x.0/ D 1, so x.t/ D e.1Ck/t and u.t/ D !ke.1Ck/t .

■ As k ! !1 (i.e., large) u.t/ looks

more and more like ı.t/, the input

we found earlier that

(instantaneously) moves x.t/ to 0!

■ To see this, note thatZ 1

0

u.t/ dt D.!k/

.!k/ ! 1:

Time

Different control signals ju.t/j

Lecture notes prepared by Dr. Gregory L. Plett. Copyright c" 2015, 2011, 2009, 2007, 2005, 2003, 2001, 2000, Gregory L. Plett

ECE4520/5520, LINEAR QUADRATIC REGULATOR 8–2

■ For !k large,

Z

u.t/ dt D 1 and u.t/ is “bunching up” near t D 0.

■ In general, as we relocate our eigenvalues farther and farther to the

left, so that the closed-loop system is faster and faster, our plant input

begins to look like the impulsive inputs we considered earlier.

■ Once again, the tradeoff is speed versus gain/ size of input.

Cost functions (switch to discrete time)

■ To avoid large inputs, we consider the cost function:

J D1X

kD0

!

xŒk!T xŒk! C "uŒk!2"

:

■ We will find the K such that uŒk! D !KxŒk! minimizes this cost.

# We make " large if we don’t want large inputs (“high cost of

control”);

# We make " small if we want fast response and don’t mind large

inputs (“cheap control”).

EXAMPLE: Consider (where xŒk! is a scalar)

xŒk C 1! D xŒk! C uŒk!

with

uŒk! D !KxŒk!:

■ Thus xŒk! D .1 ! K/kxŒ0! so

J D1X

kD0

.xŒk!2 C "uŒk!2/ D

8

ˆ<

ˆ:

xŒ0!21 C "K2

1 ! .1 ! K/2; 0 < K < 2I

1; otherwise:



■ Thus, J D pxŒ0!2 where

p D1 C K2"

K.2 ! K/:

■ We can solve for the optimal K for any given " by

dp

dKD

K.2 ! K/.2K"/ ! .1 C K2"/.2 ! 2K/

ŒK.2 ! K/!2D 0

K2".2 ! K/ D .1 C K2"/.1 ! K/

2"K2 ! "K3 D 1 C "K2 ! K ! "K3

"K2 C K ! 1 D 0:

■ So, (the other solution is a maximum, not a minimum)

Kopt D!1 C

p1 C 4"

2":

■ The optimal cost is

J D".

p1 C 4" ! 1/

2" !p

1 C 4" C 1:

■ For low cost (“cheap”) control, let " ! 0. Then Kopt ! 1 since

lim"!0

!1 Cp

1 C 4"

2"D lim

"!0

2.1 C 4"/!1=2

2D 1;

which is deadbeat control; closed-loop eigenvalues at 0.

■ For high cost (“expensive”) control, let " ! 1 then Kopt !1

p"

, which

is a small (as expected) feedback which just barely stabilizes the

system, but plant input is small. Closed loop eigenvalue at 1 !1

p"

which is < 1.



8.2: Dynamic programming: Bellman’s principle of optimality

■ We will want to minimize a more general cost function J —

minimization is a topic in optimization theory.

■ We will use a tool called dynamic programming.

■ Consider the task of finding the lowest-cost route from point xo to xf ,

where there are many possible ways to get there.

➀

➁

➂

➃

➄

➅

➆

➇

J15 J58

J78J47

J24

J46

J68

J36

J23 J38

J12

■ Then

J $18 D min fJ15 C J58; J12 C J24 C J46 C J68; : : :g :

■ We need to make only one simple observation:

In general, if xi is an intermediate point between xo and xf and

xi is on the optimal path, then

J $of D Joi C J $

if :

■ This is called Bellman’s principle of optimality.

Quadratic forms

■ In the cost function

J D1X

mD0

.xŒm!T xŒm! C "uŒm!2/;



all components of xŒm! are weighted evenly. Often, some

components or some linear combination of components are more

critical than others.

EXAMPLE: It is critical that x1Œk! be brought to zero quickly; x2Œk! doesn’t

matter so much. We might take

J D1X

mD0

xŒm!T

"

10 0

0 0:1

#

xŒm! C "uŒm!2

!

:

■ More generally, we use the quadratic form

xT Œk!QxŒk!

where Q is an n % n weighting matrix.

PROPERTY I: We may assume that Q D QT . Why?

.xT Qx/T

„ ƒ‚ …

a scalar

D xT Qx:

■ Therefore xT QT x D xT Qx and xT Qx D xT Qsymx where

Qsym D1

2.Q C QT / is the symmetric part of Q.

PROPERTY II: J should always be & 0. That is, we require

xT Qx & 0 8 x 2 Rn

then Q is positive semi-definite (we write Q & 0, and #.Q/ & 0).

■ If xT Qx > 0 for all x ¤ 0 then Q is positive definite (we write Q > 0,

and #.Q/ > 0).

Vector derivatives

■ In the following discussion we will often need to take derivatives of

vector/ matrix quantities.



■ This small dictionary should help: a.x/ and b.x/ are m % 1 vector

functions with respect to the vector x, y is some other vector and A is

some matrix.

1.@

@x

#

aT .x/b.x/$

D%

@a.x/

@x

&T

b.x/ C%

@b.x/

@x

&T

a.x/,

2.@

@x.xT y/ D y,

3.@

@x.xT x/ D 2x,

4.@

@x.xT Ay/ D Ay,

5.@

@x.yT Ax/ D AT y,

6.@

@x.xT Ax/ D .A C AT /x,

7.@

@x

#

aT .x/Qa.x/$

D 2

%

@a.x/

@x

&T

Qa.x/, where Q is a symmetric

matrix.

■ This brings us back to our problem. . .



8.3: The discrete-time linear quadratic regulator problem

■ Most generally, the discrete-time LQR problem is posed as minimizing

Ji;N D xT ŒN !P xŒN ! CN !1X

kDi

#

xT Œk!QxŒk! C uT Œk!RuŒk!$

;

which may be interpreted as the total cost associated with the

transition from state xŒi ! to the goal state 0 at time N .

■ xT ŒN !P xŒN ! is the penalty for “missing” the desired final state.

■ xT Œk!QxŒk! is the penalty on excessive state size.

■ uT Œk!RuŒk! is the penalty on excessive control effort. (R D " if SISO).

■ We require P & 0, Q & 0 and R > 0.

■ To find optimum uŒk!, we start at last step and work backwards.

JN !1;N D xT ŒN !P xŒN ! C xT ŒN ! 1!QxŒN ! 1! C uT ŒN ! 1!RuŒN ! 1!:

■ We express xŒN ! as a function of xŒN ! 1! and uŒN ! 1! via the

system dynamics

JN !1;N D .AxŒN ! 1! C BuŒN ! 1!/T P .AxŒN ! 1! C BuŒN ! 1!/

CxT ŒN ! 1!QxŒN ! 1! C uT ŒN ! 1!RuŒN ! 1!

D xT ŒN ! 1!AT PAxŒN ! 1! C uT ŒN ! 1!BT PBuŒN ! 1!

CxT ŒN ! 1!AT PBuŒN ! 1! C uT ŒN ! 1!BT PAxŒN ! 1!

CxT ŒN ! 1!QxŒN ! 1! C uT ŒN ! 1!RuŒN ! 1!:

■ We minimize over all possible inputs uŒN ! 1! by differentiation

0 D@JN !1;N

@uŒN ! 1!D 2BT PBuŒN ! 1! C 2BT PAxŒN ! 1! C 2RuŒN ! 1!

D 2!

R C BT PB"

uŒN ! 1! C 2BT PAxŒN ! 1!:



■ Therefore, u$ŒN ! 1! D !!

R C BT PB"!1

BT PA„ ƒ‚ …

ConstantŠ

xŒN ! 1!:

■ The exciting point is that the optimal uŒN ! 1!, with no constraints on

its functional form, turns out to be a linear state feedback! To ease

notation, define

KN !1 D!

R C BT PB"!1

BT PA

such that

u$ŒN ! 1! D !KN !1xŒN ! 1!:

■ Now, we can express the value of J $N !1;N as

J $N !1;N D

h'

AxŒN ! 1! ! BKN !1xŒN ! 1!(T

P'

AxŒN ! 1!

!BKN !1xŒN ! 1!(

CxT ŒN ! 1!QxŒN ! 1! C xT ŒN ! 1!KTN !1RKN !1xŒN ! 1!

i

D xT ŒN ! 1!h

.A ! BKN !1/T P.A ! BKN !1/ C Q

CKTN !1RKN !1

i

xŒN ! 1!:

■ Simplify notation once again by defining

PN !1 D .A ! BKN !1/T P.A ! BKN !1/ C Q C KT

N !1RKN !1;

so that

J $N !1;N D xT ŒN ! 1!PN !1xŒN ! 1!:

■ To see that this notation makes sense, notice that

JN;N D J $N;N D xT ŒN !P xŒN !

4D xT ŒN !PN xŒN !:



■ Now, we take another step backwards and compute the cost JN !2;N

JN !2;N D JN !2;N !1 C JN !1;N :

■ Therefore, the optimal policy (via dynamic programming) is

J $N !2;N D JN !2;N !1 C J $

N !1;N :

■ To minimize this, we realize that N ! 1 is now the goal state and

JN !2;N !1 D .AxŒN ! 2! C BuŒN ! 2!/T PN !1 .AxŒN ! 2! C BuŒN ! 2!/

CxT ŒN ! 2!QxŒN ! 2! C uT ŒN ! 2!RuŒN ! 2!:

■ We can find the best result just as before u$ŒN ! 2! D !KN !2xŒN ! 2!

where KN !2 D!

R C BT PN !1B"!1

BT PN !1A:

■ In general, u$Œk! D !KkxŒk! where Kk D!

R C BT PkC1B"!1

BT PkC1A

and

Pk D .A ! BKk/T PkC1.A ! BKk/ C Q C KTk RKk;

■ This difference equation for Pk has a starting condition that occurs at

the final time, and is solved recursively backwards in time.

EXAMPLE: Simulate a feedback controller for the system

xŒk C 1! D

"

2 1

!1 1

#

xŒk! C

"

0

1

#

uŒk!; xŒ0! D

"

2

!3

#

such that the cost criterion

J D xT Œ10!

"

5 0

0 5

#

xŒ10! C9X

kD1

xT Œk!

"

2 0

0 0:1

#

xŒk! C 2u2Œk!

!

is minimized.



■ From the problem, we gather that

P10 D

"

5 0

0 5

#

; Q D

"

2 0

0 0:1

#

; R D Œ2!:

■ Iteratively, solve for K9, P9, K8, P8 and so forth down to K1 and P1.

Then, uŒk! D !KkxŒk!.

A=[2 1; -1 1]; B=[0; 1]; x0=[2; -3]; P=zeros(2,2,10); K=zeros(1,2,9);

x=zeros(2,1,11); x(:,:,1)=x0;

P(:,:,10)=[5 0; 0 5]; R=2; Q=[2 0; 0 0.1];

for i=9:-1:1,

K(:,:,i)=inv(R+B’*P(:,:,i+1)*B)*B’*P(:,:,i+1)*A;

P(:,:,i)=(A-B*K(:,:,i))’*P(:,:,i+1)*(A-B*K(:,:,i))+ ...

Q+K(:,:,i)’*R*K(:,:,i);

end

for i=1:9,

x(:,:,i+1)=A*x(:,:,i)-B*K(:,:,i)*x(:,:,i);

end

0 2 4 6 8 10−3

−2

−1

0

1

2State vector x[k]

Time sample, k

Valu

e

1 2 3 4 5 6 7 8 9−1

−0.5

0

0.5

1

1.5

2

2.5

k2

k1

Feedback Gains K[k]

Time sample, k

Valu

e

1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

P11

P12=P21

P22

Elements of the P matrix

Time sample, k

Valu

e



8.4: Infinite-horizon discrete-time LQR

■ If we let N ! 1, then Pk tends to a steady-state solution as k ! 0.

Therefore, Kk ! K. This is clearly a much easier control design, and

usually does just about as well.

■ To find the steady-state P and K, we let Pk D PkC1 D Pss in the

above equation.

Pss D .A ! BK/T Pss.A ! BK/ C Q C KT RK

and

K D!

R C BT PssB"!1

BT PssA

which may be combined to get

Pss D AT PssA ! AT PssB!

R C BT PssB"!1

BT PssA C Q

which is called a discrete-time algebraic Riccati equation, and may be

solved in MATLAB using dare.m

EXAMPLE: For the previous example (with a finite end time), the solution

reached for P1 was

P1 D

"

49:5336 28:5208

28:5208 20:8434

#

:

In MATLAB, dare(A,B,Q,R) for the same system gives

Pss D

"

49:5352 28:5215

28:5215 20:8438

#

:

So, we see that the system settles very quickly to steady-state

behavior.

■ There are many ways to solve the D.A.R.E., but when Q has the form

C T C , and the system is SISO, there is a simple method which yields



the optimal closed-loop eigenvalues directly. (Note, when Q D C T C

we are minimizing the output energy jyŒk!j2).

Chang–Letov method

■ The optimal eigenvalues are the roots of the equation

1 C1

"GT .´!1/G.´/ D 0

which are inside the unit circle, where

G.´/ D C.´I ! A/!1B C D:

(Proved later for the continuous-time version).

EXAMPLE: Consider G.´/ D1

´ ! 1so

1 C"!1

.´ ! 1/.´!1 ! 1/D 0

2 C "!1 ! ´ ! ´!1 D

´ D 1 C1

2"˙

s

1

4"2C

1

":

■ The locus of optimal pole locations for all " form a reciprocal root

locus.

Reciprocal root locus in MATLAB (SISO)

■ We want to plot the root locus

1 C1

"GT .´!1/G.´/ D 0;

where

G.´/ D C.´I ! A/!1B C D:



■ We know how to plot a root locus of the form

1 C KG 0.´/ D 0

so we need to find a way to convert GT .´!1/G.´/ into G 0.´/.

■ We know that

GT .´!1/ D BT!

´!1I ! AT"!1

C T C DT

D BT ´!

´I ! A!T"!1

.!A!T C T / C DT :

■ Combining G.´/ and GT .´!1/ in block-diagram form:

uŒk!

yŒk!

xŒk!xŒk C 1!#Œk!

#Œk C 1!

´!1´!1

A

B C

D

A!T

BT

!C T

DT

■ The overall system has state"

xŒk C 1!

#Œk C 1!

#

D

"

A 0

!A!T C T C A!T

#"

xŒk!

#Œk!

#

C

"

B

!A!T C T D

#

uŒk!

yŒk! Dh

!BT A!T C T C C DT C BT A!Ti"

xŒk!

#Œk!

#

C

#

DT D ! BT A!T C T D$

uŒk!:

function rrl(sys)

[A,B,C,D]=ssdata(sys);

bigA=[A zeros(size(A)); -inv(A)’*C’*C inv(A)’];

bigB=[B; -inv(A)’*C’*D];

bigC=[-B’*inv(A)’*C’*C+D’*C B’*inv(A)’];

bigD=-B’*inv(A)’*C’*D+D’*D;



rrlsys=ss(bigA,bigB,bigC,bigD,-1);

rlocus(rrlsys);

EXAMPLE: Let

G.´/ D.´ C 0:25/.´2 C ´ C 0:5/

.´ ! 0:2/.´2 ! 2´ C 2/:

Note that G.´/ is unstable.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Imag

Axi

s

Real Axis

Reciprocal Root Locus

OBSERVATIONS: For the “expensive cost of control” case, stable poles

remain where they are and unstable poles are mirrored into the unit

disc. (They are not moved to be just barely stable, as we might

expect!)

For the “cheap cost of control” case, poles migrate to the finite zeros

of the transfer function, and to the origin (deadbeat control).



8.5: The continuous-time linear quadratic regulator problem (a–c)

■ The continuous-time LQR problem is stated in a similar way, and

there are corresponding results. We wish to minimize

J.xo; u; to/ D xT .tf /Ptf x.tf / CZ tf

t0

#

xT .t/Qx.t/ C uT .t/Ru.t/$

dt:

■ Ptf , Q and R have same restrictions and interpretations as before.

RESULTS: The following are key results

■ The optimal control is a linear (time varying) state feedback

u.t/ D !R!1BT P.t/x.t/:

■ Symmetric psd matrix P.t/ satisfies (matrix) differential equation

PP .t/ D P.t/BR!1BT P.t/ ! Q ! P.t/A ! AT P.t/;

with the boundary condition that P.tf / D Ptf . The differential

equation runs backwards in time to find P.t/.

■ If tf ! 1, P.t/ ! Pss as t ! 0. Then,

0 D PssBR!1BT Pss ! Q ! PssA ! AT Pss:

This is the continuous-time algebraic Riccati equation, and may be

solved in MATLAB using care.m; then,

u.t/ D !R!1BT Pssx.t/;

which is a linear state feedback.

■ There are many ways to solve the the C.A.R.E., but when Q has the

form C T C , and the system is SISO, a variant of the Chang–Letov

method may be used:



# The optimal eigenvalues are the roots of the equation

1 C1

"GT .!s/G.s/ D 0

which are in the left-half plane, where

G.s/ D C.sI ! A/!1B C D:

# The locus of all possible values of closed-loop optimal roots forms

the symmetric root locus.

Solving the continuous-time LQR problem

1. Define the cost function.

2. Use Bellman’s principle of optimality (dynamic programming).

3. Determine the Hamilton–Jacobi–Bellman equation.

4. Solve this equation (steps outlined later on).

Define the cost function

■ We define the cost function we wish to minimize

J.xo; u; to/ D xT .tf /Ptf x.tf / CZ tf

to

#


dt

where Q & 0, Ptf & 0 and R > 0.

■ We define the optimal cost

V.xo; to/ D minu.t/

J.xo; u; to/ subject to Px.t/ D Ax.t/ C Bu.t/:

Invoke Bellman’s principle of optimality

■ We break the cost function into two pieces (where ıt is small)



J.xo; u; to/ D xT .tf /Ptf x.tf / CZ toCıt

to

#


dt

CZ tf

toCıt

#


dt:

■ From the Bellman equation we know that the optimal cost

V.xo; to/ D minu.t/

(Z toCıt

to

#


dt

CV.x.to C ıt/; to C ıt/o

:

■ The minimum cost is the cost to go from x.to/ to x.to C ıt/ plus the

optimal cost to go from x.to C ıt/ to x.tf /. The latter part includes the

terminal cost.

Determine the Hamilton–Jacobi–Bellman equation

■ We evaluate V.x.to C ıt/; to C ıt/ by computing its Taylor-series

expansion around the point .xo; to/.

V.x.to C ıt/; to C ıt/ D V.xo; to/ [email protected]; t/

@t

ˇˇˇˇxo;to

Œ.to C ıt/ ! to!

[email protected]; t/

@x

ˇˇˇˇxo;to

Œx.to C ıt/ ! x.to/! C h:o:t:

■ So, if ıt is small

V.xo; to/ D minu.t/

(Z toCıt

to

#


dt

CV.xo; to/ [email protected]; t/

@t

ˇˇˇˇxo;to

ıt

„ ƒ‚ …

Not functions of u.t/




@x

ˇˇˇˇxo;to

Œx.to C ıt/ ! x.to/!

)

D V.xo; to/ [email protected]; t/

@t

ˇˇˇˇxo;to

ıt

C minu.t/

(Z toCıt

to

#


dt

„ ƒ‚ …

'ŒxT .to/Qx.to/CuT .to/Ru.to/!ıt


@x

ˇˇˇˇxo;to

Œx.to C ıt/ ! x.to/!„ ƒ‚ …

'ŒAx.to/CBu.to/!ıt

9

=

;:

■ Subtracting like terms from both sides and dividing by ıt

0 [email protected]; t/

@t

ˇˇˇˇxo;to

C

minu.t/

)

ŒxT .to/Qx.to/ C uT .to/Ru.to/! [email protected]; t/

@x

ˇˇˇˇxo;to

ŒAx.to/ C Bu.to/!

„ ƒ‚ …

Hamiltonian

*

■ This is called the Hamilton–Jacobi–Bellman equation.

■ To minimize the Hamiltonian (with respect to u.to/), take derivatives

with respect to u.to/ and set to zero.

0 D@T

@u.to/

"

ŒxT .to/Qx.to/ C uT .to/Ru.to/! [email protected]; t/

@x

ˇˇˇˇxo;to

ŒAx.to/ C Bu.to/!

#

D 2Ru.to/ C BT @V.x; t/T

@x

ˇˇˇˇxo;to

:

■ So,

u$.to/ D !1

2R!1BT @V.x; t/T

@x

ˇˇˇˇxo;to

;



hence the need for R to be positive definite.

■ We still need to [email protected]; t/

@x

ˇˇˇˇxo;to

.

1. Show V.´; to/ D ´T P.to/´ where P.to/ is symmetric, p.s.d.

2. Use this result to compute the final desired term.



8.6: The continuous-time linear quadratic regulator problem (d)

Show that V.´; to/ D ´T P.to/´

■ The minimum cost-to-go starting in state ´ is a quadratic form in ´.

Can be shown in a number of steps. The main steps are:

1. Show that the gradient operator on V (that is, rV ) is linear.

2. Integrate the (linear) gradient to get a quadratic form.

We will develop a number of properties in order to prove these results.

PROPERTY I: For all scalars #, J.#´; #u; to/ D #2J.´; u; to/ and therefore

V.#´; to/ D #2V .´; to/:

■ Let x.t/ be the state that corresponds to an input u.t/ and an initial

condition ´. Then,

x.t/ D eAt´ CZ t

0

eA.t!%/Bu.%/ d%:

■ Now, denote by Qx.t/ the state that corresponds to an input #u.t/ and

an initial condition #´. Then,

Qx.t/ D #

%

eAt´ CZ t

0

eA.t!%/Bu.%/ d%

&

D #x.t/:

Thus,

J.#´; #u; to/ D #2xT .tf /Ptf x.tf / C #2

Z tf

to

xT .t/Qx.t/ C uT .t/Ru.t/ dt

D #2J.´; u; to/

and

V.#´; to/ D #2V .´; to/:



PROPERTY II: Let u and Qu be two input sequences, and let ´ and Q́ be

two initial states. We will show that

J.´ C Q́ ; u C Qu; to/ C J.´ ! Q́ ; u ! Qu; to/ D 2J.´; u; to/ C 2J. Q́ ; Qu; to/

by “plugging in” and collecting terms.

■ Suppose

Px.t/ D Ax.t/ C Bu.t/; xo D ´I

PQx.t/ D A Qx.t/ C B Qu.t/; Qxo D Q́ :

■ Adding (or subtracting) the above equations we obtain

Px.t/ ˙ PQx.t/ D A.x.t/ ˙ Qx.t// C B.u.t/ ˙ Qu.t//; xo ˙ Qxo D ´ ˙ Q́ :

Therefore, x.t/ ˙ Qx.t/ is the state that corresponds to an input

u.t/ ˙ Qu.t/ and initial condition ´ ˙ Q́ (respectively).

■ Now, we plug in

J.´ ˙ Q́ ; u ˙ Qu; to/

D .x ˙ Qx/T .tf /Ptf .x ˙ Qx/.tf /

CZ tf

to

#

.x ˙ Qx/T .t/Q.x ˙ Qx/.t/ C .u ˙ Qu/T .t/R.u ˙ Qu/.t/$

dt

D xT .tf /Ptf x.tf / ˙ xT .tf /Ptf Qx.tf / ˙ QxT .tf /Ptf x.tf / C QxT .tf /Ptf Qx.tf /

CZ tf

to

h

xT .t/Qx.t/ ˙ QxT .t/Qx.t/ ˙ xT .t/Q Qx.t/ C QxT .t/Q Qx.t/

uT .t/Ru.t/ ˙ uT .t/R Qu.t/ ˙ QuT .t/Ru.t/ C QuT .t/R Qu.t/i

dt:

■ Therefore,

J.´ C Q́ ; u C Qu; to/ C J.´ ! Q́ ; u ! Qu; to/ D 2J.´; u; to/ C 2J. Q́ ; Qu; to/:



PROPERTY III: Next, minimize the RHS with respect to u.t/ and Qu.t/.

Conclude

V.´ C Q́ ; to/ C V.´ ! Q́ ; to/ & 2V.´; to/ C 2V. Q́ ; to/:

■ Minimizing

minu; Qu

fJ.´ C Q́ ; u C Qu; to/ C J.´ ! Q́ ; u ! Qu; to/g

D minu

f2J.´; u; to/g C minQu

f2J. Q́ ; Qu; to/g :

■ Now,

RHS D 2V.´; to/ C 2V. Q́ ; to/

but

LHS ( V.´ C Q́ ; to/ C V.´ ! Q́ ; to/

by the triangle inequality. Therefore,

V.´ C Q́ ; to/ C V.´ ! Q́ ; to/ & 2V.´; to/ C 2V. Q́ ; to/:

PROPERTY IV: Apply the above inequality with .´ C Q́/=2 substituted for ´

and .´ ! Q́/=2 substituted for Q́ to get:

V.´ C Q́ ; to/ C V.´ ! Q́ ; to/ D 2V.´; to/ C 2V. Q́ ; to/:

■ Substitute as directed.

2V

+

´ C Q́2

; to

,

C 2V

+

´ ! Q́2

; to

,

( V.´; to/ C V. Q́ ; to/:

■ By scalar multiplication principle,

2

4V.´ C Q́ ; to/ C

2

4V.´ ! Q́ ; to/ ( V.´; to/ C V. Q́ ; to/:



■ Multiply both sides by 2

V.´ C Q́ ; to/ C V.´ ! Q́ ; to/ ( 2V.´; to/ C 2V. Q́ ; to/;

and, combined with results of property III,

V.´ C Q́ ; to/ C V.´ ! Q́ ; to/ D 2V.´; to/ C 2V. Q́ ; to/:

PROPERTY V: Now we are getting somewhere. Recall that linearity

requires superposition and scaling properties be met. First, we prove

superposition of the gradient operator. Take partial derivatives of this

equation with respect to ´ and Q́ . Show that

rV.´ C Q́/ D rV.´/ C rV. Q́/:

■ The gradient operator is defined as

rf .x/ D@f .x/T

@x:

+

Also; rf .ax/ D@f .ax/T

@ax:

,

■ Take partial derivatives of the equation with respect to ´:

@V.´ C Q́ ; to/

@Ć

@V.´ ! Q́ ; to/

@´D 2

@V.´; to/

@Ć 2

@V. Q́ ; to/

@´

@V.´ C Q́ ; to/

@.´ C Q́/

@.´ C Q́/

@Ć

@V.´ ! Q́ ; to/

@.´ ! Q́/

@.´ ! Q́/

@´D 2rV.´; to/

T

rV.´ C Q́ ; to/ C rV.´ ! Q́ ; to/ D 2rV.´; to/:

■ Take partial derivatives of the equation with respect to Q́ :

@V.´ C Q́ ; to/

@ Q́C

@V.´ ! Q́ ; to/

@ Q́D 2

@V.´; to/

@ Q́C 2

@V. Q́ ; to/

@ Q́@V.´ C Q́ ; to/

@.´ C Q́/

@.´ C Q́/

@ Q́C

@V.´ ! Q́ ; to/

@.´ ! Q́/

@.´ ! Q́/

@ Q́D 2rV. Q́ ; to/

T

rV.´ C Q́ ; to/ ! rV.´ ! Q́ ; to/ D 2rV. Q́ ; to/:



■ Add the two results and divide by two to get

rV.´ C Q́ ; to/ D rV.´; to/ C rV. Q́ ; to/:

PROPERTY VI: To show linearity of the gradient the last step we must

perform is to show

rV.#´; to/ D #rV.´; to/:

■ From the definition of the gradient,

rV.#´; to/ D@V.#´; to/

T

@.#´/

D@#2V .´; to/

T

#@´

D #rV.´; to/:

■ So, the gradient is linear. This means that rV.´; to/, which is a vector,

is linear in ´ and hence has a matrix representation

rV.´; to/ D M.to/´

where M.to/ 2 Rn%n.

PROPERTY VII: We are nearly ready to integrate the gradient to show our

desired result. First, we must show that

V.´; to/ D V.0; to/ CZ 1

0

rV.&´; to/T ´ d&:

■ Note that V.´; to/ is a scalar. Consider a scalar function f .&/. Then,Z 1

0

@f .&/

@&d& D f .1/ ! f .0/:

■ Let f .&/ D V.&´; to/. Then,



V.´; to/ ! V.0; to/ DZ 1

0

@V.&´; to/

@&d&

V.´; to/ D V.0; to/ CZ 1

0

@V.&´; to/

@.&´/

@&´

@&d&

D V.0; to/ CZ 1

0

rV.&´; to/T ´ d&

PROPERTY VIII: Now, integrate away to show the desired result. Note

that V.0; to/ D 0.

V.´; to/ DZ 1

0

.M.to/.&´//T ´ d& DZ 1

0

´T M T .to/´& d&

D ´T M T .to/´&2

2

ˇˇˇˇ

1

0

D ´T M T .to/´=2:

■ Since V.´; to/ is a scalar, V.´; to/T D V.´; to/ D ´T M.to/´=2.

Averaging our two (identical) results,

V.´; to/ D ´T

+

M.to/ C M T .to/

4

,

´:

■ Therefore, P.to/ DM.to/ C M T .to/

4. Also, P.to/ & 0 since

J.´; u; to/ & 0 for all u; ´. Thus

V.´; to/ D minu

J.´; u; to/ D ´T P.to/´ & 0 8 ´;

and we have (finally) proven the desired result.

The optimal u$.t/ and differential Riccati equation

■ Because V.x; t/ D xT P.t/x,@V.x; t/

@x

ˇˇˇˇxo;to

D 2xT .to/PT .to/. We can

now state



u$.t/ D ! R!1BT P.t/„ ƒ‚ …

K.t/

x.t/

so we see that the optimum control, with no a priori constraints on the

structure of the u.t/ signal, is a (time varying) linear state feedback.

■ We need to determine P.t/. Note that in the

Hamilton–Jacobi–Bellman equation we have yet to determine

@V.x; t/

@t

ˇˇˇˇxo;to

D@

@txT P.t/x

ˇˇˇˇxo;to

D xTo

PP .to/xo:

■ Substitute all results, including optimum u$.t/

0 D xT .to/ PP .to/x.to/ C xT .to/Qx.to/ C xT .to/P.to/BR!1BT P.to/x.to/

C2xT .to/P.to/Ax.to/ ! 2xT .to/P.to/BR!1BT P.to/x.to/:

■ This expression is valid for all to. Also note that we can write

2xT .to/P.to/Ax.to/ D xT .to/P.to/Ax.to/ C xT .to/AT P.to/x.to/;

so

0 D xT .t/# PP .t/ C Q ! P.t/BR!1BT P.t/ C P.t/A C AT P.t/

$

x.t/

which is true for any x.t/. Therefore,

PP .t/ D P.t/BR!1BT P.t/ ! P.t/A ! AT P.t/ ! Q

which is called the differential (matrix) Riccati equation.

■ This is a nonlinear differential equation with boundary condition

P.tf / D Ptf , solved backward in time.

Steady-state solution

■ As the differential equation for P.t/ is simulated backward in time

from the terminal point, it tends toward steady-state values as t ! 0.



It is much simpler to approximate the optimal control gains as a

constant set of gains calculated using Pss.

0 D PssBR!1BT Pss ! PssA ! AT Pss ! Q:

This is called the Algebraic Riccati Equation. In MATLAB, care.m



8.7: Solving the differential Riccati equation via simulation

■ The differential Riccati equation may be solved numerically by

integrating the matrix differential equation

PP .t/ D P.t/BR!1BT P.t/ ! P.t/A ! AT P.t/ ! Q

backward in time.

■ The problem we discover is that MATLAB’s integration routines

ode45.m will work only on vector differential equations, not matrix

differential equations such as this.

■ The Kronecker product ˝ comes to the rescue once again, along with

the matrix stacking operator. We can write the above matrix

differential equation as a vector differential equation:

PPst D!

AT ˝ I C I ˝ AT"

Pst C Qst !!

P T ˝ P" !

BR!1BT"

st:

A “!” sign has been introduced in order for the forward-time ode45.m

(for example) to work on the backward-time equation.

■ In MATLAB

pdot=(kron(A’,eye(size(A)) + kron(eye(size(A)),A’)) ...

*st(P) + st(Q) - kron(P’,P)*st(B*inv(R)*B’);

function col=st(m) % stack subfunction

col=reshape(m,prod(size(m)),1);

function mat=unst(v) % unstack subfunction

mat=reshape(v,sqrt(length(v)),sqrt(length(v)));

EXAMPLE: Consider the continuous-time system

Px.t/ D

"

1 0

2 0

#

x.t/ C

"

1

0

#

u.t/ and y.t/ Dh

0 1i

x.t/:



■ Solve the differential matrix Riccati equation that results in the control

signal that minimizes the cost function

J D xT .5/

"

2 0

0 2

#

x.5/ CZ 5

0

ŒyT .t/y.t/ C uT .t/u.t/! dt:

■ First, note that the open-loop system is unstable, with poles at 0 and

1. It is controllable and observable.

■ The cost function is written in terms of y.t/ but not x.t/. However,

since there is no feedthrough term, we can also write it as

J D xT .5/

"

2 0

0 2

#

x.5/ CZ 5

0

#

xT .t/C T Cx.t/ C uT .t/u.t/$

dt:

This is a common “trick”.

■ Therefore, the penalty matrices are Q D C T C and R D " D 1.

■ We can simulate the finite horizon case to find P.t/.

To plot: plot(tvec.signals.values,pvec.signals.values)

Integrator has "initial condition"

st(Ptf).

Final time.

MATLABFunction

kron(unst(u)’,unst(u))*st(B*inv(R)*B’)

tvec

pvec1/s

5

st(Q)

Clock

MATLABFunction

(kron(A’,eye(size(A)))+kron(eye(size(A)),A’))*u

0 1 2 3 4 50

0.5

1

1.5

2

2.5

3

3.5

4

P12=P21

P22

P11

Solving for P

Time (sec)

■ We can also solve the infinite-horizon case (analytically, for this

example). Consider the A.R.E.

0 D AT P C PA C C T C ! PBR!1BT P"

0 0

0 0

#

D

"

1 2

0 0

#"

p11 p12

p21 p22

#

C

"

p11 p12

p21 p22

#"

1 0

2 0

#

C

"

0 0

0 1

#

!


ECE4520/5520, LINEAR QUADRATIC REGULATOR 8–30"

p11 p12

p21 p22

#"

1 0

0 0

#"

p11 p12

p21 p22

#

D

"

p11 C 2p12 p12 C 2p22

0 0

#

C

"

p11 C 2p12 0

p12 C 2p22 0

#

C

"

0 0

0 1

#

!

"

p211 p11p12

p11p12 p212

#

:

■ This matrix equality represents a set of three simultaneous equations

(because P is symmetric). They are:

2p11 ! p211 C 4p12 D 0

p12 C 2p22 ! p11p12 D 0

1 ! p212 D 0:

■ The final equation gives us p12 D ˙1. If we select p12 D !1 then the

first equation will have complex roots (bad). So, p12 D 1.

■ Then, p11 D 1 ˙p

5. If p11 D 1 !p

5 then P cannot be positive

definite. Therefore, p11 D 1 Cp

5 D 3:236.

■ Finally, we get p22 Dp

5=2 D 1:118.

■ These are the same values as the steady-state solution found by

integrating the differential Riccati equation.

■ The static feedback control signal is

u.t/ D !R!1BT Pssx.t/ D !h

3:236 1i

x.t/:

For this feedback, the closed-loop poles are at !p

5

2˙

p3

2j (stable).



8.8: Continuous-time systems and Chang–Letov (SISO only)

■ For a SISO system, we can easily plot the locus of closed-loop poles.

■ Tradeoff between control effort and output error is evident.

■ Consider the infinite-horizon LQR problem with Q D C T C , C 2 R1%n

and R D ", " 2 R.

■ The cost function is then J DZ 1

0

y2.t/ C "u2.t/ dt:

■ The algebraic Riccati equation becomes

AT P C PA ! PB"!1BT P C C T C D 0

or

C T C D P.sI ! A/ C .!sI ! AT /P C PB"!1BT P:

■ Multiply on left by BT .!sI ! AT /!1 and on right by .sI ! A/!1B"!1.

1

"

˚

BT .!sI ! AT /!1C T- ˚

C.sI ! A/!1B-

D

BT .!sI ! AT /!1 PB"!1

„ƒ‚…

KT

C "!1BT P„ ƒ‚ …

K

.sI ! A/!1B

CBT .!sI ! AT /!1PB"!2BT P.sI ! A/!1B:

■ The left-hand side is1

"GT .!s/G.s/. Add 1 to both sides and collect

.1 C K.sI ! A/!1B/.1 C K.!sI ! A/!1B/ D 1 C1

"GT .!s/G.s/:

■ Note that all terms are scalars.

FACT: Consider the determinant of a block matrix (we will not prove this

fact here, but the result may be found in many linear algebra books):



det

2

4NA NBNC ND

3

5 D det. NA/ det. ND ! NC NA!1 NB/:

FACT: 1 C K.sI ! A/!1B Ddet.sI ! A C BK/

det.sI ! A/D

'cl.s/

'ol.s/.

PROOF: Consider the block matrix

M1 D

2

4sI ! A B

!K 1

3

5

det.M1/ D det.sI ! A/ det.1 C K.sI ! A/!1B/

D det.sI ! A/.1 C K.sI ! A/!1B/:

■ Now, consider the product of matrices (where r ¤ 0)

M2 D

2

4sI ! A B

!K 1

3

5

2

4I p

K r

3

5

D

2

4sI ! A C BK .sI ! A/p C Br

0 !Kp C r

3

5

det.M2/ D det.sI ! A C BK/ det.!Kp C r/

D#

det.sI ! A/.1 C K.sI ! A/!1B/$

det.!Kp C r/;

or

1 C K.sI ! A/!1B Ddet.sI ! A C BK/

det.sI ! A/D

'cl.s/

'ol.s/:

■ So, from before, we have'cl.s/'cl.!s/

'ol.s/'ol.!s/D 1 C

1

"GT .!s/G.s/

4D (.s/:



■ Therefore (.s/ D 0 requires 'cl.s/ D 0 or 'cl.!s/ D 0. LQR requires

that 'cl.s/ be Hurwitz (stable), so we have the conclusion:

Closed-loop poles are the LHP zeros of 1 C1

"GT .!s/G.s/.

Symmetric root locus in MATLAB

■ We want to plot the root locus

1 C1

"GT .!s/G.s/ D 0:

■ We need to find a way to represent GT .!s/G.s/ as a state-space

system in MATLAB.

G.s/ D C.sI ! A/!1B C D

and

GT .!s/ D BT!

!sI ! AT"!1

C T C DT

D BT .sI ! .!AT //!1.!C /T C DT :

■ This can be represented in block-diagram form as:

u.t/ y.t/x.t/Px.t/ #.t/P#.t/

RR

A

B C

D

!AT

BT!C T

DT

■ The overall system has state"

Px.t/

P#.t/

#

D

"

A 0

!C T C !AT

#"

x.t/

#.t/

#

C

"

B

!C T D

#

u.t/

y.t/ Dh

DT C BT

i"

x.t/

#.t/

#

C DT Du.t/:



function srl(sys)

[A,B,C,D]=ssdata(sys);

bigA=[A zeros(size(A)); -C’*C -A’];

bigB=[B; -C’*D];

bigC=[D’*C B’];

bigD=D’*D;

srlsys=ss(bigA,bigB,bigC,bigD);

rlocus(srlsys);

EXAMPLE: Let

G.s/ D1

.s ! 1:5/.s2 C 2s C 2/:

Note that G.s/ is unstable.

−3 −2 −1 0 1 2 3−3

−2

−1

0

1

2

3

Imag

Axi

s

Real Axis

Symmetric Root Locus

EXAMPLE: Multivariable control via LQR. Place poles of the MIMO

MagLev using LQR for

Q D diag.1; 3; 1; 3/ R D diag.100; 100/:

The poles end up at !179:3358, !7:0858, !101:2163, !3:6269. Place

poles at these locations using the Lyapunov method (from section 6)

as well, and compare u.t/ and x.t/.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−2

−1.5

−1

−0.5

0

0.5

1

1.5

2State value: LQR −−; LYAP −

Time (sec)

Ampl

itude

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6−4

−3

−2

−1

0

1

2

3

4Control effort: LQR −−; LYAP −

Time (sec)

Ampl

itude


linear quadratic regulatormocha-java.uccs.edu/ece5520/ece5520-ch08.pdf · 2015-10-25 ·...

Documents