on optimal control of constrained linear systems with imperfect state information and stochastic...

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROLInt. J. Robust Nonlinear Control 2004; 14:379–393 (DOI: 10.1002/rnc.888)

On optimal control of constrained linear systems withimperfect state information and stochastic disturbances

Tristan Perezn,y, Hernan Haimovich and Graham C. Goodwin

Centre for Integrated Dynamics and Control, School of Electrical Engineering and Computer Science,

The University of Newcastle, University Drive, Callaghan, NSW 2308, Australia

SUMMARY

We address the problem of finite horizon optimal control of discrete-time linear systems with inputconstraints and uncertainty. The uncertainty for the problem analysed is related to incomplete stateinformation (output feedback) and stochastic disturbances. We analyse the complexities associated withfinding optimal solutions. We also consider two suboptimal strategies that could be employed for largeroptimization horizons. Copyright # 2004 John Wiley & Sons, Ltd.

KEY WORDS: stochastic optimal control; constrained control; output feedback

1. INTRODUCTION

Control problems where variables need to satisfy constraints and where the control action alsohas to minimize a cost arise naturally in many different areas of Science, especially ControlEngineering. One particular technique that addresses such problems is model predictive control(MPC). This technique offers a framework to deal with constrained control problems for whichthe classical off-line computation of control laws is rendered difficult or impossible [1]. Becauseof this, MPC has become a widely adopted technique to address advanced control problems inindustrial applications [2].

When controlling physical systems, the constrained control problem often becomes moreinvolved because of the presence of uncertainty. In this context, uncertainty can be classifiedaccording to three sources:

* Incomplete state information to implement the control (output feedback),* Unmodelled dynamics,* Disturbances.

In control problems for uncertain systems, there are predominantly two approaches to modeluncertainty [3]: via a stochastic description and via a set-membership description. The stochastic

Copyright # 2004 John Wiley & Sons, Ltd.

yE-mail: [email protected]

nCorrespondence to: Tristan Perez, Centre for Integrated Dynamics and Control, School of Electrical Engineering andComputer Science, The University of Newcastle, Callaghan, NSW 2308, Australia

approach describes uncertainty in terms of probability distributions associated with theuncertain quantities (disturbances, initial conditions, parameters, etc.). The alternative set-membership approach describes uncertainty only in terms of some information regarding thesets in which the uncertain quantities (disturbances, parameters, etc.) take values.

Within the framework of MPC, the set-membership description of uncertainty has beenadvocated in the literature [4, 5, 6]. For example, in Reference [5] a strategy based on set-valuedobservers is proposed. This strategy uses an observer that assumes unknown but boundeddisturbances and generates a set of possible states based on past output and input information.Then, to each estimated state a set of control values that guarantees the satisfaction of theconstraints is associated. The control is chosen so as to belong to the intersection of all thecontrol value sets. In Reference [6], an extension of the dual-mode paradigm [7] is presented byusing invariant sets of estimation errors for the case of unknown but bounded measurementnoise and disturbances.

On the other hand, the problem of constrained control of uncertain systems with a stochasticuncertainty description is still not fully resolved within the framework of MPC [1]. In Reference[8] output feedback predictive control of nonlinear systems with uncertain parameters isaddressed. In this work, there are no constraints associated with the control and due to thedifficulties of the optimization for a general nonlinear system, only suboptimal solutions areconsidered. In Reference [9], state feedback control of unconstrained linear systems withuncertain parameters is addressed. In Reference [10], the case of state feedback, inputconstraints and scalar disturbances is considered. A randomized algorithm (Monte Carlosampling) is used to approximate the optimal solution. Examples for an optimization horizon ofonly one time instant are presented. In [11], the same authors analyse a similar case, but wherestate constraints are also present. They utilize a short optimization horizon (five time instants)and implement the solution over a longer horizon in a receding horizon manner.

When there is incomplete state information, an observer-based strategy that seems naturalfor control in the presence of stochastic disturbances is the one that uses the so-calledCertainty Equivalence (CE) principle. Specifically, CE consists of estimating the state andthen using these estimates as if they were the true state in the control law that results if theproblem were formulated as a deterministic problem (no uncertainty). This strategy is moti-vated by the unconstrained problem with quadratic cost, for which the obtained strategy isindeed optimal [12, 13]. The use of CE in MPC leads to CE-MPC, and due to its simplicity,this strategy has been advocated in the literature [14] and reported in a number of applications[15, 16, 17]. Notwithstanding the widespread use of CE-MPC in applications, it must bestressed that CE-MPC, generally, results in a suboptimal control strategy. Two factors canbe highlighted that render CE-MPC suboptimal: (1) the state estimate is assumed to be thetrue current state, and (2) the stochastic behaviour of the system is neglected over theprediction horizon.

Our principal aim in the current work is to analyse the extent to which truly optimal solutionscan be found for single-input linear systems with input constraints and uncertainty related tooutput feedback and stochastic disturbances.

The rest of the paper is organized as follows. In Section 1, we state the optimal controlproblem for the case considered. In Section 2.1, we find the optimal solution for the case whenthe prediction horizon consists of only one time instant. In Section 2.2, we point to thecomplications that appear when the prediction horizon is increased to two time instants. Here,we also discuss the implications of practical implementation even for cases where the estimation

Copyright # 2004 John Wiley & Sons, Ltd. Int. J. Robust Nonlinear Control 2004; 14:379–393

T. PEREZ, H. HAIMOVICH AND G. C. GOODWIN380

part can be solved explicitly. This motivates the study of different suboptimal strategies inSection 3. An example is presented in Section 4 and conclusions are drawn in Section 5.

2. PROBLEM STATEMENT

We consider the following time-invariant discrete-time linear system with disturbances:

xkþ1 ¼Axk þ Buk þ wk;

yk ¼Cxk þ vk;ð1Þ

where xk;wk 2 Rn and uk; yk; vk 2 R: The control uk is constrained to take values in the followingset:

U ¼ fu : � D4u4Dg � R ð2Þ

for a given constant value D > 0: The disturbances wk; and vk are sequences of independentidentically distributed random vectors, with probability density functions pwð�Þ and pvð�Þ;respectively. The initial state, x0; is characterized by a given probability density function (pdf)px0 ð�Þ: We assume that the pair ðA;BÞ is reachable and that the pair ðA;CÞ is observable.

We further assume that, at the event k; the value of the state xk is not available to thecontroller. Instead, the following sets of past inputs and outputs are available to the controllerat the time instant k:

Ik ¼

fy0g if k ¼ 0;

fy0; y1; u0g if k ¼ 1;

fy0; y1; y2; u0; u1g if k ¼ 2;

..

. ...

fy0; y1; . . . ; yN�1; u0; u1; . . . uN�2g if k ¼ N � 1:

8>>>>>>>>><>>>>>>>>>:

ð3Þ

Then, Ik 2 R2kþ1; and also Ikþ1 ¼ fIk; ykþ1; ukg; where Ik � Ikþ1: It is clear that Ik contains allthe information available to the controller at the time instant k; and hence it is called theinformation vector.

For the system defined above, together with the assumptions made, we seek to minimize thecostz

E gT ðxNÞ þXN�1

k¼0

gðxk; ukÞ

( ); ð4Þ

where the terminal cost gT ð�Þ and the cost per stage gð�; �Þ are given by

gT ðxNÞ ¼ xTNSxN ;

gðxk; ukÞ ¼ xTkQxk þ Ru2k; ð5Þ

and N is called the optimization horizon.

zDue to the stochastic nature of the system, the expression gT ðxN Þ þPN�1

k¼0 gðxk; ukÞ becomes a random variable. Hence,it is only meaningful to formulate the minimization problem in terms of its statistics. From a practical viewpoint, it is ofinterest to minimize the expected value of this expression.


OPTIMAL CONTROL OF CONSTRAINED LINEAR SYSTEMS 381

The result of this minimization will be a sequence of functions fpopt0 ð�Þ; popt1 ð�Þ; . . . ; poptN�1ð�Þgthat enable the controller to calculate the desired optimal control action depending on theinformation available to the controller at each time instant k; i.e. uoptk ¼ poptk ðIkÞ: These functionsalso must ensure that the constraints be always satisfied. This motivates the following definition.

Definition 1 (Admissible policies for incomplete state information)A policy PN is a finite sequence of functions pkð�Þ :R2kþ1 ! R for k ¼ 0; 1; . . . ;N � 1; i.e.

PN ¼ fp0ð�Þ; p1ð�Þ; . . . ;pN�1ð�Þg:

A policy PN is called an ‘‘admissible control policy’’ if and only if

pkðIkÞ 2 U 8Ik 2 R2kþ1; for k ¼ 0; . . . ;N � 1:

Further, the class of all admissible control policies will be denoted by

%PPN ¼ fPN : PN admissibleg:

The optimal control problem we address can then be stated as follows. &

Definition 2 (Stochastic finite-horizon optimal control problem)Given the pdfs px0 ð�Þ; pwð�Þ and pvð�Þ of the initial state, x0; and the disturbances wk and vk;respectively, the problem considered is that of finding the control policy Popt

N ; called the optimalcontrol policy, belonging to the class of all admissible control policies %PPN ; that minimizes thecost

VNðPNÞ ¼ Ex0;wk ;vk

k¼0;1;...;N�1

gT ðxNÞ þXN�1

k¼0

gðxk;pkðIkÞÞ

( )ð6Þ

subject to the constraints

xkþ1 ¼ Axk þ BpkðIkÞ þ wk; ð7Þ

yk ¼ Cxk þ vk; ð8Þ

Ikþ1 ¼ fIk; ykþ1; ukg; ð9Þ

for k ¼ 0; . . . ;N � 1; where the terminal cost gT ð�Þ and the cost per stage gð�; �Þ are given by

gT ðxNÞ ¼ xTNSxN

gðxk;pkðIkÞÞ ¼ xTkQxk þ Rp2kðIkÞ;

ð10Þ

for S;R > 0 and Q50:We can thus express the optimal control policy as

PoptN ¼ arg inf

PN2 %PPN

VNðPNÞ; ð11Þ

with the following resulting optimal cost:

VoptN ¼ inf

PN2 %PPN

VNðPNÞ: ð12Þ

&



Remark 2.1We would like to emphasize that the optimization problem thus stated takes into account thefact that new information will be available to the controller at future time instants. This is calledclosed-loop optimization and can be distinguished from open-loop optimization where thecontrol values fu0; u1; . . . ; uN�1g are selected all at once, at stage 0 [13]. For deterministicsystems, in which there is no uncertainty, it is not necessary to make this distinction andminimizing the cost over all sequences of controls or over all control policies yields the sameresult. &

Throughout this work, the matrix S will be adopted as a solution of the following algebraicRiccati equation [18]:

S ¼ ATSAþQ� KT %RRK ; ð13Þ

where

K ¼4 %RR�1BTSA; %RR¼4 Rþ BTSB: ð14Þ

3. OPTIMAL SOLUTIONS

A key aspect of the problem described in the previous section is that control actions cannot becalculated without taking into account the future consequences of the current decision since theaction taken at a particular stage affects all the subsequent stages. The formulated problem fallsinto the class of the so-called sequential decision problems under uncertainty [3, 13] and the onlygeneral approach to tackle such problems is dynamic programming (DP).

DP is an algorithm based on Bellman’s principle of optimality [19] that proceeds sequentiallybackwards in time by solving optimization subproblems. We next briefly show how thisalgorithm is used to solve the problem stated previously. For details, the reader is referred to[3, 13, 19].

We start by defining the functions

*ggN�1ðIN�1;pN�1ðIN�1ÞÞ9EfgT ðxNÞ þ gðxN�1; pN�1ðIN�1ÞÞjIN�1; pN�1ðIN�1Þg;

*ggkðIk;pkðIkÞÞ9Efgðxk;pkðIkÞÞjIk;pkðIkÞg:ð15Þ

The DP algorithm for the case of incomplete state information can be expressed via thefollowing sequential optimization subproblems:

SOPN�1 : JN�1ðIN�1Þ ¼ infuN�12U

*ggN�1ðIN�1; uN�1Þ

subject to xN ¼ AxN�1 þ BuN�1 þ wN�1 ð16Þ

and

SOPk : JkðIkÞ ¼ infuk2U

½ *ggkðIk; ukÞ þ EfJkþ1ðIkþ1ÞjIk; ukg�

subject to xkþ1 ¼ Axk þ Buk þ wk and Ikþ1 ¼ fIk; ykþ1; ukg ð17Þ

for k ¼ N � 2;N � 3; . . . ; 0: The optimization process starts at stage N � 1 by solving SOPN�1



for all possible values of IN�1: In this way, the law poptN�1ð�Þ is found, in the sense that given thevalue of IN�1; the corresponding optimal control is the value uN�1 ¼ poptN�1ðI

N�1Þ; the minimizerof SOPN�1: The procedure then continues in order to find poptN�2ð�Þ; . . . ; p

opt0 ð�Þ by solving

SOPN�2; . . . ;SOP0: After the last optimization subproblem is solved, the optimal controlpolicy Popt

N is obtained and the optimal cost (cf. (12)) is

VoptN ¼ VNðP

optN Þ ¼ EfJ0ðI0Þg ¼ EfJ0ð y0Þg: ð18Þ

3.1. Optimal solution for N ¼ 1

We next apply the DP algorithm to obtain the optimal solution for the case where the predictionhorizon is equal to only one time instant.

Proposition 1For N ¼ 1; the solution of the optimal control problem stated in Definition 2 is of the formPopt

1 ¼ fpopt0 ð�Þg; with

uopt0 ¼ popt0 ðI0Þ ¼ �satDðKEfx0jI0gÞ; 8I0 2 R; ð19Þ

where K was defined in (14) and satD :R ! R is defined as

satDðzÞ ¼

D if z > D

z if jzj4D

�D if z5� D

8>><>>:

:

ð20Þ

Moreover, the last step in the DP algorithm has the form

J0ðI0Þ ¼ EfxT0Sx0jI0g þ %RRFDðKEfx0jI0gÞ þ trðKTK covfx0jI0gÞ þ EfwT

0Sw0g; ð21Þ

where FD :R ! R is given by

FDðzÞ ¼ ½z� satDðzÞ�2: ð22Þ

ProofThe only optimization subproblem we have to solve is SOP0 (cf. (16)) because N ¼ 1:

J0ðI0Þ ¼ infu02U

EfgT ðx1Þ þ gðx0; u0ÞjI0; u0g

¼ infu02U

EfðAx0 þ Bu0 þ w0ÞTSðAx0 þ Bu0 þ w0Þ þ xT0Qx0 þ Ru20jI

0; u0g:ð23Þ

By distributing and grouping terms and also using the fact that Efw0jI0; u0g ¼ Efw0g ¼ 0 andthat w0 is neither correlated with the state x0 nor correlated with the control u0; the latter can beexpressed as

J0ðI0Þ ¼ infu02U

EfxT0 ðATSAþQÞx0 þ 2u0B

TSAx0 þ ðBTSBþ RÞu20jI0; u0g þ EfwT

0Sw0g: ð24Þ



Further, using expressions (13) and (14), we can write

J0ðI0Þ ¼ infu02 U

½EfxT0Sx0 þ %RRðxT0KTKx0 þ 2u0Kx0 þ u20ÞjI

0; u0g� þ EfwT0Sw0g

¼ %RR infu02U

½Efðu0 þ Kx0Þ2jI0; u0g� þ EfxT0Sx0jI

0g þ EfwT0Sw0g;

ð25Þ

where we have used the fact that the conditional pdf of x0 given I0 and u0 is equal to the onewhere only I0 is given. Finally, using properties of the expected value of quadratic forms}see,for example [12]}the optimization to solve becomes

J0ðI0Þ ¼ %RR infu02U

½ðu0 þ KEfx0jI0gÞ2� þ trðKTK covfx0jI0gÞ þ EfxT0Sx0jI

0g þ EfwT0Sw0g: ð26Þ

From (26), it follows that the unconstrained minimum is attained at uuc0 ¼ �KEfx0jI0g: For theconstrained case, the result of Equation (19) follows from the convexity of the quadraticfunction. The final step of DP is obtained by substituting (19) back into (26), i.e.

J0ðI0Þ ¼ %RRFDðKEfx0jI0gÞ þ trðKTKcovfx0jI0gÞ þ EfxT0Sx0jI0g þ EfwT

0Sw0g; ð27Þ

with the function FDð�Þ defined in (22). &

Remark 3.1It is worth noting that when N ¼ 1 the optimal control law popt0 depends on the information I0

only through the conditional expectation Efx0jI0g: Therefore, this conditional expectation is asufficient statistic for the problem considered, i.e. it provides all the necessary information toimplement the control. &

Remark 3.2The control law given in (19) is the same law that would be optimal if we considered the case inwhich the state is measured (complete state information) and the disturbance wk is still acting onthe system. Further, the same law would also be the optimal control law if apart from measuringthe state, wk is set equal to a fixed value or to its mean value}see Reference [20] for the solutionin the deterministic and complete-information case. Therefore, Certainty Equivalence holdsfor the optimization horizon N ¼ 1; i.e. the optimal control law is the same law that wouldresult if the optimal control problem were considered with some or all uncertain quantities set toa fixed value. &

3.2. Optimal solution for N ¼ 2

We now consider the case where the optimization horizon is increased to two instants, i.e.N ¼ 2:

Proposition 2For N ¼ 2; the solution of the optimal control problem stated in Definition 2 is of the formPopt

2 ¼ fpopt0 ð�Þ;popt1 ð�Þg; with

uopt1 ¼ popt1 ðI1Þ ¼ �satDðKEfx1jI1gÞ; 8I1 2 R3; ð28Þ

uopt0 ¼ popt0 ðI0Þ ¼ arg infu02U

½ðu0 þ KEfx0jI0gÞ2 þ %RREfFDðKEfx1jI1gÞjI0; u0g� 8I0 2 R; ð29Þ

where FDð�Þ is given in (22).



ProofThe first step of the DP algorithm (cf. (16)) gives

J1ðI1Þ ¼ infu12U

EfgT ðAx1 þ Bu1 þ w1Þ þ gðx1; u1ÞjI1; u1g: ð30Þ

We see that J1ðI1Þ is similar to J0ðI0Þ in (27). By comparison, we readily obtain

popt1 ðI1Þ ¼ � satDðKEfx1jI1gÞ;

J1ðI1Þ ¼ %RRFDðKEfx1jI1gÞ þ trðKTKcovfx1jI1gÞ þ EfxT1Sx1jI1g þ EfwT

1Sw1g: ð31Þ

The second step of the DP algorithm proceeds as follows:

J0ðI0Þ ¼ infu02U

½Efgðx0; u0ÞjI0; u0g þ EfJ1ðI1ÞjI0; u0g�

subject to x1 ¼ Ax0 þ Bu0 þ w0; I1 ¼ fI0; y1; u0g: ð32Þ

The latter can be written as

J0ðI0Þ ¼ infu02U

½EfxT0Qx0 þ Ru20jI0; u0g þ EfEfxT1Sx1jI

1gjI0; u0g

þ %RREfFDðKEfx1jI1gÞjI0; u0g þ trðKTK covfx1jI1gÞ þ EfwT1Sw1g�:

ð33Þ

Since fI0; u0g � I1; using the properties of successive conditioning [21], we can express thesecond term of (33) as

EfEfxT1Sx1jI1gjI0; u0g ¼EfxT1Sx1jI

0; u0g

¼EfðAx0 þ Bu0 þ w0ÞTSðAx0 þ Bu0 þ w0ÞjI0; u0g:

ð34Þ

Using this, expression (33) becomes

J0ðI0Þ ¼ infu02U

½EfxT0Qx0 þ Ru20 þ ðAx0 þ Bu0 þ w0ÞTSðAx0 þ Bu0 þ w0ÞjI0; u0g

þ %RREfFDðKEfx1jI1gÞjI0; u0g þ trðKTK covfx1jI1gÞ þ EfwT1Sw1g�:

ð35Þ

Note that the first part of (35) is identical to (23). Therefore, using (26), expression (35) can bewritten as

J0ðI0Þ ¼ infu02U

½ðu20 þ KEfx0jI0gÞ2 þ %RREfFDðKEfx1jI1gÞjI0; u0g�

þ EfxT0Sx0jI0g þ

X1j¼0

trðKTK covfxj jI jgÞ þ EfwTj Swjg;

ð36Þ

where trðKTK covfx1jI1gÞ has been left out of the minimization because it is not affected by u0due to the linearity of the system equations [3, 22]. By considering only the terms that areaffected by u0; we find the result given in Equation (29). &

In order to proceed with the optimization and to obtain an explicit expression for popt0 ; weneed to know how to express Efx1jI1g ¼ Efx1jI0; y1; u0g explicitly as a function of I0; u0 and y1:The optimal law popt0 ð�Þ depends on I0 not only through Efx0jI0g; as was the case for N ¼ 1: Itcan be shown that even for the case of gaussian disturbances the optimal control law popt0 ð�Þdepends on covfx0jI0g as well, when input constraints are present [23].



To calculate Efx1jI1g; the conditional pdf px1 ð�jI1Þ must be found. At any time instant k; the

conditional pdfs pxkð�jIkÞ satisfy the following recursion.

Time update (Chapman–Kolmogorov equation):

pxkðxkjIk�1; uk�1Þ ¼

Zpxk ðxkjxk�1; uk�1Þpxk�1

ðxk�1jIk�1; uk�1Þ dxk�1: ð37Þ

Observation update:

pxk ðxkjIkÞ ¼ pxkðxkjI

k�1; yk; uk�1Þ ¼pykðykjxkÞpxkðxkjI

k�1; uk�1Þpyk ðykjIk�1; uk�1Þ

; ð38Þ

where

pyk ðykjIk�1; uk�1Þ ¼

ZpykðykjxkÞpxkðxkjI

k�1; uk�1Þ dxk: ð39Þ

Remark 3.3Finding the conditional pdfs that satisfy the recursion given by Equations (37) and (38) inexplicit form may be very difficult or even impossible depending on the pdfs of the initial stateand the disturbances. For the case in which they are all Gaussian, however, all the conditionaldensities that satisfy (37) and (38) are also Gaussian. In this particular case, (37) and (38) led toa recursive algorithm in terms of the (conditional) expectation and covariance (that completelydefine any Gaussian pdf) known as the Kalman Filter algorithm [24, 25]. &

Due to the way the information enters the conditional pdfs, it is very difficult to obtain anexplicit form for the optimal control. On the other hand, the computational implementation ofsuch optimal control may also be rather involved and computationally demanding, even in thecase where the recursion given by (37) and (38) can be found in explicit form. To demonstratethis point, let us illustrate a possible way of implementing the optimal controller in such a case.

We start by discretizing the set U of admissible control values such that the discretized set,Ud ¼ fu0i 2 U : i ¼ 1; 2; . . . ; rg; consist of only a finite number r of quantities. The discretizationallows the optimal control to be approximated as

uopt0 � arg infu02Ud

½ðu0 þ KEfx0jI0gÞ2 þ %RREfFDðKEfx1jI1gÞjI0; u0g� : ð40Þ

Then, in order to solve the minimization, only a finite number of values of u0 have to beconsidered for a given I0 ¼ y0: For every value u0i of u0; the expression between square bracketsin (40) must be evaluated. To accomplish this, the following steps are envisaged:

1. We need to evaluate the function W2ð�Þ given by

W2ðy0Þ ¼ Efx0jy0g ¼Z

x0px0 ðx0jy0Þ dx0; ð41Þ

for the given value of y0:2. Given y0; pvð�Þ; pwð�Þ; and px0 ð�Þ; we need to obtain py0ð�jx0Þ and px1 ð�jy0; y1; u0Þ in explicit

form (as assumed) using recursion (37) and (38) together with the system and measurementequations.



3. Using px1 ð�jy0; y1; u0Þ; the expectation Efx1jy0; y1; u0g can be expressed as

hðy0; y1; u0Þ ¼Efx1jy0; y1; u0g

¼Z

x1px1ðx1jy0; y1; u0Þ dx1;ð42Þ

which may not be expressible in explicit form even if (37) and (38) are so. However, it maybe evaluated numerically if y0; y1 and u0 were given.

4. Using px0ð�jy0Þ; pvð�Þ; together with the measurement equation, we can obtain py1 ð�jy0; u0Þ:We can now express the expectation EfFDðKhðy0; y1; u0ÞÞjI0; u0g as

W1ðy0; u0Þ ¼EfFDðKhðy0; y1; u0ÞÞjI0; u0g

¼Z

FDðKhðy0; y1; u0ÞÞpy1ðy1jy0; u0Þ dy1:ð43Þ

Note that, in order to calculate this integral numerically, the function hðy0; y1; u0Þ has to becalculated for different values of y1; even if u0 and y0 were given.

To find the cost incurred for one of the r values u0i; expressions (41)–(43) may need furtherdiscretizations (for x0; x1 and y1; respectively). Additionally, the fact that x0; x1 2 Rn maycomplicate the matter further, if n > 1:

From the previous comments, it is evident that the approximation of the optimal solution canbe very computationally demanding depending on the discretizations performed. Note that inthe above steps all the pdfs are assumed known in explicit form so that the integrals can beevaluated. As already mentioned in Remark 3.3, this may not always be possible.

However, an alternative approach to brute force discretizations is the use of Markov ChainMonte Carlo (MCMC) methods [26], which are based on approximating continuous pdfs bydiscrete ones, by drawing samples from the pdfs in question or from other approximations.These methods could be applied in this case but it must be stressed that the exponential growthin the number of computations (as the optimization horizon is increased) seems to beunavoidable save for some very particular cases. The application of these methods to therecursion given by (37) and (38) gives rise to a special case of the, so-called, particle filters [27].

Due to the apparent impossibility of analytically proceeding with the optimization forhorizons greater than two, and also due to the intricacies of the implementation of the optimallaw even for N ¼ 2; one often has to resort to the use of suboptimal solutions. In the nextsection, we analyse two alternative suboptimal strategies.

4. SUBOPTIMAL STRATEGIES

4.1. Certainty equivalent control (CEC)

As mentioned in the introduction, CEC is a control strategy in which, at each stage, the controlapplied is the optimal control of an associated deterministic problem derived from the originalproblem by removing all uncertainty. Specifically, the associated problem is derived by settingthe disturbance wk to a fixed typical value}e.g. %ww ¼ Efwkg}and by also assuming perfect stateinformation. After finding the solution to the associated deterministic problem, the control toapply is implemented using some estimate of the state #xxðIkÞ in place of the true state (which was



assumed to be known for solving the associated problem). That is, we obtain the optimal policyfor the deterministic problem (DET)

PDETN ¼ fpDET

0 ð�Þ; . . . ;pDETN�1ð�Þg; ð44Þ

where pDETk : Rn ! R for k ¼ 0; 1; . . . ;N � 1: Then, the CEC evaluates the deterministic laws at

the estimate of the state, i.e.

uCEk ¼ pDETk ð #xxðIkÞÞ: ð45Þ

The associated deterministic problem for linear systems with a quadratic cost is an example ofa case where the control policy can be obtained explicitly for any finite optimization horizon[28]. The following example illustrates this for an optimization horizon N ¼ 2; for a procedureto obtain the pre-computable laws for larger optimization horizons see [28, 29].

Example 1 (Closed-loop CEC)For N ¼ 2; the deterministic policy PDET

2 ¼ fpDET0 ð�Þ; pDET

1 ð�Þg is given by [30]

pDET1 ðxÞ ¼ �satDðKxÞ 8x 2 Rn; ð46Þ

pDET0 ðxÞ ¼

�satDðGxþHÞ if x 2 X1;

�satDðKxÞ if x 2 X2 8x 2 Rn;

�satDðGx�HÞ if x 2 X3:

8>><>>: ð47Þ

K is given by (14) and

G ¼K þ KBKA

1þ ðKBÞ2; H ¼

KB

1þ ðKBÞ2D:

The sets X1;X2;X3 form a partition of Rn; and are given by

X1 ¼fx jKAKx5� Dg

X2 ¼fx j jKAKxj4Dg

X3 ¼fx jKAKx > Dg

with

AK ¼ A� BK

Therefore, a closed-loop CEC consists in using the controls

uCE0 ¼pDET0 ð #xxðI0ÞÞ;

uCE1 ¼pDET1 ð #xxðI1ÞÞ:

ð48Þ

4.2. Partially stochastic CEC (PS-CES)

This variant of CEC treats the problem as one of perfect state information but takes stochasticdisturbances into account. To solve the optimization problem, it is assumed that the state is, andwill be, known to the controller. In reality, and as with CEC, the value of the state is providedby an estimator that receives the available information as input and generates #xxkðIkÞ: A PS-CEC



admissible policy

LN ¼ fl0ð�Þ; . . . ; lN�1ð�Þg ð49Þ

is a sequence of admissible control laws lkð�Þ :Rn ! U that map the (estimates of the) states intoadmissible control actions. The PS-CEC solves the following perfect state information problem.

Definition 3 (PS-CEC optimal control problem)Assuming the state #xxk will be available to the controller at time instant k to calculate the controland given the pdf pwð�Þ of the disturbances wk; find the admissible control policy Lopt

N thatminimizes the cost

#VVNðLNÞ ¼ Ewk

k¼0;1;...;N�1

gT ð #xxNÞ þXN�1

k¼0

gð #xxk; lkð #xxkÞÞ

( )ð50Þ

subject to #xxkþ1 ¼ A #xxk þ Buk þ wk for k ¼ 0; 1; . . . ;N � 1:

The optimal control policy for perfect state information thus found will be used, as in CEC, tocalculate the control action based on the estimate #xxk provided by the estimator, i.e.

uk ¼ lkð #xxkÞ: ð51Þ&

Next, we apply this suboptimal strategy to the problem of interest.

4.2.1. PS-CEC for N ¼ 1: Using the DP algorithm, we have

#JJð #xx0Þ ¼ infu02U

Ef #xxT1S #xx1 þ #xxT0Q #xx0 þ Ru20j #xx0; u0g: ð52Þ

As with the true optimal solution for N ¼ 1; the PS-CEC optimal control has the form

#uuopt0 ¼ lopt0 ð #xx0Þ ¼ �satDðK #xx0Þ; ð53Þ

#JJ0ð #xx0Þ ¼ #xxT0S #xx0 þ %RRFDðK #xx0Þ þ EfwT0Sw0g: ð54Þ

We can see that if #xx0 ¼ Efx0jI0g then the PS-CEC for N ¼ 1 coincides with the optimalsolution.

4.2.2. PS-CEC for N ¼ 2: The first step of the DP algorithm yields

#uuopt1 ¼ lopt1 ð #xx1Þ ¼ �satDðK #xx1Þ; ð55Þ

#JJ1ð #xx1Þ ¼ #xxT1S #xx1 þ %RRFDðK #xx1Þ þ EfwT1Sw1g: ð56Þ

For the second step, we have, after some algebra, that

#JJ0ð #xx0Þ ¼ infu02U

½Efgð #xx0; u0Þ þ #JJ1ð #xx1Þj #xx0; u0g�

subject to #xx1 ¼ A #xx0 þ Bu0 þ w0;ð57Þ

#uuopt0 ¼ arg infu02U

½ðu0 þ K #xx0Þ2 þ %RREfFD½KðA #xx0 þ Bu0 þ w0Þ�j #xx0; u0g�: ð58Þ

Comparing #uuopt0 to expression (29) for the optimal control, we can appreciate that, given #xx0; evenif EfFD½KðA #xx0 þ Bu0 þ w0Þ�j #xx0; u0g cannot be found in explicit form as a function of u0;



numerically computing this suboptimal control action is much less computationally demandingthan its optimal counterpart.

5. SIMULATION EXAMPLES

In this section we compare CEC and PS-CEC by means of simulation examples. The examplesare aimed at assessing the closeness, in performance, of the different strategies, as evidenced bythe cost function.

In order to assess the merits (or otherwise) of these strategies, the corresponding costs have tobe evaluated. As previously mentioned, any function of the state and the control is a randomvariable, and the cost is defined as the expected value of a particular one of these functions(quadratic, in this case). Hence, a comparison between the costs incurred by using differentpolicies is only meaningful in terms of these expected values. To numerically compute values ofthe cost for a given control policy, different realizations of the initial state plus process andmeasurement disturbances have to be obtained and a corresponding realization of the costevaluated. Then, the expected value can be approximated by averaging over the differentrealizations.

The following examples are simulated for the following system:

A ¼0:9713 0:2189

�0:2189 0:7524

" #; B ¼

0:0287

0:2189

" #; C ¼ ½0:3700 0:0600�: ð59Þ

The disturbances wk are adopted to have a uniform distribution with support on ½�0:5; 0:5� �½�1; 1� and likewise for vk with support on ½�0:1; 0:1�: The initial state x0 is normally distributedwith zero mean and covariance diagð300�1; 300�1Þ: A Kalman Filter was implemented toprovide the state estimates needed. Although this estimator is not the optimal one in this case,because the disturbances are not normally distributed, it yields the best linear unbiasedestimator for the state. The parameters for the Kalman Filter were adopted as the true mean andcovariance of the corresponding variables in the system. The saturation limit of the control, D;was adopted to be equal to 1. The optimization horizon is in both cases N ¼ 2:

For PS-CEC, we discretize the set U so that only 500 values are considered, and the expectedvalue in (58) is approximated by taking 300 samples of the pdf pwð�Þ for every possible value ofu0 in the discretized set. For CEC, we implement the policy given in Example 1.

We simulated the closed-loop system over only two time instants, but repeated the simulationa large number of times (between 2000 and 8000). For each simulation, a different realization ofthe disturbances and the initial state was used. A realization of the cost was calculated for everysimulation run for each one of the control policies applied (PS-CEC and CEC). The sampleaverage of the cost for each policy was computed and the difference between them was alwaysfound to be less than 0.1%.

Remark 4.1The examples simulated have no practical value. However, the comparison between the costs forthe two control policies seems to indicate that the trade-off between better performance andcomputational complexity favours the CEC implementation over the PS-CEC. &



6. CONCLUDING REMARKS

We have addressed the optimal control problem for discrete-time input-constrained linear time-invariant systems with output feedback and stochastic disturbances. We have extended some ofour previous work in the area [31] where we only considered Gaussian disturbances. Theframework presented in this paper allows for a general stochastic description of disturbances.However, the intricacies and complexities of the optimal solution for such a case prevents theevaluation of explicit solutions even for optimization horizons as short as 2 time instants.

It would be of interest, from a practical standpoint, to extend the optimization horizonbeyond N ¼ 2: However, as we have explained in a previous section and seen in the examples,due to computational issues this becomes very difficult. In order to achieve this extension, one isled to conclude that CEC may be, at this point, the only way forward.

Of course, the ultimate test for the suboptimal strategies would be to contrast them with theoptimal one. The authors believe that in this case, an appreciable difference in the cost may beobtained, due to the fact that the optimal strategy takes into account the process andmeasurement disturbances in a unified manner, as opposed to the above-mentioned suboptimalstrategies that use estimates provided by an estimator as if they were the true state.

ACKNOWLEDGEMENT

The authors would like to acknowledge comments and suggestions made by Prof. David Q. Mayne fromthe Department of Electrical and Electronic Engineering, Imperial College of Science, London, UK, thatmotivated the current work.

REFERENCES

1. Mayne DQ, Rawlings JB, Rao CV, Scokaert POM. Constrained model predictive control: Stability and optimality.Automatica 2000; 36:789–814.

2. Maciejowski JM. Predictive Control with Constraints. Prentice-Hall: Englewood Cliffs, NJ, 2002.3. Bertsekas D. Dynamic Programming and Optimal Control, Optimization and Computation Series, vols. 1, 2. Athena

Scientific: Belmont, MA, 2000.4. Lee JH, Yu Z. Worst-case formulations of model predictive control for systems with bounded parameters.

Automatica 1997; 33(5):763–781.5. Shamma JS, Tu KY. Output feedback for systems with constraints and saturations: scalar control case. System and

Control Letters 1998; 35:265–293.6. Lee YI, Kouvaritakis B. Receding horizon output feedback control for linear systems with input saturation. IEE

Proceedings, Control Theory and Applications 2001; 148(2):109–115.7. Mayne DQ, Michalska H. Robust receding horizon control of constrained nonlinear systems. IEEE Transactions on

Automatic Control 1993; 38(11):1623–1633.8. Filatov NM, Unbehauen H. Adaptive predictive control policy for nonlinear stochastic systems. IEEE Transactions

on Automatic Control 1995; 40(11):1943–1949.9. Lee JH, Cooley BL. Optimal feedback control strategies for state-space systems with stochastic parameters. IEEE

Transactions on Automatic Control 1998; 43(10):1469–1475.10. Batina I, Stoorvogel AA, Weiland S. Stochastic disturbance rejection in model predictive control by randomizes

algorithms. In Proceedings of the American Control Conference, Arlington, VA, U.S.A., 2001.11. Batina I, Stoorvogel AA, Weiland S. Optimal control of linear, stochastic systems with state and input constraints.

In Proceedings of the 41st IEEE Conference on Decision and Control, Las Vegas, NV, U.S.A., 2002.12. (AAstr .oom KJ. Introduction to Stochastic Control Theory, Mathematics in Science and Engineering, vol. 70. Academic

Press: New York, 1970.13. Bertsekas DP. Dynamic Programming and Stochastic Control, Mathematics in Science and Engineering, vol. 125.

Academic Press: New York, 1976.



14. Muske, KR, Rawlings JB. Model predictive control with linear models. A.I.Ch.e. Journal 1993; 39(2):262–2872.15. Angeli D, Mosca E, Casavola A. Output predicitive control of linear plants with saturated and rate limited

actuators. In Proceedings of American Control Conference, ACC, Chicago, U.S.A., 2000; 272–276.16. Marquis P, Broustail J. Smoc, a bridge between state space and model predictive controllers: application to the

automation of hydrotreating unit. In Proceedings of IFAC Workshop on Model Based Process Control. PergamonPress: Oxford, 1988; 37–43.

17. P!eerez T, Goodwin GC, Tzeng CY. Model predictive rudder roll stabilization control for ships. In Fifth IFACConference on Manoeuvering and Control of Marine Crafts (MCMC2000), Aalborg, Denmark, 2000.

18. Anderson BDO, Moore JB. Optimal Control. Linear Quadratic Methods. Prentice-Hall: Englewood Cliffs, NJ, 1971.19. Bellman RE. Dynamic Programming. Princeton University Press: Princeton, NJ, 1957.20. De Don!aa JA, Goodwin GC. Elucidation of state-space regions wherein model predictive control and anti-windup

strategies achieve identical control policies. Proceedings of American Control Conference, Chicago, Illinois, U.S.A.,2000.

21. Ash RB, Dol!eeans-Dade C. Probability and Measure Theory. Harcourt Academic Press, San Diego, 2000.22. Bertsekas DP. Dynamic Programming: Deterministic and Stochastic Models. Prentice-Hall: Englewood Cliffs, NJ,

1987.23. Haimovich H, Perez T, Goodwin GC. On optimality and certainty equivalence in output feedback control of

constrained uncertain linear systems. In European Control Conference, 2003 University of Cambridge, U.K.24. Anderson BDO, Moore JB. Optimal Filtering. Prentice-Hall: Englewood Cliffs, NJ, 1979.25. Kalman RE. A new approach to linear filtering and prediction problems. Transactions on ASME, Series D, Journal

of Basic Engineering 1960; 82:35–45.26. Robert CR, Casella G. Monte Carlo Statistical Methods. Springer: Berlin, 1999.27. Doucet A, de Freitas N, Gordon N. Sequential Monte Carlo Methods in Practice. Springer: Berlin, 2001.28. Bemporad A, Morari M, Dua V, Pistikopulos E. The explicit linear quadratic regulator for constrained systems.

Technical Report AUT99-16, Automatic Control Laboratory, ETH-Swiss Federal Institute of Technology, 1999.29. Ser !oon M, De Don!aa JA, Goodwin GC. Global analytical model predictive control with input constraints.

In Proceedings of the 39th Conference on Decision and Control, Sydney, Australia, 2000.30. De Don!aa JA. Input constrained linear control. Ph.D. Thesis, The University of Newcastle, NSW, Australia, 2000.31. P!eerez T, Goodwin GC. Stochastic output feedback model predictive control. In Proceedings of American Control

Conference, ACC, Arlington, VA, U.S.A., 2001.



on optimal control of constrained linear systems with imperfect state information and stochastic...

Documents