simultaneous design of measurement and control strategies for stochastic systems with feedback

16
Automatica. Vol. 25. No. 5. pp. 679--694, 1989 Printed in Great Britain. 0005-1098189 I3.00 + 0.00 Pergamon Press plc ¢~ 1989 International Federation of Automatic Control Simultaneous Design of Measurement and Control Strategies for Stochastic Systems with Feedback* RAJESH BANSALt~t and TAMER BA~jARt For first-order Gaussian ARMA models, the joint optimum design of measurement and control strategies under a quadratic performance index leads to linear rules in both finite and infinite horizons, while for higher order systems the optimum designs are generally nonlinear. Key Words--Stochastic control; team theory; information theory; optimal systems; nOn-classical information patterns. AImtraetuWe consider stochastic dynamic decision prob- lems where at each step two consecutive decisions must be taken, one being what information bearing signal(s) to transmit and the other what control action(s) to exert. For finite-horizon problems involving first-order ARMA models with Gaussian statistics and a quadratic cost criterion, we show that the optimal measurement strategy consists of transmitting the innovation linearly at each stage, which in turn leads to optimality of a linear control law. We then extend this result to infinite-horizon models with discounted costs, showing optimality of linear designs. Subsequently, we show that these appealing results do not necessarily carry over to higher order ARMA models, for which we first characterize the best designs within the afline class, and then derive instances of the problem for which there exist non-linear designs that outperform the optimal designs within the altine class. The paper also includes some illustrative numerical examples on the different classes of problems considered. 1. INTRODUCTION 1.1. Motivation WITH ADVANCES in large scale decentralized systems, brought about by improved computing and data transmitting capabilities, problems requiring simultaneous measurement and control are becoming increasingly commonplace. De- centralized systems are replacing their centralized counterparts for reasons which range from cost effectiveness to sheer necessity due to large * Received 25 August 1988; revised 5 December 1988; received in final form 16 December 1988. The original version of this paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor P. M. Guimar~es Ferreira under the direction of Editor H. Kwakernaak. t Decision and Control Laboratory, Coordinated Science Laboratory, University of Illinois, 1101 W. Springfield Avenue, Urbana, IL 61801, U.S.A. ~Present address: AT&T Bell Laboratories, 6200 E. Broad Street, Columbus, OH 43213, U.S.A. systems being more complex and less manage- able. In such decentralized systems decisions are often taken on the basis of information generated by other members of the same system and garbled by noisy communication channels. Quite often we may identify agents with one of two kinds of roles: (a) agents who perform the communication task of generating information bearing signals; (b) agents who perform the control functions of forming estimates, minimizing errors and reducing costs. For example, we may think of a space probe and an earth station together constituting a decentralized system, with the probe making measurements, and then encoding and transmit- ting them, thereby performing the communica- tion task, and the earth station performing the requisite control functions. In this paper we shall be concerned with such multi-agent stochastic decision problems (teams) requiring simultaneous design of measurement as well as control signals. 1.2. Problem formulation As a prototype for the class of stochastic decision problems alluded to above, we first consider the stochastic system described by the following set of scalar equations: Xi+ 1 = piXi q- m i -- v i (la) Yi = ui + wi (lb) where, for some Borel measurable functions h~ and y~ u, = h,(x,, y'-~) (Ic) 679

Upload: independent

Post on 10-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Automatica. Vol. 25. No. 5. pp. 679--694, 1989 Printed in Great Britain.

0005-1098189 I3.00 + 0.00 Pergamon Press plc

¢~ 1989 International Federation of Automatic Control

Simultaneous Design of Measurement and Control Strategies for Stochastic Systems with

Feedback*

RAJESH BANSALt~t and TAMER BA~jARt

For first-order Gaussian A R M A models, the joint optimum design of measurement and control strategies under a quadratic performance index leads to linear rules in both finite and infinite horizons, while for higher

order systems the optimum designs are generally nonlinear.

Key Words--Stochastic control; team theory; information theory; optimal systems; nOn-classical information patterns.

AImtraetuWe consider stochastic dynamic decision prob- lems where at each step two consecutive decisions must be taken, one being what information bearing signal(s) to transmit and the other what control action(s) to exert. For finite-horizon problems involving first-order ARMA models with Gaussian statistics and a quadratic cost criterion, we show that the optimal measurement strategy consists of transmitting the innovation linearly at each stage, which in turn leads to optimality of a linear control law. We then extend this result to infinite-horizon models with discounted costs, showing optimality of linear designs. Subsequently, we show that these appealing results do not necessarily carry over to higher order ARMA models, for which we first characterize the best designs within the afline class, and then derive instances of the problem for which there exist non-linear designs that outperform the optimal designs within the altine class. The paper also includes some illustrative numerical examples on the different classes of problems considered.

1. INTRODUCTION

1.1. Motivation WITH ADVANCES in large scale decentralized systems, brought about by improved computing and data transmitting capabilities, problems requiring simultaneous measurement and control are becoming increasingly commonplace. De- centralized systems are replacing their centralized counterparts for reasons which range from cost effectiveness to sheer necessity due to large

* Received 25 August 1988; revised 5 December 1988; received in final form 16 December 1988. The original version of this paper was not presented at any IFAC meeting. This paper was recommended for publication in revised form by Associate Editor P. M. Guimar~es Ferreira under the direction of Editor H. Kwakernaak.

t Decision and Control Laboratory, Coordinated Science Laboratory, University of Illinois, 1101 W. Springfield Avenue, Urbana, IL 61801, U.S.A.

~Present address: A T & T Bell Laboratories, 6200 E. Broad Street, Columbus, OH 43213, U.S.A.

systems being more complex and less manage- able. In such decentralized systems decisions are often taken on the basis of information generated by other members of the same system and garbled by noisy communication channels. Quite often we may identify agents with one of two kinds of roles:

(a) agents who perform the communication task of generating information bearing signals;

(b) agents who perform the control functions of forming estimates, minimizing errors and reducing costs.

For example, we may think of a space probe and an earth station together constituting a decentralized system, with the probe making measurements, and then encoding and transmit- ting them, thereby performing the communica- tion task, and the earth station performing the requisite control functions.

In this paper we shall be concerned with such multi-agent stochastic decision problems (teams) requiring simultaneous design of measurement as well as control signals.

1.2. Problem formulation As a prototype for the class of stochastic

decision problems alluded to above, we first consider the stochastic system described by the following set of scalar equations:

Xi+ 1 = p i X i q- m i - - v i (la)

Yi = ui + wi (lb)

where, for some Borel measurable functions h~ and y~

u, = h,(x,, y'-~) (Ic)

679

680 R. BANSAL and T. BASAR

and vi = yi(y i)

with yi := (Yo . . . . . Yi).

Here (la) and (lb) are the

(ld)

state and measurement equations, respectively, vi is the control variable and ui the information variable that carries information on the current and past values of the state. The functions hi and ),i are the measurement and control strategies, respec- tively, and are chosen such that ui and vi are second-order random variables. The initial state Xo, the system noises m/s and the measurement noises wi's are assumed to be independent, Gaussian, with zero mean and variances e~.) (the subscript being the identifier).

The problem is to jointly design the control and measurement policies so as to obtain optimal system performance. This objective is formalized as the minimization of a quadratic cost functional

I ] j ( h N, yN) = E ~ , (ai+lX2+l + bi v2 + q,u 2) (2) u i = 0

where the weighting parameters a/s, b/s and qi's are all taken to be positive, for all i. In the above we have adopted the convention that h i = (ho . . . . . hi), ),i = ()'o . . . . . Yi), and have im- plicitly assumed the given relationships (lc) and (ld) between the policy variables and the action variables. We now have problem P below, which we state for future reference.

Problem P.

Minimize J ( h N, ),N) h N, )t N

subject to ( l a ) - ( ld ) , where J(h N, yN) is defined by (2).

It is worth noting that if the measurement policy hi is fixed to be linear in the current value of the state, then Problem P becomes equivalent to the so-called LQG problem of stochastic control, which is known to admit a unique optimal control policy that is linear in the best (minimum mean square error) estimate of the state, generated by the Kalman filter. For other (say nonlinear) choices for hl, however, the optimal policy is not generally a certainty equivalence controller, and even a closed form solution may not exist. The joint design problem formulated above is an even "harder" problem since as a two-agent stochastic team problem it features non-classical information (Ba§ar and Cruz, 1982; Witsenhausen, 1971). This follows from the simple observation that the agent (say

A) who takes action vi has to use the information yi which depends on the action variable, u'- i, of the other agent, which is dependent on the state x i-1, which is not directly observable to agent A.

In this paper we obtain the complete solution to Problem P above (in Section 3) by first solving a hard constraint version of the problem (in Section 2). We then study the infinite horizon version (Section 4), where J in (2) is replaced by

J(h ~, y~) = E '(qu2i + aflx~+, + bye) (3) L i = 0

where ~6e(0, 1) is the discount factor, and a, b, q are three positive scalars. For this problem we establish the existence of optimal stationary policies, and provide a convergent algorithm for the numerical evaluation of these policies.

In Section 5 the more general problem is discussed, where (la) is replaced by a higher order ARMA model. The paper ends with the concluding remarks of Section 6.

1.3. Related results Determination of optimal measurement strat-

egies for linear stochastic systems has been the subject of considerable previous investigations. Athans (1972) has solved an off-line dynamic optimization problem to select alternate sensor measurement policies to optimize a combination of prediction accuracy and observation cost for a linear stochastic system. Lafortune (1985) has also considered the problem of optimally selecting among different costly observations for a linear Gaussian system with quadratic costs. Other related results are provided by Mehra (1976), Herring and Meisa (1974), and Meier et al. (1967).

One common feature of the above investiga- tions is that the measurement model has been specified parametrically, thus reducing the problem to one of optimal selection of a set of parameter values. As a departure from this approach, Whittle and Rudge (1976) have con- sidered the problem where both the measure- ment and control sequences are simultaneously optimized, and they have formulated it as a dynamic two-person cooperative game with imperfect information. In control theoretic terms, however, their solution is unrealizable because the actions taken at a given time depend also on observations lying in the future.

In this paper we consider simultaneous optimization of both the observation and the control sequences under the further requirement that all policies be causal. As noted in Section 1.2, in this formulation the policy hl affects the

Optimum measurement and control strategies 681

measurement to be used by the controller 7 , but the controller cannot infer all the information available to the agent who chooses h~. Such information patterns have been called nonclassical in the stochastic team literature, as opposed to classical or quasi-classical infor- mation patterns which include the static and partially nested structures (Ho, 1980; Ba§ar and Cruz, 1982). Stochastic teams with non-classical information have remained quite intractable from the time they were first identified some 20 years back (Witsenhausen, 1968), where it was shown that some of the simplest LQG two-person teams with non-classical information do not admit optimal linear solutions. It is, however, not only the non-classical nature of the information pattern, but also the structure of the loss functional that contributes to the difficulty, since in the absence of the product term between the decision variables in the loss functional, the optimal solutions may readily be found (Bansal and Ba§ar, 1987a), even for some multipath systems (Bansal and Ba§ar, 1987b). As we will see in the sequel, both the finite and infinite horizon problems formulated in Section 1.2 also belong to the class of such tractable stochastic decision problems with non-classical information, even though from a computational complexity viewpoint they are NP-Complete (Papadimitriou and Tsitsiklis, 1986).

2. THE H A R D CONSTRAINT VERSION In this section we formulate and solve a

stochastic dynamic team problem which is a version of Problem P formulated in Section 1.2, with hard power constraints on the ui's. The solution to this problem will then be used to construct the solution to Problem P.

Consider Problem P1 below.

Problem P1.

Minimize J'(h N, yN) hN, y N

subject to ( l a ) - ( ld ) , where J'(h N, I/N) is defined by

E Z biv2i) (4) j , ( h N, yN) = (ai+aX2i+~ + i=O

and the ui's further satisfy the power constraints

E[u 2] <_ pZ. (5)

2.1. Construction o f an equivalent problem The gist of this subsection is as follows. First

an equivalent Problem P2 is constructed from P1 which differs only in the form of the cost function. The cost function for P2 is in the form

of a sum of the squared differences between state and control variables. In the transformation from P1 to P2 the constraints represented by ( la ) - ( ld ) are unaltered. In a follow-up step, Problem P3 is constructed from Problem P2 such that the structure of the cost function is left unaltered, while the state equations are re- defined so as to facilitate subsequent analysis.

These two transformations are presented below as Claims 1 and 2, respectively.

Claim 1. Under the set of constraints repre- sented by ( la ) - ( ld) , the cost function for Problem P1, defined by (4), is identical to

= E ~ ai (v i bixi) "~ CN (6) j , (h N, YN) , _ , z i = 0

where b ; : = k~+lp,/(bi + k,+l) (7a)

a~ := b~ + ki+ 1 (7b) N

CN := ko°~o + Z ki+l o2,,, (7c) i = 0

and {ki} is a sequence defined recursively by

k, = at + k,+ab,pE/(b, + k,+,) (8)

kN+ 1 ~ a N + 1"

Proof. This is a standard result in stochastic LQ control with perfect state information (see, e.g. Bertsekas (1987) or Kumar and Varaiya (1986), known also as "completing the squares". We note that CN is a constant (independent of the control sequence {vi}), and (8) is the so-called discrete-time Riccati equation for this scalar problem.

Claim 2. The solution to Problem P2 may be obtained by solving the following equivalent Problem P3.

Problem P3.

Minimize J"(h N, ~,N) h N, f, N

where "1~i+1 = pixi + mi, X0 = xo (9a)

O, = ~,(y') (9b)

Yi = ui + wi (9c)

u, = h,(.f,, y,-1) (9d)

the u~'s satisfy the power constraint (5), and

j , ( h N, ~N) := E ~. a;(~, - b;.f,) 2 + CN. (10) L i = 0

Proof. The situation is depicted in Fig. 1.

682 R. BANSAL and T. BASAR

mN. 1 ~ ,

ON'I - ~ YO YN-I ~lb )N-I

t

PN-2 ~ ' Y0 YN-2 *0 ) ! mN. 2 V*N. 1 ~ N . I

m 0 w 1 <.- Xl '[ I ] ul " ( ~ ~ r T ~ ' " ~1

P0 YO w0 )0

t ~ ~ yo 4 - - k - ;o

FIG. 1. Diagrammatic representation of Problem P3.

Substituting for Xl, using (la), we get

vl - b'txl = v l + b'lvo - b'l(poXo + too). (11)

Similarly

v2 - b~x2 = v2 + b~vl + b~plvo

- b~(p,(poXo + mo) + rex) (12)

and at the ith stage we have

U i - - b ~ x i = u i + b ~ y i _ l + b ~ p i - l U i - 2 + " • •

+ b~pi-1 " " " p l v o

- b ; ( p , - , ( p i - 2 ( ' " "(poXo + m o ) ' " ")

+ mi-2) + mi-1). (13)

We now define the sequence {.fi} through

)C0 = X0 (14)

.fi+l = Pi£~ + m~, i = O, 1 . . . .

and let

0 i :-~" V i "~ b;vi-1 + b~p i - l v i -2

+ " " • + b ~ p i _ 1 • • • PlVo . (15)

Using these new variables, the cost function (6) can be rewritten as

E ~ a ; ( O i b;£) 2 +CN j, ,(h N, ~,N)= i=0 (16)

~Ji = ~ i ( Y i, u i - 1 )

where the evolution of the £/s is determined by (14). Since this is a team problem, and for each fixed h N the resulting stochastic control problem has classical information, minimization of (16) over (h N, ~N) is equivalent to its minimization over (h N, ~N) where Yi has only y~ as its argument.

The relationship between O~'s and v/s is given by

0 N = v N B T (17)

where B T is the transpose of the following

non-singular lower triangular matrix:

i 1 0 0 () 0 ()~ b i I () 0 0 0

b "p l b" 1 0 0 0 . B"~ r" p

b ~ p 2 p ~ b 3 P z b 3 1 0 0

1 0

kb~pN-,'''P, b~NPN 1" * " P2 ~,~' 1

We may thus write

oN= ~N[BT]--~

and we thus see that in order to obtain the optimal solution to P1, we may equivalently solve the problem in terms of ~ and J?i (over ~N, hN), which is precisely Problem P3. This then leads to the following equivalence between the solutions of Problems P1 and P3.

L e m m a 1. (i) Problem P1 admits a solution if, and only if, Problem P3 does.

(ii) If (h N, yu) is a solution for P1, then (h N, 7NB "r) solves P3; conversely, if (h N, ~N) solves P3, then (h N, ~'N[BT]-I) is a solution for P1. []

2.2. A n auxiliary p rob lem In this subsection we formulate and solve an

auxiliary problem which will play an important role in the solution to Problem P3.

Consider the situation depicted in Fig. 2, the problem being one of finding the signals u 0 , . . . , uN subject to the power constraints (5) so as to minimize the mean square error in the estimation of zu using yS. It is given that z0, ni, w,, i = 0 , 1 , . . . , are mutually independent Gaussian random variables, each with mean zero and variance indicated by o~.), the subscript being the identifier. Following the convention of information theory, we let l ( z N ; y u) denote the mutual information between zN and yU, and call the supremum of this quantity the capacity of the corresponding system.

Y0 YN-1 WN

z I Y

CtlZl+nl w0 /

J D A

FIG. 2. Diagrammatic representation of the auxiliary problem.

Optimum measurement and control strategies 683

w 1

z I • v 1

alZl+nl Wo

FIG, 3. The auxiliary p rob lem with N = 1.

We shall first solve this problem for the case N = 1 which is illustrated in Fig. 3. The proof for arbitrary finite N will then be shown to follow a similar line of reasoning.

It is a known result in information theory [see, for example, Shannon (1959), Berger (1971)], that minimum possible distortion D* that can result from the use of a channel is related to its capacity by

R ( D * ) = C.

First consider the lower branch of Fig. 3. We find D* for this system to be (Bansal, 1988)

2 D* = oz,~,

(1 + +

(1+ 2 4

+ ((1 + tl'l)20"z 2, "4- 02,) (P2 o + 02) . (18)

Now using the expression

we may compute R(D*) , to arrive at the following result.

Lemma 2. For the system depicted in the lower branch of Fig. 3, the mutual information I(z~; Yo) is bounded above by

[ (Pg+-- °2)((1 + ~I)2°2' + O~n') 1 Ceq = ½log Lo2,(p2 + ~-~-) + (~ +-~-)T~"~j.

(20) []

Now, consider the general system depicted in Fig. 3. We first note the fundamental inequality

l(z,; yo, y,) = l(z ,; yo) + l(z,; y, l Yo). (21) N o w

l ( z , ;y , l yo) = l ( z , , w;y , [yo)

- l ( w ; y , I z,, Yo) (22a)

<- l (z , , w; y, I Yo) (22b)

= l (w; y, [Yo) (22c)

= H(y, l yo) - H(y, I w, Yo) (22d) = H(y , l yo) - H(yl [ w) (22e)

<- H ( y , ) - H(y , I w) (22f)

= l(y , ; w) (22g)

+ ½ log ~wl ] (22h)

where H(.) is the entropy and H(. I ') the conditional entropy. Here steps (a), (d) and (g) follow from the definition of mutual information, steps (c) and (e) are due to the Markov property, step (b) follows because information is always positive, step (f) is valid because conditioning cannot increase entropy, and the last step holds because, for a fixed variance, the Gaussian random variable has the maximum entropy (Kagan et al., 1973).

Using (20) and (22), along with (21), we arrive at the next result.

Lemma 3. For the system depicted in Fig. 3

l(z,; Yo, Y,) <-

11 o r (P2+°2~0)((l+a'02°'~,+°2~,)

(23) []

Using this upper bound on I(z , , Yo, Y~) we can find a lower bound on the minimum mean square error achievable when the problem is to estimate z~ from the observations Yo and y,.

We have

0~, (24) l(zG Yo, Yl) >- l(zG vO >- ½10g E[(zl - vl) 2]

which implies (using (23))

2' (P2° + °2~)((1 + al)2cr~' + o2') p2 + or2 log - - g ~ . . . . . . ~ - -

(~ , (Po + O2~o) + (1 + aq) ~,o2~) crEw,

-> 12 log e[(z, - 1)1) 2]

i .e.

(25)

E[(z , - 2 2 o2w,(~,(P 2 + 02~o) + (1 + a,) ° : ,~o) (26)

z . (p2+ 2 2 2 2 +

We next note that if we use the policy

Uo = 2.oZo (27)

u , = - E ( z , l yo)]

with 4o and ~., chosen so as to satisfy the power constraints, the minimum mean square error is indeed achieved. Hence, we have the following lemma.

Lemma 4. The policies given by (27) are the policies which minimize the mean square error in estimating z, from the pair (Yo, Y,)- []

We now consider the case with arbitrary N, depicted in Fig. 2. First, consider the problem in the absence of the most recent observation, i .e.

AUTO 25:5-C

684 R. BANSAL and T. BA~AR

with the uppermost channel of Fig. 2 removed. Assuming that the version of the problem with ( N - 1) channels has already been solved, the capacity CN-a (i.e. maximum mutual information between input and output) for the portion of the system within the rectangular box C D E F is known. We can therefore find the minimum achievable distortion for a memoryless Gaussian source with variance ~z,~ when only the portion within the box ABCD is in use, by computing the conditional estimate of ZN given ZN-~, and transmitting this optimally. Using this minimum achievable distortion we can compute an upper bound for I(ZN; yN-X) as follows (where z~v is the sequence ZN,, • • •, ZN)

D* = lim 1 E [[z~ - v~[[ 2 = ~Na2"" ~-.~ j (1 + trN)20"~,N + O~ N

u) a~,, e -2cN-' (28) + ( l + t r ,2 4

(1 + ~, ,)2~ + ~

i.e.

I(ZN;yN-1)<½1og[,, (1 +__ a'N)203zu +O~"N ] L O'~ u + (1 + trN)20r~,, e'2"~c'-' J"

(29)

Furthermore, we can use a series of inequalities as in (22) to show that

I(ZN;YNIyN-')~½1ogp2N+-~ -°2wN . (30)

NOW since

I(ZN;y N) = I(ZN;y N-') + I(ZN;YN [ yN-,) (31)

we get

l(zN; yN) [ (1 + ,,~,,)2d. + d.. PhL_d.. 1

_< ½ log _ ~ + (1 + ~,,)24~ e -2c'-' d.~ J

:= CN (32)

and we have the following lemma.

Lemma 5. The mutual information I(ZN;y N) is bounded above by CN, which is the last step of the recursion:

eo~ + o~,~ Co=½10g( " ~ ) (33a)

and for i = 1 . . . . . N

{ (1 + a:/)2o~, + o~., P~ + o2..,'~ C, = ½ log \o~,, +'(1 + ~ ~ c , - , O.wi72- ] "

(33b) []

Next, let A~ denote the minimum achievable

mean square error when z, is estimated using y', i.e. (using (24))

Ai := o~..,e -2c~. (34)

Then, we have

2 Ao p2 + a ~

and for i = 1 . . . . . N

(35a)

+~,) o~,^ ~ + -, az . (35b) A'=e:+o~. , o~,'_, " ~ ' - ~ . ,

We shall next show that this lower bound is tight and may be achieved by using the policies

uo = ZoZ0 (36a)

and for i = 1 . . . . . N

ul = ~.,(z, - E(z~ [ y'-~)). (36b)

Here the Z~'s are chosen so as to meet the given power constraints with equality. Since

z~_1 = (1 + a~)zl + n, (37)

we may equivalently write

zi = ~i-lzi-1 + mi-l (38)

where

f ) i_ l .'D~" (1 + tr;) ~ ' (39) O~zi - 1

and mi-1 is a Gaussian random variable which is independent of zi-1 and has variance

'- = o~,,_, o2. (40)

With the policies chosen as in (36), we have

Yo = ~.oZo + Wo (41a)

and for i = 1 . . . . . N

yi = ,ki(z~ - E(z~ [ y'-~)) + wv (41b)

Let Yi denote the mean square error in the estimation of z~ from y~ when the communication strategies are chosen as in (36), i.e.

X~:= E[(z , - E(z~ [ f))21. We then have, using (35)

P°2 (42a) ~ = o~

and for i = 1 . . . . . N

pc z _ (42b) ~.i -- 04 2

(1 + o~i)2 ~ Zi_l + 0~, o~n q[]Z I ~ i

Further, by our specific choice of the policy, ( z i -E( z i [y i - l ) ) is a zero mean Gaussian

Optimum measurement and control strategies 685

random variable, and therefore

E[(z, - E(zi [ y/-1)

- E [ ( z i - E(zi I l y,]) 21 --

where

AI so,

p2+o2w ' (43)

2 = E [ ( z , - E ( z , I s i :

U 4 U~z ~ m ( 1 "3L Ofi)2 - ~ ~ - i_ 1 "l - O~n i . ( 4 4 )

Uzi - 1

E(E(zi l y '-~) l y 3 = 0 by our choice of policy, since y~ is independent of yi-~. Therefore the expression on the left-hand side of (43) becomes

E[(z, - E(z, [ y ' - ' ) - E(z, l y,)) 2]

= E [ ( z i - E ( z i l y')) 2] = X,

and we obtain the recursion (for i = 1 , . . . , N)

o2.,, (1 + a~,) trz, X,_, + o'~., (45a) ,

~Zi -- 1

with the initial condition

Xo = p2 ° + O2o. (45b)

The recursion for the X:s is therefore identical to the recursion for the Ai'S (given by (35)), which denote the minimum mean square error achievable. This shows that the lower bound on error is indeed tight, which then leads to the following theorem.

Theorem 1. (a) The policies given by (36) minimize the

mean square error incurred in estimating ZN from yN for the system depicted in Fig. 2, where the ).~'s are defined by (42) using the X:s defined by (45).

(b) The minimum mean square error is given by the last step of the recursion (45) or equivalently by the last step of the recursion (35). []

2.3. Solutions to Problems P3 and P1 We now return to Problem P3 defined in

Section 2.1, where the policies

Ui = h i ( ~ i , y i - 1 )

and Oi = ?i(y i)

are to be chosen in order to minimize

j t t t - :~- E E a i ( l J i - b ~ x i ) 2 "~- CN ( 4 6 )

i=0

under the constraints depicted in Fig. 1.

We first consider the minimization of the Nth term in the expression for J, which is

E [ a ~ ON - b~N) 21 the optimization problem being equivalent to minimizing

t t2 -- I E[a'Nb N( V N - :~N) 2]

where

- r __ U N

VN b 'N

i.e. the problem is one of forming the best estimate of XN under the mean square distortion criterion. We now show that the situations depicted in Figs 1 and 2 are identical except for nomenclature. To show this equivalence we note that for i = 1 , . . . , N

z~_l = (1 + try)z, + n~ (47)

which implies

z~=( l + o c ~ ) ~ zi_t (48)

where the m~'s are zero mean Gaussian random variables each with variance

e2m, = ~ cr~,.,. (49)

We therefore have

p i _ l = ( l + t r i ) - - ~ for i = 1 . . . . . N o zi_ I_

and by defining 4 o = tr2~ we can complete the correspondence between the variables $~'s and z:s for i = 0, 1 . . . . . N.

The solution of the problem of minimizing the mean square error in estimating iN from yN may therefore be obtained as in Section 2.2.

(a) The minimum mean square error in estimating x~ using y~ is given by A~, where A~'s satisfy the recursion (for i = 1 . . . . . N)

°'2w' (p2 ,A ,_ , + O2m,_,) (50a) a, = +

with the initial condition

A o = + " ( 5 0 b )

(b) The optimal communication strategies are

Uo = h~(:~o) = Xoi0 (51a)

and for i = 1 . . . . . N

u, = h*(,~', y,-1) = Z,($, - E($, I y,- l)) (51b)

where the Zi's satisfy the recursion (for

686 R. BANSAL and T. BASAR

i = l , . . . , N )

~2 = P / 2 - 1 A i - I --t- O'2m,_, ( 5 2 a )

with the initial condition

).02 =--P°2 (52b)

(the A~'s being as defined by (54)). (c) The optimal choice for the '2~'s is

fJ~ = f,*(yi) = b;E($~ l y i) (53)

where the E(.~ I f ) ' s satisfy the recursion (for i = l . . . . . N)

E(~, l Y') = p,_,E(~,_, l Y'- ')

P/ (p2_ 1 A,_, + . . . . ) y, (54a) O~. xl/2 + p2+ a2w,

with the initial condition

E(£o , = _ eou~oyo

(54b)

(the A/s being as defined in (50)). We finally note that the policies which

minimize the mean square error in the estimation of £~ given yi for i = 0 . . . . . N - 1, are identical to the corresponding policies used in the estimation of £N given yN, and we therefore have the following theorem.

Theorem 2. (a) The optimal policies h,- and ~,~ for Problem

P3 are given by (51) and (53), respectively, using the ~.;'s and A,-'s as defined by (52) and (50), respectively.

(b) The minimum value of the cost function for Problem P3 is

N j , ~ , ,2

= a i b i A i + c N. [] i = 0

We now turn to the original Problem P1 formulated in Section 2.1. Taking the difference

Oi 0i-1 - - - , (55) bi Pi-l bi_l

and using (53), (54) and (15), we find that

13-'-~i ---- ( P i - 1 -- I~ ' ~ 1)i- 1 b; ~'-lJ b;_l

2 1/2. P~ (p2~Ai_ ~ + a . . . . ) y~ (56)

which implies that the optimal control policies for the original problem are

v? = Y*(Y') = b;E(x, l Y') __a b'2, (57)

where -ri := E(x~ lye) satisfies the recursion (for

i = 1 . . . . . N)

Xi = ( P i - I - - b / - 1 ) 3 ~ i - 1

P/ (p2_,Ai_1 + ..... ) Y, - - O~- ~1/2 + O w,

with the initial condition

Poo~o = 5 2 o ~o P20 + a~,,

the A/s being as defined in (50). We therefore have the following theorem.

(58a)

(58b)

Theorem 3. (a) The optimal policies {h*} and {y*} for

Problem P1 are given by (51) and (57), respectively, using the ira's and A~'s as defined by (52) and (50), respectively.

(b) The minimum value of the cost function for Problem P1 is

N

J * = E a~b ;2A i + CN. [] i=O

A n illustration 2 __02 .-~ Consider the case with N = 2, Pi = ax0 - w,

O2m,=p2=1.0 and the objective being to minimize the cost functional

2

j (h = E (x,k, + iffi0

Using Claim 1, we get the equivalent cost functional

2

E a i ( u i , 2 ' _ b i x i ) + c 2 i = 0

where

and

t ao' = 13/5, al '= 5/2, a 2 = 2

b~=8/13, b ; =3 / 5 , b~=1/2

c2 = 613/130.

Using Theorem 3, we obtain the optimal designs for the problem to be

* * = ( ~ / ( 2 / 3 ) ) ( x ~ - E(x~ t y ° ) ) ; U o ---- XO; U l

u~ -- (~/(4/7))(x2 - R(x2 l y l ) )

and

where So = 21Yo;

S l = ~ Y o + ~ Y l ;

22 = 31yo + a2Yl + ~Y2-

The minimum cost is 6.1852.

Optimum measurement and control strategies 687

3. THE SOFT CONSTRAINT VERSION--FINITE HORIZON CASE

In this section we use the solution to the hard constraint version (obtained in Section 2) to construct the solution to the original problem P. Let Je denote the infimum of J under the hard power constraints

Je := Inf j (h N, yN). (59)

We then have the following series of equalities and inequalities:

E X ' , 2 Je = ~ q,p2 + lnf (ai ( v i - bix,) ) i=O hN;yN;E[h2]fP~i ,Vi i=O

E ' b ix , ) ) ~- ~, qiP2i+ Inf ( a i (v i - ' 2 i = 0 hN;yN;Elh2]~p21,WI i=O

N N

qip2 + ~ 'h '2^ = a i ~ i -- i (60) i = 0 i = 0

t t 2 ~ Min qiP~ + aibi Ai ~ - o

N

= ~ qiP72 + a'b[2A * i = 0

where the A~'s are defined in (50) and AT's are defined recursively likewise, with P/replaced by PT. In order to find the optimal power levels (P*2'S), we can solve the following problem:

N

,~,2- (61) Min ~'~ qip2i + aioi txi Po ... . . PN i=O

which is a non-linear optimal control problem, the solution to which is given by the following dynamic program (with p_l: = 1 and 02m_l: =0). Here W~(A) is the "optimum cost to go" given that the system is at state A at stage i

WN+I = 0

W/(A) = Min (qi P2 + a;b; 2 O w, e + (pL ,a

+ M,.,_,)

[ °~w' (p~_tA+O~m,_,)]]. (62) + w,+, e ' 2 +

The optimal value of the cost is

Min J(p2o . . . . . P~) = Wo(~o). 2 2 P o . . . . . PN

(63)

We next show that a solution to the above problem always exists. If we define

a i ° i ° w . 2 A f(A, P~):=q,P~ + ~"ff~-S'. _--~(P,-, + d,,,,_,) / " i "t- O'wi

+ w. ( + ,+1\ + ,¢. / (64)

then

W~(A) = minf(A, P~).

Note that W/ is a continuous function of its argument if W~+1 is, since the continuity of W~+~ implies continuity of f. From the continuity of WN+I (which was defined to be zero), the continuity of W,. follows for all i. We also note that as P2---~ oo, f (A, P2) ---~o0 also, and since p2 > 0, the search for P72 can be confined to a compact set over which a continuous function always admits a minimum.

The dynamic program (62) can therefore be solved, yielding values for

P t . . . . .

and we have the following theorem.

Theorem 4. Consider the problem

Minimize J(h N, yN) h N, .fN

subject to ( l a ) - ( ld ) , where J(h iv, yN) is defined by (2).

(a) The optimal policies {h*} and {YT} are given by (51) and (57), respectively, using Zi's and A[s as defined by (50) and (52), with the solution to the dynamic program (62) providing the optimum power levels, i.e.

P~=P72 for i = 0 . . . . . N.

(b) The optimal cost is given by

N t t t 2 A * Wo(tr~ o) = ~'~ qiP72 + aioi zxi . []

i=O

An illustration The optimal power levels depend critically on

the power penalties (q/s). If we assume N = 1 and the following parameter values:

2 =1.0, crEw,=l.0, O2o=1.0 ~0= 1.0, cr~ 0

qo =2.0, ql =4.0, po = 0.5, a~= 1.0

b~ = 1.0, al = 2.0, b~ = 1.0

then the optimal value of the cost is 3.5 which is Po = P1 = 0.0. If the power penalty attained by 2 2

qo is changed to 0.25 with all other parameters remaining the same, we can achieve an optimal cost of 2.9747, which is attained by using P2 o = 1.4495 and p2 = 0.0. If the power penalty qt is also changed to 0.25, then the optimal cost is further reduced to 1.9968 which is attained by po2=1.1609 and p2=1.9876. It is notable that the optimal solution satisfies a threshold property, and the number of channels in use depends on the relative magnitudes of the weighting terms.

688 R. BANSAL and T. BASAR

4. THE INFINITE HORIZON PROBLEM We now turn to the analysis of the infinite

horizon problem. For this case the cost may be rewritten as

where

J= ~ qt6iuZi + ~ ai(vi' - - blxi)'- i=0 i=O

a; --~ f l i (b + kf l ) "~ a ' f f

kBo b ; - . ."-r =_ b' a'

and k is determined as the positive root of the equation

(k - a)(kfl + b) = kbflp 2 (65)

i.e.

1 k = ~-~ (X/((b - bflp 2 - aft) 2 + 4abfl)

- (b - bflp 2 - aft)). (66)

Thus, an infinite horizon version of the originally formulated problem, with discounted cost, may be solved by solving a problem of the form P®, which is given next.

Problem P®.

[ ] =E~ Minimize J(h ®, y®) (qu 2 + a (v, - xi)2)ff hffi, Y® i=0

where a, q are given positive constants, fl is the given discount factor (0 < fl < 1) and

Ui = hi(xi, y ' - ' ) (67a )

v,. = Yi(f). (67b) []

We treat the infinite horizon problem as a limit of the finite horizon case with horizon length N, as N--* ~. Each such finit~ horizon problem is identical with the one considered in Section 3, with the only exception that now all parameter values are constants and there is an additional discount factor ft. The counterpart of the Dynamic Programming recursion (62) is

WN+I(A) = 0 (68a)

a ~ Wk(A) = ipnf qeZ + p2 +-"-~ (P 2A + o2)

+ ( ~" ) ] flW*+l "A5-:'_2 (P 2A + 02,,,) for k -< N. P + o ~

(68b)

We now study, in a series of lemmata, properties of Wk(A), and in particular its asymptotic behavior.

Lemma 6. Wk(A) is strictly increasing for decreasing k, for all A > 0 , i.e. Wk(A)> Wk+I(A) for all k -< N.

Proof. Clearly the lemma is true for k = N, since WN+I ~ 0 and WN(A) is necessarily larger than zero for all A. We now note the following sequence of equalities and inequalities:

W~(A) -- W~+,(A)

= Min qp2 -t (pEA + 02,.) /,2> 0 p2 + 0.2w

+ flWk + l ( p2 0~'o2w (p2 A + O'2m) ) ]

- Min [qp2 at72w

/,~>o p~ + 02. (p~A + 02~)

+ flWk.2(p2 +02*'02w(P2A + 02,,,)) ]

a02~ >- qP~ + P~ + 02w (v~A + 02")

+ t ~ W . ÷ , ( ~ (P~A + 02m)) a02~

_ qp2 + p2 + o2,, (P 2A + 02,-)

02.,(p:a ± ~.).~ =/~(w,<+,-w,<+~){ ~ 7 ~ :

where /~2 is chosen as the argument of the first minimization. (In case of non-unique solutions, any one of the minimizing solutions may be chosen.)

Thus if Wk+I(A) is larger than W,+2(A), then Wk(A) is larger than W~÷I(A). Since Wlv(A) is known to be larger than WN÷1(A), the proof is complete. []

Lemma 7. Wk(A) is an increasing function of A {Wk(A) 1' A} for all k < N.

Proof. We prove this by induction. First con- sider the case with k = N. We have

aa~ ] WN(A) = Minp,~o qp2 + P---~---+ o2,, (p2 A + 02,,) .

There are only two possible cases, either P 2° = 0 or P:" > 0. If p2* _ 0, then WN(A) = a(p2A +

Optimum measurement and control strategies 689

02m)" If p2" > 0, which requires

a02",(P 2A + 02m) q (p2" 4- 02.,)2 -- 0

i.e. p2*_= [a°'2w(p2A_ + 02m) ]1/2 o.2 w

L q

and we get

WN(A ) = 2a1/2q1/2ow(p2 A .~. 02")1/2 _ q02",.

Thus the lemma is true for k = N. Now, if Wk+a(A) 1' A, then for each p2

w, (. =__G ( p ~ + 02,.)) t z~ k+l~e2 02", +

since

(69)

(70)

a02", p~ + 02", (p:A + 02.) I' A.

Also qp2 + a02",(P 2A+ 0'2m)~

e~ + 02", /I 'A

and thus both terms in the expression to be minimized to obtain Wk(A) are increasing in A for all p2. Therefore, Wk(A) ~' A. []

Lemma 8. For each A >0 , Wk(A) is bounded above for all k, by an affine function of A, i.e.

0 < Wk(A ) ~ ~l ,k A @ ~'~2.k" (71)

Proof. The proof is by induction, using the observation that since Wk(A) is given by the minimum over p2, an upper bound is given by the value that the expression to be minimized attains when p2 is fixed arbitrarily at zero.

Thus WN(A)<-a(p2A+02,.) and we may choose [21,N = ap 2, f22, N = a02,,,.

Now consider the following sequence of equalities and inequalities:

a02~ Wk(A) =Mp!n qp2+ p2+_____~(p2A + 02")

"4- flWk + l ( p 2 ~ w (P2 A "+ O'2m) ) ]

[ a:" -< Mine, qp2 + p2 + 02", (P 2A + 02,,)

+ ~ w ~ ÷ , ( o ~ + 02.,)]

<_ a(o~A + 02.,) + ~W~÷,(O~ + 02,.)

<_ a(p2A + O'2m) + fl(aX.k+t(p2A + 02,.)

+ ~22,k+0 = (=p~ +/~nl.~+~pqn + ~02m + ~Q~.~+,02,.

+/3f~2,,+1

f21,kA + f~2,k-

Therefore, the lemma is proved with the sequences {Gt,k} and {Q2,k} defined recursively by

~'~I,N ~-" aP 2, ~'~2,N = aO'2m

f2t, k = ap 2 + flp2£21,k+l

Note that since the optimal measurement policy for the finite horizon problem is linear, the stationary limiting policy is given by

h*n(xn ' yn-1) = •*(Xn -- E(x,, l Y"-')) (72)

where p2-

g.2 = P 2A* + 02,. (73)

with p2* and A* being obtained through the stationary solution of the optimum control problem as N--* =.

For each N, denote the solution by (p2"}N k < N. We then expect that

p2 .= lim {p2"}N

for every finite k. To establish the existence of this limit, we

recall that Wk(A) is strictly increasing for decreasing k < N (Lemma 6) and further that it is bounded above by an affine function (Lemma 8). This last property follows since both QI., and Q2,k are bounded in retrograde time

{ at, ~ ~ a02. flLk < \ l_--Z-~] and ~"~2,k <

Hence

lim W ~ ( A ) = W(A)

where the limiting function satisfies

W(A)= M~R [qp2 + a02",(P 2A + 02") p2+ 02w

(1 - fl)(1 - p2fl)"

(74)

A~ -- 2 • 02", (p2A~,-1 + 02-) • (76b) Pk(AI,-1) +

Denote the minimizing solution here by P2(A). Let p2 = p2(A ) be a minimizing solution of the right-hand side of (68h) (which always exists, as shown in Section 3), and let • N {Ak}k=0 be the trajectory sequence defined recursively by

( 02.~ \ A~ = ~ / (76a) kp2(~) +

690 R. BANSAL and T. BA~AR

Finally, let

P02"= P20(o~) (77a) p~" : • = Pk(Ak-,). (77b)

Now note that since p2< 1, equation (76b) describes a stable system with p2 replaced by p2,

and hence A,--~ A* where A* solves

o~w A* - p2(A,) + 02w (p2A* + 02")" (78)

Let p2*:= p2(A,). (79)

Then we have the following solution to the infinite horizon problem.

Theorem 5. With N--.o~, the joint design problem under consideration admits the optimal stationary measurement policies

u. = h *.,(x., y " - ' ) = ;~*(x,, - E (x . I Y"-')), n = 0, 1 . . . . (80)

where e2*

p2A* + 02.

with A* and p2. given by (78) and (79). The optimal stationary control policies are also linear, and are given by

k/3p v.=y*(2,)=b+kfl2, , n = 0 , 1 . . . . (81)

where k is defined by (66) and $, := E[x, [ y"] by (58) with all quantities replaced by their time-invariant counterparts, also in view of (78) and (79). []

To numerically compute the optimal station- ary policies, we start with Ao = 0 and run the following algorithm:

Algorithm A. (1) Compute

(p2), = arg min [qp2 + 1

pZ;a:0 L

aO'2.,(p2Ak + oL) P2 + 02w

fl(qe2 + aAk)] + J

(2) Compute the new value, Ak+l, by

°~w (p~Ak + 02m). ak+, - (Pb* + 02w

(3) Go to step (1), and iterate.

The optimal cost is then given by

a02"(P2A* + °2")~/ t" W(A*) = ( q (p2 ) . + ( - - ~ - + - - ~ ]/ , . t - f l).

Claim 3. Algorithm A always converges. [2

Proof. First note that (P~)*, found from step (1) of the algorithm, satisfies

= Max{0,_ ('at/Z(lq~2fl)m (Pb*

x o,,,(p~A,, + ,~)"~ - 02,,) }

which implies that if Ak+ 1 --> Ak, then (P2+O* -> (e~,)*.

Now, given Ak > A,-~, we have

02~ A~+, - A~ = ( e b * + 02~ (p~A~ + o~)

o~ (P~_,)* + a~ (p=A~_, + 02.)

02. -> (Pb* + 02. (p~A, + o~..)

02. (Pb* + 02~ (p'-A,,_, + o~)

d,,, 4 P2(Ak - A,,_,) (Pb* +

>0.

Therefore, since A~ > Ao, it follows that the Ak'S form a monotone increasing sequence. Further, to show that the Ak'S are bounded above, we consider the sequence

Fo=0

Fk+~ = pZrk + 02m

and note that if Fk --> Ak, we have 2 Fk+1 = P2Fk + o 2 --> p2Ak + a,~

z (pf,), + 02,,, (p~A,, + oL) = A,,+,

i.e. if F, z Ak, then Fk+ 1 ----- Ak+ 1. But Fo = A0(~-0), and the sequence Fk is bounded above by

02m (1 - p2)"

Therefore, the monotone sequence a , is also bounded above, and the convergence of the algorithm follows.

5. HIGHER ORDER ARMA MODELS In Section 3 we studied a stochastic dynamic

system involving a first-order ARMA model, with the current state directly correlated only with the immediately preceding state. In case we allow this correlation to extend to j previous stages, we obtain a jth order ARMA model. Accordingly, let us suppose that the stochastic

Optimum measurement and control strategies 691

system is specified by the following equation:

j - I

Xi+ 1 = ~ Pi+l,i_kXi_k -b mi - vi (82) k=O

along with ( lb ) - ( ld ) , where we have the same statistical description for the random quantities x0, mi, and wi, and take Pk/= 0 for j < 0.

As seen in Sections 2 and 3, we may formulate both hard and soft constraint versions of the above problem, by adopting appropriate cost criteria. The soft constraint version is given below as Problem PS °.

Problem PS °.

Minimize E qiu 2 2 + ai+lXt+l + biv hN, y N L i = 0

subject to (82), and ( lb ) - ( ld ) .

Using completion of squares and redefinition of the v/s as 0i's (as in the case of the first-order ARMA model) we can obtain the equivalent problem PS below.

Problem PS.

M i n i m i z e E qi u2 + a i f3 i -- ~ bi,i_kXi_ k hN, y N LI=O k=O

subject to ( lb ) - ( ld ) , with xi replaced by -fi where the latter is generated by

. /-1

"l~i+l = ~ Pi+l,i--I~i-k "[" mi k=O

and v,. replaced by ~. The precise expressions for the b~.i-k's and 0;'s are given in Bansal (1988) where the details of the justification for this reformulation may also be found.

We now turn to analyzing these reformulated stochastic team problems.

Thus it is sufficient to optimize over the class of policies which use a linear transformation on the innovation in .fi.

Proof. Note that over the afline class we may write

with

Ui = Li(.f i ' yi-1) = ui + Pi

a, = - I yi-l)) and

Pi = L~(Y i-1)

where L~ is an arbitrary atfine mapping, and Li~ and Pi are uncorrelated. Thus we have

m i n J ( h N, ),N) "fN

N ./--1

[iffio (qiu2i+a'(E(~. - -k=O Cii-k'f'-klYi)'

-- ~ CLi-k.~i-k k=O

I N j--1 ) = E ~ ( q i f ~ 2 + q i p 2 + a i ( E ( ~ c , i _ k g ~ _ , l , i iffiO \ \ XkffiO '

;1 :)] -- ~ C~,i-k.~i-k

k ffi O I- N / //--1 \

> - E l 2 qift 2 + a i t E ~ , Ci , i - k ' f i - k lY i ) L i = 0 k ~ 0

-- E Ci,i--kii--k " k=O

We now note that the sigma field generated by )7 ~ is the same as the sigma field generated by yi where

yi = fii q- Wi = Yl -- pi

since p~ is y,-i measurable. Hence

5.1. Optimality over the affine class In this subsection we show that if we confine

the design to the affine class, then the optimal measurement strategies use a linear transform- ation on the innovation process.

Theorem 6. Consider the general formulation of Problem PS with hi restricted to the class

ui = hi(.f,, yi-1) = ti(.fi, yi-1)

where Li is a general affine mapping. Then one may, without loss of generality, confine to optimizing over the class of measurement policies which satisfy the structural restriction

u, = Xi[x, - E(~, I Y;-')I. (83)

min J (h N, y N ) YN

>- E ~ qift2i i=O

and therefore the optimized under (83).

( ( ) + a, E c,,,-ki,-k t Y' x Xk •O

-- E Ci,i--kii--k kffiO

cost functional may be the structural constraint

[]

The optimal strategies within the attine class for problems involving higher order ARMA models may now be found as for the first-order problem, which involves the solution of a sequential

692 R. BANSAL and T. B A S A R

n 1 w 2

1

w 1

Y . v

FIG. 4. Schematics for P rob lem Pa.

optimization problem. Since the procedure is the same as in the first-order problem, we do not pursue it any further here.

n 1

~ , U2 = Y2 . [~- ' - - "

• i

w 1

FIG. 5. Schematics for P rob lem P,~.

problem with a2,~ = 0. We then have Problem P~ below which is represented schematically in Fig. 5, and for which the hard power constraint on u2 is immaterial since there is no noise to combat.

5.2. Nonoptimality o f linear laws Here we show that for one of the simplest

team problems of the type above, involving an ARMA model of order 2, the optimum linear solution may be outperformed by an appropri- ately chosen non-linear policy.

We first restrict our attention to the following stochastic team Problem Po, a schematic representation of which is provided in Fig. 4.

Problem Pa"

Minimize E[(x - 'O) 2] h l , h 2 , y

where x l = x + n 1 (84)

u, = h , ( x ) (85)

Yl = u l + Wl (86)

u2 = h2(x,, y,) (87)

Y2 = u2 + WE (88)

V = y(y,, Y2) (89)

subject to the hard power constraints

E[u2l <_ p2 (90a)

E[u 21 <_ p2. (90b) []

Note that if the problem involved estimating x~ at the decoder (instead of x), then we would have had the two-stage version of a problem involving a first-order ARMA model as studied in Section 2, for which the optimal solutions have been shown to be linear.

We now show that Problem Po above does not, in general, admit an optimal linear solution. This is done by constructing an instance of the problem where the optimal strategies in the linear class are outperformed by appropriately chosen non-linear strategies.

In order to see why one might suspect nonoptimality of attine laws, consider the above

Problem P'o.

Minimize E[(x - v) 2]

subject to (84)-(87), and (89), along with the restriction:

Y2 = u2. (91) []

Note that since

v = Y(Yl, Y2) where

Y2 = u2 = h2(xl, Yl)

we may equivalently write

v = Y(Yl, h2(xl, YO) (92a)

= Y'(Yl, x,) (92b)

and thus Problem P~' may equivalently be represented as in Fig. 6.

We thus obtain a problem of simultaneously designing encoding and decoding policies with side information at the decoder, for which non-linear strategies which outperform the optimal linear strategies do exist.

Since linear policies are not optimal for 2 _ _ Problem P, with t r ~ - 0, they may continue to

be suboptimal for small enough values of o2~. We show next that this is precisely the case. In particular, if we consider the optimal linear

n 1

? X * V'

w 1

FIo. 6. Equivalent represen ta t ion for P rob lem P~.

Optimum measurement and control strategies 693

design for Problem P~, using

Ul = h i ( x ) = Zlx

Uz = h2(Xl, YI) = ~.z[xl - E(Xl [y0] where )~ and Z2 are chosen to meet the hard power constraints with equality (this being the optimal choice in the afline class, as shown in Section 5.1), we have

E(Xl I YI) = p~ + o~ Y~ (93)

and

xl - E(Xl I Yl) = nl + - - x

which implies that

U2 -~" /~2 e2"-'~"o.2wx p~"~2ww'~t'nl

w 1

(94)

with

~o~., + ~,(p~ + o~.,)"

The mean square error in estimating x from the simultaneous observation of yl and )'2 then is

E[(x - E(x lye, y2)) 2] = (p~ + 02 )(p~ + o ~ )

r ] X

+ (:. + + <))J (95)

Considering the situation with o~ = 100.0, a~, = 0.99, o~w, = 1.0, o ~ --- 0.01, P] = 85.0423 and P~ = 100.99, we find that the optimal linear policy yields a cost of 0.53467.

We next consider the design

h i ( x ) = x + e sgnx; h2(xl) --Xl and

(Yl + Ye - e)/2 if Y2 > 0 Y(Y~, Y2) = [ (y~ + Ye + e)/2 if Y2 < 0

(letting e = - 1 . 0 we obtain E[u~] = 85.0423 and E(u~) = 100.99).

Now

where y2 = x + n l + w2 =- x + w3

w3- N(O, 1).

If we calculate the mean square error under the above policy, we find that the non-linear policy yields a cost of 0.53172, and hence is superior to the linear optimal policy.

We now return to the problem of showing nonoptimality of linear laws for at least some instances of higher order A R M A models. Consider the following second-order model with

x2

Y0 Y 1 w2

Y0 Wl

w 0

• v 2

FIG. 7. Schematics for a second-order model with feedback.

feedback, illustrated in Fig. 7. We have

x2 = p21xl + P2oXo + ml

xl = PloXo + mo

and Xo, mo and m~ are given independent, zero mean, Gaussian random variables.

The problem is to minimize E [ ( v 2 - x z ) z] under the schematics of Fig. 7, with

v2 = Y(Yo, Yl, Y2).

Let us now suppose that 0~,,~ is arbitrarily large, essentially making the third channel redundant, and therefore

E(x2 [ Yo, Yl, Y2) = E(x2 l yo, yl).

Further suppose that

P21 = 0,

which imply

P2o-1, ~,,,:=0, Plo=l

X 2 ~ X 0

X 1 ----- Xo q- m I

and we obtain the problem depicted in Fig. 8. We thus obtain a problem of the type P,,, discussed earlier in this section, for which there are instances when linear strategies are not optimal. Hence we see that there are instances of the general problem (described by second- order ARMA processes) for which designs that are optimal in the altine class are nonoptimal in the general class of policies.

m 1

×0 x ~

w 0

v 2

FIG. 8. The second-order ARMA model under the given restrictions.

694 R. BANSAL and T. BA~AR

6. CONCLUSION In this paper we have considered stochastic

dynamic team problems where at each step two consecutive decisions were to be taken, one being what information-bearing signal to trans- mit, and the other being what control action to exert. Such problems arise in the simultaneous optimization of both the observation and the control sequences in stochastic systems. We solved the problem completely for first-order systems under quadratic cost criteria. This was done by first constructing an equivalent problem having a cost function consisting of a sum of squared differences, and then solving this equivalent problem using some bounds from information theory. For cases with hard power constraints, it was shown that the optimal measurement strategy is to linearly amplify the innovation at each stage to the maximum permissible power level. For cases with soft power constraints the structure of the solution was found to be similar, with the optimal power levels being found via solving a non-linear optimal control problem, this in turn being done by using a dynamic program. The results were then extended to cases with an infinite time horizon and a discounted cost functional, and the existence of optimal stationary policies for these problems was established.

We then considered stochastic dynamic deci- sion problems requiring simultaneous optimiz- ation of both the observation and the control sequences for second- and higher order systems under quadratic cost criteria. We considered optimality over the affine class for problems involving a general jth order model, and showed that within this class the optimal measurement strategy for the hard constraint version consists of transmitting the innovation linearly at each stage, leading to linear control laws. We then showed that for some of the simplest classes of such problems, involving second-order ARMA models, strategies which are optimal in the linear class may be outperformed by appropriately chosen non-linear strategies.

For related work in the continuous time, we should mention that the hard constraint version formulated in Section 2 was studied earlier in Liptser and Shiryayev (1976), where the optimality of linear encoding strategies was established. It seems that, using that result one may be able to show the optimality of innovation strategies for the continuous time counterpart of Problem P. However, extensions of this result to the infinite horizon case in the continuous time, and to higher order dynamics, are not immediate and are currently under study.

Acknowledgement--This work was supported in part by thc U.S. Air Force Office of Scientific Research under Grants AFOSR 84-0056 and AFOSR 88-0178, and in part by the Joint Services Electronics Program under Contract N00014-84-C-0149.

REFERENCES

Athans, M. (1972). On the determination of optimal costly measurement strategies for linear stochastic systems. Automatica, 8, 397-412.

Bansal, R. and T. Ba~ar (1987a). Stochastic teams with nonclassical information revisited--when is an affine law optimal? IEEE Trans. Aut. Control, AC-32, 554-559.

Bansal, R. and T. Ba~ar (1987b). Solutions to a class of linear-quadratic-Gaussian (LQG) stochastic team prob- lems with nonclassical information. Syst. Control Len., 9, 125-130.

Bansal, R. (1988). Simultaneous design of communication strategies and control policies in stochastic systems. Ph.D. Dissertation, University of Illinois at Urbana-Champaign, August.

Ba~ar, T. and J. B. Cruz, Jr. (1982). Concepts and methods in multiperson coordination and control. In S. G. Tzafestas (Ed.), Optimization and Control of Dynamic Operational Research Models, Chap. 11, pp. 351-394. North Holland, Amsterdam.

Berger, T. (1971). Rate Distortion Theory. Prentice-Hail, Englewood Cliffs, New Jersey.

Bertsekas, D. (1987). Dynamic Programming. Prentice-Hall, Englewood Cliffs, New Jersey.

Herring, K. D. and J. L. Melsa (1974). Optimum measurements for estimation. IEEE Trans. Aut. Control, AC-19, 264-266.

Ho, Y. C. (1980). Team decision theory and information structures. Proc. 1EEE, 68, 644-654.

Kagan, A. M., V. Y. Linnik and C. R. Rao (1973). Characterization Problems in Mathematical Statistics, pp. 408-410. Wiley, New York.

Kumar, P. R. and P. P. Varaiya (1986). Stochastic Systems: Estimation, Identification and Adaptive Control. Prentice- Hall, Englewood Cliffs, New Jersey.

Lafortune, S. (1985). On stochastic optimal control problems with selection among different costly observations. Elec. Res. Lab., University of California, Berkeley, Mere No. UCL/ERL M85/99.

Liptser, R. S. and A. N. Shiryayev (1976). Statistics of Random Processes--ll. Applications. Springer, New York.

Mehra, R. K. (1976). Optimization of measurement schedules and sensor designs for linear dynamic systems. IEEE Trans. Aut. Control, AC-21, 55-64.

Meier, L., J. Peschon and R. M. Dressier (1967). Optimum control of measurement subsystems. IEEE Trans. Aut. Control, AC-12, 528-536.

Papadimitriou, C. H. and J. N. Tsitsiklis (1986). Intractable problems in control theory. SlAM J. Control Optimiz., 2,4, 639-654.

Radner, R. (1962). Team decision problems. Ann. Math. Statist., 33, 857-881.

Sandell, N. and M. Athans (1974). Solutions of some nonclassical LQG stochastic decision problems. IEEE Trans. Aut. Control, AC-19, 108-116.

Shannon, C. E. (1959). Coding theorems for a discrete source with a fidelity criterion. IRE National Convention Record, Part 4, 142-163.

Whittle, P. and J. F. Rudge (1976). Simultaneous communication and control. Adv. Appl. Prob., 8, 365-384.

Witsenhausen, H. S. (1968). A counterexample in stochastic optimal control. SIAM J. Control, 6, 131-147.

Witsenhausen, H. S. (1971). Separation of estimation and control for discrete time systems. Proc. IEEE, 59, 1557-1566.