information embedding meets distributed control€¦ · problem in distributed control called the...

1

Information Embedding meets Distributed ControlPulkit Grover†, Aaron B. Wagner‡ and Anant Sahai†

Abstract— We consider the problem of information embed-ding where the encoder modifies a white Gaussian host signal ina power-constrained manner, and the decoder recovers both theembedded message and the modified host signal. This extendsthe recent work of Sumszyk and Steinberg to the continuous-alphabet Gaussian setting. We show that a dirty-paper codingbased strategy achieves the optimal rate for perfect recoveryof the modified host and the message. We also provide boundsfor the extension wherein the modified host signal is recoveredonly to within a specified distortion. Our results specialized tothe zero-rate case provide the tightest known lower bounds onthe asymptotic costs for the vector version of a famous openproblem in distributed control — the Witsenhausen counterex-ample. Using this bound, we characterize the asymptoticallyoptimal costs for the vector Witsenhausen problem to withina factor of 1.3 for all problem parameters, improving on theearlier best known bound of 2.

I. INTRODUCTION

+

Sm

Um

+

Zm

Xm X̂m

MM̂

Ym

E[‖Um‖2

]≤ mP

DE

Fig. 1. The host signal Sm is first modified by the encoder using a powerconstrained input Um. The modified host signal Xm and the message Mare then reconstructed at the decoder. The problem is to find the minimumdistortion in reconstruction of Xm given P , the power constraint, and R,the rate of reliable message transmission.

The problem of interest in this paper (see Fig. 1) derivesits motivation from an information-theoretic standpoint, aswell as from a distributed-control perspective. Information-theoretically, the problem is an extension of an informationembedding problem recently addressed by Sumszyk andSteinberg [1] — the encoder ensures that the decoder re-covers the modified host signal Xm perfectly, along with themessage. Philosophically, the work in [1] naturally followsthe lead of Steinberg’s earlier work in [2] where distributedsource-coding is explored with the added constraint of havingthe eventual reconstructions known at the encoder. There, thisadded constraint made the distributed lossy-coding problemsolveable even though in its traditional form, the problemfamously remains open except for the two-user Gaussian case[3].

In [1], the authors assume that the host signal Sm, themodified host signal (the channel input) Xm and the channeloutput Ym are all finite-alphabet. In this paper, we consider

†Wireless Foundations, Department of EECS, University of Califor-nia at Berkeley. Email: {pulkit, sahai} @ eecs.berkeley.edu. ‡ School ofElectrical and Computer Engineering, Cornell University. Email: wag-ner @ ece.cornell.edu.

the Gaussian (and hence infinite-alphabet) version of theproblem and derive tight upper and lower bounds. Theextension is non-trivial [4] because Fano’s inequality-basedtechniques do not work for the infinite-alphabet problem.Instead, we have to consider allowing the decoder to recon-struct Xm to within a specified average distortion. We deriveupper and lower bounds on the rate for nonzero distortion,and show their tightness in the limit of zero distortion.

Within information theory, our problem is also closelyrelated to those in [5]–[7] (these connections are describedin detail in [8]). Relative to the state amplification problemof Kim, Sutivong and Cover [5], the crucial difference isthat their decoder reconstructs the initial state Sm whereasours must recover Xm. In [6], the encoder wants to insteadreveal as little information about the initial state as possible.In [7], Kotagiri and Laneman consider two (“informed” and“uninformed”) encoders that transmit over a Multiple AccessChannel with the channel interference Sm known only tothe informed encoder. The informed encoder helps the unin-formed encoder communicate its message by modifying theinterference Sm. Intuitively, an improved reconstruction ofthe modified interference could lead to an improved decodingof the uninformed user’s message.

We ourselves have a different motivation for consideringthis problem — that of addressing a long-standing openproblem in distributed control called the Witsenhausen coun-terexample [9]. It was well known that centralized Linear-Quadratic-Gaussian (LQG) stochastic control systems areeasy to analyze — the optimal control law is linear in theobservation [10]. For general distributed systems, however,the inherent difference in “who knows what” brings out adouble-role of the control actions — while minimizing thelocal cost terms, they also communicate implicitly to theother controllers to reduce the knowledge differential. Thatthis double nature can make life difficult was demonstratedby Witsenhausen’s 1968 counterexample — a deceptivelysimple two-stage distributed LQG system. Witsenhausenshowed in the scalar case that a simple nonlinear two-point quantization strategy can outperform the best linearstrategies, despite everything being LQG with no bandwidth-mismatch. However, the problem of finding the optimalcontrol law has proved to be quite challenging and theproblem remains open. The non-convexity of the problemmakes the search for an optimal strategy hard [11], [12].In fact, suitably relaxed discrete analogs of the problem areeven NP complete [13].

The connection between our problem and the Witsen-hausen counterexample is that the special case of zero-ratecommunication in our formulation is precisely the asymptoticvector version of the counterexample with Witsenhausen’s

2

two distributed controllers playing the roles of E and Din Fig. 1. This connection was made by Grover and Sahaiin [8] where upper and lower bounds were developed onthe minimum distortion. Using these bounds, they show thata combination of linear and DPC-based strategies attainswithin factor 2 of optimal for any weighted sum of the powerand distortion costs. These were the first results showingany-sort of optimality for this type of problem. In a follow-up work, Grover, Sahai and Park [14] used generalizedsphere-packing ideas to obtain related lower bounds fornon-asymptotic vector lengths. Lattice-based quantizationstrategies were given that provably get to within a constantfactor of the optimal cost for each vector length, includ-ing Witsenhausen’s original scalar counterexample. Witsen-hausen’s counterexample is not an isolated case that can beaddressed using these tools — a “generalized” Witsenhausencounterexample is investigated in [15] where quadratic costsare imposed on all states and inputs. It is shown that vectorquantization based strategies attain within a factor of 4.13 ofthe optimal cost for this new problem as well.

In this context, our contribution here is a better lower-bound that improves the constant-factor approximation forthe asymptotic optimal costs from 2 in [8] to about 1.3.Techniques similar to those developed in [14] can likelybe used to obtain tighter nonasymptotic finite-length results.Further, with the new lower bound, the ratio of the upperand lower bound on the distortion (a formulation of moreinformation-theoretic flavor) is bounded by 1.5 for all valuesof power P and all problem parameters. This is a vastimprovement over [8] where this factor would have beeninfinity.

This constant-factor spirit is inspired from constant-gapapproximation results for the capacity1 of some notoriouslyhard multiuser wireless communication problems [16], [17].A parallel hope thus arises in distributed control, since theissue of implicit communication between controllers is asubiquitous there as interference is in wireless.

II. PROBLEM STATEMENT

The host signal Sm is distributed N (0, σ2I), and themessage M is independent of Sm and distributed uniformlyover {1, 2, . . . , 2mR}. The encoder Em maps (M,Sm) to Xm

by distorting Sm using input Um of average power (for eachmessage) at most P , i.e. E

[‖Sm −Xm‖2

]≤ mP . Additive

white Gaussian noise Zm ∼ N (0, σ2zI), where σ2

z = 1, isadded to Xm by the channel. The decoder Dm maps thechannel outputs Ym to an estimate X̂m of the modified hostsignal Xm, and an estimate M̂ of the message.

Define the error probability εm(Em,Dm) = Pr(M 6= M̂).For the encoder-decoder sequence {Em,Dm}∞m=1, define theminimum asymptotic distortion MMSE(P,R) as

inf{Em,Dm}∞m=1:εm(Em,Dm)→0

lim supm→∞

1m

E[‖Xm − X̂m‖2

].

We are interested in the tradeoff between the rate R, thepower P , and MMSE(P,R).

1In this context, a factor of 2 approximation in transmit power would bea slightly stronger result than a one-bit approximation in the capacity.

The conventional control-theoretic weighted cost formula-tion [9] defines the total cost to be

J =1mk2‖Um‖2 +

1m‖Xm − X̂m‖2. (1)

The objective is to minimize the average cost, E [J ] at rate R.The average is taken over the realizations of the host signal,the channel noise, and the message. At R = 0, the problemis the vector Witsenhausen counterexample [8].

III. MAIN RESULTS

A. Lower bounds on MMSE for rate R and power P

Theorem 1: For the problem as stated in Section II, forcommunicating reliably at rate R with input power P , theasymptotic average mean-square error in recovering Xm islower bounded as follows. For P ≥ 22R − 1,

MMSE(P,R)

≥ infσSU

supγ>0

1γ2

((√σ222R

1 + σ2 + P + 2σSU

−√

(1− γ)2σ2 + γ2P − 2γ(1− γ)σSU)+)2

,

where max{−σ√P , 22R−1−P−σ2

2

}≤ σSU ≤ σ

√P . For

P < 22R − 1, reliable communication is not possible.Corollary 1: For the vector Witsenhausen problem (R =

0) with E[‖Um‖2

]≤ mP , the following is a lower bound

on the MMSE in the estimation of Xm.

MMSE(P, 0)

≥ infσSU

supγ>0

1γ2

((√σ2

1 + σ2 + P + 2σSU

−√

(1− γ)2σ2 + γ2P − 2γ(1− γ)σSU)+)2

.

where σSU ∈[−σ√P , σ√P]. Further, this bound holds for

all values of m (and not merely in the limit m→∞).For conceptual clarity and conciseness, we only consider thecase R = 0 (Corollary 1). Tools developed herein extenddirectly to the case R > 0, and are detailed in [18].

Proof: [Of Corollary 1]For any chosen encoding map E and decoding map D,

there is a Markov chain Sm → Xm → Ym → X̂m. Usingthe data-processing inequality

I(Sm; X̂m) ≤ I(Xm;Ym). (2)

The terms in the inequality can be bounded by single letterexpressions as follows. Define Q as a random variableuniformly distributed over {1, 2, . . . ,m}. Define S = SQ,U = UQ, X = XQ, Z = ZQ, Y = YQ and X̂ = X̂Q. Usingthe memorylessness of the channel (see [18] for details),

I(Xm;Ym) ≤ mI(X;Y ), (3)

and using the fact that the source is iid,

mI(S; X̂) ≤ I(Sm; X̂m). (4)

3

Using (2), (3) and (4),

I(S; X̂) ≤ I(X;Y ). (5)

Also observe that from the definitions of S, X , X̂and Y , E

[1m‖Sm −Xm‖2

]= E

[(S −X)2

], and

E[

1m‖Xm − X̂m‖2

]= E

[(X − X̂)2

].

Using the Cauchy-Schwartz inequality, the correlationσSU = E [SU ] must satisfy the following constraint,

|σSU | = |E [SU ] | ≤√

E [S2]√

E [U2] = σ√P . (6)

Also,

E[X2]

= E[(S + U)2

]= σ2 + P + 2σSU . (7)

Since Z = Y − X ⊥⊥ X , and because a Gaussian inputdistribution maximizes the mutual information across anaverage-power constrained AWGN channel, we can upperbound the RHS of (5) as follows

I(X;Y ) ≤ 12

log2

(1 +

σ2 + P + 2σSU1

). (8)

Now, looking at the LHS in (5),

I(S; X̂) = h(S)− h(S|X̂)

= h(S)− h(S − γX̂|X̂) ∀γ(a)

≥ h(S)− h(S − γX̂)

=12

log2

(2πeσ2

)− h(S − γX̂), (9)

where (a) follows from the fact that conditioning reducesentropy. Note here that the result holds for any γ, and inparticular, γ can depend on σSU . Now,

h(S − γX̂)

= h((1− γ)S − γU − γ(X̂ −X)

). (10)

Using the Cauchy-Schwartz inequality, we can bound thesecond moment for the random variable in (10),

E[(

(1− γ)S − γU − γ(X̂ −X))2]

≤(√

(1− γ)2σ2 + γ2P − 2γ(1− γ)σSU

+γ√

E[(X̂ −X)2

])2

. (11)

This is obtained by aligning2 X − X̂ with (1− γ)S − γU .Thus,

I(S; X̂)

≥ 12

log2

(2πeσ2

)− h(S − γX̂)

≥ 12

log2

(σ2)

− log2

(√(1− γ)2σ2 + γ2P − 2γ(1− γ)σSU

+γ√

E[(X̂ −X)2

]). (12)

2In general, since bXm is a function of Ym, this alignment is not actuallypossible unless the recovery of Xm is exact. The derived bound is thereforeloose except in the zero-distortion case.

Using (5), (8) and (12), followed by algebraic manipulationsgives us the expression for the lower bound in Corollary 1,where the supγ>0 is obtained because any choice of γ > 0is allowed, and the infσSU

is obtained by allowing for allencoders and decoders (the encoding and decoding strategyinfluence the bound only through σSU , and σSU has tosatisfy (6)). The details are in [18] since space here is ata premium.B. The tightness at MMSE(P,R) = 0

For an achievable strategy, we use the combination oflinear and dirty-paper coding strategies of [8], except thathere we communicate a message at rate R as well.

Corollary 2: Given power P , a combination of linear andDPC-based strategies achieves the maximum rate C(P ) inthe limit MMSE(P,R) = 0, where C(P ) is given by

C(P )

= supσSU

12

log2

((Pσ2 − σ2

SU )(1 + σ2 + P + 2σSU )σ2(σ2 + P + 2σSU )

),

where σSU ∈[−σ√P , 0

].

Proof: We only outline the proof, see [18] for details.The achievability:We first show that the combination of linear and DPC

strategies [8] achieves the expression in Corollary 2. A DPCstrategy with the DPC parameter α = 1 (ensuring perfectrecovery of Xm) achieves the following rate [19, Eq. (6)]

R =12

log2

(P (P + σ2 + 1)

P + σ2

). (13)

Now, the combination strategy divides P into Plin and Pdpc.The linear part scales Sm, modifying the initial state tovariance σ̃2 = (σ −√Plin)2. The now achievable rate is

R̃ = supPlin,Pdpc:P=Plin+Pdpc

12

log2

(Pdpc(Pdpc + σ̃2 + 1)

Pdpc + σ̃2

).

Let σSU = −σ√Plin (note that as Plin varies from 0 toP , σSU varies from 0 to −σ

√P ). Then, Pdpc = P − σ2

SU

σ2 ,and Pdpc + σ̃2 = Pdpc + σ2 + Plin − 2σ

√Plin = P + σ2 +

2σSU . The resulting expression matches the expression inCorollary 2.

The converse:Since we are free to choose γ, let γ = γ∗ = σ2+σSU

σ2+P+2σSU.

Then, 1 − γ∗ = P+σSU

σ2+P+2σSU. Since MMSE = 0, it has to

be the case that the term inside (·)+ in the lower bound isnon-positive for some value of σSU . This immediately yields

22R ≤ supσSU

{1σ2

((1− γ∗)2σ2 + γ∗

2P − 2γ(1− γ∗)σSU

)

×(1 + σ2 + P + 2σSU )}

= supσSU

(Pσ2 − σ2SU )(1 + σ2 + P + 2σSU )

σ2(σ2 + P + 2σSU ),

where σSU ∈[−σ√P , σ√P]. Observe that the term (Pσ2−

σ2SU ) is oblivious to the sign of σSU . Further,

1 + σ2 + P + 2σSUσ2 + P + 2σSU

= 1 +1

σ2 + P + 2σSU(14)

4

is clearly larger for σSU < 0 if we fix |σSU |. Thus thesupremum in the upper bound is attained at some σSU < 0,and we get that the expression in the Corollary is an upperbound on the rate as well.

IV. NUMERICAL RESULTS

Witsenhausen’s control theoretic formulation seeks to min-imize the sum of weighted costs k2P +MMSE. Fig. 2(b)numerically shows that asymptotically, the ratio of upper andnew lower bounds on the weighted cost is bounded by 1.3, animprovement over the ratio of 2 in [8] and shown in Fig. 2(a).A ridge of ratio 2 along σ2 =

√5−12 present in Fig. 2(a)

(plotted using bound of [8]) does not exist anymore with thenew lower bound since this small-k regime corresponds totarget MMSEs close to zero – where the new lower boundis tight. This is illustrated in Fig. 3 which also explains whythe ridge along k ≈ 1.67 still survives in the new ratio plot.

Fig. 4 shows the ratio of upper and lower bounds onMMSE(P, 0) versus P and σ. While the ratio with thebound of [8] was unbounded (Fig. 4, top), the new ratio isbounded by a factor of 1.5 (Fig. 4, bottom). Fig. 5 shows thatthe price of increasing rate is an increased average distortionin Xm estimation at high rates.

The MATLAB code for these figures can be found in [20].

ACKNOWLEDGMENTS

P. Grover and A. Sahai acknowledge the support of theNational Science Foundation (CNS-403427, CNS-093240,CCF-0917212 and CCF-729122) and Sumitomo Electric.A.B. Wagner acknowledges the support of NSF 06-42925(CAREER) grant. We thank Hari Palaiyanur and Se YongPark for helpful discussions.

REFERENCES

[1] O. Sumszyk and Y. Steinberg, “Information embedding with reversiblestegotext,” in Proceedings of the 2009 IEEE Symposium on Informa-tion Theory, Seoul, Korea, Jun. 2009.

[2] Y. Steinberg, “Simultaneous transmission of data and state withcommon knowledge,” in Proceedings of the 2008 IEEE Symposiumon Information Theory, Toronto, Canada, Jun. 2008, pp. 935–939.

[3] A. B. Wagner, S. Tavildar, and P. Viswanath, “Rate region of thequadratic Gaussian two-terminal source-coding problem,” IEEE Trans.Inform. Theory, vol. 54, no. 5, pp. 1938–1961, May 2008.

[4] Y. Steinberg, personal communication, Jun. 2009.[5] Y.-H. Kim, A. Sutivong, and T. M. Cover, “State amplification,” IEEE

Trans. Inform. Theory, vol. 54, no. 5, pp. 1850–1859, May 2008.[6] N. Merhav and S. Shamai, “Information rates subject to state mask-

ing,” IEEE Trans. Inform. Theory, vol. 53, no. 6, pp. 2254–2261, Jun.2007.

[7] S. P. Kotagiri and J. N. Laneman, “Multiaccess channels with stateknown to some encoders and independent messages,” EURASIP Jour-nal on Wireless Communications and Networking, no. 450680, 2008.

[8] P. Grover and A. Sahai, “Vector Witsenhausen counterexample asassisted interference suppression,” Special issue on InformationProcessing and Decision Making in Distributed ControlSystems of the International Journal on Systems, Control andCommunications (IJSCC), to appear, 2010. [Online]. Available:http://www.eecs.berkeley.edu/∼sahai/

[9] H. S. Witsenhausen, “A counterexample in stochastic optimum con-trol,” SIAM Journal on Control, vol. 6, no. 1, pp. 131–147, Jan. 1968.

[10] D. Bertsekas, Dynamic Programming. Belmont, MA: Athena Scien-tific, 1995.

[11] R. Bansal and T. Basar, “Stochastic teams with nonclassical informa-tion revisited: When is an affine control optimal?” IEEE Transactionson Automatic Control, vol. 32, pp. 554–559, Jun. 1987.

!!

!"

#

"

!$!#%&

#'#%&

$$

$%"

$%!

$%(

$%)

"

log10(k)

log10(!0)

Ratio

*+,-.'./'01-23+456789':;;34'+2<';43=-.:>'1.?34@.:2<

10log (k)

10log (s)

k =√

0.1, σ2 =√

5− 12

σ2 =√

5− 12

(a)

!!

!"

#

"

!$!#%&

##%&

$

$

$%"

$%!

$%'

$%(

"

)*+$#,-.

)*+$#,!.

/012*

-343$%'563!343$#

Ratio of (linear +DPC) upper and thenew lower bound

(b)

Fig. 2. The ratio of upper and lower bounds on the total asymptotic costfor the vector Witsenhausen counterexample with the lower bound takenfrom [8] in (a) and from Corollary 1 in (b). As compared to the previous bestknown ratio of 2 [8], the ratio here is smaller than 1.3. Further, an infiniteridge along σ2 =

√5−12

and small k that is present in lower bounds of [8]is no longer present here. This is a consequence of the tightness lower boundat MMSE = 0, and hence for small k. A ridge remains along k ≈ 1.67(log10(k) ≈ 0.22) and large σ, and this can be understood by observingFig. 3 for σ = 10.

[12] J. T. Lee, E. Lau, and Y.-C. L. Ho, “The Witsenhausen counterex-ample: A hierarchical search approach for nonconvex optimizationproblems,” IEEE Trans. Automat. Contr., vol. 46, no. 3, 2001.

[13] C. H. Papadimitriou and J. N. Tsitsiklis, “Intractable problems incontrol theory,” SIAM Journal on Control and Optimization, vol. 24,no. 4, pp. 639–654, 1986.

[14] P. Grover, A. Sahai, and S. Y. Park, “The finite-dimensional witsen-hausen counterexample,” Proceedings of the Workshop on Control overCommunication Channels (ConCom), Jul. 2009.

[15] P. Grover, S. Y. Park, and A. Sahai, “On the generalized Witsenhausencounterexample,” in Proceedings of the Allerton Conference on Com-munication, Control, and Computing, Monticello, IL, Oct. 2009.

[16] R. Etkin, D. Tse, and H. Wang, “Gaussian interference channelcapacity to within one bit,” IEEE Trans. Inform. Theory, pp. 5534–5562, Dec. 2008.

[17] A. Avestimehr, S. Diggavi, and D. Tse, “A deterministic approach

5

! !"# !"$ !"% !"& !"'!

!"!'

!"#

!"#'

!"$

!"$'

!"%

!"%'

!"&

(

))*+

,-./012324(5677/12896.:

;/<2,9</12=96.:

(1/>-96?2,9</12=96.:

@0.A/.B?29C2?D97/22!2!"#22222222222EF2G2222222H

√0.1

σ =

√√5− 12

R = 0

(a)

! !"# !"$ !"% !"& '!

!"'

!"#

!"(

!"$

!")

!"%

!"*

!"&

!"+

'

,

--./

0123456768,9:;;356<=:2>

?3@60=@356A=:2>

,53B1=:C60=@356A=:2>

D42E32FC6=G6CH=;366!6'"%*#

66666666666IJ6K6'"%*L

σ = 10R = 0

(b)

Fig. 3. Upper and lower bounds on P vs asymptotic MMSE for σ =q√5−12

(square-root of the Golden ratio; Fig. (a)) and σ = 10 (b) forzero-rate (the vector Witsenhausen counterexample). Tangents are drawn to

evaluate the total cost for k =√

0.1 for σ =

q√5−12

, and for k = 1.67

for σ = 10 (slope = −k2). The intercept on the MMSE axis of the tangentprovides the respective bound on the total cost. The tangents to the upperbound and the new lower bound almost coincide for small values of k. Atk ≈ 1.67 and σ = 10, however, our bound is not significantly better thanthat in [8] and hence the ridge along k ≈ 1.67 remains in the new ratioplot in Fig. 2.

to wireless relay networks,” in Proc. of the Allerton Conference onCommunications, Control and Computing, October 2007.

[18] P. Grover, A. B. Wagner, and A. Sahai, “Informationembedding meets distributed control,” Submitted to IEEETransactions on Information Theory, Nov. 2009. [Online]. Available:http://www.eecs.berkeley.edu/∼pulkit/InfoEmbedding.pdf

[19] M. H. M. Costa, “Writing on dirty paper,” IEEE Trans. Inform. Theory,vol. 29, no. 3, pp. 439–441, May 1983.

[20] “Code for ‘Information embedding meetsdistributed control’.” [Online]. Available:http://www.eecs.berkeley.edu/∼pulkit/InformationEmbedding.htm

10−3

10−2

10−1

100

−1.5

−1

−0.5

0

0.5

1

1

2

3

4

5

6

7

8

P (log−scale)

Ratio of (linear+DPC) upper and the previous lower bound on MMSE

log10

(σ)

Ra

tio

10−3

10−2

10−1

100

−1.5

−1

−0.5

0

0.5

1

1

1.2

1.4

1.6

1.8

2

P (log−scale)

Ratio of (linear+DPC) upper and the new lower bound on MMSE

log10

(σ)

Ra

tio

Fig. 4. Ratio of upper and lower bounds on MMSE vs P and σ at R = 0.Whereas the ratio diverges to infinity with the old lower bound of [8] (top),it is bounded by 1.5 for the new bound (bottom). This is a consequence ofthe improved tightness of the new bound at small MMSE.

0 0.1 0.2 0.3 0.4 0.50

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Rate

MMSE

Linear + DPCupper bound

New lower bound

σ =

√√5− 12

P = 1

Fig. 5. Plot of upper and lower bounds on MMSE vs rate for fixed power

P = 1 and σ =

q√5−12

. Higher rates require higher average distortionin the reconstruction of Xm.

information embedding meets distributed control€¦ · problem in distributed control called the...

Documents