multi-terminal secrecy and source coding a …

MULTI-TERMINAL SECRECY AND SOURCE CODING

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF ELECTRICAL

ENGINEERING

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Yeow-Khiang Chia

December 2011

http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/jh787xk7499

© 2011 by Yeow Khiang Chia. All Rights Reserved.

Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

ii



http://purl.stanford.edu/jh787xk7499

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Abbas El-Gamal, Primary Adviser


Thomas Cover


Itschak Weissman

Approved for the Stanford University Committee on Graduate Studies.

Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.

iii

Yeow-Khiang Chia

I certify that I have read this dissertation and that, in my opinion, it

is fully adequate in scope and quality as a dissertation for the degree

of Doctor of Philosophy.

(Abbas El Gamal) Principal Adviser




(Tsachy Weissman)




(Thomas Cover)

Approved for the University Committee on Graduate Studies

Preface

Network Information Theory is a branch of Information Theory that aims to address

the fundamental limits of communications in a multiple user setting and when several

sources of information are available. While a complete theory is still lacking, tools and

techniques developed to address research problems in Network Information Theory

have started to make an impact on other fields.

In this thesis, we present results in two sub-areas in Network Information The-

ory. The first area, Information-Theoretic Secrecy, involves an application of Network

Information Theory techniques to a secret communication setting, where a message

is to be kept hidden from an eavesdropper. We consider two generalizations of the

classical wiretap channel, where a sender wishes to communicate to a receiver over

a noisy broadcast channel in the presence of an eavesdropper. The first generaliza-

tion considers the setting where we have more than one receiver or more than one

eavesdropper. The second setting considers the case where the transmission occurs

over a broadcast channel with state - a channel model that can serve a base model

for communicating over fast fading channels found in wireless communications.

In the second part of this thesis, we turn our attention to the area of multi-terminal

source coding. In particular, we investigate the Cascade Source Coding problem

where a source node has a source sequence that it wishes to send to an intermediate

node over a rate limited link, and also to an end node that the intermediate node

can communicate with over a rate limited link. We investigate the optimum coding

schemes under different assumptions on the side information available at the nodes,

or operational restrictions on the reconstruction process.

Acknowledgements

I am grateful to my adviser Professor Abbas El Gamal, who taught me most of what

I know about Network Information Theory through our interactions, his EE476 class

and his upcoming book on this subject. In addition, he has also taught me most of

what I know about research in general. The breath and depth of his research work is

inspiring, and I hope that I will be able to emulate it someday.

I am also grateful to my co-adviser Professor Tsachy Weissman, who has also

taught me a lot about Information Theory. I had the pleasure of learning about

universal schemes in Information Theory from his EE477 class, and about the rela-

tionship between estimation, mutual information and divergence through our joint

work. I have also benefited much from his group meetings.

Some of the fun times I had in Stanford were in Professor Thomas Cover’s research

seminars, and I thank him for giving me the opportunity to attend his research

seminars, where I got to know many interesting puzzles, facets of Information Theory,

Mathematics, sports and life in general. The insights and ideas I obtained from his

research seminars are too broad to list out, but just to name one, I think I now know

how to gamble in certain situations, such as when I am faced with a Saint Petersburg’s

Paradox.

I have benefited greatly from my interactions with the past and present students

at the Information Systems Laboratory and I thank them for it. In particular, I

would like to thank Idoia Ochoa Alvarez, Himanshu Asnani, Bernd Bandemer, Bob-

bie Glen Chern, Paul Cuff, Shirin Jalali, Young-Han Kim, Joseph Koo, Gowtham

Kumar, Alexandros Manolakos, Taesup Moon, Vinith Misra, Chandra Nair, Albert

No, Haim Permuter, Han-I Su, Kartik Venkat and Lei Zhao. I would also like to

thank Kelly, our group’s administrative assistant, for her efficiency and knowledge in

various administrative matters.

My experience in Stanford has been greatly enriched by several visiting students

and researchers. I would like to thank the visitors, especially Rajiv Soundararajan

and Majid Khormuji, for many interesting discussions. I was also fortunate to be able

to visit Professor Zhang Lin’s group at Tsinghua University for research exchange,

and I would like to thank Professor Zhang Lin and his students for hosting me.

Outside of research, I was fortunate to make some good friends in Stanford and,

in particular, I would like to thank Vijay Chandrasekhar, Issac Lim, Christopher

Philips, Lieven Verslegers, Karthik Vijayraghavan and Yang Wang for their friendship

and times we spent together.

Finally, and most importantly, I thank my wife, Hwee-Hoon, for her constant

encouragement, love and support, without which my journey in Stanford would not

have been possible.

Contents

Preface

Acknowledgements

1 Introduction 1

1.1 Information-Theoretic Secrecy . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Cascade Source Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 3 Receivers Wiretap channel 8

2.1 Introduction to 3-receivers Wiretap Channel . . . . . . . . . . . . . . 8

2.2 Definitions and Problem Setup . . . . . . . . . . . . . . . . . . . . . . 11

2.2.1 2-Receivers, 1-Eavesdropper . . . . . . . . . . . . . . . . . . . 11

2.2.2 1-Receiver, 2-Eavesdroppers . . . . . . . . . . . . . . . . . . . 12

2.3 2-receiver wiretap channel . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 2-receivers, 1-eavesdropper wiretap channel . . . . . . . . . . . . . . . 16

2.4.1 Asymptotic perfect secrecy . . . . . . . . . . . . . . . . . . . . 17

2.4.2 2-Receivers, 1-Eavesdropper with Common Message . . . . . . 28

2.5 1-receiver, 2-eavesdroppers wiretap channel . . . . . . . . . . . . . . . 35

2.6 Summary of Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3 Wiretap Channel with state 44

3.1 Introduction to Wiretap Channel with Causal State Information . . . 44

3.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.3 Summary of Main Results . . . . . . . . . . . . . . . . . . . . . . . . 47

3.4 Proof of Theorem 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.5 Proof of Theorem 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 68


4 Cascade and Triangular Source Coding 71

4.1 Introduction to Cascade and Triangular Source Coding . . . . . . . . 71


4.2.1 Cascade and Triangular Source coding . . . . . . . . . . . . . 75

4.2.2 Two way Cascade and Triangular Source Coding . . . . . . . 76

4.3 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.3.1 Cascade Source Coding . . . . . . . . . . . . . . . . . . . . . . 78

4.3.2 Triangular Source Coding . . . . . . . . . . . . . . . . . . . . 81

4.3.3 Two Way Cascade Source Coding . . . . . . . . . . . . . . . . 84

4.3.4 Two Way Triangular Source Coding . . . . . . . . . . . . . . . 90

4.4 Quadratic Gaussian Distortion Case . . . . . . . . . . . . . . . . . . . 94

4.4.1 Quadratic Gaussian Cascade Source Coding . . . . . . . . . . 94

4.4.2 Quadratic Gaussian Triangular Source Coding . . . . . . . . . 96

4.4.3 Quadratic Gaussian Two Way Source Coding . . . . . . . . . 98

4.5 Triangular Source Coding with a helper . . . . . . . . . . . . . . . . . 106


5 Causal Cascade and Triangular S.C. 111

5.1 Introduction to Cascade and Triangular Source Coding with causality

constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111


5.2.1 Causal Cascade and Triangular Source coding with same side

information at first 2 nodes . . . . . . . . . . . . . . . . . . . 113

5.2.2 Lossless Causal Cascade Source Coding . . . . . . . . . . . . . 114

5.3 Causal Cascade and Triangular Source coding with same side informa-

tion at first 2 nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.3.1 Causal Cascade Source Coding . . . . . . . . . . . . . . . . . 115

5.3.2 Causal Triangular Source Coding . . . . . . . . . . . . . . . . 118

5.4 Lossless Causal Cascade Source Coding . . . . . . . . . . . . . . . . . 119


6 Conclusions 125

A Proofs for Chapter 2 127

A.1 Proof of Lemma 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

A.2 Evaluation for example . . . . . . . . . . . . . . . . . . . . . . . . . . 129

A.3 Proof of Theorem 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

A.4 Converse for Proposition 2.2 . . . . . . . . . . . . . . . . . . . . . . . 135

A.5 Proof of Proposition 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . 136

A.6 Proof of Proposition 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . 138

B Proofs for Chapter 3 143

B.1 Proof of Proposition 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . 143



C Proofs for Chapter 4 151

C.1 Achievability proof of Theorem 4.1 . . . . . . . . . . . . . . . . . . . 151




C.5 Cardinality Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

C.6 Alternative characterizations of rate distortion regions in Corollar-

ies 4.1 and 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

C.7 Proof of Converse for Triangular source coding with helper . . . . . . 162

Bibliography 165

List of Tables

List of Figures

1.1 Wiretap Channel: The encoder wishes to send a message to the decoder

so that the decoder receives the message with high probability, while

keeping it secret from the eavesdropper. Specifically, we require that

limn→∞ I(M ;Zn) = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Special case of setting consider in Chapter 2. Here, we wish to send a

confidential message to both decoders 1 (Y n1 ) and 2 (Y n

2 ) while keeping

it secret from the eavesdropper. . . . . . . . . . . . . . . . . . . . . . 3

1.3 Wiretap channel with State . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Cascade source coding setting . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Structure of random sequences in Lemma 2.1. V n(l) is generated ac-

cording to∏n

i=1 pV |U(vi|ui). Solid arrows represent the sequence pair

(Un, V n(L), Zn) while the dotted arrows to Zn represent the other V n

sequences jointly typical with the (Un, Zn) pair. Lemma 2.1 gives an

upper bound on the number of V n sequences that can be jointly typical

with a (Un, Zn) pair. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Multilevel broadcast channel . . . . . . . . . . . . . . . . . . . . . . . 20

3.1 Wiretap channel with State . . . . . . . . . . . . . . . . . . . . . . . 45

3.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.3 Encoding in block j. . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.1 Cascade source coding setting . . . . . . . . . . . . . . . . . . . . . . 72

4.2 Triangular source coding setting. . . . . . . . . . . . . . . . . . . . . 72

4.3 Setup for two way cascade source coding. . . . . . . . . . . . . . . . . 74

4.4 Setup for two way triangular source coding. . . . . . . . . . . . . . . 74

4.5 Extended Quadratic Gaussian Two Way source coding . . . . . . . . 99

4.6 Setup for analysis of achievability of backward rates . . . . . . . . . . 101

4.7 Triangular Source Coding with a helper . . . . . . . . . . . . . . . . . 106

5.1 Triangular source coding setting. If R3 = 0, this setting reduces to the

Cascade setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.2 Lossless Cascade Source Coding Setting . . . . . . . . . . . . . . . . . 114

C.1 Cascade source coding setting for the optimization problem in Corol-

lary 4.1. X1 and X2 are lossy reconstructions of A+B. . . . . . . . . 161

C.2 Cascade source coding setting for the optimization problem in Corol-

lary 4.1. X1 and X2 are lossy reconstructions ofX and Z is independent

X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Chapter 1

Introduction

Network Information Theory studies the fundamental limits of communications over

a network of nodes. While we still do not have a complete theory, many signifi-

cant advances have been made, and coding techniques developed to address network

information theoretic problems have started to make an impact on other fields.

In this thesis, we present some results in Network Information Theory. As the

title of the thesis suggests, our focus is on two sub-fields in this area: Information-

Theoretic Secrecy and Multi-terminal source coding. In the next two sections, we

give an informal introduction to these two areas and the problems and results that

will be presented in the subsequent chapters.

1.1 Information-Theoretic Secrecy

The first topic, Information Theoretic Secrecy, spanning Chapters 2 and 3, involves an

interesting twist to the traditional Broadcast channel problem [13, Chapters 5 and 8]

considered in Network Information Theory. Instead of transmitting at as high a rate

as possible to both decoders, we consider the problem of transmitting a message to

one decoder while keeping it secret from the second decoder. This problem setting,

known as the wiretap channel and summarized in Figure 1.1, was first introduced

by Wyner [36]. He considered the problem of transmitting a message, M , at as

high a rate as possible to one decoder (the legitimate receiver) while keeping the

1

information leakage rate, I(M ;Zn)/n, to the second decoder, the eavesdropper, as

small as possible. Specifically, the requirement is for the information leakage rate to

approach zero as the number of channel uses n goes to infinity.

EncoderM

MDecoder

Eavesdropper

Y n

Zn

p(y, z|x)Xn

Figure 1.1: Wiretap Channel: The encoder wishes to send a message to the decoderso that the decoder receives the message with high probability, while keeping it secretfrom the eavesdropper. Specifically, we require that limn→∞ I(M ;Zn) = 0.

In Wyner’s paper, he considered only the degraded broadcast channel, where the

eavesdropper receives a degraded version of the output at the legitimate receiver. His

work was later generalized by Csiszar and Korner, who gave the secrecy capacity

characterization for a general (not necessarily degraded) broadcast channel [7]. In

that work, Csiszar and Korner also considered a more general version of the wire-

tap channel problem, where the aim is to send common message, M0, to both the

legitimate receiver and the eavesdropper and a confidential message, M1, to the le-

gitimate receiver. They measured secrecy by a more general notion of equivocation

that the eavesdropper has about the message M1 given its received signal Zn; i.e.

H(M1|Zn)/n.

In recent years, the area of information theoretic secrecy has gained considerable

amount of attention due to its potential applications in wireless communications,

where the assumption of a noisy broadcast channel is natural. There has been con-

siderable amount of work extending the classical work of Wyner and Csiszar-Korner.

In this thesis, we present two generalizations of the wiretap channel problem.

2

CHAPTER 1. INTRODUCTION 3

In Chapter 2, we consider a generalization of the wiretap channel to more than

two receivers. Specifically, we consider a three receivers broadcast channel, where we

need to send one common message to all three receivers and a confidential message

to a subset of the receivers (see Figure 1.2 for an illustration of a special case of

the settings considered). We consider two setups and establish inner bounds for

both settings. The first setting is when the confidential message is to be sent to

two receivers and kept secret from the third receiver. The second setup investigated

is when the confidential message is to be sent to one receiver and kept secret from

the other two receivers. The inner bounds we give use a combination of classical

coding techniques for the wiretap channel as well as more recent coding techniques

for the three receivers broadcast channel problem such as indirect decoding [26].

We also introduce new ideas in the inner bounds, such as the idea of structured

noise generation, where we generate secrecy using a publicly available superposition

codebook. The inner bounds are shown to be strictly larger than straightforward

extension of classical coding techniques in Wyner and Csiszar-Korner to the three

receivers wiretap channel setting, and are optimal for several classes of three receivers

wiretap channels.

EncoderMp(y1, y2, z|x)Xn

M

M

Y n1

Y n2

Zn

Figure 1.2: Special case of setting consider in Chapter 2. Here, we wish to send aconfidential message to both decoders 1 (Y n

1 ) and 2 (Y n2 ) while keeping it secret from

the eavesdropper.

In Chapter 3, we generalize the wiretap channel to consider the model of a wiretap

channel with state (see Figure 1.3), where the state information is available at the

encoder and the decoder. This model of a wiretap channel can serve as a base model

for communicating over fast fading channels where the encoder and decoder have

some means of measuring the channel state information, but the eavesdropper does

not.

MXi

Si

Yi

Zi

M

Encoder

Decoder

Eavesdropper

p(s)

p(y, z|x, s)

Figure 1.3: Wiretap channel with State

For this setup, we present a lower bound on the secrecy capacity of the wiretap

channel with state information available causally at both the encoder and the decoder

is established. To be slightly more explicit, we assume that the encoder is restricted to

outputXi using only the state information up to time i and the messageM . Our lower

bound is shown to be strictly larger than that for the noncausal case proposed by other

authors (Liu and Chen [23]). Achievability is proved using a combination of coding

techniques including block Markov coding, Shannon strategy, and key generation from

common state information. The state sequence available at the end of each block is

used to generate a key to enhance the transmission rate of the confidential message

in the following block. An upper bound on the secrecy capacity when the state is

available noncausally at the encoder and the decoder is established and is shown to

coincide with the aforementioned lower bound for several classes of wiretap channels

with state.

4


1.2 Cascade Source Coding

The next area in Network Information Theory that we will be discussing in this

thesis is in the area of multi-terminal source coding, spanning Chapters 4 and 5.

While this is a classical area of research in Network Information Theory, there are

still a number of interesting open problems, both of practical and theoretical interest.

One such problem is the problem of Cascade Source Coding with side information.

The general setup of the problem is shown in Figure 1.4. We have three random

variables X, Y and Z which are given by Nature according to some joint distribution

p(x, y, z). We assume that we have n i.i.d. copies of Xn, Y n and Zn, with the source

sequence Xn at the Node 0, side information Y n at Node1 and side information Zn

at Node 2 respectively. Using the message M1 from the Source Node and its own side

information Y n, Node 1 wishes to output a reconstruction of the source sequence,

Xn1 , to within prescribed distortion level D1. At the same time, Node 1 outputs the

message M2 for Node 2 so that Node 2 can use its own side information Zn and M2

to output another reconstruction of the source sequence, Xn2 , to within prescribed

distortion D2.

Xn

Y n Zn

R1 R2

Xn1

Xn2

Node 0Node 1

Node 2

Figure 1.4: Cascade source coding setting

Fundamentally, the main question in Cascade Source Coding is the following: How

can the intermediate node (Node 1) efficiently summarize the received index M1 and

its own side information Y n to the end node (Node 2), and how should the source node

help Node 1 in this regard? The cascade source coding setup is also a basic building

block for a source coding network; progress made in this problem may shed light on

how to code efficiently over a more complicated source coding network. Furthermore,

it also has potential practical applications, such as in peer to peer video compression

and transmission over a network, where each node may have side information, such

as previous video frames, about a video to be sent from the source.

When the side informations at Nodes 1 and 2 are absent, the optimum rate-

distortion region can be characterized [39]. When side information is present, this

problem is in general a difficult one. Nevertheless, several interesting partial results

have appeared in recent years characterizing the rate-distortion region under different

settings (see [33], [8], [27], [16], [15], [31]). In Chapter 4, we focus on cases where

the statistical structure on the source and side informations allows us to show the

optimality of some coding schemes. In particular, we consider the Cascade Source

Coding problem, and a generalized version of the Cascade Source Coding problem, the

Triangular Source Coding problem, where the same side information (Y ) is available

at both the source node and Node 1, and the side information available at Node 2

is a degraded version of the side information at the source node and Node 1. We

characterize the rate-distortion region for these problems. For the Cascade setup, we

showed that, at Node 1, decoding and re-binning the codeword sent by the source

node for Node 2 is optimum. We then extend our results to the Two way Cascade

and Triangular setting, where the source node is interested in lossy reconstruction

of the side information at Node 2 via a rate limited link from Node 2 to the source

node. We characterize the rate distortion regions for these settings. Complete explicit

characterizations for all settings are also given in the Quadratic Gaussian case. We

then conclude the chapter with two further extensions: A triangular source coding

problem with a helper, and an extension of our Two Way Cascade setting in the

Quadratic Gaussian case.

In Chapter 5, we consider Cascade and Triangular Source Coding with side in-

formations and causal reconstruction at the end node (Node 2). Here, Node 2 is

constrained to reconstruct the symbol X2i using only the side information from time

1 up to i. This setup of causal source coding (and reconstruction) was first intro-

duced in [34], and motivation for considering this setting can be found in that paper.

Essentially, this problem is motivated by source coding systems that are constrained

to operate with limited or no delay. Source coding systems with encoder and decoder

6


delay constraints form a subclass of schemes in the setting proposed by [34], since

their setting only imposes delay restrictions on the decoder. Results and bounds in

that paper therefore serve as bounds on the limits of performance for source coding

systems with delay constraints. Our results in Chapter 5 serve as a partial gener-

alization of the results in that paper to the Cascade and Triangular Source Coding

setting. When the side information at the source and intermediate nodes are the

same, we characterize the rate distortion regions for both the cascade and triangu-

lar source coding problems. The difference between this setting and that found in

Chapter 4 is that the degraded side information assumption is no longer necessary for

characterization of the rate-distortion region. Instead, we impose a causality restric-

tion. We then move on to consider the general cascade setting with causal lossless

reconstruction at the end node. For this setting, we characterize the rate region when

the sources satisfy a positivity condition, or when a Markov chain holds.

Chapter 2

Three receivers broadcast channels

with common and confidential

messages

2.1 Introduction to 3-receivers Wiretap Channel

As mentioned in Chapter 1, the wiretap channel was first introduced in the seminal

paper by Wyner [36]. He considered a 2-receiver broadcast channel where sender X

wishes to communicate a message to receiver Y while keeping it secret from the other

receiver (eavesdropper) Z. Wyner showed that the secrecy capacity when the channel

to the eavesdropper is a degraded version of the channel to the legitimate receiver is

Cs = maxp(x)

(I(X ; Y )− I(X ;Z)).

The main coding idea is to randomly generate 2n(I(X:Y )) xn sequences and partition

them into 2nR message bins, where R < I(X ; Y ) − I(X ;Z). To send a message, a

sequence from the message bin is randomly selected and transmitted. The legitimate

receiver uniquely decodes the codeword and hence the message with high probability,

while the message is kept asymptotically secret from the eavesdropper provided R <

CS.

8

CHAPTER 2. 3 RECEIVERS WIRETAP CHANNEL 9

This result was extended by Csiszar and Korner [7] to general (non-degraded)

2-receiver broadcast channels with common and confidential messages. They estab-

lished the secrecy capacity region, which is the optimal tradeoff between the common

and private message rates and the eavesdropper’s private message equivocation rate.

In the special case of no common message, their result yields the secrecy capacity for

the general wiretap channel,

Cs = maxp(v)p(x|v)

(I(V ; Y )− I(V ;Z)).

The achievability idea is to use Wyner’s wiretap channel coding for the channel from

V to Y by randomly selecting a vn codeword from the message bin and then sending

a random sequence Xn generated according to∏n

i=1 pX|V (xi|vi).

The work in [7] has been extended in several directions by considering different

message demands and secrecy scenarios, e.g., see [22], [3]. However, with some notable

exceptions such as [18] and [10], extending the result of Csiszar and Korner to general

discrete memoryless broadcast channels with more than two receivers has remained

open, since even the capacity region without secrecy constraints for the 3-receiver

broadcast channel with degraded message sets is not known in general. The secrecy

setup for the 3-receiver broadcast channel also has close connections to the compound

wiretap channel model (see [21, Chapter 3] and references therein). Recently, Nair

and El Gamal [26] showed that the straightforward extension of the Korner–Marton

capacity region for the 2-receiver broadcast channel with degraded message sets to

more than 3 receivers is not optimal. They established an achievable rate region for

the general 3-receiver broadcast channel and showed that it can be strictly larger than

the straightforward extension of the Korner–Marton region.

In this chapter, we establish inner and outer bounds on the secrecy capacity region

for the 3-receivers broadcast channel with common and confidential messages. We

consider two setups.

• 2-receiver, 1-eavesdropper : Here the confidential message is to be sent to two

receivers and kept secret from the third receiver (eavesdropper).

• 1-receiver, 2-eavesdroppers : In this setup the confidential message is to be sent

to one receiver and kept secret from the other two receivers.

To illustrate the main coding idea in our new inner bound for the 2-receiver, 1-

eavesdropper setup, consider the special case where a message M ∈ [1 : 2nR] is to be

sent reliably to receivers Y1 and Y2 and kept asymptotically secret from eavesdropper

Z. A straightforward extension of the Csiszar–Korner [7] result for the 2-receiver

wiretap channel yields the lower bound on the secrecy capacity

CS ≥ maxp(v)p(x|v)

min {I(V ; Y1)− I(V ;Z), I(V ; Y2)− I(V ;Z)} . (2.1)

Now, suppose Z is a degraded version of Y1, then from Wyner’s wiretap result, we

know that (I(V ; Y1) − I(V ;Z)) ≤ (I(X ; Y1) − I(X ;Z)) for all p(v, x). However, no

such inequality holds in general for the second term under the minimum. As a special

case of the inner bound in Theorem 2.1, we show that the rate obtained by replacing

V by X only in the first term in (2.1) is achievable, that is, we establish the lower

bound

CS ≥ maxp(v)p(x|v)

min {I(X ; Y1)− I(X ;Z), I(V ; Y2)− I(V ;Z)} . (2.2)

To prove achievability of (2.2), we again randomly generate 2n(I(V ;Y2)−δ) vn sequences

and partition them into 2nR bins, where R = (I(V ; Y2) − I(V ;Z)). For each vn

sequence, we randomly and conditionally independently generate 2nI(X;Z|V ) xn se-

quences. The vn and xn sequences are revealed to all parties, including the eaves-

dropper. To send a message m, the encoder randomly chooses a vn sequence from bin

m. It then randomly chooses an xn sequence from the codebook for the selected vn

sequence (instead of randomly generating an Xn sequence as in the Csiszar–Korner

scheme) and transmits it. Receiver Y2 decodes vn directly, while receiver Y1 decodes v

n

indirectly through xn [26]. In Section 2.3, we show through an example that this new

lower bound can be strictly larger than the extended Csiszar–Korner lower bound.

We then show in Theorem 2.1 that this lower bound can be generalized further via

Marton coding.

The rest of the chapter is organized as follows. In the next section we present

10


needed definitions. In Section 2.3, we provide an alternative proof of achievability for

the Csiszar–Korner 2-receiver wiretap channel that uses superposition coding and ran-

dom codeword selection instead of random generation of the transmitted codeword.

This technique is used in subsequent sections to establish the new inner bounds for

the 3-receiver setups. In Section 2.4, we present the inner bound for the 2-receiver, 1-

eavesdropper case. We show that this inner bound is tight for the reversely degraded

product broadcast channel and when the eavesdropper is less noisy than both legiti-

mate receivers. In Section 2.5, we present inner and outer bounds for the 1-receiver,

2-eavesdropper setup for 3-receiver multilevel broadcast channel [4]. We show that

the bounds coincide in several special cases.

2.2 Definitions and Problem Setup

Consider a 3-receiver discrete memoryless broadcast channel with input alphabet X ,

output alphabets Y1,Y2,Y3 and conditional pmfs p(y1, y2, y3|x). We investigate the

following two setups.

2.2.1 2-Receivers, 1-Eavesdropper

Here the confidential message is to be sent to receivers Y1 and Y2 and is to be kept

secret from the eavesdropper Y3 = Z). A (2nR0, 2nR1, n) code for this scenario consists

of: (i) two messages (M0,M1) uniformly distributed over [1 : 2nR0 ]× [1 : 2nR1]; (ii) an

encoder that randomly generates a codeword Xn(m0, m1) according to the conditional

pmf p(xn|m0, m1); and (iii) 3 decoders; the first decoder assigns to each received

sequence yn1 an estimate (M01, M11) ∈ [1 : 2nR0] × [1 : 2nR1] or an error message, the

second decoder assigns to each received sequence yn2 an estimate (M02, M12) ∈ [1 :

2nR0]× [1 : 2nR1] or an error message, and the third decoder assigns to each received

sequence zn an estimate M03 ∈ [1 : 2nR0 ] or an error message. The probability of

error for this scenario is defined as

P(n)e1 = P

{

M0j 6= M0 for j = 1, 2, 3 or M1j 6= M1for j = 1, 2}

.

The equivocation rate at receiver Z, which measures the amount of uncertainty re-

ceiver Z has about message M1, is defined as H(M1|Zn)/n.

A secrecy rate tuple (R0, R1, Re) is said to be achievable if

limn→∞

P(n)e1 = 0, and

lim infn→∞

1

nH(M1|Z

n) ≥ Re.

The secrecy capacity region is the closure of the set of achievable rate tuples

(R0, R1, Re).

For this setup, we also consider the special case of asymptotic perfect secrecy, where

no common message is to be sent to Z and a confidential message, M ∈ [1 : 2nR], is

to be sent to Y1 and Y2 only. The probability of error is as defined above with R0 = 0

and R1 = R. A secrecy rate R is said to be achievable if there exists a sequence of

(2nR, n) codes such that

limn→∞

P(n)e1 = 0, and

lim infn→∞

1

nH(M |Zn) ≥ R.

The secrecy capacity, CS, is the supremum of all achievable rates.

2.2.2 1-Receiver, 2-Eavesdroppers

In this setup, the confidential message is to be sent to receiver Y1 and kept secret

from eavesdroppers Y2 = Z2 and Y3 = Z3. A (2nR0 , 2nR1, n) code consists of the

same message sets and encoding function as in the 2-receiver, 1-eavesdropper case.

The first decoder assigns to each received sequence yn1 an estimate (M01, M1) ∈ [1 :

2nR0] × [1 : 2nR1 ] or an error message, the second decoder assigns to each received

sequence zn2 an estimate M02 ∈ [1 : 2nR0] or an error message, and the third decoder

assigns to each received sequence zn3 an estimate M03 ∈ [1 : 2nR0] or an error message.

12


The probability of error is

P(n)e2 = P{M0j 6= M0for j = 1, 2, 3or M1 6= M1}.

The equivocation rates at the two eavesdroppers are H(M1|Zn2 )/n and H(M1|Z

n3 )/n,

respectively.

A secrecy rate tuple (R0, R1, Re2, Re3) is said to be achievable if

limn→∞

P(n)e2 = 0,

lim infn→∞

1

nH(M1|Z

nj ) ≥ Rej , j = 2, 3.

The secrecy capacity region is the closure of the set of achievable rate tuples (R0, R1, Re2, Re3).

For simplicity of presentation, we consider only the special class of multilevel broad-

cast channels [4].

2.3 2-receiver wiretap channel

We first revisit the 2-receiver wiretap channel, where a confidential message is to

be sent to the legitimate receiver Y and kept secret from the eavesdropper Z. The

secrecy capacity for this case is a special case of the secrecy capacity region for the

broadcast channel with common and confidential messages established in [7].

Proposition 2.1. The secrecy capacity of the 2-receiver wiretap channel is

CS = maxp(v,x)

(I(V ; Y )− I(V ;Z)).

In the following, we provide a new proof of achievability for this result in which

the second randomization step in the original proof is replaced by a random codeword

selection from a public superposition codebook. As we will see, this proof technique

allows us to use indirect decoding to establish new inner bounds for the 3-receiver

wiretap channels.

Proof of Achievability for Proposition 2.1:

Fix p(v, x). Randomly and independently generate sequences vn(l0), l0 ∈ [1 : 2nR],

each according to∏n

i=1 pV (vi). Partition the set [1 : 2nR] into 2nR bins B(m) =

[(m − 1)2n(R−R) + 1 : m2n(R−R)], m ∈ [1 : 2nR]. For each l0 ∈ [1 : 2nR], randomly

and conditionally independently generate sequences xn(l0, l1), l1 ∈ [1 : 2nR1 ], each

according to∏n

i=1 pX|V (xi|vi). The codebook {(vn(l0), xn(l0, l1))} is revealed to all

parties. To send the message m, an index L0 ∈ B(m) is selected uniformly at random

(as in Wyner’s original proof). The encoder then randomly and independently selects

an index L1 and transmits xn(L0, L1). Receiver Y decodes L0 by finding the unique

index l0 such that (vn(l0), yn) ∈ T

(n)ǫ . By the law of large numbers and the packing

lemma [13, Chapter 3], the average probability of error approaches zero as n → ∞ if

R < (V ; Y )− δ(ǫ).

We now show that I(M ;Zn|C) ≤ nδ(ǫ). Considering the mutual information

between Zn and M , averaged over the random codebook C, we have

I(M ;Zn|C) = I(M,L0, L1;Zn|C)− I(L0, L1;Z

n|M, C)

(a)

≤ I(Xn;Zn|C)−H(L0, L1|M, C) +H(L0, L1|M,Zn, C)

≤n∑

i=1

I(Xi;Zi|C)− n(R− R)− nR1 +H(L0, L1|M,Zn, C)

≤ nI(X ;Z)− n(R + R1 − R) +H(L0|M,Zn, C) +H(L1|L0, Zn, C).

(2.3)

(a) follows since (M,L0, L1, C) → Xn → Zn from the discrete memoryless property

of the channel. The last step follows from follows since H(Zi|C) ≤ H(Zi) = H(Z)

and H(Zi|Xi, C) =∑

C p(C)p(xi|C)H(Z|xi, C) =∑

C p(C)p(vi|C)H(Z|xi) = H(Z|X)

It remains to upper bound H(L0|M,Zn, C) and H(L1|L0, Zn, C). By symmetry of

14


codebook construction, we have

H(L0|M,Zn, C) = 2−nR

2nR∑

m=1

H(L0|M = m,Zn, C)

= H(L0|Zn,M = 1, C),

H(L1|L0, Zn, C) = 2−nR

∑

l0

H(L1|L0 = l0, Zn, C)

= H(L1|L0 = 1, vn(1), Zn, C).

To further bound these terms, we use the following key lemma.

Lemma 2.1. Let (U, V, Z) ∼ p(u, v, z), S ≥ 0 and ǫ > 0. Let Un be a random

sequence distributed according to∏n

i=1 pU(ui). Let V n(l), l ∈ [1 : 2nS], be a set of

random sequences that are conditionally independent given Un and each distributed

according to∏n

i=1 pV |U(vi|ui), and let C = {Un, V n(1), V n(2), . . . V n(2nS)}. Let L ∈

[1 : 2nS] be a random index with an arbitrary probability mass function. Then, if

P{(Un, V n(L), Zn) ∈ T(n)ǫ } → 1 as n → ∞ and S > I(V ;Z|U) + δ(ǫ), there exists a

δ′(ǫ) > 0, where δ′(ǫ) → 0 as ǫ → 0, such that, for n sufficiently large,

H(L|Zn, Un, C) ≤ n(S − I(V ;Z|U)) + nδ′(ǫ).

The proof of this lemma is given in Appendix A.1. An illustration of the random

sequence structure is given in Figure 2.1.

Now, returning to (2.3), we note that P{(V n(L0), Xn(L0, L1), Z

n) ∈ T(n)ǫ } → 1 as

n → ∞ by law of large numbers. Hence, we can apply Lemma 2.1 to obtain

H(L0|Zn,M = 1, C) ≤ n(R− R)− nI(V ;Z) + nδ(ǫ), (2.4)

H(L1|L0 = 1, V n, Zn, C) ≤ n(R1 − I(X ;Z|V )) + nδ(ǫ), (2.5)

if R−R > I(V ;Z) + δ(ǫ) and R1 > I(X ;Z|V ) + δ(ǫ). Substituting from inequalities

(2.4) and (2.5) into (2.3) shows that I(M ;Zn|C) ≤ 2nδ(ǫ). We then recover the

original asymptotic secrecy rate by noting that the constraint of R1 > I(X ;Z|V )+δ(ǫ)

Un

Zn

V n(1) V n(2) V n(2nS)V n(L)

Figure 2.1: Structure of random sequences in Lemma 2.1. V n(l) is generated ac-cording to

∏ni=1 pV |U(vi|ui). Solid arrows represent the sequence pair (Un, V n(L), Zn)

while the dotted arrows to Zn represent the other V n sequences jointly typical withthe (Un, Zn) pair. Lemma 2.1 gives an upper bound on the number of V n sequencesthat can be jointly typical with a (Un, Zn) pair.

is not tight. This completes the proof of Proposition 2.1.

Remark 2.3.1. In the proof of Proposition 2.1 in [7], the encoder transmits a ran-

domly generated codeword Xn ∼∏n

i=1 pX|V (xi|vi). Although replacing random Xn

generation by superposition coding and random codeword selection in our alternative

proof does not increase the achievable secrecy rate for the 2-receiver wiretap channel,

it can increase the rate when there are more than one legitimate receiver, as we show

in the next sections.

2.4 2-receivers, 1-eavesdropper wiretap channel

We establish an inner bound on the secrecy capacity for the 3-receiver wiretap channel

with one common and one confidential message when the confidential message is to

be sent to receivers Y1 and Y2 and kept secret from receiver Z. In the following

subsection, we consider the case where M0 = ∅ and M1 = M ∈ [1 : 2nR] is to be

kept asymptotically secret from Z. This result is then extended in Subsection 2.4.2

to establish an inner bound on the secrecy capacity region.

16


2.4.1 Asymptotic perfect secrecy

We establish the following lower bound on secrecy capacity for the case where a

confidential message is to be sent to receivers Y1 and Y2 and kept secret from the

eavesdropper Z.

Theorem 2.1. The secrecy capacity of the 2-receiver, 1-eavesdropper setup with one

confidential message and asymptotic secrecy is lower bounded as follows

CS ≥ min {I(V0, V1; Y1|Q)− I(V0, V1;Z|Q), I(V0, V2; Y2|Q)− I(V0, V2;Z|Q)}

for some p(q, v0, v1, v2, x) = p(q, v0)p(v1, v2|v0)p(x|v1, v2, v0) such that I(V1, V2;Z|V0) ≤

I(V1;Z|V0) + I(V2;Z|V0)− I(V1;V2|V0).

In addition to superposition coding and the new coding idea discussed in the

previous section, Theorem 2.1 also uses Marton coding [24].

For clarity, we first establish the following Corollary.

Corollary 2.1. The secrecy capacity for the 2-receiver, 1-eavesdropper with one con-

fidential message and asymptotic secrecy is lower bounded as follows

CS ≥ maxp(q)p(v|q)p(x|v)

min {I(X ; Y1|Q)− I(X ;Z|Q), I(V ; Y2|Q)− I(V ;Z|Q)} .

Remark 2.4.1. Consider the case where X → Y1 → Z form a Markov chain. Then,

we can show that Theorem 2.1 reduces to Corollary 2.1, i.e., the achievable secrecy

rate is not increased by using Marton coding when X → Y1 → Z (or X → Y2 → Z by

symmetry) form a Markov chain. To see this, note that (I(X ; Y1|Q)− I(X ;Z|Q)) ≥

(I(V1, V0; Y1|Q) − I(V1, V0;Z|Q)) for all V1 if X → Y1 → Z. Next, note that we can

set V = (V0, V2) in Corollary 2.1 to obtain the rate in Theorem 2.1.

Proof of Corollary 2.1:

Codebook generation

Randomly and independently generate the time-sharing sequence qn according to∏n

i=1 pQ(qi). Next, randomly and conditionally independently generate 2nR sequences

vn(l0), l0 ∈ [1 : 2nR], each according to∏n

i=1 pV |Q(vi|qi). Partition the set [1 : 2nR]

into 2nR equal size bins B(m), m ∈ [1 : 2nR]. For each l0, conditionally independently

generate sequences xn(l0, l1), l1 ∈ [1 : 2nR1], each according to∏n

i=1 pX|V (xi|vi).

Encoding

To send a message m ∈ [1 : 2nR], randomly and independently choose an index

L0 ∈ C(m) and an index L1 ∈ [1 : 2nR1 ], and send xn(L0, L1).

Decoding

Assume without loss of generality that L0 = 1 and m = 1. Receiver Y2 finds L0, and

hence m, via joint typicality decoding. By the law of large number and the packing

lemma, the probability of error approaches zero as n → ∞ if

R < I(V ; Y2|Q)− δ(ǫ).

Receiver Y1 finds L0 (and hence m) via indirect decoding. That is, it declares

that L0 is sent if it is the unique index such that (qn, vn(L0), xn(L0, l1), Y

n1 ) ∈ T

(n)ǫ

for some l1 ∈ [1 : 2nR1 ]. To analyze the average probability of error P(E), define the

error events

E10 = {(Qn, Xn(1, 1), Y n1 ) /∈ T (n)

ǫ },

E11 = {(Qn, Xn(l0, l1), Yn1 ) ∈ T (n)

ǫ for some l0 6= 1}.

Then, by union of events bound the probability of error is upper bounded as

P(E) ≤ P{E10}+ P{E11}.

18


Now by law of large numbers, P{E10} → 0 as n → ∞. Next consider

P{E11} ≤∑

l0 6=1

∑

l1

P

{

(Qn, V n(l0), Xn(l0, l1), Y

n1 )

∈ T(n)ǫ

}

≤∑

l0 6=1

∑

l1

2−n(I(V,X;Y1|Q)−δ(ǫ))

≤ 2n(R+R1−I(V,X;Y1|Q)+δ(ǫ)).

Hence, P{E11} → 0 as n → ∞ if

R + R1 < I(X ; Y1|Q)− δ(ǫ).

Analysis of equivocation rate

To bound the equivocation rate term H(M |Zn, C), we proceed as before and show

that the I(M ;Zn|C) ≤ 2nδ(ǫ). Note that the only difference between this case and

the analysis for the 2-receiver case in Section 2.2 is the addition of the time-sharing

random variable Q. Since

P{(Qn, V n(L0), Xn(L0, L1), Z

n) ∈ T(n)ǫ } → 1 as n → ∞, we can apply Lemma 2.1

(with the addition of the time sharing random variable). Following the analysis in

Section 2.2, it is easy to see that I(M ;Zn|C) ≤ 2nδ(ǫ) if

R− R > I(V ;Z|Q) + δ(ǫ),

R1 > I(X ;Z|V ) + δ(ǫ).

Finally, using Fourier–Motzkin elimination on the set of inequalities completes the

proof of achievability.

Before proving Theorem 2.1, we show through an example that the lower bound

in Corollary 2.1 can be strictly larger than the rate of the straightforward extension

of the Csiszar–Korner scheme to the 2-receiver, 1-eavesdropper setting,

RCK = maxp(q)p(v|q)p(x|v)

min {I(V ; Y1|Q)− I(V ;Z|Q), I(V ; Y2|Q)− I(V ;Z|Q)} . (2.6)

Note that Theorem 2.1 includes RCK as a special case (through setting V0 = V1 =

V2 = V in Theorem 2.1).

Example

: Consider the multilevel product broadcast channel example [26] in Figure 2.2, where

X1 = X2 = Y12 = Y21 = {0, 1}, and Y11 = Z1 = Z2 = {0, E, 1}. The channel

conditional probabilities are specified in Figure 2.2.

1/2

1/2

1/3

1/3

1/2

1/2

2/3

2/3

1/2

1/2

1/2

1/2

0

00

0

1

1

1

1

E

E

Y21 Y11

Y12 Z2

Z1X1

X2

Figure 2.2: Multilevel broadcast channel

In Appendix A.2, we show that RCK < 5/6. In contrast, using Corollary 2.1, we

can achieve a rate of 5/6, which shows that the rate given in Theorem 2.1 can be

strictly larger than using the straightforward extension of the Csiszar–Korner scheme.

We now turn to the proof of Theorem 2.1, which utilizes Marton coding in addition

to the ideas already introduced.

Proof of Theorem 2.1:

Codebook generation

Randomly and independently generate a time-sharing sequence qn according to∏n

i=1 pQ(qi). Randomly and conditionally independently generate sequences vn0 (l0),

l0 ∈ [1 : 2nR], each according to∏n

i=1 pV0|Q(v0i|qi). Partition the set [1 : 2nR] into

2nR bins, B(m), m ∈ [1 : 2nR] as before. For each l0, randomly and condition-

ally independently generate sequences vn1 (l0, t1), t1 ∈ [1 : 2nT1 ], each according to

20


∏ni=1 pV1|V0(v1i|v0i). Partition the set [1 : 2nT1 ] into 2nR1 equal size bins, B(l0, l1).

Similarly, for each l0, generate sequences vn2 (l0, t2), t2 ∈ [1 : 2nT2], each according to∏n

i=1 pV2|V0(v2i|v0i), and partition [1 : 2nT1] into 2nR2 equal size bins, B(l0, l2).

Encoding

To send message m, the encoder first randomly chooses an index L0 ∈ B(m). It then

randomly chooses a product bin indices (L1, L2) and selects a jointly typical sequence

pair (vn1 (L0, t1(L0, L1)), vn2 (L0, t2(L0, L2)) in the product bin. If there is more than

one such pair, pick one of them uniformly at random. An error event occurs if no

such pair is found, in which case, the encoder picks the indices t1 ∈ B(L0, L1) and

t2 ∈ B(L0, L2) uniformly at random. This encoding step succeeds with probability of

error that approaches zero as n → ∞, if [14]

R1 + R2 < T1 + T2 − I(V1;V2|V0)− δ(ǫ).

Finally, the encoder generates a codeword Xn at random according to∏n

i=1 pX|V0,V1,V2(xi|v0i, v1i, v2i) and transmits it.

Decoding and analysis of the probability of error

Receiver Y1 decodes L0 and hencem indirectly by finding the unique index l0 such that

(vn0 (l0), vn1 (l0, t1), y

n1 ) ∈ T

(n)ǫ for some t1 ∈ [1 : 2nT1]. Similarly, receiver Y2 finds L0

(and hence m) indirectly by finding the unique index l0 such that (vn0 (l0), vn2 (l0, T2)) ∈

T(n)ǫ for some l2 ∈ [1 : 2nT2]. Following the analysis given earlier, it is easy to see that

these steps succeed with probability of error that approaches zero as n → ∞ if

R + T1 < I(V0, V1; Y1|Q)− δ(ǫ),

R + T2 < I(V0, V1; Y2|Q)− δ(ǫ).


A codebook C induces a joint pmf on (M,L0, L1, L2, Vn0 , V

n1 , V

n2 , Z

n) of the form

p(m, l0, l1, l2, vn0 , v

n1 , v

n2 z

n|c) = 2−n(R+R1+R2)p(vn0 , vn1 , v

n2 |l0, l1, l2, c).

∏ni=1 pZ|V0,V1,V2

(zi|v0i, v1i, v2i). We again analyze the mutual information between M

and (Zn, Qn), averaged over codebooks.

I(M ;Zn, Qn|C) = I(M ;Zn|Qn, C)

= I(T1(L0, L1), T2(L0, L1), L0,M ;Zn|Qn, C)

− I(T1(L0, L1), T2(L0, L2), L0;Zn|M,Qn, C)

≤ I(V n0 , V

n1 , V

n2 ;Z

n|Qn, C)− I(L0;Zn|M,Qn, C)

− I(T1(L0, L1), T2(L0, L2);Zn|L0, Q

n, C)

≤ nI(V0, V1, V2;Z|Q)−H(L0|M,Qn, C) +H(L0|M,Qn, Zn, C)

− I(T1(L0, L1), T2(L0, L2);Zn|L0, Q

n, C) + nδ(ǫ). (2.7)

In the last step, we bound the term I(V n0 , V

n1 , V

n2 ;Z

n|Qn, C) by the following argu-

ment, which is an extension of a similar argument in [36]. For simplicity of notation,

let V = (V0, V1, V2). We wish to show that I(V n;Zn|Qn, C) ≤ nI(V ;Z)+nδ(ǫ). Note

that X is generated according to p(xi|vi). Define E = 1 if (qn, vn, zn) are not jointly

typical and 0 otherwise, and N(v) = |{Vi : Vi = v}|. Then,

I(V n;Zn|C, Qn) ≤ 1 + P(E = 0)I(V ;Zn|C, E = 0, Qn)

+ P(E = 1)I(V ;Zn|C, E = 1, Qn)

≤ 1 + P(E = 0)I(V ;Zn|C, E = 0, Qn)

+ P(E = 1)n log |Z| − P(E = 1)H(Zn|C, V n, Qn, E = 1)

= 1 + P(E = 0)(H(Zn|C, E = 0, Qn)−H(Zn|C, V n, Qn)

+ P(E = 1)n log |Z|.

22


Note that H(Zn|C, E = 0, Qn) ≤ nH(Z|Q) + nδ(ǫ). For H(Zn|C, V n, Qn, E = 0) =

H(Zn|C, V n, E = 0), we have

H(Zn|C, E = 0, V n) ≥∑

c,vn∈T(n)ǫ

P(V n = vn, C = c)H(Zn|C = c, V n = vn)

=∑

c,vn∈T(n)ǫ

P(V n = vn, C = c).

(

n∑

i=1

H(Zi|C = c, V n = vn, Z i−1)

)

(a)=

∑

c,vn∈T(n)ǫ

P(V n = vn, C = c)

n∑

i=1

H(Zi|Vi = vi)

=∑

c,vn∈T(n)ǫ

P(V n = vn, C = c)∑

v∈V

N(v)H(Z|V = v)

(b)

≥∑

c,vn∈T(n)ǫ

P(V n = vn, C = c).

(

∑

v∈V

n(p(v)− δ(ǫ))H(Z|V = v)

)

≥ nH(Z|V )− nδ′(ǫ),

where (a) follows since given Vi, Xi is generated randomly according to p(xi|vi) and

since the channel is memoryless, Zi is independent of all other random variables, and

(b) follows since vn is typical, which implies that N(v) ≥ np(v)−nδ(ǫ). Finally, since

the coding scheme satisfies the encoding constraints, the proof is completed by noting

that P(E = 1) → 0 as n → ∞ by the law of large numbers and the mutual covering

lemma in [13, Chapter 9]).

We now bound each remaining terms in inequality (2.7) separately. Note that

H(L0|M,Qn, C) = n(R −R), (2.8)

H(L0|M,Qn, Zn, C)(a)

≤ n(S0 − R− I(V0;Z|Q)) + nδ(ǫ), (2.9)

where (a) follows by similar steps to the proof of Corollary 2.1 and application of

Lemma 2.1, which holds if P{(Qn, V n0 (L0), Z

n) ∈ T(n)ǫ } → 1 as n → ∞ and S0−R ≥

I(V0;Z|Q) + δ(ǫ). The first condition follows since

P

{

(

Qn, V n0 (L0), V

n1 (L0, T1(L0, L1)), V

n2 (L0, T2(L0, L2)), Z

n)

∈ T (n)ǫ

}

→ 1

as n → ∞. Next, consider

I(T1(L0, L1), T2(L0, L2);Zn|L0, Q

n, C)

= H(T1(L0, L1), T2(L0, L2)|L0, Qn, C)−H(T1(L0, L1), T2(L0, L2)|L0, Q

n, Zn, C)

(a)= H(T1(L0, L1), T2(L0, L2), L1, L2|L0, Q

n, C)

−H(T1(L0, L1), T2(L0, L2)|L0, Qn, Zn, C)

≥ H(L1, L2|L0, Qn, C)−H(T1(L0, L1)|L0, Q

n, Zn, C)

−H(T2(L0, L2)|L0, Qn, Zn, C), (2.10)

where (a) holds since given the codebook C and L0, (L1, L2) is a deterministic function

of (T1(L0, L1), T2(L0, L2)). Now,

H(L1, L2|L0, Qn, C) = n(R1 + R2), (2.11)

H(T1(L0, L1)|L0, Qn, Zn, C)

(b)

≤ n(T1 − I(V1;Z|V0) + δ(ǫ)), (2.12)

H(T2(L0, L2)|L0, Qn, Zn, C)

(c)

≤ n(T2 − I(V2;Z|V0) + δ(ǫ)), (2.13)

where (b) and (c) come from the following analysis. First consider

H(T1(L0, L1)|L0, Qn, Zn, C) = H(T1(L0, L1)|v

n0 (L0), Q

n, Zn, L0, C)

≤ H(T1(L0, L1)|Vn0 , Z

n, C).

We now upper bound the term H(T1(L0, L1)|Vn0 , Z

n, C).

Since

P

{

(

Qn, V n0 (L0), V

n1 (L0, T1(L0, L1)), V

n2 (L0, T2(L0, L2)), Z

n)

∈ T (n)ǫ

}

→ 1

24


as n → ∞, P{(V n0 (L0), V

n1 (L0, T1(L0, L1)), Z

n) ∈ T(n)ǫ } → 1 as n → ∞. We can

therefore apply Lemma 1 to obtain

H(T1(L0, L1)|L0, Qn, Zn, C) ≤ n(T1 − I(V1;Z|V0))

+ nδ(ǫ),

if T1 > I(V1;Z|V0) + δ(ǫ).

The term H(T2(L0, L2)(L0, L2)|L0, Qn, Zn, C) can be bound using the same steps

to give

H(T2(L0, L2)|L0, Qn, Zn, C) ≤ n(T2 − I(V2;Z|V0)) + nδ(ǫ),

if T2 > I(V2;Z|V0) + δ(ǫ).

Substituting from (2.11), (2.12), and (2.13) into (2.10) yields

I(T1(L0, L1), T2(L0, L2);Zn|L0, Q

n, C) ≥ n(R1 + R2)− n(T1 − I(V1;Z|V0) + δ(ǫ))

− n(T2 − I(V2;Z|V0) + δ(ǫ)). (2.14)

Substituting inequality (2.14), together with (2.8) and (2.9) into (2.7) then yields

I(M ;Zn|Qn, C) ≤ n(I(V1, V2;Z|V0) + T1 + T2 − R1 − R2)

− n(I(V1;Z|V0)− I(V2;Z|V0) + 3δ(ǫ)).

Hence, I(M ;Zn|Qn, C) ≤ 6nδ(ǫ) if

I(V1, V2;Z|V0) + T1 + T2 − R1 − R2 − I(V1;Z|V0)− I(V2;Z|V0) ≤ 3nδ(ǫ).


Reversely Degraded Product Broadcast Channel

As an example of Theorem 2.1, consider the reversely degraded product broadcast

channel with sender X = (X1, X2 . . . , Xk), receivers Yj = (Yj1, Yj2 . . . , Yjk) for j =

1, 2, 3, and conditional probability mass functions p(y1, y2, z|x) =∏k

l=1 p(y1l, y2l, zl|xl).

In [18], the following lower bound on secrecy capacity is established

CS ≥ minj∈{1,2}

k∑

l=1

[I(Ul; Yjl)− I(Ul;Zl)]+. (2.15)

for some p(u1, . . . , uk, x) =∏k

l=1 p(ul)p(xl|ul). Furthermore, this lower bound is

shown to be optimal when the channel is reversely degraded (with Ul = Xl), i.e.,

when each sub-channel is degraded but not necessarily in the same order. We can

show that this result is a special case of Theorem 2.1. Define the sets of l in-

dexes: C = {l : I(Ul; Y1l) − I(Ul;Zl) ≥ 0, I(Ul; Y2l) − I(Ul;Zl) ≥ 0}, A = {l :

I(Ul; Y1l) − I(Ul;Zl) ≥ 0} and B = {l : I(Ul; Y2l) − I(Ul;Zl) ≥ 0}. Now, setting

V0 = {Ul : l ∈ C}, V1 = {Ul : l ∈ A}, and V2 = {Ul : l ∈ B} in the rate expression

of Theorem 2.1 yields (2.15). Note that the constraint in Theorem 2.1 is satisfied for

this choice of auxiliary random variables. The expanded equations are as follows:

I(V1, V2;Z|V0) = I(UA, UB;Z|UC)

= I(UA\C , UB\C ;Z\C)

= I(UA\C ;Z,A\C) + I(UB\C ;Z,B\C)

= I(V1;Z|V0) + I(V2;Z|V0),

I(V0, V1; Y1)− I(V0, V1;Z) = I(UA; Y1,A)− I(UA;ZA),

I(V0, V1; Y1)− I(V0, V1;Z) = I(UB; Y1,A)− I(UB;ZB),

I(V1;V2|V0) = I(UA\C ;UB\C) = 0.

Receivers Y1 and Y2 are less noisy than Z

Recall that in a 2-receiver broadcast channel, a receiver Y is said to be less noisy [20]

than a receiver Z if I(U ; Y ) ≥ I(U ;Z) for all p(u, x). In this case, we have

CS = maxp(x)

min {I(X ; Y1)− I(X ;Z), I(X ; Y2)− I(X ;Z)} .

To show achievability, we set Q = ∅ and V0 = V1 = V2 = V3 = X in Theorem 2.1. The

converse follows similar steps to the converse for Proposition 2.2 in Subsection 2.4.2

given in Appendix A.4 and we omit it here.

2.4.2 2-Receivers, 1-Eavesdropper with Common Message

As a generalization of Theorem 2.1, consider the setting with both common and

confidential messages, where we are interested in achieving some equivocation rate

for the confidential message rather than asymptotic secrecy. For this setting we can

establish the following inner bound on the secrecy capacity region.

Theorem 2.2. An inner bound to the secrecy capacity region of the 2-receiver, 1-

eavesdropper broadcast channel with one common and one confidential messages is

given by the set of rate tuples (R0, R1, Re) such that

R0 < I(U ;Z),

R0 +R1 < I(U ;Z) + min {I(V0, V1; Y1|U)− I(V1;Z|V0),

I(V0, V2; Y2|U)− I(V2;Z|V0)} ,

R0 +R1 < min {I(V0, V1; Y1)− I(V1;Z|V0), I(V0, V2; Y2)− I(V2;Z|V0)} ,

Re ≤ R1,

Re < min {I(V0, V1; Y1|U)− I(V0, V1;Z|U), I(V0, V2; Y2|U)− I(V0, V2;Z|U)} ,

R0 +Re < min {I(V0, V1; Y1)− I(V1, V0;Z|U), I(V0, V2; Y2)− I(V2, V0;Z|U)} ,

R0 + 2Re < I(V0, V1; Y1) + I(V0, V2; Y2|U)− I(V1;V2|V0)− 2I(V0;Z|U),


28


R0 +R1 + 2Re < I(V0, V2; Y2|U)− I(V2;Z|V0) + I(V0, V1; Y1) + I(V0, V2; Y2|U)

− I(V1;V2|V0)− 2I(V0;Z|U),


− I(V1;V2|V0)− 2I(V0;Z|U),

for some p(u, v0, v1, v2, x) = p(u)p(v0|u)p(v1, v2|v0)p(x|v0, v1, v2) such that I(V1, V2;Z|V0) ≤

I(V1;Z|V0) + I(V2;Z|V0)− I(V1;V2|V0).

Note that if we discard the equivocation rate constraints and set V0 = V1 = V2 =

X , this inner bound reduces to the straightforward extension of the Korner–Marton

degraded message set capacity region for the 3 receivers case [26, Corollary 1].

If we take V0 = V1 = V2 = V and Y1 = Y2 = Y , then we obtain the region

consisting of all rate pairs (R0, R1) such that

R0 < I(U ;Z), (2.16)

R0 +R1 < I(U ;Z) + I(V ; Y |U),

R0 +R1 < I(V ; Y ),

Re ≤ R1,

Re < I(V ; Y |U)− I(V ;Z|U),

R0 +Re < I(V ; Y )− I(V ;Z|U) (2.17)

for some p(u, v, x) = p(u)p(v|u)p(x|v).

This region provides an equivalent characterization of the secrecy capacity region

of the 2-receiver broadcast channel with confidential messages [7]. To see this, note

that if we tighten the first inequality to R0 ≤ min{I(U ;Z), I(U ; Y )}, the last in-

equality becomes redundant and the region reduces to the original characterization

in [7].

Proof of Theorem 2.2:

The proof of Theorem 2.2 involves rate splitting for R1(= R′1 + R′′

1). We first

establish an inner bound without rate splitting. The proof with rate splitting is given

in Appendix A.3.

Codebook generation

Fix p(u, v0, v1, v2, x) and let Rr ≥ 0 be such that R1 − Re + Rr ≥ I(V0;Z|U) + δ(ǫ).

Randomly and independently generate sequences un(m0), m0 ∈ [1 : 2nR0], each ac-

cording to∏n

i=1 pU(ui). For each m0, randomly and conditionally independently

generate sequences vn0 (m0, m1, mr), (m1, mr) ∈ [1 : 2n(R1+Rr)], each according to∏n

i=1 pV0|U(v0i|ui). For each (m0, m1, mr), generate sequences vn1 (m0, m1, mr, t1), t1 ∈

[1 : 2nT1 ], each according to∏n

i=1 pV1|V0(v1i|v0i), and partition the set [1 : 2nT1] into

2nR1 equal size bins B(m0, m1, mr, l1). Similarly, for each (m0, m1, mr), randomly gen-

erate sequences vn2 (m0, m1, mr, t2), t2 ∈ [1 : 2nT2 ] each according to∏n

i=1 pV2|V0(v2i|v0i)

and partition the set [1 : 2nT2 ] into 2nR2 bins B(m0, m1, mr, l2).

Encoding

To send a message pair (m0, m1), the encoder first chooses a random index mr ∈

[1 : 2nRr ] and then the sequence pair (un(m0), vn0 (m1, mr, m0)). It then randomly

chooses a product bin indices (L1, L2) and selects a jointly typical sequence pair

(vn1 (m0, m1, mr, t1(L1)), vn2 (m0, m1, mr, t2(L2)) in it. If there is more than one such

pair, it randomly and uniformly pick a pair from the set of jointly typical pairs. An

error event occurs if no such pair is found, in which case, the encoder picks the indices

t1 ∈ B(L0, L1) and t2 ∈ B(L0, L2) uniformly at random. As with Theorem 2.1, the

probability of error approaches zero as n → ∞ if

R1 + R2 < T1 + T2 − I(V1;V2|V0)− δ(ǫ).

Finally, it generates a codewordXn at random according to∏n

i=1 pX|V0,V1.V2(xi|v0i, v1i, v2i).

30



Receiver Y1 finds (m0, m1) indirectly by looking for the unique (m0, l0) such that

(un(m0), vn0 (m0, l0), v

n1 (m0, l0, l1)) ∈ T

(n)ǫ for some l1 ∈ [1 : 2nT1]. Similarly, receiver

Y2 finds (m0, m1) indirectly by looking for the unique (m0, l0) such that

(un(m0), vn0 (m0, l0), v

n1 (m0, l0, l2)) ∈ T

(n)ǫ for some l2 ∈ [1 : 2nT2]. Receiver Z finds m0

directly by decoding U . These steps succeed with probability of error approaching

zero as n → ∞ if

R0 +R1 + T1 +Rr < I(V0, V1; Y1)− δ(ǫ),

R1 + T1 +Rr < I(V0, V1; Y1|U)− δ(ǫ),

R0 +R1 + T2 +Rr < I(V0, V1; Y2)− δ(ǫ),

R1 + T2 +Rr < I(V0, V1; Y2|U)− δ(ǫ),

R0 < I(U ;Z)− δ(ǫ).


We consider the equivocation rate averaged over codes. We will show that a part of

the message M1p can be kept asymptotically secret from the eavesdropper as long as

rate constraints on Re and R1 are satisfied. Let R1 = R1p +R1c and Re = R1p.

H(M1|Zn, C) ≥ H(M1p|Z

n,M0, C)

= H(M1p)− I(M1p;Zn|M0, C) (2.18)

(a)

≥ H(M1p)− 3nδ(ǫ)

= n(R1 − I(V0;Z|U))− 3nδ(ǫ).

This implies that Re ≤ R1 − I(V0;Z|U)− 3δ(ǫ) is achievable.

To prove step (a), consider

I(M1p;Zn|M0, C) = I(T1(L1), T2(L2),M1p,M1c,Mr;Z

n|M0, C)

− I(T1(L1), T2(L2),M1c,Mr;Zn|M1p,M0, C)

(b)

≤ I(V n0 , V

n1 , V

n2 ;Z

n|M0, C)− I(M1c,Mr;Zn|M1p,M0, C)

− I(T1(L1), T2(L2);Zn|M1,M0,Mr, C)

(c)

≤ I(V n0 , V

n1 , V

n2 ;Z

n|Un, C)− I(M1c,Mr;Zn|M1p,M0, C)

− I(T1(L1), T2(L2);Zn|M1,M0,Mr, C)

≤ nI(V0, V1, V2;Z|U) + nδ(ǫ)−H(M1c,Mr|M1p, Un, C)

+H(M1c,Mr|M1p,M0, Zn, C)

− I(T1(L1), T2(L2);Zn|M1,M0,Mr, C)

≤ nI(V0, V1, V2;Z|U) + nδ(ǫ)− n(R1 −Re +Rr)

+H(M1c,Mr|M1p,M0, Zn, C)

− I(T1(L1), T2(L2);Zn|M1,M0,Mr, C),

where (b) follows by the data processing inequality and (c) follows by the observation

that Un is a function of (C,M0) and (C,M0) → (C, Un, V n) → Zn. Following the

analysis of the equivocation rate terms in Theorem 2.1 and using Lemma 1, the

remaining terms can be bounded by

H(M1c,Mr|M1p,M0, Zn, C) ≤ H(M1c,Mr|M1p, U

n, Zn)

≤ n(R1 −Re +Rr)− nI(V0;Z|U) + nδ(ǫ),

I(T1(L1), T2(L2);Zn|M1,M0,Mr, C) = H(T1(L1), T2(L2)|M1,M0,Mr, C)

−H(T1(L1), T2(L2)|M1,M0,Mr, C, Zn)

(a)

≥ n(R1 + R2)

−H(T1(L1), T2(L2)|M1,M0,Mr, C, Zn)

(b)= n(R1 + R2)

−H(T1(L1), T2(L2)|Mr,M1,M0, Vn0 , C, Z

n)

≥ n(R1 + R2)−H(T1(L1), T2(L2)|Vn0 , Z

n)

≥ n(R1 + R2 − T1 − T2) + n(I(V1;Z|V0)

+ I(V2;Z|V0))− 2nδ(ǫ),

32


if T1 > I(V1;Z|V0) + δ(ǫ), and T2 > I(V2;Z|V0) + δ(ǫ). Step (a) follows the same

observation as in the proof of Theorem 2.1, i.e. given (M1,M0,Mr, C), (L1, L2) is a

deterministic function of (T1(L1), T2(L2)). Step (b) follows from the observation that

V n0 is a function of (C,M0,M1).

Thus, we have

I(M1p;Zn|M0, C) ≤ I(V1, V2;Z|V0)− I(V1;Z|V0)− I(V2;Z|V0)

+ n(T1 + T2 − R1 − R2) + 4nδ(ǫ).

Hence, I(M1p;Zn|M0, C) ≤ 7nδ(ǫ) if

I(V1;V2;Z|V0) + T1 + T2 − R1 − R2 − I(V1;Z|V0)− I(V2;Z|V0) ≤ 3nδ(ǫ).

Substituting back into (2.18) shows that

H(M1|Zn, C) ≥ n(R1 − I(V0;Z|U)− 5nδ(ǫ).

The rate constraints due to equivocation are

Re ≤ R1,

Rr ≥ 0,

R1 −Re + Rr > I(V0;Z|U),

T1 > I(V1;Z|V0),

T2 > I(V2;Z|V0),

T1 + T2 − R1 − R2 ≤ I(V1;Z|V0) + I(V2;Z|V0)

− I(V1;V2;Z|V0).

Using Fourier-Motzkin elimination then gives us an inner bound for the case without

rate splitting. The proof with rate splitting on R1 is given in Appendix A.3.

Special Case:

We show that the inner bound in Theorem 2.2 is tight when both Y1 and Y2 are less

noisy than Z.

Proposition 2.2. When both Y1 and Y2 are less noisy than Z, the 2-receiver, 1-

eavesdropper secrecy capacity region is the set of rate tuples (R0, R1, Re) such that

R0 ≤ I(U ;Z),

R1 ≤ min{I(X ; Y1|U), I(X ; Y2|U)},

Re ≤ [min {R1, I(X ; Y1|U)− I(X ;Z|U),

I(X ; Y2|U)− I(X ;Z|U)}]+

for some p(u, x), where [x]+ = x if x ≥ 0 and 0 otherwise.

Achievability follows by setting V0 = V1 = V2 = X in Theorem 2.2 and using the

fact that Y1 and Y2 are less noisy than Z, which allows us to assume without loss

of generality that R0 ≤ min{I(U ;Z), I(U ; Y1), I(U ; Y2)}. The set of inequalities then

reduce to

R0 < I(U ;Z),

R0 +R1 < I(U ;Z) + min{I(X ; Y1|U), I(X ; Y2|U)},

Re ≤ R1,

Re < min {I(X ; Y1|U)− I(X ;Z|U), I(X ; Y2|U)− I(X ;Z|U)} .

Since the region in Proposition 2.2 is a subset of the above region, we have established

the achievability part of the proof. Achievability in this case, however, is a straight-

forward extension of Csiszar and Korner and does not require Marton coding. For the

converse, we use the identification Ui = (M0, Zi−1). With this identification, the R0

inequality follows trivially. The R1 and Re inequalities follow from standard methods

and a technique in [26, Proposition 11]. The details are given in Appendix A.4.

34


2.5 1-receiver, 2-eavesdroppers wiretap channel

We now consider the case where the confidential message M1 is to be sent only to

Y1 and kept hidden from the eavesdroppers Z2 and Z3. All three receivers Y1, Z2, Z3

require a common message M0. For simplicity, we only consider the special case of

multilevel broadcast channel [4], where p(y1, z2, z3|x) = p(y1, z3|x)p(z2|y1). In [26], it

was shown that the capacity region (without secrecy) is the set of rate pairs (R0, R1)

such that

R0 < min{I(U ;Z2), I(U3;Z3)},

R1 < I(X ; Y1|U),

R0 +R1 < I(U3;Z3) + I(X ; Y1|U3)

for some p(u)p(u3|u)p(x|u3). We extend this result to obtain inner and outer bounds

on the secrecy capacity region.

Proposition 2.3. An inner bound to the secrecy capacity region of the 1-receiver,

2-eavesdropper multilevel broadcast channel with common and confidential messages

is is given by the set of rate tuples (R0, R1, Re2, Re3) such that

R0 < min{I(U ;Z2), I(U3;Z3)},

R1 < I(V ; Y1|U),

R0 +R1 < I(U3;Z3) + I(V ; Y1|U3),

Re2 < min{R1, I(V ; Y1|U)− I(V ;Z2|U)},

Re2 < [I(U3;Z3)−R0 − I(U3;Z2|U)]+ + I(V ; Y1|U3)− I(V ;Z2|U3),

Re3 < min{R1, [I(V ; Y1|U3)− I(V ;Z3|U3)]+},

Re2 +Re3 < R1 + I(V ; Y1|U3)− I(V ;Z2|U3),

for some p(u, u3, v, x) = p(u)p(u3|u)p(v|u3)p(x|v).

It can be shown that setting Y1 = Z2 = Y and Z3 = Z gives an alternative

characterization of the secrecy capacity of the broadcast channel with confidential

messages.

Proof of achievability : We break down the proof of Proposition 2.3 into four cases

and give the analysis of the first case in detail. The analyses for the rest of the cases

are similar and we therefore we only provide a sketch in Appendix A.5. Furthermore,

in all cases, we assume that R1 ≥ min{I(V ; Y1|U3) − I(V ;Z2|U3), [I(V ; Y1|U3) −

I(V ;Z3|U3)]+}. It is easy to see from our proof that if this inequality does not

hold, then we achieve equivocation rates of Re2 = Re3 = R1 for any rate pair(R0, R1)

satisfying the inequalities in the proposition. The four cases are:

Case 1

I(U3;Z3)−R0−I(U3;Z2|U) ≥ 0, I(V ; Y1|U3)−I(V ;Z2|U3) ≤ I(V ; Y1|U3)−I(V ;Z3|U3)

and Re3 ≥ I(V ; Y1|U3)− I(V ;Z2|U3);

Case 2

I(U3;Z3)−R0−I(U3;Z2|U) ≥ 0, I(V ; Y1|U3)−I(V ;Z2|U3) ≤ I(V ; Y1|U3)−I(V ;Z3|U3)

and Re3 ≤ I(V ; Y1|U3)− I(V ;Z2|U3);

Case 3

I(U3;Z3)−R0−I(U3;Z2|U) ≥ 0, I(V ; Y1|U3)−I(V ;Z2|U3) ≥ I(V ; Y1|U3)−I(V ;Z3|U3).

In this case, since we consider only the case of R1 ≥ I(V ; Y1|U3) − I(V ;Z3|U3), we

will see that an equivocation rate of Re3 = I(V ; Y1|U3)−I(V ;Z3|U3) can be achieved;

Case 4

I(U3;Z3)− R0 − I(U3;Z2|U) ≤ 0.

36


Now, consider Case 1, where I(U3;Z3) − R0 − I(U3;Z2|U) ≥ 0, I(V ; Y1|U3) −

I(V ;Z2|U3) ≤ I(V ; Y1|U3)− I(V ;Z3|U3) and Re3 ≥ I(V ; Y1|U3)− I(V ;Z2|U3).

Codebook generation

Fix p(u, u3, v, x) = p(u)p(u3|u)p(v|u3)p(x|v). Let R1 = Ro10+Rs

10+R′11+R′′

11+Ro11. Let

Rr0 ≥ 0 and Rr

1 ≥ 0 be the randomization rates introduced by the encoder. These are

not part of the message rate. Let R10 = Ro10+Rs

10+Rr0 and R11 = R′

11+R′′11+Ro

11+Rr1.

Randomly and independently generate sequences un(m0), m0 ∈ [1 : 2nR0], each ac-

cording to∏n

i=1 pU(ui). For each m0, randomly and conditionally independently gen-

erate sequences un3(m0, l0), l0 ∈ [1 : 2nR10 ], each according to

∏ni=1 pU3|U(u3i|ui). For

each (m0, l0), randomly and conditionally independently generate sequences vn(m0, l0, l1),

l1 ∈ [1 : 2nR11 ], each according to∏n

i=1 pV |U3(vi|u3i).

Encoding

To send a message (m0, m1), we split m1 into sub-messages with the corresponding

rates given in the codebook generation step and generate the randomization messages

(mr10, m

r11) uniformly at random from the set [1 : 2nR

r0 ]× [1 : 2nR

r1 ]. We then select the

sequence vn(m0, l0, l1) corresponding to (m0, m1, mr10, m

r11) and send Xn generated

according to∏n

i=1 pX|V (xi|vi(l1, l0, m0)).


Receiver Y1 finds (m0, m1) by decoding (U, U3, V ), Z2 finds m0 by decoding U , and Z3

finds m0 indirectly through (U, U3). The probability of error goes to zero as n → ∞

if

R0 < I(U ;Z2)− δ(ǫ),

R0 +Ro10 +Rr

0 +Rs10 < I(U3;Z3)− δ(ǫ),

Rs10 +Ro

10 +Rr0 < I(U3; Y1|U)− δ(ǫ),

R′11 +R′′

11 +Ro11 +Rr

1 < I(V ; Y1|U3)− δ(ǫ).

Analysis of equivocation rates

We show that the following equivocation rates are achievable.

Re2 = Rs10 +R′

11 − δ(ǫ),

Re3 = R′11 +R′′

11 − δ(ǫ).

It is straightforward to show that the stated equivocation rate Re3 is achievable if

Rr1 +Ro

11 > I(V ;Z3|U3) + δ(ǫ).

The analysis of the H(M1|Zn2 , C) term is slightly more involved. Consider

I(Ms10,M

′11;Z

n2 |M0, C) = I(L0, L1;Z

n2 |C,M0)− I(L0, L1;Z

n2 |C,M0,M

s10,M

′11)

≤ I(V n;Zn2 |C, U

n)− I(L0;Zn2 |C,M0,M

s10,M

′11)

− I(L1;Zn2 |C,M0, L0,M

′11)

≤ nI(V ;Z2|U)− I(L0;Zn2 |C,M0,M

s10,M

′11)

− I(L1;Zn2 |C,M0, L0,M

′11).

Now consider the second and third terms. We have

I(L0;Zn2 |C,M0,M

s10,M

′11) = H(L0|C,M0,M

s10,M

′11)−H(L0|C,M0,M

s10,M

′11, Z

n2 , U

n)

≥ n(R10 −Rs10)−H(L0|C,M

s10, Z

n2 , U

n)

≥ n(I(U3;Z2|U)− δ(ǫ)).

The last step follows from Lemma 2.1, which holds if

R10 − Rs10 = Ro

10 +Rr0

> I(U3;Z2|U) + δ(ǫ).

38


For the third term, we have

I(L1;Zn2 |C,M0, L0,M

′11) = H(L1|C,M0, L0,M

′11)−H(L1|C,M0, L0,M

′11, Z

n2 , U

n)

≥ n(R11 − R′11)−H(L1|C,M

′11, Z

n2 , U

n)

≥ n(R11 − R′11)− n(R11 −R′

11)− n(I(V ;Z2|U3) + δ(ǫ)).

In the last step, we again apply Lemma 2.1, which holds if

R′′11 +Ro

11 +Rr1 > I(V ;Z|U3) + δ(ǫ).

In summary, the inequalities for Case 1 are as follows:

Decoding Constraints : (with R0 ≤ I(U ;Z2) omitted since this inequality appears

in the final rate-equivocation region and does not contain the auxiliary rates to be

eliminated.)

R0 +Ro10 +Rr

0 +Rs10 < I(U3;Z3),

Rs10 +Ro

10 +Rr0 < I(U3; Y1|U),

R′11 +R′′

11 +Ro11 +Rr

1 < I(V ; Y1|U3).

Equivocation rate constraints :

Ro10 +Rr

o > I(U3;Z2|U),

R′′11 +Rr

1 +Ro11 > I(V ;Z2|U3),

Rr1 +Ro

11 > I(V ;Z3|U3).

Greater than or equal to zero constraints :

Ro10, R

o0, R

′11, R

′′11, R

n1 , R

r0 ≥ 0.

Equality constraints :

R1 = Ro10 +Rs

10 +R′11 +R′′

11 +Ro11,

Re2 = Rs10 +R′

11,

Re3 = R′11 +R′′

11.

Applying Fourier-Motzkin elimination yields the rate-equivocation region for Case

one. Sketch of achievability for the other cases are given in Appendix A.5.

We now establish an outer bound and use it to show that the inner bound in

Proposition 2.3 is tight in several special cases. In contrast to the case with no

secrecy constraint [26], the assumption of a stochastic encoder makes it difficult to

match our inner and outer bounds in general.

Proposition 2.4. An outer bound on the secrecy capacity of the multilevel 3-receiver

broadcast channel with one common and one confidential messages is given by the set

of rate tuples (R0, R1, Re2, Re3) such that

R0 ≤ min{I(U ;Z2), I(U3;Z3)},

R1 ≤ I(V ; Y1|U),

R0 +R1 ≤ I(U3;Z3) + I(V ; Y1|U3),

Re2 ≤ I(X ; Y1|U)− I(X ;Z2|U),

Re2 ≤ [I(U3;Z3)−R0 − I(U3;Z2|U)]+ + I(X ; Y1|U3)− I(X ;Z2|U3),

Re3 ≤ [I(V ; Y1|U3)− I(V ;Z3|U3)]+


Proof of this Proposition uses a combination of standard converse techniques from

[12], [11], and [7] and given in Appendix A.6.

Remark 2.5.1. As we can see in the inequalities governing Re2 in both the inner

and outer bounds, there is a tradeoff between the common message rate and the equiv-

ocation rate at receiver Z2. A higher common message rate limits the number of

40


codewords that can be generated to confuse the eavesdropper.

Special Cases

Using Propositions 2.3 and 2.4, we can establish the secrecy capacity region for the

following special cases.

Y1 more capable than Z3 and Z3 more capable than Z2: If Y1 is more capable [12]

than Z3 and Z3 is more capable than Z2, the capacity region is given by:

R0 ≤ min{I(U ; Y2), I(U3;Z3)},

R1 ≤ I(X ; Y1|U),

R0 +R1 ≤ I(U3;Z3) + I(X ; Y1|U3),

Re2 ≤ I(X ; Y1|U)− I(X ;Z2|U),

Re2 ≤ [I(U3;Z3)−R0 − I(U3;Z2|U)]+ + I(X ; Y1|U3)− I(X ;Z2|U3),

Re3 ≤ I(X ; Y1|U3)− I(X ;Z3|U3)

for some p(u, u3, x) = p(u)p(u3|u)p(x|u3).

Achievability follows directly from Proposition 2.3 by setting V = X and observing

that since Z3 is more capable than Z2, the inequality Re2+Re3 ≤ R1+ I(X ; Y1|U3)−

I(X ;Z2|U3) is redundant since I(X ; Y1|U3)−I(X ;Z2|U3) ≥ I(X ; Y1|U3)−I(X ;Z3|U3)

from the more capable condition. For the converse, from Proposition 2.4, observe that

since Y1 is more capable than Z3, we have

I(V ; Y1|U3)− I(V ;Z3|U3) = I(V,X ; Y1|U3)− I(V,X ;Z3|U3)

− I(X ; Y1|V ) + I(X ;Z3|V )

≤ I(X ; Y1|U3)− I(X ;Z3|U3).

One eavesdropper : Here, we consider the two scenarios where either Z2 or Z3

is an eavesdropper and the other receiver is neutral, i.e., there is no constraint on

its equivocation rate, but it still decodes a common message. The secrecy capacity

regions for these two scenarios are as follows. Proofs are omitted since these results

follow from straightforward applications of Propositions 2.3 and 2.4.

Z3 is neutral : The secrecy capacity region is the set of rate tuples (R0, R1, Re2) such

that

R0 ≤ min{I(U ; Y2), I(U3;Z3)},

R1 ≤ I(X ; Y1|U),

R0 +R1 ≤ I(U3;Z3) + I(X ; Y1|U3),

Re2 ≤ I(X ; Y1|U)− I(X ;Z2|U),

Re2 ≤ [I(U3;Z3)− R0 − I(U3;Z2|U)]+ + I(X ; Y1|U3)− I(X ;Z2|U3)

for some p(u, u3, x) = p(u)p(u3|u)p(x|u3).

Z2 is neutral : The secrecy capacity region is the set of rate tuples (R0, R1, Re3) such

that

R0 ≤ min{I(U ;Z2), I(U3;Z3)},

R1 ≤ I(V ; Y1|U),

R0 +R1 ≤ I(U3;Z3) + I(V ; Y1|U3),

Re3 ≤ [I(V ; Y1|U3)− I(V ;Z3|U3)]+


2.6 Summary of Chapter

We presented inner and outer bounds on the secrecy capacity region of the 3-receiver

broadcast channel with common and confidential messages that are strictly larger than

straightforward extensions of the Csiszar–Korner 2-receiver region. We considered

the 2-receiver, 1-eavesdropper and the 1-receiver, 2-eavesdroppers cases. For the

first case, we showed that additional superposition encoding, whereby a codeword is

picked at random from a pre-generated codebook can increase the achievable rate by

allowing the legitimate receiver to indirectly decode the message without sacrificing

42


secrecy. A general lower bound on the secrecy capacity is then obtained by combining

superposition encoding and indirect decoding with Marton coding. This lower bound

is shown to be tight for the reversely degraded product channel and when both Y1

and Y2 are less noisy than the eavesdropper. The lower bound was generalized in

Theorem 2.2 to obtain an inner bound on the secrecy capacity region for the 2-

receiver, 1 eavesdropper case. For the case where both Y1 and Y2 are less noisy than

the eavesdropper, we again show that our inner bound gives the secrecy capacity

region.

We then established inner and outer bounds on the secrecy capacity region for

the 1-receiver, 2-eavesdroppers multilevel wiretap channel. The inner bound and

outer bounds are shown to be tight for several special cases. In the results for both

setups, we observe a tradeoff between the common message rate and the eavesdropper

equivocation rates. A higher common message rate limits the number of codewords

that can be generated to confuse the eavesdroppers about the confidential message. In

addition, in the second setup, a higher common message rate can potentially reduce

the equivocation rate of one eavesdropper while leaving the equivocation rate at the

other eavesdropper unchanged.

Chapter 3

Wiretap Channel with Causal

State Information

3.1 Introduction to Wiretap Channel with Causal

State Information

Consider the 2-receiver wiretap channel with state depicted in Figure 3.1. The sender

X wishes to communicate a message reliably to the legitimate receiver Y while keep-

ing it asymptotically secret from the eavesdropper Z. The secrecy capacity for this

channel can be defined under various scenarios of state information availability at the

encoder and the decoder. When the state information is not available at either party,

the problem reduces to the classical wiretap channel for the channel averaged over

the state and the secrecy capacity is known [36], [7]. When the state is available only

at the decoder, the problem reduces to the wiretap channel with augmented receiver

(Y, S). The interesting cases to consider therefore are when the state information

is available at the encoder and may or may not be available at the decoder. This

raises the question of how the encoder and the decoder can make use of the state

information to increase the secrecy rate. This model is a generalization of the wire-

tap channel with shared secret key in [40] and can be used also as a base model for

secret communication over fast fading channels in which the sender and the receiver

44

CHAPTER 3. WIRETAP CHANNEL WITH STATE 45

MXi

Si

Yi

Zi

M

Encoder

Decoder

Eavesdropper

p(s)

p(y, z|x, s)

Figure 3.1: Wiretap channel with State

have some means for measuring the channel statistics but the eavesdropper does not.

In [5], Chen and Vinck established a lower bound on the secrecy capacity when the

state information is available noncausally only at the encoder. The lower bound is

established using a combination of Gelfand–Pinsker coding and Wyner wiretap cod-

ing. Subsequently, Liu and Chen [23] used the same techniques to establish a lower

bound on the secrecy capacity when the state information is available noncausally at

both the encoder and the decoder. In a related direction, Khisti, Diggavi, and Wor-

nell [19] considered the problem of secret key agreement first studied in [25] and [1]

for the wiretap channel with state and established the secret key capacity when the

state is available causally or noncausally at the encoder and the decoder. The key

is generated in two parts; the first using a wiretap channel code while treating the

state sequence as a time-sharing sequence, and the second is generated from the state

itself.

In this chapter, we consider the wiretap channel with state information available

causally at the encoder and the decoder. We establish a lower bound on the secrecy

capacity, which is strictly larger than the lower bound for the noncausal case in [23].

Our achievability scheme, however, is quite different from the scheme in [23]. We use

block Markov coding, Shannon strategy for channels with state [29], and secret key

agreement from state information, which builds on the work in [19]. However, un-

like [19], we are not directly interested in the size of the secret key, but rather in using

the secret key generated from the state sequence in one transmission block to increase

the secrecy rate in the following block. This block Markov scheme causes additional

information leakage through the correlation between the secret key generated in a

block and the received sequences at the eavesdropper in subsequent blocks. We show

that this leakage is asymptotically negligible. Although a similar block Markov cod-

ing scheme was used in [2] to establish the secrecy capacity of the degraded wiretap

channel with rate limited secure feedback, in their setup no information about the

key is leaked to the eavesdropper because the feedback link is assumed to be secure.

We also establish an upper bound on the secrecy capacity of the wiretap channel with

state information available noncausally at the encoder and decoder. We show that

the upper bound coincides with the aforementioned lower bound for several classes

of channels. Thus, the secrecy capacity for these classes does not depend on whether

the state information is known causally or noncausally at the encoder.

The rest of the chapter is organized as follows. In Section 3.2, we provide the

needed definitions. In Section 3.3, we summarize and discuss the main results in

this chapter. The proofs of the lower and upper bounds are detailed in Sections 3.4

and 3.5, respectively.

3.2 Problem Definition

Consider a discrete memoryless wiretap channel (DM-WTC) with discrete memoryless

state (DM)

(X ×S, p(y, z|x, s)p(s),Y ,Z) that consists of a finite input alphabet X , finite output

alphabets Y , Z, a finite state alphabet S, a collection of conditional pmfs p(y, z|x, s)

on Y × Z, and a pmf p(s) on the state alphabet S. The sender X wishes to send a

confidential message M ∈ [1 : 2nR] to the receiver Y while keeping it secret from the

eavesdropper Z with either causal or noncausal state information available at both

the encoder and decoder.

A (2nR, n) code for the DM-WTC with causal state information at the encoder

46


and decoder consists of: (i) a message set [1 : 2nR], (ii) an encoder that generates a

symbol Xi(m) according to a conditional pmf p(xi|m, si, xi−1) for i ∈ [1 : n]; and a

decoder that assigns an estimate M or an error message to each received sequence pair

(yn, sn). We assume that the message M is uniformly distributed over the message

set. The probability of error is defined as P(n)e = P{M 6= M}. The information

leakage rate at the eavesdropper Z, which measures the amount of information about

M that leaks out to the eavesdropper, is defined as RL = 1nI(M ;Zn). A secrecy

rate R is said to be achievable if there exists a sequence of codes with P(n)e → 0

and RL → 0 as n → ∞. The secrecy capacity CS−CSI is the supremum of the set of

achievable rates.

We also consider the case when the state information is available noncausally at the

encoder. The only change in the above definitions is that the encoder now generates a

codeword Xn(m) according to a conditional pmf p(xn|m, sn), i.e., a random mapping

that depends on the entire state sequence instead of just the past and present state

sequence. The secrecy capacity for this scenario is denoted by CS−NCSI.

3.3 Summary of Main Results

We present the results in this chapter. The proofs of these results are given in the

following two sections and in the Appendix.

Lower Bound

The main result in this chapter is the following lower bound on the secrecy capacity

of the DM-WTC with state information available causally at both the encoder and

decoder.

Theorem 3.1. The secrecy capacity of the DM-WTC with state information available

causally at the encoder and decoder is lower bounded as

CS−CSI ≥ max

{

maxP1

min {I(U ; Y, S)− I(U ;Z, S) +H(S|Z), I(U ; Y, S)} , (3.1)

maxP2

min{H(S|Z, V ), I(V ; Y |S)}

}

,

where P1 is of the form p(u), v(u, s), p(x|v, s) and P2 is of the form p(v)p(x|v, s).

Note that if S = ∅, the above lower bound reduces to the secrecy capacity for

the wiretap channel. Clearly, this lower bound also holds when noncausal state in-

formation is available at the encoder, since we can always treat the noncausal state

information as causal state information. Define

RS−CSI−1 = maxp(u),v(u,s),p(x|v,s)

min {I(U ; Y, S)− I(U ;Z, S) +H(S|Z), I(U ; Y, S)} (3.2)

RS−CSI−2 = maxp(v)p(x|v,s)

min{H(S|Z, V ), I(V ; Y |S)}.

Then, (3.1) can be expressed as

CS−CSI ≥ max{RS−CSI−1, RS−CSI−2}.

The proof of this theorem is detailed in Section 3.4.

Remark 3.3.1. Using the functional representation lemma [35], RS−CSI−1 can be

equivalently written as

RS−CSI−1 = maxp(v|s)p(x|v,s)

min {I(V ; Y |S)− I(V ;Z|S) +H(S|Z), I(V ; Y |S)} . (3.3)

Unless otherwise stated, this equivalent characterization for RS−CSI−1 will be as-

sumed for the rest of this section to derive other results. From section 3.4 onwards,

we revert back to the original characterization in (3.2).

48


In [23], the authors established the following lower bound for the noncausal case

CS−NCSI ≥ maxp(u|s)p(x|u,s)

(I(U ; Y, S)−max{I(U ;Z), I(U ;S)}) (3.4)

= maxp(u|s)p(x|u,s)

min {I(U ; Y |S)− I(U ;Z|S) + I(S;U |Z), I(U ; Y |S)} .

From (3.3), RS−CSI−1 is clearly at least as large as this lower bound. Hence, our lower

bound (3.1) is at least as large as this lower bound (3.4). We now show that the lower

bound (3.4) is as large as RS−CSI−1.

Fix V ∈ [0 : |V| − 1], p(v|s), and p(x|v, s) in RS−CSI−1. Let U ∈ [0 : |V||S| − 1] in

bound (3.4). Define the conditional probability mass functions: For u = v + s|V|, let

p(u|s) = p(v|s), p(x|u, s) = p(x|v, s), and let p(u|s) = p(x|u, s) = 0 otherwise. Under

this mapping, it is easy to see that H(S|Z, U) = 0 and the other terms in (3.4) reduce

to those in RS−CSI−1.

The following shows that our lower bound (3.1) can be strictly larger than (3.4).

Example: Consider the channel in Figure 3.2, where X ,Y ,Z,S ∈ {0, 1}, p(y, z|x, s) =

p(y, z|x), and the conditional pmf defined in the figure. The state S with entropy

H(S) = 1−H(0.1) is observed by both X and Y .

0.1

0.1

00

11

X Z Y

S

Figure 3.2: Example

By setting V = X independent of S and P{X = 1} = P{X = 0} = 0.5 in

RS−CSI−2, we obtain RS−CSI−2 ≥ 1−H(0.1).

We now show that RS−CSI−1 is strictly smaller than 1−H(0.1). First, note that

I(V ; Y |S) = H(Y |S)−H(Y |V, S)

≤ H(Y )−H(Y |X)

= I(X ; Y ) ≤ 1−H(0.1).

However, for RS−CSI−1 ≥ 1 −H(0.1), we must have I(V ; Y |S) ≥ 1 −H(0.1). Hence,

we must have I(V ; Y |S) = 1−H(0.1). Next, consider

I(V ; Y |S) = H(Y |S)−H(Y |V, S)

(a)

≤ 1−H(Y |V, S)

(b)

≤ 1−H(Y |V, S,X)

= 1−H(0.1).

Step (a) holds with equality iff p(y|s) = 0.5 for all y, s ∈ {0, 1}. From the structure

of the channel, this implies that p(x|s) = 0.5 for all x, s ∈ {0, 1}. Step (b) holds

with equality iff H(Y |X, V, S) = H(Y |V, S), or equivalently I(X ; Y |V, S) = 0. This

implies that given V, S,X and Y are independent, p(x, y|v, s) = p(x|v, s)p(y|v, s). But

since p(x, y|v, s) = p(x|v, s)p(y|x), either (i) p(x|v, s) = 0 or (ii) p(y|v, s) = p(y|x)

must hold. Now, consider the pair x = 1, y = 1. Then, we must have either (i)

p(x = 1|v, s) = 0 or (ii) p(y = 1|v, s) = p(y = 1|x = 1) = 0.9. In (i), X is a function

of V and S. In (ii), we have

p(y = 1|v, s) = p(x = 1|v, s)p(y = 1|x = 1) + (1− p(x = 1|v, s))p(y = 1|x = 0)

= 0.9p(x = 1|v, s) + 0.1− 0.1p(x = 1|v, s)

= 0.8p(x = 1|v, s) + 0.1.

Since p(y = 1|v, s) = 0.9, we have 0.8p(x = 1|v, s) + 0.1 = 0.9, which implies that

p(x = 1|v, s) = 1. Here again X is a function of V and S. Hence, in both cases

(i) and (ii), X must be a function of V and S, which implies that Z = X is also a

50


function of V and S. Using the fact that p(x|s) = p(z|s) = 0.5 for all x, s, we have

I(V ;Z|S) = H(Z|S)−H(Z|V, S)

= H(X|S)

= 1.

The first expression in RS−CSI−1 is then upper bounded by

I(V ; Y |S)− I(V ;Z|S) +H(S|Z) ≤ I(V ; Y |S)− I(V ;Z|S) +H(S)

= 1−H(0.1)− 1 + 1−H(0.1)

= 1− 2H(0.1) < 1−H(0.1).

This shows that RS−CSI−1 < RS−CSI−2.

The proof of Theorem 3.1 is detailed in Section 3.4. To illustrate the main ideas,

we outline the proof for the characterization of RS−CSI−1 in (3.2) for the case when

I(U ; Y, S) − I(U ;Z, S) > 0. Our coding scheme involves the transmission of b − 1

independent messages over b n-transmission blocks. We split each message Mj , j ∈

[2 : b], into two independent messages Mj0 ∈ [1 : 2nR0] and Mj1 ∈ [1 : 2nR1] with

R0 +R1 = R. Codebook generation consists of two steps. The first is the generation

of the message codebook. We randomly generate sequences un(l), l ∈ [1 : 2nI(U ;Y,S)],

and partition the set of indices [1 : 2nI(U ;Y,S)] into 2nR0 equal size bins. The indices in

each bin are further partitioned into 2nR1 equal size sub-bins C(m0, m1). The second

step is to generate the key codebook. We randomly bin the set of state sequences

sn into 2nRK bins B(k). The key Kj−1 used in block j is the bin index of the state

sequence S(j − 1) in block j − 1.

To send message Mj , Mj1 is encrypted using the key Kj−1 to obtain M ′j1 = Mj1⊕

Kj−1. A codeword un(L) is selected uniformly at random from sub-bin C(Mj0,Mj1⊕

Kj−1) and transmitted using Shannon’s strategy as depicted in Figure 3.3. The

decoder uses joint typicality decoding together with its knowledge of the key to decode

message Mj at the end of block j. Finally, at the end of block j, the encoder and the

decoder declare the bin index Kj of the state sequence s(j) as the key to be used in

block j + 1. To show that the messages can be kept asymptotically secret from the

Mj

Mj1

Mj0

M ′j1

Kj−1

Ui

ViXi

p(s)

v(u, s) p(x|v, s)

Si

Si

Figure 3.3: Encoding in block j.

eavesdropper, note that Mj0 is transmitted using Wyner wiretap coding. Hence, it

can be kept secret from the eavesdropper provided I(U ; Y, S)− I(U ;Z, S) > 0. The

key part of the proof is to show that the second part of the message Mj1, which is

encrypted with the key Kj−1, can be kept secret from the eavesdropper. This involves

showing that the eavesdropper has negligible information about Kj−1. However, the

fact that Kj−1 is generated from the state sequence in block j − 1 and used in block

j results in correlation between it and all received sequences at the eavesdropper

from subsequent blocks. We show that if RK < H(S|Z), then the eavesdropper has

negligible information about Kj−1 given all its received sequences.

Upper Bound

We establish the following upper bound on the secrecy capacity of the wiretap channel

with noncausal state information available at both the encoder and decoder (which

holds also for the causal case).

Theorem 3.2. The following is an upper bound to the secrecy capacity of the DM-

WTC with state noncausally available at the encoder and decoder

CS−NCSI ≤ min {I(V1; Y |U, S)− I(V1;Z|U, S) +H(S|Z, U), I(V2; Y |S)} .

for some U, V1 and V2 such that p(u, v1, v2, x|s) = p(u|s)p(v1|u, s)p(v2|v1, s)p(x|v2, s).

52


The cardinality of the auxiliary random variables can be upper bounded by |U| ≤

|S|(|X |+ 1), |V1| ≤ |U||S|(|X |+ 1) and |V2| ≤ |V1||U||S||X |.

The proof of this theorem is given in Section 3.5.

Secrecy Capacity Results

We show that the upper bound in Theorem 3.2 coincides with the lower bound in

Theorem 3.1 for the following cases.

Class of less noisy channels

We show that Theorems 3.1 and 3.2 are also tight when I(U ; Y |S) ≥ I(U ;Z|S) for

every U such that (U, S) → (X,S) → (Y, Z) form a Markov chain, i.e., when Y is

less noisy than Z (see [20]) for every state s ∈ S [20].

Theorem 3.3. The secrecy capacity for the DM-WTC with the state information

available causally or noncausally at the encoder and decoder when Y is less noisy

than Z is

CS−CSI = CS−NCSI

= maxp(x|s)

min {I(X ; Y |S)− I(X ;Z|S) +H(S|Z), I(X ; Y |S)} .

Consider the special case when p(y, z|x, s) = p(y, z|x) and Z is a degraded version

of Y , then Theorem 3.3 specializes to the secrecy capacity for the wiretap channel

with a key [40]

CS−CSI = CS−NCSI

= maxp(x)

min {I(X ; Y )− I(X ;Z) +H(S), I(X ; Y )} .

Achievability for Theorem 3.3 follows directly from Theorem 3.1 by setting V =

X and observing RS−CSI−1 ≥ RS−CSI−2 since Y is less noisy than Z. Hence, the

achievability scheme for RS−CSI−1 is optimal for this class of channels. To establish

the converse, we use the less noisy assumption to strengthen the first inequality in

Theorem 3.2 as follows

I(V1; Y |U, S)− I(V1;Z|U, S) +H(S|Z, U) ≤ I(V1; Y |U, S)− I(V1;Z|U, S) +H(S|Z)

(a)

≤ I(V1; Y |S)− I(V1;Z|S) +H(S|Z)

(b)

≤ I(X ; Y |S)− I(X ;Z|S) +H(S|Z),

where (a), (b) follow from the less noisy assumption. The proof of the second inequal-

ity follows by the data processing inequality, which yields I(V2; Y |S) ≤ I(X ; Y |S).

Channel is independent of state and eavesdropper is less noisy than re-

ceiver

Next, consider the case where p(y, z|x, s) = p(y, z|x) and the eavesdropper Z is less

noisy than Y , that is, I(U ;Z) ≥ I(U ; Y ) for every U such that U → X → (Y, Z).

Then, the capacity of this special class of channels is

CS−CSI = CS−NCSI = maxp(x)

min{H(S), I(X ; Y )}.

Achievability follows by setting V = X independent of S. We note that the scheme

here is basically a “one-time pad” scheme where the message that is transmitted is

scrambled with a key generated from the state sequence. Here, the secrecy capacity

is achieved by RS−CSI−2, and hence, RS−CSI−2 ≥ RS−CSI−1. The example we gave to

illustrate the fact that RS−CSI−2 can be larger than RS−CSI−1 is a special case of this

class of channels. The converse follows from Theorem 3.2 and the observation that

since Z is less noisy than Y and p(y, z|x, s) = p(y, z|x),

I(V1; Y |U, S)− I(V1;Z|U, S) +H(S|Z, U) ≤ H(S|Z, U)

≤ H(S),

54


and I(V2; Y |S) ≤ I(X ; Y ).

Specific mutual information constraints

Following the lines of [5], we can show that Theorems 3.1 and 3.2 are tight for the

following two special cases. The achievability scheme for RS−CSI−1 is optimal for

both special cases.

(i) If there exists a V ∗ such that maxp(v|s)p(x|v,s)(I(V ; Y |S)−I(V ;Z|S)+H(S|Z)) =

I(V ∗; Y |S) − I(V ∗;Z|S) + H(S|Z) and I(V ∗; Y |S) − I(V ∗;Z|S) + H(S|Z) ≤

I(V ∗; Y |S), then the secrecy capacity is CS−CSI = CS−NCSI = I(V ∗; Y |S) −

I(V ∗;Z|S) +H(S|Z).

(ii) If there exists a V ′ such that maxp(v|s)p(x|v,s) I(V ; Y |S) = I(V ′; Y |S) and I(V ′; Y |S) ≤

I(V ′; Y |S)−I(V ′;Z|S)+H(S|Z), then the secrecy capacity is CS−CSI = CS−NCSI =

I(V ′; Y |S).

3.4 Proof of Theorem 3.1

We prove achievability of RS−CSI−1 and RS−CSI−2 separately. The proof of forRS−CSI−1

is split into Cases 1 and 2 while RS−CSI−2 is proved in Case 3.

Case 1: RS−CSI−1 with I(U ; Y, S) > I(U ;Z, S)

Codebook generation

Split messageMj into two independent messagesMj0 ∈ [1 : 2nR0 ] andMj1 ∈ [1 : 2nR1 ],

thus R = R0 +R1. Let R ≥ R. The codebook generation consists of two parts.

Message codeword generation: Randomly and independently generate sequences

un(l), l ∈ [1 : 2nR], each according to∏n

i=1 pU(ui) and partition the set of indices

[1 : 2nR] into 2nR0 equal-size bins C(m0), m0 ∈ [1 : 2nR0 ]. Further partition the

indices within each bin C(m0) into 2nR1 equal size sub-bins, C(m0, m1), m1 ∈ [1 :

2nR1]. Hence, l ∈ C(m0, m1) if and only if (m0 + m1 − 1 − 1)2n(R−R0−R1) + 1 ≤ l ≤

(m0 +m1)2n(R−R0−R1).

Key codebook generation: Randomly and uniformly partition the set of sn se-

quences into 2nRK bins B(k), k ∈ [1 : 2nRK ].

Both codebooks are revealed to all parties.

Encoding

We send b− 1 messages over b n-transmission blocks. In the first block, we randomly

select an index L ∈ C(m10, m′11). The encoder then computes vi = v(ui(L), si) and

transmits a randomly generated symbol Xi ∼ p(xi|si, vi) for i ∈ [1 : n]. At the end

of the first block, the encoder and the decoder declare k1 ∈ [1 : 2nRK ] such that

s(1) ∈ B(k1) as the key to be used in block 2.

Encoding in block j ∈ [2 : b] proceeds as follows. To send messagemj = (mj0, mj1)

and given key kj−1, the encoder computes m′j1 = mj1 ⊕ kj−1. To ensure secrecy, we

must have R1 ≤ RK [28]. The encoder then randomly selects an index L such that

L ∈ C(mj0, m′j1). It then computes vi = v(ui(L), si) and transmits a randomly

generated symbol Xi ∼ p(xi|si, vi) for i ∈ [(j − 1)n+ 1 : jn].


At the end of block j, the decoder declares that l is sent if it is the unique index

such that (un(l),Y(j),S(j)) ∈ T(n)ǫ , otherwise it declares an error. It then finds the

index pair (mj0, m′j1) such that l ∈ C(mj0, m

′j1). Finally, it recovers mj1 by computing

mj1 = (m′j1 − kj−1)mod 2nRK .

To analyze the probability of error, let ǫ′′ > ǫ′ > ǫ > 0 and define the following

events for every j ∈ [2 : b]:

E(j) = {Mj 6= Mj},

E1(j) = {(Un(L),S(j)) /∈ T nǫ′ },

E2(j) = {(Un(L),S(j),Y(j)) /∈ T nǫ′′},

E3(j) = {(Un(l),S(j),Y(j)) ∈ T nǫ′′ for some l 6= L}.

56


The probability of error is upper bounded as

P(E) = P{∪bj=2E(j)} ≤

b∑

j=2

P(E(j)).

Each probability of error term can be upper bounded as

P(E(j)) ≤ P(E1(j)) + P(E2(j) ∩ E c1(j)) + P(E3(j) ∩ E c

2(j)).

Now, P(E1(j)) → 0 as n → ∞ by the law of large numbers (LLN) since P{(Un(L) ∈

T(n)ǫ )} → 1 as n → ∞ and S(j) ∼

∏ni=1 pS(si) =

∏ni=1 pS|U(si|ui) by independence.

The term P(E2(j) ∩ E c1(j)) → 0 as n → ∞ by LLN since (Un(L),S(j) ∈ T n

e′ and

Y n ∼∏n

i=1 pY |U, S(yi|ui, si). For the last term, consider

P(E3(j) ∩ E c2(j)) =

∑

l

p(l)P(E3(j) ∩ E c2(j)|L = l).

Note that L is independent of the transmission codebook sequences (un(l), l ∈ [1 :

2nR]) and the current state sequence S(j). Therefore, by the packing lemma [13,

Lecture 3], P(E3(j) ∩ E c2(j)|L = l) → 0 as n → ∞ if R < I(U ; Y, S)− δ(ǫ′′). Hence,

P(E3(j) ∩ E c2(j)) → 0 as n → ∞ if R < I(U ; Y, S)− δ(ǫ′′).

Analysis of the information leakage rate

Let Z(j) denote the eavesdropper’s received sequence in block j ∈ [1 : b] and Zj =

(Z(1), . . . ,Z(j)). We will need the following two results.

Proposition 3.1. If RK < H(S|Z) − 4δ(ǫ) and R ≥ I(U ;Z, S) + δ(ǫ), then the

following holds for every j ∈ [1 : b].

1. H(Kj|C) ≥ n(RK − δ(ǫ)).

2. I(Kj;Z(j)|C) ≤ n(δ′(ǫ) + δ′′(ǫ)). .

3. I(Kj;Zj |C) ≤ nδ′′′(ǫ), where δ′(ǫ) → 0, δ′′(ǫ) → 0 and δ′′′(ǫ) → 0 as ǫ → 0.

The proof of this proposition is given in Appendix B.1.

The second result is Lemma 2.1 from Chapter 2, which we restate here for conve-

nience.

Lemma 3.1. Let (U, V, Z) ∼ p(u, v, z), R ≥ 0, and ǫ > 0. Let Un be a random

sequence distributed according to∏n

i=1 pU(ui). Let V n(l), l ∈ [1 : 2nR], be a set of

random sequences that are conditionally independent given Un and each distributed

according to∏n

i=1 pV |U(vi|ui). Define C = {Un, V n(l)}. Let L ∈ [1 : 2nR] be a

random index distributed according to an arbitrary pmf. Then, if P{(Un, V n(L), Zn) ∈

T(n)ǫ } → 1 as n → ∞ and R > I(V ;Z|U) + δ(ǫ), there exists a δ′(ǫ) > 0, where

δ′(ǫ) → 0 as ǫ → 0, such that, for n sufficiently large, H(L|Zn, Un, C) ≤ n(R −

I(V ;Z|U)) + nδ′(ǫ).

We are now ready to upper bound the leakage rate averaged over codes. Consider

I(M2,M3, . . . ,Mb;Zb|C) =

b∑

j=2

I(Mj ;Zb|C,M b

j+1)

(a)

≤b∑

j=2

I(Mj ;Zb|C,S(j),M b

j+1)

(b)=

b∑

j=2

I(Mj ;Zj|C,S(j)),

where (a) follows by the independence of Mj and (S(j),M bj+1), and (b) follows by

the Markov Chain relationship (Zbj+1,M

bj+1, C) → (Zj ,S(j), C) → (Mj, C). Hence, it

suffices to upper bound each individual term I(Mj ;Zj|C,S(j)). Consider

I(Mj ;Zj|C,S(j)) = I(Mj0,Mj1;Z

j |C,S(j))

= I(Mj0,Mj1;Zj−1|C,S(j)) + I(Mj0,Mj1;Z(j)|C,S(j),Z

j−1).

58

where (a) follows from the fact that H(Zi(j)|C,Si(j)) ≤ H(Zi(j)|Si(j)) = H(Z|S)

and H(Zi(j)|C, Ui,Si(j)) = H(Z|U, S). Step (b) follows by Lemma 3.1 which requires

that (i) P{(Un(L),S(j),Z(j)) ∈ T(n)ǫ } → 1 as n → ∞, and (ii) R−R0 > I(U ;Z, S)+

δ(ǫ); (i) can be established using the same steps as in the analysis of probability of

error. Step (c) follows by the independence of U and S. Step (d) follows by the Markov

Chain relationship (Zj−1,Mj0,S(j)) → (Kj−1,Mj0,S(j)) → (Mj1 ⊕Kj−1,Mj0,S(j)).

The last step follows by the fact that Mj1 is independent of (C,Mj0,S(j), Kj−1) and

uniformly distributed over [1 : 2nRK ].

Next, consider the second term

I(Mj1;Z(j)|C,Mj0,S(j),Zj−1) ≤ I(Mj1, L;Z(j)|C,Mj0,S(j),Z

j−1)

− I(L;Z(j)|C,Mj0,Mj1,S(j),Zj−1)

≤ I(Un;Z(j)|C,Mj0,S(j),Zj−1)

−H(L|C,Mj0,Mj1,S(j),Zj−1)

+H(L|C,Mj0,Mj1,S(j),Zj)

(a)

≤ nI(U ;Z|S)−H(L|C,Mj0,Mj1,S(j),Zj−1)

+H(L|C,Mj0,Mj1,S(j),Zj)

(b)

≤ nI(U ;Z|S)−H(L|C,Mj0,Mj1,S(j),Zj−1)

+ n(R− R0)− nI(U ;Z, S) + nδ′(ǫ)

= n(R− R0)−H(L|C,Mj0,Mj1,S(j),Zj−1)

+ nδ′(ǫ),

where (a) follows from the same steps used in bounding I(Mj0;Z(j)|C,S(j),Zj−1) and

(b) follows from Lemma 3.1, which requires the same condition R−R0 > I(U ;Z, S)+

δ(ǫ). Next consider

H(L|C,Mj0,Mj1,S(j),Zj−1) = H(Mj1 ⊕Kj−1|C,Mj0,Mj1,S(j),Z

j−1)

+H(L|C,Mj0,Mj1,Mj1 ⊕Kj−1,S(j),Zj−1)

= H(Kj−1|C,Mj0,Mj1,S(j),Zj−1)

60


+ n(R− R0 − RK)

= H(Kj−1|C,Zj−1) + n(R− R0 −RK).

From Proposition 3.1, H(Kj−1|C,Zj−1) ≥ n(RK − δ(ǫ)− δ′′′(ǫ)), which implies that

I(Mj1;Z(j)|C,Mj0,S(j),Zj−1) ≤ n(δ(ǫ) + δ′(ǫ)) + nδ′′′(ǫ).

This completes the analysis of the information leakage rate.

Rate analysis

From the analysis of probability of error and information leakage rate, we see that

the rate constraints are

R < I(U ; Y, S)− δ(ǫ),

R−R0 > I(U ;Z, S) + δ(ǫ),

RK < H(S|Z)− 4δ(ǫ),

R0 +R1 ≤ R,

R1 ≤ RK ,

R = R0 +R1,

R ≥ 0, R0 ≥ 0, R1 ≥ 0, RK ≥ 0.

Using Fourier-Motzkin elimination (e.g., see Lecture 6 of [13]), we obtain

R < maxp(u),v(u,s),p(x|s,v)

min {I(U ; Y, S)− I(U ;Z, S) +H(S|Z), I(U ; Y, S)}

(a)= max

p(u),v(u,s),p(x|s,v)min {I(V ; Y |S)− I(V ;Z|S) +H(S|Z), I(V ; Y |S)} ,

where (a) follows by the independence of U and S and the fact that V is a function

of U and S.

Case 2: RS−CSI−1 with I(U ; Y, S) ≤ I(U ;Z, S)

In this case, only part of the key generated from the previous block is used to encrypt

the message transmitted to the eavesdropper. The other part of the key is used to

generate additional uncertainty about the message sent at the eavesdropper to ensure

that a sufficiently large secret key rate is achieved in the current block. Note that we

only need to consider the case where H(S|Z)− (I(U ;Z, S)− I(U ; Y, S)) > 0.

Codebook generation

Codebook generation again consists of two parts.

Message codebook generation: Let R ≥ Rd and R ≤ R − Rd. Randomly and

independently generate sequences un(l), l ∈ [1 : 2nR], each according to∏n

i=1 pU(ui)

and partition the set of indices [1 : 2nR] into 2nRd equal-size bins C(md), md ∈

[1 : 2nRd]. We further partition the set of indices in each bin C(md) into sub-bins,

C(md, m), m ∈ [1 : 2nR].

Key codebook generation: We randomly bin the set of sn ∈ Sn sequences into 2nRK

bins B(k), k ∈ [1 : 2nRK ].

Encoding


select an index L. The encoder then computes vi = v(ui(L), si), i ∈ [1 : n], and

transmits a randomly generates sequence Xn according to∏n

i=1 pX|S,V (xi|si, vi). At

the end of the first block, the encoder and decoder declare k1 ∈ [1 : 2nRK ] such that


Encoding in block j ∈ [2 : b] is as follows. We split the key kj−1 into two indepen-

dent parts, Kj−1,d and Kj−1,m at rates Rd and R, respectively. To send message mj ,

the encoder computes m′ = mj ⊕ k(j−1)m. This requires that RK ≥ R +Rd. The en-

coder then randomly selects an index L ∈ C(k(j−1)d, m′). At time i ∈ [(j−1)n+1 : jn],

it computes vi = (ui(L), si), i ∈ [1 : n], and transmits a randomly generated sequence

Xn according to∏n

i=1 pX|S,V (xi|si, vi).

62



At the end of block j, the decoder declares that l is sent if it is the unique index such

that (un(l),Y(j),S(j)) ∈ T(n)ǫ and l ∈ C(k(j−1)d). Otherwise, it declares an error. It

then finds the index m′ such that un(l) ∈ C(k(j−1)d, m′). Finally, it recovers mj by

computing mj = (m′ − k(j−1)m)mod 2nR . Following similar steps to the analysis for

Case 1, it can be shown that Pe → 0 as n → ∞ if R− Rd < I(U ; Y, S)− δ(ǫ).


Following the same steps as for Case 1, we can show that it suffices to upper bound

the terms I(Mj ;Z(j)|C,S(j),Zj−1) for j ∈ [2 : b]. Define M ′

j = Mj ⊕ K(j−1)m and

consider

I(Mj ;Z(j)|C,S(j),Zj−1) = H(Mj)−H(Mj |C,S(j),Z

j)

≤ H(Mj)−H(Mj|C,S(j), K(j−1)d,M′j,Z

j)

= H(Mj)−H(Mj |C, K(j−1)d,M′j ,Z

j−1)

= H(Mj)−H(Mj |C,Zj−1, K(j−1)d)

−H(M ′j |C,Z

j−1, K(j−1)d,Mj) +H(M ′j|C,Z

j−1, K(j−1)d)

= nR −H(Mj) +H(M ′j|C,Z

j−1, K(j−1)d)

−H(M ′j|C,Z

j−1, K(j−1)d,Mj)

≤ nR−H(K(j−1)m|C,Zj−1, K(j−1)d).

Next, we show that

I(K(j−1)m;Zj−1|C, K(j−1)d) ≤ nδ′′′(ǫ), (3.5)

H(K(j−1)m|C, K(j−1)d) ≥ n(RK −Rd − δ(ǫ)), (3.6)

which implies

I(Mj ;Z(j)|C,S(j),Zj−1) ≤ nR − n(RK − Rd) + n(δ′′′(ǫ) + δ(ǫ)).

Hence, the rate of information leakage tends to zero as n → ∞ if R ≤ RK − Rd. To

prove (3.5) and (3.6), we need the following.

Proposition 3.2. If R > I(U ;Z, S) + δ(ǫ) and RK < H(S|Z) − 4δ(ǫ), then for all

j ∈ [1 : b],

1. H(Kj|C) ≥ n(RK − δ(ǫ)).

2. I(Kj;Z(j)|C) ≤ n(δ(ǫ) + δ′(ǫ) + δ′′(ǫ)).

3. I(Kj;Zj |C) ≤ nδ′′′(ǫ), where δ′(ǫ) → 0, δ′′(ǫ) → 0 and δ′′′(ǫ) → 0 as ǫ → 0.

The proof of this proposition is given in Appendix B.2.

Part 3 of Proposition 3.2 implies (3.5) since

I(Kj−1;Zj−1|C) = I(K(j−1)d, K(j−1)m;Z

j−1|C)

= I(K(j−1)d;Zj−1|C) + I(K(j−1)m;Z

j−1|C, K(j−1)d).

Part 1 of Proposition 3.2 implies (3.6) sinceH(K(j−1)|C) = H(K(j−1)m, K(j−1)d|C) ≥

n(RK − δ(ǫ)), which implies that H(K(j−1)m|C, K(j−1)d) ≥ n(RK − Rd − δ(ǫ)).

Rate analysis

The following rate constraints are necessary for Case 2.

R > I(U ;Z, S) + δ(ǫ),

R −Rd < I(U ; Y, S)− δ(ǫ),

R ≤ R− Rd,

RK < H(S|Z)− 4δ(ǫ),

R ≤ RK −Rd,

R ≥ 0, RK ≥ 0, Rd ≥ 0, R ≥ 0.

Using Fourier Motzkin elimination, we obtain

R < maxp(u),v(u,s),p(x|s,v)

min {I(U ; Y, S)− I(U ;Z, S) +H(S|Z), I(U ; Y, S)}

64


= maxp(u),v(u,s),p(x|s,v)

min {I(V ; Y |S)− I(V ;Z|S) +H(S|Z), I(V ; Y |S)} .

Case 3: RS−CSI−2

Achievability of RS−CSI−2 uses the same techniques as Case 2 for RS−CSI−1. However,

here the key generated in a block is used only to encrypt the message in the follow-

ing block. The eavesdropper may be able to decode the message transmitted in a

block, which would reduce the key rate generated at the end of that block. This is

compensated for by the fact that the entire key is used for encryption. The code-

book generation, encoding and analysis of probability of error and equivocation are

therefore similar to that in Case 2.

As an outline, in each block, we generate a key Kj of size 2nH(S|V,Z). In the next

block, we encrypt the message Mj+1 using Kj to obtain M ′j+1 = Mj+1 ⊕ Kj. A

codeword V n(M ′j+1) is then selected from a codebook of size less than 2nI(V ;Y,S). This

codeword is then transmitted by generating Xi according to pX(Vi, Si) for i ∈ [1 : n].

The decoder first decodes M ′j+1 using (Y(j + 1),S(j + 1)) and then decrypts the

message using Mj+1 = (M ′j+1 −Kj)mod 2nRK .

Remark 3.4.1. An important difference between the schemes for achieving RS−CSI−1

and RS−CSI−2 is that the transmitted codeword in the scheme for RS−CSI−1 can be

kept secret from the eavesdropper through a combination of Wyner wiretap coding and

encryption of the codebook using the secret key. This fact was made clear in Case 2 of

the proof, where part of the key was explicitly used to encrypt the codebook instead of

the message. On the other hand, no effort was made to keep the transmitted codeword

secret from the eavesdropper in the scheme for RS−CSI−2. Hence, one cannot assume

that the eavesdropper has no information about the codeword sent. As such, we did

not use the Shannon strategy as in Figure 3.1 in the encoding part for RS−CSI−2. If

we had used the Shannon strategy, we would have obtained the expression

R′S−CSI−2 = max

p(u),v(u,s),p(x|s,v){H(S|Z, U), I(U ; Y, S)}

= maxp(u),v(u,s),p(x|s,v)

{H(S|Z, U), I(V ; Y |S)}.

The first term in R′S−CSI−2, H(S|Z, U), is the rate of the secret key rate generated

assuming that the eavesdropper knows the transmitted codeword. As a result of this

term, R′S−CSI−2 is simply a special case of our original RS−CSI−2 expression, where

we maximized the rate over p(v) (V independent of S). This is in contrast to the

equivalent characterization of RS−CSI−1 in (3.3), where we were able to perform the

maximization of the rate over p(v|s).

We now turn to the proof of achievability of RS−CSI−2.

Codebook generation

Codebook generation again consists of two parts.

Message codebook generation: Randomly and independently generate sequences

vn(l), l ∈ [1 : 2nR], each according to∏n

i=1 pV (vi).

Key codebook generation: Set RK = R. Randomly bin the set of sn ∈ Sn sequences

into 2nRK bins B(k), k ∈ [1 : 2nRK ].

Encoding


select an index L ∈ [1 : 2nR]. The encoder then selects vn(L) transmits a randomly

generated sequence Xn according to∏n

i=1 pX|S,V (xi|si, vi). At the end of the first

block, the encoder and the decoder declare the index k1 ∈ [1 : 2nRK ] such that


Encoding in block j ∈ [2 : b] is as follows. To send message mj , the encoder

computes the encrypted message m′ = mj⊕kj−1. It then selects the sequence vn(m′).

At time i ∈ [(j − 1)n + 1 : jn], it transmits a randomly generated sequence Xn

according to∏n

i=1 pX|S,V (xi|si, vi).


At the end of block j, the decoder declares that m′ is sent if it is the unique index such

that (vn(m′),Y(j),S(j)) ∈ T(n)ǫ . Otherwise, it declares an error. It then recovers mj

66


by computing mj = (m′ − kj−1)mod 2nRK . Following similar steps to the analysis for

Case 1, it can be shown that the probability of error tends to zero as n → ∞ if

R < I(V ; Y, S)− δ(ǫ).


Following the same steps as for Case 1, we can show that it suffices to upper bound

the terms I(Mj ;Z(j)|C,S(j),Zj−1) for j ∈ [2 : b]. Consider

I(Mj;Z(j)|C,S(j),Zj−1) = H(Mj)−H(Mj |C,S(j),Z

j)

≤ H(Mj)−H(Mj|C,S(j),Mj ⊕Kj−1,Zj)

= H(Mj)−H(Mj |C,Mj ⊕Kj−1,Zj−1)

= H(Mj)−H(Mj ⊕Kj−1,Mj |C,Zj−1)

+H(Mj ⊕Kj−1|C,Zj−1)

≤ nR−H(Mj|C,Zj−1)

−H(Mj ⊕Kj−1|C,Zj−1,Mj) + nR

= nR −H(Kj−1|C,Zj−1).

Next, we show that

I(Kj−1;Zj−1|C) ≤ nδ′′(ǫ), (3.7)

H(Kj−1|C) ≥ n(RK − δ(ǫ)). (3.8)

which would imply

I(Mj ;Z(j)|C,S(j),Zj−1) ≤ n(δ′′(ǫ) + δ(ǫ)).

To prove (3.7) and (3.8), we will use the following.

Proposition 3.3. If RK < H(S|Z, V )− 4δ(ǫ), then for all j ∈ [1 : b],

1. H(Kj|C) ≥ n(RK − δ(ǫ)).

2. I(Kj;Z(j)|C) ≤ nδ′(ǫ).

3. I(Kj;Zj |C) ≤ nδ′′(ǫ), where δ′(ǫ) → 0 and δ′′(ǫ) → 0 as ǫ → 0.

The proof of this proposition is given in Appendix B.3. It is clear that equa-

tions (3.7) and (3.8) are implied by Proposition 3.3, which completes the analysis of

information leakage rate.

Rate analysis

The following rate constraints are necessary for Case 3.

R = RK ,

R < I(V ; Y, S)− δ(ǫ),

RK < H(S|Z, V )− 4δ(ǫ),

RK ≥ 0, R ≥ 0.

Using Fourier Motzkin elimination, these constraints imply the achievability of

R < maxp(v)p(x|s,v)

min{H(S|Z, V ), I(V ; Y |S)}.

3.5 Proof of Theorem 3.2

For any sequence of codes with probability of error and leakage rate that approach

zero as n → ∞, consider

nR = H(M)

(a)

≤ I(M ; Y n, Sn) + nǫn

(b)

≤ I(M ; Y n, Sn)− I(M ;Zn) + 2nǫn

=

n∑

i=1

(

I(M ; Yi, Si|Yni+1, S

ni+1)− I(M ;Zi|Z

i−1))

+ 2nǫn

68


(c)=

n∑

i=1

(

I(M,Z i−1; Yi, Si|Yni+1, S

ni+1)− I(M,Y n

i+1, Sni+1;Zi|Z

i−1))

+ 2nǫn

(d)=

n∑

i=1

(

I(M ; Yi, Si|Yni+1, S

ni+1, Z

i−1)− I(M ;Zi|Yni+1, S

ni+1, Z

i−1))

+ 2nǫn

(e)=

n∑

i=1

(I(V1i; Yi, Si|Ui)− I(V1i;Zi|Ui)) + 2nǫn

=

n∑

i=1

(I(V1i; Yi, Si|Ui)− I(V1i;Zi, Si|Ui) + I(V1i;Si|Zi, Ui)) + 2nǫn

≤n∑

i=1

(I(V1i; Yi, Si|Ui)− I(V1i;Zi, Si|Ui) +H(Si|Zi, Ui)) + 2nǫn

≤n∑

i=1

(I(V1i; Yi|Ui, Si)− I(V1i;Zi, Si|Ui, Si) +H(Si|Zi, Ui)) + 2nǫn

(f)= n(I(V1; Y |U, S)− I(V1;Z|U, S)) + nH(S|Z, U) + 2nǫn,

where (a) follows by Fano’s inequality; (b) follows by the secrecy condition; (c)

and (d) follow by the Csiszar sum identity; (e) follows by the identifications Ui =

(Y ni+1, S

ni+1, Z

i−1) and V1i = (M,Y ni+1, S

ni+1, Z

i−1); and (f) follows by using the time-

sharing random variable Q and defining U = (UQ, Q), V1 = (V1Q, Q), S = SQ, Y = YQ

and Z = ZQ.

For the second upper bound, we have

nR ≤ I(M ; Y n, Sn) + nǫn

(a)= I(M ; Y n|Sn) + nǫn

=n∑

i=1

I(M ; Yi|Sn, Y n

i+1)

≤n∑

i=1

I(M,Y ni+1, Z

i−1, Sni+1, S

i−1; Yi|Si)

(b)=

n∑

i=1

I(V2i; Yi|Si)

= nI(V2Q; Y |S,Q)

(c)

≤ nI(V2; Y |S),

where (a) follows by the independence of M and Sn; (b) follows by the identification

V2i = (M,Y ni+1, Z

i−1, Sni+1, S

i−1); and (c) follows by defining V2 = (V2Q, Q). The

bounds on cardinality of the auxiliary random variables follow from standard tech-

niques (e.g., see [13, Appendix C]).


We established bounds on the secrecy capacity of the wiretap channel with state

information causally available at the encoder and the decoder. We showed that our

lower bound can be strictly larger than the best known lower bound for the noncausal

state information case. The upper bound holds when the state information is available

noncausally at the encoder and the decoder. We showed that the bounds are tight

for several classes of wiretap channels with state.

As we have seen, the secrecy capacity for several special classes of the wiretap

channels with state available at both the encoder and the decoder does not depend

on whether the state is available causally or noncausally. An interesting question to

explore is whether this observation holds in general for our setup.

We used key generation from state information to improve the message transmis-

sion rate. It may be possible to extend this idea to the case when the state information

is available only at the encoder. This case, however, is not straightforward to analyze

since it would be necessary for the encoder to reveal some state information to the

decoder (and hence partially to the eavesdropper) in order to agree on a secret key,

which would reduce the wiretap coding part of the rate.

70

Chapter 4

Cascade and Triangular Source

Coding with Side Information

4.1 Introduction to Cascade and Triangular Source

Coding

In this chapter, we turn our attention to a multi-terminal source coding problem in

Network Information Theory - the Cascade and Triangular Source Coding problem.

The problem of lossy source coding through a cascade was first considered, in an

information theoretic setup, by Yamamoto [39], where a source node (Node 0) sends

a message to Node 1, which then sends a message to Node 2. Nodes 1 and 2 wish

to reconstruct the source sequence to within given distortions. Since Yamamoto’s

work, the cascade setting has been extended in recent years through incorporating

side information at either Nodes 1 or 2. As mentioned in the first chapter, this model

of Cascade source coding with side information has potential applications in peer to

peer networking, such as video compression and transmission over a network, where

each node may have side information, such as previous video frames, about a video to

be sent from the source. In [33], the authors considered the Cascade problem with side

information Y at Node 1 and Z at Node 2, with the Markov Chain X −Z − Y . The

authors provided inner and outer bounds for this setup and showed that the bounds

71

coincide for the Gaussian case. In [8], the authors considered the Cascade problem

where the side information is known only to the intermediate node and provided inner

and outer bounds for this setup.

Of most relevance to results discussed in this chapter is the work in [27], where the

authors considered the Cascade source coding problem with side information available

at both Node 0 and Node 1 and established the rate distortion region for this setup.

The Cascade setting was then extended to the Triangular source coding setting where

an additional rate limited link is available from the source node to Node 2.

X

Y Y Z

R1 R2

X1

X2

Node 0Node 1

Node 2

Figure 4.1: Cascade source coding setting

X

Y

Y

Z

R1R2

R3

X1

X2

Node 0

Node 1

Node 2

Figure 4.2: Triangular source coding setting.

Given the results in [27], a natural question is whether it can be extended to richer

classes of Cascade source coding problems. A related question is the following. The

achievability scheme in the Cascade result in [27] relies on Node 1 decoding and re-

transmitting the codeword sent by Node 0 to Node 2. This is essentially a special case

72

CHAPTER 4. CASCADE AND TRIANGULAR SOURCE CODING 73

of the decode and re-bin scheme where Node 1 decodes and re-bins the codeword sent

by Node 0 to Node 2. When is this decode and re-bin scheme optimum and what is

the statistical structure of the sources and network topology required for this scheme

to be optimum? In this chapter, we extend the Cascade and Triangular source coding

setting in [27] to include additional side information Z at Node 2, with the constraint

that the Markov chain X − Y −Z holds. Under the Markov constraint, we establish

the rate distortion regions for both the Cascade and Triangular setting. The Cascade

and Triangular settings are shown in Figures 4.1 and 4.2, respectively. In the Cascade

case, we show that, at Node 1, the decode and re-bin scheme is optimum. To the best

of our knowledge, this is the first lossy source coding setting where the decode and

re-bin scheme at the Cascade is shown to be optimum. (In [16], the decode and re-bin

scheme was shown to be optimum for some classes of source statistics in a lossless

setting.) The decode and re-bin appears to rely quite heavily on the fact that the side

information at Node 2 is degraded: Since Node 1 can decode any codeword intended

for Node 2, there is no need for Node 0 to send additional information for Node 1

to relay to Node 2 on the R1 link. Node 0 can therefore tailor the transmission for

Node 1 and rely on Node 1 to decode and minimize the rate required on the R2 link.

We also extend our results to two way source coding through a cascade, where Node

0 wishes to obtain a lossy version of Z through a rate limited link from Node 2 to

Node 0. This setup generalizes the (two rounds) two way source coding result found

in [17]1. The Two Way Cascade Source Coding and Two Way Triangular Source

Coding are given in Figures 4.3 and 4.4, respectively.

The rest of this chapter is as follows. In section 4.2, we provide the formal defi-

nitions and problem setup. In section 4.3, we present and prove our results for the

aforementioned settings. In section 4.4, we consider the Quadratic Gaussian case. We

show that Gaussian auxiliary random variables suffice to exhaust the rate distortion

regions and their parameters may be found through solving a tractable low dimen-

sional optimization problem. We also showed that our Quadratic Gaussian settings

may be transformed into equivalent settings in [27] where explicit characterizations

1 [17] considered multiple rounds. In this chapter, we consider only two rounds and when wemention the results in [17], we mean the two rounds version of the results

X

Y Y

R1 R2

R3

X1

X2ZZ

Node 0Node 1 Node 2

Figure 4.3: Setup for two way cascade source coding.

X

Y

Y

R1R2

R3

R4

X1

X2ZZ

Node 0

Node 1

Node 2

Figure 4.4: Setup for two way triangular source coding.

were given. In the Quadratic Gaussian case, we also extended our settings to solve

a more general case of Two Way Cascade source coding. In section 4.5, we extend

our triangular source coding setup to include a helper, which observes the side infor-

mation Y , and has a rate limited link to Node 2. Our Two Way Cascade Quadratic

Gaussian Extension is shown in Figure 4.5 (in section 4.4), while our helper extension

is shown in Figure 4.7 (in section 4.5).

74



In this section, we give formal definitions for the setups under consideration. The

source sequences under consideration, {Xi ∈ X , i = 1, 2, . . .}, {Yi ∈ Y , i = 1, 2, . . .}

and {Zi ∈ Z, i = 1, 2, . . .}, are drawn from finite alphabets X , Y and Z respectively.

For any i ≥ 1, the random variables (Xi, Yi, Zi) are independent and identically

distributed according to p(x, y, z) = p(x)p(y|x)p(z|y); i.e. X−Y −Z. The distortion

measure between sequences is defined in the usual way. Let d : X × X → [0,∞).

Then,

d(xn, xn) :=1

n

n∑

i=1

d(xi, xi).

4.2.1 Cascade and Triangular Source coding

We give formal definition for the Triangular source coding setting (Figure 4.2). The

Cascade setting follows from specializing the definitions for the Triangular setting by

setting R3 = 0. A (n, 2nR1 , 2nR2, 2nR3) code for the Triangular setting consists of 3

encoders

f1 (at Node 0) : X n ×Yn → M1 ∈ [1 : 2nR1],

f2 (at Node 1) : Yn × [1 : 2nR1] → M2 ∈ [1 : 2nR2 ],

f3 (at Node 0) : X n ×Yn → M3 ∈ [1 : 2nR3],

and 2 decoders

g1 (at Node 1) : Yn × [1 : 2nR1 ] → X n1 ,

g2 (at Node 2) : Zn × [1 : 2nR2 ]× [1 : 2nR3 ] → X n2 .

Given (D1, D2), a (R1, R2, R3, D1, D2) rate-distortion tuple for the triangular source

coding setting is said to be achievable if, for any ǫ > 0, and n sufficiently large, there

exists a (n, 2nR1 , 2nR2, 2nR3) code for the Triangular source coding setting such that

E

[

1

n

n∑

i=1

dj(Xi, Xj,i)

]

≤ Dj + ǫ, j=1,2,

where Xn1 = g1(Y

n, f1(Xn, Y n)) and Xn

2 = g2(Zn, f2(Y

n, f1(Xn, Y n)), f3(X

n, Y n)).

The rate-distortion region, R(D1, D2), is defined as the closure of the set of all

achievable rate-distortion tuples.

Cascade Source coding

The Cascade source coding setting corresponds to the case where R3 = 0.

4.2.2 Two way Cascade and Triangular Source Coding

We give formal definitions for the more general Two way Triangular source coding set-

ting shown in Figure 4.4. A (n, 2nR1, 2nR2, 2nR3, 2nR4) code for the Triangular setting

consists of 4 encoders

f1 (at Node 0) : X n ×Yn → M1 ∈ [1 : 2nR1],

f2 (at Node 1) : Yn × [1 : 2nR1] → M2 ∈ [1 : 2nR2 ],

f3 (at Node 0) : X n ×Yn → M3 ∈ [1 : 2nR3],

f4 (at Node 2) : Zn × [1 : 2nR2 ]× [1 : 2nR3] → M4 ∈ [1 : 2nR4 ],

and 3 decoders

g1 (at Node 1) : Yn × [1 : 2nR1 ] → X n1 ,

g2 (at Node 2) : Zn × [1 : 2nR2 ]× [1 : 2nR3 ] → X n2 ,

g3 (at Node 0) : X n × Yn × [1 : 2nR4 ] → Zn.

Given (D1, D2, D3), a (R1, R2, R3, R4, D1, D2, D3) rate-distortion tuple for the two

76


way triangular source coding setting is said to be achievable if, for any ǫ > 0, and n

sufficiently large, there exists a (n, 2nR1 , 2nR2, 2nR3, 2nR4) code for the two way trian-

gular source coding setting such that

E

[

1

n

n∑

i=1

dj(Xi, Xj,i)

]

≤ Dj + ǫ, j=1,2 and,

E

[

1

n

n∑

i=1

d3(Zi, Zi)

]

≤ Dj + ǫ,

where Xn1 = g1(Y

n, f1(Xn, Y n)), Xn

2 = g2(Zn, f2(Y

n, f1(Xn, Y n)), f3(X

n, Y n)) and

Zn = g3(Xn, Y n, f4(Z

n, f2(Yn, f1(X

n, Y n)), f3(Xn, Y n))).

The rate-distortion region, R(D1, D2, D3), is defined as the closure of the set of

all achievable rate-distortion tuples.

Two way Cascade Source coding

The Two way Cascade source coding setting corresponds to the case where R3 = 0.

In the special case of Two way Cascade setting, we will use R3, rather than R4, to

denote the rate from Node 2 to Node 0.

4.3 Main results

In this section, we present our main results, which are single letter characterizations

of the rate-distortion regions for the four settings introduced in section 4.2. The

single letter characterizations for the Cascade source coding setting, Triangular source

coding setting, Two way Cascade source coding setting and Two way Triangular

source coding setting are given in Theorems 4.1, 4.2, 4.3 and 4.4, respectively. While

Theorems 4.1 to 4.3 can be derived as special cases of Theorem 4.4, for clarity and

to illustrate the development of the main ideas, we will present Theorems 4.1 to 4.4

separately. In each of the Theorems, we will present a sketch of the achievability proof

and proof of the converse. Details of the achievability proofs for Theorems 4.1-4.4

are given in Appendix C. Proofs of the cardinality bounds for the auxiliary random

variables appearing in the Theorems are given in Appendix C.5. In each of the

Theorems presented, the achievability scheme does not require the Markov structure

X−Y −Z, and hence, they can be used even if the sources do not satisfy the Markov

condition. The Markov condition is required for us to prove the converse.

4.3.1 Cascade Source Coding

Theorem 4.1. Rate Distortion region for Cascade source coding

R(D1, D2) for the Cascade source coding setting defined in section 4.2 is given by the

set of all rate tuples (R1, R2) satisfying

R2 ≥ I(U ;X, Y |Z),

R1 ≥ I(X ; X1, U |Y )

for some p(x, y, z, u, x1) = p(x)p(y|x)p(z|y)p(u|x, y)p(x1|x, y, u) and function g2 : U×

Z → X2 such that

E dj(X, Xj) ≤ Dj , j=1,2.

The cardinality of U is upper bounded by |U| ≤ |X ||Y|+ 3.

If Z = ∅, this region reduces to the Cascade source coding region given in [27]. If

Y = X , this setup reduces to the well-known Wyner-Ziv setup [38].

The coding scheme follows from a combination of techniques used in [27] and a

new idea of decoding and re-binning at the Cascade node (Node 1). Node 0 generates

a description Un intended for Nodes 1 and 2. Node 1 decodes Un and then re-bins it

to reduce the rate of communicating Un to Node 2 based on its side information. In

addition, Node 0 generates Xn1 to satisfy the distortion requirement at Node 1. We

now give a sketch of achievability and a proof of the converse.

Sketch of Achievability

We first generate 2n(I(X,Y ;U)+ǫ) Un sequences according to∏n

i=1 p(ui). For each un

and yn sequences, we generate 2n(I(Xn1 ;X|U,Y )+ǫ) Xn

1 sequences according to∏n

i=1 p(xi|ui, yi).

Partition the set of Un sequences into 2n(I(U ;X|Y )+2ǫ) bins, B1(m10). Separately and

78


independently, partition the set of Un sequences into 2n(I(U ;X,Y |Z)+2ǫ) bins, B2(m2),

m2 ∈ [1 : 2n(I(U ;X,Y |Z)+2ǫ)].

Given xn, yn, Node 0 looks for a jointly typical codeword un; that is, (un, xn, yn) ∈

T(n)ǫ . If there are more than one, it selects a codeword uniformly at random from

the set of jointly typical codewords. This operation succeeds with high probability

since there are 2n(I(X,Y ;U)+ǫ) Un sequences. Node 0 then looks for a xn1 that is jointly

typical with un, xn, yn. This operation succeeds with high probability since there

are 2n(I(X1;X|U,Y )+ǫ) xn1 sequences. Node 0 then sends out the bin index m10 such

that un ∈ B1(m10) and the index corresponding to xn1 . This requires a total rate of

R1 = I(U ;X|Y ) + I(X1;X|U, Y ) + 3ǫ.

At Node 1, it recovers un by looking for the unique un sequence in B1(m10) such

that (un, yn) ∈ T(n)ǫ . Since there are only 2n(I(X,Y ;U)−I(U ;X|Y )−ǫ) = 2n(I(U ;Y )−ǫ) se-

quences in the bin, this operation succeeds with high probability. Node 1 reconstructs

xn as xn1 . Node 1 then sends out m2 such that un ∈ B2(m2). This requires a rate of

R2 = I(U ;X, Y |Z) + 2ǫ.

At Node 2, note that since U − (X, Y ) − Z, the sequences (Un, Xn, Y n, Zn) are

jointly typical with high probability. Node 2 looks for the unique un in B2(m2)

such that (un, zn) ∈ T(n)ǫ . From the Markov Chain U − (X, Y ) − Z, I(U ;X, Y ) −

I(U ;X, Y |Z) = I(U ;Z). Hence, this operation succeeds with high probability since

there are only 2n(I(U ;Z)−ǫ) un sequences in the bin. It then reconstructs using x2i =

g2(ui, zi) for i ∈ [1 : n].

Proof of Converse. Given a (n, 2nR1, 2nR2 , D1, D2) code, define

Ui = (X i−1, Y i−1, Z i−1, Zni+1,M2). We have the following.

nR2 ≥ H(M2)

≥ H(M2|Zn)

= I(Xn, Y n;M2|Zn)

=n∑

i=1

I(Xi, Yi;M2|Zn, X i−1, Y i−1)


and

R2 ≥ I(XQ, YQ;UQ|Q,ZQ)

= I(XQ, YQ;UQ, Q|ZQ)

= I(X, Y ;UQ, Q|Z),

R1 ≥ I(XQ; X1Q, UQ|YQ, Q)

= I(X ; X1Q, UQ, Q|Y ).

Defining U = (UQ, Q) and X1Q = X1 then completes the proof. The existence of

the reconstruction function g2 follows from the definition of U . The Markov Chains

U − (X, Y )−Z and Z− (U,X, Y )− X1 required to factor the probability distribution

stated in the Theorem also follow from definitions of U and X1.

We now extend Theorem 4.1 to the Triangular Source coding setting.

4.3.2 Triangular Source Coding

Theorem 4.2. Rate Distortion Region for Triangular Source Coding

R(D1, D2) for the Triangular source coding setting defined in section 4.2 is given by

the set of all rate tuples (R1, R2, R3) satisfying

R1 ≥ I(X ; X1, U |Y ),

R2 ≥ I(X, Y ;U |Z),

R3 ≥ I(X, Y ;V |U,Z)

for some p(x, y, z, u, v, x1) = p(x)p(y|x)p(z|y)p(u|x, y)p(x1|x, y, u)p(v|x, y, u) and func-

tion g2 : U × V × Z → X2 such that

E dj(X, Xj) ≤ Dj , j=1,2.

The cardinalities for the auxiliary random variables can be upper bounded by |U| ≤

|X ||Y|+ 4 and |V| ≤ (|X ||Y|+ 4)(|X ||Y|+ 1).

If Z = ∅, this region reduces to the Triangular source coding region given in [27].

The proof of the Triangular case follows that of the Cascade case, with the ad-

ditional step of Node 0 generating an additional description V n that is intended for

Node 2. This description is then binned to reduce the rate, with the side information

at Node 2 being Un and Zn. Node 2 first decodes Un and then V n.


The Achievability proof is an extension of that in Theorem 4.1. The additional

step we have here is that we generate 2n(I(V ;X,Y |U)+ǫ) V n sequences according to∏n

i=1 p(vi|ui) for each un sequence, and bin these sequences to 2n(I(V ;X,Y |U,Z)+2ǫ) bins,

B3(m3), m3 ∈ [1 : 2nR3]. To send from Node 0 to Node 2, Node 0 first finds a vn

sequence that is jointly typical with (un, xn, yn). This operation succeeds with high

probability since we have 2n(I(V ;X,Y |U)+ǫ) vn sequences. We then send out m3, the bin

number for vn. At Node 2, from the probability distribution, we have the Markov

Chain (V, U)− (X, Y )− Z. Hence, the sequences are jointly typical with high prob-

ability. Node 2 reconstructs by looking for unique vn ∈ B3(m3) such that (un, vn, zn)

are jointly typical. This operation succeeds with high probability since the number

of sequences in B3(m3) is 2n(I(V ;Z|U)−ǫ). Node 2 then reconstructs using the function

g2.

Proof of Converse. The converse is proved in two parts. In the first part, we derive

the required inequalities and in the second part, we show that the joint probability

distribution can be restricted to the form stated in the Theorem.

Given a (n, 2nR1, 2nR2, 2nR3 , D1, D2) code, define Ui = (X i−1, Y i−1, Z i−1, Zni+1,M2)

and Vi = (Ui,M3). We omit proof of the R1 and R2 inequalities since it follows the

same steps as in Theorem 4.1. We have

nR1 ≥n∑

i=1

I(Xi; X1i, Ui|Yi),

nR2 ≥n∑

i=1

I(Xi, Yi;Ui|Zi).

82


For R3, we have

nR3 ≥ H(M3)

≥ H(M3|M2, Zn)

= I(Xn, Y n;M3|M2, Zn)

=

n∑

i=1

(

H(Xi, Yi|M2, Zn, X i−1, Y i−1)−H(Xi, Yi|M2,M3, Z

n, X i−1, Y i−1)

=n∑

i=1

(H(Xi, Yi|Ui, Zi)−H(Xi, Yi|Ui, Vi, Zi))

=

n∑

i=1

I(Xi, Yi;Vi|Ui, Zi).

Next, let Q be a random variable uniformly distributed over [1 : n] and independent

of (Xn, Y n, Zn). Defining U = (UQ, Q), V = (VQ, Q) and X1Q = X1 then gives us the

bounds stated in Theorem 4.2. The existence of the reconstruction function g2 follows

from the definition of U and V . Next, from the definitions of U , V and X1, we note the

following Markov relation: (U, V, X1)− (X, Y )−Z. The joint probability distribution

can then be factored as p(x, y, z, u, v, x1) = p(x, y, z)p(u|x, y)p(x1, v|x, y, u).

We now show that it suffices to restrict the joint probability distributions to the

form

p(x, y, z)p(u|x, y)p(x1|x, y, u)p(v|x, y, u) using a method in [27, Lemma 5]. The ba-

sic idea is that since the inequalities derived rely on p(x1, v|x, y, u) only through the

marginals p(x1|x, y, u) and p(v|x, y, u), we can obtain the same bounds even when the

probability distribution is restricted to the form p(x, y, z)p(u|x, y)p(x1|x, y, u)p(v|x, y, u).

Fix a joint distribution p(x, y, z)p(u|x, y)p(x1, v|x, y, u) and let p(v|x, y, u) and

p(x1|x, y, u) be the induced conditional distributions. Note that

p(x, y, z)p(u|x, y)p(x1, v|x, y, u) and p(x, y, z)p(u|x, y)p(x1|x, y, u)p(v|x, y, u) have the

same marginals p(x, y, z, u, v) and p(x, y, z, u, x1), and the Markov condition (U, V, X1)−

(X, Y )− Z continues to hold under p(x, y, z)p(u|x, y)p(x1|x, y, u)p(v|x, y, u).

Finally, note that the rate and distortion constraints given in Theorem 4.2 depends

on the joint distribution only through the marginals p(x, y, z, u, v) and p(x, y, z, u, x1).

It therefore suffices to restrict the probability distributions to the form

p(x, y, z)p(u|x, y)p(x1|x, y, u)p(v|x, y, u).

4.3.3 Two Way Cascade Source Coding

We now extend the source coding settings to include the case where Node 0 requires

a lossy version of Z. We first consider the Two Way Cascade Source coding setting

defined in section 4.2 (we will use R3 to denote the rate on the link from Node 2 to

Node 0). In the forward part, the achievable scheme consists of using the achievable

scheme for the Cascade source coding case. Node 2 then sends back a description of

Zn to Node 0, with Xn, Y n, Un1 as side information at Node 0. For the converse, we

rely on the techniques introduced and also on a technique for establishing Markovity

of random variables found in [17].

Theorem 4.3. Rate Distortion Region for Two Way Cascade Source Coding

R(D1, D2, D3) for Two Way Cascade Source Coding is given by the set of all rate

tuples (R1, R2, R3) satisfying

R1 ≥ I(X ; X1, U1|Y ),

R2 ≥ I(U1;X, Y |Z),

R3 ≥ I(U2;Z|U1, X, Y ),

for some p(x, y, z, u1, u2, x1) = p(x)p(y|x)p(z|y)p(u1|x, y)p(x1|u1, x, y)p(u2|z, u1) and

functions g2 : U1 ×Z → X2 and g3 : U1 × U2 × X × Y → Z such that

E(dj(X, Xj)) ≤ Dj , j = 1, 2

E(d3(Z, Z)) ≤ D3.

The cardinalities for the auxiliary random variables can be upper bounded by |U1| ≤

|X ||Y|+ 5 and |U2| ≤ |U1|(|Z|+ 1).

If Y = X , this region reduces to the result for two way (two rounds only) source

coding found in [17].

84



The forward path (R1 and R2) follows from the Cascade source coding case in

Theorem 4.1. The reverse direction follows by the following. For each un1 , we

generate 2n(I(U2;Z|U1)+ǫ) un2 sequences according to

∏ni=1 p(u2i|u1i) and bin them to

2n(I(U2;Z|U1,X,Y )+2ǫ) bins, B3(m3), m3 ∈ [1 : 2nR3]. Node 2 finds a un2 sequence that is

jointly typical with (un1 , z

n). Since there are 2n(I(U2;Z|U1)+ǫ) sequences, this operation

succeeds with high probability. It then sends out the bin index m3, which the jointly

typical vn sequence is in. At Node 0, it recovers un2 by looking for the unique sequence

in B3(m3) such that (un1 , u

n2 , x

n, yn) are jointly typical. From the Markov condition

U2 − (U1, Z)− (X, Y ) and the Markov Lemma [32], the sequences are jointly typical

with high probability. Next, since there are only 2n(I(U2;X,Y |U1)−ǫ) sequences in the

bin, the probability that we do not find the unique (correct) sequence goes to zero

with n. Finally, Node 0 reconstructs using the function g3.

Proof of Converse. Given a (n, 2nR1, 2nR2 , 2nR3, D1, D2, D3) code, define

U1i = (M2, Xi−1, Y i−1, Zn

i+1) and U2i = M3. Note that unlike Theorems 4.1 and 4.2,

U1i does not contain Z i−1. We have

nR1 ≥ H(M1)

≥ H(M1|Yn, Zn)

= H(M1,M2|Yn, Zn)


=n∑

i=1


=

n∑

i=1

(

H(Xi|Xi−1, Y n, Zn)−H(Xi|X

i−1, Y n, Zn,M1,M2))

=n∑

i=1

(

H(Xi|Yi, Zi)−H(Xi|Xi−1, Y n, Zn,M1,M2)

)

(a)=

n∑

i=1

(

H(Xi|Yi)−H(Xi|Xi−1, Y n, Zn,M1,M2)

)


=n∑

i=1

I(Zi;U2i|Xi, Yi, U1i),

where the last step follows from the Markov relation Zi− (Xi, Yi)−U1i which we will

now prove, together with other Markov relations between the random variables. The

first two Markov relations below are used for factoring the joint probability distri-

bution while Markov relations three and four are used for establishing the distortion

constraints. We will use the following lemma from [17].

Lemma 4.1. Let A1, A2, B1, B2 be random variables with joint probability mass func-

tions mf p(a1, a2, b1, b2) = p(a1, b1)p(a2, b2). Let M1 be a function of (A1, A2) and M2

be a function of (B1, B2, M1). Then,

I(A2;B1|M1, M2, A1, B2) = 0, (4.1)

I(B1; M1|A1, B2) = 0, (4.2)

I(A2; M2|M1, A1, B2) = 0. (4.3)

Now, let us show the following Markov relations:

1. Zi − (Xi, Yi)− (U1i, X1i): To establish this relation, we show that

I(Zi;U1i, X1i|Xi, Yi) = 0.

I(Zi; X1i, U1i|Xi, Yi) = I(Zi; X1i,M2, Xi−1, Y i−1, Zn

i+1|Xi, Yi)

≤ I(Zi; X1i,M2, Xi−1, Y i−1, Xn

i+1, Yni+1, Z

ni+1|Xi, Yi)

= I(Zi;Xi−1, Y i−1, Xn

i+1, Yni+1, Z

ni+1|Xi, Yi)

= 0.

2. U2i − (Zi, U1i)− (X1i, Xi, Yi): Note that U2i = M3. Consider

I(Xi, Xi, Yi;U2i|Zi, U1i) ≤ I(Xi, Xni , Y

ni ;M3|Z

ni , X

i−1, Y i−1,M2)

= I(Xni , Y

ni ;M3|Z

ni , X

i−1, Y i−1,M2).

Now, using Lemma 1, set A1 = (X i−1, Y i−1), B1 = Z i−1, A2 = (Xni , Y

ni ),

B2 = (Zni ), M2 = M3 and M1 = M2. Then, using the third expression in the

Lemma, we see that I(Xni , Y

ni ;M3|Z

ni , X

i−1, Y i−1,M2) = 0.

3. Z i−1 − (U1i, Zi)− (Xi, Yi): Consider

I(Xi, Yi;Zi−1|U1i, Zi) ≤ I(Xn

i , Yni ;Z

i−1|X i−1, Y i−1, Zni ,M2)

= H(Z i−1|X i−1, Y i−1, Zni ,M2)−H(Z i−1|Xn, Y n, Zn

i ,M2)

≤ H(Z i−1|X i−1, Y i−1, Zni )−H(Z i−1|Xn, Y n, Zn

i )

= H(Z i−1|X i−1, Y i−1)−H(Z i−1|X i−1, Y i−1)

= 0.

4. (Xni+1, Y

ni+1)− (U1i, U2i, Xi, Yi)− Zi: Consider

I(Xni+1, Y

ni+1;Zi|U1i, U2i, Xi, Yi) ≤ I(Xn

i+1, Yni+1;Z

i|M2,M3, Zni+1, X

i, Y i).

Applying the first expression in the Lemma with A2 = (Xni+1, Y

ni+1), A1 =

(X i, Y i), B1 = Z i and B2 = Zni+1 gives I(Xn

i+1, Yni+1;Zi|U1i, U2i, Xi, Yi) = 0.

Distortion constraints

We show that the auxiliary definitions satisfy the distortion constraints by showing

the existence of functions x∗2i(U1i, Zi) and z∗i (U1i, U2i, Xi, Yi) such that

E(d2(Xi, x∗2i(U1i, Zi))) ≤ E(d2(Xi, x2i(M2, Z

n))) (4.4)

E(d3(Zi, z∗i (U1i, U2i, Xi, Yi))) ≤ E(d3(Xi, z3i(M3, X

n, Y n, Zn))), (4.5)

where x2i(M2, Zn) and zi(M3, X

n, Y n) are the original reconstruction functions.

To prove the first expression (4.4), we have

E(d2(Xi, x2i(M2, Zn))) =

∑

p(xi, yi, zn, m2)d2(xi, x2i(m2, zn))

(a)=∑

(

p(u1i, zi)p(xi, yi|u1i, z

i)d2(xi, x′2i(u1i, zi, z

i−1)))

=∑

(

p(u1i, zi, zi−1)p(xi, yi|u1i, zi)d2(xi, x

′2i(u1i, zi, z

i−1)))

,

88


where (a) follows from defining x′2i(u1i, zi, z

i−1) = x2i(m2, zn) for all xi−1, yi−1 and the

last step follows from the Markov relation Z i−1 − (U1i, Zi) − (Xi, Yi). Finally, defin-

ing (zi−1)∗ = argminzi−1

∑

xi,yip(xi, yi|u1i, zi)d2(xi, x

′2i(u1i, zi, z

i−1)) and x∗2i(u1i, zi) =

x′2i(u1i, zi, (z

i−1)∗) gives us

E(d2(Xi, x2i(M2, Zn)))

=∑

p(u1i, zi, zi−1)

(

∑

xi,yi

(

p(xi, yi|u1i, zi)d2(xi, x′2i(u1i, zi, z

i−1)))

)

≥∑

p(u1i, zi, zi−1)d2(xi, x

∗2i(u1i, zi))

= E(d2(Xi, x∗2i(U1i, Zi))).

To prove the second expression (4.5), we follow similar steps. Considering the

expected distortion, we have

E(d3(Zi, zi(M3, Xn, Y n)))

=∑

p(zni , xn, yn, m3)d3(zi, zi(m3, x

n, yn))

=∑

(

p(u1i, u2i, xi, yi, xni+1, y

ni+1)p(zi|u1i, u2i, xi, yi, x

ni+1, y

ni+1).

d3(zi, z′3i(u1i, u2i, xi, yi, x

ni+1, y

ni+1))

)

=∑

(

p(u1i, u2i, xi, yi, xni+1, y

ni+1)p(zi|u1i, u2i, xi, yi).

d3(zi, z′i(u1i, u2i, xi, yi, x

ni+1, y

ni+1))

)

,

where the last step uses Markov relation 4. The rest of the proof is omitted since it

uses the same steps as the proof for the first distortion constraint.

Finally, using the standard time sharing random variable Q as before and defining

U1 = (U1Q, Q), U2 = U2Q, X1 = X1Q, we obtain the required outer bound for the rate-

distortion region. The bound for the distortions follow from defining inequalities 4.4

and 4.5. We show the rest of the proof for D2 and omit the proof for D3 since it

follows similar steps. Defining x∗2(u1, zi) = x∗

2Q(u1Q, zi), we have

E(d2(X, x∗2(U1, Z))) = EQ E(d2(X, x∗

2(U1, Z))|Q)

=1

n

n∑

i=1

E(d2(Xi, x∗2i(U1i, Zi)))

≤1

n

n∑

i=1

E(d2(Xi, x2i(M2, Zn))

≤ D2.

We now turn to the final case of Two Way Triangular Source Coding.

4.3.4 Two Way Triangular Source Coding

Theorem 4.4. Rate Distortion Region for Two Way Triangular Source Coding

R(D1, D2, D3) for Two Way Triangular Source Coding is given by the set of all rate

tuples (R1, R2, R3, R4) satisfying

R1 ≥ I(X ; X1, U1|Y ), (4.6)

R2 ≥ I(X, Y ;U1|Z), (4.7)

R3 ≥ I(X, Y ;V |Z, U1), (4.8)

R4 ≥ I(U2;Z|U1, V,X, Y ), (4.9)

for some p(x, y, z, u1, u2, v, x1) = p(x)p(y|x)p(z|y)p(u1|x, y)p(x1|x, y, u1).

p(v|x, y, u1)p(u2|z, u1, v) and functions g2 : U1 ×V ×Z → X2 and g3 : U1 ×U2 ×V ×

X × Y → Z such that

E(d1(X, X1)) ≤ D1, (4.10)

E(d2(X, X2)) ≤ D2, (4.11)

E(d3(Z, Z)) ≤ D3. (4.12)

The cardinalities for the auxiliary random variables are upper bounded by |U1| ≤

|X ||Y|+ 6, |V| ≤ |U1|(|X ||Y|+ 3) and |U2| ≤ |U1||V|(|Z|+ 1).


90


The forward direction (R1, R2, R3) for Two-Way triangular source coding follows

the procedure in Theorem 4.2. For the reverse direction (R4), it follows Theorem 4.3

with (U1, V ) replacing the role of U1 in Theorem 4.3.

Proof of Converse. Given a (n, 2nR1 , 2nR2, 2nR3, 2nR4, D1, D2, D3) code, define U1i =

(M2, Xi−1, Y i−1, Zn

i+1), U2i = M4 and Vi = (M3, U1i). The R1 and R2 bounds follow

the same steps as in Theorem 3. For R3, we have

nR3 ≥ H(M3)

≥ H(M3|M2, Zn)

= I(Xn, Y n;M3|M2, Zn)

=

n∑

i=1

(

H(Xi, Yi|M2, Zn, X i−1, Y i−1)−H(Xi, Yi|M2,M3, Z

n, X i−1, Y i−1))

≥n∑

i=1

(H(Xi, Yi|Ui, Zi)−H(Xi, Yi|U1i, Vi, Zi))

=n∑

i=1

I(Xi, Yi;Vi|U1i, Zi).

Next, consider

nR4 = H(M4)

≥ H(M4|Xn, Y n)

≥ I(M4;Zn|Xn, Y n)

= H(Zn|Xn, Y n)−H(Zn|Xn, Y n,M4)

= H(Zn|Xn, Y n)−H(Zn|Xn, Y n,M2,M3,M4)

≥n∑

i=1

H(Zi|Xi, Yi)−H(Zi|Zni+1, X

i, Y i,M2,M3,M4)

=

n∑

i=1

I(Zi;U1i, Vi, U2i|Xi, Yi)

=n∑

i=1

I(Zi;U2i|Xi, Yi, Vi, U1i),

where the last step follows from the Markov relation Zi − (Xi, Yi) − (Vi, U1i) which

we will now prove together with other Markov relations between the random vari-

ables. The first 2 Markov relations are for factoring the probability distribution while

Markov relations 3 and 4 are for establishing the distortion constraints.

Markov Relations

1. Zi − (Xi, Yi)− (U1i, Vi, X1i): To establish this relation, we show that

I(Zi; X1i, U1i, Vi|Xi, Yi) = 0.

I(Zi; X1i, U1i, Vi|Xi, Yi)

= I(Zi; X1i,M3,M2, Xi−1, Y i−1, Zn

i+1|Xi, Yi)

≤ I(Zi; X1i,M3,M2, Xi−1, Y i−1, Xn

i+1, Yni+1, Z

ni+1|Xi, Yi)

= I(Zi;Xi−1, Y i−1, Xn

i+1, Yni+1, Z

ni+1|Xi, Yi)

= 0.

2. U2i − (Zi, U1i, Vi)− (X1i, Xi, Yi): Consider

I(Xi, Xi, Yi;U2i|Zi, U1i, Vi) ≤ I(Xi, Xni , Y

ni ;M4|Z

ni , X

i−1, Y i−1,M2,M3)

= I(Xni , Y

ni ;M4|Z

ni , X

i−1, Y i−1,M2,M3).

Now, using Lemma 1, set A1 = (X i−1, Y i−1), B1 = Z i−1, A2 = (Xni , Y

ni ),

B2 = (Zni ), M2 = M4 and M1 = M2. Then, using the third expression in the

Lemma, we see that I(Xni , Y

ni ;M4|Z

ni , X

i−1, Y i−1,M2) = 0.

3. Z i−1 − (U1i, Vi, Zi)− (Xi, Yi): Consider

I(Xi, Yi;Zi−1|U1i, Vi, Zi) ≤ I(Xn

i , Yni ;Z

i−1|X i−1, Y i−1, Zni ,M2,M3)

=(

H(Z i−1|X i−1, Y i−1, Zni ,M2,M3)

−H(Z i−1|Xn, Y n, Zni ,M2,M3)

)

≤ H(Z i−1|X i−1, Y i−1, Zni )−H(Z i−1|Xn, Y n, Zn

i )

= H(Z i−1|X i−1, Y i−1)−H(Z i−1|X i−1, Y i−1)

92


= 0.

4. (Xni+1, Y

ni+1)− (U1i, U2i, Vi, Xi, Yi)− Zi: Consider

I(Xni+1, Y

ni+1;Zi|U1i, U2i, Vi, Xi, Yi) ≤ I(Xn

i+1, Yni+1;Z

i|M2,M3,M4, Zni+1, X

i, Y i).

Applying the first expression in the Lemma with A2 = (Xni+1, Y

ni+1), A1 =

(X i, Y i), B1 = Z i and B2 = Zni+1 gives I(Xn

i+1, Yni+1;Zi|U1i, U2i, Xi, Yi) = 0.

Distortion Constraints

The proof of the distortion constraints is omitted since it follows similar steps to

the Two Way Cascade Source Coding case, with the new Markov relations 3 and 4,

and (U1i, Vi) replacing U1i in the proof.

Using the standard time sharing random variable Q as before and defining U1 =

(U1Q, Q), U2 = U2Q, X1 = X1Q and V = VQ we obtain an outer bound for the rate-

distortion region for some probability distribution of the form p(x, y, z, u1, u2, v, x1) =

p(x, y, z)p(u1|x, y)p(x1, v|x, y, u1)p(u2|z, u1, v). It remains to show that it suffices to

consider probability distributions of the form p(x, y, z)p(u1|x, y)p(x1|x, y, u1).

p(v|x, y, u1)p(u2|z, u1, v). This follows similar steps to proof of Theorem 2. Let

p1 = p(x, y, z)p(u1|x, y)p(x1, v|x, y, u1)p(u2|z, u1, v),

p2 = (p(x, y, z)p(u1|x, y)p(x1|x, y, u1)p(v|x, y, u1)p(u2|z, u1, v)) ,

where p(x1|x, y, u1) and p(v|x, y, u1) are the marginals induced by p1. Next, note

that R1, R2, R3, R4 and the distortion constraints depend on p1 only through the

marginals p(x, y, z, u1, u2, v) and p(x, y, z, u1, x1). Since these marginals are the same

for p1 and p2, the rate and distortion constraints are unchanged. Finally, note that

the Markov relations 1 and 2 implied by p1 continues to hold under p2. This completes

the proof of the converse.

4.4 Quadratic Gaussian Distortion Case

In this section, we evaluate the rate-distortion regions when (X, Y, Z) are jointly

Gaussian and the distortion is measured in terms of the mean square error. We will

assume, without loss of generality, that X = A+B+Z, Y = B+Z and Z = Z, where

A, B and Z are independent, zero mean Gaussian random variables with variances σ2A,

σ2B and σ2

Z respectively. While the results in section 4.3 were proven only for discrete

memoryless sources, the extension to the Quadratic Gaussian case is standard and

can be found in for example [37] and [13, Lecture 3].

4.4.1 Quadratic Gaussian Cascade Source Coding

Corollary 4.1. Quadratic Gaussian Cascade Source Coding

First, we note that if R2 < 12log

σ2A+σ2

B

D2, then the distortion constraint D2 cannot

be met. Hence, given D1, D2 > 0 and R2 ≥ max{12log

σ2A+σ2

B

D2, 0}, the rate distortion

region for Quadratic Gaussian Cascade Source Coding is characterized by the smallest

rate R1 such that (D1, D2, R1, R2) are achievable, which is

R1 = max

{

1

2log

σ2A

D1

,1

2log

σ2A

σ2A|U,B

}

,

where U = α∗A + β∗B + Z∗, Z∗ ∼ N(0, σ2Z∗), with α∗, β∗ and σ2

Z∗ achieving the

maximum in the following optimization problem:

maximize σ2A|U,B

subject to R2 ≥1

2log

σ2U

σ2Z∗

D2 ≥ σ2A+B|U

The optimization problem given in the corollary can be solved following analysis

in [27]. In our proof of the corollary, we will show that the rate distortion region

obtained is the same as the case when the degraded side information Z is available

to all nodes.

94


Converse. Consider the case when the side information Z is available to all nodes.

Without loss of generality, we can subtract the side information away from X and

Y to obtain a rate distortion problem involving only A + B and B at Node 0, B at

Node 1 and no side information at Node 2. Characterization of this class of Quadratic

Gaussian Cascade source coding problem has been carried out in [27] and following

the analysis therein, we can show that the rate distortion region is given by the region

in Corollary 4.1.

Achievability. We evaluate Theorem 4.1 using Gaussian auxiliaries random variables.

Let U ′ = α∗X+(β∗−α∗)Y +Z∗ = α∗A+β∗(B+Z)+Z∗ and V be a Gaussian random

variable that we will specify in the proof. We now rewrite R1 = I(X ;U ′, X1|Y ) as

R1 = I(X ;U ′, V |Y ) with X1 = V + E(X|U ′, Y ), V independent of U ′ and Y . Let

g2(U′, Z) = E(X|U ′, Z). Evaluating R1 and R2 using this choice of auxiliaries, we

have

R1 = I(X ;U ′, V |Y )

= h(A +B + Z|B + Z)− h(X|U ′, V, Y )

=1

2log

σ2A

σ2X|U ′,V,Y

,

R2 = I(X, Y ;U ′|Z)

= h(U ′|Z)− h(U ′|X, Y, Z)

=1

2log

σ2α∗A+β∗B+Z∗

σ2Z∗

=1

2log

σ2U

σ2Z∗

.

Next, we have

σ2X|U ′,Y = σ2

A+B+Z|α∗A+β∗(B+Z)+Z∗,B+Z

= σ2A|α∗A+Z∗,B+Z

= σ2A|α∗A+Z∗

= σ2A|U,B.

If σ2X|U ′,Y = σ2

A|U,B ≤ D1, we set V = 0 to obtain R1 =12log

σ2A

σ2X|U′,Y

. If σ2X|U ′,Y > D1,

then we choose V = X−E(X|U ′, Y )+Z2 where Z2 ∼ N(0, D1σ2X|U ′,Y /(σ

2X|U ′,Y −D1))

so that σ2X|U ′,V,Y = D1 and obtain R1 =

12log

σ2A

D1. Therefore,

R1 = max{12log

σ2A

D1, 12log

σ2A

σ2A|U,B

}.

Finally, we show that this choice of random variables satisfy the distortion con-

straints. For D1, note that since E(X − X1)2 = σ2

X|U ′,V,Y , the distortion constraint

D1 is always satisfied. For the second distortion constraint, we have

E(X − X2)2 = σ2

X|U ′,Z

= σ2A+B|α∗A+β∗(B+Z)+Z∗,Z

= σ2A+B|α∗A+β∗B+Z∗,Z

= σ2A+B|α∗A+β∗B+Z∗

= σ2A+B|U

≤ D2.

Hence, our choice of auxiliary U ′ and V satisfies the rate distortion region and

distortion constraints given in the corollary, which completes our proof.

4.4.2 Quadratic Gaussian Triangular Source Coding

Corollary 4.2. Quadratic Gaussian Triangular Source Coding

Given D1, D2 > 0 and R2, R3 ≥ 0, R2 + R3 ≥ 12log

σ2A+σ2

B

D2, the rate distortion region

for Quadratic Gaussian Triangular Source Coding is characterized by the smallest R1

for which (D1, D2, R1, R2, R3) is achievable, which is

R1 = max

{

1

2log

σ2A

D1,1

2log

σ2A

σ2A|U,B

}

,

where U = α∗A + β∗B + Z∗, Z ∼ N(0, σ2Z∗), with α∗, β∗ and σ2

Z∗ satisfying the

following optimization problem.

maximize σ2A|U,B

96


subject to R2 ≥1

2log

σ2U

σ2Z∗

22R3D2 ≥ σ2A+B|U

As with Corollary 4.1, the optimization problem given this corollary can be solved

following analysis in [27].

Converse. The converse uses the same approach as Corollary 4.1. Consider the case

when the side information Z is available to all nodes. Without loss of generality, we

can subtract the side information away from X and Y to obtain a rate distortion

problem involving only A+B and B at Node 0, B at Node 1 and no side information

at Node 2. Characterization of this class of Quadratic Gaussian Triangular source

coding problem has been carried out in [27] and following the analysis therein, we

can show that the rate distortion region is given by the region in Corollary 4.2.

Achievability. We evaluate Theorem 4.2 using Gaussian auxiliary random variables.

Let U ′ = α∗X + (β∗ − α∗)Y + Z∗ = α∗A+ β∗(B + Z) + Z∗ and V ′ = X + ηU ′ + Z3,

Z3 ∼ N(0, σ2Z3). Following the analysis in Corollary 4.1, the inequalities for the rates

are

R1 = max

{

1

2log

σ2A

D1,1

2log

σ2A

σ2A|U,B

}

,

R2 ≥1

2log

σ2U

σ2Z∗

,

R3 ≥ I(X, Y ;V |Z, U ′) = I(X ;V ′|Z, U ′)

=1

2log

σ2X|Z,U ′

σ2X|Z,U ′,V ′

.

As with Corollary 4.1, the distortion constraint D1 is satisfied with an appropriate

choice of X1. For the distortion constraint D2, we have

D2 ≥ σ2X|Z,U ′,V ′.

Next, note that we can assume equality for R3, since we can adjust η and σ2Z3

so that inequality is met. Since this operation can will only decrease σ2X|Z,U ′,V ′ , the

distortion constraint D2 will still be met. Therefore, setting R3 =12log

σ2X|Z,U′

σ2X|Z,U′,V ′

, we

have

D2 ≥ σ2X|Z,U ′,V ′

=σ2X|Z,U ′

22R3.

Since σ2X|Z,U ′ = σ2

A+B|U , this completes the proof of achievability.

Remark: As alternative characterizations, we show in Appendix C.6 that the Cascade

and Triangular settings in Corollaries 4.1 and 4.2 can be transformed into equivalent

problems in [27] where explicit characterizations of the rate distortion regions were

given.

4.4.3 Quadratic Gaussian Two Way Source Coding

It is straightforward to extend Corollaries 4.1 and 4.2 to Quadratic Gaussian Two Way

Cascade and Triangular Source Coding using the observation that in the Quadratic

Gaussian case, side information at the encoder does not reduce the required rate.

Therefore, the backward rate from Node 2 to Node 0 is always lower bounded by12log

σ2Z|B+Z

D3. This rate (and distortion constraint D3) can be achieved by simply

encoding Z. We therefore state the following corollary without proof.

Corollary 4.3. Quadratic Gaussian Two Way Triangular Source Coding

GivenD1, D2, D3 > 0, R2, R3 ≥ 0, R2+R3 ≥12log

σ2A+σ2

B

D2and R4 ≥ max{1

2log

σ2Z|Y

D3, 0},

the rate distortion region for Quadratic Gaussian Two Way Triangular Source Coding

is characterized by the smallest R1 for which (R1, R2, R3, R4, D1, D2, D3) is achievable,

which is

R1 = max

{

1

2log

σ2A

D1,1

2log

σ2A

σ2A|U,B

}

,

98


where U = α∗A + β∗B + Z∗, Z ∼ N(0, σ2Z∗), with α∗, β∗ and σ2

Z∗ satisfying the


maximize σ2A|U,B

subject to R2 ≥1

2log

σ2U

σ2Z∗

22R3D2 ≥ σ2A+B|U

Remark: The special case of TwoWay Cascade Quadratic Gaussian Source Coding

can be obtained as a special case by setting R3 = 0.

Next, we present an extension to our settings for which we can characterize the

rate-distortion region in the Quadratic Gaussian case. In this extended setting, we

have Cascade setting from Node 0 to Node 2 and a triangular setting from Node 2 to

Node 0, with the additional constraint that Node 1 also reconstructs a lossy version

of Z. As formal definitions are natural extensions of those presented in section 4.2,

we will omit them here. The setting is shown in Figure 4.5.

X, Y

Y

Z

R1R2

R3R4

R5

X1, Z1

X2

Z2

Node 0

Node 1

Node 2

Figure 4.5: Extended Quadratic Gaussian Two Way source coding

Theorem 4.5. Extended Quadratic Gaussian Two Way Cascade Source Coding

Given D1, D2 > 0, 0 < DZ1, DZ2 ≤ σ2Z|Y and R2 ≥ max{1

2log

σ2A+σ2

B

D2, 0}, the rate

distortion region for the Extended Quadratic Gaussian Two Way Cascade Source

Coding is given by the set of R1, R3, R4, R5 ≥ 0 satisfying the following equalities and

inequalities

R1 = max

{

1

2log

σ2A

D1

,1

2log

σ2A

σ2A|U,B

}

,

where U = α∗A + β∗B + Z∗, Z∗ ∼ N(0, σ2Z∗), with α∗, β∗ and σ2

Z∗ satisfying the


maximize σ2A|U,B

subject to R2 ≥1

2log

σ2U

σ2Z∗

D2 ≥ σ2A+B|U

and

R3 ≥1

2log

σ2Z|Y

DZ1

,

R3 +R5 ≥1

2log

σ2Z|Y

min{DZ1 , DZ2},

R4 +R5 ≥1

2log

σ2Z|Y

DZ2

.

Proof. Converse

For the forward direction (R1, R2), we note that Node 2 can only send a function of

(M1, Yn, Zn) to Nodes 0 and 1 using the R4 and R5 links. Since M1 and Y n available

at both Node 0 and 1, the forward rates are lower bounded by the setting where

Zn is available to all nodes. Further, in this setting, the distortion constraints DZ1

and DZ2 are automatically satisfied since Z is available at Nodes 0 and 1. Therefore,

(R3, R4, R5) do not affect the achievable (R1, R2) rates in this modified (lower bound)

setting. (R1, R2) are then obtained by the observation in Corollary 4.1 that the rate

distortion region obtained for our Quadratic Gaussian Cascade setting in Corollary 4.1

is equivalent to the case where the side information Z is available at all nodes.

100


For the reverse direction, the lower bounds are derived by letting the side informa-

tion (X, Y ) to be available at Node 2, and for side information X to be available at

Node 1. The D1 and D2 distortion constraints are then automatically satisfied since

X is available at all nodes. We then observed that (R1, R2) do not affect the achiev-

able (R3, R4, R5) rates in this modified (lower bound) setting. The stated inequalities

for R3, R4, R5 are then obtained from standard cutset bound arguments and the fact

that X − Y − Z form a Markov Chain.

Achievability

We analyze only the backward rates R3, R4 and R5 since the forward direction

follows from Corollary 4.1. For the backward rates, we now show that the rates are

achievable without the assumption of (X, Y ) being available at Node 2. We will rely

on results on successive refinement of Gaussian sources with common side information

given in [30]. A simplified figure of the setup for analyzing the backward rates is given

in Figure 4.6. We have three cases to consider.

Y

Y

Z

R3R4

R5

Z1

Z2

Node 0

Node 1

Node 2

Figure 4.6: Setup for analysis of achievability of backward rates

Case 1: DZ1 ≤ DZ2

In this case, the inequalities in the lower bound reduce to

R3 ≥1

2log

σ2Z|Y

DZ1

,

R4 +R5 ≥1

2log

σ2Z|Y

DZ2

.

From the successive refinement results in [30], we can show that the following rates

are achievable

R3 = I(U1, U2, U3;Z|Y ),

R4 = I(U2;Z|Y ),

R5 = I(U3;Z|Y, U2)

for some conditional distribution F (U1, U2, U3|Z), Z1(U1, U2, U3, Y ) and Z2(U1, U2, Y )

satisfying the distortion constraints. Now, for fixed R4 ≤ 12log

σ2Z|Y

DZ2, choose D′(≥

DZ2) such that R4 = 12log

σ2Z|Y

D′ . We now choose the auxiliary random variables and

reconstruction functions in the following manner. Define Q(x) :=xσ2

Z|Y

σ2Z|Y

−x.

U1 = Z +W1 where W1 ∼ N(0, Q(DZ1)),

U3 = U1 +W3 where W3 ∼ N(0, Q(DZ2)−Q(DZ1)),

U2 = U3 +W2 where W2 ∼ N(0, Q(D′)−Q(DZ2)),

Z1 = E(Z|U1, Y ),

Z2 = E(Z|U3, Y ).

From this choice of auxiliary random variables, it is easy to verify the following

R3 = I(U1, U2, U3;Z|Y )

= I(U1;Z|Y )

=1

2log

σ2Z|Y

DZ1

,

R4 = I(U2;Z|Y )

=1

2log

σ2Z|Y

D′,

R4 +R5 = I(U2;Z|Y ) + I(U3;Z|Y, U2)

= I(U3, U2;Z|Y )

102


=1

2log

σ2Z|Y

DZ2

,

E(Z − Z1)2 = DZ1 ,

E(Z − Z2)2 = DZ2 .

Case 2: DZ1 > DZ2, R3 ≥ R4

In this case, the active inequalities are

R3 ≥1

2log

σ2Z|Y

DZ1

,

R4 +R5 ≥1

2log

σ2Z|Y

DZ2

.

From [30], the following rates are achievable

R3 = I(U1, U2;Z|Y ),

R4 = I(U2;Z|Y ),

R5 = I(U3, U1;Z|Y, U2).

First, assume R3 ≤ 12log

σ2Z|Y

DZ2. Choose DZ2 ≤ D′ ≤ D′′ ≤ DZ1. We choose the

auxiliary random variables and reconstruction functions as follows.

U3 = Z +W3 where W3 ∼ N(0, Q(DZ2)),


U2 = U1 +W2 where W2 ∼ N(0, Q(D′′)−Q(D′)),

Z1 = E(Z|U1, Y ),

Z2 = E(Z|U3, Y ).

From this choice of auxiliary random variables, it is easy to verify the following

R3 = I(U1, U2;Z|Y )

= I(U1;Z|Y )

=1

2log

σ2Z|Y

D′,

R4 = I(U2;Z|Y )

=1

2log

σ2Z|Y

D′′,

R4 +R5 = I(U2;Z|Y ) + I(U3, U1;Z|Y, U2)

= I(U3, U1, U2;Z|Y )

= I(U3;Z|Y )

=1

2log

σ2Z|Y

DZ2

,

E(Z − Z1)2 = D′ ≤ DZ1 ,

E(Z − Z2)2 = DZ2.

Next, consider R3 > 12log

σ2Z|Y

DZ2and R4 > 1

2log

σ2Z|Y

DZ2. Then, it is easy to see from

our achievability scheme that we can obtain R′4 < R4, R

′3 < R3 and R5 = 0 by setting

D′ = D′′ = DZ2 . Finally, consider the case where R3 >12log

σ2Z|Y

DZ2and R4 ≤

12log

σ2Z|Y

DZ2.

Then, we observe from our achievability scheme that we can achieve R′3 =

12log

σ2Z|Y

DZ2<

R3 for any R4 and R5 satisfying the inequalities by setting D′ = DZ2.

Case 3: DZ1 > DZ2, R3 < R4

In this case, the active inequalities are

R3 ≥1

2log

σ2Z|Y

DZ1

,

R3 +R5 ≥1

2log

σ2Z|Y

DZ2

.

We first consider the case where R3 ≤ 12log

σ2Z|Y

DZ2. We exhibit a scheme for which

R′4 = R3 (< R4) and still satisfies the constraints. This procedure is done by letting

U2 in case 2 to be equal to U1. For DZ2 ≤ D′ ≤ DZ1, define the auxiliary random

104


variables and reconstruction functions as follows.

U3 = Z +W3 where W3 ∼ N(0, Q(DZ2)),


Z1 = E(Z|U1, Y ),

Z2 = E(Z|U3, Y ).

Then, we have the following.

R3 = I(U1;Z|Y )

=1

2log

σ2Z|Y

D′,

R′4 = I(U1;Z|Y )

=1

2log

σ2Z|Y

D′,

R3 +R5 = I(U1;Z|Y ) + I(U3;Z|Y, U1)

= I(U3, U1;Z|Y )

= I(U3;Z|Y )

=1

2log

σ2Z|Y

DZ2

,

E(Z − Z1)2 = D′ ≤ DZ1 ,

E(Z − Z2)2 = DZ2 .

Finally, we note that in the case where R3 > 12log

σ2Z|Y

DZ2, we can always achieve

R′3 =

12log

σ2Z|Y

DZ2, R′

4 =12log

σ2Z|Y

DZ2and R′

5 = 0 by letting D′ = DZ2 .

Remark 1: The Two-way Cascade source coding setup given in section 4.2 can

be obtained as a special case by setting R3 = R4 = 0 and DZ1 → ∞.

Remark 2: The rate distortion region is the same regardless of whether Node

2 sends first, or Node 0 sends first. This observation follows from (i) our result in

Corollary 4.1 where we showed that the rate distortion region for the Cascade setup

is equivalent to the setup where all nodes have the degraded side information Z; and

(ii) our proof above where we showed that the backward rates are the same as in the

case where the side information (X, Y ) is available at all nodes.

Remark 3: For arbitrary sources and distortions, the problem is open in general.

Even in the Gaussian case, the problem is open without the Markov Chain X−Y −Z.

One may also consider the setting where there is a triangular source coding setup in

the forward path from Node 0 to Node 2. This setting is still open, since the trade

off in sending from Node 0 to Node 2 and then to Node 1 versus sending directly to

Node 1 from Node 0 is not clear.

4.5 Triangular Source Coding with a helper

We present an extension to our Triangular source coding setup by also allowing the

side information Y to be observed at the second node through a rate limited link

(or helper). The setup is shown in Figure 4.7. As the formal definitions are natural

extensions of those given in section 4.2, we will omit them here.

X

Y

Y

Y

Z

R1R2

R3

Rh

X1

X2

Node 0

Node 1

Node 2

Helper

Figure 4.7: Triangular Source Coding with a helper

106


Theorem 4.6. The rate distortion region for Triangular source coding with a helper

is given by the set of rate tuples

R1 ≥ I(X ; X1, U1|Y, Uh),

R2 ≥ I(U1;X, Y |Z, Uh),

R3 ≥ I(X, Y ;U2|U1, Uh, Z),

Rh ≥ I(Uh; Y |Z).

for some p(x, y, z, u1, u2, uh, x1) = p(x)p(y|x)p(z|y)p(uh|y)p(u|x, y, uh).

p(x1|x, y, u1, uh)p(u2|x, y, u1, uh) and function g2 : U1 × U2 × Uh × Z → X2 such that

E dj(Xj, Xj) ≤ Dj , j=1,2.

We give a proof of the converse in Appendix C.7. As the achievability techniques

used form a straightforward extension of the techniques described in Appendix C, we

give only a sketch of achievability.

Sketch of Achievability. The achievability follows that of Triangular source coding,

with an additional step of generating a lossy description of Y n. The codebook gener-

ation consists of the following steps

• Generate 2n(I(Y ;Uh)+ǫ) Unh sequences according to

∏ni=1 p(uhi). Partition the set

of Unh sequences into 2n(I(Uh;Y |Z)+2ǫ) bins, Bh(mh), mh ∈ [1 : 2n(I(Uh;Y |Z)+2ǫ)].

• Generate 2n(I(X,Y,Uh;U1)+ǫ) Un1 sequences according to

∏ni=1 p(u1i). Partition the

set of Un1 sequences into 2n(I(U1;X|Y,Uh)+2ǫ) bins, B1(m10). Separately and in-

dependently, partition the set of Un sequences into 2n(I(U1;X,Y |Z,Uh)+2ǫ) bins,

B2(m2), m2 ∈ [1 : 2n(I(U ;X,Y |Z)+2ǫ)].

• For each (un1 , u

nh, y

n) sequence, generate 2n(I(Xn1 ;X|U1,Y,Uh)+ǫ) Xn

1 sequences ac-

cording to∏n

i=1 p(xi|u1i, uhi, yi).

• Generate 2n(I(U2;X,Y |Uh,U1)+ǫ) Un2 sequences according to

∏ni=1 p(u2i|u1i, uhi) for

each (un1 , u

nh) sequence, and partition these sequences to 2n(I(U2;X,Y |U1,Uh,Z)+2ǫ)

bins, B3(m3).

Encoding consists of the following steps

• Helper node: The helper node (and Nodes 0 and 1) looks for a unh sequence such

that (unh, y

n) ∈ T(n)ǫ . This step succeeds with high probability since there are

2n(I(Y ;Uh)+ǫ) Unh sequences. The helper then sends out the bin index mh such

that unh ∈ B(mh). The sequences (un

h, xn, yn, zn) are jointly typical with high

probability due to the Markov Chain (X,Z)− Y − Uh.

• Node 0: Given (xn, yn, unh) ∈ T

(n)ǫ , Node 0 looks for a jointly typical codeword

un1 . This operation succeeds with high probability since there are 2n(I(X,Y,Uh;U1)+ǫ)

Un1 sequences. Node 0 then looks for a xn

1 that is jointly typical with (un1 , x

n, yn, unh).

This operation succeeds with high probability since there are 2n(I(X1;X|U1,Uh,Y )+ǫ)

xn1 sequences.

• Node 0 also finds a un2 sequence that is jointly typical with (un

1 , unh, x

n, yn). This

operation succeeds with high probability since we have 2n(I(U2;X,Y |U1,Uh)+ǫ) vn

sequences.

• Node 0 then sends out the bin index m10 such that un1 ∈ B1(m10) and the index

corresponding to xn1 to Node 1. This requires a total rate of R1 = I(U ;X|Y ) +

I(Xn1 ;X|U, Y ) + 3ǫ to Node 1. Node 0 also sends out the bin index m3 such

that un2 ∈ B(m3) to Node 2. This requires a rate of I(U2;X, Y |U1, Uh, Z) + 2ǫ.

• Node 1 decodes the codeword un1 and forwards the index m2 such that un

1 ∈

B(m2) to Node 2. This requires a rate of I(U1;X, Y |Z, Uh) + 2ǫ.

Decoding consists of the following steps

• Node 1: Node 1 reconstructs un1 by looking for the unique Un

1 sequence in

B1(m10) such that (Un1 , U

nh , Y

n) ∈ T(n)ǫ . Since there are only

2n(I(X,Y,Uh;U1)−I(U1;X|Y,Uh)−ǫ) = 2n(I(U1;Uh,Y )−ǫ) sequences in the bin, this operation

succeeds with high probability. Node 1 reconstructs Xn as Xn1 (m10, m11). Since

the sequence (Xn1 , X

n) are jointly typical with high probability, the expected

distortion constraint is satisfied.

108


• Node 2: We note that since (U1, U2, Uh, X)− Y − Z, the sequences

(Unh , U

n1 , U

n2 , X

n, Y n, Zn) are jointly typical with high probability. Decoding at

node 2 consists of the following steps

1. Node 2 first looks for unh in Bh(mh) such that (un

h, zn) ∈ T

(n)ǫ . This op-

eration succeeds with high probability since there are only 2n(I(Uh;Z)−ǫ unh

sequences in the bin.

2. It then looks for un1 in B2(m2) such that (un

h, un1 , z

n) ∈ T(n)ǫ . Since

I(U1;X, Y, Uh) − I(U1;X, Y |Z, Uh) = I(U1;Z, Uh) by the Markov Chain

Z− (X, Y, Uh)−U1, this operation succeeds with high probability as there

are only 2n(I(U1;Z,Uh)−ǫ un1 sequences in the bin.

3. Finally, it looks for un2 in B3(m3) such that (un

h, un1 , u

n2 , z

n) ∈ T(n)ǫ . Since

I(U2;X, Y |Uh, U1)−I(U2;X, Y |Z, Uh, U1) = I(U2;Z|U1, Uh) by the Markov

Chain Z−(X, Y, Uh, U1)−U2, this operation succeeds with high probability

as there are only 2n(I(U2;Z|U1,Uh)−ǫ un2 sequences in the bin.

4. Node 2 then reconstructs using the function x2i = g2(u1i, u2i, uhi, zi) for

i ∈ [1 : n]. Since the sequences (Xn, Zn, Un1 , U

n2 , U

nh ) are jointly typical

with high probability, the expected distortion constraint is satisfied.


Rate distortion regions for the cascade, triangular, two-way cascade and two-way

triangular source coding settings were established. Decoding part of the description

intended for Node 2 and then re-binning it was shown to be optimum for our Cascade

and Triangular settings. We also extended our Triangular setting to the case where

there is an additional rate constrained helper, which observes Y , for Node 2. In the

Quadratic Gaussian case, we showed that the auxiliary random variables can be taken

to be jointly Gaussian and that the rate-distortion regions obtained for the Cascade

and Triangular setup were equivalent to the setting where the degraded side informa-

tion is available at all nodes. This observation allows us to transform our Cascade

and Triangular settings into equivalent settings for which explicit characterizations

are known. Characterizations of the rate distortion regions for the Quadratic Gaus-

sian cases were also established in the form of tractable low dimensional optimization

programs. Our Two Way Cascade Quadratic Gaussian setting was extended to solve

a more general two way cascade scenario. The case of generally distributed X, Y, Z,

without the degradedness assumption, remains open.

110

Chapter 5

Causal Cascade and Triangular

Source Coding

5.1 Introduction to Cascade and Triangular Source

Coding with causality constraints

This chapter is a continuation of Chapter 4 where we introduced the Cascade and

Triangular Source Coding problems. In that chapter, we characterized the rate-

distortion regions for a variety of settings when the side information at Node 2 is

degraded with respect to the side informations at Nodes 0 and 1. The intuition we

gained from the results and analysis is that the decode and re-bin scheme is usually

optimum when the side information at Node 2 is “poorer” in some sense. This idea

of poorer quality side information was made precise in the previous chapter when we

assumed a Markov condition on the side informations. In this chapter, we will see

another notion of “poorer” quality side information at Node 2 which would allow us

to establish further results for the Cascade and Triangular Source Coding problems.

Therefore, this chapter will focus on other conditions for the Cascade and Trian-

gular settings for which we can establish the rate distortion regions. In this chapter,

we look at the case of causal reconstruction at the end node (Node 2), where Node

2 is constrained to reconstruct the symbol X2i using only the side information from

111

time 1 up to i. This setup of causal source coding (and reconstruction) was first intro-

duced in [34], and motivation for considering this setting can be found in that paper.

Essentially, this problem is motivated by source coding systems that are constrained

to operate with limited or no delay. Source coding systems with encoder and decoder

delay constraints form a subclass of schemes in the setting proposed by [34], since

their setting only imposes delay restrictions on the decoder. Results and bounds in

that paper therefore serve as bounds on the limits of performance for source coding

systems with delay constraints. In [31], this setup was extended to the Gu-Effros

cascade source coding problem [15] when causal reconstruction is required at the end

node.

This chapter also considers the setting of causal reconstruction at the end node,

and we derive more general results for both the cascade and triangular source coding

cases. We begin by providing formal definitions and problem statements in section 5.2.

In section 5.3, we consider the problem of Causal Cascade and Triangular Source

Coding with the same side information Y available at the first 2 nodes (the Source

Node, Node 0, and the intermediate node, Node 1) and side information Z available

at Node 2. We characterize the rate distortion regions for these settings and recover

as a special case, the rate distortion region for the Gu-Effros network. In section 5.4,

we consider the problem of (near) lossless source coding through a cascade with side

informations Y and Z available at Nodes 1 and 2 respectively. We establish the

optimum rate region when the positivity condition [34] p(x, y, z) > 0 for all x, y, z

holds; or when the sources form a Markov chain X − Y − Z.


Formal definitions of Causal Cascade and Triangular Source Coding are similar to

the formal definitions given in Chapter 4, but for completeness, and also because we

have are introducing new settings as well, we give the formal definitions for the setups

under consideration in this chapter.

112

CHAPTER 5. CAUSAL CASCADE AND TRIANGULAR S.C. 113

5.2.1 Causal Cascade and Triangular Source coding with same

side information at first 2 nodes

We give formal definition for the Triangular source coding setting (Figure 5.1). The

Cascade setting follows from specializing the definitions for the Triangular setting by

setting R3 = 0. A (n, 2nR1 , 2nR2, 2nR3) code for the Triangular setting consists of 3

encoders

f1 (at Node 0) : X n ×Yn → M1 ∈ [1 : 2nR1],

f2 (at Node 1) : Yn × [1 : 2nR1] → M2 ∈ [1 : 2nR2 ],

f3 (at Node 0) : X n ×Yn → M3 ∈ [1 : 2nR3],

and 2 decoders

g1 (at Node 1) : Yn × [1 : 2nR1] → X n1 ,

g2i (at Node 2) : Z i × [1 : 2nR2 ]× [1 : 2nR3] → X2i,

i ∈ [1 : n],

where Xn1 = g1(Y

n, f1(Xn, Y n)) and

X2i = g2i(Zi, f2(Y

n, f1(Xn, Y n)), f3(X

n, Y n)).

Given distortions (D1, D2), a (R1, R2, R3, D1, D2) rate-distortion tuple for the tri-

angular source coding setting is said to be achievable if, for any ǫ > 0, and n suffi-

ciently large, there exists a (n, 2nR1 , 2nR2, 2nR3) code for the Triangular source coding

setting such that

E

[

1

n

n∑

i=1

dj(Xi, Xj,i)

]

≤ Dj + ǫ, j=1,2,

The rate-distortion region, R(D1, D2), is defined as the closure of the set of all

achievable rate-distortion tuples.

Causal Cascade Source coding with same side information at first 2 nodes : The

Cascade source coding setting corresponds to the case where R3 = 0.

X

Y

Y

Z

R1R2

R3

X1

X2

Node 0

Node 1

Node 2

Figure 5.1: Triangular source coding setting. If R3 = 0, this setting reduces to theCascade setting

X

Y Z

R1 R2

X2 = X

Node 0Node 1

Node 2

Figure 5.2: Lossless Cascade Source Coding Setting

5.2.2 Lossless Causal Cascade Source Coding

This setting is shown in Figure 5.2. A (n, 2nR1, 2nR2) code for the lossless causal

cascade source coding setting consists of 2 encoders

f1 (at Node 0) : X n → M1 ∈ [1 : 2nR1 ],

f2 (at Node 1) : Yn × [1 : 2nR1] → M2 ∈ [1 : 2nR2 ],

and 1 decoder with the following functions for i ∈ [1 : n]

g2i (at Node 2) : Z i × [1 : 2nR2 ] → X2i.

A (R1, R2) rate tuple is said to be achievable if, for any ǫ > 0, and n sufficiently large,

there exists a (n, 2nR1, 2nR2) code for the lossless causal cascade source coding setting

such that

P{Xn2 6= Xn} ≤ ǫ.

The rate region is then defined as the closure of the set of all achievable rate

114


tuples.

5.3 Causal Cascade and Triangular Source coding

with same side information at first 2 nodes

In this section, we present our results for the first setting described in 5.2.1, which are

single letter characterizations of the rate-distortion regions for both the Triangular

and Cascade Source coding settings. In Theorems 5.1 and 5.2, we will present only

proofs of the converses. The achievability schemes are similar to the schemes presented

in Chapter 4, but with slight modifications to account for the assumption of causal

reconstruction. Due to the similarities, we omit the proof of achievability here and

only mention the two differences that arise in the causal setting:

• In the Causal Cascade setting, Node 2 decodes its intended codeword Un with-

out making use of its side information Zn. Hence, the binning rate at Node 1

has to be higher, which is reflected the rate distortion regions. Similarly, for the

Causal Triangular Source Coding case, Node 2 decodes its intended codewords

V n and Un without making use of the side information Zn.

• At every time i ∈ [1 : n], Node 2 outputs X2i(Un(i), Zi) in the Causal Cascade

case and X2i(Un(i), V n(i), Zi) in the Causal Triangular Source Coding case.

5.3.1 Causal Cascade Source Coding

Theorem 5.1. R(D1, D2) for the Cascade source coding setting defined in section 5.2.1

is given by the set of all rate tuples (R1, R2) satisfying

R1 ≥ I(X ; X1, U |Y ),

R2 ≥ I(U ;X, Y )

for some

p(x, y, z, u, x1) = p(x, y, z)p(u|x, y)p(x1|x, y, u) and function g2 : U × Z → X2 such

that

E dj(X, Xj) ≤ Dj , j=1,2.

The cardinality of U is bounded by |U| ≤ |X ||Y|+ 3.

Remark 1: If Y = X , this setup reduces to the case of causal Wyner-Ziv source

coding discussed in [34].

Remark 2: If Z = ∅, then the case of causal source coding at Node 2 is the same as

noncausal source coding at Node 2. Therefore, Theorem 5.1 reduces to the equivalent

result in [27] when Z = ∅.

Remark 3: Let Y = ∅, X = (X, Y ), d1(X, X1) = d1(X, X) and d2(X, Y ) =

d2(Y , Y ), then we recover the Gu-Effros network [15] with causal source coding at

the end node, which shows that the result in [31] is a special case of Theorem 5.1.

Proof of Converse. Given a

(n, 2nR1, 2nR2) code satisfying the distortion constraints (D1, D2), define Ui =

(X i−1, Y i−1, Z i−1,M2). We have the following.

nR2 ≥ H(M2) = I(Xn, Y n;M2)

=n∑

i=1

I(Xi, Yi;M2|Xi−1, Y i−1)

(a)=

n∑

i=1

(

H(Xi, Yi|Zi−1, X i−1, Y i−1)−H(Xi, Yi|Z

i−1, X i−1, Y i−1,M2))

=n∑

i=1

H(Xi, Yi)−H(Xi, Yi|Ui)

=

n∑

i=1

I(Xi, Yi;Ui).

Step (a) follows from the Markov Chain Z i−1− (X i−1, Y i−1,M2)− (Xi, Yi) which can

be shown by considering

I(Xi, Yi;Zi−1|X i−1, Y i−1,M2) ≤ H(Z i−1|X i−1, Y i−1)−H(Z i−1|Xn, Y n,M2)

116


= H(Z i−1|X i−1, Y i−1)−H(Z i−1|X i−1, Y i−1) = 0.

Next,

nR1 ≥ H(M1) ≥ H(M1|Yn)

= H(M1,M2|Yn) = I(Xn;M1,M2|Y

n)

=n∑

i=1

I(Xi;M1,M2|Xi−1, Y n)

=

n∑

i=1

(

H(Xi|Yi)−H(Xi|Xi−1, Y n,M1,M2)

)

(a)=

n∑

i=1

(

H(Xi|Yi)−H(Xi|Xi−1, Y n, Z i−1,M1,M2)

)

(b)=

n∑

i=1

(

H(Xi|Yi)−H(Xi|Xi−1, Y n, X1i, Z

i−1,M1,M2))

≥n∑

i=1

H(Xi|Yi)−H(Xi|X1i, Yi, Ui)

=n∑

i=1

I(Xi; X1i, Ui|Yi).

(a) follows from the Markov Chain Z i−1 − (X i−1, Y i−1,M2) − (Xi, Yi) which can

be established using the same technique as that in showing the Markov Chain for

the bound for R2. (b) follows from the fact that X1i is a function of Y n and M1.

Next, the proof is completed in the usual manner by letting Q be a random variable

uniformly distributed over [1 : n] and independent of (Xn, Y n, Zn). We note that

XQ = X , YQ = Y , ZQ = Z and defining U = (UQ, Q) and X1Q = X1 then completes

the proof. The existence of the reconstruction function g2 follows from the definition

of U and the assumption of causal source coding at Node 2. The Markov Chain

(U, X1)− (X, Y )− Z also follows from definitions of U and X1.

5.3.2 Causal Triangular Source Coding

Theorem 5.2. R(D1, D2) for the Triangular source coding setting defined in sec-

tion 5.2.1 is given by the set of all rate tuples (R1, R2, R3) satisfying

R1 ≥ I(X ; X1, U |Y ),

R2 ≥ I(X, Y ;U),

R3 ≥ I(X, Y ;V |U)

for some p(x, y, z, u, v, x1) =

p(x, y, z)p(u|x, y)p(x1|x, y, u)p(v|x, y, u) and function g2 : U ×V ×Z → X2 such that

E dj(X, Xj) ≤ Dj , j=1,2.

The cardinalities for the auxiliary random variables can be upper bounded by |U| ≤

|X ||Y|+ 4 and |V| ≤ (|X ||Y|+ 4)(|X ||Y|+ 1).

Similar to the Cascade case, if Z = ∅, this region reduces to the Triangular source

coding region given in [27].

Proof of Converse. The converse is proved in two parts. In the first part, the re-

quired inequalities are derived and in the second part, we need to show that the joint

probability distribution can be restricted to the form stated in the Theorem. Proof

for the first part is given, while for the second part, the proof follows similar lines to

a technique in [27, Lemma 5] and we refer readers to it.

Given a (n, 2nR1, 2nR2 , 2nR3) code satisfying the distortion constraints (D1, D2),

define Ui = (X i−1, Y i−1, Z i−1,M2) and Vi = M3. We omit proof of the R1 and R2

inequalities since it follows the same steps as in Theorem 5.1. For R3, we have

nR3 ≥ H(M3) ≥ I(Xn, Y n;M3|M2)

(a)=

n∑

i=1

(

H(Xi, Yi|M2, Xi−1, Y i−1)−H(Xi, Yi|M2,M3, X

i−1, Y i−1, Z i−1))

118


≥n∑

i=1

H(Xi, Yi|Ui)−H(Xi, Yi|Ui, Vi)

=

n∑

i=1

I(Xi, Yi;Vi|Ui).

(a) follows from the Markov Chain Z i−1 − (M2,M3, Xi−1, Y i−1) − (Xi, Yi). Now,

let Q be a random variable uniformly distributed over [1 : n] and independent of

(Xn, Y n, Zn). We then obtain the bounds stated in Theorem 5.2 by defining U =

(UQ, Q), V = (VQ, Q) and X1Q = X1. The existence of the reconstruction function

g2i for i ∈ [1 : n] follows from the definition of U , V and the assumption of causal

reconstruction. Next, from the definitions of U , V and X1, we note the following

Markov relation: (U, V, X1) − (X, Y ) − Z. The joint probability distribution can

then be factored as p(x, y, z, u, v, x1) = p(x, y, z)p(u|x, y)p(x1, v|x, y, u). Proof of this

second part is omitted here.

5.4 Lossless Causal Cascade Source Coding

In this section, we present results for the setting described in 5.2.2. The case when a

positivity condition is satisfied is presented in Theorem 5.3, while the case ofX−Y −Z

forming a Markov Chain is presented in Theorem 5.4.

Theorem 5.3. Let p(x, y, z) > 0 for all x ∈ X , y ∈ Y , z ∈ Z. Then, the rate region

for lossless cascade source coding setting defined in section 5.2.2 is given by the set

of (R1, R2) satisfying

R1 ≥ H(X|Y ),

R2 ≥ H(X).

Remark: This result extends the result in [34]. It shows that the side information

at the end node does not help when the positivity condition holds. Xn has to be

communicated to both Node 1 and Node 2, without the use of side information Zn.

It is clear that we can generalize this theorem slightly by requiring lossy reconstruction

at the Node 1. The rate region remains unchanged as R1 ≥ H(X|Y ) implies that we

can satisfy the distortion constraint since Xn is recovered (near) losslessly at Node 1.

Proof: We give a proof of converse, since the achievability is straightforward. The

converse follows from cutset bound arguments. We first obtain a lower bound on

R1 by cutset arguments and relaxing the lossless requirement to Hamming distortion

with D ≥ 0. Let X2i be a function of M1, Yn, Z i and the distortion measure be

Hamming distortion. Then, defining Ui = (Y n/i, Z i−1,M1), we have

R1(D) ≥ H(M1) ≥ I(M1;Xn|Y n)

=

n∑

i=1

H(Xi|Yi)−H(Xi|Xi−1, Y n,M1)

=n∑

i=1

H(Xi|Yi)−H(Xi|Xi−1, Y n, Z i−1,M1)

≥n∑

i=1

I(Xi;Ui|Yi).

(a) follows from the Markov Chain Z i−1 − (X i−1, Y n,M1)−Xi. The lower bound is

then obtained following standard procedures and the observation that U−X−(Y, Z).

We have

R1(D) ≥ minU−X−(Y,Z):E d(X,X2(U,Y,Z))≤D

I(X ;U |Y ).

Setting D = 0 and following the procedure in [34], we can show that the lower bound

reduces to

R1 ≥ minU−X−(Y,Z):H(X|U,Y,Z)=0

I(X ;U |Y ).

We now show that the above expression is lower bounded by H(X|Y ), thereby giving

us the required lower bound for R1. To show this, first take (u, x) such that p(u|x) > 0

and p(u) > 0. We note that such a pair always exists since p(x) > 0 from positivity

condition and p(u) =∑

x p(x)p(u|x) > 0. Now, we show that p(u|x′) = 0 for all

x′ 6= x.

120


First, note that since p(u) > 0, there exists (y, z) such that p(u, y, z) > 0. But

since H(X|U, Y, Z) = 0, we must have H(X|U = u, Y = y, Z = z) = 0. Hence,

p(x|u, y, z) is either 0 or 1. But since p(x|u, y, z) ≥ p(x, u, y, z) = p(x, y, z)p(u|x) > 0

by positivity condition, we have p(x|u, y, z) = 1 and hence, p(x′|u, y, z) = 0. Since

p(x′|u, y, z) ≥ p(x′, u, y, z) = p(x′, y, z)p(u|x′) = 0 and p(x′, y, z) > 0, p(u|x′) = 0 for

all x′ 6= x. Therefore, p(x|u) = 1 for every u such that p(u) > 0 and some x ∈ X .

This result implies that

H(X|Y, U) ≤ H(X|U)

=∑

u

p(u)H(X|U = u) = 0.

Hence, R1 ≥ minU−X−(Y,Z):H(X|U,Y,Z)=0 I(X ;U |Y ) ≥ H(X|Y ).

Lower Bound on R2

We now turn to proving the required lower bound for R2.

Proof. From cutset argument and following similar procedure as that for R1, we can

show the following lower bound for R2,

R2 ≥ minU−(X,Y )−Z:H(X|Z,U)=0

I(X, Y ;U).

The auxiliary U used in the lower bound can be taken as U = (UQ, Q), where UQ =

(ZQ−1,M2) and Q is a random variable uniformly distributed over [1 : n], independent

of other random variables. We now show that this expression is lower bounded by

H(X) following similar arguments to that forR1. Take (u, x) such that p(u|x) > 0 and

p(u) > 0 as before. Since p(u) > 0, there exists at least one z such that p(u, z) > 0.

Hence, we have H(X|U = u, Z = z) = 0 from the condition that H(X|U,Z) =

0. Therefore, p(x|z, u) is either 0 or 1. Next, we have p(x|u, z) ≥ p(x, u, z) =∑

y p(x, y, z, u) =∑

y p(x, y, z)p(u|x, y). Now, since p(y|x) > 0 for all (x, y) and

p(u|x) =∑

y p(y|x)p(u|x, y) > 0, there must exist a y such that p(u|x, y) > 0. Hence,

p(x|u, z) > 0 from p(u|x, y) > 0 and p(x, y, z) > 0, which implies that p(x|u, z) = 1

and p(x′|u, z) = 0 for x′ 6= x.

Since p(x′|u, z) ≥ p(x′, u, z) =∑

y p(x′, y, z)p(u|x′, y) = 0 and p(x′, y, z) > 0 for

all y by positivity, we have that p(u|x′, y) = 0 for all y, which implies that p(u|x′) =∑

y p(y|x′)p(u|x′, y) = 0. We therefore have p(u) =

∑

x′′ p(x′′)p(u|x′′) = p(u, x).

Hence, p(x|u) = 1 for some x ∈ X and every u such that p(u) > 0. Finally, the proof

is completed by observing that

minU−(X,Y )−Z:H(X|U,Z)=0

I(X, Y ;U) ≥ minU−(X,Y )−Z:H(X|U,Z)=0

H(X)−H(X|U)

= H(X).

Theorem 5.4 (Markov chain X − Y − Z). For the

lossless cascade source coding setting defined in section 5.2.2, if X − Y − Z form a

Markov chain, then the rate region for lossless causal source coding is given by

R1 ≥ H(X|Y ),

R2 ≥ I(X, Y ;U)

for some p(u|x, y) such that H(X|U,Z) = 0.

Remark: Unlike the case when noncausal side information Zn is available at the

end node, the case of X − Z − Y forming a Markov Chain is, to the best of our

knowledge, still open for the case of causal reconstruction at the end node.

Achievability sketch: Fix p(u|x, y) such that H(X|U,Z) = 0. We first describe

Xn to Node 1, which can be done using a rate of H(X|Y )+ δ1(ǫ). Node 1 then codes

for the end node using a lossy source code with Hamming distortion and distortion,

D = 0. Node 1 carries out this step out by covering (Xn, Y n) with Un, which requires

a rate of I(U ;X, Y ) + δ2(ǫ). Node 2 then reconstructs Xi by X2i = g2i(Ui, Zi). It

remains to show that this procedure yields a code with vanishing block probability

of error. First, consider (xn, zn, un(m2)) ∈ T(n)ǫ . Then, pX,Z,U(xi, zi, ui) > 0 for all

i ∈ [1 : n]. From the condition H(X|U,Z) = 0, we see that xi(ui, zi) = xi for

i ∈ [1 : n]. Next, let 0 < ǫ′ < ǫ and define the following error events

P(E1) := P((Xn, Y n) /∈ T(n)ǫ′ ),

122


P(E2) := P(Xn1 (Y

n) 6= Xn) (at Node 1),

P(E3) := P((Un(M2), Xn1 , Y

n, Zn) /∈ T (n)ǫ ).

Then, the probability of “error” is upper bounded by

P(E) ≤ P(E1) + P(E2 ∩ E c1) + P(E3 ∩ (E2 ∪ E1)

c).

P(E1) → 0 as n → ∞ by law of large numbers; P(E2 ∩ E c1) → 0 as n → ∞ since R1 =

H(X|Y )+δ1(ǫ) and P(E3∩(E2∪E1)c) → 0 as n → ∞ because R2 = I(U ;X, Y )+δ2(ǫ)

and by the conditional typicality lemma [13, Lecture 2], since U − (X, Y )− Z.

On E c, note that Xn1 = Xn and (Un(M2), X

n1 , Z

n) = (Un(M2), Xn, Zn) ∈ T (n)

ǫ .

Hence,

P(Xn2 6= Xn|E c) = 0,

which implies that P(Xn2 6= Xn) ≤ P(E). Since P(E) → 0 as n → ∞, this establishes

the achievability of the rate region.

Converse Sketch: As the proof of converse for this case is relatively straightforward

and is similar to that in Theorem 5.3, we give only a sketch.

The lower bound for R1 is established using straightforward cutset arguments and

the Markov Chain assumption X − Y − Z:

Given a (n, 2nR1, 2nR2) code, we have

nR1 ≥ H(M1)

= I(M ;Xn)

≥ I(Xn;M |Y n, Zn)

(a)= H(Xn|Y n, Zn)−H(Xn|Y n,M1,M2, Z

n)

(b)

≥ H(Xn|Y n, Zn)− nǫn

(c)= nH(X|Y )− nǫn.

(a) follows fromM2 being a function of Y n,M1; (b) follows from Fano’s inequality and

the assumption of vanishing block error probability at Node 2; (c) follows from the

i.i.d nature of the source and the side information, and the Markov Chain X−Y −Z.

The lower bound for R2 is established using procedures similar to those used in

establishing the lower bounds for R1 and R2 in Theorem 5.3. We therefore omit the

proof here.


In this chapter, we considered the cascade and triangular source coding with side

information and causal reconstruction at the end node. When the side information at

the source and intermediate nodes are the same, we characterized the rate distortion

regions for both the cascade and triangular source coding problems. For the cascade

setting with causal lossless reconstruction at the end node, we characterized the rate

region when the sources satisfy a positivity condition, or when a Markov chain holds.

124

Chapter 6

Conclusions

In this thesis, we discussed two sub-areas in Network Information Theory and pre-

sented results for both these areas: Information-Theoretic Secrecy and Multi-terminal

Source Coding.

In the area of Information-Theoretic Secrecy, we looked at two generalizations of

the Wiretap Channel. We first generalized the wiretap channel to more than one

legitimate receiver and to more than one eavesdropper. Inner bounds were given for

the various settings considered and were shown to be optimum for several classes of

three receivers wiretap channels. An interesting idea developed in the inner bounds

was the idea of structured noise generation, where we generate secrecy through a

publicly available superposition codebook. We then looked at a generalization of the

wiretap channel to a wiretap channel with state setup, where both the sender and the

legitimate receiver has access to the channel state information causally. We developed

a new inner bound to the secrecy capacity for this setting and showed that the inner

bound can be strictly larger than previously developed inner bound, which also uses

the more restrictive assumption of noncausal state information at the sender. The

main idea (and difference) in our inner bound is to use the state information as a key

in a block Markov setting to increase the secret message transmission rate.

The second part of the thesis deals with the Cascade and Triangular Source Coding

problem, where we first characterized the rate-distortion regions for both the Cascade

and Triangular Source Coding under an assumption on the statistical structure of the

125

source and side informations. Specifically, we showed that when we have the same side

information at the source node and the intermediate node, and a degraded version

of the side information at the end node, an optimal coding scheme is the decode and

re-bin coding scheme, where, at the intermediate node, we decode the codeword that

was meant for the end node and re-bin it to reduce the transmission rate. We then

extended our analysis to several other related settings. Next, motivated by the idea

that the decode and re-bin scheme is usually optimum when the side information at

Node 2 is poorer in some sense, we consider the Cascade and Triangular Source Coding

problem when we restrict the reconstruction at the end node to causal reconstruction.

When the side information at the source and intermediate nodes are the same, we

characterized the rate distortion regions for both the cascade and triangular source

coding problems, without the degradedness assumption. For the cascade setting with

causal lossless reconstruction at the end node, we characterized the rate region when

the sources satisfy a positivity condition, or when a Markov chain holds.

126

Appendix A

Proofs for Chapter 2

A.1 Proof of Lemma 2.1

First, define N(Un, Zn) = |{k ∈ [1 : 2nS] : (Un, V n(k), Zn) ∈ T(n)ǫ }|. Next, we

define the following “error” events. Let E1(Un, Zn) = 1 if {N(Un, Zn) ≥ (1 +

δ1(ǫ))2n(S−I(V ;Z|U)+δ(ǫ))} and E1 = 0 otherwise. Let E = 0 if (Un, V n(L), Zn) ∈ T

(n)ǫ

and E1(Un, Zn, L) = 0, and E = 1 otherwise. We now show that if S ≥ I(V ;Z|U) +

δ(ǫ), then P{E = 1} → 0 as n → ∞. By the union of events bound,

P{E = 1} ≤ P{(Un, V n(L), Zn) /∈ T (n)ǫ }+ P{E1(U

n, Zn, L) = 1}.

The first term tends to zero as n → ∞ by assumption. The second term is bounded

as follows. Define A(un, zn) as the event (E1(un, Zn) = 1) ∩ (Zn = zn)

P{E1(Un, Zn) = 1} =

∑

un∈T(n)ǫ

p(un)P{(E1(Un, Zn) = 1)|Un = un}

=∑

un∈T(n)ǫ

∑

zn∈T(n)ǫ (Z|U)

p(un)P{An(un, zn)|Un = un}

≤∑

un∈T(n)ǫ

p(un).

(

∑

zn∈T(n)ǫ (Z|U)

P{(E1(un, zn) = 1)|Un = un}

)

.

127

Now, P{E1(un, zn) = 1|Un = un} = P{N(un, zn) ≥ (1 + δ1(ǫ))2

n(S−I(V ;Z|U)+δ(ǫ))|Un =

un}. Define Xk = 1 if (un, V n(k), zn) ∈ T(n)ǫ and 0, otherwise. We note that Xk,

k ∈ [1 : 2nS], are i.i.d. Bernoulli p random variables, where 2−n(I(V ;Z|U)+δ(ǫ)) ≤ p ≤

2−n(I(V ;Z|U)−δ(ǫ)). We have

P

{(

N(un, zn)

≥ (1 + δ1(ǫ))2n(S−I(V ;Z|U)+δ(ǫ))

)

| Un = un

}

≤ P

2nS∑

k=1

Xk ≥ (1 + δ1(ǫ))2nSp

|Un = un

.

Applying the Chernoff Bound (e.g., see [13, Appendix B]), we have

P

2nS∑

k=1

Xk ≥ (1 + δ1(ǫ))2nSp|Un = un

≤ exp(−2nSpδ21(ǫ)/4)

≤ exp(−2n(S−I(V ;Z|U)−δ(ǫ))δ21(ǫ)/4).

Hence,

P{E1(Un, Zn) = 1} ≤

∑

un∈T(n)ǫ

p(un).

(

∑

zn∈T(n)ǫ (Z|U)

exp(−2n(S−I(V ;Z|U)−δ(ǫ))δ21(ǫ)/4)

)

≤ 2n log |Z| exp(−2n(S−I(V ;Z|U)−δ(ǫ))δ21(ǫ)/4),

which tends to zero as n → ∞ if S > I(V ;Z|U) + δ(ǫ).

We are now ready to bound H(L|C, Zn, Un). Consider

H(L,E|C, Un, Zn) ≤ 1 + P{E = 1}H(L|C, E = 1, Un, Zn)

+ P{E = 0}H(L|C, E = 0, Un, Zn)

≤ 1 + P{E = 1}nS + log((1 + δ1(ǫ))2n(S−I(V ;Z|U)+δ(ǫ)))

≤ n(S − I(V ;Z|U) + δ′(ǫ)).

This completes the proof of the lemma.

128

APPENDIX A. PROOFS FOR CHAPTER 2 129

A.2 Evaluation for example

We first give an upper bound for the extended Csiszar–Korner lower bound.

Fact : The extended Csiszar and Korner lower bound in (2.6) for the channel

shown in Figure 2.2 is upper bounded by

RCK ≤ min {I(X1; Y11)− I(X1;Z1) + I(V2; Y12|Q2)− I(V2;Z2|Q2), I(X1; Y21)

−I(X1;Z1)− I(V2;Z2|Q2)} .

for some p(x1)p(q2, v2)p(x2|v2).

Proof. From (2.6), we have

R ≤ maxp(q)p(v|q)p(x|v)

min {I(V ; Y1|Q)− I(V ;Z|Q), I(V ; Y2|Q)− I(V ;Z|Q)} .

Consider the first bound for RCK.

I(V ; Y1|Q)− I(V ;Z|Q) = I(V ; Y11, Y12|Q)− I(V ;Z1|Q)− I(V ;Z2|Q,Z1)

≤ I(V ; Y11, Y12, Z1|Q)− I(V ;Z1|Q)− I(V ;Z2|Q,Z1)

= I(V ; Y11, Y12|Q,Z1)− I(V ;Z2|Q,Z1)

= I(V ; Y11|Q,Z1, Y12) + I(V ; Y12|Q,Z1)− I(V ;Z2|Q,Z1)

= I(V ; Y11, Z1|Q, Y12)− I(V ;Z1|Q, Y12) + I(V ; Y12|Q,Z1)

− I(V ;Z2|Q,Z1)

(a)= I(V ; Y11|Q, Y12)− I(V ;Z1|Q, Y12) + I(V ; Y12|Q,Z1)

− I(V ;Z2|Q,Z1)

≤ I(V ′; Y11|Q)− I(V ′;Z1|Q) + I(V ; Y12|Q,Z1)

− I(V ;Z2|Q,Z1).

(a) follows from the structure of the channel which gives the Markov condition

(Q, Y12, V )− Y11 − Z1. The last step follows from defining V ′ = (V, Y12) and the fact

that Z1 is a degraded version of Y11.

Consider now the second bound.

RCK ≤ I(V ; Y2|Q)− I(V ;Z|Q)

= I(V ; Y21|Q)− I(V ;Z1|Q)− I(V ;Z2|Q,Z1)

≤ I(V ′; Y21|Q)− I(V ′;Z1|Q)− I(V ;Z2|Q,Z1).

Combining the bounds, we have

RCK ≤ min {I(V ′; Y11|Q)− I(V ′;Z1|Q) + I(V ; Y12|Q,Z1)− I(V ;Z2|Q,Z1), (A.1)

I(V ′; Y21|Q)− I(V ′;Z1|Q)− I(V ;Z2|Q,Z1)}

for some p(q)p(v|q)p(x1, x2|v). Now, we note that the terms I(V ′; Y11|Q)−I(V ′;Z1|Q)

and I(V ′; Y21|Q)− I(V ′;Z1|Q) depends only on the marginal distribution

p(q)p(v′|q)p(x1|v′)p(y21, y11, z1|x1). Similarly, defineQ′ = (Q,Z1), the terms I(V ; Y12|Q

′)−

I(V ;Z2|Q′) and I(V ;Z2|Q

′) depends only on the marginal distribution

p(q′, v, x2)p(y12, z2|x2). Therefore, we can further upper bound RCK by

RCK ≤ maxmin {I(V1; Y11|Q1)− I(V1;Z1|Q1) + I(V2; Y12|Q2)− I(V2;Z2|Q2),

I(V1; Y21|Q1)− I(V1;Z1|Q1)− I(V2;Z2|Q2)} ,

where the maximum is over p(q1)p(v1|q1)p(x1|v1) and p(q2)p(v2|q2)p(x2|v2)1. We now

further simplify this bound as follows.

RCK ≤ maxmin {I(V1; Y11|Q1)− I(V1;Z1|Q1) + I(V2; Y12|Q2)− I(V2;Z2|Q2),

I(V1; Y21|Q1)− I(V1;Z1|Q1)− I(V2;Z2|Q2)} ,

≤ maxmin {I(X1; Y11)− I(X1;Z1) + I(V2; Y12|Q2)− I(V2;Z2|Q2),

I(X1; Y21)− I(X1;Z1)− I(V2;Z2|Q2)} ,

where the maximum is now over distributions of the form p(x1) and p(q2)p(v2|q2)p(x2|v2).

1To see that this bound is at least as large as the previous bound in (A.1), set V1 = V ′, Q1 = Q,V2 = (V,Q′) and Q2 = Q′ in this bound to recover the previous bound

130


The last step follows from the fact that Z1 is degraded with respect to both Y21 and

Y11.

Next, we evaluate this upper bound. We will make use of the entropy relation-

ship [6]: H(ap, 1− p, (1− a)p) = H(p, 1− p) + pH(a, 1− a). First consider the terms

for the first channel components,

(I(X1; Y11) − I(X1;Z1)) and (I(X1; Y21) − I(X1;Z1)). Letting P{X1 = 0} = γ and

evaluating the individual expressions, we obtain

I(X1; Y21) = H(γ, 1− γ),

I(X1; Y11) = H

(

γ

2,1

2,1− γ

2

)

− 1

=1

2H(γ, 1− γ),

I(X1;Z1) = H

(

γ

6,5

6,5(1− γ)

6

)

−H

(

1

6,5

6

)

=1

6H(γ, 1− γ).

This gives

I(X1; Y21)− I(X1;Z1) =5

6H(γ, 1− γ),

I(X1; Y11)− I(X1;Z1) =1

3H(γ, 1− γ).

Note that both expressions are maximized by setting γ = 1/2, which yields

RCK ≤ min

{

1

3+ I(V2; Y12|Q2)− I(V2;Z2|Q2),

5

6− I(V2;Z2|Q2)

}

. (A.2)

Next, we consider the second channel component terms. Let αi = p(q2i), βj,i =

p(v2j |q2i), P{X2 = 0|V2 = v2j} = µj, and P{V2 = v2j} = νj , then

I(V2;Z2|Q2) =∑

i

αiH

(

∑

j βj,iµj

2,1

2,

∑

j βj,i(1− µj)

2

)

−∑

j

νjH

(

µj

2,1

2,(1− µj)

2

)

=1

2

∑

i

αiH

(

∑

j

βj,iµj,∑

j

βj,i(1− µj)

)

−1

2

∑

j

νjH (µj, (1− µj)) ,

I(V2; Y12|Q2) =∑

i

αiH

(

∑

j

βj,iµj,∑

j

βj,i(1− µj)

)

−∑

j

νjH (µj , (1− µj)) .

This implies that

I(V2; Y12|Q2)− I(V2;Z2|Q2) =1

2

∑

i

αiH

(

∑

j

βj,iµj ,∑

j

βj,i(1− µj)

)

−1

2

∑

j

νjH (µj, (1− µj)) .

Comparing the above expressions, we see that I(V2;Z2|Q2) = 0 implies that I(V2; Y12|Q2)−

I(V2;Z2|Q2) = 0. This, together with (A.2), implies that RCK is strictly less than

5/6.

In comparison, consider the new lower bound in Corollary 2.1. Setting V = X1

and X1 and X2 independent Bernoulli 1/2, we have

I(X1, X2; Y11, Y12)− I(X1, X2;Z1, Z2) = I(X1; Y11)− I(X1;Z1) + I(X2; Y12)− I(X2;Z2)

=1

3+

1

2=

5

6,

I(V ; Y2)− I(V ;Z) = I(X1; Y21)− I(X1;Z1, Z2)

= I(X1; Y21)− I(X1;Z1)

=5

6.

Thus, R = 5/6 is achievable using the new scheme, which shows that the our lower

bound can be strictly larger than the extended Csiszar and Korner lower bound. In

fact, R = 5/6 is the capacity for this example since the channel is a special case of the

reversely degraded broadcast channel considered in [18] and we can use the converse

result therein to show that CS ≤ 5/6.

132


A.3 Proof of Theorem 2.2

Using Fourier–Motzkin elimination on the rate constraints gives the following region.

R0 < I(U ;Z),

R1 < min {I(V0, V1; Y1|U)− I(V1;Z|V0), I(V0, V2; Y2|U)− I(V2;Z|V0)} ,

2R1 < I(V0, V1; Y1|U) + I(V0, V2; Y2|U)− I(V1;V2|V0), (A.3)


R0 + 2R1 < I(V0, V1; Y1) + I(V0, V2; Y2|U)− I(V1;V2|V0), (A.4)

R0 + 2R1 < I(V0, V2; Y2) + I(V0, V1; Y1|U)− I(V1;V2|V0), (A.5)

2R0 + 2R1 < I(V0, V1; Y1) + I(V0, V2; Y2)− I(V1;V2|V0), (A.6)

Re ≤ R1,


2Re < I(V0, V1; Y1|U) + I(V0, V2; Y2|U)− I(V1;V2|V0)− 2I(V0;Z|U), (A.7)




2R0 + 2Re < I(V0, V1; Y1) + I(V0, V2; Y2)− I(V1;V2|V0)− 2I(V0;Z|U), (A.8)

with the constraint of I(V1, V2;Z|V0) ≤ I(V1;Z|V0) + I(V2;Z|V0) − I(V1;V2|V0) on

the set of possible probability distributions. Due to this constraint, the numbered

inequalities in the above region are redundant.

We now complete the proof by using rate splitting. This is equivalent to letting

R1 = R′′1 , R0 = Rn

0 +R′1 in the above region and letting the new rates be Rn

0 for the

common message and Rn1 = R′

1+R′′11 for the private message. Using Fourier-Motzkin

to eliminate the auxiliary rates R′1 and R′′

1 then results in the following region.

R0 < I(U ;Z),


I(V0, V2; Y2|U)− I(V2;Z|V0)} ,


Re ≤ R1,





R0 +R1 +Re < min {I(V0, V1; Y1|U)− I(V1;Z|V0), I(V0, V2; Y2|U)− I(V2;Z|V0)}

+min {I(V0, V1; Y1)− I(V1, V0;Z|U), I(V0, V2; Y2)− I(V2, V0;Z|U)} ,

R0 +R1 + 2Re < min {I(V0, V1; Y1|U)− I(V1;Z|V0), I(V0, V2; Y2|U)− I(V2;Z|V0)}

+ I(V0, V1; Y1) + I(V0, V2; Y2|U)

− I(V1;V2|V0)− 2I(V0;Z|U),

R0 +R1 + 2Re < min {I(V0, V1; Y1|U)− I(V1;Z|V0), I(V0, V2; Y2|U)− I(V2;Z|V0)}

+ I(V0, V2; Y2) + I(V0, V1; Y1|U)− I(V1;V2|V0)− 2I(V0;Z|U).

Eliminating redundant inequalities then results in

R0 < I(U ;Z),


I(V0, V2; Y2|U)− I(V2;Z|V0)} ,


Re ≤ R1,


134






− I(V1;V2|V0)− 2I(V0;Z|U),


− I(V1;V2|V0)− 2I(V0;Z|U).

A.4 Converse for Proposition 2.2

The R1 inequalities follow from a technique used in [26, Proposition 11]. We provide

the proof here for completeness.

nR1 ≤∑

i

I(M1; Y1i|M0, Yn1,i+1) + nǫn

≤∑

i

I(M1; Y1i|M0, Yn1,i+1, Z

i−1) +∑

i

I(Z i−1; Y1i|M0, Yn1,i+1) + nǫn

(a)

≤∑

i

I(M1, Yn1,i+1; Y1i|M0, Z

i−1)−∑

i

I(Y n1,i+1; Y1i|M0, Z

i−1)

+∑

i

I(Y n1,i+1;Zi|M0, Z

i−1) + nǫn

(b)

≤∑

i

I(Xi; Y1i|M0, Zi−1) + nǫn

=∑

i

I(Xi; Y1i|Ui) + nǫn,

where (a) follows by the Csiszar sum lemma; and (b) follows by the assumption that

Y1 is less noisy than Z and the data processing inequality. The other inequality

involving Y2 and Z can be shown in a similar fashion.

We now turn to the Re inequalities. The fact that Re ≤ R1 is trivial. We show

the other 2 inequalities. We have

nRe ≤ I(M1; Yn1 |M0)− I(M1;Z

n|M0) + nǫn

=

n∑

i=1

(

I(M1; Y1i|M0, Yn1,i+1)− I(M1;Zi|M0, Z

i−1))

+ nǫn

(a)=

n∑

i=1

(

I(M1, Zi−1; Y1i|M0, Y

n1,i+1)− I(M1, Y

n1,i+1;Zi|M0, Z

i−1))

+ nǫn

(b)=

n∑

i=1

(

I(M1; Y1i|M0, Yn1,i+1, Z

i−1)− I(M1;Zi|M0, Zi−1, Y n

1,i+1))

+ nǫn

(c)

≤n∑

i=1

(

I(M1, Yn1,i+1; Y1i|M0, Z

i−1)− I(M1, Yn1,i+1;Zi|M0, Z

i−1))

+ nǫn

(d)

≤n∑

i=1

(I(Xi; Y1i|Ui)− I(Xi;Zi|Ui)) + nǫn,

where (a) and (b) follow by the Csiszar sum lemma; (c) follows by the less noisy

assumption; (d) follows by the less noisy assumption and the fact that conditioned on

(M0, Zi−1), (M1, Y

ni+1) → Xi → (Y1i, Zi). The second inequality involving I(X ; Y2|U)−

I(X ;Z|U) can be proved in a similar manner. Finally, applying the independent ran-

domization variable Q ∼ Unif[1 : n], i.e. uniformly distributed over [1 : n], and

defining U = (UQ, Q), X = XQ, Y1 = Y1Q, Y2 = Y2Q and Z = ZQ then completes the

proof.

A.5 Proof of Proposition 2.3

In cases two to four, the codebook generation, encoding and decoding procedures are

the same as Case 1, but with different rate definitions. We therefore do not repeat

these steps here.

Case 2 : Assume that I(U3;Z3)−R0−I(U3;Z2|U) ≥ 0, I(V ; Y1|U3)−I(V ;Z2|U3) ≤

I(V ; Y1|U3)− I(V ;Z3|U3) and Re3 ≤ I(V ; Y1|U3)− I(V ;Z2|U3).

In this case, using the definitions of the split message and randomization rates as

136


in case 1, we see that we can achieve Re3 = I(V ; Y1|U3) − I(V ;Z2|U3) by defining

R′11 = I(V ; Y1|U3)− I(V ;Z2|U3) and R′′

11 = 0. The equivocation rate constraints now

are

Ro10 +Rr

o > I(U3;Z2|U),

Rr1 +Ro

11 > I(V ;Z2|U3).

Performing Fourier-Motzkin elimination as before then yields the rate-equivocation

region given in Case 2.

Case 3 : Assume that I(U3;Z3)−R0−I(U3;Z2|U) ≥ 0, I(V ; Y1|U3)−I(V ;Z2|U3) ≥

I(V ; Y1|U3)− I(V ;Z3|U3)

In this case, since we consider only the case of R1 ≥ I(V ; Y1|U3) − I(V ;Z3|U3), an

equivocation rate of Re3 = I(V ; Y1|U3) − I(V ;Z3|U3) can be achieved by setting

R′11 = I(V ; Y1|U3)− I(V ;Z3|U3). The constraints for this case are as follow.

Decoding Constraints :

R0 +Ro10 +Rr

0 +Rs10 < I(U3;Z3),

Rs10 +Ro

10 +Rr0 < I(U3; Y1|U),

R′11 +R′′

11 +Ro11 +Rr

1 < I(V ; Y1|U3).

Equivocation rate constraints :

Ro10 +Rr

o > I(U3;Z2|U),

R′′11 +Rr

1 +Ro11 > I(V ;Z3|U3),

Rr1 +Ro

11 > I(V ;Z2|U3).

Greater than or equal to zero constraints :

Ro10, R

o0, R

′11, R

′′11, R

r1, R

r0 ≥ 0.

Equality constraints :

R1 = Ro10 +Rs

10 +R′11 +R′′

11 +Ro11,

Re2 = Rs10 +R′

11 +R′′11,

Re3 = R′11,

R′11 = I(V ; Y1|U3)− I(V ;Z3|U3).

Performing Fourier-Motzkin elimination then results in the rate-equivocation re-

gion for Case 3.

Case 4 : Assume that I(U3;Z3) − R0 − I(U3;Z2|U) ≤ 0. In this case, note that

Re2 ≤ min{R1, I(V ; Y1|U3)−I(V ;Z2|U3)} and can be achieved using only the V n layer

of codewords. We setRs10 = 0 in this case. If I(V ; Y1|U3)−I(V ;Z2|U3) ≤ I(V ; Y1|U3)−

I(V ;Z3|U3), then Re2 = I(V ; Y1|U3)− I(V ;Z2|U3) and Re3 = min{R1, I(V ; Y1|U3)−

I(V ;Z3|U3)} are achievable. If I(V ; Y1|U3)−I(V ;Z2|U3) ≥ I(V ; Y1|U3)−I(V ;Z3|U3),

then Re3 = I(V ; Y1|U3)− I(V ;Z3|U3) and Re2 = min{R1, I(V ; Y1|U3)− I(V ;Z2|U3)}

are achievable.

A.6 Proof of Proposition 2.4

As in [26], we establish bounds for the channel from X to (Y1, Z2) and for the channel

from X to (Y1, Z3).

The X to (Y1, Z2) bound : We first prove bounds on R0 and R1. Define the auxiliary

random variables Ui = (M0, Yi−11 ), U3i = (M0, Y

i−11 , Zn

3,i+1), and Vi = (M1,M0, Zn3,i+1, Y

i−11 )

for i = 1, 2, . . . , n. Then, following the steps of the converse proof in [11], it is straight-

forward to show that

R0 ≤1

n

n∑

i=1

I(Ui;Z2i) + ǫn,

R1 ≤1

n

n∑

i=1

(I(Vi; Y1i|Ui)) + ǫn,

where ǫn → 0 with n.

138


To bound Re2, first consider

H(M1|Zn2 )

(a)

≤ H(M1|Zn2 ,M0) + nǫn

(b)= H(M1)− I(M1;Z

n2 |M0) + nǫn

(c)

≤ I(M1; Yn1 |M0)− I(M1;Z

n2 |M0) + nǫn

=n∑

i=1

(

I(M1; Y1i|M0, Yi−11 )− I(M1;Z2i|M0, Z

i−12 )

)

+ nǫn

(d)

≤n∑

i=1

(

I(Xi; Y1i|M0, Yi−11 )− I(Xi;Z2i|M0, Z

i−12 )

)

+ nǫn

=

n∑

i=1

(

I(Xi; Y1i|Ui)−H(Z2i|M0, Zi−12 ) +H(;Z2i|M0, Z

i−12 , Xi)

)

+ nǫn

(e)

≤n∑

i=1

(

I(Xi; Y1i|Ui)−H(Z2i|M0, Zi−12 , Y i−1) +H(Z2i|M0, Z

i−12 , Y i−1, Xi)

)

+ nǫn

(f)=

n∑

i=1

(I(Xi; Y1i|Ui)− I(Xi;Z2i|Ui)) + nǫn,

where (a) and (c) follow by Fano’s inequality, (b) follows by the independence of M1

and M0. (d), (e) and (f) follows by degradation of the channel from X → Y1 → Z2,

which implies Z i−12 → Y i−1

1 → Xi → Y1i → Z2i by physical degradedness. For the

next inequality, we use the fact that a stochastic encoder p(xn|M0,M1) can be treated

as a deterministic mapping of (M0,M1) and an independent randomization variable

W onto Xn.

nR0 + nRe2 = H(M0) +H(M1|Zn2 )

(a)

≤ H(M0) +H(M1|Zn2 ,M0) + nǫn

= I(M0;Zn3 ) +H(M1|M0)−H(M1|M0) +H(M1|Z

n2 ,M0) + nǫn

= I(M0;Zn3 ) + I(M1; Y

n1 |M0)− I(M1;Z

n2 |M0) + nǫn

(b)

≤ I(M0;Zn3 ) + I(M1,W ; Y n

1 |M0)− I(M1,W ;Zn2 |M0) + nǫn

(c)

≤n∑

i=1

(I(U3i;Z3i) + I(Xi; Y1i|U3i))− I(M1,W ;Zn2 |M0) + nǫn

(d)

≤n∑

i=1

(I(U3i;Z3i) + I(Xi; Y1i|U3i))−n∑

i=1

H(Z2i|M0, Yi−11 )

+n∑

i=1

H(Z2i|M1,M0,W, Z i−12 ) + nǫn

(e)=

n∑

i=1


i=1

H(Z2i|M0, Yi−11 )

+n∑

i=1

H(Z2i|M1,M0,W, Y i−11 ) + nǫn

=

n∑

i=1


i=1

I(M1,W,M0;Z2i|M0, Yi−11 )

+ nǫn

(f)=

n∑

i=1


i=1

I(Xi;Z2i|M0, Yi−11 ) + nǫn

=

n∑

i=1


i=1

I(Xi;Z2i|Ui) + nǫn,

where (a) follows by Fano’s inequality and H(M0|Zn2 ) ≤ nǫn; (b) follows by degra-

dation of the channel from X → (Y1, Z2); (c) by Csiszar sum applied to the first two

terms (see for e.g. [7]); (d) follows by the fact that conditioning reduces entropy; (e)

follows by the Markov relation: Z i−12 → Y i−1

1 → (M0,M1,W ) → Z2i; (f) follows by

the fact that Xi if a function of (M0,M1,W ). This chain of inequalities implies that

Re2 ≤1

n

n∑

i=1

(I(U3i;Z3i)− I(U3i;Z2i|Ui))− R0

+1

n

n∑

i=1

(I(Xi; Y1i|U3i)− I(Xi;Z2i|U3i)) + ǫn

≤

[

1

n

n∑

i=1

(I(U3i;Z3i)− I(U3i;Z2i|Ui))− R0

]+

140


+1

n

n∑

i=1

(I(Xi; Y1i|U3i)− I(Xi;Z2i|U3i)) + ǫn.

Finally, we arrive at single letter expressions by introducing the time-sharing ran-

dom variable Q ∼ Unif[1 : n], i.e. uniformly distributed over [1 : n], indepen-

dent of (M0,M1, X, Y1, Z2, Z3,W ), and defining UQ = (M0, YQ−11 ), U = (UQ, Q),

VQ = (M1, U, Zn2,Q+1), Y1 = Y1Q and Z2 = Z2Q to obtain the following bounds

R0 ≤ I(U ;Z2) + ǫn,

R1 ≤ I(V ; Y1|U) + ǫn,

Re2 ≤ (I(X ; Y1|U)− I(X ;Z2|U)) + ǫn,

Re2 ≤ [I(U3;Z3)−R0 − I(U3;Z2|U)]+ + I(X ; Y1|U3)− I(X ;Z2|U3) + ǫn.

The X → (Y1, Z3) bound : The inequalities involving X → (Y1, Z3) follow standard

converse techniques. First, applying the proof techniques from [12], we obtain the

following bounds for the rates

R0 ≤ min

{

1

n

n∑

i=1

I(U3i;Z3i),1

n

n∑

i=1

I(U3i; Y1i)

}

+ ǫn,

R0 +R1 ≤1

n

n∑

i=1

(I(Vi; Y1i|U3i) + I(U3i;Z3i)) + ǫn.

We now turn to the second secrecy bound,

H(M1|Zn3 ) ≤ H(M1,M0|Z

n3 ) = H(M1|Z

n3 ,M0) +H(M0|Z

n3 )

(a)

≤ H(M1|Zn3 ,M0) + nǫn

(b)

≤ H(M1|Zn3 ,M0)−H(M1|Y

n1 ,M0) + nǫn

= I(M1; Yn1 |M0)− I(M1;Z

n3 |M0) + nǫn,

where (a) and (b) follow by Fano’s inequality. Using the Csiszar sum lemma, we can

obtain the following

H(M1|Zn3 ) ≤

n∑

i=1

(

I(M1; Y1i|M0, Yi−11 )− I(M1;Z3i|M0, Z

n3,i+1)

)

+ nǫn

(a)=

n∑

i=1

(

I(M1, Zn3,i+1; Y1i|M0, Y

i−11 )− I(M1, Y

i−11 ;Z3i|M0, Z

n3,i+1)

)

+ nǫn

(b)=

n∑

i=1

(

I(M1; Y1i|M0, Yi−11 , Zn

3,i+1)− I(M1;Z3i|M0, Zn3,i+1, Y

i−11 )

)

+ nǫn

=

n∑

i=1

(I(Vi; Y1i|U3i)− I(Vi;Z3i|U3i)) + nǫn,

where both (a) and (b) are obtained using the the Csiszar sum lemma. Applying the

independent randomization variable Q ∼ Unif[1 : n], i.e. uniformly distributed over

[1 : n], we obtain

R0 ≤ min{I(U3;Z3), I(U3; Y1)}+ ǫn,

R0 +R1 ≤ I(U3;Z3) + I(V ; Y1|U3) + ǫn,

Re3 ≤ I(V ; Y1|U3)− I(V ;Z3|U3) + ǫn,

where U3Q = (M0, YQ−11 , Zn

3,Q+1), U3 = (U3Q, Q), Y1 = Y1Q and Z3 = Z3Q. This

completes the proof of the outer bound.

142

Appendix B


B.1 Proof of Proposition 3.1

1) The proof of this part follows largely from Lemma 2 in Lecture 22 of [13]. For

completeness, we give the proof here. Consider

H(Kj|C) ≥ P{Sn ∈ T (n)ǫ }H(Kj|C,S(j) ∈ T (n)

ǫ )

≥ (1− ǫ′n)H(Kj|C,S(j) ∈ T (n)ǫ ).

Let P (kj) be the random pmf of Kj given {S(j) ∈ T(n)ǫ }, where the randomness is

induced by the random bin assignment (codebook) C.

By symmetry, P (kj), kj ∈ [1 : 2nRK ], are identically distributed. We express P (1)

in terms of a weighted sum of indicator functions as

P (1) =∑

sn∈T(n)ǫ

p(sn)

P{Sn ∈ T(n)ǫ }

· I{sn∈B(1)}.

It can be easily shown that

EC(P (1)) = 2−nRK ,

Var(P (1))

143

= 2−nRK (1− 2−nRK)∑

xn∈T(n)ǫ

(

p(sn)

P{S(j) ∈ T(n)ǫ }

)2

≤ 2−nRK2n(H(S)+δ(ǫ)) 2−2n(H(S)−δ(ǫ))

(1− ǫ′n)2

≤ 2−n(RK+H(S)−4δ(ǫ))

for sufficiently large n.

By the Chebyshev inequality,

P{|P (1)− E(P (1))| ≥ ǫE(P (1))} ≤Var(P (1))

(ǫE(P (1)))2

≤2−n(H(S)−RK−4δ(ǫ))

ǫ2.

Note that if RK < H(S)− 4δ(ǫ), this probability → 0 as n → ∞. Now, by symmetry

H(K1|C,S(j) ∈ T (n)ǫ ) = 2nRK E(P (1)) log(1/P (1)))

≥ 2nRK P{|P (1)− E(P (1))| < ǫ2−nRK}.

E(

P (1) log(1/P (1))∣

∣ |P (1)− E(P (1))| < ǫ2−nRK)

≥

(

1−2−n(H(S)−RK−4δ(ǫ))

ǫ2

)

.(nRK(1− ǫ)− (1− ǫ) log(1 + ǫ))

≥ n(RK − δ(ǫ))

for sufficiently large n and RK < H(S)− 4δ(ǫ).

Thus, we have shown that if RK < H(S)− 4δ(ǫ), H(Kj|C) ≥ n(RK − δ(ǫ)) for n

sufficiently large. This completes the proof of part 1 of Proposition 3.1. Note now

that since H(S|Z) ≤ H(S), the same results also holds if RK ≤ H(S|Z)− 4δ(ǫ).

2) We need to show that if RK < H(S|Z) − 3δ(ǫ), then I(Kj;Z(j)|C) ≤ 2nδ(ǫ)

for every j ∈ [1 : b]. We have

I(Kj;Z(j)|C) = I(S(j);Z(j)|C)− I(S(j);Z(j)|Kj, C).

144

APPENDIX B. PROOFS FOR CHAPTER 3 145

We analyze the terms separately. For the first term, we have

I(S(j);Z(j)|C) = I(S(j), L;Z(j)|C)− I(L;Z(j)|S(j), C)

≤ I(Un,S(j);Z(j)|C)−H(L|S(j), C) +H(L|C,S(j),Z(j))

≤ nI(U, S;Z)−H(L|S(j), C) +H(L|C,S(j), Zn)

(a)

≤ nI(U, S;Z)−H(L|S(j), C) + n(R − I(U ;Z, S) + δ′(ǫ))

= nR −H(Mj0|C)−H(Mj1 ⊕Kj−1|C,Mj0)

−H(L|Mj0,Mj1 ⊕Kj−1, C) + nI(S;Z) + nδ′(ǫ)

≤ nR− nR0 −H(Mj1 ⊕Kj−1|C,Mj0, Kj−1)

− n(R− R0 −RK) + nI(S;Z) + nδ′(ǫ)

= nRK −H(Mj1|C,Mj0, Kj−1) + nI(S;Z) + nδ′(ǫ)

= n(I(S;Z) + δ′(ǫ)),

where step (a) follows by applying Lemma 3.1, which holds since R−R0 > I(U ;Z, S)+

δ(ǫ) and P((Un(L),S(j),Z(j)) ∈ T(n)ǫ ) → 1 as n → ∞. For the second term we have

I(S(j);Z(j)|Kj, C) = H(S(j)|Kj, C)−H(S(j)|Z(j), Kj, C)

= H(S(j), Kj|C)−H(Kj|C)−H(S(j)|Z(j), Kj, C)

≥ nH(S)− nRK −H(S(j)|Z(j), Kj, C)

≥ n(H(S)− RK)−H(S(j)|Z(j), Kj)

(b)

≥ n(H(S)−RK)− n(H(S|Z)−RK + δ′′(ǫ))

= nI(S;Z)− nδ′(ǫ),

where (b) follows by showing that H(S(j)|Z(j), Kj) ≤ n(H(S|Z)−RK + δ(ǫ)). This

requires the condition RK < H(S|Z) − 3δ(ǫ). Combining the bounds for the 2 ex-

pressions gives I(Kj ;Z(j)|C) ≤ n(δ′(ǫ) + δ′′(ǫ)).

Proof of step (b): Give an arbitrary ordering to the set of all state sequences sn with

S(j) = sn(T ) for some T ∈ [1 : 2n log |S|]. Hence, H(S(j)|Z(j), K) = H(T |K,Z(j)).

From the coding scheme, we know that P{(sn(T ),Z(j)) ∈ T(n)ǫ } → 1 as n → ∞.

Note here that T is random and corresponds to the realization of Sn.

Now, fix T = t, Z(j) = zn, K = k and define N(zn, k, t) = |l ∈ [1 : |T(n)ǫ (S)|] :

(sn(l), zn) ∈ T(n)ǫ , l 6= t, sn(l) ∈ B(k)|. For zn /∈ T

(n)ǫ , N(zn, k, t) = 0. For zn ∈ T

(n)ǫ ,

it is easy to show that

|T(n)ǫ (S|Z)| − 1

2nRK≤ E(N(zn, k, t)) ≤

|T(n)ǫ (S|Z)|

2nRK,

Var(N(zn, k, t)) ≤|T

(n)ǫ (S|Z)|

2nRK.

By the Chebyshev inequality,

P{N(zn, k, t) ≥ (1 + ǫ)E(N(zn, k, t))} ≤Var(N(zn, k, t))

(ǫE(N(zn, k, t)))2

≤2−n(H(S|Z)−3δ(ǫ)−RK )

ǫ2.

Note that P{N(zn, k, t) ≥ (1+ǫ)E(N(zn, k, t))} → 0 as n → ∞ ifR < H(S|Z)−3δ(ǫ).

Now, define the following events

E1 = {(S(j),Z(j)) /∈ T (n)ǫ },

E2 = {N(Z(j), K, T ) ≥ (1 + ǫ)E(N(Z(j), K, T ))}.

Let E = 0 if E c1 ∩ E c

2 occurs and 1 otherwise. We have

P(E = 1) ≤ P(E1) + P(E2)

≤∑

(zn,sn(t))∈T(n)ǫ , k

(p(zn, t, k)P {N(zn, k, t) ≥ (1 + ǫ)E(N(zn, k, t))})

+ P(E1) + P{(sn(T ),Z(j)) /∈ T (n)ǫ }.

P{(sn(T ),Z(j)) /∈ T(n)ǫ } = P(E1) and P(E1) → 0 as n → ∞ by the coding scheme.

For the second term, P{N(zn, k, t) ≥ (1 + ǫ)E(N(zn, k, t))} → 0 as n → ∞ if R <

H(S|Z)− 3δ(ǫ). Hence, P(E = 1) → 0 as n → ∞ if if R < H(S|Z)− 3δ(ǫ).

146


We can now bound H(T |K,Zn) by

H(T |K,Zn) ≤ 1 + P(E = 1)H(T |K,Zn, E = 1)

+H(T |K,Zn, E = 0)

≤ n(H(S|Z)− RK + δ(ǫ)).

This completes the proof of part 2.

3) To upper bound I(Kj ;Zj|C), we use an induction argument assuming that

I(Kj−1;Zj−1|C) ≤ nδj−1(ǫ), where δj−1(ǫ) → 0 as ǫ → 0. Note that the proof for

j = 2 follows from part 2. Consider

I(Kj;Zj |C) = I(Kj;Z(j)|C) + I(Kj ;Z

j−1|C,Z(j))

(a)

≤ n(δ′(ǫ) + δ′′(ǫ)) + I(Kj;Zj−1|C,Z(j))

= H(Zj−1|C,Z(j))−H(Zj−1|C,Z(j), Kj) + n(δ′(ǫ) + δ′′(ǫ))

≤ H(Zj−1|C)−H(Zj−1|C, Kj−1,Z(j), Kj) + n(δ′(ǫ) + δ′′(ǫ))

(b)= H(Zj−1|C)−H(Zj−1|C, Kj−1) + n(δ′(ǫ) + δ′′(ǫ))

= I(Kj−1;Zj−1|C) + n(δ′(ǫ) + δ′′(ǫ))

(c)

≤ nδj−1(ǫ) + n(δ′(ǫ) + δ′′(ǫ)),

where (a) follows from part 2 of the Proposition; (b) follows from the Markov Chain

relation Zj−1 → Kj−1 → (Z(j), Kj); (c) follows from the induction hypothesis. This

completes the proof since the last line implies that there exists a δ′′′(ǫ), where δ′′′(ǫ) →

0 as ǫ → 0, that upper bounds I(Kj ;Zj|C) for j ∈ [1 : b].


1) We first show that if RK < H(S) − 4δ(ǫ), then H(Kj|C) ≥ n(RK − δ(ǫ)). This

is done in the same manner as for part 1 of Proposition 3.1. The proof is therefore

omitted.

2) We need to show that if RK < H(S|Z) − 3δ(ǫ), then I(Kj;Z(j)|C) ≤ 2nδ(ǫ)

for every j ∈ [1 : b]. We have

I(Kj;Z(j)|C) = I(S(j);Z(j)|C)− I(S(j);Z(j)|Kj, C).

We analyze the terms separately. For the first term, we have

I(S(j);Z(j)|C) = I(S(j), L;Z(j)|C)− I(L;Z(j)|S(j), C)

≤ I(Un,S(j);Z(j)|C)−H(L|S(j), C) +H(L|C,S(j),Z(j))

≤ nI(U, S;Z)−H(L|S(j), C) +H(L|C,S(j), Zn)

(a)

≤ nI(U, S;Z)−H(L|C) + n(R − I(U ;Z, S) + δ′(ǫ))

= nR −H(K(j−1)d|C)−H(K(j−1)m ⊕Mj |C) + nI(S;Z)

−H(L|K(j−1)m ⊕Mj , K(j−1)d) + nδ′(ǫ)

(b)

≤ n(R− Rd − R− R +Rd +R + δ(ǫ) + δ′(ǫ)) + nI(S;Z)

= n(I(S;Z) + δ(ǫ) + δ′(ǫ)),

where step (a) follows by applying Lemma 1, which holds by the condition R >

I(U ;Z, S)+ δ(ǫ) and the fact that P((Un(L),S(j),Z(j)) ∈ T(n)ǫ ) → 1 as n → ∞ from

the encoding scheme. Step (b) follows from part 1 of Proposition 3.2: H(Kj−1|C) ≥

n(RK−δ(ǫ)), which implies thatH(K(j−1)d|C) ≥ n(Rd−δ(ǫ)). Note that we implicitly

assumed j ≥ 2. The case of j = 1 is straightforward since H(L|C) = nR by the fact

that we transmit a codeword picked uniformly at random.

The proof that I(S(j);Z(j)|Kj, C) ≥ nI(S;Z)− nδ′′(ǫ) follows the same steps as

the proof of part 2 of Proposition 3.1 and requires the same condition that RK <

H(S|Z)− 3δ(ǫ).

3) This part is proved in the same manner as part 3 of Proposition 3.1.

148

The proof of step (b) follows the same steps as in the proof of part 2 of Proposi-

tion 3.1. We can show that step (b) holds if RK < H(S|Z, V )− 3δ(ǫ).

Combining the two terms then give the required upper bound which completes

the proof of Part 2.

3) This part is proved in the same manner as part 3 of Proposition 3.1.

150

Appendix C


C.1 Achievability proof of Theorem 4.1

Codebook Generation

• Fix the joint distribution p(x, y, z, u, x1) = p(x)p(y|x)p(z|y)p(u|x, y)p(x1|x, y, u).

Let R = R10 +R11, Rl ≥ R10 and R2 ≥ R10.

• Generate 2nR10 Un(l) sequences, l ∈ [1 : 2nR1 ], each according to∏n

i=1 p(ui).

• Partition the set of Un sequences into 2nR10 bins, B1(m10), m10 ∈ [1 : 2nR10 ].

Separately and independently, partition the set of Un sequences into 2nR2 bins,

B2(m2), m2 ∈ [1 : 2nR2].

• For each un(l) and yn sequences, generate 2nR11 Xn1 (l, m11) sequences according

to∏n

i=1 p(x1i|ui, yi).

Encoding at the encoder

Given a (xn, yn) pair, the encoder first looks for an index l ∈ [1 : 2nRl] such that

(un(l), xn, yn) ∈ T (n)ǫ , where T (n)

ǫ stands for the set of jointly typical sequences. If

there are more than one such l, it selects one uniformly at random from the set of

admissible indices. If there is none, it sends an index uniformly at random from

151

[1 : 2nRl]1. Next, it finds the index m11 such that (x1(l, m11), un(m10), x

n, yn) ∈ T(n)ǫ .

As before, if there is more than one, it selects one uniformly at random from the set

of admissible indices. If there is none, it sends an index uniformly at random from

[1 : 2nR11 ]. Finally, it sends out (m10, m11), where m10 is the bin index such that

un(l) ∈ B1(m10). The total rate required is R.

Decoding and reconstruction at Node 1

Given (m10, m11), Node 1 looks for the unique l such that (un(l), yn) ∈ T(n)ǫ and

un(l) ∈ B1(l). It reconstructs xn as xn(l, m11). If it failed to find a unique one, or if

there is more than one, it outputs l = 1 and performs the reconstruction as before.

Encoding at Node 1

Node 1 sends an index m2 such that un(l) ∈ B2(m2). This requires a rate of R2.

Decoding and reconstruction at Node 2

Node 2 looks for the index l such that (un(l), yn) ∈ T(n)ǫ and l ∈ B2(m2). It then

reconstructs xn according to x2i = g2(un(l)i, zi) for i ∈ [1 : n]. If there is no such

index, it reconstructs using l = 1.

Analysis of expected distortion

Using the typical average lemma in [13, Lecture 2] and following the analysis

in [13, Lecture 3], it suffices to analyze the probability of “error”; i.e. the probability

that the chosen sequences will not be jointly typical with the source sequences. Let L

and M11 be the chosen indices at the encoder. Note that these define the bin indices

M10 and M2. Let M2 be the chosen index at Node 1. Define the following error

events:

1. E0 := {(Xn, Y n) /∈ T(n)ǫ }

2. E1 := {(Un(l), Xn, Y n) /∈ T(n)ǫ } for all l ∈ [1 : 2nRl]

3. E2 := {(Un(l), Xn, Y n, Zn) /∈ T(n)ǫ } for all l ∈ [1 : 2nRl]

4. E3 := {(Un(L), Xn(L,m11), Xn, Y n) /∈ T

(n)ǫ } for all m11 ∈ [1 : 2nR11]

1For simplicity, we assume randomized encoding, but it is easy to see that the randomizedencoding employed our proofs can be incorporated as part of the (random) codebook generationstage.

152

APPENDIX C. PROOFS FOR CHAPTER 4 153

5. E4 := {(Un(l), Y n) ∈ T(n)ǫ } for some l 6= L and Un(l) ∈ B1(M10)

6. E5(M2) := {(Un(l), Zn) ∈ T(n)ǫ } for some l 6= L and Un(l) ∈ B2(M2)

We can then bound the probability of error as

Pe ≤ P{5⋃

i=0

Ei} =∑

P{Ei ∩ (i−1⋂

j=0

E cj )}.

• P{E0} → 0 as n → ∞ by Law of Large Numbers (LLN).

• By the covering lemma in [13, Lecture 3], P{E1 ∩ E c0} → 0 as n → ∞ if

Rl > I(U ;X, Y ) + δ(ǫ).

• P{E2 ∩E c1 ∩E c

0} → 0 as n → ∞ by the Markov relation U − (X, Y )−Z and the

conditional joint typicality lemma [13, Lecture 2].

• By the covering lemma in [13, Lecture 3], P{E3 ∩ (⋂2

j=0 Ecj} → 0 as n → ∞ if

R11 > I(X1;X|U, Y ) + δ(ǫ).

• From the analysis of the Wyner-Ziv Coding scheme (see [36] or [13, Lecture

12]), P{E4 ∩ (⋂3

j=0 Ecj} → 0 as n → ∞ if

Rl −R10 < I(U ; Y )− δ(ǫ).

• For the last term, we have

P{E5(M2) ∩ (4⋂

j=0

E cj )}=P{E5(M2) ∩ (

4⋂

j=0

E cj ) ∩ {M2 6= M2}}

+ P{E5(M2) ∩ (

4⋂

j=0

E cj ) ∩ {M2 = M2}}

(a)= P{E5(M2) ∩ (

4⋂

j=0

E cj ) ∩ {M2 = M2}}

= P{E5(M2) ∩ (

4⋂

j=0

E cj ) ∩ {M2 = M2}}

≤ P{E5(M2) ∩ E c2}.

Step (a) follows from the observation that (⋂4

j=0 Ecj ) ∩ {M2 6= M2} = ∅. The

analysis of the probability of error therefore reduces to the analysis for the

equivalent Wyner-Ziv setup with Z as the side information at Node 2. Hence,

P{E5(M2) ∩ (⋂4

j=0 Ecj )} → 0 as n → ∞ if

Rl −R2 < I(U ;Z)− δ(ǫ).

Eliminating Rl in the aforementioned inequalities then gives us the required rate re-

gion.


As the achievability proof for the Triangular Source Coding Case follows that of

the Cascade Source Coding Case closely, we will only include the additional steps

required for generating R3 and analysis of probability of error at Node 2. The steps

for generating R1 and R2, and for reconstruction at Node 1 are the same as the

Cascade setup.

Codebook Generation

• Fix p(x, y, z, u, v, x1) = p(x)p(y|x)p(z|y)p(u|x, y)p(x1|x, y, u)p(v|x, y, u).

• For each un(l), generate V n(l3), l3 ∈ [1 : 2nR3], according to∏n

i=1 p(vi|ui).

Partition the set of vn sequences into 2nR3 bins, B3(m3).

Encoding

154


• Given a sequence (xn, yn) and un(l) found through the steps in the Cascade

Source Coding setup, the encoder looks for an index l3 such that (un, vn(l, l3), xn, yn) ∈

T(n)ǫ . If it finds more than one, it selects one uniformly at random from the set

of admissible indices. If it finds none, it outputs an index uniformly at random

from [1 : 2nR3 ]. The encoder then sends out m3 such that L3 ∈ B3(m3).

Decoding

The additional decoding step is in decoding L3. Node 2 looks for the unique l3

such that (un(l), vn(l, l3), zn) ∈ T

(n)ǫ and vn(l3) ∈ B3(M3). If there is none or more

than one, it outputs m3 = 1.

Analysis of Distortion

Let L, M11 and M3 be the indices chosen by the encoder. Note that these fix the

indices M10 and M2. We follow similar analysis as in the Cascade case, with the same

definitions for error events E0 to E5. We also require the following additional error

events:

7) E6 := {(Un(L), V n(L, L3), Xn, Y n) /∈ T

(n)ǫ }.

8) E7 := {(Un(L), V n(L, L3), Xn, Y n, Zn) /∈ T

(n)ǫ }.

9) E8(L) := {(Un(L), V n(L, l3), Zn) ∈ T

(n)ǫ } for some l3 6= L3 and l3 ∈ B3(M3).

To bound the probability of error, we have the following additional terms

• By the covering lemma, P(E6 ∩ E c2) → 0 as n → ∞ if

R3 > I(V ;X, Y |U) + δ(ǫ).

• P(E7 ∩ E c6) → 0 as n∞ from the Markov condition (V, U)− (X, Y )−Z and the

conditional joint typicality lemma.

• P{E8(L) ∩ E c5(M2) ∩ E c

7 ∩ (⋂4

j=0 Ecj ))}. We have

P{E8(L) ∩ E c5(M2) ∩ E c

7 ∩ (

4⋂

j=0

E cj )}

= P{E8(L) ∩ E c5(M2) ∩ E c

7 ∩ (4⋂

j=0

E cj ) ∩ {M2 = M2}}

+ P{E8(L) ∩ E c5(M2) ∩ E c

7 ∩ (

4⋂

j=0

E cj ) ∩ {M2 6= M2}}

= P{E8(L) ∩ E c5(M2) ∩ E c

7 ∩ (

4⋂

j=0

E cj ) ∩ {M2 = M2}}

= P

{

E8(L) ∩ E c5(M2) ∩ E c

7 ∩ (4⋂

j=0

E cj ) ∩ {M2 = M2}

}

≤ P{E8(L) ∩ E c5(M2) ∩ E c

7}

(a)= P{E8(L) ∩ E c

5(M2) ∩ E c7 ∩ {L = L}}

= P{E8(L) ∩ E c5(M2) ∩ E c

7 ∩ {L = L}}

≤ P{E8(L) ∩ E c7}.

(a) follows from the observation that E c5(M2) ∩ E c

7 ∩ {L 6= L} = ∅. It remains

to bound P{E8(L) ∩ E c7}. Note that the analysis of this term is equivalent to

analyzing the setup where Un is the side information at Node 0 and (Un, Zn) is

the side information at Node 2. Hence, P{E8(L) ∩ E c7} → 0 as n → ∞ if

R3 − R3 < I(V ;Z|U)− δ(ǫ).

We then obtain the rate region by eliminating R3 and Rl.


As with the case for the Triangular setting, the proof for this case follows the Cascade

setting closely. We will therefore include only the additional steps. We have a change

of notation from the Cascade setting. We will use U1 instead of U .

Codebook Generation

156


• Fix p(x, y, z, u1, u2, x1) = p(x, y, z)p(u1|x, y)p(x1|u1, x, y)p(u2|z, u1).

• For each un1(l), generate 2nR3 Un

2 (l3) sequences, l ∈ [1 : 2nR3], each according to∏n

i=1 p(u2i|u1i). Partition the set of Un2 into 2nR3 bins, B3(m3).

Encoding

The additional encoding step is at Node 2. Node 2 looks for an index L3 such that

(un1(L), u

n2 (L, L3), Z

n) ∈ T(n)ǫ . As before, if it finds more than one, it selects an index

uniformly at random from the set of admissible indices. If it finds none, it outputs

an index uniformly at random from [1 : 2nR3 ]. It then outputs the bin index m3 such

that L3 ∈ B3(m3).

Decoding

Additional decoding is required at Node 0. Node 0 looks the index l3 such that

(un1(l), u

n2(l, l3), x

n, yn) ∈ T(n)ǫ and l3 ∈ B3(m3).

Analysis of distortion

Let ECascade denote the event that an error occurs in the forward Cascade path.

In addition, we define the following error events.

• ETW−1(L) := {(Un1 (L), U

n2 (L, l3), Z

n) /∈ T(n)ǫ for all l3 ∈ [1 : 2nR3 ]}.

• ETW−2(L) := {(Un1 (L), U

n2 (L, L3), Z

n, Xn, Y n) /∈ T (n)ǫ }.

• ETW−3(L) := {(Un1 (L), U

n2 (L, l3), X

n, Y n) ∈ T(n)ǫ for some l3 ∈ B3(M3), l3 6=

L3}.

• P(ETW−1(L) ∩ E cCascade) = P(ETW−1(L) ∩ E c

Cascade) → 0 as n → ∞ if

R3 > I(U2;Z|U1) + δ(ǫ).


Cascade) → 0 as n → ∞ by the strong

Markov Lemma [32].


Cascade) → 0 as n → ∞ if

R3 − R3 < I(U2;X, Y |U1)− δ(ǫ).

Finally, eliminating R3 and Rl gives us the required rate region.


The achievability proof for Two Way Triangular source coding combines the proofs

of the Triangular source coding case and the Two-way cascade case. As it is largely

similar to these proofs, we will not repeat it here. We will just mention that the

codebook generation, encoding, decoding and analysis of distortion for the forward

path from Node 0 to Node 2 follows that of the Triangular source coding case, while

codebook generation, encoding, decoding and analysis of distortion for the reverse

path from Node 2 to Node 0 follows that of the Two-way Cascade source coding case,

with (U2, V ) taking the role of U2.

C.5 Cardinality Bounds

We provide cardinality bounds for Theorems 4.1-4.4 stated in Chapter 4. The main

tool we will use is the Fenchel-Eggleston-Caratheodory Theorem [9].

Proof of cardinality bound for Theorem 4.1

For each x, y, we have

fj(pX,Y |U(x, y|u)) =∑

u

p(u)p(x, y|u) = p(x, y).

We therefore have |X ||Y|−1 continuous functions of p(x, y|u). These set of equations

preserves the distribution p(x, y) and hence, by Markovity, p(x, y, z). Next, observe

that the following are similarly continuous functions of p(x, y|u)

I(U ;X, Y |Z) = H(X, Y |Z)−H(X, Y, Z|U) +H(Z|U),

I(X ; X1, U |Y ) = H(X|Y )−H(X|U) +H(X, X1, Y |U),

E d1(X, X1) =∑

x,x

p(x, x1)d(x, x1),

158


E d2(X, X2) =∑

x,y,u

p(x, y, u)d(x, g2(x, u)),

These equations give us 4 additional continuous functions and hence, by Fenchel-

Eggleston-Caratheodory Theorem, there exists a U ′ with cardinality of |X ||Y|+3 such

that all the constraints are satisfied. Note that this construction does not preserve

p(x1), but this does not change the rate-distortion region since the associated rate

and distortion are preserved.


We will first give a bound for the cardinality of U . We look at the following

continuous functions of p(x, y|u).

fj(pX,Y |U(x, y|u)) =∑

u

p(u)p(x, y|u) = p(x, y), ∀x, y

I(U ;X, Y |Z) = H(X, Y |Z)−H(X, Y, Z|U) +H(Z|U),

I(X ; X1, U |Y ) = H(X|Y )−H(X|U) +H(X, X1, Y |U),

I(X, Y ;V |U,Z) = H(X, Y, Z|U)−H(Z|U)−H(X, Y, V, Z|U) +H(V, Z|U),

E d1(X, X1) =∑

x,x

p(x, x1)d(x, x1),

E d2(X, X2) =∑

x,y,u,v

p(x, y, u, v)d(x, g2(x, u)).

From these equations, there exists a U ′ with |U ′| ≤ |X ||Y|+4 such that the equations

are satisfied. Note that the new U ′ induces a new V ′. For each U ′ = u, consider the

following continuous functions of p(x, y|u, v)

p(x, y|u) =∑

v

p(v|u)p(x, y|v, u),

I(X, Y ;V |U = u, Z) = H(X, Y |U = u, Z)−H(X, Y |V, U = u, Z),

E(d2(X, X2)|U = u) =∑

x,y,v

p(x, y, v|u)d(x, g2(x, u)).

From this set of equations, we see that for each U ′ = u, it suffices to consider V ′

such that |V ′| ≤ |X ||Y| + 1. Hence, the overall cardinality bound on V is |V| ≤

(|X ||Y|+ 4)(|X ||Y|+ 1). The joint p(x, y, z) is preserved due to the Markov Chain

(V, U)− (X, Y )− Z.


The cardinality bounds on U1 follows similar analysis as in the Cascade source

coding case. The proof is therefore omitted. For each U1 = u1, the following are

continuous functions of p(z|u2, u1),

p(z|u1) =∑

u2

p(u2|u1)p(z|u2, u1),

I(U2;Z|U1 = u1, X, Y ) = H(Z|U1 = u1, X, Y )−H(Z|U1 = u1, U2, X, Y ),

E(d3(Z, Z)|U1 = u1) =∑

x,y,z,u2

(p(x, y, z, u2|u1)d(z, g3(x, y, u1, u2))) .

From this set of equations, we see that for each U1 = u1, it suffices to consider U ′2 such

that |U ′2| ≤ |Z|+1. Hence, the overall cardinality bound on U2 is |U2| ≤ |U1|(|Z|+1).

The joint p(x, y, z) is preserved due to the Markov Chains U1 − (X, Y ) − Z and

U2 − (Z, U1)− (X, Y ).


The cardinality bounds follow similar steps to those for the first 3 theorems. For

the cardinality bound for |U2|, we find a cardinality bound for each U1 = u1 and

V = v. Details of the proof are omitted.

C.6 Alternative characterizations of rate distor-

tion regions in Corollaries 4.1 and 4.2

In this appendix, we show that the rate distortion regions in Corollaries 4.1 and 4.2

can alternatively be characterized by transforming them into equivalent problems

found in [27], where explicit characterizations were given. We focus on the Cascade

case (Corollary 4.1), since the Triangular case follows by the same analysis.

Figure C.1 shows the Cascade source coding setting which the optimization prob-

lem in Corollary 4.1 solves.

160


A +B

BB

R1 R2

X1

X2

Node 0Node 1

Node 2

Figure C.1: Cascade source coding setting for the optimization problem in Corol-lary 4.1. X1 and X2 are lossy reconstructions of A+B.

In [27], explicit characterization of the Cascade source coding setting in Figure C.2

was given.

X

Y = X + ZY = X + Z

R1 R2

X1

X2

Node 0Node 1

Node 2

Figure C.2: Cascade source coding setting for the optimization problem in Corol-lary 4.1. X1 and X2 are lossy reconstructions of X and Z is independent X .

We now show that the setting in Figure C.1 can be transformed into the setting

in Figure C.2. First, we note that for the setting in Figure C.2, the rate distortion

regions are the same regardless of whether the sources are (X, Y ) or (X,αY ) where

α 6= 0 since the nodes can simply scale Y by an appropriate constant.

Next, for Gaussian sources, the two settings are equivalent if we can show that

the covariance matrix of (X,αY ) can be made equal to the covariance matrix of

(A+B,B). Equating coefficients in the covariance matrix, we require the following

σ2X = σ2

A + σ2B,

ασ2X = σ2

B,

α2(σ2X + σ2

Z) = σ2B.

Solving these equations, we see that α = σ2B/(σ

2A+σ2

B) and σ2Z = (σ2

B−α2σ2X)/α

2.

Since (σ2B − α2σ2

X) ≥ 0, this choice of σ2Z is valid, which completes the proof.

C.7 Proof of Converse for Triangular source cod-

ing with helper

Given a (n, 2nR1, 2nR2, 2nR3 , 2nRh, D1, D2) code, define Uhi = (Y i−1, Z i−1, Zni+1,Mh),

U1i = (X i−1,M2) and U2i = (Uhi, U1i,M3). Observe that we have the required Markov

conditions (Xi, Zi)−Yi−Uhi and Zi−(Xi, Yi, Uhi)−(U1i, U2i). For the helper condition,

we have

nRh ≥ I(Mh; Yn|Zn)

=

n∑

i=1

(

H(Yi|Zi)−H(Yi|Yi−1,Mh, Z

n))

=

n∑

i=1

I(Uhi; Yi|Zi).

For the other rates, we have

nR1 ≥ H(M1)

≥ H(M1|Yn, Zn)

= H(M1,M2|Yn, Zn)


=n∑

i=1


=

n∑

i=1

(

H(Xi|Xi−1, Y n, Zn)−H(Xi, Yi|X

i−1, Y n, Zn,M1,M2))

=n∑

i=1

(

H(Xi|Yi, Zi)−H(Xi, Yi|Xi−1, Y n, Zn,M1,M2)

)

(a)=

n∑

i=1

(

H(Xi|Yi)−H(Xi, Yi|Xi−1, Y n, X1i, Z

n,M1,M2,Mh))

≥n∑

i=1

(

H(Xi|Yi, Uhi)−H(Xi|X1i, Yi, U1i, Uhi))

162

in the proof of Theorem 4.2, which we therefore omit.

164

Bibliography

[1] Rudolph Ahlswede and Imre Csiszar. Common randomness in information theory

and cryptography—I: Secret sharing. IEEE Trans. Inf. Theory, 39(4), 1993.

[2] Ehsan Ardestanizadeh, Massimo Franceschetti, Tara Javidi, and Young Han

Kim. Wiretap channel with rate-limited feedback. IEEE Trans. Inf. Theory,

55(12):5353–5361, December 2009.

[3] G. Bagherikaram, A. S. Motahari, and A. K. Khandani. The secrecy rate region of

the broadcast channel. In 46th Annual Allerton Conference on Communication,

Control and Computing, Sept 2008.

[4] S. Borade, L. Zheng, and M. Trott. Multilevel broadcast networks. In Interna-

tional Symposium on Information Theory, 2007.

[5] Yanling Chen and A. J. Han Vinck. Wiretap channel with side information.

IEEE Trans. Inf. Theory, 54(1):395–402, 2006.

[6] T. Cover and J. Thomas. Elements of Information Theory 2nd edition. Wiley

Interscience, July 2006.

[7] I. Csiszar and J. Korner. Broadcast channels with confidential messages. IEEE

Trans. Inf. Theory, IT-24:339–348, May 1978.

[8] Paul Cuff, Han-I Su, and Abbas El Gamal. Cascade multiterminal source cod-

ing. In ISIT’09: Proceedings of the 2009 IEEE international conference on Sym-

posium on Information Theory, pages 1199–1203, Piscataway, NJ, USA, 2009.

IEEE Press.

165

[9] H. G. Eggleston. Convexity. Cambridge: Cambridge University Press, 1958.

[10] E. Ekrem and S. Ulukus. Secrecy capacity of a class of broadcast channels with an

eavesdropper. EURASIP Journal on Wireless Communications and Networking,

Oct 2009.

[11] Abbas El Gamal. The feedback capacity of degraded broadcast channels (cor-

resp.). IEEE Transactions on Information Theory, 24(3):379–381, May 1978.

[12] Abbas El Gamal. The capacity of a class of broadcast channels. IEEE Trans.

Info. Theory, 25(2):166–169, Mar 1979.

[13] Abbas El Gamal and Young Han Kim. Network Information Theory. Cambridge

University Press, 1st edition, 2011.

[14] Abbas El Gamal and E. C. van der Meulen. A proof of marton’s coding theorem

for the discrete memoryless broadcast channel. IEEE Transactions on Informa-

tion Theory, 27(1):120–121, 1981.

[15] W. H. Gu and M. Effros. Source coding for a simple multi-hop network. In IEEE

International Symposium on Information Theory, ISIT, pages 2335–2339, June

28- July 3 2005.

[16] P. Ishwar and S. S. Pradhan. A relay-assisted distributed source coding problem,.

In ITA Workshop, 2008.

[17] Amiram H. Kaspi. Two-way source coding with a fidelity criterion. IEEE Trans.

Inf. Theory, 31(6):735–740, 1985.

[18] A. Khisti, A. Tchamkerten, and G.W. Wornell. Secure broadcasting over fading

channels. IEEE Trans. Info. Theory, 54(6):2453–2469, June 2008.

[19] Ashish Khisti, Suhas N. Diggavi, and Gregory W. Wornell. Secret key agree-

ment using asymmetry in channel state knowledge. In Proc. IEEE International

Symposium on Information Theory, pages 2286–2290, Seoul, South Korea, July

2009.

166

BIBLIOGRAPHY 167

[20] J. Korner and K. Marton. Comparison of two noisy channels. In Topics in

Information Theory (Second Colloq., Keszthely, 1975), pages 411–423, 1977.

[21] Yingbin Liang, H. V. Poor, and Shlomo Shamai. Information theoretic secu-

rity. Foundations and Trends in Communications and Information Theory, 5(4–

5):355–580, 2008.

[22] Ruoheng Liu, I. Maric, P. Spasojevic, and R. D. Yates. Discrete memoryless

interference and broadcast channels with confidential messages: Secrecy rate

regions. IEEE Trans. Info. Theory, 54(6):2493–2507, June 2008.

[23] Wei Liu and Biao Chen. Wiretap channel with two-sided state information. In

Proc. 41st Asilomar Conf. Signals, Systems and Comp., pages 893–897, Pacific

Grove, CA, November 2007.

[24] Katalin Marton. A coding theorem for the discrete memoryless broadcast chan-

nel. IEEE Trans. Info. Theory, 25(3):306–311, May 1979.

[25] Ueli M. Maurer. Secret key agreement by public discussion from common infor-

mation. IEEE Trans. Inf. Theory, 39(3):733–742, 1993.

[26] Chandra Nair and Abbas El Gamal. The capacity region of a class of 3-receiver

broadcast channels with degraded message sets. IEEE Trans. Info. Theory, 2008.

Submitted. Available online at http://arxiv.org/abs/0712.3327.

[27] Haim Permuter and Tsachy Weissman. Cascade and triangular source coding

with side information at the first two nodes. Online at ArXiv: arXiv:1001.1679v.

[28] Claude E. Shannon. Communication theory of secrecy systems. Bell Systems

Tech. J., 28:656–715, 1949.

[29] Claude E. Shannon. Channels with side information at the transmitter. IBM J.

Res. Develop., 2:289–293, 1958.

[30] Yossef Steinberg and Neri Merhav. On successive refinement for the wyner-ziv

problem, ccit rep. 419, ee pub. 1358. Technical report, Technion Department of

Electrical Engineering, 2003.

[31] R. Timo and B.N. Vellambi. Two lossy source coding problems with causal side-

information. In IEEE International Symposium on Information Theory, ISIT,

pages 1040–1044, June 28- July 3 2009.

[32] S.-Y. Tung. Multiterminal source coding. PhD thesis, Cornell University, Itahca,

NY, 1978.

[33] D. Vasudevan, C. Tian, and S. N. Diggavi. Lossy source coding for a cascade

communication system with side informations. In Proceeding of Allerton confer-

ence on Communication, Control and Computing, 2006.

[34] Tsachy Weissman and Abbas El Gamal. Source coding with limited-look-ahead

side information at the decoder. IEEE Trans. Inf. Theory, 52(12):5218–5239,

2006.

[35] F. M. J. Willems and E. C. van der Meulen. The discrete memoryless multiple-

access channel with cribbing encoders. IEEE Trans. Inf. Theory, 31(3):313–327,

1985.

[36] A. D. Wyner. The wire-tap channel. Bell System Technical Journal, 54(8):1355–

1387, 1975.

[37] A. D.Wyner. The rate-distortion function for source coding with side information

at the decoder-ii: General sources. Information and Control, (38:60-80), 1978.

[38] A. D. Wyner and J. Ziv. The rate-distortion function for source coding with side

information at the decoder. IEEE Trans. Inf. Theor., 22(1):1–10, 1976.

[39] H. Yamamoto. Source coding theory for cascade and branching communication

systems. IEEE Trans. Inf. Theory, 27:299–308, 1981.

[40] Hirosuke Yamamoto. Rate-distortion theory for the Shannon cipher system.

IEEE Trans. Inf. Theory, 43(3):827–835, 1997.

168

multi-terminal secrecy and source coding a …

Documents