nearest neighbour decoding for fading...

UNIVERSITY OF CAMBRIDGE

NEAREST NEIGHBOUR DECODING

FOR FADING CHANNELS

presented by

A. Taufiq Asyhari

Trinity Hall

Cambridge University Engineering Department

January 2012

This dissertation is submitted

for the degree of Doctor of Philosophy

ThesisFigs/UnivShield.eps

To Mom, Dad and Ana

Declaration

I hereby declare that this dissertation is the result of my own work and includes

nothing which is the outcome of work done in collaboration except where specifi-

cally indicated in the text. This dissertation is not substantially the same as any

that I have submitted for a degree or diploma or other qualification at any other

university.

I also declare that the length of this dissertation is less than 71,000 words and

that the number of figures is less than 150. Approval to extend the word limit

set by the Department of Engineering Degree Committee has been obtained.

Signature: . . . . . . . . . . . . . . . . . . . . . . . . . . .

A. Taufiq Asyhari

Cambridge, January 2012

Nearest Neighbour Decoding for Fading Channels

By: A. Taufiq Asyhari

Abstract

This dissertation addresses the effects of imperfect channel state information

(CSI) on the reliability of nearest neighbour decoding in fading channels.

In the first part, we investigate the reliability of nearest neighbour decoding

in block-fading channels, which are relevant for delay-constrained applications.

The block-fading channel is non-ergodic and the reliability under perfect CSI

is measured by the information outage probability. Using mismatched-decoding

techniques, we develop a framework to study nearest neighbour decoding with

imperfect CSI and propose the generalised outage probability as a new tool to

measure the reliability, which generalises the notion of information outage prob-

ability. Assuming a non-adaptive transmission scheme and imperfect CSI at the

receiver (CSIR), we first evaluate the performance of the generalised outage prob-

ability at high signal-to-noise ratio (SNR). We characterise the reliability as a

function of the system parameters and the quality of channel estimation. We

then consider two adaptive schemes: incremental-redundancy automatic-repeat

request (IR-ARQ) (based on receiver feedback) and power adaptation (based on

imperfect CSI at the transmitter (CSIT)). For IR-ARQ, the reliability is shown to

be a function of the system parameters and the qualities of feedback and channel

estimation. For power adaptation with imperfect CSIT, the reliability is shown

to be a function of the system parameters and the CSI quality at both terminals.

In the second part, we investigate the reliability of nearest neighbour de-

coding in ergodic fading channels. The ergodic channel is relevant for delay-

unconstrained transmission where reliable communication is possible at rates

below the channel capacity. In order to obtain accurate CSI, we propose a

pilot-aided channel-estimation scheme. We first consider point-to-point chan-

nels and demonstrate that our scheme achieves the high-SNR logarithmic growth

of the capacity of multiple-input single-output channels, and achieves the best

so far known lower bound on the high-SNR logarithmic growth of the capacity

of multiple-input multiple-output channels. We then consider fading multiple-

access channels and propose a joint-transmission scheme. We characterise the

high-SNR performance of the joint-transmission scheme and compare it with

time-division multiple-access schemes. We then show the potential of the joint-

transmission scheme for uplink cellular communications.

The results in this dissertation are relevant for the design of codes under

imperfect CSI, channel estimator, feedback signalling and power adaptation.

Acknowledgement

This dissertation is the ultimate destination of my PhD journey for the last 3.5

years. Alhamdulillah, all praises are due to Alloh SWT, the Lord of the universe

for giving me the ability to reach this final destination.

I would like to express my sincere gratitude to my supervisor Dr. Albert

Guillen i Fabregas for giving me the opportunity to study PhD at one of the

best campuses in the world. Albert has been very helpful and supportive in

many circumstances. He gave me freedom and flexibility to explore my research

topics and encouraged me to work independently. Our working relationship for

the past years has shaped and strengthened my interest in information theory.

He is surely one of my role models of a good teacher and researcher. I hope that

we can continue to work together in the future.

I am deeply thankful to Dr. Tobias Koch. I have benefited a lot from many

discussions with him. Tobi has been very critical with mathematics in our papers

and this has influenced my way of doing research. Tobi also has many valuable

suggestions on improving my writing and presentation skills. Some parts of this

dissertation have been improved following from his feedback. I am grateful for

all of these and hope that we can still work together in the future.

I would like to extend my gratitude to Dr. Jossy Sayir for his care and

concern, and who has proofread some chapters in this dissertation. His feedback

has improved the presentation of this dissertation.

I would like to thank all members of our research group. Thanks to Dr.

Alfonso Martinez for his insightful comments on some parts of my research.

Thanks to Li Peng for being my research buddy in the last 3.5 years. Having

started PhD at the same time, we have shared many academic and non-academic

discussions. I would also like to thank Dr. Adria Tauste, Jing Guo, Dr. Alex

Alvarado and Jonathan Scarlett. I have learnt many new things from you guys.

I spent a month of doing research at NCTU, Hsinchu, Taiwan. I would like

to thank Prof. Stefan Moser for hosting me there and for every help he provided.

I have learnt from Stefan idealism in doing research and I am grateful for that.

I would also like to thank those who have helped me in Taiwan: Hsuan-Yin,

Hui-Ting, Yu-Hsing, Sameer, Mas Moro and Mas Agus.

This PhD study would not be possible without generous supports from Yousef

Jameel Scholarship. I would like to express my sincere gratitude to Mr. Jameel

and every one in Yousef Jameel Foundation who have selected me as a scholar in

Cambridge.

I would like to thank staffs at Cambridge University, in particular Kathy

White, Rachel Fogg and Janet Milne for providing administrative supports and

Roger Wareham for providing computing supports.

Being in a foreign country, there were always concerns in practising my reli-

gion. Thanks to my wonderful friends in Islamic Society: Khairul, Syed, Irufan,

Tariq, Sheikh, Ubaid, Mohammed and many more who have always been in

support. I hope that we can keep in touch.

Being far away from my country, it could be so lonely. But that did not

happen. Thanks to Mas Dono and family for having been very supportive since

the first time I arrived in Cambridge. Thanks to my compatriots: Astari, Yuke,

Tracey, Anin, Mbak Ina, Kevin, Antony and many others who have made our

Indonesian community in Cambridge so lively. Thanks to my friend Amika who

always created jokes during our online chat whenever I felt bored with my re-

search.

I would like to gratefully acknowledge my “home” friends: Reyhan, Zuhdi,

Marta, Bang Dian, Barra, Bowo and Digdaya for all their helps during my ap-

plication to Cambridge.

Finally, I would like to thank my family for their unconditional supports and

love. I am very grateful to my father Bapak Muh Kodri, my mother Ibu Siti

Mariyam and my sister Yulia Dwi for every thing they have given in my life. I

am deeply thankful to my lovely wife, Ana, who has accompanied me in every

part of this journey. I understand that it has been very hard to have a long

distance relationship between Cambridge and Kuala Lumpur. I am so proud of

you honey being able to do this together. I am sorry for the time that I have

missed and I promise to make up for that. I am also thankful to my father-in-law

Bapak Muhaimin, S.H., M.Kn. and my mother-in-law Ibu Dra. Siti Asiatun for

their constant supports and motivations to both me and my wife.

Contents

Contents xi

Acronyms xv

List of Figures xvii

List of Tables xix

1 Preliminaries 1

1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . 1

1.2 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

I Non-Ergodic Block-Fading Channels 9

2 The Block-Fading Channel 11

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 The Block-Fading Channel with Perfect CSI . . . . . . . . . . . . 14

2.3.1 Mutual Information, Capacity and Outage Probability . . 14

2.3.2 Nearest Neighbour Decoding and Noise Distribution . . . . 17

2.4 The Block-Fading Channel with Imperfect CSI . . . . . . . . . . . 18

2.4.1 Generalised Gallager’s Bound, GMI and Generalised Outage 19

2.4.2 Section 2.4.1 Proofs . . . . . . . . . . . . . . . . . . . . . . 22

2.4.2.1 Concavity of EQ0 (s, ρ, Hb) . . . . . . . . . . . . . 22

2.4.2.2 Non-Negativity of Igmi(H) . . . . . . . . . . . . . 23

2.4.2.3 GMI Upper Bound . . . . . . . . . . . . . . . . . 25

2.5 Outage Bounds, Diversity and System Design . . . . . . . . . . . 25

3 MIMO Block-Fading Channels with Imperfect CSIR 29

3.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Outage Diversity in Block-Fading Channels . . . . . . . . . . . . . 34

3.3 Generalised Outage Diversity . . . . . . . . . . . . . . . . . . . . 36

3.4 Random Coding Achievability . . . . . . . . . . . . . . . . . . . . 39

xi

CONTENTS

3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.5.1 Insights for System Design . . . . . . . . . . . . . . . . . . 44

3.5.2 The DMT for Gaussian Codebooks . . . . . . . . . . . . . 47

3.5.3 Optical Wireless Scintillation Distributions . . . . . . . . . 48

3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4 IR-ARQ in MIMO Block-Fading Channels with Imperfect Feed-back and CSIR 51

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2.1 Channel Model . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2.2 IR-ARQ Scheme . . . . . . . . . . . . . . . . . . . . . . . 55

4.2.3 Feedback Channel and ARQ . . . . . . . . . . . . . . . . 56

4.3 Performance Metrics with Imperfect CSIR . . . . . . . . . . . . . 57

4.3.1 Error Probability with Perfect Feedback . . . . . . . . . . 57

4.3.1.1 Threshold Decoder Ψ(·) . . . . . . . . . . . . . . 58

4.3.1.2 Decoding Error and Communication Outage . . . 62

4.3.2 Error Probability with Imperfect Feedback . . . . . . . . . 65

4.4 ARQ Outage Diversity . . . . . . . . . . . . . . . . . . . . . . . . 66

4.4.1 Uniform Power Allocation . . . . . . . . . . . . . . . . . . 68

4.4.2 Power-Controlled ARQ . . . . . . . . . . . . . . . . . . . . 69

4.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5 Mismatched CSI Outage SNR-Exponents of MIMO Block-FadingChannels 77

5.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.3 Outage SNR-Exponents . . . . . . . . . . . . . . . . . . . . . . . 83

5.3.1 Full-CSIT Power Allocation . . . . . . . . . . . . . . . . . 83

5.3.2 Causal-CSIT Power Allocation . . . . . . . . . . . . . . . . 84

5.3.3 Predictive-CSIT Power Allocation . . . . . . . . . . . . . . 85

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.4.1 Pilot-Assisted Channel Estimation . . . . . . . . . . . . . 86

5.4.2 Mean-Feedback CSIT Model . . . . . . . . . . . . . . . . . 89

5.4.3 Comments on Achievable Rates . . . . . . . . . . . . . . . 90

5.4.4 Comments on Continuous Input Distributions . . . . . . . 93

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

xii

CONTENTS

II Stationary Ergodic Fading Channels 97

6 Stationary Fading Channels 99

6.1 MIMO Gaussian Flat-Fading Channels . . . . . . . . . . . . . . . 99

6.2 Capacity and The Pre-Log . . . . . . . . . . . . . . . . . . . . . . 100

6.2.1 Coherent Channels . . . . . . . . . . . . . . . . . . . . . . 101

6.2.2 Noncoherent Channels . . . . . . . . . . . . . . . . . . . . 103

7 Pilot-Aided Channel Estimation for Stationary Fading Channels105

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

7.2 System Model and Transmission Scheme . . . . . . . . . . . . . . 107

7.3 The Pre-Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7.4 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

7.4.1 Proof of Theorem 7.1 . . . . . . . . . . . . . . . . . . . . . 115

7.4.1.1 Linear Interpolator . . . . . . . . . . . . . . . . . 115

7.4.1.2 Achievable Rates and Pre-Logs . . . . . . . . . . 123

7.4.2 A Note on Input Distribution . . . . . . . . . . . . . . . . 132

7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

8 Pilot-Aided Channel Estimation for Fading Multiple-AccessChannels 137

8.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

8.2 Transmission Scheme . . . . . . . . . . . . . . . . . . . . . . . . . 139

8.3 The MAC Pre-Log . . . . . . . . . . . . . . . . . . . . . . . . . . 142

8.4 Joint Transmission Versus TDMA . . . . . . . . . . . . . . . . . . 144

8.4.1 Receiver Employs Less Antennas Than Transmitters . . . 146

8.4.2 Receiver Employs More Antennas Than Transmitters . . . 146

8.4.3 A Case in Between . . . . . . . . . . . . . . . . . . . . . . 147

8.5 Proof of Theorem 8.1 . . . . . . . . . . . . . . . . . . . . . . . . . 149

8.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

9 Summary and Future Research 157

9.1 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 157

9.1.1 Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

9.1.2 Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

9.2 Areas for Future Research . . . . . . . . . . . . . . . . . . . . . . 160

Appendix A 163

A.1 Proof of Lemma 3.1 (Discrete Inputs) . . . . . . . . . . . . . . . . 163

A.2 Proof of Theorem 3.1 (Discrete Inputs) . . . . . . . . . . . . . . . 164

A.2.1 SISO Case . . . . . . . . . . . . . . . . . . . . . . . . . . 165

xiii

CONTENTS

A.2.2 MIMO Case . . . . . . . . . . . . . . . . . . . . . . . . . . 180

A.3 Proof of Theorem 3.1 (Gaussian Inputs) . . . . . . . . . . . . . . 184

A.3.1 GMI Lower Bound . . . . . . . . . . . . . . . . . . . . . . 185

A.3.2 GMI Upper Bound . . . . . . . . . . . . . . . . . . . . . . 192

A.4 Proof of Theorem 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . 197

A.5 Proof of Inequality (A.220) . . . . . . . . . . . . . . . . . . . . . . 202

A.6 Proof of Proposition 3.1 . . . . . . . . . . . . . . . . . . . . . . . 205

A.7 Proof of Theorem 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . 212

Appendix B 219

B.1 Proof of Lemma 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 219

B.2 Proof of Theorem 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . 220

B.2.1 Converse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

B.2.2 Achievability . . . . . . . . . . . . . . . . . . . . . . . . . 223

B.3 Proof of Proposition 4.1 . . . . . . . . . . . . . . . . . . . . . . . 224

B.3.0.1 GMI Upper Bound . . . . . . . . . . . . . . . . . 225

B.3.0.2 GMI Lower Bound . . . . . . . . . . . . . . . . . 227

B.4 Proof of Proposition 4.2 . . . . . . . . . . . . . . . . . . . . . . . 228

B.4.1 GMI Upper Bound . . . . . . . . . . . . . . . . . . . . . . 231

B.4.2 GMI Lower Bound . . . . . . . . . . . . . . . . . . . . . . 234

Appendix C 241

C.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

C.2 Power Allocation and Asymptotic Analysis . . . . . . . . . . . . . 242

C.2.1 Power Allocation . . . . . . . . . . . . . . . . . . . . . . . 242

C.2.2 Asymptotic Analysis . . . . . . . . . . . . . . . . . . . . . 244

C.2.3 GMI Upper and Lower Bounds . . . . . . . . . . . . . . . 245

C.3 Full-CSIT Power Allocation . . . . . . . . . . . . . . . . . . . . . 247

C.3.1 GMI Upper Bound . . . . . . . . . . . . . . . . . . . . . . 248

C.3.2 GMI Lower Bound . . . . . . . . . . . . . . . . . . . . . . 251

C.4 Causal-CSIT Power Allocation . . . . . . . . . . . . . . . . . . . . 253

C.4.1 GMI upper bound . . . . . . . . . . . . . . . . . . . . . . 253

C.4.2 GMI Lower Bound . . . . . . . . . . . . . . . . . . . . . . 256

C.5 Predictive-CSIT Power Allocation . . . . . . . . . . . . . . . . . . 260

C.5.1 GMI Upper Bound . . . . . . . . . . . . . . . . . . . . . . 260

C.5.2 GMI Lower Bound . . . . . . . . . . . . . . . . . . . . . . 264

C.6 LMMSE Channel Estimation . . . . . . . . . . . . . . . . . . . . 265

References 273

xiv

Acronyms

Herein we list the main acronyms used throughout the dissertation. The meaning

of the acronym is usually stated once when the first time it appears in the text.

ARQ Automatic-Repeat Request

AWGN Additive-White Gaussian Noise

BPSK Binary-Phase Shift Keying

CSI Channel State Information

CSIR Channel State Information at the Receiver

CSIT Channel State Information at the Transmitter

GMI Generalised Mutual Information

i.i.d. independent and identically distributed

IR-ARQ Incremental-Redundancy Automatic-Repeat Request

LMMSE Linear Minimum Mean-Squared Error

MAC Multiple-Access Channel

MIMO Multiple-Input Multiple-Output

MISO Multiple-Input Single-Output

MMSE Minimum Mean-Squared Error

ML Maximum-likelihood

OFDM Orthogonal Frequency Division Multiplexing

pdf probability density function

psd power spectral density

QAM Quadrature-Amplitude Modulation

SISO Single-Input Single-Output

SNR Signal-to-Noise Ratio

TDMA Time-Division Multiple-Access

TDD Time-Division Duplex

xv

List of Figures

2.1 A diagram for a MIMO block-fading channel. . . . . . . . . . . . 13

3.1 A MIMO block-fading channel with imperfect CSIR. . . . . . . . 31

3.2 Random coding SNR-exponent lower bound for discrete signal

codebooks as a function of target rate R (in bits per channel use),

B = 4, nt = 2, nr = 2, τ = 0, M = 4 and de = 0.5. . . . . . . . . . 43

3.3 Generalised outage SNR-exponent for discrete-input block-fading

channel, B = 4, τ = 0 (Rayleigh, Rician and Nakagami-q fading),

nt = 2 and nr = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.4 Generalised outage probability for Gaussian-input MIMO Rayleigh

block-fading channel with B = 2, R = 2, nt = 2 and nr = 1. . . . 46

3.5 Generalised outage probability for BPSK-input MIMO Rayleigh

block-fading channel with B = 2, R = 1, nt = 2 and nr = 1. . . . 47

4.1 System model for IR-ARQ transmission with binary feedback. . . 53

4.2 Density of the accumulated GMI at round ℓ = 2 for Gaussian-input

transmission over a SISO Rayleigh fading channel with B = 1,

Pℓ = 16 (unit power), ℓ = 1, 2. . . . . . . . . . . . . . . . . . . . 64

4.3 Simulation results of ARQ outage probability for Gaussian-input

transmission over a MIMO Rayleigh block-fading channel with

parameters: B = 2, L = 2, nt = 2, nr = 1, R = 2 bits per

channel use and the BSC feedback with parameter p0 = 0.5. . . . 72

4.4 ARQ outage diversity for 4-QAM inputs in a MIMO Rayleigh

block-fading channel with parameters: B = 2, L = 3, nt = 2

and nr = 1. UP (OP) indicates results with uniform (optimal)

power allocation. The data rate R is in bits per channel use. The

CSIR-error diversity de is such that (4.71) is satisfied. . . . . . . 73

5.1 System model for MIMO block-fading channels with imperfect CSI

at both terminals. . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

xvii

LIST OF FIGURES

5.2 Interplay among the CSIT- and CSIR-error diversities and the

outage SNR-exponent with full-CSIT power allocation. . . . . . . 83

5.3 Comparison of the densities of the GMI and and the lower bound

(5.51) with fading realisation H = 1, transmission power P = 1

(unit power) and CSIR-error variance σ2e = 0.1. . . . . . . . . . . 92

6.1 A diagram for communication over a stationary MIMO fading

channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.1 Structure of pilot and data transmission for nt = 2, L = 7 and

T = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

8.1 The two-user MAC system model. . . . . . . . . . . . . . . . . . . 138

8.2 Structure of joint-transmission scheme, nt,1 = 2, nt,2 = 1, L = 7

and T = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

8.3 Structure of TDMA scheme, nt,1 = 2, nt,2 = 1, L = 4 and T = 2. . 142

8.4 Pre-log regions for a fading MAC with nr = 2 and nt,1 = nt,2 = 1

for different values of L∗. Depicted are the pre-log region for the

joint-transmission scheme as given in Theorem 8.1 (dashed line),

the pre-log region of the TDMA scheme as given in Remark 8.2

(solid line), and the pre-log region of the coherent TDMA scheme

(8.21) (dotted line). . . . . . . . . . . . . . . . . . . . . . . . . . . 147

xviii

List of Tables

3.1 Pdf for Different Fading Distribution . . . . . . . . . . . . . . . . 32

8.1 Typical values of L∗ for various environments with fc ranging from

800 MHz to 5 GHz. The values of στ are taken from [1] for indoor

and urban environments and from [2] for hilly area environments. 148

C.1 Definition of magnitute-squared and phase variables. . . . . . . . 241

C.2 Definition of normalised magnitute-squared variables. . . . . . . . 242

xix

LIST OF TABLES

xx

Chapter 1

Preliminaries

1.1 Background and Motivation

Wireless communications are characterised by multipath propagation of signals

over a free space medium. The presence of scattering objects in the surrounding

environment causes the transmitted signals to arrive at the receiver with different

delays, which result in random attenuation of the original transmitted signals.

This phenomenon is referred to as fading [3], one of the challenges for reliable

transmission over wireless channels. The severity of fading depends on various

factors such as the geography and the topography of the scattering environment,

the mobile velocity, the carrier frequency and the transmitted signal bandwidth.

Reliable communication over fading channels has traditionally been studied

in information theory under the assumption of perfect knowledge of the fading.

Such perfect knowledge of the fading is commonly referred to as perfect channel

state information (CSI); the corresponding channel where perfect CSI is available

is commonly referred to as the coherent channel. This assumption has facilitated

many developments in modern communication technologies including the design

of good practical coding schemes, the design of adaptive transmission and many

more. Among the well-known results is the discovery of nearest neighbour de-

coding as a reliable decoding scheme. The nearest neighbour decoder is a simple

decoder that selects the codeword that is closest (in a Euclidean-distance sense)

to the channel output. For coherent channels with additive Gaussian noise, this

decoder is the maximum-likelihood decoder and is therefore optimal in the sense

that it minimises the error probability (see [4] and references therein). Due to

this optimality and simplicity, the nearest neighbour decoder becomes a per-

formance benchmark of practical decoders and has inspired the development of

many decoders such as the ones that are based on the widely-used Viterbi al-

1

1.1 Background and Motivation

gorithm [5] (see for example [6–8]) and the ones that are based on generalised

minimum-distance decoding [9] (see for example [10–12]).

Note that the optimality of nearest neighbour decoding can be guaranteed if

the decoder has noiseless access to every realisation of the fading. Such noiseless

access is typically substantiated by assuming that the fading variation is suffi-

ciently slow such that accurate fading estimates can be obtained by transmitting

known training symbols before data transmission. In practical scenarios, this

assumption is too optimistic and may lead to an unexpected behaviour of the

actual system. Due to hardware limitation and time-varying characteristics of

the channel, most channel estimators are not able to guarantee perfect fading

estimation and always incur channel estimation errors. The erroneous or noisy

fading estimates make nearest neighbour decoding inherently suboptimal. We re-

fer to such erroneous or noisy fading estimates as imperfect or mismatched CSI.

The noncoherent channel refers to a channel where perfect CSI is not available,

but the transmitter and receiver may estimate the fading using some channel

estimators.1.1

In this dissertation, we study nearest neighbour decoding that operates based

on erroneous fading estimates as a result of imperfect fading estimation. In par-

ticular, for a given level of accuracy of the fading estimation, we investigate the

reliability of the decoder using the framework of mismatched decoding [13]. We

quantify the performance loss incurred by using noisy fading estimates and iden-

tify the system parameters that contribute to the loss. Note that the reliability

measure of communication systems largely depends on the nature of the signal,

the targeted application and the fading dynamics. In particular, for applications

where large delays are tolerable, the transmission of a codeword spans over a

large number of fading realisations. In this case, the channel is considered er-

godic and long interleaved codes of rate not exceeding the channel capacity can

be used [14, 15]. For such a setup, the reliability measure is concerned with the

design of codes achieving the largest rate with a vanishing error probability, i.e.,

the largest reliable information rate corresponding to the channel capacity. On

the other hand, for applications with stringent delay constraints, long interleavers

cannot be assumed, and the transmission of a codeword only extends over a finite

number of fading realisations. In this case, the channel is considered non-ergodic.

The block-fading channel [14–16] is a useful channel model for such non-ergodic

channels undergoing a slowly-varying fading process. The important feature of

1.1Throughout the dissertation, the term noisy/erroneous fading estimates is used inter-changeably with the term imperfect/mismatched CSI; the term noiseless fading estimates isused interchangeably with the term perfect/matched CSI.

2

1.2 Dissertation Outline

this model is that the channel remains constant in a block (which consists of

several coded symbols) and varying from block to block according to a certain

probability distribution. The block-fading channel is not information stable [17]

and therefore it has zero capacity. This means that for a given positive data rate,

the error probability cannot be made any arbitrarily small. For such a setup,

the reliability measure is concerned with code design that minimises the error

probability.


The dissertation is structured into two main parts as follows.

Part I

Part I studies nearest neighbour decoding for non-ergodic block-fading channels

and includes the following four chapters.

Chapter 2: The Block-Fading Channel

This chapter serves as an introductory point of Part I. We first describe the

multiple-input multiple-output (MIMO) block-fading channel. We then review

some information-theoretic material for the block-fading channel with perfect

CSI, which includes the notions of mutual information, capacity and outage prob-

ability. We continue by introducing our approach to studying the block-fading

channel with imperfect CSI from a mismatched-decoding perspective. In this sec-

tion, we revisit the generalised mutual information (GMI) as an achievable rate

with a fixed decoding rule, investigate some important properties of the GMI,

introduce the generalised outage probability—the probability that the GMI is

less than the data rate—as a reliability measure, show the generalised outage

probability as the fundamental limit for independent and identically distributed

(i.i.d) codebooks, and show the relationship between the outage probability and

the generalised outage probability. At the end of the chapter, we provide some

perspectives on the system design based on the outage probability and the gen-

eralised outage probability.

Chapter 3: MIMO Block-Fading Channels with Imperfect CSIR

This chapter studies nearest neighbour decoding in block-fading channels with

imperfect CSI at the receiver (CSIR) and no CSI at the transmitter (CSIT). We

use the mismatched-decoding approach introduced in Chapter 2 to study the

3


reliability of the nearest neighbour decoder. We analyse the generalised outage

probability of nearest neighbour decoding in the high signal-to-noise ratio (SNR)

regime for random codes with Gaussian and discrete signal constellations. In

particular, we characterise the diversity or the SNR-exponent, which is defined

as the high-SNR slope of the error probability curve on a logarithmic-logarithmic

scale. This characterisation, which comprises both converse and achievability of

random coding, provides meaningful and realistic channel estimation and code

design criteria when the perfect fading estimation is not possible.

Chapter 4: IR-ARQ in MIMO Block-Fading Channels with Imperfect

Feedback and CSIR

Incremental-redundancy automatic-repeat request (IR-ARQ) can be used to im-

prove the reliability of the system with the help of feedback from the receiver and

retransmission of incorrectly decoded messages. This chapter studies the error

and outage performance of IR-ARQ in block-fading channels. In particular, we

focus on the high-SNR system performance. We derive the ARQ outage diversity

accounting for imperfect feedback and CSIR. We also demonstrate how power

control can be used to further improve the performance of IR-ARQ.

Chapter 5: Mismatched CSI Outage SNR-Exponents of MIMO Block-

Fading Channels

Another way to improve the reliability of the system is by allowing the transmitter

to have access to the CSI. This chapter studies the outage performance of the

MIMO block-fading channel where both transmitter and receiver do not know the

actual CSI but they have access to a noisy version. In particular, we study the

interplay between the variances of the CSI noise at the transmitter and receiver

in the generalised outage diversity (outage SNR-exponent). We demonstrate

that obtaining a reliable channel estimate at the receiver is more important than

obtaining a reliable channel estimate at the transmitter in terms of outage SNR-

exponent. We provide some connections between mismatched CSI outage SNR-

exponents and the design of a good channel estimation scheme.

Part II

Part II studies nearest neighbour decoding for stationary ergodic fading channels

and includes the following three chapters.

4


Chapter 6: Stationary Fading Channels

This chapter serves as an opening to the second part of the dissertation. We

revisit existing results on the interplay between channel capacity and CSI in

stationary MIMO Gaussian flat-fading channels. We distinguish the notions of

coherent fading channels and noncoherent fading channels. We then review the

capacity behaviour of coherent and noncoherent fading channels. In particular,

we focus on the high-SNR regime and address the capacity pre-log, defined as

the limiting ratio of the capacity to the logarithm of the SNR as the SNR tends

to infinity.

Chapter 7: Pilot-Aided Channel Estimation for Stationary Fading

Channels

We study the reliable information rates of noncoherent, stationary, Gaussian,

MIMO flat-fading channels that are achievable with nearest neighbour decoding

and imperfect fading estimation. To obtain accurate fading estimates from a

time-varying stationary fading process, we introduce pilots (also known as train-

ing sequences), which are emitted at a regular interval by the transmitter. We

analyse the behaviour of the achievable information rates in the limit as the SNR

tends to infinity. We demonstrate that nearest neighbour decoding with pilot-

aided channel estimation achieves the capacity pre-log of noncoherent multiple-

input single-output (MISO) flat-fading channels, and it achieves the best so far

known lower bound on the capacity pre-log of noncoherent MIMO flat-fading

channels.

Chapter 8: Pilot-Aided Channel Estimation for Fading Multiple-Access

Channels

This chapter extends the use of nearest neighbour decoding with pilot-aided

channel estimation presented in Chapter 7 to the fading multiple-access chan-

nel (MAC). We first introduce a two-user MIMO fading MAC model and a

joint-transmission scheme that jointly transmits codewords from both users and

separately estimates the channel from both users at the receiver. The reliable

information rate region that is achievable with nearest neighbour decoding and

pilot-aided channel estimation is analysed and the corresponding pre-log region,

defined as the limiting ratio of the rate region to the logarithm of the SNR as the

SNR tends to infinity, is determined. We compare the joint-transmission scheme

with time-division multiple-access (TDMA) and derive sufficient conditions when

the joint-transmission scheme is better than TDMA and when TDMA is better

5


than the joint-transmission scheme.

Chapter 9: Summary and Future Research

This chapter provides concluding remarks and identifies possible areas for future

research.

Note on Published and Submitted Works1.2

The materials in Part I have appeared in the following papers:

• A. T. Asyhari and A. Guillen i Fabregas, “Nearest neighbor decoding in

MIMO block-fading channels with imperfect CSIR”, IEEE Transactions on

Information Theory, vol. 58, no.3, pp. 1483–1517, March 2012.

• A. T. Asyhari and A. Guillen i Fabregas, “Mismatched CSI outage expo-

nents of MIMO block-fading channels”, to be submitted to IEEE Transac-

tions on Information Theory.

• A. T. Asyhari and A. Guillen i Fabregas,“MIMO ARQ block-fading chan-

nels with imperfect feedback and CSIR”, submitted to IEEE Transactions

on Wireless Communications, June 2010.

• A. T. Asyhari and A. Guillen i Fabregas, “Mismatched CSI outage expo-

nents of block-Fading channels”, in Proceedings of the IEEE International

Symposium on Information Theory, Saint Petersburg, Russia, July–August

2011.

• A. T. Asyhari and A. Guillen i Fabregas,“MIMO block-fading channels

with mismatched CSIR”, in Proceedings of the International Symposium

on Information Theory and its Applications, Taichung, Taiwan, October

2010.

• A. T. Asyhari and A. Guillen i Fabregas, “Coding for the MIMO ARQ

block-fading channel with imperfect feedback and CSIR”, in Proceedings of

the IEEE Information Theory Workshop, Dublin, Ireland, August–Septem-

ber 2010.

• A. T. Asyhari and A. Guillen i Fabregas, “Nearest neighbour decoding in

block-fading channels with imperfect CSIR”, in Proceedings of the IEEE

Information Theory Workshop, Taormina, Italy, October 2009.

1.2For on-going (to be submitted) works, the title of the articles have not been finalised andare subject to change.

6

1.3 Notation

The materials in Part II have appeared in the following papers:

• A. T. Asyhari, T. Koch and A. Guillen i Fabregas, “Nearest neighbour

decoding with pilot-aided channel estimation achieves the pre-log”, to be

submitted to IEEE Transactions on Information Theory.


decoding with pilot-assisted channel estimation for fading multiple-access

channels”, in Proceedings of the 49th Annual Allerton Conference on Con-

trol, Communication and Computing, Monticello, IL, September 2011.


decoding and pilot-aided channel estimation in stationary Gaussian flat-

fading channels”, in Proceedings of the IEEE International Symposium on

Information Theory, Saint Petersburg, Russia, July–August 2011.

1.3 Notation

In this section, we introduce notations that are generally used throughout the

dissertation. Notations which are specific to each chapter are introduced in the

chapter.

Sets or events are generally denoted by calligraphic fonts, e.g., X and the

superscript c is used to denote their complement, e.g., Xc. Exceptions to this set

notation include the use of for the set of real numbers, for the set of complex

numbers, for the set of integers, for the set of positive integers and 0 for

the set of non-negative integers.

Unless otherwise stated, we denote random scalars by uppercase letters, e.g.,

X , and their realisations by lowercase letters, e.g., x. Random vectors are denoted

by boldfaced uppercase letters, e.g., X, and their realisations are denoted by

boldfaced lowercase letters, e.g., x. Boldfaced letters 0 and 1 denote vectors with

all entries 0 and 1, respectively. To denote random matrices, we use uppercase

letters of a blackboard bold font, e.g., X. Deterministic matrices are denoted by

uppercase sans serif letters, e.g., X. The matrix In represents an identity matrix

of size n× n. Sans serif letters 0 and 1 denote matrices with all entries 0 and 1,

respectively.

We denote [·] for the expectation with respect to random variables in its

arguments.

For any random scalar X over an alphabet X, PX(·) denotes the probability

mass function [18] if X is discrete and the probability density function [19] if X

7

1.3 Notation

is continuous. Similar notations are also used for random vectors and matrices.

The univariate complex-Gaussian distribution with mean µ and variance σ2

is denoted as N(µ, σ2). The n-variate complex-Gaussian distribution with mean

µ and covariance matrix Σ is denoted as Nn(µ,Σ).

The operator | · | represents two things. It may mean the absolute value of a

scalar, or the cardinality of a set. The notation ‖ · ‖ denotes the Euclidean norm

of a vector

‖x‖ =

√

∑

k

|x(k)|2 (1.1)

where x(k) is the k-th element of x.

We use det(·) and tr(·) to denote the determinant and the trace of a matrix.

Other matrix operators include (·)T and (·)†, which denote the transpose and the

conjugate (Hermitian) transpose of a matrix. The notation ‖·‖F is the Frobenius

norm of a matrix

‖X‖F =√

tr (X†X). (1.2)

The indicator function of an event E is given by 1E. It is 1 if the event E

occurs and 0 otherwise.

The floor (ceiling) function ⌊x⌋ (⌈x⌉) denotes the largest (smallest) integer

smaller (greater) than or equal to x, while [x]+ = max(0, x).

We shall denote ı as the square root of negative one, i.e., ı =√−1.

Following the definition in [20], the exponential equality f(x).= xd indicates

that limx→∞log f(x)logx

= d. The exponential inequalities.

≤ and.

≥ are similarly

defined.

The symbols , , ≻ and ≺ describe component-wise inequalities ≥, ≤, >

and <.

We use log(·) and logp(·) to denote the natural and the base-p logarithm

functions.

Limit, limit superior and limit inferior are denoted by lim, lim sup and lim inf,

respectively.

8

Part I

Non-Ergodic Block-Fading

Channels

9

Chapter 2

The Block-Fading Channel

2.1 Introduction

The block-fading channel is a relevant model to study the transmission of delay-

limited applications over a slowly-varying fading channel. Within a block-fading

period, the channel gain remains constant, varying from block to block accord-

ing to some underlying distribution. Assuming a slowly-varying fading condi-

tion, modern communication systems utilising frequency hopping schemes such

as Global System for Mobile Communications (GSM) and multi-carrier modula-

tion such as Orthogonal-Frequency Division Multiplexing (OFDM) are practical

examples that are reasonably modelled by the block-fading channel.

Reliable communication over block-fading channels has traditionally been

studied under the assumption of perfect channel state information at the re-

ceiver (CSIR) [14], [16, 21–24]. Since only a finite number of fading realisations

are spanned in a single codeword, block-fading channel is not information sta-

ble [17]. The input-output mutual information between the transmitted and the

received codewords is varying in a random manner and the channel is consid-

ered to be non-ergodic. For most fading distributions, the Shannon capacity

is strictly zero [23]. Based on error exponent considerations, Malkamaki and

Leib [22] showed that the outage probability, i.e., the probability that the mu-

tual information is smaller than the target transmission rate [14,16], is the natural

fundamental limit of the channel. This means that in the limit as the codeword

length tends to infinity, communication with arbitrarily few errors is not possible

as the smallest error probability cannot be smaller than the outage probability.

In order to reduce the outage probability, the transmitter can adapt its trans-

mission scheme based on the channel condition. This requires the availability of

the channel state information at the transmitter (CSIT). Assuming perfect CSIT,

11

2.2 System Model

some adaptive transmission schemes, which reduce the outage probability by

some significant margins, have been proposed in the literature (see, e.g., [25–32]

and references therein). For example, based on feedback from the receiver, a

scheme that uses automatic-repeat request (ARQ) protocols improves the out-

age performance by adapting the transmission rate [25, 26, 29, 32]. When the

channel condition is good, the message can be transmitted at a high data rate;

when the channel condition degrades, the message can be transmitted at lower

rates. Another example is power adaptation based on the knowledge of the fad-

ing [25,27,28,30,31]. The idea of power adaptation is that in a very bad channel

realisation, power can be saved and used when channel conditions improve.

Channel state information (CSI) therefore plays critical roles in the block-

fading channel. As perfect CSI is difficult to guarantee in practice, designing

a communication system based on perfect CSI may lead to an unexpected be-

haviour of the actual system. In this chapter, we provide an overview on the

interplay of the CSI and the reliability of data transmission in the block-fading

channel. We first introduce the system model in Section 2.2. We then review

some existing results on the block-fading channel with perfect CSI in Section

2.3. We continue by proposing a study of the block-fading channel with imper-

fect CSI using the framework of mismatched decoding in Section 2.4. Using the

results in Sections 2.3 and 2.4, we discuss some guidelines for system design for

the block-fading channel in Section 2.5. We also give some perspectives on the

remaining chapters of Part I at the end of Section 2.5.

2.2 System Model

We consider a multiple-input multiple-output (MIMO) block-fading channel with

nt transmit antennas and nr receive antennas. The channel output at block b is

an nr × J-dimensional random matrix

Yb =

√

Pb

ntH bXb + Zb, b = 1, . . . , B (2.1)

where

• B and J denote the number of fading blocks and the channel block length

(number of channel uses in one block), respectively;

• Xb ∈ nt×J denotes the channel input matrix at block b;

• H b denotes the nr × nt-dimensional random fading matrix at block b;

12

2.2 System Model

m Encoder

×

×

+

+

Decoder m

H1

HB

Z1

ZB

√

P1

ntX1

√

PB

ntXB

Y1

YB

b

b

b

CSIRCSIT

Figure 2.1: A diagram for a MIMO block-fading channel.

• Zb denotes the nr × J-dimensional additive noise matrix at block b;

• and Pb denotes the transmission power allocated at block b.

We assume that the nr × J entries of Zb, b = 1, . . . , B are independent and

identically distributed (i.i.d.) Gaussian random scalars. The random matrices

H b, b = 1, . . . , B (whose values belong to nr×nt) are drawn i.i.d. according to

a certain probability distribution. We assume that the average fading gain is

normalised, i.e., [‖H b‖2F ] = ntnr. We finally assume that for all b = 1, . . . , B,

the fading H b and the noise Zb are independent and that their joint law does not

depend on Xb.

A codeword X of length N = BJ is defined by a concatenation of B channel

input matrices, i.e., X , [X1, . . . ,XB] ∈ Xnt×BJ where X ⊆ denotes the signal

constellation. We assume that the signal constellation is normalised in energy,

i.e., [|X|2] = 1. Collection of all possible transmitted codewords makes up for

a codebook, which is a part of the encoder. We shall consider codebooks whose

entries are generated i.i.d. from a probability distribution PX(x) over Xnt . A

message m, m ∈ M—where M is a set of all possible messages—is mapped into

a codeword X(m) at rate

R =1

BJlog2 |M|. (2.2)

We shall assume throughout that the messages are equiprobable. Upon receiving

the corrupted codeword, the decoder outputs the message m using a pre-defined

decoding rule. We say that a rate R is achievable if there exists a combination

of encoder and decoder such that the error probability Prm 6= m vanishes as

the block length J tends to infinity.

13

Chapter1/Chapter1Figs/MIMO-BFC-Model.eps

2.3 The Block-Fading Channel with Perfect CSI

Power allocation Pb, b = 1, . . . , B can be static or dynamic. Static allocation

refers to power allocation that does not change over time whereas dynamic allo-

cation refers to power allocation that possibly changes over time. An example of

static allocation is uniform power allocation that allocates power equally across

fading blocks independent of the fading realisations. Dynamic allocation is typ-

ically related to power adaptation based on the CSIT. This can be illustrated

as follows. Let fb(H) be the available CSIT at block b, which is a function of

the actual fading H. The power allocation algorithm—which minimises or max-

imises a certain objective—allocates Pb such that Pb = Pb(fb(H)). A constraint

is normally imposed to the transmission power, e.g.,

[

1

B

B∑

b=1

Pb (fb(H))

]

≤ SNR. (2.3)

Here SNR is the average-power constraint. Dynamic power allocation is a possible

technique for adaptive transmission.


In this section, we revisit some information-theoretic measures such as mutual

information, capacity and outage probability, which are fundamental for block-

fading channels with perfect CSI. We then discuss the interplay of nearest neigh-

bour decoding and noise distribution in block-fading channels.

To simplify the presentation, we shall assume that the power allocated at

block b, Pb, is static over time and known to the receiver. Hence, the channel

transition probability can be characterised by the conditional probability density

PY|X ,H (Y|X,H).

2.3.1 Mutual Information, Capacity and Outage Proba-

bility

It has been shown in [22,33] that the average error probability for the ensemble of

random codes of rate R and a fixed channel realisation H = H, input distribution

PX(x) over Xnt and the maximum-likelihood decoder with perfect CSIR is given

by

Pe,ave(H) ≤ 2−NEr(R,H) (2.4)

14


where

Er(R,H) = sup0≤ρ≤1

1

B

B∑

b=1

E0(ρ,Hb)− ρR (2.5)

is the error exponent for channel realisation H and

E0(ρ,Hb) = − log2

(

PY |X,H (Y |X ′,H b)

PY |X,H (Y |X,H b)

)1

1+ρ

∣

∣

∣

∣

∣

∣

X,Y ,H b

ρ∣∣

∣

∣

∣

∣

H b = Hb

(2.6)

is the Gallager function for a given fading realisation Hb [33]. Here ρ is used as

the optimising variable in (2.5). Note that the inner expectation is taken over

X ′, while the outer expectation is taken over X,Y for a fixed fading realisation

Hb. Here PY |X,H (y|x,Hb) characterises the channel transition probability for one

channel use conditioned that Hb is known. Remark that the upper bound in

(2.4) corresponds to the average over all i.i.d. codebooks. The ensemble-average

in (2.4) implies that there exists a code in the ensemble whose average error

probability is bounded as Pe,ave(H) ≤ 2−NEr(R,H) [33]. Then, the average error

probability for that code averaged over all fading states is

Pe,ave ≤ [

2−NEr(R,H )]

. (2.7)

Basic error exponent results show that Er(R,H) is positive only when

R ≤ I(H) − ε, where ε > 0 is a small number and I(H) is the input-output

mutual information

I(H) =1

B

B∑

b=1

Iawgn

(

√

Pb

ntHb

)

, (2.8)

Iawgn

(

√

Pb

ntHb

)

=

[

log2PY |X,H (Y |X,H b)

[

PY |X,H (Y |X ′,H b)∣

∣Y ,H b

]

∣

∣

∣

∣

∣

H b = Hb

]

. (2.9)

Otherwise, Er(R,H) is zero.

Using Arimoto’s converse [34] it is possible to show that the mutual informa-

tion with optimal input distribution over Xnt, I⋆(H), is the largest rate that can

be reliably transmitted.2.1 There is no rate larger than I⋆(H) having a vanishing

error probability. This converse is strong in the sense that for rates larger than

I⋆(H), the error probability tends to one for sufficiently large block length. This

2.1Herein optimal input distribution refers to an input distribution over Xnt that maximisesI(H). The maximised mutual information I⋆(H) is commonly referred to as capacity for a givenalphabet X and channel H.

15


converse is also the “dual” to the Gallager theorem for rates above I⋆(H), i.e., for

a fixed channel realisation H the average error probability of any coding scheme

constructed from the alphabet X is lower-bounded by [34]

Pe,ave(H) ≥ 1− 2−NE′r(R,H) (2.10)

where

E ′r(R,H) = sup

−1≤ρ≤0inf

PX(x)

1

B

B∑

b=1

E0(ρ,Hb)− ρR (2.11)

is of the same form as (2.5) except for the range of ρ in the supremum and for the

infimum of the probability distribution PX(x) over Xnt . With channel realisation

being random, over all fading coefficients, the average error probability becomes

Pe,ave ≥ 1− [

2−NE′r(R,H )

]

. (2.12)

It has been shown in [34] that E ′r(R,H) is positive whenever R > I⋆(H) and zero

otherwise. Let P ⋆X(x) be the input distribution over Xnt that achieves I⋆(H).

Suppose that P ⋆X(x) is used to evaluate the Gallager error exponent (2.5). Then,

it follows from [22, 33, 34] that I(H) is equal to I⋆(H), and for sufficiently large

N , (2.7) and (2.12) converge and we obtain that [22]

Pe,ave∼= inf

ε>0PrI(H)− ε < R (2.13)

= PrI⋆(H) < R , Pout(R) (2.14)

which is the information outage probability [16]. The above results show the

convergence of the random coding achievability and converse to the outage prob-

ability as the block length increases to infinity. These results also imply that the

outage probability is the natural fundamental limit for block-fading channels.

One has to note that the convergence in (2.14) holds when the capacity-

achieving distribution P ⋆X(x) is used to construct codebooks. For a fixed input

distribution PX(x), the probability in (2.13) only characterises random cod-

ing achievability bound to the average error probability; it does not imply any-

thing on the converse bound for any given code. For continuous alphabet, it is

well known that Gaussian inputs achieve the capacity for additive white Gaus-

sian noise (AWGN) channels. On the other hand, for discrete alphabet X, the

capacity-achieving distribution for AWGN channels depends on the operating

SNR. For high SNR, equiprobable distribution over Xnt is optimal.

16


2.3.2 Nearest Neighbour Decoding and Noise Distribu-

tion

Gallager’s bound (2.4) and Arimoto’s bound (2.10) are derived based on the

channel transition density conditioned that the fading is known perfectly, i.e.,

PY|X ,H (Y|X,H). Given the channel output Y = [Y1, . . . ,YB], the fading H =

[H1, . . . ,HB] and the codebook, the message output m is obtained from maximis-

ing the likelihood metric

m = argmaxm∈M

PY|X ,H (Y|X(m),H). (2.15)

If the messagesm = 1, . . . , |M| are equiprobable, then this decision rule is optimal

in the sense that it minimises the error probability.

The likelihood metric PY|X ,H (Y|X(m),H) depends on the distribution of the

additive noise Z = [Z1, . . . ,ZB]. Note that using the assumptions in Section 2.2,

we can express the likelihood metric as

PY|X ,H (Y|X(m),H) =

B∏

b=1

J∏

ν=1

PY |X,H (yb,ν|xb,ν(m),Hb) . (2.16)

For AWGN, we have that

PY |X,H (yb,ν |xb,ν(m),Hb) =1

πnte−∥

∥

∥yb,ν−√

Pbnt

Hbxb,ν(m)∥

∥

∥

2

, (2.17)

which by taking − logPY|X ,H (Y|X(m),H) yields a distance metric and results in

(2.15) being nearest neighbour decoding

m = arg minm∈M

B∑

b=1

J∑

ν=1

∥

∥

∥yb,ν −

√

Pb/nt Hbxb,ν(m)∥

∥

∥

2

. (2.18)

It follows that for AWGN, nearest neighbour decoding is optimal if H is known

perfectly.

If the additive noise is not Gaussian, then nearest neighbour decoding may

no longer be optimal. Optimal decoding is derived from maximising the metric

PY|X ,H (Y|X,H), which has a similar form to the density of the noise distribution.

However, Lapidoth [4] has shown for Gaussian inputs that even though nearest

neighbour decoding is suboptimal for additive non-Gaussian noise, it achieves

the same level of reliability as that achieved for AWGN. It follows that nearest

neighbour decoding provides a robust design approach for any noise distribution

17

2.4 The Block-Fading Channel with Imperfect CSI

in the channel. However, one should note that since AWGN is the worst noise

that minimises the mutual information for Gaussian inputs, a drawback of using

nearest neighbour decoding is that any noise in the channel will appear to be as

harsh as AWGN with equal power.

2.4 The Block-Fading Channel with Imperfect

CSI

In this section, we introduce a mismatched-decoding approach to analyse block-

fading channels with imperfect CSI. We shall assume that the receiver has no

perfect knowledge of H b but rather has access to the noisy estimates H b

H b = H b + Eb, b = 1, . . . , B (2.19)

where Eb is the CSIR noise matrix (or the estimation error matrix) at block b. The

receiver then employs a decoding rule based on a positive metric QY|X ,H (Y|X, H).

We further assume that this decoding metric is a product of the individual symbol

metrics, i.e.,

QY|X ,H (Y|X, H) =

B∏

b=1

J∏

ν

QY |X,H (yb,ν |xb,ν, Hb) (2.20)

where each individual metric is positive and bounded by one, i.e.,

0 < QY |X,H (yb,ν |xb,ν, Hb) < 1. (2.21)

Note that the metric from the channel transition probability PY|X ,H (Y|X,H) hasthe same properties as (2.20) and (2.21). Similarly to Section 2.3, we assume

static power allocation.

In following, we study an upper bound to the error probability for a maximum-

metric decoder based on the metric QY|X ,H (Y|X, H), which is an extension of

(2.15), i.e., the output message m is selected if

m = argmaxm∈M

QY|X ,H

(

Y|X(m), H)

. (2.22)

We evaluate the upper bound to the error probability in the limit of large block

length and specifically introduce the notions of generalised mutual information

(GMI) [13, 35] and generalised outage probability, which provide basic tools for

the analysis in Chapters 3, 4 and 5.

18


2.4.1 Generalised Gallager’s Bound, GMI and Generalised

Outage

We say that mismatched decoding occurs if decoding based onQY|X ,H (Y|X, H) does

not match with that based on PY|X ,H (Y|X,H) [13,35–37]. This problem is generally

encountered for a wide-range of communication systems, where the only way to

obtain CSIR is via a channel estimator, inducing accurate yet imperfect channel

coefficients. Note that if QY|X ,H (Y|X, H) has the same form as PY|X ,H (Y|X,H) but

with H being replaced by H, then QY|X ,H (Y|X, H) is a distance metric with noisy

channel estimate and (2.22) yields a nearest neighbour decoder. In this case, the

nearest neighbour decoder treats the channel estimate Hb as if it was the true

channel.

Following the same steps outlined in Section 2.3.1, we can upper-bound the

error probability of the ensemble of random codes as [35, 37, 38]

Pe,ave(H) ≤ 2−NEQr (R,H) (2.23)

where now the mismatched decoding error exponent is

EQr (R, H) = sup

s>00≤ρ≤1

1

B

B∑

b=1

EQ0 (s, ρ, Hb)− ρR (2.24)

and

EQ0 (s, ρ, Hb)

= − log2

QY |X,H (Y |X ′, H b)

QY |X,H (Y |X, H b)

s∣∣

∣

∣

∣

∣

X,Y ,H b,Eb

ρ∣∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

(2.25)

is the generalised Gallager function for a given fading realisation Hb and estima-

tion error Eb [35]. Here s and ρ are used as the optimising variables in (2.24).

Remark 2.1. The use of the decoder metric QY|X ,H (Y|X, H) here is not only

restricted to the distance metric with noisy channel estimate. The random coding

upper bound holds for any bounded positive decoding metric satisfying (2.21).

Proposition 2.1 (Concavity of EQ0 (s, ρ, Hb)). For a fixed input distribution, the

generalised Gallager function, EQ0 (s, ρ, Hb), is a concave function of s for s > 0

and of ρ for 0 ≤ ρ ≤ 1.

19


Proof. See Section 2.4.2.1.

Since EQ0 (s, ρ, Hb) is concave in ρ for 0 ≤ ρ ≤ 1, the maximum slope of

EQ0 (s, ρ, Hb) with respect to ρ occurs at ρ = 0. The maximisation over s results

in a maximum slope equal to the GMI [13, 35], given by

Igmi(H) = sups>0

1

B

B∑

b=1

Igmib (SNR,Hb, Hb, s) (2.26)

where

Igmib (SNR,Hb, Hb, s)

=

log2Qs

Y |X,H(Y |X, H b)

[

QsY |X,H

(Y |X ′, H b)∣

∣

∣Y ,H b,Eb

]

∣

∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

. (2.27)

Note that if H = H, then Igmi(H) is equal to I(H). The GMI Igmi(H) has the

following properties.

Proposition 2.2 (Non-Negativity of the GMI). For a decoding metric that can

be expressed as (2.20) and satisfies (2.21), Igmi(H) is always non-negative, i.e.,

Igmi(H) ≥ 0. (2.28)


Proposition 2.3 (GMI Upper Bound). The GMI is upper-bounded as

Igmi(H) ≤ 1

B

B∑

b=1

supsb>0

Igmib

(

SNR,Hb, Hb, sb

)

. (2.29)


The maximum slope analysis on the concave function EQ0 (s, ρ, Hb) shows that

the exponent EQr (R, H) is only positive whenever R ≤ Igmi(H)− ε, and zero oth-

erwise, proving the achievability of Igmi(H). Then, following the same argument

as that from [33], there exists a code in the ensemble such that the average error

probability—averaged over all fading and its corresponding estimate states—is

bounded as

Pe,ave ≤

[

2−NEQr (R,H )

]

, (2.30)

20


which, for large J (noting that N = BJ) becomes

Pe,ave ≤ infε>0

PrIgmi(H)− ε < R (2.31)

= PrIgmi(H) < R , Pgout(R), (2.32)

the generalised outage probability.

The above analysis shows the achievability of Pgout(R) which indicates that for

large J one may find codes whose error probability approaches Pgout(R). Unfortu-

nately, there are no generally tight converse results for mismatched decoding [13]

which implies that one might be able to find codes whose error probability for

large J might be lower than Pgout(R). However, as shown in [4,36,39], a converse

exists for i.i.d. codebooks, i.e., no rate larger than the GMI can be transmitted

with vanishing error probability.

Proposition 2.4 (Generalised Outage Converse). For i.i.d. codebooks with suf-

ficiently large block length, we have that

Pout(R) ≤ Pgout(R) ≤ Pe,ave. (2.33)

Proof. The inequality of Pe,ave ≥ Pgout(R) follows from the GMI converse in [39]

for i.i.d. codebooks. Furthermore, due to the data-processing inequality for error

exponents EQr (R, H) ≤ Er(R,H) [35,37], we obtain that Igmi(H) ≤ I(H) ≤ I⋆(H)

[13], and hence Pgout(R) ≥ Pout(R).

From the above proposition, we say that the generalised outage probability is

the fundamental limit for i.i.d. codebooks. If one would allow non-i.i.d. codebook

construction, then a smaller error probability than the generalised outage proba-

bility can potentially be achieved. From the works in [13,36,38], it can be shown

that conditioned on the fading and estimation error, one can obtain achievable

rates (e.g., rates below the LM bound for the mismatched capacity [13]) that are

larger than the GMI, which implies that the error probability can be smaller

than the generalised outage probability. The drawback of using the achievability

in [13, 36, 38] is that the LM bound can be very difficult to compute because of

the optimisation over cost function.

21


2.4.2 Section 2.4.1 Proofs

2.4.2.1 Concavity of EQ0 (s, ρ, Hb)

Fix the input distribution PX(x). We first assume 0 < g0 < g1, 0 < δ < 1,

ρ = δg0 + (1− δ)g1. Define

f(s,x,y) ,

[(

QY |X,H (Y |X ′, H b)

QY |X,H (Y |X, H b)

)s∣∣

∣

∣

∣

X = x,Y = y,H b,Eb

]

(2.34)

and

EQ0 (s, ρ, Hb) = − log [f(s,X,Y )ρ|H b = Hb,Eb = Eb] . (2.35)

Then, we have that

[f(s,X,Y )ρ|H b = Hb,Eb = Eb]

= [

f(s,X,Y )δg0f(s,X,Y )(1−δ)g1∣

∣H b = Hb,Eb = Eb

]

. (2.36)

Using Holder’s inequality [40], we have that

[

f(s,X,Y )δg0f(s,X,Y )(1−δ)g1∣

∣H b = Hb,Eb = Eb

]

≤ ( [f(s,X,Y )g0|H b = Hb,Eb = Eb])δ

× ( [f(s,X,Y )g1|H b = Hb,Eb = Eb])(1−δ) . (2.37)

Taking the logarithm, which is a monotonously increasing function, on both sides

yields

−EQ0 (s, ρ, Hb) ≤ −δEQ

0 (s, g0, Hb)− (1− δ)EQ0 (s, g1, Hb) (2.38)

which shows the concavity of EQ0 (s, ρ, Hb) in ρ for ρ ≥ 0.

22


Now, let s = δg0 + (1− δ)g1. Then,

f(s,x,y) =

QY |X,H (Y |X ′, H b)

QY |X,H (Y |X, H b)

)δg0+(1−δ)g1∣

∣

∣

∣

∣

∣

X = x,Y = y,H b,Eb

(2.39)

≤(

[(

QY |X,H (Y |X ′, H b)

QY |X,H (Y |X, H b)

)g0∣∣

∣

∣

∣

X = x,Y = y,H b,Eb

])δ

×(

[(

QY |X,H (Y |X ′, H b)

QY |X,H (Y |X, H b)

)g1∣∣

∣

∣

∣

X = x,Y = y,H b,Eb

])1−δ

(2.40)

= f(g0,x,y)δ × f(g1,x,y)

1−δ (2.41)

where the inequality follows from Holder’s inequality [40]. Evaluating (2.35), we

have that

[

f(s,X,Y )ρ|H b = Hb,Eb = Eb

]

≤ [

f(g0,X,Y )ρδ × f(g1,X,Y )ρ(1−δ)∣

∣H b = Hb,Eb = Eb

]

(2.42)

≤ ( [f(g0,X,Y )ρ|H b = Hb,Eb = Eb])δ

× ( [f(g1,X,Y )ρ|H b = Hb,Eb = Eb])(1−δ) . (2.43)

Taking logarithm on both sides gives us

−EQ0 (s, ρ, Hb) ≤ −δEQ

0 (g0, ρ, Hb)− (1− δ)EQ0 (g1, ρ, Hb) (2.44)

which proves the concavity of EQ0 (s, ρ, Hb) in s for s ≥ 0.

2.4.2.2 Non-Negativity of Igmi(H)

Let s∗ be the value of s achieving the supremum on the right-hand side (RHS)

of (2.26). Then, by substituting a specific value of s to the RHS of (2.26), i.e.,

s ↓ 0, we have that

Igmi(H) =1

B

B∑

b=1

Igmib (SNR,Hb, Hb, s

∗) (2.45)

≥ lims↓0

1

B

B∑

b=1


23


because s∗ always maximises the RHS of (2.26). Let s = 1s′. Note that from

(2.27) and (2.46)

lims↓0


= lims′↑∞

1

s′

[

log2QY |X,H (Y |X, H b)

∣

∣

∣

∣

H b = Hb,Eb = Eb

]

− [

log2[

Q1s′

Y |X,H(Y |X ′, H b)

∣

∣

∣Y ,H b,Eb

]

∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

]

(2.47)

= lims′↑∞

−[

log2[

Q1s′

Y |X,H(Y |X ′, H b)

∣

∣

∣Y ,H b,Eb

]

∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

]

. (2.48)

Since the function Q1s′

Y |X,H(y|x′, Hb) can be bounded from (2.21) as

0 < Q1s′

Y |X,H(y|x′, Hb) < 1 (2.49)

for s = 1s′> 0, we have that

[

Q1s′

Y |X,H(Y |X ′, H b)

∣

∣

∣Y ,H b,Eb

]

< 1 (2.50)

and

− log2[

Q1s′

Y |X,H(Y |X ′, H b)

∣

∣

∣Y ,H b,Eb

]

> 0. (2.51)

It follows from (2.48) that

lims′↑∞

−[

log2[

Q1s′

Y |X,H(Y |X ′, H b)

∣

∣

∣Y ,H b,Eb

]

∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

]

≥ [

lims′↑∞

− log2[

Q1s′

Y |X,H(Y |X ′, H b)

∣

∣

∣Y ,H b,Eb

]

∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

]

(2.52)

=

[

− log2

[

lims′↑∞

Q1s′

Y |X,H(Y |X ′, H b)

∣

∣

∣Y ,H b,Eb

]

∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

]

(2.53)

= 0 (2.54)

where inequality (2.52) is obtained by applying Fatou’s lemma [41], and equality

(2.54) is obtained by applying the dominated convergence theorem [19]. This

proves Proposition 2.2.

24

2.5 Outage Bounds, Diversity and System Design

2.4.2.3 GMI Upper Bound

We have that

sups>0

1

B

B∑

b=1

Igmib

(

SNR,Hb, Hb, s)

≤ 1

B

B∑

b=1

supsb>0

Igmib

(

SNR,Hb, Hb, sb

)

. (2.55)

The left-hand side (LHS) supremum over s is taken over all B blocks. Thus, the

optimising s does not necessarily maximise the value of Igmib (SNR,Hb, Hb, s) for

each b. In this upper bound, however, the supremum is taken for each b.

2.5 Outage Bounds, Diversity and System De-

sign

In this section, we describe how we can use the outage probability and the gener-

alised probability as guidelines for system design. We start by defining important

reliability measures at high SNR.

Definition 2.1 (Code Diversity). A code is said to have diversity d⋆ if it holds

that

d⋆ = limSNR→∞

− logPe,ave(SNR)

log SNR(2.56)

where Pe,ave(SNR) is the average error probability of that code.

Diversity is commonly referred to as SNR-exponent. In this case, code diver-

sity measures the high-SNR slope of the error probability (plotted in a log-log

scale).

Definition 2.2 (Multiplexing Gain). A coding scheme is said to have multiplex-

ing gain rg if it holds that

rg = limSNR→∞

R(SNR)

log SNR(2.57)

where R(SNR) is the coding rate.

The multiplexing gain was introduced in [20] and characterises the high-SNR

linear gain of the coding rate with respect to the logarithm of the SNR.

Using perfect-CSIR assumption, we have shown that the outage probability

serves as the fundamental limit of block-fading channels. In the following, we

explain one particular aspect of code design for a fixed block length that utilises

the outage result.

25


Definition 2.3 (Outage Diversity). The outage diversity dcsir is defined as the

high-SNR slope of the outage probability curve, as a function of the SNR, in

log-log scale plot when the receiver has access to perfect CSIR

dcsir , limSNR→∞

− logPout(R)

log SNR. (2.58)

Remark that the above Pout(R) is defined as the probability that the mutual

information, maximised over all probability distributions on alphabet Xnt , I⋆(H)

is less than the data rate R.

Suppose that dcsir is a finite real-valued quantity, which is the case for many

fading distributions. Then, we have the following lemma on a lower bound to

the error probability at high SNR [20].

Lemma 2.1 (Converse Outage Bound). For any coding scheme with any fixed

length, the average error probability at high SNR is lower-bounded as

Pe,ave(SNR).≥ SNR

−dcsir. (2.59)

Proof. This lemma has been proven using Fano’s inequality in [20]. Here we

provide an alternative proof based on Arimoto’s converse. Recall E ′r(R,H) is

positive if and only if R > I⋆(H) and zero otherwise. When the error exponent

E ′r(R,H) = 0, then 1 − 2−NE′

r(R,H) = 0. Therefore, the error probability can be

lower-bounded as

Pe,ave ≥ [

1− 2−NE′r(R,H )

]

(2.60)

=

∫

I⋆(H)<R

(

1− 2−NE′r(R,H)

)

PH (H)dH (2.61)

=

∫

I⋆(H)<RPH (H)dH−

∫

I⋆(H)<R2−NE′

r(R,H)PH (H)dH (2.62)

= Pout(R)−∫

I⋆(H)<R2−NE′

r(R,H)PH (H)dH (2.63)

.= Pout(R). (2.64)

Note that for the set I⋆(H) < R, we have that

∫

I⋆(H)<R2−NE′

r(R,H)pH (H)dH < Pout(R) (2.65)

with strict inequality due to 2−NE′r(R,H) < 1 and non-zero probability measure of

the set I⋆(H) < R. Since the RHS of Arimoto’s converse is bounded by zero,

26


as the SNR tends to infinity, the integral term of 2−NE′r(R,H) decays, as a function

of the SNR, at a rate faster than or equal to the decaying rate of Pout(R).

Remark 2.2. The proof for Lemma 2.1 in [20] works for finite length codes when

the multiplexing gain rg is non-zero. For fixed-rate transmission (zero multiplex-

ing gain), it will no longer work unless we assume that the block length grows with

log SNR. Herein we do not make any assumption on the multiplexing gain rg and

the block length J to prove Lemma 2.1. Hence, our proof based on Arimoto’s

converse is more general than the proof based on Fano’s inequality in [20].

Lemma 2.1 provides a guideline to design good codes. The lower bound

implies that the code diversity of any coding scheme with a fixed block length

J can never be larger than the outage diversity dcsir. This result can be used

as a benchmark for code design, i.e., a good code should have diversity that

approaches or equal to the outage diversity. In fact, this result has been used

to construct practical codes from discrete alphabets. Following [20], it has been

shown in [23] that for single-input single-output (SISO) Rayleigh block-fading

channels with channel inputs that are drawn from a discrete alphabet X with

alphabet size |X| = 2M , the outage diversity dcsir is given by the Singleton bound

[42]

dSB(R) , 1 +

⌊

B

(

1− R

M

)⌋

. (2.66)

This Singleton bound turns out to be one of the design criteria in [21–23], i.e., op-

timal codes for the block-fading channel should be maximum-distance separable

(MDS) on a blockwise basis, i.e., achieving the Singleton bound on the blockwise

Hamming distance of the code with equality. Families of blockwise MDS codes

based on convolutional codes [22, 23], turbo codes [43], low-density parity check

(LDPC) codes [44] and Reed-Solomon codes [21] have also been proposed in the

literature. Remark that from dSB(R), we can see that there is a fundamental

trade-off between the SNR-exponent (diversity) and the target rate.

We next illustrate how we can use the result in Lemma 2.1 for system design

when perfect CSIR cannot be assumed. We first define a relevant outage measure

for imperfect CSIR at high SNR.

Definition 2.4 (Generalised Outage Diversity). The generalised outage diversity

dicsir is defined by the high-SNR slope of the generalised outage probability curve

in log-log scale plot when the receiver only knows the noisy CSIR

dicsir , limSNR→∞

− logPgout(R)

log SNR. (2.67)

27


From the data-processing inequality (Proposition 2.4), we have Pgout(R) ≥Pout(R), which implies that dicsir ≤ dcsir. Note that the lower bound in Lemma 2.1

holds for coding schemes with mismatched decoding as well since the maximum-

likelihood decoder (for perfect CSIR) gives the smallest error probability. An

interesting question to address is whether it is possible to design schemes that

yield dicsir = dcsir. If we can achieve dicsir = dcsir, random coding arguments in

Section 2.4 implies that we can find a code with the highest possible diversity.

By assuming that the decoder is supplied by noisy fading estimates, we im-

plicitly consider schemes that separately perform channel estimation and message

decoding. It is thus intuitive to see that the accuracy of channel estimation af-

fects the reliability of message decoding. Using nearest neighbour decoding, we

prove rigorously in Chapter 3 that indeed, dicsir can be expressed as a function of

both the accuracy of the channel estimation and the perfect-CSIR diversity dcsir.

We also demonstrate how random coding schemes can achieve the perfect-CSIR

diversity for a given block length J .

If the channel condition is known prior to transmission, the transmitter can

employ adaptive schemes to improve the generalised outage diversity. In chap-

ter 4, we study the way to improve the generalised outage diversity dicsir using

an ARQ scheme. ARQ improves the outage probability by retransmitting any

erroneously decoded codewords. ARQ with perfect CSIR and feedback has been

studied in the literature (see for example [25,26,29]). We investigate such scheme

with imperfect CSIR and feedback. In Chapter 5, we characterise the improve-

ment of the generalised outage diversity dicsir when power adaptation based on

noisy CSIT is used. Several works have addressed power adaptation in block-

fading channels; see for example [27,30,31] for the works with perfect CSI at both

terminals; and see for example [45, 46] for the works with imperfect CSIT (as-

suming perfect CSIR). In this dissertation, we study imperfect CSIT and CSIR

in a unified framework.

28

Chapter 3

MIMO Block-Fading Channels

with Imperfect CSIR

We have shown in Chapter 2 that for perfect CSIR, nearest neighbour decoding

yields minimal error probability. However, perfect CSIR is difficult to obtain in

practice because of the hardware limitation and the time-varying characteristics

of the channel.

In this chapter, we study the performance of nearest neighbour decoding for

transmission over MIMO block-fading channels when the receiver is unable to

obtain the perfect CSIR. We apply the mismatched-decoding approach intro-

duced in Section 2.4 to evaluate the reliability of nearest neighbour decoding.

More specifically, we utilise the generalised mutual information (GMI) and the

generalised outage probability. In summary, the GMI is an achievable rate when

a fixed decoding rule—which is not necessarily matched to the channel—is em-

ployed [13, 36]. In our case, it characterises the maximum communication rate

under fixed fading and fading estimate realisations, below which the average er-

ror probability is guaranteed to vanish as the codeword length tends to infinity.

Due to the time-varying fading and its corresponding estimate, the GMI is ran-

dom. The generalised outage probability, the probability that the GMI is less

than the target rate, serves as an achievable error probability performance for

MIMO block-fading channels. By achievability, we mean that random codes are

able to achieve this performance but it does not mean that there are no codes

that perform better than the generalised outage probability. However, as shown

in [4, 36, 39], i.i.d. codebooks have a GMI converse, i.e., no rate larger than

the GMI can be transmitted with vanishing error probability for i.i.d. code-

books. Thus, for i.i.d. codebooks, the GMI is the largest achievable rate and the

generalised outage probability becomes the fundamental limit for block-fading

29

3.1 System Model

channels with mismatched CSIR.

We are particularly interested in the reliability performance of nearest neigh-

bour decoding in the high-SNR regime. More specifically, we are interested in

the generalised outage diversity

dicsir , limSNR→∞

− logPgout(R)

log SNR(3.1)

defined in (2.67). This reliability measure is a natural extension of the outage

diversity

dcsir , limSNR→∞

− logPout(R)

log SNR(3.2)

defined in (2.58). Here dcsir characterises the perfect-CSIR diversity, which in

turn provides a performance benchmark for practical codes, i.e., the highest

diversity that a code can achieve is the outage diversity (see Section 2.5). In this

chapter, we shall take a closer look to the relationship between the imperfect-

CSIR generalised outage diversity dicsir and the perfect-CSIR outage diversity

dcsir. Our main results serve as practical guidelines for system design that take

into account the CSIR estimation error.

The rest of the chapter is outlined as follows. Based on the setup in Sec-

tion 2.2, Section 3.1 introduces specific models for the fading distribution, the

codebook generation and the imperfect CSIR. Using Gaussian and discrete-

constellation codebooks, Section 3.2 revisits some existing results on the charac-

terisation of the perfect-CSIR outage diversity. Section 3.3 establishes our main

theorem on the generalised outage diversity. Section 3.4 discusses our results on

the random coding achievability. Section 3.5 provides important remarks, shows

numerical evidence and discusses the generality of the results with respect to the

fading distribution. Finally, Section 3.6 summarises the important points of the

chapter.

3.1 System Model

We recall the channel model for a MIMO block-fading channel with nt transmit

antennas, nr receive antennas and B fading blocks in Section 2.2. Herein we

assume that the CSIT is not available; thus, the uniform power allocation over all

fading blocks and transmit antennas is used in (3.3), i.e., Pb = SNR, b = 1, . . . , B.

30

3.1 System Model

m Encoder

×

×

+

+

Decoder m

H1

HB

Z1

ZB

√

SNR

nt

X1

√

SNR

nt

XB

Y1

YB

b

b

b

CSIR

Hb, b = 1, . . . , B

Figure 3.1: A MIMO block-fading channel with imperfect CSIR.

The channel output at block b is an nr × J-dimensional random matrix

Yb =

√

SNR

ntH bXb + Zb, b = 1, . . . , B (3.3)

where

• B and J denote the number of fading blocks and the channel block length,

respectively;

• Xb ∈ nt×J denotes the channel input matrix at block b;

• H b denotes the nr × nt-dimensional random fading matrix at block b;

• and Zb denotes the nr × J-dimensional additive noise matrix at block b.

We assume that the nr × J entries of Zb, b = 1, . . . , B are i.i.d. complex-

Gaussian random variables with zero mean and unit variance.

The entries of H b are i.i.d. random variables drawn according to a certain

probability distribution. We use the general fading model of [47, 48]. Let Hb,r,t

be the channel coefficient for fading block b, receive antenna r and transmit

antenna t. Then, the probability density function (pdf) of Hb,r,t is given by

PHb,r,t(h) = w0|h|τe−w1|h−w2|ϕ (3.4)

where w0 > 0, τ ∈ , w1 > 0, w2 ∈ and ϕ ≥ 1 are constants (finite and SNR

independent). This model subsumes a number of widely used fading distribu-

tions such as Rayleigh, Rician, Nakagami-m (m ≥ 1), Nakagami-q (0 < q ≤ 1)

31

Chapter2/Chapter2Figs/MIMO-BFC-CSIR-Model.eps

3.1 System Model

Table 3.1: Pdf for Different Fading Distribution

Fading type pdf w0 τ w1 w2 ϕ

Rayleigh 1πΩe−

|h|2

Ω1πΩ

0 1Ω

0 2

Rician 1πΩe−

|h−µ|2

Ω1πΩ

0 1Ω

µ 2

Nakagami-m mm|h|2m−2

πΩmΓ(m)e−

m|h|2

Ωmm

πΩmΓ(m)2m− 2 m

Ω0 2

Weibull ηΩ−η

2π|h|η−2e−(

|h|Ω )

ηηΩ−η

2πη − 2 1

Ωη 0 η

Nakagami-q 1+q2

2πqΩI0

(

(1−q4)|h|24q2Ω

)

e− (1+q2)2

4q2Ω|h|2

see footnote3.1

and Weibull (η ≥ 2) as tabulated in Table 3.1. For Rayleigh and Rician fading

channels, the above pdf represents the pdf of the complex-Gaussian random vari-

able with independent real and imaginary parts. For Nakagami-m, Weibull and

Nakagami-q fading channels, the above pdf is derived assuming that the magni-

tude and phase are independently distributed, and that the phase is uniformly

distributed over [0, 2π). Furthermore, we assume that the average fading gain is

normalised, i.e., [‖H b‖2F ] = ntnr, b = 1, . . . , B.

We finally assume that for all b = 1, . . . , B, the fading H b and the noise Zb

are independent and that their joint law does not depend on Xb.

At the transmitter end, we consider coding schemes of fixed rate R and length

N = BJ , whose codewords are defined as X , [X1, · · · ,XB] ∈ Xnt×BJ , where X

denotes the signal constellation; herein we focus on Gaussian constellation and

discrete constellation of size |X| = 2M . By fixed rate, we mean that the coding

rate is a positive constant and independent of the SNR; thus the multiplexing

gain (2.57) tends to zero. A non-zero multiplexing gain is only possible with an

input constellation that has continuous probability distribution (such as Gaussian

input) or an input constellation that has discrete probability distribution but with

alphabet size |X| increasing with the SNR. Since many practical systems employ

coding schemes with a fixed code rate and a finite alphabet size, the assumption

of zero multiplexing gain is highly relevant in practice. Furthermore, codewords

are assumed to satisfy the average input power constraint 1BJ [‖X‖2F ] ≤ nt.

At the receiver side, when perfect CSIR is assumed, nearest neighbour de-

coding is optimal in minimising the word error probability. However, practical

systems employ channel estimators that yield accurate yet imperfect channel

3.1We have the bounds for the modified Bessel function of the first kind I0(·) as [47, 48]

1 ≤ I0(x) =∑∞

i=0(x/2)2i

i!i! ≤(

∑∞i=0

(x/2)i

i!

)2

= ex, x ≥ 0. Using these bounds, we recover (3.4).

32

3.1 System Model

estimates. We model the channel estimate as

H b = H b + Eb, b = 1, . . . , B (3.5)

where H b and Eb are the nr × nt-dimensional noisy channel estimate and chan-

nel estimation error matrices, respectively. In particular, the entries of Eb are

assumed to be independent of the entries of H b and to have an i.i.d. complex-

Gaussian distribution with zero mean and variance

σ2e = SNR

−de , de > 0. (3.6)

Thus, we have assumed a family of channel estimation schemes for which the

CSIR noise variance is a decreasing function of the SNR. This model is widely

used in pilot-based channel estimation for which the error variance is proportional

to the reciprocal of the pilot SNR [49, 50]. We generalise this reciprocal of the

pilot SNR with the parameter de which denotes the channel estimation error

diversity.

A nearest neighbour decoder is used to infer the transmitted message. Due

to its optimality under perfect CSIR and its simplicity, this decoder is widely

used in practice even when perfect CSIR cannot be guaranteed. With imperfect

CSIR, the decoder treats the imperfect channel estimate as if it were perfect.

Let Hb be the realisation of H b. Assuming a memoryless channel, the decoder

performs decoding by calculating the following metric

QY |X,H (yb,ν|x, Hb) =1

πnre−∥

∥

∥yb,ν−√

SNR

ntHbx

∥

∥

∥

2

(3.7)

for each channel use, i.e., for b = 1, . . . , B, ν = 1, . . . , J . The decision is made

at the end of BJ channel uses for a single codeword. The message output corre-

sponds to the codeword that maximises the product of the metric (3.7) over BJ

channel uses.

Note that hb,r,t, hb,r,t and eb,r,t are the elements of Hb, Hb and Eb at row r,

r = 1, . . . , nr, and column t, t = 1, . . . , nt, respectively (where Hb, Hb and Eb are

the realisations of H b, H b and Eb, respectively). We define αb,r,t , − log |hb,r,t|2log SNR

,

αb,r,t , − log |hb,r,t|2log SNR

and θb,r,t , − log |eb,r,t|2log SNR

. Then, Ab, Ab and Θb are nr × nt

matrices whose element at row r and column t is given by αb,r,t, αb,r,t and θb,r,t,

respectively, for all r = 1, . . . , nr and t = 1, . . . , nt. We use this change of random

variables to analyse the communication performance in the high-SNR regime.

33

3.2 Outage Diversity in Block-Fading Channels


We have defined the outage diversity dcsir in Section 2.5, i.e.,

dcsir = − logPout(R)

log SNR(3.8)

where

Pout(R) = PrI⋆(H) < R (3.9)

and where I⋆(H) is the maximised mutual information for which the maximisation

is over all input distribution PX(x) over Xnt. We have also shown in Section 2.5

that for a given input alphabet X, dcsir is the largest diversity that a code may

have.

As mentioned in Section 3.2, we focus on Gaussian and discrete signal con-

stellations. The motivation of using Gaussian constellation can be understood

since for power-limited channels with additive Gaussian noise and perfect CSIR,

the input distribution that maximises the mutual information—maximised over

all possible PX(x) over nt—is Gaussian. However, the application of Gaussian

constellation for code design is limited in practice. The main reason is that Gaus-

sian distributions have unbounded support. This is not desirable from a practical

perspective since the transmission power cannot be peak-limited and we oper-

ate with infinite alphabet size. This motivates a study on systems with discrete

alphabets for which the alphabet size is fixed and for which each constellation

point has a finite energy.

Note that the evaluation of I⋆(H) requires an input distribution PX(x) over

Xnt that maximises the mutual information. It has been shown in the literature

that the optimal input distribution depends on the operating SNR. At high

SNR, the nt-variate complex-Gaussian distribution with zero mean and identity

covariance matrix Int is optimal in terms of diversity for continuous alphabets [20].

On the other hand, for an nt-variate discrete alphabet Xnt, the optimal input

distribution at high SNR is given by independent and equiprobable elements

X1, . . . , Xnt (where Xt is the t-th element ofX) on X since equiprobable elements

maximise the entropy [51]. At low SNR, for both Gaussian and discrete signal

constellations, the optimal input distribution is characterised by the nt elements

that are fully correlated to better combat the noise [51, 52].

In this dissertation, since we are particularly interested in the high-SNR

regime, we shall only consider input distributions that achieve the optimal di-

versity dcsir for a given alphabet, i.e. zero-mean identity-covariance-matrix nt-

34


variate complex-Gaussian distribution and independent and equiprobable ele-

ments X1, . . . , Xnt on X. With these input distributions, it suffices to consider

I(H) instead of I⋆(H) to find the outage diversity (3.8).

Recall the mutual information for the block-fading channel (2.8)

I(H) =1

B

B∑

b=1

Iawgn

(

√

SNR

ntHb

)

(3.10)

where Iawgn (Ψ) is the mutual information of an AWGN MIMO channel for a

given channel matrix Ψ. In particular, the mutual information is given by

Iawgn

(

√

SNR

nt

Hb

)

= log2 det

(

Inr +SNR

nt

HbH†b

)

(3.11)

for Gaussian inputs and

Iawgn

(

√

SNR

ntHb

)

=Mnt − [

log2∑

x′∈Xnt

e−∥

∥

∥

√

SNR

ntHb(X−x′)+Z

∥

∥

∥

2+‖Z‖2

]

(3.12)

for discrete inputs, where M , log2 |X|. To the best of our knowledge there is

no closed form expression for the expectation term in the mutual information

for discrete inputs. However, this expectation can be computed efficiently using

Gauss-Hermite quadratures [53] for systems of small size; for larger sizes the

above expectation needs to be evaluated using Monte Carlo methods. It follows

that for both Gaussian and discrete signal constellations, the outage diversity

satisfies the dot equality

PrI(H) < R .= SNR

−dcsir. (3.13)

We revisit existing results on the optimal SNR-exponents for both Gaussian

and discrete inputs with perfect CSIR.

Lemma 3.1. Consider transmission over a MIMO block-fading channel at fixed

rate R with fading model parameters w0, w1, w2, τ and ϕ as described in (3.4)

and perfect CSIR using Gaussian constellation and discrete signal constellation

of size 2M . Then,

Pout(R).= Pr I(H) < R .

= SNR−dcsir (3.14)

35

3.3 Generalised Outage Diversity

where

dcsir =

(

1 + τ2

)

Bntnr for Gaussian inputs(

1 + τ2

)

dSB(R) for discrete inputs,(3.15)

and

dSB(R) = nr

(

1 +

⌊

B

(

nt −R

M

)⌋)

(3.16)

is the Singleton bound for MIMO channels. The exponent dcsir is achieved by

random codes for both Gaussian and equiprobable discrete inputs.

Proof. For Gaussian inputs, the proof is outlined in [47] by setting zero multi-

plexing gain. The proof for discrete inputs extending the results of [32] to the

general fading model in (3.4) is outlined in Appendix A.1.

The results in Lemma 3.1 show the interplay among the system and the

channel parameters in determining the optimal SNR-exponents. For any positive

target rate, Gaussian inputs achieve the maximum diversity. On the other hand,

the diversity achieved by the discrete input has a trade-off with the target rate

given by the Singleton bound. Note that

limM→∞

⌊

B

(

nt −R

M

)⌋

= Bnt − 1 (3.17)

which implies that sufficiently large constellations can always achieve maximum

diversity. The diversity characterisation for discrete inputs provides a bench-

mark for the error performance of practical codes. A good practical code must

have an SNR-exponent d⋆ in (2.56) that achieves the scaled Singleton bound

(1 + τ2)dSB(R).


In this section, we describe the behaviour of the generalised outage probability

for codebooks that are generated from i.i.d. Gaussian and discrete inputs. In

particular, we study the high-SNR regime characterised by the generalised outage

diversity, which is defined in (2.67)

dicsir = limSNR→∞

− logPgout(R)

log SNR(3.18)

where

Pgout(R) = PrIgmi(H) < R (3.19)

36


and H = H + E. For a given H = H and E = E, the GMI is given by

Igmi(H) = sups>0

1

B

B∑

b=1


where


=

log2Qs

Y |X,H(Y |X, H b)

[

QsY |X,H

(Y |X ′, H b)∣

∣

∣Y ,H b,Eb

]

∣

∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

. (3.21)

Note that if H = H, then Igmi(H) is equal to I(H). The evaluation of (3.21) gives

Igmib (SNR,Hb, Hb, s) = log2 det Σy − s

log 2

(

nr +SNR

nt‖Hb − Hb‖2F

)

+

[

sY †Σ−1y Y

∣

∣

∣H b = Hb,Eb = Eb

]

log 2(3.22)

for Gaussian inputs, where

Σy , Inr + sHbH†b

SNR

nt

(3.23)

and


= Mnt −

log2∑

x′∈Xnt

(

e−s

∥

∥

∥

√

SNR

nt(HbX−Hbx

′)+Z

∥

∥

∥

2+s

∥

∥

∥

√

SNR

nt(Hb−Hb)X+Z

∥

∥

∥

2)

(3.24)

for discrete inputs. It follows from Proposition 2.4 that the generalised outage

diversity is always upper-bounded by the outage diversity, i.e., dicsir ≤ dcsir.

In the following, we provide a precise fundamental relationship between dicsir

and dcsir. The converse results, which use the large block length results on the

generalised outage probability in Proposition 2.4, are summarised in the following

theorem.

Theorem 3.1 (Converse). Consider the MIMO block-fading channel, imperfect

CSIR and fading model described by (3.3), (3.5) and (3.4), respectively. Then,

for high SNR, the generalised outage probability using nearest neighbour decoding

37


based on (3.7) behaves as

Pgout(R) = Pr

Igmi(H) < R

.= SNR

−dicsir (3.25)

where

dicsir = min(1, de)× dcsir (3.26)

is the generalised outage SNR-exponent or the generalised outage diversity. This

relationship holds for code constructions based on both i.i.d. Gaussian and dis-

crete inputs.

Proof. We use bounding techniques to prove the result. The lower bound is

derived by evaluating the GMI structure for each type of input distribution. The

upper bound uses the upper bound in Proposition 2.3. See Appendices A.2 (for

discrete inputs) and A.3 (for Gaussian inputs).

Remark 3.1. Standard proof methodologies to derive an upper bound to the

SNR-exponent for MIMO channels with perfect CSIR employ a genie-aided re-

ceiver [32], which eliminates the interference among the nt transmit antennas. It

follows that the mutual information of a parallel nt single-input multiple-output

(SIMO) channel each with nr receive antennas serves as an upper bound to the

mutual information of the nr × nt MIMO channels. However, this approach may

not work for mismatched CSIR. In general, mismatched decoding introduces ad-

ditional interference during the decoding process and this interference may not be

reduced by having a genie-aided receiver that decomposes the MIMO channel into

a parallel SIMO channel.

Theorem 3.1 gives an upper bound to the decaying rate of the average er-

ror probability as a function of the SNR for sufficiently long codes. The re-

sults also provide a precise fundamental relationship between the perfect- and

imperfect-CSIR SNR-exponents. Suppose that a noisy channel estimator pro-

duces a Gaussian random estimation error with variance σ2e = SNR

−de . Then,

Theorem 3.1 shows that the imperfect-CSIR SNR-exponent is a linear function

of the perfect-CSIR SNR-exponent with a linear scaling factor of min(1, de).

The intuition on the scaling factor min(1, de) is as follows. Channel estimation

errors introduce supplementary outage events, adding to those due to deep fades

[20, 23]. Therefore, the generalised outage set contains the perfect-CSIR outage

set, and a generalised outage occurs when there is a deep fade, or when the

channel estimation error is high.

The above analysis also shows that the phases of the fading and of the chan-

nel estimation error play no role in determining the SNR-exponents for both

38

3.4 Random Coding Achievability

Gaussian and discrete signal constellations. However, as shown in the proof (Ap-

pendix A.2), it seems that the phases affect high-SNR outage events for discrete

signal constellations; the exact effect depends on the configuration of the specific

signal constellation.


For large block length, as shown in (2.32) and (2.33), it suffices to study Pgout(R)

to characterise the error probability for i.i.d. codebooks. However, practical

wireless communications typically operate with a fixed and finite block length.

Herein, we present the results of achievable random coding SNR-exponents for a

given block length J . In this context, we use the generalised Gallager exponents

[35, 37] and evaluate a lower bound to the SNR-exponents for any fixed length.

We first state the SNR-exponent achieved by random codes with Gaussian

constellations for any τ in the fading model (3.4). We will then provide a tighter

block length threshold for τ = 0.

Theorem 3.2 (Achievability - Gaussian Inputs). Consider the MIMO block-

fading channel (3.3) with fading distribution in (3.4) and data rate growing with

the logarithm of the SNR at multiplexing gain rg ≥ 0 as defined in (2.57). In the

presence of mismatched CSIR (3.5), there exists a Gaussian random code whose

average error probability is upper-bounded as

Pe,ave(SNR).≤ SNR

−dℓG(rg) (3.27)

where

dℓG(rg) = infAc

G∩A0,Θde×1

(

1 +τ

2

)

B∑

b=1

nr∑

r=1

nt∑

t=1

αb,r,t +

B∑

b=1

nr∑

r=1

nt∑

t=1

(θb,r,t − de)

+ J

(

B∑

b=1

[min(1, θmin)− αb,min]+ − Brg

)

(3.28)

and AcG is the complement of the set

AG ,

A,Θ ∈ Bnr×nt :

B∑

b=1

[min(1, θmin)− αb,min]+ ≤ Brg

, (3.29)

39


where

θmin , min θ1,1,1, . . . , θb,r,t, . . . , θB,nr,nt , (3.30)

αb,min , min αb,1,1, . . . , αb,r,t, . . . , αb,nr,nt . (3.31)

The function dℓG(rg) can be computed explicitly for any block length J ≥ 1.

Proof. See Appendix A.4.

Corollary 3.1. Following Theorem 3.2, the lower bound of the SNR-exponent

for Gaussian input with multiplexing gain rg ↓ 0 is given by

dℓG = limrg↓0

dℓG(rg) =

dicsir for J ≥⌈(

1 + τ2

)

ntnr

⌉

BJ min(1, de) otherwise.(3.32)

Proof. The proof is obtained by solving the infimum in Theorem 3.2 for a given

length J . The optimiser of Θ to dℓG(rg) is given by Θ∗ = de × 1 since an increase

in θmin increases αb,min in the constraint set, and since by the definition θb,r,t ≥θmin. To find α∗

b,r,t that gives the infimum (3.28), let αb,r,t ∈ [0,min(1, de)], and

let α∗b,min be the value of αb,min that is tight in Ac

G. Since αb,r,t ≥ αb,min, the

infimum solution for the rest of αb,r,t is given by α∗b,r,t = α∗

b,min. Using these, it is

straightforward to show (3.32).

For τ = 0, we have the following proposition that provides a tighter achiev-

ability bound than Theorem 3.2.

Proposition 3.1. Consider the MIMO block-fading channel (3.3) with fading

model (3.4) for τ = 0, imperfect CSIR (3.5), data rate growing with the logarithm

of the SNR at multiplexing gain rg ≥ 0 as defined in (2.57) and n∗ = min(nt, nr).

Let λb be a row vector consisting of the non-zero eigenvalues of HbH†b, with 0 <

λb,1 ≤ · · · ≤ λb,n∗, and λ , λ1, . . . , λb, . . . , λB. Define the variable υb,i as υb,i ,− log λb,i

log SNR. There exists a Gaussian random code whose average error probability

is upper-bounded as

Pe,ave(SNR).

≤ SNR−dℓG(rg) (3.33)

40


where

dℓG(rg)

= infAc

G∩υ0,Θde×1

B∑

b=1

n∗∑

i=1

(2i− 1 + |nt − nr|)υb,i +B∑

b=1

nr∑

r=1

nt∑

t=1

(θb,r,t − de)

+ J

(

B∑

b=1

n∗∑

i=1

[min(1, θmin)− υb,i]+ − Brg

)

(3.34)

and AcG is the complement of the set

AG ,

υ ∈ Bn∗

,Θ ∈ Bnr×nt :

B∑

b=1

n∗∑

i=1

[min(1, θmin)− υb,i]+ ≤ Brg

, (3.35)

where

θmin , min θ1,1,1, . . . , θb,r,t, . . . , θB,nr,nt . (3.36)

The function dℓG(rg) can be computed explicitly for any given block length J ≥ 1.


The optimiser of Θ in (3.34) is given by Θ∗ = de × 1 because an increase in

θmin increases υb,i in the objective function and the constraint set. To find υ∗b,ithat gives the infimum (3.34), let υb,i ∈ [0,min(1, de)]. Following [20], for J ≥nt + nr − 1, the infimum always occurs with

∑Bb

∑n∗

i

(

min(1, de)− υ∗b,i)

= Brg.

For rg ↓ 0 (fixed coding rate), the length of J ≥ nt + nr − 1 leads to

dℓG = limrg↓0

dℓG(rg) = min(1, de)Bntnr. (3.37)

Using similar steps to those in [20], if J < nt + nr − 1, then dℓG is not tight to

the dicsir and the random coding achievable SNR-exponent is given by solving the

infimum of dℓG(rg) in (3.34) as rg ↓ 0, which is strictly smaller than dicsir.

Therefore, for random codes with Gaussian constellations, the lower bound

of the SNR-exponent is dicsir as long as the block length satisfies

J ≥

nt + nr − 1 for τ = 0⌈(

1 + τ2

)

ntnr

⌉

otherwise.(3.38)

The achievability of the random codes constructed over discrete alphabets of

size |X| = 2M for a given block length J is summarised as follows.

41


Theorem 3.3 (Achievability - Discrete Inputs). Let the block length J grow as

limSNR→∞J(SNR)log SNR

= ω, ω ≥ 0. Then, there exists a random code constructed over

discrete-input alphabet of size |X| = 2M such that the average error probability

with mismatched CSIR (3.5) is upper-bounded by

Pe,ave(SNR).

≤ SNR−dℓ

X(R) (3.39)

where

dℓX(R) = ωBM log 2

(

nt −R

M

)

(3.40)

for 0 ≤ ω <min(1,de)×(1+ τ

2 )nr

M log 2,

dℓX(R) = ωM log 2

(

1 +

⌊

BR

M

⌋

− BR

M

)

+min(1, de)×(

1 +τ

2

)

nr

(⌈

B

(

nt −R

M

)⌉

− 1

)

(3.41)

formin(1,de)×(1+ τ

2 )nr

M log 2≤ ω < 1

M log 2×(

min(1,de)×(1+ τ2 )nr

1+⌊BRM ⌋−BR

M

)

and

dℓX(R) = min(1, de)×(

1 +τ

2

)

nr

⌈

B

(

nt −R

M

)⌉

(3.42)

for ω ≥ 1M log 2

×(

min(1,de)×(1+ τ2 )nr

1+⌊BRM ⌋−BR

M

)

.


The assumption of the block length J to grow with the SNR as J(SNR) =

ω log SNR, ω ≥ 0 is to obtain a more precise characterisation of what can be

achieved using random codes with a discrete constellation of size 2M . The results

show the interplay among the growth rate ω, the cardinality of the constellation

2M and the achievable SNR-exponent with random coding.

Remark 3.2. By letting ω → ∞, we have that

dℓX(R) = min(1, de)×

(

1 +τ

2

)

nr

⌈

B

(

nt −R

M

)⌉

(3.43)

≤ min(1, de)×(

1 +τ

2

)

nr

(

1 +

⌊

B

(

nt −R

M

)⌋)

(3.44)

= dicsir. (3.45)

This implies that dℓX(R) is equal to dicsir only at the continuous points of the

42


0 2 4 6 80

2

4

6

8dℓ X(R

)

R

ωM log 2 → ∞

ωM log 2 = 2

ωM log 2 = 12

Figure 3.2: Random coding SNR-exponent lower bound for discrete signal code-books as a function of target rate R (in bits per channel use), B = 4, nt = 2,nr = 2, τ = 0, M = 4 and de = 0.5.

Singleton bound. Hence, random codes based on discrete constellations are only

able to achieve dicsir at the continuous points only for all possible R, with the block

length growing as log SNR and the growth rate is very large. Herein we have also

shown analytically that the block length is required to grow with log SNR at a

certain growth rate ω > 0 to obtain a non-zero random coding SNR-exponent.

This is illustrated in Figure 3.2. To achieve the perfect-CSIR outage diversity at

the continuous points of the Singleton bound, the growth rate, ω, should approach

infinity, and the channel estimator should provide reliable estimates (de ≥ 1).

Figure 3.2 also explains the case of a finite-valued ω. If ω is fixed and satisfies

ωM log 2 < min(1, de) ×(

1 + τ2

)

nr, then dℓX(R) is strictly smaller than dicsir for

any positive rate R. On the other hand, if ω is fixed and satisfies ωM log 2 ≥min(1, de) ×

(

1 + τ2

)

nr, then dℓX(R) is equal to dicsir for some values of R (as

indicated by the dashed line in Figure 3.2); larger ω implies a larger range of

values of R for which dℓX(R) is equal to dicsir. As ω tends to infinity, random

coding achieves dicsir for all R except at the discontinuity points of the Singleton

bound (as shown by the solid line in Figure 3.2).

43

Chapter2/Chapter2Figs/EPS/Fig-RC-UB-Discrete.eps

3.5 Discussion

3.5 Discussion

3.5.1 Insights for System Design

The following observations can be obtained from Theorems 3.1, 3.2, 3.3 and

Proposition 3.1.

1. The optimal SNR-exponent for any coding scheme can be obtained when

de ≥ 1. The converse on the SNR-exponent is strong since Pgout(R) has the

same exponential decay in SNR as Pout(R). We need both a good channel

estimation and a good code design to achieve the optimal SNR-exponent.

2. The term min(1, de) appears naturally from the data-processing inequality.

It highlights the importance of having channel estimators that can achieve

the CSIR-error diversity de ≥ 1. We are able to achieve the perfect-CSIR

SNR-exponent provided that de ≥ 1. If de < 1, the resulting SNR-exponent

scales linearly with de and approaching zero for de ↓ 0. Figure 3.3 illustrates

this effect in a discrete-input block-fading channel with B = 4, τ = 0,

nt = 2 and nr = 2. In a block-fading setup, this result provides a more

precise characterisation on the accuracy of channel estimation at high SNR

than [54].

3. The role of the channel estimation error diversity de is governed by the

channel estimation model. With a maximum-likelihood (ML) estimator,

it can be shown that de is proportional to the pilot power [49]. Larger

pilot power implies larger de. Hence, the price for obtaining high outage

diversity is in the pilot power which does not contain any information data.

The bounding condition min(1, de) implies that the perfect-CSIR outage

diversity can be achieved with de = 1. Note that although having larger de

for de > 1 shows no diversity improvement, it still leads to a better outage

performance. As de tends to infinity, the outage performance converges to

that with perfect CSIR.

4. The outage diversity in Theorem 3.1 is valid for the general fading model

described by (3.4). This fading model is used extensively in analysing the

performance of radio-frequency (RF) wireless communications.

5. For a given de ≥ 1, Gaussian random codes with finite block length can

achieve the perfect-CSIR outage diversity (1+ τ2)Bntnr as long as the block

length is larger than a threshold. On the other hand, discrete-alphabet

44

3.5 Discussion

0 0.5 1 1.5 20

5

10

15dicsir

RM

de = 1

de = 0.5

de = 0.05

Figure 3.3: Generalised outage SNR-exponent for discrete-input block-fadingchannel, B = 4, τ = 0 (Rayleigh, Rician and Nakagami-q fading), nt = 2 andnr = 2.

random codes with finite block length cannot achieve the perfect-CSIR

outage diversity (1+ τ2)dSB(R). In order for these random codes to achieve

(1 + τ2)dSB(R) for almost everywhere, the block length needs to grow as

ω log SNR.

Figures 3.4 and 3.5 provide simulation results of the generalised outage proba-

bility Pgout(R) for Gaussian and binary phase-shift keying (BPSK) inputs, res-

pectively, over a MIMO Rayleigh block-fading channel with nt = 2 and nr = 1.

The following parameters are specified: B = 2 and R = 2 bits per channel use

for Gaussian inputs and B = 2, and R = 1 bits per channel use for BPSK

inputs. The curves were generated as follows. Monte Carlo simulation was used

to compute the number of outage events. Firstly, each entry of Hb and Eb,

b = 1, . . . , B were independently generated from zero-mean complex-Gaussian

distributions with variance one and σ2e = SNR

−de, respectively. The values of

de = 0.5, 1, and 2 were used for comparison with the perfect CSIR outage

probability. Secondly, for a fixed s > 0, channel Hb and channel estimate Hb =

Hb + Eb, Igmib (SNR,Hb, Hb, s), b = 1, . . . , B were computed for Gaussian inputs

(3.22) and BPSK inputs (3.24). Note that for Gaussian inputs, the generalised

45

Chapter2/Chapter2Figs/EPS/Fig-Staircase-B4-2by2-Discrete.eps

3.5 Discussion

0 10 20 30 4010

−4

10−3

10−2

10−1

100

Pgout(R)

SNR (dB)

Perfect CSIRde = 0.5

de = 1

de = 1.25

Figure 3.4: Generalised outage probability for Gaussian-input MIMO Rayleighblock-fading channel with B = 2, R = 2, nt = 2 and nr = 1.

outage diversity may not be derived directly from (3.22) particularly due to

the term [sY †Σ−1y Y |H b = Hb,Eb = Eb]. However, this term can be evaluated

numerically using the singular value decomposition [86] as in Appendix A.3.2. On

the other hand, for BPSK inputs, we compute the expectation in (3.24) using the

Gauss-Hermite quadratures [53]. Thirdly, for a fixed Hb and Hb, the supremum

over s on the RHS of (3.20) was solved using standard convex optimisation

algorithm since the function

1

B

B∑

b=1


is concave in s for s > 0.3.2 Finally, an outage event was declared whenever

Igmi(H) was less than R. Then, Pgout(R) was given by the ratio of the number

of outage events and the number of total transmissions. From the figures, we

observe that dcsir is equal to 4 and 3 for Gaussian and BPSK inputs, respectively.

As predicted by Theorem 3.1, the slope becomes steeper with increasing de,

eventually becoming parallel to the perfect CSIR outage curve for de ≥ 1. For

3.2The concavity of Igmib (SNR,Hb, Hb, s) and

1B

∑Bb=1 I

gmib (SNR,Hb, Hb, s) in s, s > 0 can be

shown using the same technique used to prove the concavity of EQ0 (s, ρ, Hb) in Section 2.4.2.1.

46

Chapter2/Chapter2Figs/EPS/Fig_G_mimo2x1_B2R2.eps

3.5 Discussion

−5 0 5 10 15 20 25 30 3510

−4

10−3

10−2

10−1

100

Pgout(R)

SNR (dB)

Perfect CSIRde = 0.5

de = 1

de = 1.25

Figure 3.5: Generalised outage probability for BPSK-input MIMO Rayleighblock-fading channel with B = 2, R = 1, nt = 2 and nr = 1.

de > 1, the slope does not increase as de increases. However, we still observe the

improvement in outage gain; the curves for de = 1.25 achieve the same Pgout(R)

as the one for de = 1 at a lower SNR.

3.5.2 The DMT for Gaussian Codebooks

In this work, we focus on fixed-rate transmission such that the multiplexing gain

rg tends to zero. The analysis can be extended to any positive multiplexing gain

(rg > 0), which is relevant for continuous inputs such as Gaussian inputs or

discrete inputs with alphabet size increasing with the SNR.

The analysis of a general positive multiplexing gain is implicitly covered in

Theorem 3.2 and Proposition 3.1. In the limit as the block length tends to

infinity, both results provide lower bounds to the optimal diversity-multiplexing

trade-off (DMT). Note that from Theorem 3.2, one may obtain dℓG(rg) for general

fading parameter τ and for J → ∞ as

dℓG(rg) =(

1 +τ

2

)

Bntnr (min(1, de)− rg) , 0 ≤ rg ≤ min(1, de). (3.47)

Note that this lower bound can be loose since the maximum multiplexing gain for

47

Chapter2/Chapter2Figs/EPS/Fig_BPSK_mimo2x1-B2R1.eps

3.5 Discussion

a positive diversity is min(1, de) as compared to min(nt, nr) for the case of perfect

CSIR [20]. From Proposition 3.1, one may obtain dℓG(rg) for fading parameter τ =

0 and for J → ∞ as the trade-off with the multiplexing gain. Indeed, as shown

in Appendix A.6, the lower bound of the optimal DMT curve dℓG(rg) for J → ∞is given by the piecewise-linear function connecting the points (rg, d

ℓG(rg)), where

rg = 0,min(1, de), 2min(1, de), . . . ,min(nt, nr)min(1, de), (3.48)

dℓG(rg) = min(1, de) · B(

nt −rg

min(1, de)

)

·(

nr −rg

min(1, de)

)

. (3.49)

Note that we have dℓG,max = min(1, de)Bntnr and rg,max = min(nt, nr)min(1, de).

This lower bound is tight for B = 1 and de ≥ 1, which is the perfect-CSIR

DMT [20].

There are several reasons why the above bounds may not be tight for mis-

matched CSIR. The first one is that we only evaluate the above bounds based on

Gallager’s lower bound to the error exponent (2.24) which for large J yields an

upper bound to the generalised outage probability as shown in Appendices A.4

and A.6. Hence, these bounds are not an exact characterisation of the generalised

outage probability. The second one is that for the bound in Theorem 3.2, the

trade-off is derived by using the joint pdf of the entries of Hb and Eb. Note that

this leads to a further lower bound as shown in Appendix A.6. A tighter bound

is obtained by considering the analysis using the joint pdf of the eigenvalues.

However, this last approach has some technical difficulties particularly for τ 6= 0

as shown in Appendix A.6.

From Appendix A.3.2, we have an upper bound to the optimal DMT as

dGicsir(rg) ≤(

1 +τ

2

)

Bntnr min

(

1− rgmin(nt, nr)

, de

)

, 0 ≤ rg ≤ min(nt, nr).

(3.50)

Note the upper bound is trivial for any de ≤ 1− rgmin(nt,nr)

because this is identical

to the result with rg ↓ 0, which always upper-bounds the optimal DMT. The

upper bound (3.50) and the lower bound (3.47) are tight only for rg ↓ 0.

3.5.3 Optical Wireless Scintillation Distributions

In optical wireless scintillation channels, we mainly deal with the received signal

intensities, and not complex input symbols or complex fading realisations; thus

the use of real amplitude modulation such as pulse-position modulation (PPM)

is common [55–57]. This means that the channel phase is not being considered

48

3.6 Conclusion

in the detection and only the real-part of the complex-Gaussian noise affects the

decision. However, the mutual information and the GMI expressions in (3.10)

and (3.20) are valid for real-valued signals and real-valued fading responses as

well. Notice that in our converse and achievability results, we have used the

joint probability of A and . The parameter that distinguishes the resulting

SNR-exponents for different fading conditions is the channel parameter τ in the

form of 1 + τ2. This form comes out naturally from the pdf after defining αb,r,t ,

− log |hb,r,t|2log SNR

. Thus, as long as after performing the change of random variables, we

can express the pdf of the normalised fading gain for each channel matrix entry

as

PAb,r,t(α)

.= exp

(

−(

1 +τ

2

)

α log SNR)

, α ≥ 0 (3.51)

then our main results are valid for those fading distributions as well. Conse-

quently, the results are valid for fading distributions used in optical wireless

scintillation channels such as lognormal-Rice distribution for which(

1 + τ2

)

= 12

and gamma-gamma distribution for which(

1 + τ2

)

= 12min(a, b), where a, b are

the parameters of the individual gamma distributions [55–57].

3.6 Conclusion

We have examined the outage behaviour of nearest neighbour decoding in MIMO

block-fading channels with imperfect CSIR. By treating the problem as a mismat-

ched-decoding problem, we apply the GMI and the generalised outage probabil-

ity, the probability that the GMI is less than the data rate. Due to the data-

processing inequality for error exponents and mismatched decoders, the gener-

alised outage probability is larger than the outage probability of the perfect CSIR

case.

We have analysed the generalised outage probability in the high-SNR regime

and we have derived the diversity (SNR-exponent) for both Gaussian and discrete

inputs. We have shown that in both cases, the SNR-exponents are given by the

perfect-CSIR SNR-exponents scaled by the minimum of the channel estimation

error diversity—which measures the exponential decay of the channel estimation

error with respect to the SNR—and one. Note that the perfect-CSIR outage

diversity is the highest diversity that a code may have. Therefore, in order to

achieve the highest possible diversity, the channel estimator should be designed in

such a way so as to make the estimation error diversity equal to or larger than one.

Furthermore, the optimal SNR-exponent for Gaussian inputs can be achieved

using Gaussian random codes with finite block length as long as the block length

49

3.6 Conclusion

is greater than a threshold; this threshold depends on the fading distribution and

the number of antennas. On the other hand, for discrete-constellation random

codes, the optimal SNR-exponent can be achieved using block length that grows

very fast with log SNR. All these results are well applicable for many fading

distributions.

Our main result suggests that the nearest neighbour decoder can be reliable—

in the sense that it achieves the perfect-CSIR diversity—if the receiver can pro-

vide sufficiently accurate channel estimates. The analysis finds applications in

pilot-based channel estimation for which achieving the perfect-CSIR diversity is

possible by allocating the same amount of power to both data and pilot. Note

that even though increasing the pilot power beyond the data power does not

enhance the diversity, it generally improves the generalised outage probability

and closes the gap with the perfect-CSIR outage probability.

Given that the channel estimate is sufficiently accurate, the design of good

codes with imperfect CSIR has the same performance benchmark (in terms of the

diversity) as that with perfect CSIR. Our random coding achievability analysis

suggests that such good codes do exist. If the channel estimate is not accurate

enough, then our imperfect-CSIR generalised outage diversity may not be the

highest diversity. One can possibly achieve a better performance by using a

more complicated coding scheme (such as the ones with non-i.i.d. inputs) and

decoding scheme (such as the ones that decode without having to perform explicit

channel estimation).

50

Chapter 4

IR-ARQ in MIMO Block-Fading

Channels with Imperfect

Feedback and CSIR

In Chapter 2, we have shown that the outage diversity (with perfect CSIR) is the

highest diversity that any good codes can achieve. In Chapter 3, with uniform

power allocation, we have derived the relationship between the outage diversity

and the generalised outage diversity (with imperfect CSIR). For most fading

distributions both the outage and the generalised outage diversities are finite,

which implies that an arbitrarily small error probability is not attainable.

In this chapter, we consider incremental-redundancy automatic repeat-request

(IR-ARQ) to improve the generalised outage diversity. We provide a brief intro-

duction to IR-ARQ in Section 4.1. We describe the system model in Section 4.2.

We then explain some useful performance metrics for IR-ARQ in Section 4.3.

We state our main results and discuss the findings in Section 4.4. Finally, we

summarise the important findings of the chapter in Section 4.5.

4.1 Introduction

ARQ is a widely-known technique that allows to increase the reliability of trans-

mission over inherently unreliable channels like slowly-varying wireless chan-

nels [58]. Traditionally, ARQ reduces decoding errors based on retransmission

process. When erroneous packets are detected, the receiver requests retransmis-

sion of the same packet until the maximum number of rounds for a given message

is reached. An advanced form of ARQ is hybrid ARQ, where channel coding is

used to protect the packet from the impairments of the channel and provide

51

4.1 Introduction

error-correction capability.

IR-ARQ is a hybrid-ARQ scheme that employs rate-compatible codes where

observations over different transmission rounds can be combined [59–61]. The

coding rate decreases with the number of transmission rounds to adapt to poor

channel conditions. References [60–62] have shown that IR-ARQ provides signif-

icant gains over conventional hybrid-ARQ schemes that only uses channel coding

without combining the observations.

The code diversity performance of IR-ARQ over block-fading channels has

been studied in the literature. A notable result was the optimal rate-diversity-

delay trade-off, firstly studied in [26] for Gaussian inputs under quasi-static fad-

ing channels and later in [29] for both Gaussian and discrete inputs under block-

fading channels. In these works, it has been shown that the optimal diversity is an

increasing function of the maximum allowed ARQ delay, L. Thus, the reliability

improvement offered by ARQ comes into effect at the expense of increasing trans-

mission delay, which inherently reduces the throughput. The throughput loss,

however, is negligible at sufficiently high SNR. This demonstrates that IR-ARQ

is a good technique for reliability improvement at high SNR without sacrificing

throughput.

Common assumptions of the above works are perfect CSIR and perfect ARQ

feedback. Unfortunately, these assumptions are difficult to guarantee in practice.

A number of works have also addressed the effect of imperfect CSIR and

feedback in ARQ channels. For example, references [63–65] studied the effect

of imperfect feedback whereas references [66, 67] investigated the role of the

imperfect-CSIR accuracy on the performance of ARQ schemes. Most of these

works consider separate coding for error correction and detection. A message is

firstly encoded using error-detecting encoder and the resulting packet is subse-

quently encoded using error-correcting encoder [63, 64, 68]. Thus, in general the

performance of the system depends closely on the type of error-correcting and

error-detecting codes.

In this chapter, we consider both imperfect CSIR and feedback and study

precisely their impact on the diversity performance of IR-ARQ coding systems.

In particular, we consider noisy CSIR and a simple binary symmetric channel

(BSC) model for the feedback link. Inspired by [58], we do not use separate coding

for error correction and detection. Instead, we use random coding schemes at

the transmitter and a threshold decoder at the receiver capable of detecting an

error. More specifically, we analyse the performance of random coding schemes

constructed over Gaussian and discrete signal constellations. The ARQ decoder

52

4.2 System Model

mARQ

Encoder

√

Pℓnt

Xℓ(m) YℓIR ARQ

DecoderY1,ℓ

Fr(ℓ)

mChannel

Feedbackchannel

Ft(ℓ)

Figure 4.1: System model for IR-ARQ transmission with binary feedback.

consists of distance-metric decoders that behave as a threshold decoder for rounds

up to (L− 1) and as a nearest neighbour decoder at L-th round.

We first study the corresponding error probability, which is characterised

using the GMI [35, 37] introduced in Chapter 2. We then derive the optimal

SNR-exponent for i.i.d. codebooks in the limit of large code length. In particu-

lar, assuming a general power allocation strategy, we show that the feedback-link

reliability must improve with the transmit SNR for the code to be able to exploit

the diversity offered by ARQ. We also identify the two extremes of imperfect feed-

back: the one that guarantees perfect-feedback diversity and the one that shows

the inability of ARQ to improve the diversity performance. We then consider

uniform power allocation and optimal long-term power control at the transmit-

ter. We derive the conditions for which power control may provide additional

diversity gains with respect to uniform power allocation.

4.2 System Model

The overview of the system model is illustrated in Figure 4.1. In the following,

we shall explain each entity in the system model.

4.2.1 Channel Model

We consider an IR-ARQ coding scheme over a MIMO block-fading channel with

nt transmit antennas, nr receive antennas, L rounds and B fading blocks per

round. The output of the channel at ARQ round ℓ is a Bnr × J-dimensional

random matrix

Yℓ =

√

Pℓ

nt

H ℓXℓ + Zℓ, ℓ = 1, . . . , L (4.1)

53

Chapter3/Chapter3Figs/IR-ARQ-Model.eps

4.2 System Model

where Zℓ is the Bnr×J-dimensional noise matrix and Xℓ ∈ XBnt×J is the transmit-

ted signal matrix; J denotes the channel block length, X denotes signal constella-

tion and Pℓ denotes the allocated power at round ℓ satisfying the average-power

constraint

[

1

L

L∑

ℓ=1

Pℓ

]

≤ P. (4.2)

We assume that the entries of Zℓ, ℓ = 1, . . . , L are i.i.d. complex-Gaussian

random variables with zero mean and unit variance; P indicates the average

SNR at each received antenna. The fading matrix H ℓ is a Bnr × Bnt random

block-diagonal matrix defined by

H ℓ , diag (H ℓ,1, . . . ,H ℓ,B) (4.3)

where H ℓ,b, b ∈ 1, . . . , B, ℓ ∈ 1, . . . , L is the fading matrix for round ℓ and

block b, and takes values in nr×nt. We further assume that the fading process

H ℓ follows the short-term static model [26] for which H ℓ,b are i.i.d. from block to

block and from round to round. The distribution of H ℓ,b follows from the fading

model in Section 3.1. We write the channel outputs accumulated up to round ℓ

as

Y1,ℓ = H1,ℓX1,ℓ + Z1,ℓ (4.4)

where

Y1,ℓ = [YT

1, . . . ,YT

ℓ ]T, X1,ℓ = [XT

1, . . . ,XT

ℓ ]T, (4.5)

Z1,ℓ = [ZT

1, . . . ,ZT

ℓ ]T , H1,ℓ = diag

(

√

P1

nt

H1, . . . ,

√

Pℓ

nt

H ℓ

)

. (4.6)

We further assume that imperfect channel estimates have the same form as (3.5)

H ℓ = H ℓ + Eℓ, ℓ = 1, . . . , L (4.7)

where H ℓ , diag(H ℓ,1, . . . , H ℓ,B) and Eℓ , diag (Eℓ,1, . . . ,Eℓ,B) (with H ℓ,b and Eℓ,b

taking value in nr×nt) are the noisy channel estimate and the channel estimation

error, respectively. The entries of Eℓ,b, b = 1, . . . , B are independent from the

entries of H ℓ,b and i.i.d. complex-Gaussian random variables with zero mean and

variance

σ2e = P−de, de > 0 (4.8)

54

4.2 System Model

where de is the channel estimation error diversity. As explained in Chapter 3,

this model is widely used in pilot-based channel estimation. Finally, we write

H1,ℓ and E1,ℓ similarly to H1,ℓ in (4.6).

4.2.2 IR-ARQ Scheme

At the transmitter end, a mother code of rate RLis constructed based on concate-

nating several coded blocks. A mother codeword used to encode the message m,

m ∈ 1, . . . , |M| is defined by

X1,L(m) , [XT

1(m), . . . ,XT

L(m)]T

(4.9)

which is made of L matrices Xℓ(m) ∈ XBnt×J , ℓ = 1, . . . , L. Here M is the set

of all possible messages (equiprobable) such that |M| = 2BJR, where R is the

coding rate for each Xℓ(m). Note that Xℓ(m) may be different for different ℓ. At

round ℓ, the transmitter sends the matrix Xℓ, which consists of B coded blocks.

Each coded symbol is drawn i.i.d. from an input signal set X; here we focus on

Gaussian and discrete inputs. For Gaussian inputs, we assume a fixed data rate,

i.e., R > 0, independent of the SNR, whereas for discrete inputs, the data rate

is limited by the number of transmit antennas and the cardinality of signal set

|X| = 2M , i.e., R ∈ (0,Mnt). The constellations are assumed to satisfy the unit

average energy.

At the receiver end, for each transmission round ℓ = 1, . . . , L, the IR-ARQ

decoder considers all coded blocks up to the current round. At rounds ℓ < L,

conditioned on fixed channel H1,ℓ = H1,ℓ and its corresponding estimate H1,ℓ =

H1,ℓ, the receiver employs a threshold decoder based on the set

Tδ(ℓ) =

X1,ℓ(m),Y1,ℓ, H1,ℓ :QsY|X ,H

(

Y1,ℓ

∣

∣

∣X1,ℓ(m), H1,ℓ

)

[

QsY|X ,H

(

Y1,ℓ

∣

∣

∣X′1,ℓ, H1,ℓ

)] ≥ |M|δ

(4.10)

where s and δ are some positive numbers, and where the expectation is taken over

the probability distribution PX(X′1,ℓ). Herein Q

Y|X ,H (·) is the Euclidean-distance

metric given by

QY|X ,H

(

Y1,ℓ

∣

∣

∣X1,ℓ, H1,ℓ

)

,

ℓ∏

ℓ′=1

B∏

b=1

J∏

ν=1

1

πnte−∥

∥

∥yℓ′,b,ν−

√Pℓ′/nt Hℓ′,bxℓ′,b,ν

∥

∥

∥

2

. (4.11)

The threshold decoder Ψ(·) outputs:

55

4.2 System Model

• Ψ(Y1,ℓ) = m, m ∈ 1, . . . , |M| if X1,ℓ(m) is the unique sequence such that

its normalised metric is greater than the threshold |M|δ

in Tδ(ℓ). A positive

acknowledgement (ACK) is then generated.

• Ψ(Y1,ℓ) = 0 if no unique sequence is found, and an error is declared. A

negative ACK is then generated.

At round ℓ = L, error detection is not required. A maximum-metric decoder,

which is identical to the nearest neighbour decoder in Chapter 3, is used to output

the message m such that

m = arg maxm=1,...,|M|

QY|X ,H

(

Y1,L

∣

∣

∣X1,L(m), H1,L

)

. (4.12)

A detailed analysis on the performance of the decoder is given in Section 4.3.1.

At round ℓ, ℓ < L, the transmitter receives through a feedback channel a

positive ACK or negative ACK. If the positive ACK is received, the transmit-

ter understands that the message has been successfully delivered and starts the

transmission of the next message. Instead, if the negative ACK is received, the

transmitter sends the next ARQ round corresponding to the current message,

Xℓ+1(m). This process continues until a positive ACK is received or until the

maximum round L has been reached.

Synchronous transmission is assumed. Each transmission round is numbered

and the sequence is known perfectly at both terminals.

4.2.3 Feedback Channel and ARQ

We shall denote the ACK generated by the receiver at round ℓ as Fr(ℓ). Similarly,

the ACK received by the transmitter at round ℓ is denoted by Ft(ℓ). The numbers

0 and 1 denote negative and positive ACKs, respectively. For example, Fr(ℓ) = 1

(Fr(ℓ) = 0) is the positive (negative) ACK generated by the receiver at round

ℓ and Ft(ℓ) = 1 (Ft(ℓ) = 0) is the positive (negative) ACK received by the

transmitter at round ℓ.

We model the feedback channel as a BSC with crossover probability pfb. The

motivation is that the ACK sent by the receiver may be interpreted incorrectly

by the transmitter due to unreliable medium. We assume that the crossover

probability is such that

pfb = min

p0,p0P dfb

(4.13)

56

4.3 Performance Metrics with Imperfect CSIR

where 0 ≤ p0 ≤ 12and dfb > 0 is the feedback diversity, which is defined as

dfb , limP→∞

− log pfblogP

. (4.14)

This models a feedback channel whose quality increases with the forward link

SNR. The perfect feedback assumption [26, 29, 32] is a special case of this BSC

feedback with pfb = 0.

Upon receiving a positive ACK, the transmitter stops the current message

transmission and starts the next message transmission. Otherwise, the transmit-

ter continues the current message transmission in the next round. The transmit-

ter keeps track of the number of rounds that have already elapsed and terminates

the current transmission once the maximum delay limit L is reached.

Once a message is successfully decoded at a particular round, the receiver

issues a positive ACK. The receiver will disregard any extra transmission rounds

of the current message as a result of wrong negative ACKs at the transmitter;

for each extra transmission round, the receiver will continue to issue positive

ACK. If decoding is not successful (Fr(ℓ) = 0) but there is no retransmission of

the current message (since Ft(ℓ) = 1), the receiver declares an error; no ACK is

further issued for the current message as decoding will start for the next message.


In the following, we develop ARQ performance metrics accounting imperfect

CSIR.

4.3.1 Error Probability with Perfect Feedback

To analyse the error probability of the underlying ARQ scheme, we first define

the following three decoder events.

• The joint event of error detection up to round ℓ

Dℓ ,

Ψ(Y1,1) = 0, . . . ,Ψ(Y1,ℓ) = 0

. (4.15)

• Assuming that the message m is transmitted, the undetected error event

at round ℓ (conditioned that Dℓ−1 occurs), consists of

57


– the event of valid decoding

Vℓ ,

⋃

m6=0

Ψ(Y1,ℓ) = m

, (4.16)

– the event of decoding error

Eℓ ,

⋃

m 6=m

Ψ(Y1,ℓ) = m

. (4.17)

Assuming perfect feedback, the error probability at round ℓ, Pe(ℓ), ℓ = 1, . . . ,

L− 1 can be written as

Pe(ℓ) = Pr V1,E1+ℓ−1∑

ℓ′=2

PrDℓ′−1,Vℓ′,Eℓ′+ Pr Dℓ−1,Eℓ . (4.18)

The accumulated error probability at round L is given by

Pe(L) = Pr V1,E1+L−1∑

ℓ′=2

PrDℓ′−1,Vℓ′ ,Eℓ′+ Pr DL−1,EL . (4.19)

Note that for any ℓ = 1, . . . , L − 1, if an undetected error occurs at round ℓ, it

will also be counted for error at round ℓ+1 up to L. This emphasises that ARQ

cannot correct undetected errors.

In order to characterise the above error probability, we first study the thresh-

old decoder Ψ(·) for a block-fading channel and its properties in terms of detected

and undetected error probability in Section 4.3.1.1. We then apply the results to

IR-ARQ in Section 4.3.1.2.

4.3.1.1 Threshold Decoder Ψ(·)

In this section, we consider a block-fading channel with B fading blocks and

characterise the error behaviour of the threshold decoder Ψ(·) based on the set

(4.10), i.e.,

Tδ =

X(m),Y, H :QsY|X ,H

(

Y

∣

∣

∣X(m), H

)

[

QsY|X ,H

(

Y

∣

∣

∣X′, H

)] ≥ |M|δ

(4.20)

for some s, δ > 0. Herein we have specifically omitted the argument ℓ in Tδ and

the subscript 1,ℓ in the arguments of the metric as we do not incorporate ARQ.

58


We first note that the set Tδ is the modified version of the set implicitly

used in the Csiszar-Korner upper bound to the error probability [69, Lemma

12.9]. There are two differences with [69, Lemma 12.9]. The first one is the

introduction of the parameter s > 0 (allowing a more general bound than using

s = 1 in [69, Lemma 12.9]). The second one is the normalising term

[

QsY|X ,H

(

Y

∣

∣

∣X′, H)]

(4.21)

that lifts the restriction [69]

[

QY|X ,H

(

Y

∣

∣

∣X′, H)]

= 1, ∀Y ∈ Bnr×J . (4.22)

Thus, using the same steps to derive [69, Lemma 12.9]), we have the following

lemma on the upper bound to the error probability of the threshold decoder.

Lemma 4.1. The error probability of the ensemble of random codes using the

threshold decoder—which produces the output m if X(m) is the unique sequence

such that its normalised metric is greater than the threshold in Tδ (4.20); other-

wise, an error is declared—can be upper-bounded as

Pe,ave

(

H

)

≤ δ

+ Pr

1

BJlog2

QsY|X ,H

(

Y

∣

∣

∣X, H

)

[

QsY|X ,H

(

Y

∣

∣

∣X′, H

)∣

∣

∣Y, H

] < R− 1

BJlog2 δ

∣

∣

∣

∣

∣

∣

H = H,E = E

(4.23)

for any s, δ > 0.

Lemma 4.1 implies the following.

Corollary 4.1 (Detected Error Probability). Let

δ′ = − 1

BJlog2 δ, 0 < δ < 1. (4.24)

For a given alphabet X and its corresponding distribution, there exists a random

code whose detected error probability can be bounded as

PrD|H = H,E = E ≤ 2−BJδ′

+ Pr

1

BJlog2

QsY|X ,H

(

Y

∣

∣

∣X, H

)

[

QsY|X ,H

(

Y

∣

∣

∣X′, H

)∣

∣

∣Y, H

] < R + δ′

∣

∣

∣

∣

∣

∣

H = H,E = E

(4.25)

59


for any s > 0.

The upper bound in Lemma 4.1 holds for the undetected error probability

as well. However, we want the undetected error probability to be as small as

possible. The following lemma—which we prove in Appendix B.1—may provide

a tighter bound.

Lemma 4.2 (Undetected Error Probability). Recall that δ′ is given in (4.24).

For a given alphabet X and its corresponding distribution, there exists a random

code whose undetected error probability can be bounded as

PrV,E|H = H,E = E ≤ 2−BJδ′ (4.26)

for any δ′ > 0.

We shall note that the maximum-metric decoder based on QY|X ,H (Y|X, H),

cf.(2.22), yields a smaller error probability than the above threshold decoder.

This is so because whenever the threshold decoder produces a message output,

this output is identical to the output of the maximum-metric decoder (2.22).

However, the threshold decoder has an advantage over the maximum-metric de-

coder. More specifically, the threshold decoder allows for error detection, i.e.,

when there is no unique codeword above the threshold in Tδ.

Note that with an i.i.d. codebook, we can write the expectation over X′ as

[

QsY|X ,H

(

Y

∣

∣

∣X′, H)]

=

B∏

b=1

J∏

ν=1

[

QsY |X,H

(

yb,ν

∣

∣

∣X ′

b,ν , Hb

)]

. (4.27)

For a fixed fading H b = Hb ∈ nr×nt and its corresponding estimation error

Eb = Eb ∈ nr×nt, b = 1, . . . , B, in the limit as J → ∞, we have from the law of

large numbers [19] that

limJ→∞

1

Jlog2

QsY|X ,H

(

Y

∣

∣

∣X, H

)

[

QsY|X ,H

(

Y

∣

∣

∣X′, H

)]

=B∑

b=1

log2

QsY |X,H

(

Y

∣

∣

∣X, H b

)

[

QsY |X,H

(

Y

∣

∣

∣X ′, H b

)∣

∣

∣Y , H b

]

∣

∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

. (4.28)

Fix δ′ > 0. Following from Corollary 4.1 and (4.28), as the block length J

tends to infinity, the detected error probability—conditioned on both fading and

60


its corresponding estimation error—is given by

PrD|H = H,E = E ≤ 1

Igmi(H , s) ≤ R + δ′∣

∣

∣H = H,E = E

, (4.29)

where

Igmi(H, s) =1

B

B∑

b=1

Igmib (P,Hb, Hb, s) (4.30)

and

Igmib (P,Hb, Hb, s)

=

log2

QsY |X,H

(

Y

∣

∣

∣X, H b

)

[

QsY |X,H

(

Y

∣

∣

∣X ′, H b

)∣

∣

∣Y , H b

]

∣

∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

. (4.31)

Note that for a given H = H and E = E, the bound (4.29) is valid for any s > 0.

Thus, by taking the expectation over the ensemble of H ,E, we can tighten the

bound (4.29) by optimising over s > 0 for which s depends on the realisations of

H and E, i.e.,

PrD , [PrD|H = H,E = E] (4.32)

≤ Pr

sups>0

Igmi(H , s) ≤ R + δ′

(4.33)

= Pr

Igmi(H) ≤ R + δ′

(4.34)

which is similar to the generalised outage probability for a given rate R+δ′ (2.32)

but with < being replaced by ≤. On the other hand, following from Lemma 4.2

PrV,E|H = H,E = E ≤ 2−BJδ′ , (4.35)

we observe that for any δ′ > 0, the undetected error probability vanishes as J

tends to infinity. Remark that the upper bound (4.35) is independent from both

H and E.

The converse for i.i.d. codebooks shows that conditioned on fading and its

corresponding estimation error, the error probability with the maximum-metric

decoder (4.12) for sufficiently large J is lower-bounded by [4, 39]

PrE|H = H,E = E ≥ 1− exp(

−e−BJ(Igmi(H)+ε−R) + e−BJ(Igmi(H)+ε))

(4.36)

where ε > 0 is an arbitrarily small number. In the limit of large J , averaging the

61


RHS of the inequality over all fading and channel estimation error realisations

leads to the generalised outage probability at a given rate R (2.32). Recall that

the threshold decoder Ψ(·) cannot be better than the maximum-metric decoder.

Hence, the lower bound (4.36) holds for the threshold decoder Ψ(·) as well.From Lemma 4.1 and the result (4.29), we have that for large block length,

whenever Igmi(H) > R + δ′, the message will be correctly decoded with high

probability. For i.i.d. codebooks, whenever Igmi(H) < R, a decoding error occurs

with high probability, which follows from (4.36). If a decoding error occurs, then

it can be detected with arbitrarily high probability, which follows from the result

(4.35) by letting J to infinity.

4.3.1.2 Decoding Error and Communication Outage

We can view IR-ARQ decoding at round ℓ, ℓ = 1, . . . , L as decoding for a block-

fading channel with total ℓB fading blocks. Due to the concatenation of the

matrices X1, . . . ,Xℓ in X1,ℓ, the effective coding rate is given by R/ℓ. Note that

following from Lemma 4.2, we have that the probability of undetected error

accumulated up to round ℓ < L can be upper-bounded as

Pr V1,E1+ℓ∑

ℓ′=2

Pr Dℓ′−1,Vℓ′,Eℓ′ ≤ ℓ2−BJδ′ . (4.37)

By fixing δ′ > 0 and letting J → ∞, the contribution of (4.37) to (4.18) can

be made arbitrarily small. The dominating contribution is given by the detected

error probability

Pr Dℓ−1,Eℓ − Pr Dℓ−1,Vℓ,Eℓ = Pr Dℓ−1,Vcℓ,Eℓ (4.38)

= Pr Dℓ . (4.39)

Note that by the definition (4.15), Dℓ characterises the joint detected error event

up to round ℓ. Following Corollary 4.1, in the limit as J → ∞, we have the

bound

Pr Dℓ ≤ Pr

H1,ℓ,E1,ℓ ∈

ℓ⋂

ℓ′=1

Q1,ℓ′(R + δ′)

(4.40)

where

Q1,ℓ(R) ,

H1,ℓ,E1,ℓ ∈ ℓBnr×ℓBnt : Igmi

1,ℓ(H1,ℓ) ≤ R

. (4.41)

62


For future reference, we shall define a similar set to Q1,ℓ(R) but with ≤ being

replaced by <, i.e.,

O1,ℓ(R) ,

H1,ℓ,E1,ℓ ∈ ℓBnr×ℓBnt : Igmi

1,ℓ(H1,ℓ) < R

. (4.42)

In the above, we have defined Igmi

1,ℓ(H1,ℓ) as the accumulated GMI, related to the

GMI with ℓB fading blocks Igmi(H1,ℓ) as

Igmi

1,ℓ(H1,ℓ) , ℓIgmi(H1,ℓ) (4.43)

where

Igmi(H1,ℓ) = sups>0

1

ℓB

ℓ∑

l=1

B∑

b=1

Igmil,b (Pl,Hl,b, Hl,b, s) (4.44)

and where Igmil,b (Pl,Hl,b, Hl,b, s) is given on the RHS of (4.31). Clearly, this ac-

cumulated GMI replaces the role of the accumulated mutual information [32]

I1,ℓ(H1,ℓ) ,1

B

ℓ∑

l=1

B∑

b=1

Iawgn

(

√

Pl

nt

Hl,b

)

(4.45)

(where Iawgn(·) is defined in Section 2.3.1) to describe communication outages

for IR-ARQ.

At round L, the maximum-metric decoder (4.12), which is better than the

threshold decoder Ψ(·), is used to produce the output message. Thus, at round

L, the error characterisation in Section 2.4.1 applies. Following the same steps

as for ℓ < L, we can see that the upper bound to Pe(L) in (4.19) is dominated by

PrDL−1,EL. Following the steps in Section 2.4.1 and the derivation in Section

4.3.1.1, we can show as J → ∞ that

Pe(L) ≤ Pr

H1,L,E1,L ∈

L⋂

ℓ′=1

Q1,ℓ′(R + δ′)

. (4.46)

Using Proposition 2.4 (see also the discussion in Section 4.3.1.1), we can show

that with i.i.d. codebooks and perfect error detection, the error probability at

round ℓ for sufficiently large block length can be lower-bounded as

PrEℓ,Dℓ−1 ≥ Pr

H1,ℓ,E1,ℓ ∈

ℓ⋂

ℓ′=1

O1,ℓ′(R)

. (4.47)

Notice that the upper bound (4.46) tends to the lower bound (4.47) (at round

L) with δ′ ↓ 0. For future reference, we shall define the RHS of (4.47) in the

63


0 2 4 6 8 10 12 140

0.05

0.1

0.15

0.2

0.25

0.3

0.35

de = 0.5

de = 1

de = 2

Perfect CSIR

Igmi

1,2(h1,2)

Density

ofIgmi

1,2(H

1,2)

Igmi

1,1(h1,1)

Figure 4.2: Density of the accumulated GMI at round ℓ = 2 for Gaussian-inputtransmission over a SISO Rayleigh fading channel with B = 1, Pℓ = 16 (unitpower), ℓ = 1, 2.

following.

Definition 4.1 (Generalised Joint-Outage). The generalised joint-outage prob-

ability at ARQ round ℓ, ℓ = 1, . . . , L, is defined by the probability that the accu-

mulated GMI from rounds 1 to ℓ are less than R, i.e.,

P ℓgout(R) , Pr

H1,ℓ,E1,ℓ ∈

ℓ⋂

ℓ′=1

O1,ℓ′(R)

. (4.48)

One of the main reasons for employing IR-ARQ in practical systems is the fea-

ture of concatenation of all noisy received coded blocks Y1,ℓ. From an information-

theoretic perspective, under perfect CSIR (where Igmi

1,ℓ(H1,ℓ) and I1,ℓ(H1,ℓ) are

identical), this allows for information accumulation (4.45). However, this may

no longer be true if the CSIR is imperfect. As we can see from (4.43) and (4.44)

increasing the number of coded blocks does not necessarily improve the accumu-

lated GMI due to the optimisation over s > 0. In contrast to the accumulated

mutual information, the accumulated GMI at round ℓ is not the sum of the GMI

at rounds ℓ′ = 1, . . . , ℓ. To illustrate this, we consider Gaussian-input transmis-

sion using a two-round IR-ARQ scheme over a SISO quasi-static Rayleigh-fading

channel and evaluate the accumulated GMI at round ℓ = 2 conditioned on the

accumulated GMI at round ℓ = 1, Igmi

1,1(h1,1). Let h1,1 = h1,1 = 1 − i. With

Pℓ = 16, ℓ = 1, 2, this gives Igmi

1,1(h1,1) = 5.044 bits per channel use. We plot the

64

Chapter3/Chapter3Figs/pdf_accu_GMI.eps


density of the random variable Igmi

1,2(H1,2) in Figure 4.2.

As we can see from Figure 4.2 that for de = 0.5, 1, 2, there is non-zero prob-

ability that the accumulated GMI at round ℓ = 2 is less than the accumulated

GMI at round ℓ = 1. This probability decreases with an increase in de. The

figure also suggests that if the decoder is mismatched, increasing the number

of fading blocks does not necessarily translate into improving the accumulated

maximum-achievable rate. This phenomenon does not occur if the CSIR is per-

fect, i.e., the accumulated mutual information at round ℓ = 2 always improves

as a consequence of the non-negativity of mutual information [18].

4.3.2 Error Probability with Imperfect Feedback

With imperfect feedback, there are additional error events which come from

feedback errors, i.e., when the receiver detects an error hence sending a negative

ACK (Fr(ℓ) = 0), but the transmitter receives a positive ACK (Ft(ℓ) = 1). We

first define the joint event when the transmitter receives negative ACKs up to

round ℓ

Aℓ , Ft(1) = 0, . . . , Ft(ℓ) = 0 . (4.49)

Using the feedback model in Section 4.2, we understand that the value of Ft(ℓ)

is known after the value of Fr(ℓ) is known.

With imperfect feedback, the error probability in (4.19) becomes

Pe(L) =PrV1,E1+L−1∑

ℓ′=2

PrAℓ′−1,Dℓ′−1,Vℓ′,Eℓ′+ Pr AL−1,DL−1,EL

+ PrD1, Ft(1) = 1+L−1∑

ℓ′=2

PrAℓ′−1,Dℓ′, Ft(ℓ′) = 1. (4.50)

The first three terms

PrV1,E1+L−1∑

ℓ′=2

PrAℓ′−1,Dℓ′−1,Vℓ′ ,Eℓ′+ Pr AL−1,DL−1,EL (4.51)

characterise decoding errors and undetected errors. The last two terms

PrD1, Ft(1) = 1+L−1∑

ℓ′=2

PrAℓ′−1,Dℓ′, Ft(ℓ′) = 1 (4.52)

are due to feedback errors. Note that only feedback errors up to round L− 1 are

considered as the imperfect feedback at round L is immaterial since there is no

65

4.4 ARQ Outage Diversity

further retransmission. We shall analyse the severity of the imperfect feedback

and CSIR and investigate the behaviour of Pe(L) at high SNR in the following

section.


In this section, we present our results on the high-SNR performance of the IR-

ARQ scheme. Note that the reliability of IR-ARQ is characterised by errors

accumulated at round L, i.e., Pe(L) in (4.50). Following the large block length

analysis in Section 4.3, using random coding schemes, the behaviour of Pe(L) can

be captured by the ARQ outage probability P arqgout(R), which is defined as follows.

Definition 4.2 (ARQOutage Probability). The ARQ outage probability P arqgout(R)

is the probability that the accumulated GMI at rounds ℓ = 1, . . . , L is less than

the data rate R, but no further retransmission is performed, i.e.,

P arqgout(R) , Pr

H1,1,E1,1 ∈ O1,1(R), Ft(1) = 1

+L−1∑

ℓ=2

Pr

H1,ℓ,E1,ℓ ∈

ℓ⋂

ℓ′=1

O1,ℓ′(R)

,Aℓ−1, Ft(ℓ) = 1

+ Pr

H1,L,E1,L ∈

L⋂

ℓ′=1

O1,ℓ′(R)

,AL−1

. (4.53)

In the following, we define two important quantities that capture the SNR-

exponents of P ℓgout(R) in (4.48) and P arq

gout(R), and will be used to state our main

theorem.

Definition 4.3 (Generalised Outage Diversity at Round ℓ). The generalised out-

age diversity at ARQ round ℓ is defined as the high-SNR slope of the generalised

joint-outage probability at round ℓ (4.48) curve plotted in log-log scale, i.e.,

dicsir(ℓ) , limP→∞

−logP ℓ

gout(R)

logP. (4.54)

Note that the perfect-CSIR outage diversity at round ℓ, dcsir(ℓ) can be obtained

from dicsir(ℓ) by letting de ↑ ∞.

Definition 4.4 (ARQ Outage Diversity). The ARQ outage diversity is defined

as the high-SNR slope of the ARQ outage probability curve plotted in log-log scale,

i.e.,

darq , limP→∞

− logP arqgout(R)

logP. (4.55)

66


Our main result on darq is summarised in the following theorem.

Theorem 4.1. Consider an IR-ARQ coding scheme over the MIMO block-fading

channel (4.1) with imperfect CSIR (4.7) and imperfect feedback (4.13). For any

possible power allocation P1, . . . , PL satisfying the power constraint (4.2) and any

input constellations, the ARQ outage diversity is given by

darq = min (dicsir(L), dfb + dicsir(1)) (4.56)

where dicsir(ℓ), ℓ = 1, . . . , L is the generalised outage diversity at round ℓ (4.54).

The achievability of darq is shown using random coding.

Proof. See Appendix B.2.

Theorem 4.1 shows the interplay of the ARQ outage diversity darq, the feed-

back diversity dfb, the generalised outage diversity dicsir(ℓ) and the ARQ delay

limit L. Imperfect feedback introduces ACK errors at the transmitter and limits

the communication reliability by preventing compensation of outages occurring

at any rounds less than L. As observed in Appendix B.2, the outage events

occurring at rounds ℓ < L contribute to the ARQ outage probability due to

the crossover probability of BSC feedback that cannot be made any arbitrarily

small. The slowest decaying exponent corresponding to these contributing terms

is given by dicsir(1)+ dfb. At round L, the imperfect feedback is immaterial since

there is no issue of ACK signals and no further retransmission. At this round, the

exponent that contributes to the ARQ diversity is given by dicsir(L). It follows

that the ARQ diversity is given by the minimum of dicsir(1) + dfb and dicsir(L).

Remark 4.1 (Imperfect CSIR, Perfect Feedback). At high SNR, perfect feedback

is a special case of dfb with dfb ↑ ∞. In this case, the ARQ outage diversity

becomes

darq = dicsir(L). (4.57)

If the imperfectness is due to CSIR only, then the gain due to ARQ is still

achievable, i.e., improving diversity by a factor of L. In the perfect feedback

case, the outage events occurring for rounds ℓ < L are being compensated by

the retransmission process; thus, those outage events do not count for the ARQ

outage probability and only the outage events occurring at the maximum round

L are not compensated for [25, 26, 29]. It follows from the discussion in Section

4.3 that in this case, the ARQ outage probability is given by the generalised

joint-outage probability at round L, P arqgout(R) = PL

gout(R).

67


With sufficiently reliable feedback, we can achieve the perfect-feedback di-

versity darq in Remark 4.1. In particular, it follows from Theorem 4.1 that the

condition for dfb to achieve darq in (4.57) is given by

dfb ≥ dicsir(L)− dicsir(1). (4.58)

Depending on how power is allocated, as L increases, the minimum dfb required

to achieve the perfect-feedback diversity is usually higher. In practice, however,

the feedback diversity dfb depends on the feedback signalling and on the available

resources such as bandwidth and power.

As the feedback diversity gets lower, i.e., dfb ↓ 0, the diversity improvement

with ARQ is limited. The ARQ diversity tends to the single-round diversity

dicsir(1).

In the following, we shall discuss how dicsir(ℓ) evolves with the round index ℓ,

ℓ = 1, . . . , L when uniform power allocation or optimal power control is employed.

This allows for an exact characterisation of Theorem 4.1 when a specific power

allocation scheme is used at the transmitter. For the sake of referencing, the

superscripts u and p will be used to refer results with uniform power allocation

and power control, respectively.

4.4.1 Uniform Power Allocation

Uniform power allocation refers to equal power allocation across rounds, i.e.,

Pℓ = P , ℓ = 1, . . . , L, which clearly satisfies the constraint (4.2). The following

proposition quantifies duicsir(ℓ), the generalised outage diversity at round ℓ with

uniform power allocation.

Proposition 4.1 (Uniform Power Allocation). Consider an IR-ARQ coding

scheme over the MIMO block-fading channel (4.1) with imperfect CSIR (4.7)

and imperfect feedback (4.13). With uniform power allocation, the generalised

outage diversity at round ℓ is given by

duicsir(ℓ) = min(1, de)× ducsir(ℓ), ℓ = 1, . . . , L (4.59)

where

ducsir(ℓ) =

(

1 + τ2

)

ℓBntnr, for Gaussian constellations(

1 + τ2

)

dSB(ℓ, R), for discrete constellations of size 2M(4.60)

68


is the perfect-CSIR outage diversity at round ℓ [25, 26,29], and where

dSB(ℓ, R) , nr

(

1 +

⌊

ℓB

(

nt −R

ℓM

)⌋)

. (4.61)


The term min(1, de) in (4.59) is identical to the finding in Theorem 3.1. In

particular, we observe that the relationship of imperfect- and perfect-CSIR SNR-

exponent in Theorem 3.1 continues to hold in ARQ block-fading channels with

perfect feedback or with feedback satisfying (4.58), i.e.,

dfb ≥ duicsir(L)− duicsir(1). (4.62)

The perfect-CSIR ARQ outage diversity can be achieved with de ≥ 1. Remark

that with de, dfb ↑ ∞, we recover some results in [29, 32].

4.4.2 Power-Controlled ARQ

We now consider a long-term power control satisfying (4.2) that minimises the

ARQ outage probability. Improving the ARQ diversity using power control under

perfect feedback and CSIR has been considered in [26] for Gaussian inputs and

in [25, 32] for discrete inputs. The result in this chapter will generalise these

preceding results when imperfect feedback and CSIR are taken into account.

The transmitter can adapt its transmission power based on the received feed-

back. Power is spent for the current message whenever negative ACK is received

by the transmitter. We thus can write (4.2) as

P1

L+

1

L

L∑

ℓ=2

Pr Ft(ℓ− 1) = 0Pℓ ≤ P. (4.63)

We summarise the improvement made by optimal power control to the gen-

eralised outage diversity at round ℓ in the following proposition.

Proposition 4.2. Consider an IR-ARQ coding scheme over the MIMO block-

fading channel (4.1) with imperfect CSIR (4.7) and imperfect feedback (4.13).

Using optimal power control satisfying (4.63), the generalised outage diversity at

69


ARQ round ℓ (4.54) is found recursively as

dpicsir(ℓ) =(

1 +τ

2

)

Bntnr

ℓ∑

ℓ′=1

min(

1 + min(

dpicsir(ℓ′ − 1), d(ℓ′ − 1)

)

, de

)

(4.64)

for Gaussian constellations and

dpicsir(ℓ) =(

1 +τ

2

)

min(

1 + min(

dpicsir(ℓ− 1), d(ℓ− 1))

, de

)

dSB(R)

+(

1 +τ

2

)

Bntnr

ℓ−1∑

ℓ′=1

min(

1 + min(

dpicsir(ℓ′ − 1), d(ℓ′ − 1)

)

, de

)

(4.65)

for discrete constellations of size 2M where

d(ℓ) , minl=1,...,ℓ

ldfb + dpicsir(ℓ− l) (4.66)

d(0) , 0 (4.67)

dpicsir(0) , 0 (4.68)

dSB(R) , dSB(1, R). (4.69)


Note that Proposition 4.2 only shows the improvement of dicsir(ℓ) obtained

by optimal power control with respect to uniform power allocation. We have

from Propositions 4.1 and 4.2 that duicsir(1) = dpicsir(1). Further note that even

though we have dpicsir(ℓ) ≥ duicsir(ℓ), Theorem 4.1 suggests that the ARQ outage

diversity is strongly affected by the feedback diversity. Having a low feedback

diversity implies that the ARQ outage diversity is limited by the generalised

outage diversity at the first round plus some extra diversity coming from the

feedback. As the feedback diversity tends to zero, the ARQ scheme with power

control does not improve the diversity. There are two factors contributing to this

phenomenon. Firstly, the power used in the first ARQ round is always limited to

the constraint (4.63). Hence, we cannot transmit power at the first ARQ round

at the higher order than the constraint. Secondly, with a low feedback diversity,

the decoding of ACK signals is not highly reliable, hence it limits the overall

system reliability.

From Theorem 4.1 and Propositions 4.1 and 4.2, we observe that optimal

power control is superior to uniform power allocation if the feedback diversity dfb

is sufficiently high. The following corollary identifies the condition when optimal

70


power control yields some diversity gains.

Corollary 4.2. Optimal power control can improve ARQ outage diversity with

respect to uniform power allocation if

dfb > duicsir(L)− duicsir(1). (4.70)

Otherwise, optimal power control is as good as uniform power allocation.

The following corollary establishes the criteria of achieving the perfect-feedback

and perfect-CSIR ARQ diversity even if the feedback and CSIR are imperfect.

Corollary 4.3. To achieve the perfect-CSIR diversity, the CSIR-error diversity

de has to satisfy

de ≥ 1 + min(

dpicsir(L− 1), d(L− 1))

. (4.71)

To achieve the perfect-feedback diversity, the feedback diversity dfb has to be

such that d(ℓ) in (4.66), ℓ = 1, . . . , L− 1 satisfy the following conditions

d(ℓ) ≥ dpicsir(ℓ) (4.72)

and

dfb ≥ dp∗icsir(L)− dp∗icsir(1) (4.73)

where dp∗icsir(ℓ) is dpicsir(ℓ) with infinite dfb.

The criterion de ≥ 1 in Chapter 3 is not sufficient to achieve the perfect-

CSIR diversity when optimal power control is used. We require much larger

de to fully exploit the diversity offered by ARQ. This is because power control

improves the diversity recursively, and the CSIR quality has to improve as well

for every recursion step to adapt with the power level. Hence, the minimum

requirement for de to achieve the perfect-CSIR diversity is associated with the

power spent at the last round L. The high de requirement can be reduced if one

can design a scheme such that the CSIR-error diversity de can be adapted as

a function of the round index ℓ, i.e., de(ℓ), for which de(ℓ) increases with ℓ. In

particular, if de(ℓ) ≥ 1+min(dpicsir(ℓ−1), d(ℓ−1)), then the perfect-CSIR diversity

can be achieved. For example, in pilot-aided channel estimation, adaptive de(ℓ)

achieving perfect-CSIR diversity can be obtained by allowing the same power for

both pilot and data symbols for each round.4.1 Note, however, that having the

4.1Note that as explained in Chapter 3, the pilot power is inversely proportional to P−de(ℓ).

71


−10 −5 0 5 10 15 20 2510

−6

10−5

10−4

10−3

10−2

10−1

100

No ARQ, de = 1

dfb = 0.25, de = 1

dfb = 4, de = 1

Perfect feedback, de = 1

Perfect feedback and CSIR

P (dB)

Parq

gout(R)

Figure 4.3: Simulation results of ARQ outage probability for Gaussian-inputtransmission over a MIMO Rayleigh block-fading channel with parameters:B = 2, L = 2, nt = 2, nr = 1, R = 2 bits per channel use and the BSCfeedback with parameter p0 = 0.5.

same power on both data and pilot is not a necessary condition as we may have

a fixed de > 0 to obtain the perfect-CSIR diversity.

The two conditions of the feedback diversity for each round can be explained

as follows. The first condition (4.72) is to ensure that the exponent of the power

allocated at round ℓ is identical to the exponent of the perfect-feedback power

allocation at round ℓ. The second condition (4.73) is to ensure that when the

feedback diversity is added to the generalised outage diversity at the first round,

the result is greater than the perfect-feedback diversity at round L.

4.4.3 Discussion

We illustrate the preceding theoretical results using numerical examples as fol-

lows.

Figure 4.3 shows P arqgout(R) for Gaussian-input transmission with uniform power

allocation. In particular, we evaluate the different outage curves for different

feedback diversities. For dfb = 0.25, the diversity improvement is marginal with

respect to the no-ARQ case; the ARQ diversity is given by the sum of dfb and

duicsir(1). We show here that with dfb = 4 that satisfies (4.62), the perfect-feedback

ARQ diversity can be achieved.

72

Chapter3/Chapter3Figs/ARQ_uni_G.eps


0 0.5 1 1.5 2 2.5 3 3.5 40

5

10

15

Perfect feedback

dfb = 5

dfb = 2

dfb = 0

R

darq

(a) Uniform power allocation

0 0.5 1 1.5 2 2.5 3 3.5 40

20

40

60

80

100

120

140OP, Perfect feedback

OP, dfb = 100

OP, dfb = 50

OP, dfb = 0

UP, Perfect feedback

R

darq

(b) Optimal power allocation

Figure 4.4: ARQ outage diversity for 4-QAM inputs in a MIMO Rayleigh block-fading channel with parameters: B = 2, L = 3, nt = 2 and nr = 1. UP (OP)indicates results with uniform (optimal) power allocation. The data rate R is inbits per channel use. The CSIR-error diversity de is such that (4.71) is satisfied.

Figure 4.4 illustrates the ARQ diversity for 4-quadrature amplitude modula-

tion (QAM) inputs. Figure 4.4a shows the diversity results with uniform power

allocation. With dfb ↓ 0, the system is unable to utilise the benefits of ARQ

in improving diversity. As dfb increases the diversity performance, a high feed-

back diversity is required to fully utilise the ARQ scheme, especially when L is

large. In this figure, the perfect-feedback diversity can be achieved with dfb = 8,

73

Chapter3/Chapter3Figs/ARQ_uni_4QAM.eps

Chapter3/Chapter3Figs/ARQ_PC_4QAM.eps


which satisfies (4.62). Figure 4.4b illustrates the effect of optimal power control

to the ARQ outage diversity. Uniform power allocation with perfect feedback

establishes the boundary for improving the diversity with power control. With

a low dfb (e.g., dfb = 0), we are unable to obtain the diversity improvement with

power control. As dfb increases, we observe some diversity gains, starting from

the high data rate region. For a sufficiently high dfb, the diversity curve tends

to coincides with the perfect-feedback ARQ diversity at high data rate (see, e.g.,

R ≥ 2 for dfb = 50, then R ≥ 1 for dfb = 100). The figure confirms that the

power-controlled IR-ARQ promises a significant improvement of the diversity but

requires a highly-reliable feedback channel.

From the above results, we learn that practical system design has to account

for all possible imperfections in the system and the channel. We shall illustrate

in the following that failure to account those imperfections may result in a very

inefficient communication system.

• Failure to account for imperfect feedback

One purpose of IR-ARQ is to improve the reliability of the system. In

particular, we expect that the high-SNR slope of the error probability will

increase as a function of the ARQ delay limit L. However, if feedback

errors, which induce a low feedback diversity, are not accounted for, then

we may only obtain a marginal improvement of the diversity, which is given

by the first-round diversity plus the feedback diversity. This can be much

smaller than the expected ARQ diversity that improves as a function of L.

• Assuming perfect feedback, failure to account for imperfect CSIR

If the CSIR is imperfect, but the transmitter treats that as it was perfect,

then assuming perfect feedback, the power allocation at round ℓ (which is

optimal for perfect CSIR) satisfies [32]

Pℓ.= P 1+dpcsir(ℓ−1). (4.74)

As proved in Appendix B.4 (with dfb ↑ ∞), we have that with imperfect

CSIR

Pr Ft(ℓ− 1) = 0 .= P−dpicsir(ℓ−1). (4.75)

Thus, the average (4.63) becomes

P1

L+

1

L

L∑

ℓ=2

Pr Ft(ℓ− 1) = 0Pℓ

.≥ P 1 (4.76)

74

4.5 Conclusion

where the last inequality is because dpicsir(ℓ−1) ≤ dpcsir(ℓ−1) by Proposition

2.4. This implies that by ignoring that the CSIR is imperfect, we may

violate the power constraint (4.63).

As also pointed out in Appendix B.4, failure to account for the CSIR-

error diversity de may lead to a lower achievable SNR-exponent. Consider

a discrete alphabet of size 2M and suppose that de = 1 and dSB(R) ≥ 1.

Following the steps used in Appendix B.4, with the power allocation (4.74),

we have that

dpicsir(L) = LdSB(R) (4.77)

which is strictly smaller than the ARQ diversity with uniform power alloca-

tion duicsir = dSB(L,R). So, in this case, employing the power adaptation al-

gorithm [32] (which is optimal in terms of SNR-exponent for perfect CSIR)

may yield a lower reliability as compared to employing uniform power allo-

cation. This shows that the optimal power control algorithm under perfect

CSIR can be highly suboptimal in practical systems.

On the last note, we recall that under perfect CSIR, the accumulated mu-

tual information [26, 29, 32] always improves with the number of fading blocks,

implying that under perfect CSIR, IR-ARQ (full observation-combining) with

optimal power control is superior to other hybrid ARQ schemes (partial or no

observation-combining) with or without optimal power control. As we pointed

out in Section 4.3, under imperfect CSIR, the accumulated GMI does not ne-

cessarily improve and can decrease with the number of fading blocks. This sug-

gests that in practical scenarios, IR-ARQ may not necessarily be superior to

other schemes employing partial observation-combining (such as maximal-ratio

combining) or no observation-combining (such as ALOHA). With our decoding

model, for example, to improve the reliability, the general rule is that at round

ℓ, ℓ = 1, . . . , L, we should selectively combine fading blocks which yield a unique

message output m ∈ M from the threshold decoder Ψ(·) in Section 4.2.2. Selec-

tive observation-combining yields a better reliability when the combined coded

blocks improve the accumulated GMI from round to round.

4.5 Conclusion

We have analysed the performance of IR-ARQ over MIMO block-fading channels

with imperfect CSIR and BSC feedback. Specifically, we have characterised the

diversity penalty caused by imperfect CSIR and feedback. Our results suggest

75

4.5 Conclusion

that the feedback SNR must improve with the forward SNR in order for IR-

ARQ to be able to exploit the available diversity; otherwise the IR-ARQ scheme

is unable to improve the diversity. We have derived the conditions for which

the perfect-feedback and perfect-CSIR ARQ diversity may be exploited. We

have learnt that in order to achieve the perfect-feedback diversity, the required

feedback transmission must provide an additional diversity which is increasing

with the maximum number of ARQ rounds. We have identified how power control

can be used to further improve the system performance. We have highlighted the

importance of practical system design to account for all possible imperfections in

the channel. Failure to account for these imperfections may result in overspending

transmitted power and a degradation of the reliability of transmission. At the

end of the chapter, we have pointed out that selective observation-combining in

ARQ may be competitive to IR-ARQ for practical systems where the CSIR and

feedback are imperfect.

76

Chapter 5

Mismatched CSI Outage

SNR-Exponents of MIMO

Block-Fading Channels

In Chapter 2, we have discussed that the error probability in block-fading chan-

nels cannot be made any arbitrarily smaller than the information-outage prob-

ability for sufficiently large block length. One important element that affects

the outage performance is the availability of CSI. In Chapter 3 we have studied

the effect of imperfect CSI at the receiver (CSIR) in block-fading channels. In

this chapter, we extend the analysis in Chapter 3 to include imperfect CSI at

the transmitter (CSIT). We propose a unified framework to study mismatched

CSI at both terminals. In particular, we study the GMI and the generalised out-

age probability—which have been introduced in Chapter 2—of nearest neighbour

decoding when power adaptation is employed at the transmitter.

This chapter is outlined as follows. Section 5.1 specifically introduces our

system model. Section 5.2 provides some relevant backgrounds to study imper-

fect CSI in the block-fading channel. Section 5.3 presents our main results on

outage SNR-exponent. Section 5.4 provides discussion and further analysis on

our findings. Section 5.5 summarises the important points of the chapter.

5.1 System Model

The system model is depicted in Figure 5.1. We consider a MIMO block-fading

channel with nt transmit antennas, nr receive antennas and B fading blocks per

codeword. The output of the channel for block b is an nr×J-dimensional random

77

5.1 System Model

m Encoder

×

×

+

+

Decoder m

H1

HB

Z1

ZB

q

P1

ntX1

q

PB

ntXB

Y1

YB

b

b

b

CSIR

Hb, b = 1, . . . , B

CSIT

H(n(b)), b = 1, . . . , B

Figure 5.1: System model for MIMO block-fading channels with imperfect CSIat both terminals.

matrix

Yb = H bP12b Xb + Zb, b = 1, . . . , B (5.1)

where Zb are the nr × J-dimensional random noise matrix and Xb ∈ Xnt×J are

the transmitted signal matrix; J and X denote the channel block length and

the signal constellation, respectively. We assume that the entries of Zb are i.i.d.

complex-Gaussian random variables with zero mean and unit variance. The

nr × nt random matrix H b denotes the fading for block b and is assumed to be

i.i.d. from block to block. Furthermore, the entries of H b are i.i.d. zero-mean

unit-variance complex-Gaussian random variables; the magnitude of each entry

of H b is then Rayleigh distributed.

A codeword that represents a message m ∈ 1, . . . , 2BJR to be transmitted is

denoted by X(m) = [X1(m), . . . ,XB(m)] where R is the coding rate. The entries

of X are i.i.d. constructed from a probability distribution over Xnt where X ⊆ is the input alphabet. Herein we focus on Gaussian inputs and discrete inputs.

We further assume that the coding rate R is a fixed positive constant; hence

the multiplexing gain (2.57) tends zero. Finally, we assume that a codeword is

normalised such that1

BJ[

‖X‖2F]

= nt. (5.2)

We study a case where imperfect CSI is the actual CSI plus AWGN. This

model of noisy CSI comes from exploiting channel reciprocity [70, 71] for which

the channel realisation is identical at both ends but the channel estimation noises

78

Chapter4/Chapter4Figs/MIMO-BFC-Model.eps

5.1 System Model

are independent, i.e.,

CSIT H b = H b + Eb, (5.3)

CSIR H b = H b + Eb (5.4)

where Eb and Eb are the CSIT and the CSIR noise random matrices, respectively.

Note that Eb is independent from Eb. The entries of Eb and Eb are assumed

to be independent from H b and i.i.d. complex-Gaussian random variables with

zero mean and variances σ2e = P−de and σ2

e = P−de, respectively, where P is

the average SNR. For a fixed fading matrix H b = Hb, the fading estimates at

both terminals are independent. The imperfect CSIR model is widely used in

a pilot-based channel estimation at the receiver for which the error variance is

proportional to the reciprocal of the pilot SNR [49, 50]. The same estimation

technique can also be performed at the transmitter, i.e., by transmitting pilot

symbols at the reverse link of a time-division duplex (TDD) system. We further

incorporate the parameters de > 0 and de > 0, denoting the CSIT-error and the

CSIR-error diversities, respectively.

The power matrix Pb ∈ nt×nt is a diagonal matrix. In particular, we use a

scaled identity power matrix as in [45, 46]

Pb

(

H(n(b))

)

=Pb

(

H(n(b)))

ntInt (5.5)

where n(b) is the number of fading blocks used for power adaptation. Note that

the power allocation at block b (5.5) is performed after knowing the noisy CSIT

vector

H(n(b)) =[

H1, . . . , Hn(b)

]

. (5.6)

Depending on n(b) we have the following cases.

• Full-CSIT power allocation if n(b) = B for all b = 1, . . . , B. Imperfect

fading estimates for the whole B blocks in a codeword are available at the

transmitter prior to transmission. This setup is practically relevant for

multi-carrier orthogonal transmission such as OFDM, where the channel is

estimated in the time domain and data transmission occurs in the frequency

domain.

• Causal-CSIT power allocation if n(b) = b− τd with a fixed delay τd > 0 for

any b = 1, . . . , B. This corresponds to CSIT being limited only to the past

imperfect fading estimates due to the delay τd. Causal CSIT is motivated

79

5.2 Preliminaries

by block-fading channels with time-domain transmission for which only

past fading estimates may be available at the transmitter.

• Predictive-CSIT power allocation if n(b) = b+τf with a fixed τf ≥ 0 for any

b = 1, . . . , B. Here τf is a prediction parameter indicating the number of

predicted fading blocks. This corresponds to CSIT including past, current

and a number of predicted future fading estimates. This setup is relevant

for instantaneous parallel transmission such as in OFDM where (possibly)

not all fading blocks are used for power allocation. More specifically, for

each fading block b = 1, . . . , B, only (n(b) = b + τf) fading matrices are

used for allocating power at block b. This setup is essential to capture the

case in between causal and full CSIT cases.

For the above power allocation schemes, the corresponding long-term average

power constraint is given by

[

1

B

B∑

b=1

tr(

Pb

(

H(n(b))

))

]

≤ P. (5.7)

The power matrix structure in (5.5) implies that for block b, we have equal power

distribution across all transmit antennas.

Nearest neighbour decoding is used to infer the transmitted message. With

imperfect CSIR, the decoder treats the imperfect channel estimate as if it were

perfect. It first computes the following metric for a given Y = [Y1, . . . ,YB], CSIR

H = [H1, . . . , HB], and power matrix P = [P1(H(n(1))), . . . ,PB(H

(n(B)))]

Q(

Y, H,P,X(m))

∝ exp

(

−B∑

b=1

∥

∥

∥Yb − HbP

12b Xb(m)

∥

∥

∥

2

F

)

(5.8)

and then outputs

m = arg maxm∈1,...,2BJR

Q(

Y, H,P,X(m))

. (5.9)

5.2 Preliminaries

By incorporating power adaptation based on noisy CSIT, we have from the

analysis in Chapter 2 that in the large block length regime, the average error

probability—averaged over the ensemble of random codes with input distribution

PX(x) over Xnt—can be upper-bounded using the generalised outage probability

80

5.2 Preliminaries

Pgout(R) defined in (2.32), i.e.,

Pe ≤ Pr

Igmi(H , H ,P) < R

(5.10)

, Pgout(R) (5.11)

where

Igmi(H, H,P) = sups>0

1

B

B∑

b=1

Igmib (Pb,Hb, Hb, s) (5.12)

is the generalised mutual information (GMI) for fading H, receiver estimate H

and power matrix P, and where

Igmib (Pb,Hb, Hb, s)

=

log2

Qs(

Y , Hb,Pb,X)

[

Qs(

Y , Hb,Pb,X ′)∣

∣

∣Y ,Hb, Hb,Pb

]

∣

∣

∣

∣

∣

∣

Hb, Hb,Pb

. (5.13)

This shows the achievability of the generalised outage probability for block-fading

channels with imperfect CSI. The ensemble average implies that we can find codes

that achieve an error probability, that is as small as Pgout(R). This does not mean

that no codes can achieve a smaller error probability than Pgout(R). However,

for i.i.d. codebooks, it has been shown in [72] based on the results in [4, 36, 39]

that Pgout(R) is the smallest error probability for block-fading channels with

mismatched CSIR. Therefore, for i.i.d. codebooks with sufficiently large block

length, it suffices to study Pgout(R) in order to characterise the error events.

We are interested in characterising the behaviour of Pgout(R) at high SNR.

One important figure of merit is the generalised outage diversity defined in (2.67)

d , limP→∞

− logPgout(R)

logP. (5.14)

Throughout this chapter, we shall emphasise that the generalised outage diversity

and the outage SNR-exponent mean the same thing and correspond to (5.14).

In Chapter 3, we have shown that with uniform power allocation, the imperfect-

CSIR outage SNR-exponent duicsir is the function of the perfect-CSIR outage SNR-

exponent ducsir and the CSIR-error diversity de, i.e.,

duicsir = min(1, de)× ducsir. (5.15)

Here the superscript u denotes the uniform power allocation. From [29, 32], we

81

5.2 Preliminaries

have that

ducsir =

Bntnr, for Gaussian inputs

dSB(R) , nr

(

1 +⌊

B(

nt − RM

)⌋)

, for discrete inputs(5.16)

where dSB(R) is the Singleton bound, and where M , log2 |X|. This result

implies that if the variance of the CSIR error is less than or equal to the inverse

of the SNR, the perfect-CSIR diversity is achievable. Otherwise, the imperfect-

CSIR diversity is smaller than the perfect-CSIR diversity.

If CSIT is available, then the transmitter can adapt its transmission power to

minimise the generalised outage probability. The idea is that in a very bad chan-

nel realisation, power can be saved and used when channel conditions improve.

References [27,73] showed that if perfect CSI is available at both terminals, then

zero outage is possible, implying that the delay-limited capacity [74] is positive.

References [45,46] extended the results to perfect CSIR and imperfect full CSIT

setup. In this case, the SNR-exponent is given by

dficsit = ducsir (1 + ducsirde) (5.17)

where the superscript f denotes full-CSIT power control. Assuming perfect CSIR,

reference [75] considered cases where imperfect causal or predictive CSIT is avail-

able. In those cases, the SNR-exponent is given as a function of the CSIT-error

diversity de and the CSIT delay τd or the CSIT prediction parameter τf .

In practical scenarios, both CSIR and CSIT will be imperfect. It is there-

fore of practical interest to study mismatched CSI at both ends under a unified

framework. In this work, we find the SNR-exponents with imperfect CSI at both

ends using nearest neighbour decoding and power allocation. In particular, the

power allocation algorithm is given by the solution to the following optimisation

problem

minimize Pgout(R)

subject to

[

1

B

B∑

b=1

tr(

Pb

(

H(n(b))

))

]

≤ P

diag(

Pb

(

H(n(b))

))

0, b = 1, . . . , B.

(5.18)

Solving the above optimisation problem can be difficult in general. Given our

CSIT model, the minimum-outage power allocation is difficult to find since

Pgout(R) depends on both actual channel and channel estimate. Nevertheless,

we will see that despite this difficulty, studying the behaviour of the optimal

82

5.3 Outage SNR-Exponents

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

de

de

d e

= 1 + ducs

ird e

Mismatched CSIRdominatesdf

icsi= ded

u

csir

Power control is not effective

df

icsi= d

u

csir

Mismatched CSITdominatesdf

icsi= (1 + d

u

csirde)d

u

csir

Figure 5.2: Interplay among the CSIT- and CSIR-error diversities and the outageSNR-exponent with full-CSIT power allocation.

solution at high SNR is possible. We will use the technique in [46] to derive

the asymptotic power allocation that results in no loss in terms of outage SNR-

exponent.


The solution to the power allocation in (5.18) depends on whether the CSIT is

full, causal or predictive. Therefore, we will separately study the SNR-exponent

for each type of CSIT.

5.3.1 Full-CSIT Power Allocation

Theorem 5.1 (Full CSIT). For full CSIT (where n(b) = B in (5.6)), the outage

SNR-exponent dficsi of MIMO block-fading channels with nt transmit antennas,

nr receive antennas, B fading blocks, CSIT-error diversity de and CSIR-error

diversity de for Gaussian and discrete constellations is

dficsi =

ducsirde, if de ≤ 1 + ducsirde

ducsir (1 + ducsirde) , if de > 1 + ducsirde(5.19)

83

Chapter4/Chapter4Figs/Fig1_2.eps


where ducsir is given in (5.16).

Proof. See Appendix C.3.

The results in Theorem 5.1 highlight the trade-off on the resources spent for

estimating the channel at both terminals and the effectiveness of power control

given a noisy CSIR. We illustrate this in Figure 5.2. Power control is effective

whenever the CSIR noise variance is much smaller than the CSIT noise variance,

i.e., de > 1 + ducsirde. For example, with Gaussian inputs de must be larger than

de approximately by a factor of Bntnr. The condition de > 1 highlights the

potential improvement made by power control over uniform power allocation.

Outage events are dominated by the mismatched CSIR if the CSIR noise is

strong, i.e., for de ≤ 1 + ducsirde. Otherwise, outage events are dominated by the

mismatched CSIT.

Remark 5.1. CSIR has a higher impact on the generalised outage diversity than

CSIT. With high quality CSIR, poor CSIT has the same effect as having no power

control. On the other hand, with high quality CSIT, poor CSIR results in diversity

approaching zero.

The result in Theorem 5.1 is consistent with previous results. In particular,

we recover the mismatched-CSIT perfect-CSIR outage SNR-exponent in [45, 46]

by letting de ↑ ∞ (perfect CSIR), and the no-CSIT mismatched-CSIR outage

SNR-exponent in Chapter 3 by letting de ↓ 0.

5.3.2 Causal-CSIT Power Allocation

Theorem 5.2 (Causal CSIT). Consider a MIMO block-fading channel with nt

transmit antennas, nr receive antennas, B fading blocks, CSIT-error diversity

de and CSIR-error diversity de. For causal CSIT with delay τd > 0 (where

n(b) = b − τd in (5.6)), the outage SNR-exponent dcicsi for Gaussian inputs is

given by

dcicsi = ntnr

B∑

b=1

υb (5.20)

where

υb ,

min (de, 1) , for b = 1, . . . , τd

min(

de, 1 + ntnr

∑b−τdb′=1 min (υb′, de)

)

, for b = τd + 1, . . . , B.(5.21)

84


On the other hand, the outage SNR-exponent dcicsi for discrete inputs is given by

dcicsi = ntnr

b∑

b=1

ϑb + nr

(

d‡ − bnt

)

ϑb+1 (5.22)

where

d‡ , Bnt −⌈

BR

M

⌉

+ 1 (5.23)

b , maxb: bnt≤d‡

b (5.24)

ϑb ,

min(de, 1) b = 1, . . . ,min(τd, b+ 1)

min(

de, 1 + nrnt

∑b−τdb′=1 min(ϑb′ , de)

)

, b = min(τd, b+ 1) + 1, . . . , b+ 1.

(5.25)


There are two cases for which causal-CSIT power allocation cannot increase

the diversity. The first case is when the CSIR estimation is too unreliable, i.e.,

de ≤ 1. The second case is when the delay of obtaining CSIT is long. For

instance, power control cannot improve diversity if the delay in obtaining CSIT

is greater than B (Gaussian inputs) or b (discrete inputs). Thus, the required

CSIT delay for discrete inputs is more strict than that for Gaussian inputs. This

second case is identical to the result in [75]. Indeed, the result in [75] is an

instance of Theorem 5.2 with infinite de.

Note that with perfect CSI at both transmitter and receiver (de ↑ ∞ and

de ↑ ∞), we have that the outage diversity with causal-CSIT is always finite.

This means that at high SNR, the slope of the generalised outage probability

with respect to logP is finite and zero outage is not possible with finite SNR. It

thus follows that the delay-limited capacity [74] is zero.

5.3.3 Predictive-CSIT Power Allocation

Theorem 5.3 (Predictive CSIT). Consider a MIMO block-fading channel with

nt transmit antennas, nr receive antennas, B fading blocks, CSIT-error diversity

de and CSIR-error diversity de. For predictive CSIT (where n(b) = b + τf in

(5.6), τf ≥ 0), the outage SNR-exponent dpicsi for Gaussian inputs is given by

dpicsi = ntnr

B∑

b=1

min (de, 1 + ntnr min (B, b+ τf) de) . (5.26)

85

5.4 Discussion

On the other hand, the outage SNR-exponent dpicsi for discrete inputs is given by

dpicsi = ntnr

b∑

b=1

min (ηb, de) + nr

(

d‡ − bnt

)

min(

ηb+1, de)

(5.27)

where d‡ and b are defined in (5.23) and (5.24), respectively, and

ηb ,

1 + ntnr (b+ τf) de, b+ τf ≤ b

1 + nrd‡de, b+ τf > b.

(5.28)


In Theorem 5.3, we observe how the predictive-CSIT power control improves

the outage diversity via a recursion in the power adaptation. Note that with

de ≥ 1 + ducsirde (5.29)

we essentially obtain the same diversity as in the noiseless CSIR case, which is

given in [75]. If the prediction parameter τf satisfies

τf ≥

B − 1, for Gaussian inputs

b, for discrete inputs,(5.30)

then the diversity obtained with predictive CSIT is the same as that with full

CSIT.

For de ≤ 1, we note that any power control with full, causal or predictive

CSIT cannot improve the outage diversity with respect to that achieved with

uniform power allocation. This corresponds to the case where the CSIR is too

unreliable.

5.4 Discussion

5.4.1 Pilot-Assisted Channel Estimation

The CSI models in (5.3) and (5.4) are an abstraction of pilot-based channel esti-

mation for which two-way pilot transmissions are used for estimating the fading

coefficients at both terminals. In particular, the estimations take the advantage

of the slow-fading process and the channel reciprocity in TDD systems. The

channel remains constant for block b and thus, the two-way pilot transmissions

for estimating Hb occur within the block b. For orthogonal pilot design [76, 77],

86

5.4 Discussion

where orthogonal vectors are used to estimate the nr · nt entries of the fading

matrix H b for b = 1, . . . , B, these transmissions require (nt + nr) channel uses

and are done prior transmitting the data for block b. Since the transmitter has

access to noisy fading coefficients up to block b, for block-fading channels with

time-domain data transmission, only causal CSIT power allocation with delay

τd > 0 or predictive CSIT with τf = 0 are realistic. On the other hand, for block

fading channels with time-domain channel estimation and frequency-domain data

transmission (such as multi-carrier transmission and OFDM), the full CSIT as-

sumption is of practical relevance.

Suppose that orthogonal pilots [76, 77] are employed and for each training

only one antenna is active at a time. This training requires nt time instants to

transmit pilot symbols from the transmitter and nr time instants to transmit

pilot symbols from the receiver. Furthermore, assume that the pilot power at

the transmitter is P de and at the receiver is P de, then the received pilot symbols

at both ends, when the transmit antenna t and the receive antenna r are active,

are given by

Receiver: Yb,r,t =√P deHb,r,t + Zb,r,t, (5.31)

Transmitter: Yb,r,t =√P deHb,r,t + Zb,r,t (5.32)

where Zb,r,t and Zb,r,t are zero-mean unit-variance complex-Gaussian noise at

the receiver and at the transmitter, respectively; Zb,r,t and Zb,r,t are independent.

Dividing (5.31) by√P de and dividing (5.32) by

√P de leads to the models in (5.3)

and (5.4). This pilot-based estimation refers to the maximum-likelihood (ML)

channel estimation [49]. Note that in the limit of J → ∞, the pilot fractionnt+nr

Jvanishes and hence, the pilot insertion does not affect the SNR-exponents

in Theorems 5.1, 5.2 and 5.3.

Another common channel estimation scheme is the linear minimum mean-

squared error (LMMSE) estimation. Consider the estimation at the receiver and

let the CSIR be

Hb,r,t = ab,r,tYb,r,t (5.33)

where ab,r,t is chosen such that

σ2e =

[

|Eb,r,t|2]

= [

|Hb,r,t − Hb,r,t|2]

(5.34)

is minimised. It is not difficult to show that ab,r,t that minimises the above

87

5.4 Discussion

expectation is given by

a∗b,r,t =

√P de

P de + 1. (5.35)

Furthermore, from the orthogonality principle [78], we have

Hb,r,t = Hb,r,t + Eb,r,t (5.36)

with Hb,r,t and Eb,r,t being independent from each other. Moreover, the variances

of Hb,r,t and Eb,r,t are given by

σ2h= 1− 1

P de + 1, (5.37)

σ2e = 1− σ2

h=

1

P de + 1. (5.38)

Note that for de > 0 we have the dot equality

σ2e =

1

P de + 1.= P−de. (5.39)

The same explanation applies for the estimation at the transmitter where we

have

Hb,r,t = Hb,r,t + Eb,r,t (5.40)

with Hb,r,t and Eb,r,t being independent from each other. Furthermore, the vari-

ances of Hb,r,t and Eb,r,t are given by

σ2h= 1− 1

P de + 1, (5.41)

σ2e = 1− σ2

h=

1

P de + 1.= P−de (5.42)

where the dot equality is valid for de > 0. Note that for a given Hb,r,t = hb,r,t,

Yb,r,t and Yb,r,t are independent; it then follows that for a fixed fading realisation,

the channel estimations at both ends are independent.

The above LMMSE estimation is equivalent to the ML estimation in the

sense that the CSIR-error and the CSIT-error diversities are given by de and

de, respectively. The difference is on which random variables are independent

from the others. Consider the estimation at the receiver. For ML estimation, we

understand from (5.4) that Hb,r,t and Eb,r,t are independent. On the other hand,

for LMMSE estimation in (5.36), the orthogonality principle [78] implies that

Hb,r,t and Eb,r,t are independent. Furthermore, the variances of Eb,r,t for ML and

LMMSE estimations are not exactly equal but only asymptotically equal, i.e.,

88

5.4 Discussion

they have the same dot equality. Despite this difference, we prove in Appendix

C.6 that using LMMSE estimation, we still obtain the same SNR-exponent as

that obtained using ML estimation. Thus, it is not a surprise that Theorem 5.1 is

a generalisation of the results in [45] for transmission rate with zero multiplexing

gain and the result in [46] as pointed out in Section 5.3. The only relevant

parameters are the CSIR-error and CSIT-error diversities de and de.

5.4.2 Mean-Feedback CSIT Model

The results in Theorems 5.1, 5.2 and 5.3 correspond to a case where the transmit-

ter establishes its own independent channel estimator. This is a typical model

for a two-way training for which both transmitter and receiver transmit pilot

symbols. From the received pilot symbols, each communicating party employs

its channel estimator to obtain accurate fading estimates.

One might also consider a CSIT model for which the transmitter obtains a

noisy version of the CSIR via a dedicated feedback channel. This is commonly

referred to as the mean-feedback model [79]. For this model, we have

CSIR Hb,r,t = Hb,r,t + Eb,r,t, (5.43)

CSIT Hb,r,t = Hb,r,t + Eb,r,t. (5.44)

The CSIT equation can be written as

Hb,r,t = Hb,r,t + Eb,r,t = Hb,r,t + Eb,r,t + Eb,r,t. (5.45)

We can see from the last equation that the effective CSIT noise with respect to

the actual channel Hb,r,t is given by

Eb,r,t + Eb,r,t (5.46)

where it has zero mean and variance

P−de + P−de. (5.47)

The effective CSIT-error diversity is then obtained from the exponent of P−de +

P−de , which is given by min (de, de). Thus, for the mean-feedback model, the out-

age SNR-exponents can be obtained by applying de = min (de, de) in Theorems

5.1, 5.2 and 5.3

89

5.4 Discussion

5.4.3 Comments on Achievable Rates

The technique used to derive our main results is based on the GMI, which is

an achievable coding rate when a fixed decoding rule—which is not necessarily

matched to the channel—is employed [13]. Transmission rates below the GMI

have a vanishing error probability as the block length tends to infinity. Further-

more, the GMI is the largest reliable transmission rate when the encoder is forced

to use i.i.d. inputs [4, 13, 36, 39]. Therefore, the outage SNR-exponents derived

in Theorems 5.1, 5.2 and 5.3 are the optimal SNR-exponents when using i.i.d.

codebooks (Gaussian or discrete) and a nearest neighbour decoder.

Note however, one may obtain a larger achievable rate if the assumption of

i.i.d. codebooks can be lifted. Indeed, i.i.d. inputs may not be optimal and

it has been shown in [13, 36, 38] that using inputs with a good cost constraint

yields a lower bound to the mismatched capacity (LM bound), that is larger

than the GMI. The main difficulty of using LM bound is the optimisation over

all possible cost function, which in general cannot be solved analytically. For a

given codebook, a larger achievable rate is also possible by optimising over all

possible decoders.

There are several works that study a similar problem to ours, but use different

characterisations of the reliable transmission rate. In particular, we refer to the

works in [80–82] for comparison. For simplicity and for the sake of comparison,

we consider a SISO quasi-static channel (B = 1). References [80–82] assumed

Gaussian inputs and LMMSE channel estimation at the receiver; no assumption

on the decoder structure is made. Thus, from (5.1) and (5.36) we can write the

input-output relationship as

Y =√PHx+

√PEx+Z (5.48)

where Y and Z are the random received vector and the noise vector, respec-

tively, which take values on CJ ; H and E are scalar fading estimate and fading

estimation error, respectively; x is the channel input vector; P is the transmis-

sion power. Note that since every realisation of H is known at the receiver,

the argument in [80–82] is that one can treat the term√PEx as an additional

noise term. Furthermore, it was argued in [80, 82] that by modelling the signal-

dependent noise

Z ′ =√PEx+Z (5.49)

as a zero-mean Gaussian noise with i.i.d. entries independent of x and each

90

5.4 Discussion

having variance

1 + P |E|2, (5.50)

one can obtain a lower bound to the instantaneous mutual information as [82]

I(H, E, P ) = log2

(

1 +P |H|2

1 + P |E|2

)

. (5.51)

Note that the above expression leads to an outage SNR-exponent that is obtained

by solving

Pr

I(H, E, P ) < R

= Pr

log2

(

1 +P |H|2

1 + P |E|2

)

< R

. (5.52)

Interestingly, following the steps used in Appendix A.3 for B = 1, the GMI can

be lower-bounded as

Igmi(H, H, P ) ≥ log2

(

1 +P |H|2

1 + P |H − H|2

)

− 1. (5.53)

In the high-SNR regime, the difference between (5.51) and (5.53) does not affect

the outage SNR-exponents. Thus, it is not surprising that for no-CSIT and

transmitter pilot power P de, our results are identical to the results in [80, 82].

Even though evaluating the bound (5.51) seems to be much easier than eval-

uating the GMI, there are several drawbacks with the lower-bounding technique

in (5.51).

• To the best of our knowledge, there is no explicit proof of the achievability

of I(H, E, P ). The lower-bound reasoning comes from the work by Zheng

and Tse [83], where the LMMSE channel estimate model is used at the

receiver to derive a lower bound to the blockwise-ergodic capacity. In [83],

the block length J is a finite quantity in general, and the capacity expression

is obtained via coding over infinitely many blocks. It then follows that H

and E have uncorrelated statistics over infinitely many blocks. The lower-

bound proof then follows from [84, Sec. III], [85, App. I]. The lower bound

of the blockwise-ergodic capacity is obtained after averaging over all states

of fading and its corresponding estimate.

It is not yet clear whether the technique in [83] can directly be applied to

non-ergodic fading channels. As supposed to coding over infinitely many

blocks, in a quasi-static channel, coding is performed for only one block and

91

5.4 Discussion

0 0.5 1 1.5 20

2

4

6

8

10

12

Igmi(H, H, P )

I(H)

I(H, E, P )

I(H, E, P ) > I(H)I(H, E, P ) < I(H)

Information rates (bits per channel use)

Density

Figure 5.3: Comparison of the densities of the GMI and and the lower bound(5.51) with fading realisation H = 1, transmission power P = 1 (unit power) andCSIR-error variance σ2

e = 0.1.

the block length J is taken to infinity to recover the information outage

probability [20,23]. Note that during a single block, both fading and fading

estimate are constant; it follows that the estimation error is also constant

for that block. Hence, (5.51) may not be an achievable lower bound to the

instantaneous mutual information for the block of interest since both H

and E are constant within a single block.

The above explanation means that there is no guarantee that transmitting

codeword at rate R = I(H, E, P ) − ǫ for any ǫ > 0 has a vanishing error

probability as the block length J tends to infinity. This is in contrast with

Igmi(H, H, P ) for which the achievability has been proven in [35, 37, 38].

• For some combinations of H and E, we may find I(H, E, P ) that can be

larger than the perfect-CSIR mutual information I(H)

I(H) = log2(

1 + P |H|2)

. (5.54)

We illustrate this in Figure 5.3 where we fix H = 1 and P = 1, and we use

the LMMSE estimation model such that

H = H + E. (5.55)

92

Chapter4/Chapter4Figs/pdf_LB_MI_non_coh.eps

5.4 Discussion

For a fixed H = 1 and P = 1, the probability that I(H, E, P ) is greater

than I(H) is non zero, which implies that the lower bound (5.51) may

violate the data-processing inequality. This result indirectly disproves the

achievability of I(H, E, P ). Furthermore, this lower bound is in contrast to

Igmi(H, H, P ), which is always smaller than I(H) as shown in [35, 37, 38].

• It is not clear whether modelling Z ′ in (5.49) as a signal-independent Gaus-

sian noise would still result in the correct exponent for discrete inputs.

5.4.4 Comments on Continuous Input Distributions

We have considered Gaussian inputs as a possible continuous input distribution

for our SNR-exponent analysis. The main reason is that Gaussian input is the

optimal input distribution for the channel in (5.1) when perfect CSI is available

at the receiver. However, when only noisy CSIR is known, Gaussian inputs may

no longer be optimal.

Using expression (5.13), we can show that the SNR-exponent for Gaussian

inputs is a lower bound to the SNR-exponent for some other continuous distribu-

tions satisfying certain conditions. We first assume that the input vector is i.i.d.

over all transmit antennas and all channel uses and is such that [

|X|2]

= 1.

The expression in (5.13) (in natural-base logarithm) can be decomposed into two

terms as follows

Igmib (Pb,Hb, Hb, s) =

[

log Qs(

Y , Hb,Pb,X)∣

∣

∣Hb, Hb,Pb

]

− [

log [

Qs(

Y , Hb,Pb,X′)∣

∣

∣Y ,Hb, Hb,Pb

]∣

∣

∣Hb, Hb,Pb

]

.

(5.56)

Evaluating the first term yields

[

log Qs(

Y , Hb,Pb,X)∣

∣

∣Hb, Hb,Pb

]

= −s[

∥

∥

∥

(

Hb − Hb

)

P12b X +Z

∥

∥

∥

2∣

∣

∣

∣

Hb, Hb,Pb

]

(5.57)

= −s(

nr +

[

∥

∥

∥EbP

12b X

∥

∥

∥

2∣

∣

∣

∣

Hb, Hb,Pb

])

(5.58)

≥ −s(

nr +

[

∥

∥

∥EbP

12

b

∥

∥

∥

2

F‖X‖2

∣

∣

∣

∣

Hb, Hb,Pb

])

(5.59)

= −s(

nr + nt

∥

∥

∥EbP

12b

∥

∥

∥

2

F

)

(5.60)

93

5.4 Discussion

where the inequality is due to the Frobenius norm property ‖Ax‖2F ≤ ‖A‖2F ·‖x‖2[86, Sec. 5.6] . The first expectation in the second term can be evaluated as

follows

[

Qs(

Y , Hb,Pb,X′)∣

∣

∣Y ,Hb, Hb,Pb

]

=

∫

x′

PX(x′)e−s

∥

∥

∥Y −HbP

1/2b x′

∥

∥

∥

2

dx′. (5.61)

Then, if the input pdf can be bounded as

PX(x) ≤ G

πnte−‖x‖2 , x ∈ nt (5.62)

for some constant G > 0, independent of the SNR, then the above expectation

can be bounded as

∫

x′

PX(x′)e−s

∥

∥

∥Y −HbP1/2b x′

∥

∥

∥

2

dx′

≤ G

∫

x′

1

πnte−‖x′‖2e

−s∥

∥

∥Y −HbP1/2b x′

∥

∥

∥

2

dx′ (5.63)

=G

det(

Inr + sHbPbH†b

)exp

(

−sY †(

Inr + sHbPbH†b

)−1

Y

)

. (5.64)

It follows that with s > 0, Igmib (Pb,Hb, Hb, s) can be lower-bounded as

Igmib (Pb,Hb, Hb, s)

≥ log det(

Inr + sHbPbH†b

)

− logG− s

(

nr + nt

∥

∥

∥EbP

12b

∥

∥

∥

2

F

)

+

[

sY †(

Inr + sHbPbH†b

)−1

Y

]

(5.65)

≥ log det(

Inr + sHbPbH†b

)

− logG− s

(

nr + nt

∥

∥

∥EbP

12b

∥

∥

∥

2

F

)

(5.66)

where the last inequality is because for any y ∈ nr, the term

y†(

Inr + sHbPbH†b

)−1

y (5.67)

is always non-negative as shown in Appendix A.3. Furthermore, instead of taking

the supremum over s > 0 to the summation of the RHS of (5.66) over all B blocks,

using s = s,

s =B

Bnr + nt

∑Bb=1

∥

∥

∥EbP

12b

∥

∥

∥

2

F

, (5.68)

94

5.5 Conclusion

yields the GMI lower bound

Igmi(H, H,P)

≥ 1

B

B∑

b=1

log det

Inr +

BHbPbH†b

Bnr + nt

∑Bb=1

∥

∥

∥EbP

12b

∥

∥

∥

2

F

− 1− logG

. (5.69)

Comparing the GMI lower bound in Appendix A.3 to the above lower bound

when the transmit power matrix Pb is given in (5.5), it is clear that both yield

the same large-SNR set that characterises the GMI lower bound since logG is

an SNR-independent constant. Since the SNR-exponent for Gaussian inputs

derived using GMI upper and lower bounds are identical, it follows that for any

input distribution such that the condition in (5.62) holds, the SNR-exponent is

lower-bounded by the SNR-exponent for Gaussian inputs. It is not yet clear

whether this lower bound is tight because solving the GMI upper bound for

input distribution such that (5.62) holds remains a challenge. This also implies

that there are other continuous input distributions that may have larger SNR-

exponents than Gaussian inputs.

5.5 Conclusion

We have studied the effects of imperfect CSI on the performance of data trans-

mission over MIMO block-fading channels. In particular, we derived the outage

SNR-exponent as a function of the CSIR and the CSIT noise variances, σ2e = P−de

and σ2e = P−de where P is the average data transmission power. We showed that

noisy CSIR has more detrimental effects on the SNR-exponent than noisy CSIT.

The results shed new light on the design of pilot-assisted channel estimation

in block-fading channels. If pilot symbols from both ends are sent with power

P (de = de = 1), then the CSIT estimation and the power adaptation are not

useful in terms of improving the SNR-exponent. On a positive note, if the pilot

signalling can be done at a power level sufficiently higher than P , then one can

reap significant benefits due to power adaptation across blocks. Furthermore,

since noisy CSIR has more detrimental effects than noisy CSIT, obtaining a

reliable CSIR is more important than obtaining a reliable CSIT. For full CSIT,

this can be achieved by having pilot symbols from the transmitter that have a

larger power exponent than pilot symbols from the receiver. More specifically,

a reliable CSIR can be obtained if de > 1 + ducsirde, where ducsir is the perfect-

CSIR outage SNR-exponent with uniform power allocation. One common way

95

5.5 Conclusion

of guaranteeing this reliable CSIR is by allocating the same amount of power in

the data symbols to the pilot symbols at the transmitter [80–82]. Note that even

though identical pilot and data power at the transmitter justifies perfect-CSIR

analysis of the outage SNR-exponent, the generalised outage probability can still

improve if the pilot power is larger than the data power as shown in Chapter 3.

For causal and predictive CSIT, the outage SNR-exponent does not only depend

on the CSIT-error diversity de and the CSIR-error diversity de, but also on the

CSIT delay τd or the CSIT prediction parameter τf .

The outage SNR-exponents derived in this chapter are the optimal SNR-

exponents when using i.i.d. codebooks (Gaussian or discrete) and a nearest

neighbour decoder. In order to obtain a potentially better SNR-exponent, one

should consider non-i.i.d. codebooks or different decoding strategy.

96

Part II

Stationary Ergodic Fading

Channels

97

Chapter 6

Stationary Fading Channels

Recall that for delay-unconstrained transmission, a codeword spans over a large

number of fading realisations and the channel is assumed to be stationary and

ergodic. For most fading distributions, the channel capacity is positive. With a

good coding scheme, one can transmit reliably, i.e., with vanishing error proba-

bility, at rates below the channel capacity.

In this chapter, we revisit existing results concerning the interplay between

capacity and channel state information (CSI) in stationary Gaussian fading chan-

nels. Similarly to Part I, we refer to the knowledge of the fading as the CSI.

This chapter is structured as follows. Section 6.1 introduces a model for

multiple-input multiple-output (MIMO) stationary Gaussian flat-fading chan-

nels. Section 6.2 reviews the capacity of fading channels for both coherent and

noncoherent settings. We focus on the high signal-to-noise ratio (SNR) regime

and study the capacity pre-log, defined as the limiting ratio of the capacity to

the logarithm of the SNR as the SNR tends to infinity.

6.1 MIMO Gaussian Flat-Fading Channels

We consider a discrete-time MIMO flat-fading channel with nt transmit anten-

nas and nr receive antennas. The channel output at time k, k ∈ , is an nr-

dimensional random vector

Yk =

√

SNR

nt

Hkxk +Zk. (6.1)

Here xk ∈ nt denotes the channel input vector at time k, Hk denotes the nr ×nt-dimensional random fading matrix at time k and Zk denotes the nr-variate

random additive noise vector at time k.

99

6.2 Capacity and The Pre-Log

m ENC

√

SNR

nt

xk

× +Yk

DEC m

Hk Zk

Figure 6.1: A diagram for communication over a stationary MIMO fading chan-nel.

We shall assume throughout that the noise process Zk, k ∈ is a se-

quence of independent and identically distributed (i.i.d.) complex-Gaussian ran-

dom vectors with zero mean and identity covariance matrix. The average SNR

for each received antenna is thus SNR. The fading process Hk, k ∈ is sta-

tionary, ergodic and complex-Gaussian. We assume that the nr · nt processes

Hk(r, t), k ∈ , r = 1, . . . , nr, t = 1, . . . , nt are independent and have the

same law, with each process having zero-mean, unit-variance and power spectral

density fH(λ), −12≤ λ ≤ 1

2. Thus, fH(·) is a non-negative function satisfying

[Hk+m(r, t)H∗k(r, t)] =

∫ 1/2

−1/2

eı2πmλfH(λ)dλ (6.2)

where H∗k(r, t) denotes the complex conjugate of Hk(r, t). We finally assume that

the fading process Hk, k ∈ and the noise process Zk, k ∈ are independentand that their joint law does not depend on xk, k ∈ .


The transmission of a message m, m ∈ M = 1, . . . , |M| over the channel

(6.1) is illustrated in Figure 6.1. The encoder (ENC) first maps the message m

into a codeword which is selected from the codebook C. Each codeword forms

a sequence of n channel inputs, e.g., x1, . . . ,xn, which is transmitted over the

channel. We say that the channel inputs satisfy an average-power constraint if

1

n

n∑

k=1

[

‖Xk‖2]

≤ nt. (6.3)

100

Chapter5/Chapter5Figs/MIMO-Stationary-Model.eps


On the other hand, we say that the channel inputs satisfy a peak-power constraint

if it holds with probability one

‖Xk‖2 ≤ nt, k ∈ . (6.4)

Upon receiving a sequence of n channel outputs y1, . . . ,yn, the decoder (DEC)

decides on an output message m, m ∈ M = 1, . . . , |M| based on a certain

decision rule. We say that a rate R (in nats per channel use),

R ,log |M|n

, (6.5)

is achievable if there exists a combination of encoder and decoder such that the

error probability Prm 6= m tends to zero as the codeword length n tends to

infinity.

Capacity is defined as the supremum of all achievable rates R maximised over

all possible encoders and decoders. We denote Cav(SNR) as the capacity under

the average-power constraint (6.3) and Cp(SNR) as the capacity under the peak-

power constraint (6.4). We shall remove the subscripts av and p if the analysis

applies to both constraints.

We shall focus on the high-SNR behaviour of the capacity. One important

figure of merit is the capacity pre-log, defined in the following.

Definition 6.1 (Capacity Pre-Log). The capacity pre-log ΠC is defined as the

limiting ratio of the capacity to the logarithm of the SNR

ΠC , lim supSNR→∞

C(SNR)

log SNR. (6.6)

In the literature, the capacity pre-log is often referred to as the spatial

multiplexing gain [20]. For high SNR, the capacity can be approximated as

C(SNR) ≈ ΠC · log SNR. Hence, the capacity pre-log measures the growth rate

of the capacity in the high-SNR regime.

6.2.1 Coherent Channels

We refer to the coherent fading channel as a channel where the receiver has

access to the fading realisations. Since the fading realisations are available at

the receiver, we can treat the fading as part of the channel output. The channel

101


capacity under an average-power constraint is in this case given by (see, e.g., [87])

Cav(SNR) = limn→∞

1

nsup I(X1, . . . ,Xn;Y1, . . . ,Yn,H1, . . . ,Hn) (6.7)

where I(· ; ·) is the mutual information, and where the maximisation is over the

joint distribution X1, . . . ,Xn satisfying

1

n

n∑

k=1

[

‖Xk‖2]

≤ nt. (6.8)

The capacity under a peak-power constraint is obtained by replacing the con-

straint (6.8) with (6.4).

The capacity of multi-antenna coherent flat-fading channels has been studied

in the literature (see, e.g., [88–90]). Under the average-power constraint (6.8),

the capacity (6.7) (in nats per channel use) can be expressed as an expectation

over the random fading matrix [88–90]

Cav(SNR) =

[

log det

(

Inr +SNR

ntHH

†)]

. (6.9)

This capacity can be achieved using nearest neighbour decoding—which in this

case is maximum-likelihood decoding—and a Gaussian codebook whose entries

are drawn i.i.d. from nt-variate Gaussian distribution Nnt (0, Int).

The following proposition provides the pre-log for Cav(SNR) in (6.9), which

was derived in [89, 90].

Proposition 6.1 (Coherent MIMO Pre-Log). Assume that the fading satisfies

0 < [

‖H‖2F]

<∞. (6.10)

Then, the capacity pre-log of coherent Gaussian fading channels under the average-

power constraint (6.8) is given by

ΠCav = min(nt, nr). (6.11)

Thus, for coherent fading channels, the capacity increases as the logarithm

of the SNR with the growth rate given by the minimum number of transmit

and receive antennas. Therefore, at high SNR, higher information rates can be

supported by increasing the number of antennas.

Notice that Proposition 6.1 characterises the pre-log under the average-power

constraint (6.3). Since the peak-power constraint (6.4) also implies the average-

102


power constraint (6.3), it follows that Cp(SNR) ≤ Cav(SNR) and hence ΠCp ≤ΠCav .

6.2.2 Noncoherent Channels

We refer to the noncoherent fading channel as a channel where both the trans-

mitter and the receiver do not have access to the fading realisations but may be

aware of the fading statistics. For this channel, the capacity expression in (6.7)

changes to

Cav(SNR) = limn→∞

1

nsup I(X1, . . . ,Xn;Y1, . . . ,Yn). (6.12)

The capacity Cp(SNR) is defined accordingly. From the chain rule for mutual

information [18, Th. 2.5.2]

I(X1, . . . ,Xn;Y1, . . . ,Yn) = I(X1, . . . ,Xn;Y1, . . . ,Yn,H1, . . . ,Hn)

− I(X1, . . . ,Xn;H1, . . . ,Hn|Y1, . . . ,Yn) (6.13)

and the non-negativity of mutual information [18, Th. 2.6.3]

I(X1, . . . ,Xn;H1, . . . ,Hn|Y1, . . . ,Yn) ≥ 0, (6.14)

we can see that the capacity of the noncoherent fading channel (6.12) is upper-

bounded by the capacity of the coherent fading channel (6.7).

Lapidoth [85] derived the capacity pre-log of noncoherent single-input single-

output (SISO) fading channels under a peak-power constraint. The analysis

in [85] was later extended by Koch and Lapidoth [91] to multiple-input single-

output (MISO) fading channels. For the fading process defined in Section 6.1,

it follows that the capacity pre-log of MISO fading channels (nt ≥ 1, nr = 1) is

equal to the capacity pre-log of the SISO fading channel (nt = 1, nr = 1). The

results in [85, 91] are summarised in the following proposition.

Proposition 6.2 (Noncoherent MISO Pre-Log). Consider a MISO fading chan-

nel nt ≥ 1 and nr = 1 under the peak-power constraint (6.4). If the fading

processes Hk(1, t), k ∈ , t = 1, . . . , nt are independent and have the same

law, with each process having zero-mean, unit-variance and power spectral den-

sity fH(λ), −12≤ λ ≤ 1

2, then

ΠCp = µ(

λ : fH(λ) = 0)

(6.15)

103


where µ(·) denotes the Lebesgue measure on the interval [−1/2, 1/2].

To the best of our knowledge, the capacity pre-log of MIMO fading channels

is unknown. The best so far known lower bound is due to Etkin and Tse [1],

which characterises the pre-log under an average-power constraint. The lower

bound in [1] is given in the following proposition.

Proposition 6.3 (Noncoherent MIMO Pre-Log). Consider a MIMO fading chan-

nel under the average-power constraint (6.3). If the nr · nt fading processes

Hk(r, t), k ∈ , t = 1, . . . , nt, r = 1, . . . , nr are independent and have the

same law, with each process having zero-mean, unit-variance and power spectral

density fH(λ), −12≤ λ ≤ 1

2, then

ΠCav ≥ min(nt, nr)

(

1−min(nt, nr)µ(

λ : fH(λ) > 0)

)

. (6.16)

Observe that (6.16) specialises to (6.15) for nr = 1. It should be noted that the

capacity pre-log for MISO and SISO fading channels was derived under a peak-

power constraint on the channel inputs, whereas the lower bound on the capacity

pre-log for MIMO fading channels was derived under an average-power constraint.

Clearly, the capacity pre-log corresponding to a peak-power constraint can never

be larger than that corresponding to an average-power constraint. It is believed

that the two pre-logs are in fact identical (see the conclusion in [85]).

Unlike the capacity pre-log of the coherent fading channel, the capacity pre-

log of the noncoherent channel depends on the time-correlation of the fading,

indicated by the power spectral density fH(·), cf. (6.2). As discussed in [85],

the function fH(·) characterises the predictability of the fading process. As the

Lebesgue measure µ(λ : fH(λ) = 0) increases, the capacity pre-log of the non-

coherent fading channel approaches that of the coherent channel.

The capacity pre-log characterises the largest information rates achievable at

high SNR. The above results, however, do not explain how to construct practical

schemes that achieve the capacity pre-log, particularly in the noncoherent chan-

nel, where CSI is not available. In Chapter 7, we propose a simple scheme using

nearest neighbour decoding and pilot-aided channel estimation that achieves the

MISO capacity pre-log (Proposition 6.2) and achieves the lower bound of the

MIMO capacity pre-log (Proposition 6.3). In Chapter 8, we extend the analysis

of the scheme proposed in Chapter 7 to the fading multiple-access channel.

104

Chapter 7

Pilot-Aided Channel Estimation

for Stationary Fading Channels

7.1 Introduction

The capacity of coherent multiple-input multiple-output (MIMO) channels in-

creases with the signal-to-noise ratio (SNR) as min(nt, nr) log SNR, where nt and

nr are the number of transmit and receive antennas, respectively [88, 89]. In

Chapter 6, we refer to the growth factor min(nt, nr) as the capacity pre-log.

This capacity growth can be achieved using a nearest neighbour decoder which

selects the codeword that is closest (in a Euclidean-distance sense) to the chan-

nel output. In fact, for coherent fading channels with additive Gaussian noise,

this decoder is the maximum-likelihood decoder and is therefore optimal in the

sense that it minimises the error probability (see [4] and references therein). The

coherent channel model assumes that there is a genie that provides the fading

coefficients to the decoder; this assumption is difficult to achieve in practice. In

this work, we replace the role of the genie by a scheme that estimates the fading

via pilot symbols. This can be viewed as a particular coding strategy over a

noncoherent fading channel, i.e., a channel where both communication ends do

not have access to fading coefficients but may be aware of the fading statistics.

Note that with imperfect fading estimations, the nearest neighbour decoder that

treats the fading estimate as if it were perfect is not necessarily optimal. Never-

theless, we show that, in some cases, nearest neighbour decoding and pilot-aided

channel estimation achieves the capacity pre-log of noncoherent fading channels.

The capacity of noncoherent fading channels has been studied in a number of

works. Building upon [77], Hassibi and Hochwald [76] studied the capacity of the

block-fading channel and used pilot symbols (also known as training symbols)

105

7.1 Introduction

to obtain reasonably accurate fading estimates. Lozano and Jindal [92] provided

tools for a unified treatment of pilot-based channel estimation in both block

and stationary bandlimited fading channels. In these works, lower bounds on

the channel capacity were obtained. Lapidoth [85] studied a single-input single-

output (SISO) fading channel for more general stationary fading processes and

showed that, depending on the predictability of the fading process, the capacity

growth in SNR can be, inter alia, logarithmically or double logarithmically. The

extension of [85] to multiple-input single-output (MISO) fading channels can be

found in [91]. A lower bound on the capacity of stationary MIMO fading channels

was derived by Etkin and Tse in [1].

Lapidoth and Shamai [54] and Weingarten et. al. [39] studied noncoherent

stationary fading channels from a mismatched-decoding perspective. In partic-

ular, they studied achievable rates with Gaussian codebooks and nearest neigh-

bour decoding. In both works, it is assumed that there is a genie that provides

imperfect estimates of the fading coefficients.

In this chapter, we add the estimation of the fading coefficients to the anal-

ysis. In particular, we study a communication system where the transmitter

emits at regular intervals pilot symbols, and where the receiver performs channel

estimation and data detection, separately. Based on the channel outputs corre-

sponding to pilot transmissions, the channel estimator produces estimates for the

remaining time instants using a linear minimum mean-square error (LMMSE) in-

terpolator. Using these estimates, the data detector employs a nearest neighbour

decoder to decide what the transmitted message was. We study the achievable

rates of this communication scheme at high SNR. In particular, we study the

pre-log for fading processes of bandlimited power spectral densities. (The pre-

log is defined as the limiting ratio of the achievable rate to the logarithm of the

SNR as the SNR tends to infinity.)

For SISO fading channels, using some simplifying arguments, Lozano [93] and

Jindal and Lozano [92] showed that this scheme achieves the capacity pre-log. In

this chapter, we prove this result without any simplifying assumptions and ex-

tend it to MIMO fading channels. The rest of the chapter is organised as follows.

Section 7.2 describes the channel model and introduces our transmission scheme

along with the nearest neighbour decoder and pilots for channel estimation. Sec-

tion 7.3 defines the pre-log and presents our main results. Section 7.4 provides

the proof of our main results. Section 7.5 summarises the important points of

the chapter.

106

7.2 System Model and Transmission Scheme


We consider a discrete-time MIMO flat-fading channel with nt transmit antennas

and nr receive antennas, whose channel output at time instant k ∈ is the

complex-valued nr-dimensional random vector given by

Yk =

√

SNR

ntHkxk +Zk. (7.1)

Here xk ∈ nt denotes the time-k channel input vector, Hk denotes the nr × nt-

dimensional random fading matrix at time k, and Zk denotes the nr-variate

random additive noise vector at time k.

The noise process Zk, k ∈ is a sequence of i.i.d. complex-Gaussian ran-

dom vectors of zero mean and covariance matrix Inr. SNR denotes the average

SNR for each received antenna.

The fading process Hk, k ∈ is stationary, ergodic and complex-Gaussian.

We assume that the nr ·nt processes Hk(r, t), k ∈ , r = 1, . . . , nr, t = 1, . . . , nt

are independent and have the same law, with each process having zero-mean,

unit-variance and power spectral density (psd) fH(λ), −12≤ λ ≤ 1

2. Thus, fH(·)

is a non-negative function satisfying

[Hk+m(r, t)H∗k(r, t)] =

∫ 1/2

−1/2


where H∗k(r, t) denotes the complex conjugate of Hk(r, t). We further assume

that the psd fH(·) has bandwidth λD < 1/2, i.e., fH(λ) = 0 for |λ| > λD and

fH(λ) > 0 otherwise.

We finally assume that the fading process Hk, k ∈ and the noise process

Zk, k ∈ are independent and that their joint law does not depend on xk, k ∈.

The transmission involves both codewords and pilots. The former convey the

message to be transmitted, and the latter are used to facilitate the estimation of

the fading coefficients at the receiver. We denote a codeword conveying a message

m, m ∈ M (where M =

1, . . . , enR

is the set of possible messages), at rate R

by the length-n sequence of input vectors x1(m), . . . , xn(m). The codeword is

selected from the codebook C, which is drawn i.i.d. from an nt-variate complex-

Gaussian distribution with zero mean and identity covariance matrix such that

107


Pilot Data No transmission

n +“

n

L−nt

+ 1”

nt

L L(T − 1)L(T − 1)

t = 1

t = 2

Figure 7.1: Structure of pilot and data transmission for nt = 2, L = 7 and T = 2.

1

n

n∑

k=1

[

∥

∥Xk(m)∥

∥

2]

= nt, m ∈ M. (7.3)

To estimate the fading matrix, we transmit orthogonal pilot vectors. The

pilot vector pt ∈ nt used to estimate the fading coefficients corresponding to

the t-th transmit antenna is given by pt(t) = 1 and pt(t′) = 0 for t′ 6= t. For

example, the first pilot vector is p1 = (1, 0, · · · , 0)T. To estimate the whole fading

matrix, we thus need to send the nt pilot vectors p1, . . . ,pnt.

The transmission scheme is as follows. Every L time instants (for some L ∈), we transmit the nt pilot vectors p1, . . . ,pnt. Each codeword is then split up

into blocks of L − nt data vectors, which will be transmitted after the nt pilot

vectors. The process of transmitting L − nt data vectors and nt pilot vectors

continues until all n data vectors are completed. Herein we assume that n is an

integer multiple of L−nt.7.1 Prior to transmitting the first data block, and after

transmitting the last data block, we introduce a guard period of L(T − 1) time

instants (for some T ∈ ), where we transmit every L time instants the nt pilot

vectors p1, . . . ,pnt, but we do not transmit data vectors in between. The guard

period ensures that, at every time instant, we can employ a channel estimator

that bases its estimation on the channel outputs corresponding to the T past

and the T future pilot transmissions. This facilitates the analysis and does not

incur any loss in terms of achievable rates. The above transmission scheme is

illustrated in Figure 7.1. The channel estimator is described in the following.

Note that the total block-length of the above transmission scheme (comprising

data vectors, pilot vectors and guard period) is given by

n′ = np + n+ ng (7.4)

7.1If n is not an integer multiple of L − nt, then the last L − nt instants are not fully usedby data vectors and contain therefore time instants where we do not transmit anything. Thethereby incurred loss in information rate vanishes as n tends to infinity.

108

Chapter6/Chapter6Figs/MIMO-transmission-scheme-v2.eps


where np denotes the number of channel uses reserved for pilot vectors, and where

ng denotes the number of channel uses during the silent guard period, i.e.,

np =

(

n

L− nt

+ 1 + 2(T − 1)

)

nt, (7.5)

ng = 2(L− nt)(T − 1). (7.6)

We now turn to the decoder. Let D denote the set of time indices where data

vectors of a codeword are transmitted, and let P denote the set of time indices

where pilots are transmitted. The decoder consists of two parts: a channel esti-

mator and a data detector. The channel estimator considers the channel output

vectors Yk′, k′ ∈ P corresponding to the past and future T pilot transmissions

and estimates Hk(r, t) using a linear interpolator, i.e., the estimate H(T )k (r, t) of

the fading coefficient Hk(r, t) is given by

H(T )k (r, t) =

k+TL∑

k′=k−TL:k′∈P

ak′(r, t)Yk′(r), k ∈ D (7.7)

where the coefficients ak′(r, t) are chosen in order to minimise the mean-squared

error.

Note that, since the pilot vectors transmit only from one antenna, the fad-

ing coefficients corresponding to all transmit and receive antennas (r, t) can

be observed. Further note that, since the fading processes Hk(r, t), k ∈ ,r = 1, . . . , nr, t = 1, . . . , nt are independent, estimating Hk(r, t) only based on

Yk(r), k ∈ rather than on Yk, k ∈ incurs no loss in optimality.

Since the time-lags between Hk, k ∈ D and the observations Yk′, k′ ∈ P

depend on k, it follows that the interpolation error

E(T )k (r, t) = Hk(r, t)− H

(T )k (r, t) (7.8)

is not stationary but cyclo-stationary with period L. It can be shown that,

irrespective of r, the variance of the interpolation error

ǫ2ℓ,T (r, t) =

[

∣

∣

∣Hk(r, t)− H

(T )k (r, t)

∣

∣

∣

2]

(7.9)

109


tends to the following expressions as T tends to infinity [94]

ǫ2ℓ(t) , limT→∞

ǫ2ℓ,T (r, t) (7.10)

= 1−∫ 1/2

−1/2

SNR|fHL,ℓ−t+1(λ)|2SNRfHL,0(λ) + nt

dλ (7.11)

where ℓ , k mod L denotes the remainder of k/L. Here fHL,ℓ(·) is given by

fHL,ℓ(λ) =1

L

L−1∑

ν=0

fH

(

λ− ν

L

)

ei2πℓλ−νL , ℓ = 0, . . . , L− 1 (7.12)

and fH(·) is the periodic function of period [−1/2, 1/2) that coincides with fH(λ)

for −1/2 ≤ λ ≤ 1/2. If

L ≤ 1

2λD(7.13)

then |fHL,ℓ(·)| becomes

|fHL,ℓ(λ)| = fHL,0(λ) =1

LfH

(

λ

L

)

, −1

2≤ λ ≤ 1

2. (7.14)

In this case, irrespective of ℓ and t, the variance of the interpolation error is given

by

ǫ2ℓ(t) = ǫ2 = 1−∫ 1/2

−1/2

SNR

(

fH(λ))2

SNRfH(λ) + Lntdλ (7.15)

which vanishes as SNR tends to infinity. Recall that λD denotes the bandwidth of

fH(·). Thus, (7.13) implies that no aliasing occurs as we undersample the fading

process L times. While the variance in (7.11) may depend on the transmit an-

tenna index t, t = 1, . . . , nt, the variance in (7.15) is independent of the transmit

antenna index. See Section 7.4.1 for a more detailed discussion.

The channel estimator feeds the sequence of fading estimates H (T )k , k ∈ D

(which is composed of the matrix entries H(T )k (r, t), k ∈ D) to the data detector.

We shall denote its realisation by H(T )k , k ∈ D. Based on the channel outputs

yk, k ∈ D and fading estimates H(T )k , k ∈ D, the data detector uses a nearest

neighbour decoder to guess which message was transmitted. Thus, the decoder

decides on the message m that satisfies

m = arg minm∈M

D(m) (7.16)

110

7.3 The Pre-Log

where

D(m) ,∑

k∈D

∥

∥

∥

∥

∥

yk −√

SNR

ntH

(T )k xk(m)

∥

∥

∥

∥

∥

2

. (7.17)

7.3 The Pre-Log

We say that a rate

R(SNR) ,log |M|n

(7.18)

is achievable if the error probability tends to zero as the codeword length n tends

to infinity. In this work, we study the maximum rate R∗(SNR) that is achievable

with nearest neighbour decoding and pilot-aided channel estimation. We focus

on the achievable rates at high SNR. In particular, we are interested in the

maximum achievable pre-log, defined as

ΠR∗ , lim supSNR→∞

R∗(SNR)

log SNR. (7.19)

We recall some key points on the capacity pre-log of the noncoherent fading

channel in Chapter 6. Proposition 6.2 summarises the capacity pre-log of the

SISO fading channel under a peak-power constraint derived by Lapidoth [85] as

ΠCp = µ(

λ : fH(λ) = 0)

(7.20)

where µ(·) denotes the Lebesgue measure on the interval [−1/2, 1/2]. Koch and

Lapidoth [91] later extended this result to MISO fading channels and showed that

if the fading processes Hk(t), k ∈ , t = 1, . . . , nt are independent and have

the same law, then the capacity pre-log of MISO fading channels is equal to the

capacity pre-log of the SISO fading channel with fading process Hk(1), k ∈ .Using (7.20), the capacity pre-log of MISO fading channels with bandlimited psd

of bandwidth λD can be evaluated as

ΠCp = 1− 2λD. (7.21)

To the best of our knowledge, a complete characterisation on the capacity pre-

log of MIMO fading channels is unknown. Proposition 6.3 provides the best so

far known lower bound due to Etkin and Tse [1], i.e., for independent fading

processes Hk(r, t), k ∈ , t = 1, . . . , nt, r = 1, . . . , nr that have the same

law, the capacity pre-log of the MIMO fading channel under an average-power

111

7.3 The Pre-Log

constraint can be lower-bounded as

ΠCav ≥ min(nt, nr)

(

1−min(nt, nr)µ(

λ : fH(λ) > 0)

)

. (7.22)

For a psd that is bandlimited to λD, this becomes

ΠCav ≥ min(nt, nr)(

1−min(nt, nr) 2λD

)

. (7.23)

Note that since R∗(SNR) ≤ C(SNR), it follows that ΠR∗ ≤ ΠC .7.2

In this work, we show that a communication scheme that employs nearest

neighbour decoding and pilot-aided channel estimation achieves the following

pre-log.

Theorem 7.1. Consider the above Gaussian MIMO flat-fading channel with nt

transmit antennas and nr receive antennas. Then, the transmission and decoding

scheme described in Section 7.2 achieves

ΠR∗ ≥ min(nt, nr)

(

1− min(nt, nr)

L∗

)

(7.24)

where L∗ =⌊

12λD

⌋

is the largest integer satisfying L∗ ≤ 12λD

.

Proof. See Section 7.4.1.

Remark 7.1. We derive Theorem 7.1 for i.i.d. Gaussian codebooks satisfying

the average-power constraint (7.3). Nevertheless, it can be shown that Theorem

7.1 continues to hold when the channel inputs satisfy a peak-power constraint.

More specifically, we show in Section 7.4.2 that a sufficient condition on the

input distribution with power constraint [‖X‖2] ≤ nt for achieving the pre-log

is that its probability density function (pdf) PX(x) satisfies

PX(x) ≤ K

πnte−‖x‖2 , x ∈ nt (7.25)

for some K satisfying

limSNR→∞

logK

log SNR= 0. (7.26)

The condition (7.25) is satisfied, for example, by truncated Gaussian inputs, for

7.2The channel capacity is the supremum of all achievable rates maximised over all possibleencoders and decoders

112

7.3 The Pre-Log

which the nt elements in X are independent and identically distributed and

PX(x) =1

Kπnt

e−‖x‖2 , x ∈ x ∈ nt : |x(t)| ≤ 1, 1 ≤ t ≤ nt , (7.27)

K =

(∫

|x|≤1

1

πe−|x|2dx

)nt

(7.28)

= (1− e−1)nt. (7.29)

If 1/(2λD) is an integer, then (7.24) becomes

ΠR∗ ≥ min(nt, nr)(

1−min(nt, nr) 2λD

)

. (7.30)

Thus, in this case nearest neighbour decoding together with pilot-aided channel

estimation achieves the capacity pre-log of MISO fading channels (7.21) as well

as the lower bound on the capacity pre-log of MIMO fading channels (7.23).

Suppose that both the transmitter and the receiver use the same number of

antennas, namely nt′ , nr

′ , min(nt, nr). Then, as the codeword length tends to

infinity, we have from (7.4), (7.5) and (7.6) that the fraction of time consumed

for the transmission of pilots is given by

limn→∞

np

n′ = limn→∞

(

nL−nt

+ 1 + 2(T − 1))

nt′

(

nL−nt

′ + 1 + 2(T − 1))

nt′ + n + 2(L− nt

′)(T − 1)=nt

′

L.

(7.31)

Consequently, from the achievable pre-log (7.24), namely

ΠR∗ ≥ nt′(

1− nt′

L

)

, L ≤ 1

2λD, (7.32)

we observe that the loss compared to the capacity pre-log of the coherent fading

channel nt′ = min(nt, nr) is given by the fraction of time used for the transmission

of pilots. From this we infer that the nearest neighbour decoder in combination

with the channel estimator described in Section 7.2 is optimal at high SNR in the

sense that it achieves the capacity pre-log of the coherent fading channel. This

further implies that the achievable pre-log in Theorem 7.1 is the best pre-log that

can be achieved by any scheme employing nt′ pilot vectors.

To achieve the pre-log in Theorem 7.1, we assume that L ≤ 12λD

, in which

case the variance of the interpolation error (7.15), namely

ǫ2 = 1−∫ 1/2

−1/2

SNR (fH(λ))2

SNRfH(λ) + Lntdλ ≈ 2λDLnt

SNR, (7.33)

113

7.3 The Pre-Log

vanishes as the inverse of the SNR. The achievable pre-log is then maximised

by maximising L ≤ 12λD

. Note that as a criterion of “perfect side information”

for nearest neighbour decoding in fading channels, Lapidoth and Shamai [54]

suggested that the variance of the fading estimation error should be negligible

compared to the reciprocal of the SNR. Using the linear interpolator (7.7), we

obtain an estimation error with variance decaying as the reciprocal of the SNR

provided that L ≤ 12λD

. Thus, the condition L ≤ 12λD

can be viewed as a sufficient

condition for obtaining “nearly perfect side information” in the sense that the

variance of the interpolation error is in the same order as the reciprocal of the

SNR.

Of course, one could increase the L beyond 12λD

. Indeed, by increasing L, we

could reduce the rate loss due to the transmission of pilots as indicated in (7.32)

at the cost of obtaining a larger fading estimation error, which may reduce the

reliability of the nearest neighbour decoder. To understand this trade-off better,

we shall analyse the contribution of the nearest neighbour decoder to the pre-log

when L > 12λD

. Note that for L > 12λD

, the variance of the interpolation error

follows from (7.11)

ǫ2ℓ(t) = 1−∫ 1/2

−1/2

SNR |fHL,ℓ−t+1(λ)|2SNRfHL,0(λ) + nt

dλ (7.34)

=

∫ 1/2

−1/2

ntfHL,0(λ)

SNRfHL,0(λ) + ntdλ+

∫ 1/2

−1/2

SNR(

(fHL,0(λ))2 − |fHL,ℓ−t+1(λ)|2

)

SNRfHL,0(λ) + ntdλ.

(7.35)

The former integral

∫ 1/2

−1/2

ntfHL,0(λ)

SNRfHL,0(λ) + ntdλ ≈ nt

SNR(7.36)

vanishes as the SNR tends to infinity. Furthermore, we prove in Section 7.4.1

that as the SNR tends to infinity, the latter integral

∫ 1/2

−1/2

SNR(

(fHL,0(λ))2 − |fHL,ℓ−t+1(λ)|2

)

SNRfHL,0(λ) + nt

dλ (7.37)

is bounded away from zero. This implies that the interpolation error (7.35) does

not vanish as the SNR tends to infinity, and the decoder therefore cannot achieve

a positive pre-log. It thus follows that the condition L ≤ 12λD

is necessary in order

to achieve a positive pre-log.

Comparing (7.24) and (7.23) with the capacity pre-log min(nt, nr) for coherent

114

7.4 Proofs

fading channels [88, 89], we observe that, for a fading process of bandwidth λD,

the penalty for not knowing the fading coefficients is roughly(

min(nt, nr))2

2λD.

Consequently, the lower bound (7.24) does not grow linearly with min(nt, nr), but

it is a quadratic function of min(nt, nr) that achieves its maximum at

min(nt, nr) =L∗

2. (7.38)

This gives rise to the lower bound

ΠR∗ ≥ L∗

4(7.39)

which cannot be larger than 1/(8λD). The same holds for the lower bound (7.23).

7.4 Proofs

7.4.1 Proof of Theorem 7.1

Theorem 7.1 is proven as follows. We first characterise the estimation error from

the linear interpolator (7.7). We then compute the rates achievable with the

communication scheme described in Section 7.2. Finally, we analyse the pre-log

corresponding to these rates.

7.4.1.1 Linear Interpolator

By (7.7), the estimate of Hk(r, t) is given by

H(T )k (r, t) =

k+TL∑


ak′(r, t)Yk′(r), k ∈ D. (7.40)

We further denote the interpolation error by E(T )k (r, t) = Hk(r, t) − H

(T )k (r, t).

We have the following lemma.

Lemma 7.1. For any k ∈ , let k = jL + ℓ, ℓ = 0, . . . , L − 1. Without

loss of generality, assume that for k ∈ P, ℓ = 0, . . . , nt − 1 and for k ∈ D,

ℓ = nt, . . . , L−1. Then, the linear interpolator (7.40) has the following properties.

1. For each t = 1, . . . , nt and r = 1, . . . , nr, the estimate H(T )k (r, t) and the cor-

responding estimation error E(T )k (r, t) are independent zero-mean complex-

Gaussian random variables.

115

7.4 Proofs

2. (a) For a given transmit antenna t and ℓ ∈ nt, . . . , L−1, the nr processes

(H(T )jL+ℓ(r, t), E

(T )jL+ℓ(r, t)), j ∈ , r = 1, . . . , nr are independent and

have the same law.

(b) For a given receive antenna r and ℓ ∈ nt, . . . , L−1, the nt processes

(H(T )jL+ℓ(r, t), E

(T )jL+ℓ(r, t)), j ∈ , t = 1, . . . , nt are independent but

have different laws.

3. For each ℓ = nt, . . . , L−1, the joint process (H jL+ℓ, H(T )jL+ℓ,XjL+ℓ,ZjL+ℓ), j ∈

is stationary ergodic.

4. It holds that for ℓ = nt, . . . , L− 1

[

Z†ℓ H

(T )ℓ Xℓ

]

= [

X†ℓ H

†(T )ℓ Zℓ

]

= 0. (7.41)

5. Irrespective of j and r, the variance of the interpolation error E(T )jL+ℓ(r, t),

ℓ = nt, . . . , L− 1 tends to

ǫ2ℓ(t) = 1−∫ 1/2

−1/2


dλ (7.42)

as T tends to infinity, where

fHL,ℓ(λ) =1

L

L−1∑

ν=0

fH

(

λ− ν

L

)

ei2πℓλ−νL

λ, −1

2≤ λ ≤ 1

2(7.43)

and fH(·) is the periodic function of period [−1/2, 1/2) that coincides with

fH(λ) for −1/2 ≤ λ ≤ 1/2. This implies the following results.

(a) For L ≤ 12λD

, irrespective of ℓ and t, (7.42) becomes

ǫ2ℓ(t) = 1−∫ 1/2

−1/2

SNR (fH (λ))2

SNRfH (λ) + Lntdλ (7.44)

which vanishes as SNR tends to infinity

lim infSNR→∞

ǫ2ℓ(t) = 0. (7.45)

(b) For L > 12λD

, we have for all ℓ = nt, . . . , L− 1, t = 1, . . . , nt

lim infSNR→∞

ǫ2ℓ(t) > 0. (7.46)

116

7.4 Proofs

Proof. 1. By the orthogonality principle [78], we have that H(T )k (r, t) and

E(T )k (r, t) are uncorrelated. Noting that the pilot symbol is unity, we can

write (7.40) as

H(T )k (r, t) =

k+TL∑


ak′(r, t)

(

√

SNR

ntHk′(r, t) + Zk′(r)

)

, k ∈ D.

(7.47)

Since the processes Hk(r, t), k ∈ and Zk(r), k ∈ are zero-mean

complex-Gaussian processes, we have from (7.47) and the orthogonality

principle [78] that H(T )k (r, t) and E

(T )k (r, t) are zero-mean independent com-

plex-Gaussian random variables.

2. Let k = jL+ℓ and ℓ = kmodL. Without loss of generality, assume that for

k ∈ D, we have ℓ = nt, . . . , L− 1, and for k ∈ P, we have ℓ = 0, . . . , nt − 1.

Since the pilot vectors are transmitted sequentially from p1 to pnt, we have

for k ∈ P that

pt = pℓ+1, ℓ = 0, . . . , nt − 1 (7.48)

namely the (ℓ + 1)-th pilot vector, ℓ = 0, . . . , nt − 1 is used to estimate

the fading coefficients from transmit antenna t. To estimate Hk(r, t), there

is no loss in optimality by considering only the outputs Yk′(r) for k′ ∈ P,

k′ ∈ [k − TL, k + TL] satisfying

k′ mod L = t− 1. (7.49)

Indeed, the channel outputs Yk′(r), k′modL 6= t−1 correspond to Hk′(r, t

′),

t′ 6= t, which are independent from Hk(r, t). It follows from [94] that for the

estimation at k = jL+ ℓ, the optimal coefficients ak′(r, t) (which minimises

the mean-squared error) depend only on L and ℓ. The fading estimate

(7.40) and its corresponding estimation error can then be expressed as

H(T )jL+ℓ(r, t) =

T−1∑

τ=−T

a−τL,ℓ(r, t)Y(j−τ)L+t−1(r) (7.50)

=

T−1∑

τ=−T

a−τL,ℓ(r, t)

(

√

SNR

ntH(j−τ)L+t−1(r, t) + Z(j−τ)L+t−1(r)

)

,

(7.51)

E(T )jL+ℓ(r, t) = HjL+ℓ(r, t)− H

(T )jL+ℓ(r, t). (7.52)

117

7.4 Proofs

We note that the nr · nt processes Hk(r, t), k ∈ are independent from

each other and have the same law. We have the following results.

(a) We observe in (7.51) that for a given t, the time differences between

the index of interest (jL+ℓ) and the positions of pilots ((j−τ)L+t−1,

τ = −T, . . . , T − 1) are the same for all r = 1, . . . , nr. It thus follows

from [94] that for a given t, the optimal coefficients a−τL,ℓ(r, t) are

identical for all r = 1, . . . , nr. This implies that for a given t and

ℓ, the nr processes (H(T )jL+ℓ(r, t), E

(T )jL+ℓ(r, t)), j ∈ , r = 1, . . . , nr

corresponding to the channel estimation at nr receive antennas are

independent and have the same law.

(b) We also observe in (7.51) that for a given r, the time differences

between the index of interest (jL + ℓ) and the position of pilots

((j − τ)L + t − 1, τ = −T, . . . , T − 1) are different for t = 1, . . . , nt.

It thus follows from [94] that for a given r, the optimal coefficients

a−τL,ℓ(r, t) are generally different for t = 1, . . . , nt. This implies that

for a given r and ℓ, the nt processes (H(T )jL+ℓ(r, t), E

(T )jL+ℓ(r, t)), j ∈ ,

t = 1, . . . , nt are independent but have different laws.

3. We apply existing results on the stationary process, particularly on weakly

mixing and ergodicity. (The definition of these notions can be found in [95].)

Since Hk, k ∈ is ergodic Gaussian process, it is also weakly mixing [96].

Since Zk, k ∈ is i.i.d. Gaussian process and independent from Hk, k ∈, it follows from [97, Prop. 1.6] that the joint process (Hk,Zk), k ∈ is ergodic.

To understand the behaviour of the process (H jL+ℓ, H(T )jL+ℓ,ZjL+ℓ), j ∈ ,

ℓ ∈ nt, . . . , L− 1, we first consider another random matrix Hk, for which

the entry at row r and column t is similarly defined to (7.47) but with the

removal of k′ ∈ P and with the same set of ak′(r, t), k′ = k − TL, . . . , k +

TL for all k ∈ , i.e.,

Hk(r, t) =

k+TL∑

k′=k−TL

ak′(r, t)

(

√

SNR

ntHk′(r, t) + Zk′(r)

)

. (7.53)

We can see that the joint process (Hk, Hk,Zk), k ∈ is Gaussian. For any

process Uk, k ∈ , denote Uk∗

k , k∗ > k as the sequence Uk, Uk+1, . . . , Uk∗ .

Then, we can express (Hk, Hk,Zk) as the output of a time-invariant multi-

118

7.4 Proofs

variate function of (Hk,Zk), k ∈ , i.e.,

(Hk, Hk,Zk) = φ(

Hk+TLk−TL,Z

k+TLk−TL

)

. (7.54)

Since (Hk,Zk), k ∈ is ergodic, and since (Hk, Hk,Zk), k ∈ is

Gaussian, it follows from [96] that the joint process (Hk, Hk,Zk), k ∈ is weakly mixing. Furthermore, as a consequence of the vector process

(H (j+1)L−1jL , H

(j+1)L−1jL ,Z

(j+1)L−1jL ), j ∈

being weakly mixing [96], the undersampled process

(H jL+ℓ, H jL+ℓ,ZjL+ℓ), j ∈

for each ℓ = 0, . . . , L− 1 is weakly mixing, which implies ergodicity [96].

From the proof in Part 2), we note for H(T )k (r, t) in (7.47) with k = jL+ ℓ

that the optimal coefficients ak′(r, t) do not depend on j, and can then

be expressed as a−τL,ℓ. For each ℓ = nt, . . . , L − 1, we can then view

(H jL+ℓ, H(T )jL+ℓ,ZjL+ℓ), j ∈ as a special case of (H jL+ℓ, H jL+ℓ,ZjL+ℓ), j ∈

, where the values of ak′(r, t) in (7.53) is given by

ak′(r, t) =

a−τL,ℓ(r, t), if k′ = (j − τ)L+ t− 1, τ ∈ −T, . . . , T − 10, otherwise.

(7.55)

It thus follows that the undersampled process (H jL+ℓ, H(T )jL+ℓ,ZjL+ℓ), j ∈

for each ℓ = nt, . . . , L− 1 is ergodic.

By applying [87, Lemma 2], since XjL+ℓ, j ∈ is i.i.d. and independent

from (Hk,Zk), k ∈ , we have that the joint process

(H jL+ℓ, H(T )jL+ℓ, XjL+ℓ, ZjL+ℓ), j ∈

is ergodic.

4. Note that ℓ = nt, . . . , L− 1 correspond to k ∈ D. We then have that

[

Z†ℓ H

(T )ℓ Xℓ

]

= [

[

Z†ℓ H

(T )ℓ Xℓ

∣

∣

∣H

(T )ℓ

]]

. (7.56)

The process H (T )k , k ∈ D is a function of (Hk,Zk), k ∈ P. Since

Zk, k ∈ D has zero mean, independent from (Hk,Zk), k ∈ P and

119

7.4 Proofs

Xk, k ∈ D, it follows for any H(T )ℓ = H

(T )ℓ ∈ nr×nt, ℓ = nt, . . . , L − 1

that the inner expectation on the RHS of (7.56) is zero, which implies that

the outer expectation is also zero. The same reason can also be used to

prove [X†ℓ H

†(T )ℓ Zℓ] = 0.

5. As T tends to infinity, we have that from (7.52) that irrespective of j and

r,

ǫ2ℓ(t) = limT→∞

[

∣

∣

∣E

(T )jL+ℓ(r, t)

∣

∣

∣

2]

(7.57)

= limT→∞

[

∣

∣

∣HjL+ℓ(r, t)− H

(T )jL+ℓ(r, t)

∣

∣

∣

2]

(7.58)

= limT→∞

[(

HjL+ℓ(r, t)− H(T )jL+ℓ(r, t)

)

H†jL+ℓ(r, t)

]

(7.59)

= 1− limT→∞

[

H(T )jL+ℓ(r, t)H

†jL+ℓ(r, t)

]

(7.60)

= 1−∞∑

τ=−∞a−τL,ℓ(r, t)

√

SNR/nt

[

H(j−τ)L+t−1H†jL+ℓ

]

. (7.61)

Herein (7.59) follows from the orthogonality principle [78], and (7.61) fol-

lows from (7.51). Then, following the derivation in [94], we obtain

ǫ2ℓ(t) = 1−∞∑

τ=−∞a−τL,ℓ(r, t)

√

SNR/nt

[

H(j−τ)L+t−1H†jL+ℓ

]

(7.62)

= 1−∫ 1/2

−1/2


dλ (7.63)

where

fHL,ℓ(λ) =1

L

L−1∑

ν=0

fH

(

λ− ν

L

)

ei2πℓλ−νL

λ, −1

2≤ λ ≤ 1

2(7.64)

and fH(·) is the periodic function of period [−1/2, 1/2) that coincides with

fH(λ) for −1/2 ≤ λ ≤ 1/2.

The inverse of twice bandwidth 12λD

determines the behaviour of ǫ2ℓ(t) as

the SNR tends to infinity.

(a) If L ≤ 12λD

, then it follows that

fH

(

λ− ν

L

)

= 0, ν = 1, . . . , L− 1, −1

2≤ λ ≤ 1

2, (7.65)

120

7.4 Proofs

in which case,

fHL,ℓ(λ) =1

L

L−1∑

ν=0

fH

(

λ− ν

L

)

ei2πℓλ−νL

λ =1

LfH

(

λ

L

)

ei2πℓLλ,

− 1

2≤ λ ≤ 1

2. (7.66)

Combining (7.66) with (7.65), we obtain for the variance of the inter-

polation error

ǫ2ℓ(t) = 1−∫ 1/2

−1/2


dλ (7.67)

= 1−∫ 1/2

−1/2

SNR (fH (λ))2

SNRfH (λ) + Lntdλ, L ≤ 1

2λD(7.68)

irrespective of ℓ and t. Since H(T )jL+ℓ(r, t) and E

(T )jL+ℓ(r, t) are indepen-

dent, it follows from (7.68) and (7.52) that

limT→∞

[

∣

∣

∣H

(T )jL+ℓ(r, t)

∣

∣

∣

2]

=

∫ 1/2

−1/2

SNR (fH (λ))2

SNRfH (λ) + Lnt

dλ. (7.69)

(b) We next analyse the interpolation error for L > 12λD

. To this end, we

express L as

L =1

2λD+ ε (7.70)

for some ε > 0. The variance of the interpolation error (7.42) can be

decomposed into two integrals

ǫ2ℓ(t) =

∫ 1/2

−1/2

ntfHL,0(λ)

SNRfHL,0(λ) + nt

dλ

+

∫ 1/2

−1/2

SNR(

(fHL,0(λ))2 − |fHL,ℓ−t+1(λ)|2

)

SNRfHL,0(λ) + nt

dλ. (7.71)

121

7.4 Proofs

Let ℓ′ , ℓ− t+ 1. We have that

(fHL,0(λ))2 − |fHL,ℓ′(λ)|

2

=1

L2

L−1∑

ν=0

L−1∑

ν′=0,ν′ 6=ν

fH

(

λ− ν

L

)

fH

(

λ− ν ′

L

)

(

1− eı2πℓ′ λ−ν

L · e−ı2πℓ′ λ−ν′

L

)

(7.72)

=2

L2

L−1∑

ν=0

L−1∑

ν′>ν

fH

(

λ− ν

L

)

fH

(

λ− ν ′

L

)(

1− cos

(

2πℓ′ν ′ − ν

L

))

.

(7.73)

Since the summands are non-negative, we obtain the following lower

bound by considering only the summands corresponding to ν = 0 and

ν ′ = 1

(fHL,0(λ))2 − |fHL,ℓ′(λ)|

2

≥ 2

L2fH

(

λ

L

)

fH

(

λ− 1

L

)(

1− cos

(

2πℓ′

L

))

. (7.74)

We note that for every k ∈ D, we have ℓ = kmodL ≥ nt. It therefore

follows that 1 ≤ ℓ′ ≤ L− 1 which implies

1− cos

(

2πℓ′

L

)

> 0. (7.75)

By the definition of f(·), it follows that for L = 12λD

+ ε, fH(

λL

)

and

fH(

λ−1L

)

overlap on an interval L ⊆ [−1/2, 1/2]. Indeed, it can be

shown that for L = 12λD

+ε, the Lebesgue measure of L on the interval

−1/2 ≤ λ ≤ 1/2 is given by

µ (L) = min(1, 2λDε). (7.76)

The second integral in (7.71) can then be lower-bounded as

∫ 1/2

−1/2

SNR(

(fHL,0(λ))2 − |fHL,ℓ′(λ)|2

)

SNRfHL,0(λ) + ntdλ

≥ 2(

1− cos(

2πℓ′

L

))

L2

∫

L

SNR fH(

λL

)

fH(

λ−1L

)

SNRfHL,0(λ) + ntdλ. (7.77)

Computing the limit as the SNR tends to infinity and applying Fatou’s

122

7.4 Proofs

lemma [41] yield

lim infSNR→∞

2(

1− cos(

2πℓ′

L

))

L2

∫

L

SNR fH(

λL

)

fH(

λ−1L

)

SNRfHL,0(λ) + nt

dλ

≥ 2(

1− cos(

2πℓ′

L

))

L2

∫

L

lim infSNR→∞

SNR fH(

λL

)

fH(

λ−1L

)

SNRfHL,0(λ) + ntdλ (7.78)

=2(

1− cos(

2πℓ′

L

))

L2

∫

L

fH(

λL

)

fH(

λ−1L

)

fHL,0(λ)dλ. (7.79)

Since L is of positive Lebesgue measure, and since the integrand in

the RHS of in (7.79) is strictly positive, it follows from [98] that

∫

L

fH(

λL

)

fH(

λ−1L

)

fHL,0(λ)dλ > 0. (7.80)

Together with (7.75), this makes the second integral in (7.71) bounded

away from zero as the SNR tends to infinity, which implies that the

variance of the interpolation error ǫ2ℓ(t) is also bounded away from

zero whenever L > 12λD

.

7.4.1.2 Achievable Rates and Pre-Logs

We first note that it suffices to consider the case where nt = nr. If nt > nr,

then we employ only nr transmit antennas, and if nr > nt, then we ignore nr−nt

antennas at the receiver. This yields in both cases a lower bound on the maximum

achievable rate.

To prove Theorem 7.1, we analyse the generalized mutual information (GMI)

[36] for the channel and communication scheme in Section 7.2. The GMI, de-

noted by IgmiT (SNR), specifies the highest information rate for which the average

probability of error, averaged over the ensemble of i.i.d. Gaussian codebooks,

tends to zero as the codeword length n tends to infinity (see [4, 39, 54] and ref-

erences therein). The GMI for stationary Gaussian channels employing nearest

neighbour decoding has been evaluated in [39,54] where explicit assumptions on

the fading estimate process are specified. However, since the fading estimate pro-

duced by the linear interpolator (7.7) has different statistics to the ones in [39,54],

the results on the GMI presented in [39, 54] do not directly extend to our case.

Thus, for the sake of completeness, we shall re-derive IgmiT (SNR) using our fading

estimate specified in Lemma 7.1.

To prove Theorem 7.1, we evaluate IgmiT (SNR) in the following order.

123

7.4 Proofs

1. We compute a lower bound on IgmiT (SNR) for a fixed window size T .

2. We analyse the behaviour of this lower bound as T tends to infinity.

3. We evaluate the limiting ratio of this lower bound to log SNR as SNR tends

to infinity.

IgmiT (SNR) for a fixed T

Note that the linear interpolator (7.40) is used to estimate the fading at time

k ∈ D. At time k ∈ P, the estimation is not required. However, for the sake of

completeness, we obtain the estimate at time k ∈ P using

H(T )k (r, t) =

√

nt

SNRYk(r). (7.81)

In order to evaluate the GMI for a fixed T , we need the following lemma.

Lemma 7.2. Consider the channel and the transmission model in Section 7.2.

Let

n = n+

(

n

L− nt+ 1

)

nt − 1. (7.82)

Without loss of generality, consider a codeword for which the first data vector is

transmitted at time k = nt. Define F (SNR) as

F (SNR) , nr +SNR

(L− nt)nt

(L−1)∑

ℓ=nt

[

∥

∥

∥E(T )ℓ

∥

∥

∥

2

F

]

(7.83)

and a typical set

Tδ ,

xk,yk, H(T )k , k = 0, . . . , n :

∣

∣

∣

∣

∣

1

n

∑

k∈D

∥

∥

∥yk −

√

SNR/nt H(T )k xk

∥

∥

∥

2

− F (SNR)

∣

∣

∣

∣

∣

< δ

(7.84)

for some δ > 0. For any process Uk, k ∈ , denote Uk∗

k , k∗ > k as the sequence

Uk, Uk+1, . . . , Uk∗. Then, it holds that

limn→∞

Pr

(

X n0 ,Y

n0 , H

(T ),n0

)

∈ Tδ

= 1, ∀δ > 0. (7.85)

124

7.4 Proofs

Proof. For k ∈ D, the channel input vector x corresponds to the data vector x.

Then, as the codeword length n tends to infinity, we have the following limit

limn→∞

1

n

∑

k∈D

∥

∥

∥

√

SNR/nt

(

Hk − H(T )k

)

xk + zk

∥

∥

∥

2

=1

L− nt

L−1∑

ℓ=nt

limn→∞

L− nt

n

( n(L−nt

)−1∑

j=0

∥

∥

∥

√

SNR/nt

(

HjL+ℓ − H(T )jL+ℓ

)

xjL+ℓ + zjL+ℓ

∥

∥

∥

2

(7.86)

=1

L− nt

L−1∑

ℓ=nt

[

∥

∥

∥

√

SNR/nt

(

H ℓ − H(T )ℓ

)

Xℓ +Zℓ

∥

∥

∥

2]

, almost surely

(7.87)

=1

L− nt

L−1∑

ℓ=nt

(

nr +SNR

nt

[

∥

∥

∥E(T )ℓ Xℓ

∥

∥

∥

2

F

])

(7.88)

= nr +SNR

(L− nt)nt

L−1∑

ℓ=nt

[

∥

∥

∥E(T )ℓ

∥

∥

∥

2

F

]

. (7.89)

Herein equality (7.87) follows from the ergodicity condition in Part 3) of Lemma

7.1. Equality (7.88) follows from Part 4) of Lemma 7.1. Equality (7.89) follows

since Xℓ is independent from E(T )ℓ (as E(T )

k , k ∈ D is a function of (Hk,Zk), k ∈P) and Nnt (0, Int) distributed. This completes the proof of the lemma.

Let Pe be the ensemble-average error probability and Pe(m) be the ensemble-

average error probability corresponding to message m. Due to the symmetry of

the codebook construction, it suffices to consider the error behaviour, conditioned

on the event that message m = 1 is transmitted.

Let E(m′) denote the event that D(m′) ≤ D(1). The probability of error is

given by

Pe(1) = Pr

⋃

m′ 6=1

E(m′)

. (7.90)

Using the typical set Tδ and its complement Tcδ, the ensemble-average error prob-

125

7.4 Proofs

ability can be upper-bounded as

Pe(1) =Pr

⋃

m′ 6=1

E(m′)

∣

∣

∣

∣

∣

X n0 (1),Y

n0 , H

(T ),n0 ∈ Tδ

Pr

X n0 (1),Y

n0 , H

(T ),n0 ∈ Tδ

+ Pr

⋃

m′ 6=1

E(m′)

∣

∣

∣

∣

∣

X n0 (1),Y

n0 , H

(T ),n0 ∈ T

cδ

Pr

X n0 (1),Y

n0 , H

(T ),n0 ∈ T

cδ

(7.91)

≤Pr

⋃

m′ 6=1

E(m′)

∣

∣

∣

∣

∣

X n0 (1),Y

n0 , H

(T ),n0 ∈ Tδ

+ Pr

X n0 (1),Y

n0 , H

(T ),n0 ∈ T

cδ

(7.92)

≤enR · Pr

1

n·D(m′) < F (SNR) + δ

∣

∣

∣

∣

X n0 (1),Y

n0 , H

(T ),n0 ∈ Tδ

+ Pr

X n0 (1),Y

n0 , H

(T ),n0 ∈ T

cδ

, m′ 6= 1 (7.93)

where in the last inequality we have used the union bound and that, for

(X n0 (1),Y

n0 , H

(T ),n0 ) ∈ Tδ,

1

n·D(1) < F (SNR) + δ. (7.94)

It follows from Lemma 7.2 that the probability PrX n0 (1),Y

n0 , H

(T )n0 ∈ Tc

δ can

be made any arbitrarily small by letting the codeword length n tend to infinity.

The GMI basically characterises the rate of exponential decay of the expres-

sion

Pr

1

n·D(m′) < F (SNR) + δ

∣

∣

∣

∣

X n0 (1),Y

n0 , H

(T )n0 ∈ Tδ

, m′ 6= 1 (7.95)

to zero as δ ↓ 0 [39]. The computation of the GMI requires the conditional

log moment-generating function of the metric D(m′) associated with the wrong

message output m′ 6= 1, conditioned on the channel outputs and on the fading

estimates, i.e.,

κn(θ, SNR) = log E

[

exp

(

θ

n

∑

k∈DDk(m

′)

)∣

∣

∣

∣

∣

(yk, H(T )k ), k ∈ D

]

(7.96)

where

Dk(m′) ,

∥

∥

∥yk −

√

SNR/nt H(T )k xk(m

′)∥

∥

∥

2

. (7.97)

Following along the lines of [39,54], we can express the conditional log moment-

126

7.4 Proofs

generating function in (7.96) as the sum of conditional log moment-generating

functions for the individual vector metrics Dk(m′), k ∈ D, i.e.,

log

[

exp

(

θ

n

∑

k∈DDk(m

′)

)∣

∣

∣

∣

∣

(yk, H(T )k ), k ∈ D

]

=∑

k∈Dlog

[

exp

(

θ

nDk(m

′)

)∣

∣

∣

∣

yk, H(T )k

]

. (7.98)

The expectation on the RHS of (7.98) can be evaluated as

[

exp

(

θ

nDk(m

′)

)∣

∣

∣

∣

yk, H(T )k

]

=

∫

xk

1

πntexp

(

−‖xk‖2 +θ

n

∥

∥

∥yk −

√

SNR/nt H(T )k xk

∥

∥

∥

2)

dxk (7.99)

=1

det(

Inr − θnSNR

ntH

(T )k H

†(T )k

)exp

(

θ

ny†k

(

Inr −θ

n

SNR

ntH

(T )k H

†(T )k

)−1

yk

)

(7.100)

where the integral can be evaluated in the same way as in [39, App. A]. This

yields

κn(θ, SNR)

=∑

k∈D

(

θ

ny†k

(

Inr −θ

n

SNR

ntH

(T )k H

†(T )k

)−1

yk − log det

(

Inr −θ

n

SNR

ntH

(T )k H

†(T )k

)

)

.

(7.101)

127

7.4 Proofs

We then have that for all θ < 0

limn→∞

1

n· κn(nθ, SNR)

= limn→∞

1

n

∑

k∈Dθy†

k

(

Inr − θSNR

nt

H(T )k H

†(T )k

)−1

yk

− limn→∞

1

n

∑

k∈Dlog det

(

Inr − θSNR

nt

H(T )k H

†(T )k

)

(7.102)

=1

L− nt

L−1∑

ℓ=nt

limn→∞

L− nt

n

( nL−nt

)−1∑

j=0

θy†jL+ℓ

(

Inr − θSNR

ntH

(T )jL+ℓH

†(T )jL+ℓ

)−1

yjL+ℓ

− 1

L− nt

L−1∑

ℓ=nt

limn→∞

L− nt

n

(n/(L−nt)−1)∑

j=0

log det

(

Inr − θSNR

ntH

(T )jL+ℓH

†(T )jL+ℓ

)

(7.103)

=1

L− nt

(L−1)∑

ℓ=nt

[

θY †ℓ ·(

Inr − θSNR

nt

H(T )ℓ H

†(T )ℓ

)−1

· Yℓ

]

− 1

L− nt

(L−1)∑

ℓ=nt

[

log det

(

Inr − θSNR

nt

H(T )ℓ H

†(T )ℓ

)]

, almost surely

(7.104)

where the convergence in (7.104) is due to the ergodicity of (YjL+ℓ, H jL+ℓ), j ∈, ℓ = nt, . . . , L− 1, which follows from Part 3) of Lemma 7.1.

Following the same steps as in [39, 54], we can show that for all δ′ > 0, the

ensemble-average error probability can be bounded as

Pe(1) ≤ exp(nR)exp(

−n(

IgmiT (SNR)− δ′

))

+ ε(δ′, n) (7.105)

for some functions ε(δ′, n) such that

limn→∞

ε(δ′, n) = 0. (7.106)

Here IgmiT (SNR) denotes the GMI as a function of SNR for a fixed T , given by

IgmiT (SNR) =

L− nt

L

(

supθ<0

(θF (SNR)− κ(θ, SNR))

)

(7.107)

128

7.4 Proofs

where κ(θ, SNR) is defined by the RHS of (7.104)

κ(θ, SNR) ,1

L− nt

(L−1)∑

ℓ=nt

[

θY †ℓ ·(

Inr − θSNR

nt

H(T )ℓ H

†(T )ℓ

)−1

· Yℓ

]

− 1

L− nt

(L−1)∑

ℓ=nt

[

log det

(

Inr − θSNR

nt

H(T )ℓ H

†(T )ℓ

)]

. (7.108)

Herein the pre-factor (L − nt)/L comes from the fraction of time used for data

transmission. The bound (7.105) implies that for rates below IgmiT (SNR), the

communication scheme described in Section 7.2 has vanishing error probability

as n tends to infinity. Combining (7.83) and (7.104) with (7.107) yields

IgmiT (SNR)

= supθ<0

1

L

L−1∑

ℓ=nt

θ

(

nr +SNR

nt

[

∥

∥

∥E(T )ℓ

∥

∥

∥

2

F

])

+

[

log det

(

Inr − θSNR

ntH

(T )ℓ H

†(T )ℓ

)]

− [

θY †ℓ

(

Inr − θSNR

nt

H(T )ℓ H

†(T )ℓ

)−1

Yℓ

]

. (7.109)

Following the steps used in Appendix A.3, it can be shown that for θ < 0

−[

θY †ℓ

(

Inr − θSNR

ntH ℓ,T H

†ℓ,T

)−1

Yℓ

]

≥ 0. (7.110)

As observed in Appendix A.3, a good lower bound on IgmiT (SNR) for high SNR

follows by choosing

θ =−1

nr + (SNR/nt)ntnrǫ2∗,T(7.111)

where

ǫ2∗,T = maxr=1,...,nr,t=1,...,nt,

ℓ=nt,...,L−1

ǫ2ℓ,T (r, t). (7.112)

Hence, substituting the choice of θ in (7.111) and applying (7.110) to the RHS

of (7.109) yield

IgmiT (SNR) ≥ 1

L

L−1∑

ℓ=nt

[

log det

(

Inr +SNR

ntnr + ntnrSNRǫ2∗,TH

(T )ℓ H

†(T )ℓ

)]

− 1

.

(7.113)

129

7.4 Proofs

Limit T → ∞

We next analyse the RHS of (7.113) in the limit as T tends to infinity. To this

end, we note that, for L ≤ 12λD

, the interpolation error tends to (7.68), namely

ǫ2ℓ(t) = 1−∫ 1/2

−1/2

SNR (fH(λ))2

SNRfH(λ) + Lnt

dλ (7.114)

irrespective of ℓ and t. We shall therefore denote the variance of the interpolation

error ǫ2ℓ(t) by ǫ. Note that for a fixed T , the entries of

1√

ntnr + ntnrSNRǫ2∗,T

H(T )ℓ (7.115)

are independent of each other but not i.i.d., which follows from Part 2) of Lemma

7.1. However, as T tends to infinity, their distribution becomes identical due to

(7.68) and (7.69) and hence they converge in probability to

1√

ntnr + ntnrSNRǫ2∗,T

H(T )ℓ

d−→ 1√ntnr + ntnrSNRǫ2

H (7.116)

where the entries of H are i.i.d. complex-Gaussian random variables with zero

mean and variance (1− ǫ2).

Note that

log det

(

Inr +SNR


(T )ℓ H

†(T )ℓ

)

≥ 0 (7.117)

is a continuous function with respect to the entries of the matrix

1


(T )ℓ H

†(T )ℓ . (7.118)

It follows from Portmanteau’s lemma [99] that for T → ∞, the RHS of (7.113)

can be lower-bounded by

limT→∞

1

L

L−1∑

ℓ=nt

[

log det

(

Inr +SNR


(T )ℓ H

(T )ℓ

)]

− 1

≥ L− nt

L

[

log det

(

Inr +SNR

ntnr + ntnrSNRǫ2H H

†)]

− 1

. (7.119)

130

7.4 Proofs

Applying the lower bound log det (I+ A) ≥ log det A, we further have that

L− nt

L

[

log det

(

Inr +SNR


†)]

− 1

≥ L− nt

L

(

[

log det

(

SNR


†)]

− 1

)

. (7.120)

Combining (7.120) with (7.119) and (7.113) yields

Igmi(SNR)

= limT→∞

IgmiT (SNR) (7.121)

≥ L− nt

L

(

nt log SNR− nt log(

nt2 + nt

2SNRǫ2)

+ [

log det H H †]− 1

)

.

(7.122)

Limit SNR → ∞

In the following, we compute a lower bound on the pre-log by computing the

limiting ratio of the RHS of (7.122) to SNR as SNR tends to infinity. To this end,

we first consider

SNR ǫ2 = SNR

(

1−∫ 1/2

−1/2

SNR(fH(λ))2

SNRfH(λ) + Lntdλ

)

(7.123)

=

∫ 1/2

−1/2

SNRfH(λ)Lnt

SNRfH(λ) + Lntdλ. (7.124)

Since the integrand is bounded by

0 ≤ SNRfH(λ)Lnt

SNRfH(λ) + L≤ Lnt (7.125)

it follows that 0 ≤ SNR ǫ2 ≤ Lnt, which implies that

limSNR→∞

log (ntnr + ntnrSNR ǫ2)

log SNR= 0. (7.126)

We next consider the term [

log det H H †]− 1. Note that by [90, Lemma A.2]

[

log det H H †]− 1 = nt log(1− ǫ2) +nr−1∑

b=0

ψ(nt − b)− 1 (7.127)

131

7.4 Proofs

where ψ(·) is Euler’s digamma function [53]. Furthermore, since

0 ≤ SNR(fH(λ))2

SNRfH(λ) + Lnt≤ fH(λ) (7.128)

we have by the dominated convergence theorem [19] that

limSNR→∞

ǫ2 = limSNR→∞

(

1−∫ 1/2

−1/2

SNR(fH(λ))2

SNRfH(λ) + Lntdλ

)

= 0 (7.129)

so, log(1 − ǫ2) vanishes as the SNR tends to infinity. Combining (7.129) with

(7.127) yields

limSNR→∞

[

log det H H †]− 1

log SNR= 0. (7.130)

It thus follows from (7.126) and (7.130) that we obtain the lower bound

ΠR∗ ≥ nt

(

1− nt

L

)

(7.131)

= min(nt, nr)

(

1− min(nt, nr)

L

)

, L ≤ 1

2λD(7.132)

where we have used that nt = nr = min(nt, nr). Note that the condition L ≤ 12λD

is necessary since otherwise (7.114) would not hold. This proves Theorem 7.1.

7.4.2 A Note on Input Distribution

The pre-log in Theorem 7.1 is derived using codebooks whose entries are drawn

i.i.d. from Nnt (0, Int). However, Gaussian inputs are not necessary to achieve

the pre-log (7.24). In fact, (7.24) can be achieved by any i.i.d. inputs whose den-

sity satisfies (7.25) and (7.26). To show this, we consider (7.107) and evaluate an

upper bound to F (SNR) and κ(θ, SNR) for an arbitrary continuous input distri-

bution with the average-power constraint [

‖X‖2]

≤ nt and density satisfying

PX(x) ≤ K

πnte−‖x‖2 , x ∈ nt . (7.133)

Remark that with the constraint (7.133), it is not possible to have [

‖X‖2]

= 0.

In order for Lemma 7.2 to hold, F (SNR) should be re-defined as

F (SNR) , nr +SNR

(L− nt)nt

L−1∑

ℓ=nt

[

∥

∥

∥E(T )ℓ Xℓ

∥

∥

∥

2

F

]

. (7.134)

Using the upper bound on the Frobenius norm of the product of two matrices

132

7.4 Proofs

‖AB‖2F ≤ ‖A‖2F · ‖B‖2F [86, Sec. 5.6] and the independence between E(T )ℓ and Xℓ,

we can bound F (SNR) by

F (SNR) ≤ nr +SNR

(L− nt)nt

L−1∑

ℓ=nt

[

∥

∥

∥E(T )ℓ

∥

∥

∥

2

F

]

· [

∥

∥Xℓ

∥

∥

2]

. (7.135)

To evaluate an upper bound to κ(θ, SNR), we upper-bound κn(θ, SNR), de-

fined in (7.96), using the following upper bound

[

exp

(

θ

nDk(m

′)

)∣

∣

∣

∣

yk, H(T )k

]

=

∫

xk

PX(xk) exp

(

θ

n

∥

∥

∥yk −

√

SNR/nt H(T )k xk

∥

∥

∥

2)

dxk (7.136)

≤∫

xk

K

πntexp

(

−‖xk‖2 +θ

n

∥

∥

∥yk −

√

SNR/nt H(T )k xk

∥

∥

∥

2)

dxk (7.137)

=K

det(

Inr − θnSNR

ntH

(T )k H

†(T )k

)exp

(

θ

ny†k

(

Inr −θ

n

SNR

ntH

(T )k H

†(T )k

)−1

yk

)

.

(7.138)

By following the steps used in Section 7.4.1.2, and by choosing

θ =−1

nr + (SNR/nt)ntnrǫ2∗,T[

‖X‖2] (7.139)

where ǫ2∗,T is given in (7.112), we obtain from (7.135) and (7.138)

IgmiT (SNR)

≥ 1

L

L−1∑

ℓ=nt

[

log det

(

Inr +SNR

ntnr + ntnrSNRǫ2∗,T[

‖X‖2] H

(T )ℓ H

†(T )ℓ

)]

− L− nt

L(1 + logK) . (7.140)

Taking the limit of T to infinity, and repeating the steps used in Section 7.4.1.2

133

7.5 Conclusion

yield

Igmi(SNR)

= limT→∞

IgmiT (SNR) (7.141)

≥ L− nt

L

(

[

log det

(

SNR

ntnr + ntnrSNR ǫ[

‖X‖2] H H

†

)]

− 1− logK

)

(7.142)

=L− nt

L

(

nt log SNR− nt log(ntnr + ntnrSNR ǫ[

‖X‖2]

)

+ [

log det H H †]− 1− logK

)

. (7.143)

We conclude by evaluating the limiting ratio of the RHS of (7.143) to log SNR

as SNR to infinity. To this end, we can see from (7.125) and the average-power

constraint [

‖X‖2]

≤ nt that

limSNR→∞

log(

ntnr + ntnrSNRǫ2[

‖X‖2])

log SNR= 0. (7.144)

It thus follows from (7.144) and (7.130) that the pre-log (7.24) can be achieved

if

limSNR→∞

logK

log SNR= 0. (7.145)

7.5 Conclusion

We have studied the information rate pre-log of noncoherent bandlimited MIMO

fading channels achievable with nearest neighbour decoding and pilot-aided chan-

nel estimation. We have shown that the achievable pre-log is given by the ca-

pacity pre-log of the coherent fading channel times the fraction of time used for

the transmission of data. Hence, the loss with respect to the coherent case is

solely due to the transmission of orthogonal pilots used to obtain accurate fad-

ing estimates. If the inverse of twice the bandwidth of the fading process is an

integer, then for MISO channels, the above scheme is optimal in the sense that it

achieves the capacity pre-log of the noncoherent fading channel derived by Koch

and Lapidoth [91]. For noncoherent MIMO channels, the above scheme achieves

the best so far known lower bound on the capacity pre-log obtained by Etkin and

Tse [1].

134

7.5 Conclusion

The pre-log derived here assumes that L (the smallest time interval for which

the same pilot is being transmitted) is limited by the inverse of twice the band-

width of the fading psd so that we can achieve a decaying variance of the fading

estimation error as the inverse of the SNR. This facilitates reliable fading estima-

tion and enables the nearest neighbour decoder to achieve the capacity pre-log of

the coherent fading channel. In order to improve the pre-log, one should reduce

the time spent for the transmission of pilots, yet still maintain the accuracy of

fading estimates. Note that the fraction of time used for the transmission of

pilots is directly proportional to the number of transmit antennas and inversely

proportional to L. Hence, to reduce the fraction of time for the transmission of

pilots, one could increase L beyond the inverse of twice the bandwidth of the

fading psd. However, we show that the pre-log cannot be improved using this

technique. The fault lies on the variance of the fading estimation error that

is bounded away from zero, which makes the fading estimation unreliable and

makes the nearest neighbour decoder perform poorly.

135

Chapter 8

Pilot-Aided Channel Estimation

for Fading Multiple-Access

Channels

In Chapter 7, we have studied the pre-log of point-to-point MIMO fading channels

achievable with nearest neighbour decoding and pilot-aided channel estimation.

It was demonstrated that the pre-log coincides with the capacity pre-log for

MISO fading channels, derived by Koch and Lapidoth [91], and that the scheme

achieves the best so far known lower bound on the capacity pre-log of MIMO

fading channels, derived by Etkin and Tse [1].

In this chapter, we extend the analysis in Chapter 7 to the fading multiple-

access channel (MAC). We propose a joint-transmission scheme that jointly trans-

mits and decodes messages from all users.8.1 We are interested in the achievable-

rate region that can be achieved with nearest neighbour decoding and pilot-aided

channel estimation. In particular, we study the pre-log region, defined as the lim-

iting ratio of the achievable-rate region to the logarithm of the SNR as the SNR

tends to infinity.

This chapter is organised as follows. We first introduce the MIMO fading

MAC model in Section 8.1 and describe the transmission scheme in Section 8.2.

We present our main results on the MAC pre-log in Section 8.3. We next com-

pare the joint-transmission scheme with time-division multiple-access (TDMA) in

Section 8.4. We give the proof of our main results in Section 8.5. We summarise

the important points of the chapter in Section 8.6.

8.1By joint transmission, we mean that codewords from both users are simultaneously trans-mitted at the same time instants. It is assumed that there exists a central controller thatsynchronises the transmission from both users.

137

8.1 System Model

Tx

Tx

s = 1

s = 2

Rx

m1

m2

(m1, m2)

Figure 8.1: The two-user MAC system model.

8.1 System Model

We consider a two-user MIMO fading MAC, where two terminals wish to com-

municate with a third one, and where the channels between the terminals are

MIMO fading channels. The first user has nt,1 antennas, the second user has

nt,2 antennas and the receiver has nr antennas. The channel model is depicted

in Figure 8.1. The channel output at time instant k ∈ is a complex-valued

nr-dimensional random vector

Yk =√SNRH1,kx1,k +

√SNRH2,kx2,k +Zk. (8.1)

Here xs,k ∈ nt,s denotes the time-k channel input vector corresponding to user

s, s = 1, 2, H s,k denotes the nr × nt,s-dimensional fading matrix at time k cor-

responding to user s, s = 1, 2, SNR denotes the average SNR for each transmit

antenna and Zk denotes the nr-variate additive noise vector at time k.

The noise process Zk, k ∈ is a sequence of i.i.d. complex-Gaussian ran-

dom vectors with zero mean and covariance matrix Inr.

The fading processes H s,k, k ∈ , s = 1, 2 are stationary, ergodic and

complex-Gaussian. We assume that the (nt,1·nr+nt,2·nr) processes Hs,k(r, t), k ∈, s = 1, 2, r = 1, . . . , nr, t = 1, . . . , nt,s are independent and have the same law,

with each process having zero mean, unit variance and power spectral density

(psd) fH(λ), −12≤ λ ≤ 1

2. Thus, fH(·) is a nonnegative function satisfying

[

Hs,k+m(r, t)H∗s,k(r, t)

]

=

∫ 1/2

−1/2


where H∗s,k(r, t) denotes the complex conjugate of Hs,k(r, t). We further assume

that the psd fH(·) has bandwidth λD ∈ (0, 1/2], i.e., fH(λ) = 0 for |λ| > λD and

138

Chapter7/Chapter7Figs/MAC-diagram-1.eps

8.2 Transmission Scheme

fH(λ) > 0 otherwise.

We finally assume that the fading processes H s,k, k ∈ , s = 1, 2 and

the noise process Zk, k ∈ are independent and that their joint law does

not depend on xs,k, k ∈ , s = 1, 2. We consider a noncoherent channel

model, where the transmitters and the receiver are aware of the statistics of

H s,k, k ∈ , s = 1, 2 but not of their realisations.


Both users transmit codewords and pilot symbols over the channel (8.1). To

transmit the message ms ∈ 1, . . . , enRs, s = 1, 2, each user’s encoder selects

a codeword of length n from a codebook Cs, where Cs, s = 1, 2 are drawn i.i.d.

from an nt,s-variate, zero-mean, complex-Gaussian distribution with covariance

matrix Int,s.8.2 To facilitate channel estimation at the receiver, orthogonal pilot

vectors are used. The pilot vector ps,t ∈ nt,s, s = 1, 2, t = 1, . . . , nt,s used to

estimate the fading coefficients from transmit antenna t of user s is given by

ps,t(t) = 1 and ps,t(t′) = 0 for t′ 6= t. For example, the first pilot vector of user

s is given by (1, 0, . . . , 0)T, where (·)T denotes the transpose. To estimate the

fading matrices H1,k and H2,k, each training period requires the (nt,1 +nt,2) pilot

vectors p1,1, . . . ,p1,nt,1,p2,1, . . . ,p2,nt,2.

Assuming synchronous transmissions from both users, the transmission scheme

extends the point-to-point setup in Chapter 7 to the two-user MAC setup as il-

lustrated in Figure 8.2. Every L time instants (for some L ≥ nt,1+nt,2, L ∈ ),user 1 first transmits the nt,1 pilot vectors p1,1, . . . ,p1,nt,1. Once the transmis-

sion of the nt,1 pilot vectors is finished, user 2 transmits its nt,2 pilot vectors

p2,1, . . . ,p2,nt,2. The codewords for both users are then split up into blocks of

(L − nt,1 − nt,2) data vectors, which are transmitted simultaneously after the

(nt,1 + nt,2) pilot vectors. The process of transmitting (L − nt,1 − nt,2) data

vectors and (nt,1 +nt,2) pilot vectors continues until all n data symbols are com-

pleted. Herein we assume that n is an integer multiple of (L − nt,1 − nt,2).8.3

Prior to transmitting the first data block, and after transmitting the last data

block, a guard period of L(T − 1) time instants (for some T ∈ ) is introducedfor the purpose of channel estimation, where we transmit every L time instants

the (nt,1 + nt,2) pilot vectors but we do not transmit data vectors in between.

8.2With this assumption, the channel inputs satisfy an average-power constraint. Using thetruncated Gaussian distribution satisfying the conditions in Remark 7.1, one can also imposea peak-power constraint.

8.3As in the point-to-point setup, this assumption is not critical (cf. footnote 7.1).

139



n +(

nL−nt,1−nt,2

+ 1)

(nt,1 + nt,2)

L L(T − 1)L(T − 1)

s = 1,t = 1

t = 2

s = 2, t = 1

Figure 8.2: Structure of joint-transmission scheme, nt,1 = 2, nt,2 = 1, L = 7 andT = 2.

Here we can see that codewords from both users are jointly transmitted at the

same time instants whereas pilots from both users are separately transmitted

at different time instants. Note that the total block-length of this transmission

scheme (comprising data vectors, pilot vectors and guard period) is given by

n′ = np + n+ ng (8.3)

where np and ng are now given by

np =

(

n

L− nt,1 − nt,2+ 1 + 2(T − 1)

)

(nt,1 + nt,2), (8.4)

ng = 2(L− nt,1 − nt,2)(T − 1). (8.5)

Once the transmission is completed, the decoder guesses which message has

been transmitted. The decoder consists of two parts: a channel estimator and a

data detector. The channel estimator observes the channel output Yk′, k′ ∈ P cor-

responding to the past and future T pilot transmissions and estimates Hs,k(r, t)

using a linear interpolator, i.e., the estimate H(T )s,k (r, t) of the fading coefficient

Hs,k(r, t) is given by

H(T )s,k (r, t) =

k+TL∑


as,k′(r, t)Yk′(r), k ∈ D (8.6)

where the coefficients as,k′(r, t) are chosen in order to minimise the mean-squared

error. Here P denotes the set of time indices where pilot symbols are transmitted,

and D denotes the set of time indices where data vectors of a codeword are

transmitted.

140

Chapter7/Chapter7Figs/joint-transmission.eps


Note that, since the pilot symbols are transmitted only from one user and

one antenna at a time, the fading coefficients corresponding to all transmit and

receive antennas from both users can be observed. Further note that, since the

fading processes Hs,k(r, t), k ∈ , s = 1, 2, r = 1, . . . , nr, t = 1, . . . , nt,s are

independent, estimating Hs,k(r, t) only based on Yk(r), k ∈ rather than on

Yk, k ∈ incurs no loss in optimality.

We denote the interpolation error in estimating Hs,k(r, t) using the interpo-

lator (8.6) as

E(T )s,k (r, t) = Hs,k(r, t)− H

(T )s,k (r, t). (8.7)

The error E(T )s,k (r, t) has zero mean and variance less than unity. A detailed anal-

ysis of the variance of E(T )s,k (r, t) follows closely from the analysis of the variance

of the interpolation error in Chapter 7 (see Sections 7.2 and 7.4.1).8.4

From the received codeword yk, k ∈ and the channel-estimate matrices

H(T )s,k , k ∈ D, s = 1, 2 (which are composed of the entries h(T )

s,k (r, t), k ∈ D,where h

(T )s,k (r, t) denotes the realisation of H

(T )s,k (r, t)), the decoder chooses the pair

of messages (m1, m2) that minimises the distance metric

(m1, m2) = arg min(m1,m2)

D(m1, m2) (8.8)

where

D(m1, m2) ,∑

k∈D

∥

∥

∥

∥

∥

yk −√SNR H

(T )1,kx1,k(m1)−

√SNR H

(T )2,kx2,k(m2)

∥

∥

∥

∥

∥

2

. (8.9)

In the following, we will refer to the above scheme as the joint-transmission

scheme.

We shall compare the joint-transmission scheme with a TDMA scheme, where

each user transmits its message using the transmission scheme illustrated in Fig-

ure 8.3. In particular, during the first βn′ channel uses (for some 0 ≤ β ≤ 1), user

1 transmits its codeword according to the transmission scheme given in Chapter

7 (see also Figure 8.3), while user 2 is silent. (Here n′ is given in (8.3).) Then,

during the next (1− β)n′ channel uses, user 2 transmits its codeword according

to the same transmission scheme, while user 1 is silent. In both cases, the re-

ceiver guesses the corresponding message ms, s = 1, 2 using a nearest neighbour

decoder and pilot-aided channel estimation.

8.4One could view the fading estimation from both users as the fading estimation for MIMOchannels with (nt,1 + nt,2) transmit antennas and nr receive antennas. The resulting transmitantenna index depends on the user index and the transmit antenna index for each user.

141

8.3 The MAC Pre-Log


L(T − 1) L L(T − 1)

L(T − 1)

L

L(T − 1)

βn′ (1 − β)n′

s = 1,t = 1

t = 2

s = 2, t = 1

Figure 8.3: Structure of TDMA scheme, nt,1 = 2, nt,2 = 1, L = 4 and T = 2.

8.3 The MAC Pre-Log

Let R∗1(SNR), R

∗2(SNR) and R

∗1+2(SNR) be the maximum achievable rate for user

1, the maximum achievable rate for user 2 and the maximum achievable sum-rate,

respectively. The achievable-rate region is given by the closure of the convex hull

of the set [18]

R =

R1(SNR), R2(SNR) : R1(SNR) < R∗1(SNR),

R2(SNR) < R∗2(SNR),

R1(SNR) +R2(SNR) < R∗1+2(SNR)

. (8.10)

We are interested in the pre-logs of R1(SNR) and R2(SNR), defined as the limiting

ratios of R1(SNR) and R2(SNR) to the logarithm of the SNR as the SNR tends

to infinity. Thus, the pre-log region is given by the closure of the convex hull of

the set

ΠR =

ΠR1 ,ΠR2 : ΠR1 < ΠR∗1, ΠR2 < ΠR∗

2, ΠR1 +ΠR2 < ΠR∗

1+2

(8.11)

where

ΠR∗1, lim sup

SNR→∞

R∗1(SNR)

log SNR, (8.12)

ΠR∗2, lim sup

SNR→∞

R∗2(SNR)

log SNR, (8.13)

ΠR∗1+2

, lim supSNR→∞

R∗1+2(SNR)

log SNR. (8.14)

142

Chapter7/Chapter7Figs/tdma.eps

8.3 The MAC Pre-Log

The capacity pre-logs ΠC1 , ΠC2 and ΠC1+2 are defined in the same way but

with R∗1(SNR), R

∗2(SNR) and R∗

1+2(SNR) replaced by the respective capacities

C1(SNR), C2(SNR) and C1+2(SNR).

In the following theorem, we present our result on the pre-log region of the

two-user MIMO fading MAC achievable with the joint-transmission scheme.

Theorem 8.1. Consider the MIMO fading MAC model (8.1). Then, the pre-log

region achievable with the joint-transmission scheme is the closure of the convex

hull of the set

ΠR1 ,ΠR2 : ΠR1 < min (nr, nt,1)

(

1− nt,1 + nt,2

L∗

)

,

ΠR2 < min (nr, nt,2)

(

1− nt,1 + nt,2

L∗

)

,

ΠR1 +ΠR2 < min (nr, nt,1 + nt,2)

(

1− nt,1 + nt,2

L∗

)

(8.15)

where L∗ =⌊

12λD

⌋


.

Proof. See Section 8.5.

Remark 8.1. The pre-log region given in Theorem 8.1 is the largest region

achievable with any transmission scheme that uses (nt,1 + nt,2)/L∗ of the time

for transmitting pilot symbols. Indeed, even if the channel estimator would be

able to estimate the fading coefficients perfectly, and even if we could decode

the data symbols using a maximum-likelihood decoder, the capacity pre-log region

(without pilot transmission) would be given by the closure of the convex hull of

the set [18,88,89]

(ΠR1 ,ΠR2) : ΠR1 < min(nr, nt,1),

ΠR2 < min(nr, nt,2),

ΠR1 +ΠR2 < min(nr, nt,1 + nt,2)

(8.16)

which, after multiplying by 1 − (nt,1 + nt,2)/L∗ in order to account for the pilot

symbols, becomes (8.15). Thus, in order to improve upon (8.15), one would need

to design a transmission scheme that employs less than (nt,1 + nt,2)/L∗ pilot

symbols per channel use.

143

8.4 Joint Transmission Versus TDMA

Remark 8.2 (TDMA Pre-Log). Consider the MIMO fading MAC model (8.1).

Then, the pre-log region achievable with the TDMA scheme employing nearest

neighbour decoding and pilot-aided channel estimation is the closure of the convex

hull of the set

ΠR1 ,ΠR2 : ΠR1 < βmin (nr, nt,1)(

1− nt,1

L∗

)

,

ΠR2 < (1− β)min (nr, nt,2)(

1− nt,2

L∗

)

, 0 ≤ β ≤ 1

(8.17)

where L∗ =⌊

12λD

⌋


. This follows directly

from the pre-log of the point-to-point MIMO fading channel (Theorem 7.1) where

the number of transmit antennas from users 1 and 2 is given by nt,1 and nt,2,

respectively.

Note that the sum of the pre-logs ΠR1+ΠR2 is upper-bounded by the capacity

pre-log of the point-to-point MIMO fading channel with (nt,1 + nt,2) transmit

antennas and nr receive antennas, since the point-to-point MIMO channel allows

for cooperation between the transmitting terminals. While the capacity pre-log

of point-to-point MIMO fading channels remains an open problem, the capacity

pre-log of point-to-point MISO fading channels under a peak-power constraint is

known, cf. (7.21). It thus follows from (7.21) that, for nr = nt,1 = nt,2 = 1, we

have

ΠR1 +ΠR2 < 1− 2λD (8.18)

which together with the single-user constraints [85]

ΠR1 < ΠC1 = 1− 2λD (8.19)

ΠR2 < ΠC2 = 1− 2λD (8.20)

implies that TDMA achieves the capacity pre-log region of the SISO fading MAC

under a peak-power constraint. The next section provides a more detailed com-

parison between the joint-transmission scheme and TDMA.


In this section, we discuss how the joint-transmission scheme performs compared

to TDMA. To this end, we compare the sum-rate pre-log ΠR∗1+2

of the joint-

144


transmission scheme (Theorem 8.1) with the sum-rate pre-log of the TDMA

scheme employing nearest neighbour decoding and pilot-aided channel estimation

(Remark 8.2) as well as with the sum-rate pre-log of TDMA when the receiver

has knowledge of the realisations of the fading processes H s,k, k ∈ , s = 1, 2.

In the latter case, the sum-rate pre-log is given by

ΠR∗1+2

= βmin(nr, nt,1) + (1− β)min(nr, nt,2). (8.21)

The following corollary presents a sufficient condition on L∗ under which the

sum-rate pre-log of the joint-transmission scheme is strictly larger than the sum-

rate pre-log of the coherent TDMA scheme (8.21), as well as a sufficient condition

on L∗ under which it is strictly smaller than the sum-rate pre-log of the TDMA

scheme given in Remark 8.2. Since (8.21) is an upper bound on the sum-rate

pre-log of any TDMA scheme over the MIMO fading MAC (8.1), and since the

sum-rate pre-log given in Remark 8.2 is a lower bound on the sum-rate pre-log

of the best TDMA scheme, it follows that the sufficient conditions presented in

Corollary 8.1 hold also for the best TDMA scheme.

Corollary 8.1. Consider the MIMO fading MAC model (8.1). The joint-trans-

mission scheme achieves a larger sum-rate pre-log than any TDMA scheme if

L∗ >min(nr, nt,1 + nt,2)(nt,1 + nt,2)

min(nr, nt,1 + nt,2)−min(

nr,max(nt,1, nt,2)) (8.22)

where we define a/0 , ∞ for every a > 0. Conversely, the best TDMA scheme

achieves a larger sum-rate pre-log than the joint-transmission scheme if

L∗ <min(nr, nt,1 + nt,2)(nt,1 + nt,2)

min(nr, nt,1 + nt,2)−min(nr, nt,1, nt,2)

− min(nt,1nr, nt,12, nt,2nr, nt,2

2)

min(nr, nt,1 + nt,2)−min(nr, nt,1, nt,2). (8.23)

Recall that L∗ is inversely proportional to the bandwidth of the power spectral

density fH(·), which in turn is inversely proportional to the coherence time of

the fading channel. We thus see from Corollary 8.1 that the joint-transmission

scheme tends to be superior to TDMA when the coherence time of the channel

is large. In contrast, TDMA is superior to the joint-transmission scheme when

the coherence time of the channel is small.

Intuitively, this can be explained by observing that, compared to TDMA,

the joint-transmission scheme uses the multiple antennas at the transmitters

145


and at the receiver more efficiently, but requires more pilot symbols to estimate

the fading coefficients. Thus, when the coherence time is large, the number of

pilot symbols required to estimate the fading is small, so the gain in capacity

by using the antennas more efficiently dominates the loss incurred by requiring

more pilot symbols. On the other hand, when the coherence time is small, the

number of pilot symbols required to estimate the fading is large and the loss in

capacity incurred by requiring more pilot symbols dominates the gain by using

the antennas more efficiently.

We next evaluate (8.22) and (8.23) for some particular values of nr, nt,1, and

nt,2.

8.4.1 Receiver Employs Less Antennas Than Transmit-

ters

Suppose that the number of receive antennas is smaller than the number of

transmit antennas, i.e., nr ≤ min(nt,1, nt,2). Then, the RHSs of (8.22) and (8.23)

become ∞, so every finite L∗ satisfies (8.23). Thus, if the number of receive

antennas is smaller than the number of transmit antennas, then, irrespective of

L∗, TDMA is superior to the joint-transmission scheme.

8.4.2 Receiver Employs More Antennas Than Transmit-

ters

Suppose that the receiver employs more antennas than the transmitters, i.e.,

nr ≥ nt,1 + nt,2, and suppose that nt,1 = nt,2 = nt. Then, (8.22) and (8.23)

become

L∗ > 4nt (8.24)

and

L∗ < 3nt. (8.25)

Thus, if L∗ is greater than 4nt, then the joint-transmission scheme is superior to

TDMA. In contrast, if L∗ is smaller than 3nt, then TDMA is superior. This is

illustrated in Figure 8.4 for the case where nr = 2 and nt,1 = nt,2 = 1. Note that

if L∗ is between 3nt and 4nt, then the joint-transmission scheme is superior to

the TDMA scheme presented in Remark 8.2, but it may be inferior to the best

TDMA scheme.

146


1 − 1

L∗

1 − 1

L∗

1 − 2

L∗

1 − 2

L∗

ΠR1

ΠR2

1

1

(a) L∗ < 3

1 − 1

L∗

1 − 1

L∗

1 − 2

L∗

1 − 2

L∗

ΠR1

ΠR2

1

1

(b) L∗ > 4

Noncoherent Joint-transmission

Noncoherent TDMA

Coherent TDMA

Figure 8.4: Pre-log regions for a fading MAC with nr = 2 and nt,1 = nt,2 = 1 fordifferent values of L∗. Depicted are the pre-log region for the joint-transmissionscheme as given in Theorem 8.1 (dashed line), the pre-log region of the TDMAscheme as given in Remark 8.2 (solid line), and the pre-log region of the coherentTDMA scheme (8.21) (dotted line).

8.4.3 A Case in Between

Suppose that nr ≤ nt,1 + nt,2 and nt,2 < nr ≤ nt,1. Then, (8.22) becomes

L∗ >∞ (8.26)

and (8.23) becomes

L∗ < nt,2 +nrnt,1

nr − nt,2. (8.27)

Thus, in this case the joint-transmission scheme is always inferior to the coherent

TDMA scheme (8.21), but it can be superior to the TDMA scheme in Remark

8.2.

Typical Values of L∗

We briefly discuss what values of L∗ may occur in practical scenarios. To this

end, we first recall that L∗ is the largest integer satisfying L∗ ≤ 12λD

, where

λD is the bandwidth of the power spectral density fH(·), which in turn can be

associated with the Doppler spread of the channel as

λD =fmWc

. (8.28)

Here fm is the maximum Doppler shift given by

fm =v

cfc (8.29)

147

Chapter7/Chapter7Figs/single-tx-multiple-rx-plot-MAC-1.eps

Chapter7/Chapter7Figs/single-tx-multiple-rx-plot-MAC-2.eps

Chapter7/Chapter7Figs/single-tx-multiple-rx-plot-MAC-label.eps


EnvironmentDelay Mobile

λD ≈ 5στvc fc L∗

spread στ Speed v

Indoor 10 – 100 ns 5 km/h 2 · 10−7 – 10−5 5 · 104 – 2.5 · 106Urban 1 – 2 µs 5 km/h 2 · 10−5 – 2 · 10−4 2.5 · 103 – 2.5 · 104Urban 1 – 2 µs 75 km/h 2 · 10−4 – 0.004 125 – 2.5 · 103Hilly area 3 – 10 µs 200 km/h 0.002 – 0.05 10 – 250

Table 8.1: Typical values of L∗ for various environments with fc ranging from800 MHz to 5 GHz. The values of στ are taken from [1] for indoor and urbanenvironments and from [2] for hilly area environments.

where v is the mobile speed, c = 3·108 m/s is the speed of light and fc is the carrier

frequency; and Wc is the coherence bandwidth of the channel approximated as

[1, 100]

Wc ≈1

5στ(8.30)

where στ is the delay spread. Following the order of magnitude computations of

Etkin and Tse [1], we determine typical values of λD for indoor, urban, and hilly

area environments and for carrier frequencies ranging from 800 MHz to 5 GHz

and tabulate the results in Table 8.1.

For indoor environments and mobile speeds of 5 km/h, we thus have that L∗ is

typically greater than 5·104. For urban environments, L∗ is typically greater than

2.5 · 103 for mobile speeds of 5 km/h and greater than 125 for mobile speeds of

75 km/h. For hilly area environments and mobile speeds of 200 km/h, L∗ ranges

typically from 10 to 250. Thus, for most practical scenarios, L∗ is typically large.

It therefore follows that, if nr ≥ nt,1+nt,2, (8.22) is satisfied unless nt,1+nt,2 is very

large. For example, if the receiver employs more antennas than the transmitters,

and if nt,1 = nt,2 = nt, then L∗ > 4nt is satisfied even for urban environments and

mobile speeds of 75 km/h, as long as nt < 30. Only for hilly area environments

and mobile speeds of 200 km/h, this condition may not be satisfied for a practical

number of transmit antennas. Thus, if the number of antennas at the receiver

is sufficiently large, then the joint-transmission scheme is superior to TDMA in

most practical scenarios. On the other hand, if nr ≤ min(nt,1, nt,2), then TDMA

is always superior to the joint-transmission scheme, irrespective of how large L∗

is. This suggests that one should use more antennas at the receiver than at the

transmitters.

148

8.5 Proof of Theorem 8.1


In contrast to the proof of Theorem 7.1, we note that for the fading MAC, it is

not sufficient to consider only the case of nt,1 = nt,2 = nr as both transmitting

terminals do not cooperate. For the proof of Theorem 8.1, we consider a general

setup of nt,1, nt,2 and nr.

In order to analyse the achievable pre-log region for the fading MAC, we first

provide an extension to Lemma 7.2 for the fading MAC as follows.

Corollary 8.2. Let E(T )s,k , s = 1, 2 be the estimation-error matrix in estimating

H s,k, i.e.,

E(T )s,k = H s,k − H

(T )s,k (8.31)

with the entries E(T )s,k (r, t), r = 1, . . . , nr, t = 1, . . . , nt. Define

F (SNR , nr +SNR

L− nt,1 − nt,2

L−1∑

ℓ=nt,1+nt,2

[

∥

∥

∥E(T )1,ℓ

∥

∥

∥

2

F+∥

∥

∥E(T )2,ℓ

∥

∥

∥

2

F

]

, (8.32)

n , n +

(

n

L− nt,1 − nt,2+ 1

)

(nt,1 + nt,2)− 1, (8.33)

Tδ ,

xs,k,yk, H(T )s,k , k = 0, . . . , n, s = 1, 2 :

∣

∣

∣

∣

∣

1

n

∑

k∈D

∥

∥

∥yk −

√SNR H

(T )1,kx1,k −

√SNR H

(T )2,kx2,k

∥

∥

∥

2

− F (SNR)

∣

∣

∣

∣

∣

< δ

(8.34)

for some δ > 0. It holds that

limn→∞

Pr

X ns,0,Y

n0 , H

(T ),ns,0 , s = 1, 2

∈ Tδ

= 1, ∀δ > 0. (8.35)

Proof. The proof follows from the proof of Lemma 7.2 by treating the channel

as a MIMO channel with channel matrix (H1,k,H2,k), channel-estimate matrix

(H(T )1,k , H

(T )2,k ) and by considering transmission of codewords that are drawn i.i.d.

from Nnt,1+nt,2 (0, Int,1+nt,2).

Let Pe and Pe(m1, m2) be the ensemble-average error probability and the

ensemble-average error probability corresponding to message m1 and m2 be-

ing transmitted. Since the codebook construction is symmetric, it suffices to

study the conditional probability of error, conditioned on the event that the

149


messages (m1, m2) = (1, 1) were transmitted. Let E(m′1, m

′2) denote the event

that D(m′1, m

′2) ≤ D(1, 1). The ensemble-average error probability can be upper-

bounded as

Pe(1, 1)

= Pr

⋃

(m′1,m

′2)6=(1,1)

E(m′1, m

′2)

(8.36)

≤ Pr

⋃

m′1 6=1

E(m′1, 1)

+ Pr

⋃

m′2 6=1

E(1, m′2)

+ Pr

⋃

m′1 6=1

⋃

m′2 6=1

E(m′1, m

′2)

.

(8.37)

With i.i.d. codebooks, we then have three maximum achievable rates: Igmi1,T (SNR),

Igmi2,T (SNR) and Igmi

1+2,T (SNR) corresponding to the error events (m′1 6= 1, m′

2 = 1),

(m′1 = 1, m′

2 6= 1) and (m′1 6= 1, m′

2 6= 1).

Igmi1,T (SNR) – Error Event (m′

1 6= 1, m′2 = 1)

Following the same steps used to derive (7.93), we can upper-bound the ensemble-

average error probability for the error event E(m′1, 1), m

′1 6= 1 using Tδ and its

complement Tcδ as

Pr

⋃

m′1 6=1

E(m′1, 1)

≤ Pr

⋃

m′1 6=1

E(m′1, 1)

∣

∣

∣

∣

∣

∣

X ns,0(1),Y

n0 , H

(T ),ns,0 , s = 1, 2

∈ Tδ

+ Pr

X ns,0(1),Y

n0 , H

(T ),ns,0 , s = 1, 2

∈ Tcδ

(8.38)

≤ enR1 · Pr

1

n·D(m′

1, 1) < F (SNR) + δ

∣

∣

∣

∣

X ns,0(1),Y

n0 , H

(T ),ns,0 , s = 1, 2

∈ Tδ

+ Pr

X ns,0(1),Y

n0 , H

(T ),ns,0 , s = 1, 2

∈ Tcδ

, m′1 6= 1 (8.39)

where in the last inequality we have used the union bound and that

1

n·D(1, 1) < F (SNR) + δ for

X ns,0(1),Y

n0 , H

(T ),ns,0 , s = 1, 2

∈ Tδ. (8.40)

150


Note that due to Corollary 8.2, we have that

Pr

X ns,0(1),Y

n0 , H

(T ),ns,0 , s = 1, 2

∈ Tcδ

(8.41)

can be made any arbitrarily small by letting the codeword length n tend to

infinity.

The computation of the GMI corresponding to the event E(m′1, 1), m

′1 6= 1 re-

quires the expression of log moment-generating function of of the metric D(m′1, 1)

associated with an incorrect messagem′1 6= 1, conditioned on the channel outputs,

on the message from user 2, m2 = 1 and on the fading estimates, i.e.,

κ1,n(θ, SNR)

= log E

[

exp

(

θ

n

∑

k∈DDk(m

′1, 1)

)∣

∣

∣

∣

∣

(

yk,x2,k(1), H(T )1,k , H

(T )2,k

)

, k ∈ D

]

(8.42)

where

Dk(m′1, 1) =

∥

∥

∥yk −

√SNR H

(T )1,kx1,k(m

′1)−

√SNR H

(T )2,kx2,k(1)

∥

∥

∥

2

. (8.43)

Following the steps used in Section 7.4.1, it is not difficult to show that

limn→∞

1

n· κ1,n(nθ, SNR)

=1

L− nt,1 − nt,2

L−1∑

ℓ=nt,1+nt,2

(

g1,ℓ − [

log det(

Inr − θ SNR H(T )1,ℓ H

†(T )1,ℓ

)])

,

almost surely (8.44)

where

g1,ℓ ,

[

θ(

Yℓ −√SNR H

(T )2,ℓ X2,ℓ

)†×(


†(T )1,ℓ

)−1

×(

Yℓ −√SNR H

(T )2,ℓ X2,ℓ

)

]

. (8.45)

Furthermore, following the derivation in [39, 54], we can bound the ensemble-

average error probability (E(m′1, 1), m

′1 6= 1) for any δ′ > 0 as

Pr

⋃

m′1 6=1

E(m′1, 1)

≤ exp (nR1) exp(

−n(

Igmi1,T (SNR)− δ′

))

+ ε1(δ′, n) (8.46)

151


for some functions ε1(δ′, n) satisfying

limn→∞

ε1(δ′, n) = 0. (8.47)

Here Igmi1,T (SNR) is the GMI corresponding to the event E(m′

1, 1), m′1 6= 1 for a

fixed T

Igmi1,T (SNR) =

L− nt,1 − nt,2

L

(

supθ<0

(

θF (SNR)− κ1(θ, SNR))

)

(8.48)

where κ1(θ, SNR) is given by the RHS of (8.44), i.e.,

κ1(θ, SNR) =1

L

L−1∑

ℓ=nt,1+nt,2

(

g1,ℓ − [

log det(


†(T )1,ℓ

)])

. (8.49)

By noting g1,ℓ ≤ 0 for θ ≤ 0 (which can be shown using the technique devel-

oped in Appendix A.3), combining (8.32) and (8.49) with (8.48) and substituting

the choice8.5

θ =−1

nr + nr (nt,1 + nt,2) SNR ǫ2∗,T

(8.50)

where

ǫ2∗,T = maxs=1,2,

r=1,...,nr,t=1,...,nt,s,

ℓ=nt,1+nt,2,...,L−1

[

∣

∣

∣E

(T )s,ℓ (r, t)

∣

∣

∣

2]

, (8.51)

we obtain a GMI lower bound

Igmi1,T (SNR) ≥

1

L

L−1∑

ℓ=nt,1+nt,2

log det

Inr +SNR H

(T )1,ℓ H

†(T )1,ℓ


− 1

.

(8.52)

We continue by analysing the RHS of (8.52) in the limit as the observation

window T of the channel estimator tends to infinity. To this end, following from

the derivation in Section 7.4.1, we note that, for L ≤ 12λD

, the variance of the

interpolation error [|E(T )s,ℓ (r, t)|2] tends to (7.15) (with SNR in (7.15) replaced

by ntSNR)

limT→∞

[

∣

∣

∣E

(T )s,ℓ (r, t)

∣

∣

∣

2]

= ǫ2 = 1−∫ 1/2

−1/2

SNR(fH(λ))2

SNRfH(λ) + Ldλ (8.53)

8.5As pointed in Section 7.4.1, this choice of θ yields a good lower bound at high SNR.

152


irrespective of s, ℓ, r, t. Hence, irrespective of ℓ, the estimate H(T )1,ℓ tends to H in

distribution as T tends to infinity, so

H(T )1,ℓ H

†(T )1,ℓ


d−→ H H†

nr + nr (nt,1 + nt,2)SNR ǫ2(8.54)

where the entries of H are i.i.d., circularly-symmetric, complex-Gaussian random

variables with zero mean and variance 1−ǫ2. Since the function A 7→ det(I+A) is

continuous and bounded from below, we obtain from Portmanteau’s lemma [99]

that

Igmi1 (SNR)

= limT→∞

Igmi1,T (SNR) (8.55)

≥ L− nt,1 − nt,2

L

[

log det

(

Inr +SNR H H †

nr + nr (nt,1 + nt,2) SNR ǫ2

)]

− 1

(8.56)

≥ L− nt,1 − nt,2

Lmin(nr, nt,1)

[

log SNR − log(

nr + nr(nt,1 + nt,2) SNR ǫ2)

]

+L− nt,1 − nt,2

LΨ (8.57)

where

Ψ ,

[

log det H H †]− 1, nr ≤ nt,1

[

log det H †H

]

− 1, nr > nt,1.(8.58)

Here the last inequality follows by lower-bounding log det (I+ A) ≥ log detA.

Following the pre-log evaluation in Section 7.4.1, we obtain a lower bound for

the maximum achievable pre-log as

ΠR∗1, lim

SNR→∞

Igmi1 (SNR)

log SNR≥ min(nr, nt,1)

(

1− nt,1 + nt,2

L

)

, L ≤ 1

2λD. (8.59)

The condition L ≤ 1/(2λD) is necessary since otherwise (7.15) would not hold.

This yields one boundary of the pre-log region presented in Theorem 8.1.

153


Igmi2 (SNR) – Error Event (m′

1 = 1, m′2 6= 1)

This follows from the proof for the error event (m′1 6= 1, m′

2 = 1) by swapping

user 1 with user 2. We thus have

ΠR∗2≥ min(nr, nt,2)

(

1− nt,1 + nt,2

L

)

, L ≤ 1

2λD(8.60)

yielding the second boundary of the pre-log region presented in Theorem 8.1.

Igmi1+2(SNR) – Error Event (m′

1 6= 1, m′2 6= 1)

The computation of the GMI corresponding to the event E(m′1, m

′2), (m′

1 6=1, m′

2 6= 1) requires the log moment-generating function of the metric D(m′1, m

′2)

associated with incorrect messages m′1 6= 1 and m′

2 6= 1, conditioned on the

channel outputs and the fading estimates, i.e.,

κ1+2,n(θ, SNR) = log E

[

exp

(

θ

n

∑

k∈DDk(m

′1, m

′2)

)∣

∣

∣

∣

∣

(

yk, H(T )1,k , H

(T )2,k

)

, k ∈ D

]

(8.61)

where

Dk(m′1, m

′2) =

∥

∥

∥yk −

√SNR H

(T )1,kx1,k(m

′1)−

√SNR H

(T )2,kx2,k(m

′2)∥

∥

∥

2

. (8.62)

As a consequence of the ergodicity condition in Part 3) of Lemma 7.1, we can

show for all θ < 0 that

limn→∞

1

n· κ1+2,n(nθ, SNR)

=1

L− nt,1 − nt,2

L−1∑

ℓ=nt,1+nt,2

[

θY †ℓ

(

Inr − θSNR(

H(T )1,ℓ H

†(T )1,ℓ + H

(T )2,ℓ H

†(T )2,ℓ

))−1

Yℓ

]

− 1

L− nt,1 − nt,2

L−1∑

ℓ=nt,1+nt,2

[

log det

(

Inr − θSNR

(

H(T )1,ℓ H

†(T )1,ℓ + H

(T )2,ℓ H

†(T )2,ℓ

)

)]

,

almost surely. (8.63)

As above, the GMI corresponding to the event E(m′1, m

′2), (m

′1 6= 1, m′

2 6= 1)

can be evaluated as [39, 54]

Igmi1+2,T (SNR) =

L− nt,1 − nt,2

L

(

supθ<0

(

θF (SNR)− κ1+2(θ, SNR))

)

(8.64)

154

8.6 Conclusion

where F (SNR) is given in (8.32), and where κ1+2(θ, SNR) is given by the RHS of

(8.63), i.e.,

κ1+2(θ, SNR)

=1

L− nt,1 − nt,2

L−1∑

ℓ=nt,1+nt,2

[

θY †ℓ

(

Inr − θSNR(

H(T )1,ℓ H

†(T )1,ℓ + H

(T )2,ℓ H

†(T )2,ℓ

))−1

Yℓ

]

− 1

L− nt,1 − nt,2

L−1∑

ℓ=nt,1+nt,2

[

log det

(

Inr − θSNR

(

H(T )1,ℓ H

†(T )1,ℓ + H

(T )2,ℓ H

†(T )2,ℓ

)

)]

.

(8.65)

The sum-rate Igmi1+2,T (SNR) can then be viewed as the GMI of an nr × (nt,1 +

nt,2)-dimensional MIMO channel with channel matrix (H1,k,H2,k). Noting that

the channel estimator produces the channel-estimate matrix(

H(T )1,k , H

(T )2,k

)

, it thus

follows from Section 7.4.1 that the pre-log

ΠR∗1+2

, limSNR→∞

R∗1+2(SNR)

log SNR(8.66)

is lower-bounded by

ΠR∗1+2

≥ min (nr, nt,1 + nt,2)

(

1− nt,1 + nt,2

L

)

(8.67)

for L ≤ 1/(2λD). This yields the third boundary of the pre-log region presented

in Theorem 8.1.

Combining (8.59), (8.60) and (8.67), and noting that the boundary is max-

imised for L being the largest integer satisfying L ≤ 12λD

, proves Theorem 8.1.

8.6 Conclusion

We have considered a two-user MIMO fading MAC and proposed a joint–trans-

mission scheme using nearest neighbour decoding and pilot-aided channel es-

timation. We have analysed the achievable rate region and have derived the

corresponding pre-log region. We have shown that the achievable pre-log region

is the best pre-log region for any scheme employing the number of pilots as many

as the sum of transmit antennas from all users.

We have compared the joint-transmission scheme with TDMA and have de-

rived sufficient conditions when the joint-transmission scheme is better than

TDMA and when TDMA is better than the joint-transmission scheme. We have

155

8.6 Conclusion

shown that TDMA is the optimal scheme for SISO fading MAC under a peak-

power constraint in the sense that it achieves the capacity pre-log region of the

SISO fading MAC. The joint-transmission scheme is typically better for channels

with large coherence time when the receiver employs more antennas than the

sum of transmit antennas. Large coherence time occurs when the mobiles move

at small to moderate speeds (up to 200 km/h) and the delay spread is not very

large (typical for indoor and urban environments). On the other aspects, the use

of more receive antennas at the receiver is feasible for uplink transmission, i.e.,

where the number of transmitter for both users are limited due to the size of the

device whereas at the receiver, the base station has more space to employ more

antennas. This suggests the potential of the joint-transmission scheme for uplink

transmission in indoor and urban environments. In other environments, one may

consider TDMA for multiple-access transmission.

156

Chapter 9

Summary and Future Research

In this chapter, we summarise the main contributions of this dissertation and

identify several potential areas for future research.

9.1 Main Contributions

This dissertation has focused on nearest neighbour decoding for fading channels

when the receiver does not have access to perfect CSI.

9.1.1 Part I

Part I studied nearest neighbour decoding in block-fading channels. The block-

fading channel is non-ergodic and its fundamental limit (assuming perfect CSIR)

is given by the outage probability, the probability that the channel is unable to

support the data rate. This fundamental limit can be achieved using nearest

neighbour decoding (which is optimal for perfect CSIR) and a good code design.

In practice, the CSIR is imperfect. Using mismatched-decoding approaches,

we have formulated a technique to study the imperfectness of CSIR and its

effect on nearest neighbour decoding. We have introduced the generalised outage

probability as a novel efficient tool for evaluating the reliability of transmission

over a block-fading channel. Moreover, using error exponents for mismatched

decoders, we have proved the achievability of the generalised outage probability.

Using the GMI converse, we have further shown that the generalised outage

probability is the fundamental limit for i.i.d. codebooks.

Studying the outage and the generalised outage diversities, which characterise

the high-SNR behaviour of the outage and the generalised outage probabilities,

respectively, reveals important system design criteria. We have shown that for

157


both Gaussian and discrete inputs, the generalised outage diversity is given by the

perfect-CSIR outage diversity times the minimum of the channel estimation error

diversity—which measures the decay of the channel estimation error variance as

a function of the SNR—and one. Therefore, in order to achieve the highest

possible diversity, the channel estimator should be designed so as to make the

estimation error diversity equal to or larger than one. We have further determined

a threshold on the required block length of random codes in order to achieve

the generalised outage diversity. The results obtained are well applicable for a

general fading model subsuming Rayleigh, Rician, Nakagami-m, Nakagami-q and

Weibull distributions as well as optical wireless channels with lognormal-Rice and

gamma-gamma scintillations.

To improve the generalised outage diversity, we have considered IR-ARQ, as

an adaptive transmission scheme based on binary feedback. Considering both

uniform and optimal power allocation, we have expressed the resulting ARQ

diversity as a function of the quality of feedback, the quality of channel estima-

tion and the maximum number of ARQ rounds. We have derived the condition

when IR-ARQ can provide a significant gain with respect to non-adaptive trans-

mission. We have also determined the condition when power-controlled ARQ is

superior to uniform-power ARQ. In order to utilise the full benefits offered by

power-controlled ARQ, we have shown that the quality of channel estimation has

to improve with the number of rounds. Our results highlight the importance of

accounting for imperfections in the channel for system design and provide guide-

lines on designing a good channel estimator and feedback signalling for IR-ARQ

schemes.

Power can be adapted if CSIT is available. Depending on the number of im-

perfect CSIT blocks prior to transmission, we consider full-, causal- and predictive-

CSIT power allocations. For full CSIT, we have characterised the generalised

outage diversity and have expressed it as a function of the qualities of CSIR

and CSIT and the perfect-CSIR outage diversity with uniform power alloca-

tion. For causal and predictive CSIT, the generalised outage diversity does not

only depend on the qualities of CSIR and CSIT but also on the CSIT delay

or the CSIT prediction parameter. Our results suggest that imperfect CSIR

has more detrimental effects on the generalised outage diversity than imperfect

CSIT. Hence, obtaining a reliable CSIR is more important than obtaining a reli-

able CSIT. The diversity characterisation allows to determine the condition when

the power adaptation is not beneficial with respect to uniform power allocation.

The results shed new light on the design of pilot-assisted channel estimation in

158


block-fading channels.

9.1.2 Part II

Part II studied nearest neighbour decoding in stationary ergodic noncoherent

fading channels. Reliable transmission over these channels is possible at rates

below the channel capacity. Due to the absence of CSI, the capacity of the

noncoherent fading channel is smaller than the capacity of the coherent fading

channel.

We have proposed a scheme for point-to-point MIMO fading channels that

estimates the fading with regular training via orthogonal pilots and feeds the

fading estimates to the nearest neighbour decoder. Assuming a bandlimited psd

of the fading process, we studied a set of information rates achievable with this

scheme in the limit as the SNR tends to infinity. Our results reveal that in

order to obtain reliable fading estimates, the portion of time required for pilot

transmission cannot be less than the number of transmit antennas times twice

the bandwidth of the fading psd. Using reliable estimates of the fading, the

nearest neighbour decoder can achieve a positive pre-log, which is given by the

capacity pre-log of the coherent fading channel times the fraction of time used

for the transmission of data. Hence, the loss with respect to the coherent case is

solely due to the transmission of orthogonal pilots used to obtain accurate fading

estimates. Furthermore, if the inverse of twice the bandwidth of the fading psd

is an integer, then for MISO channels, our scheme achieves the capacity pre-

log of the noncoherent fading channel derived by Koch and Lapidoth [91]. For

noncoherent MIMO channels, our scheme achieves the best so far known lower

bound on the capacity pre-log obtained by Etkin and Tse [1].

We have extended our analysis in the point-to-point MIMO channel to the

two-user fading MAC and have proposed a joint-transmission scheme for both

transmitting terminals. We have analysed the rate region that is achievable

with nearest neighbour decoding and pilot-aided channel estimation and have

determined the corresponding pre-log region. We have compared the joint-

transmission scheme with TDMA. We have shown that the joint-transmission

scheme is typically better than TDMA if the number of receive antennas is

larger than the sum of transmit antennas and if the bandwidth of the fading

psd is small. This shows the potential of the joint-transmission scheme in uplink

transmission where the base station can employ more antennas than the sum of

all antennas from mobile devices. If the number of receive antennas is smaller

159

9.2 Areas for Future Research

than the sum of all transmit antennas, then TDMA may be superior to joint

transmission. Indeed, for SISO fading MAC, TDMA is optimal in the sense that

it achieves the capacity pre-log region.


In Part I, we have shown that the generalised outage probability is the fundamen-

tal limit for i.i.d. codebooks. Lifting the restriction of i.i.d. codebooks may yield

a better error performance, as shown in [13, 36, 38]. Finding the fundamental

limits for general codes involves the discovery of a general tight converse bound,

which is still an open problem and can be considered for further research.

Our achievability results in Part I have been derived using random coding.

Random coding is not of practical interest and only implies that given a reliable

fading estimation, there exists a good code achieving the optimal code diver-

sity. Using our criteria of reliable fading estimation, one can investigate the

performance of existing structured codes under imperfect CSIR and compare the

high-SNR slope of the error probability with the outage diversity.

We have highlighted in Part I that practical system design for block-fading

channels has to account for all possible imperfections in the system and the

channel. For example, power control with imperfect ARQ feedback and imperfect

CSI depends on how noisy the feedback and the imperfect CSI are. In this

dissertation, we have not considered the power control algorithm that achieves

the optimal diversity and this is a potential problem that shall be solved in future

research.

So far, we have only considered independent block-fading channels. This

model can be too simple to capture practical scenarios. A more general correlated

block-fading model can be considered for future research. This is highly relevant

for predictive-CSIT power allocation, where fading realisations in the past can

be used to predict the future values. Based on the fading from the past up to

the number of predicted blocks, the transmitter allocates the power for current

codeword transmission. This research area will involve theories from stochastic

prediction.

In Part II, we have considered orthogonal pilots to estimate the fading and

observed that the main loss in the pre-log is due to the portion of time required

to transmit pilots. We note that orthogonal pilots require the number of pilot

vectors to be equal to the number of transmit antennas. Thus, in order to

reduce the portion of time for pilot transmission, one may consider designing

160


non-orthogonal pilots. It is prima facie not clear whether this will still yield a

fading estimation error whose variance vanishes with increasing SNR, which is

critical to achieve a positive pre-log.

In Part II, we have only dealt with bandlimited fading processes and the

rate pre-log. If the fading process is not bandlimited, then the rate pre-log

is zero and may not be a useful metric. References [85, 101] have shown that

for non-bandlimited fading processes, the capacity may grow, inter-alia, double

logarithmically with the SNR and the fading number has been introduced as a

performance metric to characterise the dominating term in the high-SNR regime.

Future research may consider developing a scheme for non-bandlimited fading

channels using the fading number as the performance metric.

In both Parts I and II, we have mostly resorted to asymptotic analyses in

the limit of large block length. We have only considered finite block length in

the random coding achievability. A more broad area for future research may

encompass theoretical analysis with finite block length. In our opinion, this is

the most exciting research area to be studied. The inherent difficulty of the

finite block-length analysis is that many convergence results in the literature,

such as the law of large numbers, the ergodic theorem, and the large deviation

technique, cannot be applied. One therefore needs to resort to new techniques

such as information density methods [102]. Recent progress in this area can be

found in [103–105]. Results on noncoherent fading channels are still limited and

many topics including mismatched decoding in the finite block-length regime can

be further explored.

161

Appendix A

A.1 Proof of Lemma 3.1 (Discrete Inputs)

We first bound the pdf of the fading (3.4) as1.1

w0|h|τe−w1(|h|+|w2|)ϕ ≤ w0|h|τe−w1|h−w2|ϕ ≤ w0|h|τe−w1|(|h|−|w2|)|ϕ. (A.1)

Let h = |h|eıφhand α = − log |h|2

log SNR. The lower bound for the joint pdf of Ab,r,t and

ΦHb,r,t is given as

PAb,r,t,ΦHb,r,t

(α, φ) ≥ w0

2log SNR · SNR−(1+ τ

2 )α · e−w1(SNR−α

2 +|w2|)ϕ

. (A.2)

For α < 0, we can see from the exponential term that the above lower bound

decays exponentially with the SNR; for α ≥ 0, the exponential term converges

to a constant as SNR ↑ ∞. As for the joint pdf upper bound, everything remains

unchanged except for the exponential term. We can write the upper bound for

the exponential term as follows

e−w1|(|h|−|w2|)|ϕ = e−w1|SNR−α

2 −|w2||ϕ . (A.3)

If α < 0, then the above term decays exponentially with the SNR. On the other

hand, if α ≥ 0, then the above term converges to a constant as SNR ↑ ∞. We

therefore have that both upper and lower bounds behave similarly for high SNR.

Let OX be the asymptotic perfect-CSIR outage set for the discrete constellation

X, which has been characterised in [32]. We then have the outage probability for

1.1For any ϕ ≥ 1, applying the triangle and reverse-triangle inequalities yields |h − w2|ϕ ≤(|h|+ |w2|)ϕ and |h− w2| ≥

∣

∣|h| − |w2|∣

∣ ≥ 0.

163

A.2 Proof of Theorem 3.1 (Discrete Inputs)

nr × nt MIMO channels with B fading blocks

PX

out(R).= SNR

−dXcsir (A.4)

.=

∫

OX∩A0SNR

−(1+ τ2 )

∑Bb=1

∑nrr=1

∑ntt=1 αb,r,tdAdΦH. (A.5)

Applying Varadhan’s lemma [106] yields

dXcsir = infOX∩A0

(

1 +τ

2

)

B∑

b=1

nr∑

r=1

nt∑

t=1

αb,r,t (A.6)

which is exactly the Rayleigh-fading result [32] multiplied by(

1 + τ2

)

.


We first state the following lemma, which applies to both i.i.d. Gaussian and

discrete inputs.

Lemma A.1. Consider the MIMO block-fading channel (3.3) with mismatched

CSIR (3.5). Denote the high-SNR generalised outage set by O, which is expressed

in terms of the normalised fading matrix A, fading phase matrix ΦH, normalised

error matrix Θ and error phase matrix ΦE. Then, the generalised outage proba-

bility satisfies

Pgout(R).= SNR

−dicsir (A.7)

.=

∫

O

PA,H(A,Φ

H)P(Θ)PE(ΦE)dAdΘdΦHdΦE (A.8)

.=

∫

O

PA,H(A,Φ

H)P(Θ)dAdΘdΦHΦE. (A.9)

For the fading model (3.4), dicsir is given by the solution of the following infimum

dicsir = infO∩A0,Θde×1

(

1 +τ

2

)

B∑

b=1

nr∑

r=1

nt∑

t=1

αb,r,t +

B∑

b=1

nr∑

r=1

nt∑

t=1

(θb,r,t − de)

.

(A.10)

164


Proof. The joint probability of A, , H and E over O can be written as (A.9)

because the random matrices and E are independent. Since each entry of E

is uniformly distributed over [0, 2π), the density PE(ΦE) does not affect the dot

equality. This lemma is then obtained by applying Varadhan’s lemma [106] to

(A.9). The condition A 0 is the same as that for perfect CSIR in Appendix

A.1. On the other hand, the condition Θ de×1 is derived as follows. Consider

the entry of at block b, receive antenna r and transmit antenna t, Θb,r,t. The

pdf of Θb,r,t is given by

PΘb,r,t(θ) = log SNR · SNRde−θ · exp

(

−SNRde−θ

)

. (A.11)

We can see that the interval for which the pdf (A.11) does not decay exponentially

with the SNR is given by θ ≥ de. The condition Θ de×1 follows by considering

all entries of .

We start the proof of the discrete-input SNR-exponent with the proof for

SISO channels. The proof for MIMO channels follows as an extension of the

proof for SISO channels.

A.2.1 SISO Case

GMI Lower Bound

For the SISO channel, (3.24) becomes

Igmib (SNR, hb, hb, s)

=M − 1

2M

∑

x∈X

log2∑

x′∈X

(

e−s|√SNR(hbx−hbx

′)+Z|2+s|√SNR(hb−hb)x+Z|2

)

(A.12)

and the GMI is given by

Igmi(h) = sups>0

1

B

B∑

b=1

Igmib (SNR, hb, hb, s). (A.13)

165


For a given s > 0 and any noise realisation z ∈ , we can bound the summation

of the term inside the expectation in (A.12) as

0 ≤B∑

b=1

log2∑

x′∈X

(


′)+z|2+s|√SNR(hb−hb)x+z|2

)

(A.14)

≤B∑

b=1

log2

(

|X|es|√SNR(hb−hb)x+z|2

)

(A.15)

=B∑

b=1

log |X|+ s|z −√SNRebx|2

log 2. (A.16)

We have the expectation over Z

[

B∑

b=1

log |X|+ s|Z −√SNRebx|2

log 2

]

=B log |X|+ s(B + SNR

∑Bb=1 |eb|2|x|2)

log 2.

(A.17)

Note that |X| and |x|2, ∀x ∈ X are assumed to be finite and independent of the

SNR. Thus, to make sure that the RHS of (A.17) is finite

B log |X|+ s(B + SNR∑B

b=1 |eb|2|x|2)log 2

<∞, (A.18)

we can pick s in the set S ⊆ ,

S ,

s ∈ : 0 < s ≤ 1

B + SNR∑B

b=1 |eb|2

. (A.19)

Hence, for any s ∈ S, we can apply the dominated convergence theorem [19], for

which

limSNR→∞

log2∑

x′∈X

(



)

=

limSNR→∞

log2∑

x′∈X

(



)

. (A.20)

Replacing the supremum over s > 0 in (A.13) with the supremum over s ∈ S

166


results in a lower bound to the GMI. Substituting a specific value of s ∈ S further

lower-bounds the GMI. As we will show later, the following choice of s

s =1

B + SNR(1+ε)∑B

b=1 |eb|2, ε > 0 (A.21)

yields a tight GMI lower bound at high SNR. At high SNR, the range ε > 0

allows for s ↓ 0 (ε ↑ ∞) and for s→ 1

B+SNR∑B

b=1 |eb|2(ε ↓ 0).

Using the transformation of variables αb = − log |hb|2log SNR

and θb = − log |eb|2logSNR

, the

exponential term in (A.12) with s = s becomes



= e−s

∣

∣

∣

∣

SNR1−αb

2 eıφhb (x−x′)+z−SNR

1−θb2 eıφ

ebx′

∣

∣

∣

∣

2

+s

∣

∣

∣

∣

z−SNR1−θb

2 eıφebx

∣

∣

∣

∣

2

(A.22)

where φhb and φe

b are the angles of hb and eb, respectively. We partition b ∈1, . . . , B into four disjoint subsets as follows

• b ∈ B1 if αb > 1 and αb < θb;

• b ∈ B2 if αb < 1 and αb < θb;

• b ∈ B3 if αb > 1 and αb > θb;

• b ∈ B4 if αb < 1 and αb > θb.

Note that the expression s in (A.21) satisfies

s.= min

(

SNR0, SNRθmin−1−ε

)

. (A.23)

In order to obtain a good lower bound, we shall select ε such that

0 < ε < θmin − αmax (A.24)

where

αmax = max

αb

∣

∣

∣

∣

αb < min(1, θmin), b = 1, . . . , B

. (A.25)

Then, the convergence of (A.22) as the SNR tends to infinity can be explained

as follows.

167


1. For b ∈ B1, we have that αb > 1 and αb < θb. Under this condition and after

exchanging the limit and the expectation in (A.20), it can be observed that

(A.22) tends to one for any s > 0. It implies that Igmib (SNR, hb, hb, s) →

0, ∀b ∈ B1.

2. For b ∈ B2, we have that αb < 1 and αb < θb. The dominating term

in the exponent of (A.22) is given by −s × SNR1−αb . Thus, for b ∈ B2

and αb < θmin, exchanging the limit and the expectation in (A.20) yields

the convergence of (A.22) to zero as the SNR tends to infinity. We then

have that Igmib (SNR, hb, hb, s) → M . On the other hand, for b ∈ B2 and

αb ≥ θmin, we observe the convergence of (A.22) to one as the SNR tends

to infinity. This implies that Igmib (SNR, hb, hb, s) → 0.

3. For b ∈ B3 and θb < 1, we have the dot equality

− s∣

∣

∣SNR

1−αb2 eıφ

hb (x− x′) + z − SNR

1−θb2 eıφ

ebx′∣

∣

∣

2

+ s∣

∣

∣z − SNR

1−θb2 eıφ

ebx∣

∣

∣

2

.= −s

(

|x′|2 − |x|2)

SNR1−θb (A.26)

for |x| 6= |x′| and

− s∣

∣

∣SNR

1−αb2 eıφ


1−θb2 eıφ

ebx′∣

∣

∣

2

+ s∣

∣

∣z − SNR

1−θb2 eıφ

ebx∣

∣

∣

2

.= s|z|SNR

1−θb2 cos(φz − φe

b)|x| ·(

cosφx′ − cosφx)

+ s|z|SNR1−θb

2 sin(φz − φeb)|x| ·

(

sin φx′ − sin φx)

(A.27)

for |x| = |x′|, x 6= x′, where φz is the angle of z. In this case, we can-

not use the dominated convergence theorem [19] in (A.20) since there is

a dependency on z. Instead, since the logarithm is a concave function

of its argument, we first apply Jensen’s inequality [18, Th. 2.6.2] to the

168


expectation in (A.12)

log2∑

x′∈X

(



)

≤ log2

∑

x′∈X

(



)

(A.28)

= log2∑

x′∈X

(



)

. (A.29)

For a given z ∈ , we have the bounds

0 ≤(



)

≤ es|z−√SNRebx|2. (A.30)

Averaging over Z yields

[

es|Z−√SNRebx|2

]

=1

1− se

(

s2

1−s+s

)

SNR|eb|2|x|2 (A.31)

where we have assumed s < 1 so that the the above expectation can be

evaluated. Furthermore, using s = s in (A.21), the RHS of (A.31) can be

guaranteed to be finite. Thus, with s = s, we can apply the dominated

convergence theorem [19] as

limSNR→∞

(



)

=

limSNR→∞

(



)

. (A.32)

For |x| 6= |x′|, using the relationship in (A.32) and ε in (A.24), we observe

that (A.26) tends to zero as the SNR tends to infinity and (A.22) tends to

169


one. To evaluate (A.27), we first upper-bound

s|z|SNR1−θb

2 cos(φz − φeb)|x| ·

(

cosφx′ − cosφx)

+ s|z|SNR1−θb

2 sin(φz − φeb)|x| ·

(

sin φx′ − sin φx)

≤ 4s|z|SNR1−θb

2 |x|. (A.33)

Let W = |Z|. Then, W has the Rayleigh pdf

PW (w) = 2we−w2

, w ≥ 0. (A.34)

Using the result in (A.29) and the upper bound (A.33) for |x| = |x′|, x 6= x′,

we have that

[

e4s|Z|SNR1−θb

2 |x|]

=

[

e4sWSNR1−θb

2 |x|]

(A.35)

=

∫ ∞

0

e4swSNR1−θb

2 |x| · 2we−w2

dw (A.36)

= 1 + 2√πsSNR

1−θb2 |x|e(4s2SNR1−θb |x|2)

·(

1 + erf(

2sSNR1−θb

2 |x|))

(A.37)

≤ 1 + 4√πsSNR

1−θb2 |x|e(4s2SNR1−θb |x|2) (A.38)

where erf(·) is the error function [53]. Inequality (A.38) is due to the bound

erf(a) ≤ 1. Note that for θb < 1, we have

s · SNR1−θb

2.= SNR

θmin−1−ε · SNR1−θb

2 (A.39).

≤ SNRθmin−1−ε · SNR1−θb (A.40)

.= SNR

θmin−θb−ε. (A.41)

As θmin − ε is always less than θb, the last dot equality implies that as

the SNR tends to infinity, the upper bound in (A.38) tends to one. This

provides an upper bound to the expectation over Z in (A.29) when |x| =|x′|, x 6= x′, αb > 1 and θb < 1. Complementing the result with the one for

|x| 6= |x′|, αb > 1 and θb < 1, we have that Igmib (SNR, hb, hb, s) → 0 when

θb < 1.

For b ∈ B3 and θb > 1, it can be observed that (A.22) tends to one as the

170


SNR tends to infinity for any s > 0. This implies that Igmib (SNR, hb, hb, s) →

0.

4. For b ∈ B4, we always have θb < 1. For |x| 6= |x′|, we have the dot equality

− s∣

∣

∣SNR

1−αb2 eıφ


1−θb2 eıφ

ebx′∣

∣

∣

2

+ s∣

∣

∣z − SNR

1−θb2 eıφ

ebx∣

∣

∣

2

.= −s

(

|x′|2 − |x|2)

SNR1−θb . (A.42)

On the other hand, for |x| = |x′|, x 6= x′, we have the dot equality

− s∣

∣

∣SNR

1−αb2 eıφ


1−θb2 eıφ

ebx′∣

∣

∣

2

+ s∣

∣

∣z − SNR

1−θb2 eıφ

ebx∣

∣

∣

2

.= −s · SNR1−αb+θb

2 |x|2 ·(

cos(φehb )− cos(φeh

b + φx′x))

. (A.43)

Then, using ε in (A.24) and exchanging the limit and the expectation in

(A.20), it can be observed for both |x| 6= |x′| and |x| = |x′| (x 6= x′) that

(A.22) tends to one as the SNR tends to infinity. Thus, we have that

Igmib (SNR, hb, hb, s) → 0, ∀b ∈ B4.

From the above analysis, the generalised outage probability can be upper-bounded

as

Pgout(R).= SNR

−dXicsir (A.44)

.≤ Pr

1

B

B∑

b=1

M · 1

αb ≤ 1− ǫ ∩ αb ≤ θmin − δ

< R

(A.45)

.=

∫

Oǫ,δX

PA,ΦH(α,φh)PΘ(θ)PΦE(φe)dαdθdφhdφe (A.46)

where we have defined

Oǫ,δX

,

α, θ ∈ B :

B∑

b=1

1

αb ≤ 1− ǫ ∩ αb ≤ θmin − δ

<BR

M

(A.47)

for any ǫ, δ > 0. Applying Lemma A.1 yields

dXicsir ≥ infOǫ,δX

∩α0,θde×1

(

1 +τ

2

)

B∑

b=1

αb +B∑

b=1

(θb − de)

. (A.48)

171


Following the steps used in [23], we can show that the values of α and θ

achieving the infimum are given by

θ∗b = de, for b = 1, . . . , B (A.49)

α∗b = min(1− ǫ, θ∗b − δ), for b = 1, . . . , B − b∗ (A.50)

α∗b = 0, for b = B − b∗ + 1, . . . , B (A.51)

where b∗ ∈ 0, . . . , B − 1 is the unique integer satisfying b∗

B< R

M≤ b∗+1

B. As

this is valid for any ǫ > 0 and δ > 0, the lower bound for dXicsir is tight if we let

ǫ, δ ↓ 0. This yields

dXicsir ≥ min(1, de)×(

1 +τ

2

)

dSB(R) (A.52)

where dSB(R) is the Singleton bound

dSB(R) = 1 +

⌊

B

(

1− R

M

)⌋

. (A.53)

GMI Upper Bound

For each x, x′ ∈ X, we define

fx,x′(sb, SNR, hb, eb, z) , e−sb|√SNRhb(x−x′)+z−√SNRebx

′|2+sb|z−√SNRebx|2 (A.54)

and for each x ∈ X, we define

fx(sb, SNR, hb, eb, z) , log2∑

x′∈Xfx,x′(sb, SNR, hb, eb, z). (A.55)

Then, by Proposition 2.3, the GMI can be upper-bounded as

Igmi(h) ≤ 1

B

B∑

b=1

supsb>0

Igmib

(

SNR, hb, hb, sb

)

(A.56)

172


where

Igmib

(

SNR, hb, hb, sb

)

=M − 1

2M

∑

x∈X

[

fx(sb, SNR, hb, eb, Z)

]

(A.57)

=M − [

1

2M

∑

x∈Xfx(sb, SNR, hb, eb, Z)

]

. (A.58)

In order to evaluate the GMI upper bound, we first partition X into some

sets, each has equi-energy signal points. Suppose that the constellation X has

n energy levels. Denote Xn′ , n′ = 1, . . . , n, as the subset of X corresponding

to the n′-th energy level. Then, we can partition X into n disjoint subsets Xn′ ,

n′ = 1, . . . , n such that

X = X1 ∪ . . . ∪ Xn. (A.59)

Note that for each n′, n′ = 1, . . . , n, all signal points in Xn′ have the same energy.

We shall use this partition in the following high-SNR analysis, which is based

on Proposition 2.3 and Fatou’s lemma [41]. To this end, we use the change of

variables from |hb|2 and |eb|2 to αb and θb so that we can write

e−sb|√SNRhb(x−x′)+z−

√SNRebx

′|2+sb|z−√SNRebx|2

= e−sb

∣

∣

∣

∣

SNR1−αb

2 eıφhb (x−x′)+z−SNR

1−θb2 eıφ

ebx′

∣

∣

∣

∣

2

+sb

∣

∣

∣

∣

z−SNR1−θb

2 eıφebx

∣

∣

∣

∣

2

. (A.60)

We expand the exponential term and consider the following cases.

1. Case 1: αb > 1. Regardless of the value of θb, the supremum of

Igmib (SNR, hb, hb, sb) over sb > 0 in (A.56) tends to zero as it is upper-

bounded by the perfect-CSIR mutual information for block b [23].

2. Case 2: αb < 1 and αb < θb. The supremum on the RHS of (A.56) is

equivalent to the following infimum

infsb>0

1

2M

∑

x∈X

[


]

, (A.61)

173


which can be lower-bounded as

infsb>0

1

2M

∑

x∈X

[


]

≥ 1

2M

∑

x∈X

[

infsb>0


]

(A.62)

by exchanging the infimum over sb and the expectation twice. Let sb be

the value of sb that gives the infimum on the RHS of (A.62). The choice

of sb depends on the behaviour of the following term

e−sb|√SNRhb(x−x′)+z−

√SNRebx

′|2+sb|z−√SNRebx|2. (A.63)

It follows that if

|√SNRhb(x− x′) + z −

√SNRebx

′|2 > |z −√SNRebx|2 (A.64)

the solution of sb is given by sb ↑ ∞. Otherwise, the solution of sb is given

by sb ↓ 0. Note that since we have the dot equality

− sb

∣

∣

∣SNR

1−αb2 eıφ


1−θb2 eıφ

ebx′∣

∣

∣

2

+ sb

∣

∣

∣z − SNR

1−θb2 eıφ

ebx∣

∣

∣

2

.= −sbSNR1−αb (A.65)

for x 6= x′, αb < 1 and αb < θb, it follows that in this case sb ↑ ∞.

Since fx(sb, SNR, hb, eb, z) ≥ 0, we can apply Fatou’s lemma [41] to the RHS

of (A.62) as follows

limSNR→∞

1

2M

∑

x∈X

[


]

≥ 1

2M

∑

x∈X

[

limSNR→∞

fx(sb, SNR, hb, eb, Z)]

. (A.66)

This gives a further lower bound to the RHS of (A.62) and yields an upper

bound to Igmi(h).

Using (A.65) and the limit in (A.66), we can show that (A.60) tends to

zero for x 6= x′ as the SNR tends to infinity, and (A.60) is equal to one for

174


x = x′. Thus, the supremum of Igmib (SNR, hb, hb, sb) over sb > 0 in (A.56)

is upper-bounded by M for αb < 1 and αb < θb.

3. Case 3: αb < 1 and αb > θb. The supremum in (A.56) is equivalent to the

following infimum

infsb>0

[

1

2M

∑


]

(A.67)

which can be lower-bounded as

infsb>0

[

1

2M

∑


]

≥ [

infsb>0

1

2M

∑


]

(A.68)

by exchanging the infimum over sb and the expectation. The terms in the

exponent of (A.60) can be shown to have the following dot equality

− sb

∣

∣

∣SNR

1−αb2 eıφ


1−θb2 eıφ

ebx′∣

∣

∣

2

+ sb

∣

∣

∣z − SNR

1−θb2 eıφ

ebx∣

∣

∣

2

.= −sb

(

|x′|2 − |x|2)

SNR1−θb (A.69)

for |x| 6= |x′|. On the other hand, for |x| = |x′|, x 6= x′, we have that

− sb

∣

∣

∣SNR

1−αb2 eıφ


1−θb2 eıφ

ebx′∣

∣

∣

2

+ sb

∣

∣

∣z − SNR

1−θb2 eıφ

ebx∣

∣

∣

2

.= −sb · SNR1−αb+θb

2 |x|2 ·(

cos(φehb )− cos(φeh

b + φx′x))

(A.70)

with probability one since the constellation X is discrete and both ΦHb and

ΦHb are uniformly distributed over [0, 2π). Hence, for each x ∈ X and

x 6= x′, we have that at high SNR

fx,x′(sb, SNR, hb, eb, z) = e−sb(|x′|2−|x|2)SNR1−θb(A.71)

175


for |x| 6= |x′|, and

fx,x′(sb, SNR, hb, eb, z) = e−sb·SNR

1−αb+θb

2 |x|2·(

cos(φehb )−cos(φeh

b +φx′x))

(A.72)

for |x| = |x′|, x 6= x′. It follows that

fx(sb, SNR, hb, eb, z) = log2∑

x′∈Xfx,x′(sb, SNR, hb, eb, z). (A.73)

Using the second order derivative of the log-sum-exp function

fx(sb, SNR, hb, eb, z), it can be shown that the function

∑

x∈Xfx(sb, SNR, hb, eb, z) (A.74)

is convex in sb for sb > 0. To check whether the extreme point, which gives

the global minimum to (A.74), exists for sb > 0, we can simply find the

derivative of (A.74) at sb = 0 [72]

∂

∂sb

∑

x∈Xfx(sb, SNR, hb, eb, z)

∣

∣

∣

∣

∣

sb=0

=−SNR

1−αb+θb2

|X| log 2∑

x∈X

∑

x′ 6=x|x′|=|x|x′∈X

(

|x|2 cos(φehb )− |x|2 cos(φeh

b + φx′x))

.

(A.75)

We next apply the partitioning in (A.59). Consider a pair of signal points

(x1, x2) such that x1, x2 ∈ Xn′ , |Xn′| ≥ 2, for n′ ∈ 1, . . . , n. The contri-

176


bution of the pair (x1, x2) in the summations in (A.75) is given by

|x1|2 cos(φehb )− |x1|2 cos(φeh

b + φx2x1) + |x2|2 cos(φehb )

− |x2|2 cos(φehb + φx1x2)

= |x1|2(

2 cos(φehb )− cos(φeh

b + φx2x1)− cos(φehb − φx2x1)

)

(A.78)

= cos(φehb )|x1|2

(

2− 2 cos(φx2x1))

(A.79)

= cos(φehb )|x1|2

(

1− cos(φx2x1) + 1− cos(φx1x2)

)

(A.80)

= cos(φehb )|x1|2

(

1− cos(φx2x1))

+ cos(φehb )|x2|2

(

1− cos(φx1x2))

(A.81)

where we have used φx1x2 = −φx2x1 by the definition, the equality |x1|2 =|x2|2, and the trigonometry identities

cos(a+ b) + cos(a− b) = 2 cos(a) cos(b) (A.82)

and cos(−a) = cos(a). Define

Q ,

n′ : |Xn′ | ≥ 2, n′ = 1, . . . , n

. (A.83)

Using the result in (A.81), we can re-write the summations in (A.75) as

∑

x∈X

∑

x′ 6=x|x′|=|x|x′∈X

(

|x|2 cos(φehb )− |x|2 cos(φeh

b + φx′x))

= cos(φehb )∑

n′∈Q

∑

x∈Xn′

∑

x′ 6=x,x′∈Xn′

|x|2(1− cos(φx′x)) (A.84)

where we have incorporated all Xn′ , n′ = 1, . . . , n that satisfy |Xn′| ≥ 2.

Let sb be the value of sb that gives the infimum on the RHS of (A.68). Note

that this sb is different to the one in case 2. We have from (A.84) that the

condition 1 − cos(φx′x) ≥ 0 is always true. Then, if cos(φehb ) ≤ 0, then the

derivative in (A.75) is always non-negative, which implies that the solution

of sb that leads to the infimum on the RHS of (A.68) is given by sb ↓ 0. By

177


using sb ↓ 0 and applying Fatou’s lemma [41] to the RHS of (A.68)

limSNR→∞

[

1

2M

∑


]

≥ [

limSNR→∞

1

2M

∑


]

, (A.85)

we have that the upper bound for the supremum of Igmib (SNR, hb, hb, sb)

over sb > 0 in (A.56) tends to zero as the SNR tends to infinity.

On the other hand, if cos(φehb ) > 0, then the derivative in (A.75) is al-

ways non-positive. Thus, there is a possibility that there exists a positive

number sb in the interval sb > 0 that leads to the infimum on the RHS

of (A.68). This also implies that the upper bound for the supremum of

Igmib (SNR, hb, hb, sb) over sb > 0 in (A.56) is in [0,M ].

We can then derive a loose upper bound as follows. By explicitly writing

φehb = φe

b − φhb , we first define

Ξb ,

φhb , φ

eb ∈ [0, 2π) : cos

(

φeb − φh

b

)

> 0

. (A.86)

A loose upper bound is then obtained by considering that when Ξb occurs,

the upper bound for the supremum of Igmib (SNR, hb, hb, sb) over sb > 0 in

(A.56) is given by M .

From the above cases, we can show that the generalised outage probability is

lower-bounded as

Pgout(R).= SNR

−dXicsir (A.87)

.≥ Pr

1

B

B∑

b=1

M · 1

Eǫ,δ(αb, θb,Ξb)

< R

(A.88)

.=

∫

Oǫ,δX

PA,ΦH(α,φh)PΘ(θ)PΦE(φe)dαdθdφhdφe (A.89)


Eǫ,δ(αb, θb,Ξb) ,

αb ≤ 1 + ǫ ∩ αb ≤ θb + δ

∪

αb ≤ 1 + ǫ ∩ αb > θb + δ ∩ Ξb

(A.90)

178


for ǫ, δ > 0, and

Oǫ,δ

X,

α, θ ∈ B :B∑

b=1

1

Eǫ,δ(αb, θb,Ξb)

<BR

M

. (A.91)

Applying Lemma A.1 yields

dXicsir ≤ infOǫ,δX ∩α0,θde×1

(

1 +τ

2

)

B∑

b=1

αb +

B∑

b=1

(θb − de)

. (A.92)

Similarly to the GMI lower bound, it is not difficult to show that the values

of θb, b = 1, . . . , B achieving the infimum are given by de. To find the values of

αb, b = 1, . . . , B that solve for the infimum, we need to see whether there exists

φhb and φe

b that do not belong to Ξb. Note that the following condition

π

2≤ φe

b − φhb ≤ 3π

2(A.93)

implies that cos(φeb − φh

b ) ≤ 0. Thus, from (A.86) and (A.93), we can always

find (φhb , φ

eb) /∈ Ξb. It then follows from [23] that the values of αb, b = 1, . . . , B

achieving the infimum are given by

α∗b = min(1 + ǫ, θ∗b + δ), for b = 1, . . . , B − b∗ (A.94)

α∗b = 0, for b = B − b∗ + 1, . . . , B (A.95)

where b∗ ∈ 0, . . . , B − 1 is the unique integer satisfying b∗

B< R

M≤ b∗+1

B.

Substituting the values of αb and θb, b = 1, . . . , B achieving the infimum (A.92),

we obtain the upper bound of the SNR-exponent

dXicsir ≤ min(1, de)×(

1 +τ

2

)

dSB(R) (A.96)

where we have let ǫ, δ ↓ 0 to make the upper bound tight.

179


A.2.2 MIMO Case

Recall (3.24)


=Mnt −

log2∑

x′∈Xnt

(

e−s

∥

∥

∥

√

SNR

nt(HbX−Hbx

′)+Z

∥

∥

∥

2+s

∥

∥

∥

√

SNR

nt(Hb−Hb)X+Z

∥

∥

∥

2)

(A.97)

where the expectation is over (X,Z). The GMI is given by

Igmi(H) = sups>0

1

B

B∑

b=1

Igmib (SNR,Hb, Hb, s). (A.98)

Mimicking the analysis done for the SISO case, we have the GMI lower and upper

bounds as follows.

GMI Lower Bound

Using the suboptimal s ∈ S to apply the dominated convergence theorem [19],

we have that

− s

∥

∥

∥

∥

∥

√

SNR

ntHb(x− x′) + z −

√

SNR

ntEbx

′

∥

∥

∥

∥

∥

2

+ s

∥

∥

∥

∥

∥

z −√

SNR

ntEbx

∥

∥

∥

∥

∥

2

.= −s

nr∑

r=1

∣

∣

∣

∣

∣

∣

nt∑

t=1

SNR1−αb,r,t

2 eıφhb,r,t(xt − x′t) + zr −

nt∑

t=1

SNR1−θb,r,t

2 eıφeb,r,tx′t

∣

∣

∣

∣

∣

∣

2

+ s

nr∑

r=1

∣

∣

∣

∣

∣

zr −nt∑

t=1

SNR1−θb,r,t

2 eıφeb,r,txt

∣

∣

∣

∣

∣

2

(A.99)

where now |eb|2 in (A.21) changes to ‖Eb‖2F , and where φhb,r,t and φe

b,r,t are the

angles of hb,r,t and eb,r,t, respectively. Similarly to what it is done in [32], define

180


the following sets S(ǫ,δ)b,r , S

(ǫ,δ)b , and κb for ǫ, δ > 0 as

S(ǫ,δ)b,r ,

t : αb,r,t ≤ 1− ǫ ∩ αb,r,t ≤ θmin − δ, t = 1, . . . , nt

, (A.100)

S(ǫ,δ)b ,

nr⋃

r=1

S(ǫ,δ)b,r , (A.101)

κb ,∣

∣

∣S(ǫ,δ)b

∣

∣

∣(A.102)

where now θmin , minθ1,1,1, . . . , θb,r,t, . . . , θB,nr,nt. Note that s satisfies

s.= SNR

min(0,θmin−1−ε) (A.103)

where ε is chosen such that

0 < ε < θmin − αmax, (A.104)

and where

αmax

= max

αb,r,t

∣

∣

∣

∣

αb,r,t < min (1, θmin) , b = 1, . . . , B, r = 1, . . . , nr, t = 1, . . . , nt

.

(A.105)

For r = 1, . . . , nr and xt 6= x′t, if there exists αb,r,t satisfying the constraint

set S(ǫ,δ)b , then with s = s, the exponential function inside the expectation on

the RHS of (A.97) tends to zero as the SNR tends to infinity. Otherwise, the

exponential function converges to one. Therefore, we can write the following dot

181


equality for high SNR

− s

∣

∣

∣

∣

∣

∣

nt∑

t=1

SNR1−αb,r,t


nt∑

t=1

SNR1−θb,r,t

2 eıφeb,r,tx′t

∣

∣

∣

∣

∣

∣

2

+ s

∣

∣

∣

∣

∣

zr −nt∑

t=1

SNR1−θb,r,t

2 eıφeb,r,txt

∣

∣

∣

∣

∣

2

.= −s

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∑

t∈S(ǫ,δ)b,r

xt 6=x′t

SNR1−αb,r,t


nt∑

t=1

SNR1−θb,r,t

2 eıφeb,r,tx′t

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

2

+ s

∣

∣

∣

∣

∣

zr −nt∑

t=1

SNR1−θb,r,t

2 eıφeb,r,txt

∣

∣

∣

∣

∣

2

. (A.106)

Let s∗ be the value of s > 0 that solves the supremum on the RHS of (A.98).

Using the suboptimal s = s given in (A.103), we have the upper bound for the

expectation over Z at high SNR as follows

limSNR→∞

log2∑

x′∈Xnt

(

e−s∗

∥

∥

∥

√

SNR

ntHb(x−x′)+Z−

√

SNR

ntEbx

′∥

∥

∥

2+s∗

∥

∥

∥Z−

√

SNR

ntEbx

∥

∥

∥

2)

≤ [

log2∑

x′∈Xnt

1

xt 6= x′t, ∀t ∈ S(ǫ,δ)b

]

(A.107)

=M(nt − κb) (A.108)

for all x ∈ Xnt. Thus,

limSNR→∞


∗) ≥Mκb (A.109)

and Pgout(R) is upper-bounded as

Pgout(R).

≤ Pr

1

B

B∑

b=1

Mκb < R

. (A.110)

Define

OX ,

A,Θ ∈ Bnr×nt :B∑

b=1

κb <BR

M

. (A.111)

182


Then, applying Lemma A.1 yields the lower bound for the SNR-exponent

dXicsir ≥ infOX∩A0,Θde×1

(

1 +τ

2

)

B∑

b=1

nr∑

r=1

nt∑

t=1

αb,r,t +B∑

b=1

nr∑

r=1

nt∑

t=1

(θb,r,t − de)

.

(A.112)

We can observe from (A.100) that the solution of θb,r,t for all b = 1, . . . , B,

r = 1, . . . , nr and t = 1, . . . , nt to the above infimum is given by de. Following

the analysis in [32], it can be proved that the solution of the above infimum is

given by

dXicsir ≥ min(1, de) ·(

1 +τ

2

)

nr

(

1 +

⌊

B

(

nt −R

M

)⌋)

. (A.113)

GMI Upper Bound

Similarly to the SISO analysis, the GMI upper bound is evaluated using Proposi-

tion 2.3 and Fatou’s lemma [41]. The only difference with the GMI lower bound

is in the definition of the sets

S(ǫ,δ)

b,r ,

t :

αb,r,t ≤ 1 + ǫ ∩ αb,r,t ≤ θb,r,t + δ

∪

αb,r,t ≤ 1 + ǫ ∩ αb,r,t > θb,r,t + δ ∩ Ξb,r,t

, t = 1, . . . , nt

,

(A.114)

S(ǫ,δ)

b ,

nr⋃

r=1

S(ǫ,δ)

b,r , (A.115)

κb ,∣

∣

∣S(ǫ,δ)

b

∣

∣

∣(A.116)

where

Ξb,r,t ,

φhb,r,t, φ

eb,r,t ∈ [0, 2π) : cos

(

φeb,r,t − φh

b,r,t

)

> 0

. (A.117)

183

A.3 Proof of Theorem 3.1 (Gaussian Inputs)

Following the same steps used in the SISO analysis, we can lower-bound the

expectation over Z as follows

limSNR→∞

log2∑

x′∈Xnt

(

e−s∗

∥

∥

∥

√

SNR

ntHb(x−x′)+Z−

√

SNR

ntEbx

′∥

∥

∥

2+s∗

∥

∥

∥Z−√

SNR

ntEbx

∥

∥

∥

2)

≥ [

log2∑

x′∈Xnt

1

xt 6= x′t, ∀t ∈ S(ǫ,δ)

b

]

(A.118)

=M(nt − κb). (A.119)

The generalised outage probability can then be lower-bounded as

Pgout(R).

≥ Pr

1

B

B∑

b=1

Mκb < R

. (A.120)

Define

OX ,


b=1

κb <BR

M

. (A.121)

Using OX to apply the result in Lemma A.1 and following the technique used for

the GMI lower bound, the SNR-exponent can be proved to be upper-bounded as

dXicsir ≤ min(1, de) ·(

1 +τ

2

)

nr

(

1 +

⌊

B

(

nt −R

M

)⌋)

. (A.122)

This is because the infimum solution for θb,r,t in (A.114) is the same as that for

θmin in (A.100) (given by de), and because we can always find φhb,r,t and φ

eb,r,t that

do not belong to Ξb,r,t. This completes the proof for discrete inputs.


Recall the GMI (3.20) for i.i.d. Gaussian inputs (in nats per channel use)

Igmi(H) = sups>0

1

B

B∑

b=1

Igmib (SNR,Hb, Hb, s) (A.123)

184


where

Igmib (SNR,Hb, Hb, s) = log det

(

Inr + sHbH†b

SNR

nt

)

− s

(

nr +SNR

nt

‖Hb − Hb‖2F)

+ [

sY †Σ−1y Y

∣

∣

∣H b = Hb,Eb = Eb

]

(A.124)

and where

Σy , Inr + sHbH†b

SNR

nt. (A.125)

In the following, we derive lower and upper bounds to (A.123) to prove Theorem

3.1.

A.3.1 GMI Lower Bound

We first note from [72, App. D] that [sY †Σ−1y Y |H b = Hb,Eb = Eb] is non-

negative. Then, we have that

Igmib (SNR, Hb,Eb, s) ≥ log det

(

Inr + sHbH†b

SNR

nt

)

− s

(

nr +SNR

nt

‖Eb‖2F)

.

(A.126)

Without loss of generality, assume that nt ≥ nr.1.2 Let λb,i, i = 1, . . . , nr be

the i-th eigenvalue of HbH†b. Then, the RHS of (A.126) can be converted into

eigenvalues expression

Igmib (SNR, λb,Eb, s) ≥ log

nr∏

i=1

(

1 + sλb,iSNR

nt

)

− s

(

nr +SNR

nt‖Eb‖2F

)

.

(A.127)

1.2If nt < nr, then it suffices to replace(

Inr+ sHbH

†bSNRnt

)

with(

Int+ sH

†bHb

SNRnt

)

.

185


We can further lower-bound (A.127) using

nr∏

i=1

(

1 + sλb,iSNR

nt

)

=

(

1 + sλb,1SNR

nt

)

. . .

(

1 + sλb,nr

SNR

nt

)

(A.128)

≥ 1 + sλb,1SNR

nt

+ sλb,2SNR

nt

+ · · ·+ sλb,nr

SNR

nt

(A.129)

= 1 + sSNR

nt

nr∑

i=1

λb,i (A.130)

= 1 + sSNR

nt‖Hb‖2F (A.131)

= 1 + sSNR

nt

nr∑

r=1

nt∑

t=1

|hb,r,t|2 (A.132)

where the inequality follows since HbH†b is a positive semidefinite matrix, where

the singular values are always zero or positive. It holds that [86]

‖Hb‖2F =nr∑

i=1

λb,i =nr∑

r=1

nt∑

t=1

|hb,r,t|2. (A.133)

We then have a lower bound to the GMI as

Igmi(H) = sups>0

1

B

B∑

b=1

Igmib (SNR,Hb, Hb, s) (A.134)

≥ sups>0

1

B

B∑

b=1

log

(

1 + s‖Hb‖2FSNR

nt

)

− s

(

nr +SNR

nt

‖Eb‖2F)

.

(A.135)

The optimiser for s is difficult to evaluate in a closed form due to the sum over

b involving the logarithm function. A suboptimal s can be obtained as follows.

For any s > 0, we can lower-bound

B∑

b=1

log

(

1 + s‖Hb‖2FSNR

nt

)

− s

(

nr +SNR

nt

‖Eb‖2F)

≥B∑

b=1

log

(

s‖Hb‖2FSNR

nt

)

− s

(

Bnr +SNR

nt

B∑

b=1

‖Eb‖2F

)

. (A.136)

We then perform the first-order derivative to the RHS of (A.136) with respect to

186


s and equate it to zero. From this step, we obtain a suboptimal s with respect

to (A.135), which is given by

s =B

Bnr +SNR

nt

∑Bb=1 ‖Eb‖2F

. (A.137)

Replacing s in (A.135) with s and removing the supremum yield

Igmi(H) ≥ 1

Blog

(

B∏

b=1

e−1

(

1 +B‖Hb‖2F

Bnr +SNR

nt

∑Bb=1 ‖Eb‖2F

SNR

nt

))

. (A.138)

Note that from hb,r,t = hb,r,t + eb,r,t, we have

|hb,r,t|2 = |hb,r,t|2 + |eb,r,t|2 + 2|hb,r,t||eb,r,t| cos(φhb,r,t − φe

b,r,t). (A.139)

Let αb,r,t = − log |hb,r,t|2log SNR

, αb,r,t = − log |hb,r,t|2log SNR

and θb,r,t = − log |eb,r,t|2logSNR

. Then, for any

real positive number ς > 0, we have for αb,r,t 6= θb,r,t that

SNRς |hb,r,t|2 = SNR

ς−αb,r,t.= SNR

ς−min(αb,r,t,θb,r,t). (A.140)

On the other hand, for αb,r,t = θb,r,t, we have the following four cases.

• If hb,r,t = eb,r,t and hb,r,t 6= 0, we have that


ς−αb,r,t = 4SNRς |hb,r,t|2 (A.141).= SNR

ς−αb,r,t.= SNR

ς−θb,r,t . (A.142)

• If hb,r,t = e∗b,r,t and cos(φhb,r,t) 6= 0, where e∗b,r,t denotes the complex conjugate

of eb,r,t, we have that


ς−αb,r,t = 4SNRς |hb,r,t|2 cos2(φhb,r,t) (A.143)

.= SNR

ς−αb,r,t.= SNR

ς−θb,r,t . (A.144)

187


• If −hb,r,t = e∗b,r,t and sin(φhb,r,t) 6= 0, we have that


ς−αb,r,t = 4SNRς |hb,r,t|2 sin2(φhb,r,t) (A.145)

.= SNR

ς−αb,r,t.= SNR

ς−θb,r,t . (A.146)

• If hb,r,t = −eb,r,t, we have that


ς−αb,r,t = 0. (A.147)

Note that the condition for hb,r,t = −eb,r,t also covers the condition for

hb,r,t = eb,r,t = 0, the condition for hb,r,t = e∗b,r,t with cos(φhb,r,t) = 0, and the

condition for −hb,r,t = e∗b,r,t with sin(φhb,r,t) = 0.

From the preceding evaluation, we have that

e−1

(

1 +B‖Hb‖2F

Bnr +SNR

nt

∑Bb=1 ‖Eb‖2F

SNR

nt

)

= e−1

(

1 +B SNR

nt

∑nr

r=1

∑nt

t=1 SNR−αb,r,t

Bnr +SNR

nt

∑Bb=1

∑nr

r=1

∑nt

t=1 SNR−θb,r,t

)

(A.148)

.= max

(

SNR0,

SNR1−αb,min

max(

SNR0, SNR1−θmin

)

)

(A.149)

.= max

(

SNR0,min

(

SNR0, SNRθmin−1

)

× SNR1−αb,min

)

(A.150)

where

θb,min , min θb,1,1, . . . , θb,r,t, . . . , θb,nr,nt , (A.151)

θmin , min θ1,min, . . . , θb,min, . . . , θB,min , (A.152)

αb,min , min αb,1,1, . . . , αb,r,t, . . . , αb,nr,nt . (A.153)

188


Define αb,min, (r, t)αb,minand (r, t)θb,min

as

αb,min , min αb,1,1, . . . , αb,r,t, . . . , αb,nr,nt , (A.154)

(r, t)αb,min, arg min

r=1,...,nrt=1,...,nt

αb,r,t, (A.155)

(r, t)θb,min, arg min

r=1,...,nrt=1,...,nt

θb,r,t. (A.156)

We have the following cases.

1. Case 1: (r, t)αb,min6= (r, t)θb,min

. This refers to the case where the indices

(r, t) for which the minimum occurs are different for αb,r,t and θb,r,t. Clearly,

we have that

SNR

nt‖Hb‖2F

.=

SNR

ntSNR

−αb,min.= SNR

1−min(αb,min,θb,min). (A.157)

It follows that

e−1

(

1 +B‖Hb‖2F

Bnr +SNR

nt

∑Bb=1 ‖Eb‖2F

SNR

nt

)

.= max

(

SNR0,min

(

SNR0, SNRθmin−1

)

SNR1−αb,min

)

(A.158)

.= max

(

SNR0,min

(

SNR0, SNRθmin−1

)

× SNR1−min(αb,min,θb,min)

)

(A.159)

.= max

(

SNR0, SNRmin(1,θmin)−min(αb,min,θb,min)

)

. (A.160)

2. Case 2: (r, t)αb,min= (r, t)θb,min

. This refers to the case where the indices

(r, t) for which the minimum occurs are the same for both αb,r,t and θb,r,t.

• Case 2.1: αb,min < θb,min. We have that

SNR

nt

‖Hb‖2F.=

SNR

nt

SNR−αb,min

.= SNR

1−αb,min . (A.161)

189


It follows that

e−1

(

1 +B‖Hb‖2F

Bnr +SNR

nt

∑Bb=1 ‖Eb‖2F

SNR

nt

)

.= max

(

SNR0, SNRmin(1,θmin)−αb,min

)

. (A.162)

• Case 2.2: αb,min > θb,min. We have that

SNR

nt‖Hb‖2F

.=

SNR

ntSNR

−αb,min.= SNR

1−θb,min. (A.163)

If we have θmin < 1, the dot equality can be evaluated as follows

e−1

(

1 +B‖Hb‖2F

Bnr +SNR

nt

∑Bb=1 ‖Eb‖2F

SNR

nt

)

.= max

(

SNR0, SNRθmin−θb,min

)

(A.164).= SNR

0 (A.165)

where the last dot equality follows from the condition θmin ≤ θb,min.

For θmin ≥ 1, we have that

e−1

(

1 +B‖Hb‖2F

Bnr +SNR

nt

∑Bb=1 ‖Eb‖2F

SNR

nt

)

.= max

(

SNR0, SNR1−θb,min

)

(A.166).= SNR

0 (A.167)

where the last dot equality is due to θb,min ≥ θmin.

• Case 2.3: αb,min = θb,min. From (A.142), (A.144) and (A.146), if

hb,min = eb,min, hb,min 6= 0 or hb,min = e∗b,min, cos(φhb,min) 6= 0 or−hb,min =

e∗b,min, sin(φhb,min) 6= 0, then we observe the same convergence results

as in case 2.2. Otherwise, we have from (A.147) that SNR−αb,min = 0

and hence,

e−1

(

1 +B‖Hb‖2F

Bnr +SNR

nt

∑Bb=1 ‖Eb‖2F

SNR

nt

)

.= SNR

0. (A.168)

190


Note that the results in cases 2.2 and 2.3 are identical.

Summarising from the above cases, we have that

e−1

(

1 +B‖Hb‖2F

Bnr +SNR

nt

∑Bb=1 ‖Eb‖2F

SNR

nt

)

.= SNR

[min(1,θmin)−αb,min]+ . (A.169)

Recall that with multiplexing gain rg (cf. (2.57)), the data rate R(SNR) satisfies

the dot equality eR(SNR) .= SNRrg . Then, from (A.138) and (A.169), we can bound

Pgout(R) as follows

Pgout(R) = PrIgmi(H) < R(SNR) (A.170).= SNR

−dGicsir (A.171)

.≤ Pr

1

B

B∑

b=1

[min(1, θmin)− αb,min]+ < rg

(A.172)

.=

∫

OG

PA,H(A,ΦH)P(Θ)PE(Φ

E)dAdΘdΦHdΦE (A.173)


OG ,


b=1

[min(1, θmin)− αb,min]+ < Brg

. (A.174)

Applying Lemma A.1 yields

dGicsir ≥ infOG∩A0,Θde×1

(

1 +τ

2

)

B∑

b=1

nr∑

r=1

nt∑

t=1

αb,r,t +B∑

b=1

nr∑

r=1

nt∑

t=1

(θb,r,t − de)

.

(A.175)

Since increasing θmin increases both the infimum function and the LHS of the

constraint, the optimiser of θmin is θ∗min = de. Since θb,r,t ≥ θmin, the optimisers of

θb,r,t are θ∗b,r,t = de for all b, r, t. On the other hand, the infimum solution of A is

given by the intersection of the region defined by∑B

b=1 αb,min > B(min(1, de)−rg)and the region defined by αb,min ∈ [0,min(1, de)]. Since αb,r,t ≥ αb,min, for all

r = 1, . . . , nr and t = 1, . . . , nt, the solution to the above infimum is given by

dGicsir ≥(

1 +τ

2

)

Bntnr × (min(1, de)− rg) (A.176)

191


for rg ∈ [0,min(1, de)] and zero otherwise. For a fixed coding rate independent

of the SNR (rg ↓ 0), we have that

dGicsir ≥ min(1, de)×(

1 +τ

2

)

Bntnr. (A.177)

A.3.2 GMI Upper Bound

The expectation [sY †Σ−1y Y |H b = Hb,Eb = Eb] can be evaluated as

[

sY †Σ−1y Y

∣

∣

∣H b = Hb,Eb = Eb

]

=

∫

x,y

sy†Σ−1y yPX(x)PY |X,H (y|x,Hb)dxdy (A.178)

=

∫

x,y

sy†Σ−1y y · 1

πnre−∥

∥

∥y−√

SNR

ntHbx

∥

∥

∥

2

· 1

πnte−‖x‖2dxdy (A.179)

=

∫

y

sy†Σ−1y y ·

∫

x

1

πnre−∥

∥

∥y−√

SNR

ntHbx

∥

∥

∥

2

· 1

πnte−‖x‖2dx

dy (A.180)

=

∫

y

sy†Σ−1y y

πnr det(

SNR

ntHbH

†b + Inr

)e−y†Σ−1y ydy (A.181)

where

Σy = Inr +SNR

nt

HbH†b (A.182)

is a positive semi-definite matrix. Let y = Qy, where Q is a unitary matrix

diagonalising Σ−1y . Then, Y is a Gaussian random vector with zero mean and

covariance matrix Q†ΣyQ. We have that1.3

y†Σ−1y y = y†Q†Σ−1

y Qy = y†∆y =

nr∑

i=1

|yi|21 + SNR

ntλb,i

(A.183)

where λb,i is the i-th eigenvalue of HbH†b, and ∆ is a diagonal matrix with diagonal

elements given by (1+ SNR

ntλb,i)

−1, i = 1, . . . , nr. Since Σ−1y is a Hermitian matrix,

we can apply the eigen-decomposition [86] such that

Σ−1y = Q∆Q† ⇔ ∆ = Q†Σ−1

y Q (A.184)

1.3Without loss of generality, herein we assume nt ≥ nr.

192


where Q is another unitary matrix and ∆ is another diagonal matrix obtained

by diagonalising Σ−1y . Let λb,i be the i-th eigenvalue of HbH

†b, then the diagonal

entries of ∆ are given by (1 + sSNR

ntλb,i)

−1 for all i = 1, . . . , nr. Applying this to

y†Σ−1y y, we have that

y†Σ−1y y = y†Q†Σ−1

y Qy = y†Q†Q∆Q†Qy = y†V∆V†y = y†Σ−1y y (A.185)

where V = Q†Q is also a unitary matrix and Σ−1y is a Hermitian matrix. Then,

we have that

[

sY †Σ−1y Y

∣

∣

∣H b = Hb,Eb = Eb

]

=s

πnr det(

SNR

ntHbH

†b + Inr

)

∫

y

y†Σ−1y ye−y†Σ−1

y ydy (A.186)

=s

πnr det(

SNR

ntHbH

†b + Inr

) ·∫

y

(

y†V∆V†y)

e−∑nr

i=1|yi|

2

1+ SNRnt

λb,i dy. (A.187)

Let vi,j and σi,j be the entries of V and Σ−1y at row i and column j, respectively.

Then, the integral (A.187) can evaluated as

∫

y

(

y†V∆V

†y)

e−∑nr

i=1|yi|

2

1+ SNRnt

λb,i dy

=

∫

y

(

nr∑

i=1

σi,i|yi|2 + 2nr∑

i=1

nr∑

j>i

ℜ (σi,j y∗i yj)

)

e−

∑nri=1

|yi|2

1+ SNRnt

λb,i dy (A.188)

=nr∑

i=1

∫

y

σi,i|yi|2e−∑nr

i=1|yi|

2

1+ SNRnt

λb,i dy (A.189)

= πnr × det

(

SNR

ntHbH

†b + Inr

)

×nr∑

i=1

σi,i

(

1 +SNR

ntλb,i

)

. (A.190)

Here y∗i denotes the complex conjugate of yi and ℜ· denotes the real part of a

complex number. We have from (A.188) that

σi,i =nr∑

j=1

|vi,j|2

1 + sSNR

ntλb,j

≤nr∑

j=1

|vi,j|2 = 1 (A.191)

where the inequality is because λb,j is non-negative; the last equality is because

193


for the unitary matrix V, the sum of |vi,j|2 over j = 1, . . . , nr is equal to one.

Finally, we have

[

sY †Σ−1y Y

∣

∣

∣H b = Hb,Eb = Eb

]

= snr∑

i=1

σi,i

(

1 +SNR

ntλb,i

)

. (A.192)

Let s∗ be the optimising s that gives the supremum on the RHS of (A.123).

Since s, σi,i and λb,i are all non-negative, we upper-bound Igmib (SNR,Hb, Hb, s

∗)

using Proposition 2.3 and (A.191) as follows


∗)

≤ supsb>0

log det

(

sbSNR

ntHbH

†b + Inr

)

− sb

(

nr +SNR

nt‖Eb‖2F

)

+ sb

nr∑

i=1

(

1 +SNR

ntλb,i

)

(A.193)

= supsb>0

log

nr∏

i=1

(

1 + sbSNR

ntλb,i

)

− sb

(

nr +SNR

nt‖Eb‖2F

)

+ sb

(

nr +SNR

nt‖Hb‖2F

)

(A.194)

≤ supsb>0

nr log

(

1 + sbSNR

nt‖Hb‖2F

)

+ sb

(

SNR

nt‖Hb‖2F − SNR

nt‖Eb‖2F

)

(A.195)

where the last inequality is because∑

i λb,i = ‖Hb‖2F , thus each λb,i is upper-

bounded by ‖Hb‖2F . If SNR‖Hb‖2F is greater than or equal to SNR‖Eb‖2F , then the

supremum on the RHS of (A.195) is achieved with sb ↑ ∞ because the RHS of

(A.195) is a strictly increasing function of sb. However, using the data-processing

inequality (Proposition 2.4), we can always bound Igmib (SNR,Hb, Hb, s

∗) with the

perfect-CSIR bound


∗) ≤ log det

(

SNR

ntHbH

†b + Inr

)

(A.196)

≤ nr log

(

1 +SNR

nt

‖Hb‖2F)

. (A.197)

On the other hand, if SNR‖Hb‖2F is less than SNR‖Eb‖2F , the supremum is achieved

194


with s∗b given by

s∗b =

[

nr

SNR

nt‖Eb‖2F − SNR

nt‖Hb‖2F

− 1SNR

nt‖Hb‖2F

]

+

. (A.198)

The above s∗b ≥ 0 is obtained from the solution of the first order derivative of

nr log

(

1 + sbSNR

nt‖Hb‖2F

)

+ sb

(

SNR

nt‖Hb‖2F − SNR

nt‖Eb‖2F

)

(A.199)

with respect to sb when the derivative is equal to zero.

We continue the analysis by using the change of random variables as used in

the GMI lower bound (Appendix A.3.1). The condition SNR‖Hb‖2F ≥ SNR‖Eb‖2Ffor the perfect-CSIR bound implies that at high SNR, we have the following dot

inequality

SNR1−αb,min

.

≥ SNR1−θb,min . (A.200)

We then have the following asymptotic characterisations.

1. Case 1: αb,min ≤ θb,min. From the perfect-CSIR bound (A.197), we have

that

1 +SNR

nt‖Hb‖2F

.= max

(

SNR0, SNR1−αb,min

)

. (A.201)

2. Case 2: αb,min > θb,min. If 1SNR

nt‖Hb‖2F

is greater than or equal to

nrSNR

nt‖Eb‖2F− SNR

nt‖Hb‖2F

, we have s∗b ↓ 0. From the RHS of (A.195), this yields

exp

(

s∗bSNR

nrnt‖Hb‖2F − s∗bSNR

nrnt‖Eb‖2F

)

×(

1 + s∗bSNR

nt‖Hb‖2F

)

.= SNR

0.

(A.202)

Otherwise, we have

s∗b =nr

SNR

nt‖Eb‖2F − SNR

nt‖Hb‖2F

− 1SNR

nt‖Hb‖2F

(A.203)

195


and this also yields

exp

(

s∗bSNR

nrnt‖Hb‖2F − s∗bSNR

nrnt‖Eb‖2F

)

×(

1 + s∗bSNR

nt‖Hb‖2F

)

.= SNR

0.

(A.204)

Recall that the multiplexing gain rg and rate R(SNR) relationship eR(SNR) .=

SNRrg (see Appendix A.3.1). From the above cases, we have the bound for

Pgout(R) as follows

Pgout(R).= SNR

−dGicsir (A.205)

.≥ Pr

1

B

B∑

b=1

nr [1− αb,min]+ · 1 αb,min ≤ θb,min < rg

(A.206)

.=

∫

OG

PA,H(A,ΦH)P(Θ)PE(Φ

E)dAdΘdΦHdΦE (A.207)


OG ,


b=1

[1− αb,min]+ · 1 αb,min ≤ θb,min

<Brgnr

.

(A.208)

Thus, applying Lemma A.1 to find the SNR-exponent and following the same

steps used for the GMI lower bound, it is not difficult to prove that

dGicsir ≤(

1 +τ

2

)

Bntnrmin

(

1− rgnr, de

)

. (A.209)

For fixed rate independent of the SNR (rg ↓ 0), we obtain

dGicsir ≤ min(1, de)(

1 +τ

2

)

Bntnr. (A.210)

This proves Theorem 3.1 for Gaussian inputs.

196

A.4 Proof of Theorem 3.2


Recall that from (2.25), the generalised Gallager function for MIMO channels

can be written as (in natural-base log)

EQ0 (s, ρ, Hb)

= − log

(

∫

x′

PX(x′)

(

QY |X,H (Y |x′, H b)

QY |X,H (Y |X, H b)

)s

dx′

)ρ∣

∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

.

(A.211)

Evaluating the inner expectation over X ′ for Y = y, X = x, H b = Hb and

Eb = Eb, we have that

∫

x′

PX(x′)

(

QY |X,H (y|x′, Hb)

QY |X,H (y|x, Hb)

)s

dx′ = es∥

∥

∥y−√

SNR

ntHbx

∥

∥

∥

2

· e−sy†Σ−1y y

det(

sHbH†bSNR

nt+ Inr

)

(A.212)

where

Σy = sHbH†b

SNR

nt+ Inr. (A.213)

Then, the expectation over (X,Y ) is given as

(

∫

x′

PX(x′)

(


QY |X,H (Y |X, H b)

)s

dx′

)ρ∣

∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

=

[

eρs

∥

∥

∥Y −√

SNR

ntHbX

∥

∥

∥

2

× e−ρsY †Σ−1y Y

∣

∣

∣

∣

H b = Hb,Eb = Eb

]

det(

sHbH†bSNR

nt+ Inr

)ρ . (A.214)

We can evaluate the expectation as follows. For a function f(x,y), the ex-

pectation over (X,Y ) is given by

[f(X,Y )] =

∫

x,y

f(x,y) PX(x) PY |X(y|x)dydx. (A.215)

197


We first apply the integration over y

∫

y

(

eρs

∥

∥

∥y−

√

SNR

ntHbx

∥

∥

∥

2

· e−ρsy†Σ−1y y

)

· 1

πnre−∥

∥

∥y−

√

SNR

ntHbx

∥

∥

∥

2

dy. (A.216)

Using y = Q†y, we have that

y†Σ−1y y = y†

Q†Σ−1

y Qy = y†∆y =

nr∑

i=1

|yi|2

1 + sλb,iSNR

nt

(A.217)

where Q is a unitary matrix identical to that defined in Appendix A.3, and where

without loss of generality, we have assumed nt ≥ nr. Note that

∥

∥

∥

∥

∥

y −√

SNR

nt

Hbx

∥

∥

∥

∥

∥

2

=

∥

∥

∥

∥

∥

Qy −√

SNR

nt

Hbx

∥

∥

∥

∥

∥

2

=

∥

∥

∥

∥

∥

y −√

SNR

nt

Q†Hbx

∥

∥

∥

∥

∥

2

(A.218)

because multiplication with a unitary matrix does not affect the Euclidean norm

of a vector. Therefore, we have that

∫

y

eρs

∥

∥

∥y−√

SNR

ntHbx

∥

∥

∥

2

· e−ρsy†Σ−1y y · 1

πnre−∥

∥

∥y−√

SNR

ntHbx

∥

∥

∥

2

dy

=

∫

y

eρs

∥

∥

∥y−

√

SNR

ntQ†Hbx

∥

∥

∥

2

· e−ρsy†∆y · 1

πnre−∥

∥

∥y−

√

SNR

ntQ†Hbx

∥

∥

∥

2

dy (A.219)

≤ 1(

1− ρs1−ρs

SNR

nt‖Eb‖2F

)nt·

nr∏

i=1

(

1 + sλb,iSNR

nt

1 + sλb,iSNR

nt(1− ρs)

)

(A.220)

=

(

1− ρs

1− ρs− ρsSNR

nt‖Eb‖2F

)nt

·nr∏

i=1

(

1 + sλb,iSNR

nt

1 + sλb,iSNR

nt(1− ρs)

)

(A.221)

where inequality (A.220) is proved in Appendix A.5. Note that the result in

(A.220) requires ρs < 1 and

s ≤ u1

u2 +SNR

nt‖Eb‖2F

(A.222)

—where u1 and u2 are some positive constants—so that the integral can be

198


evaluated. We then have that

∫

x′

PX(x′)

(


QY |X,H (Y |X, H b)

)s

dx′

ρ∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

≤ 1

det(

sHbH†bSNR

nt+ Inr

)ρ ·(

1− ρs

1− ρs− ρsSNR

nt‖Eb‖2F

)nt

·nr∏

i=1

(

1 + sλb,iSNR

nt

1 + sλb,iSNR

nt(1− ρs)

)

(A.223)

and from (A.211)

EQ0 (s, ρ, Hb)

≥

nr∑

i=1

(ρ− 1) log

(

1 + sλb,iSNR

nt

)

− nt log(1− ρs)

+ nt log

(

1− ρs− ρsSNR

nt‖Eb‖2F

)

+

nr∑

i=1

log

(

1 + sλb,iSNR

nt(1− ρs)

)

.

(A.224)

Note that the random coding error exponent is given by

EQr (R, H) = sup

s>00≤ρ≤1

1

B

B∑

b=1

EQ0 (s, ρ, Hb)− ρR. (A.225)

A lower bound to EQr (R, H) can be obtained by replacing EQ

0 (s, ρ, Hb) with the

RHS of (A.224), and given by

EQr (R, H) ≥ sup

s>00≤ρ≤1

1

B

B∑

b=1

(

nr∑

i=1

(ρ− 1) log

(

1 + sλb,iSNR

nt

)

)

− nt log(1− ρs)

+ nt log

(

1− ρs− ρsSNR

nt

‖Eb‖2F)

+

nr∑

i=1

log

(

1 + sλb,iSNR

nt(1− ρs)

)

− ρR. (A.227)

Note that from (A.227), we require ρs < 1 and ρs + ρsSNR

nt‖Eb‖2F < 1 for all

199


b = 1, . . . , B so that the logarithm functions are defined. Note that the following

choices of ρ = ρ = 1 and

s = s =1

nr

(

1 + SNR

nt

∑Bb=1 ‖Eb‖2F

) (A.228)

satisfy (A.222) and ensure that the logarithm functions in (A.227) are always

defined. Since ρs and ρs+ ρsSNR

nt‖Eb‖2F are always bounded by some real-valued

constants in the interval [0, 1], we have that for SNR ≥ 0

− nt log(1− ρs) + nt log

(

1− ρs− ρsSNR

nt‖Eb‖2F

)

≥ nt log

(

1− 1

nr

)

, u3.

(A.229)

Note that choosing specific values of ρ and s further lower-bounds (A.227).

Since EQr (R, H) can also be lower-bounded by 0 (i.e., ρ = 0), we have that by

substituting s and ρ to ρ and s, the following lower bounds

EQr (R, H) ≥

1

B

B∑

b=1

(

u3 +nr∑

i=1

(ρ− 1) log

(

1 + sλb,iSNR

nt

)

+

nr∑

i=1

log

(

1 + sλb,iSNR

nt(1− ρs)

)

)

− ρR

+

(A.230)

≥[

1

B

B∑

b=1

log

(

eu3 ·(

1 + s‖Hb‖2FSNR

nt(1− s)

))

−BR

]

+

(A.231)

, EQr (R, H). (A.232)

Note that inequality (A.231) is due to the lower-bounding technique in (A.131).

Following the high-SNR analysis in Appendix A.3.1 (cases 1 and 2), we obtain

the dot equality

eu3 ·(

1 + s‖Hb‖2FSNR

nt(1− s)

)

.= SNR

[min(1,θmin)−αb,min]+ . (A.233)

Recall the rate and multiplexing gain relationship eR(SNR) .= SNR

rg (cf. (2.57)).

It follows from EQr (R, H) (the RHS of (A.231)), (A.233) and the dot equality

200


eR(SNR) .= SNRrg that for high SNR, if the following event

AG =


b=1

[min(1, θmin)− αb,min]+ ≤ Brg

(A.234)

occurs, then EQr (R, H) = 0 and if the complementary event

AcG =


b=1

[min(1, θmin)− αb,min]+ > Brg

(A.235)

occurs, then EQr (R, H) > 0. Therefore, we can upper-bound the average error

probability of Gaussian random codes as follows

Pe,ave

≤ [

e−BJEQr (R,H )

]

(A.236)

.

≤∫

AG∩A0,Θde×1

(

SNR−(1+ τ

2 )∑B

b=1

∑nrr=1

∑ntt=1 αb,r,t

× SNR∑B

b=1

∑nrr=1

∑ntt=1(de−θb,r,t)dAdΘdΦHdΦE

)

+

∫

AcG∩A0,Θde×1

(

SNR−(1+ τ

2 )∑B

b=1

∑nrr=1

∑ntt=1 αb,r,t · SNR

∑Bb=1

∑nrr=1

∑ntt=1(de−θb,r,t)

× SNR−J(

∑Bb=1[min(1,θmin)−αb,min]+−Brg)dAdΘdΦHdΦE

)

(A.237)

.= K1SNR

−d1(rg) +K2SNR−d2(rg) (A.238)

.= SNR

−d2(rg) (A.239)

where K1, K2.= SNR

0, and where

d1(rg) =(

1 +τ

2

)

Bntnr × (min(1, de)− rg) (A.240)

is a lower bound to the generalised outage SNR-exponent achieved with infinite

block length (notice that OG in Appendix A.3 is similar to AG except for the

201

A.5 Proof of Inequality (A.220)

inequality < which becomes ≤ in AG) and

d2(rg) = infAc

G∩A0,Θde×1

(

1 +τ

2

)

B∑

b=1

nr∑

r=1

nt∑

t=1

αb,r,t +

B∑

b=1

nr∑

r=1

nt∑

t=1

(θb,r,t − de)

+ J

(

B∑

b=1

[min(1, θmin)− αb,min]+ −Brg

)

. (A.241)

Since we need

J

(

B∑

b=1

[min(1, θmin)− αb,min]+ − Brg

)

> 0 (A.242)

in the set AcG for d2(rg), it is straightforward to deduce that d2(rg) ≤ d1(rg),

which follows from [20, Lemma 6]. Therefore, the SNR-exponent lower bound

for a given block length J is given by dℓG(rg) = d2(rg) in Theorem 3.2.


Basically, we want to evaluate the following expectation over X

∫

y

(

eρs

∥

∥

∥y−√

SNR

ntHbX

∥

∥

∥

2

· e−ρsy†Σ−1y y · 1

πnre−∥

∥

∥y−√

SNR

ntHbX

∥

∥

∥

2

dy

)

∣

∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

=

∫

y

(

eρs

∥

∥

∥y−√

SNR

ntQ†HbX

∥

∥

∥

2

· e−ρsy†∆y · 1

πnre−∥

∥

∥y−√

SNR

ntQ†HbX

∥

∥

∥

2

dy

)

∣

∣

∣

∣

∣

∣

H b = Hb,Eb = Eb

. (A.243)

To simplify the presentation, we let g =√

SNR

ntQ†Hbx and c =

√

SNR

ntQ†Hbx,

g, c ∈ nr . Then, expanding the argument in the exponential term for X = x,

202


we have that

ρs

∥

∥

∥

∥

∥

y −√

SNR

ntQ†Hbx

∥

∥

∥

∥

∥

2

− ρsy†∆y −∥

∥

∥

∥

∥

y −√

SNR

ntQ†Hbx

∥

∥

∥

∥

∥

2

= ρsnr∑

i=1

(

|yi − gi|2 −|yi|2

1 + sλb,iSNR

nt

)

−nr∑

i=1

|yi − ci|2. (A.244)

By basic integration, we can easily obtain that

∫

yi

1

πe−|yi−ci|2−ρs

|yi|2

1+sλb,iSNRnt

+ρs|yi−gi|2dyi

=1 + sλb,i

SNR

nt

1 + sλb,iSNR

nt(1− ρs)

× e−|ci|2+ρs|gi|2+ |ρsgi−ci|2·

1+sλb,iSNRnt

1+sλb,iSNRnt

(1−ρs) . (A.245)

Evaluating the integral for all yi, i = 1, . . . , nr yields

(

nr∏

i=1

1 + sλb,iSNR

nt

1 + sλb,iSNR

nt(1− ρs)

)

× exp

−nr∑

i=1

|ci|2 + ρsnr∑

i=1

|gi|2

+

nr∑

i=1

(

|ρsgi − ci|2 ·1 + sλb,i

SNR

nt

1 + sλb,iSNR

nt(1− ρs)

)

. (A.246)

Note that

nr∑

i=1

|ci|2 =∥

∥

∥

∥

∥

√

SNR

nt

Q†Hbx

∥

∥

∥

∥

∥

2

=SNR

nt

‖Hbx‖2 (A.247)

nr∑

i=1

|gi|2 =∥

∥

∥

∥

∥

√

SNR

ntQ

†Hbx

∥

∥

∥

∥

∥

2

=SNR

nt

∥

∥

∥Hbx

∥

∥

∥

2

(A.248)

because Q is a unitary matrix that does not change the Euclidean norm of a

vector. This removes the difficulty of obtaining the exact expression for Q. On

the other hand, the last term in the exponential function in (A.246) is difficult to

evaluate as the summation involves the variable λb,i. Herein we have to impose

an additional condition such that the last term in (A.246) can be evaluated.

Suppose that we restrict ρs < 1 with strict inequality for 0 ≤ ρ ≤ 1 and s > 0.

203


Then, we have the bounds for λb,i ≥ 0 and SNR ≥ 0

1 ≤1 + sλb,i

SNR

nt

1 + sλb,iSNR

nt(1− ρs)

≤ 1

1− ρs. (A.249)

Hence, we can upper bound the last term in (A.246) as follows

nr∑

i=1

|ρsgi − ci|2(

1 + sλb,iSNR

nt

1 + sλb,iSNR

nt(1− ρs)

)

≤ 1

1− ρs

nr∑

i=1

|ρsgi − ci|2 (A.250)

=1

1− ρs

∥

∥

∥

∥

∥

√

SNR

ntQ†(ρsHbx− Hbx)

∥

∥

∥

∥

∥

2

(A.251)

=1

1− ρs

SNR

nt

∥

∥

∥ρsHbx− Hbx

∥

∥

∥

2

(A.252)

where the last equality is due to the unitary matrix Q† that does not affect the

Euclidean norm of a vector. This removes the dependency on Q†. By combining

(A.247)–(A.252), we upper-bound the expectation over X of the exponential

function in (A.246) with the following

exp

− SNR

nt

‖HbX‖2 − ρs‖HbX‖2 − 1

1− ρs‖(ρsHb − Hb)X‖2

=

∫

x

exp

− SNR

nt

‖Hbx‖2 − ρs‖Hbx‖2 −1

1− ρs‖(ρsHb − Hb)x‖2

× 1

πnte−‖x‖2dx (A.253)

=1

πnt

∫

x

exp

(

ρs

1− ρs

SNR

nt‖Ebx‖2 − ‖x‖2

)

dx (A.254)

≤ 1

πnt

∫

x

exp

(

ρs

1− ρs

SNR

nt

‖Eb‖2F‖x‖2 − ‖x‖2)

dx (A.255)

=1

(

1− ρs1−ρs

SNR

nt‖Eb‖2F

)nt(A.256)

where the last inequality is due to ‖Ebx‖2 ≤ ‖Eb‖2F‖x‖2 [86, Sec. 5.6]. Note

that the integrand in (A.255) is integrable if ρs1−ρs

SNR

nt‖Eb‖2F < 1. At high SNR,

204

A.6 Proof of Proposition 3.1

there exist positive constants u1 and u2 for which s ≤ u1

u2+SNR

nt‖Eb‖2F

guarantees

that (A.255) is integrable.


From (A.230) and (A.231) with ρ = 1, we can rewrite the lower bounds of the

mismatched decoding error exponent as follows

EQr (R, H) ≥

1

B

B∑

b=1

u3 +

nr∑

i=1

log

(

1 + sλb,iSNR

nt(1− s)

)

−R

+

(A.257)

≥

1

B

B∑

b=1

log

(

eu3 ·(

1 + s‖Hb‖2FSNR

nt(1− s)

))

− R

+

(A.258)

where u3 < 0 and

s =1

nr

(

1 + SNR

nt

∑Bb=1 ‖Eb‖2F

) . (A.259)

We have used the last inequality above to derive the block length threshold for

Gaussian inputs in Theorem 3.2. The results are general for the fading model

(3.4). However, the last inequality implies a looser achievability bound and the

resulting block length threshold may not be tight.

A tighter bound can be obtained by using the inequality (A.257). This re-

quires the joint density function of the random vector Λb and the entries of Eb.

Note that conditioned on Eb = Eb, H b has the same distribution and covariance

as H b but with the mean shifted by Eb. From (3.4), the conditional distribution

of each channel estimate entry, Hb,r,t, is given by

PHb,r,t|Eb,r,t(h|e) = w0|h− e|τe−w1|h−e−w2|ϕ. (A.260)

The characterisation of the above pdf is difficult when τ 6= 0. At high SNR, the

near-zero behaviour determines the dominating term in the pdf [20, 47]. Note

that for τ 6= 0, the near-zero behaviour of the pdf is determined by the values of

h, e in |h − e|τ . This behaviour does not only depend on |h|τ but also |e|τ and

the angles of h and e. These interplaying variables make the near-zero behaviour

of the pdf intractable. On the other hand, when τ = 0, the variable e only

affects the exponential term, which for many cases tends to decay exponentially

205


or converges to a constant for high SNR (see also [47, 48]).

Consider τ = 0 and assume nt ≥ nr, we perform a change of random variables

from the matrix entries in Hb to its eigenvalues λb,i for all i = 1, . . . , nr. Since the

entries of the channel matrix are assumed to be i.i.d., the pdf of H b for a given

Eb is given by

PH b|Eb

(

Hb

∣

∣

∣Eb

)

=

nr∏

r=1

nt∏

t=1

w0e−w1|hb,r,t−eb,r,t−w2|ϕ. (A.261)

Using the singular value decomposition of Hb = UDS and the eigen-decomposition

in the form of HbH†b = UΣU† [86], random matrices results [107, 108] provide the

joint distribution of the ordered eigenvalues in the following form [47, 48]

PΛb|Eb

(

λb|Eb

)

= Cn,m

nr∏

i=1

λnt−nr

b,i ·∏

i<j

(λb,i − λb,j)2

·∫

Vnr,nr

∫

Vnr,nt

PH b|Eb

(UDS|Eb)dSdU (A.262)

where Cn,m is the normalising constant and Vnr,nr and Vnr,nt are the complex

Stiefel manifolds [107, 108]. Remark that Σ = diag[λb,1, . . . , λb,nr] and D =

diag[λ12b,1, . . . , λ

12b,nr

] with λb,1 ≤ · · · ≤ λb,nr.

Let υb,i = − log λb,i

log SNR. Using this change of variable, (A.261) and (A.262), we

can write the above pdf for τ = 0 as done in [48]

PΥb|Eb

( υb|Eb)

= Cn,m · (log SNR)nr ·nr∏

i=1

SNR−(nt−nr+1)υb,i ·

∏

i<j

(

SNR−υb,i − SNR

−υb,j)2

·(

∫

Vnr,nr

∫

Vnr,nt

wnrnt0 e−w1(‖Hb−Eb−W2‖ϕ)

ϕ

dSdU

)

(A.263)

where W2 is an nr × nt matrix with all elements equal to w2 and ‖ · ‖ϕ is the

ϕ−norm [86]. As we deal with the achievability bound, it suffices to find a tight

upper bound for the pdf. Note that since nr×nt is a finite-dimensional complex

space, all norms on nr×nt are equivalent [86].1.4 Thus, we can find a positive

1.4The equivalence of norms can be explained as follows. Given a finite-dimensional spacem×n and a matrix X ∈ m×n, there exists positive real numbers P and S independent of X

such that P‖X‖p′ ≤ ‖X‖p ≤ S‖X‖p′ [86].

206


number u4 > 0 such that the term in the exponent can be lower-bounded as

‖Hb − Eb −W2‖ϕ ≥ u4‖Hb − Eb −W2‖F . (A.264)

Applying the backward triangle inequality for the matrix norm, we have that

‖Hb − Eb −W2‖F ≥∣

∣

∣‖Hb‖F − ‖Eb +W2‖F

∣

∣

∣(A.265)

=

∣

∣

∣

∣

∣

∣

√

√

√

√

nr∑

i=1

λb,i −

√

√

√

√

nr∑

r=1

nt∑

t=1

|eb,r,t + w2|2∣

∣

∣

∣

∣

∣

(A.266)

=

∣

∣

∣

∣

∣

∣

√

√

√

√

nr∑

i=1

SNR−υb,i −

√

√

√

√

nr∑

r=1

nt∑

t=1

∣

∣

∣

∣

SNR− θb,r,t

2 eıφeb,r,t + w2

∣

∣

∣

∣

2

∣

∣

∣

∣

∣

∣

.

(A.267)

Since ϕ ≥ 1 by the definition (3.4), we can lower-bound (A.263) using

(

‖Hb − Eb −W2‖ϕ)ϕ

≥ uϕ4

(

‖Hb − Eb −W2‖F)ϕ

(A.268)

≥ uϕ4

∣

∣

∣‖Hb‖F − ‖Eb +W2‖F

∣

∣

∣

ϕ

(A.269)

which follows from the monotonicity of the function f(u) = uϕ over the interval

u > 0. Remark that the conditional pdf in (A.262) is conditioned on Eb. We can

write the joint density function of Λb,Eb as follows

PΛb,Eb

(

λb,Eb

)

= PΛb|Eb

(

λb

∣

∣

∣Eb

)

PEb(Eb). (A.270)

The density PΛb|Eb

(λb|Eb)PEb(Eb) can be further expanded as

PΛb|Eb

(

λb

∣

∣

∣Eb

)

PEb(Eb) = P

Λb|Eb,r,t

(

λb

∣

∣

∣eb,r,t

)

nr∏

r=1

nt∏

t=1

PEb,r,t(eb,r,t) (A.271)

where eb,r,t denotes the collection of eb,r,t for all r, t. Equality (A.271) holds

since the matrix Eb can be completely expressed in terms of its entries eb,r,t,

r = 1, . . . , nr, t = 1, . . . , nt. Note that the entries of the random matrix Eb are

i.i.d. random variables and for each entry, the phase ΦEb,r,t is independent from

the magnitude |Eb,r,t| and uniformly distributed over [0, 2π). Hence, applying

207


the transformation of the variables λb,i and |eb,r,t|2 to υb,i = − log λb,i

log SNRand θb,r,t =

− log |eb,r,t|2logSNR

, we have the joint pdf of Υb, Θb,r,t and ΦEb,r,t, r = 1, . . . , nr, t = 1, . . . , nt

as follows

PΥb,Θb,r,t,ΦE

b,r,t(

υb, θb,r,t, φeb,r,t

)

= PΥb|Θb,r,t,ΦE

b,r,t(

υb| θb,r,t, φeb,r,t

)

·nr∏

r=1

nt∏

t=1

PΘb,r,t(θb,r,t)PΦE

b,r,t(φe

b,r,t).

(A.272)

We continue the analysis from (A.271) and (A.272). Note that the term

PΥb|Θb,r,t,ΦE

b,r,t(υb|θb,r,t, φe

b,r,t) (A.273)

can be further upper-bounded using (A.269). Using this bound, we then group

the exponential terms as follows

exp

− w1uϕ4

∣

∣

∣

∣

∣

∣

√

√

√

√

nr∑

i=1

SNR−υb,i−

√

√

√

√

nr∑

r=1

nt∑

t=1

∣

∣

∣

∣

SNR− θb,r,t

2 eıφeb,r,t + w2

∣

∣

∣

∣

2

∣

∣

∣

∣

∣

∣

ϕ

−nr∑

r=1

nt∑

t=1

SNR(de−θb,r,t)

. (A.274)

As the SNR increases, the behaviour of this exponential term is dominated by the

smallest values of υb,i, i = 1, . . . , nr and θb,r,t, r = 1, . . . , nr, t = . . . , nt. Since the

eigenvalues λb,1, . . . , λb,nt are ordered in a non-decreasing order, the dominating

terms are indicated by υb,nr and θb,min. We have the following observations.

1. For the variables inside the modulus operator, | · |ϕ, if υb,nr ≥ 0 and θb,min ≥0, the terms inside the modulus are converging to some constant w′

2 as the

SNR increases and the convergence of the exponential term is determined

by

−w1uϕ4 |w′

2|ϕ −nr∑

r=1

nt∑

t=1

SNR(de−θb,r,t) .= −SNR

0 − SNRde−θb,min. (A.275)

If θb,min < de, then SNR(de−θb,min) dominates and this makes the overall

pdf upper bound decay exponentially with the SNR. If θb,min ≥ de, then

208


the constant w1uϕ4 |w′

2|ϕ dominates and eventually the exponential function

converges to an SNR independent constant which can be neglected in the

pdf upper bound for the asymptotic analysis.

2. If either υb,nr < 0 or θb,min < 0, then the exponential convergence can be

explained in the following cases.

• If υb,nr < θb,min, then the following dominates the exponent

− w1uϕ4

√

√

√

√

nr∑

i=1

SNR−υb,i

ϕ

−nr∑

r=1

nt∑

t=1

SNR(de−θb,r,t)

.= −SNR

−ϕ2υb,nr − SNR

de−θb,min (A.276).= −SNR

max(−ϕ2υb,nr ,de−θb,min). (A.277)

Since υb,nr < 0, it can be seen that the exponential function always

makes the pdf upper bound decay exponentially with the SNR.

• If υb,nr > θb,min, the dominating exponent is given by

− w1uϕ4

√

√

√

√

nr∑

r=1

nt∑

t=1

∣

∣

∣

∣

SNR− θb,r,t

2 eıφeb,r,t + w2

∣

∣

∣

∣

2

ϕ

−nr∑

r=1

nt∑

t=1

SNR(de−θb,r,t)

.= −SNR

−ϕ2θb,min − SNR

de−θb,min (A.278).= −SNR

max(−ϕ2θb,min, de−θb,min). (A.279)

Since θb,min is less than zero, it can be seen that the exponential func-

tion always makes the pdf upper bound decay exponentially with the

SNR.

3. Note that we have υb,1 ≥ · · · ≥ υb,nr and for any r = 1, . . . , nr, t = 1, . . . , nt,

θb,r,t ≥ θb,min.

Hence, from the above observations, we require that υb 0 and Θb de × 1,

b = 1, . . . , B so that the pdf upper bound does not decay exponentially to zero

as the SNR tends to infinity.

We continue the analysis by evaluating the lower bound for EQr (R, H) in

209


(A.257)

EQr (R, H) ≥

1

B

B∑

b=1

u3 +

nr∑

i=1

log

(

1 + sλb,iSNR

nt(1− s)

)

−R

+

(A.280)

=

1

Blog

B∏

b=1

nr∏

i=1

eu3nr

(

1 + sλb,iSNR

nt

(1− s)

)

−R

+

(A.281)

, E ′Qr (R, H) (A.282)

where

s =1

nr

(

1 + SNR

nt

∑Bb=1

∑nr

r=1

∑nt

t=1 |eb,r,t|2) . (A.283)

Using the change of variables from λb,i and |eb,r,t|2 to υb,i and θb,r,t, we can show

the following dot equality

eu3nr

(

1 + sλb,iSNR

nt

(1− s)

)

.= SNR

[min(1,θmin)−υb,i]+ (A.284)

where

θmin , min

θ1,1,1, . . . , θb,r,t, . . . , θB,nr,nt

. (A.285)

It follows from E ′Qr (R, H) (the RHS of (A.281)), (A.284) and the rate and mul-

tiplexing gain relationship eR(SNR) .= SNRrg (cf. (2.57)) that at high SNR, if the

following event

AG =

υ ∈ Bnr ,Θ ∈ Bnr×nt :

B∑

b=1

nr∑

i=1

[min(1, θmin)− υb,i]+ ≤ Brg

(A.286)

occurs, then E ′Qr (R, Hb) = 0 and otherwise if

AcG =

υ ∈ Bnr ,Θ ∈ Bnr×nt :

B∑

b=1

nr∑

i=1

[min(1, θmin)− υb,i]+ > Brg

(A.287)

occurs, then E ′Qr (R, Hb) > 0. Therefore, for the fading model (3.4) with τ = 0,

210


we can upper-bound the average error probability of Gaussian random codes as

follows

Pe,ave ≤ [

e−BJEQr (R,H )

]

(A.288)

.≤∫

AG∩υ0,Θde×1

(

SNR−∑B

b=1

∑nri=1(2i−1+nt−nr)υb,i

× SNR∑B

b=1

∑nrr=1

∑ntt=1(de−θb,r,t)dυdΘdΦE

)

+

∫

AcG∩υ0,Θde×1

(

SNR−∑B

b=1

∑nri=1(2i−1+nt−nr)υb,i

× SNR∑B

b=1

∑nrr=1

∑ntt=1(de−θb,r,t)

× SNR−J(

∑Bb=1

∑nri=1[min(1,θmin)−υb,i]+−Brg)dυdΘdΦE

)

(A.289)

.= G1SNR

−d1(rg) +G2SNR−d2(rg) (A.290)

.= SNR

−d2(rg) (A.291)

whereG1, G2.= SNR

0, and d1(rg) is the generalised outage SNR-exponent achieved

with infinite block length. Note that to find the solution of d1(rg), we follow the

same approach of finding the optimal DMT in [20]. The lower-bound of the op-

timal DMT curve d1(rg) is given by the piecewise-linear function connecting the

points (rg, d1(rg)), where

rg = 0,min(1, de), 2min(1, de), . . . , nrmin(1, de), (A.292)

d1(rg) = min(1, de) · B(

nt −rg

min(1, de)

)

·(

nr −rg

min(1, de)

)

. (A.293)

Note that we have d1,max = min(1, de)Bntnr and rg,max = min(1, de)nr. On the

other hand, d2(rg) is given as

d2(rg) = infAc

G∩υ0,Θde×1

B∑

b=1

nr∑

i=1

(2i− 1 + nt − nr)υb,i +

B∑

b=1

nr∑

r=1

nt∑

t=1

(θb,r,t − de)

+ J

(

B∑

b=1

nr∑

i=1


)

. (A.294)

211


Since we need

J

(

B∑

b=1

nr∑

i=1


)

> 0 (A.295)

for d2(rg), it is straightforward to deduce that d2(rg) ≤ d1(rg), which follows

from [20, Lemma 6]. Thus, d2(rg) leads to dℓG(rg) in the proposition. Note that

we just need to replace (Inr + sHbH†bSNR

nt) with (Int + sH†

bHbSNR

nt) in the analysis if

nt < nr.


We use the generalised Gallager upper bound to derive the achievability by isolat-

ing the channel block length and the random coding exponent. Recall EQ0 (s, ρ, Hb)

in (2.25) written in different form here

EQ0 (s, ρ, Hb)

= − log2

[(

∑

x′∈Xnt

PX(x′)

(


QY |X,H (Y |X, H b)

)s)ρ ∣∣

∣

∣

∣

H b = Hb,Eb = Eb

]

.

(A.296)

For a given Y = y, X = x, H b = Hb and Eb = Eb, inserting the decoding metric

(3.7) and evaluating the expectation over X ′, we have that

∑

x′∈Xnt

PX(x′)

(

QY |X,H (y|x′, Hb)

QY |X,H (y|x, Hb)

)s

= 2−Mnt∑

x′∈Xnt

(

e−∥

∥

∥

√

SNR

ntHb(x−x′)+z−

√

SNR

ntEbx

′∥

∥

∥

2+∥

∥

∥z−√

SNR

ntEbx

∥

∥

∥

2)s

. (A.297)

212


Substituting (A.297) to the RHS of (A.296), we obtain

− log2

[(

∑

x′∈Xnt

PX(x′)

(


QY |X,H (Y |X, H b)

)s)ρ ∣∣

∣

∣

∣

H b = Hb,Eb = Eb

]

= (1 + ρ)Mnt

− log2∑

x∈Xnt

[

∑

x′∈Xnt

(

e−s

∥

∥

∥

√

SNR

ntHb(x−x′)+Z−

√

SNR

ntEbx

′∥

∥

∥

2+s

∥

∥

∥Z−√

SNR

ntEbx

∥

∥

∥

2)

ρ]

.

(A.298)

Note that

1 ≤

∑

x′∈Xnt

(

e−s

∥

∥

∥

√

SNR

ntHb(x−x′)+z−

√

SNR

ntEbx

′∥

∥

∥

2+s

∥

∥

∥z−√

SNR

ntEbx

∥

∥

∥

2)

ρ

(A.299)

≤ |Xnt |ρeρs∥

∥

∥z−√

SNR

ntEbx

∥

∥

∥

2

. (A.300)

We have the expectation over Z

[

|Xnt|ρeρs∥

∥

∥Z−√

SNR

ntEbx

∥

∥

∥

2]

=|Xnt|ρ

(1− ρs)nre

(

ρ2s2

1−ρs+ρs

)

SNR

nt‖Ebx‖2

(A.301)

≤ |Xnt|ρ(1− ρs)nr

e

(

ρ2s2

1−ρs+ρs

)

SNR

nt‖Eb‖2F ‖x‖2

(A.302)

where we have assumed ρs < 1 so that the expectation can be evaluated, and

where we have used the Frobenius norm property ‖Ebx‖2 ≤ ‖Eb‖2F‖x‖2 [86, Sec.

5.6] in the last inequality. Since the signal energy ‖x‖2, x ∈ Xnt is finite, the

condition|Xnt|ρ

(1− ρs)nre

(

ρ2s2

1−ρs+ρs

)

SNR

nt‖Eb‖2F ‖x‖2

<∞ (A.303)

can be satisfied by choosing the optimal solution of s over

S =

s ∈ : 0 < s ≤ 1

B + SNR∑B

b=1 ‖Eb‖2F

. (A.304)

The choice of s ∈ S leads to a lower bound to the mismatched decoding

error exponent in (2.24). As (A.303) can be satisfied with s ∈ S, the dominated

convergence theorem [19] can be applied here. Let s∗ be the value of s that solves

213


the supremum on the RHS of (2.24). Then, using a similar argument to the one

used in the generalised outage evaluation (Appendix A.2.2), we can conclude the

following expectation over Z

limSNR→∞

∑

x′∈Xnt

(

e−s∗

∥

∥

∥

√

SNR

ntHb(x−x′)+Z−

√

SNR

ntEbx

′∥

∥

∥

2+s∗

∥

∥

∥Z−

√

SNR

ntEbx

∥

∥

∥

2)

ρ

≤ [(

∑

x′∈Xnt

1

xt 6= x′t, ∀t ∈ S(ǫ,δ)b

)ρ]

(A.305)

= 2ρM(nt−κb) (A.306)

where S(ǫ,δ)b and κb have the same definition as those in Appendix A.2.2. Conse-

quently, we have at high SNR

EQ0 (s

∗, ρ, Hb) ≥ ρMκb. (A.307)

The random coding error exponent EQr (R, Hb) can then be bounded as follows

EQr (R, Hb) = sup

s>00≤ρ≤1

1

B

B∑

b=1

EQ0 (s, ρ, Hb)− ρR (A.308)

≥ sup0≤ρ≤1

ρM

(

1

B

B∑

b=1

κb −R

M

)

. (A.309)

Define ζ and two mutually exclusive sets as follows

ζ ,B∑

b=1

κb −BR

M, (A.310)

AX ,

A, Θ ∈ Bnr×nt :

B∑

b=1

κb >BR

M

, (A.311)

AcX,

A, Θ ∈ Bnr×nt :B∑

b=1

κb ≤BR

M

. (A.312)

Note that the value of ρ that solves the supremum on the RHS of (A.309) is

given by ρ∗ = 1 if ζ > 0 and ρ∗ = 0 if ζ ≤ 0. Then, we can upper-bound the

214


average error probability of discrete-input random codes as follows

Pe,ave ≤ [

2−BJEQr (R,H )

]

(A.313)

.

≤∫

AX∩A0,Θde×1

(

SNR−(1+ τ

2 )∑B

b=1

∑nrr=1

∑ntt=1 αb,r,t

× SNR−

∑Bb=1

∑nrr=1

∑ntt=1(θb,r,t−de) × 2−JMζdAdΘdΦHdΦE

)

+

∫

AcX∩A0,Θde×1

(

SNR−(1+ τ

2 )∑B

b=1

∑nrr=1

∑ntt=1 αb,r,t

× SNR−

∑Bb=1

∑nrr=1

∑ntt=1(θb,r,t−de)dAdΘdΦHdΦE

)

(A.314)

.= C1SNR

−d1 + C2SNR−d2 (A.315)

.= SNR

−min(d1,d2) (A.316)

where C1, C2.= SNR

0. It is straightforward to see that

d2 = min(1, de)×(

1 +τ

2

)

nr

⌈

B

(

nt −R

M

)⌉

(A.317)

is equivalent to dicsir in Theorem 3.1 up to the discontinuity points of the Singleton

bound. This is exactly the same as (A.113) when replacing < in the outage set

with ≤. On the other hand, following the same steps used in Appendix A.2.2,

we arrive to the following result for d1

d1 = infAX∩A0, Θde×1

JMζ log 2

log SNR+(

1 +τ

2

)

B∑

b=1

nr∑

r=1

nt∑

t=1

αb,r,t

+

B∑

b=1

nr∑

r=1

nt∑

t=1

(θb,r,t − de)

. (A.318)

If both M and J are not growing with log SNR, it is clearly seen that d1 = 0

as the SNR tends to infinity. Assume that M is fixed and J(SNR) = ω log SNR,

215


ω ≥ 0. Then, we can write d1 as

d1 = infAX∩A0, Θde×1

ωMζ log 2 +(

1 +τ

2

)

B∑

b=1

nr∑

r=1

nt∑

t=1

αb,r,t

+

B∑

b=1

nr∑

r=1

nt∑

t=1

(θb,r,t − de)

. (A.319)

By letting ǫ, δ ↓ 0 to achieve a tight SNR-exponent lower bound, the optimiser

of Θ is given by Θ∗ = de × 1 and for A is given by evaluating the intersection of

A 0 and AX. This yields the following solution of d1

d1 = inf1+⌊BR

M ⌋≤K≤Bnt

d1(K) (A.320)

where

d1(K) = ωM log 2

(

K − BR

M

)

+min(1, de)×(

1 +τ

2

)

nr(Bnt −K). (A.321)

Note that the derivative of d1(K) with respect to K is given by

∂d1(K)

∂K= ωM log 2−min(1, de)

(

1 +τ

2

)

nr. (A.322)

It follows that the value of K that solving the infimum (A.320) is given by

K∗ = Bnt (A.323)

if ωM log 2 < min(1, de)(1 +τ2)nr, and

K∗ = 1 +

⌊

BR

M

⌋

(A.324)

if ωM log 2 ≥ min(1, de)(1 +τ2)nr.

We are interested in the interval of ω for which d1 ≥ d2 as this is the point

where the SNR-exponent of discrete-input random codes is tight with dicsir up to

the discontinuity points of the Singleton bound. Note that from (A.317), (A.321),

(A.323) and (A.324), we deduce that d1 ≥ d2 is only possible with ωM log 2 ≥

216


min(1, de)(1 + τ2)nr. This implies that K∗ = 1 + ⌊BR

M⌋ and d1 = d1(K

∗). It

follows that by comparing d1(K∗) and d2, we obtain the following threshold on

ω for which d1(K∗) ≥ d2

ω ≥ 1

M log 2· min(1, de) ·

(

1 + τ2

)

nr

1 +⌊

BRM

⌋

− BRM

. (A.325)

Furthermore, using K∗ in (A.323) and (A.324), a complete characterisation on

the achievable SNR-exponent with discrete-input random codes can be obtained

and it is given in equations (3.40), (3.41) and (3.42).

217

Appendix B

B.1 Proof of Lemma 4.2

We first note that due to the symmetry of the codebook construction, it suffices

to consider that the message m = 1 is transmitted. Recall that for a given Y,

the decoder outputs the message m if X(m) is the unique matrix such that its

normalised metric is greater than the threshold in Tδ. Otherwise, it declares

error. The undetected error event is characterised by the unique decoding of

Ψ(Y) when the decoded message is not the transmitted one. Let X(j) be the

codeword matrix corresponding to the j-th message, j ∈ 1, . . . , |M|. We can

write the undetected error event V,E as

V,E∣

∣

∣

(

X(1),Y, H)

∈ Tcδ

⊆⋃

j 6=1

(

X(j),Y, H)

∈ Tδ

∣

∣

∣

(

X(1),Y, H)

∈ Tcδ

.

(B.1)

We evaluate the following probability for j 6= 1 conditioned on a fixed fading

H = H and its corresponding estimation error E = E (such that H = H+ E)

Pr(

X(j),Y, H)

∈ Tδ

∣

∣

∣

(

X(1),Y, H)

∈ Tcδ

= [

Pr(

X(j),Y, H)

∈ Tδ

∣

∣

∣Y

∣

∣

∣

(

X(1),Y, H)

∈ Tcδ

]

(B.2)

where the equality holds since conditioned on Y = Y for any Y ∈ Bnr×J (and

for a fixed H = H+ E), the following metric for message j

QsY|X ,H (Y|X(j), H)

[

QsY|X ,H (Y|X′, H)

] (B.3)

219

B.2 Proof of Theorem 4.1

is independent of whether or not the event (X(1),Y, H) ∈ Tcδ has occurred (as

the codeword for message j 6= 1 is independently generated from the codeword

for message 1). We next evaluate Pr(X(j),Y, H) ∈ Tδ|Y for a specific channel

output Y = Y ∈ Bnr×J

Pr(

X(j),Y, H)

∈ Tδ

∣

∣

∣Y = Y

= Pr


[


] ≥ |M|δ

(B.4)

≤

[


]

|M|δ

[


] (B.5)

=δ

|M| . (B.6)

Inequality (B.5) follows from Markov’s inequality. Equality (B.6) follows since

for a given transmit message m = 1, Y = Y, H = H and E = E, random coding

with i.i.d. codebooks implies that the expectation [QsY|X ,H (Y|X(j), H)] does not

depend on the message index j for j 6= 1. Combining (B.6) with (B.2) and

applying the union bound to the probability of the event (B.1) yields

PrV,E|H = H,E = E < δ. (B.7)

The proof of Lemma 4.2 is completed by letting δ′ = − log2 δBJ

.


Using random coding schemes, we characterise Pe(L) in (4.50). The converse and

achievability bounds are given in the following.

220


B.2.1 Converse

To characterise the converse, we shall assume i.i.d. codebooks and perfect error

detection such that Pe(L) in (4.50) becomes

Pe(L)

= Pr D1, Ft(1) = 1+ Pr AL−1,DL−1,EL+L−1∑

ℓ′=2

Pr Aℓ′−1,Dℓ′, Ft(ℓ′) = 1 .

(B.8)

As J → ∞, we can lower-bound the first term as

Pr D1, Ft(1) = 1 = PrD1PrFt(1) = 1|Fr(1) = 0 (B.9)

≥ Pr

H1,1,E1,1 ∈ O1,1(R)

pfb (B.10).= P−dicsir(1)−dfb . (B.11)

Here (B.10) follows from the converse of i.i.d. codebooks (Proposition 2.4) and

(B.11) follows from the definition of dicsir(1) in (4.54).

We next consider PrAℓ−1,Dℓ, Ft(ℓ) = 1. We have that for rounds ℓ < L

PrAℓ−1,Dℓ, Ft(ℓ) = 1= Pr Ft(ℓ) = 1|Aℓ−1,DℓPr Aℓ−1,Dℓ (B.12)

= pfb Pr Aℓ−1,Dℓ (B.13)

= pfb Pr Dℓ|Aℓ−1,Dℓ−1Pr Aℓ−1|Aℓ−2,Dℓ−1Pr Aℓ−2,Dℓ−1 (B.14)

= pfb(1− pfb) Pr Dℓ|Aℓ−1,Dℓ−1Pr Aℓ−2,Dℓ−1 (B.15)

= pfb(1− pfb)ℓ−1 Pr D1

ℓ∏

ℓ′=2

Pr Dℓ′|Aℓ′−1,Dℓ′−1 . (B.16)

Note that

Pr D1ℓ∏

ℓ′=2

Pr Dℓ′|Aℓ′−1,Dℓ′−1 (B.17)

with imperfect feedback is identical to

Pr D1ℓ∏

ℓ′=2

Pr Dℓ′|Dℓ′−1 (B.18)

221


with perfect feedback as we properly condition the event Dℓ′ with Aℓ′−1 and

Dℓ′−1, i.e., the event Dℓ′ in (B.17) is only considered if detected error occurs at

all previous rounds and negative ACKs are obtained at the transmitter. The

effect of imperfect feedback is captured by pfb(1 − pfb)ℓ−1. For round ℓ = L, we

have that

Pr AL−1,DL−1,EL= Pr EL|AL−1,DL−1Pr AL−1,DL−1 (B.19)

= (1− pfb)L−1 Pr EL|AL−1,DL−1Pr D1

L−1∏

ℓ′=2

Pr Dℓ′|Aℓ′−1,Dℓ′−1 . (B.20)

Therefore, using the GMI converse for i.i.d. codebooks (4.47), we have as

J → ∞ that

pfb(1− pfb)ℓ−1 Pr D1

ℓ∏

ℓ′=2

Pr Dℓ′|Aℓ′−1,Dℓ′−1

≥ pfb(1− pfb)ℓ−1 Pr

H1,ℓ,E1,ℓ ∈

ℓ⋂

ℓ′=1

O1,ℓ′(R)

(B.21)

.= P−dicsir(ℓ)−dfb (B.22)

and

(1− pfb)L−1 Pr EL|AL−1,DL−1Pr D1

L−1∏

ℓ′=2


≥ (1− pfb)L−1 Pr

H1,L,E1,L ∈

L⋂

ℓ′=1

O1,ℓ′(R)

(B.23)

.= P−duicsir(L). (B.24)

Combining (B.11), (B.22) and (B.24) with (B.8) yields

Pe(L).≥ P−dicsir(L) +

L−1∑

ℓ=1

P−dicsir(ℓ)−dfb . (B.25)

222


B.2.2 Achievability

We now prove that the same SNR-exponent as that on the RHS of (B.25) can

be achieved using random coding schemes. Recall Pe(L) in (4.50)

Pe(L) =PrV1,E1+L−1∑

ℓ′=2

PrAℓ′−1,Dℓ′−1,Vℓ′,Eℓ′+ Pr AL−1,DL−1,EL

+ PrD1, Ft(1) = 1+L−1∑

ℓ′=2

PrAℓ′−1,Dℓ′, Ft(ℓ′) = 1. (B.26)

Applying Lemma 4.2, the first two terms corresponding to undetected errors can

be upper-bounded as

PrV1,E1+L−1∑

ℓ′=2

PrAℓ′−1,Dℓ′−1,Vℓ′ ,Eℓ′ ≤ (L− 1)2−BJδ′ (B.27)

which vanishes as J → ∞ for a fixed δ′ > 0. Following the analysis in Appendix

B.2.1, the last three terms can be written as

Pr AL−1,DL−1,EL = (1− pfb)L−1 Pr EL|AL−1,DL−1

· Pr D1L−1∏

ℓ′=2

Pr Dℓ′|Aℓ′−1,Dℓ′−1 , (B.28)

Pr D1, Ft(1) = 1 = PrD1pfb, (B.29)

Pr Aℓ′−1,Dℓ′, Ft(ℓ′) = 1 = pfb(1− pfb)

ℓ−1 Pr D1ℓ∏

ℓ′=2

Pr Dℓ′|Aℓ′−1,Dℓ′−1 .

(B.30)

As argued in Appendix B.2.1, the probability

Pr D1ℓ∏

ℓ′=2

Pr Dℓ′|Aℓ′−1,Dℓ′−1 (B.31)

with imperfect feedback is identical to the probability

Pr Dℓ = Pr D1ℓ∏

ℓ′=2

Pr Dℓ′ |Dℓ′−1 (B.32)

223

B.3 Proof of Proposition 4.1

with perfect feedback. Thus, using random coding schemes with J → ∞, apply-

ing (4.40) yields upper bounds

PrD1pfb ≤ Pr

H1,1,E1,1 ∈ Q1,1(R + δ′)

pfb, (B.33)

pfb(1− pfb)ℓ−1Pr D1

ℓ∏

ℓ′=2

Pr Dℓ′ |Aℓ′−1,Dℓ′−1

≤ pfb(1− pfb)ℓ−1 Pr

H1,ℓ,E1,ℓ ∈

ℓ⋂

ℓ′=1

Q1,ℓ′(R + δ′)

(B.34)

and applying (4.46) yields an upper bound

(1− pfb)L−1 Pr EL|AL−1,DL−1Pr D1

L−1∏

ℓ′=1


≤ (1− pfb)L−1 Pr

H1,L,E1,L ∈

L⋂

ℓ′=1

Q1,ℓ′(R + δ′)

. (B.35)

By having δ′ ↓ 0 and following from the definition of the sets Q1,ℓ(R) and O1,ℓ(R),

we can see that (B.33), (B.34) and (B.35) tend to be similar with (B.10), (B.21)

and (B.23), respectively. This implies that the converse and the achievability are

tight for a sufficiently small δ′. It follows that the ARQ diversity (4.55) is given

by the slowest decaying exponent of the RHS of (B.25), i.e.,

darq = min

dicsir(1) + dfb, dicsir(2) + dfb, . . . , dicsir(L− 1) + dfb, dicsir(L)

(B.36)

= min(

dicsir(L), dicsir(1) + dfb

)

(B.37)

where the last equality is because dicsir(ℓ) is a non-decreasing function of ℓ. This

completes the proof.


We evaluate duicsir(ℓ)—the generalised outage diversity at round ℓ with uniform

power allocation—using the same change of random variables as that in Ap-

pendix A, i.e., αℓ′,b,r,t , − log |hℓ′,b,r,t|2/ logP and θℓ′,b,r,t , − log |eℓ′,b,r,t|2/ logP .We denote A1,ℓ,Θ1,ℓ ∈ ℓBnr×nt as the matrices with entries αℓ′,b,r,t and θℓ′,b,r,t,

224


respectively. It follows from Lemma A.1 that

duicsir(ℓ) = infA1,ℓ,Θ1,ℓ∈

⋂ℓℓ′=1

O1,ℓ′

(R), A1,ℓ0,Θ1,ℓde×1

(

1 +τ

2

)

ℓ∑

ℓ′=1

B∑

b=1

nr∑

r=1

nt∑

t=1

αℓ′,b,r,t

+

ℓ∑

ℓ′=1

B∑

b=1

nr∑

r=1

nt∑

t=1

(θℓ′,b,r,t − de)

. (B.38)

For both Gaussian and discrete alphabets, an exact characterisation on O1,ℓ(R)

is difficult to obtain. We shall use bounding techniques developed in Appendix

A to characterise duicsir(ℓ).

We infer from Appendix A that it suffices to consider solving the SNR-

exponent for discrete inputs with alphabet size |X| = 2M . The proof for Gaussian

inputs with constant R, independent of the SNR (such that the multiplexing gain

tends to zero) follows along the same line as the proof for discrete inputs with a

sufficiently large alphabet size such that M ≥ BR. Thus, for the remaining part

of this appendix, we shall focus on O1,ℓ(R) for discrete inputs.

B.3.0.1 GMI Upper Bound

Using the GMI upper bound in Appendix A.2, we have an upper bound duicsir(ℓ) ≥duicsir(ℓ)


⋂ℓℓ′=1

O1,ℓ′

(R), A1,ℓ0,Θ1,ℓde×1

(

1 +τ

2

)

ℓ∑

ℓ′=1

B∑

b=1

nr∑

r=1

nt∑

t=1

αℓ′,b,r,t

+

ℓ∑

ℓ′=1

B∑

b=1

nr∑

r=1

nt∑

t=1


(B.39)

where O1,ℓ′(R) is similarly defined to (4.42) but with the accumulated GMI at

round ℓ′ replaced by its corresponding upper bound obtained using Proposition

2.3. Following the analysis in Appendix A.2, we have for discrete inputs with

alphabet size |X| = 2M that

O1,ℓ(R) =

A,Θ ∈ ℓBnr×nt :

ℓ∑

ℓ′=1

B∑

b=1

κℓ′,b <BR

M

, ℓ = 1, . . . , L (B.40)

225


where

κℓ′,b ,∣

∣

∣S(ǫ,ǫ′)

ℓ′,b

∣

∣

∣, (B.41)

S(ǫ,ǫ′)

ℓ′,b ,

nr⋃

r=1

S(ǫ,ǫ′)

ℓ′,b,r, (B.42)

S(ǫ,ǫ′)

ℓ′,b,r ,

t :

αℓ′,b,r,t ≤ 1 + ǫ ∩ αℓ′,b,r,t ≤ θℓ′,b,r,t + ǫ′

∪

αℓ′,b,r,t ≤ 1 + ǫ ∩ αℓ′,b,r,t > θℓ′,b,r,t + ǫ′ ∩ Ξℓ′,b,r,t

, t = 1, . . . , nt

(B.43)

for any ǫ, ǫ′ > 0, and where

Ξℓ′,b,r,t ,

φhℓ′,b,r,t, φ

eℓ′,b,r,t ∈ [0, 2π) : cos

(

φeℓ′,b,r,t − φh

ℓ′,b,r,t

)

> 0

. (B.44)

As argued in Appendix A, the values of θℓ′,b,r,t, ℓ′ = 1, . . . , ℓ, for all b, r, t achieving

the infimum (B.39) are given by de. Substituting θℓ′,b,r,t = de in (B.43) and

following the analysis in [32], it can be shown that the infimum (B.39) is achieved

with

αℓ′,b,r,t = min(1 + ǫ, de + ǫ′), for all b, r, t and ℓ′ = 1, . . . , ℓ− 1, (B.45)

and

αℓ,b,r,t =

min(1 + ǫ, de + ǫ′), bB + t > BRM

0, otherwise(B.46)

for all r. Thus, by letting ǫ, ǫ′ ↓ 0, we have that

duicsir(ℓ) = min(1, de)×(

1 +τ

2

)

nr

(

1 +

⌊

ℓB

(

nt −R

ℓM

)⌋)

(B.47)

= min(1, de)× ducsir(ℓ). (B.48)

226


B.3.0.2 GMI Lower Bound

Using the GMI lower bound in Appendix A.2, we have a lower bound duicsir(ℓ) ≤duicsir(ℓ)


⋂ℓℓ′=1 O1,ℓ′

(R), A1,ℓ0,Θ1,ℓde×1

(

1 +τ

2

)

ℓ∑

ℓ′=1

B∑

b=1

nr∑

r=1

nt∑

t=1

αℓ′,b,r,t

+ℓ∑

ℓ′=1

B∑

b=1

nr∑

r=1

nt∑

t=1


(B.49)

where O1,ℓ′(R) is similarly defined to (4.42) but with the accumulated GMI at

round ℓ′ replaced by its corresponding lower bound. We have for discrete inputs

with alphabet size |X| = 2M that

O1,ℓ(R) =


ℓ∑

ℓ′=1

B∑

b=1

κℓ′,b <BR

M

, ℓ = 1, . . . , L (B.50)

where

κℓ′,b ,∣

∣

∣S(ǫ,ǫ′)ℓ′,b

∣

∣

∣, (B.51)

S(ǫ,ǫ′)ℓ′,b ,

nr⋃

r=1

S(ǫ,ǫ′)ℓ′,b,r, (B.52)

S(ǫ,ǫ′)ℓ′,b,r ,

t : αℓ′,b,r,t ≤ 1− ǫ ∩ αℓ′,b,r,t ≤ θℓ′,min − δ, t = 1, . . . , nt

, (B.53)

θℓ′,min , min θ1,1,1,1, . . . , θℓ′,B,nr,nt (B.54)

for any ǫ, ǫ′ > 0.

Using the same arguments in Appendix A, the solutions of θℓ′,min and θℓ′,b,r,t

achieving the infimum (B.49) are all given by de. Thus, following the analysis

in [32], the values of αℓ′,b,r,t achieving the infimum (B.49) are given by

αℓ′,b,r,t = min(1− ǫ, de − ǫ′), for all b, r, t and ℓ′ = 1, . . . , ℓ− 1 (B.55)

227


and

αℓ,b,r,t =

min(1− ǫ, de − ǫ′), bB + t > BRM

0, otherwise(B.56)

for all r. Thus, we have that by letting ǫ, ǫ′ ↓ 0

duicsir(ℓ) = min(1, de)×(

1 +τ

2

)

nr

(

1 +

⌊

ℓB

(

nt −R

ℓM

)⌋)

(B.57)

= min(1, de)× ducsir(ℓ) (B.58)

which is identical to the upper bound (B.48). It follows that

duicsir(ℓ) = duicsir(ℓ) = duicsir(ℓ) (B.59)

which completes the proof.


We evaluate dpicsir(ℓ)—the generalised outage diversity at round ℓ with power

control—using the same change of random variables as that in Appendix B.3.

Recall the power constraint in (4.63)

P1

L+

1

L

L∑

ℓ=2

Pr Ft(ℓ− 1) = 0Pℓ ≤ P. (B.60)

We can see that

PrFt(ℓ− 1) = 0 = PrFt(1) = 0, . . . , Ft(ℓ− 1) = 0 (B.61)

as negative ACK at round ℓ (at the transmitter) is only possible if ACKs at all

previous rounds (at the transmitter) are also negative.

To derive PrFt(ℓ) = 0, we first consider PrFr(ℓ) = 0 and PrFr(ℓ) = 1.Note that an event Fr(ℓ) = 0 occurs if decoding error occurs at round ℓ. Thus,

228


we can write

PrFr(ℓ) = 0 = Pr E1, . . . ,Eℓ,Aℓ−1 .= P−dpicsir(ℓ) (B.62)

where the dot equality follows from the argument in (B.22) and by noting that

we have power control. An event Fr(ℓ) = 1 occurs if

• correct message is obtained at round ℓ.

• correct message is obtained at previous rounds ℓ′ = 1, . . . , ℓ − 1, but re-

transmission still occurs as a result of Ft(ℓ′) = 0.

We thus have

PrFr(ℓ) = 1

= Pr E1, . . . ,Eℓ−1,Ecℓ,Aℓ−1+ PrEc

1,Aℓ−1+ℓ−1∑

ℓ′=2

PrE1, . . . ,Eℓ′−1,Ecℓ′,Aℓ−1.

(B.63)

Under imperfect feedback and CSIR, we have no prior knowledge whether

power control always reduces the probability of decoding error from round to

round. Thus, to facilitate the analysis, we denote the set L as

L = ℓ : Pr Ecℓ|E1, . . . ,Eℓ−1,Aℓ−1 = 0, ℓ = 1, . . . , L, (B.64)

namely the set of round indices for which correct decoding cannot be obtained.

The first term in (B.63) can be evaluated as

Pr E1, . . . ,Eℓ−1,Ecℓ,Aℓ−1

= Pr Ecℓ|E1, . . . ,Eℓ−1,Aℓ−1PrE1, . . . ,Eℓ−1,Aℓ−1 (B.65)

= (1− Pr Eℓ|E1, . . . ,Eℓ−1,Aℓ−1) PrE1, . . . ,Eℓ−1,Aℓ−1. (B.66)

229


For ℓ ∈ L, this term vanishes. For ℓ ∈ Lc, this term can be evaluated as

(1− Pr Eℓ|E1, . . . ,Eℓ−1,Aℓ−1) PrE1, . . . ,Eℓ−1,Aℓ−1= PrE1, . . . ,Eℓ−1,Aℓ−1 − PrE1, . . . ,Eℓ,Aℓ−1 (B.67)

= (1− pfb) PrE1, . . . ,Eℓ−1,Aℓ−2 − PrE1, . . . ,Eℓ,Aℓ−1 (B.68).= P−dpicsir(ℓ−1) − P−dpicsir(ℓ) (B.69).= P−dpicsir(ℓ−1). (B.70)

Here the first dot equality follows from the same argument as (B.22) and the

second dot equality follows since ℓ ∈ Lc. The second term in (B.63) can be

evaluated as

PrEc1,Aℓ−1 = PrAℓ−1|Ec

1,A1PrA1|Ec1PrEc

1 (B.71)

= pℓ−1fb (1− Pr E1) (B.72)

.= P−(ℓ−1)dfb − P−(ℓ−1)dfb−dpicsir(1) (B.73).= P−(ℓ−1)dfb . (B.74)

Here the first dot equality follows from the same argument as (B.11) and the

second dot equality is because for the coding rate assumed in Section 4.2, the

optimal power control yields dpicsir(1) ≥ duicsir(1) > 0. For the third term, we have

that

PrE1, . . . ,Eℓ′−1,Ecℓ′,Aℓ−1

= PrAℓ−1|E1, . . . ,Eℓ′−1,Ecℓ′,Aℓ′PrAℓ′|E1, . . . ,Eℓ′−1,E

cℓ′,Aℓ′−1

· PrE1, . . . ,Eℓ′−1,Ecℓ′,Aℓ′−1 (B.75)

= pℓ−ℓ′

fb · PrE1, . . . ,Eℓ′−1,Ecℓ′,Aℓ′−1. (B.76)

Similarly to (B.66), if ℓ′ ∈ L, then PrE1, . . . ,Eℓ′−1,Ecℓ′,Aℓ′−1 = 0; if ℓ′ ∈ Lc, we

have that

PrE1, . . . ,Eℓ′−1,Ecℓ′,Aℓ′−1 .

= P−dpicsir(ℓ′−1) (B.77)

which implies that

PrE1, . . . ,Eℓ′−1,Ecℓ′,Aℓ−1 .

= P−(ℓ−ℓ′)dfb−dpicsir(ℓ′−1). (B.78)

230


We then evaluate Pr Ft(ℓ) = 0 as follows

Pr Ft(ℓ) = 0 = Pr Ft(ℓ) = 0|Fr(ℓ) = 0PrFr(ℓ) = 0+ Pr Ft(ℓ) = 0|Fr(ℓ) = 1PrFr(ℓ) = 1 (B.79)

= (1− pfb) PrFr(ℓ) = 0+ pfbPrFr(ℓ) = 1 (B.80)

.= P−min(dpicsir(ℓ), d(ℓ)) (B.81)

where

d(ℓ) = min

ℓdfb, minℓ′=1,...,ℓ−1

ℓ′∈Lc

ℓ′dfb + dpicsir(ℓ− ℓ′)

(B.82)

and where we have dpicsir(0) , 0 and d(0) , 0 by definition. As we shall see later

on that L is an empty set since dpicsir(ℓ) is an increasing function of ℓ.

In the following, we characterise dpicsir(ℓ) using bounding techniques developed

in Appendix A. As pointed out in Appendix B.3, it suffices to consider discrete

inputs only.

B.4.1 GMI Upper Bound

To evaluate an upper bound to dpicsir(ℓ), we shall apply the GMI upper bound in

Proposition 2.3. Let

Igmiℓ′,b

(

Pℓ′ ,Hℓ′,b, Hℓ′,b

)

≥ sups>0

Igmiℓ′,b

(

Pℓ′,Hℓ′,b, Hℓ′,b, s)

(B.83)

be the resulting upper bound obtained using the techniques in Appendix A.2 and

Igmi

1,ℓ(H1,ℓ) ,

1

B

ℓ∑

ℓ′=1

B∑

b=1

Igmiℓ′,b

(

Pℓ′ ,Hℓ′,b, Hℓ′,b

)

. (B.84)

We then have an upper bound dpicsir(ℓ) ≥ dpicsir(ℓ) satisfying

P−dpicsir(ℓ).= Pr

H1,ℓ,E1,ℓ ∈

ℓ⋂

ℓ′=1

O1,ℓ′(R)

(B.85)

where O1,ℓ′(R) is similarly defined to O1,ℓ′(R) (4.42) but with Igmi

1,ℓ′(H1,ℓ′) replaced

by Igmi

1,ℓ′(H1,ℓ′). We can see from Appendix A.2 that Igmi

ℓ′,b (Pℓ′,Hℓ′,b, Hℓ′,b) is a non-

231


decreasing function of the transmit power Pℓ′ in the high-SNR regime. Note

that the upper bound (B.84) holds for any possible allocation of transmit power

P1, . . . , Pℓ.

Let P ⋆1 , . . . , P

⋆ℓ be the optimal power allocation that minimises P ℓ

gout(R) in

(4.48) and Igmi⋆

1,ℓ(H1,ℓ) be the corresponding accumulated GMI with the optimal

power allocation P ⋆1 , . . . , P

⋆ℓ . The constraint (B.60) implies that P ⋆

ℓ , ℓ = 1, . . . , L,

have to satisfy

P ⋆ℓ ≤ PL

PrFt(ℓ− 1) = 0.= P 1+min(dpicsir(ℓ−1), d(ℓ−1)). (B.86)

Consider the upper bound Igmiℓ′,b (Pℓ′ ,Hℓ′,b, Hℓ′,b). Since Igmi

ℓ′,b (Pℓ′,Hℓ′,b, Hℓ′,b) is a

non-decreasing function of the transmit power Pℓ′, it follows that using

Pℓ′.= P 1+min(dpicsir(ℓ′−1), d′(ℓ′−1)), (B.87).

≥ P 1+min(dpicsir(ℓ′−1), d(ℓ′−1)), ℓ′ = 1, . . . , ℓ (B.88)

for Igmi

1,ℓ(H1,ℓ), we have at high SNR that

Igmi

1,ℓ(H1,ℓ) ≥ Igmi⋆

1,ℓ(H1,ℓ). (B.89)

Here d′(ℓ) is similarly defined to d(ℓ) (B.82) but with dpicsir(ℓ − ℓ′) replaced by

dpicsir(ℓ− ℓ′).

Using P1, . . . , Pℓ satisfying (B.87), the upper bound (B.85) is given by

dpicsir(ℓ) = infA1,ℓ,Θ1,ℓ∈

⋂ℓℓ′=1

O1,ℓ′

(R), A1,ℓ0,Θ1,ℓde×1

(

1 +τ

2

)

ℓ∑

ℓ′=1

B∑

b=1

nr∑

r=1

nt∑

t=1

αℓ′,b,r,t

+

ℓ∑

ℓ′=1

B∑

b=1

nr∑

r=1

nt∑

t=1


. (B.90)

Let

aℓ , min(

dpicsir(ℓ− 1), d′(ℓ− 1))

. (B.91)

232


We then have for discrete inputs with alphabet size |X| = 2M that

O1,ℓ(R) =


ℓ∑

ℓ′=1

B∑

b=1

κℓ′,b <BR

M

(B.92)

where

κℓ′,b ,∣

∣

∣S(ǫ,ǫ′)

ℓ′,b

∣

∣

∣, (B.93)

S(ǫ,ǫ′)

ℓ′,b ,

nr⋃

r=1

S(ǫ,ǫ′)

ℓ′,b,r, (B.94)

S(ǫ,ǫ′)

ℓ′,b,r ,

t :

αℓ′,b,r,t ≤ 1 + aℓ′ + ǫ ∩ αℓ,b,r,t ≤ θℓ′,b,r,t + ǫ′

∪

αℓ′,b,r,t ≤ 1 + aℓ′ + ǫ ∩ αℓ′,b,r,t > θℓ′,b,r,t + ǫ′ ∩ Ξℓ′,b,r,t

,

t = 1, . . . , nt

(B.95)

for some ǫ, ǫ′ > 0, and where Ξℓ′,b,r,t is similarly defined to (B.44). Similarly to

the uniform power allocation, the solution for θℓ′,b,r,t, ℓ′ = 1, . . . , ℓ, b = 1, . . . , B,

r = 1, . . . , nr, t = 1, . . . , nt achieving the infimum (B.90) is given by de. The

values of αℓ′,b,r,t achieving the infimum (B.90) are given by

αℓ′,b,r,t = min (1 + aℓ + ǫ, de + ǫ′) , (B.96)

for all ℓ′ = 1, . . . , ℓ− 1 and all b, r, t, and

αℓ,b,r,t =

min (1 + aℓ + ǫ, de + ǫ′) , bB + t > BRM

0, otherwise(B.97)

for all r. Thus, it follows from aℓ (B.91) and by letting ǫ, ǫ′ ↓ 0 that

dpicsir(ℓ) =(

1 +τ

2

)

min(

1 + min(

dpicsir(ℓ− 1), d′(ℓ− 1))

, de

)

dSB(R)

+(

1 +τ

2

)

Bntnr

ℓ−1∑

ℓ′=1

min(

1 + min(

dpicsir(ℓ′ − 1), d′(ℓ′ − 1)

)

, de

)

.

(B.98)

We can see from (B.98) that dpicsir(1) = duicsir(1).

233


B.4.2 GMI Lower Bound

Let

Igmi

1,ℓ(H1,ℓ) ≤

1

B

ℓ∑

ℓ′=1

B∑

b=1

Igmiℓ′,b (Pℓ′,Hℓ′,b,Eℓ′,b, s) (B.99)

be the resulting lower bound to Igmi

1,ℓ(H1,ℓ) obtained using the techniques in Ap-

pendix A.2. We shall denote dpicsir as the lower bound to dpicsir(ℓ) satisfying

P−dpicsir.= Pr

H1,ℓ,E1,ℓ ∈

ℓ⋂

ℓ′=1

O1,ℓ′

(B.100)

where O1,ℓ′(R) is similarly defined to O1,ℓ′(R) (4.42) but with Igmi

1,ℓ′(H1,ℓ′) replaced

by Igmi

1,ℓ′(H1,ℓ′).

Note that the power allocation (B.86) violates the constraint (B.60). However,

a suboptimal power allocation satisfying the same dot equality as (B.86) and the

constraint (B.60) can be constructed, i.e.,

Pℓ =P

PrFt(ℓ− 1) = 0.= P 1+min(dpicsir(ℓ−1), d(ℓ−1)). (B.101)

Since we deal with the lower bound dpicsir(ℓ), we shall consider another suboptimal

power allocation that has a similar form to (B.101), i.e.,

Pℓ.= P 1+min(dpicsir(ℓ−1), d′′(ℓ−1)) (B.102)

where d′′(ℓ) is similarly defined to (B.82) but with dpicsir(ℓ− ℓ′), ℓ′ = 1, . . . , ℓ− 1

replaced by dpicsir(ℓ − ℓ′). We have dpicsir(0) , 0, d′′(0) , 0 by definition. We

observe that both dpicsir(ℓ − ℓ′) and d′′(ℓ) are non-decreasing functions of ℓ. We

shall see in the following that this allocation does not necessarily yield the same

SNR-exponent as that obtained with the GMI upper bound.

234


Using P1, . . . , Pℓ satisfying (B.102), a lower bound to dpicsir(ℓ) is given by

dpicsir(ℓ) = infA1,ℓ,Θ1,ℓ∈

⋂ℓℓ′=1

O1,ℓ′

(R), A1,ℓ0,Θ1,ℓde×1

(

1 +τ

2

)

ℓ∑

ℓ′=1

B∑

b=1

nr∑

r=1

nt∑

t=1

αℓ′,b,r,t

+

ℓ∑

ℓ′=1

B∑

b=1

nr∑

r=1

nt∑

t=1


.

(B.103)

Let

aℓ = min(

dpicsir(ℓ− 1), d′′(ℓ− 1))

(B.104)

gℓ(ℓ′) = min

l=1,...,ℓθl,min − al + aℓ′ (B.105)

θl,min = θl,1,1,1, . . . , θl,B,nr,nt. (B.106)

It follows from (B.102) that Pℓ.= P 1+aℓ . Following the derivation of the GMI

lower bound in Appendix A, we have that for discrete inputs with alphabet size

2M

O1,ℓ(R) =

A,Θ ∈ ℓBnr×nt :ℓ∑

ℓ′=1

B∑

b=1

κℓ′,b <BR

M

. (B.107)

Here for a given ℓ = 1, . . . , L, we define the following for each ℓ′ = 1, . . . , ℓ

κℓ′,b ,∣

∣

∣S(ǫ,ǫ′)ℓ′,b

∣

∣

∣, (B.108)

S(ǫ,ǫ′)ℓ′,b ,

nr⋃

r=1

S(ǫ,ǫ′)ℓ′,b,r, (B.109)


t :

αℓ′,b,r,t ≤ 1 + aℓ′ − ǫ ∩ αℓ′,b,r,t ≤ gℓ(ℓ′)− ǫ′

, t = 1, . . . , nt

(B.110)

for some ǫ, ǫ′ > 0.

Following the same argument in Appendix A, the infimum solutions for θℓ′,b,r,t

for all ℓ′ = 1, . . . , ℓ and all b, r, t are given by de. This makes (B.105) become

gℓ(ℓ′) = min

l=1,...,ℓde − al + aℓ′. (B.111)

235


We next evaluate the values of αℓ′,b,r,t achieving the infimum (B.106). Con-

sider the constraint O1,ℓ(R). The values of αℓ′,b,r,t, ℓ′ ≤ ℓ that make the constraint

in O1,ℓ(R) tight are as follows.

• For ℓ′ = ℓ and all r = 1, . . . , nr,

αℓ,b,r,t > min(1 + aℓ − ǫ, gℓ(ℓ)− ǫ′), bB + t >BR

M(B.112)

and

0 ≤ αℓ′,b,r,t ≤ min(1 + aℓ − ǫ, gℓ(ℓ)− ǫ′), (B.113)

otherwise.

• For ℓ′ < ℓ,

αℓ′b,r,t > min(1 + aℓ′ − ǫ, gℓ(ℓ′)− ǫ′) (B.114)

for all b, r, t.

Note that from the monotonicity of aℓ, we have

min(1 + aℓ − ǫ, gℓ(ℓ)− ǫ′) = min(1 + aℓ − ǫ, de − ǫ′). (B.115)

It follows that for ℓ′ = ℓ, the infimum solutions for αℓ′,b,r,t are exactly

determined by (B.112) and (B.113) and given by

αℓ,b,r,t =

min(1 + aℓ − ǫ, de − ǫ′), bB + t > BRM

0, otherwise.(B.116)

For ℓ′ < ℓ, the infimum solutions for αℓ′,b,r,t are not only determined by

(B.114) but also determined by the intersection⋂ℓ

ℓ′ O1,ℓ′(R).

Consider ℓ′ = ℓ− 1. In order to satisfy the constraint O1,ℓ−1(R) ∩ O1,ℓ(R),

we need to have for all r = 1, . . . , nr

αℓ−1,b,r,t > min(1 + aℓ−1 − ǫ, gℓ−1(ℓ− 1)− ǫ′)

= min(1 + aℓ−1 − ǫ, de − ǫ′), bB + t >BR

M(B.117)

αℓ−1,b,r,t > min(1 + aℓ−1 − ǫ, gℓ(ℓ− 1)− ǫ′), bB + t ≤ BR

M(B.118)

αℓ′′,b,r,t > min(1 + aℓ′′ − ǫ, gℓ−1(ℓ′′)− ǫ′), ℓ′′ ≤ ℓ− 2, and all b, t.

(B.119)

236


Thus, for ℓ′ = ℓ − 1, the infimum solutions for αℓ′,b,r,t that meet both

O1,ℓ−1(R) and O1,ℓ(R) are given by

αℓ−1,b,r,t =

min(1 + aℓ−1 − ǫ, de − ǫ′), bB + t > BRM

min(1 + aℓ−1 − ǫ, gℓ(ℓ− 1)− ǫ′), otherwise(B.120)

for all r. For ℓ′ < ℓ− 1, we can follow the same procedure by considering

extra constraint in the intersection. It is not difficult to prove that the

values of αℓ′,b,r,t solving the infimum are given by (B.116) for ℓ′ = ℓ and

αℓ′,b,r,t =

min(1 + aℓ′ − ǫ, de − ǫ′), bB + t > BRM

min(1 + aℓ′ − ǫ, gℓ′+1(ℓ′)− ǫ′), otherwise

(B.121)

for ℓ′ < ℓ and all r. Notice that from (B.111), we have

gℓ′+1(ℓ′) = min

l=1,...,ℓ′+1de − al + aℓ′ = de − aℓ′+1 + aℓ′ (B.122)

which follows from the monotonicity of aℓ.

Inserting all values of αℓ′,b,r,t achieving the infimum (B.106), and by letting

ǫ, ǫ′ ↓ 0, we have a lower bound

dpicsir(ℓ) =(

1 +τ

2

)

dSB(R)

ℓ∑

ℓ′=1

min(1 + aℓ′, de)

+(

1 +τ

2

)

(Bntnr − dSB(R))ℓ−1∑

ℓ′=1

min(1 + aℓ′, gℓ′+1(ℓ′)).

(B.123)

Since a1 = a1 = 0, we can see that (B.98) and (B.123) are equal if de is

sufficiently large such that for ℓ′ = 1, . . . , L− 1

1 + aℓ′ ≤ gℓ′+1(ℓ′) (B.124)

which implies

de ≥ 1 + aℓ′+1. (B.125)

237


So, the bounds are not equal if there exists ℓ′ < ℓ such that

de < 1 + aℓ′+1. (B.126)

As 1 + aℓ′+1 is exactly the exponent of Pℓ′+1, the bounds are not tight due to a

suboptimal power allocation (B.102).

Since aℓ is a monotonically increasing function of ℓ, we let

ℓmin = min ℓ : de < 1 + aℓ, ℓ = 1, . . . , L. (B.127)

The suboptimality of the power adaptation occurs because we have power ex-

ponent larger than the CSIR-error diversity de. Hence, in order to prevent the

condition (B.126), we shall allocate for ℓ′ ≥ ℓmin, Pℓ′ such that

Pℓ′.= P de , (B.128)

which yields

Pℓ.= Pmin(1+aℓ, de), ℓ = 1, . . . , L. (B.129)

Replacing the power allocation (B.102) with (B.129) changes S(ǫ,ǫ′)ℓ′,b,r in (B.110) to


t :

αℓ′,b,r,t ≤ 1 + min(aℓ′, de − 1)− ǫ ∩ αℓ′,b,r,t ≤ gℓ(ℓ′)− ǫ′

,

t = 1, . . . , nt

(B.130)

and gℓ(ℓ′) in (B.111) to

gℓ(ℓ′) = min

l=1,...,ℓde −min(al, de − 1) + min(aℓ′, de − 1). (B.131)

Noting these changes and following the same steps to prove the SNR-exponent

with power allocation (B.102), it is not difficult to show that with power alloca-

238


tion (B.129)

dpicsir(ℓ)

=(

1 +τ

2

)

min (1 + aℓ, de) dSB(R) +(

1 +τ

2

)

Bntnr

ℓ−1∑

ℓ′=1

min (1 + aℓ′ , de)

(B.132)

=(

1 +τ

2

)

min(

1 + min(

dpicsir(ℓ− 1), d′′(ℓ− 1))

, de

)

dSB(R)

+(

1 +τ

2

)

Bntnr

ℓ−1∑

ℓ′=1

min(

1 + min(

dpicsir(ℓ′ − 1), d′′(ℓ′ − 1)

)

, de

)

.

(B.133)

Here the second equality follows from (B.104). This lower bound can be solved via

recursion from dpicsir(1) and coincides to the upper bound (B.98) since dpicsir(0) =

d′′(0) = 0 (which are the same as dp

icsir(0) = d′(0) = 0 in the upper bound). This

completes the proof the proposition.

239

Appendix C

Magnitude-Squared Notation Phase Notation

Matrix Entry (r, t) Matrix Entry (r, t)

Γb γb,r,t , |hb,r,t|2 ΦHb Φh

b,r,t , ∠hb,r,tΞb ξb,r,t , |eb,r,t|2 ΦE

b Φeb,r,t , ∠eb,r,t

Γb γb,r,t , |hb,r,t|2 ΦHb Φh

b,r,t , ∠hb,r,tΓb γb,r,t , |hb,r,t|2 ΦH

b Φhb,r,t , ∠hb,r,t

Table C.1: Definition of magnitute-squared and phase variables.

Throughout this appendix, to simplify the presentation, we shall change the

usual notation for the pdf in Chapter 1. For any continuous random variable W ,

we rewrite the pdf PW (w) as p(w).

C.1 Preliminaries

Let Hb,r,t, Hb,r,t and Eb,r,t be the entries of H b, H b and Eb at row r and column t

and let

Hb,r,t ,1

σeHb,r,t. (C.1)

It follows from (5.3) that conditioned on Hb,r,t = hb,r,t, Hb,r,t is complex-Gaussian

distributed with mean of 1σehb,r,t and unit variance.

We define magnitude-squared variables and phase variables in Table C.1.

Note that the random variables Γb,r,t and Ξb,r,t have the exponential pdfs

p(γb,r,t) = e−γb,r,t , (C.2)

p(

ξb,r,t

)

= P dee−P de ξb,r,t . (C.3)

241

C.2 Power Allocation and Asymptotic Analysis

Magnitude-Squared Entry (r, t) Normalised Entry (r, t)Matrix Matrix

Γb γb,r,t , |hb,r,t|2 Ab αb,r,t , − log γb,r,tlogP

Ξb ξb,r,t , |eb,r,t|2 Θb θb,r,t , − log ξb,r,tlogP



Table C.2: Definition of normalised magnitute-squared variables.

Conditioned on Hb,r,t = hb,r,t, Γb,r,t has the non-central chi-square pdf

p(γb,r,t|ν) = e−γb,r,t−νI0(

2√

γb,r,tν)

(C.4)

where ν = 1σ2e|hb,r,t|2 = 1

σ2eγb,r,t is the non-centrality parameter and I0(·) is the

zeroth order modified Bessel function of the first kind.

For high-SNR analysis, we define transformed variables in Table C.2. It

follows from (C.2)–(C.4) that we have the following pdfs

p(αb,r,t) = log(P )P−αb,r,te−P−αb,r,t

, (C.5)

p(θb,r,t) = log(P )P de−θb,r,te−Pde−θb,r,t

, (C.6)

p(αb,r,t|αb,r,t) = log(P )P−αb,r,te−P−αb,r,t−P

de−αb,r,tI0

(

2Pde−αb,r,t−αb,r,t

2

)

. (C.7)


C.2.1 Power Allocation

We consider power allocation with a scaled identity matrix (5.5)

Pb

(

H(n(b)))

=Pb

(

H(n(b)))

nt

Int, b = 1, . . . , B. (C.8)

One can show that power allocation with constraint [Pb(H(n(b)))] ≤ BP for

all b = 1, . . . , B results in an upper bound to the outage SNR-exponent; note

that this violates the constraint (5.18). On the other hand, one can consider

a suboptimal power allocation such that [Pb(H(n(b)))] ≤ P to obtain a lower

242


bound to the outage SNR-exponent. Let

Γ(n(b)) ,

[

Γ1, . . . , Γn(b)

]

, (C.9)

A(n(b)) ,

[

A1, . . . , An(b)

]

, (C.10)

ΦH(n(b))

,

[

ΦH1 , . . . ,Φ

Hn(b)

]

. (C.11)

Then, the optimal power allocation satisfies

∫

Γ(n(b))∈n(b)·nr·nt+ ,

ΦH(n(b))∈[0,2π)n(b)·nr ·nt

Pb

(

H(n(b)))

p(

Γ(n(b)))

p(

ΦH(n(b)))

dΓ(n(b))dΦH(n(b)) .

≤ P.

(C.12)

Let Pb(H(n(b)))

.= Pb. Using the transformation in Table C.2, the above con-

straint becomes

∫

A(n(b))∈n(b)·nr·nt+ ,

ΦH(n(b))∈[0,2π)n(b)·nr·nt

PbP−∑n(b)

b′=1

∑nrr=1

∑ntt=1 αb′,r,tdA(n(b))dΦH(n(b)) .

≤ P. (C.13)

Herein we have neglected the terms irrelevant to the SNR-exponent such as the

phase as p(ΦH(n(b))) is uniform over [0, 2π)n(b)·nr·nt and the interval of αb′,r,t < 0 as

its probability decays exponentially with the SNR. Applying Varadhan’s lemma

[106] to (C.13) yields

supA(n(b))∈n(b)·nr·nt

+ ,

ΦH(n(b))∈[0,2π)n(b)·nr·nt

b

(

A(n(b)),ΦH(n(b)))

−n(b)∑

b′=1

nr∑

r=1

nt∑

t=1

αb′,r,t

≤ 1. (C.14)

The optimal power exponent b minimises Pgout(R). In the following, we shall

consider b that depends on the magnitude but not the phase, i.e., b(A(n(b))).

We shall observe later in Appendices C.2.3 and C.3 – C.5 that this allocation

does not incur loss in the terms of SNR-exponent.

243


C.2.2 Asymptotic Analysis

Let OX be the large-SNR outage set from an input alphabet X that can expressed

in terms of Γb, Γb, ΦHb , Θb and ΦE

b , b = 1, . . . , B. Then, it follows that

Pgout(R) = Pr

Igmi(H , H ,P) < R

(C.15)

=

∫

OX

B∏

b=1

p(

Γb,Hb, Eb

)

dΓbdHbdEb (C.16)

=

∫

OX

B∏

b=1

p(

Γb

∣

∣

∣Γb

)

p (Γb) p(

ΦHb

)

p(

Ξb

)

p(

ΦEb

)

· dΓbdΓbdΞbdΦHb dΦ

Eb .

(C.17)

By changing the variables from Γb, Γb and Ξb to Ab, Ab and Θb, we have that

Pgout(R)

.=

∫

OX

B∏

b=1

nr∏

r=1

nt∏

t=1

p(αb,r,t|αb,r,t)p(αb,r,t)p(θb,r,t)dαb,r,tdαb,r,tdθb,r,tdφhb,r,tdφ

eb,r,t

(C.18)

where the pdfs have been expressed in terms of the entries of the matrices. Herein

the pdfs of the phases do not appear because ΦHb,r,t and ΦE

b,r,t are uniformly dis-

tributed over [0, 2π) and hence do not affect the dot equality.

Now, assume that we have perfect CSIR (de ↑ ∞). Following the same

derivation in [46], we have that

Pgout(R).=

∫

OX

B∏

b=1

nr∏

r=1

nt∏

t=1

p(αb,r,t|αb,r,t)p(αb,r,t)dαb,r,tdαb,r,t (C.19)

.=

∫

OX

∏

(b,r,t):−de≤αb,r,t=αb,r,t−de<0

(

P−αb,r,tdαb,r,t

)

·∏

(b,r,t):αb,r,t≥0,αb,r,t≥de

(

P−(αb,r,t+αb,r,t)dαb,r,tdαb,r,t

)

. (C.20)

We compare (C.18) and (C.20), and observe that the extra term in (C.18) is

due to p(θb,r,t). Thus, evaluating (C.18) by using the joint pdf

244

C.2 Power Allocation and Asymptotic Analysis∏

b,r,t p(αb,r,t|αb,r,t)p(αb,r,t) (C.20) and the pdf p(θb,r,t) (C.6) yields

Pgout(R).= P dicsi (C.21)

.=

∫

OX

B∏

b=1

nr∏

r=1

nt∏

t=1

p(αb,r,t|αb,r,t)p(αb,r,t)p(θb,r,t)dαb,r,tdαb,r,tdθb,r,tdφhb,r,tdφ

eb,r,t

(C.22)

.=

∫

OX

∏

(b,r,t):−de≤αb,r,t=αb,r,t−de<0

(

logP · e−P−(θb,r,t−de) · P−αb,r,t−(θb,r,t−de)

· dαb,r,tdθb,r,tdφhb,r,tdφ

eb,r,t

)

×∏

(b,r,t):αb,r,t≥0,αb,r,t≥de

(

logP · e−P−(θb,r,t−de) · P−αb,r,t−αb,r,t−(θb,r,t−de)

· dαb,r,tdαb,r,tdθb,r,tdφhb,r,tdφ

eb,r,t

)

(C.23)

.=

∫

OX

∏

(b,r,t):−de≤αb,r,t=αb,r,t−de<0,

θb,r,t≥de

(

P−αb,r,t−(θb,r,t−de)dαb,r,tdθb,r,tdφhb,r,tdφ

eb,r,t

)

×∏

(b,r,t): αb,r,t≥0,αb,r,t≥de,

θb,r,t≥de

(

P−αb,r,t−αb,r,t−(θb,r,t−de)dαb,r,tdαb,r,tdθb,r,tdφhb,r,tdφ

eb,r,t

)

(C.24)

where the last dot equality follows from the proof of Lemma A.1. Applying

Varadhan’s lemma [106] to (C.24) yields

dicsi = infA,A,Θ∈OX

∑

(b,r,t):−de≤αb,r,t=αb,r,t−de<0, θb,r,t≥de

αb,r,t +(

θb,r,t − de

)

∑

(b,r,t): αb,r,t≥0,αb,r,t≥de, θb,r,t≥de

αb,r,t + αb,r,t +(

θb,r,t − de

)

. (C.25)

C.2.3 GMI Upper and Lower Bounds

We use bounding techniques developed in Appendix A to prove the results. A

GMI upper bound is obtained from Proposition 2.3. Let

Igmib (Pb,Hb, Hb) ≥ sup

s>0Igmib (Pb,Hb, Hb, s) (C.26)

245


be the upper bound obtained by following the derivation in Appendices A.2 and

A.3. Denote the resulting GMI upper bound as

Igmi(H, H,P) ,

B∑

b=1

Igmib (Pb,Hb, Hb). (C.27)

We can see from Appendices A.2 and A.3 that Igmib (Pb,Hb, Hb) is a non-decreasing

function of the transmit power coefficient Pb(H(n(b))) at high SNR. It follows that

using the maximum power exponent in (C.14), i.e.,

b

(

A(n(b)))

= 1 +

n(b)∑

b′=1

nr∑

r=1

nt∑

t=1

αb′,r,t (C.28)

for the GMI upper bound yields an upper bound to the optimal outage SNR-

exponent for i.i.d. inputs. As for A(n(b)), the power exponent can be expressed

as

b

(

A(n(b)))

= 1 + n(b)nrntde +

n(b)∑

b′=1

nr∑

r=1

nt∑

t=1

αb′,r,t. (C.29)

We further denote an equivalent outage-set obtained with GMI upper bound as

OX ,

H, E,P : Igmi(H, H,P) < R

. (C.30)

A GMI lower bound is obtained by substituting a suboptimal s to Igmib (Pb,Hb, Hb, s),

i.e.,

s =B

Bnr +∑B

b=1

∑nr

r=1

∑nt

t=1

Pb(H(n(b)))nt

|eb,r,t|2. (C.31)

Let

Igmib (Pb,Hb, Hb, s) ≤ Igmi

b (Pb,Hb, Hb, s) (C.32)

be the lower bound derived using the techniques in Appendices A.2 and A.3 and

let

Igmi(H, H,P) ,1

B

B∑

b=1

Igmib (Pb,Hb, Hb, s) (C.33)

246

C.3 Full-CSIT Power Allocation

be the resulting GMI lower bound. Then, we denote an equivalent-outage set

with GMI lower bound

OX ,

H, E,P : Igmi(H, H,P) < R

. (C.34)

Note that the choice s gives the largest outage SNR-exponent in Chapter 3 with

uniform power allocation. However, with power control, it is not yet clear what

effects that may occur by allocating different power to different fading blocks.

Nevertheless, we will first use the allocation (C.29) to evaluate a lower bound to

the outage SNR-exponent. We will show later that this allocation may not give

a tight upper and lower bounds.


For full CSIT, we have n(b) = B, b = 1, . . . , B. We shall show in the following

that upper and lower bounds to the optimal outage SNR-exponent derived using

b(A(n(b))) (i.e, (C.29) with n(b) = B),

b

(

A(n(b)))

= 1 +Bnrntde +B∑

b′=1

nr∑

r=1

nt∑

t=1

αb′,r,t (C.35)

are tight. In the following, the superscript f will be used to indicate full CSIT.

We infer from Appendix A that it suffices to consider solving the outage

SNR-exponent for discrete inputs with alphabet size |X| = 2M . The proof for

the Gaussian inputs with constant R independent of the SNR (such that the

multiplexing gain tends to zero) follows along the same line as the proof for

discrete inputs with a sufficiently large alphabet size such that M ≥ BR. Thus,

for the remaining part of this appendix, we shall focus on OX for discrete inputs.

247


C.3.1 GMI Upper Bound

Substituting OX in (C.25) with OX (C.30) yields an upper bound dficsi ≥ dficsi

dficsi = infA,A,Θ∈OX

∑


αb,r,t +(

θb,r,t − de

)

∑



θb,r,t − de

)

. (C.36)

Using b(A(n(b))) in (C.35) and following the derivation in Appendix A.2, we can

express an equivalent outage-set with GMI upper bound (C.30) for the discrete

constellations of size |X| = 2M as follows

OX =

A, A, Θ ∈ Bnr×nt :

B∑

b=1

κb <BR

M

. (C.37)

Here we have defined the following

κb ,∣

∣

∣S(ǫ,δ)

b

∣

∣

∣, (C.38)

S(ǫ,δ)

b ,

nr⋃

r=1

S(ǫ,δ)

b,r , (C.39)

S(ǫ,δ)

b,r ,

t :

αb,r,t ≤ b

(

A(n(b)))

+ ǫ ∩ αb,r,t ≤ θb,r,t + δ

∪

αb,r,t ≤ b

(

A(n(b)))

+ ǫ ∩ αb,r,t > θb,r,t + δ ∩ Qb,r,t

,

t = 1, . . . , nt

, (C.40)

Qb,r,t ,

φhb,r,t, φ

eb,r,t ∈ [0, 2π) : cos

(

φeb,r,t − φh

b,r,t

)

> 0

(C.41)

for any ǫ, δ > 0. Note that increasing θb,r,t increases both the objective function

(C.36) and the threshold for αb,r,t in (C.40). Hence, the infimum solutions for

θb,r,t, b = 1, . . . , B, r = 1, . . . , nr, t = 1, . . . , nt are given by de.

From (C.36), (C.37) and θb,r,t = de, b = 1, . . . , B, r = 1, . . . , nr, t = 1, . . . , nt,

248


assume that without loss of generality, for r = 1, . . . , nr

αb,r,t ≥ min(

b

(

A(n(b)))

+ ǫ, de + δ)

, bB + t >BR

M(C.42)

if Qb,r,t does not occur, and

αb,r,t ≥ b

(

A(n(b)))

+ ǫ, bB + t >BR

M(C.43)

if Qb,r,t occurs. Since the argument on the RHS of (C.36) is increasing with αb,r,t

and since withπ

2≤ φe

b,r,t − φhb,r,t ≤

3π

2, (C.44)

the event Qb,r,t does not occur, it follows that the infimum is achieved with

αb,r,t =

min(

b

(

A(n(b)))

+ ǫ, de + δ)

, if bB + t > BRM

0, otherwise.(C.45)

Depending on the values of de and b(A(n(b))), we have the following cases.

• Case 1: de + δ ≤ b(A(n(b))) + ǫ and de + δ < de

We have αb,r,t for bB + t > BRM

that achieve the infimum (C.36) are given

by de + δ and they belong to (b, r, t) : −de ≤ αb,r,t = αb,r,t − de < 0.On the other hand, for bB + t ≤ BR

M, the infimum (C.36) is achieved with

αb,r,t = 0, which also belongs to (b, r, t) : −de ≤ αb,r,t = αb,r,t − de < 0.Thus, the solutions for αb,r,t that achieve the infimum (C.36) are given by

αb,r,t =

de + δ − de, for bB + t > BRM

−de, otherwise.(C.46)

This yields

dficsi = (de + δ) dSB(R). (C.47)

Here we have that b(A(n(b))) = 1+Bnrntde+

∑

b,r,t αb,r,t = 1+dSB(R)de ≥de+δ−ǫ, which satisfies the condition for case 1 for sufficiently small δ−ǫ.

• Case 2: de + δ ≤ b(A(n(b))) + ǫ and de + δ ≥ de

We have αb,r,t for bB + t > BRM

that achieve the infimum (C.36) are given

249


by de + δ and they belong to (b, r, t) : αb,r,t ≥ 0, αb,r,t ≥ de. On the other

hand, for bB+t ≤ BRM

, the infimum (C.36) is achieved with αb,r,t = 0, which

belongs to (b, r, t) : −de ≤ αb,r,t = αb,r,t − de < 0. Thus, the solutions for

αb,r,t that achieve the infimum (C.36) are given by

αb,r,t =

0, for bB + t > BRM

−de, otherwise.(C.48)

This yields

dficsi = (de + δ) dSB(R). (C.49)

Here we have that b(A(n(b))) = 1+Bnrntde +

∑

b,r,t αb,r,t = 1+ dSB(R)de.

Hence, the condition de + δ ≤ b(A(n(b))) + ǫ is satisfied if de + δ ≤

1 + dedSB(R) + ǫ for some sufficiently small δ, ǫ > 0.

• Case 3: de + δ > b(A(n(b))) + ǫ

We first note that for bB+t ≤ BRM

, the values of αb,r,t achieving the infimum

are given by zero. This implies that the values for αb,r,t, bB + t ≤ BRM

achieving the infimum are given by −de.

For (b′, r′, t′) such that b′B+t′ > BRM

, if (b′, r′, t′) belongs to (b, r, t) : −de ≤αb,r,t = αb,r,t − de < 0, we have that

b

(

A(n(b)))

= 1 + (Bnrnt − 1)de + αb′,r′,t′ +∑

(b,r,t)6=(b′,r′,t′)

αb,r,t ≥ αb′,r′,t′.

(C.50)

This implies that the constraint (C.37) can never be met. As such (b′, r′, t′)

for b′B + t′ > BRM

must belong to (b, r, t) : αb,r,t ≥ 0, αb,r,t ≥ de. In that

case, since b(A(n(b))) increases with αb,r,t, the values of αb,r,t that solve the

infimum in (C.36) are given by

αb,r,t =

0, for bB + t > BRM

−de, otherwise(C.51)

and results in

b

(

A(n(b)))

= 1 + dSB(R)de. (C.52)

250


It follows that

dficsi = dSB(R) (1 + dSB(R)de + ǫ) . (C.53)

From cases 1 to 3, by letting ǫ, δ ↓ 0, we have the upper bound

dficsi =

dedSB(R), if de < 1 + dSB(R)de

dSB(R) (1 + dSB(R)de) , if de ≥ 1 + dSB(R)de.(C.54)

C.3.2 GMI Lower Bound

Replacing OX in (C.25) with OX (C.34) yields a lower bound dficsi ≤ dficsi

dficsi = infA,A,Θ∈O

X

∑


αb,r,t +(

θb,r,t − de

)

∑



θb,r,t − de

)

. (C.55)

In the following, we solve dficsi using the same power exponent b(A(n(b))) used to

derive the upper bound (cf. (C.35))

b

(

A(n(b)))

= 1 +Bnrntde +B∑

b′=1

nr∑

r=1

nt∑

t=1

αb′,r,t, b = 1, . . . , B (C.56)

and show that this exponent yields dficsi = dficsi. Following the derivation in

Appendix A.2, we first express an equivalent outage-set with GMI lower bound

(C.34) for the discrete constellation of size |X| = 2M as follows

OX =

A, A, Θ ∈ Bnr×nt :B∑

b=1

κb <BR

M

. (C.57)

251



κb ,∣

∣

∣S(ǫ,δ)b

∣

∣

∣, (C.58)

S(ǫ,δ)b ,

nr⋃

r=1

S(ǫ,δ)b,r , (C.59)

S(ǫ,δ)b,r ,

t : αb,r,t < b

(

A(n(b)))

− ǫ ∩ αb,r,t < θmin − δ, t = 1, . . . , nt

,

(C.60)

θmin , minθ1,1,1, . . . , θB,nr,nt (C.61)

for any ǫ, δ > 0.

We observe from (C.55) and (C.60) that increasing θmin increases both the

objective function (C.55) and the threshold for αb,r,t in (C.60). Thus, the value

of θmin that solves the infimum (C.55) is given by de. Since for any b, r, t we have

θb,r,t ≥ θmin, the values of θb,r,t that solve the infimum are also given by de as they

do not appear in OX.

We next compare S(ǫ,δ)b,r (C.60) with S

(ǫ,δ)

b,r (C.40). There are two main dif-

ferences between S(ǫ,δ)b,r and S

(ǫ,δ)

b,r . Firstly, in the set S(ǫ,δ)b,r , we have θmin as the

threshold for αb,r,t instead of θb,r,t in S(ǫ,δ)

b,r . However, since the value of θmin that

solves the infimum (C.55) is also given by de (same as the value of θb,r,t that

solves the infimum (C.36)), this difference will not contrast the resulting infi-

mum in (C.36) and (C.55). Secondly, we have an extra term in the set S(ǫ,δ)

b,r that

depends on the phases φhb,r,t and φ

eb,r,t

αb,r,t ≤ b + ǫ ∩ αb,r,t > θb,r,t + δ ∩ Qb,r,t

(C.62)

where

Qb,r,t =

φhb,r,t, φ

eb,r,t ∈ [0, 2π) : cos

(

φeb,r,t − φh

b,r,t

)

> 0

. (C.63)

The infimum solution in (C.36) is obtained when the event Qb,r,t does not occur.

It follows that since the infimum solutions for both θmin and θb,r,t are identical

and the set (C.62) is not active in solving the infimum (C.36), the result for the

infimum (C.55) has a similar form to that for (C.36), i.e.,

dpicsi =

(de − δ) dSB(R), if de − ǫ < 1 + dSB(R)de − δ

(1 + dSB(R)de − ǫ) dSB(R), if de − ǫ ≥ 1 + dSB(R)de − δ.(C.64)

252

C.4 Causal-CSIT Power Allocation

By letting ǫ, δ ↓ 0, combining (C.54) with (C.64) completes the proof.


For causal CSIT, we have that n(b) = b− τd. The exponent of the optimal power

allocation must satisfy (C.14). We shall first use the maximum power exponent

satisfying the constraint, i.e.,

b

(

A(n(b)))

= 1 +

b−τd∑

b′=1

nr∑

r=1

nt∑

t=1

(αb′,r,t + de) (C.65)

for both GMI upper and lower bounds. For GMI upper bound, this gives an

upper bound to the optimal outage SNR-exponent as argued in Appendix C.2.3.

For GMI lower bound, this, however, may not yield a tight result to the optimal

outage SNR-exponent. Nevertheless, this gives a guidance about the structure

of the power exponent that yields a tight lower bound.

Similarly to Appendix C.3, we note that it suffices to consider solving the

outage SNR-exponent for discrete inputs with alphabet size |X| = 2M . In the

following, the superscript c will be used to indicate causal CSIT.

C.4.1 GMI upper bound

An upper bound to the outage SNR-exponent with causal CSIT dcicsi has an

equivalent expression to the one with full CSIT (C.36), i.e.,

dcicsi = infA,A,Θ∈OX

∑


αb,r,t +(

θb,r,t − de

)

∑



θb,r,t − de

)

. (C.66)

Here OX has a similar form to that in the full-CSIT case except for b(A(n(b))),

which is now given in (C.65). We can write OX as

OX =


b=1

κb <BR

M

(C.67)

253


where we have defined the following

κb ,∣

∣

∣S(ǫ,δ)

b

∣

∣

∣, (C.68)

S(ǫ,δ)

b ,

nr⋃

r=1

S(ǫ,δ)

b,r , (C.69)

S(ǫ,δ)

b,r ,

t :

αb,r,t ≤ b

(

A(n(b)))

+ ǫ ∩ αb,r,t ≤ θb,r,t + δ

∪

αb,r,t ≤ b

(

A(n(b)))


,

t = 1, . . . , nt

, (C.70)

Qb,r,t ,

φhb,r,t, φ

eb,r,t ∈ [0, 2π) : cos

(

φeb,r,t − φh

b,r,t

)

> 0

(C.71)

for any ǫ, δ > 0.

We first define the following

d‡ , Bnt −⌈

BR

M

⌉

+ 1. (C.72)

Following the same argument used in Appendix C.3.1, the infimum solutions for

θb,r,t, for all b, r, t in (C.66) are given by de. For each r = 1, . . . , nr, assume that

without loss of generality, the following conditions that make the constraint in

OX tight

αb,r,t > min(

b

(

A(n(b)))

+ ǫ, de + δ)

,(

φhb,r,t, φ

eb,r,t

)

/∈ Qb,r,t, bnt + t ≤ d‡,

(C.73)

αb,r,t ≤ min(

b

(

A(n(b)))

+ ǫ, de + δ)

, bnt + t > d‡.

(C.74)

Then, the infimum (C.66) is achieved with αb,r,t equal to

ϑb,r,t =

min(

b

(

A(n(b)))

+ ǫ, de + δ)

, for bnt + t ≤ d‡

0, for bnt + t > d‡.(C.75)

Note that for b = 1, . . . , τd, we have b(A(n(b))) = 1.

The exponent b(A(n(b))) sets a threshold for αb,r,t in (C.70) (deep-fading

threshold). Since increasing αb′,r,t, b′ = 1, . . . , b − τd increases both b(A

(n(b)))

254


and the objective function in (C.66), it follows that the solutions for αb,r,t that

attain the infimum (C.66) are given by

αb,r,t =

ϑb,r,t − de, if ϑb,r,t < de

0, if ϑb,r,t ≥ de(C.76)

which can also be written as

αb,r,t = min(

ϑb,r,t − de, 0)

. (C.77)

Using this αb,r,t, we have that for b = τd + 1, . . . , B

b

(

A(n(b)))

= 1 +

b−τd∑

b′=1

nr∑

r=1

nt∑

t=1

min(

ϑb′,r,t, de)

. (C.78)

Let

b = maxb: bnt≤d‡

b. (C.79)

It follows from (C.75), (C.76) and (C.78) that by letting ǫ, δ ↓ 0, the infimum

(C.66) is given by

dcicsi = ntnr

b∑

b=1

ϑb + nr

(

d‡ − bnt

)

ϑb+1 (C.80)

where for b = 1, . . . ,min(τd, b+ 1)

ϑb = min(de, 1) (C.81)

and for b = (τd, b+ 1) + 1, . . . , b+ 1

ϑb = min

(

de, 1 + nrnt

b−τd∑

b′=1

min(

ϑb′ , de)

)

. (C.82)

255



A lower bound to the outage SNR-exponent with causal CSIT dcicsi has an equiv-

alent expression to the one with full CSIT (C.55), i.e.,

dcicsi = infA,A,Θ∈O

X

∑


αb,r,t +(

θb,r,t − de

)

∑



θb,r,t − de

)

(C.83)

where now OX is characterised by b(A(n(b))) satisfying the constraint (C.14).

Note that b(A(n(b))) in (C.65) may no longer give a tight lower bound. We will

show later the way to improve b(A(n(b))) so that we obtain a tight lower bound.

Following the derivation in Appendix A.2, we obtain an equivalent-outage set

with GMI lower bound (C.34) for discrete inputs as follows

OX =


b=1

κb <BR

M

. (C.84)


κb ,∣

∣

∣S(ǫ,δ)b

∣

∣

∣, (C.85)

S(ǫ,δ)b ,

nr⋃

r=1

S(ǫ,δ)b,r , (C.86)

S(ǫ,δ)b,r ,

t : αb,r,t < b

(

A(n(b)))

− ǫ ∩ αb,r,t < gb − δ, t = 1, . . . , nt

(C.87)

where gb satisfies

P gb .=

Pb

(

H(n(b)))

∑Bb=1

∑nr

r=1

∑nt

t=1

Pb(H(n(b)))nt

∣

∣eb,r,t∣

∣

2(C.88)

and given by

gb = minb=1,...,B

θb,min −b +b

. (C.89)

Here we have θb,min , θb,1,1, . . . , θb,nr,nt. Since increasing θb,min may increase the

256


threshold gb in OX, the values of θb,min achieving the infimum (C.83) are given

by de. All other θb,r,t are also given by de. Since b(A(n(b))) is monotonically

non-decreasing with b, it follows that for b = 1, . . . , τd

gb = de −B

(

A(n(B)))

+ 1, (C.90)

and for b = τd + 1, . . . , B

gb = de −B

(

A(n(B))

)

+b

(

A(n(b))

)

. (C.91)

For r = 1, . . . , nr, assume that without loss of generality, the following condi-

tions that make the the constraint OX tight

αb,r,t > min(

b

(

A(n(b)))

− ǫ, gb − δ)

, bnt + t ≤ d‡, (C.92)

αb,r,t ≤ min(

b

(

A(n(b)))

− ǫ, gb − δ)

, bnt + t > d‡ (C.93)

where d‡ is defined in (C.72). By letting ǫ, δ ↓ 0, the infimum (C.83) is achieved

with αb,r,t equal to

ϑb,r,t =

min(

b

(

A(n(b)))

, gb)


0, for bnt + t > d‡.(C.94)

Since gb in (C.89) is less than or equal to θb,r,t in (C.70), a lower bound to the

outage SNR-exponent with b(A(n(b))) is generally less than the upper bound.

The loss is mainly due to transmitting with power exponent larger than the

CSIR-error diversity de. Hence, in this case, using the exponent (C.65) which is

optimal in the perfect-CSIR case, may no longer be optimal in the mismatched-

CSIR case.

However, as observed in the GMI upper bound (Appendix C.4.1), if de is less

than b(A(n(b))) in (C.65), using a power exponent beyond de does not provide

additional gains as de always limits the performance. In the following, we show

that limiting the power exponent by the CSIR-error diversity de yields the same

outage SNR-exponent as that in Appendix C.4.1.

Consider a new power exponent ′b(A

(n(b))), which is obtained by imposing a

257


peak limit de to b(A(n(b))), i.e.,

′b

(

A(n(b)))

= min(

de, b

(

A(n(b))))

. (C.95)

With power exponent ′b(A

(n(b))), gb in (C.89) becomes

g′b = minb=1,...,B

θb,min −′b +′

b

. (C.96)

With θb,min = de that leads to the infimum (C.83), we have that

g′b = de −′B

(

A(n(B)))

+′b

(

A(n(b)))

. (C.97)

Following from (C.94), the values of αb,r,t achieving the infimum (C.83) become

ϑ′b,r,t =

min(

′b

(

A(n(b)))

, g′b)


0, for bnt + t > d‡.(C.98)

For bnt + t ≤ d‡, we can evaluate ϑ′b,r,t as follows

ϑ′b,r,t = min (′b, g

′b) (C.99)

= min (de, b, de −min (de, B) + min (de, b)) (C.100)

where we have omitted the arguments (A(n(b))) and (A(n(B))) for ease of notation.

Since b(A(n(b))) is non-decreasing with b, evaluating (C.100) yields

ϑ′b,r,t =

de, if de ≤ b

(

A(n(b)))

b

(

A(n(b)))

, if b

(

A(n(b)))

< de ≤ B

(

A(n(B)))

b

(

A(n(b)))

, if de > B

(

A(n(B)))

(C.101)

which can simply be written as

ϑ′b,r,t = min(

de, b

(

A(n(b))))

. (C.102)

Note that for b = 1, . . . , τd, we have b(A(n(b))) = 1. Thus, for b = 1, . . . , τd, the

values of ϑ′b,r,t are given by min(de, 1).

We next considerb(A(n(b))), b = τd+1, . . . , B. The power exponentb(A

(n(b)))

258


sets a threshold for αb,r,t in (C.87). Since increasing αb′,r,t, b′ = 1, . . . , b − τd in-

creases both b(A(n(b))) and the objective function in (C.83), it follows that the

solutions for αb′,r,t attaining the infimum (C.83) are given by

αb′,r,t =

ϑb′,r,t − de, if ϑb′,r,t < de

0, if ϑb′,r,t ≥ de(C.103)

which can also be written as

αb′,r,t = min(

ϑ′b′,r,t − de, 0

)

. (C.104)

Using this αb′,r,t, we have that for b = τd + 1, . . . , B

b

(

A(n(b)))

= 1 +

b−τd∑

b′=1

nr∑

r=1

nt∑

t=1

min(

ϑ′b′,r,t, de

)

. (C.105)

Recall that d‡ and b are defined in (C.72) and (C.79), respectively. It follows

from (C.102), (C.103) and (C.105) that the infimum (C.83) is given by

dcicsi = ntnr

b∑

b=1

ϑ′b + nr

(

d‡ − bnt

)

ϑ′b+1

(C.106)

where for b = 1, . . . ,min(τd, b+ 1)

ϑ′b = min(de, 1) (C.107)

and for b = min(τd, b+ 1) + 1, . . . , b+ 1

ϑ′b = min

(

de, 1 + nrnt

b−τd∑

b′=1

min (ϑ′b′ , de)

)

. (C.108)

We can see that this lower bound coincides with the upper bound (C.80), which

completes the proof.

259

C.5 Predictive-CSIT Power Allocation


For predictive CSIT, we have that n(b) = b + τf . The exponent of the optimal

power allocation must satisfy (C.14)

supA(n(b))∈Rn(b)nr×nt

+

b

(

A(n(b)))

−min(B,b+τf )∑

b′=1

nr∑

r=1

nt∑

t=1

(αb′,r,t + de)

≤ 1. (C.109)

Here we have incorporated min(B, b + τf) to limit only fading estimates for the

current codeword as fading matrices beyond the current codeword does not affect

the current transmission.

Similarly to Appendix C.3, we note that it suffices to consider solving the

outage SNR-exponent for discrete inputs with alphabet size |X| = 2M . In the

following, the superscript p is used to indicate predictive CSIT.

C.5.1 GMI Upper Bound

Replacing OX in (C.25) with OX yields an upper bound dpicsi ≥ dpicsi

dpicsi = infA,A,Θ∈OX

∑


αb,r,t +(

θb,r,t − de

)

∑



θb,r,t − de

)

.

(C.110)

Here we use the maximum power exponent satisfying the constraint, i.e.,

b

(

A(n(b)))

= 1 +

min(B,b+τf )∑

b′=1

nr∑

r=1

nt∑

t=1

(αb′,r,t + de) (C.111)

as this gives an upper bound to the optimal outage SNR-exponent as argued in

Appendix C.2.3.

Similarly to the full-CSIT case, an equivalent outage-set with GMI upper

260


bound (C.30) for discrete inputs is given by

OX =

A, A, Θ ∈ Bnr×nt :

B∑

b=1

κb <BR

M

(C.112)

where κb = |S(ǫ,δ)

b | and S(ǫ,δ)

b =⋃nr

r=1 S(ǫ,δ)

b,r are all defined in (C.38) and (C.39) but

with b(A(n(b))) given in (C.111).

Following the same argument in Appendix C.3, the infimum solution for θb,r,t

are given by de. As b(A(n(b))) is non-decreasing with b, without loss of generality,

for each r = 1, . . . , nr, assume the following conditions

αb,r,t > min(

b

(

A(n(b)))

+ ǫ, de + δ)

,(

φhb,r,t, φ

eb,r,t

)

/∈ Qb,r,t, bnt + t ≤ d‡,

(C.113)

αb,r,t ≤ min(

b

(

A(n(b)))

+ ǫ, de + δ)

, bnt + t > d‡

(C.114)

that satisfy OX with a tight inequality in the constraint. Then, the infimum

(C.110) is achieved with

αb,r,t =

min(

b

(

A(n(b)))

+ ǫ, de + δ)


0, for bnt + t > d‡.(C.115)

We let ǫ, δ ↓ 0. For any de > 0, let

b∗ , maxb:b(A(n(b)))<de

b (C.116)

if b ∈ 1, . . . , B such that b(A(n(b))) < de exists and b∗ , 0, otherwise. In the

following, we solve b(A(n(b))) when the infimum (C.110) is achieved. For the

following two cases, we note that b(A(n(b))) is non-decreasing with b, and d‡ and

b are defined in (C.72) and (C.79), respectively.

1. Case de < de

For b ≥ b+1 and t such that (b−1)nt+ t > d‡, the values of αb,r,t achieving

the infimum are zero. It follows that αb,r,t = −de. In the following, we solve

for αb,r,t and b(A(n(b))) for b ≤ b+ 1 and t such that (b− 1)nt + t ≤ d‡.

• b∗ < b

261


For b ∈ [b∗ + 1, b+ 1] and t such that (b− 1)nt + t ≤ d‡, the infimum

(C.110) is achieved with

αb,r,t = de, (C.117)

αb,r,t = de − de (C.118)

for all r. For b ≤ b∗, the infimum (C.110) is achieved with

αb,r,t = b

(

A(n(b))

)

, (C.119)

αb,r,t = 0. (C.120)

With the above values of αb,r,t, we have

b

(

A(n(b)))

= 1 +min(b∗, b+ τf)ntnrde

+[

min(

d‡, (b+ τf)nt

)

− b∗nt

]

+nrde. (C.121)

• b∗ > b

For b ≤ b+ 1 and t such that (b− 1)nt + t ≤ d‡, the infimum (C.110)

is achieved with

αb,r,t = b

(

A(n(b)))

(C.122)

αb,r,t = 0 (C.123)

for all r. Thus, we have that

b

(

A(n(b)))

= 1 +min(

d‡, (b+ τf)nt

)

nrde. (C.124)

• b∗ = b

For b ≤ b∗, the infimum (C.110) is achieved with

αb,r,t = b

(

A(n(b)))

(C.125)

αb,r,t = 0 (C.126)

for all r, t. For b ≥ b∗ + 1 and t such that (b − 1)nt + t ≤ d‡, the

262


infimum (C.110) is achieved with

αb,r,t = de (C.127)

αb,r,t = de − de (C.128)

for all r. Thus, we have that

b

(

A(n(b)))

= 1 +min(b∗, b+ τf)ntnrde (C.129)

for b+ τf ≤ b∗ and

b

(

A(n(b)))

= 1 +min(b∗, b+ τf)ntnrde +(

d‡ − b∗nt

)

nrde (C.130)

otherwise.

Since in this case de < de, we have that b(A(n(b))) < de is only possible for

b∗ = 0. This implies that for all b = 1, . . . , B, we always have b(A(n(b))) ≥

de. Thus, it follows from (C.115) that by having ǫ, δ ↓ 0

αb,r,t =

de, for bnt + t ≤ d‡

0, for bnt + t > d‡.(C.131)

Note that in this case, the sum of the preceding values of αb,r,t contributing

to the infimum (C.110) is zero.

2. Case de ≥ de

In this case, when αb,r,t = 0, we have αb,r,t = −de; when αb,r,t = b(A(n(b)))

or αb,r,t = de, de ≥ de, we have αb,r,t = 0 to achieve the infimum (C.110).

Thus, in this case, the infimum solutions for αb,r,t are given by

αb,r,t =

0, for bnt + t ≤ d‡

−de, for bnt + t > d‡.(C.132)

The values of b(A(n(b))) when αb,r,t attaining the infimum (C.110) are then

263


given by

b

(

A(n(b)))

= ηb ,

1 + ntnr (b+ τf) de, b+ τf ≤ b

1 + nrd‡de, b+ τf > b.

(C.133)

It follows from (C.115) (by letting ǫ, δ ↓ 0) and (C.133) that

αb,r,t =

min (ηb, de) , for bnt + t ≤ d‡

0, for bnt + t > d‡.(C.134)

From (C.132) and (C.110), we observe that the sum of αb,r,t contributing

to the infimum (C.110) is also zero.

Combining the above cases yields infimum (C.110)

dpicsi = ntnr

b∑

b=1

min (ηb, de) + nr

(

d‡ − bnt

)

min(

ηb+1, de)

. (C.135)


Following the same explanation as in the causal-CSIT case (Appendix C.4), the

power exponent to prove a tight lower bound to the outage SNR-exponent is

peak-limited by de, i.e.,

′b

(

A(n(b)))

= min

de, 1 +

min(B,b+τf )∑

b′=1

nr∑

r=1

nt∑

t=1

min (αb′,r,t, de)

. (C.136)

For discrete inputs, (C.113) implies that the extra term in S(ǫ,δ)

b,r , i.e.,

αb,r,t ≤ b

(

A(n(b)))


(C.137)

for the GMI upper bound does not affect the solution for the infimum. Hence, by

ignoring this extra term and by letting ǫ, δ ↓ 0, it is not difficult to show that OX

derived using b(A(n(b))) in (C.111) and OX derived using ′

b(A(n(b))) in (C.136)

tend to be identical. With that, it can be shown that the resulting lower bound

to the outage SNR-exponent coincides to the upper bound (C.135). The proof

for the GMI lower bound is not reproduced here for the sake of compactness.

264

C.6 LMMSE Channel Estimation


Recall that for LMMSE estimation

CSIT Hb,r,t = Hb,r,t + Eb,r,t, (C.138)

CSIR Hb,r,t = Hb,r,t + Eb,r,t. (C.139)

We have the following pdfs

p(hb,r,t) =1

πe−|hb,r,t|2, (C.140)

p(hb,r,t|eb,r,t) =1

π (1− σ2e)exp

(

−|hb,r,t − eb,r,t|2(1− σ2

e)

)

, (C.141)

p(

hb,r,t|hb,r,t)

=1

πσ2e (1− σ2

e)exp

−

∣

∣

∣hb,r,t − (1− σ2

e) hb,r,t

∣

∣

∣

2

σ2e (1− σ2

e)

. (C.142)

Define

Hb,r,t ,Hb,r,t

√

σ2e (1− σ2

e). (C.143)

Conditioned on Hb,r,t = hb,r,t, Hb,r,t is a complex-Gaussian random scalar with

mean hb,r,t√

(1− σ2e)/σe and unit variance. Then, applying the variable-trans-

formation in Tabel C.2, the outage SNR-exponent can be evaluated using

Pgout(R)

.=

∫

OX

∏

b,r,t

p( γb,r,t| γb,r,t)p(

γb,r,t, φhb,r,t

∣

∣ ξb,r,t, φeb,r,t

)

p(

ξb,r,t

)

p(

φeb,r,t

)

dΓdΓdΞdΦHΦE

(C.144)

.=

∫

OX

∏

b,r,t

p(αb,r,t|αb,r,t)p(αb,r,t, φhb,r,t|θb,r,t, φe

b,r,t)p(

θb,r,t

)

p(

φeb,r,t

)

dAdAdΘdΦHΦE.

(C.145)

265


Here p(θb,r,t) is given in (C.6) and p(φeb,r,t) = (2π)−1. Following from Appendix

A.1, we have the bounds

p(αb,r,t, φhb,r,t|θb,r,t, φe

b,r,t)

≥ 1

2π(1− σ2e)

logP · P−αb,r,t · exp

−

(

P−αb,r,t2 + P− θb,r,t

2

)2

(1− σ2e)

, (C.146)


b,r,t)

≤ 1

2π(1− σ2e)

logP · P−αb,r,t · exp

−∣

∣

∣

∣

P−αb,r,t2 − P− θb,r,t

2

∣

∣

∣

∣

2

(1− σ2e)

. (C.147)

We have p(γb,r,t|γb,r,t) as a non-central Chi-square pdf with two degrees of freedom

p(γb,r,t|γb,r,t) = C0 · exp(

−γb,r,t − Λhb,r,t

)

· I0(

2√

Λhb,r,tγb,r,t

)

(C.148)

where C0 is the normalising constant and Λhb,r,t is the non-centrality parameter

Λhb,r,t =

(1− σ2e)

σ2e

|hb,r,t|2 .= P de−αb,r,t . (C.149)

Let p(αb,r,t, φhb,r,t|θb,r,t, φe

b,r,t) be the RHS of (C.146). We then have a lower

bound to Pgout(R) as

Pgout(R)

.≥∫

OX

∏

b,r,t

p(αb,r,t|αb,r,t)p(αb,r,t, φhb,r,t|θb,r,t, φe

b,r,t)p(

θb,r,t

)

dAdAdΘdΦHdΦE

(C.150)

.=

∫

OX

∏

b,r,t

e−P−αb,r,t−P

−(αb,r,t−de)

e−(

P−αb,r,t/2+P

−θb,r,t/2)2

e−P−(θb,r,t−de)

× P−αb,r,t−αb,r,t−(θb,r,t−de) · I0(

Pde−αb,r,t−αb,r,t

2

)

dAdAdΘdΦHdΦE. (C.151)

The high-SNR behaviour of the joint pdf depends on I0(Pde−αb,r,t−αb,r,t

2 ). For each

b, r, t, we have the following cases.

266


• Case 1: de − αb,r,t − αb,r,t > 0

From [45, 46], we have that

I0

(


2

)

.= P− de−αb,r,t−αb,r,t

4 ePde−αb,r,t−αb,r,t

2 . (C.152)

Grouping the exponential terms in the integrand of (C.151) yields

exp

(

− P−αb,r,t − P−(αb,r,t−de) + Pde−αb,r,t−αb,r,t

2

−(


2

)2

− P−(θb,r,t−de)

)

. (C.153)

Note that

max (−αb,r,t,−(αb,r,t − de)) ≥de − αb,r,t − αb,r,t

2(C.154)

with equality occurs if αb,r,t = αb,r,t − de.

⋄ Case 1.1: αb,r,t 6= αb,r,t − de

We have the dot equality


2 −(


2

)2

− P−(θb,r,t−de) .= −Pmax(−αb,r,t,−(αb,r,t−de),−(θb,r,t−de)). (C.155)

Since

max (−αb,r,t,−(αb,r,t − de)) >de − αb,r,t − αb,r,t

2> 0, (C.156)

we have that the joint pdf decays exponentially with the SNR.

⋄ Case 1.2: αb,r,t = αb,r,t − de

The exponential behaviour depends on the following. If

−(θb,r,t − de) > −(αb,r,t − de), (C.157)

then the joint pdf decays exponentially with the SNR for θb,r,t < de and

267


converge to a constant for θb,r,t ≥ de. However, since de−αb,r,t−αb,r,t >

0 and αb,r,t = αb,r,t − de, we know that αb,r,t < de. This implies that

θb − de < 0 and the joint pdf decays exponentially with the SNR.

If

−(θb,r,t − de) ≤ −(αb,r,t − de), (C.158)

we have that


2 −(


2

)2

− P−(θb,r,t−de) .= −Pmax(−αb,r,t,−(αb,r,t−de)) + Pde−αb,r,t−αb,r,t

2 (C.159)

which is undetermined. But, as argued in [45, 46], we can replace

p(αb,r,t|αb,r,t) with the Kronecker delta function δf (αb,r,t − αb,r,t + de).

Let

A(1)b,r,t =

αb,r,t, αb,r,t, θb,r,t : de − αb,r,t − αb,r,t > 0, αb,r,t = αb,r,t − de

(C.160)

and

cb,r,t =∏

b′,r′,t′ 6=b,r,t

p(αb′,r′,t′|αb′,r′,t′)p(αb′,r′,t′ , φhb′,r′,t′ |θb′,r′,t′, φe

b′,r′,t′)

· p(θb′,r′,t′)dαb′,r′,t′dαb′,r′,t′dθb′,r′,t′dφhb′,r′,t′dφ

eb′,r′,t′ . (C.161)

Then, we have that

∫

OX∩A(1)b,r,t

cb,r,tp(αb,r,t|αb,r,t)p(αb,r,t, φhb,r,t|θb,r,t, φe

b,r,t)

· p(θb,r,t)dαb,r,tdαb,r,tdθb,r,tdφhb,r,tdφ

eb,r,t

.=

∫

OX∩αb,r,t=αb,r,t−de,θb,r,t−de≥αb,r,t−decb,r,tp(αb,r,t, φ

hb,r,t|θb,r,t, φe

b,r,t)

· p(θb,r,t)dαb,r,tdθb,r,tdφhb,r,tdφ

eb,r,t (C.162)

.=

∫

OX∩αb,r,t=αb,r,t−de,θb,r,t−de≥αb,r,t−decb,r,te

−(

P−αb,r,t/2+P

−θb,r,t/2)2

· e−P−(θb,r,t−de)


eb,r,t. (C.163)

We have the following dot equality for the exponential terms in the

268


integrand of (C.163)

−(

P−αb,r,t/2 + P−θb,r,t/2)2

− P−(θb,r,t−de) .= −Pmax(−αb,r,t,−(θb,r,t−de)).

(C.164)

If −αb,r,t ≥ −(θb,r,t−de), then we need the condition αb,r,t ≥ 0 to make

the exponential terms converge to a positive constant since otherwise

the RHS of (C.163) decays exponentially with the SNR; this implies

that θb,r,t ≥ de. On the other hand, if −αb,r,t ≤ −(θb,r,t − de), then we

need the condition θb,r,t ≥ de to make the exponential terms converge

to a positive constant since otherwise the RHS of (C.163) decays ex-

ponentially with the SNR; this implies that αb,r,t ≥ 0. From these

conditions, we can express the RHS of (C.163) as follows

∫

OX∩αb,r,t=αb,r,t−de,θb,r,t−de≥αb,r,t−decb,r,te

−(

P−αb,r,t/2+P

−θb,r,t/2)2

· e−P−(θb,r,t−de)


eb,r,t

.=

∫

OX∩−de≤αb,r,t=αb,r,t−de<0, θb,r,t≥decb,r,tP

−αb,r,t−(θb,r,t−de)

· dαb,r,tdθb,r,tdφhb,r,tdφ

eb,r,t. (C.165)

• Case 2: de − αb,r,t − αb,r,t ≤ 0

From [45, 46], we have that

I0

(


2

)

.= P 0. (C.166)

Grouping the exponential terms in the integrand of (C.151) yields

exp

(

−P−αb,r,t − P−(αb,r,t−de) −(



)

.

(C.167)

We have the dot equality

− P−αb,r,t − P−(αb,r,t−de) −(



.= −Pmax(−αb,r,t,−(αb,r,t−de),−(θb,r,t−de)). (C.168)

269


Let

A(2)b,r,t =

αb,r,t, αb,r,t, θb,r,t : de − αb,r,t − αb,r,t ≤ 0

. (C.169)

We then have that

∫

OX∩A(2)b,r,t

cb,r,tp(αb,r,t|αb,r,t)p(αb,r,t, φhb,r,t|θb,r,t, φe

b,r,t)

· p(θb,r,t)dαb,r,tdαb,r,tdθb,r,tdφhb,r,tdφ

eb,r,t

.=

∫

OX∩αb,r,t≥0,αb,r,t≥de,θb,r,t≥decb,r,tP

−αb,r,t P−αb,r,t

· P−(θb,r,t−de)dαb,r,tdαb,r,tdθb,r,t. (C.170)

The conditions αb,r,t ≥ 0, αb,r,t ≥ de and θb,r,t ≥ de are necessary since

otherwise the joint pdf decays exponentially with the SNR.

Combining (C.165) and (C.170) with (C.151) gives a lower bound to Pgout(R).

An upper bound can be obtained by applying (C.147), i.e., replacing


b,r,t) with the RHS of (C.147) in cases 1 and 2 above. Fol-

lowing the same derivation as the above cases, it is not difficult to show that

using the upper bound (C.147), we obtain the same dot equality as (C.165) and

(C.170). Thus, we can show that the Pgout(R) exactly satisfies the following dot

equality

Pgout(R).= P−dicsi (C.171)

.=

∫

OX

∏

(b,r,t):−de≤αb,r,t=αb,r,t−de<0,

θb,r,t≥de

(


eb,r,t

)

×∏

(b,r,t): αb,r,t≥0,αb,r,t≥de,

θb,r,t≥de

(

P−αb,r,t−αb,r,t−(θb,r,t−de)dαb,r,tdαb,r,tdθb,r,tdφhb,r,tdφ

eb,r,t

)

.

(C.172)

Applying Varadhan’s lemma [106] to the last dot equality yields

dicsi = infA,A,Θ∈OX

∑


αb,r,t +(

θb,r,t − de

)

∑



θb,r,t − de

)

(C.173)

270


which is identical to (C.25).

To conclude the proof, we note that the power exponent b(A(n(b))) derived

using LMMSE estimation [46] satisfies the same constraint as that derived using

ML estimation (cf. (C.14)). Thus, the difference of LMMSE and ML estimation

is immaterial for the large-SNR outage set. It follows from Appendices C.3 –

C.5 that the outage SNR-exponent characterisations in Theorems 5.1, 5.2 and

5.3 are valid for the LMMSE estimation as well.

271

References

[1] R. H. Etkin and D. N. C. Tse, “Degrees of freedom in some underspread

MIMO fading channels,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1576–

1608, Apr. 2006.

[2] S. R. Saunders and A. Aragon Zavala, Antennas and Propagation for Wire-

less Communication Systems, 2nd ed. Chichester, UK: Wiley, 2007.

[3] J. G. Proakis and M. Salehi, Digital Communications, 5th ed. New York:

McGraw-Hill, 2008.

[4] A. Lapidoth, “Nearest neighbor decoding for additive non-Gaussian noise

channels,” IEEE Trans. Inf. Theory, vol. 42, no. 5, pp. 1520–1529, Sep.

1996.

[5] A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically

optimum decoding algorithm,” IEEE Trans. Inf. Theory, vol. IT-13, no. 2,

pp. 260–269, Apr. 1967.

[6] T. May, H. Rohling, and V. Engels, “Performance analysis of Viterbi decod-

ing for 64-DAPSK and 64-QAM modulated OFDM signals,” IEEE Trans.

Commun., vol. 46, no. 2, pp. 182–190, Feb. 1998.

[7] E. Akay and E. Ayanoglu, “Achieving full frequency and space diversity in

wireless systems via BICM, OFDM, STBC, and Viterbi decoding,” IEEE

Trans. Commun., vol. 54, no. 12, pp. 2164–2172, Dec. 2006.

[8] J. Jin and C.-Y. Tsui, “Low-power limited-search parallel state Viterbi

decoder implementation based on scarce state transition,” IEEE Trans.

VLSI Syst., vol. 15, no. 10, pp. 1172–1176, Oct. 2007.

[9] G. D. Forney, Jr., “Generalized minimum distance decoding,” IEEE Trans.

Inf. Theory, vol. IT-12, no. 2, pp. 125–131, Apr. 1966.

273

REFERENCES

[10] G. D. Forney, Jr. and A. Vardy, “Generalized minimum-distance decoding

of Euclidean-space codes and lattices,” IEEE Trans. Inf. Theory, vol. 42,

no. 6, pp. 1992–2026, Nov. 1996.

[11] R. Kotter, “Fast generalized minimum-distance decoding of algebraic-

geometry and Reed-Solomon codes,” IEEE Trans. Inf. Theory, vol. 42,

no. 3, pp. 721–737, May 1996.

[12] A. Clark and D. P. Taylor, “Lattice codes and generalized minimum-

distance decoding for OFDM systems,” IEEE Trans. Commun., vol. 55,

no. 3, pp. 417–426, Mar. 2007.

[13] N. Merhav, G. Kaplan, A. Lapidoth, and S. Shamai (Shitz), “On infor-

mation rates for mismatched decoders,” IEEE Trans. Inf. Theory, vol. 40,

no. 6, pp. 1953–1967, Nov. 1994.

[14] E. Biglieri, J. Proakis, and S. Shamai (Shitz), “Fading channels:

Information-theoretic and communications aspects,” IEEE Trans. Inf.

Theory, vol. 44, no. 6, pp. 2619–2692, Oct. 1998.

[15] D. Tse and P. Viswanath, Fundamentals of Wireless Communication.

Cambridge University Press, 2005.

[16] L. H. Ozarow, S. Shamai, and A. D. Wyner, “Information theoretic con-

siderations for cellular mobile radio,” IEEE Trans. Veh. Technol., vol. 43,

no. 2, pp. 359–378, May 1994.

[17] S. Verdu and T. S. Han, “A general formula for channel capacity,” IEEE

Trans. Inf. Theory, vol. 40, no. 4, pp. 1147–1157, Jul. 1994.

[18] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed.

Hoboken, NJ: Wiley, 2006.

[19] R. Durrett, Probability: Theory and Examples, 4th ed. Cambridge Uni-

versity Press, 2010.

[20] L. Zheng and D. N. C. Tse, “Diversity and multiplexing: A fundamental

tradeoff in multiple-antenna channels,” IEEE Trans. Inf. Theory, vol. 49,

no. 5, pp. 1073–1096, May 2003.

[21] R. Knopp and P. A. Humblet, “On coding for block fading channels,” IEEE

Trans. Inf. Theory, vol. 46, no. 1, pp. 189–205, Jan. 2000.

274

REFERENCES

[22] E. Malkamaki and H. Leib, “Coded diversity on block-fading channels,”

IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 771–781, Mar. 1999.

[23] A. Guillen i Fabregas and G. Caire, “Coded modulation in the block-fading

channel: Coding theorems and code construction,” IEEE Trans. Inf. The-

ory, vol. 52, no. 1, pp. 91–114, Jan. 2006.

[24] K. D. Nguyen, A. Guillen i Fabregas, and L. K. Rasmussen, “A tight lower

bound to the outage probability of discrete-input block-fading channels,”

IEEE Trans. Inf. Theory, vol. 53, no. 11, pp. 4314–4322, Nov. 2007.

[25] K. D. Nguyen, “Adaptive transmission for block-fading channels,” Ph.D.

dissertation, University of South Australia, 2009.

[26] H. E. Gamal, G. Caire, and M. O. Damen, “The MIMO ARQ channel:

Diversity-multiplexing-delay tradeoff,” IEEE Trans. Inf. Theory, vol. 52,

no. 8, pp. 3601–3621, Aug. 2006.

[27] G. Caire, G. Taricco, and E. Biglieri, “Optimum power control over fading

channels,” IEEE Trans. Inf. Theory, vol. 45, no. 5, pp. 1468–1489, Jul.

1999.

[28] A. Lozano, A. M. Tulino, and S. Verdu, “Optimum power allocation for par-

allel Gaussian channels with arbitrary input distributions,” IEEE Trans.

Inf. Theory, vol. 52, no. 7, pp. 3033–3051, Jul. 2006.

[29] A. Chuang, A. Guillen i Fabregas, L. K. Rasmussen, and I. B. Collings,

“Optimal throughput-diversity-delay tradeoff in MIMO ARQ block-fading

channels,” IEEE Trans. Inf. Theory, vol. 54, no. 9, pp. 3968–3986, Sep.

2008.

[30] K. D. Nguyen, A. Guillen i Fabregas, and L. K. Rasmussen, “Power alloca-

tion for block-fading channels with arbitrary input constellations,” IEEE

Trans. Wireless Commun., vol. 8, no. 5, pp. 2514–2523, May 2009.

[31] ——, “Outage exponents of block-fading channels with power allocation,”

IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2373–2381, May 2010.

[32] K. D. Nguyen, L. K. Rasmussen, A. Guillen i Fabregas, and N. Letzepis,

“MIMO ARQ with multibit feedback: Outage analysis,” IEEE Trans. Inf.

Theory, vol. 58, no. 2, pp. 765–779, Feb. 2012.

275

REFERENCES

[33] R. G. Gallager, Information Theory and Reliable Communication. New

York: Wiley, 1968.

[34] S. Arimoto, “On the converse to the coding theorem for discrete memoryless

channels,” IEEE Trans. Inf. Theory, vol. 19, no. 3, pp. 357–359, May 1973.

[35] G. Kaplan and S. Shamai (Shitz), “Information rates and error exponents

of compound channels with application to antipodal signaling in a fading

environment,” AEU Archiv fur Elektronik und Ubertragungstechnik, vol. 47,

no. 4, pp. 228–239, 1993.

[36] A. Ganti, A. Lapidoth, and I. E. Telatar, “Mismatched decoding revisited:

General alphabets, channels with memory, and the wide-band limit,” IEEE

Trans. Inf. Theory, vol. 46, no. 7, pp. 2315–2328, Nov. 2000.

[37] A. Guillen i Fabregas, A. Martinez, and G. Caire, “Bit-interleaved coded

modulation,” Foundations and Trends in Commun. and Inf. Theory, vol. 5,

no. 1-2, pp. 1–153, 2008.

[38] S. Shamai (Shitz) and I. Sason, “Variations on the Gallager bounds, con-

nections, and applications,” IEEE Trans. Inf. Theory, vol. 48, no. 12, pp.

3029–3051, Dec. 2002.

[39] H. Weingarten, Y. Steinberg, and S. Shamai (Shitz), “Gaussian codes and

weighted nearest neighbor decoding in fading multiple-antenna channels,”

IEEE Trans. Inf. Theory, vol. 50, no. 8, pp. 1665–1686, Aug. 2004.

[40] G. H. Hardy, J. E. Littlewood, and G. Polya, Inequalities, 2nd ed. Cam-

bridge University Press, 1952.

[41] H. L. Royden, Real Analysis, 2nd ed. New York: Macmillan, 1968.

[42] R. C. Singleton, “Maximum distance q-nary codes,” IEEE Trans. Inf. The-

ory, vol. 10, no. 2, pp. 116–118, Apr. 1964.

[43] J. J. Boutros, E. C. Strinati, and A. Guillen i Fabregas, “Turbo code design

for block fading channels,” in Proc. 42nd Annual Allerton Conference on

Communication, Control and Computing, Monticello, IL, Sep.–Oct. 2004.

[44] J. J. Boutros, A. Guillen i Fabregas, E. Biglieri, and G. Zemor, “Low-

density parity-check codes for nonergodic block-fading channels,” IEEE

Trans. Inf. Theory, vol. 56, no. 9, pp. 4286–4300, Sep. 2010.

276

REFERENCES

[45] T. T. Kim and G. Caire, “Diversity gains of power control with noisy

CSIT in MIMO channels,” IEEE Trans. Inf. Theory, vol. 55, no. 4, pp.

1618–1626, Apr. 2009.

[46] T. T. Kim, K. D. Nguyen, and A. Guillen i Fabregas, “Coded modulation

with mismatched CSIT over MIMO block-fading channels,” IEEE Trans.

Inf. Theory, vol. 56, no. 11, pp. 5631–5640, Nov. 2010.

[47] L. Zhao, W. Mo, Y. Ma, and Z. Wang, “Diversity and multiplexing tradeoff

in general fading channels,” IEEE Trans. Inf. Theory, vol. 53, no. 4, pp.

1549–1557, Apr. 2007.

[48] ——, “Diversity and multiplexing tradeoff in general fading channels,” in

Proc. Conference on Information Sciences and Systems (CISS), Princeton,

NJ, Mar. 2006.

[49] G. Taricco and E. Biglieri, “Space-time decoding with imperfect channel

estimation,” IEEE Trans. Wireless Commun., vol. 4, no. 4, pp. 1874–1888,

Jul. 2005.

[50] E. Biglieri, Coding for Wireless Channels. New York: Springer

Science+Business Media, Inc., 2005.

[51] M. R. D. Rodrigues, F. Perez-Cruz, and S. Verdu, “Multiple-input multiple-

output Gaussian channels: Optimal covariance for non-Gaussian inputs,”

in Proc. IEEE Inf. Theory Workshop, Porto, Portugal, May 2008.

[52] F. Perez-Cruz, M. R. D. Rodrigues, and S. Verdu, “MIMO Gaussian chan-

nels with arbitrary inputs: Optimal precoding and power allocation,” IEEE

Trans. Inf. Theory, vol. 56, no. 3, pp. 1070–1084, Mar. 2010.

[53] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions

With Formulas, Graphs, and Mathematical Tables. New York: Dover,

1965.

[54] A. Lapidoth and S. Shamai (Shitz), “Fading channels: How perfect need

“perfect side information” be?” IEEE Trans. Inf. Theory, vol. 48, no. 5,

pp. 1118–1134, May 2002.

[55] N. Letzepis and A. Guillen i Fabregas, “Outage probability of the Gaussian

MIMO free space optical channel with PPM,” IEEE Trans. Commun.,

vol. 57, no. 12, pp. 3682–3690, Dec. 2009.

277

REFERENCES

[56] ——, “Outage probability of the free space optical channel with doubly

stochastic scintillation,” IEEE Trans. Commun., vol. 57, no. 10, pp. 2899–

2902, Oct. 2009.

[57] N. Letzepis, K. D. Nguyen, A. Guillen i Fabregas, and W. G. Cowley, “Out-

age analysis of the hybrid free-space optical and radio-frequency channel,”

IEEE J. Sel. Areas Commun., vol. 27, no. 9, pp. 1709–1719, Dec. 2009.

[58] G. Caire and D. Tuninetti, “The throughput of hybrid-ARQ protocols for

the Gaussian collision channel,” IEEE Trans. Inf. Theory, vol. 47, no. 5,

pp. 1971–1988, Jul. 2001.

[59] D. M. Mandelbaum, “An adaptive-feedback coding scheme using incremen-

tal redundancy,” IEEE Trans. Inf. Theory, vol. 20, no. 3, pp. 388–389, May

1974.

[60] J. J. Metzner, “Improvements in block-retransmission schemes,” IEEE

Trans. Commun., vol. COM-27, no. 2, pp. 524–532, Feb. 1979.

[61] D. Chase, “Code combining—A maximum-likelihood decoding approach

for combining an arbitrary number of noisy packets,” IEEE Trans. Com-

mun., vol. COM-33, no. 5, pp. 385–393, May 1985.

[62] D. J. Costello, Jr., J. Hagenauer, H. Imai, and S. B. Wicker, “Applications

of error-control coding,” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2531–

2560, Oct. 1998.

[63] E. Malkamaki and H. Leib, “Performance of truncated type-II hybrid ARQ

schemes with noisy feedback over block fading channels,” IEEE Trans.

Commun., vol. 48, no. 9, pp. 1477–1487, Sep. 2000.

[64] P. Wu and N. Jindal, “Coding versus ARQ in fading channels: How reliable

should the PHY be?” IEEE Trans. Commun., vol. 59, no. 12, pp. 3363–

3374, Dec. 2011.

[65] R. Cam and C. Leung, “Throughput analysis of some ARQ protocols in

the presence of feedback errors,” IEEE Trans. Commun., vol. 45, no. 1, pp.

35–44, Jan. 1997.

[66] L. Cao and P.-Y. Kam, “On the performance of packet ARQ schemes in

Rayleigh fading: The role of receiver channel state information and its

accuracy,” IEEE Trans. Veh. Technol., vol. 60, no. 2, pp. 704–709, Feb.

2011.

278

REFERENCES

[67] H. A. Ngo and L. Hanzo, “Impact of imperfect channel state information

on RS coding aided hybrid-ARQ in Rayleigh fading channels,” in IEEE

Int. Conf. Commun., Cape Town, South Africa, May 2010.

[68] H. Zheng, A. Lozano, and M. Haleem, “Multiple ARQ processes for MIMO

systems,” EURASIP J. Appl. Signal Process., vol. 2004, no. 5, pp. 772–782,

2004.

[69] I. Csiszar and J. Korner, Information Theory: Coding Theorems for Dis-

crete Memoryless Systems, 2nd ed. Cambridge University Press, 2011.

[70] R. Knopp and G. Caire, “Power control and beamforming for systems with

multiple transmit and receive antennas,” IEEE Trans. Wireless Commun.,

vol. 1, no. 4, pp. 638–648, Oct. 2002.

[71] M. Guillaud, D. T. M. Slock, and R. Knopp, “A practical method for wire-

less channel reciprocity exploitation through relative calibration,” in Proc.

Eighth Int. Symp. Signal Process. and Its Applicat., Sydney, Australia, Aug.

2005.

[72] A. T. Asyhari and A. Guillen i Fabregas, “Nearest neighbor decoding in

MIMO block-fading channels with imperfect CSIR,” IEEE Trans. Inf. The-

ory, vol. 58, no. 3, pp. 1483–1517, Mar. 2012.

[73] E. Biglieri, G. Caire, and G. Taricco, “Limiting performance of block-fading

channels with multiple antennas,” IEEE Trans. Inf. Theory, vol. 47, no. 4,

pp. 1273–1289, May 2001.

[74] S. V. Hanly and D. N. C. Tse, “Multiaccess fading channels–Part II: Delay-

limited capacities,” IEEE Trans. Inform. Theory, vol. 44, no. 7, pp. 2816–

2831, Nov. 1998.

[75] K. D. Nguyen, N. Letzepis, A. Guillen i Fabregas, and L. K. Rasmussen,

“Causal/predictive imperfect channel state information in block-fading

channels,” submitted to IEEE Trans. Inf. Theory, Jul. 2010.

[76] B. Hassibi and B. M. Hochwald, “How much training is needed in multiple-

antenna wireless links?” IEEE Trans. Inf. Theory, vol. 49, no. 4, pp. 951–

963, Apr. 2003.

[77] T. L. Marzetta, “BLAST training: Estimating channel characteristics for

high-capacity space-time wireless,” in Proc. 37th Annual Allerton Conf. on

Communication, Control, and Computing, Monticello, IL, Sep. 1999.

279

REFERENCES

[78] H. V. Poor, An Introduction to Signal Detection and Estimation, 2nd ed.

New York: Springer-Verlag (A Dowden & Culver book), 1994.

[79] E. Visotsky and U. Madhow, “Space-time transmit precoding with imper-

fect feedback,” IEEE Trans. Inf. Theory, vol. 47, no. 6, pp. 2632–2639,

Sep. 2001.

[80] V. Aggarwal and A. Sabharwal, “Bits about the channel: Multiround pro-

tocols for two-way fading channels,” IEEE Trans. Inf. Theory, vol. 57,

no. 6, pp. 3352–3370, Jun. 2011.

[81] ——, “Power-controlled feedback and training for two-way MIMO chan-

nels,” IEEE Trans. Inf. Theory, vol. 56, no. 7, pp. 3310–3331, Jul. 2010.

[82] X. J. Zhang, Y. Gong, and K. B. Letaief, “Power control and channel

training for MIMO channels: A DMT perspective,” IEEE Trans. Wireless

Commun., vol. 10, no. 7, pp. 2080–2089, Jul. 2011.

[83] L. Zheng and D. N. C. Tse, “Communication on the Grassmann manifold:

A geometric approach to the noncoherent multiple-antenna channel,” IEEE

Trans. Inf. Theory, vol. 48, no. 2, pp. 359–383, Feb. 2002.

[84] M. Medard, “The effect upon channel capacity in wireless communications

of perfect and imperfect knowledge of the channel,” IEEE Trans. Inf. The-

ory, vol. 46, no. 3, pp. 933–946, May 2000.

[85] A. Lapidoth, “On the asymptotic capacity of stationary Gaussian fading

channels,” IEEE Trans. Inf. Theory, vol. 51, no. 2, pp. 437–446, Feb. 2005.

[86] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University

Press, 1985.

[87] Y.-H. Kim, “A coding theorem for a class of stationary channels with feed-

back,” IEEE Trans. Inf. Theory, vol. 54, no. 4, pp. 1488–1499, Apr. 2008.

[88] G. J. Foschini, “Layered space-time architecture for wireless communica-

tion in a fading environment when using multi-element antennas,” Bell

Labs Tech. J., vol. 1, no. 2, pp. 41–59, 1996.

[89] E. Telatar, “Capacity of multi-antenna Gaussian channels,” European

Trans. Telecommun., vol. 10, no. 6, pp. 585–595, Nov.–Dec. 1999.

280

REFERENCES

[90] A. Grant, “Rayleigh fading multi-antenna channels,” EURASIP J. Appl.

Signal Process., vol. 2002, no. 3, pp. 316–329, Mar. 2002.

[91] T. Koch and A. Lapidoth, “The fading number and degrees of freedom in

non-coherent MIMO fading channels: A peace pipe,” in Proc. IEEE Int.

Symp. Inf. Theory, Adelaide, Australia, Sep. 2005.

[92] N. Jindal and A. Lozano, “A unified treatment of optimum pilot overhead

in multipath fading channels,” IEEE Trans. Commun., vol. 58, no. 10, pp.

2939–2948, Oct. 2010.

[93] A. Lozano, “Interplay of spectral efficiency, power and Doppler spectrum

for reference-signal-assisted wireless communication,” IEEE Trans. Wire-

less Commun., vol. 7, no. 12, pp. 5020–5029, Dec. 2008.

[94] S. Ohno and G. B. Giannakis, “Average-rate optimal PSAM transmis-

sions over time-selective fading channels,” IEEE Trans. Wireless Commun.,

vol. 1, no. 4, pp. 712–720, Oct. 2002.

[95] K. Petersen, Ergodic Theory, ser. Cambridge studies in advanced mathe-

matics 2. Cambridge University Press, 1983.

[96] V. Sethuraman and B. Hajek, “Capacity per unit energy of fading channels

with a peak constraint,” IEEE Trans. Inf. Theory, vol. 51, no. 9, pp. 3102–

3120, Sep. 2005.

[97] J. R. Brown, Ergodic Theory and Topological Dynamics. New York: Aca-

demic Press, 1976.

[98] A. J. Weir, Lebesgue Integration and Measure. Cambridge University

Press, 1973.

[99] A. W. van der Vaart and J. A. Wellner, Weak Convergence and Empirical

Processes: With Applications to Statistics. New York: Springer-Verlag,

1996.

[100] T. S. Rappaport, Wireless Communications: Principles and Practice,

2nd ed. Upper Saddle River, NJ: Prentice Hall PTR, 2002.

[101] A. Lapidoth and S. M. Moser, “Capacity bounds via duality with appli-

cations to multiple-antenna systems on flat-fading channels,” IEEE Trans.

Inf. Theory, vol. 49, no. 10, pp. 2426–2467, Oct. 2003.

281

REFERENCES

[102] T. S. Han, Information-Spectrum Methods in Information Theory. Berlin,

Germany: Springer-Verlag, 2003.

[103] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Channel coding rate in the

finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp.

2307–2359, May 2010.

[104] Y. Polyanskiy, “Channel coding: Non-asymptotic fundamental limits,”

Ph.D. dissertation, Princeton University, 2010.

[105] Y. Polyanskiy and S. Verdu, “Scalar coherent fading channel: Dispersion

analysis,” in Proc. IEEE Int. Symp. Inf. Theory, Saint Petersburg, Russia,

Jul.–Aug. 2011.

[106] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications,

2nd ed. New York: Springer-Verlag, 1998.

[107] R. J. Muirhead, Aspects of Multivariate Statistical Theory. New York:

Wiley, 1982.

[108] A. Edelman, “Eigenvalues and condition numbers of random matrices,”

Ph.D. dissertation, MIT, 1989.

282

nearest neighbour decoding for fading...

Documents