366 ieee transactions on information theory, vol. 61, …yc5/publications/nonasymptoticmimo.pdf ·...

22
366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 1, JANUARY 2015 Backing Off From Infinity: Performance Bounds via Concentration of Spectral Measure for Random MIMO Channels Yuxin Chen, Student Member, IEEE, Andrea J. Goldsmith, Fellow, IEEE , and Yonina C. Eldar, Fellow, IEEE Abstract— The performance analysis of random vector channels, particularly multiple-input-multiple-output (MIMO) channels, has largely been established in the asymptotic regime of large channel dimensions, due to the analytical intractability of characterizing the exact distribution of the objective performance metrics. This paper exposes a new nonasymptotic framework that allows the characterization of many canonical MIMO system performance metrics to within a narrow interval under finite channel dimensionality, provided that these metrics can be expressed as a separable function of the singular values of the matrix. The effectiveness of our framework is illustrated through two canonical examples. In particular, we characterize the mutual information and power offset of random MIMO channels, as well as the minimum mean squared estimation error of MIMO channel inputs from the channel outputs. Our results lead to simple, informative, and reasonably accurate control of various performance metrics in the finite-dimensional regime, as corrob- orated by the numerical simulations. Our analysis framework is established via the concentration of spectral measure phenom- enon for random matrices uncovered by Guionnet and Zeitouni, which arises in a variety of random matrix ensembles irrespective of the precise distributions of the matrix entries. Index Terms— MIMO, massive MIMO, confidence interval, concentration of spectral measure, random matrix theory, non-asymptotic analysis, mutual information, MMSE. I. I NTRODUCTION T HE past decade has witnessed an explosion of developments in multi-dimensional vector channels [1], particularly multiple-input-multiple-output (MIMO) channels. The exploitation of multiple (possibly correlated) dimensions provides various benefits in wireless communication and signal processing systems, including channel capacity gain, improved energy efficiency, and enhanced robustness against noise and channel variation. Manuscript received February 8, 2014; revised October 24, 2014; accepted October 25, 2014. Date of publication October 28, 2014; date of current ver- sion December 22, 2014. This work was supported in part by the NSF under Grant CCF-0939370 and Grant CIS-1320628, in part by the AFOSR within the MURI Programme under Grant FA9550-09-1-0643, and in part by the BSF Transformative Science under Grant 2010505. Y. Chen is with the Department of Electrical Engineering and the Department of Statistics, Stanford University, Stanford, CA 94305 USA (e-mail: [email protected]). A. J. Goldsmith is with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305 USA (e-mail: [email protected]). Y. C. Eldar is with the Department of Electrical Engineering, Technion-Israel Institute of Technology, Haifa 32000, Israel (e-mail: [email protected]). Communicated by A. Lozano, Associate Editor for Communications. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIT.2014.2365497 Although many fundamental MIMO system performance metrics can be evaluated via the precise spectral distributions of finite-dimensional MIMO channels (e.g. channel capacity [2], [3], minimum mean square error (MMSE) estimates of vector channel inputs from the channel outputs [4], power offset [5], sampled capacity loss [6]), this approach often results in prohibitive analytical and computational complexity in characterizing the probability distributions and confidence intervals of these MIMO system metrics. In order to obtain more informative analytical insights into the MIMO system performance metrics, a large number of works (see [5], [7]–[21]) present more explicit expressions for these performance metrics with the aid of random matrix theory. Interestingly, when the number of input and output dimensions grow, many of the MIMO system metrics taking the form of linear spectral statistics converge to deterministic limits, due to various limiting laws and universality properties of (asymptotically) large random matrices [22]. In fact, the spectrum (i.e. singular-value distribution) of a random channel matrix H tends to stabilize when the channel dimension grows, and the limiting distribution is often universal in the sense that it is independent of the precise distributions of the entries of H . These asymptotic results are well suited for massive MIMO communication systems. However, the limiting regime falls short in providing a full picture of the phenomena arising in most practical systems which, in general, have moderate dimensionality. While the asymptotic convergence rates of many canonical MIMO system performance metrics have been investigated as well, how large the channel dimension must be largely depends on the realization of the growing matrix sequences. In this paper, we propose an alternative method via concentration of measure to evaluate many canon- ical MIMO system performance metrics for finite-dimension channels, assuming that the target performance metrics can be transformed into linear spectral statistics of the MIMO channel matrix. Moreover, we show that the metrics fall within narrow intervals with high (or even overwhelming) in the nonasymptotic regime. A. Related Work Random matrix theory is one of the central topics in probability theory with many connections to wireless com- munications and signal processing. Several random matrix ensembles, such as Gaussian unitary ensembles, admit exact 0018-9448 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: others

Post on 22-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 1, JANUARY 2015

Backing Off From Infinity: Performance Boundsvia Concentration of Spectral Measure

for Random MIMO ChannelsYuxin Chen, Student Member, IEEE, Andrea J. Goldsmith, Fellow, IEEE, and Yonina C. Eldar, Fellow, IEEE

Abstract— The performance analysis of random vectorchannels, particularly multiple-input-multiple-output (MIMO)channels, has largely been established in the asymptotic regime oflarge channel dimensions, due to the analytical intractability ofcharacterizing the exact distribution of the objective performancemetrics. This paper exposes a new nonasymptotic framework thatallows the characterization of many canonical MIMO systemperformance metrics to within a narrow interval under finitechannel dimensionality, provided that these metrics can beexpressed as a separable function of the singular values of thematrix. The effectiveness of our framework is illustrated throughtwo canonical examples. In particular, we characterize the mutualinformation and power offset of random MIMO channels, aswell as the minimum mean squared estimation error of MIMOchannel inputs from the channel outputs. Our results lead tosimple, informative, and reasonably accurate control of variousperformance metrics in the finite-dimensional regime, as corrob-orated by the numerical simulations. Our analysis framework isestablished via the concentration of spectral measure phenom-enon for random matrices uncovered by Guionnet and Zeitouni,which arises in a variety of random matrix ensembles irrespectiveof the precise distributions of the matrix entries.

Index Terms— MIMO, massive MIMO, confidence interval,concentration of spectral measure, random matrix theory,non-asymptotic analysis, mutual information, MMSE.

I. INTRODUCTION

THE past decade has witnessed an explosion ofdevelopments in multi-dimensional vector channels [1],

particularly multiple-input-multiple-output (MIMO) channels.The exploitation of multiple (possibly correlated) dimensionsprovides various benefits in wireless communication and signalprocessing systems, including channel capacity gain, improvedenergy efficiency, and enhanced robustness against noise andchannel variation.

Manuscript received February 8, 2014; revised October 24, 2014; acceptedOctober 25, 2014. Date of publication October 28, 2014; date of current ver-sion December 22, 2014. This work was supported in part by the NSF underGrant CCF-0939370 and Grant CIS-1320628, in part by the AFOSR withinthe MURI Programme under Grant FA9550-09-1-0643, and in part by theBSF Transformative Science under Grant 2010505.

Y. Chen is with the Department of Electrical Engineering and theDepartment of Statistics, Stanford University, Stanford, CA 94305 USA(e-mail: [email protected]).

A. J. Goldsmith is with the Department of Electrical Engineering, StanfordUniversity, Stanford, CA 94305 USA (e-mail: [email protected]).

Y. C. Eldar is with the Department of Electrical Engineering,Technion-Israel Institute of Technology, Haifa 32000, Israel(e-mail: [email protected]).

Communicated by A. Lozano, Associate Editor for Communications.Color versions of one or more of the figures in this paper are available

online at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TIT.2014.2365497

Although many fundamental MIMO system performancemetrics can be evaluated via the precise spectral distributionsof finite-dimensional MIMO channels (e.g. channelcapacity [2], [3], minimum mean square error (MMSE)estimates of vector channel inputs from the channeloutputs [4], power offset [5], sampled capacity loss [6]),this approach often results in prohibitive analytical andcomputational complexity in characterizing the probabilitydistributions and confidence intervals of these MIMO systemmetrics. In order to obtain more informative analytical insightsinto the MIMO system performance metrics, a large numberof works (see [5], [7]–[21]) present more explicit expressionsfor these performance metrics with the aid of random matrixtheory. Interestingly, when the number of input and outputdimensions grow, many of the MIMO system metrics takingthe form of linear spectral statistics converge to deterministiclimits, due to various limiting laws and universality propertiesof (asymptotically) large random matrices [22]. In fact, thespectrum (i.e. singular-value distribution) of a random channelmatrix H tends to stabilize when the channel dimensiongrows, and the limiting distribution is often universal in thesense that it is independent of the precise distributions of theentries of H .

These asymptotic results are well suited for massiveMIMO communication systems. However, the limiting regimefalls short in providing a full picture of the phenomena arisingin most practical systems which, in general, have moderatedimensionality. While the asymptotic convergence rates ofmany canonical MIMO system performance metrics havebeen investigated as well, how large the channel dimensionmust be largely depends on the realization of the growingmatrix sequences. In this paper, we propose an alternativemethod via concentration of measure to evaluate many canon-ical MIMO system performance metrics for finite-dimensionchannels, assuming that the target performance metrics canbe transformed into linear spectral statistics of the MIMOchannel matrix. Moreover, we show that the metrics fall withinnarrow intervals with high (or even overwhelming) in thenonasymptotic regime.

A. Related Work

Random matrix theory is one of the central topics inprobability theory with many connections to wireless com-munications and signal processing. Several random matrixensembles, such as Gaussian unitary ensembles, admit exact

0018-9448 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

CHEN et al.: BACKING OFF FROM INFINITY: PERFORMANCE BOUNDS VIA CONCENTRATION 367

characterization of their eigenvalue distributions [23] underany channel dimensionality. These spectral distributionsassociated with the finite-dimensional MIMO channels canthen be used to compute analytically the distributions andconfidence intervals of the MIMO system performance metrics(e.g. mutual information of MIMO fading channels [24]–[28]).However, the computational intractability of integrating alarge-dimensional function over a finite-dimensional MIMOspectral distribution precludes concise and informativecapacity expressions even in moderate-sized problems. For thisreason, theoretical analysis based on precise eigenvalue char-acterization is generally limited to small-dimensional vectorchannels.

In comparison, one of the central topics in modern randommatrix theory is to derive limiting distributions for theeigenvalues of random matrix ensembles of interest, whichoften turns out to be simple and informative. Several pertinentexamples include the semi-circle law for symmetricWigner matrices [29], the circular law for i.i.d. matrixensembles [30], [31], and the Marchenko–Pastur law [32] forrectangular random matrices. One remarkable feature of theseasymptotic laws is the universality phenomenon, wherebythe limiting spectral distributions are often indifferent to theprecise distribution of each matrix entry. This phenomenonallows theoretical analysis to accommodate a broad class ofrandom matrix families beyond Gaussian ensembles. See [22]for a beautiful and self-contained exposition of these limitingresults.

The simplicity and universality of these asymptotic lawshave inspired a large body of work in characterizing theasymptotic performance limits of random vector channels. Forinstance, the limiting results for i.i.d. Gaussian ensembleshave been applied in early work [2] to establish the linearincrease of MIMO capacity with the number of antennas.This approach was then extended to accommodate separablecorrelation fading models [8], [12], [14]–[16], [33], [34] withgeneral non-Gaussian distributions (particularly Rayleigh andRicean fading [5], [13]). These models admit analytically-friendly solutions, and have become the cornerstone for manyresults for random MIMO channels. In addition, severalworks have characterized the second-order asymptotics(often in the form of central limit theorems) for mutualinformation [16], [35], [36], information density [37],second-order coding rates [38], [39], diversity-multiplexingtradeoff [40], etc., which uncover the asymptotic convergencerates of various MIMO system performance metrics under alarge family of random matrix ensembles. The limiting tail ofthe distribution (or large deviation) of the mutual informationhas also been determined in the asymptotic regime [41].In addition to information theoretic metrics, other signalprocessing and statistical metrics like MMSE [7], [42], [43],multiuser efficiency for CDMA systems [10], opticalcapacity [44], covariance matrix and principal compo-nents [45], canonical correlations [46], and likelihood ratiotest statistics [46], have also been investigated via asymptoticrandom matrix theory. While these asymptotic laws havebeen primarily applied to performance metrics in the form oflinear spectral statistics, more general performance metrics can

be approximated using the delicate method of “deterministicequivalents” (see [47], [48]).

A recent trend in statistics is to move from asymp-totic laws towards non-asymptotic analysis of randommatrices [49], [50], which aims at revealing statistical effectsof a moderate-to-large number of components, assuming asufficient amount of independence among them. One promi-nent effect in this context is the concentration of spectralmeasure phenomenon [51], [52], which indicates that manyseparable functions of a matrix’s singular values (called lin-ear spectral statistics [53]) can be shown to fall within anarrow interval with high probability even in the moderate-dimensional regime. This phenomenon has been investigatedin various fields such as high-dimensional statistics [49],statistical learning [54], and compressed sensing [50].

Inspired by the success of the measure concentrationmethods in the statistics literature, our recent work [6]develops a non-asymptotic approach to quantify the capacityof multi-band channels under random sub-sampling strategies,which to our knowledge is the first to exploit the concen-tration of spectral measure phenomenon to analyze randomMIMO channels. In general, the concentration of measurephenomena are much less widely recognized and used in thecommunication community than in the statistics and signalprocessing community. Recent emergence of massive MIMOtechnologies [55]–[58], which uses a large number of antennasto obtain both capacity gain and improved radiated energyefficiency, provides a compelling application of these methodsto characterize system performance. Other network/distributedMIMO systems (see [59], [60]) also require analyzing large-dimensional random vector channels. It is our aim here todevelop a general framework that promises new insightsinto the performance of these emerging technologies undermoderate-to-large channel dimensionality.

B. ContributionsWe develop a general non-asymptotic framework for

deriving performance bounds of random vector channels orany general MIMO system, based on the powerful concen-tration of spectral measure phenomenon of random matricesas revealed by Guionnet and Zeitouni [51]. Specifically, weintroduce a general recipe that can be used to assess variousMIMO system performance metrics to within vanishinglysmall confidence intervals, assuming that the objective metricscan be transformed into linear spectral statistics associatedwith the MIMO channel. Our framework and associated resultscan accommodate a large class of probability distributionsfor random channel matrices, including those with boundedsupport, a large class of sub-Gaussian measures, and heavy-tailed distributions. To broaden the range of metrics that wecan accurately evaluate, we also develop a general concentra-tion of spectral measure inequality for the cases where onlythe exponential mean (instead of the mean) of the objectivemetrics can be computed.

We demonstrate the effectiveness of our approach throughtwo illustrative examples: (1) mutual information of randomvector channels under equal power allocation; (2) MMSE in

Page 3: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

368 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 1, JANUARY 2015

estimating signals transmitted over random MIMO channels.These examples allow concise and, informative characteri-zations even in the presence of moderate-to-high SNR andmoderate-to-large channel dimensionality. In contrast to alarge body of prior works that focus on first-order limitsor asymptotic convergence rate of the target performancemetrics, we are able to derive full characterization of thesemetrics in the non-asymptotic regime. Specifically, we obtainnarrow confidence intervals with precise order and reasonablyaccurate pre-constants, which do not rely on careful choiceof the growing matrix sequence. Our numerical simulationsalso corroborate that our theoretical predictions are reasonablyaccurate in the finite-dimensional regime.

C. Organization and Notation

The rest of the paper is organized as follows. Section IIintroduces several families of probability measuresinvestigated in this paper. We present a general frameworkcharacterizing the concentration of spectral measure phenom-enon in Section III. In Section IV, we illustrate the generalframework using a small sample of canonical examples.Finally, Section V concludes the paper with a summary ofour findings and a discussion of several future directions.

For convenience of presentation, we let C denote the set ofcomplex numbers. For any function g(x), the Lipschitz normof g(·) is defined as

‖g‖L := supx �=y

∣∣∣∣

g (x) − g (y)

x − y

∣∣∣∣. (1)

We let λi (A), λmax(A), and λmin(A) represent the i th largesteigenvalue, the largest eigenvalue, and the smallest eigenvalueof a Hermitian matrix A, respectively, and use ‖·‖ to denotethe operator norm (or spectral norm). The set {1, 2, . . . , n} isdenoted as [n], and we write

([n]k

)

for the set of all k-elementsubsets of [n]. For any set s, we use card(s) to represent thecardinality of s. Also, for any two index sets s and t, wedenote by As,t the submatrix of A containing the rows atindices in s and columns at indices in t. We use x ∈ a +[b, c]to indicate that x lies within the interval [a +b, a +c]. Finally,the standard notation f (n) = O (g(n)) means that there existsa constant c > 0 such that f (n) ≤ cg(n); f (n) = � (g(n))indicates that there are universal constants c1, c2 > 0 such thatc1g(n) ≤ f (n) ≤ c2g(n); and f (n) = � (g(n)) indicates thatthere are universal constants c > 0 such that f (n) ≤ cg(n).Throughout this paper, we use log (·) to represent the naturallogarithm. Our notation is summarized in Table I.

II. THREE FAMILIES OF PROBABILITY MEASURES

In this section, we define precisely three classes of prob-ability measures with different properties of the tails, whichlay the foundation of the analysis in this paper.

Definition 1 (Bounded Distribution): We say that a randomvariable X ∈ C is bounded by D if

P (|X | ≤ D) = 1. (2)

The class of probability distributions that have boundedsupport subsumes as special cases a broad class of distributions

TABLE I

SUMMARY OF NOTATION AND PARAMETERS

encountered in practice (e.g. the Bernoulli distribution anduniform distribution). Another class of probability measuresthat possess light tails (i.e. sub-Gaussian tails) is defined asfollows.

Definition 2 (Logarithmic Sobolev Measure): We say thata random variable X ∈ C with cumulative probabilitydistribution (CDF) F (·) satisfies the logarithmic Sobolevinequality (LSI) with uniform constant cls if, for any differen-tiable function h, one hasˆ

h2(x) log

(h2(x)´

h2(z)dF(z)

)

dF(x)≤2cls

ˆ∣∣h′(x)

∣∣2dF(x).

(3)Remark 1: One of the most popular techniques in

demonstrating measure concentration is the “entropy method”(see, [61], [62]), which hinges upon establishing inequalitiesof the form

KL(

PtX‖PX

) ≤ O(

t2)

(4)

for some measure PX and its tilted measure PtX , where

KL (·‖·) denotes the Kullback–Leibler divergence. See[62, Chapter 3] for detailed definitions and derivation. It turnsout that for a probability measure satisfying the LSI, onecan pick the function h(·) in (3) in some appropriate fashionto yield the entropy-type inequality (4). In fact, the LSI hasbeen well recognized as one of the most fundamental criteriato demonstrate exponentially sharp concentration for variousmetrics.

Remark 2: While the measure satisfying the LSI necessarilyexhibits sub-Gaussian tails (see [61], [62]), many concentra-tion results under log-Sobolev measures cannot be extended togeneral sub-Gaussian distributions (e.g. for bounded measuresthe concentration results have only been shown for convexfunctions instead of general Lipschitz functions).

A number of measures satisfying the LSI have beendiscussed in the expository paper [63]. One of the mostimportant examples is the standard Gaussian distribution,which satisfies the LSI with logarithmic Sobolev constantcLS = 1. In many situations, the probability measures obeyingthe LSI exhibit very sharp concentration of spectral mea-sure phenomenon (typically sharper than general boundedmeasures).

Page 4: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

CHEN et al.: BACKING OFF FROM INFINITY: PERFORMANCE BOUNDS VIA CONCENTRATION 369

While prior works on measure concentration focus primarilyon measures with bounded support or measures with sub-Gaussian tails, it is also of interest to accommodate a moregeneral class of distributions (e.g. heavy-tailed distributions).For this purpose, we introduce sub-exponential distributionsand heavy-tailed distributions as follows.

Definition 3 (Sub-Exponential Distribution): Arandomvari-able X is said to be sub-exponentially distributed withparameter λ if it satisfies

P (|X | > x) ≤ c0e−λx , ∀x > 0 (5)

for some absolute constant c0 > 0.Definition 4 (Heavy-Tailed Distribution): A random vari-

able X is said to be heavy-tailed distributed if

limx→∞ e−λx

P (|X | > x) = ∞, ∀λ > 0. (6)

That said, sub-exponential (resp. heavy-tailed) distribu-tions are those probability measures whose tails are lighter(resp. heavier) than exponential distributions. Commonlyencountered heavy-tailed distributions include log-normal andpower-law distributions.

To facilitate analysis, for both sub-exponential and heavy-tailed distributions, we define the following two quantitiesτc and σc with respect to X for some sequence c(n) such that

P (|X | > τc) ≤ 1

mnc(n)+1, (7)

and

σc =√

E[|X |2 1{|X |<τc}

]

. (8)

In short, τc represents a truncation threshold such that thetruncated X coincides with the true X with high probability,and σc denotes the standard deviation of the truncated X .The idea is to study X1{|X |<τc} instead of X in the analy-sis, since the truncated value is bounded and exhibits lighttails. For instance, if X is exponentially distributed such thatP (|X | > x) = e−x and c(n) = log n, then this yields that

τc = log2 n + log (mn).

Typically, one would like to pick c(n) such that τc becomes asmall function of m and n obeying σ 2

c ≈ E[|X |2].

III. CONCENTRATION OF SPECTRAL MEASURE IN

LARGE RANDOM VECTOR CHANNELS

We present a general mathematical framework thatfacilitates the analysis of random vector channels. Theproposed approach is established upon the concentrationof spectral measure phenomenon derived byGuionnet and Zeitouni [51], which is a consequence ofTalagrand’s concentration inequalities [64]. While theadaptation of such general concentration results to oursettings requires moderate mathematical effort, it leads to avery effective framework to assess the fluctuation of variousMIMO system performance metrics. We will provide a fewcanonical examples in Section IV to illustrate the power ofthis framework.

Consider a random matrix M = [M i j ]1≤i≤n,1≤ j≤m,where M i j ’s are assumed to be independently distributed.

We use MRi j and M I

i j to represent respectively the real andimaginary parts of M i j , which are also generated indepen-dently. Note, however, that M i j ’s are not necessarily drawnfrom identical distributions. Set the numerical value

κ :={

1, in the real-valued case,2, in the complex-valued case.

(9)

We further assume that all entries have matching two momentssuch that for all i and j :

E[

M i j] = 0, Var

(

MRi j

)

= ν2i j , (10)

Var(

MIi j

)

={

ν2i j , in the complex-valued case,

0, in the real-valued case,(11)

where ν2i j (νi j ≥ 0) are uniformly bounded by

maxi, j

∣∣νi j∣∣ ≤ ν. (12)

In this paper, we focus on the class of MIMO systemperformance metrics that can be transformed into additivelyseparable functions of the eigenvalues of the random channelmatrix (called linear spectral statistics [53]), i.e. the type ofmetrics taking the form

n∑

i=1

f

(

λi

(1

nM RR∗M∗

))

(13)

for some function f (·) and some deterministic matrix R ∈Cm×m , where λi (X) denotes the i th eigenvalue of a matrix X .As can been seen, many vector channel performance metrics(e.g. MIMO mutual information, MMSE, sampled channelcapacity loss) can all be transformed into certain forms oflinear spectral statistics. We will characterize in Proposition 1and Theorem 1 the measure concentration of (13), whichquantifies the fluctuation of (13) in finite-dimensional randomvector channels.

A. Concentration Inequalities for the Spectral Measure

Many linear spectral statistics of random matrices sharplyconcentrate within a narrow interval indifferent to the preciseentry distributions. Somewhat surprisingly, such spectral mea-sure concentration is shared by a very large class of randommatrix ensembles. We provide formal quantitative illustrationof such concentration of spectral measure phenomena below,which follows from [51, Corollary 1.8]. This behavior, whilenot widely used in communication and signal processing, is ageneral result in probability theory derived from the Talagrandconcentration inequality (see [64] for details).

Before proceeding to the concentration results, we define

f0 (M) := 1

n

min(m,n)∑

i=1

f

(

λi

(1

nM R R∗M∗

))

(14)

for the sake of notational simplicity.Proposition 1: Consider a random matrix M =

[

M i j]

1≤i≤n,1≤ j≤m satisfying (10) and (12). Suppose thatR ∈ Cm×m is any given matrix and ‖R‖ ≤ ρ. Considera function f (x) such that g(x) := f

(

x2)

(x ∈ R+) is areal-valued Lipschitz function with Lipschitz constant ‖g‖L

Page 5: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

370 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 1, JANUARY 2015

as defined in (1). Let κ and ν be defined in (9) and (12),respectively.

(a) Bounded Measure: If M i j ’s are bounded by D and g(·)is convex, then for any β > 8

√π ,

| f0 (M) − E [ f0 (M)]| ≤ β Dρν ‖g‖Ln

(15)

with probability exceeding 1 − 4 exp(

−β2

)

.(b) Logarithmic Sobolev Measure: If the measure of M i j

satisfies the LSI with a uniformly bounded constant cls, thenfor any β > 0,

| f0 (M) − E [ f0 (M)]| ≤ β√

clsρν ‖g‖Ln

(16)

with probability exceeding 1 − 2 exp(

−β2

κ

)

.(c) Sub-Exponential and Heavy-Tailed Measure: Sup-

pose that M i j ’s are independently drawn from eithersub-exponential distributions or heavy-tailed distributions andthat their distributions are symmetric about 0, with τc and σc

defined respectively in (7) and (8) with respect to M i j forsome sequence c(n). If g(·) is convex, then

∣∣∣ f0 (M) − E

[

f0

(

M)]∣∣∣ ≤ 2κ

c(n) log nτcσcρ ‖g‖Ln

(17)

with probability exceeding 1 − 5n−c(n), where M is definedsuch that M i j := M i j 1{|M i j |<τc}.

Proof: See Appendix A. �Remark 3: By setting β = √

log n in Proposition 1(a)and (b), one can see that under either measures of boundedsupport or measures obeying the LSI,

∣∣∣ f0 (M) − E

[

f0

(

M)]∣∣∣ = O

(√log n

n

)

(18)

with high probability. In contrast, under heavy-tailedmeasures,

∣∣∣ f0 (M) − E

[

f0

(

M)]∣∣∣ = O

(√

c(n)τ 2c σ 2

c log n

n

)

(19)

with high probability, where√

c(n)τcσc is typically a growingfunction in n. As a result, the concentration under sub-Gaussian distributions is sharper than that under heavy-tailedmeasures by a factor of

√c(n)τcσc.

Remark 4: The bounds derived in Proposition 1 scalelinearly with ν, which is the maximum standard deviation ofthe entries of M. This allows us to assess the concentrationphenomena for matrices with entries that have non-uniformvariance.

Proposition 1(a) and 1(b) assert that for both measuresof bounded support and a large class of sub-Gaussiandistributions, many separable functions of the spectra ofrandom matrices exhibit sharp concentration, assuming asufficient amount of independence between the entries.More remarkably, the tails behave at worst like a Gaussianrandom variable with well-controlled variance. Note that fora bounded measure, we require the objective metric of theform (13) to satisfy certain convexity conditions in order toguarantee concentration. In contrast, the fluctuation of general

Lipschitz functions can be well controlled for logarithmicSobolev measures. This agrees with the prevailing wisdomthat the standard Gaussian measure (which satisfies the LSI)often exhibits sharper concentration than general boundeddistributions (e.g. Bernoulli measures).

Proposition 1(c) demonstrates that spectral measure concen-tration arises even when the tail distributions of M i j are muchheavier than standard Gaussian random variables, although itmight not be as sharp as for sub-Gaussian measures. Thisremarkable feature comes at a price, namely, the deviationof the objective metrics is much less controlled than for sub-Gaussian distributions. However, this degree of concentrationmight still suffice for most practical purposes. Note that theconcentration result for heavy-tailed distributions is stated interms of the truncated version M. The nice feature of thetruncated M is that its entries are all bounded (and hencesub-Gaussian), which can often be quantified or estimatedin a more convenient fashion. Finally, we remark that theconcentration depends on the choice of the sequence c(n),which in turn affects the size of τc and σc. We will illustratethe resulting size of confidence intervals in Section III-C viaseveral examples.

B. Approximation of Expected Empirical Distribution

Although Proposition 1 ensures sharp measure concen-tration of various linear spectral statistics, a more precisecharacterization requires evaluating the mean value of thetarget metric (13) (i.e. E [ f0 (M)]). While limiting laws oftenadmit simple asymptotic characterization of this mean valuefor a general class of metrics, whether the convergence ratecan be quantified often needs to be studied on a case-by-casebasis. In fact, this has become an extensively researched topicin mathematics (see [65]–[68]). In this subsection, we developan approximation result that allows the expected value of abroader class of metrics to be well approximated by conciseand informative expressions.

Recall that

f0 (M) := 1

n

min(m,n)∑

i=1

f

(

λi

(1

nM R R∗M∗

))

(20)

We consider a large class of situations where the exponentialmean of the target metric (13)

E ( f ) := E[

exp (n f0 (M))]

= E

⎣exp

min{m,n}∑

i=1

f

(

λi

(1

nM R R∗M∗

))⎞

⎦ (21)

(instead of E [ f0 (M)]) can be approximated in a reason-ably accurate manner. This is particularly relevant whenf (·) is a logarithmic function. For example, this applies tolog-determinant functions

f0 (M) = 1

nlog det

(

I + 1

nM R R∗M∗

)

,

which are of significant interest in various applications suchas wireless communications [69], multivariate hypothesistesting [70], etc. While E

[

det(

I + 1n M M∗)] often admits a

Page 6: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

CHEN et al.: BACKING OFF FROM INFINITY: PERFORMANCE BOUNDS VIA CONCENTRATION 371

TABLE II

SUMMARY OF PARAMETERS OF THEOREM 1

simple distribution-free expression, E[

log det(

I + 1n M M∗)]

is highly dependent on precise distributions of the entries ofthe matrix.

One might already notice that, by Jensen’s inequality,log E ( f ) is larger than the mean objective metric E [ f0 (M)].Nevertheless, in many situations, these two quantities differ byonly a vanishingly small gap, which is formally demonstratedin the following lemma.

Lemma 1: (a) Sub-Exponential Tail: Suppose that

P

(

| f0 (M) − E [ f0 (M)]| >y

n

)

≤ c1 exp (−c2 y)

for some values c1 > 0 and c2 > 1, then

1

nlog E ( f ) −

log(

1 + c1c2−1

)

n≤ E [ f0 (M)] ≤ 1

nlog E ( f )

(22)

with E ( f ) defined in (21).(b) Sub-Gaussian Tail: Suppose that

P

(

| f0 (M) − E [ f0 (M)]| >y

n

)

≤ c1 exp(

−c2 y2)

for some values c1 > 0 and c2 > 0, then

1

nlog E (f) ≥ E [ f0 (M)]

≥ 1

nlog E (f) −

log

(

1 +√

πc21

c2exp

(1

4c2

))

n(23)

with E ( f ) defined in (21).Proof: See Appendix B. �

Remark 5: For measures with sub-exponential tails,if c2 < 1, the concentration is not decaying sufficiently fastand is unable to ensure that E ( f ) = E

[

en f0(M)]

exists.In short, Lemma 1 asserts that if f0 (M) possesses a

sub-exponential or a sub-Gaussian tail, then E[ f0 (M)] canbe approximated by 1

n log E ( f ) in a reasonably tight manner,namely, within a gap no worse than O ( 1

n

)

. Since Proposition 1implies sub-Gaussian tails for various measures, we immedi-ately arrive at the following concentration results that concerna large class of sub-Gaussian and heavy-tailed measures.

Theorem 1: Let cρ, f ,D, cρ, f ,cls , and cρ, f ,τc ,σc be numericalvalues defined in Table II, and set

μρ,g,A := νρ ‖g‖L A. (24)

(1) Bounded Measure: Under the assumptions ofProposition 1(a), we have

{

f0 (M) ≤ 1n log E ( f ) + βμρ,g,D

n ,

f0 (M) ≥ 1n log E ( f ) − βμρ,g,D

n − cρ, f ,Dn ,

(25)

with probability exceeding 1 − 4 exp(

−β2

)

.

(2) Logarithmic Sobolev Measure: Under the assumptionsof Proposition 1(b), we have

{

f0 (M) ≤ 1n log E ( f ) + βμρ,g,

√cls

n ,

f0 (M) ≥ 1n log E ( f ) − βμρ,g,

√cls

n − cρ, f ,clsn ,

(26)

with probability at least 1 − 2 exp(

−β2

κ

)

.(3) Heavy-Tailed Distribution: Under the assumptions of

Proposition 1(c), we have{

f0 (M) ≤ 1n log EM ( f ) + μρ,g,ζ

n ,

f0 (M) ≥ 1n log EM ( f ) − μρ,g,ζ

n − cρ, f ,τc ,σcn ,

(27)

with probability exceeding 1 − 5n−c(n), where{

ζ := τcσc√

8κc(n) log n;

EM ( f ) := E

[

exp(

n f0

(

M))]

.Proof: See Appendix C. �

While Lemma 1 focuses on sub-exponential and sub-Gaussian measures, we are able to extend the concentrationphenomenon around 1

n log E ( f ) to a much larger class ofdistributions including heavy-tailed measures through certaintruncation arguments.

To get a more informative understanding of Theorem 1,consider the case where ‖g‖L, τc, D, cls, ρ are all constants.One can see that with probability exceeding 1 − ε for anysmall constant ε > 0,

f0 (M) = 1

nlog E ( f ) + O

(1

n

)

holds for measures of bounded support and measures satisfy-ing the LSI. In comparison, for symmetric power-law measuressatisfying P

(∣∣M i j

∣∣ ≥ x

) ≤ x−λ for some λ > 0, the functionτc is typically a function of the form nδ for some constantδ > 0. In this case, with probability at least 1 − ε, one has

f0 (M) = 1

nlog EM ( f ) + O

(1

n1−δ

)

for heavy-tailed measures, where the uncertainty cannot becontrolled as well as for bounded measures or measuresobeying the LSI.

In the scenario where E ( f ) can be computed, Theorem1 presents a full characterization of the confidence intervalof the objective metrics taking the form of linear spectralstatistics. In fact, in many applications, E ( f ) (rather thanE [ f0]) can be precisely computed for a general class ofprobability distributions beyond the Gaussian measure, whichallows for accurate characterization of the concentration of theobjective metrics via Theorem 1, as illustrated in Section IV.

C. Confidence Interval

In this subsection, we demonstrate that the sharp spectralmeasure concentration phenomenon allows us to estimate themean of the target metric in terms of narrow confidenceintervals.

Specifically, suppose that we have obtained the value ofthe objective metric f0 (M) = 1

n

i f(

λi( 1

n M R R∗M∗)) for

Page 7: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

372 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 1, JANUARY 2015

a given realization M. The goal is find an interval (called(1 − α0) confidence interval)

[

l (M), u (M)]

such that

P(

E [ f0] ∈ [l (M), u (M)]) ≥ 1 − α0 (28)

for some constant α0 ∈ (0, 1).Consider the assumptions of Proposition 1, i.e. M i j ’s

are independently generated satisfying E[M i j ] = 0 andE[|M i j |2] = ν2

i j ≤ ν2. An immediate consequence of Propo-sition 1 is stated as follows.

• (Bounded Measure) If M i j ’s are bounded by D and g(·)is convex, then

⎣ f0 (M) ±√

8κ Dρν ‖g‖L√

log 4α0

n

⎦ (29)

is an (1 − α0) confidence interval for E [ f0 (M)].

• (Logarithmic Sobolev Measure) If the measure of M i j

satisfies the LSI with a uniform constant cls, then⎡

⎣ f0 (M) ±√

κclsρν ‖g‖L√

log 2α0

n

⎦ (30)

is an (1 − α0) confidence interval for E [ f0 (M)].

• (Sub-Exponential Measure) If the measure of M i j

is symmetric about 0 and P(∣∣M i j

∣∣ > x

) ≤ e−λx for

some constant λ > 0, then c(n) = log(

5α0

)

log n and τc =1λ log

(5mnα0

)

, indicating that

⎢⎢⎣

f0 (M) ±

4κ log(

5α0

)

ρν ‖g‖Lλ

log(

5mnα0

)

n

⎥⎥⎦

(31)

is an (1 − α0) confidence interval for E[ f0(M)].• (Power-Law Measure) If the measure of M i j is symmet-

ric about 0 and P(∣∣M i j

∣∣ > x

) ≤ x−λ for some constant

λ > 0, then c(n) = log(

5α0

)

log n and τc =(

5mnα0

) 1λ

, indicatingthat⎡

⎢⎢⎢⎣

f0 (M) ±

(√

4κ log(

5α0

) (5α0

) 1λ

ρν ‖g‖L)

(mn)1λ

n

⎥⎥⎥⎦

is an (1 − α0) confidence interval for E[ f0(M)].One can see from the above examples that the spans ofthe confidence intervals under power-law distributions aremuch less controlled than that under sub-Gaussian measure.Depending on the power-law decay exponent λ, the typical

deviation can be as large as O(

n−1+ 2λ

)

as compared to O ( 1n

)

under various sub-Gaussian measures.

When D, cls, ρ, ‖g‖L, and α0 are all constants, the widthsof the above confidence intervals decay with n, which isnegligible for many metrics of interest.

D. A General Recipe for ApplyingProposition 1 and Theorem 1

For pedagogical reasons, we provide here a general reciperegarding how to apply Proposition 1 and Theorem 1 toevaluate the fluctuation of system performance metrics inrandom MIMO channels with channel matrix H .

1) Transform the performance metric into a linear spectralstatistic, i.e. write the metric in the form f0 (M) =1n

∑ni=1 f

(

λi( 1

n H R R∗ H∗)) for some function f (·)and some deterministic matrix R.

2) For measures satisfying the LSI, it suffices to calcu-late the Lipschitz constant of g(x) := f (x2). Forboth measures of bounded support and heavy-taileddistributions, since the function g(x) := f (x2) is non-convex in general, one typically needs to convexifyg(x) first. In particular, one might want to identifytwo reasonably tight approximation g1(x) and g2(x) ofg(x) such that: 1) g1(x) and g2(x) are both convex(or concave); 2) g1(x) ≤ g(x) ≤ g2(x).

3) Apply Proposition 1 and / or Theorem 1 on eitherg(x) (for measures satisfying the LSI) or on g1(x) andg2(x) (for bounded measures or heavy-tailed measures)to obtain concentration bounds.

This recipe will be used to establish the canonical examplesprovided in Section IV.

IV. SOME CANONICAL EXAMPLES

In this section, we apply our general analysis frameworkdeveloped in Section III to a few canonical examples thatarise in wireless communications and signal processing. Ratherthan making each example as general as possible, we presentonly simple settings that admit concise expressions frommeasure concentration. We emphasize these simple illustrativeexamples in order to demonstrate the effectiveness of ourgeneral treatment.

A. Mutual Information and Power Offsetof Random MIMO Channels

Consider the following MIMO channel

y = H x + z, (32)

where H ∈ Rnr×nt denotes the channel matrix, x ∈ Rnt

represents the transmit signal, and y ∈ Rnr is the receivedsignal. We denote by z ∼ N (

0, σ 2 In)

the additive Gaussiannoise. Note that (32) allows modeling of a large class ofrandom vector channels (e.g. MIMO-OFDM channels [71],CDMA systems [18], undersampled channels [72]) beyondmultiple antenna channels. For instance, in unfaded direct-sequence CDMA systems, the columns of H can repre-sent random spreading sequences; see [73, Sec. 3.1.1] fordetails.

Page 8: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

CHEN et al.: BACKING OFF FROM INFINITY: PERFORMANCE BOUNDS VIA CONCENTRATION 373

The total power is assumed to be P , independentof nt and nr, and the signal-to-noise ratio (SNR) isdenoted by

SNR := P

σ 2 . (33)

In addition, we denote the degrees of freedom as

n := min (nt, nr) , (34)

and use α > 0 to represent the ratio

α := nt

nr. (35)

We suppose throughout that α is a universal constant that doesnot scale with nt and nr.

Consider the simple channel model where {H i j : 1 ≤i ≤ nr, 1 ≤ j ≤ nt} are independently distributed.Suppose that channel state information (CSI) is avail-able to both the transmitter and the receiver. When equalpower allocation is adopted at all transmit antennas, itis well known that the mutual information C (H , SNR)of the MIMO channel (32) under equal power allocationis [2]

C (H , SNR) = log det

(

I + SNR · 1

ntH H∗

)

, (36)

which depends only on the eigenvalue distribution of H H∗.In the presence of asymptotically high SNR and chan-nel dimensions, it is well known that if H i j ’s are inde-pendent with zero mean and unit variance, then, almostsurely,

limSNR→∞ lim

nr→∞

(C (H , SNR)

nr− min {α, 1} · log SNR

)

=⎧

−1 + (α − 1) log(

αα−1

)

, if α ≥ 1,

−α log (αe) + (1 − α) log(

11−α

)

, if α < 1,(37)

which is independent of the precise entry distributionof H (see [11]). The method of deterministic equiva-lents has also been used to obtain good approximationsunder finite channel dimensions [47, Ch. 6]. In contrast,our framework characterizes the concentration of mutualinformation in the presence of finite SNR and finite nrwith reasonably tight confidence intervals. Interestingly,the MIMO mutual information is well-controlled withina narrow interval, as formally stated in the followingtheorem.

Theorem 2: Assume perfect CSI at both the transmitter andthe receiver, and equal power allocation at all transmit anten-nas. Suppose that H ∈ Rnr×nt where H i j ’s are independentrandom variables satisfying E

[

H i j] = 0 and E[|H i j |2] = 1.

Set n := min {nr, nt}.

(a) If H i j ’s are bounded by D, then for any β > 8√

π ,

C (H , SNR)

nr

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

log SNR+[

1nr

logR ( 2eSNR , nr, nt

)+ βr lb,+bdnr

,

1nr

logR ( e2SNR , nr, nt

)+ βrub,+bdnr

]

, if α ≥ 1

α log SNRα +

[

1nr

logR ( 2αeSNR , nt , nr

)+ βr lb,−bdnr

,

1nr

logR ( eα2SNR , nt , nr

)+ βrub,−bdnr

]

, if α < 1

(38)

with probability exceeding 1 − 8 exp(

−β2

8

)

.(b) If H i j ’s satisfy the LSI with respect to a uniform

constant cls, then for any β > 0,

C (H , SNR)

nr

⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

log SNR + 1nr

logR ( 1SNR , nr, nt

)

+ βnr

[

r lb,+ls , rub,+

ls

]

, if α ≥ 1

α log SNRα + 1

nrlogR ( α

SNR , nt, nr)

+ βnr

[

r lb,−ls , rub,−

ls

]

, if α < 1

(39)

with probability exceeding 1 − 4 exp(−β2

)

.(c) Suppose that H i j ’s are independently drawn from either

sub-exponential distributions or heavy-tailed distributions andthat the distributions are symmetric about 0. Let τc(n) bedefined as in (7) with respect to H i j ’s for some sequencec(n). Then,

C (H , SNR)

nr

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

log SNR +[

1nr

logR(

2eσ 2

c SNR , nr, nt

)

+ r lb,+htnr

,

1nr

logR(

e2σ 2

c SNR , nr, nt

)

+ rub,+htnr

]

+ 2 log σc, if α ≥ 1

α log SNRα +

[

1nr

logR(

2αeσ 2

c SNR, nr, nt

)

+ r lb,−htnr

,

1nr

logR(

eα2σ 2

c SNR , nt , nr

)

+ rub,−htnr

,

]

+ 2 log σc, if α < 1

(40)

with probability exceeding 1 − 10nc(n) .

Here,

R (ε, n, m) :=n∑

i=0

(n

i

)εn−i m−i m!

(m − i)!, (41)

and the residual terms are provided in Table III.Proof: See Appendix E. �

Remark 6: In the regime where β = � (1), one can easilysee that the magnitudes of all the residual terms rub,+

bd , r lb,+bd ,

rub,−bd , r lb,−

bd , rub,+ls , r lb,+

ls , rub,−ls , r lb,−

ls do not scale with n.

Page 9: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

374 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 1, JANUARY 2015

TABLE III

SUMMARY OF PARAMETERS OF THEOREM 2 AND COROLLARY 1

The above theorem relies on the expression R (ε, m, n).In fact, this function is exactly equal to E

[

det(

ε I + 1m M M∗)]

in a distribution-free manner, as revealed by the followinglemma.

Lemma 2: Consider any random matrix M ∈ Cn×m suchthat M i j ’s are independently generated satisfying

E[

M i j] = 0, and E

[∣∣M i j

∣∣2]

= 1. (42)

Then one has

E

[

det

(

ε I + 1

mM M∗

)]

=n∑

i=0

(n

i

)εn−i m−i m!

(m − i)!(43)

Proof: This lemma improves upon known results underGaussian random ensembles by generalizing them to a verygeneral class of random ensembles. See Appendix D for thedetailed proof. �

To get a more quantitative assessment of the concentrationintervals, we plot the 95% confidence interval of the MIMOmutual information for a few cases in Fig. 1 when thechannel matrix H is an i.i.d. Gaussian random matrix. Theexpected value of the capacity is adopted from the mean of3000 Monte Carlo trials. Our theoretical predictions of thedeviation bounds are compared against the simulation resultsconsisting of Monte Carlo trials. One can see from the plotsthat our theoretical predictions are fairly accurate even forsmall channel dimensions, which corroborates the power ofconcentration of measure techniques.

At moderate-to-high SNR, the function R(ε, n, m) admits asimple approximation. This observation leads to a concise andinformative characterization of the mutual information of theMIMO channel, as stated in the following corollary.

Corollary 1: Suppose that SNR > 2e max {α, 1} · max{e2

α3, 4α}, and set n := min{nt , nr}.(a) Consider the assumptions of Theorem 2(a). For any

β > 8√

π ,

C (H , SNR)

nr

⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

log SNRe + (α − 1) log

α−1

)

+[

γ lb,+bd + βr lb,+

bdnr

, γ ub,+bd + βrub,+

bdnr

]

, if α ≥ 1,

α log SNRαe + (1 − α) log

(1

1−α

)

+[

γ lb,−bd + βr lb,−

bdnr

, γ ub,−bd + βrub,−

bdnr

]

, if α < 1,

(44)

with probability exceeding 1 − 8 exp(

−β2

8

)

.

(b) Consider the assumptions of Theorem 2(b). For anyβ > 0,

C (H , SNR)

nr

⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

log SNRe + (α − 1) log

α−1

)

+[

γ lb,+ls + βr lb,+

lsnr

, γ ub,+ls + βrub,+

lsnr

]

, if α ≥ 1,

α log SNRαe + (1 − α) log

(1

1−α

)

+[

γ lb,−ls + βr lb,−

lsnr

, γ ub,−ls + βrub,−

lsnr

]

, if α < 1,

(45)

with probability exceeding 1 − 4 exp(−β2

)

.

Page 10: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

CHEN et al.: BACKING OFF FROM INFINITY: PERFORMANCE BOUNDS VIA CONCENTRATION 375

Fig. 1. 95%-confidence interval for MIMO mutual information C(H ,SNR)nr

when H is drawn from i.i.d. standard Gaussian ensemble. From Table III, the

sizes of the upper and lower confidence spans are given by β√

SNR√αnr

(α > 1) and β√

SNRnr

(α < 1), respectively, where β =√

log 8(1−5%) . The expected value

of C(H ,SNR)nr

is adopted from the mean of 3000 Monte Carlo trials. The theoretical confidence bounds are compared against the simulation results (with 3000Monte Carlo trials). The results are shown for (a) α = 2, SNR = 5; (b) α = 2, SNR = 2; (c) α = 1/2, SNR = 5; and (d) α = 1/2, SNR = 2.

(c) Consider the assumptions of Theorem 2(c). Then, onehas

C (H , SNR)

nr

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

log SNRe + (α − 1) log

α−1

)

+ log σc+[

γ lb,+ht + r lb,+

htnr

, γ ub,+ht + rub,+

htnr

]

, if α ≥ 1,

α log SNRαe + (1 − α) log

(1

1−α

)

+ log σc+[

γ lb,−ht + r lb,−

htnr

, γ ub,−ht + rub,−

htnr

]

, if α < 1,

(46)

with probability at least 1 − 10n−c(n).Here, the residual terms are formally provided in Table III.

Proof: See Appendix F. �Remark 7: One can see that the magnitudes of all these

extra residual terms γ ub,−bd , γ lb,−

bd , γ ub,+bd , γ lb,+

bd , γ ub,−ls , γ lb,−

ls ,

γ ub,+ls , and γ lb,+

ls are no worse than the order

O(

log n

n+ log SNR√

SNR

)

, (47)

which vanish as SNR and channel dimensions grow.In fact, Corollary 1 is established based on the sim-

ple approximation of R (ε, n, m) := E[

det(

ε I + 1m M M∗)].

At moderate-to-high SNR, this function can be approximatedreasonably well through much simpler expressions, as statedin the following lemma.

Lemma 3: Consider a random matrix M =(

M i j)

1≤i≤n,1≤ j≤m such that α := m/n ≥ 1. Suppose

that M i j ’s are independent satisfying E[

M i j] = 0 and

E

[∣∣M i j

∣∣2]

= 1. If 4n < ε < min

{1

e2α3 , 14α

}

, then one has

1

nlog E

[

det

(

ε I + 1

mM M∗

)]

= (α − 1) log

α − 1

)

− 1 + rE, (48)

Page 11: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

376 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 1, JANUARY 2015

where the residual term rE satisfies

rE ∈[

12 log n−log 1

2π (m + 1)

n,

1.5 log (en)

n+2

√αε log

1

αε

]

.

Proof: Appendix F. �Combining Theorem 2 and Lemma 3 immediately

establishes Corollary 1.1) Implications of Theorem 2 and Corollary 1: Some impli-

cations of Corollary 1 are listed as follows.

1) When we set β = �(√

log n)

, Corollary 1 implies thatin the presence of moderate-to-high SNR and moderate-to-large channel dimensionality n = min {nt, nr}, theinformation rate per receive antenna behaves like

C (H , SNR)

nr

=

⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

log SNRe + (α − 1) log

α−1

)

+O(

SNR+√log n

n + log SNR√SNR

)

, if α ≥ 1

α log SNRαe + (1 − α) log

(1

1−α

)

+O(

SNR+√log n

n + log SNR√SNR

)

, if α < 1

(49)

with high probability. That said, for a fairly large regime,the mutual information falls within a narrow range.In fact, the mutual information depends almost onlyon SNR and the ratio α = nt

nr, and is independent

of the precise number of antennas nr and nt, exceptfor some vanishingly small residual terms. The first-order term of the expression (49) coincides with existingasymptotic results (see [11]). This in turns validates ourconcentration of measure approach.

2) Theorem 2 indicates that the size of the typicalconfidence interval for C(H ,SNR)

nrdecays with n at

a rate not exceeding O ( 1n

)

(by picking β = � (1))under measures of bounded support or measuressatisfying the LSI. Note that it has been established(see [35, Th. 2 and Proposition 2]) that theasymptotic standard deviation of C(H ,SNR)

nrscales

as �( 1

n

)

. This reveals that the concentration ofmeasure approach we adopt is able to obtainthe confidence interval with optimal size. Recentworks by Li, Mckay and Chen [74], [75]derive polynomial expansion for each of the moments(e.g. mean, standard deviation, and skewness) ofMIMO mutual information, which can also be used toapproximate the distributions. In contrast, our resultsare able to characterize any concentration interval in asimpler and more informative manner.

3) Define the power offset for any channel dimension asfollows

L (H , SNR) := log SNR − C (H , SNR)

min {nr, nt} . (50)

One can see that this converges to the notion of high-SNR power offset investigated by Lozano et. al. [5], [12]in the limiting regime (i.e. when SNR → ∞). Our

results reveal the fluctuation of L (H , SNR) such that

L (H , SNR)

=

⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

(α − 1) log(

α−1α

)+ 1

+O(

SNR+√log n

n + log SNR√SNR

)

, if α ≥ 1

1−αα log (1 − α) + log (eα)

+O(

SNR+√log n

n + log SNR√SNR

)

, if α < 1

(51)

with high probability. The first-order term ofL (H , SNR) agrees with the known resultson asymptotic power offset (see [5, Proposition 2][12, Equation (84)]) when n → ∞ and SNR → ∞.Our results distinguish from existing results in the sensethat we can accurately accommodate a much largerregime beyond the one with asymptotically large SNRand channel dimensions.

4) The same mutual information and power offset valuesare shared among a fairly general class of distributionseven in the non-asymptotic regime. Moreover, heavy-tailed distributions lead to well-controlled informationrate and power-offset as well, although the spectralmeasure concentration is often less sharp than forsub-Gaussian distributions (which ensure exponentiallysharp concentration).

5) Finally, we note that Corollary 1 not only characterizesthe order of the residual term, but also provides reason-ably accurate characterization of a narrow confidenceinterval such that the mutual information and the poweroffset lies within it, with high probability. All pre-constants are explicitly provided, resulting in a fullassessment of the mutual information and the poweroffset. Our results do not rely on careful choice ofgrowing matrix sequences, which are typically requiredin those works based on asymptotic laws.

2) Connection to the General Framework: We now illus-trate how the proof of Theorem 2 follows from the generalframework presented in Section III.

1) Note that the mutual information can be alternativelyexpressed as

1

nrC (H , SNR)

= 1

nr

min{nr ,nt}∑

i=1

log

(

1 + SNR · λi

(1

ntH H∗

))

:= 1

nr

min{nr ,nt}∑

i=1

{

log SNR + f

(

λi

(1

ntH H∗

))}

,

where f (x) := log( 1

SNR + x)

on the domain x ≥ 0.As a result, the function g(x) := f (x2) =log( 1

SNR + x2)

has Lipschitz norm ‖g‖L ≤ SNR1/2.2) For measures satisfying the LSI, Theorem 2(b) imme-

diately follows from Theorem 1(b). For measures ofbounded support or heavy-tailed measures, one canintroduce the following auxiliary function with respect

Page 12: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

CHEN et al.: BACKING OFF FROM INFINITY: PERFORMANCE BOUNDS VIA CONCENTRATION 377

to some small ε:

gε(x) :={

log(

ε + x2)

, if x ≥ √ε,

1√ε

(

x − √ε)+ log (2ε) , if 0 ≤ x <

√ε,

which is obtained by convexifying g(x). If we setfε(x) := gε(

√x), then one can easily check that

fSNR−1 (x) ≤ f (x) ≤ f e2 SNR−1 (x) .

Applying Theorem 1(a) allows us to derive theconcentration on

1

nr

min{nr ,nt}∑

i=1

fSNR−1

(

λi

(1

ntH H∗

))

and

1

nr

min{nr ,nt}∑

i=1

f e2 SNR−1

(

λi

(1

ntH H∗

))

,

which in turn provide tight lower and upper bounds for1nr

∑min{nr ,nt}i=1 f

(

λi

(1nt

H H∗))

.

3) Finally, the mean value E

[

det(

I + SNR · 1nt

H H∗)]

admits a closed-form expression independent of preciseentry distributions, as derived in Lemma 2.

B. MMSE Estimation in Random Vector Channels

Next, we show that our analysis framework can be appliedin Bayesian inference problems that involve estimation ofuncorrelated signals components. Consider a simple linearvector channel model

y = H x + z,

where H ∈ Cn×p is a known matrix, x ∈ Cp denotes aninput signal with zero mean and covariance matrix P I p, andz represents a zero-mean noise uncorrelated with x, whichhas covariance σ 2 In . Here, we denote by p and n the inputdimension and the sample size, respectively, which agrees withthe conventional notation adopted in the statistics community.In this subsection, we assume that

α := p

n≤ 1, (52)

i.e. the sample size exceeds the dimension of the input signal.The goal is to estimate the channel input x from the channeloutput y with minimum �2 distortion.

The MMSE estimate of x given y can be written as [4]

x = E[

x | y] = E

[

x y∗] (E[

y y∗])−1 y

= P H∗ (σ 2 In + P H H∗)−1y, (53)

and the resulting MMSE is given by

MMSE (H , SNR)

= tr

(

P I p − P2 H∗ (σ 2 In + P H H∗)−1H)

. (54)

The normalized MMSE (NMMSE) can then be written as

NMMSE (H , SNR) := MMSE (H , SNR)

E[‖x‖2]

= tr

(

I p− H∗(

1

SNRIn + H H∗

)−1

H

)

,

(55)

where

SNR := P

σ 2 .

Using the concentration of measure technique, we canevaluate NMMSE (H) to within a narrow interval with highprobability, as stated in the following theorem.

Theorem 3: Suppose that H = AM, where A ∈ Cm×n is a

deterministic matrix for some integer m ≥ p, and M ∈ Cn×p

is a random matrix such that M i j ’s are independent randomvariables satisfying E

[

M i j] = 0 and E[∣∣M i j

∣∣2] = 1

p .(a) If

√pM i j ’s are bounded by D, then for any β > 8

√π ,

NMMSE (H , SNR)

p∈[

M(

8

9SNR, H)

+ βτ lbbd

p,

M(

9

8SNR, H)

+ βτ ubbd

p

]

(56)

with probability exceeding 1 − 8 exp(

−β2

16

)

.

(b) If√

pM i j ’s satisfy the LSI with respect to a uniformconstant cls, then for any β > 0,

NMMSE (H , SNR)

p∈ M

(1

SNR, H)

+ β

p

[

τ lbls , τ ub

ls

]

(57)

with probability exceeding 1 − 4 exp(

−β2

2

)

.

(c) Suppose that√

pM i j ’s are independently drawn fromeither sub-exponential distributions or heavy-tailed distribu-tions and that the distributions are symmetric about 0. Letτc be defined as in (7) with respect to

√pM i j ’s for some

sequence c(n). Then

NMMSE (H , SNR)

p∈[

M(

8

9SNR, AM

)

+ τ lbht

p,

M(

9

8SNR, AM

)

+ τ ubht

p

]

(58)

with probability exceeding 1 − 10pc(n) , where M is defined such

that M i j := M i j 1{√p|M i j |<τc}.Here, the function M (ε, H) is defined as

M (δ, H) := 1

pE

[

tr((

δ I + H∗ H)−1)]

, (59)

and the residual terms are formally defined in Table IV.Proof: See Appendix G. �

Theorem 3 ensures that the MMSE per signal compo-nent falls within a small interval with high probability.This concentration phenomenon again arises for a very largeclass of probability measures including bounded distributions,

Page 13: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

378 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 1, JANUARY 2015

TABLE IV

SUMMARY OF PARAMETERS OF THEOREM 3 AND COROLLARY 2

sub-Gaussian distributions, and heavy-tailed distributions.Note that M ( 9

8SNR , H)

, M ( 89SNR , H

)

, and M ( 1SNR , H

)

are often very close to each other at moderate-to-high SNR(see the analysis of Corollary 2). The spans of the residualintervals for both bounded and logarithmic Sobolev measuresdo not exceed the order

O(

β ‖A‖ SNR1.5

p

)

, (60)

which is often negligible compared with the MMSE value persignal component.

In general, we are not aware of a closed-form expres-sion for M (δ, H) under various random matrix ensembles.If H is drawn from a Gaussian ensemble, however, we areable to derive a simple expression and bound this value towithin a narrow confidence interval with high probability, asfollows.

Corollary 2: Suppose that H i j ∼ N(

0, 1p

)

are indepen-dent complex-valued Gaussian random variables, and assumethat α < 1. Then

NMMSE (H , SNR)

p∈ α

1 − α+[

τ lbg + βτ lb

ls

p, τ ub

g + βτ ubls

p

]

(61)

with probability exceeding 1−4 exp(

−β2

2

)

. Here, τ lbg and τ ub

g

are presented in Theorem 3, while τ lbg and τ ub

g are formallydefined in Table IV.

Proof: See Appendix G. �Corollary 2 implies that the MMSE per signal component

behaves like

NMMSE (H , SNR)

p= α

1 − α+ O

(

SNR1.5

n+ 1

n+ 1

SNR

)

with high probability (by setting β = � (1)). Except forthe residual term that vanishes in the presence of highSNR and large signal dimensionality n, the first orderterm depends only on the SNR and the oversamplingratio 1

α . Therefore, the normalized MMSE converges to analmost deterministic value even in the non-asymptotic regime.This illustrates the effectiveness of the proposed analysisapproach.

1) Connection to the General Framework: We now demon-strate how the proof of Theorem 3 follows from the generaltemplate presented in Section III.

1) Observe that the MMSE can be alternativelyexpressed as

NMMSE (H , SNR)

p= 1

p

min{n,p}∑

i=1

1

SNR−1 + λi(

H H∗)

:= 1

p

min{n,p}∑

i=1

f(

λi(

H H∗)) ,

where f (x) := 1SNR−1+x

(x ≥ 0). Consequently, the

function g(x) := f (x2) = 1SNR−1+x2 has Lipschitz norm

‖g‖L ≤ 3√

38 SNR3/2.

2) For measures satisfying the LSI, Theorem 3(b) can beimmediately obtained by Proposition 1(b). For measuresof bounded support or heavy-tailed measures, one canintroduce the following auxiliary function with respectto some small ε:

gε(x) :=⎧

1ε2+x2 , if x > 1√

3ε,

− 3√

38

(

x − ε√3

)

+ 34ε2 , if x ≤ 1√

3ε,

which is obtained by convexifying g(x). By settingfε(x) := gε(

√x), one has

f 32√

2SNR−1/2 (x) ≤ f (x) ≤ fSNR−1/2 (x) .

Applying Proposition 1(a) gives the concentrationbounds on

1

p

min{p,n}∑

i=1

f 32√

2SNR−1/2

(

λi(

H H∗))

and

1

p

min{p,n}∑

i=1

fSNR−1/2(

λi(

H H∗)) ,

which in turn provide tight sandwich bounds for1p

∑min{p,n}i=1 f

(

λi(

H H∗)).

V. CONCLUSIONS AND FUTURE DIRECTIONS

We have presented a general analysis framework that allowsmany canonical MIMO system performance metrics takingthe form of linear spectral statistics to be assessed within anarrow concentration interval. Moreover, we can guaranteethat the metric value falls within the derived interval withhigh probability, even under moderate channel dimensionality.We demonstrate the effectiveness of the proposed frameworkthrough two canonical metrics in wireless communications

Page 14: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

CHEN et al.: BACKING OFF FROM INFINITY: PERFORMANCE BOUNDS VIA CONCENTRATION 379

and signal processing: mutual information and power offset ofMIMO channels, and MMSE for Gaussian processes. For eachof these examples, our approach allows a more refined andaccurate characterization than those derived in previous workin the asymptotic dimensionality limit. While our exampleshere are presented for i.i.d. channel matrices, they can allbe immediately extended to accommodate dependent randommatrices using the same techniques. In other words, ouranalysis can be applied to characterize various performancemetrics of wireless systems under correlated fading, as studiedin [8].

Our work suggests a few future research directions fordifferent matrix ensembles. Specifically, if the channel matrixH cannot be expressed as a linear correlation model M A forsome known matrix A and a random i.i.d. matrix M , then itwill be of great interest to see whether there is a systematicapproach to analyze the concentration of spectral measurephenomenon. For instance, if H consists of independent sub-Gaussian columns but the elements within each column arecorrelated (i.e. cannot be expressed as a weighted sum of i.i.d.random variables), then it remains to be seen whether gener-alized concentration of measure techniques (see [52]) can beapplied to find confidence intervals for the system performancemetrics. The spiked random matrix model [76], which alsoadmits simple asymptotic characterization, is another popularcandidate in modeling various system metrics. Another morechallenging case is when the rows of H are randomly drawnfrom a known set of structured candidates. An example isa random DFT ensemble where each row is independentlydrawn from a DFT matrix, which is frequently encounteredin the compressed sensing literature (see [77]). Understandingthe spectral concentration of such random matrices could beof great interest in investigating OFDM systems under randomsubsampling.

APPENDIX APROOF OF PROPOSITION 1

Part (a) and (b) of Proposition 1 are immediate conse-quences of [51, Corollary 1.8] via simple algebraic manip-ulation. Here, we only provide the proof for heavy-taileddistributions.

Define a new matrix M as a truncated version of M suchthat

M i j ={

M i j , if∣∣M i j

∣∣ < τc,

0, otherwise.

Note that the entries of M have zero mean and variance notexceeding σ 2

c , and are uniformly bounded in magnitude by τc.Consequently, Theorem 1(a) asserts that for any β > 8

√π ,

we have

∣∣∣ f0

(

M)

− E

[

f0

(

M)]∣∣∣ ≤ βτcσcρ ‖g‖L

n(62)

with probability exceeding 1 − 4 exp(

−β2

16

)

.

In addition, a simple union bound taken collectively with (7)yields

P

(

M �= M)

≤∑

1≤i≤n,1≤ j≤m

P

(1

νi j

∣∣M i j

∣∣ > τc

)

≤ mnP

(1

νi j

∣∣M i j

∣∣ > τc

)

= 1

nc(n). (63)

By setting β = 4√

c(n) log n, one has

4 exp

(

−β2

16

)

= 4

nc(n).

This combined with (62) and the union bound implies that∣∣∣ f0 (M) − E

[

f0

(

M)]∣∣∣ ≤ 4

c(n) log nτcσcρ ‖g‖Ln

(64)

holds with probability at least 1 − 5n−c(n), as claimed.

APPENDIX BPROOF OF LEMMA 1

For notational simplicity, define

Y := n { f0 (M) − E [ f0 (M)]}

=min{m,n}∑

i=1

{

f

(

λi

(1

nM RR∗M∗

))

− E

[

f

(

λi

(1

nM R R∗M∗

))]}

and

Z := n f0 (M) =min{m,n}∑

i=1

f

(

λi

(1

nM R R∗M∗

))

,

and denote by μ|Y |(y) the probability measure of |Y |.(1) Suppose that P (|Y | > y) ≤ c1 exp (−c2y) holds for

some universal constants c1 > 0 and c2 > 1. Applyingintegration by parts yields the following inequality

E

[

eY]

≤ E

[

e|Y |] =ˆ ∞

0e|Y |dμ|Y |(y)

= − eyP (|Y | > y)

∣∣∞0 +

ˆ ∞

0ey

P (|Y | > y) dy

≤ 1 + c1

ˆ ∞

0exp (− (c2 − 1) y) dy

= 1 + c1

c2 − 1. (65)

This gives rise to the following bound

log E

[

eZ]

− log eE[Z ] = log E

[

eY]

≤ log

(

1 + c1

c2 − 1

)

.

Therefore, we can conclude

1

nlog E

[

eZ]

−log(

1 + c1c2−1

)

n≤ 1

nE [Z ] ≤ 1

nlog E

[

eZ]

,

(66)

where the last inequality follows from Jensen’sinequality

1

nE [Z ] = 1

nlog eE[Z ] ≤ 1

nlog Ee[Z ].

Page 15: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

380 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 1, JANUARY 2015

(2) Similarly, if P (|Y | > y) ≤ c1 exp(−c2 y2

)

holds forsome universal constants c1 > 0 and c2 > 0, then onecan bound

E

[

eY]

≤ − eyP (|Y | > y)

∣∣∞0 +

ˆ ∞

0ey

P (|Y | > y) dy

≤ 1 + c1

ˆ ∞

0exp

(

−c2y2 + y)

dy

≤ 1 + c1

√π

c2exp

(1

4c2

)

·ˆ ∞

−∞

c2

πexp

(

−c2y2 + y − 1

4c2

)

dy

≤ 1 + c1

√π

c2exp

(1

4c2

)

, (67)

where the last inequality follows since√c2π exp

(

−c2y2 + y − 14c2

)

is the pdf of N(

12c2

, 2c2

)

and hence integrates to 1.Applying the same argument as for the measure withsub-exponential tails then leads to

1

nlog E

[

eZ]

≥ 1

nE [Z ]

≥ 1

nlog E

[

eZ]

−log

(

1 +√

c21πc2

exp(

14c2

))

n.

APPENDIX CPROOF OF THEOREM 1

For notational convenience, define

Z :=min{m,n}∑

i=1

f

(

λi

(1

nM RR∗M∗

))

,

Y := Z − E [Z ].

(1) Under the assumptions of Proposition 1(a), the con-centration result (15) implies that for any y >8√

πρD ‖g‖L, one has

P (|Y | > y) ≤ 4 exp

(

− y2

8κ D2ρ2ν2 ‖g‖2L

)

.

For y ≤ 8√

π Dρν ‖g‖L, we can employ the trivialbound P (|Y | > y) ≤ 1. These two bounds taken col-lectively lead to a universal bound such that for anyy ≥ 0, one has

P (|Y | > y) ≤ exp

(8π

κ

)

exp

(

− y2

8κ D2ρ2ν2 ‖g‖2L

)

.

Lemma 1 then suggests

1

nlog E ( f ) ≥ 1

nE [Z ] ≥ log E ( f )

n

−log(

1 + √8κπ Dρν ‖g‖L e

8πκ +2κ D2ρ2ν2‖g‖2

L)

n.

This together with (15) establishes (25).

(2) Under the assumptions of Proposition 1(b), the concen-tration inequality (15) asserts that for any y > 0, wehave

P (|Y | > y) ≤ 2 exp

(

− y2

κclsρ2ν2 ‖g‖2L

)

.

Applying Lemma 1 then yields

1

nlog E ( f ) ≥ 1

nE [Z ] ≥ log E ( f )

n

−log

(

1 +√

4κπclsρ2ν2 ‖g‖2Le

κ4 clsρ

2ν2‖g‖2L)

n.

This combined with (16) establishes (26).(3) Under the assumptions of Proposition 1(c), define

Z :=n∑

i=1

f

(

λi

(1

nM R R∗M

∗))

,

Y := Z − E

[

Z]

.

Combining the concentration result (62) with (25) givesrise to⎧

⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

1n

∑ni=1 f

(

λi

(1n M RR∗M

∗))

≤ 1n log EM ( f ) + βτcσcρ‖g‖L

n ,1n

∑ni=1 f

(

λi

(1n M RR∗M

∗))

≥ 1n log EM ( f ) − βτcσcρ‖g‖L

n − cρ, f ,τc ,σcn ,

(68)

with probability exceeding 1 − 4 exp(

−β2

)

. We haveshown in (63) that

P

(

M �= M)

≤ 1

nc(n)

and 4 exp(

−β2

)

= 4nc(n) when β = √

8κc(n) log n,concluding the proof via the union bound.

APPENDIX DPROOF OF LEMMA 2

For any n × n matrix A with eigenvalues λ1, . . . , λn , definethe characteristic polynomial of A as

pA(t) = det (t I − A) = tn − S1(λ1, . . . , λn)tn−1

+ · · · + (−1)n Sn(λ1, . . . , λn), (69)

where Sl (λ1, . . . , λn) is the lth elementary symmetric polyno-mial defined as follows

Sl(λ1, . . . , λn) :=∑

1≤i1<···<il ≤n

l∏

j=1

λi j . (70)

Let El(A) represent the sum of determinants of all l-by-lprincipal minors of A. It has been shown in [78, Th. 1.2.12]that

Sl(λ1, . . . , λn) = El(A), 1 ≤ l ≤ n, (71)

which follows that

det (t I + A) = tn + E1(A)tn−1 + · · · + En(A). (72)

Page 16: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

CHEN et al.: BACKING OFF FROM INFINITY: PERFORMANCE BOUNDS VIA CONCENTRATION 381

On the other hand, for any principal minor(

M M∗)s,s of M M∗

coming from the rows and columns at indices from s (wheres ⊆ [n] is an index set), then one can show that

E

[

det((

M M∗)s,s

)]

= E[

det(

Ms,[m]M∗s,[m]

)]

=∑

t:card(t)=card(s)

E[

det(

Ms,t M∗s,t)]

=(

m

card(s)

)

· E[

det(

Ms,s M∗s,s)]

.

(73)

Consider now any i.i.d. random matrix G ∈ Cl×l such thatE[

Gi j] = 0 and E[∣∣Gi j

∣∣2] = 1. If we denote by

l thepermutation group of l elements, then the Leibniz formula forthe determinant gives

det (G) =∑

σ∈∏l

sgn(σ )

l∏

i=1

Gi,σ(i) .

Since Gi j are jointly independent, we have

E[

det(

GG∗)] = E

[

|det (G)|2]

=∑

σ∈∏l

E

[l∏

i=1

∣∣Gi,σ(i)

∣∣2

]

=∑

σ∈∏l

l∏

i=1

E

[∣∣Gi,σ(i)

∣∣2]

= l!, (74)

which is distribution-free.So far, we are already able to derive the closed-form

expression for E[

det(

ε I + 1m M M∗)] by combining (72), (73)

and (74) via straightforward algebraic manipulation.Alternatively, (72), (73) and (74) taken collectively indicate

that E[

det(

ε I + 1m M M∗)] is independent of precise distri-

bution of the entries M i j ’s. As a result, we only need toevaluate E

[

det(

ε I + 1m M M∗)] under i.i.d. Gaussian random

matrices. To this end, one can simply cite the closed-formexpression derived for Gaussian random matrices, which hasbeen reported in [73, Th. 2.13]. This concludes the proof.

APPENDIX EPROOF OF THEOREM 2

When equal power allocation is adopted, the MIMO mutualinformation per receive antenna is given by

C (H , SNR)

nr= 1

nrlog det

(

Inr + SNRnt

H H∗)

= 1

nr

min{nr ,nt}∑

i=1

log

(1

SNR+ 1

ntλi(

H H∗))

+ min{1, α} log SNR,

=⎧

1nr

log det(

1SNR Inr + 1

ntH H∗

)

+ log SNR, if α ≥ 1

αnt

log det(

αSNR Int + 1

nrH∗ H

)

+ α log SNRα , if α < 1

(75)

where α = nt/nr is assumed to be an absolute constant.

The first term of the capacity expression (75) in both casesexhibits sharp measure concentration, as stated in the follow-ing lemma. This in turn completes the proof of Theorem 2.

Lemma 4: Suppose that α := mn ≥ 1 is an absolute

constant. Consider a real-valued random matrix M =[ζi j ]1≤i≤n,1≤ j≤m, where ζi j ’s are jointly independent with zeromean and unit variance.

(a) If ζi j ’s are bounded by D, then for any β > 8√

π , onehas⎧

⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

1n log det

(

ε I + 1m M M∗)≤ 1

n logR ( e2ε, n, m

)+ D√e2 εα

βn ,

1n log det

(

ε I + 1m M M∗)≥ 1

n logR ( 2e ε, n, m

)− D√εα

βn

−log{

1+√

8π D2εα e8π+2D2

εα

}

n ,(76)

with probability exceeding 1 − 4 exp(

−β2

8

)

.(b) If ζi j satisfies the LSI with respect to a uniform constant

cls, then for any β > 0, one has⎧

⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

1n log det

(

ε I + 1m M M∗) ≤ 1

n logR (ε, n, m) +√

cls√εα

βn ,

1n log det

(

ε I + 1m M M∗) ≥ 1

n logR (ε, n, m) −√

cls√εα

βn

−log(

1+√

4πclsεα e

cls4εα

)

n ,

(77)

with probability at least 1 − 2 exp(−β2

)

.(c) If ζi j ’s are heavy-tailed distributed, then one has

⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎩

1n log det

(

ε I + 1m M M∗) ≤ 1

n logR(

eε2σ 2

c, n, m

)

+2 log σc + τcσc√

8c(n) log n√εαn

,1n log det

(

ε I + 1m M M∗) ≥ 1

n logR(

2εeσ 2

c, n, m

)

+2 log σc − τcσc√

8c(n) log n√εαn

−log

(

1+√

8πτ2c σ2

cεα e8π+ 2τ2

c σ2c

εα

)

n ,

(78)

with probability exceeding 1 − 5nc(n) .

Here, the function R (ε) is defined as

R (ε, n, m) :=n∑

i=0

(n

i

)εn−i m−i m!

(m − i)!.

Proof of Lemma 4: Observe that the derivative of g(x) :=log(

ε + x2)

is

g′(x) = 2x(

ε + x2) , x ≥ 0,

which is bounded within the interval[

0, 1√ε

]

when x ≥ 0.Therefore, the Lipschitz norm of g(x) satisfies

‖g‖L ≤ 1√ε

.

The three class of probability measures are discussed sepa-rately as follows.

(a) Consider first the measure uniformly bounded by D.In order to apply Theorem 1, we need to first convexify theobjective metric. Define a function gε(x) such that

gε(x) :={

log(

ε + x2)

, if x ≥ √ε,

1√ε

(

x − √ε)+ log (2ε) , if 0 ≤ x <

√ε.

Page 17: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

382 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 1, JANUARY 2015

Apparently, gε(x) is a concave function at x ≥ 0, and itsLipschitz constant can be bounded above by

‖gε‖L ≤ 1√ε

.

By setting fε (x) := gε

(√x)

, one can easily verify that

log

(2

eε + x

)

≤ fε(x) ≤ log (ε + x) ,

and hence

1

nlog det

(2

eε I + 1

mM M∗

)

≤ 1

n

n∑

i=1

[

(

λi

(1

mM M∗

))]

≤ 1

nlog det

(

ε I + 1

mM M∗

)

.

(79)

One desired feature of fε(x) is that gε(x) = fε(

x2)

is concavewith finite Lipschitz norm bounded above by 1√

ε, and is

sandwiched between two functions whose exponential meansare computable.

By assumption, each entry of M is bounded by D, and ρ =√nm = 1√

α. Theorem 1 taken collectively with (79) suggests

that

1

n

n∑

i=1

(

λi

(1

mM M∗

))

≤ 1

nlog E

[n∏

i=1

exp

{

(

λi

(1

mM M∗

))}]

+ D√εα

β

n

≤ 1

nlog E

[

det

(

ε I + 1

mM M∗

)]

+ D√εα

β

n(80)

and

1

n

n∑

i=1

(

λi

(1

mM M∗

))

≥ 1

nlog E

[n∏

i=1

exp

{

(

λi

(1

mM M∗

))}]

− D√εα

β

n

−log

{

1 +√

8π D2

εα e8π+ 2D2εα

}

n

≥ 1

nlog E

[

det

(2

eε I + 1

mM M∗

)]

− D√εα

β

n

−log

{

1 +√

8π D2

εα e8π+ 2D2εα

}

n(81)

with probability exceeding 1 − 4 exp(

−β2

8

)

.To complete the argument for Part (a), we observe that (79)

also implies

n∑

i=1

(

λi

(1

mM M∗

))

≤ log det

(

ε I + 1

mM M∗

)

≤n∑

i=1

f e2 ε

(

λi

(1

mM M∗

))

.

Substituting this into (80) and (81) and making use of theidentity given in Lemma 2 complete the proof for Part (a).

(b) If the measure of M i j satisfies LSI with uniformlybounded constant cls, then g(x) := log

(

ε + x2)

is Lipschitzwith Lipschitz bound 1√

ε. Applying Theorem 1 yields

⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎩

1n

∑ni=1 f

(

λi( 1

m M M∗))≤ 1n log E

[

det(

ε I + 1m M M∗)]

+√

cls√εα

βn ,

1n

∑ni=1 f

(

λi( 1

m M M∗))≥ 1n log E

[

det(

ε I + 1m M M∗)]

−√

cls√εα

βn −

log

(

1+√

4πclsεα e

cls4εα

)

n ,

(82)

with probability exceeding 1 − 2 exp(−β2

)

.(c) If the measure of M i j is heavy-tailed, then we again

need to use the sandwich bound between fε(x) and f (x).Theorem 1 indicates that⎧

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

1n

∑ni=1 fε

(

λi( 1

m M M∗))≤ 1n log E

[

det(

ε I+ 1m M M

∗)]

+ τcσc√

8c(n) log n√εαn

,

1n

∑ni=1 fε

(

λi( 1

m M M∗))≥ 1n log E

[

det(

2e ε I + 1

m M M∗)]

− τcσc√

8c(n) log n√εαn

−log

(

1+√

8πτ2c σ2

cεα e8π+ 2τ2

c σ2c

εα

)

n ,

(83)

with probability exceeding 1− 5nc(n) . The only other difference

with the proof of Part (a) is that the entries of M has varianceσ 2

c , and hence

1

nlog E

[

det

(

ε I + 1

mM M

∗)]

= 2 log σc + 1

nlog E

[

det

σ 2c

I + 1

mM M∗

)]

.

Finally, the proof is complete by applying Lemma 2 onE[

det(

ε I + 1m M M∗)].

APPENDIX FPROOF OF LEMMA 3

Using the results derived in Lemma 2, one can bound

E

[

det

(

ε I + 1

mM M∗

)]

= m−nn∑

i=0

(n

i

)m!

(m − i)!εn−i mn−i

= m!m−nn∑

i=0

(n

i

)εi mi

(m − n + i)!

≤ m!m−n (n + 1) maxi:0≤i≤n

(n

i

)εi mi

(m − n + i)!.

Denote by imax the index of the largest term as follows

imax := arg maxi:0≤i≤n

(n

i

)εi mi

(m − n + i)!.

Page 18: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

CHEN et al.: BACKING OFF FROM INFINITY: PERFORMANCE BOUNDS VIA CONCENTRATION 383

Suppose for now that imax = δn for some numericalvalue δ, then we can bound

1

nlog E

[

det

(

ε I + 1

mM M∗

)]

≤ log m!

n+ log (n + 1)

n+ log

( nδn

)

n+ δ log ε

− (1 − δ) log m − log (m − n + δn)!

n

= log m!(m−n+δn)!(n−δn)!

n+ log (n + 1)

n+ log

( nδn

)

n

+ log (n − δn)!

n− (1 − δ) log m + δ log ε

= α log( m

n−δn

)

m+ log (n + 1)

n+ log

( nδn

)

n+ log (n − δn)!

n− (1 − δ) log m + δ log ε. (84)

It follows immediately from the well-known entropy formula(see [79, Example 11.1.3]) that

H(

k

n

)

− log (n + 1)

n≤ 1

nlog

(n

k

)

≤ H(

k

n

)

, (85)

where H(x) := x log 1x + (1 − x) log 1

1−x denotes the binaryentropy function. Also, the famous Stirling approximationbounds asserts that

√2πnn+ 1

2 e−n ≤ n! ≤ enn+ 12 e−n ,

⇐⇒ 2 + log n

2n+ log n − 1 ≥ 1

nlog n!

≥ log (2π) + log n

2n+ log n − 1.

(86)

Substituting (85) and (86) into (84) leads to

1

nlog E

[

det

(

ε I + 1

mM M∗

)]

≤ αH(

(1 − δ)n

m

)

+ log (n + 1)

n+ H (δ)

+ log e + 12 log [(1 − δ) n]

n+ (1 − δ) log [(1 − δ) n]

− (1 − δ) log e − (1 − δ) log m + δ log ε

= αH(

1 − δ

α

)

+ log (n + 1) + 1 + 12 log [(1 − δ) n]

n

+H (δ) + (1 − δ) log1

α+ (1 − δ) log(1 − δ)

− (1 − δ) + δ log ε.

Making use of the inequality

log (n + 1) + 1 + 1

2log n ≤ 1.5 log (en) ,

one obtains

1

nlog E

[

det

(

ε I + 1

mM M∗

)]

≤ αH(

1 − δ

α

)

+ 1.5 log (en)

n+ δ log

1

δ− (1 − δ) log (eα) + δ log ε

= αH(

1

α

)

− log (αe) + 1.5 log (en)

n+ δ log

1

δ

+δ log (eα) + δ log ε + α

[

H(

1 − δ

α

)

− H(

1

α

)]

≤ (α − 1) log

α − 1

)

− 1 + 1.5 log (en)

n+ δ log

1

δ+δ log (eα) + δ log ε + H (δ) , (87)

where the last inequality follows from the following identityon binary entropy function [80]: for any p, q obeying 0 ≤p, q ≤ 1 and 0 < pq < 1, one has

H (p) − H (pq) = (1 − pq)H(

p − pq

1 − pq

)

− pH(q),

⇒ H(

1 − δ

α

)

− H(

1

α

)

≤ 1

αH(1 − δ) = 1

αH(δ).

It remains to estimate the index imax or, equivalently, thevalue δ. If we define

sε (i) :=(

n

i

)εi mi

(m − n + i)!and rε (i) := sε (i + 1)

sε (i),

then one can compute

rε (i)=n!

(i+1)!(n−i−1)!εi+1mi+1

(m−n+i+1)!

n!i!(n−i)!

εi mi

(m−n+i)!

= εm (n − i)

(i +1) (m − n + i + 1),

which is a decreasing function in i . Suppose that nε >max

{

4, 2(

1 − 1α + 1

αn

)}

. By setting rε (x) = 1, we can derivea valid positive solution x0 as follows

x0 = − (m + n + 2 + εm)

2

+√

(m + n + 2 + εm)2 − 4 (m − n + 1) + 4εmn

2.

Simple algebraic manipulation yields

x0 <− (m + n + 2 + εm) + (m + n + 2 + εm) + 2

√εmn

2≤ √

εαn,

and

x0 >− (m + n + 2 + εm) +

(m + n + 2 + εm)2 + 2εmn

2

= εmn

(m + n + 2 + εm) +√

(m + n + 2 + εm)2 + 2εmn

= αεn

α + 1 + 2n + αε +

√(

α + 1 + 2n + αε

)2 + 2εα

≥ α

2(

α + 1 + 2n + α

)+ √2α

εn >α

5 (α + 1)εn.

Page 19: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

384 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 1, JANUARY 2015

Therefore, we can conclude that δ ∈[

α5(1+α)ε,

√αε]

. Assume

that ε < 1e2α

. Substituting this into (87) then leads to

1

nlog E

[

det

(

ε I + 1

mM M∗

)]

≤ (α − 1) log

α − 1

)

− 1 + Rn,ε ,

where Rn,ε denotes the residual term

Rn,ε := 1.5 log (en)

n+√

αε

(

log1√αε

+log (eα)

)

+H (√αε)

.

In particular, if we further have ε < min{

1e2α3 , 1

}

, then one

has log αe < log 1√αε

and

H (√αε) ≤ 2

√αε log

1√αε

,

which in turn leads to

Rn,ε ≤ 1.5 log (en)

n+ 4

√αε log

1√αε

.

On the other hand, the lower bound on1n log E

[

det(

ε I + 1m M M∗)] can be easily obtained through

the following argument

E

[

det

(

ε I + 1

mM M∗

)]

≥ E[

det( 1

m M M∗)] = m!(m−n)!mn ,

and hence

1

nlog E

[

det

(

ε I + 1

mM M∗

)]

≥ 1

nlog

(m

n

)

+ log n!

n− log m

≥ αH(

1

α

)

− log (m + 1)

n+ log (2π) + 1

2 log n

n

+log( n

m

)

−1 (88)

≥ (α − 1) log

α − 1

)

−1− log 12π (m + 1)− 1

2 log n

n,

(89)

where (88) makes use of the bounds (85) and (86). Thisconcludes the proof.

APPENDIX GPROOF OF THEOREM 3 AND COROLLARY 2

Proof of Theorem 3: Suppose that the singular value decom-position of H can be expressed by H = U�V ∗ where � is adiagonal matrix consist of all singular values of H . Then the

MMSE per input component can be expressed as

NMMSE (H , SNR)

p

= 1 − 1

ptr

(

V�U∗(

1

SNRI + U�2U∗

)−1

U�V ∗)

= 1 − 1

ptr

(

(1

SNRI + �2

)−1

)

= 1

p

p∑

i=1

1

SNR−1 + λi(

H∗ H) .

(i) Note that gε(x) := 1ε2+x2 is not a convex function.

In order to apply Proposition 1, we define a convexified variantgε(x) of gε(x). Specifically, we set

gε(x) :=⎧

1ε2+x2 , if x > 1√

3ε,

− 3√

38ε3

(

x − ε√3

)

+ 34ε2 , if x ≤ 1√

3ε,

which satisfies

‖gε(x)‖L ≤ 3√

3

8ε3 .

One can easily check that

g 32√

2ε(x) ≤ g 3

2√

2ε(x) ≤ gε(x) ≤ gε(x) ≤ g 2

√2

3 ε(x).

This implies that we can sandwich the target function asfollows

1

p

p∑

i=1

1

ε2 + λi(

H∗ H)

≤ 1

p

p∑

i=1

(

(√

λi(

H∗ H))

− E

[

(√

λi(

H∗ H))])

+E

[

g 2√

23 ε

(√

λi(

H∗ H))]

, (90)

and

1

p

p∑

i=1

1

ε2 + λi(

H∗ H) (91)

≥ 1

p

p∑

i=1

(

g 32√

(√

λi(

H∗ H))

−E

[

g 32√

(√

λi(

H∗ H))])

+ Eg 32√

(√

λi(

H∗ H))

.

Recall that

H∗ H = M∗ A∗ AM, (92)

where the entries of M are independently generated withvariance 1

n . Since gε(x) is a convex function in x , applyingProposition 1 yields the following results.

Page 20: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

CHEN et al.: BACKING OFF FROM INFINITY: PERFORMANCE BOUNDS VIA CONCENTRATION 385

1) If√

pM i j ’s are bounded by D, then for any β > 8√

π ,we have

1

p

p∑

i=1

1

ε2 + λi(

H∗ H)

≤ 3√

3D ‖A‖8ε3

β

p+ 1

p

p∑

i=1

E

[

g 2√

23 ε

(√

λi(

H∗ H))]

,

and

1

p

p∑

i=1

1

ε2 + λi(

H∗ H)

≥(

2√

2D ‖A‖3√

3ε3

)

β

p− 1

p

p∑

i=1

E

[

g 32√

(√

λi(

H∗ H))]

with probability exceeding 1 − 8 exp(

−β2

16

)

.2) If the measure of

√pM i j satisfies the LSI with a

uniformly bounded constant cls, then for any β > 0,one has∣∣∣∣∣

1

p

p∑

i=1

1

ε2 + λi(

H∗ H) − E

[

(√

λi(

H∗ H))]∣∣∣∣∣

≤(

3√

3√

cls ‖A‖8ε3

)

β

p(93)

with probability exceeding 1 − 4 exp(

−β2

2

)

. Here, we

have made use of the fact that ‖gε‖L ≤ 3√

38ε3 .

3) Suppose that√

pM i j ’s are independently drawn fromsymmetric heavy-tailed distributions. Proposition 1suggests that

1

p

p∑

i=1

1

ε2 + λi(

H∗ H)

≤ 3√

3τcσc ‖A‖√c(n) log n

2ε3 p

+ 1

p

p∑

i=1

E

[

g 2√

23 ε

(√

λi

(

M∗

A∗ AM))]

,

and

1

p

p∑

i=1

1

ε2 + λi(

H∗ H)

≥ 8√

2τcσc ‖A‖√c(n) log n

3√

3ε3 p

− 1

p

p∑

i=1

E

[

g 32√

(√

λi

(

M∗

A∗ AM))]

with probability exceeding 1 − 10nc(n) . Here, M is a

truncated copy of M such that M i j = M i j 1{|H i j |≤τc}.

This completes the proof.Proof of Corollary 2: In the moderate-to-high SNR regime

(i.e. when all of λi(

H∗ H)

are large), one can apply the

following simple bound

1

p

p∑

i=1

(

1

λi(

H∗ H) − ε2

λ2i

(

H∗ H)

)

≤ 1

p

p∑

i=1

E

[

(√

λi(

H∗ H))]

≤ 1

p

p∑

i=1

1

λi(

H∗ H) .

This immediately leads to

1

ptr((

H∗ H)−1)

− ε2

ptr((

H∗ H)−2)

≤ 1

ptr

((

ε2 I p + H∗ H)−1)

≤ 1

ptr((

H∗ H)−1)

. (94)

When H i j ∼ N(

0, 1p

)

, applying [70, Th. 2.2.8] gives

tr((

H∗ H)−1)

= p2

(n−p−1) = αp(

1−α− 1n

) . (95)

Similarly, making use of [70, Th. 2.2.8] leads to

tr((

M∗M)−2)

= p2 p (n − p − 2) + p + p2

(n − p) (n − p − 1) (n − p − 3)

=(

1 − 1n

)

α2 p

(1 − α)(

1 − α − 1n

) (

1 − α − 3n

) . (96)

Substituting (95) and (96) into (94) yields

α

1 − α− 1

SNR

α2

(

1 − α − 3n

)3 ≤ 1

ptr

((

ε2 I p + H∗ H)−1)

≤ α

1 − α − 1n

, (97)

thus concluding the proof. �

ACKNOWLEDGMENTS

The authors would like to thank the anonymous reviewersfor extensive and constructive comments that greatly improvedthe paper.

REFERENCES

[1] A. El Gamal and Y.-H. Kim, Network Information Theory. Cambridge,U.K.: Cambridge Univ. Press, 2011.

[2] E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur. Trans.Telecommun., vol. 10, no. 6, pp. 585–595, Nov./Dec. 1999.

[3] A. Goldsmith, S. A. Jafar, N. Jindal, and S. Vishwanath, “Capacitylimits of MIMO channels,” IEEE J. Sel. Areas Commun., vol. 21, no. 5,pp. 684–702, Jun. 2003.

[4] T. Kailath, A. H. Sayed, and B. Hassibi, Linear Estimation.Englewood Cliffs, NJ, USA: Prentice-Hall, 2000.

[5] A. Lozano, A. M. Tulino, and S. Verdu, “High-SNR power offset inmultiantenna communication,” IEEE Trans. Inf. Theory, vol. 51, no. 12,pp. 4134–4151, Dec. 2005.

[6] Y. Chen, A. J. Goldsmith, and Y. C. Eldar, submitted to IEEE Transac-tions on Information Theory.

[7] D. N. C. Tse and S. V. Hanly, “Linear multiuser receivers: Effectiveinterference, effective bandwidth and user capacity,” IEEE Trans. Inf.Theory, vol. 45, no. 2, pp. 641–657, Mar. 1999.

Page 21: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, NO. 1, JANUARY 2015

[8] C.-N. Chuah, D. N. C. Tse, J. M. Kahn, and R. A. Valenzuela, “Capacityscaling in MIMO wireless systems under correlated fading,” IEEE Trans.Inf. Theory, vol. 48, no. 3, pp. 637–650, Mar. 2002.

[9] P. B. Rapajic and D. Popescu, “Information capacity of a random sig-nature multiple-input multiple-output channel,” IEEE Trans. Commun.,vol. 48, no. 8, pp. 1245–1248, Aug. 2000.

[10] S. V. Hanly and D. N. C. Tse, “Resource pooling and effectivebandwidths in CDMA networks with multiuser receivers and spatialdiversity,” IEEE Trans. Inf. Theory, vol. 47, no. 4, pp. 1328–1351,May 2001.

[11] A. Lozano and A. M. Tulino, “Capacity of multiple-transmit multiple-receive antenna architectures,” IEEE Trans. Inf. Theory, vol. 48, no. 12,pp. 3117–3128, Dec. 2002.

[12] A. M. Tulino, A. Lozano, and S. Verdú, “Impact of antenna correlationon the capacity of multiantenna channels,” IEEE Trans. Inf. Theory,vol. 51, no. 7, pp. 2491–2509, Jul. 2005.

[13] A. Lozano, A. M. Tulino, and S. Verdú, “Multiple-antenna capacityin the low-power regime,” IEEE Trans. Inf. Theory, vol. 49, no. 10,pp. 2527–2544, Oct. 2003.

[14] R. Müller, “A random matrix model of communication via antennaarrays,” IEEE Trans. Inf. Theory, vol. 48, no. 9, pp. 2495–2506,Sep. 2002.

[15] X. Mestre, J. R. Fonollosa, and A. Pages-Zamora, “Capacity of MIMOchannels: Asymptotic evaluation under correlated fading,” IEEE J. Sel.Areas Commun., vol. 21, no. 5, pp. 829–838, Jun. 2003.

[16] A. L. Moustakas, S. H. Simon, and A. M. Sengupta, “MIMO capacitythrough correlated channels in the presence of correlated interferers andnoise: A (not so) large N analysis,” IEEE Trans. Inf. Theory, vol. 49,no. 10, pp. 2545–2561, Oct. 2003.

[17] A. Lozano, “Capacity-approaching rate function for layered multi-antenna architectures,” IEEE Trans. Wireless Commun., vol. 2, no. 4,pp. 616–620, Jul. 2003.

[18] S. Verdú and S. Shamai, “Spectral efficiency of CDMA with randomspreading,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 622–640,Mar. 1999.

[19] A. L. Moustakas and S. H. Simon, “On the outage capacity of correlatedmultiple-path MIMO channels,” IEEE Trans. Inf. Theory, vol. 53, no. 11,pp. 3887–3903, Nov. 2007.

[20] F. Dupuy and P. Loubaton, “On the capacity achieving covariance matrixfor frequency selective MIMO channels using the asymptotic approach,”IEEE Trans. Inf. Theory, vol. 57, no. 9, pp. 5737–5753, Sep. 2011.

[21] G. Taricco and E. Riegler, “On the ergodic capacity of correlated Ricianfading MIMO channels with interference,” IEEE Trans. Inf. Theory,vol. 57, no. 7, pp. 4123–4137, Jul. 2011.

[22] T. Tao, Topics in Random Matrix Theory (Graduate Studies in Mathe-matics). Providence, RI, USA: AMS, 2012.

[23] M. L. Mehta, Random Matrices, vol. 142. Amsterdam, The Netherlands:Elsevier, 2004.

[24] B. Hassibi and T. L. Marzetta, “Multiple-antennas and isotropicallyrandom unitary inputs: The received signal density in closed form,” IEEETrans. Inf. Theory, vol. 48, no. 6, pp. 1473–1484, Jun. 2002.

[25] Z. Wang and G. B. Giannakis, “Outage mutual information of space-timeMIMO channels,” IEEE Trans. Inf. Theory, vol. 50, no. 4, pp. 657–662,Apr. 2004.

[26] M. Chiani, M. Z. Win, and A. Zanella, “On the capacity of spatiallycorrelated MIMO Rayleigh-fading channels,” IEEE Trans. Inf. Theory,vol. 49, no. 10, pp. 2363–2371, Oct. 2003.

[27] T. Ratnarajah, R. Vaillancourt, and M. Alvo, “Complex random matricesand Rayleigh channel capacity,” Commun. Inf. Syst., vol. 3, no. 2,pp. 119–138, 2003.

[28] T. Ratnarajah, R. Vaillancourt, and M. Alvo, “Complex random matricesand Rician channel capacity,” Problems Inf. Transmiss., vol. 41, no. 1,pp. 1–22, 2005.

[29] E. P. Wigner, “On the distribution of the roots of certain symmetricmatrices,” Ann. Math., vol. 67, no. 2, pp. 325–327, 1958.

[30] T. Tao, V. Vu, and M. Krishnapur, “Random matrices: Universality ofESDs and the circular law,” Ann. Probab., vol. 38, no. 5, pp. 2023–2065,2010.

[31] T. Tao and V. Vu, “Random matrices: The circular law,” Commun.Contemp. Math., vol. 10, no. 2, pp. 261–307, 2008.

[32] V. A. Marchenko and L. A. Pastur, “Distribution of eigenvalues forsome sets of random matrices,” Matematicheskii Sbornik, vol. 114, no. 4,pp. 507–536, 1967.

[33] H. Shin and J. H. Lee, “Capacity of multiple-antenna fading channels:Spatial fading correlation, double scattering, and keyhole,” IEEE Trans.Inf. Theory, vol. 49, no. 10, pp. 2636–2647, Oct. 2003.

[34] R. Couillet and W. Hachem. (2013). “Analysis of the limiting spectralmeasure of large random matrices of the separable covariance type.”[Online]. Available: http://arxiv.org/abs/1310.8094

[35] W. Hachem, O. Khorunzhiy, P. Loubaton, J. Najim, and L. Pastur,“A new approach for mutual information analysis of large dimensionalmulti-antenna channels,” IEEE Trans. Inf. Theory, vol. 54, no. 9,pp. 3987–4004, Sep. 2008.

[36] G. Taricco, “Asymptotic mutual information statistics of separatelycorrelated Rician fading MIMO channels,” IEEE Trans. Inf. Theory,vol. 54, no. 8, pp. 3490–3504, Aug. 2008.

[37] J. Hoydis, R. Couillet, P. Piantanida, and M. Debbah, “A random matrixapproach to the finite blocklength regime of MIMO fading channels,”in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Jul. 2012, pp. 2181–2185.

[38] J. Hoydis, R. Couillet, and P. Piantanida. (2013). “The second-ordercoding rate of the MIMO Rayleigh block-fading channel.” [Online].Available: http://arxiv.org/abs/1303.3400

[39] J. Hoydis, R. Couillet, and P. Piantanida, “Bounds on the second-ordercoding rate of the MIMO Rayleigh block-fading channel,” in Proc. IEEEInt. Symp. Inf. Theory, Jul. 2013, pp. 1526–1530.

[40] S. Loyka and G. Levin, “Finite-SNR diversity-multiplexing tradeoff viaasymptotic analysis of large MIMO systems,” IEEE Trans. Inf. Theory,vol. 56, no. 10, pp. 4781–4792, Oct. 2010.

[41] P. Kazakopoulos, P. Mertikopoulos, A. L. Moustakas, and G. Caire,“Living at the edge: A large deviations approach to the outage MIMOcapacity,” IEEE Trans. Inf. Theory, vol. 57, no. 4, pp. 1984–2007,Apr. 2011.

[42] Y.-C. Liang, G. Pan, and Z. D. Bai, “Asymptotic performance of MMSEreceivers for large systems using random matrix theory,” IEEE Trans.Inf. Theory, vol. 53, no. 11, pp. 4173–4190, Nov. 2007.

[43] R. Couillet and M. Debbah, “Signal processing in large systems:A new paradigm,” IEEE Signal Process. Mag., vol. 30, no. 1, pp. 24–39,Jan. 2013.

[44] A. Karadimitrakis, A. L. Moustakas, and P. Vivo. (2013).“Outage capacity for the optical MIMO channel.” [Online]. Available:http://arxiv.org/abs/1302.0614

[45] I. M. Johnstone, “On the distribution of the largest eigenvaluein principal components analysis,” Ann. Statist., vol. 29, no. 2,pp. 295–327, 2001.

[46] I. M. Johnstone. (2006). “High dimensional statistical inference and ran-dom matrices.” [Online]. Available: http://arxiv.org/abs/math/0611589

[47] R. Couillet and M. Debbah, Random Matrix Methods for WirelessCommunications. Cambridge, U.K.: Cambridge Univ. Press, 2011.

[48] R. Couillet, M. Debbah, and J. W. Silverstein, “A deterministic equiva-lent for the analysis of correlated mimo multiple access channels,” IEEETrans. Inf. Theory, vol. 57, no. 6, pp. 3493–3514, Jun. 2011.

[49] D. L. Donoho, “High-dimensional data analysis: The curses and bless-ings of dimensionality,” in AMS Math Challenges Lecture. Stanford,CA, USA: Stanford Univ., 2000, pp. 1–32.

[50] R. Vershynin, “Introduction to the non-asymptotic analysis of randommatrices,” in Compressed Sensing, Theory and Applications, Y. C. Eldarand G. Kutyniok, Eds. Cambridge, U.K.: Cambridge Univ. Press, 2012,pp. 210–268.

[51] A. Guionnet and O. Zeitouni, “Concentration of the spectral measure forlarge matrices,” Electron. Commun. Probab., vol. 5, pp. 119–136, 2000.

[52] N. El Karoui, “Concentration of measure and spectra of random matri-ces: Applications to correlation matrices, elliptical distributions andbeyond,” Ann. Appl. Probab., vol. 19, no. 6, pp. 2362–2405, 2009.

[53] Z. D. Bai and J. W. Silverstein, “CLT for linear spectral statistics oflarge-dimensional sample covariance matrices,” Ann. Probab., vol. 32,no. 1A, pp. 553–605, 2004.

[54] P. Massart and J. Picard, Concentration Inequalities and Model Selection(Lecture Notes in Mathematics). New York, NY, USA: Springer-Verlag,2007.

[55] H. Huh, G. Caire, H. C. Papadopoulos, and S. A. Ramprashad,“Achieving ‘massive MIMO’ spectral efficiency with a not-so-largenumber of antennas,” IEEE Trans. Wireless Commun., vol. 11, no. 9,pp. 3226–3239, Sep. 2012.

[56] F. Rusek et al., “Scaling up MIMO: Opportunities and challenges withvery large arrays,” IEEE Signal Process. Mag., vol. 30, no. 1, pp. 40–60,Jan. 2013.

[57] E. Larsson, O. Edfors, F. Tufvesson, and T. Marzetta, “Massive MIMOfor next generation wireless systems,” IEEE Commun. Mag., vol. 52,no. 2, pp. 186–195, Feb. 2013.

[58] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and spectral effi-ciency of very large multiuser MIMO systems,” IEEE Trans. Commun.,vol. 61, no. 4, pp. 1436–1449, Apr. 2013.

Page 22: 366 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 61, …yc5/publications/NonasymptoticMIMO.pdf · Index Terms—MIMO, massive MIMO, confidence interval, concentration of spectral

CHEN et al.: BACKING OFF FROM INFINITY: PERFORMANCE BOUNDS VIA CONCENTRATION 387

[59] H. Huh, A. M. Tulino, and G. Caire, “Network MIMO with linear zero-forcing beamforming: Large system analysis, impact of channel esti-mation, and reduced-complexity scheduling,” IEEE Trans. Inf. Theory,vol. 58, no. 5, pp. 2911–2934, May 2012.

[60] J. Zhang, R. Chen, J. G. Andrews, A. Ghosh, and R. W. Heath,“Networked MIMO with clustered linear precoding,” IEEE Trans. Wire-less Commun., vol. 8, no. 4, pp. 1910–1921, Apr. 2009.

[61] S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequal-ities: A Nonasymptotic Theory of Independence. London, U.K.:Oxford Univ. Press, 2013.

[62] M. Raginsky and I. Sason, “Concentration of measure inequalities ininformation theory, communications and coding,” in Foundations andTrends in Communications and Information Theory, vol. 10. Norwell,MA, USA: NOW Publishers, 2013.

[63] M. Ledoux, “Concentration of measure and logarithmic Sobolevinequalities,” in Séminaire de Probabilités XXXIII. Berlin, Germany:Springer-Verlag, 1999, pp. 120–216.

[64] M. Talagrand, “Concentration of measure and isoperimetric inequalitiesin product spaces,” Pub. Math. Inst. Hautes Etudes Sci., vol. 81, no. 1,pp. 73–205, 1995.

[65] Z. D. Bai, “Convergence rate of expected spectral distributions of largerandom matrices. Part II. Sample covariance matrices,” Ann. Probab.,vol. 21, no. 2, pp. 649–672, 1993.

[66] Z. D. Bai, B. Miao, and J.-F. Yao, “Convergence rates of spectraldistributions of large sample covariance matrices,” SIAM J. Matrix Anal.Appl., vol. 25, no. 1, pp. 105–127, 2003.

[67] F. Götze and A. N. Tikhomirov, “The rate of convergence of spectraof sample covariance matrices,” Theory Probab. Appl., vol. 54, no. 1,pp. 129–140, 2010.

[68] F. Götze and A. Tikhomirov, “The rate of convergence for spectra ofGUE and LUE matrix ensembles,” Central Eur. J. Math., vol. 3, no. 4,pp. 666–704, 2005.

[69] A. Goldsmith, Wireless Communications. New York, NY, USA:Cambridge Univ. Press, 2005.

[70] Y. Fujikoshi, V. V. Ulyanov, and R. Shimizu, Multivariate Statistics:High-Dimensional and Large-Sample Approximations (Probability andStatistics). Hoboken, NJ, USA: Wiley, 2010.

[71] M. Chiani, A. Conti, M. Mazzotti, E. Paolini, and A. Zanella, “Outagecapacity for MIMO-OFDM systems in block fading channels,” inProc. Asilomar Conf. Signals, Syst., Comput. (ASILOMAR), Nov. 2011,pp. 779–783.

[72] Y. Chen, Y. C. Eldar, and A. J. Goldsmith, “Shannon meets Nyquist:Capacity of sampled Gaussian channels,” IEEE Trans. Inf. Theory,vol. 59, no. 8, pp. 4889–4914, Aug. 2013.

[73] A. M. Tulino and S. Verdú, Random Matrix Theory and WirelessCommunications (Foundations and Trends in Communications andInformation Theory). Hanover, MA, USA: Now Publishers, 2004.

[74] S. Li, M. R. McKay, and Y. Chen, “On the distribution of MIMO mutualinformation: An in-depth Painlevé-based characterization,” IEEE Trans.Inf. Theory, vol. 59, no. 9, pp. 5271–5296, Sep. 2013.

[75] Y. Chen and M. R. McKay, “Coulumb fluid, Painlevé transcendents, andthe information theory of MIMO systems,” IEEE Trans. Inf. Theory,vol. 58, no. 7, pp. 4594–4634, Jul. 2012.

[76] D. Passemier, M. R. Mckay, and Y. Chen. (2014). “Asymptotic linearspectral statistics for spiked Hermitian random matrix models.” [Online].Available: http://arxiv.org/abs/1402.6419

[77] E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles:Exact signal reconstruction from highly incomplete frequency informa-tion,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006.

[78] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.:Cambridge Univ. Press, 1985.

[79] T. M. Cover and J. A. Thomas, Elements of Information Theory.New York, NY, USA: Wiley, 2012.

[80] S. Verdu, Information Theory, to be published. [Online]. Available:http://blogs.princeton.edu/blogit/

Yuxin Chen (S’09) received the B.S. in Microelectronics with High Distinc-tion from Tsinghua University in 2008, the M.S. in Electrical and ComputerEngineering from the University of Texas at Austin in 2010, and the M.S. inStatistics from Stanford University in 2013. He is currently a Ph.D. candidatein the Department of Electrical Engineering at Stanford University. Hisresearch interests include information theory, compressed sensing, networkscience and high-dimensional statistics.

Andrea J. Goldsmith (S’90–M’93–SM’99–F’05) is the Stephen Harris pro-fessor in the School of Engineering and a professor of Electrical Engineeringat Stanford University. She was previously on the faculty of ElectricalEngineering at Caltech. Her research interests are in information theory andcommunication theory, and their application to wireless communications andrelated fields. She co-founded and serves as Chief Scientist of Accelera, Inc.,and previously co-founded and served as CTO of Quantenna Communications,Inc. She has also held industry positions at Maxim Technologies, MemorylinkCorporation, and AT&T Bell Laboratories. Dr. Goldsmith is a Fellow ofStanford, and she has received several awards for her work, includingthe IEEE Communications Society and Information Theory Society jointpaper award, the IEEE Communications Society Best Tutorial Paper Award,the National Academy of Engineering Gilbreth Lecture Award, the IEEEComSoc Communications Theory Technical Achievement Award, the IEEEComSoc Wireless Communications Technical Achievement Award, the AlfredP. Sloan Fellowship, and the Silicon Valley/San Jose Business Journal’sWomen of Influence Award. She is author of the book Wireless Commu-nications and co-author of the books MIMO Wireless Communications andPrinciples of Cognitive Radio, all published by Cambridge University Press.She received the B.S., M.S. and Ph.D. degrees in Electrical Engineering fromU.C. Berkeley.

Dr. Goldsmith has served on the Steering Committee for the IEEE TRANS-ACTIONS ON WIRELESS COMMUNICATIONS and as editor for the IEEETRANSACTIONS ON INFORMATION THEORY, the Journal on Foundationsand Trends in Communications and Information Theory and in Networks, theIEEE TRANSACTIONS ON COMMUNICATIONS, and the IEEE Wireless Com-munications Magazine. She participates actively in committees and conferenceorganization for the IEEE Information Theory and Communications Societiesand has served on the Board of Governors for both societies. She has alsobeen a Distinguished Lecturer for both societies, served as President of theIEEE Information Theory Society in 2009, founded and chaired the studentcommittee of the IEEE Information Theory society, and chaired the EmergingTechnology Committee of the IEEE Communications Society. At Stanford shereceived the inaugural University Postdoc Mentoring Award, served as Chairof Stanfords Faculty Senate in 2009 and currently serves on its Faculty Senateand on its Budget Group.

Yonina C. Eldar (S’98–M’02–SM’07–F’12) received the B.Sc. degree inphysics and the B.Sc. degree in electrical engineering both from Tel-AvivUniversity (TAU), Tel-Aviv, Israel, in 1995 and 1996, respectively, andthe Ph.D. degree in electrical engineering and computer science from theMassachusetts Institute of Technology (MIT), Cambridge, in 2002.

From January 2002 to July 2002, she was a Postdoctoral Fellow at theDigital Signal Processing Group at MIT. She is currently a Professor inthe Department of Electrical Engineering at the Technion-Israel Institute ofTechnology, Haifa and holds the The Edwards Chair in Engineering. She isalso a Research Affiliate with the Research Laboratory of Electronics at MITand a Visiting Professor at Stanford University, Stanford, CA. Her researchinterests are in the broad areas of statistical signal processing, samplingtheory and compressed sensing, optimization methods, and their applicationsto biology and optics.

Dr. Eldar was in the program for outstanding students at TAU from 1992to 1996. In 1998, she held the Rosenblith Fellowship for study in electricalengineering at MIT, and in 2000, she held an IBM Research Fellowship.From 2002 to 2005, she was a Horev Fellow of the Leaders in Scienceand Technology program at the Technion and an Alon Fellow. In 2004, shewas awarded the Wolf Foundation Krill Prize for Excellence in ScientificResearch, in 2005 the Andre and Bella Meyer Lectureship, in 2007 the HenryTaub Prize for Excellence in Research, in 2008 the Hershel Rich InnovationAward, the Award for Women with Distinguished Contributions, the Muriel& David Jacknow Award for Excellence in Teaching, and the TechnionOutstanding Lecture Award, in 2009 the Technion’s Award for Excellence inTeaching, in 2010 the Michael Bruno Memorial Award from the RothschildFoundation, and in 2011 the Weizmann Prize for Exact Sciences. In 2012she was elected to the Young Israel Academy of Science and to the IsraelCommittee for Higher Education. In 2013 she received the Technion’s Awardfor Excellence in Teaching, the Hershel Rich Innovation Award, and the IEEESignal Processing Technical Achievement Award. She received several bestpaper awards together with her research students and colleagues. She receivedseveral best paper awards together with her research students and colleagues.She is the Editor in Chief of Foundations and Trends in Signal Processing.In the past, she was a Signal Processing Society Distinguished Lecturer,member of the IEEE Signal Processing Theory and Methods and Bio ImagingSignal Processing technical committees, and served as an associate editor forthe IEEE TRANSACTIONS ON SIGNAL PROCESSING, the EURASIP Journalof Signal Processing, the SIAM Journal on Matrix Analysis and Applications,and the SIAM Journal on Imaging Sciences.