performance analysis for sparse support recovery

arX

iv:0

908.

0744

v2 [

cs.IT

] 5

Nov

200

9TO APPEAR ON IEEE TRANS. INFORMATION THEORY 1

Performance Analysis for Sparse SupportRecovery

Gongguo Tang,Student Member, IEEE,Arye Nehorai,Fellow, IEEE

Abstract—The performance of estimating the common supportfor jointly sparse signals based on their projections onto lower-dimensional space is analyzed. Support recovery is formulatedas a multiple-hypothesis testing problem. Both upper and lowerbounds on the probability of error are derived for generalmeasurement matrices, by using the Chernoff bound and Fano’sinequality, respectively. The upper bound shows that the perfor-mance is determined by a quantity measuring the measurementmatrix incoherence, while the lower bound reveals the importanceof the total measurement gain. The lower bound is appliedto derive the minimal number of samples needed for accuratedirection-of-arrival (DOA) estimation for a sparse representationbased algorithm. When applied to Gaussian measurement ensem-bles, these bounds give necessary and sufficient conditionsfor avanishing probability of error for majority realizations o f themeasurement matrix. Our results offer surprising insights intosparse signal recovery. For example, as far as support recoveryis concerned, the well-known bound in Compressive Sensingwith the Gaussian measurement matrix is generally not sufficientunless the noise level is low. Our study provides an alternativeperformance measure, one that is natural and important inpractice, for signal recovery in Compressive Sensing and otherapplication areas exploiting signal sparsity.

Index Terms—Chernoff bound, Compressive Sensing, Fano’sinequality, jointly sparse signals, multiple hypothesis testing,probability of error, support recovery

I. I NTRODUCTION

SUPPORT recovery for jointly sparse signals concernsaccurately estimating the non-zero component locations

shared by a set of sparse signals based on a limited numberof noisy linear observations. More specifically, suppose thatx(t) ∈ FN , t = 1, 2, . . . , T, F = R or C, is a sequenceof jointly sparse signals (possibly under a sparsity-inducingbasis Φ instead of the canonical domain) with a commonsupportS, which is the index set indicating the non-vanishingsignal coordinates. This model is the same as the joint sparsitymodel 2 (JSM-2) in [1]. The observation model is linear:

y(t) = Ax(t) + w(t) t = 1, 2, . . . , T. (1)

In (1), A ∈ FM×N is the measurement matrix,y(t) ∈ F

M

the noisy data vector, andw(t) ∈ FM an additive noise.In most cases, the sparsity levelK , |S| and the numberof observationsM is far less thanN , the dimension ofthe ambient space. This problem arises naturally in severalsignal processing areas such as Compressive Sensing [2]–[6],source localization [7]–[10], sparse approximation and signaldenoising [11].

This work was supported by the Department of Defense under the Air ForceOffice of Scientific Research MURI Grant FA9550-05-1-0443, and ONR GrantN000140810849.

G. Tang and A. Nehorai are with the Department of Electrical and SystemsEngineering, Washington University in St. Louis, St. Louis, MO 63130 USA.

Compressive Sensing [2]–[4], a recently developed fieldexploiting the sparsity property of most natural signals, showsgreat promise to reduce signal sampling rate. In the classicalsetting of Compressive Sensing, only one snapshot is consid-ered; i.e., T = 1 in (1). The goal is to recover a long vectorx := x(1) with a small fraction of non-zero coordinates fromthe much shorter observation vectory := y(1). Since mostnatural signals are compressible under some basis and arewell approximated by theirK−sparse representations [12],this scheme, if properly justified, will reduce the necessarysampling rate beyond the limit set by Nyquist and Shannon[5], [6]. Surprisingly, for exactK−sparse signals, ifM =O(K log(N

K )) ≪ N and the measurement matrix is generatedrandomly from, for example, a Gaussian distribution, wecan recoverx exactly in the noise-free setting by solvinga linear programming task. Besides, various methods havebeen designed for the noisy case [13]–[17]. Along with thesealgorithms, rigorous theoretic analysis is provided to guaranteetheir effectiveness in terms of, for example, variouslp-normsof the estimation error forx [13]–[17]. However, these resultsoffer no guarantee that we can recover the support of a sparsesignal correctly.

The accurate recovery of signal support is crucial to Com-pressive Sensing both in theory and in practice. Since for sig-nal recovery it is necessary to haveK ≤M , signal componentvalues can be computed by solving a least squares problemonce its support is obtained. Therefore, support recovery is astronger theoretic criterion than variouslp-norms. In practice,the success of Compressive Sensing in a variety of applicationsrelies on its ability for correct support recovery because thenon-zero component indices usually have significant physicalmeanings. The support of temporally or spatially sparse signalsreveals the timing or location for important events such asanomalies. The indices for non-zero coordinates in the Fourierdomain indicate the harmonics existing in a signal [18], whichis critical for tasks such as spectrum sensing for cognitiveradios [19]. In compressed DNA microarrays for bio-sensing,the existence of certain target agents in the tested solution isreflected by the locations of non-vanishing coordinates, whilethe magnitudes are determined by their concentrations [20]–[23]. For compressive radar imaging, the sparsity constraintsare usually imposed on the discretized time–frequency do-main. The distance and velocity of an object have a directcorrespondence to its coordinate in the time-frequency domain.The magnitude determined by coefficients of reflection is ofless physical significance [24]–[26]. In sparse linear regression[27], the recovered parameter support corresponds to the fewfactors that explain the data. In all these applications, thesupport is physically more significant than the component

http://arxiv.org/abs/0908.0744v2

TO APPEAR ON IEEE TRANS. INFORMATION THEORY 2

values.Our study of sparse support recovery is also motivated by

the recent reformulation of the source localization problem asone of sparse spectrum estimation. In [7], the authors trans-form the process of source localization using sensory arraysinto the task of estimating the spectrum of a sparse signalby discretizing the parameter manifold. This method exhibitssuper-resolution in the estimation of direction of arrival(DOA)compared with traditional techniques such as beamforming[28], Capon [29], and MUSIC [30], [31]. Since the basic modelemployed in [7] applies to several other important problemsin signal processing (see [32] and references therein), theprinciple is readily applicable to those cases. This idea islatergeneralized and extended to other source localization settingsin [8]–[10]. For source localization, the support of the sparsesignal reveals the DOA of sources. Therefore, the recoveryalgorithm’s ability of exact support recovery is key to theeffectiveness of the method. We also note that usually multipletemporal snapshots are collected, which results in a jointlysparse signal sets as in (1). In addition, sinceM is the numberof sensors whileT is the number of temporal samples, it is farmore expensive to increaseM thanT . The same commentsapply to several other examples in the Compressive Sensingapplications discussed in the previous paragraph, especially thecompressed DNA microarrays, spectrum sensing for cognitiveradios, and Compressive Sensing radar imaging.

The signal recovery problem with joint sparsity con-straint [33]–[36], also termed the multiple measurement vector(MMV) problem [37]–[41], has been considered in a line ofprevious works. Several algorithms, among them SimultaneousOrthogonal Matching Pursuit (SOMP) [34], [37], [40]; convexrelaxation [41];ℓ1−minimization [38], [39]; and M-FOCUSS[37], are proposed and analyzed, either numerically or theoret-ically. These algorithms are multiple-dimension extensions oftheir one-dimension counterparts. Most performance measuresof the algorithms are concerned with bounds on various normsof the difference between the true signals and their estimates ortheir closely related variants. The performance bounds usuallyinvolve the mutual coherence between the measurement matrixA and the basis matrixΦ under which the measured signalsx(t) have a jointly sparse representation. However, with jointsparsity constraints, a natural measure of performance wouldbe the model (1)’s potential for correctly identifying the truecommon support, and hence the algorithm’s ability to achievethis potential. As part of their research, J. Chen and X. Huoderived, in a noiseless setting, sufficient conditions on theuniqueness of solutions to (1) underℓ0 and ℓ1 minimization.In [37], S. Cotteret. al. numerically compared the probabil-ities of correctly identifying the common support by basicmatching pursuit, orthogonal matching pursuit, FOCUSS, andregularized FOCUSS in the multiple-measurement setting witha range of SNRs and different numbers of snapshots.

The availability of multiple temporal samples offers servaladvantages to the single-sample case. As suggested by theupper bound (26) on the probability of error, increasing thenumber of temporal samples drives the probability of errorto zero exponentially fast as long as certain condition on theinconsistency property of the measurement matrix is satisfied.

The probability of error is driven to zero by scaling the SNRaccording to the signal dimension in [42], which is not verynatural compared with increasing the samples, however. Ourresults also show that under some conditions increasing tem-poral samples is usually equivalent to increasing the numberof observations for a single snapshot. The later is generallymuch more expensive in practice. In addition, when there isconsiderable noise and the columns of the measurement matrixare normalized to one, it is necessary to have multiple temporalsamples for accurate support recovery as discussed in SectionIV and Section V-B.

Our work has several major differences compared to relatedwork [43] and [42], which also analyze the performancebounds on the probability of error for support recovery usinginformation theoretic tools. The first difference is in the waythe problem is modeled: In [42], [43], the sparse signalis deterministic with known smallest absolute value of thenon-zero components while we consider a random signalmodel. This leads to the second difference: We define theprobability of error over the signal and noise distributions withthe measurement matrix fixed; In [42], [43], the probabilityof error is taken over the noise, the Gaussian measurementmatrix and the signal support. Most of the conclusions inthis paper apply to general measurement matrices and weonly restrict ourselves to the Gaussian measurement matrixin Section V. Therefore, although we use a similar set oftheoretical tools, the exact details of applying them are quietdifferent. In addition, we consider a multiple measurementmodel while only one temporal sample is available in [42],[43]. In particular, to get a vanishing probability of error,Aeron et.al. [42] require to scale the SNR according to thesignal dimension, which has a similar effect to having multipletemporal measurements in our paper. Although the first twodifferences make it difficult to compare corresponding resultsin these two papers, we will make some heuristic commentsin Section V.

The contribution of our work is threefold. First, we intro-duce a hypothesis-testing framework to study the performancefor multiple support recovery. We employ well-known tools instatistics and information theory such as the Chernoff boundand Fano’s inequality to derive both upper and lower boundson the probability of error. The upper bound we derive is forthe optimal decision rule, in contrast to performance analysisfor specific sub-optimal reconstruction algorithms [13]–[17].Hence, the bound can be viewed as a measure of the measure-ment system’s ability to correctly identify the true support. Ourbounds isolate important quantities that are crucial for systemperformance. Since our analysis is based on measurementmatrices with as few assumptions as possible, the results canbe used as a guidance in system design. Second, we applythese performance bounds to other more specific situationsand derive necessary and sufficient conditions in terms ofthe system parameters to guarantee a vanishing probabilityoferror. In particular, we study necessary conditions for accuratesource localization by the mechanism proposed in [7]. Byrestricting our attention to Gaussian measurement matrices,we derive a result parallel to those for classical CompressiveSensing [2], [3], namely, the number of measurements that


are sufficient for signal reconstruction. Even if we adopt theprobability of error as the performance criterion, we get thesame bound onM as in [2], [3]. However, our result suggeststhat generally it is impossible to obtain the true supportaccurately with only one snapshot when there is considerablenoise. We also obtain a necessary condition showing that thelog N

K term cannot be dropped in Compressive Sensing. Lastbut not least, in the course of studying the performance boundswe explore the eigenvalue structure of a fundamental matrixin support recovery hypothesis testing for both general mea-surement matrices and the Gaussian measurement ensemble.These results are of independent interest.

The paper is organized as follows. In Section II, weintroduce the mathematical model and briefly review thefundamental ideas in hypothesis testing. Section III is devotedto the derivation of upper bounds on the probability of errorfor general measurement matrices. We first derive an upperbound on the probability of error for the binary supportrecovery problem by employing the well-known Chernoffbound in detection theory [44] and extend it to multiplesupport recovery. We also study the effect of noise on systemperformance. In Section IV, an information theoretic lowerbound is given by using the Fano’s inequality, and a necessarycondition is shown for the DOA problem considered in [7].We focus on the Gaussian ensemble in Section V. Necessaryand sufficient conditions on system parameters for accuratesupport recovery are given and their implications discussed.The paper is concluded in Section VI.

II. N OTATIONS, MODELS, AND PRELIMINARIES

A. Notations

We first introduce some notations used throughout thispaper. Supposex ∈ FN is a column vector. We denote byS = supp(x) ⊆ 1, . . . , N the support ofx, which isdefined as the set of indices corresponding to the non-zerocomponents ofx. For a matrixX , S = supp (X) denotes theindex set of non-zero rows ofX . Here the underlying fieldFcan be assumed asR or C. We consider both real and complexcases simultaneously. For this purpose, we denote a constantκ = 1/2 or 1 for the real or complex case, respectively.

SupposeS is an index set. We denote by|S| the number ofelements inS. For any column vectorx ∈ FN , xS ∈ FN isthe vector inF|S| formed by the components ofx indicated bythe index setS; for any matrixB, BS denotes the submatrixformed by picking the rows ofB corresponding to indices inS, while BS is the submatrix with columns fromB indicatedby S. If I and J are two index sets, thenBI

J = (BI)J ,the submatrix ofB with rows indicated byI and columnsindicated byJ .

Transpose of a vector or matrix is denoted by′ whileconjugate transpose by†. A ⊗ B represents the Kroneckerproduct of two matrices. For a vectorv, diag(v) is the diagonalmatrix with the elements ofv in the diagonal. The identitymatrix of dimensionM is IM . The trace of matrixA is givenby tr(A), the determinant by|A|, and the rank byrank(A).Though the notation for determinant is inconsistent with thatfor cardinality of an index set, the exact meaning can alwaysbe understood from the context.

Bold symbols are reserved forrandomvectors and matrices.We useP to denote the probability of an event andE theexpectation. The underlying probability space can be inferredfrom the context. Gaussian distribution for a random vectorinfield F with meanµ and covariance matrixΣ is representedby FN (µ,Σ) . Matrix variate Gaussian distribution [45] forY ∈ FM×T with meanΘ ∈ FM×T and covariance matrixΣ ⊗ Ψ, whereΣ ∈ FM×M and Ψ ∈ FT×T , is denoted byFNM,T (Θ,Σ ⊗ Ψ)

Supposefn∞n=1, gn∞n=1 are two positive sequences,fn = o(gn) means thatlimn→∞

fn

gn= 0. An alternative

notation in this case isgn ≫ fn. We usefn = O(gn) to denotethat there exists anN ∈ N andC > 0 independent ofN suchthat fn ≤ Cgn for n ≥ N . Similarly, fn = Ω(gn) meansfn ≥ Cgn for n ≥ N . These simple but expedient notationsintroduced by G. H. Hardy greatly simplify derivations [46].

B. Models

Next, we introduce our mathematical model. Supposex (t) ∈ FN , t = 1, . . . , T are jointly sparse signals withcommon support; that is, only a few components ofx (t)are non-zero and the indices corresponding to these non-zerocomponents are the same for allt = 1, . . . , T . The commonsupportS = supp (x (t)) has known sizeK = |S|. We assumethat the vectorsxS (t) , t = 1, . . . , T formed by the non-zerocomponents ofx(t) follow i.i.d. FN (0, IK). The measurementmodel is as follows:

y (t) = Ax (t) + w (t) , t = 1, 2, . . . , T, (2)

where A is the measurement matrix andy (t) ∈ FM themeasurements. The additive noisew (t) ∈ FN is assumed tofollow i.i.d. FN

(

0, σ2IM)

. Note that assuming unit variancefor signals loses no generality since only the ratio of signalvariance to noise variance appears in all subsequence analyses.In this sense, we view1/σ2 as the signal-to-noise ratio (SNR).

Let X =[

x (1) x (2) · · · x (T )]

and Y , W be de-fined in a similar manner. Then we write the model in themore compact matrix form:

Y = AX + W . (3)

We start our analysis for general measurement matrixA.For an arbitrary measurement matrixA ∈ FM×N , if everyM × M submatrix ofA is non-singular, we then callA anon-degeneratemeasurement matrix. In this case, the corre-sponding linear systemAx = b is said to have theUniqueRepresentation Property (URP), the implication of which isdiscussed in [13]. While most of our results apply to generalnon-degenerate measurement matrices, we need to imposemore structure on the measurement matrices in order to obtainmore profound results. In particular, we will consider Gaussianmeasurement matrixA whose elementsAmn are generatedfrom i.i.d. FN (0, 1). However, since our performance analysisis carried out by conditioning on a particular realization of A,we still use non-boldA except in Section V. The role playedby the variance ofAmn is indistinguishable from that of asignal variance and hence can be combined to1/σ2, the SNR,by the note in the previous paragraph.


We now consider two hypothesis-testing problems. The firstone is a binary support recovery problem:

H0 : supp (X) = S0

H1 : supp (X) = S1. (4)

The results we obtain for binary binary support recovery (4)offer insight into our second problem: the multiple supportrecovery. In the multiple support recovery problem we chooseone among

(

NK

)

distinct candidate supports ofX, which is amultiple-hypothesis testing problem:

H0 : supp (X) = S0

H1 : supp (X) = S1

...HL−1 : supp (X) = SL−1

. (5)

C. Preliminaries for Hypothesis Testing

We now briefly introduce the fundamentals of hypothesistesting. The following discussion is based mainly on [44]. Ina simple binary hypothesis test, the goal is to determine whichof two candidate distributions is the true one that generates thedata matrix (or vector)Y :

H0 : Y ∼ p (Y |H0)H1 : Y ∼ p (Y |H1)

. (6)

There are two types of errors when one makes a choicebased on the observed dataY . A false alarm correspondsto choosingH1 when H0 is true, while amiss happens bychoosingH0 whenH1 is true. The probabilities of these twotypes of errors are called the probability of a false alarm andthe probability of a miss, which are denoted by

PF = P (ChooseH1|H0), (7)

PM = P (ChooseH0|H1), (8)

respectively. Depending on whether one knows the priorprobabilitiesP(H0) and P(H1) and assigns losses to errors,different criteria can be employed to derive the optimal de-cision rule. In this paper we adopt the probability of errorwith equal prior probabilities ofH0 and H1 as the decisioncriterion; that is, we try to find the optimal decision rule byminimizing

Perr = PFP(H0) + PMP(H1) =1

2PF +

1

2PD. (9)

The optimal decision rule is then given by thelikelihood ratiotest:

ℓ(Y ) = logp (Y |H1)

p (Y |H0)

H1

≷H0

0 (10)

wherelog(·) is the natural logarithm function.The probability of error associated with the optimal decision

rule, namely, the likelihood ratio test (10), is a measure ofthe best performance a system can achieve. In many casesof interest, the simple binary hypothesis testing problem (6)is derived from a signal-generation system. For example, ina digital communication system, hypothesesH0 and H1 cor-respond to the transmitter sending digit0 and1, respectively,and the distributions of the observed data under the hypothesesare determined by the modulation method of the system.

Therefore, the minimal probability of error achieved by thelikelihood ratio test is a measure of the performance of themodulation method. For the problem addressed in this paper,the minimal probability of error reflects the measurementmatrix’s ability to distinguish different signal supports.

The Chernoff bound [44] is a well-known tight upper boundon the probability of error. In many cases, the optimum testcan be derived and implemented efficiently but an exact perfor-mance calculation is impossible. Even if such an expressioncan be derived, it is too complicated to be of practical use.For this reason, sometimes a simple bound turns out to bemore useful in many problems of practical importance. TheChernoff bound, based on the moment generating function ofthe test statisticℓ(Y ) (10), provides an easy way to computesuch a bound.

Define µ(s) as the logarithm of the moment generatingfunction of ℓ(Y ):

µ(s) , log

∫ ∞

−∞

esℓ(Y )p(Y |H0)dY

= log

∫ ∞

−∞

[p(Y |H1)]s[p(Y |H0)]

1−sdY . (11)

Then the Chernoff bound states that

PF ≤ exp[µ(sm)] ≤ exp[µ(s)], (12)

PM ≤ exp[µ(sm)] ≤ exp[µ(s)], (13)

and

Perr ≤1

2exp[µ(sm)] ≤ 1

2exp[µ(s)], (14)

where0 ≤ s ≤ 1 and sm = argmin0≤s≤1µ(s). Note that arefined argument gives the constant1/2 in (14) instead of1 asobtained by direct application of (12) and (13) [44]. We usethese bounds to study the performance of the support recoveryproblem.

We next extend to multiple-hypothesis testing the key ele-ments of the binary hypothesis testing. The goal in a simplemultiple-hypothesis testing problem is to make a choice amongL distributions based on the observations:

H0 : Y ∼ p (Y |H0)H1 : Y ∼ p (Y |H1)

...HL−1 : Y ∼ p (Y |HL−1)

. (15)

Using the total probability of error as a decision criterionand assuming equal prior probabilities for all hypotheses,weobtain the optimal decision rule given by

H∗ = argmax0≤i≤L−1p (Y |Hi). (16)

Application of the union bound and the Chernoff bound (14)shows that the total probability of error is bounded as follows:

Perr =

L−1∑

i=0

P(H∗ 6= Hi|Hi)P(Hi)

≤ 1

2L

L−1∑

i=0

L−1∑

j=0j 6=i

exp[µ(s; Hi,Hj)], 0 ≤ s ≤ 1, (17)


whereexp[µ(s; Hi,Hj)] is the moment-generating function inthe binary hypothesis testing problem forHi andHj . Hence,we obtain an upper bound for multiple-hypothesis testing fromthat for binary hypothesis testing.

III. U PPERBOUND ON PROBABILITY OF ERROR FOR

NON-DEGENERATEMEASUREMENTMATRICES

In this section, we apply the general theory for hypothesistesting, the Chernoff bound on the probability of error inparticular, to the support recovery problems (4) and (5). Wefirst study binary support recovery, which lays the foundationfor the general support recovery problem.

A. Binary Support Recovery

Under model (3) and the assumptions pertaining to it,observationsY follow a matrix variate Gaussian distribution[45] when the true support isS:

Y |S ∼ FNM,T (0,ΣS ⊗ IT ) (18)

with the probability density function (pdf) given by

p (Y |S) =1

(π/κ)κMT |ΣS |κTexp

[

−κtr(

Y †Σ−1S Y

)]

,

(19)whereΣS = ASA

†S +σ2IM is the common covariance matrix

for each column ofY . The binary support recovery problem(4) is equivalent to a linear Gaussian binary hypothesis testingproblem:

H0 : Y ∼ FNM,T (0,ΣS0⊗ IT )

H1 : Y ∼ FNM,T (0,ΣS1⊗ IT )

. (20)

From now on, for notation simplicity we will denoteΣSiby

Σi. The optimal decision rule with minimal probability of errorgiven by the likelihood ratio testℓ(Y ) (10) reduces to

− κtr[

Y †(

Σ−11 − Σ−1

0

)

Y]

− κT log|Σ1||Σ0|

H1

RH0

0. (21)

To analyze the performance of the likelihood ratio test (21),we first compute the log-moment-generating function ofℓ(Y )according to (11):

µ(s) (22)

= log

∫

[p (Y |H1)]s [p (Y |H0)]

1−s dY

= log

[

1

(π/κ)κMT |Σ1|κsT |Σ0|κ(1−s)T

×∫

exp

−κtr[

Y †(

sΣ−11 + (1 − s)Σ−1

0

)

Y]

dY

]

= log

[∣

∣sΣ−11 + (1 − s)Σ−1

0

∣

∣

−κT

|Σ1|κsT |Σ0|κ(1−s)T

]

= −κT log∣

∣sH1−s + (1 − s)H−s∣

∣ , 0 ≤ s ≤ 1, (23)

whereH = Σ1/20 Σ

−11 Σ

1/20 . The computation of the exact

minimizersm = argmin0≤s≤1µ(s) is non-trivial and will leadto an expression ofµ(sm) too complicated to handle. When|S0| = |S1| and the columns ofA are not highly correlated,

for example in the case ofA with i.i.d. elements,sm ≈ 12 .

We then takes = 12 in the Chernoff bounds (12), (13), and

(14). Whereas the bounds obtained in this way may not be theabsolute best ones, they are still valid.

As positive definite Hermitian matrices,H andH−1 canbe simultaneously diagonalized by a unitary transformation.Suppose that the eigenvalues ofH are λ1 ≥ · · · ≥λk0

> 1 = · · · = 1 > σ1 ≥ · · · ≥ σk1and D =

diag[λ1, . . . , λk0, 1, . . . , 1, σ1, . . . , σk1

]. Then it is easy toshow that

µ(1/2) = −κT log

∣

∣

∣

∣

D1/2 +D−1/2

2

∣

∣

∣

∣

= −κT[

k0∑

j=1

log

(

√

λj + 1/√

λj

2

)

+

k1∑

j=1

log

(√σj + 1/

√σj

2

)

]

.(24)

Therefore, it is necessary to count the numbers of eigen-values ofH that are greater than 1, equal to 1 and less than1, i.e., the values ofk0 and k1 for general non-degeneratemeasurement matrixA. We have the following theorem onthe eigenvalue structure ofH :

Proposition 1 For any non-degenerate measurement matrixA, let H = Σ

1/20 Σ−1

1 Σ1/20 , ki = |S0 ∩ S1|, k0 = |S0\S1| =

|S0|−ki, k1 = |S1\S0| = |S1|−ki and assumeM > k0 +k1;thenk0 eigenvalues of matrixH are greater than1, k1 lessthan 1, andM − (k0 + k1) equal to1.

Proof: See Appendix A.

For binary support recovery (4) with|S0| = |S1| = K,we havek0 = k1 , kd. The subscriptsi and d in ki andkd are short for “intersection” and “difference”, respectively.Employing the Chernoff bounds (14) and Proposition 1, wehave

Proposition 2 If M ≥ 2kd, the probability of error for thebinary support recovery problem(4) is bounded by

Perr ≤ 1

2

[

λS0,S1λS1,S0

16

]−κkdT/2

, (25)

whereλSi,Sjis the geometric mean of the eigenvalues ofH =

Σ1/2i Σ

−1j Σ

1/2i that are greater than one.

Proof: According to (14) and (24), we have

Perr ≤ 1

2exp

[

µ

(

1

2

)]

≤ 1

2

kd∏

j=1

(

√

λj

2

)

kd∏

j=1

(

1/√σj

2

)

−κT

=1

2

(

∏kd

j=1 λj

)1/kd(

∏kd

j=11σj

)1/kd

16

−κkdT/2

.


Define λSi,Sjas the geometric mean of the eigenvalues of

H = Σ1/2i Σ

−1j Σ

1/2i that are greater than one. Then obvi-

ously we haveλS0,S1=(

∏kd

j=1 λj

)1/kd

. SinceH−1 and

Σ1/21 Σ−1

0 Σ1/21 have the same set of eigenvalues,1/σj, j =

1, . . . , kd are the eigenvalues ofΣ1/21 Σ−1

0 Σ1/21 that are greater

than 1. We conclude thatλS1,S0=(

∏kd

j=1 1/σj

)1/kd

. Note that λS0,S1

and λS1,S0completely determine the

measurement system (3)’s performance in differentiating twodifferent signal supports. It must be larger than the constant16 for a vanishing bound when more temporal samples aretaken. Once the threshold 16 is exceeded, taking more sampleswill drive the probability of error to 0 exponentially fast.From numerical simulations and our results on the Gaussianmeasurement matrix,λSi,Sj

does not vary much whenSi, Sj

and kd change, as long as the elements in the measurementmatrixA are highly uncorrelated.1 Therefore, quite appealingto intuition, the larger the sizekd of the difference set betweenthe two candidate supports, the smaller the probability of error.

B. Multiple Support Recovery

Now we are ready to use the union bound (17) to studythe probability of error for the multiple support recoveryproblem (5). We assume each candidate supportSi has knowncardinality K, and we haveL =

(

NK

)

such supports. Ourgeneral approach is also applicable to cases for which we havesome prior information on the structure of the signal’s sparsitypattern, for example the setup in model-based CompressiveSensing [47]. In these cases, we usually haveL ≪

(

NK

)

supports, and a careful examination on the intersection patternof these supports will give a better bound. However, in thispaper we will not address this problem and will instead focuson the full support recovery problem withL =

(

NK

)

. Definingλ = mini6=jλSi,Sj

, we have the following theorem:

Theorem 1 If M ≥ 2K and λ > 4 [K (N −K)]1

κT , then theprobability of error for the full support recovery problem(5)with |Si| = K andL =

(

NK

)

is bounded by

Perr ≤1

2

K(N−K)

(λ/4)κT

1 − K(N−K)

(λ/4)κT

. (26)

Proof: Combining the bound in Proposition 2 and Equation(17), we have

Perr ≤ 1

2L

L−1∑

i=0

L−1∑

j=1j 6=i

[

λSi,SjλSj ,Si

16

]−κkdT/2

≤ 1

2L

L−1∑

i=0

L−1∑

j=1j 6=i

(

λ

4

)−κkdT

.

Herekd depends on the supportsSi andSj . For fixedSi, thenumber of supports that have a difference set withSi with

1Unfortunately, this is not the case when the columns ofA are samplesfrom uniform linear sensor array manifold.

cardinality kd is(

Kkd

)(

N−Kkd

)

. Therefore, using(

Kkd

)

≤ Kkd

and(

N−Kkd

)

≤ (N − K)kd and the summation formula forgeometric series, we obtain

Perr ≤ 1

2L

L−1∑

i=0

K∑

kd=1

(

K

kd

)(

N −K

kd

)(

λ

4

)−κkdT

≤ 1

2

K∑

kd=1

[

K (N −K)(

λ/4)κT

]kd

≤ 1

2

K(N−K)

(λ/4)κT

1 − K(N−K)

(λ/4)κT

.

We make several comments here. First,λ depends solelyon the measurement matrixA. Compared with the resultsin [43], where the bounds involve the signal, we get moreinsight into what quantity of the measurement matrix isimportant in support recovery. This information is obtainedby modelling the signalsx(t) as Gaussian random vectors.The quantityλ effectively characterizes system (3)’s abilityto distinguish different supports. Clearly,λ is related to therestricted isometry property (RIP), which guarantees stablesparse signal recovery in Compressive Sensing [4]–[6]. Wediscuss the relationship between RIP andλ for the special casewith K = 1 at the end of Section III-C. However, a preciserelationship for the general case is yet to be discovered.

Second, we observe that increasing the number of tempo-ral samples plays two roles simultaneously in the measure-ment system. For one thing, it decreases the the threshold4[K(N − K)]

1

κT that λ must exceed for the bound (26)to hold. However, sincelimT→∞ 4[K(N − K)]

1

κT = 4 forfixed K and N , increasing temporal samples can reducethe threshold only to a certain limit. For another, since thebound (26) is proportional toe−T log(λ/4), the probability oferror turns to 0 exponentially fast asT increases, as long asλ > 4 [K (N −K)]

1

κT is satisfied.In addition, the final bound (26) is of the same order as

the probability of error whenkd = 1. The probability of errorPerr is dominated by the probability of error in cases for whichthe estimated support differs by only one index from the truesupport, which are the most difficult cases for the decisionrule to make a choice. However, in practice we can imaginethat these cases induce the least loss. Therefore, if we assignweights/costs to the errors based onkd, then the weightedprobability of error or average cost would be much lower. Forexample, we can choose the costs to exponentially decreasewhenkd increases. Another possible choice of cost functionis to assume zero cost whenkd is below a certain criticalnumber. Our results can be easily extended to these scenarios.

Finally, note that our bound (26) applies to any non-degenerate matrix. In Section V, we apply the bound to Gaus-sian measurement matrices. The additional structure allows usto derive more profound results on the behavior of the bound.

C. The Effect of Noise

In this subsection, we explore how the noise variance affectsthe probability of error, which is equivalent to analyzing the


behavior ofλSi,Sjand λ as indicated in (25) and (26).

We now derive bounds on the eigenvalues ofH . The lowerbound is expressed in terms of the QR decomposition of asubmatrix of the measurement matrix with the noise varianceσ2 isolated.

Proposition 3 For any non-degenerate measurement matrixA, let H = Σ

1/20 Σ−1

1 Σ1/20 with Σi = ASi

A†Si

+ σ2IM , ki =|S0∩S1|, k0 = |S0\S1| = |S0|−ki, k1 = |S1\S0| = |S1|−ki.We have the following:

1) if M > k0 + k1, then the sorted eigenvalues ofHthat are greater than 1 are lower bounded by thecorresponding eigenvalues ofIk0

+ 1σ2R33R

†33, where

R33 is thek0 × k0 submatrix at the lower-right cornerof the upper triangle matrix in the QR decomposition of[

AS1\S0AS1S0

AS0\S1

]

;2) the eigenvalues ofH are upper bounded by the cor-

responding eigenvalues ofIM + 1σ2AS0\S1

A†S0\S1

; inparticular, the sorted eigenvalues ofH that are greaterthan1 are upper bounded by the corresponding ones ofIk0

+ 1σ2A

†S0\S1

AS0\S1.

Proof: See Appendix B.

The importance of this proposition is twofold. First, byisolating the noise variance from the expression of matrixH , this theorem clearly shows that when noise variancedecreases to zero, the relatively large eigenvalues ofH willblow up, which results in increased performance in supportrecovery. Second, the bounds provide ways to analyze specialmeasurement matrices, especially the Gaussian measurementensemble discussed in Section V.

We have the following corollary:

Corollary 1 For support recovery problems(4) and (5) withsupport sizeK, supposeM ≥ 2K; then there exist constantsc1, c2 > 0 that depend only on the measurement matrixAsuch that

1 +c2σ2

≥ λ ≥ 1 +c1σ2. (27)

From (25) and (26), we then conclude that for any temporalsample sizeT

limσ2→0

Perr = 0 (28)

and the speed of convergence is approximately(σ2)κkdT and(σ2)κT for the binary and multiple cases, respectively.

Proof: According to Proposition 3, for any fixedSi, Sj , theeigenvalues ofH = Σ

1/2i Σ−1

j Σ1/2i that are greater than1 are

lower bounded by those ofIkd+ 1

σ2R33R†33; hence we have

λSi,Sj≥

∣

∣

∣

∣

Ikd+

1

σ2R33R

†33

∣

∣

∣

∣

1/kd

≥∣

∣

∣

∣

Ikd+

1

σ2R2

33

∣

∣

∣

∣

1/kd

=

[

kd∏

l=1

(

1 +1

σ2r2ll

)

]1/kd

≥ 1 +1

σ2

(

kd∏

l=1

r2ll

)1/kd

, (29)

whererll is the lth diagonal element ofR33. For the secondinequality we have used Fact 8. 11. 20 in [48]. SinceA isnon-degenerate andM ≥ 2K,

[

ASj\SiASjSi

ASi\Sj

]

isof full rank andr2ll > 0, 0 ≤ l ≤ kd for all Si, Sj . Defining

c1 as the minimal value of(

∏kd

l=1 r2ll

)1/kd

’s over all possiblesupport pairsSi, Sj , we then havec1 > 0 and

λ ≥ 1 +c1σ2.

On the other hand, the upper bound on the eigenvalues ofHyields

λSi,Sj≤

∣

∣

∣

∣

Ikd+

1

σ2A†

Si\SjASi\Sj

∣

∣

∣

∣

1/kd

≤ 1 +1

σ2kdtr(

A†Si\Sj

ASi\Sj

)

= 1 +1

σ2kd

∑

1≤m≤Mn∈Si\Sj

|Amn|2 . (30)

Therefore, we haveλ ≤ 1 +

c2σ2,

with c2 = maxS:|S|≤K1|S|

∑

1≤m≤Mn∈S

|Amn|2. All other state-

ments in the theorem follows immediately from (25) and (26).

Corollary 1 suggests that in the limiting case where there isno noise,M ≥ 2K is sufficient to recover aK−sparse signal.This fact has been observed in [4]. Our result also shows thatthe optimal decision rule, which is unfortunately inefficient,is robust to noise. Another extreme case is when the noisevarianceσ2 is very large. Then fromlog(1+x) ≈ x, 0 < x <<1, the bounds in (25) and (26) are approximated bye−κkdT/σ2

and e−κT/σ2

. Therefore, the convergence exponents for thebounds are proportional to the SNR in this limiting case.

The diagonal elements ofR33, rll ’s, have clear meanings.Since QR factorization is equivalent to the Gram-Schmidtorthogonalization procedure,r11 is the distance of the firstcolumn ofASi/Sj

to the subspace spanned by the columnsof ASj

; r22 is the distance of the second column ofASi/Sj

to the subspace spanned by the columns ofASjplus the first

column ofASi/Sj, and so on. Therefore,λSi,Sj

is a measureof how well the columns ofASi/Sj

can be expressed bythe columns ofASj

, or, put another way, a measure of the


incoherence between the columns ofASiandASj

. Similarly,λ is an indicator of the incoherence of the entire matrixA oforderK.

To relateλ with the incoherence, we consider the case withK = 1 andF = R. By restricting our attention to matrices withunit columns, the above discussion implies that a better boundis achieved if the minimal distance of all pairs of columnvectors of matrixA is maximized. Finding such a matrixA isequivalent to finding a matrix with the inner product betweencolumns as large as possible, since the distance between twounit vectorsu andv is 2−2| < u, v > | where< u, v >= u′vis the inner product betweenu andv. For each integers, theRIP constantδs is defined as the smallest number such that[4], [5]:

1 − δs ≤ ‖Ax‖22

‖x‖22

≤ 1 − δs, |supp(x)| = s. (31)

A direct computation shows thatδ2 is equal to the minimum ofthe absolute values of the inner products between all pairs ofcolumns ofA. Hence, the requirements of finding the smallestδ2 that satisfies (31) and maximizingλ coincide whenK =1. For generalK, Milenkovic et.al. established a relationshipbetweenδ2 and δK via Gersgorin’s disc theorem [49] anddiscussed them as well as some coding theoretic issues inCompressive Sensing context [50].

IV. A N INFORMATION THEORETICLOWER BOUND ON

PROBABILITY OF ERROR

In this section, we derive an information theoretic lowerbound on the probability of error forany decision rule in themultiple support recovery problem. The main tool is a variantof the well-known Fano’s inequality [51]. In the variant, theaverage probability of error in a multiple-hypothesis testingproblem is bounded in terms of the Kullback-Leibler diver-gence [52]. Suppose that we have a random vector or matrixY

with L possible densitiesf0, . . . , fL−1. Denote the average ofthe Kullback-Leibler divergence between any pair of densitiesby

β =1

L2

∑

i,j

DKL(fi||fj). (32)

Then by Fano’s inequality [53], [43], the probability of error(17) for any decision rule to identify the true density is lowerbounded by

Perr ≥ 1 − β + log 2

logL. (33)

Since in the multiple support recovery problem (5), all thedistributions involved are matrix variate Gaussian distributionswith zero mean and different variances, we now compute theKullback-Leibler divergence between two matrix variate Gaus-sian distributions. Supposefi = FNM,T (0,Σi ⊗ IT ), fj =FNM,T (0,Σj ⊗ IT ), the Kullback-Leibler divergence has

closed form expression:

DKL(fi||fj) = Efilog

fi

fj

=1

2Efi

[

−κtr[

Y †(

Σ−1i − Σ−1

j

)

Y]

− κT log|Σi||Σj |

]

=1

2κT

[

tr (Hi,j − IM ) + log|Σj ||Σi|

]

,

whereHi,j = Σ1/2i Σ−1

j Σ1/2i . Therefore, we obtain the average

Kullback-Leibler divergence (32) for the multiple supportrecovery problem as

β =1

L2

∑

Si,Sj

1

2κT

[

tr(Hi,j) −M + log|Σj ||Σi|

]

=κT

2L2

∑

Si,Sj

[tr(Hi,j) −M ] ,

where thelog|Σj ||Σi|

terms all cancel out andL =(

NK

)

. Invokingthe second part of Proposition 3, we get

tr (Hi,j) ≤ tr

(

IM +1

σ2ASi\Sj

A†Si\Sj

)

= M +1

σ2

∑

1≤m≤Mn∈Si\Sj

|Amn|2 .

Therefore, the average Kullback-Leibler divergence is boundedby

β ≤ κT

2σ2L2

∑

Si,Sj

∑

1≤m≤Mn∈Si\Sj

|Amn|2 .

Due to the symmetry of the right-hand side, it must be ofthe forma

∑

1≤m≤M1≤n≤N

|Amn|2 = a ‖A‖2F, where‖ · ‖F is the

Frobenius norm. Setting allAmn = 1 gives

κT

2σ2L2

∑

Si,Sj

∑

1≤m≤Mn∈Si\Sj

1

=κT

2σ2L2

L−1∑

i=0

K∑

kd=1

(

K

kd

)(

N −K

kd

)

kdM

= aMN.

Therefore, we geta = κTK(N−K)2σ2N2 using the mean expression

for hypergeometric distribution:

K∑

kd=1

(

Kkd

)(

N−Kkd

)

(

NK

) kd =K (N −K)

N.

Hence, we have

β ≤ κTK (N −K)

2σ2N2‖A‖2

F .

Therefore, the probability of error is lower bounded by

Perr ≥ 1 −κTK(N−K)

2σ2N2 ‖A‖2F + log 2

logL. (34)

We conclude with the following theorem:


Theorem 2 For multiple support recovery problem(5), theprobability of error for any decision rule is lower bounded by

Perr ≥ 1 − κT KN

(

1 − KN

)

‖A‖2F

2σ2 log(

NK

) + o (1) . (35)

Each term in bound (35) has clear meanings. The Frobeniusnorm of measurement matrix‖A‖2

F is total gain of system (2).Since the measured signal isK−sparse, only a fraction of thegain plays a role in the measurement, and its average over allpossibleK−sparse signals isKN ‖A‖2

F. While an increase insignal energy enlarges the distances between signals, a penaltyterm

(

1 − KN

)

is introduced because we now have moresignals. The termlogL = log

(

NK

)

is the total uncertainty orentropy of the support variableS, since we impose a uniformprior on it. As long asK ≤ N

2 , increasingK increases boththe average gain exploited by the measurement system, and theentropy of the support variableS. The overall effect, quitecounterintuitively, is a decrease of the lower bound in (35).Actually, the term involvingK,

KN

(1−KN

)

log (N

K), is approximated by

an increasing functionα(1−α)NH(α) with α = K

N and the binaryentropy functionH(α) = −α logα − (1 − α) log(1 − α).The reason for the decrease of the bound is that the boundonly involves theeffectiveSNR without regard to any innerstructure ofA (e.g. the incoherence) and the effective SNRincreases withK. To see this, we compute the effective SNR

asEx

[

‖Ax‖2

F

Mσ2

]

=KN

‖A‖2

F

Mσ2 . If we scale down the effective SNR

through increasing the noise energyσ2 by a factor ofK, thenthe bound is strictly increasing withK.

The above analysis suggests that the lower bound (35)is weak as it disregards any incoherence property of themeasurement matrixA. For some cases, the bound reducesto 2κσ2 log N

K (refer to Corollary 2, Theorem 3 and 4) andis less thanK when the noise level orK is relatively large.Certainly recovering the support is not possible with fewerthanK measurements. The bound is loose also in the sensethat whenT , ‖A‖2

F, or the SNR1/σ2 is large enough thebound becomes negative, but when there is noise, perfectsupport recovery is generally impossible. While the originalFano’s inequality

H (Perr) + Perr log (L− 1) ≥ H (S|Y ) (36)

is tight in some cases [51], the adoption of the averagedivergence (32) as an upper bound on the mutual informationI (S; Y ) between the random supportS and the observationY reduces the tightness (see the proof of (33) in [54]). Due tothe difficulty of computingH(S|Y ) andI(S; Y ) analytically,it is not clear whether a direction application of (36) resultsin a significantly better bound.

Despite of its drawbacks we discussed, the bound (35)identifies the importance of the gain‖A‖2

F of the measurementmatrix, a quantity usually ignored in, for example, Compres-sive Sensing. We can also draw some interesting conclusionsfrom (35) for measurement matrices with special properties. Inparticular, in the following corollary, we consider measurementmatrices with rows or columns normalized to one. The rows ofa measurement matrix are normalized to one in sensor network

scenario (SNET) where each sensor is power limited whilethe columns are sometimes normalized to one in CompressiveSensing (refer to [42] and references therein).

Corollary 2 In order to have a probability of errorPerr < εwith 0 < ε < 1, the number of measurements must satisfy:

MT ≥ (1 − ε)2σ2K log N

K

κKN (1 − K

N )+ o(1)

≥ (1 − ε)8σ2

κK log

N

K+ o(1), (37)

if the rows ofA have unit norm; and

T ≥ (1 − ε)2σ2 log N

K

κ(

1 − KN

) + o(1)

≥ (1 − ε)2σ2

κlog

N

K+ o(1), (38)

if the columns ofA have unit norm.

Note that the necessary condition (37) has the same criticalquantity as the sufficient condition in Compressive Sensing.The inequality in (38) is independent ofM . Therefore, if thecolumns are normalized to have unit norm, it is necessaryto have multiple temporal measurements for a vanishingprobability of error. Refer to Theorem 3 and 4 and discussionsfollowing them.

In the work of [7], each column ofA is the array manifoldvector function evaluated at a sample of the DOA parameter.The implication of the bound (35) for optimal design is that weshould construct an array whose geometry leads to maximal‖A‖2

F. However, under the narrowband signal assumption andnarrowband array assumption [55], the array manifold vectorfor isotropic sensor arrays always has norm

√M [56], which

means that‖A‖2F = MN . Hence in this case, the probability

of error is always bounded by

Perr ≥ 1 − T KN

(

1 − KN

)

MN

2σ2 log(

NK

) + o (1) . (39)

Therefore, we have the following theorem,

Theorem 3 Under the narrowband signal assumption andnarrowband array assumption, for an isotropic sensor arrayin the DOA estimation scheme proposed in [7], in order tolet the probability of errorPerr < ε with 0 < ε < 1 for anydecision rule, the number of measurements must satisfy thefollowing:

MT ≥ (1 − ε)2σ2 log

(

NK

)

K(

1 − KN

) + o(1)

≥ (1 − ε) 2σ2 logN

K+ o(1). (40)

We comment that the same lower bound applies to Fouriermeasurement matrix (not normalized by1/

√M ) due to the

same line of argument. We will not explicitly present this resultin the current paper.

Since in radar and sonar applications the number of targetsK is usually small, our result shows that the number of


samples is lower bounded bylogN . Note thatN is thenumber of intervals we use to divide the whole range of DOA;hence, it is a measure of resolution. Therefore, the number ofsamples only needs to increase in the logarithm ofN , whichis very desirable. The symmetric roles played byM and Tare also desirable sinceM is the number of sensors and isexpensive to increase. As a consequence, we simply increasethe number of samples to achieve a desired probability of error.In addition, unlike the upper bound of Theorem 1, we do notneed to assume thatM ≥ 2K in Theorem 2 and 3. Actually,Malioutov et.al.made the empirical observation thatℓ1−SVDtechnique can resolveM−1 sources if they are well separated[7]. Theorem 3 still applies to this extreme case.

Analysis of support recovery problem with measurementmatrix obtained from sampling a manifold has considerablecomplexity compared with the Gaussian case. For example, itpresents significant challenge to estimateλSi,Sj

in the DOAproblem except for a few special cases that we discuss in [57].As we mentioned before, unlike the Gaussian case,λSi,Sj

foruniform linear arrays varies greatly withSi andSj . Therefore,even if we can computeλSi,Sj

, replacing it withλ in the upperbound of Theorem 1 would lead to a very loose bound. Onthe other hand, the lower bound of Theorem 2 only involvesthe Frobenius norm of the measurement matrix, so we applyit to the DOA problem effortlessly. However, the lower boundis weak as it does not exploit any inner structure of themeasurement matrix.

Donohoet.al. considered the recovery of a “sparse” wide-band signal from narrow-band measurements [58], [59], aproblem with essentially the same mathematical structurewhen we sample the array manifold uniformly in the wavenumber domain instead of the DOA domain. It was foundthat the spectral norm of the product of the band-limitingand time-limiting operators is crucial to stable signal recoverymeasured by thel2 norm. In [58], Donoho and Stark boundedthe spectral norm using the Frobenius norm, which leads tothe well-known uncertainty principle. The authors commentedthat the uncertainty principle condition demands an extremedegree of sparsity for the signal. However, this condition canbe relaxed if the signal support are widely scattered. In [7],Malioutov et.al. also observed from numerical simulationsthat theℓ1−SVD algorithm performs much better when thesources are well separated than when they are located closetogether. In particular, they observed that presence of biasis mitigated greatly when sources are far apart. Donoho andLogan [59] explored the effect of the scattering of the signalsupport by using the “analytic principle of the large sieve”.They bounded the spectral norm for the limiting operator bythe maximum Nyquist density, a quantity that measures thedegree of scattering of the signal support. We expect that ourresults can be improved in a similar manner. The challengesinclude using support recovery as a performance measure,incorporating multiple measurements, as well as developingthe whole theory within a probabilistic framework.

V. SUPPORTRECOVERY FOR THEGAUSSIAN

MEASUREMENTENSEMBLE

In this section, we refine our results in previous sectionsfrom general non-degenerate measurement matrices to theGaussian ensemble. Unless otherwise specified, we alwaysassume that the elements in a measurement matrixA arei.i.d.samples from unit variance real or complex Gaussian distri-butions. The Gaussian measurement ensemble is widely usedand studied in Compressive Sensing [2]–[6]. The additionalstructure and the theoretical tools available enable us to derivedeeper results in this case. In this section, we assume generalscaling of(N,M,K, T ). We do not find in our results a cleardistinction between the regime of sublinear sparsity and theregime of linear sparsity as the one discussed in [43].

We first show two corollaries on the eigenvalue structurefor the Gaussian measurement ensemble. Then we derivesufficient and necessary conditions in terms ofM,N,K andT for the system to have a vanishing probability of error.

A. Eigenvalue Structure for a Gaussian Measurement Matrix

First, we observe that a Gaussian measurement matrix isnon-degenerate with probability one, since anyp ≤M randomvectors a1,a2, . . . ,ap from FN (0,Σ) with Σ ∈ RM×M

positive definite are linearly independent with probability one(refer to Theorem 3.2.1 in [45]). As a consequence, we have

Corollary 3 For Gaussian measurement matrixA, let H =

Σ1/20 Σ

−11 Σ

1/20 , ki = |S0∩S1|, k0 = |S0\S1| = |S0|−ki, k1 =

|S1\S0| = |S1| − ki. If M > k0 + k1, then with probabilityone,k0 eigenvalues of matrixH are greater than1, k1 lessthan 1, andM − (k0 + k1) equal to1.

We refine Proposition 3 based on the well-known QRfactorization for Gaussian matrices [45], [60].

Corollary 4 With the same notations as in Corollary 3, thenwith probability one, we have:

1) if M > k0 + k1, then the sorted eigenvalues ofH

that are greater than 1 are lower bounded by thecorresponding ones of Ik0

+ 1σ2 R33R

†33, where the

elements ofR33 = (rmn)k0×k0satisfy:

2κr2mn ∼ χ22κ(M−k1−ki−m+1), 1 ≤ m ≤ k0,

rmn ∼ FN (0, 1) , 1 ≤ m < n ≤ k0.

2) the eigenvalues ofH are upper bounded by the cor-responding eigenvalues ofIM + 1

σ2 AS0\S1A

†S0\S1

; inparticular, the sorted eigenvalues ofH that are greaterthan1 are upper bounded by the corresponding ones ofIk0

+ 1σ2 A

†S0\S1

AS0\S1.

Now with the distributions on the elements of the boundingmatrices, we can give sharp estimate onλSi,Sj

. In particular,we have the following proposition:

Proposition 4 For Gaussian measurement matrixA, supposeSi and Sj are a pair of distinct supports with the same sizeK. Then we have

1 +M

σ2≥ EλSi,Sj

≥ 1 +M −K − kd

σ2.


Proof: We copy the inequalities (29), (30) onλSi,Sjhere:

1+1

σ2kd

∑

1≤m≤Mn∈Si\Sj

|Amn|2 ≥ λSi,Sj≥ 1+

1

σ2

(

kd∏

m=1

|rmm|2)1/kd

.

The proof then reduces to the computation of two expecta-tions, one of which is trivial:

E1

σ2kd

∑

1≤m≤Mn∈S0\S1

|Amn|2 =M

σ2.

Next, the independence of thernn’s and the convexity ofexponential functions together with Jensen’s inequality yield

E1

σ2

(

kd∏

n=1

r2nn

)1/kd

=1

2κσ2E exp

[

1

kd

kd∑

n=1

log(

2κr2nn

)

]

≥ 1

2κσ2exp

[

1

kd

kd∑

n=1

E log(

2κr2nn

)

]

.

Since(

2κr2nn

)

∼ χ22κ(M−K−n+1), the expectation of log-

arithm is E log(

2κr2nn

)

= log 2 + ψ (κ(M −K − n+ 1)),

whereψ (z) = Γ′(z)Γ(z) is the digamma function. Note thatψ (z)

is increasing and satisfiesψ (z + 1) ≥ log z. Therefore, wehave

E1

σ2

(

kd∏

n=1

r2nn

)1/kd

≥ 1

2κσ2exp

[

log 2 +1

kd

kd∑

n=1

ψ (κ(M −K − n+ 1))

]

≥ 1

κσ2exp [ψ (κ(M −K − kd + 1))]

≥ 1

κσ2exp [log (κ(M −K − kd))]

≥ M −K − kd

σ2.

The expected value of the critical quantityλSi,Sjlies

between1 + M−2Kσ2 and1 + M

σ2 , linearly proportional toM .Note that in conventional Compressive Sensing, the varianceof the elements ofA is usually taken to be1

M , which isequivalent to scaling the noise varianceσ2 to Mσ2 in ourmodel. The resultantλSi,Sj

is then centered between1+1−2 K

M

σ2

and1 + 1σ2 .

B. Necessary Condition

One fundamental problem in Compressive Sensing is howmany samples should the system take to guarantee a sta-ble reconstruction. Although many sufficient conditions areavailable, non-trivial necessary conditions are rare. Besides,in previous works, stable reconstruction has been measuredinthe sense oflp norms between the reconstructed signal and the

true signal. In this section, we derive two necessary conditionson M and T in terms ofN and K in order to guaranteerespectively that, first,EPerr turns to zeros and, second, formajority realizations ofA, the probability of error vanishes.More precisely, we have the following theorem:

Theorem 4 In the support recovery problem(5), for anyε, δ > 0, a necessary condition ofEPerr < ε is

MT ≥ (1 − ε)2σ2 log

(

NK

)

κK(

1 − KN

) + o(1) (41)

≥ (1 − ε)2σ2

κlog

N

K+ o(1), (42)

and a necessary condition ofP Perr (A) ≤ ε ≥ 1 − δ is

MT ≥ (1 − ε− δ)2σ2 log

(

NK

)

κK(

1 − KN

) + o(1) (43)

≥ (1 − ε− δ)2σ2

κlog

N

K+ o(1). (44)

Proof: Equation (35) andE ‖A‖2F =

∑

m,l E |Aml|2 =MN give

EPerr ≥ 1 − κT KN

(

1 − KN

)

MN

2σ2 log(

NK

) + o (1) . (45)

Hence,EPerr < ε entails (41) and (42).Denote byE the eventA : Perr (A) ≤ ε; thenP Ec ≤

δ and we have

EPerr =

∫

E

Perr (A) +

∫

Ec

Perr (A)

≤ εP (E) + P (Ec)

≤ ε+ δ.

Therefore, from the first part of the theorem, we obtain (43)and (44).

We compare our results with those of [43] and [42]. Aswe mentioned in the introduction, the differences in problemmodeling and the definition of the probability of error makea direct comparison difficult. We first note that Theorem 2in [43] is established for the restricted problem where it isknown a priori that all non-zero components in the sparsesignal are equal. Because the set of signal realizations withequal non-zero components is a rare event in our signal model,it is not fitting to compare our result with the correspondingone in [43] by computing the distribution of the smallest on-support element, e.g., the expectation. Actually, the square ofthe smallest on-support element for the restricted problem,M2(β∗) (or β in [42]), is equivalent to the signal variancein our model: both are measures of the signal energy. If wetake into account the noise variance and replaceM2(β∗) (orβ2SNR in [42]) with 1/σ2 , the necessary conditions in thesepapers coincide with ours when only one temporal sample isavailable.

Our result shows that as far as support recovery is con-cerned, one cannot avoid thelog N

K term when only givenone temporal sample. Worse, for conventional Compressive


Sensing with a measurement matrix generated from a Gaussianrandom variable with variance1/M , the necessary conditionbecomes

T ≥ 2σ2 log(

NK

)

κK(

1 − KN

) + o(1)

≥ 2σ2

κlog

N

K+ o(1),

which is independent ofM . Therefore, when there isconsid-erable noise (σ2 > κ/(2 log N

K ) ), it is impossible to havea vanishingEPerr no matter how large anM one takes.Basically this situation arises because while taking moresamples, one scales down the measurement gainsAml, whicheffectively reduces the SNR and thus is not helpful in supportrecovery. As discussed below Theorem 3,log

(

NK

)

is theuncertainty of the support variableS, and log N

K actuallycomes from it. Therefore, it is no surprise that the numberof samples is determined by this quantity and cannot be madeindependent of it.

C. Sufficient Condition

We derive a sufficient condition in parallel with sufficientconditions in Compressive Sensing. In Compressive Sens-ing, when only one temporal sample is available,M =Ω(

K log NK

)

is enough for stable signal reconstruction forthe majority of the realizations of measurement matrixA

from a Gaussian ensemble with variance1M . As shown inthe previous subsection, if we take the probability of errorforsupport recovery as a performance measure, it is impossibleinthis case to recover the support with a vanishing probabilityof error unless the noise is small. Therefore, we consider aGaussian ensemble with unit variance. We first establish alemma to estimate the lower tail of the distribution forλSi,Sj

.We have shown that theE

(

λSi,Sj

)

lie between1+ M−2Kσ2 and

1+ Mσ2 . Whenγ is much less than1+ M−2K

σ2 , we expect thatP

λSi,Sj≤ γ

decays quickly. More specifically, we have thefollowing large deviation lemma:

Lemma 1 Suppose thatγ = 13

M−2Kσ2 . Then there exists

constantc > 0 such that forM − 2K sufficiently large, wehave

P

λSi,Sj≤ γ

≤ exp [−c (M − 2K)] .

This large deviation lemma together with the union boundyield the following sufficient condition for support recovery:

Theorem 5 Suppose that

M = Ω

(

K logN

K

)

(46)

and

κT logM

σ2≫ log [K (N −K)] . (47)

Then given any realization of measurement matrixA from aGaussian ensemble, the optimal decision rule(16) for multiplesupport recovery problem(5) has a vanishingPerr with

probability turning to one. In particular, ifM = Ω(

K log NK

)

and

T ≫ logN

log logN, (48)

then the probability of error turns to zero asN turns to infinity.

Proof: Denoteγ = 13

M−2Kσ2 . Then according to the union

bound, we have

P

λ ≤ γ

= P

⋃

Si 6=Sj

[

λSi,Sj≤ γ

]

≤∑

Si 6=Sj

P

λSi,Sj≤ γ

.

Therefore, application of Lemma 1 gives

P

λ ≤ γ

≤(

N

K

)2

K exp −c (M − 2K)

≤ exp

[

−c (M − 2K) + 2K logN

K+ logK

]

.

Hence, as long asM = Ω(

K log NK

)

, we know that theexponent turns to−∞ asN −→ ∞. We now defineE =

A : λ (A) > γ

, whereP E approaches one asN turnsto infinity. Now the upper bound (26) becomes

Perr = O

K (N −K)(

λ12σ2

)κT

= O

(

K (N −K)(

Mσ2

)κT

)

.

Hence, ifκT log Mσ2 ≫ log [K (N −K)], we get a vanishing

probability of error. In particular, under the assumption thatM ≥ Ω

(

K log NK

)

, if T ≫ log Nlog log N , then log[K(N−K)]

log[K log NK ]

≤log N

log log N implies thatK (N −K) ≪ O

(

(

K log NK

σ2

)κT)

=

O

(

K(N−K)

( M

σ2 )κT

)

for suitably selected constants.

We now consider several special cases and explore theimplications of the sufficient conditions. The discussionsareheuristic in nature and their validity requires further checking.

If we setT = 1, then we needM to be much greater thanN to guarantee a vanishing probabilityPerr. This restrictionsuggests that even if we have more observations than theoriginal signal lengthN , in which case we can obtain theoriginal sparse signal by solving a least squares problem, westill might not be able to get the correct support because of thenoise, as long asM is not sufficiently large compared toN .We discussed in the introduction that for many applications,the support of a signal has significant physical implicationsand its correct recovery is of crucial importance. Therefore,without multiple temporal samples and with moderate noise,the scheme proposed by Compressive Sensing is questionable


as far as support recovery is concerned. Worse, if we set thevariance for the elements inA to be1/M as in CompressiveSensing, which is equivalent to replacingσ2 with Mσ2, evenincreasing the number of temporal samples will not improvethe probability of error significantly unless the noise varianceis very small. Hence, using support recovery as a criterion,one cannot expect the Compressive Sensing scheme to workvery well in the low SNR case. This conclusion is not asurprise, since we reduce the number of samples to achievecompression.

Another special case is whenK = 1. In this case, thesufficient condition becomesM ≥ logN and κT log M

σ2 ≫logN. Now the number of total samples should satisfyMT ≫ (log N)2

log log N while the necessary condition states thatMT = Ω (logN) . The smallest gap between the necessarycondition and sufficient condition is achieved whenK = 1.

From a denoising perspective, Fletcheret.al. [61] upperbounded and approximated the probability of error for supportrecovery averaged over the Gaussian ensemble. The boundand its approximation are applicable only to the special casewith K = 1 and involve complex integrals. The authorsobtained interesting SNR threshold as a function ofM,NandK through the analytical bound. Note that our bounds arevalid for generalK and have a simple form. Besides, most ofour derivation is conditioned on a realization of the Gaussianmeasurement ensemble. The conditioning makes more sensethan averaging since in practice we usually make observationswith fixed sensing matrix and varying signals and noise.

The result of Theorem 5 also exhibits several interestingproperties in the general case. Compared with the necessarycondition (43) and (44), the asymmetry in the sufficientcondition is even more desirable in most cases because of theasymmetric cost associated with sensors and temporal samples.Once the thresholdK log N

K of M is exceeded, we can achievea desired probability of error by taking more temporal samples.If we were concerned only with total the number of samples,we would minimizeMT subject to the constraints (46) and(47) to achieve a given level of probability of error. However,in applications for which timing is important, one has toincrease sensors to reducePerr to a certain limit.

The sufficient condition (46), (47), and (48) is separablein the following sense. We observe from the proof that therequirementM = Ω

(

K log NK

)

is used only to guarantee thatthe randomly generated measurement matrix is a good one inthe sense that its incoherenceλ is sufficiently large, as in thecase of Compressive Sensing. It is in Lemma 1 that we usethe Gaussian ensemble assumption. If another deterministicconstruction procedure (for attempts in this direction, see [62])or random distribution give measurement matrix with betterincoherenceλ, it would be possible to reduce the orders forbothM andT .

VI. CONCLUSIONS

In this paper, we formulated the support recovery problemsfor jointly sparse signals as binary and multiple-hypothesistestings. Adopting the probability of error as the performancecriterion, the optimal decision rules are given by the likelihood

ratio test and the maximuma posterioriprobability estimator.The latter reduces to the maximum likelihood estimator whenequal prior probabilities are assigned to the supports. Wethen employed the Chernoff bound and Fano’s inequality toderive bounds on the probability of error. We discussed theimplications of these bounds at the end of Section III-B,Section III-C, Section IV, Section V-B, and Section V-C,in particular when they are applied to the DOA estimationproblem considered in [7] and Compressive Sensing with aGaussian measurement ensemble. We derived sufficient andnecessary conditions for Compressive Sensing using Gaussianmeasurement matrices to achieve a vanishing probability oferror in both the mean and large probability senses. Theseconditions show the necessity of considering multiple temporalsamples. The symmetric and asymmetric roles played bythe spatial and temporal samples and their implications insystem design were discussed. For Compressive Sensing, wedemonstrated that it is impossible to obtain accurate signalsupport with only one temporal sample if the variance for theGaussian measurement matrix scales with1/M and there isconsiderable noise.

This research on support recovery for jointly sparse signalsis far from complete. Several questions remain to be answered.First, we notice an obvious gap between the necessary andsufficient conditions even in the simplest case withK = 1.Better techniques need to be introduced to refine the re-sults. Second, as in the case for RIP, computation of thequantity λ for an arbitrary measurement matrix is extremelydifficult. Although we derive large derivation bounds onλand compute the expected value forλSi,Sj

for the Gaussianensemble, its behaviors in both the general and Gaussian casesrequire further study. Its relationship with RIP also needstobe clarified. Finally, our lower bound derived from Fano’sinequality identifies only the effect of the total gain. Theeffect of the measurement matrix’s incoherence is elusive.Theanswers to these questions will enhance our understanding ofthe measurement mechanism (2).

APPENDIX APROOF OFPROPOSITION1

In this proof, we focus on the case for which bothk0 6= 0andk1 6= 0. Other cases have similar and simpler proofs. Theeigenvalues ofH satisfy |λIM −H | = 0, which is equivalentto |λΣ1 − Σ0| = 0. The substitutionλ = µ+ 1 defines

g (µ) = |(µ+ 1)Σ1 − Σ0| = |µΣ1 − (Σ0 − Σ1)| .

The following algebraic manipulation

G , Σ0 − Σ1

= AS0A†

S0−AS1

A†S1

=[

AS0∩S1A†

S0∩S1+AS0\S1

A†S0\S1

]

−[

AS0∩S1A†

S0∩S1+AS1\S0

A†S1\S0

]

= AS0\S1A†

S0\S1−AS1\S0

A†S1\S0


leads to

g (µ) = |µΣ1 −G|= |Σ1|

1

2

∣

∣

∣µIM − Σ− 1

2

1 GΣ− 1

2†

1

∣

∣

∣ |Σ1|1

2 .

Therefore, to prove the theorem, it suffices to show that

Σ− 1

2

1 GΣ− 1

2†

1 hask0 positive eigenvalues,k1 negative eigen-values andM − (k0 + k1) zero eigenvalues or, put another

way, Σ− 1

2

1 GΣ− 1

2†

1 has inertia(k0, k1,M − (k0 + k1)). TheSylvester’s law of inertia ( [49], Theorem 4.5.8, p. 223)states that the inertia of a symmetric matrix is invariant undercongruence transformations. Hence, we need only to show thatG has inertia(k0, k1,M − (k0 + k1)). ClearlyG = PQ† withP =

[

AS0\S1AS1\S0

]

andQ =[

AS0\S1−AS1\S0

]

.To find the number of zero eigenvalues ofG, we calculate therank ofG. The non-degenerateness of measurement matrixAimplies thatrank (P ) = rank (Q) = k0+k1. Therefore, fromrank inequality ( [49], Theorem 0.4.5, p. 13),

rank (P ) + rank(

Q†)

− (k0 + k1)

≤ rank(

PQ†)

≤ min

rank (P ) , rank(

Q†)

,

we conclude thatrank (G) = k0 + k1.

To count the number of negative eigenvalues ofG, we usethe Jocobi-Sturm rule ( [63], Theorem A.1.4, p. 320), whichstates that for anM×M symmetric matrix whosejth leadingprincipal minor has determinantdj , j = 1, . . . ,M , the numberof nonnegative eigenvalues is equal to the number of signchanges of sequence1, d1, . . . , dM. We consider only thefirst k0+k1 leading principal minors, since higher order minorshave determinant0.

SupposeI = 1, . . . , k0 + k1 is an index set. Without lossof generality, we assume thatP I is nonsingular. ApplyingQLfactorization (one variation ofQR factorization, see [64]) tomatrix P I , we obtainP I = OL, whereO is an orthogonalmatrix, OO† = Ik0+k1

, andL = (lij)(k0+k1)×(k0+k1) is anlower triangular matrix. The diagonal entries ofL are nonzerosinceP I is nonsingular. The partition ofL into

L =[

L1 L2

]

with L1 ∈ F(k0+k1)×k0 , L2 ∈ F(k0+k1)×k1 , andL2 =

[

0L3

]

with L3 ∈ Fk1×k1 implies

GII = P I(QI)

† = O[

L1 L2

]

[

L†1

−L†2

]

O†.

Again using the invariance property of inertia under congru-ence transformation, we focus on the leading principal minors

of U ,[

L1 L2

]

[

L†1

−L†2

]

. SupposeJ = 1, . . . , j. For

1 ≤ j ≤ k0, from the lower triangularity ofL, it is clear that

∣

∣

(

UJJ

)∣

∣ =∣

∣(L1)JJ

∣

∣

2=

j∏

i=1

|lii|2 > 0.

For k0 + 1 ≤ j ≤ k0 + k1, supposeJ0 = 1, . . . , k0 andJ1 = 1, . . . , j − k0. We then have

∣

∣UJJ

∣

∣ =∣

∣

∣(L1)J0

J0

∣

∣

∣

2 ∣∣

∣(L3)J1

J1

∣

∣

∣

∣

∣

∣

∣

−[

(L3)J1

J1

]†∣

∣

∣

∣

= (−1)j−k0

∣

∣

∣(L1)J0

J0

∣

∣

∣

2 ∣∣

∣(L3)J1

J1

∣

∣

∣

2

= (−1)j−k0

j∏

i=1

|lii|2 .

Therefore, the sequence1, d1, d2, · · · dk0+k1has k1 sign

changes, which implies thatGII—henceG—hask1 negative

eigenvalues. Finally, we conclude that the theorem holds forH .

APPENDIX BPROOF OFPROPOSITION3

We first prove the first claim. From the proof of Proposition1, it suffices to show that the sorted positive eigenvalues ofΣ

− 1

2

1 GΣ− 1

2†

1 are greater than those of1σ2R33R†33, whereG =

AS0\S1A†

S0\S1− AS1\S0

A†S1\S0

. Since cyclic permutation ofa matrix product does not change its eigenvalues, we restrictourselves toΣ−1

1 G. Consider theQR decomposition

[

AS1\S0AS1S0

AS0\S1

]

= QR

,[

Q1 Q2 Q3 Q4

]

R11 R12 R13

0 R22 R23

0 0 R33

0 0 0

,

where Q ∈ FM×M is an orthogonal matrix with parti-tions Q1 ∈ FM×k1 , Q2 ∈ FM×ki , Q3 ∈ FM×k0 , R ∈FM×(k1+ki+k0) is an upper triangular matrix with partitionsR11 ∈ Fk1×k1 , R22 ∈ Fki×ki , R33 ∈ Fk0×k0 , and othersubmatrices have corresponding dimensions.

First, we note that

Q†GQ

=

R13 R11

R23 0R33 00 0

[

R†13 R†

23 R†33 0

−R†11 0 0 0

]

=

[

R13 R11

R23 0

] [

R†13 R†

23

−R†11 0

] [

R13R†33

R23R†33

]

0[

R33R†13 R33R

†23

]

R33R†33 0

0 0 0

.

Therefore, the lastM − (k1 + ki + k0) rows and columnsof Q†GQ—and hence of(Q†Σ1Q)−1(Q†GQ)—are zeros,which lead to theM − (k1 + ki + k0) zero eigenvalues of

Σ− 1

2

1 GΣ− 1

2†

1 . We then drop these rows and columns in allmatrices involved in subsequent analysis. In particular, thesubmatrix ofQ†Σ1Q = Q†

(

σ2IM +AS1A†

S1

)

Q without the

lastM − (k1 + ki + k0) rows and columns is


σ2IM +

R11 R12

0 R22

0 0

[

R†11 0 0

R†12 R†

22 0

]

=

σ2Ik1+ki+

[

R11 R12

0 R22

] [

R†11 0

R†12 R†

22

]

0

0 σ2Ik0

,[

F 00 σ2Ik0

]

.

Define

[

V K†

K R33R†33

]

,

[

R13 R11

R23 0

] [

R†13 R†

23

−R†11 0

] [

R13R†33

R23R†33

]

[

R33R†13 R33R

†23

]

R33R†33

.

Due to the invariance of eigenvalues with respect to orthogonaltransformations and switching to the symmetrized version,wefocus on

[

F 00 σ2Ik0

]− 1

2

[

V K†

K R33R†33

] [

F 00 σ2Ik0

]− 1

2†

=

[

F− 1

2V F− 1

2† F− 1

2K† 1σ

1σKF

− 1

21

σ2R33R†33

]

.

Next we argue that the sorted positive eigenvalues of[

F− 1

2V F− 1

2† F− 1

2K† 1σ

1σKF

− 1

21

σ2R33R†33

]

are greater than the correspond-

ing sorted eigenvalues of1σ2R33R†33.

For any ε > 0, we define a matrixMε,N =[ −N Ik1+ki

0

0 1σ2R33R

†33 − εIk0

]

. Then we have

[

F− 1

2V F− 1

2† F− 1

2K† 1σ

1σKF

− 1

21

σ2R33R†33

]

−Mε,N

=

[

F− 1

2V F− 1

2† +N Ik1+ki

F− 1

2K† 1σ

1σKF

− 1

2 εIk0

]

.

Note that

[

F− 1

2 V F− 1

2† +N Ik1+ki

F− 1

2K† 1σ

1σKF

− 1

2 εIk0

]

is congruent

to

[

F− 1

2V F− 1

2† +N Ik1+ki

− 1εσ2F

− 1

2K†KF− 1

2 00 εIk0

]

.

ClearlyF− 1

2 V F− 1

2† +N Ik1+ki

− 1εσ2F

− 1

2K†KF− 1

2 is pos-itive definite whenN is sufficiently large. Hence, whenN islarge enough, we obtain

[

F− 1

2V F− 1

2† F− 1

2K† 1σ

1σKF

− 1

21

σ2R33R†33

]

≻Mε,N .

Using Corollary 4.3.3 of [49], we conclude that the eigen-

values of

[

F− 1

2 V F− 1

2† F− 1

2K† 1σ

1σKF

− 1

21

σ2R33R†33

]

are greater than those

of Mε,N if sorted. From Proposition 1, we know that

[

F− 1

2 V F− 1

2† +N Ik1+ki

F− 1

2K† 1σ

1σKF

− 1

2 εIk0

]

has exactlyk0 posi-

tive eigenvalues, which are the only eigenvalues that couldbe greater thanλ

(

1σ2R33R

†33

)

− ε. Sinceε is arbitrary, we

finally conclude that the positive eigenvalues ofΣ−11 G are

greater than those of1σ2R33R†33 if sorted in the same way.

For the second claim, we need some notations and prop-erties of symmetric and Hermitian matrices. For any pair ofsymmetric (or Hermitian) matricesP andQ, P ≺ Q meansthat Q − P is positive definite andP Q meansQ − Pis nonnegative definite. Note that ifP and Q are positivedefinite, then from Corollary 7.7.4 of [49]P Q if andonly if Q−1 P−1; if P Q then the eigenvalues ofPand Q satisfy λk (P ) ≤ λk (Q), whereλk(P ) denotes thekth largest eigenvalue ofP ; furthermore,A B implies thatPAP † PBP † for anyP , square or rectangular. Therefore,σ2IM +AS0S1

A†S0S1

σ2IM +AS1A†

S1= Σ1 yields

Σ1/20 Σ−1

1 Σ1/20 Σ

1/20

(

σ2IM +AS0S1A†

S0S1

)−1

Σ1/20 .

Recall that from the definition of eigenvalues, the non-zeroeigenvalues ofAB andBA are the same for any matricesAandB. Since we are interested only in the eigenvalues, a cyclicpermutation in the matrix product on the previous inequality’sright-hand side gives us

(

σ2IM +AS0S1A†

S0S1

)− 1

2

Σ0

(

σ2IM +AS0S1A†

S0S1

)− 1

2

= IM +(

σ2IM +AS0S1A†

S0S1

)− 1

2

AS0\S1

×A†S0\S1

(

σ2IM +AS0S1A†

S0S1

)− 1

2

, IM +Q− 1

2AS0\S1A†

S0\S1Q− 1

2

, IM + P.

Until now we have shown that the sorted eigenvalues ofHare less than the corresponding ones ofIM +P . The non-zeroeigenvalues ofQ− 1

2AS0\S1A†

S0\S1Q− 1

2 is the same as the non-

zero eigenvalues ofA†S0\S1

Q−1AS0\S1 1

σ2A†S0\S1

AS0\S1.

Using the same fact again, we conclude that the non-zeroeigenvalues of 1

σ2A†S0\S1

AS0\S1is the same as the non-zero

eigenvalues of1σ2AS0\S1A†

S0\S1. Therefore, we obtain that

λk(Σ1/20 Σ−1

1 Σ1/20 ) ≤ λk(IM + P ) ≤ λk

(

IM +1

σ2AS0\S1

A†S0\S1

)

.

In particular, the eigenvalues ofH that are greater than1 are upper bounded by the corresponding ones ofIM +1

σ2AS0\S1A†

S0\S1if they are both sorted ascendantly. Hence,

we get that the eigenvalues ofH that are greater than1 areless than those ofIk0

+ 1σ2A

†S0\S1

AS0\S1.

Therefore, the conclusion of the second part of the theoremholds. We comment here that usually it is not true thatH IM + 1

σ2AS0\S1A†

S0\S1. Only the inequality on eigenvalues

holds.


APPENDIX CPROOF OFLEMMA 1

For arbitrary fixed supportsSi, Sj , we have

λSi,Sj≥ 1 +

1

2κσ2

(

kd∏

l=1

2κr2ll

)1/kd

≥ 1

2κσ2min

1≤l≤kd

ql,

where 2κr2ll ∼ χ22κ(M−K−l+1) can be written as a sum of

2κ(M − K − l + 1) independent squared standard Gaussianrandom variables andql ∼ χ2

2κ(M−2K) is obtained by drop-pingK − l+ 1 of them. Therefore, using the union bound weobtain

P

λSi,Sj≤ γ

≤ P

1

2κσ2min

1≤l≤kd

ql ≤ γ

≤ P

⋃

1≤l≤kd

[

ql ≤ 2κσ2γ]

≤ kdP

ql ≤ 2κσ2γ

.

Sinceγ = 13

M−2Kσ2 implies that2κσ2γ = 2κ

3 (M − 2K) <2κ(M − 2K)− 2, the mode ofχ2

κ(M−2K), whenM − 2K issufficiently large, we have

P

ql ≤ 2κσ2γ

=

∫ 2κσ2γ

0

(1/2)κ(M−2K)

Γ (κ(M − 2K))xκ(M−2K)−1e−x/2dx

≤[

κσ2γ]κ(M−2K)

Γ (κ(M − 2K))e−κσ2γ .

The inequalitylog Γ (z) ≥(

z − 12

)

log z − z says that whenM − 2K is large enough,

P

ql ≤ 2κσ2γ

≤ exp κ(M − 2K) log(

κσ2γ)

− κσ2γ

−[

κ(M − 2K)− 1

2

]

log [κ(M − 2K)]

+κ(M − 2K)≤ exp −c(M − 2K) ,

wherec < κ (log 3 − 1). Therefore, we have

P

λSi,Sj≤ γ

≤ K exp −c(M − 2K) .

ACKNOWLEDGMENT

The authors thank the anonymous referees for their carefuland helpful comments.

REFERENCES

[1] M. B. Wakin, S. Sarvotham, M. F. Duarte, D. Baron, and R. G.Bara-niuk, “Recovery of jointly sparse signals from few random projections,”in Proc. Neural Inform. Processing Systems, Vancouver, Canada, Dec.2005, pp. 1435–1442.

[2] E. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles:Exact signal reconstruction from highly incomplete frequency infor-mation,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, Feb.2006.

[3] D. Donoho, “Compressed sensing,”IEEE Trans. Inf. Theory, vol. 52,no. 4, pp. 1289–1306, Apr. 2006.

[4] E. Candes and T. Tao, “Decoding by linear programming,”IEEE Trans.Inf. Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005.

[5] E. Candes and M. Wakin, “An introduction to compressivesampling,”IEEE Signal Process. Mag., vol. 25, no. 2, pp. 21–30, Mar. 2008.

[6] R. Baraniuk, “Compressive sensing [lecture notes],”IEEE SignalProcess. Mag., vol. 24, no. 4, pp. 118–121, Jul. 2007.

[7] D. Malioutov, M. Cetin, and A. Willsky, “A sparse signal reconstructionperspective for source localization with sensor arrays,”IEEE Trans.Signal Process., vol. 53, no. 8, pp. 3010–3022, Aug. 2005.

[8] D. Model and M. Zibulevsky, “Signal reconstruction in sensor arraysusing sparse representations,”Signal Processing, vol. 86, no. 3, pp.624–638, Mar. 2006.

[9] V. Cevher, M. Duarte, and R. G. Baraniuk, “Distributed target local-ization via spatial sparsity,” inEuropean Signal Processing Conference(EUSIPCO 2008), Lausanne, Switzerland, Aug. 2008.

[10] V. Cevher, P. Indyk, C. Hegde, and R. G. Baraniuk, “Recovery ofclustered sparse signals from compressive measurements,”in Int. Conf.Sampling Theory and Applications (SAMPTA 2009), Marseille, France,May 2009, pp. 18–22.

[11] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decompositionby basis pursuit,”SIAM J. Sci. Comp., vol. 20, no. 1, pp. 33–61, 1998.

[12] S. Mallat, A Wavelet Tour of Signal Processing.San Diego, CA:Academic, 1998.

[13] I. Gorodnitsky and B. Rao, “Sparse signal reconstruction from limiteddata using FOCUSS: A re-weighted minimum norm algorithm,”IEEETrans. Signal Process., vol. 45, no. 3, pp. 600–616, Mar. 1997.

[14] E. Candes and T. Tao, “The Dantzig selector: Statistical estimationwhenp is much larger thann,” Ann. Statist., vol. 35, pp. 2313–2351,2007.

[15] J. Tropp and A. Gilbert, “Signal recovery from random measurementsvia orthogonal matching pursuit,”IEEE Trans. Inf. Theory, vol. 53,no. 12, pp. 4655–4666, Dec. 2007.

[16] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensingsignal reconstruction,”IEEE Trans. Inf. Theory, vol. 55, no. 5, pp.2230–2249, May 2009.

[17] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery fromincomplete and inaccurate samples,”Appl. Comput. Harmonic Anal.,vol. 26, no. 3, pp. 301–321, May 2009.

[18] P. Borgnat and P. Flandrin, “Time-frequency localization from sparsityconstraints,” inProc. IEEE Int. Conf. Acoustics, Speech and SignalProcessing (ICASSP 2008), Las Vegas, NV, Apr. 2008, pp. 3785–3788.

[19] Z. Tian and G. Giannakis, “Compressed sensing for wideband cog-nitive radios,” inProc. IEEE Int. Conf. Acoustics, Speech and SignalProcessing (ICASSP 2007), Honolulu, HI, Apr. 2007, pp. IV–1357–IV–1360.

[20] M. A. Sheikh, S. Sarvotham, O. Milenkovic, and R. G. Baraniuk, “DNAarray decoding from nonlinear measurements by belief propagation,”in Proc. IEEE Workshop Statistical Signal Processing (SSP 2007),Madison, WI, Aug. 2007, pp. 215–219.

[21] H. Vikalo, F. Parvaresh, and B. Hassibi, “On recovery ofsparse signalsin compressed DNA microarrays,” inProc. Asilomar Conf. Signals,Systems and Computers (ACSSC 2007), Pacific Grove, CA, Nov. 2007,pp. 693–697.

[22] F. Parvaresh, H. Vikalo, S. Misra, and B. Hassibi, “Recovering sparsesignals using sparse measurement matrices in compressed DNA mi-croarrays,” IEEE J. Sel. Topics Signal Processing, vol. 2, no. 3, pp.275–285, Jun. 2008.

[23] H. Vikalo, F. Parvaresh, S. Misra, and B. Hassibi, “Sparse measure-ments, compressed sampling, and DNA microarrays,” inProc. IEEEInt. Conf. Acoustics, Speech and Signal Processing (ICASSP2008),Las Vegas, NV, Apr. 2008, pp. 581–584.

[24] R. Baraniuk and P. Steeghs, “Compressive radar imaging,” in IEEERadar Conference, Apr. 2007, pp. 128–133.

[25] M. Herman and T. Strohmer, “Compressed sensing radar,”in IEEERadar Conference, May 2008, pp. 1–6.

[26] M. A. Herman and T. Strohmer, “High-resolution radar via compressedsensing,”IEEE Trans. Signal Process., vol. 57, no. 6, pp. 2275–2284,Jun. 2009.

[27] E. G. Larsson and Y. Selen, “Linear regression with a sparse parametervector,” IEEE Trans. Signal Process., vol. 55, no. 2, pp. 451–460, Feb.2007.


[28] D. H. Johnson and D. E. Dudgeon,Array Signal Processing: Conceptsand Techniques. Englewood Cliffs, NJ: Prentice Hall, 1993.

[29] J. Capon, “High-resolution frequency-wavenumber spectrum analysis,”Proceedings of the IEEE, vol. 57, no. 8, pp. 1408–1418, Aug. 1969.

[30] R. Schmidt, “Multiple emitter location and signal parameter estima-tion,” IEEE Trans. Antennas Propag., vol. 34, no. 3, pp. 276–280,Mar. 1986.

[31] G. Bienvenu and L. Kopp, “Adaptivity to background noise spatialcoherence for high resolution passive methods,” inProc. IEEE Int.Conf. Acoustics, Speech and Signal Processing (ICASSP 1980), vol. 5,Denver, CO, Apr. 1980, pp. 307–310.

[32] P. Stoica and A. Nehorai, “MUSIC, maximum likelihood, and Cramer-Rao bound: Further results and comparisons,”IEEE Trans. Acoust.,Speech, Signal Process., vol. 38, no. 12, pp. 2140–2150, Dec. 1990.

[33] M. Duarte, S. Sarvotham, D. Baron, M. Wakin, and R. Baraniuk,“Distributed compressed sensing of jointly sparse signals,” in Proc.Asilomar Conf. Signals, Systems and Computers (ACSSC 2005), PacificGrove, CA, Nov. 2005, pp. 1537–1541.

[34] M. Duarte, M. Wakin, D. Baron, and R. Baraniuk, “Universal dis-tributed sensing via random projections,” inInt. Conf. InformationProcessing in Sensor Networks (IPSN 2006), Nashville, TN, Apr. 2006,pp. 177–185.

[35] M. Fornasier and H. Rauhut, “Recovery algorithms for vector-valueddata with joint sparsity constraints,”SIAM J. Numer. Anal., vol. 46,no. 2, pp. 577–613, 2008.

[36] M. Mishali and Y. Eldar, “The continuous joint sparsityprior for sparserepresentations: Theory and applications,” inIEEE Int. Workshop Com-putational Advances in Multi-Sensor Adaptive Processing (CAMPSAP2007), St. Thomas, U.S. Virgin Islands, Dec. 2007, pp. 125–128.

[37] S. Cotter, B. Rao, K. Engan, and K. Kreutz-Delgado, “Sparse solutionsto linear inverse problems with multiple measurement vectors,” IEEETrans. Signal Process., vol. 53, no. 7, pp. 2477–2488, Jul. 2005.

[38] J. Chen and X. Huo, “Sparse representations for multiple measure-ment vectors (MMV) in an over-complete dictionary,” inProc. IEEEInt. Conf. Acoustics, Speech and Signal Processing (ICASSP2005),Philadelphia, PA, Mar. 2005, pp. 257–260.

[39] ——, “Theoretical results on sparse representations ofmultiple-measurement vectors,”IEEE Trans. Signal Process., vol. 54, no. 12,pp. 4634–4643, Dec. 2006.

[40] J. A. Tropp, A. C. Gilbert, and M. J. Strauss, “Algorithms for simulta-neous sparse approximation. Part I: Greedy pursuit,”Signal Processing,vol. 86, no. 3, pp. 572–588, 2006.

[41] J. A. Tropp, “Algorithms for simultaneous sparse approximation. PartII: Convex relaxation,”Signal Processing, vol. 86, no. 3, pp. 589 –602, 2006.

[42] S. Aeron, M. Zhao, and V. Saligrama, “Information theoretic boundsto performance of compressed sensing and sensor networks,”Preprint,2008. [Online]. Available: http://arxiv.org/pdf/0804.3439v4

[43] M. Wainwright, “Information-theoretic bounds on sparsity recovery inthe high-dimensional and noisy setting,” inIEEE Int. Symp. InformationTheory (ISIT 2007), Nice, France, Jun. 2007, pp. 961–965.

[44] H. L. Van Trees,Detection, Estimation, and Modulation Theory, PartI. New York: Wiley-Interscience, 2001.

[45] A. K. Gupta and D. K. Nagar,Matrix Variate Distributions. BocaRaton, FL: Chapman & Hall/CRC, 1999.

[46] Y. S. Chow and H. Teicher,Probability Theory: Independence, Inter-changeability, Martingales. New York: Springer-Verlag, 1988.

[47] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, “Model-basedcompressive sensing,” submitted for publication.

[48] D. S. Bernstein,Matrix Mathematics: Theory, Facts, and Formulaswith Application to Linear Systems Theory. Princeton, NJ: PrincetonUniversity Press, 2005.

[49] R. A. Horn and C. R. Johnson,Matrix Analysis. New York: CambridgeUniversity Press, 1990.

[50] O. Milenkovic, H. Pham, and W. Dai, “Sublinear compressive sensingreconstruction via belief propagation decoding,” inInt. Symp. Informa-tion Theory (ISIT 2009), Seoul, South Korea, Jul. 2009, pp. 674–678.

[51] T. M. Cover and J. A. Thomas,Elements of Information Theory, 2nd ed.Hoboken, NJ: Wiley-Interscience, 2006.

[52] S. Kullback, Information Theory and Statistics. New York: DoverPublications, 1997.

[53] Y. G. Yatracos, “A lower bound on the error in nonparametric regressiontype problems,”Ann. Statist., vol. 16, no. 3, pp. 1180–1187, 1988.

[54] L. Birge, “Approximation dans les espaces metriques et theorie del’estimation,” Probability Theory and Related Fields, vol. 65, no. 2,pp. 181–237, Jun. 1983.

[55] A. Dogandzic and A. Nehorai, “Space-time fading channel estimationand symbol detection in unknown spatially correlated noise,” IEEETrans. Signal Process., vol. 50, no. 3, pp. 457–474, Mar. 2002.

[56] H. L. Van Trees,Optimum Array Processing. New York: Wiley-Interscience, 2002.

[57] G. Tang and A. Nehorai, “Support recovery for source localizationbased on overcomplete signal representation,” submitted to Proc. IEEEInt. Conf. Acoustics, Speech and Signal Processing (ICASSP2010).

[58] D. L. Donoho and P. B. Stark, “Uncertainty principles and signalrecovery,” SIAM Journal on Applied Mathematics, vol. 49, no. 3, pp.906–931, 1989.

[59] D. L. Donoho and B. F. Logan, “Signal recovery and the large sieve,”SIAM Journal on Applied Mathematics, vol. 52, no. 2, pp. 577–591,1992.

[60] T. W. Anderson,An Introduction to Multivariate Statistical Analysis,3rd ed. Hoboken, NJ: Wiley-Interscience, 2003.

[61] A. K. Fletcher, S. Rangan, V. K. Goyal, and K. Ramchandran, “De-noising by sparse approximation: Error bounds based on rate-distortiontheory,”EURASIP Journal on Applied Signal Processing, vol. 2006, pp.1–19, 2006.

[62] R. A. DeVore, “Deterministic constructions of compressed sensingmatrices,”J. Complexity, vol. 23, no. 4-6, pp. 918–925, 2007.

[63] I. Gohberg, P. Lancaster, and L. Rodman,Indefinite Linear Algebraand Applications. Basel, Switzerland: Birkhauser, 2005.

[64] G. H. Golub and C. F. V. Loan,Matrix Computations, 3rd ed.Baltimore, MD: Johns Hopkins University Press, 1996.

Gongguo Tangearned his B.Sc. degree in Mathematics from the ShandongUniversity, China in 2003, and the M.Sc. degree in System Science fromChinese Academy of Sciences, China, in 2006.

Currently, he is a Ph.D. candidate with the Department of Electrical andSystems Engineering, Washington University, under the guidance of Dr. AryeNehorai. His research interests are in the area of Compressive Sensing,statistical signal processing, detection and estimation,and their applications.

Arye Nehorai (S’80-M’83-SM’90-F’94) earned his B.Sc. and M.Sc. degreesin electrical engineering from the Technion–Israel Institute of Technology,Haifa, Israel, and the Ph.D. degree in electrical engineering from StanfordUniversity, Stanford, CA.

From 1985 to 1995, he was a Faculty Member with the DepartmentofElectrical Engineering at Yale University. In 1995, he became a Full Professorin the Department of Electrical Engineering and Computer Science at TheUniversity of Illinois at Chicago (UIC). From 2000 to 2001, he was Chair ofthe Electrical and Computer Engineering (ECE) Division, which then becamea new department. In 2001, he was named University Scholar ofthe Universityof Illinois. In 2006, he became Chairman of the Department ofElectricaland Systems Engineering at Washington University in St. Louis. He is theinaugural holder of the Eugene and Martha Lohman Professorship and theDirector of the Center for Sensor Signal and Information Processing (CSSIP)at WUSTL since 2006.

Dr. Nehorai was Editor-in-Chief of the IEEE TRANSACTIONS ONSIGNAL PROCESSING from 2000 to 2002. From 2003 to 2005, he wasVicePresident (Publications) of the IEEE Signal Processing Society (SPS), Chair ofthe Publications Board, member of the Board of Governors, and member of theExecutive Committee of this Society. From 2003 to 2006, he was the foundingeditor of the special columns on Leadership Reflections in the IEEE SignalProcessing Magazine. He was co-recipient of the IEEE SPS 1989 SeniorAward for Best Paper with P. Stoica, coauthor of the 2003 Young AuthorBest Paper Award, and co-recipient of the 2004 Magazine Paper Award withA. Dogandzic. He was elected Distinguished Lecturer of the IEEE SPS for theterm 2004 to 2005 and received the 2006 IEEE SPS Technical AchievementAward. He is the Principal Investigator of the new multidisciplinary universityresearch initiative (MURI) project entitled Adaptive Waveform Diversity forFull Spectral Dominance. He has been a Fellow of the Royal Statistical Societysince 1996.

http://arxiv.org/pdf/0804.3439v4

performance analysis for sparse support recovery

Documents