near-optimal adaptive compressed sensingto perform considerably better than 1) compressed sensing...

13
arXiv:1306.6239v2 [cs.IT] 29 Apr 2014 Near-Optimal Adaptive Compressed Sensing Matthew L. Malloy, Member, IEEE and Robert D. Nowak, Fellow, IEEE Abstract—This paper proposes a simple adaptive sensing and group testing algorithm for sparse signal recovery. The algorithm, termed Compressive Adaptive Sense and Search (CASS), is shown to be near-optimal in that it succeeds at the lowest possible signal-to-noise-ratio (SNR) levels, improving on previous work in adaptive compressed sensing [2]–[5]. Like traditional compressed sensing based on random non-adaptive design matrices, the CASS algorithm requires only k log n measurements to recover a k- sparse signal of dimension n. However, CASS succeeds at SNR levels that are a factor log n less than required by standard compressed sensing. From the point of view of constructing and implementing the sensing operation as well as computing the reconstruction, the proposed algorithm is substantially less computationally intensive than standard compressed sensing. CASS is also demonstrated to perform considerably better in practice through simulation. To the best of our knowledge, this is the first demonstration of an adaptive compressed sensing algorithm with near-optimal theoretical guarantees and excellent practical performance. This paper also shows that methods like compressed sensing, group testing, and pooling have an advantage beyond simply reducing the number of measurements or tests – adaptive versions of such methods can also improve detection and estimation performance when compared to non-adaptive direct (uncompressed) sensing. I. I NTRODUCTION Compressed sensing (CS) has had a tremendous impact on signal processing, machine learning, and statistics, fundamen- tally changing the way we think about sensing and data acqui- sition. Beyond the standard compressed sensing methods that gave birth to the field, the compressive framework naturally suggests an ability to make measurements in an on-line and adaptive manner. Adaptive sensing uses previously collected measurements to guide the design and selection of the next measurement in order to optimize the gain of new information. There is now a reasonably complete understanding of the potential advantages of adaptive sensing over non-adaptive sensing, the main one being that adaptive sensing can reliably recover sparse signals at lower SNRs than non-adaptive sens- ing (where SNR is proportional to the squared amplitude of the weakest non-zero element). Roughly speaking, to recover a k-sparse signal of length n, standard (non-adaptive) sensing requires the SNR to grow like log n. Adaptive sensing, on the other hand, succeeds as long as the SNR scales like Manuscript received May 24, 2013; revised November 7, 2013; accepted April 19, 2014. This work was partially supported by AFOSR grant FA9550- 09-1-0140, NSF grant CCF-1218189 and the DARPA KECoM program. The material in this paper was presented in part at the Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA [1], November, 2012. M. L. Malloy and R. D. Nowak are with the Department of Electrical and Computer Engineering, University of Wisconsin, Madison, WI 53715 USA (e-mail: [email protected], [email protected]). log k. This is a significant improvement, especially in high- dimensional regimes. In terms of the number of measurements, both standard compressed sensing and adaptive compressed sensing require about k log n measurements. This paper makes two main contributions in adaptive sens- ing. First, we propose a simple adaptive sensing algorithm, termed Compressive Adaptive Sense and Search (CASS) that is proved to be near-optimal in that it succeeds if the SNR scales like log k. From the point of view of constructing and implementing the sensing operation as well as computing the reconstruction, the CASS algorithm is comparatively less com- putationally intensive than standard compressed sensing. The algorithm could be easily realized in a number of existing com- pressive systems, including those based on digital micromirror devices [6], [7]. Second, CASS is demonstrated in simulation to perform considerably better than 1) compressed sensing based on random Gaussian sensing matrices and 2) non- adaptive direct sensing (which requires m = n measurements). To the best of our knowledge, this is the first demonstration of an adaptive sensing algorithm with near-optimal theoretical guarantees and excellent practical performance. The results presented in this paper answer the following important question. Can sensing systems that make adaptive measurements significantly outperform non-adaptive systems in practice? Perhaps not surprisingly, the answer is yes, contra- dictory to the theme of [8]. This is for two reasons; first, while the required SNR for success of adaptive procedures is at best a logarithmic factor in dimension smaller than non-adaptive procedures, this factor is significant for problems of even modest size. On test signals (natural and synthetic images, see Figs. 5 and 6) the CASS procedure consistently outperforms standard compressed sensing by 2-8 dB. Second, in terms of computational complexity, adaptive sensing algorithms such as CASS require little or no computation after the measurement stage of the procedure is completed; in general, non-adaptive algorithms involve computation and memory intense optimiza- tion routines after measurements are gathered. This work also sheds light on another relevant question. Can compressive sensing systems with an SNR budget ever hope to outperform direct non-compressive systems which measure each element of an unknown vector directly? The answer to this question is somewhat surprising. For the task of support recovery, when the measurements are collected non- adaptively, compressive techniques are always less reliable than direct sensing with n measurements. However, adaptive compressed sensing with just k log n measurements can be more reliable than non-adaptive direct sensing, provided the signal is sufficiently sparse. This means that methods like

Upload: others

Post on 26-Jan-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Near-Optimal Adaptive Compressed Sensingto perform considerably better than 1) compressed sensing based on random Gaussian sensing matrices and 2) non-adaptive direct sensing (which

arX

iv:1

306.

6239

v2 [

cs.IT

] 29

Apr

201

4

Near-Optimal Adaptive Compressed SensingMatthew L. Malloy,Member, IEEE and Robert D. Nowak,Fellow, IEEE

Abstract—This paper proposes a simple adaptive sensing andgroup testing algorithm for sparse signal recovery. The algorithm,termed Compressive Adaptive Sense and Search (CASS), isshown to be near-optimal in that it succeeds at the lowest possiblesignal-to-noise-ratio (SNR) levels, improving on previous work inadaptive compressed sensing [2]–[5]. Like traditional compressedsensing based on random non-adaptive design matrices, the CASSalgorithm requires only k log n measurements to recover ak-sparse signal of dimensionn. However, CASS succeeds at SNRlevels that are a factor log n less than required by standardcompressed sensing. From the point of view of constructingand implementing the sensing operation as well as computingthe reconstruction, the proposed algorithm is substantially lesscomputationally intensive than standard compressed sensing.CASS is also demonstrated to perform considerably better inpractice through simulation. To the best of our knowledge, thisis the first demonstration of an adaptive compressed sensingalgorithm with near-optimal theoretical guarantees and excellentpractical performance. This paper also shows that methods likecompressed sensing, group testing, and pooling have an advantagebeyond simply reducing the number of measurements or tests –adaptive versions of such methods can also improve detection andestimation performance when compared to non-adaptive direct(uncompressed) sensing.

I. I NTRODUCTION

Compressed sensing (CS) has had a tremendous impact onsignal processing, machine learning, and statistics, fundamen-tally changing the way we think about sensing and data acqui-sition. Beyond the standard compressed sensing methods thatgave birth to the field, the compressive framework naturallysuggests an ability to make measurements in an on-line andadaptive manner. Adaptive sensing uses previously collectedmeasurements to guide the design and selection of the nextmeasurement in order to optimize the gain of new information.

There is now a reasonably complete understanding of thepotential advantages of adaptive sensing over non-adaptivesensing, the main one being that adaptive sensing can reliablyrecover sparse signals at lower SNRs than non-adaptive sens-ing (where SNR is proportional to the squared amplitude ofthe weakest non-zero element). Roughly speaking, to recovera k-sparse signal of lengthn, standard (non-adaptive) sensingrequires the SNR to grow likelogn. Adaptive sensing, onthe other hand, succeeds as long as the SNR scales like

Manuscript received May 24, 2013; revised November 7, 2013;acceptedApril 19, 2014. This work was partially supported by AFOSR grant FA9550-09-1-0140, NSF grant CCF-1218189 and the DARPA KECoM program. Thematerial in this paper was presented in part at the Asilomar Conference onSignals, Systems and Computers, Pacific Grove, CA, USA [1], November,2012.

M. L. Malloy and R. D. Nowak are with the Department of Electrical andComputer Engineering, University of Wisconsin, Madison, WI 53715 USA(e-mail: [email protected], [email protected]).

log k. This is a significant improvement, especially in high-dimensional regimes. In terms of the number of measurements,both standard compressed sensing and adaptive compressedsensing require aboutk logn measurements.

This paper makes two main contributions in adaptive sens-ing. First, we propose a simple adaptive sensing algorithm,termed Compressive Adaptive Sense and Search (CASS) thatis proved to be near-optimal in that it succeeds if the SNRscales likelog k. From the point of view of constructing andimplementing the sensing operation as well as computing thereconstruction, the CASS algorithm is comparatively less com-putationally intensive than standard compressed sensing.Thealgorithm could be easily realized in a number of existing com-pressive systems, including those based on digital micromirrordevices [6], [7]. Second, CASS is demonstrated in simulationto perform considerably better than1) compressed sensingbased on random Gaussian sensing matrices and2) non-adaptive direct sensing (which requiresm = n measurements).To the best of our knowledge, this is the first demonstrationof an adaptive sensing algorithm with near-optimal theoreticalguaranteesand excellent practical performance.

The results presented in this paper answer the followingimportant question. Can sensing systems that make adaptivemeasurements significantly outperform non-adaptive systemsin practice? Perhaps not surprisingly, the answer isyes, contra-dictory to the theme of [8]. This is for two reasons; first, whilethe required SNR for success of adaptive procedures is at besta logarithmic factor in dimension smaller than non-adaptiveprocedures, this factoris significant for problems of evenmodest size. On test signals (natural and synthetic images,seeFigs. 5 and 6) the CASS procedure consistently outperformsstandard compressed sensing by 2-8 dB. Second, in terms ofcomputational complexity, adaptive sensing algorithms such asCASS require little or no computation after the measurementstage of the procedure is completed; in general, non-adaptivealgorithms involve computation and memory intense optimiza-tion routines after measurements are gathered.

This work also sheds light on another relevant question.Can compressive sensing systems with an SNR budget everhope to outperform direct non-compressive systems whichmeasure each element of an unknown vector directly? Theanswer to this question is somewhat surprising. For the taskof support recovery, when the measurements are collected non-adaptively, compressive techniques are always less reliablethan direct sensing withn measurements. However,adaptivecompressed sensing with justk logn measurements can bemore reliable than non-adaptive direct sensing, provided thesignal is sufficiently sparse. This means that methods like

Page 2: Near-Optimal Adaptive Compressed Sensingto perform considerably better than 1) compressed sensing based on random Gaussian sensing matrices and 2) non-adaptive direct sensing (which

compressed sensing, group testing, and pooling may have anadvantage beyond simply reducing the number of measure-ments – adaptive versions of such methods can also improvedetection and estimation performance.

Lastly, while adaptive sensing results in improved perfor-mance, we mention a few limitations. Making adaptive mea-surements can necessitate more flexible measurement hard-ware, incurring a cost in hardware design. In some systems,adaptive measurements could be easily realized (for example,existing compressive systems based on digital micromirrorde-vices [6], [7]). In other systems, however, hardware constraintsmay make adaptive sensing entirely impractical. The results inthis paper assume adaptive measurements can be made and theadditive Gaussian noise model is applicable; both requirementsare made precise in the following section.

A. Problem Setup and Background

Sparse support recovery in compressed sensing refers to thefollowing problem. Letx ∈ R

n be an unknown signal withk ≪ n non-zero entries. The unknown signal is measuredthroughm linear projections of the form:

yi = 〈ai,x〉+ zi i = 1, ...,m (1)

wherezi are i.i.d.N (0, 1), andai ∈ Rn are sensing vectors

with a total sensing energy constraintm∑

i=1

||ai||22 ≤ M . (2)

The goal of the support recovery problem is to identify thelocations of the non-zero entries ofx. Roughly speaking, theparamount result of compressed sensing states this is possibleif 1) m is greater than a constant timesk logn (which canbe much less thann), and 2) the amplitude of the non-zeroentries ofx, and the total sensing energyM , aren’t too small.

This paper is concerned with relaxing the second require-ment through adaptivity. If the sensing vectorsa1, ...,am arefixed prior to making any measurements (thenon-adaptivesetting), then a necessary condition for exact support recoveryis that the smallest non-zero entry ofx be greater than aconstant times

√nM logn [9], [10]. The factor ofM/n is best

interpreted as the sensing energy per dimension, andlogn isrequired to control the error rate across then dimensions.Standard compressed sensing approaches based on randomsensing matrices are known to achieve this lower bound whilerequiring on the order ofk logn measurements [11].

In adaptive sensing,ai can be a function ofy1, ..., yi−1.The sensing vectorsa1, ...,am are not fixed prior to makingobservations, but instead depend on previous measurements.In this adaptive scenario, a necessary condition for sparserecovery2 is that the smallest non-zero entry exceed a constanttimes

√nM log k [12]. To the best of our knowledge, no

proposed algorithms achieve this lower bound while also usingonly orderk logn measurements.

A handful of adaptive sensing procedures have been pro-posed, some coming close to meeting the lower bound whilerequiring only orderk log n measurements. Most recently,

TABLE IASYMPTOTIC REQUIREMENTS FOR EXACT SUPPORT RECOVERY1(

n, k → ∞, k

n→ 0

)

non-adaptive adaptive

dire

ct

1) m ≥ n

2) xmin >√

2 n

Mlogn

necessary and sufficienttraditional detection problem

1) m > n

2) xmin >√

2 n

Mlog k

necessary [13]sufficient [14]

com

pres

sive 1) m ≥ k logn

2) xmin >√

C n

Mlogn

necessary [9], [10]sufficient [11]

1) m ≥ k logn

2) xmin >√

C n

Mlog k

necessary [12]2

sufficient using CASS3

in [5], the authors propose an algorithm that guaranteesexact recovery provided the smallest non-zero entry exceedsan unspecified constant times

√nM (log k + log log2 logn),

coming within a triply logarithmic factor of the lower bound.While this triply logarithmic factor is not of practical concern,removing the suboptimal dependence onn is of theoreticalinterest. Reducing the leading constant, on the other hand,isof great practical concern.

Table I summarizes the necessary and sufficient conditionsfor exact support recovery in the compressive, direct, non-adaptive and adaptive settings.Direct or un-compressed refersto the setting where individual measurement are made of eachcomponent – specifically, each sensing vector is supportedon only one index. Here, support recovery is a traditionaldetection problem, and the non-adaptive SNR requirementbecomes apparent by considering the following. If we measureeach element ofx once using sensing vectors that form anidentity matrix when stacked together, thenM = n, and therequirement implies the smallest non-zero entry be greaterthan

√2 logn. It is well known that the maximum ofn i.i.d.

Gaussians grows as√2 logn; the smallest signal must exceed

this value.Adaptive direct sensing has recently received a fair amount

of attention in the context of sparse recovery [14]–[22], andis closely related to traditional work in sequential statisticalanalysis. In this non-compressive setting, the number of mea-surements must be at least on the order of the dimension –i.e,m ≥ n, as testing procedures must at a minimum measureeach index once. In this setting, a number of works have shownthat scaling of the SNR aslog k is necessary and sufficient forexact recovery [14], [18].

To summarize, Table I highlights the potential gains fromcompressive and adaptive measurements. Allowing for com-pressive measurements can reduce the total number of mea-surements fromn to k logn, but does not relax the SNR

1We write g(n) > f(n) as shorthand forlimn→∞ g(n)/f(n) > 1.2Necessary to control the expected symmetric set differenceover a slightly

larger class of problems. See Sec. III-C and [12, Proposition 4.2].3Sufficient for exact support recovery with Alg. 1 ifxi ≥ 0, i = 1, . . . , n

or if k ≤ logn; sufficient to recover any fixed fraction of support for anyk,x that satisfy condition. See Theorems 1 and 2.

Page 3: Near-Optimal Adaptive Compressed Sensingto perform considerably better than 1) compressed sensing based on random Gaussian sensing matrices and 2) non-adaptive direct sensing (which

requirement. Allowing for adaptive measurements does notreduce the total number of measurements required; instead,itcan reduce the required SNR scaling fromlogn to log k. Priorto this work, in the adaptive compressive case, it was unknownif log k scaling of the SNR was sufficient for support recovery.

B. Main Results and Contributions

The main theoretical contribution of this work is to completeTable I by introducing a simple adaptive compressed sensingprocedure that1) requires only orderk logn measurements,and 2) succeeds provided the minimum non-zero entry isgreater than a constant times

√nM log k, showing this scaling

is sufficient. To the best of our knowledge, this is the firstdemonstration of a procedure that is optimal in terms ofdependence on SNR and dimension.

Specifically, we propose a procedure termed CompressiveAdaptive Sense and Search (CASS). For recovery of non-negative signals, the procedure succeeds in exact supportrecovery with probability greater than1 − δ provided theminimum non-zero entry ofx is greater than

√20

n

M

(log k + log

(8

δ

)),

requires exactlym = 2k log2(n/k) measurements, and hastotal sensing energy

∑mi=1 ||ai||22 = M . This result implies

that asn grows large, if the minimum non-zero entry exceeds√20 n

M log k, the procedure succeeds in exact support recov-ery.

When the signal contains both positive and negative entries,the CASS procedure guarantees the following. On average, theprocedure succeeds in recovery of at least a fraction1− ǫ ofthe non-zero components provided their magnitude is at least

√20

n

M

(log k + 2 log

(8

ǫ

)).

In this setting the procedure requires less than2k log2 (n/k) + 8k/ǫ measurements and has total sensingenergy

∑mi=1 ||ai||22 = M .

In addition to these theoretical developments, we present aseries of numerical results that show CASS outperforms stan-dard compressed sensing for problems of even modest size.The numerical experiments reinforce the theoretical results –as the dimension of the problem grows, performance of theCASS algorithm does not deteriorate, confirming the logarith-mic dependence on dimension has been removed. We alsoshow side-by-side comparisons between standard compressedsensing, direct sensing, and CASS on approximately sparsesignals.

C. Prior Work

A number of adaptive compressed sensing procedures havebeen proposed (see [3]–[5], [23]–[26] and references therein),some coming close to meeting the lower bound while requiringonly orderk logn measurements.

Most closely related to the CASS procedure proposed here,from an algorithmic perspective, are the procedures of Iwen

[3], Iwen and Tewfik [4], [25], and the Compressive BinarySearch of Davenport and Arias-Castro [24]. Much like CASS,these procedures rely on repeated bisection of the support ofthe signal. The first proposal of a bisecting search procedureapplied to the adaptive compressed sensing problem, to thebest of our knowledge, was [3]. The theoretical guarantees ofthis work when translated to the setting studied here ensureexact support recovery provided the non-zero entries ofx are

greater than a constant times√

nM (log2 k + log2 log n) (see

[25, Theorem 2] and more succinctly, [4, Theorem 2]).

Most recently, in [5], the authors propose an algorithmtermed Sequential Compressed Sensing. The procedure isbased on random sensing vectors with sequentially maskedsupport, allowing the procedure to focus sensing energy onthe suspected non-zero components. The procedure guaranteesexact recovery provided the smallest non-zero entry exceedsan unspecified constant times

√nM (log k + log log2 logn),

coming within in a triply logarithmic factor of the theoreticalbound lower bound.

While the bisection approach used by CASS is suggestedin a number of other works, the distinguishing feature of thethe procedure presented here is the allocation of the sensingenergy. In the case of adaptive compressed sensing, roughlyspeaking, both procedures in [3], and later in [24], suggestan allocation of sensing energy across the bisecting steps ofthe procedure that ensure the probability a mistake is made atany step is the same. The CASS procedure instead uses lesssensing energy for initial steps, and more sensing energy atlater steps, reducing the the probability the procedure makesan error from step to step. This allocation of the sensing energyremoves any sub-optimal dependence on dimension from theSNR requirement; the CASS procedure succeeds provided theminimum non-zero entry exceeds

√20 n

M log k.

Compressed sensing can be viewed as a variant of non-adaptive group testing (see [27], [28] for a detailed discussion).In traditional group testing [29], binary valued measurementsindicate if a group of items contains one more defectives,distinct from the real valued measurement model in (1). Whilethe measurement models are different, the CASS procedureattempts to emulate adaptive group testing by forming partialsums of the sparse signal in an adaptive, sequential manner.Inthe traditional group testing literature, CASS is perhaps mostsimilar to thegeneralized binary splitting procedure of [30].Like generalized binary splitting, CASS first isolates defectiveitems by measuring groups of size approximatelyn/k. Theprocedures deviate beyond this; the main distinguishing featureof CASS is the allocation of the sensing energy across passesrequired to cope with additive Gaussian noise. To the bestof our knowledge, analogous allocations of have not beenproposed to cope with noise in adaptive group testing asnoisy group testing has remained largely unexplored [27].This highlights another contribution of CASS – although wefocus on the compressed sensing framework, the key idea(the allocation of the sensing energy) could be adapted toother measurement models, such as a binary symmetric noise

Page 4: Near-Optimal Adaptive Compressed Sensingto perform considerably better than 1) compressed sensing based on random Gaussian sensing matrices and 2) non-adaptive direct sensing (which

model in group testing. In that case, CASS would suggest anallocation of measurements across stages to compensate forafixed probability of error per measurement.

D. Notation

Notation adopted throughout, in general, follows conven-tion. Bold face letters,x, denote vectors (both random anddeterministic), matrices are denoted with capital letters, forexample,A, while calligraphic font, such asS, denotes a setor an event. The indicator functionI{E} is equal to one ifthe eventE occurs, and equals zero otherwise. The vector1{S}, for any support setS, is a vector of ones overS, and0elsewhere. Expectation is denotedE[·] and the probability ofan eventP(·).

II. PROBLEM DETAILS

Consider the canonical compressed sensing setup in whicha sparse signalx ∈ R

n supported onS ⊂ {1, . . . , n}, |S| = k,is measured throughm linear projections as in (1). In matrix-vector form,

y = Ax+ z

where z ∈ Rm ∼ N (0, I) and A ∈ R

m×n is the sensingmatrix with total sensing energy constraint

||A||2fro ≤ M. (3)

The total sensing energy constraint reflects a limit on the totalamount of energy or time available to the sensing device, asin [5], [24]. Importantly, it results in a decoupling between theenergy constraint and the number of measurements, allowingfor fair comparison between procedures that take a differentnumber of measurements. In adaptive sensing the rows ofA,denotedaT

i , are designed sequentially and adaptively basedon prior observations,y1, . . . , yi−1, as opposed to the non-adaptive setting in whichai, i = 1, . . . ,m are fixed a-priori.

The error metrics considered are the probability of exactsupport recovery, or the family wise error rate,

P

(S 6= S

)

and the symmetric set difference

d(S, S) =∣∣∣{S \ S} ∪ {S \ S}

∣∣∣

whereS is an estimate ofS. Both depend on the procedure,n, k, M and the amplitude of the non-zero entries ofx. Inparticular, we quantify performance in terms of the minimumamplitude of the non-zero elements ofx, given as

xmin = mini∈S

|xi|.

The the SNR is defined as the amplitude of the smallednon-zero entry squared, scaled by the sensing energy perdimension:

SNR =x2minM

n.

In general, no assumption is made on the support setSother than|S| = k. Theorem 2 requires the sparse supportbe chosen uniformly at random over all possible cardinalityk support sets, which can be avoided by randomly permutingthe support before running the algorithm.

Lastly, throughout the paper we assume that bothn andkare powers of two for simplicity of presentation. This can ofcourse be relaxed, at the cost of increased constants.

III. C OMPRESSIVEADAPTIVE SENSE AND SEARCH

Conceptually, the CASS procedure operates by initiallydividing the signal into a number of partitions, and thenusing compressive measurements to test for the presenceof one or more non-zero elements in each partition. Theprocedure continues its search by bisecting the most promisingpartitions, with the goal of returning thek largest componentsof the vector. This compressive sensing and bisecting searchapproach inspired the name of the algorithm.

A. The CASS Procedure

The CASS procedure is detailed in Alg. 1. The procedurerequires three inputs:1) k, the number of non-zero coefficientsto be returned by the procedure,2) M , the total budget, and3) and an initial scale parameterǫ, which is in most practicalscenarios is set toǫ = 1.

Since the procedure is based on repeated bisections ofthe signal, it is helpful to define the dyadic subintervals of{1, . . . , n}. The dyadic subintervals,Jj,ℓ ⊂ {1, . . . , n}, aregiven by

Jj,ℓ =

{(ℓ− 1)n

2j+ 1, . . . ,

ℓn

2j

},

j = 0, 1, . . . , log2 n, ℓ = 1, . . . , 2j

wherej indicates the scale andℓ the location of the dyadicpartition.

The procedure consists of a series ofs0 steps. Conceptually,on the first step,s = 1, the procedure begins by dividing thesignal intoℓ0 partitions, whereℓ0 is the smaller ofn or thenext power of two greater than4k/ǫ. These initial partitionsare given by the dyadic sub-intervals of{1, . . . , n} at scalelog2(ℓ0):

Jlog2(ℓ0),ℓℓ = 1, . . . , ℓ0.

The procedure measures the signal withℓ0 sensing vectors,each with support over a single dyadic sub-interval. Theprocedure then selects the largestk measurements in absolutevalue. Thesek measurements are used to define the support ofthe sensing vectors used on step two of the procedure. Morespecifically, the supports of the sensing vectors correspondingto thek largest measurements are bisected, giving2k supportsets. These2k support sets define the support of the sensingvectors on steps = 2.

The procedure continues in the fashion, taking2k mea-surements on each step (after the initial step, which requiresℓ0 measurements) and bisecting the support of thek largestmeasurements to define the support of the sensing vectors on

Page 5: Near-Optimal Adaptive Compressed Sensingto perform considerably better than 1) compressed sensing based on random Gaussian sensing matrices and 2) non-adaptive direct sensing (which

the next step. On the final step,s = s0, wheres0 = log2nℓ0+1,

the support of the sensing vectors consists of a single index;the procedure returns the support of the sensing vectorscorresponding to thek largest measurements as the estimate ofS and terminates. Additionally, estimates of the values of thenon-zero coefficients are returned based on the measurementsmade on the last step.

One of the key aspects of the CASS algorithm is theallocation of the measurement budget across the steps of theprocedure. On steps, the amplitude of the non-zero entries ofthe sensing vectors is given as

√Ms

γn(4)

where γ is an internal parameter of the procedure used toensure the total sensing energy constraint in (3) is satisfiedwith equality. Specifically,

γ = 1 +4k

ℓ0

s0∑

s=2

s2−s.

This particular allocation of the sensing energy gives the CASSprocedure its optimal theoretical guarantees. While the ampli-tude of each sensing vector grows in a polynomial mannerwith each step, the support of the sensing vectors decreasesgeometrically; this results in a decrease in the sensing energyacross steps, but an increase in reliability. In essence, theallocation exploits the fact that an increase in reliability comesat a smaller cost for steps later in the procedure.

Details of the CASS procedure are found in Alg. 1, andperformance is quantified in the following section.

Algorithm 1 CASSinput: number of terms in approximationk,

budgetM ,initial scale parameterǫ ∈ (0, 1] – (defaultǫ = 1)

initialize: ℓ0 = min{4 · 2⌈log2(k/ǫ)⌉, n

}partitions

s0 = log2nℓ0

+ 1 stepsγ = 1 + 4k/ℓ0

∑s0s=2 s2

−s

L = {1, 2, . . . , ℓ0}for s = 1, . . . , s0 do

for ℓ ∈ L doas,ℓ =

√Msγn 1{Jlog2(ℓ0)−1+s,ℓ}

measure: ys,ℓ = 〈as,ℓ,x〉+ zs,ℓ zs,ℓiid∼ N (0, 1)

end forlet ℓ1, ℓ2, . . . , ℓ|L| be such that

|ys,ℓ1 | ≥ |ys,ℓ2 | ≥ · · · ≥ |ys,ℓ|L||

if s 6= s0 thenL = {2ℓ1 − 1, 2ℓ1, 2ℓ2 − 1, 2ℓ2, . . . , 2ℓk − 1, 2ℓk}

elseS = {ℓ1, ℓ2, . . . , ℓk}xi = ys,i ·

√γnMs for all i ∈ S

end ifend foroutput: S (k indices), estimatesxi for i ∈ S

B. Theoretical Guarantees

The theoretical guarantees of CASS are presented in The-orems 1 and 2. Theorem 1 presents theoretical guarantees fornon-negative signals, while Theorem 2 quantifies performancewhen the signal is both positive and negative.

Theorem 1. Assume xi ≥ 0 for all i = 1, . . . , n. Set ǫ = 1.For δ ∈ (0, 1], Alg. 1 has P(S 6= S) ≤ δ provided

xmin ≥√20

n

M

(log k + log

(8

δ

)),

has total sensing energy ||A||2fro = M and uses m = 2k log2nk

measurements.

Proof: See Appendix A.The theoretical guaranties of Theorem 1 do not apply to

recovery of signals with positive and negative entries, asscenarios can arise where two non-zero components can cancelwhen measured by the same sensing vector. In order to avoidthis effect, theinitial scale parameter, ǫ, can be set to a valueless than one. Doing so increases the number of partitionsthat are created on the first step, but reduces the occurrenceofpositive and negative signal cancellation. Theorem 2 boundsthe expected fraction of the components that are recoveredwhen the signal contains positive and negative entries.

Theorem 2. Assume S is chosen uniformly at random. Forany ǫ ∈ (0, 1] Alg. 1 has E[d(S,S)] ≤ kǫ provided

xmin ≥√20

n

M

(log k + 2 log

(8

ǫ

)),

has total sensing energy ||A||2F = M and has

m ≤ 8k

ǫ+ 2k log2

(nk

)

measurements. Moreover, if k/ǫ is a power of two, theprocedure uses exactly 4k/ǫ+2k log2(nǫ/4k) measurements.

Proof: See Appendix B.

C. Discussion

Theorem 2 implies conditions under which a fraction1− ǫof the support can be recovered in expectation. Providedǫ ≥1/ logn, the procedure requires orderk logn measurements.In scenarios where one wishes to controlP(S 6= S), ǫ can beset to satisfyǫ < 1/k. In this case, while the SNR requirementremains essentially unchanged, the procedure would require onthe order ofk2 measurements. In this sense, the compressivecase with positive and negative non-zero values is moredifficult, a phenomena noted in other work [31]. Note thatwhen k ≥ logn, the theoretical guarantee of Theorem 2adapted to control the family wise error rate requires morethan orderk logn measurements.

The minimax lower bound of [12, Proposition 4.2] states thefollowing (over a slightly larger class of signals with sparsity

Page 6: Near-Optimal Adaptive Compressed Sensingto perform considerably better than 1) compressed sensing based on random Gaussian sensing matrices and 2) non-adaptive direct sensing (which

k, k − 1, andk + 1, with k < n/2): if E[d(S,S)] ≤ ǫ thennecessarily

xmin ≥√2n

M

(log k + log

1

2ǫ− 1

).

Since the conditionP(S 6= S) ≤ δ in Theorem 1 impliesE[d(S,S)] ≤ 2kδ (which is readily shown as the procedurealways hasd(S,S) ≤ 2k, and hasd(S,S) = 0 when S =S) the upper and lower bounds can be compared directly bysettingδ = ǫ/2k. In doing so, Theorem 1 implies that CASSsucceeds withE[d(S,S)] ≤ ǫ provided

xmin ≥√20

n

M

(2 log k + log

16

ǫ

).

In the same manner, Theorem 2 can be directly compared.Doing so shows the theorems are tight to within constantfactors.

The splitting or bisecting approach in CASS is a commontechnique for a number search problems, including adaptivesensing. As discussed in Sec. I-C, to the best of our knowledge,the first work to propose a bisecting search for the compressedsensing problem was [3]. When translated to the frameworkstudied here, the authors in essence allocate the sensing vectorson each bisecting step to be proportional to

√M/n, as

opposed to increasing with the index of the step as in (4).Likewise, the compressive binary search (CBS) algorithm[24] also employs sensing vectors that are proportional to√M/n. The CBS procedure finds a single non-zero entry

with vanishing probability of error asn gets large providedxmin ≥

√8 nM log log2 n, wherexmin is the amplitude of the

non-zero entry. The authors of [24] rightly question whetherthe log log2 n term is needed. Thm. 1 answers this question,removing the doubly logarithmic term and coming within aconstant factor of the lower bound in [24]. A more directcomparison to the work in [24] is found in [32].

Increasing the number of partitions to further isolate sparseentries is an idea suggested in [3], [25]. In particular, theauthors suggest a procedure that creates a large random par-titions based on primes, essentially decreasing the support ofthe sensing vector such that each non-zero element is isolatedwith arbitrarily high probability.

IV. N UMERICAL EXPERIMENTS

This section presents a series of numerical experimentsaimed at highlighting the performance gains of adaptive sens-ing. In all experiments, the CASS procedure was implementedin the sparsifying basis according to Alg. 1 with inputs asspecified. The signal to noise ratio is defined as

SNR (dB) = 10 log10

(x2(k)M

n

)

wherex(k) is the amplitude of thekth largest element ofx.This definition reflects the SNR of measuring thekth largestelement using a fraction1/n of the total sensing energy. In

102

103

104

105

106

10−2

10−1

100

n

P(S

6=S

)

CASS, ǫ = 1, m = 2 log2n

OMP, m = 2 log2n

Direct , m = n

SNR = 15.5 dB

SNR = 9.5 dB

Fig. 1. Recovery of one sparse signal (k = 1). Empirical performanceof CASS (Alg. 1), traditional compressed sensing with orthogonal matchingpursuit (OMP), and direct sensing as a function ofn, for SNR(dB) =10 log10(x

2minM/n) = 9.5 dB and SNR = 15.5 dB. 10,000 trials for

eachn.

all experiments, Gaussian noise of unit variance is added toeach measurement, as specified in (1).

Fig. 1 shows empirical performance of CASS for 1-sparserecovery, and, for comparison,1) traditional compressed sens-ing with orthogonal matching pursuit (OMP) using a Gaussianensemble with normalized columns and2) direct sensing.As the dimension of the problem is increased, notice thatperformance of CASS remains constant, while the error prob-ability of both OMP and direct sensing increase. Note for non-adaptive methods, in the one sparse case OMP is equivalent tothe maximum likelihood support estimator, and is thus optimalamongst non-adaptive recovery algorithms. The total sensingenergy is the same in all cases, allowing for fair comparison.For sufficiently largen, CASS outperforms direct sensing.

Figs. 2 and 3 show performance of CASS fork-sparsesignals with equal magnitude positive and negative non-zeroentries. Performance is in terms of empirical value ofd(S,S)averaged over the trials. The support of the test signals waschosen uniformly at random from the set ofk-sparse signals,and the non-zero entries were assigned an amplitude of+xmin

or−xmin with equal probability. The cancellation effect resultsin an error floor of aroundd(S,S)/(2k) = 0.10; this isgreatly reduced when the scale parameter is set toǫ = 1/8,clearly visible in both Fig. 2 and Fig. 3. The procedureis compared against traditional compressed sensing using arandom Gaussian ensemble, recovered with LASSO and aregularizer tuned to return ak-sparse signal (using SpaRSA,[33]). Performance of traditional compressed sensing withm = 2k log(n/k), fairly compared against CASS withǫ = 1,is shown in both figures. Traditional compressed sensing withm = 4k/ǫ + 2k log2(nǫ/4k) is also shown, which is fairlycompared to CASS withǫ = 1/8.

In Fig. 3 notice that performance of CASS remains con-

Page 7: Near-Optimal Adaptive Compressed Sensingto perform considerably better than 1) compressed sensing based on random Gaussian sensing matrices and 2) non-adaptive direct sensing (which

0 10 20 30 40 50 60 70 800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

SNR

d(S

,S

)/(2

k)

CASS, ǫ = 1, m = 384

CS (LASSO), m = 384

CASS, ǫ = 1/8, m = 1088

CS (LASSO), m = 1088

Fig. 2. Empirical performance of CASS (Alg. 1) with non-zeroentries ofx positive and negative of equal amplitude as a function of SNRcomparedagainst traditional compressed sensing with Gaussian ensemble and recoveryusing LASSO,k = 32, n = 2048. SNR = x2

minM/n. 100 trials for eachSNR (linear units).

104

105

106

10−2

10−1

100

n

d(S

,S

)/(2

k)

CASS ǫ = {1, 1/8}

CS (LASSO)

Direct , m = n

m = 2k log2

(

n

k

)

m = 32k + 2k log2

(

n

32k

)

Fig. 3. Non-zero entries of equal magnitude, positive and negative, as afunction of n. k = 32. CASS Alg. 1 with ǫ = 1, m = 2k log2(n/k),and ǫ = 1/8, m = 32k + 2k log2(

n

32k). LASSO using Gaussian ensemble,

m = 2k log2(n/k), m = 32k+2k log2(n

32k). SNR= 15.5 dB. 1000 trials

for eachn. LASSO was not evaluated forn ≥ 219 because of computationalintensity.

stant asn becomes large. For sufficiently largen, CASSoutperforms direct sensing, highlighting a major advantageof adaptive sensing: for support recovery, while standardcompressed sensing does not outperform direct sensing interms of dependence on SNR, for sufficiently sparse problems,CASS does. As solving LASSO with a dense sensing matrixfor large problem sizes becomes computationally prohibitive,performance was only evaluated up ton = 218. For alldimensions evaluated, compressed sensing using LASSO was

inferior to CASS and direct sensing.

0 2 4 6 8 10 12 14 16 18 20

18

20

22

24

PSN

R

SNR = 4 dB

0 2 4 6 8 10 12 14 16 18 200.4

0.45

0.5

0.55

0.6

0.65

Image Type

d(S

,S

)/(2

k)

CASS, ǫ = 1 CS Direct G-LASSO

0 2 4 6 8 10 12 14 16 18 20

18

20

22

24

26

PSN

R

SNR = 16 dB

0 2 4 6 8 10 12 14 16 18 20

0.1

0.2

0.3

0.4

0.5

Image Type

d(S

,S

)/(2

k)

CASS, ǫ = 1 CS Direct G-LASSO

Fig. 4. Performance of CASS, standard compressed sensing with LASSOand Gaussian ensemble, Group LASSO of [34] (G-LASSO) and Gaussianensemble, which exploits signal structure, and direct sensing on the MicrosoftResearch Object Class Recognition database [35] for two SNRs. The databaseconsists of 20 image classes, with approximately 30 images per class. Theimages are approximately sparse in the Haar basis. The totalsensing energywas the same for all methods.n = 1282 . The results are averaged over eachclass.k = 512, ǫ = 1.

Fig. 4 shows results of running CASS on the 20 imageclasses in the Microsoft Research Object Class Recognitiondatabase [35]. Each class consists of approximately 30 imagesof size 128 × 128. Results were averaged over all imagesin a particular class. Direct sensing, standard compressedsensing using a Gaussian Ensemble and the group LASSO(G-LASSO) of [34], [36] which exploits signal structure, areincluded. G-LASSO is known to be one of the best non-adaptive methods for recovery of natural images, making it anatural benchmark for comparison (see [37, Fig. 5]). The totalsensing energy was the same in all cases, and all methods usedm = 5120 measurements (except direct sensing which requiresm = 16, 384 measurements). Each algorithm was tuned to

Page 8: Near-Optimal Adaptive Compressed Sensingto perform considerably better than 1) compressed sensing based on random Gaussian sensing matrices and 2) non-adaptive direct sensing (which

TABLE IIHANDWRITTEN EIGHT, 10 TRIALS,n = 2562 , k = 256, ǫ = 1

CASS (Alg. 1) CS (LASSO) Direct

SNR = −2 dBd(S,S) 120.0 153.2 131.0

PSNR (dB) 16.88 8.43 12.71runtime (s) 0.11 118.3 0.04

SNR = −8 dBd(S,S) 152.0 202.0 163.8

PSNR (dB) 12.91 4.40 6.15runtime (s) 0.11 65.0 0.04

SNR = −14 dBd(S,S) 236.4 379.6 301.6

PSNR (dB) 6.69 2.52 -1.76runtime (s) 0.11 54.2 0.03

output a 512 coefficient approximation. CASS universallyoutperforms both G-LASSO and standard CS on all imagesets and SNRs evaluated.

Fig. 5 shows recovery of a test image from [38] for threeSNRs. The image isapproximately sparse in the Daubechieswavelet domain (‘db4’ in MATLAB). Alg. 1 was run withk = 256 as input and compared against LASSO with arandom Gaussian ensemble, and the regularizer was tunedto return k = 256 elements (again using SpaRSA, [33]).Using traditional compressed sensing, at an SNR of−8 dB,the reconstructed image is essentially un-recognizable; at thesame SNR, the image recovered with CASS is still verymuch recognizable. The peak-signal-to-noise ratio (PSNR)forimages on[0, 1) is defined as

PSNR (dB) = −10 log10

( ||x− x||22n

)

wherex is an estimate of the image.The experiments of Fig. 5 were repeated 10 times, and the

empirical results averaged over the 10 trials are shown in TableII. In terms of PSNR, CASS outpeforms compressed sensingby 4-8 dB. The runtime of the simulation, which consistsof the entire simulation time, including both measurementand recovery, is included. While the compressed sensingexperiments were not optimized for speed, it seems unlikelyany recovery method using dense Gaussian matrices wouldapproach the runtime of CASS – standard compressed sensingruntime was on the order of a minute, while runtime of CASSwas approximately0.1 seconds.

Fig. 6 shows side-by-side performance of CASS, traditionalcompressed sensing with a dense Gaussian sensing matrixand LASSO tuned to returnk = 2048 non-zero components,and direct sensing on a natural image. In terms of PSNR,CASS outperforms compressed sensing by 2-4 dB. In termsof symmetric set difference, CASS substantially outperformscompressed sensing on all SNRs evaluated.

A. Discussion

The leading constant in the upper bound – a factor of 20required by the CASS procedure – is significant in practice.

Roughly speaking, traditional direct sensing requiresxmin ≥√2 nM logn, while CASS requiresxmin ≥

√20 n

M log k. Onewould then conclude that anytime the signal is sufficientlysparse, specifically, ifk ≤ n1/10, the CASS procedure shouldoutperform direct sensing. Notice that in simulation CASSoutperforms direct sensing on signals wherek ≥ n1/10,implying the leading constant of 20 is loose.

V. CONCLUSION

This work presented an adaptive compressed sensing andgroup testing procedure for recovery of sparse signals inadditive Gaussian noise, termed CASS. The procedure wasshown to be near-optimal in that it comes within a constantfactor of the best possible dependence on SNR, while requiringon the order ofk logn measurements. Simulation showedthat the theoretical developments hold in practice, includingscenarios in which the signals is only approximately sparse.

While many of the questions regarding the gains of adap-tivity for sparse support recovery have been settled, smallquestions remain. First, it would be of practical interest tofurther reduce the leading constant in the SNR requirementof CASS. The question of whether or notexact supportrecovery is possible in full generality (positive and negativenon-zero entries, any level of sparsity), with SNR scaling aslog k, and the number of measurements scaling ask logn, isstill outstanding. Other questions, such as the applicability ofCASS to signals which are sparse in overcomplete dictionaries,are also interesting future research directions.

APPENDIX A

Proof of Theorem 1: The total number of measurements isgiven as

m = ℓ0 + 2k(s0 − 1) = 2k log2

(nk

)

since the first step requiresℓ0 measurements, and the remain-ing s0 − 1 steps each require2k measurements. The equalityfollows asℓ0 = 4k. The constraint in (3) is satisfied:

||A||2fro =

ℓ0∑

ℓ=1

||a1,ℓ||2 +s0∑

s=2

2k∑

ℓ=1

||as,ℓ||2

= ℓ0

(n

ℓ0

)(M

γn

)+ 2k

s0∑

s=2

(n

ℓ02s−1

)(Ms

γn

)

=M

γ

(1 +

4k

ℓ0

s0∑

s=2

s2−s

)= M (5)

Page 9: Near-Optimal Adaptive Compressed Sensingto perform considerably better than 1) compressed sensing based on random Gaussian sensing matrices and 2) non-adaptive direct sensing (which

CASS (Alg. 1) CS (LASSO) DirectSNR

=−2

dB

d(S ,S) = 112 PSNR= 17.5 dBsimulation time0.13s

d(S,S) = 152 PSNR= 8.6 dBsimulation time69s

d(S ,S) = 134 PSNR= 12.3 dBsimulation time0.05s

SNR

=−8

dB

d(S ,S) = 156 PSNR= 12.9 dBsimulation time0.10s

d(S,S) = 198 PSNR= 4.6 dBsimulation time71s

d(S,S) = 162 PSNR= 5.9 dBsimulation time0.04s

SNR

=−14

dB

d(S ,S) = 232 PSNR= 7.29 dBsimulation time0.12s

d(S ,S) = 380 PSNR= 2.67 dBsimulation time94s

d(S,S) = 310 PSNR= −1.94 dBsimulation time0.04s

Fig. 5. Handwritten eight [38], approximately sparse in ‘dB4’ wavelet basis,n = 2562. Recovery with CASS (Alg. 1),k = 256, ǫ = 1, as input,m = 4096 (column 1). Traditional compressed sensing with Gaussian ensemble and LASSO (using SpaRSA [33]), tuned to returnk = 256 components,m = 4096 (column 2). Direct sensing,m = n, using bestk = 256 term approximation (column 3). Each method evaluated for three SNRs,SNR(dB) =10 log10(x

2(k)

M/n), wherex(k) is the amplitude of thekth largest component.

where the second equality follows as each sensing vector hassupportn2−(s−1)/ℓ0 and amplitudeMs/(γn) on steps.

A sufficient conditions for exact recovery of the supportset is that the procedure never incorrectly eliminate a mea-surement corresponding to a non-zero entry on any of thes0steps. Letys,1, . . . , ys,ts be measurements corresponding tosensing vectors with one or more non-zero entries in theirsupport, andys,ts+1, . . . be measurements corresponding tothe zero entries on steps. Consider a series of thresholdsdenotedτs, s = 1, . . . , s0. If |ys,1|, . . . , |ys,ts | are all greaterthan τs, and |ys,ts+1|, . . . are all less thenτs, for all s, the

sufficient condition is implied. Taking the complement of thisevent, and from a union bound,

P(S 6= S) ≤ P

t1⋃

j=1

{|y1,j| ≤ τ1} ∪ℓ0⋃

j=t1+1

{|y1,j | ≥ τ1}

+

s0∑

s=2

P

ts⋃

j=1

{|ys,j | ≤ τs} ∪2k⋃

j=ts+1

{|ys,j | ≥ τs}

for any series of thresholdsτs. As all non-zero elements arepositive by assumption, and the signals combine construc-tively, it is straightforward to seeys,1, . . . , ys,ts are normally

Page 10: Near-Optimal Adaptive Compressed Sensingto perform considerably better than 1) compressed sensing based on random Gaussian sensing matrices and 2) non-adaptive direct sensing (which

CASS (Alg. 1) CS (LASSO) DirectSNR

=16

dB

d(S ,S) = 736 PSNR= 25.5 dBsimulation time0.35s

d(S,S) = 1504 PSNR= 21.8 dBsimulation time312s

d(S ,S) = 316 PSNR= 26.2 dBsimulation time0.06s

SNR

=10

dB

d(S,S) = 1100 PSNR= 24.6 dBsimulation time0.34s

d(S,S) = 1774 PSNR= 21.3 dBsimulation time327s

d(S ,S) = 714 PSNR= 25.3 dBsimulation time0.06s

SNR

=4

dB

d(S,S) = 1836 PSNR= 22.7 dBsimulation time0.36s

d(S,S) = 2304 PSNR= 20.5 dBsimulation time232s

d(S,S) = 1694 PSNR= 21.4 dBsimulation time0.05s

Fig. 6. Cameraman, approximately sparse in Haar basis,n = 2562. Recovery with CASS, Alg. 1,k = 2048, ǫ = 1, as input,m = 20480 (column 1).Traditional compressed sensing with Gaussian ensemble andrecovery with LASSO (using SpaRSA [33]), regularizer tunedto returnk = 2048 components,m = 20480 (column 2). Direct sensing,m = n, using bestk = 2048 term approximation (column 3). Each method evaluated for three SNRs,SNR(dB) =10 log10(x

2(k)

M/n), wherex(k) is the amplitude of thekth largest component.

distributed with mean greater thanxminas and unit variance.On the other hand,ys,ts+1, . . . are normally distributed withmean zero and unit variance. Setting the thresholds atτs =xminas/2, whereas is the amplitude of the non-zero entriesof the sensing vector on steps, gives the series of equationsin (6) - (8).

Here, (6) follows from the union bound. Equation (7)follows from a standard Gaussian tail bound:1 − FN (x) ≤12 exp(−x2/2) for x ≥ 0, where FN (·) is the Gaussiancumulative density function. Lastly, (8) follows asγ is always

less than5/2:

γ = 1 + 4k/ℓ0

s0∑

s=2

s2−s ≤ 5

2(9)

as the sum is arithmetico-geometric, andℓ = 4k. Setting

xmin ≥√20

n

M

(log k + log

(8

δ

))

in (8) givesP(S 6= S) ≤∑s0s=1 (2/δ)

−s ≤ δ, completing theproof.

Page 11: Near-Optimal Adaptive Compressed Sensingto perform considerably better than 1) compressed sensing based on random Gaussian sensing matrices and 2) non-adaptive direct sensing (which

P

(S 6= S

)≤

t1∑

j=1

P

(|y1,j| ≤

xmina12

)+

4k∑

j=t1+1

P

(|y1,j| ≥

xmina12

)

+

s0∑

s=2

ts∑

j=1

P

(|ys,j | ≤

xminas2

)+

2k∑

j=ts+1

P

(|ys,j | ≥

xminas2

) (6)

≤ 4k exp

(−x2

mina21

8

)+

s0∑

s=2

2k exp

(−x2

mina2s

8

)(7)

≤s0∑

s=1

exp

(−x2

minMs

8γn+ log 4k

)≤

s0∑

s=1

exp

(−x2

minMs

20n+ log 4k

)(8)

APPENDIX B

Proof of Theorem 2.First, we confirm that the sensing budget is satisfied. In the

same manner as before

||A||2fro =

ℓ0∑

ℓ=1

||a1,ℓ||2 +s0∑

s=2

2k∑

ℓ=1

||as,ℓ||2

=M

γ

(1 +

4k

ℓ0

s0∑

s=2

s2−s

)= M.

The total number of measurement satisfies

m = ℓ0 + 2k(s0 − 1) ≤ 8k/ǫ+ 2k log2(n/k)

where the inequality follows asmin{n, 4k/ǫ} ≤ ℓ0 ≤ 4 ·2⌈log2(k/ǫ)⌉ ≤ 8k/ǫ.

We proceed by bounding the expected number of non-zero components that are isolated when the signal is par-titioned, and then show these indices are returned inSwith high probability. We assume the support of the signalis chosen uniformly at random from all possiblek-sparsesupports sets (note this is equivalent to randomly permut-ing the labels of all indices). Conceptually, the algorithmdivides the signal support intoℓ0 disjoint support sets, whereℓ0 = min{n, next power of 2 greater than4k/ǫ}. Let the setI ⊂ {1, . . . , ℓ0} index the disjoint support sets that containexactly one index corresponding to a non-zero entry:

I :={ℓ ∈ {1, . . . , ℓ0} : |Jlog2 ℓ0,ℓ ∩ S| = 1

}.

Then

E[|I|] = E

[ℓ0∑

ℓ=1

I{ℓ∈I}

]=

ℓ0∑

ℓ=1

E[I{ℓ∈I}

]

=

ℓ0∑

ℓ=1

P(ℓ ∈ I) =ℓ0∑

ℓ=1

k

ℓ0

(ℓ0 − 1

ℓ0

)k−1

= k

(ℓ0 − 1

ℓ0

)k−1

≥ kℓ0 − k

ℓ0

whereI{·} is the indicator function. Sinceℓ0 ≥ min{4k/ǫ, n}we have

E[|I|] ≥ k(1− ǫ/4). (10)

Let E be the event that the procedure fails to find at least asmany true non-zero elements as the number that were isolatedon the first partitioning, specifically,

E = {|S ∩ S| < |I|}.Additionally, notice that since the procedure returns exactly kindices,

Ec ⇒ d(S,S) ≤ 2(k − |I|). (11)

We can bound the probability of the eventE in a fashionsimilar to the previous proof. Consider a threshold at steps given asτs = xminas/2. Assume thatys,1, ys,2, . . . , ys,t′sare measurements corresponding to isolated components,ys,t′s+1, . . . , ys,ts correspond to measurements that containmore than one non-zero index, andys,ts+1, . . . correspondto noise only measurements. If all noise only measurements(ys,ts+1, . . . ) are below the threshold in absolute value, andall isolated non-zero components are above the threshold(ys,1, . . . , ys,t′s), for all s, this implies E does not occur,regardless ofys,t′s+1, . . . , ys,ts . As before, after applying aunion bound,

P (E) ≤t′1∑

j=1

P (|y1,j | ≤ τ1) +

ℓ0∑

j=t1+1

P (|y1,j | ≥ τ1)

+

s0∑

s=2

t′s∑

j=1

P (|ys,j | ≤ τs) +

2k∑

j=ts+1

P (|ys,j | ≥ τs)

≤ ℓ0 exp

(−x2

mina21

8

)+

s0∑

s=2

2k exp

(−x2

mina2s

8

)

≤s0∑

s=1

exp

(−x2

mina2s

8+ log

(8k

ǫ

))

where the inequality follows from a standard Gaussian tailbound, and the last line follows asℓ0 ≤ 8k/ǫ. Settingas =√Ms/γn gives

P (E) ≤s0∑

s=1

exp

(−x2

minMs

8γn+ log

(8k

ǫ

))

≤s0∑

s=1

exp

(−x2

minMs

20n+ log

(8k

ǫ

))

Page 12: Near-Optimal Adaptive Compressed Sensingto perform considerably better than 1) compressed sensing based on random Gaussian sensing matrices and 2) non-adaptive direct sensing (which

since, from (9),γ ≤ 5/2. Next, imposing the condition of theTheorem,

xmin ≥√20

n

M

(log k + 2 log

(8

ǫ

))

=

√20

n

M

(log

(8

ǫ

)+ log

(8k

ǫ

)),

then

P (E) ≤s0∑

s=1

(8

ǫ

)−s

≤ ǫ/4. (12)

By combining (10), (11), and (12), sinceE[d(S ,S)|E ] ≤ 2k,

E[d(S,S)] = P(E)E[d(S ,S)|E ] + P(Ec)E[d(S ,S)|Ec]

≤ kǫ,

completing the proof.

REFERENCES

[1] M. L. Malloy and R. D. Nowak, “Near-optimal adaptive compressedsensing,” inSignals, Systems and Computers (ASILOMAR), 2012 Con-ference Record of the Forty Sixth Asilomar Conference on. IEEE, 2012,pp. 1124–1130.

[2] J. Haupt, R. Nowak, and R. Castro, “Adaptive sensing for sparse signalrecovery,” inDigital Signal Processing Workshop and 5th IEEE SignalProcessing Education Workshop, 2009. DSP/SPE 2009. IEEE 13th.IEEE, 2009, pp. 702–707.

[3] M. Iwen, “Group testing strategies for recovery of sparse signals innoise,” in Signals, Systems and Computers, 2009 Conference Record ofthe Forty-Third Asilomar Conference on. IEEE, 2009, pp. 1561–1565.

[4] M. Iwen and A. Tewfik, “Adaptive compressed sensing for sparsesignals in noise,” inSignals, Systems and Computers (ASILOMAR), 2011Conference Record of the Forty Fifth Asilomar Conference on. IEEE,2011.

[5] J. Haupt, R. Baraniuk, R. Castro, and R. Nowak, “Sequentially designedcompressed sensing,” inStatistical Signal Processing Workshop (SSP),2012 IEEE. IEEE, 2012, pp. 401–404.

[6] M. Duarte, M. Davenport, D. Takhar, J. Laska, T. Sun, K. Kelly, andR. Baraniuk, “Single-pixel imaging via compressive sampling,” SignalProcessing Magazine, IEEE, vol. 25, no. 2, pp. 83–91, 2008.

[7] V. Studer, J. Bobin, M. Chahid, H. Mousavi, E. Candes, andM. Dahan,“Compressive fluorescence microscopy for biological and hyperspectralimaging,” Proceedings of the National Academy of Sciences, vol. 109,no. 26, pp. E1679–E1687, 2012.

[8] E. Arias-Castro, E. J. Candes, and M. Davenport, “On the FundamentalLimits of Adaptive Sensing,”ArXiv e-prints, Nov. 2011.

[9] M. Wainwright, “Information-theoretic limits on sparsity recovery inthe high-dimensional and noisy setting,”Information Theory, IEEETransactions on, vol. 55, no. 12, pp. 5728 –5741, dec. 2009.

[10] S. Aeron, V. Saligrama, and M. Zhao, “Information theoretic boundsfor compressed sensing,”Information Theory, IEEE Transactions on,vol. 56, no. 10, pp. 5111 –5130, oct. 2010.

[11] M. J. Wainwright, “Sharp thresholds for high-dimensional and noisyrecovery of sparsity,”arXiv preprint math/0605740, 2006.

[12] R. M. Castro, “Adaptive Sensing Performance Lower Bounds for SparseSignal Estimation and Testing,”ArXiv e-prints, Jun. 2012.

[13] M. L. Malloy and R. D. Nowak, “On the limits of sequentialtestingin high dimensions,” inSignals, Systems and Computers (ASILOMAR),2011 Conference Record of the Forty Fifth Asilomar Conference on.IEEE, 2011, pp. 1245–1249.

[14] M. L. Malloy and R. Nowak, “Sequential testing for sparse recovery,”arXiv preprint arXiv:1212.1801, 2012.

[15] D. Wei and A. O. Hero, “Multistage adaptive estimation of sparsesignals,” in Statistical Signal Processing Workshop (SSP), 2012 IEEE.IEEE, 2012, pp. 153–156.

[16] G. Newstadt, E. Bashan, and A. Hero, “Adaptive search for sparse targetswith informative priors,” in Acoustics Speech and Signal Processing(ICASSP), 2010 IEEE International Conference on. IEEE, 2010, pp.3542–3545.

[17] J. Haupt, R. M. Castro, and R. Nowak, “Distilled sensing: Adaptivesampling for sparse detection and estimation,”Information Theory, IEEETransactions on, vol. 57, no. 9, pp. 6222–6235, 2011.

[18] M. L. Malloy and R. D. Nowak, “Sequential analysis in high-dimensional multiple testing and sparse recovery,” inInformation TheoryProceedings (ISIT), 2011 IEEE International Symposium on. IEEE,2011.

[19] M. Malloy, G. Tang, and R. Nowak, “The sample complexityof searchover multiple populations,”Information Theory, IEEE Transactions on,vol. 59, no. 8, pp. 5039–5050, Aug 2013.

[20] E. Price and D. P. Woodruff, “Lower bounds for adaptive sparserecovery,”arXiv preprint arXiv:1205.3518, 2012.

[21] K. Jamieson, M. Malloy, R. Nowak, and S. Bubeck, “On findingthe largest mean among many,” inSignals, Systems and Computers(ASILOMAR), 2013 Conference Record of the Forty Seventh AsilomarConference on. IEEE, 2013.

[22] ——, “lil’UCB: An optimal exploration algorithm for multi-armedbandits,”arXiv preprint arXiv:1312.7308, 2013.

[23] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” SignalProcessing, IEEE Transactions on, vol. 56, no. 6, pp. 2346–2356, 2008.

[24] M. A. Davenport and E. Arias-Castro, “Compressive binarysearch,” ArXiv e-prints, Feb. 2012. [Online]. Available:http://arxiv.org/abs/1202.0937

[25] M. Iwen and A. Tewfik, “Adaptive strategies for target detection and lo-calization in noisy environments,”Signal Processing, IEEE Transactionson, vol. 60, no. 5, pp. 2344–2353, 2012.

[26] J. Haupt, R. Baraniuk, R. Castro, and R. Nowak, “Compressive distilledsensing: Sparse recovery using adaptivity in compressive measure-ments,” in Signals, Systems and Computers, 2009 Conference Recordof the Forty-Third Asilomar Conference on, nov. 2009, pp. 1551 –1555.

[27] G. K. Atia and V. Saligrama, “Boolean compressed sensing and noisygroup testing,”Information Theory, IEEE Transactions on, vol. 58, no. 3,pp. 1880–1901, 2012.

[28] A. Gilbert, M. Iwen, and M. Strauss, “Group testing and sparse signalrecovery,” in Signals, Systems and Computers, 2008 42nd AsilomarConference on. IEEE, 2008, pp. 1059–1063.

[29] D. Du and F. Hwang,Combinatorial group testing and its applications.World Scientific, 1993.

[30] F. Hwang, “A method for detecting all defective membersin a populationby group testing,” Journal of the American Statistical Association,vol. 67, no. 339, pp. 605–608, 1972.

[31] E. Arias-Castro, “Detecting a vector based on linear measurements,”Electronic Journal of Statistics, vol. 6, pp. 547–558, 2012.

[32] M. L. Malloy and R. D. Nowak, “Near-Optimal CompressiveBinary Search,” ArXiv e-prints, Mar. 2012. [Online]. Available:http://arxiv.org/abs/1203.1804

[33] S. J. Wright, R. D. Nowak, and M. A. Figueiredo, “Sparse reconstructionby separable approximation,”Signal Processing, IEEE Transactions on,vol. 57, no. 7, pp. 2479–2493, 2009.

[34] N. S. Rao, R. D. Nowak, S. J. Wright, and N. G. Kingsbury, “Convexapproaches to model wavelet sparsity patterns,” inImage Processing(ICIP), 2011 18th IEEE International Conference on. IEEE, 2011, pp.1917–1920.

[35] “Microsoft Research Cambridge Object Recognition Image Database,http://research.microsoft.com/en-us/projects/objectclassrecognition/.”

[36] L. Jacob, G. Obozinski, and J.-P. Vert, “Group lasso with overlapand graph lasso,” inProceedings of the 26th Annual InternationalConference on Machine Learning. ACM, 2009, pp. 433–440.

[37] N. S. Rao and R. D. Nowak, “Correlated Gaussian designs for com-pressive imaging,” inImage Processing (ICIP), 2012 19th IEEE Inter-national Conference on. IEEE, 2012, pp. 921–924.

[38] A. Frank and A. Asuncion, “UCI machine learning repository,” 2010.[Online]. Available: http://archive.ics.uci.edu/ml

Page 13: Near-Optimal Adaptive Compressed Sensingto perform considerably better than 1) compressed sensing based on random Gaussian sensing matrices and 2) non-adaptive direct sensing (which

Matthew L. Malloy received the B.S. degree in Electrical and ComputerEngineering from the University of Wisconsin in 2004, the M.S. degree fromStanford University in 2005 in electrical engineering, andthe Ph.D. degreefrom the University of Wisconsin in December 2012 in electrical engineering.

Dr. Malloy currently holds a postdoctoral research position at the Univer-sity of Wisconsin in the Wisconsin Institutes for Discovery. From 2005-2008,he was a radio frequency design engineering for Motorola in Chicago, IL,USA. In 2008 he received the Wisconsin Distinguished Graduate fellowship.In 2009, 2010 and 2011 he received the Innovative Signal Analysis fellowship.

Dr. Malloy has served as a reviewer for the IEEE Transactionson SignalProcessing, the IEEE Transactions on Information Theory, IEEE Transactionson Automatic Control and the Annals of Statistics. He was awarded the beststudent paper award at the Asilomar Conference on Signals and Systems(2011). His current research interests include signal processing, estimation anddetection, information theory, statistics and optimization, with applications incommunications, biology, and imaging.

Robert D. Nowak received the B.S., M.S., and Ph.D. degrees in electricalengineering from the University of Wisconsin-Madison in 1990, 1992, and1995, respectively. He was a Postdoctoral Fellow at Rice University in 1995-1996, an Assistant Professor at Michigan State University from 1996-1999,

held Assistant and Associate Professor positions at Rice University from 1999-2003, and is now the McFarland-Bascom Professor of Engineering at theUniversity of Wisconsin-Madison.

Professor Nowak has held visiting positions at INRIA, Sophia-Antipolis(2001), and Trinity College, Cambridge (2010). He has served as an AssociateEditor for the IEEE Transactions on Image Processing and theACM Transac-tions on Sensor Networks, and as the Secretary of the SIAM Activity Groupon Imaging Science. He was General Chair for the 2007 IEEE StatisticalSignal Processing workshop and Technical Program Chair forthe 2003 IEEEStatistical Signal Processing Workshop and the 2004 IEEE/ACM InternationalSymposium on Information Processing in Sensor Networks.

Professor Nowak received the General Electric Genius of Invention Award(1993), the National Science Foundation CAREER Award (1997), the ArmyResearch Office Young Investigator Program Award (1999), the Office ofNaval Research Young Investigator Program Award (2000), the IEEE SignalProcessing Society Young Author Best Paper Award (2000), the IEEE SignalProcessing Society Best Paper Award (2011), ASPRS Talbert Abrams PaperAward (2012), and the IEEE W.R.G. Baker Award (2014). He is a Fellowof the Institute of Electrical and Electronics Engineers (IEEE). His researchinterests include signal processing, machine learning, imaging and networkscience, and applications in communications, bioimaging,and systems biol-ogy.