wiener filters in gaussian mixture signal estimation with \(\ell _\infty \) -norm error
TRANSCRIPT
6626 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 10, OCTOBER 2014
Wiener Filters in Gaussian Mixture SignalEstimation With �∞-Norm Error
Jin Tan, Student Member, IEEE, Dror Baron, Member, IEEE,and Liyi Dai, Fellow, IEEE
Abstract— Consider the estimation of a signal x ∈ RN from
noisy observations r = x + z, where the input x is generated by anindependent and identically distributed (i.i.d.) Gaussian mixturesource, and z is additive white Gaussian noise in parallel Gaussianchannels. Typically, the �2-norm error (squared error) is used toquantify the performance of the estimation process. In contrast,we consider the �∞-norm error (worst case error). For thiserror metric, we prove that, in an asymptotic setting where thesignal dimension N → ∞, the �∞-norm error always comesfrom the Gaussian component that has the largest variance, andthe Wiener filter asymptotically achieves the optimal expected�∞-norm error. The i.i.d. Gaussian mixture case can be extendedto i.i.d. Bernoulli-Gaussian distributions, which are often usedto model sparse signals. Finally, our results can be extendedto linear mixing systems with i.i.d. Gaussian mixture inputs, insettings where a linear mixing system can be decoupled to parallelGaussian channels.
Index Terms— Estimation theory, Gaussian mixtures, �∞-normerror, linear mixing systems, parallel Gaussian channels, Wienerfilters.
I. INTRODUCTION
A. Motivation
The Gaussian distribution is widely used to describe theprobability densities of various types of data, owing toits advantageous mathematical properties [3]. It has beenshown that non-Gaussian distributions can often be sufficientlyapproximated by an infinite mixture of Gaussians [4], so thatthe mathematical advantages of the Gaussian distribution canbe leveraged when discussing non-Gaussian signals [4]–[8].In practice, signals are often contaminated by noise duringsampling or transmission, and therefore estimation of signalsfrom their noisy observations are needed. Most estimationmethods evaluate the performance by the ubiquitous �2-normerror [4] (squared error). However, there are applicationswhere other error metrics may be preferred [9]. For example,the �2 error criterion ensures that the estimated signal has
Manuscript received January 3, 2014; revised May 14, 2014; acceptedJuly 16, 2014. Date of publication August 1, 2014; date of current versionSeptember 11, 2014. This work was supported in part by the National ScienceFoundation under Grant CCF-1217749 and in part by the U.S. Army ResearchOffice under Grant W911NF-04-D-0003. This paper was presented at the2013 Information Theory and Applications Workshop [1] and 2014 IEEEConference on Information Sciences and Systems [2].
J. Tan and D. Baron are with the Department of Electrical and ComputerEngineering, North Carolina State University, Raleigh, NC 27695 USA(e-mail: [email protected]; [email protected]).
L. Dai is with the Computing Sciences Division, U.S. Army ResearchOffice, Research Triangle Park, Durham, NC 27709 USA (e-mail:[email protected]).
Communicated by O. Milenkovic, Associate Editor for Coding Theory.Digital Object Identifier 10.1109/TIT.2014.2345260
low square error on average, but does not guarantee thatevery estimated signal component is close to the correspondingoriginal signal component. In problems such as image andvideo compression [10] where the reconstruction quality atevery signal component is important, it would be desirable tooptimize for �∞-norm error. Our interest in the �∞-norm erroris also motivated by applications including wireless commu-nications [11], group testing [12] and trajectory planning incontrol systems [13], where we want to decrease the worst-case sensitivity to noise.
B. Problem Setting
In this correspondence, our main focus is on parallelGaussian channels, and the results can be extended to linearmixing systems. In both settings, the input x ∈ R
N is generatedby an independent and identically distributed (i.i.d.) Gaussianmixture source,
xi ∼K∑
k=1
sk · N (μk, σ2k ) =
K∑
k=1
sk√2πσ 2
k
e− (xi−μk )2
2σ2k , (1)
where the subscript (·)i denotes the i -th component ofa sequence (or a vector), μ1, μ2, . . . , μK (respectively,σ 2
1 , σ 22 , . . . , σ 2
K ) are the means (respectively, variances) ofthe Gaussian components, and 0 < s1, s2, . . . , sK < 1 arethe probabilities of the K Gaussian components. Note that∑K
k=1 sk = 1. A special case of the Gaussian mixture isBernoulli-Gaussian,
xi ∼ s · N (μx , σ2x ) + (1 − s) · δ(xi ), (2)
for some 0 < s < 1, μx , and σ 2x , where δ(·) is the delta
function [3]. The zero-mean Bernoulli-Gaussian model is oftenused in sparse signal processing [14]–[21].
In parallel Gaussian channels [5], [6], we consider
r = x + z, (3)
where r, x, z ∈ RN are the output signal, the input signal,
and the additive white Gaussian noise (AWGN), respectively.The AWGN channel can be described by the conditionaldistribution
fR|X(r|x)=N∏
i=1
fR|X (ri |xi )=N∏
i=1
1√2πσ 2
z
exp
(− (ri − xi )
2
2σ 2z
),
(4)
where σ 2z is the variance of the Gaussian noise.
0018-9448 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
TAN et al.: WIENER FILTERS IN GAUSSIAN MIXTURE SIGNAL ESTIMATION 6627
In a linear mixing system [14], [15], [17], [18], we consider
w = �x, (5)
the measurement matrix � ∈ RM×N is sparse and its
entries are i.i.d. Because each component of the measurementvector w ∈ R
M is a linear combination of the componentsof x, we call the system (5) a linear mixing system. Themeasurements w are passed through a bank of separable scalarchannels characterized by conditional distributions
fY|W(y|w) =M∏
i=1
fY |W (yi |wi ), (6)
where y ∈ RM are the channel outputs. However, unlike the
parallel Gaussian channels (4), the channels (6) of the linearmixing system are not restricted to Gaussian [16], [18], [19].
Our goal is to estimate the original input signal x either fromthe parallel Gaussian channel outputs r in (3) or from the linearmixing system outputs y and the measurement matrix � in (5)and (6). To evaluate how accurate the estimation process is,we quantify the �∞-norm error between x and its estimate x,
‖x − x‖∞ = maxi∈{1,...,N} |xi − xi |;
this error metric helps prevent any significant errors during theestimation process. The estimator that minimizes the expectedvalue of ‖x − x‖∞ is called the minimum mean �∞-norm errorestimator. We denote this estimator by x�∞ , which can beexpressed as
x�∞ = arg minx
E [‖x − x‖∞]. (7)
C. Related Work
Gaussian mixtures are widely used to model various typesof signals, and a number of signal estimation methods havebeen introduced to take advantage of the Gaussian mixturedistribution. For example, an infinite Gaussian mixture modelwas proposed in [4] to represent real data such as images,and a denoising scheme based on local linear estimators wasdeveloped to estimate the original data. A similar algorithmbased on an adaptive Wiener filter was applied to denoiseX-ray CT images [6], where a Gaussian mixture model wasutilized. However, these works only quantified the �2-normerror of the denoising process. Signal estimation problemswith �∞-norm error have not been well-explored, but therehave been studies on general properties of the �∞-norm. Forexample, in Clark [22], the author developed a deductivemethod to calculate the distribution of the greatest elementin a finite set of random variables; and Indyk [23] discussedhow to find the nearest neighbor of a point while taking the�∞-norm distance into consideration.
D. Contributions
In this correspondence, we study the estimator that min-imizes the �∞-norm error in parallel Gaussian channels inan asymptotic setting where the signal dimension N → ∞.We prove that, when estimating an input signal that is gen-erated by an i.i.d. Gaussian mixture source, the �∞-norm
error always comes from the Gaussian component that hasthe largest variance. Therefore, the well-known Wiener filterachieves the minimum mean �∞-norm error. The Wiener filteris a simple linear function that is applied to the channeloutputs, where the multiplicative constant of the linear func-tion is computed by considering the greatest variance of theGaussian mixture components (1) and the variance of thechannel noise. Moreover, the Wiener filter can be appliedto linear mixing systems defined in (5) and (6) to mini-mize the �∞-norm error, based on settings where a linearmixing system can be decoupled to parallel Gaussianchannels [18], [19], [24]–[28].
The remainder of the correspondence is arranged as follows.We provide our main results and discuss their applications inSection II. Proofs of the main results appear in Section III,while Section IV concludes.
II. MAIN RESULTS
For parallel Gaussian channels (3), the minimum meansquared error estimator, denoted by x�2 , is achieved by theconditional expectation E[x|r]. If the input signal x is i.i.d.Gaussian (not a Gaussian mixture), i.e., xi ∼ N (μx , σ
2x ), then
the estimate
x�2 = E[x|r] = σ 2x
σ 2x + σ 2
z(r − μx ) + μx (8)
achieves the minimum mean squared error, where σ 2z is the
variance of the Gaussian noise z in (3), and we use theconvention that adding a scalar to (respectively, subtractinga scalar from) a vector means adding this scalar to (respec-tively, subtracting this scalar from) every component of thevector. This format in (8) is called the Wiener filter in signalprocessing [29]. It has been shown by Sherman [30], [31]that, besides the �2-norm error, the linear Wiener filter is alsooptimal for all �p-norm errors ( p ≥ 1), including the �∞-normerror. Surprisingly, we find that, if the input signal is generatedby an i.i.d. Gaussian mixture source, then the Wiener filterasymptotically minimizes the expected �∞-norm error.
Before providing the result for the Gaussian mixture inputcase, which is mathematically involved, we begin with ananalysis of the simpler Bernoulli-Gaussian input case.
Theorem 1: In parallel Gaussian channels (3), if the inputsignal x is generated by an i.i.d. Bernoulli-Gaussian sourcedefined in (2), then the Wiener filter
xW,BG = σ 2x
σ 2x + σ 2
z(r − μx) + μx (9)
asymptotically achieves the minimum mean �∞-norm error.More specifically,
limN→∞
E[‖x − xW,BG‖∞
]
E[‖x − x�∞‖∞
] = 1,
where x�∞ satisfies (7).Theorem 1 is proved in Section III-A. The proof
combines concepts in typical sets [32] and a result byGnedenko [33], which provided asymptotic properties ofthe maximum of a Gaussian sequence. The main idea of
6628 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 10, OCTOBER 2014
the proof is to show that with overwhelming probability themaximum absolute error satisfies ‖x − x‖∞ = |xi − xi |,where i ∈ I = {i : xi ∼ N (μx , σ
2x )}, i.e., I is the index
set that includes all the Gaussian components of the vector x,and excludes all the zero components of x. Therefore,minimizing ‖x − x‖∞ is equivalent to minimizing‖xI − xI‖∞, where (·)I denotes a subvector with entriesin the index set I. Because the vector xI is i.i.d. Gaussian,the Wiener filter minimizes ‖xI − xI‖∞ [30], [31];hence the Wiener filter minimizes ‖x − x‖∞ withoverwhelming probability. On the other hand, the caseswhere ‖x − x‖∞ = |xi − xi | and i /∈ I are rare, the mean�∞-norm error of the Wiener filter barely increases, and so theWiener filter asymptotically minimizes the expected �∞-normerror.
Having discussed the Bernoulli-Gaussian case, let usproceed to the Gaussian mixture case defined in (1). Herethe maximum absolute error between x and the estimate xsatisfies ‖x − x‖∞ = |xi − xi |, where i ∈ I ′ = {i : xi ∼N (μm , σ 2
m)}, and m = arg maxk∈{1,2,...,K } σ 2k . That is, the
maximum absolute error between x and x lies in an index thatcorresponds to the Gaussian mixture component with greatestvariance.
Theorem 2: In parallel Gaussian channels (3), if the inputsignal x is generated by an i.i.d. Gaussian mixture sourcedefined in (1), then the Wiener filter
xW,GM = σ 2m
σ 2m + σ 2
z(r − μm) + μm (10)
asymptotically achieves the minimum mean �∞-norm error,where m = arg maxk∈{1,2,...,K } σ 2
k . More specifically,
limN→∞
E[‖x − xW,GM‖∞
]
E[‖x − x�∞‖∞
] = 1,
where x�∞ satisfies (7).The proof of Theorem 2 is given in Section III-B. We note
in passing that the statements in Theorems 1 and 2 do nothold for �p-norm error (0 < p < ∞). Because there is a oneto one correspondence between the parameters (μk and σ 2
k ) ofa Gaussian mixture component and its corresponding Wienerfilter, if a Wiener filter is optimal in the �p error sense for anyof the Gaussian mixture components, then it is suboptimalfor the rest of the mixture components. Therefore, any singleWiener filter is suboptimal in the �p error sense for anyGaussian mixture signal comprising more than one Gaussiancomponent.
Remark 1: Theorems 1 and 2 can be extended to linearmixing systems. We consider a linear mixing system definedin (5), where the matrix � ∈ R
M×N is i.i.d. sparse, and let� denote the average number of nonzeros in each row of �.It has been shown [16], [18], [19], [24]–[28] that, in a large-sparse-limit where M, N, � → ∞ with M/N → β < ∞ forsome constant β > 0 and � = o(N1/2), a linear mixingsystem (5) and (6) can be decoupled to an equivalent setof parallel Gaussian channels, q = x + v where v ∈ R
N isthe equivalent Gaussian noise, and q ∈ R
N are the outputsof the decoupled parallel Gaussian channels. The statisticalproperties of the noise v are characterized by Tanaka’s fixed
point equation [24]–[26], [34]. Therefore, when the inputsignal x is generated by an i.i.d. Gaussian mixture source, byapplying the Wiener filter to q, we can obtain the estimate thatminimizes the �∞-norm error of the signal estimation process.
III. PROOFS
A. Proof of Theorem 1
1)Two Error Patterns: We begin by defining two errorpatterns. Consider parallel Gaussian channels (3), where theinput signal xi ∼ s · N (μx , σ
2x ) + (1 − s) · δ(xi ) for some s,
and the noise zi ∼ N (0, σ 2z ). The Wiener filter (linear
estimator) for the Bernoulli-Gaussian input is xW,BG = σ 2x
σ 2x +σ 2
z·
(r − μx) + μx . Let I denote the index set wherexi ∼ N (μx , σ
2x ), and let J denote the index set where
x j ∼ δ(x j ). We define two types of error patterns: (i) for
i ∈ I �{i : xi ∼ N (μx , σ
2x
)},
the error is
ei � xW,BG,i − xi = σ 2x
σ 2x + σ 2
z· (ri − μx )
+ μx − xi ∼ N(
0,σ 2
x σ 2z
σ 2x + σ 2
z
),
where we remind readers that xW,BG,i denotes the i -th com-ponent of the vector xW,BG in (9); and (ii) for
j ∈ J �{
j : x j ∼ δ(x j )},
the error is
e j � xW,BG, j − x j = σ 2x
σ 2x + σ 2
z· (ri − μx) + μx − 0
∼ N(
σ 2z
σ 2x + σ 2
zμx ,
σ 4x σ 2
z
(σ 2x + σ 2
z )2
).
2) Maximum of Error Patterns: Let us comparemaxi∈I |ei | and max j∈J |e j |.
Lemma 1: Suppose ui is an i.i.d. Gaussian sequence oflength N, ui ∼ N (μ, σ 2) for i ∈ {1, 2, . . . , N}, thenmax1≤i≤N |ui |√
2σ 2·ln(N)converges to 1 in probability. That is,
limN→∞ Pr
(∣∣∣∣∣max1≤i≤N |ui |√
2σ 2 · ln(N)− 1
∣∣∣∣∣ <
)= 1, (11)
for any > 0.Lemma 1 is proved in Section III-C.Before applying Lemma 1, we define a set Aε of possible
inputs x such that the numbers of components in the setsI and J both go to infinity as N → ∞,
Aε �{
x :∣∣∣∣|I|N
− s
∣∣∣∣ < ε
}, (12)
where ε > 0 and ε → 0 (namely, ε → 0+) as a functionof signal dimension N , and |I| denotes the cardinality of theset I. The definition of Aε suggests that
∣∣ |J |N − (1 − s)
∣∣ < εand |I| + |J | = N . Therefore, if x ∈ Aε , then |I|, |J | → ∞as N → ∞.
TAN et al.: WIENER FILTERS IN GAUSSIAN MIXTURE SIGNAL ESTIMATION 6629
Now we are ready to evaluate maxi∈I |ei | and max j∈J |e j |.For i.i.d. Gaussian random variables ei ∼ N (0,
σ 2x σ 2
zσ 2
x +σ 2z),
where i ∈ I, the equality (11) in Lemma 1 becomes
limN→∞Pr
⎛⎜⎜⎝
∣∣∣∣∣∣∣∣
maxi∈I |ei |√2 · σ 2
x σ 2z
σ 2x +σ 2
z· ln(|I|)
−1
∣∣∣∣∣∣∣∣<
∣∣∣∣∣∣∣∣x ∈ Aε
⎞⎟⎟⎠=1, (13)
for any > 0. For i.i.d. Gaussian random variables e j ,where j ∈ J , the equality (11) becomes
limN→∞Pr
⎛
⎜⎜⎝
∣∣∣∣∣∣∣∣
max j∈J∣∣e j∣∣
√2 · σ 4
x σ 2z
(σ 2x +σ 2
z )2 · ln(|J |)−1
∣∣∣∣∣∣∣∣<
∣∣∣∣∣∣∣∣x ∈ Aε
⎞
⎟⎟⎠=1, (14)
for any > 0.Equations (13) and (14) suggest that
limN→∞ E
⎡
⎢⎢⎣maxi∈I |ei |√
2 · σ 2x σ 2
zσ 2
x +σ 2z
· ln(|I|)
∣∣∣∣∣∣∣∣x ∈ Aε
⎤
⎥⎥⎦ = 1
and
limN→∞ E
⎡
⎢⎢⎣max j∈J |e j |√
2 · σ 4x σ 2
z(σ 2
x +σ 2z )2 · ln(|J |)
∣∣∣∣∣∣∣∣x ∈ Aε
⎤
⎥⎥⎦ = 1,
which yield
limN→∞ E
[maxi∈I |ei |√
ln(N)
∣∣∣∣ x∈ Aε
]= lim
N→∞
√
2 · σ 2x σ 2
z
σ 2x + σ 2
z· ln(|I|)
ln(N)
(15)
and
limN→∞ E
[max j∈J |e j |√
ln(N)
∣∣∣∣ x ∈ Aε
]
= limN→∞
√
2 · σ 4x σ 2
z
(σ 2x + σ 2
z )2 · ln(|J |)ln(N)
. (16)
According to the definition of Aε in (12), where s is a constant,and ε → 0+,
N(s − ε) < |I| < N(s + ε)
and
N(1 − s − ε) < |J | < N(1 − s + ε), (17)
and thus
limN→∞
√ln (|I|)ln(N)
= 1 and limN→∞
√ln (|J |)ln(N)
= 1. (18)
Finally, equations (15) and (16) become
limN→∞ E
[maxi∈I |ei |√
ln(N)
∣∣∣∣ x ∈ Aε
]=√
2 · σ 2x σ 2
z
σ 2x + σ 2
z(19)
and
limN→∞ E
[max j∈J |e j |√
ln(N)
∣∣∣∣ x ∈ Aε
]=√
2 · σ 4x σ 2
z
(σ 2x + σ 2
z )2 . (20)
Combining (13) and (14),
limN→∞ Pr
⎛⎜⎜⎝
1 −
1 + <
maxi∈I |ei |max j∈J |e j | ·
√2 · σ 4
x σ 2z
(σ 2x +σ 2
z )2 · ln(|J |)√
2 · σ 2x σ 2
zσ 2
x +σ 2z
· ln(|I|)
<1 +
1 −
∣∣∣∣ x ∈ Aε
⎞⎟⎟⎠ = 1. (21)
Note that√
ln(N) + ln(1 − s − ε)
ln(N) + ln(s + ε)=√
ln(N(1 − s − ε))
ln(N(s + ε))
<
√ln(|J |)ln(|I|) <
√ln(N(1 − s + ε))
ln(N(s − ε))
=√
ln(N) + ln(1 − s + ε)
ln(N) + ln(s − ε).
Then the following limit holds,
limN→∞
√ln(|J |)ln(|I|) = 1.
We can write the above limit in probabilistic form,
limN→∞ Pr
(∣∣∣∣∣
√ln(|J |)ln(|I|) − 1
∣∣∣∣∣ <
∣∣∣∣∣ x ∈ Aε
)= 1, (22)
for any > 0. Because of the logarithms in (22), the
ratio√
2·ln(|J |)√2·ln(|I|) is sufficiently close to 1 as N is astronomically
large. This is why we point out in Section IV that theasymptotic results in this correspondence might be impractical.Plugging (22) into (21),
limN→∞ Pr
⎛
⎝ 1 −
(1 + )2 ·√
σ 2x + σ 2
z
σ 2x
<maxi∈I |ei |max j∈J |e j |
<1 +
(1 − )2 ·√
σ 2x + σ 2
z
σ 2x
∣∣∣∣∣∣x ∈ Aε
⎞
⎠ = 1. (23)
Equation (23) holds for any > 0. We note that
√σ 2
x +σ 2z
σ 2x
> 1,
and thus 1−(1+)2 ·
√σ 2
x +σ 2z
σ 2x
> 1 for sufficiently small .
Therefore,
limN→∞ Pr
(maxi∈I |ei |max j∈J |e j | > 1
∣∣∣∣ x ∈ Aε
)
= limN→∞ Pr
(maxi∈I |xi − xW,BG,i |
max j∈J |x j − xW,BG, j | > 1
∣∣∣∣ x ∈ Aε
)
= 1, (24)
6630 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 10, OCTOBER 2014
limN→∞
E[‖x − xW,BG‖∞
∣∣ x ∈ Aε
]√
ln(N)
= limN→∞ E
[maxi∈I |xi − xW,BG,i |√
ln(N)
∣∣∣∣ x ∈ Aε,maxi∈I |xi − xW,BG,i |
max j∈J |x j − xW,BG, j | > 1
]· Pr
(maxi∈I |xi − xW,BG,i |
max j∈J |x j − xW,BG, j | > 1
∣∣∣∣ x ∈ Aε
)
+ limN→∞ E
[max j∈J |x j − xW,BG, j |√
ln(N)
∣∣∣∣ x ∈ Aε,maxi∈I |xi − xW,BG,i |
max j∈J |x j − xW,BG, j | ≤ 1
]· Pr
(maxi∈I |xi − xW,BG,i |
max j∈J |x j − xW,BG, j | ≤ 1
∣∣∣∣ x ∈ Aε
).
(25)
and
limN→∞ Pr
(maxi∈I |xi − xW,BG,i |
max j∈J |x j − xW,BG, j | ≤ 1
∣∣∣∣ x ∈ Aε
)= 0.
3) Mean �∞-Norm Error: The road map for the remainderof the proof is to first show that when x ∈ Aε the Wienerfilter is asymptotically optimal for expected �∞-norm error,and then show that Pr(x ∈ Aε) is arbitrarily close to 1.
In order to utilize equations (19) and (20), we normalizethe quantities in the derivation of equation (25) by
√ln(N) so
that every term is bounded.Let us now verify that the second term in (25) equals 0.
In fact, the following derivations hold from (14) and (24),
1 = limN→∞ Pr
⎛
⎜⎜⎝
∣∣∣∣∣∣∣∣
max j∈J∣∣x j − xW,GB, j
∣∣√
2 · σ 4x σ 2
z(σ 2
x +σ 2z )2 · ln(|J |)
− 1
∣∣∣∣∣∣∣∣<
∣∣∣∣∣∣∣∣x ∈ Aε
⎞
⎟⎟⎠
= limN→∞ Pr
⎛⎜⎜⎝
∣∣∣∣∣∣∣∣
max j∈J∣∣x j − xW,GB, j
∣∣√
2 · σ 4x σ 2
z(σ 2
x +σ 2z )2 · ln(|J |)
− 1
∣∣∣∣∣∣∣∣<
∣∣∣∣∣∣∣∣x ∈ Aε,
maxi∈I |xi − xW,BG,i |max j∈J |x j − xW,BG, j | > 1
⎞⎟⎟⎠ .
Therefore,
limN→∞ E
⎡
⎢⎢⎣max j∈J
∣∣x j − xW,GB, j∣∣
√2 · σ 4
x σ 2z
(σ 2x +σ 2
z )2 · ln(|J |)
∣∣∣∣∣∣∣∣x ∈ Aε,
maxi∈I |xi − xW,BG,i |max j∈J |x j − xW,BG, j | > 1
⎤
⎥⎥⎦ = 1,
which yields (following similar derivations of (16) and (20))
limN→∞ E
[max j∈J
∣∣x j − xW,GB, j∣∣
√ln(N)
∣∣∣∣∣ x ∈ Aε,
maxi∈I |xi − xW,BG,i |max j∈J |x j − xW,BG, j | > 1
]=√
2 · σ 4x σ 2
z
(σ 2x + σ 2
z )2 .
Therefore, the second term in (25) equals√2 · σ 4
x σ 2z
(σ 2x +σ 2
z )2 × 0 = 0, and equation (25) becomes
limN→∞
E[‖x − xW,BG‖∞
∣∣ x ∈ Aε
]√
ln(N)
= limN→∞ E
[maxi∈I |xi − xW,BG,i |√
ln(N)
∣∣∣∣ x ∈ Aε,
maxi∈I |xi − xW,BG,i |max j∈J |x j − xW,BG, j | > 1
]
= limN→∞ E
[maxi∈I |xi − xW,BG,i |√
ln(N)
∣∣∣∣ x ∈ Aε
]
− limN→∞ E
[maxi∈I |xi − xW,BG,i |√
ln(N)
∣∣∣∣ x ∈ Aε,
maxi∈I |xi − xW,BG,i |max j∈J |x j − xW,BG, j | ≤ 1
]
× Pr
(maxi∈I |xi − xW,BG,i |
max j∈J |x j − xW,BG, j | ≤ 1
∣∣∣∣ x ∈ Aε
)
= limN→∞ E
[maxi∈I |xi − xW,BG,i |√
ln(N)
∣∣∣∣ x ∈ Aε
]. (26)
Equation (26) shows that the maximum absolute error of theWiener filter relates to the Gaussian-distributed componentsof x.
4) Optimality of the Wiener Filter: It has been shown bySherman [30], [31] that, for parallel Gaussian channels withan i.i.d. Gaussian input x, if an error metric function d(x, x)relating x and its estimate x is convex, then the Wiener filteris optimal for that error metric. The �∞-norm is convex, andtherefore, for any estimator x,
E [‖x − x‖∞| x ∈ Aε] = E
[max
i∈I∪J|xi − xi |
∣∣∣∣ x ∈ Aε
]
≥ E
[maxi∈I
|xi − xi |∣∣∣∣ x ∈ Aε
]
≥ E
[maxi∈I
|xi − xW,BG,i |∣∣∣∣ x∈ Aε
]. (27)
The inequality (27) holds, because the set {xi : i ∈ I}only contains the i.i.d. Gaussian components of x, and theWiener filter is optimal for �∞-norm error when the inputsignal is i.i.d. Gaussian. The inequality (27) holds for anysignal length N, and thus it holds when N → ∞ and wedivide both sides by
√ln(N), (28), shown at the top of
next page, where the last step in (28) is justified by the
TAN et al.: WIENER FILTERS IN GAUSSIAN MIXTURE SIGNAL ESTIMATION 6631
0 ≤ limN→∞
(E [‖x − x‖∞| x ∈ Aε]√
ln(N)− E
[maxi∈I |xi − xW,BG,i |
∣∣ x ∈ Aε
]√
ln(N)
)
= limN→∞
(E [‖x − x‖∞| x ∈ Aε]√
ln(N)− E
[‖x − xW,BG‖∞∣∣ x ∈ Aε
]√
ln(N)
), (28)
derivation of (26). Equation (28) also holds for x = x�∞ ,
limN→∞
(E[‖x − x�∞‖∞
∣∣ x ∈ Aε
]√
ln(N)
− E[‖x − xW,BG‖∞
∣∣ x ∈ Aε
]√
ln(N)
)≥ 0. (29)
5) Typical Set: Let us now evaluate Pr(x ∈ Aε). The set Aε
only considers whether the components in x are Gaussianor zero, and so we introduce a binary vector x ∈ R
N ,where xi = 1{xi∼N (μx σ 2
x )} and 1{·} is the indicator function.That is, xi = 1 if xi is Gaussian, and else xi = 0. Thesequence x � {x1, x2, . . . , xN } is called a typical sequence([32], p. 59), if it satisfies
2−N(H(X)+δ) ≤ Pr(x1, x2, . . . , xN ) ≤ 2−N(H(X)−δ), (30)
for some δ > 0, where H (X) denotes the binary entropy [32]of the sequence {x1, x2, . . . , xN }. The set Aε is then called atypical set [32], and
Pr(x ∈ Aε) > 1 − δ. (31)
We highlight that the inequalities (30) and (31) both holdwhen δ → 0+ as a function of N .
In our problem setting where Pr(xi = 1) = Pr(xi ∼N (μx , σ
2x )) = s, the entropy of the sequence {x1, x2, . . . , xN }
is
H (X) = −s log2(s) − (1 − s) log2(1 − s), (32)
and the probability of the sequence {x1, x2, . . . , xN } is
Pr(x1, x2, . . . , xN ) = s|I| · (1 − s)|J |. (33)
Plugging (17), (32), and (33) into (30), the value of δ can becomputed,
δ = ε
∣∣∣∣log2
(s
1 − s
)∣∣∣∣, (34)
for 0 < s < 1 and s �= 0.5. That is,
Pr(x ∈ Aε) > 1 − δ = 1 − ε
∣∣∣∣log2
(s
1 − s
)∣∣∣∣. (35)
Finally, we compare E[‖x−xWB,G‖∞] with E[‖x−x�∞‖∞],where x�∞ satisfies (7), i.e., the estimate x�∞ is optimal forminimizing the mean �∞-norm error of estimation. By defin-ition,
limN→∞
(E[‖x − x�∞‖∞
]√
ln(N)− E
[‖x − xW,BG‖∞]
√ln(N)
)≤ 0,
but we already proved in (29) that
limN→∞
(E[‖x−x�∞‖∞
∣∣ x∈ Aε
]√
ln(N)− E
[‖x−xW,BG‖∞∣∣ x∈ Aε
]√
ln(N)
)≥0,
and thus
limN→∞
(E[‖x−x�∞‖∞
∣∣ x /∈ Aε
]√
ln(N)− E
[‖x−xW,BG‖∞∣∣ x /∈ Aε
]√
ln(N)
)
≤ 0. (36)
We know that Pr (x /∈ Aε) < δ from (35). To completethe proof, it suffices to show that, when x /∈ Aε , thesubtraction (36) is bounded. When x /∈ Aε , there are 3 casesfor the possible values of |I| and |J |:
• Case 1: |I|, |J | → ∞, but (18) may not hold.• Case 2: |J | → ∞ but |I| �→ ∞.• Case 3: |I| → ∞ but |J | �→ ∞.
We observe that equations (19) and (20) are derivedfrom (15), (16), and (18). In Case 1, similar equalitiesto (15) and (16) hold,
limN→∞ E
[maxi∈I |ei |√
ln(N)
∣∣∣∣Case 1 of x /∈ Aε
]
= limN→∞
√
2 · σ 2x σ 2
z
σ 2x + σ 2
z· ln(|I|)
ln(N)≤√
2 · σ 2x σ 2
z
σ 2x + σ 2
z
and
limN→∞ E
[max j∈J |e j |√
ln(N)
∣∣∣∣Case 1 of x /∈ Aε
]
= limN→∞
√
2 · σ 4x σ 2
z
(σ 2x + σ 2
z )2 · ln(|J |)ln(N)
≤√
2 · σ 4x σ 2
z
(σ 2x + σ 2
z )2 .
Therefore, the value of limN→∞ E[ ‖x−xW,BG‖∞√
ln(N)| Case 1 of
x /∈ Aε
]is bounded.
In Case 2, it is obvious that limN→∞ E[
maxi∈I |ei |√ln(N)
|Case 2 of x /∈ Aε
]is bounded because |I| �→ ∞, while
limN→∞ E[
max j∈J |e j |√ln(N)
| Case 2 of x /∈ Aε
]is bounded
because |J | ≤ N , and
limN→∞ E
[max j∈J |e j |√
ln(N)
∣∣∣∣Case 2 of x /∈ Aε
]≤√
2· σ 4x σ 2
z
(σ 2x +σ 2
z )2 .
The analysis for Case 3 is similar to that of Case 2.Therefore, we have shown that limN→∞ E
[ ‖x−xW,BG‖∞√ln(N)
|x /∈ Aε
]is bounded.
6632 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 10, OCTOBER 2014
limN→∞
E[‖x − xW,BG‖∞
]
E[‖x − x�∞‖∞
] = limN→∞
E[‖x−xW,BG‖∞]√ln(N)
E[‖x−x�∞‖∞]√ln(N)
= limN→∞
E[ ‖x−xW,BG‖∞|x∈Aε]√ln(N)
· Pr (x ∈ Aε) + E[ ‖x−xW,BG‖∞|x/∈Aε]√ln(N)
· Pr (x /∈ Aε)
E[ ‖x−x�∞‖∞|x∈Aε]√ln(N)
· Pr (x ∈ Aε) + E[ ‖x−x�∞‖∞|x/∈Aε]√ln(N)
· Pr (x /∈ Aε)
= limN→∞
E[ ‖x−xW,BG‖∞|x∈Aε]√ln(N)
· Pr (x ∈ Aε) + E[ ‖x−x�∞‖∞|x/∈Aε]√ln(N)
· Pr (x /∈ Aε)
E[ ‖x−x�∞‖∞|x∈Aε]√ln(N)
· Pr (x ∈ Aε) + E[ ‖x−x�∞‖∞|x/∈Aε]√ln(N)
· Pr (x /∈ Aε)
+ limN→∞
E[ ‖x−xW,BG‖∞|x/∈Aε]√ln(N)
· Pr (x /∈ Aε) − E[ ‖x−x�∞‖∞|x/∈Aε]√ln(N)
· Pr (x /∈ Aε)
E[ ‖x−x�∞‖∞|x∈Aε]√ln(N)
· Pr (x ∈ Aε) + E[ ‖x−x�∞‖∞|x/∈Aε]√ln(N)
· Pr (x /∈ Aε)
≤ 1 + limN→∞
(E[ ‖x−xW,BG‖∞|x/∈Aε]√
ln(N)− E[ ‖x−x�∞‖∞|x/∈Aε]√
ln(N)
)· Pr (x /∈ Aε)
E[‖x−x�∞‖∞]√ln(N)
< 1 + limN→∞
c · δE[‖x−x�∞‖∞]√
ln(N)
. (37)
By (36), limN→∞ E[ ‖x−x�∞‖∞|x/∈Aε]√ln(N)
is bounded above by
limN→∞ E[ ‖x−xW,BG‖∞√
ln(N)
∣∣x /∈ Aε
]. Hence,
limN→∞
(E[‖x−xW,BG‖∞
∣∣ x /∈ Aε
]√
ln(N)− E
[‖x−x�∞‖∞∣∣ x /∈ Aε
]√
ln(N)
)
= c
is bounded, where c > 0 is a constant.Therefore, the derivation in equation (37), shown at the top
of the page, holds. In (37), the value of limN→∞ E[‖x−x�∞‖∞]√ln(N)
is bounded below because of (29),
limN→∞
E[‖x − x�∞‖∞
]√
ln(N)
= limN→∞
E[‖x − x�∞‖∞
∣∣ x ∈ Aε
]√
ln(N)· Pr (x ∈ Aε)
+ limN→∞
E[‖x − x�∞‖∞
∣∣ x /∈ Aε
]√
ln(N)· Pr (x /∈ Aε)
≥ limN→∞
E[‖x − xW,BG‖∞
∣∣ x ∈ Aε
]√
ln(N)· Pr (x ∈ Aε)
>
√
2 · σ 2x σ 2
z
σ 2x + σ 2
z· (1 − δ).
On the other hand, whether the value of E[‖x−x�∞‖∞]√ln(N)
is bounded above or not, the second term in (37) is alwaysarbitrarily small because δ is arbitrarily small, and thus (37)is equivalent to
limN→∞
E[‖x − xW,BG‖∞
]
E[‖x − x�∞‖∞
] < 1 + δ,
where δ → 0+ as a function of N . Finally, because x∞ is theoptimal estimator for �∞-norm error,
limN→∞
E[‖x − xW,BG‖∞
]
E[‖x − x�∞‖∞
] ≥ 1.
Therefore,
limN→∞
E[‖x − xW,BG‖∞
]
E[‖x − x�∞‖∞
] = 1,
which completes the proof.
B. Proof of Theorem 2
The road map of the proof of Theorem 2 is the same as thatof Theorem 1.
1) K Error Patterns: The input signal of the parallelGaussian channels (3) is generated by an i.i.d. Gaussianmixture source (1), and suppose without loss of generalitythat σ 2
1 = maxk∈{1,2,...,K } σ 2k . The Wiener filter is xW,GM =
σ 21
σ 21 +σ 2
z· (r − μ1) + μ1 = σ 2
1 r+σ 2z μ1
σ 21 +σ 2
z. Let Ik denote the index
set where xi ∼ N (μk , σ2k ),. Then we define K types of error
patterns: for k ∈ {1, 2, . . . , K }, the k-th error pattern is
e(k)i � xW,GM,i − xi = σ 2
1 ri + σ 2z μ1
σ 21 + σ 2
z
− xi
∼N(
σ 2z
σ 21 +σ 2
z
μ1− σ 2z
σ 21 +σ 2
z
μk,σ 4
z
(σ 21 +σ 2
z )2σ 2
k + σ 41
(σ 21 +σ 2
z )2σ 2
z
),
where
i ∈ Ik � {i : xi ∼ N (μk, σ2k )}.
Because the variances σ 2z , σ 2
1 , σ 22 , . . . , σ 2
K > 0 are constants,and σ 2
1 = maxk∈{1,2,...,K } σ 2k ,
σ 4z
(σ 21 + σ 2
z )2σ 2
1 + σ 41
(σ 21 + σ 2
z )2σ 2
z
= maxk∈{1,2,...,K }
(σ 4
z
(σ 21 + σ 2
z )2σ 2
k + σ 41
(σ 21 + σ 2
z )2σ 2
z
), (38)
which shows that the first error pattern e(1)i has the greatest
variance.
TAN et al.: WIENER FILTERS IN GAUSSIAN MIXTURE SIGNAL ESTIMATION 6633
2) Maximum of Error Patterns: Define the set Aε as
Aε �{x :∣∣∣∣|I1|N
−s1
∣∣∣∣<ε1,
∣∣∣∣|I2|N
−s2
∣∣∣∣<ε2, . . . ,
∣∣∣∣|IK |
N−sK
∣∣∣∣<εK
},
where∑K
k=1 |Ik | = N , and εk → 0+ as a function of Nfor k ∈ {1, 2, . . . , K }. Applying a similar derivation to thatof (24), we obtain that
limN→∞ Pr
⎛⎜⎜⎝
maxi∈I1 |xW,GM,i − xi |max j∈Ik |xW,GM, j − x j |
>1 −
(1 + )2 ·
√σ 4
z
(σ 21 +σ 2
z )2 σ 21 + σ 4
1(σ 2
1 +σ 2z )2 σ 2
z
√σ 4
z
(σ 21 +σ 2
z )2 σ 2k + σ 4
1(σ 2
1 +σ 2z )2 σ 2
z
∣∣∣∣∣∣∣∣x∈ Aε
⎞⎟⎟⎠
(39)
= limN→∞ Pr
(maxi∈I1 |xW,GM,i − xi |max j∈Ik |xW,GM, j − x j | ≥ 1
∣∣∣∣ x ∈ Aε
)
= 1, (40)
for any k �= 1. Equation (40) is valid because (39) holds
for any > 0, and
√σ4
z(σ2
1 +σ2z )2
σ 21 + σ4
1(σ2
1 +σ2z )2
σ 2z
√σ4
z(σ2
1 +σ2z )2
σ 2k + σ4
1(σ2
1 +σ2z )2
σ 2z
≥ 1 is derived
from (38).Hence,
limN→∞ E
[ ‖x − xW,GM‖∞√ln(N)
∣∣∣∣ x ∈ Aε
]
limN→∞ E
[maxi∈I1 |xi − xW,GM,i |√
ln(N)
∣∣∣∣ x ∈ Aε
]. (41)
Equation (41) shows that the maximum absolute error of theWiener filter relates to the Gaussian component that has thegreatest variance.
3) Optimality of the Wiener Filter: Then applying similarderivations of equations (28) and (29),
limN→∞
(E
[‖x−x�∞‖∞√ln(N)
∣∣∣∣ x∈ Aε
]−E
[‖x−xW,GM‖∞√ln(N)
∣∣∣∣ x∈ Aε
])
≥ 0.
4) Typical Set: Similar to the derivation of (34), we obtainthe probability of x ∈ Aε ([32], p. 59),
Pr(x ∈ Aε) > 1 − δ,
where
δ =K∑
k=1
εk∣∣log2(sk)
∣∣.
Finally,
limN→∞
E[‖x − xW,GM‖∞
]
E[‖x − x�∞‖∞
] < 1 + δ,
where δ → 0+, and thus
limN→∞
E[‖x − xW,GM‖∞
]
E[‖x − x�∞‖∞
] = 1.
C. Proof of Lemma 1
It has been shown [33], [35] that for an i.i.d. standardGaussian sequence ui ∼ N (0, 1), where i ∈ {1, 2, . . . , N}, themaximum of the sequence, maxi ui , converges to
√2 ln(N) in
probability, i.e.,
limN→∞ Pr
(∣∣∣∣max1≤i≤N ui√
2 · ln(N)− 1
∣∣∣∣ <
)= 1,
for any > 0. Therefore, for an i.i.d. non-standard Gaussiansequence ui ∼ N (μ, σ 2), ui−μ
|σ | ∼ N (0, 1), and it follows that
limN→∞ Pr
(∣∣∣∣∣max1≤i≤N (ui − μ)√
2σ 2 · ln(N)− 1
∣∣∣∣∣ <
)= 1, (42)
for any > 0. We observe that, for a given μ, the followingprobability equals 1 for sufficient large N , and therefore,
limN→∞ Pr
(∣∣∣∣∣−μ√
2σ 2 · ln(N)− 0
∣∣∣∣∣ <
)= 1, (43)
for any > 0. Combining (42) and (43),
limN→∞ Pr
(∣∣∣∣∣max1≤i≤N ui√
2σ 2 · ln(N)− 1
∣∣∣∣∣ < 2
)= 1,
limN→∞ Pr
(∣∣∣∣∣max1≤i≤N |ui |√
2σ 2 · ln(N)− 1
∣∣∣∣∣ <
)
= limN→∞ Pr
(∣∣∣∣∣max1≤i≤N ui√
2σ 2 · ln(N)− 1
∣∣∣∣∣ < and
∣∣∣∣∣max1≤i≤N (−ui )√
2σ 2 · ln(N)− 1
∣∣∣∣∣ <
)
= limN→∞ Pr
(∣∣∣∣∣max1≤i≤N ui√
2σ 2 · ln(N)− 1
∣∣∣∣∣ <
)− lim
N→∞ Pr
(∣∣∣∣∣max1≤i≤N ui√
2σ 2 · ln(N)− 1
∣∣∣∣∣ < and
∣∣∣∣∣max1≤i≤N (−ui )√
2σ 2 · ln(N)− 1
∣∣∣∣∣ >
)
= limN→∞ Pr
(∣∣∣∣∣max1≤i≤N ui√
2σ 2 · ln(N)− 1
∣∣∣∣∣ <
)− 0
= 1,
6634 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 10, OCTOBER 2014
for any > 0, which owing to arbitrariness of yields
limN→∞ Pr
(∣∣∣∣∣max1≤i≤N ui√
2σ 2 · ln(N)− 1
∣∣∣∣∣ <
)= 1. (44)
Equation (44) suggests that, for a sequence of i.i.d. Gaussianrandom variables, ui ∼ N (μ, σ 2), the maximum of thesequence is not affected by the value of μ.
On the other hand, the i.i.d. Gaussian sequence(−ui ) ∼ N (−μ, σ 2) satisfies
limN→∞ Pr
(∣∣∣∣∣max1≤i≤N (−ui )√
2σ 2 · ln(N)− 1
∣∣∣∣∣ <
)= 1.
Hence, the equation, as shown at the bottom of the previouspage, for any > 0.
IV. CONCLUSION
This correspondence focused on estimating input signals inparallel Gaussian channels, where the signals were generatedby i.i.d. Gaussian mixture sources, and the �∞-norm errorwas used to quantify the performance. We proved that theWiener filter (10), a simple linear function that is applied tothe Gaussian channel outputs, asymptotically minimizes themean �∞-norm error when the signal dimension N → ∞.Specifically, the multiplicative constant of the linear filteronly relates to the greatest variance of the Gaussian mixturecomponents and the variance of the Gaussian noise. Ourresults for parallel Gaussian channels can be extended to linearmixing systems, in settings where linear mixing systems canbe decoupled to parallel Gaussian channels.
Our results are asymptotic, but one will notice fromSection III-A (22) that the asymptotic results hold only forastronomically large signal dimension N , which may leadreaders to wonder whether the Wiener filter performs wellwhen the signal dimension is finite. To answer this ques-tion, we performed numerical simulations for finite signaldimensions. The numerical results showed that the Wienerfilter indeed reduces the �∞-norm error to some extent.Specifically, the Wiener filter outperforms the relaxed beliefpropagation algorithm [18], [19] in linear mixing systems.However, our numerical results suggested that there exist betteralgorithms [2] for �∞-norm error than the Wiener filter infinite signal dimension settings. The development of optimalalgorithms in the finite dimension setting is left for futurework.
ACKNOWLEDGMENTS
The authors would like to thank Nikhil Krishnan for usefuldiscussions. They would also like to thank the reviewersfor their comments, which greatly helped us improve thismanuscript.
REFERENCES
[1] J. Tan and D. Baron, “Signal reconstruction in linear mixing systemswith different error metrics,” in Proc. Inf. Theory Appl. Workshop,Feb. 2013, pp. 1–7.
[2] J. Tan, D. Baron, and L. Dai, “Signal estimation with low infinity-normerror by minimizing the mean p-norm error,” in Proc. IEEE 48th Annu.Conf. Inf. Sci. Syst., Mar. 2014, pp. 1–5.
[3] A. Papoulis, Probability, Random Variables, and Stochastic Processes.New York, NY, USA: McGraw-Hill, 1991.
[4] T. I. Alecu, S. Voloshynovskiy, and T. Pun, “The Gaussian transformof distributions: Definition, computation and application,” IEEE Trans.Signal Process., vol. 54, no. 8, pp. 2976–2985, Aug. 2006.
[5] A. Bijaoui, “Wavelets, Gaussian mixtures and Wiener filtering,” SignalProcess., vol. 82, no. 4, pp. 709–712, Apr. 2002.
[6] M. Tabuchi, N. Yamane, and Y. Morikawa, “Adaptive Wiener filter basedon Gaussian mixture model for denoising chest X-ray CT image,” inProc. Annu. Conf. SICE, Sep. 2007, pp. 682–689.
[7] J. Vila and P. Schniter, “Expectation-maximization Gaussian-mixtureapproximate message passing,” in Proc. 46th Annu. Conf. Inf. Sci. Syst.,Mar. 2012, pp. 1–6.
[8] J. P. Vila and P. Schniter, “Expectation-maximization Gaussian-mixtureapproximate message passing,” IEEE Trans. Signal Process., vol. 61,no. 19, pp. 4658–4672, Oct. 2013.
[9] J. Tan, D. Carmon, and D. Baron, “Signal estimation with additive errormetrics in compressed sensing,” IEEE Trans. Inf. Theory, vol. 60, no. 1,pp. 150–158, Jan. 2014.
[10] M. Dalai and R. Leonardi, “�-infinity constrained approximationsfor image and video compression,” in Proc. Picture Coding Symp.,Apr. 2006.
[11] C. Studer, W. Yin, and R. G. Baraniuk, “Signal representations withminimum �∞-norm,” in Proc. 50th Allerton Conf. Commun., Control,Comput., Oct. 2012, pp. 1270–1277.
[12] A. C. Gilbert, B. Hemenway, A. Rudra, M. J. Strauss, and M. Wootters,“Recovering simple signals,” in Proc. Inf. Theory Appl. Workshop,Feb. 2012, pp. 382–391.
[13] M. Egerstedt and C. F. Martin, “Trajectory planning in the infinitynorm for linear control systems,” Int. J. Control, vol. 72, no. 13,pp. 1139–1146, 1999.
[14] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles:Exact signal reconstruction from highly incomplete frequency informa-tion,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006.
[15] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52,no. 4, pp. 1289–1306, Apr. 2006.
[16] D. Guo and C. C. Wang, “Random sparse linear systems observed viaarbitrary channels: A decoupling principle,” in Proc. IEEE Int. Symp.Inf. Theory, Jun. 2007, pp. 946–950.
[17] D. Guo and C.-C. Wang, “Multiuser detection of sparsely spreadCDMA,” IEEE J. Sel. Areas Commun., vol. 26, no. 3, pp. 421–431,Apr. 2008.
[18] S. Rangan, “Estimation with random linear mixing, belief propagationand compressed sensing,” in Proc. 44th Annu. Conf. Inf. Sci. Syst.,Mar. 2010, pp. 1–6.
[19] S. Rangan, “Estimation with random linear mixing, belief propagationand compressed sensing,” arXiv:1001.2228, Jan. 2010.
[20] J.-L. Starck, F. Murtagh, and J. M. Fadili, Sparse Image and SignalProcessing: Wavelets, Curvelets, Morphological Diversity. Cambridge,U.K.: Cambridge Univ. Press, 2010.
[21] J. Vila and P. Schniter, “Expectation-maximization Bernoulli-Gaussianapproximate message passing,” in Proc. IEEE 45th Asilomar Conf.Signals, Syst. Comput., Nov. 2011, pp. 799–803.
[22] C. E. Clark, “The greatest of a finite set of random variables,” Oper.Res., vol. 9, no. 2, pp. 145–162, Mar. 1961.
[23] P. Indyk, “On approximate nearest neighbors under �∞ norm,”J. Comput. Syst. Sci., vol. 63, no. 4, pp. 627–638, Dec. 2001.
[24] D. Guo and S. Verdú, “Randomly spread CDMA: Asymptoticsvia statistical physics,” IEEE Trans. Inf. Theory, vol. 51, no. 6,pp. 1983–2010, Jun. 2005.
[25] A. Montanari and D. Tse, “Analysis of belief propagation for non-linearproblems: The example of CDMA (or: How to prove Tanaka’s formula),”in Proc. IEEE Inf. Theory Workshop, Mar. 2006, pp. 160–164.
[26] D. Guo, D. Baron, and S. Shamai, “A single-letter characterizationof optimal noisy compressed sensing,” in Proc. 47th Allerton Conf.Commun., Control, Comput., Sep. 2009, pp. 52–59.
[27] S. Rangan, A. K. Fletcher, and V. K. Goyal, “Asymptotic analysis ofMAP estimation via the replica method and applications to compressedsensing,” IEEE Trans. Inf. Theory, vol. 58, no. 3, pp. 1902–1923,Mar. 2012.
[28] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algo-rithms for compressed sensing,” Proc. Nat. Acad. Sci., vol. 106, no. 45,pp. 18914–18919, Nov. 2009.
[29] N. Wiener, Extrapolation, Interpolation, and Smoothing of StationaryTime Series With Engineering Applications. Cambridge, MA, USA:MIT Press, 1949.
TAN et al.: WIENER FILTERS IN GAUSSIAN MIXTURE SIGNAL ESTIMATION 6635
[30] S. Sherman, “A theorem on convex sets with applications,” Ann. Math.Statist., vol. 26, no. 4, pp. 763–767, Dec. 1955.
[31] S. Sherman, “Non-mean-square error criteria,” IRE Trans. Inf. Theory,vol. 4, no. 3, pp. 125–126, Sep. 1958.
[32] T. M. Cover and J. A. Thomas, Elements of Information Theory.New York, NY, USA: Wiley, 2006.
[33] B. Gnedenko, “Sur la distribution limite du terme maximum d’une seriealeatoire,” Ann. Math., vol. 44, no. 3, pp. 423–453, Jul. 1943.
[34] T. Tanaka, “A statistical-mechanics approach to large-system analysis ofCDMA multiuser detectors,” IEEE Trans. Inf. Theory, vol. 48, no. 11,pp. 2888–2910, Nov. 2002.
[35] S. Berman, “A law of large numbers for the maximum in a stationaryGaussian sequence,” Ann. Math. Statist., vol. 33, no. 1, pp. 93–97,Mar. 1962.
Jin Tan (S’11) received the B.Sc. degree in Microelectronics from FudanUniversity, China, in 2010, and the M.Sc. degree in Electrical and ComputerEngineering from North Carolina State University, Raleigh, USA, in 2012.Currently she is a PhD candidate at North Carolina State University, in thedepartment of Electrical and Computer Engineering. Her research interestsinclude information theory, estimation theory, and statistical signal processing.
Dror Baron (S’99–M’03–M’10) received the B.Sc. (summa cum laude) andM.Sc. degrees from the Technion - Israel Institute of Technology, Haifa, Israel,in 1997 and 1999, and the Ph.D. degree from the University of Illinois atUrbana-Champaign in 2003, all in electrical engineering.
From 1997 to 1999, Dr. Baron worked at Witcom Ltd. in modem design.From 1999 to 2003, he was a research assistant at the University of Illinoisat Urbana-Champaign, where he was also a Visiting Assistant Professor in2003. From 2003 to 2006, he was a Postdoctoral Research Associate inthe Department of Electrical and Computer Engineering at Rice University,Houston, TX. From 2007 to 2008, he was a quantitative financial analyst withMenta Capital, San Francisco, CA, and from 2008 to 2010 he was a VisitingScientist in the Department of Electrical Engineering at the Technion - IsraelInstitute of Technology, Haifa. Since 2010, Dr. Baron has been an AssistantProfessor in the Electrical and Computer Engineering Department at NorthCarolina State University.
Dr. Baron’s research interests combine information theory, signal process-ing, and fast algorithms; in recent years, he has focused on compressedsensing. Dr. Baron was a recipient of the 2002 M. E. Van ValkenburgGraduate Research Award, and received honorable mention at the RobertBohrer Memorial Student Workshop in April 2002, both at the Universityof Illinois. He also participated from 1994 to 1997 in the Program forOutstanding Students, comprising the top 0.5% of undergraduates at theTechnion.
Liyi Dai (S’93–M’93–SM’13–F’14) received a B.S. degree from ShandongUniversity, Shandong, China, in 1983, an M.S. degree from the Institute ofSystems Science, Academia Sinica, Beijing, China, in 1986 and a Ph.D.degree from Harvard University, Cambridge, MA, USA, in 1993. He wasa recipient of the NSF CAREER Award and an Associate Editor of the IEEETRANSACTIONS ON AUTOMATIC CONTROL. He has authored/coauthored 78journal and conference publications and is the author of Singular ControlSystems (Springer-Verlag, 1989).