wiener filters in gaussian mixture signal estimation with \(\ell _\infty \) -norm error

10
6626 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 10, OCTOBER 2014 Wiener Filters in Gaussian Mixture Signal Estimation With -Norm Error Jin Tan, Student Member, IEEE, Dror Baron, Member, IEEE, and Liyi Dai, Fellow, IEEE Abstract— Consider the estimation of a signal x R N from noisy observations r = x + z, where the input x is generated by an independent and identically distributed (i.i.d.) Gaussian mixture source, and z is additive white Gaussian noise in parallel Gaussian channels. Typically, the 2 -norm error (squared error) is used to quantify the performance of the estimation process. In contrast, we consider the -norm error (worst case error). For this error metric, we prove that, in an asymptotic setting where the signal dimension N →∞, the -norm error always comes from the Gaussian component that has the largest variance, and the Wiener filter asymptotically achieves the optimal expected -norm error. The i.i.d. Gaussian mixture case can be extended to i.i.d. Bernoulli-Gaussian distributions, which are often used to model sparse signals. Finally, our results can be extended to linear mixing systems with i.i.d. Gaussian mixture inputs, in settings where a linear mixing system can be decoupled to parallel Gaussian channels. Index Terms— Estimation theory, Gaussian mixtures, -norm error, linear mixing systems, parallel Gaussian channels, Wiener filters. I. I NTRODUCTION A. Motivation The Gaussian distribution is widely used to describe the probability densities of various types of data, owing to its advantageous mathematical properties [3]. It has been shown that non-Gaussian distributions can often be sufficiently approximated by an infinite mixture of Gaussians [4], so that the mathematical advantages of the Gaussian distribution can be leveraged when discussing non-Gaussian signals [4]–[8]. In practice, signals are often contaminated by noise during sampling or transmission, and therefore estimation of signals from their noisy observations are needed. Most estimation methods evaluate the performance by the ubiquitous 2 -norm error [4] (squared error). However, there are applications where other error metrics may be preferred [9]. For example, the 2 error criterion ensures that the estimated signal has Manuscript received January 3, 2014; revised May 14, 2014; accepted July 16, 2014. Date of publication August 1, 2014; date of current version September 11, 2014. This work was supported in part by the National Science Foundation under Grant CCF-1217749 and in part by the U.S. Army Research Office under Grant W911NF-04-D-0003. This paper was presented at the 2013 Information Theory and Applications Workshop [1] and 2014 IEEE Conference on Information Sciences and Systems [2]. J. Tan and D. Baron are with the Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC 27695 USA (e-mail: [email protected]; [email protected]). L. Dai is with the Computing Sciences Division, U.S. Army Research Office, Research Triangle Park, Durham, NC 27709 USA (e-mail: [email protected]). Communicated by O. Milenkovic, Associate Editor for Coding Theory. Digital Object Identifier 10.1109/TIT.2014.2345260 low square error on average, but does not guarantee that every estimated signal component is close to the corresponding original signal component. In problems such as image and video compression [10] where the reconstruction quality at every signal component is important, it would be desirable to optimize for -norm error. Our interest in the -norm error is also motivated by applications including wireless commu- nications [11], group testing [12] and trajectory planning in control systems [13], where we want to decrease the worst- case sensitivity to noise. B. Problem Setting In this correspondence, our main focus is on parallel Gaussian channels, and the results can be extended to linear mixing systems. In both settings, the input x R N is generated by an independent and identically distributed (i.i.d.) Gaussian mixture source, x i K k=1 s k · N k 2 k ) = K k=1 s k 2πσ 2 k e (x i μ k ) 2 2σ 2 k , (1) where the subscript (·) i denotes the i -th component of a sequence (or a vector), μ 1 2 ,...,μ K (respectively, σ 2 1 2 2 ,...,σ 2 K ) are the means (respectively, variances) of the Gaussian components, and 0 < s 1 , s 2 ,..., s K < 1 are the probabilities of the K Gaussian components. Note that K k=1 s k = 1. A special case of the Gaussian mixture is Bernoulli-Gaussian, x i s · N x 2 x ) + (1 s ) · δ(x i ), (2) for some 0 < s < 1, μ x , and σ 2 x , where δ(·) is the delta function [3]. The zero-mean Bernoulli-Gaussian model is often used in sparse signal processing [14]–[21]. In parallel Gaussian channels [5], [6], we consider r = x + z, (3) where r, x, z R N are the output signal, the input signal, and the additive white Gaussian noise (AWGN), respectively. The AWGN channel can be described by the conditional distribution f R|X (r|x) = N i =1 f R| X (r i |x i ) = N i =1 1 2πσ 2 z exp (r i x i ) 2 2σ 2 z , (4) where σ 2 z is the variance of the Gaussian noise. 0018-9448 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: liyi

Post on 06-Feb-2017

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Wiener Filters in Gaussian Mixture Signal Estimation With  \(\ell _\infty \) -Norm Error

6626 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 10, OCTOBER 2014

Wiener Filters in Gaussian Mixture SignalEstimation With �∞-Norm Error

Jin Tan, Student Member, IEEE, Dror Baron, Member, IEEE,and Liyi Dai, Fellow, IEEE

Abstract— Consider the estimation of a signal x ∈ RN from

noisy observations r = x + z, where the input x is generated by anindependent and identically distributed (i.i.d.) Gaussian mixturesource, and z is additive white Gaussian noise in parallel Gaussianchannels. Typically, the �2-norm error (squared error) is used toquantify the performance of the estimation process. In contrast,we consider the �∞-norm error (worst case error). For thiserror metric, we prove that, in an asymptotic setting where thesignal dimension N → ∞, the �∞-norm error always comesfrom the Gaussian component that has the largest variance, andthe Wiener filter asymptotically achieves the optimal expected�∞-norm error. The i.i.d. Gaussian mixture case can be extendedto i.i.d. Bernoulli-Gaussian distributions, which are often usedto model sparse signals. Finally, our results can be extendedto linear mixing systems with i.i.d. Gaussian mixture inputs, insettings where a linear mixing system can be decoupled to parallelGaussian channels.

Index Terms— Estimation theory, Gaussian mixtures, �∞-normerror, linear mixing systems, parallel Gaussian channels, Wienerfilters.

I. INTRODUCTION

A. Motivation

The Gaussian distribution is widely used to describe theprobability densities of various types of data, owing toits advantageous mathematical properties [3]. It has beenshown that non-Gaussian distributions can often be sufficientlyapproximated by an infinite mixture of Gaussians [4], so thatthe mathematical advantages of the Gaussian distribution canbe leveraged when discussing non-Gaussian signals [4]–[8].In practice, signals are often contaminated by noise duringsampling or transmission, and therefore estimation of signalsfrom their noisy observations are needed. Most estimationmethods evaluate the performance by the ubiquitous �2-normerror [4] (squared error). However, there are applicationswhere other error metrics may be preferred [9]. For example,the �2 error criterion ensures that the estimated signal has

Manuscript received January 3, 2014; revised May 14, 2014; acceptedJuly 16, 2014. Date of publication August 1, 2014; date of current versionSeptember 11, 2014. This work was supported in part by the National ScienceFoundation under Grant CCF-1217749 and in part by the U.S. Army ResearchOffice under Grant W911NF-04-D-0003. This paper was presented at the2013 Information Theory and Applications Workshop [1] and 2014 IEEEConference on Information Sciences and Systems [2].

J. Tan and D. Baron are with the Department of Electrical and ComputerEngineering, North Carolina State University, Raleigh, NC 27695 USA(e-mail: [email protected]; [email protected]).

L. Dai is with the Computing Sciences Division, U.S. Army ResearchOffice, Research Triangle Park, Durham, NC 27709 USA (e-mail:[email protected]).

Communicated by O. Milenkovic, Associate Editor for Coding Theory.Digital Object Identifier 10.1109/TIT.2014.2345260

low square error on average, but does not guarantee thatevery estimated signal component is close to the correspondingoriginal signal component. In problems such as image andvideo compression [10] where the reconstruction quality atevery signal component is important, it would be desirable tooptimize for �∞-norm error. Our interest in the �∞-norm erroris also motivated by applications including wireless commu-nications [11], group testing [12] and trajectory planning incontrol systems [13], where we want to decrease the worst-case sensitivity to noise.

B. Problem Setting

In this correspondence, our main focus is on parallelGaussian channels, and the results can be extended to linearmixing systems. In both settings, the input x ∈ R

N is generatedby an independent and identically distributed (i.i.d.) Gaussianmixture source,

xi ∼K∑

k=1

sk · N (μk, σ2k ) =

K∑

k=1

sk√2πσ 2

k

e− (xi−μk )2

2σ2k , (1)

where the subscript (·)i denotes the i -th component ofa sequence (or a vector), μ1, μ2, . . . , μK (respectively,σ 2

1 , σ 22 , . . . , σ 2

K ) are the means (respectively, variances) ofthe Gaussian components, and 0 < s1, s2, . . . , sK < 1 arethe probabilities of the K Gaussian components. Note that∑K

k=1 sk = 1. A special case of the Gaussian mixture isBernoulli-Gaussian,

xi ∼ s · N (μx , σ2x ) + (1 − s) · δ(xi ), (2)

for some 0 < s < 1, μx , and σ 2x , where δ(·) is the delta

function [3]. The zero-mean Bernoulli-Gaussian model is oftenused in sparse signal processing [14]–[21].

In parallel Gaussian channels [5], [6], we consider

r = x + z, (3)

where r, x, z ∈ RN are the output signal, the input signal,

and the additive white Gaussian noise (AWGN), respectively.The AWGN channel can be described by the conditionaldistribution

fR|X(r|x)=N∏

i=1

fR|X (ri |xi )=N∏

i=1

1√2πσ 2

z

exp

(− (ri − xi )

2

2σ 2z

),

(4)

where σ 2z is the variance of the Gaussian noise.

0018-9448 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Wiener Filters in Gaussian Mixture Signal Estimation With  \(\ell _\infty \) -Norm Error

TAN et al.: WIENER FILTERS IN GAUSSIAN MIXTURE SIGNAL ESTIMATION 6627

In a linear mixing system [14], [15], [17], [18], we consider

w = �x, (5)

the measurement matrix � ∈ RM×N is sparse and its

entries are i.i.d. Because each component of the measurementvector w ∈ R

M is a linear combination of the componentsof x, we call the system (5) a linear mixing system. Themeasurements w are passed through a bank of separable scalarchannels characterized by conditional distributions

fY|W(y|w) =M∏

i=1

fY |W (yi |wi ), (6)

where y ∈ RM are the channel outputs. However, unlike the

parallel Gaussian channels (4), the channels (6) of the linearmixing system are not restricted to Gaussian [16], [18], [19].

Our goal is to estimate the original input signal x either fromthe parallel Gaussian channel outputs r in (3) or from the linearmixing system outputs y and the measurement matrix � in (5)and (6). To evaluate how accurate the estimation process is,we quantify the �∞-norm error between x and its estimate x,

‖x − x‖∞ = maxi∈{1,...,N} |xi − xi |;

this error metric helps prevent any significant errors during theestimation process. The estimator that minimizes the expectedvalue of ‖x − x‖∞ is called the minimum mean �∞-norm errorestimator. We denote this estimator by x�∞ , which can beexpressed as

x�∞ = arg minx

E [‖x − x‖∞]. (7)

C. Related Work

Gaussian mixtures are widely used to model various typesof signals, and a number of signal estimation methods havebeen introduced to take advantage of the Gaussian mixturedistribution. For example, an infinite Gaussian mixture modelwas proposed in [4] to represent real data such as images,and a denoising scheme based on local linear estimators wasdeveloped to estimate the original data. A similar algorithmbased on an adaptive Wiener filter was applied to denoiseX-ray CT images [6], where a Gaussian mixture model wasutilized. However, these works only quantified the �2-normerror of the denoising process. Signal estimation problemswith �∞-norm error have not been well-explored, but therehave been studies on general properties of the �∞-norm. Forexample, in Clark [22], the author developed a deductivemethod to calculate the distribution of the greatest elementin a finite set of random variables; and Indyk [23] discussedhow to find the nearest neighbor of a point while taking the�∞-norm distance into consideration.

D. Contributions

In this correspondence, we study the estimator that min-imizes the �∞-norm error in parallel Gaussian channels inan asymptotic setting where the signal dimension N → ∞.We prove that, when estimating an input signal that is gen-erated by an i.i.d. Gaussian mixture source, the �∞-norm

error always comes from the Gaussian component that hasthe largest variance. Therefore, the well-known Wiener filterachieves the minimum mean �∞-norm error. The Wiener filteris a simple linear function that is applied to the channeloutputs, where the multiplicative constant of the linear func-tion is computed by considering the greatest variance of theGaussian mixture components (1) and the variance of thechannel noise. Moreover, the Wiener filter can be appliedto linear mixing systems defined in (5) and (6) to mini-mize the �∞-norm error, based on settings where a linearmixing system can be decoupled to parallel Gaussianchannels [18], [19], [24]–[28].

The remainder of the correspondence is arranged as follows.We provide our main results and discuss their applications inSection II. Proofs of the main results appear in Section III,while Section IV concludes.

II. MAIN RESULTS

For parallel Gaussian channels (3), the minimum meansquared error estimator, denoted by x�2 , is achieved by theconditional expectation E[x|r]. If the input signal x is i.i.d.Gaussian (not a Gaussian mixture), i.e., xi ∼ N (μx , σ

2x ), then

the estimate

x�2 = E[x|r] = σ 2x

σ 2x + σ 2

z(r − μx ) + μx (8)

achieves the minimum mean squared error, where σ 2z is the

variance of the Gaussian noise z in (3), and we use theconvention that adding a scalar to (respectively, subtractinga scalar from) a vector means adding this scalar to (respec-tively, subtracting this scalar from) every component of thevector. This format in (8) is called the Wiener filter in signalprocessing [29]. It has been shown by Sherman [30], [31]that, besides the �2-norm error, the linear Wiener filter is alsooptimal for all �p-norm errors ( p ≥ 1), including the �∞-normerror. Surprisingly, we find that, if the input signal is generatedby an i.i.d. Gaussian mixture source, then the Wiener filterasymptotically minimizes the expected �∞-norm error.

Before providing the result for the Gaussian mixture inputcase, which is mathematically involved, we begin with ananalysis of the simpler Bernoulli-Gaussian input case.

Theorem 1: In parallel Gaussian channels (3), if the inputsignal x is generated by an i.i.d. Bernoulli-Gaussian sourcedefined in (2), then the Wiener filter

xW,BG = σ 2x

σ 2x + σ 2

z(r − μx) + μx (9)

asymptotically achieves the minimum mean �∞-norm error.More specifically,

limN→∞

E[‖x − xW,BG‖∞

]

E[‖x − x�∞‖∞

] = 1,

where x�∞ satisfies (7).Theorem 1 is proved in Section III-A. The proof

combines concepts in typical sets [32] and a result byGnedenko [33], which provided asymptotic properties ofthe maximum of a Gaussian sequence. The main idea of

Page 3: Wiener Filters in Gaussian Mixture Signal Estimation With  \(\ell _\infty \) -Norm Error

6628 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 10, OCTOBER 2014

the proof is to show that with overwhelming probability themaximum absolute error satisfies ‖x − x‖∞ = |xi − xi |,where i ∈ I = {i : xi ∼ N (μx , σ

2x )}, i.e., I is the index

set that includes all the Gaussian components of the vector x,and excludes all the zero components of x. Therefore,minimizing ‖x − x‖∞ is equivalent to minimizing‖xI − xI‖∞, where (·)I denotes a subvector with entriesin the index set I. Because the vector xI is i.i.d. Gaussian,the Wiener filter minimizes ‖xI − xI‖∞ [30], [31];hence the Wiener filter minimizes ‖x − x‖∞ withoverwhelming probability. On the other hand, the caseswhere ‖x − x‖∞ = |xi − xi | and i /∈ I are rare, the mean�∞-norm error of the Wiener filter barely increases, and so theWiener filter asymptotically minimizes the expected �∞-normerror.

Having discussed the Bernoulli-Gaussian case, let usproceed to the Gaussian mixture case defined in (1). Herethe maximum absolute error between x and the estimate xsatisfies ‖x − x‖∞ = |xi − xi |, where i ∈ I ′ = {i : xi ∼N (μm , σ 2

m)}, and m = arg maxk∈{1,2,...,K } σ 2k . That is, the

maximum absolute error between x and x lies in an index thatcorresponds to the Gaussian mixture component with greatestvariance.

Theorem 2: In parallel Gaussian channels (3), if the inputsignal x is generated by an i.i.d. Gaussian mixture sourcedefined in (1), then the Wiener filter

xW,GM = σ 2m

σ 2m + σ 2

z(r − μm) + μm (10)

asymptotically achieves the minimum mean �∞-norm error,where m = arg maxk∈{1,2,...,K } σ 2

k . More specifically,

limN→∞

E[‖x − xW,GM‖∞

]

E[‖x − x�∞‖∞

] = 1,

where x�∞ satisfies (7).The proof of Theorem 2 is given in Section III-B. We note

in passing that the statements in Theorems 1 and 2 do nothold for �p-norm error (0 < p < ∞). Because there is a oneto one correspondence between the parameters (μk and σ 2

k ) ofa Gaussian mixture component and its corresponding Wienerfilter, if a Wiener filter is optimal in the �p error sense for anyof the Gaussian mixture components, then it is suboptimalfor the rest of the mixture components. Therefore, any singleWiener filter is suboptimal in the �p error sense for anyGaussian mixture signal comprising more than one Gaussiancomponent.

Remark 1: Theorems 1 and 2 can be extended to linearmixing systems. We consider a linear mixing system definedin (5), where the matrix � ∈ R

M×N is i.i.d. sparse, and let� denote the average number of nonzeros in each row of �.It has been shown [16], [18], [19], [24]–[28] that, in a large-sparse-limit where M, N, � → ∞ with M/N → β < ∞ forsome constant β > 0 and � = o(N1/2), a linear mixingsystem (5) and (6) can be decoupled to an equivalent setof parallel Gaussian channels, q = x + v where v ∈ R

N isthe equivalent Gaussian noise, and q ∈ R

N are the outputsof the decoupled parallel Gaussian channels. The statisticalproperties of the noise v are characterized by Tanaka’s fixed

point equation [24]–[26], [34]. Therefore, when the inputsignal x is generated by an i.i.d. Gaussian mixture source, byapplying the Wiener filter to q, we can obtain the estimate thatminimizes the �∞-norm error of the signal estimation process.

III. PROOFS

A. Proof of Theorem 1

1)Two Error Patterns: We begin by defining two errorpatterns. Consider parallel Gaussian channels (3), where theinput signal xi ∼ s · N (μx , σ

2x ) + (1 − s) · δ(xi ) for some s,

and the noise zi ∼ N (0, σ 2z ). The Wiener filter (linear

estimator) for the Bernoulli-Gaussian input is xW,BG = σ 2x

σ 2x +σ 2

(r − μx) + μx . Let I denote the index set wherexi ∼ N (μx , σ

2x ), and let J denote the index set where

x j ∼ δ(x j ). We define two types of error patterns: (i) for

i ∈ I �{i : xi ∼ N (μx , σ

2x

)},

the error is

ei � xW,BG,i − xi = σ 2x

σ 2x + σ 2

z· (ri − μx )

+ μx − xi ∼ N(

0,σ 2

x σ 2z

σ 2x + σ 2

z

),

where we remind readers that xW,BG,i denotes the i -th com-ponent of the vector xW,BG in (9); and (ii) for

j ∈ J �{

j : x j ∼ δ(x j )},

the error is

e j � xW,BG, j − x j = σ 2x

σ 2x + σ 2

z· (ri − μx) + μx − 0

∼ N(

σ 2z

σ 2x + σ 2

zμx ,

σ 4x σ 2

z

(σ 2x + σ 2

z )2

).

2) Maximum of Error Patterns: Let us comparemaxi∈I |ei | and max j∈J |e j |.

Lemma 1: Suppose ui is an i.i.d. Gaussian sequence oflength N, ui ∼ N (μ, σ 2) for i ∈ {1, 2, . . . , N}, thenmax1≤i≤N |ui |√

2σ 2·ln(N)converges to 1 in probability. That is,

limN→∞ Pr

(∣∣∣∣∣max1≤i≤N |ui |√

2σ 2 · ln(N)− 1

∣∣∣∣∣ <

)= 1, (11)

for any > 0.Lemma 1 is proved in Section III-C.Before applying Lemma 1, we define a set Aε of possible

inputs x such that the numbers of components in the setsI and J both go to infinity as N → ∞,

Aε �{

x :∣∣∣∣|I|N

− s

∣∣∣∣ < ε

}, (12)

where ε > 0 and ε → 0 (namely, ε → 0+) as a functionof signal dimension N , and |I| denotes the cardinality of theset I. The definition of Aε suggests that

∣∣ |J |N − (1 − s)

∣∣ < εand |I| + |J | = N . Therefore, if x ∈ Aε , then |I|, |J | → ∞as N → ∞.

Page 4: Wiener Filters in Gaussian Mixture Signal Estimation With  \(\ell _\infty \) -Norm Error

TAN et al.: WIENER FILTERS IN GAUSSIAN MIXTURE SIGNAL ESTIMATION 6629

Now we are ready to evaluate maxi∈I |ei | and max j∈J |e j |.For i.i.d. Gaussian random variables ei ∼ N (0,

σ 2x σ 2

zσ 2

x +σ 2z),

where i ∈ I, the equality (11) in Lemma 1 becomes

limN→∞Pr

⎛⎜⎜⎝

∣∣∣∣∣∣∣∣

maxi∈I |ei |√2 · σ 2

x σ 2z

σ 2x +σ 2

z· ln(|I|)

−1

∣∣∣∣∣∣∣∣<

∣∣∣∣∣∣∣∣x ∈ Aε

⎞⎟⎟⎠=1, (13)

for any > 0. For i.i.d. Gaussian random variables e j ,where j ∈ J , the equality (11) becomes

limN→∞Pr

⎜⎜⎝

∣∣∣∣∣∣∣∣

max j∈J∣∣e j∣∣

√2 · σ 4

x σ 2z

(σ 2x +σ 2

z )2 · ln(|J |)−1

∣∣∣∣∣∣∣∣<

∣∣∣∣∣∣∣∣x ∈ Aε

⎟⎟⎠=1, (14)

for any > 0.Equations (13) and (14) suggest that

limN→∞ E

⎢⎢⎣maxi∈I |ei |√

2 · σ 2x σ 2

zσ 2

x +σ 2z

· ln(|I|)

∣∣∣∣∣∣∣∣x ∈ Aε

⎥⎥⎦ = 1

and

limN→∞ E

⎢⎢⎣max j∈J |e j |√

2 · σ 4x σ 2

z(σ 2

x +σ 2z )2 · ln(|J |)

∣∣∣∣∣∣∣∣x ∈ Aε

⎥⎥⎦ = 1,

which yield

limN→∞ E

[maxi∈I |ei |√

ln(N)

∣∣∣∣ x∈ Aε

]= lim

N→∞

2 · σ 2x σ 2

z

σ 2x + σ 2

z· ln(|I|)

ln(N)

(15)

and

limN→∞ E

[max j∈J |e j |√

ln(N)

∣∣∣∣ x ∈ Aε

]

= limN→∞

2 · σ 4x σ 2

z

(σ 2x + σ 2

z )2 · ln(|J |)ln(N)

. (16)

According to the definition of Aε in (12), where s is a constant,and ε → 0+,

N(s − ε) < |I| < N(s + ε)

and

N(1 − s − ε) < |J | < N(1 − s + ε), (17)

and thus

limN→∞

√ln (|I|)ln(N)

= 1 and limN→∞

√ln (|J |)ln(N)

= 1. (18)

Finally, equations (15) and (16) become

limN→∞ E

[maxi∈I |ei |√

ln(N)

∣∣∣∣ x ∈ Aε

]=√

2 · σ 2x σ 2

z

σ 2x + σ 2

z(19)

and

limN→∞ E

[max j∈J |e j |√

ln(N)

∣∣∣∣ x ∈ Aε

]=√

2 · σ 4x σ 2

z

(σ 2x + σ 2

z )2 . (20)

Combining (13) and (14),

limN→∞ Pr

⎛⎜⎜⎝

1 −

1 + <

maxi∈I |ei |max j∈J |e j | ·

√2 · σ 4

x σ 2z

(σ 2x +σ 2

z )2 · ln(|J |)√

2 · σ 2x σ 2

zσ 2

x +σ 2z

· ln(|I|)

<1 +

1 −

∣∣∣∣ x ∈ Aε

⎞⎟⎟⎠ = 1. (21)

Note that√

ln(N) + ln(1 − s − ε)

ln(N) + ln(s + ε)=√

ln(N(1 − s − ε))

ln(N(s + ε))

<

√ln(|J |)ln(|I|) <

√ln(N(1 − s + ε))

ln(N(s − ε))

=√

ln(N) + ln(1 − s + ε)

ln(N) + ln(s − ε).

Then the following limit holds,

limN→∞

√ln(|J |)ln(|I|) = 1.

We can write the above limit in probabilistic form,

limN→∞ Pr

(∣∣∣∣∣

√ln(|J |)ln(|I|) − 1

∣∣∣∣∣ <

∣∣∣∣∣ x ∈ Aε

)= 1, (22)

for any > 0. Because of the logarithms in (22), the

ratio√

2·ln(|J |)√2·ln(|I|) is sufficiently close to 1 as N is astronomically

large. This is why we point out in Section IV that theasymptotic results in this correspondence might be impractical.Plugging (22) into (21),

limN→∞ Pr

⎝ 1 −

(1 + )2 ·√

σ 2x + σ 2

z

σ 2x

<maxi∈I |ei |max j∈J |e j |

<1 +

(1 − )2 ·√

σ 2x + σ 2

z

σ 2x

∣∣∣∣∣∣x ∈ Aε

⎠ = 1. (23)

Equation (23) holds for any > 0. We note that

√σ 2

x +σ 2z

σ 2x

> 1,

and thus 1−(1+)2 ·

√σ 2

x +σ 2z

σ 2x

> 1 for sufficiently small .

Therefore,

limN→∞ Pr

(maxi∈I |ei |max j∈J |e j | > 1

∣∣∣∣ x ∈ Aε

)

= limN→∞ Pr

(maxi∈I |xi − xW,BG,i |

max j∈J |x j − xW,BG, j | > 1

∣∣∣∣ x ∈ Aε

)

= 1, (24)

Page 5: Wiener Filters in Gaussian Mixture Signal Estimation With  \(\ell _\infty \) -Norm Error

6630 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 10, OCTOBER 2014

limN→∞

E[‖x − xW,BG‖∞

∣∣ x ∈ Aε

]√

ln(N)

= limN→∞ E

[maxi∈I |xi − xW,BG,i |√

ln(N)

∣∣∣∣ x ∈ Aε,maxi∈I |xi − xW,BG,i |

max j∈J |x j − xW,BG, j | > 1

]· Pr

(maxi∈I |xi − xW,BG,i |

max j∈J |x j − xW,BG, j | > 1

∣∣∣∣ x ∈ Aε

)

+ limN→∞ E

[max j∈J |x j − xW,BG, j |√

ln(N)

∣∣∣∣ x ∈ Aε,maxi∈I |xi − xW,BG,i |

max j∈J |x j − xW,BG, j | ≤ 1

]· Pr

(maxi∈I |xi − xW,BG,i |

max j∈J |x j − xW,BG, j | ≤ 1

∣∣∣∣ x ∈ Aε

).

(25)

and

limN→∞ Pr

(maxi∈I |xi − xW,BG,i |

max j∈J |x j − xW,BG, j | ≤ 1

∣∣∣∣ x ∈ Aε

)= 0.

3) Mean �∞-Norm Error: The road map for the remainderof the proof is to first show that when x ∈ Aε the Wienerfilter is asymptotically optimal for expected �∞-norm error,and then show that Pr(x ∈ Aε) is arbitrarily close to 1.

In order to utilize equations (19) and (20), we normalizethe quantities in the derivation of equation (25) by

√ln(N) so

that every term is bounded.Let us now verify that the second term in (25) equals 0.

In fact, the following derivations hold from (14) and (24),

1 = limN→∞ Pr

⎜⎜⎝

∣∣∣∣∣∣∣∣

max j∈J∣∣x j − xW,GB, j

∣∣√

2 · σ 4x σ 2

z(σ 2

x +σ 2z )2 · ln(|J |)

− 1

∣∣∣∣∣∣∣∣<

∣∣∣∣∣∣∣∣x ∈ Aε

⎟⎟⎠

= limN→∞ Pr

⎛⎜⎜⎝

∣∣∣∣∣∣∣∣

max j∈J∣∣x j − xW,GB, j

∣∣√

2 · σ 4x σ 2

z(σ 2

x +σ 2z )2 · ln(|J |)

− 1

∣∣∣∣∣∣∣∣<

∣∣∣∣∣∣∣∣x ∈ Aε,

maxi∈I |xi − xW,BG,i |max j∈J |x j − xW,BG, j | > 1

⎞⎟⎟⎠ .

Therefore,

limN→∞ E

⎢⎢⎣max j∈J

∣∣x j − xW,GB, j∣∣

√2 · σ 4

x σ 2z

(σ 2x +σ 2

z )2 · ln(|J |)

∣∣∣∣∣∣∣∣x ∈ Aε,

maxi∈I |xi − xW,BG,i |max j∈J |x j − xW,BG, j | > 1

⎥⎥⎦ = 1,

which yields (following similar derivations of (16) and (20))

limN→∞ E

[max j∈J

∣∣x j − xW,GB, j∣∣

√ln(N)

∣∣∣∣∣ x ∈ Aε,

maxi∈I |xi − xW,BG,i |max j∈J |x j − xW,BG, j | > 1

]=√

2 · σ 4x σ 2

z

(σ 2x + σ 2

z )2 .

Therefore, the second term in (25) equals√2 · σ 4

x σ 2z

(σ 2x +σ 2

z )2 × 0 = 0, and equation (25) becomes

limN→∞

E[‖x − xW,BG‖∞

∣∣ x ∈ Aε

]√

ln(N)

= limN→∞ E

[maxi∈I |xi − xW,BG,i |√

ln(N)

∣∣∣∣ x ∈ Aε,

maxi∈I |xi − xW,BG,i |max j∈J |x j − xW,BG, j | > 1

]

= limN→∞ E

[maxi∈I |xi − xW,BG,i |√

ln(N)

∣∣∣∣ x ∈ Aε

]

− limN→∞ E

[maxi∈I |xi − xW,BG,i |√

ln(N)

∣∣∣∣ x ∈ Aε,

maxi∈I |xi − xW,BG,i |max j∈J |x j − xW,BG, j | ≤ 1

]

× Pr

(maxi∈I |xi − xW,BG,i |

max j∈J |x j − xW,BG, j | ≤ 1

∣∣∣∣ x ∈ Aε

)

= limN→∞ E

[maxi∈I |xi − xW,BG,i |√

ln(N)

∣∣∣∣ x ∈ Aε

]. (26)

Equation (26) shows that the maximum absolute error of theWiener filter relates to the Gaussian-distributed componentsof x.

4) Optimality of the Wiener Filter: It has been shown bySherman [30], [31] that, for parallel Gaussian channels withan i.i.d. Gaussian input x, if an error metric function d(x, x)relating x and its estimate x is convex, then the Wiener filteris optimal for that error metric. The �∞-norm is convex, andtherefore, for any estimator x,

E [‖x − x‖∞| x ∈ Aε] = E

[max

i∈I∪J|xi − xi |

∣∣∣∣ x ∈ Aε

]

≥ E

[maxi∈I

|xi − xi |∣∣∣∣ x ∈ Aε

]

≥ E

[maxi∈I

|xi − xW,BG,i |∣∣∣∣ x∈ Aε

]. (27)

The inequality (27) holds, because the set {xi : i ∈ I}only contains the i.i.d. Gaussian components of x, and theWiener filter is optimal for �∞-norm error when the inputsignal is i.i.d. Gaussian. The inequality (27) holds for anysignal length N, and thus it holds when N → ∞ and wedivide both sides by

√ln(N), (28), shown at the top of

next page, where the last step in (28) is justified by the

Page 6: Wiener Filters in Gaussian Mixture Signal Estimation With  \(\ell _\infty \) -Norm Error

TAN et al.: WIENER FILTERS IN GAUSSIAN MIXTURE SIGNAL ESTIMATION 6631

0 ≤ limN→∞

(E [‖x − x‖∞| x ∈ Aε]√

ln(N)− E

[maxi∈I |xi − xW,BG,i |

∣∣ x ∈ Aε

]√

ln(N)

)

= limN→∞

(E [‖x − x‖∞| x ∈ Aε]√

ln(N)− E

[‖x − xW,BG‖∞∣∣ x ∈ Aε

]√

ln(N)

), (28)

derivation of (26). Equation (28) also holds for x = x�∞ ,

limN→∞

(E[‖x − x�∞‖∞

∣∣ x ∈ Aε

]√

ln(N)

− E[‖x − xW,BG‖∞

∣∣ x ∈ Aε

]√

ln(N)

)≥ 0. (29)

5) Typical Set: Let us now evaluate Pr(x ∈ Aε). The set Aε

only considers whether the components in x are Gaussianor zero, and so we introduce a binary vector x ∈ R

N ,where xi = 1{xi∼N (μx σ 2

x )} and 1{·} is the indicator function.That is, xi = 1 if xi is Gaussian, and else xi = 0. Thesequence x � {x1, x2, . . . , xN } is called a typical sequence([32], p. 59), if it satisfies

2−N(H(X)+δ) ≤ Pr(x1, x2, . . . , xN ) ≤ 2−N(H(X)−δ), (30)

for some δ > 0, where H (X) denotes the binary entropy [32]of the sequence {x1, x2, . . . , xN }. The set Aε is then called atypical set [32], and

Pr(x ∈ Aε) > 1 − δ. (31)

We highlight that the inequalities (30) and (31) both holdwhen δ → 0+ as a function of N .

In our problem setting where Pr(xi = 1) = Pr(xi ∼N (μx , σ

2x )) = s, the entropy of the sequence {x1, x2, . . . , xN }

is

H (X) = −s log2(s) − (1 − s) log2(1 − s), (32)

and the probability of the sequence {x1, x2, . . . , xN } is

Pr(x1, x2, . . . , xN ) = s|I| · (1 − s)|J |. (33)

Plugging (17), (32), and (33) into (30), the value of δ can becomputed,

δ = ε

∣∣∣∣log2

(s

1 − s

)∣∣∣∣, (34)

for 0 < s < 1 and s �= 0.5. That is,

Pr(x ∈ Aε) > 1 − δ = 1 − ε

∣∣∣∣log2

(s

1 − s

)∣∣∣∣. (35)

Finally, we compare E[‖x−xWB,G‖∞] with E[‖x−x�∞‖∞],where x�∞ satisfies (7), i.e., the estimate x�∞ is optimal forminimizing the mean �∞-norm error of estimation. By defin-ition,

limN→∞

(E[‖x − x�∞‖∞

]√

ln(N)− E

[‖x − xW,BG‖∞]

√ln(N)

)≤ 0,

but we already proved in (29) that

limN→∞

(E[‖x−x�∞‖∞

∣∣ x∈ Aε

]√

ln(N)− E

[‖x−xW,BG‖∞∣∣ x∈ Aε

]√

ln(N)

)≥0,

and thus

limN→∞

(E[‖x−x�∞‖∞

∣∣ x /∈ Aε

]√

ln(N)− E

[‖x−xW,BG‖∞∣∣ x /∈ Aε

]√

ln(N)

)

≤ 0. (36)

We know that Pr (x /∈ Aε) < δ from (35). To completethe proof, it suffices to show that, when x /∈ Aε , thesubtraction (36) is bounded. When x /∈ Aε , there are 3 casesfor the possible values of |I| and |J |:

• Case 1: |I|, |J | → ∞, but (18) may not hold.• Case 2: |J | → ∞ but |I| �→ ∞.• Case 3: |I| → ∞ but |J | �→ ∞.

We observe that equations (19) and (20) are derivedfrom (15), (16), and (18). In Case 1, similar equalitiesto (15) and (16) hold,

limN→∞ E

[maxi∈I |ei |√

ln(N)

∣∣∣∣Case 1 of x /∈ Aε

]

= limN→∞

2 · σ 2x σ 2

z

σ 2x + σ 2

z· ln(|I|)

ln(N)≤√

2 · σ 2x σ 2

z

σ 2x + σ 2

z

and

limN→∞ E

[max j∈J |e j |√

ln(N)

∣∣∣∣Case 1 of x /∈ Aε

]

= limN→∞

2 · σ 4x σ 2

z

(σ 2x + σ 2

z )2 · ln(|J |)ln(N)

≤√

2 · σ 4x σ 2

z

(σ 2x + σ 2

z )2 .

Therefore, the value of limN→∞ E[ ‖x−xW,BG‖∞√

ln(N)| Case 1 of

x /∈ Aε

]is bounded.

In Case 2, it is obvious that limN→∞ E[

maxi∈I |ei |√ln(N)

|Case 2 of x /∈ Aε

]is bounded because |I| �→ ∞, while

limN→∞ E[

max j∈J |e j |√ln(N)

| Case 2 of x /∈ Aε

]is bounded

because |J | ≤ N , and

limN→∞ E

[max j∈J |e j |√

ln(N)

∣∣∣∣Case 2 of x /∈ Aε

]≤√

2· σ 4x σ 2

z

(σ 2x +σ 2

z )2 .

The analysis for Case 3 is similar to that of Case 2.Therefore, we have shown that limN→∞ E

[ ‖x−xW,BG‖∞√ln(N)

|x /∈ Aε

]is bounded.

Page 7: Wiener Filters in Gaussian Mixture Signal Estimation With  \(\ell _\infty \) -Norm Error

6632 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 10, OCTOBER 2014

limN→∞

E[‖x − xW,BG‖∞

]

E[‖x − x�∞‖∞

] = limN→∞

E[‖x−xW,BG‖∞]√ln(N)

E[‖x−x�∞‖∞]√ln(N)

= limN→∞

E[ ‖x−xW,BG‖∞|x∈Aε]√ln(N)

· Pr (x ∈ Aε) + E[ ‖x−xW,BG‖∞|x/∈Aε]√ln(N)

· Pr (x /∈ Aε)

E[ ‖x−x�∞‖∞|x∈Aε]√ln(N)

· Pr (x ∈ Aε) + E[ ‖x−x�∞‖∞|x/∈Aε]√ln(N)

· Pr (x /∈ Aε)

= limN→∞

E[ ‖x−xW,BG‖∞|x∈Aε]√ln(N)

· Pr (x ∈ Aε) + E[ ‖x−x�∞‖∞|x/∈Aε]√ln(N)

· Pr (x /∈ Aε)

E[ ‖x−x�∞‖∞|x∈Aε]√ln(N)

· Pr (x ∈ Aε) + E[ ‖x−x�∞‖∞|x/∈Aε]√ln(N)

· Pr (x /∈ Aε)

+ limN→∞

E[ ‖x−xW,BG‖∞|x/∈Aε]√ln(N)

· Pr (x /∈ Aε) − E[ ‖x−x�∞‖∞|x/∈Aε]√ln(N)

· Pr (x /∈ Aε)

E[ ‖x−x�∞‖∞|x∈Aε]√ln(N)

· Pr (x ∈ Aε) + E[ ‖x−x�∞‖∞|x/∈Aε]√ln(N)

· Pr (x /∈ Aε)

≤ 1 + limN→∞

(E[ ‖x−xW,BG‖∞|x/∈Aε]√

ln(N)− E[ ‖x−x�∞‖∞|x/∈Aε]√

ln(N)

)· Pr (x /∈ Aε)

E[‖x−x�∞‖∞]√ln(N)

< 1 + limN→∞

c · δE[‖x−x�∞‖∞]√

ln(N)

. (37)

By (36), limN→∞ E[ ‖x−x�∞‖∞|x/∈Aε]√ln(N)

is bounded above by

limN→∞ E[ ‖x−xW,BG‖∞√

ln(N)

∣∣x /∈ Aε

]. Hence,

limN→∞

(E[‖x−xW,BG‖∞

∣∣ x /∈ Aε

]√

ln(N)− E

[‖x−x�∞‖∞∣∣ x /∈ Aε

]√

ln(N)

)

= c

is bounded, where c > 0 is a constant.Therefore, the derivation in equation (37), shown at the top

of the page, holds. In (37), the value of limN→∞ E[‖x−x�∞‖∞]√ln(N)

is bounded below because of (29),

limN→∞

E[‖x − x�∞‖∞

]√

ln(N)

= limN→∞

E[‖x − x�∞‖∞

∣∣ x ∈ Aε

]√

ln(N)· Pr (x ∈ Aε)

+ limN→∞

E[‖x − x�∞‖∞

∣∣ x /∈ Aε

]√

ln(N)· Pr (x /∈ Aε)

≥ limN→∞

E[‖x − xW,BG‖∞

∣∣ x ∈ Aε

]√

ln(N)· Pr (x ∈ Aε)

>

2 · σ 2x σ 2

z

σ 2x + σ 2

z· (1 − δ).

On the other hand, whether the value of E[‖x−x�∞‖∞]√ln(N)

is bounded above or not, the second term in (37) is alwaysarbitrarily small because δ is arbitrarily small, and thus (37)is equivalent to

limN→∞

E[‖x − xW,BG‖∞

]

E[‖x − x�∞‖∞

] < 1 + δ,

where δ → 0+ as a function of N . Finally, because x∞ is theoptimal estimator for �∞-norm error,

limN→∞

E[‖x − xW,BG‖∞

]

E[‖x − x�∞‖∞

] ≥ 1.

Therefore,

limN→∞

E[‖x − xW,BG‖∞

]

E[‖x − x�∞‖∞

] = 1,

which completes the proof.

B. Proof of Theorem 2

The road map of the proof of Theorem 2 is the same as thatof Theorem 1.

1) K Error Patterns: The input signal of the parallelGaussian channels (3) is generated by an i.i.d. Gaussianmixture source (1), and suppose without loss of generalitythat σ 2

1 = maxk∈{1,2,...,K } σ 2k . The Wiener filter is xW,GM =

σ 21

σ 21 +σ 2

z· (r − μ1) + μ1 = σ 2

1 r+σ 2z μ1

σ 21 +σ 2

z. Let Ik denote the index

set where xi ∼ N (μk , σ2k ),. Then we define K types of error

patterns: for k ∈ {1, 2, . . . , K }, the k-th error pattern is

e(k)i � xW,GM,i − xi = σ 2

1 ri + σ 2z μ1

σ 21 + σ 2

z

− xi

∼N(

σ 2z

σ 21 +σ 2

z

μ1− σ 2z

σ 21 +σ 2

z

μk,σ 4

z

(σ 21 +σ 2

z )2σ 2

k + σ 41

(σ 21 +σ 2

z )2σ 2

z

),

where

i ∈ Ik � {i : xi ∼ N (μk, σ2k )}.

Because the variances σ 2z , σ 2

1 , σ 22 , . . . , σ 2

K > 0 are constants,and σ 2

1 = maxk∈{1,2,...,K } σ 2k ,

σ 4z

(σ 21 + σ 2

z )2σ 2

1 + σ 41

(σ 21 + σ 2

z )2σ 2

z

= maxk∈{1,2,...,K }

(σ 4

z

(σ 21 + σ 2

z )2σ 2

k + σ 41

(σ 21 + σ 2

z )2σ 2

z

), (38)

which shows that the first error pattern e(1)i has the greatest

variance.

Page 8: Wiener Filters in Gaussian Mixture Signal Estimation With  \(\ell _\infty \) -Norm Error

TAN et al.: WIENER FILTERS IN GAUSSIAN MIXTURE SIGNAL ESTIMATION 6633

2) Maximum of Error Patterns: Define the set Aε as

Aε �{x :∣∣∣∣|I1|N

−s1

∣∣∣∣<ε1,

∣∣∣∣|I2|N

−s2

∣∣∣∣<ε2, . . . ,

∣∣∣∣|IK |

N−sK

∣∣∣∣<εK

},

where∑K

k=1 |Ik | = N , and εk → 0+ as a function of Nfor k ∈ {1, 2, . . . , K }. Applying a similar derivation to thatof (24), we obtain that

limN→∞ Pr

⎛⎜⎜⎝

maxi∈I1 |xW,GM,i − xi |max j∈Ik |xW,GM, j − x j |

>1 −

(1 + )2 ·

√σ 4

z

(σ 21 +σ 2

z )2 σ 21 + σ 4

1(σ 2

1 +σ 2z )2 σ 2

z

√σ 4

z

(σ 21 +σ 2

z )2 σ 2k + σ 4

1(σ 2

1 +σ 2z )2 σ 2

z

∣∣∣∣∣∣∣∣x∈ Aε

⎞⎟⎟⎠

(39)

= limN→∞ Pr

(maxi∈I1 |xW,GM,i − xi |max j∈Ik |xW,GM, j − x j | ≥ 1

∣∣∣∣ x ∈ Aε

)

= 1, (40)

for any k �= 1. Equation (40) is valid because (39) holds

for any > 0, and

√σ4

z(σ2

1 +σ2z )2

σ 21 + σ4

1(σ2

1 +σ2z )2

σ 2z

√σ4

z(σ2

1 +σ2z )2

σ 2k + σ4

1(σ2

1 +σ2z )2

σ 2z

≥ 1 is derived

from (38).Hence,

limN→∞ E

[ ‖x − xW,GM‖∞√ln(N)

∣∣∣∣ x ∈ Aε

]

limN→∞ E

[maxi∈I1 |xi − xW,GM,i |√

ln(N)

∣∣∣∣ x ∈ Aε

]. (41)

Equation (41) shows that the maximum absolute error of theWiener filter relates to the Gaussian component that has thegreatest variance.

3) Optimality of the Wiener Filter: Then applying similarderivations of equations (28) and (29),

limN→∞

(E

[‖x−x�∞‖∞√ln(N)

∣∣∣∣ x∈ Aε

]−E

[‖x−xW,GM‖∞√ln(N)

∣∣∣∣ x∈ Aε

])

≥ 0.

4) Typical Set: Similar to the derivation of (34), we obtainthe probability of x ∈ Aε ([32], p. 59),

Pr(x ∈ Aε) > 1 − δ,

where

δ =K∑

k=1

εk∣∣log2(sk)

∣∣.

Finally,

limN→∞

E[‖x − xW,GM‖∞

]

E[‖x − x�∞‖∞

] < 1 + δ,

where δ → 0+, and thus

limN→∞

E[‖x − xW,GM‖∞

]

E[‖x − x�∞‖∞

] = 1.

C. Proof of Lemma 1

It has been shown [33], [35] that for an i.i.d. standardGaussian sequence ui ∼ N (0, 1), where i ∈ {1, 2, . . . , N}, themaximum of the sequence, maxi ui , converges to

√2 ln(N) in

probability, i.e.,

limN→∞ Pr

(∣∣∣∣max1≤i≤N ui√

2 · ln(N)− 1

∣∣∣∣ <

)= 1,

for any > 0. Therefore, for an i.i.d. non-standard Gaussiansequence ui ∼ N (μ, σ 2), ui−μ

|σ | ∼ N (0, 1), and it follows that

limN→∞ Pr

(∣∣∣∣∣max1≤i≤N (ui − μ)√

2σ 2 · ln(N)− 1

∣∣∣∣∣ <

)= 1, (42)

for any > 0. We observe that, for a given μ, the followingprobability equals 1 for sufficient large N , and therefore,

limN→∞ Pr

(∣∣∣∣∣−μ√

2σ 2 · ln(N)− 0

∣∣∣∣∣ <

)= 1, (43)

for any > 0. Combining (42) and (43),

limN→∞ Pr

(∣∣∣∣∣max1≤i≤N ui√

2σ 2 · ln(N)− 1

∣∣∣∣∣ < 2

)= 1,

limN→∞ Pr

(∣∣∣∣∣max1≤i≤N |ui |√

2σ 2 · ln(N)− 1

∣∣∣∣∣ <

)

= limN→∞ Pr

(∣∣∣∣∣max1≤i≤N ui√

2σ 2 · ln(N)− 1

∣∣∣∣∣ < and

∣∣∣∣∣max1≤i≤N (−ui )√

2σ 2 · ln(N)− 1

∣∣∣∣∣ <

)

= limN→∞ Pr

(∣∣∣∣∣max1≤i≤N ui√

2σ 2 · ln(N)− 1

∣∣∣∣∣ <

)− lim

N→∞ Pr

(∣∣∣∣∣max1≤i≤N ui√

2σ 2 · ln(N)− 1

∣∣∣∣∣ < and

∣∣∣∣∣max1≤i≤N (−ui )√

2σ 2 · ln(N)− 1

∣∣∣∣∣ >

)

= limN→∞ Pr

(∣∣∣∣∣max1≤i≤N ui√

2σ 2 · ln(N)− 1

∣∣∣∣∣ <

)− 0

= 1,

Page 9: Wiener Filters in Gaussian Mixture Signal Estimation With  \(\ell _\infty \) -Norm Error

6634 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 10, OCTOBER 2014

for any > 0, which owing to arbitrariness of yields

limN→∞ Pr

(∣∣∣∣∣max1≤i≤N ui√

2σ 2 · ln(N)− 1

∣∣∣∣∣ <

)= 1. (44)

Equation (44) suggests that, for a sequence of i.i.d. Gaussianrandom variables, ui ∼ N (μ, σ 2), the maximum of thesequence is not affected by the value of μ.

On the other hand, the i.i.d. Gaussian sequence(−ui ) ∼ N (−μ, σ 2) satisfies

limN→∞ Pr

(∣∣∣∣∣max1≤i≤N (−ui )√

2σ 2 · ln(N)− 1

∣∣∣∣∣ <

)= 1.

Hence, the equation, as shown at the bottom of the previouspage, for any > 0.

IV. CONCLUSION

This correspondence focused on estimating input signals inparallel Gaussian channels, where the signals were generatedby i.i.d. Gaussian mixture sources, and the �∞-norm errorwas used to quantify the performance. We proved that theWiener filter (10), a simple linear function that is applied tothe Gaussian channel outputs, asymptotically minimizes themean �∞-norm error when the signal dimension N → ∞.Specifically, the multiplicative constant of the linear filteronly relates to the greatest variance of the Gaussian mixturecomponents and the variance of the Gaussian noise. Ourresults for parallel Gaussian channels can be extended to linearmixing systems, in settings where linear mixing systems canbe decoupled to parallel Gaussian channels.

Our results are asymptotic, but one will notice fromSection III-A (22) that the asymptotic results hold only forastronomically large signal dimension N , which may leadreaders to wonder whether the Wiener filter performs wellwhen the signal dimension is finite. To answer this ques-tion, we performed numerical simulations for finite signaldimensions. The numerical results showed that the Wienerfilter indeed reduces the �∞-norm error to some extent.Specifically, the Wiener filter outperforms the relaxed beliefpropagation algorithm [18], [19] in linear mixing systems.However, our numerical results suggested that there exist betteralgorithms [2] for �∞-norm error than the Wiener filter infinite signal dimension settings. The development of optimalalgorithms in the finite dimension setting is left for futurework.

ACKNOWLEDGMENTS

The authors would like to thank Nikhil Krishnan for usefuldiscussions. They would also like to thank the reviewersfor their comments, which greatly helped us improve thismanuscript.

REFERENCES

[1] J. Tan and D. Baron, “Signal reconstruction in linear mixing systemswith different error metrics,” in Proc. Inf. Theory Appl. Workshop,Feb. 2013, pp. 1–7.

[2] J. Tan, D. Baron, and L. Dai, “Signal estimation with low infinity-normerror by minimizing the mean p-norm error,” in Proc. IEEE 48th Annu.Conf. Inf. Sci. Syst., Mar. 2014, pp. 1–5.

[3] A. Papoulis, Probability, Random Variables, and Stochastic Processes.New York, NY, USA: McGraw-Hill, 1991.

[4] T. I. Alecu, S. Voloshynovskiy, and T. Pun, “The Gaussian transformof distributions: Definition, computation and application,” IEEE Trans.Signal Process., vol. 54, no. 8, pp. 2976–2985, Aug. 2006.

[5] A. Bijaoui, “Wavelets, Gaussian mixtures and Wiener filtering,” SignalProcess., vol. 82, no. 4, pp. 709–712, Apr. 2002.

[6] M. Tabuchi, N. Yamane, and Y. Morikawa, “Adaptive Wiener filter basedon Gaussian mixture model for denoising chest X-ray CT image,” inProc. Annu. Conf. SICE, Sep. 2007, pp. 682–689.

[7] J. Vila and P. Schniter, “Expectation-maximization Gaussian-mixtureapproximate message passing,” in Proc. 46th Annu. Conf. Inf. Sci. Syst.,Mar. 2012, pp. 1–6.

[8] J. P. Vila and P. Schniter, “Expectation-maximization Gaussian-mixtureapproximate message passing,” IEEE Trans. Signal Process., vol. 61,no. 19, pp. 4658–4672, Oct. 2013.

[9] J. Tan, D. Carmon, and D. Baron, “Signal estimation with additive errormetrics in compressed sensing,” IEEE Trans. Inf. Theory, vol. 60, no. 1,pp. 150–158, Jan. 2014.

[10] M. Dalai and R. Leonardi, “�-infinity constrained approximationsfor image and video compression,” in Proc. Picture Coding Symp.,Apr. 2006.

[11] C. Studer, W. Yin, and R. G. Baraniuk, “Signal representations withminimum �∞-norm,” in Proc. 50th Allerton Conf. Commun., Control,Comput., Oct. 2012, pp. 1270–1277.

[12] A. C. Gilbert, B. Hemenway, A. Rudra, M. J. Strauss, and M. Wootters,“Recovering simple signals,” in Proc. Inf. Theory Appl. Workshop,Feb. 2012, pp. 382–391.

[13] M. Egerstedt and C. F. Martin, “Trajectory planning in the infinitynorm for linear control systems,” Int. J. Control, vol. 72, no. 13,pp. 1139–1146, 1999.

[14] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles:Exact signal reconstruction from highly incomplete frequency informa-tion,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006.

[15] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52,no. 4, pp. 1289–1306, Apr. 2006.

[16] D. Guo and C. C. Wang, “Random sparse linear systems observed viaarbitrary channels: A decoupling principle,” in Proc. IEEE Int. Symp.Inf. Theory, Jun. 2007, pp. 946–950.

[17] D. Guo and C.-C. Wang, “Multiuser detection of sparsely spreadCDMA,” IEEE J. Sel. Areas Commun., vol. 26, no. 3, pp. 421–431,Apr. 2008.

[18] S. Rangan, “Estimation with random linear mixing, belief propagationand compressed sensing,” in Proc. 44th Annu. Conf. Inf. Sci. Syst.,Mar. 2010, pp. 1–6.

[19] S. Rangan, “Estimation with random linear mixing, belief propagationand compressed sensing,” arXiv:1001.2228, Jan. 2010.

[20] J.-L. Starck, F. Murtagh, and J. M. Fadili, Sparse Image and SignalProcessing: Wavelets, Curvelets, Morphological Diversity. Cambridge,U.K.: Cambridge Univ. Press, 2010.

[21] J. Vila and P. Schniter, “Expectation-maximization Bernoulli-Gaussianapproximate message passing,” in Proc. IEEE 45th Asilomar Conf.Signals, Syst. Comput., Nov. 2011, pp. 799–803.

[22] C. E. Clark, “The greatest of a finite set of random variables,” Oper.Res., vol. 9, no. 2, pp. 145–162, Mar. 1961.

[23] P. Indyk, “On approximate nearest neighbors under �∞ norm,”J. Comput. Syst. Sci., vol. 63, no. 4, pp. 627–638, Dec. 2001.

[24] D. Guo and S. Verdú, “Randomly spread CDMA: Asymptoticsvia statistical physics,” IEEE Trans. Inf. Theory, vol. 51, no. 6,pp. 1983–2010, Jun. 2005.

[25] A. Montanari and D. Tse, “Analysis of belief propagation for non-linearproblems: The example of CDMA (or: How to prove Tanaka’s formula),”in Proc. IEEE Inf. Theory Workshop, Mar. 2006, pp. 160–164.

[26] D. Guo, D. Baron, and S. Shamai, “A single-letter characterizationof optimal noisy compressed sensing,” in Proc. 47th Allerton Conf.Commun., Control, Comput., Sep. 2009, pp. 52–59.

[27] S. Rangan, A. K. Fletcher, and V. K. Goyal, “Asymptotic analysis ofMAP estimation via the replica method and applications to compressedsensing,” IEEE Trans. Inf. Theory, vol. 58, no. 3, pp. 1902–1923,Mar. 2012.

[28] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algo-rithms for compressed sensing,” Proc. Nat. Acad. Sci., vol. 106, no. 45,pp. 18914–18919, Nov. 2009.

[29] N. Wiener, Extrapolation, Interpolation, and Smoothing of StationaryTime Series With Engineering Applications. Cambridge, MA, USA:MIT Press, 1949.

Page 10: Wiener Filters in Gaussian Mixture Signal Estimation With  \(\ell _\infty \) -Norm Error

TAN et al.: WIENER FILTERS IN GAUSSIAN MIXTURE SIGNAL ESTIMATION 6635

[30] S. Sherman, “A theorem on convex sets with applications,” Ann. Math.Statist., vol. 26, no. 4, pp. 763–767, Dec. 1955.

[31] S. Sherman, “Non-mean-square error criteria,” IRE Trans. Inf. Theory,vol. 4, no. 3, pp. 125–126, Sep. 1958.

[32] T. M. Cover and J. A. Thomas, Elements of Information Theory.New York, NY, USA: Wiley, 2006.

[33] B. Gnedenko, “Sur la distribution limite du terme maximum d’une seriealeatoire,” Ann. Math., vol. 44, no. 3, pp. 423–453, Jul. 1943.

[34] T. Tanaka, “A statistical-mechanics approach to large-system analysis ofCDMA multiuser detectors,” IEEE Trans. Inf. Theory, vol. 48, no. 11,pp. 2888–2910, Nov. 2002.

[35] S. Berman, “A law of large numbers for the maximum in a stationaryGaussian sequence,” Ann. Math. Statist., vol. 33, no. 1, pp. 93–97,Mar. 1962.

Jin Tan (S’11) received the B.Sc. degree in Microelectronics from FudanUniversity, China, in 2010, and the M.Sc. degree in Electrical and ComputerEngineering from North Carolina State University, Raleigh, USA, in 2012.Currently she is a PhD candidate at North Carolina State University, in thedepartment of Electrical and Computer Engineering. Her research interestsinclude information theory, estimation theory, and statistical signal processing.

Dror Baron (S’99–M’03–M’10) received the B.Sc. (summa cum laude) andM.Sc. degrees from the Technion - Israel Institute of Technology, Haifa, Israel,in 1997 and 1999, and the Ph.D. degree from the University of Illinois atUrbana-Champaign in 2003, all in electrical engineering.

From 1997 to 1999, Dr. Baron worked at Witcom Ltd. in modem design.From 1999 to 2003, he was a research assistant at the University of Illinoisat Urbana-Champaign, where he was also a Visiting Assistant Professor in2003. From 2003 to 2006, he was a Postdoctoral Research Associate inthe Department of Electrical and Computer Engineering at Rice University,Houston, TX. From 2007 to 2008, he was a quantitative financial analyst withMenta Capital, San Francisco, CA, and from 2008 to 2010 he was a VisitingScientist in the Department of Electrical Engineering at the Technion - IsraelInstitute of Technology, Haifa. Since 2010, Dr. Baron has been an AssistantProfessor in the Electrical and Computer Engineering Department at NorthCarolina State University.

Dr. Baron’s research interests combine information theory, signal process-ing, and fast algorithms; in recent years, he has focused on compressedsensing. Dr. Baron was a recipient of the 2002 M. E. Van ValkenburgGraduate Research Award, and received honorable mention at the RobertBohrer Memorial Student Workshop in April 2002, both at the Universityof Illinois. He also participated from 1994 to 1997 in the Program forOutstanding Students, comprising the top 0.5% of undergraduates at theTechnion.

Liyi Dai (S’93–M’93–SM’13–F’14) received a B.S. degree from ShandongUniversity, Shandong, China, in 1983, an M.S. degree from the Institute ofSystems Science, Academia Sinica, Beijing, China, in 1986 and a Ph.D.degree from Harvard University, Cambridge, MA, USA, in 1993. He wasa recipient of the NSF CAREER Award and an Associate Editor of the IEEETRANSACTIONS ON AUTOMATIC CONTROL. He has authored/coauthored 78journal and conference publications and is the author of Singular ControlSystems (Springer-Verlag, 1989).