multivariate probability density deconvolution for stationary random processes

11
1105 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 4, JULY 1991 Multivariate Probability Density Deconvolution for Stationary Random Processes Elias Masry, Fellow, IEEE Abstract -The kernel-type estimation of the joint probability density functions of stationary random processes from noisy observations is considered. Precise asymptotic expressions and bounds on the mean-square estimation error are established, along with rates of mean-square convergence, for processes satisfying a variety of mixing conditions. The dependence of the convergence rates on the joint density of the noise process is studied. Index Terms -Deconvolution of multivariate probability den- sities, quadratic-mean convergence and rates, mixing processes. I. INTRODUCTION ONSIDER the problem of estimating a probability C density function f(x) using observations corrupted by additive noise. Set where the Xi's are independent and identically dis- tributed (i.i.d.) random variables with (unknown) proba- bility density function f(x), the ei's are i.i.d. random variables with known probability density h(x), and the processes (Xi} and (ei} are independent. Let g(x) denote the probability density function of x. that is given by the convolution g = f*h. Given n observations {yi}:=l one desires to estimate the probability density f(x). This is clearly a probability density deconvolution problem; it arises in a variety of contexts such as communication theory [12], [15] and medical sciences [lo]. Kernel methods for the estimation of f(x) were consid- ered in [2], [4], [13] where bounds on the rate of mean- square convergence are established. Orthogonal series expansion approach is considered in [9] where similar rates are obtained. B-spline method is discussed in [lo] and a maximum likelihood approach is presented in [12]. The purpose of this paper is to extend the kernel approach considered in [2], [4], [13] in the following direc- tions. The process (Xi};= -m is allowed to have a depen- y, = xi + .si, (1.1) dence structure. Manuscript received May 22, 1990; revised September 30, 1990. This work was supported by the Office of Naval Research under Grant E. Masry is with the Department of Electrical Engineering, University IEEE Log Number 9143446. N00014-90-J-1175. of California, San Diego, La Jolla, CA 92093. Convergence properties of estimators for the joint probability densities f(x; p), p 2 1, of the process (Xi};= -m are established. The noise process (ei}:= -m is allowed to have a de- pendence structure. Exact asymptotic expressions, rather than bounds, are established for a broad class of noise processes. We thus assume that the process (Xi};= -m is stationary and for each integer p 2 1 let f(x; p) = f(xl; . * , xp; p> be the joint probability density function of the random vari- ables x,; . e, Xp that is assumed to exist. Similarly, let h(x;p) be the joint probability density function of the random variables cl,- * a, ep. Denote the joint probability density of Yl, . a, Yp by g(y; p). Clearly g(y;p) =/ R* f(Y -u;p)h(u;Wu. (1.2) Denote the characteristic functions corresponding to respectively. Then f(x;~), h(~;p), and g(Y;P> by d~f(t), 4h(f), and 4g(t), 4#> = 4f(MIW* (1.3) Our goal is to study the quadratic-mean convergence properties of appropriate estimators of f(x; p) on the basis of the observations (x}:=,, n > p. This is clearly a multidimensional density deconvolution problem for de- pendent data. We adopt the kernel density estimation approach considered in [41,[131. We make the following basic assumptions. Let K(x) be a real, even, bounded density on Rp satisfying K(x)= O(Jl~ll-~-') for some 6 > 0, and denote its Fourier trans- form by &&), c#K( t) = jRPXK( x) dr, where tax = CiP_ltjxj. Assume that a) I4h(t)l> 0, for all t E RP, (1.4a) b) +K(t)/4h(t/ b) E Ll(RP)n L,(RP), (1.4b) Define the function W,(x) on RP, b > 0, by for every b > 0. 0018-9448/91/0700-1105$01.00 01991 IEEE ~ ____ ___ T

Upload: e

Post on 22-Sep-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multivariate probability density deconvolution for stationary random processes

1105 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 4, JULY 1991

Multivariate Probability Density Deconvolution for Stationary

Random Processes Elias Masry, Fellow, IEEE

Abstract -The kernel-type estimation of the joint probability density functions of stationary random processes from noisy observations is considered. Precise asymptotic expressions and bounds on the mean-square estimation error are established, along with rates of mean-square convergence, for processes satisfying a variety of mixing conditions. The dependence of the convergence rates on the joint density of the noise process is studied.

Index Terms -Deconvolution of multivariate probability den- sities, quadratic-mean convergence and rates, mixing processes.

I. INTRODUCTION ONSIDER the problem of estimating a probability C density function f ( x ) using observations corrupted

by additive noise. Set

where the Xi's are independent and identically dis- tributed (i.i.d.) random variables with (unknown) proba- bility density function f ( x ) , the ei's are i.i.d. random variables with known probability density h ( x ) , and the processes (Xi} and (ei} are independent. Let g ( x ) denote the probability density function of x. that is given by the convolution g = f*h. Given n observations {yi}:=l one desires to estimate the probability density f ( x ) . This is clearly a probability density deconvolution problem; it arises in a variety of contexts such as communication theory [12], [15] and medical sciences [lo].

Kernel methods for the estimation of f ( x ) were consid- ered in [2], [4], [13] where bounds on the rate of mean- square convergence are established. Orthogonal series expansion approach is considered in [9] where similar rates are obtained. B-spline method is discussed in [lo] and a maximum likelihood approach is presented in [12].

The purpose of this paper is to extend the kernel approach considered in [2], [4], [13] in the following direc- tions.

The process (Xi};= -m is allowed to have a depen-

y, = xi + .si, (1.1)

dence structure.

Manuscript received May 22, 1990; revised September 30, 1990. This work was supported by the Office of Naval Research under Grant

E. Masry is with the Department of Electrical Engineering, University

IEEE Log Number 9143446.

N00014-90-J-1175.

of California, San Diego, La Jolla, CA 92093.

Convergence properties of estimators for the joint probability densities f(x; p ) , p 2 1, of the process (Xi};= - m are established. The noise process (ei}:= - m is allowed to have a de- pendence structure. Exact asymptotic expressions, rather than bounds, are established for a broad class of noise processes.

We thus assume that the process (Xi};= - m is stationary and for each integer p 2 1 let f(x; p ) = f ( x l ; . * , xp; p > be the joint probability density function of the random vari- ables x,; . e , X p that is assumed to exist. Similarly, let h ( x ; p ) be the joint probability density function of the random variables c l , - * a , e p . Denote the joint probability density of Yl , . a , Yp by g ( y ; p) . Clearly

g ( y ; p ) =/ R* f ( Y - u ; p ) h ( u ; W u . (1.2)

Denote the characteristic functions corresponding to

respectively. Then f ( x ; ~ ) , h ( ~ ; p ) , and g ( Y ; P > by d ~ f ( t ) , 4 h ( f ) , and 4g(t) ,

4#> = 4f(MIW* (1.3)

Our goal is to study the quadratic-mean convergence properties of appropriate estimators of f ( x ; p ) on the basis of the observations (x}:=,, n > p . This is clearly a multidimensional density deconvolution problem for de- pendent data. We adopt the kernel density estimation approach considered in [41,[131.

We make the following basic assumptions. Let K ( x ) be a real, even, bounded density on R p satisfying K ( x ) = O(Jl~ l l -~- ' ) for some 6 > 0, and denote its Fourier trans- form by &&),

c#K( t ) = j R P X K ( x) dr,

where t a x = CiP_ltjxj. Assume that a) I 4 h ( t ) l > 0, for all t E R P , (1.4a) b) + K ( t ) / 4 h ( t / b ) E Ll(RP)n L,(RP),

(1.4b)

Define the function W,(x) on R P , b > 0, by

for every b > 0.

0018-9448/91/0700-1105$01.00 01991 IEEE

~ ____ ___

T

Page 2: Multivariate probability density deconvolution for stationary random processes

1106 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 4, JULY 1991

Note that W,(x) is real since ~ $ ~ ( t ) is real. If h ( x ; p ) is even so is W,(x). By (1.4) it is clear that W,(x) is bounded and uniformly continuous (the bound depends on b). Note that in general W,(x) is neither nonnegative nor integrable. However, W,(x) is always in L , ( R P ) by (1.4) and Parseval's theorem. If 4K( t )/ 4Jt / b ) is sufficiently smooth, then [l , Ch. 91 W, (x ) would be in L, (RP) , as in Section 111.

Let (b,}I= be a sequence of positive numbers such that b, + 0 as n +CQ. Given the observations {YJ:=,, we esti- mate f k p ) by

1

where

and it is naturally assumed that n > p . Note that fn (x ; p ) has the form of a kernel density estimator, but that the kernel W,n<x) depends on the bandwidth parameter b, unlike the classical density estimator.

It is also clear that the estimator f , ( x ; p ) may not in general be positive or integrable. However, when W,(x) is in L,(RP) then by (1.5) and the inversion formula we have

SO :hat, by taking t = 0, [,,W,(x)dX = 1 and thus jRuf,(x; p ) dwA= 1. We finally note that an alternative ex- pression for f,Cx; p ) is

where ~ $ ~ , ~ ( t ) is the standard multivariate estimate of the characteristic function +,(t) ,

(1.9)

and the integralA(l.8) exists in view of (1.4) and the boundedness of 4g,n(t) .

In Section I1 we establish-upper bounds on the mean- square estimation error for f J x ; p ) , under various mixing conditions on the process { X J , for broad classes of noise densities h(x; p ) . These quadratic-mean convergence rates (with p = 1) match the optimal rates obtained in [2], [41 for the case of i.i.d. observations. The convergence results established in Section 11, as well as those given in [2], [41, [8], [9], 1131, are all in the form of bounds on the mean- square estimator error. While the precise asymptotic ex- pression for the bias of these estimators is well known, no such expression has been derived for the variance of the estimators in the previously cited papers.' In Section 111, we derive the precise asymptotic expression for the vari-

'It was brought to my attention by a referee that for the special case of i.i.d. observations with 4h(t) decaying algebraically, the asymptotic variance was derived in a recently published paper [5] .

ance of the joint density estimators i ( x ; p ) for the class of noise characteristic functions with algebraically decay- ing tail. In order to reduce the complexity of analysis, it is assumed in Section 111 that the noise process {ei>:= p m is an i.i.d. sequence; this assumption, however, is not essen- tial as shown in Section 111.

11. BOUNDS ON QUADRATIC-MEAN CONVERGENCE The following approximation of the identity [141 is

needed in the sequel. Lemmu I : Assume that Q ( x ) is a bounded integrable

function on R P such that &(XI= O ( l l ~ l l - ~ - ~ ) for some 6 > 0. Let f E L,(RP). Then for almost all x E R P we have

The bias of f n ( x ; p ) is easily obtained.

Theorem 1:

a) For almost all x E R P we have

E [ L ( x ; p ) ] - f ( x ; P ) , as r2-m-

b) Assume that f ( x ; p ) is twice differentiable and its second partial derivatives are bounded and continu- ous on R P . The kernel K ( x ) is assumed to satisfy

j R F j K ( u ) du = 0, j = 1;. - , p ,

and

have

(1/ b i ) bias

as n-00

f n k P > ] 1

u G " ( x ; p ) u T K ( u ) du (2.2) 5 j R P

where the ( p x p ) matrix G" is given by

and uT is the transpose of the row vector U.

Proof: a) From (1.9) it follows that E[C$,,,(t)l= + , ( t ) = 4f(t)+h(t). Hence by (1.81, Fubini's theorem, and the convolution theorem, we have

and Part a) follows by Lemma 2.1.

integral remainder we obtain b) Expanding f in (2.3) in a Taylor series with an

Page 3: Multivariate probability density deconvolution for stationary random processes

MASRY: MULTIVARIATE PROBABILITY DENSITY DECONVOLUTION 1107

where

an( x) = 11( 1 - w ) j R p [ uG”( x - bnwu; p ) u T ] K ( U ) du.

Part b) follows by applying dominated convergence to a , ( x ) using the continuity and boundedness of G”. 0

It should be noted that the bias of the estimator d(x; p ) is identical to that of the estimator of f ( x ; p ) based on the observation ( X , ; . ., Xn>, i.e., when no observation noise is present. It is also identical to the case of a vector random sample. &The situtation is radically different for the variance of f n ( x ; p ) as will be seen next.

Let Ek be the a-algebra of events generated by the random variables ( X I , E ] , i I j I kl and L2(%b) denote the collection of all second-order random variables which are xb-measurable. The stationary processes {X,, E ] ) are called strongly mixing [ l l ] if

sup IP[AB]-P[A]P[B]I=a,+O, as k + m , A E y”-”, B E yk”

0

(2-5)

(2.6a)

and uniformly mixing if sup IP[BlA]-P[B]I=+,+O, as k + m ,

A E 9 - O -

B E Fk-

(2.6b) and p-mixing [7] if

ICOV{U,V) I varl/2[u]varl/2[v] = P k + O ’

U E L,( F-”“-) v E L,(ykF,“)

as k -+w. (2 .6~) a , (respectively + k ) is the strong (respectively uniform) mixing coefficient and pk is the maximal correlation coef- ficient. It is well known that ak 5 (1/4)pk I (1/2)+i/2 and thus the class of p-mixing processes is intermediate between strong and uniform mixing.

Theorem 2:

a) If { X , , E , } are p-mixing processes with L ; = , p k < w

and f ( x ; p ) _< C, < CQ, then

where

A , = C , 2 p - 1 + 2 pi . I j = m l 1

b) If {Xi, EJ are strongly mixing processes with Z~=, ( ‘Y~)~-~ / ’ ’ <CO, for some Y > 2, and f ( x ; p ) I C, < w, then

var [ L ( x ; p ) ]

It zollows from Theorem 2.2 that the rate at which var [ f n ( x ; p ) ] tends to zero as n -, CQ depends on the norm ((wb,((q, q 2 2, of the kernel wb(x) of (1.5) and therefore on the rate of decay of the multivariate characteristic function + J t ) of the noise process as lltll + W . Following the proof of Theorem 2 we consider two board classes of such characteristic functions, corresponding to exponen- tial and algebraic decay, and estimate the value of the

Proof of Theorem 2: By (1.6) and stationarity we have norm IIwbnllq.

var [ L(x;p)]

n - p - 1

I = 1

By Theorem 1 (Part a))

Now by (2.7) we set

1 n - p - 1

s = 2 ( 1-- n - p)zn , i I = 1

P - 1 n - p - 1

I = 1 I = p - < 2 lZn,J+2 l Z n , i l = J l + J 2 . (2.9)

For J , we apply the Cauchy-Schwarz inequality to obtain

(2.10)

Page 4: Multivariate probability density deconvolution for stationary random processes

1108 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 4, JULY 1991

a) For p-mixing processes we bound J , by Assume that 4,(t) satisfies the following.

since Yo and Y, are 92”- and Fj,”,-measurable, respec- tively. It follows by (2.7)-(2.11) that

IlW& 1 + 2 ( p - 1 ) + 2 p j + O( l / n ) .

j = ] - 1 ( n - p ) b , P b) For strongly mixing processes we have for random

variables U and V which are Fom and Fkm measurable, respectively, with EJUI” <a, EJVI” < w for v > 2, that by a lemma of Doe [6, Corollary A.21,

~ ~ ~ ~ ( U , V } ~ I ~ ( ~ ~ ) ~ ~ ~ ~ ~ { E I U I ” E I ~ / I ~ } ~ ~ ” . (2.12)

Applying this bound to each term in J2 of (2.9) we obtain

3, I 16

( n - P)b??

Thus by (2.7H2.10) and (2.13) we have

var [ L ( x ; p ) ]

+ O ( l / n ) . 0

In order to estabiish upper bounds on the rate of convergence of var[ f , ( x ; p ) ] as n + 03, we now determine the rate of growth of the norms I1wbn11q appearing in Theorem 2. We begin with the case of algebraic decay of +,(t) as lltll +W. The proof of the following proposition is relegated to the Appendix.

Proposition I : Assume that +,( t ) satisfies

a) I+,(t)l > 0, b) lltll’(p)14h(t)l + Bp as lltll +a,

for some p ( p ) > 0 and Bp > 0.

for all t E R P ,

lim sup ~ ~ ( P ) I I wb 11 I ( A ,) ‘Iu( A,) -,Iu.

We remark that Conditions a h ) of Proposition 1 ensure that wb(x) E L,. Condition b) is needed in order to obtain the precise asymptotic value of the norm IlwbllZ

that is utilized later in Section 111. For this section it would have been sufficient to establish only a bound on the norm Ilwbll2; for this, Condition b) could be replaced by the weaker condition lltll~(p)l+,(t)l > Bp for large lltll for some p ( p ) > O and Bp>O (see the proof in the Appendix). Theorem 2 and Propositiop 1 provide the following rates of convergence for var[fn(x; p ) ] when the tail of the joint characteristic function 4,(t) of the noise process {ei} decays algebraically as lltll +m.

b-+O

Corollary 1:

1) For p-mixing processes we have under the assump- tions of Theorem 2a) and Conditions a)-c) of Proposition 1 that, as n --)a,

b) For strongly mixing processes we have under the assumptions of Theorem 2b) and Conditions a)-d) of Proposition 1 that, as n +m,

where U > 2.

If y e combine the results of Corollary la) with the bias of f , ( x ; p ) , as given in Theorem lb), and select an optimal value of the bandwidth parameter, bn - n - ’ / ( 4 + P + 2 p ( p ) ) we find that for p-mixing processes we have a mean-square convergence rate of

~ l ~ ( ~ ; ~ ) - f ( ~ ; ~ ) l ~ = 0 ( ~ - - 4 / ( 4 + ~ + 2 p ( p ) ) 1 and for strongly mixing processes with

b, - 1 / [ 4 + 2 P ( i - l / U ) + 2 P ( P ) I

we have

E l L ( W ) - f ( x ; p ) 1, = O ( n - 2 / [ 2 + P ( l - l / ” ) + P ( P l l ) ,

where v > 2.

observation errors {ei}. Then It may be of interest to consider the special case of i.i.d.

P

+ h ( t ) = n &(ti>, (2.14) i = l

Page 5: Multivariate probability density deconvolution for stationary random processes

MASRY: MULTIVARIATE PROBABILITY DENSITY DECONVOLUTION

~

1109

where q h ( t ) is the characteristic function of E ; . For this case we select a factorable kernel

P K ( x ) = n K ( x i ) , (2.15)

where K ( x ) is real, even, bounded density on the real line satisfying K(x) = O(lxI-('+')) for some 6 > 0. Then by (1.5) we find

i = l

P wb(x) = n 'b('i>, (2.16)

i = l

where 1 m

Wb(x) = -/ ei'x[&K(t)/&h(t/b)] d t . (2.17) 2~ -m

Then Proposition 1 takes the form of Proposition 2. Proposition 2: Assume that + J t ) satisfies

a) Iqh(t)l > 0, for all t E R , b) ItlPl&(t)l + B, as It1 -CQ, for some p > 0.

Assume that qK(t) satisfies the following.

Then

2) if, in addition, we have d)

then for v > 2 we have

c) A, = 1/(2.rrB:)/"_,lt12P1qK(~)12 dt < W .

1) b2PPI(wbll~

d) A5 = ( 1 / 2 ~ B , > / ~ ~ l t I ~ I ~ K ( t ) ( dt <m

limsupbPplIWbI1, I ( (&)1/u(&)1-2 /u)p . (2.18) b - 0

Proof: Apply Proposition 1 with p = 1. 0

For the special case of i.i.d. observation errors {ei} and factorabl? kernel, the results of Corollary 1 for the vari- ance of f J x ; p ) become

Al( A4)p

nbn v a r [ L ( x ; d ] ~ p(2p+1) ( I + o w (2.19)

for p-mixing processes, and

( 2.20)

for 2trongly mixing processes where v > 2. When the bias of fn (x ; p ) is given by'Theorem lb), and the bandwidth b, is chosen optimally, the corresponding mean-square er- rors are then

~ k ( x ; p ) - f ( x ; p ) I 2 = 0(n-4/[4+~(1+2~)1) (2.21)

for p-mixing processes, and

~ ( ~ ( x ; p ) - f ( x ; p ) l 2 = O(n-2/[2+p(l+p-1/v)l) (2.22)

for strongly mixing processes where v > 2. For p = 1 the rate (2.21) coincides with the optimal rate given in [4] for the case of i.i.d. observations. We remark that when the

probability density of the observation error ei is Gamma with parameters a and A then the parameter /3 in Propo- sition 2 and results (2.19)-(2.22) is equal to a. Similarly, when the observation errors {ei) have a double-exponen- tial density, then the parameter p in the results (2.19)-(2.22) is equal to 2. It is worth noting that when ei = 0 (no observation noise) the corzesponding classical mean-square convergence rate of f,( x; p ) is precisely n-4/(4+p) for p-mixing and strongly mixing processes. Thus the presence of the observation noise diminishes the rate of convergence by a factor that depends on the rate of decay

We next establish Aupper bounds on the mean-square estimation error of f , ( x ; p ) when the joint characteristic function 4Jt) of the noise processes {ei} has an exponen- tial decay as lltll -+W. This includes as special cases Gaussian and multivariate Cauchy (with spherical symme- try) noise probability densities. The proof of the following proposition is given in the Appendix.

Proposition 3: Assume that +h(t ) satisfies the following.

a) [ + & ) I > 0, b) I ~ h ( t ) l ~ B p l l t l l P o ( P ) e - u ~ ~ ~ ~ ~ ~ ( p ) as (Itll -+ m (say Iltll>M)

of the tail characteristic function 4,(t> of ei.

for all t E RP.

with a > 0, p ( p ) > 0, p o ( p ) real.

Assume that ~ $ ~ ( t ) satisfies the following.

c) +, ( t ) = 0, for Iltll> d for some d > 0.

Then

llwb[l; 5 A 6 e 2 a ( d / b ) o ( P ) ( I + o(1)) as b + 0 where A , = Vp(d)/((2.rr)pB,2M280(p)). Moreover, for v > 2,

where the constant % ( d ) is given in (A.7). Theorem 2 and Propositionn3 now provide the following

rate of convergence for var[f,(x;p)] when the tail of the joint characteristic function 4,(t) of the noise process {e;} decays exponentially as lltll +Co.

Corollary 2:

a) For p-mixing processes we have under the assump- tions of Theorem 2a) and Proposition 3 that, as n +m,

e 2 a ( d / b , )B(p)

var [ L(x;.,] 5 A 1 4 nb,P ( I + o(1)).

b) For strongly mixing processes we have under the assumptions of Theorem 2b) and Proposition 3 that, as n+m,

var [L( x; P ) ]

where v > 2, and V , ( d ) is given in (A.7).

It follows from Corollary 2 that, when the joint charac- teristic function &(t) of the noise process decays ex-

I

Page 6: Multivariate probability density deconvolution for stationary random processes

1110 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 4, JULY 1991

ponentially fast, we must choose the bandwidth para- meter b, such that l / bn inc;eases only logarithmically in n in order to have var[f,(x;p)l + 0 as n --)CO. Set e 2 a ( d / b n ) w P ' = nl-7, for some o < y < 1, in Proposition 3. Then

for p-mixing processes, and

for strongly mixing processes. If we assume, in addition, that f(x; p ) is twice continuously differentiable, as in TheFrem lb), then the rate of mean-square convergence of f,(x; p ) becomes dominated by the bias, i.e.,

for both p-mixing and strongly mixing processes. When the noise process { e i } consists of i.i.d. random

variables and the kernel K ( x ) is factorable, as in (2.14)-(2.17), Proposition 2 could be recast in terms of the bejavior of +e one-dimensional characteristic func- tions + J t ) and + K ( f ) (as was done in Proposition 2 for the algebraic case). In this case we obtain IlW,ll; =

O(e2aJ'(d/b)P) as b + 0 where p = p(1). Then (2.23) be- comes

E [ L( x ; p ) - f( x ;p ) I2 = O((1n n ) - 4 / P ) . (2.24)

Note that the rate of convergence in (2.24) does not depend on the dimensionality p of the density f(x;p). The case of Gaussian noise corresponds to p = 2 and that of noise with a Cauchy probability density to p = 1.

111. EXACT QUADRATIC-MEAN CONVERGENCE In Section I1 upper bounds on the mean-square estima-

tion error for h ( x ; p ) were established. These results have the same form as those given in the literature 121, [41, [SI, [9], 1131 for the case of estimating a one-dimensional density from i.i.d. observations.

In this section we establish precise (rather than bounds) asymptotic expressions for the variance of the density estimators. This goal does not appear to be feasible when the tail of the noise characteristic function +h(t) decays exponentially fast as Ilt II + m. We consider therefore the case of algebraic decay only. In order to reduce the complexity of the derivations, we limit ourself to i.i.d. observation error {eL}. We thus have Y, = X , + E , , where the process (XI]:= --m is stationary with joint probability densities f ( x ; p ) , p = 1,2; -, as in the Introduction. The process {e1] consists of i.i.d. random variables, indepen- dent of {XJ, w'th probability density & ( x ) and character- istic function cjh(t). Let K ( x ) be a real, even, bounded density on the real line satisfying K ( x ) = O ( ~ X ( - ' - ~ ) for some S > 0 and let q K ( t ) be its characteristic function.

The joint density f(x; p ) is estimated by 1 n - a

where b, is the bandwidth parameter, b, + 0 as n +W.

The complexity in establishiFg precise asymptotic ex- pressions for the variance of f,(x;p) is due to two fac- tors: 1) the need to establish an approximation of the identity result involving the kernel Wi(x)-similar to that of the classical Lemma 1 where the kernel Q(x) does not depend on b-and 2) the need to have W,(x) E L , in order to handle the case of dependent data-see the proof of Theorem 3. This second factor requkes us to impose smoothness conditions on +,(t) and cjh(t ) that can be eliminated in the i.i.d. case (with p = 1). We next present, via a series of lemmas, an approximation of the identity for W t ( r ) (Lemma 4) as well as a bound on the L,-norm of W,(x) (Lemma 3) for the case where the noise characteristic function &,( t ) decays algebraically as It1 + m. Special examples would include Gamma and dou- ble exponential probability density for h ( x ) but exclude the Gaussian case. Now set

then, (3.3)

If Fb(t) E L,, as was assumed in Section 11, then W , ( x ) E

L, and its L,-norm can be found as in Proposition 2. If one the other hand F,(t) E L , and is twice continuously differentiable with F:(t) E L, then by integration by parts one has

and by the Riemann-Lebsegue lemma, w,(x) = o ( x - * ) s_o that wb(_x) E L,. Under such smoothness conditions on + J t ) and c j K ( t ) we will show that in fact

b a l m --m IF$'( t ) I dt 5 E , < m , (3.5)

for some p > 0, where the constant E , does not depend on b, from which one obtains a bound for W,(x) and for its L,-norm llWblll. The following lemma, which is a special case of Proposition 1, provides the asymptotic behavior of II W, II 2.

Page 7: Multivariate probability density deconvolution for stationary random processes

MASRY: MULTIVARIATE PROBABILITY DENSITY DECONVOLUTION 1111

Lemma 2: Assume that qh(t) satisfies

a) I&(t)l > 0, b) ItlPlqh(t)l + lBll as It1 +CO,

processes, we also consider the following dependence index introduced in [3] for p = l . Assume that the vec-

(3'6a) tor random variables X , = ( X I ; . ., X p > and X , = for Some P > 0. (x,,,,. . ., x,+,) have a joint probability density

(3-6b) f ( u ; u , 2 p , j ) of order 2p for j 2 p and let f ( u ; p ) be the probability density of X , . Define the dependence index pp,n of the process { X J by

for all t E R ,

Assume that $,Jt) satisfies

C) D , 1/(2.srlB,12)l~m,lt12p~~K(t)~z dt <CO, (3.6~) then 1 "

p p , n = SUP ; c I f ( u 7 v ; 2 p 7 j ) - f ( u ; p ) f ( u ; p ) I . 1) b2p/:m,IWb(~)12 Q!X + D,, as b + 0. u,Y J = p

and (3.7)

independent as j + CO, we expect pp,n to tend to zero as b2pp\

2) If, in addition,

then for v > 2 we have

If the random vectors X , and X , become asymptotically

n + W . Unlike mixing conditions, pp," depends only on the 2pth densities of the process ( X ) . Also note that the corresponding dependence index of the process (E;}.

~ , ( x ) 1, + of, as b + 0, R

d) 1/(2~lB,I)~~OTltlPl$,(t)l dt <a, (3.6d)

1 " , / U py,,= sup; c Ig(u,u;2P,;)-g(u;P)g(u;P)I,

limsupbpp{ lRPl ~ , ( x ) l v d ~ ) 5 Const. u,Y J = p

b+O satisfies

p'Y' p . n I p p , n (3.8) Note that under Assumptions a)-c) of Lemma 2 we have Fb(t) E L,. Next we impose smoothness conditions on $,(t) and $ K ( t ) that ensure that ( 3 . 5 ~ ~ satisfied and thus provide a bound on the Ll-norms IIWblll and IIWJI. The proof of Lemma 3 is given in the Appendix.

Lemma 3: Assume that qh(t) and qK(t) are twice continuously differentiable with bounded derivatives such that

since the E,'S are i.i.d. with probability density h ( x ) and hence,

g ( x 7 y ; 2 p 7 j ) - g(x ;P )g (Y;P)

a) Iqh(t)l > 0, b) t p $ h ( t ) + B 1 as t -+CO,

c) 8p,ll~mluIP-21~K(u)I du <CO,

for all t E R, for some P21 , and lBll

> 07

l",lulP-'l&(u)ldu <CO,

lLPl OT ulPl$;(u)l du <CO,

where a p , , is the Kronecker's delta. Then bPx2IWb(x)I I E, <CO, where E, is a constant independent of b. Thus bplWblll I Const. and bPPllWblll I Const.

We now state an approximation of the identity result involving the kernel W i ( x ) . Its proof is given in the Appendix.

Lemma 4: Assume that $ h ( t ) and &,Jt) satisfy the assumptions of 'Lemma 3 and Conditions b) and c) of Lemma 2. Set

D

Wb( x) = n [ b'pW;( Xi)] i = l

and let g E Lm(RP). Then r _

P

- n [h( uJh ( Vi)] dudu. (3.9) i = l

We now state and prove the principal result of this section.

Theorem 3: Assume that &(t) and $,(t) satisfy the assumptions of Lemmas 3 and Condition b) and c) of Lemma 2. Assume that the density g(y;q) of the vector (Y,; . . ,Y,) exists and is bounded by a constant ( M , ) for all 1 I q I 2p . Furthermore, the 2p-dimensional proba- bility density g(x, y;2p, j ) of the vectors Yo, y , j 2 p , exists.

a) If

SUP Ig(x,Y;2P,j)-g(x;P)g(Y;P)I X , Y , j Z P

- < M , < m , (3.10)

b) if (3.6d) and (3.10) are satisfied and (E;} is a strongly and (y.1 is a p-mixing process with ,pk < to, or

lim \ l A @ b ( ( x - * ) / b ) g ( u ) d u = D p g ( x )

at all points x of continuity of g; the constant D , is given by (3.6~).

We now establish preci2e asymptotic results for the

mixing process with m

k r ( a k ) l - * / " <CO,

k = l

b - 0 RP 1 fo r somev>2and r > 1 - 2 / v (3.11)

variance of the estimator f i ( x ; p) . In addition to mixing or

I- -

Page 8: Multivariate probability density deconvolution for stationary random processes
Page 9: Multivariate probability density deconvolution for stationary random processes

MASRY: MULTIVARIATE PROBABILITY DENSITY DECONVOLUTION

and, by Lemma 4,

Hence

nbizP+’)PS 3 - < [ f: P l ] D f g ( l : P ) ( l + o ( l ) ) I = c , + l

and, since {pJ is summable and cn +w, we have

lim sup nb;’P +1)pS3 = 0. (3.19)

Part a) of the theorem now follows from (3.13), (3..15)-(3.19).

b) For strongly mixing processes we use the bound (2.12) for v > 2 to obtain for cn + 1 i 1 I n - p - 1

n + m

and, by Part 2) of Lemma 2, we have J3 I [Const./ b,PP”I(l+ o(1)). Hence, by (3.20),

so that

nbffP+‘)pS, I Const./b~’-2/“)P m

( a k ) 1 - 2 / y ( 1 + o( 1) ) . k = c , - p

This can be bounded from above by

Now choose (cn)- ’ = bi1-2 /u)p /r , then for r > (1 - 2/ v) we have c, b,P + 0 as required. Then

m

n b ~ * P + ’ ) p S 3 ~ C ~ n ~ t . k ‘ ( a k ) ’ - 2 / ” ( 1 + o ( 1 ) ) -0 ,

as n -03, (3.21)

by the summability of {kr(ak)’-2/”} and c n + m as n +w. Part b) of the theorem now follows from (3.13), (3.15)-(3.18), and (3.21).

k = c , - p

1113

c) From (3.13) and (3.15) we have a precise asymptotic expression for In,O. From (3.17) we know that the term S, is asymptotically negligible. It remains to show that

nb;’P+’)p(S2 + S,) - 0, as n + w .

Now 1 s, + s3 =

( n - P)biP

n - D - 1

and by (3.8) we have

By Lemma 3, IIWb,llL,(RP) I Const./bPP, so that

S, + S3 = O(pp ,n /bzPP) = o( l / (nb~zP+’)p ) )

by assumption on pp,n. 0

We now discuss the implications of Theorem 3. We first note that for strongly mcing processes the convergence rate for the variance of f n ( x ; p ) is exactly l / ( n b ~ z P + l ) p ) , which is considerably faster than the rate of 1/ ( n b i p ( P + l - l / u ) ) given by the bound in Corollary 1. Note, however, that the condition imposed on the strong mixing coefficient ak in Theorem 3, C ~ = l k r ( a k ) 1 - 2 / u < C O for some v > 2 and r > (1 - 2/ v), is more stringent than the corresponding condition 1(ak)1-2/’ < w for some v > 2, needed for Corollary 1.

When f < x ; p > is twice continuously differentiable, then by Theorem 1 (Part b)) and Theorem 3, we have a precise asymptotic expression for the mean-square estimation error,

E [ h P ) - f ( x ; p ) l 2

T-

Page 10: Multivariate probability density deconvolution for stationary random processes

1114 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 4, JULY 1991

The asymptotically optimal bandwidth parameter is then b, - under which we have

lim n 4 / [ 4 + ( 2 ~ +‘)PIE [ d( x; p ) - f ( x; p ) ] ’ n + m

Thus the precise rate of convergence of the mean-square error is n -4/[4+(2P+1)pl, valid for p-mixing and strongly mixing processes as well as for processes with multivariate dependence index satisfying the conditions stated in The- orem 3. Note that in the absence of the noise process, the corresponding rate is known to be n-4/(4+p). The effect of the noise is then quite clear and it depends on the parameter p that describes the behavior of the tail char- acteristic function, $ J t ) - B , / t P as t +CQ, of the noise process.

We finally rem:rk that results similar to Theorem 3 for the variance of f , ( x ; p ) can be obtained when the noise process {ei} is not necessarily an i.i.d. sequence and the kernel K ( x ) is not necessarily of product-type: The corre- sponding result to Lemma 2 has already been given in Proposition 1. All that is needed is a multivariate version of Lemma 3 that can be established by imposing a smoothness condition on the multidimensional function F b ( t ) +K(t ) /4h( t ) and proceeding as in the proof of Lemma 3. The proof of Lemma 4 remains essentially the same.

APPENDIX Proof of Proposition 1: By Parseval’s theorem we have

b2P(P)IIWbll; = (1/277)’/ Th( t ) dt; R P

By Condition b) we have

Moreover, by Condition b) there exists a large (but fixed) M > 0 such that

where 1, is the indicator function of the set A . For any E > 0 and all b < E / M , we have

I *b( t ) I 5 C( M)(~ /~ )2P(p)1 [ l l t l l &)

+ (2/Bp)211tl12P(p)I + K ( t ) 1’ A( t ) (A.3)

and note that A(t) is integrable in view of Condition c). It then follows by (A.lb(A.3) and dominated conver- gence that limb-o b2P(P)IIWbll; = A , and Part 1) follows. For Part 2) we note that for v > 2

II w, II v I I I W, 11:- 2 / v I I W, II (A.4) and we need to find a bound on IIWbllm. Now

and proceeding as in Part 1) we find that

Thus

and therefore limsup, + o bP(p)llWbll, I (As)1-2/v(A4)1/V. 0

Proof of Proposition 3: By Parseval’s theorem we have

By Condition a) we then have 1

I , I dt = O( b P ) . (AS) (277)’ min I +h( u ) I htii 5 bM

llull < M

Next by Condition b) we have

with 2 n- p / 2 d p

V p ( d ) = / du= (A.7) llull s d PT(P/2 ) .

It follows that I , is asymptotically negligible compared to the bound (A.6) of I,. The bound on the norm llWbll; follows. Now

Page 11: Multivariate probability density deconvolution for stationary random processes

MASRY: MULTIVARIATE PROBABILITY DENSITY DECONVOLUTION 1115

and by the Cauchy-Schwarz inequality 6 = 6(x, E ) such that Ig(x - U ) - g(x)l< E , so that

and by (A.4) the bound on the norm ~~~b~~~ for u > 2 I E jRPFPb( U ) dU = €Of( 1 + o( 1)) follows. 0

Proof of Lemma 3: By integration by parts we have since is say l g ’ I c7 we have 1

< 2c - pb( U / b ) dU

= Const. b3’ + 0, as b + 0.

where M is large but fixed. Since ( & J t ) ) ( j ) and (6K(t))”) are bounded for j = 071,2 and minlu, I&h(u>l > 0, we have

for p 2 1. Next by Condition b) we obtain

where c1 = 1, c , = 2p, c3 = p(p - 1). Hence (bP/ 2r)j?mlF:(t)l dt I E , <to, where E , is a constant inde- pendent of b. The result follows from (3.4). 0

REFERENCES

[ l ] S. Bochner, Lectures on Fourier Integrals. Princeton: Princeton Univ. Press, 1959.

[2] R. J. Carroll and P. Hall, “Optimal rates of convergence for deconvolving a density,” J . Amer. Statist. Assoc., vol. 83, pp.

[31 J. V. Castellana and M. R. Leadbetter, “On smoothed probability density estimation for stationary processes,” J . Stochastic Processes

[4] J. Fan, “On the optimal rates of convergence for nonparametric deconvolution problem,” Ann. Statist., vol. 19, 1991, to appear.

[5] -, “Asymptotic normality for deconvolving kernel density esti- mators,” Sankhya, ser. A, vol. 53, pt. 2, 1990.

[61 P. Hall and C. C. Heyde, Martingale Limit Theory and its Applica- tions. New York: Academic, 1980.

[71 A. N. Kolmogorov and Yu. A. Rozanov, “On strong mixing condi- tions for stationary Gaussian processes,” Theory Prob. Appl., vol. 5,

[8] M. C. Liu and R. L. Taylor, “A consistent nonparametric density estimator for the deconvolution problem,” Canad. J . Statist ., vol.

191 E. Masry and J. Rice, “Gaussian deconvolution via differentiation,” Canad. J . Statist., to appear.

[lo] J. Mendelsohn and J. Rice, “Deconvolution of microfluorometric histograms with E-splines,” J . Amer. Statist. Assoc., vol. 77, pp. 748-753, 1982.

1184- 1186, 1988.

Appl., vol. 21, pp. 179-193, 1986.

pp. 204-207, 1960.

17, pp. 399-410, 1989.

proof of Lemma 4: By Lemma (Part 1)), we have [ l l ] M. Rosenblatt, “A central limit theorem and strong mixing condi-

[12] D. L. Snyder, M. I. Miller, and T. J. Schultz, “Constrained proba- tions,’’ Proc. Nat. Acad. Sci., vol. 4, pp. 43-47, 1956.

llFPb 11 L , ( R P ) + Of as b -+ 0. Consider the difference bility density estimation from noisy data,” in Proc. 1988,- Con$ Inform. Sci. Syst., 1988, pp. 170-172.

[131 L. A. Stefanski and R. J. Carroll, “Deconvoluting kernel density 1

lRP ‘b( U / b , [ g ( - U ) - g ( x, 1 dU

- - Jull S 6 + Jull > 6 )

If g is continuous at x then, given

estimators,” Statistics, vol. 21, pp. 169-184, 1990. [14] R. L. Wheeden and A. Zygmund, Measure and Integral. New

York: Marcel Dekker, 1977. [15] G. L. Wise, A. P. Traganitis, and J. B. Thomas, “The estimation of

a probability density function from measurements corrupted by Poisson noise,” IEEE Trans. Inform. Theory, vol. IT-23, no. 6, pp. 764-766, Nov. 1977.

for every 6 > 0.

E > 0, there exists a