the delta method for nonparametric kernel ...yacine/delta.pdfthe delta method for nonparametric...

THE DELTA METHOD

FOR NONPARAMETRIC KERNEL FUNCTIONALS

Yacine AÔt-Sahalia1Graduate School of Business

University of Chicago1101 E 58th St

Chicago, IL 60637

January 1992Revised, August 1994

1 This paper is part of my Ph.D. dissertation at the MIT Department of Economics. I am very indebted to

Richard Dudley, Jerry Hausman and Whitney Newey for helpful suggestions and advice. The comments of

Peter Robinson and three anonymous referees have considerably improved this paper. I am also grateful to

seminar participants at Berkeley, British Columbia, Carnegie-Mellon, Chicago, Columbia, Harvard,

Northwestern, Michigan, MIT, Princeton, Stanford, Washington University, Wharton, Yale and the Yale-

NSF Conference on Asymptotics for Infinite Dimensional Parameters as well as Emmanuel Guerre, Pascal

Massart and Jens Praestgaard for stimulating conversations. Part of this research was conducted while I was

visiting CREST-INSEE in Paris whose hospitality is gratefully acknowledged. Financial support from

Ecole Polytechnique and MIT Fellowships is also gratefully acknowledged. All errors are mine.

The Delta Methods for Nonparametric Kernel Functionals

by Yacine AÔt-Sahalia

Abstract

This paper provides under weak conditions a generalized delta method for

functionals of nonparametric kernel estimators, based on (possibly) dependent and

(possibly) multivariate data. Generalized derivatives are allowed to permit the inclusion of

virtually any functional, global or pointwise, explicitly or implicitly defined. It is shown

that forming the estimator with dependent data modifies the asymptotic distribution only if

the functional is more irregular than some threshold level. Variance estimators and rates of

convergence are derived. Many examples are provided.

Keywords: Delta Method, Nonparametric Kernel Estimation, Dependent Data, Rate of

Convergence, Functional Differentiation, Generalized Functions.

1

1. Introduction

The delta method is a simple and widely used tool to derive the asymptotic

distribution of nonlinear functionals of an estimator. Most parametric estimators converge

at the familiar root-n case, and so does the functional. When the estimator is

nonparametric, however, some functionals will converge at a rate slower than root-n while

others will retain the root-n rate. The slower than root-n functionals require some form of

smoothing to be estimated, the most popular being the kernel method. A delta method has

long been available for the class of root-n functionals that can be estimated without

smoothing (von Mises [1947], Reeds [1976], Huber [1981], Dudley [1990]).

In all cases, the essence of the delta method is a first order Taylor expansion of the

functional. The problem is that the slower-than-root-n functionals are not differentiable in

the usual sense. Therefore the examples of slower-than-root-n functionals studied in the

literature have been tackled without the systematic "plug-and-play" feature that made the

delta method attractive in the settings where it was available. This paper proposes a simple

delta method that covers also slower-than-root-n functionals, and under conditions that

equal and often relax those used in the previous "case-by-case" work. To address the

problem of non-differentiability, the paper allows generalized functions as functional

derivatives. An example of a generalized function is the Dirac delta function, and its

derivatives. With generalized functions, the familiar delta method approach based on

differentiating the functional is shown to be easily implemented for non-trivial examples.

The results are valid even when the data are serially correlated, with independent data as a

special case.

The main contribution of the paper is to show how to linearize systematically

slower-than-root-n functionals, and then to provide a general yet simple result yielding

their asymptotic distribution. The purpose of the examples is two-fold: first, some classical

2

examples (regression function, etc.) are included to show how the method of this paper

significantly beats the previous approaches; second, new distributions are derived for cases

where they were not previously available (dependent censored least absolute deviation,

quantiles, mode, stochastic differential equations, etc.). The paper is organized as follows:

Section 2 derives the generalized delta method. Consistent estimators of the asymptotic

variances are proposed in Section 3. Section 4 discusses the rates of convergence of the

estimators. Section 5 illustrates the application of the result through many examples.

Section 6 concludes. Proofs are in the Appendix.

2. The Delta Method with Generalized Derivatives

2.1 Assumptions

Consider Rd-valued random variables X1, X2,...,Xn identically distributed as f(.),

an unknown density function with associated cumulative density function F x( ) ≡ f t( )dt−∞

x

∫where x x x xd≡ ( )1 2, , ,K . The following regularity conditions are imposed:

Assumption A1: The sequence Xi{ } is a strictly stationary β-mixing sequence satisfying:

k k kδ β →∞ → 0 for some fixed δ > 1.

βk k= ∀ ≥0 1, corresponds to the independence case. As long as βk k→∞ → 0,

the sequence is said to be absolutely regular.

Assumption A2: The density function f(.) is continuously differentiable on Rd up to

order s. Its successive derivatives are bounded and in L Rd2( ) .

Let Cs be the space of density functions satisfying A2. To estimate the density

function f(.), a Parzen-Rosenblatt kernel function K(.) will be used. The kernel will be

required to satisfy:

3

Assumption A3: (i) K is an even function integrating to one;

(ii) The kernel is of order r=s, an even integer:

1 1 1 0

2 0

3

1 1

1

1

1

) / , , , ( ) ;

) / ( ) ;

) | ( ) | .

∀ ∈ ≡ + + ∈ −{ } =

∃ ∈ = ≠

< + ∞

−∞

+∞

−∞

+∞

−∞

+∞

∫

∫

∫

λ λ λ λ

λ λ

λ λ

λ λ

N r x x K x dx

N r and x x K x dx

x K x dx

dd d

dd

r

d

d

K K L

L

(iii) K is continuously differentiable up to order s+d on Rd, and its

derivatives of order up to s are in L2(Rd).

The last assumption indicates how the bandwidth hn in the kernel density estimator

should be chosen. The statement of the assumption depends upon an exponent parameter

e>0 and an integer m, 0≤m

4

where Φ(1)[G](.) is a continuous linear (in H) functional and L(2,m) is the sum of the L2

norm of the all derivatives of H up to order m. If this holds uniformly on H in any compact

subset K of Cs, and Φ( ),

[ ]( )12

G H C K HL s

≤ ( ) ( ), then Φ is said to be L(2,m)-

Hadamard-differentiable at F. In what follows it will always implicitly be assumed that the

linear term Φ(1)[F](.) is not degenerate. If it were then the asymptotic distribution would be

given by a term of higher order in the Taylor expansion.

By the Riesz Representation Theorem (see e.g., Schwartz [1966]), there exists a

distribution ϕ F R Rd[ ] : a such that Φ( )1 F H F x dH x[ ]( ) = [ ]( ) ( )

−∞

+∞

∫ ϕ . Call ϕ F[ ] ⋅( ) the

functional derivative2 of Φ at F. The standard delta method is applicable only if ϕ F[ ] ⋅( ) is a

regular function, i.e., at least cadlag (right-continuous, left-limit). For some functionals Φ,

the functional derivative will indeed be a regular function. For example, let

Φ F f x dx[ ] ≡−∞

+∞

∫ ( )2 . Then:

Φ F H f x h x dx f x dx f x h x dx h x dx+[ ] = ( ) + ( ){ } = ( ) + ( ) ( ) + ( )−∞

+∞

−∞

+∞

−∞

+∞

−∞

+∞

∫ ∫ ∫ ∫2 2 22

so R F H h x dxΦ[ , ] = ( )−∞

+∞

∫ 2 and Φ( )1 2F H F x dH x f x h x dx[ ]( ) = [ ]( ) ( ) = ( ) ( )−∞

+∞

−∞

+∞

∫ ∫ϕ . Thus its

functional derivative is: ϕ F f[ ] ⋅( ) = ⋅( )2 , a function in Cs.

Unfortunately, many functionals of interest in econometrics do not have "regular"

functional derivatives, that is ϕ F[ ] ⋅( ) will not be a cadlag function. Instead, it will be a

2 Although ϕ[F] is not unique, the respective asymptotic distributions given by Theorem 3 are independent

of the choice of ϕ [F]. One way to make the representation unique would be to impose that

ϕ F x dF x[ ]( ) ( )−∞

+∞

∫ = 0 . Any ϕ[F] given by the Riesz Theorem will be called "the" derivative, even though

"a" derivative would be more appropriate.

5

generalized function. The existing delta method cannot treat such functionals. The main

point of the paper is that the same familiar delta approach will work provided that one

includes generalized functions as functional derivatives. This method turns out to be very

simple as well as powerful. Many examples will be provided below. To get the flavor of

the result immediately, consider the hazard rate function Φ F f yF y

[ ] ≡−

( )( )1

evaluated at some

y. It will be shown below that it is differentiable in the extended sense of this paper, with

functional derivative: ϕ δ[ ]( )( )

( )( )

( )( )( )

F xF y

xF y

F yyy x= −

+−[ ] ( )

≥11 1

11

2 . This functional

derivative is a linear combination of a Dirac mass at y (a generalized function) and an

indicator function (a regular function). The asymptotic distribution of the kernel estimator

of the hazard rate will be driven by the Dirac term in the linear expansion, only the most

unsmooth term counts.

2.3. Generalized Functions

The concept of generalized function, or distribution, was formally introduced by

Schwartz [1954,1966]. Simply put, any function g, DEFINED AS???? no matter how

unsmooth, can be differentiated. Its derivative g(1) is defined by its cross-products against

smooth functions f: g x f x dx g x f x dx1 11( )−∞

+∞ ( )−∞

+∞( ) ( ) ≡ −( ) ( ) ( )∫ ∫ in the univariate case,

where f(1) is a standard derivative. Of course, by integration by part, this reduces to the

common definition of differentiability if g turns out to be a regular function.

For example, a Dirac function at 0 is defined by: δ 0 0( )−∞+∞

( ) ( ) = ( )∫ x f x dx f , and itsderivative is given by δ δ0

10

1 1 0( )−∞+∞

( )( )

−∞

+∞ ( )( ) ( ) = − ( ) ( ) = − ( )∫ ∫( ) x f x dx x f x dx f . Successivedifferentiation of the Dirac function is possible up to the number of derivatives that the

functions f admit, to yield δ δ0 01 1 0( )−∞+∞

( )( )

−∞

+∞ ( )( ) ( ) = −( ) ( ) ( ) = −( ) ( )∫ ∫( )q q q q qx f x dx x f x dx f .Besides Dirac functions, many other generalized functions can be constructed; see

Schwartz [1954,1966] or Zemanian [1965] for examples.

6

This paper allows functional derivatives ϕ F[ ] ⋅( ) to be generalized functions. I will

define an increasing sequence of spaces of generalized functions. Each space will contain

functions of a given "level of unsmoothness." It will be first shown that the asymptotic

distribution of the plug-in depends on the particular space containing ϕ F[ ] ⋅( ) and that the

more unsmooth ϕ F[ ] ⋅( ) is the slower the rate of convergence. Furthermore when ϕ F[ ] ⋅( ) is

more unsmooth than cadlag (i.e., when it is a generalized function instead of a regular

function) it will be shown that constructing the estimator on correlated data does not affect

its asymptotic variance.

Start by defining the space C-1 of bounded cadlag functions from [0,1]d to R. C-1

contains all the usual spaces C0, C1, etc. of continuous, continuously once-differentiable

functions, etc. The regular functions are the elements of C-1. Now define C-2 to be the

space of linear combinations of Dirac functions and functions of C-1, C-3 to be the space of

linear combinations of derivatives of Dirac functions and functions of C-2, etc. When

ϕ F[ ] ⋅( ) belongs to the generalized function space C-q, q 2, but not to the space

immediately smaller C-q+1, write ϕ F C Cq q[ ] ∈ − − +\ 1. q can readily be interpreted as an

"order of unsmoothness" of ϕ F[ ] ⋅( ). Moving up the following scale, the functions become

more unsmooth, and conversely:

7

C

C

C

C

C

0

- 3

- 1

- 2

1

DIFFERENTIATE

INTEGRATE

GENERAL I ZED

FUNCT I ONS

FUNCT I ONS

REGULAR

II.4. The Generalized Delta Method

The result is stated in dimension d=1. An extension to the multivariate case is

provided in the Appendix. ϕ F C Cq q[ ] ∈ − − +\ 1 has the form:

ϕ α δF x F x x B F x

L

yq[ ]( ) = [ ]( ) ( ) + [ ]( )

=( )

−( )∑ ll

l

1

2 where each yl is a fixed point, αl F C[ ] ⋅( ) ∈−1

and B F C q[ ] ⋅( ) ∈ − +1 (see Section 5 for examples).

The following delta method characterizes the asymptotic distribution of the plug-in

functional as a function of the particular space C-q where the functional derivative ϕ F[ ] ⋅( )

lies:

8

Theorem: Suppose that Φ is L(2,m)-Hadamard-differentiable at the

true cdf F with functional derivative ϕ F[ ] ⋅( ). Then under A1-A3:

(i) If ϕ[ ]F C∈ −1 , then under A4(r,m):

n F F N V Fnd1 2 0Φ Φ Φ( ˆ ) ( ) ,−{ } → [ ]( ) with asymptotic variance:

V F VAR F x COV F x F xtk

t t kΦ[ ] = ( )( ) + ( ) ( )( )=

+∞

+∑ϕ ϕ ϕ[ ] [ ] , [ ]21

( i i ) If ϕ[ ] \F C Cq q∈ − − +1 for some q [2,s], then under A4(r+1/2,

m): h n F F N V Fnq

nd2 3 2 1 2 0−( ) −{ } → [ ]( )/ / ( ˆ ) ( ) ,Φ Φ Φ with asymptotic variance:

V F K x dx F yqL

Φ[ ] = ( )

[ ]( ){ }( )−∞

+∞

=∫ ∑

2 2

1

αl ll

. The asymptotic variance is the

same whether or not the data are serially dependent.

When the functional derivative is a regular function (that is ϕ[ ]F C∈ −1), the result

does not depend on how smooth it is beyond being cadlag. On the other hand, when it is a

generalized function, the result depends on the exact degree of unsmoothness of the

functional derivative (belonging to the space C q− , but not the one immediately smoother,

C q− +1). Many (but not all, e.g. the integrated squared density) of the functionals with

regular derivatives can be estimated without smoothing. In that case, the result is exactly

the same whether a kernel or empirical cdf is plugged-in, and there is no reason to smooth.

For functionals with unsmooth derivatives however, smoothing is essential, as the

plug-in cannot even be defined at the empirical cdf. And the asymptotic distribution is

driven exclusively by the "most unsmooth" component of the functional derivative: the

smoother component B F C q[ ] ⋅( ) ∈ − +1 of ϕ F C Cq q[ ] ∈ − − +\ 1 does not appear in the

asymptotic variance. Such a functional (asymptotically) behaves essentially like a linear

combination of the density or its derivatives that are not integrated upon. When

ϕ[ ] \F C Cq q∈ − − +1, it can also be noted that not only the asymptotic variance contains no

9

time-series term, but also has no cross-covariances across the L terms in the functional

derivative. For example, it is known that the kernel density evaluated at a point y1 and at a

different point y2 are asymptotically uncorellated.

This brings the following remark. The slower-than-root-n functionals have a

"local" character, such as the density evaluated at a point, or the mode of the density

function. Consider for example the density function f(.) and the local functional

Φy F f y[ ] ≡ ( ) (real-valued) as opposed to the global functional Φ F f[ ] ≡ ⋅( ) (Cs -valued).

Drawing from the experience of root-n functionals, it may be tempting to try to obtain weak

convergence to a Gaussian process of Φ F f[ ] ≡ ⋅( ). Unfortunately, no such result holds for

slower-than-root-n functionals. Indeed if it existed a limiting process for the normalized

kernel density estimator, this process would have to take independent values W(t) and

W(s) for every t … s.

The delta method derived here has an intuitive duality interpretation. The asymptotic

distribution of an unsmooth functional Φ is driven by the inner product ϕ[ ]( ) ( )F x dH x−∞

+∞

∫ .

When ϕ[F] is a generalized function (in C-q, q 2) then H must belong to C+q-1. Therefore

one needs to have a sufficiently regular nonparametric estimator and unknown cdf. to plug-

in as H = F̂n − F . This is the role played by the kernel smoothing. If one uses the empirical

distribution Fn instead of the KCDF F̂n then H F Fn= − will be in C-1 only, and therefore

the only functionals that can be plugged-into must have derivatives in C-1.

3. Consistent Estimation of the Asymptotic Variances

The asymptotic variances given by the delta method can be consistently estimated in

each case:

10

(i) If ϕ[F] C-1, then under A4(r,m) and the technical regularity condition A5

(given in the Appendix, and designed to guarantee that the truncated sum in the variance

estimator will effectively approximate the infinite sum) the asymptotic variance V FΦ[ ] can

be consistently estimated by:

ˆ [ ˆ ]( ) [ ˆ ]( )

[ ˆ ]( ) [ ˆ ]( ) [ ˆ ]( )

Vn

F xn

F x

G k nF x F x

nF x

n n ii

n

n ii

n

nk

G

n i n i k n ii

n

i

n kn

≡ −

++ −

−

= =

=+

==

−

∑ ∑

∑ ∑∑

1 1

21

1 1

2

1 1

2

1 1

2

1

ϕ ϕ

ϕ ϕ ϕ

where Gn is a truncation lag chosen such that limn

nG→∞= +∞ and G O nn = ( )1 3/ . This is an

estimator of the spectral density at zero (see Newey-West [1987], Robinson [1989,1991]).

The choice of the truncation lag Gn and the Bartlett kernel is subject to the same provisions

and can be improved upon as in Andrews [1991] in the parametric case.

(ii) If ϕ[F] C-q \ C-q+1, then under A4(r+1/2, m) the asymptotic variance V FΦ[ ]

can be consistently estimated by:

ˆ ˆV K x dx F ynq

n

L

≡ ( )

[ ]( ){ }( )−∞

+∞

=∫ ∑

2 2

1

αl ll

.

The appropriate estimate of the asymptotic variance makes it possible to construct

confidence intervals on Φ F̂n[ ] and carry out tests of general hypotheses regarding F̂n . Forexample, to test the hypothesis H F0 0: Φ[ ] = versus H F1 0: Φ[ ] ≠ one could simply use

the following Wald-type test statistics:

Wn ≡ λ(n)Φ(F̂n )' V̂n−1 Φ(F̂n )

under H 0

d → χ[1]2 , where λ( )( )

( )n

n in i

n h in iinq≡

−( )2 3 .

4. Rates of Convergence

The speed of decrease of the bandwidth to zero as the sample size increases is

constrained by A4. The bandwidth can be chosen within the bounds allowed by A4 in

11

order to generate the fastest possible rate of convergence β (the speed of convergence being

n−β ).

(i) If ϕ[F] C-1, then the plug-in will converge at rate β=1/2. The root-n rate is

achieved by kernel plug-ins under A3 no matter how hn is chosen within A4, and will

produce an asymptotic distribution centered at zero.

(ii) If ϕ[F] C-q \ C-q+1 for some q [2,s], then the rate of convergence is at best

β = − −( ) +( )r q r( )2 2 1 . It can be achieved by kernel plug-ins under A3 when choosinghn of the order n

−α , with α = +1 2 1( )r . The resulting asymptotic distribution of the plug-

in will not be centered at zero. For any ε>0, the rate of convergence β−ε however can be

achieved with a resulting asymptotic distribution centered at zero by choosing hn of the

order n−α , with α ε= + + −( ){ } ( ) 1 2 1 1 2 3/ /r q . This choice is admissible under

A4(r+1/2,m). Given the optimal rates of Stone [1980] and Goldstein and Messer [1992], it

therefore turns out that the kernel-type estimators can achieve the optimal rate (but if one

insists on getting the optimal rate then the limiting distribution is not centered at zero).

5. Examples and Applications

Classical examples as well as new distributions are provided in this section to both

show how the method can very easily yield classical results and provide new results.

Example 1: Ordinary Least Squares

The following trivial example illustrates the method in a very simple case,

recovering the asymptotic distribution of classical parametric estimators. Consider a simple

linear model: y x E xt t t t t= + [ ] =β ε ε, | 0 . Although at first sight a quintessentiallyparametric model, the linear regression model in fact makes no assumptions whatsoever

regarding the distribution of the disturbances (other than uncorrelatedness with the

12

regressors). In that sense, the OLS estimator can be treated as a nonparametric estimator.

OLS estimates the functional:

β = [ ][ ] ≡ ( ) = ( ) ( )−∞+∞

−∞

+∞

−∞

+∞

−∞

+∞

∫∫ ∫∫E XYE X F x yf x y dxdy x f x y dxdy22Φ , ,

by plugging-into this expression the empirical cdf Fn :

β̂OLS n t tt

n

tt

n

Fn

y xn

x≡ ( ) == =∑ ∑Φ 1 1

1

2

1

Now compute the functional derivative of Φ:

Φ F Hx y f h

x f h

x yf

x f

x y h

x f

x y f x h

x f

+[ ] =+{ }

+{ }= + −

−∞

+∞

−∞

+∞

−∞

+∞

−∞

+∞−∞

+∞

−∞

+∞

−∞

+∞

−∞

+∞−∞

+∞

−∞

+∞

−∞

+∞

−∞

+∞−∞

+∞

−∞

+∞

−∞

+∞

−∞

+∞

−∞

+∞

−∞

+∞

∫∫

∫∫

∫∫

∫∫

∫∫

∫∫

∫∫ ∫∫

∫∫2 2 2

2

2

+ ( )

= [ ] + [ ]( ) ( ) + ( )

( )

−∞

+∞

−∞

+∞

( )∫∫

2 2 1

2

2 1

2

O H

F F x y h x y dxdy O H

L

L

,

,, ,Φ ϕ

So the functional F Fa Φ( ) = β is L(2,1)-Hadamard-differentiable at F, and its

derivative is ϕ F u v u vE X

E XY u

E XC[ ]( ) = [ ] −

[ ][ ]{ }

∈ −, 22

2 21. Theorem (i) gives the asymptotic

distribution of the plug-in (using either the empirical or the kernel estimator of F) for

ϕ[ ]F C∈ −1 : n F F N V Fnd1 2 0Φ Φ Φ( ˆ ) ( ) ,−{ } → [ ]( ) , with asymptotic variance given by:

V F VAR F y x COV F y x F y xt tk

t t t k t kΦ[ ] = ( )( ) + ( ) ( )( )=

+∞

+ +∑ϕ ϕ ϕ[ ] , [ ] , , [ ] ,21

.

Replacing the functional derivative ϕ[F] by its expression and yt-xtβ by εt, it is

easy to check that this expression is equal to the classical OLS asymptotic variance

V E x E x E x xOLS t t t t t k t t kk

≡ [ ]( ) [ ] + [ ]

−

+ +=

+∞

∑22 2 2

1

2ε ε ε .

13

Example 2: Least Absolute Deviations

Consider again the simple linear model y xt t t= +β ε where x is K-variate, β is an

unknown K-dimensional parameter vector. The identification assumption on

ε t t T/ , ,={ }1 K consists now of being independent of x t Tt / , ,={ }1 K and having zeromedian. Let fε be the marginal density of the disturbances ε. The least absolute deviation

(LAD) estimator is defined by (with the minimum taken over a compact set):

ˆ argminβ ββLAD t tt

T

Ty x≡ −

=∑1

1

.

The first order condition is: 1

01T

sign y x xt t LAD tt

T

−( ) ≡=∑ β̂ , where sign(a) -1 if

a

14

Example 3: Censored Least Absolute Deviations

Powell [1984] extended the LAD estimator to the case where the dependent variable

is censored, i.e., only y xt t t≡ +{ }max ,0 β ε is observed. This case is typical of situationsarising in a labor supply context. In that case, consider the CLAD estimator:

ˆ argmin max ,β ββCLAD t tt

T

Ty x≡ − { }

=∑1 0

1

Regularity conditions guaranteeing the identifiability of β, and the existence and unicity

(asymptotically) of a solution are given by Powell [1984], and also assumed here. The

population first order condition is: E sign y x x x− { }( ) ( )[ ] =max ,0 1 0β β and let Φ F( ) ≡ βso β̂CLAD nF= ( )Φ . Now: ϕ β β

εCLAD F x y f

B sign y x x x C[ ] , max , '( ) = ( )

− { }( ) ( ) ∈− −12 0

0 11 1

with B E x x xt t t≡ ( )[ ]1 β ' . Thus n N f B W BCLAD d1 2 2 1 10 2 0ˆ ,β β ε−{ } → ( )( )( )− − − ,W VAR sign x x COV sign x x sign x x

COV sign x x sign x x

t t t t t t t k t k t kk

t k t k t k t t t

≡ ( ) ( )( ) + ( ) ( ) ( ) ( )( ){+ ( ) ( ) ( ) ( )( )}

+ + +=

+∞

+ + +

∑ε β ε β ε βε β ε β

1 1 1

1 11

,

,

This asymptotic distribution for dependent data appears to be new.

Example 4: Integrated Functionals

Consider next the family of real-valued functionals of the following form, where

ω(.) is a trimming function: Φ( ) , , ,...,F x x F x F x F x dxm≡ ( ) ( ) ( ) ( )( )−∞

+∞( ) ( ) ( )∫ ω ψ 1 2 . This class

includes the information matrix giving the asymptotic variance of maximum likelihood

estimators, the entropy measure, the average derivative estimators of Powell, Stock and

Stoker [1989] and Robinson [1989], the integral of the squared density, etc. The functional

derivative is: ϕ ∂∂

ω ∂∂

[ ]( ) ( ) ( ) ( , ( ),..., ( ))( )( )F x

xx

Fx F x F xq

q

q qm

q

m

= −

−−

−( )

=∑ 1 1

1

11

1

Ψ

C-1, so the

plug-in will converge at rate root-n and have an asymptotic distribution sensitive to

dependent data.

15

Example 5: Pointwise Estimation

Consider the classical example ΦqqF F y[ ] ≡ ( )( ), a derivative of the cdf evaluated at

y. Then if q=0, ϕ0[F](x) = 1(y − x) ∈C−1 , while ϕ δq y

q q qF x x C C[ ]( ) \= ( ) ∈( )−( ) − − −1 1 , hence

for q 1: h n F y F y N K x dx f ynq

nq q d q( ) / ˆ , | |2 1 2 1 2 1 20− ( ) ( )

−∞

+∞−( )( ) − ( ){ } → ( )

( )

∫ . The

extension to multivariate data is immediate given the multivariate result in the Appendix.

Example 6: Smooth Quantiles

Take Φ F F y[ ] ≡ [ ]−1 for some y. In the independent case, smooth estimation ofquantiles has been studied e.g., by Parzen [1979] and Silverman and Young [1987]. Here

the functional derivative can be computed as: ϕ[F](x) = − 1F(1) F−1(y)[ ] 1 F

−1(y) − x( ) ∈C−1

so the asymptotic distribution will converge at rate root-n and have time-dependent terms.

Letting f be the joint density of observations at lag k, the asymptotic variance is:

V F

y y f s t f s f t dsdt

F F y

k

F yF y

k

Φ[ ] =− + −{ }

[ ]( )−∞−∞=

+∞

−

−−

∫∫∑( ) ( , ) ( ) ( )

( )

( )( )

( )

1 2

11

1

1 1 2

This result also appears to be new. Weak convergence of the quantile process to a

Gaussian process is proved using the same method (see AÔt-Sahalia [1993]).

Example 7: Mode

The mode of a unimodal univariate density, studied by Parzen [1962] for i.i.d.

data, can be obtained by the following functional: Φ(F) ≡ [F(2) ]−1(0), that is the point at

which the derivative of the density is zero. The functional derivative can be computed here:

ϕ[F](x) = − 1F(3) [F(2) ]−1(0)[ ] δ([F(2) ]−1 (0))

(1) ∈C−3 \ C−2 , so it follows that:

16

h n F F N K x dxF F

F Fn n

d3 2 1 2 2 1 2 1 1 21 2 1

3 2 1 20 0 0

0

0

/ ( ) ( )( ) ( )

( ) ( )[ ˆ ] ( ) [ ] ( ) , | |

[ ] ( )

[ ] ( )

− −

−∞

+∞( )

−

−−{ } → ( )

[ ][ ]( )

∫ .

This result appears to be new for dependent data.

Example 8: Hazard Rate

Consider Φ F F yF y

[ ] ≡−

( ) ( )( )

1

1 for some fixed y. Its kernel estimation has been studied

by Roussas [1990]. Hazard rates are typically useful in unemployment studies. Here the

derivative can easily be computed:

ϕ δ[ ]( )( )

( )( )

( )\( )

( )

F xF y

xF y

F yy C Cy x= −

+−[ ] ( )

∈≥ − −11 1

11

22 1, and therefore:

h nf y

F y

f yF y

N K x dxf y

F ynn

n

d1 2 1 2 1 221 1

01

/ˆ ( )

ˆ ( )

( )( )

, | |( )

( )−−

−

→ ( )

−( )

−∞

+∞( )∫ .

Example 9: Regression Function

The Nadaraya-Watson method relies on a kernel plug-in to estimate

Φ F E Z Y y zf y z dz f y z dz[ ] ≡ =[ ] = ( ) ( )−∞

+∞

−∞

+∞

∫ ∫, , . The asymptotic distribution ofRobinson [1983] and Bierens [1985] can be recovered by computing the functional

derivative: ϕ δ[ ]( ) [ ][ ]

( ) \( )F xz a F

b Fw C Cy=

−

∈ − −2 1 where a F E Z Y y[ ] ≡ =[ ] andb F f y z dz[ ] ,≡ ( )

−∞

+∞

∫ . Hence for ε ≡ − =[ ]Z E Z Y y| and k regressors in Y:

h n F F NE Y y K w dw

f y z dznk

nd/ / ( ˆ ) ( ) ,

|

,

2 1 2

2 2

0Φ Φ−{ } →=[ ] ( )[ ]{ }

( )

( )−∞

+∞

−∞

+∞

∫∫

ε l

17

6. Conclusions

This paper has extended the delta method to nonparametric estimators of unsmooth

functionals. The regularity conditions are simple, easily verifiable and generally equal or

beat conditions used in case-by-case studies. Generalized derivatives were allowed to

permit the inclusion of virtually any functional, global or pointwise, explicitly or implicitly

defined. It was found here that both the rate of convergence to the asymptotic distribution

and the asymptotic variance were functions of the unsmoothness of the functional

derivative. Basing the estimator on dependent data modifies the asymptotic distribution

only if the functional is more irregular than some threshold level (cadlag). New functional

derivatives were computed for a variety of practical estimators used in econometrics, and

used to obtain straightforwardly their asymptotic distribution.

Compared to the case-by-case approach, the generalized delta method has another

advantage. It isolates the computation of the functional derivative, which is computed once

and for all. When considering dependent sequences, or nonparametric estimation strategies

other than kernel-based, the exact same functional derivative will be needed. The kernel

results of this paper could therefore potentially be extended to cover other nonparametric

methods. Many popular nonparametric procedures for density estimation are indeed of the

form ˆ ,f un

K u xn n ii

n

( ) ≡ ( )=∑1

1

. For example, the kernel method sets

K u xh

Ku x

hn i nd

i

n

,( ) ≡ −

1 with a fixed function K(.), while the orthogonal function

method is based on K u x p u p xn i jj

h

j i

n

,( ) ≡ ( ) ( )=

−

∑1

1

where{pi} is a system of orthogonal

functions in L2(Rd).

18

References

AÔt-Sahalia, Y., [1993], "Nonparametric Functional Estimation with Applications toFinancial Models," Ph.D. Thesis, MIT, May.

Andrews, D.W.K., [1991], "Heteroskedasticity and Autocorrelation ConsistentCovariance Matrix Estimation," Econometrica, Vol. 59, No. 3, 817-858.

Arcones, M.A., and Yu, B., [1992], "Central Limit Theorems for Empirical and U-Processes of Stationary Mixing Sequences," MSRI Mimeo, U.C. Berkeley.

Bierens, H.J., [1985], "Kernel Estimators of Regression Functions," in Bewley, T.F.,ed., Advances in Econometrics, Fifth World Congress, Vol. I, Econometric SocietyMonographs, Cambridge University Press, Cambridge, England.

Billingsley, P., [1968], Convergence of Probability Measures, Wiley, New-York.

Dudley, R.M., [1990], "Nonlinear Functionals of Empirical Measures and theBootstrap," in Probability in Banach Spaces 7, ed. E. Eberlein et al., Progress inProbability 21, 63-82, Birh‰user, Boston.

Fernholz, L.T., [1983], Von Mises Calculus for Statistical Functionals, LectureNotes in Statistics 19, Springer-Verlag.

Gill, R.D., [1989], "Non- and Semi-parametric Maximum Likelihood Estimators and thevon Mises Method (Part 1)," Scandinavian Journal of Statistics, Vol. 16, 97-128.

Goldstein, L. and Messer, P., [1992], "Optimal Plug-In Estimators forNonparametric Functional Estimation," Annals of Statistics, Vol. 20, 1306-1328.

Gyˆrfi, L., H‰rdle, W., Sarda, P. and Vieu, P., [1989], Nonparametric CurveEstimation from Time Series, Lecture Notes in Statistics 60, Springer-Verlag.

Masry, E., [1989], "Nonparametric Estimation of Conditional Probability Densities andExpectations of Stationary Processes: Strong Consistency and Rates," StochasticProcesses and their Applications, 32, 109-127.

von Mises, R., [1947], "On the Asymptotic Distribution of Differentiable StatisticalFunctions," Annals of Mathematical Statistics, Vol. 18, 309-348.

Newey, W.K. and West, K.D., [1987], "A Simple, Positive Semi-Definite,Heteroskedasticity and Autocorrelation Consistent Covariance Matrix," Econometrica,Vol. 55, No. 3, 703-708.

Pakes, A. and Pollard, D., [1989], "Simulation and the Asymptotics of OptimizationEstimators," Econometrica, Vol. 57, No. 5, 1025-1057.

Parzen, E., [1962], "On Estimation of a Probability Density Function and the Mode,"Annals of Mathematical Statistics, 33, 1065-1076.

19

------------, [1979], "Nonparametric Statistical Data Modeling," Journal of theAmerican Statistical Association, 74, 105-131.

Phillips, P.C.B., [1991], "A Shortcut to LAD Estimator Asymptotics," EconometricTheory, 7, 450-463.

Pollard, D., [1990], "Asymptotics for Least Absolute Deviation Estimator,"Econometric Theory, 6.---------------, [1984], Convergence of Stochastic Processes, Springer, New-York.

Powell, J.L., [1984], "Least Absolute Deviations Estimation for the CensoredRegression Model," Journal of Econometrics, Vol. 25, 303-325.Powell, J.L., Stock, J.H. and Stoker, T.M., [1989], "Semiparametric Estimationof Index Coefficients," Econometrica, Vol. 57, No. 6, 1403-1430.

Reeds, J.A.III, [1976], "On the Definition of Von Mises functionals," Ph.D. Thesis,Harvard University, Department of Statistics.

Robinson, P.M., [1991], "Automatic Frequency Domain Inference on Semiparametricand Nonparametric Models," Econometrica, Vol. 59, No. 5, 1329-1363.-----------------, [1989], "Hypothesis Testing in Semiparametric and NonparametricModels for Econometric Time Series," Review of Economic Studies, Vol. 56, 511-534.---------------, [1988], "Root-N Consistent Semiparametric Regression,"Econometrica, Vol. 56, 931-954.-----------------, [1984], "Robust Nonparametric Autoregression," in Robust andNonlinear Time Series Analysis, Franke, H‰rdle and Martin eds., Lecture Notes inStatistics 26, Springer-Verlag, Heidelberg.-----------------, [1983], "Nonparametric Estimators for Time Series," Journal of TimeSeries Analysis, 4, 185-207.

Rosenblatt, M., [1991], Stochastic Curve Estimation, NSF-CMBS RegionalConference Series in Probability and Statistics, Vol. 3, Institute of MathematicalStatistics, Hayward.-----------------, [1971], "Curve Estimates," Annals of Mathematical Statistics, Vol.42, 1815-1842.

Roussas, G., [1969], "Nonparametric Estimation in Markov Processes," Annals of theInstitute of Statistical Mathematics, Vol. 21, 73-87.-------------, [1989], "Hazard Rate Estimation Under Dependence Conditions," Journalof Statistical Planning and Inference, 22, 81-93.

Schwartz, L., [1954,1966], Theory of Distributions, Hermann, Paris.

Stone, C.J., [1980], "Optimal Convergence Rates for Nonparametric Estimators,"Annals of Statistics, Vol. 8, No. 6, 1348-1360.

Zemanian, A.H., [1965], Distribution Theory and Transform Analysis, McGraw-Hill, New-York.

20

Appendix

The statement of the Theorem in the multivariate case is the following:

(i) If ϕ[ ]F C∈ −1 , the result reads the same in dimension d;

(ii) If ϕ[ ] \F C Cq q∈ − − +1: let ϕ α ∂ δF x F x x B F x

L

y[ ]( ) = [ ]( ) ( ) + [ ]( )=

( )( )∑ l

l

lll

1

∆ where

l∆ = −q 2, αl F C[ ] ⋅( ) ∈

−1 and B F C q[ ] ⋅( ) ∈ − +1. l ly R d∈ ( ) contains d l( ) components, and

x x x= ( )−l l, is partitioned accordingly. The maximal number of variables affected by theDirac mass is d d L

* max / , ,≡ ( ) ∈{ }{ }l l K1 and is attained at l i n

L* *, , /≡ ∈{ } ( ) ={ }l K l1 L d d .

Then under A4(r+d*/2, m): h n F F N V Fnd q

nd* / / ( ˆ ) ( ) ,+ −( ) −{ } → [ ]( )2 4 2 1 2 0Φ Φ Φ

where:

V F K u d u F y t f y t d tΦ∆[ ] = ( )( )[ ]

[ ]( ){ } ( )

( )( )

−∞

+∞− − −

−∞

+∞

∈∫ ∫∑ ∂ α

l

ll l

ll l l l l

l

2 2, ,

L

and

K K v d vl

l l( )

−

−∞

+∞−⋅( ) ≡ ⋅( )∫ , .

Proof of Theorem: The following two lemmas will be used:

Lemma 1 (Central Limit Theorem): Under A1-A4( ,0), ˆ̂ ˆ Â n F E Fn n n≡ − [ ]( )1 2converges in law to a Gaussian C0-stochastic distribution GF in the space ( , ),C L

00∞( ) ,

with finite-dimensional covariances given below. If A4( ,0) is replaced by the more

stringent requirement A4(r,0), then the preceding statements hold for the centered process

ˆ Â n F Fn n≡ −( )1 2 instead of ˆ̂ ˆ Â n F E Fn n n≡ − [ ]( )1 2 . The covariance kernel of thegeneralized Brownian Bridge G B FF ≡ ˜ o is given by (where Fk is the joint cdf of

observations at lag k):

21

E F s B F t F s t F t F s t F t s F s F tk kk

˜ ( ( )) ˜ ( ( )) min , ( ) ( , ) ( , ) ( ) ( )B[ ] = ( )( ) −( ) + + −{ }=

+∞

∑1 21

.

Lemma 2 (Bounds for Remainder Term): Under A1-A3, for q = 0,...,d+s:

ˆ ˆ,

F E F O n hn nL q

p nq− [ ] = ( )( ) − −2 1 2 and: E F F O hn L q nr q dˆ ,[ ] − = ( )( ) − −( )2 .

To prove Lemma 1, show that the class of functions Γ ≡ ∈ ∈{ }( ) +W y R h Ry h d, */ ,where W

hK

th

dty h d

y

,( )−∞

⋅( ) ≡ − ⋅∫

1 forms a subgraph VC class. But such a class is a

Euclidean class (from Lemma (2.12) in Pakes and Pollard [1989]); conclude with Theorem

1 of Arcones and Yu [1992]. Alternatively one could use the U-statistics approach of

Robinson [1989]. Lemma 2 is easy given A2. Details are in AÔt-Sahalia [1993].

(i) Consider now the first part of the Theorem. By differentiability of the functional

Φ at F: n F F F x dA x R F An n n1 2 Φ Φ Φˆ [ ] ˆ [ , ˆ ][ ] − [ ]{ } = ( ) ( ) +

−∞

+∞

∫ ϕ , where:

R F A O n F FnX

n L mΦ[ , ˆ ] ˆ

,= −

∞( )

1 22

. First, R F A onX

pΦ[ , ˆ ] ( )= 1 follows from

Lemma 2 since: n F F O n h n hn L m p nm

nr m d1 2

21 2 2 1 2 2 2/

,

/ ( )ˆ − = +( )∞( )

− − − − is op (1) under

A4(r,m) as r>2(m-d). Then by Slutsky's Theorem, the distribution of n F Fn1 2 Φ Φˆ[ ] − [ ]{ }

is given by that of ϕ[ ] ˆF x dA xn( ) ( )−∞

+∞

∫ .

But by the continuous mapping theorem (e.g., Proposition 9.3.7 in Dudley

[1989]), ϕ[ ] ˆF x dA xn( ) ( )−∞

+∞

∫ converges in law to ϕ[ ]( ) ( ˜ ( ( ))F x d B F x−∞

+∞

∫ since from Lemma 1

Ân converges in law to the process G B FF ≡ ˜ o . ϕ[ ]( ) ( ˜ ( ( ))F x d B F x−∞

+∞

∫ is the ItÙ integral of

the real-valued, non-random function ϕ[F] with respect to the Gaussian stochastic process

̃B Fo and is therefore normally distributed. The asymptotic variance of the generic ItÙ

integral ω( ) ( ˜ ( ( ))x d B F x−∞

+∞

∫ can be computed as:

22

E x d B F x E x y d B F y d B F x

E B F y B F xx

x x

yd

d

d

ω ω ω

∂ ω∂ ∂

∂ ω

( ) ( ˜ ( )) ( ) ( ) ( ˜ ( )) ( ˜ ( ))

( ˜ ( ))( ˜ ( ))

o o o

o oL

−∞

+∞

−∞

+∞

−∞

+∞

−∞

+∞

−∞

+∞

∫ ∫∫

∫∫

=

= [ ] ( ) ( )

2

1 ∂∂ ∂1y ydydx

dL

Thus:

V F E F x E F x

E F x F x E F x E F x

t t

kt t k t t k

Φ[ ] = ( )[ ] − ( )[ ]+ ( ) ( )[ ] − ( )[ ] ( )[ ]{ }{

=

+∞

+ +∑

ϕ ϕ

ϕ ϕ ϕ ϕ

[ ] [ ]

[ ] [ ] [ ] [ ]

2 2

1

2

(ii) Consider now the case where:

ϕ α ∂ δ[ ]( ) \ ,F x F x x C C for qy

q q= [ ]( ) ( ) ∈ = −( ) ( ) − − +l l ll

l∆ ∆1 2 , 2≤q≤s.

The remainder term in the expansion of the functional is bounded as in (i). Then by

Slutsky's Theorem the asymptotic distribution is given by the linear term (scaled for now at

the rate n1/2):

∂ δ α

∂ α

l

l

l

l l

ll

ll l

l l l ll

∆

∆

( )( )

−∞

+∞

( ) −− −

=−∞

+∞−

−∞

+∞

( ) [ ]( )

= [ ]( ) − −

∫

∫∫

y n

nd

n n x y

n

x F x dA x

hF x x K

x th

x th

d x a t dt

ˆ ( )

, , ( )|

1

+ [ ]( ) − −

− [ ]( )

( ) −− −

=−∞

+∞−

−∞

+∞

( ) − −

∫∫n h F x x Kx th

x th

d x f t dt

F x x f x

nd

n n x y

1 2 1/

|

, , ( )

, ,

∂ α

∂ α

l

l l

l

ll l

l l l ll

ll l l

∆

∆ ll ll l

x d xx y

( ){ } =−∞

+∞−∫ |

23

= [ ]( ) − −

+ ( )( ) −

− −

=−∞

+∞−

−∞

+∞− −( )∫∫ 1 1 2 2h F x x K

x th

x th

d x a t dt O n hnd

n n x y

n p nr q∂ α

l

l l

ll l

l l l ll∆ , , ( )

|

Write the leading term as ωn nt a t dt( ) ( )−∞

+∞

∫ , where: a t dt dA tn n( ) = ( ) and

A n F Fn n≡ −( )1 2 . Then let ν ωn nd nt h t( ) ≡ ( )( )+( )ll2 2∆ / . Next show that:

E h t da t

K u d u F y t f y t d t o

nd

n nl

ll l

ll l l l l

l

l

( )+( )−∞

+∞

( )( )

−∞

+∞− − −

−∞

+∞

( ) ( )

= ( )( )[ ]

[ ]( ){ } ( )

+

∫

∫ ∫

2 2

2

2 21

∆

∆

/

, , (

ω

∂ α ))

But:

ω ω ω ω ωn n n i ni

n

n i ni

n

x dA xn

x x f x dxn

x E x( ) ( ) = ( ) − ( ) ( )

= ( ) − ( )[ ]{ }−∞

+∞

−∞

+∞

= =∫ ∫∑ ∑1 11 2

11 2

1/ / , so:

E t dA t t f t dt t f t dt

f t s f s f t t s dt ds

n n n n

k

n

k n n

ω ω ω

ω ω

( ) ( )

= ( )[ ] − ( )

+ −{ } ( ) ( )

−∞

+∞

−∞

+∞

−∞

+∞

=

−

−∞

+∞

−∞

+∞

∫ ∫ ∫

∑ ∫∫

2

2

2

1

1

2

( ) ( )

( , ) ( ) ( )

The first of the three terms above is O hn

d d− + − ( )( )( )2 l l∆ , while the other two are o hn

d d− + − ( )( )( )2 l l∆ . In particular the "time-series" term containing the sum over time lags isof lower order than the first term. The computations are very similar for all three terms; for

example for the first term:

24

−∞

+∞

−∞

+∞( ) −

− −

=−∞

+∞−

−∞

+∞( )

∫ ∫ ∫

∫

( )[ ] = [ ]( ) − −

=

ω ∂ α

∂ α

nn

dn n x y

nd

t f t dth

F x x Kx th

x th

d x f t dt

hF

2

2

2

2

1

1

( ) , , ( )|

l

l l

l

ll l

l l l ll

l

∆

Θ [[ ]( ){ } − −

−

=

−( )− −

=−∞

+∞−

≤∫∑ l l

l l l ll

l l

l l

l ll l l

x x Kx th

x th

d x f t dtx y

n n x y

, , ( )|

|/

∂ ∆ ΘΘ Θ ∆

2

Because

∂ ∂l l

l l

l ll l l l l l l l

∆ Θ∆ Θ

∆ Θ−( )− −

−−( )

− −− −

=( ) ( )

− −

Kx th

x th h

Kx th

x thn n n n n

, ,1

, the

term of highest order corresponds to lΘ = 0 , and is therefore given by:

−∞

+∞− ( )

− −

−∞

∞−∫ ∫ [ ]( ) ( ) − −

1 12

2

hF y x

hK

y th

x th

d x f t dtn

d

n n n

α ∂ll l

l l l ll

l

l

, , ( )∆

∆

= ( )( )[ ]

[ ]( ){ } ( )

+

( )+( )

( )−∞

+∞− − −

−∞

+∞

( )+∫ ∫1 1

2

2 2

2hK u d u F y t f y t d t o

hnd

ndl l

l ll

l l l l l

ll

l

l∆∆

∆∂ α , ,

Hence:

E t dA t K u d u F y t f y t d t o

V F o

n nν ∂ α( ) ( )

= ( )( )[ ]

[ ]( ){ } ( )

+ ( )

≡ [ ] + ( )−∞

+∞( )

( )−∞

+∞− − −

−∞

+∞

∫ ∫ ∫2

2 21

1

l

ll l

ll l l l l∆

Φ

, ,

Therefore ν ν νn n n i ni

ndx dA x

nx E x N V F( ) ( ) = ( ) − ( )[ ]{ } → [ ]( )

−∞

+∞

=∫ ∑1 01 2

1/ , Φ

by the Central Limit Theorem. But the first order Taylor expansion of the functional yields:

n F F t dA t O n h O n F Fn n n p nr q

p n L m

1 2 1 2 2 1 22

/ /

,( ˆ ) ( ) ˆΦ Φ−{ } = ( ) ( ) + ( ) + −

−∞

+∞− −( )

∞( )∫ ω

Suppose that d dl( ) = * (when there are more than one l the only terms that matter

are those corresponding to l ∈L*). Then under Assumption A4(r+d*/2,m),

h O n h on

dp n

r qp

l l( )+( ) − −( )( ) = ( )2 2 1 2 2 1∆ / and the remainder term is also op(1). Therefore:

h n F F t a t dt o

N K u d u F y t f y t d

nd

n n n p

d

l

ll l

ll l l l l

l

l

( )+( )−∞

+∞

( )( )

−∞

+∞− − −

−{ } = ( ) ( ) + ( )

→ ( )( )[ ]

[ ]( ){ } ( )

∫

∫

2 2 1 2

2 2

1

0

∆

∆

Φ Φ/ / ( ˆ ) ( )

, , ,

ν

∂ α tt−∞

+∞

∫

25

Under Assumption A4(r,m) only, this will be the asymptotic distribution of

h n F E Fn

dn n

l l( )+( ) − [ ]{ }2 2 1 2∆ Φ Φ/ / ( ˆ ) ( ˆ ) instead of h n F Fnd nll( )+( ) −{ }2 2 1 2∆ Φ Φ/ / ( ˆ ) ( ) . Indeed:

E F F O hn nr qΦ Φ( ˆ ) ( )[ ] − = ( )− −( )2 . Under A4(r+d*/2,m), this asymptotic bias term once

multiplied by h nnd l l( )+( )2 2 1 2∆ / / is o(1).

The absence of a covariance at the same order O hn

d− ( )+( )( )l l2 ∆ between termsassociated with different l yields:

h n F F

N K u d u F y t f y t d t

nd q

n

d

* ( ) / / ( ˆ ) ( )

, , ,

+ −( )

( )( )

−∞

+∞− − −

−∞

+∞

∈

−{ } → ( )( )[ ]

[ ]( ){ } ( )

∫ ∫∑

2 2 2 1 2

2 20

Φ Φ

∆∂ αl

ll l

ll l l l l

l L*

If present, the term B[F](.) C-q+1 only contributes terms of higher order in

powers of hn-1 and therefore does not change the asymptotic distribution.

Estimation of the Asymptotic Variance: The additional technical regularity

condition is A5: Let V V F F xi i i≡ [ ] ≡ [ ]( )ϕ and ˆ ˆV F xi n i≡ [ ]( )ϕ . Assume:(1) E Vi

3+[ ] < ∞δ where δ ε ε ε> +( ) +( )3 3 2/ for some ε>0;(2) E V GG isup ∈ [ ][ ] < ∞Ν 2 where N is a neighborhood of the true cdf F;(3) E V G V H C G Hi i L m[ ] − [ ][ ] ≤ − ∞( )2 2 , for G and H in N.

the delta method for nonparametric kernel ...yacine/delta.pdfthe delta method for nonparametric...

Documents