the delta method for nonparametric kernel ...yacine/delta.pdfthe delta method for nonparametric...
TRANSCRIPT
-
THE DELTA METHOD
FOR NONPARAMETRIC KERNEL FUNCTIONALS
Yacine AÔt-Sahalia1Graduate School of Business
University of Chicago1101 E 58th St
Chicago, IL 60637
January 1992Revised, August 1994
1 This paper is part of my Ph.D. dissertation at the MIT Department of Economics. I am very indebted to
Richard Dudley, Jerry Hausman and Whitney Newey for helpful suggestions and advice. The comments of
Peter Robinson and three anonymous referees have considerably improved this paper. I am also grateful to
seminar participants at Berkeley, British Columbia, Carnegie-Mellon, Chicago, Columbia, Harvard,
Northwestern, Michigan, MIT, Princeton, Stanford, Washington University, Wharton, Yale and the Yale-
NSF Conference on Asymptotics for Infinite Dimensional Parameters as well as Emmanuel Guerre, Pascal
Massart and Jens Praestgaard for stimulating conversations. Part of this research was conducted while I was
visiting CREST-INSEE in Paris whose hospitality is gratefully acknowledged. Financial support from
Ecole Polytechnique and MIT Fellowships is also gratefully acknowledged. All errors are mine.
-
The Delta Methods for Nonparametric Kernel Functionals
by Yacine AÔt-Sahalia
Abstract
This paper provides under weak conditions a generalized delta method for
functionals of nonparametric kernel estimators, based on (possibly) dependent and
(possibly) multivariate data. Generalized derivatives are allowed to permit the inclusion of
virtually any functional, global or pointwise, explicitly or implicitly defined. It is shown
that forming the estimator with dependent data modifies the asymptotic distribution only if
the functional is more irregular than some threshold level. Variance estimators and rates of
convergence are derived. Many examples are provided.
Keywords: Delta Method, Nonparametric Kernel Estimation, Dependent Data, Rate of
Convergence, Functional Differentiation, Generalized Functions.
-
1
1. Introduction
The delta method is a simple and widely used tool to derive the asymptotic
distribution of nonlinear functionals of an estimator. Most parametric estimators converge
at the familiar root-n case, and so does the functional. When the estimator is
nonparametric, however, some functionals will converge at a rate slower than root-n while
others will retain the root-n rate. The slower than root-n functionals require some form of
smoothing to be estimated, the most popular being the kernel method. A delta method has
long been available for the class of root-n functionals that can be estimated without
smoothing (von Mises [1947], Reeds [1976], Huber [1981], Dudley [1990]).
In all cases, the essence of the delta method is a first order Taylor expansion of the
functional. The problem is that the slower-than-root-n functionals are not differentiable in
the usual sense. Therefore the examples of slower-than-root-n functionals studied in the
literature have been tackled without the systematic "plug-and-play" feature that made the
delta method attractive in the settings where it was available. This paper proposes a simple
delta method that covers also slower-than-root-n functionals, and under conditions that
equal and often relax those used in the previous "case-by-case" work. To address the
problem of non-differentiability, the paper allows generalized functions as functional
derivatives. An example of a generalized function is the Dirac delta function, and its
derivatives. With generalized functions, the familiar delta method approach based on
differentiating the functional is shown to be easily implemented for non-trivial examples.
The results are valid even when the data are serially correlated, with independent data as a
special case.
The main contribution of the paper is to show how to linearize systematically
slower-than-root-n functionals, and then to provide a general yet simple result yielding
their asymptotic distribution. The purpose of the examples is two-fold: first, some classical
-
2
examples (regression function, etc.) are included to show how the method of this paper
significantly beats the previous approaches; second, new distributions are derived for cases
where they were not previously available (dependent censored least absolute deviation,
quantiles, mode, stochastic differential equations, etc.). The paper is organized as follows:
Section 2 derives the generalized delta method. Consistent estimators of the asymptotic
variances are proposed in Section 3. Section 4 discusses the rates of convergence of the
estimators. Section 5 illustrates the application of the result through many examples.
Section 6 concludes. Proofs are in the Appendix.
2. The Delta Method with Generalized Derivatives
2.1 Assumptions
Consider Rd-valued random variables X1, X2,...,Xn identically distributed as f(.),
an unknown density function with associated cumulative density function F x( ) ≡ f t( )dt−∞
x
∫where x x x xd≡ ( )1 2, , ,K . The following regularity conditions are imposed:
Assumption A1: The sequence Xi{ } is a strictly stationary β-mixing sequence satisfying:
k k kδ β →∞ → 0 for some fixed δ > 1.
βk k= ∀ ≥0 1, corresponds to the independence case. As long as βk k→∞ → 0,
the sequence is said to be absolutely regular.
Assumption A2: The density function f(.) is continuously differentiable on Rd up to
order s. Its successive derivatives are bounded and in L Rd2( ) .
Let Cs be the space of density functions satisfying A2. To estimate the density
function f(.), a Parzen-Rosenblatt kernel function K(.) will be used. The kernel will be
required to satisfy:
-
3
Assumption A3: (i) K is an even function integrating to one;
(ii) The kernel is of order r=s, an even integer:
1 1 1 0
2 0
3
1 1
1
1
1
) / , , , ( ) ;
) / ( ) ;
) | ( ) | .
∀ ∈ ≡ + + ∈ −{ } =
∃ ∈ = ≠
< + ∞
−∞
+∞
−∞
+∞
−∞
+∞
∫
∫
∫
λ λ λ λ
λ λ
λ λ
λ λ
N r x x K x dx
N r and x x K x dx
x K x dx
dd d
dd
r
d
d
K K L
L
(iii) K is continuously differentiable up to order s+d on Rd, and its
derivatives of order up to s are in L2(Rd).
The last assumption indicates how the bandwidth hn in the kernel density estimator
should be chosen. The statement of the assumption depends upon an exponent parameter
e>0 and an integer m, 0≤m
-
4
where Φ(1)[G](.) is a continuous linear (in H) functional and L(2,m) is the sum of the L2
norm of the all derivatives of H up to order m. If this holds uniformly on H in any compact
subset K of Cs, and Φ( ),
[ ]( )12
G H C K HL s
≤ ( ) ( ), then Φ is said to be L(2,m)-
Hadamard-differentiable at F. In what follows it will always implicitly be assumed that the
linear term Φ(1)[F](.) is not degenerate. If it were then the asymptotic distribution would be
given by a term of higher order in the Taylor expansion.
By the Riesz Representation Theorem (see e.g., Schwartz [1966]), there exists a
distribution ϕ F R Rd[ ] : a such that Φ( )1 F H F x dH x[ ]( ) = [ ]( ) ( )
−∞
+∞
∫ ϕ . Call ϕ F[ ] ⋅( ) the
functional derivative2 of Φ at F. The standard delta method is applicable only if ϕ F[ ] ⋅( ) is a
regular function, i.e., at least cadlag (right-continuous, left-limit). For some functionals Φ,
the functional derivative will indeed be a regular function. For example, let
Φ F f x dx[ ] ≡−∞
+∞
∫ ( )2 . Then:
Φ F H f x h x dx f x dx f x h x dx h x dx+[ ] = ( ) + ( ){ } = ( ) + ( ) ( ) + ( )−∞
+∞
−∞
+∞
−∞
+∞
−∞
+∞
∫ ∫ ∫ ∫2 2 22
so R F H h x dxΦ[ , ] = ( )−∞
+∞
∫ 2 and Φ( )1 2F H F x dH x f x h x dx[ ]( ) = [ ]( ) ( ) = ( ) ( )−∞
+∞
−∞
+∞
∫ ∫ϕ . Thus its
functional derivative is: ϕ F f[ ] ⋅( ) = ⋅( )2 , a function in Cs.
Unfortunately, many functionals of interest in econometrics do not have "regular"
functional derivatives, that is ϕ F[ ] ⋅( ) will not be a cadlag function. Instead, it will be a
2 Although ϕ[F] is not unique, the respective asymptotic distributions given by Theorem 3 are independent
of the choice of ϕ [F]. One way to make the representation unique would be to impose that
ϕ F x dF x[ ]( ) ( )−∞
+∞
∫ = 0 . Any ϕ[F] given by the Riesz Theorem will be called "the" derivative, even though
"a" derivative would be more appropriate.
-
5
generalized function. The existing delta method cannot treat such functionals. The main
point of the paper is that the same familiar delta approach will work provided that one
includes generalized functions as functional derivatives. This method turns out to be very
simple as well as powerful. Many examples will be provided below. To get the flavor of
the result immediately, consider the hazard rate function Φ F f yF y
[ ] ≡−
( )( )1
evaluated at some
y. It will be shown below that it is differentiable in the extended sense of this paper, with
functional derivative: ϕ δ[ ]( )( )
( )( )
( )( )( )
F xF y
xF y
F yyy x= −
+−[ ] ( )
≥11 1
11
2 . This functional
derivative is a linear combination of a Dirac mass at y (a generalized function) and an
indicator function (a regular function). The asymptotic distribution of the kernel estimator
of the hazard rate will be driven by the Dirac term in the linear expansion, only the most
unsmooth term counts.
2.3. Generalized Functions
The concept of generalized function, or distribution, was formally introduced by
Schwartz [1954,1966]. Simply put, any function g, DEFINED AS???? no matter how
unsmooth, can be differentiated. Its derivative g(1) is defined by its cross-products against
smooth functions f: g x f x dx g x f x dx1 11( )−∞
+∞ ( )−∞
+∞( ) ( ) ≡ −( ) ( ) ( )∫ ∫ in the univariate case,
where f(1) is a standard derivative. Of course, by integration by part, this reduces to the
common definition of differentiability if g turns out to be a regular function.
For example, a Dirac function at 0 is defined by: δ 0 0( )−∞+∞
( ) ( ) = ( )∫ x f x dx f , and itsderivative is given by δ δ0
10
1 1 0( )−∞+∞
( )( )
−∞
+∞ ( )( ) ( ) = − ( ) ( ) = − ( )∫ ∫( ) x f x dx x f x dx f . Successivedifferentiation of the Dirac function is possible up to the number of derivatives that the
functions f admit, to yield δ δ0 01 1 0( )−∞+∞
( )( )
−∞
+∞ ( )( ) ( ) = −( ) ( ) ( ) = −( ) ( )∫ ∫( )q q q q qx f x dx x f x dx f .Besides Dirac functions, many other generalized functions can be constructed; see
Schwartz [1954,1966] or Zemanian [1965] for examples.
-
6
This paper allows functional derivatives ϕ F[ ] ⋅( ) to be generalized functions. I will
define an increasing sequence of spaces of generalized functions. Each space will contain
functions of a given "level of unsmoothness." It will be first shown that the asymptotic
distribution of the plug-in depends on the particular space containing ϕ F[ ] ⋅( ) and that the
more unsmooth ϕ F[ ] ⋅( ) is the slower the rate of convergence. Furthermore when ϕ F[ ] ⋅( ) is
more unsmooth than cadlag (i.e., when it is a generalized function instead of a regular
function) it will be shown that constructing the estimator on correlated data does not affect
its asymptotic variance.
Start by defining the space C-1 of bounded cadlag functions from [0,1]d to R. C-1
contains all the usual spaces C0, C1, etc. of continuous, continuously once-differentiable
functions, etc. The regular functions are the elements of C-1. Now define C-2 to be the
space of linear combinations of Dirac functions and functions of C-1, C-3 to be the space of
linear combinations of derivatives of Dirac functions and functions of C-2, etc. When
ϕ F[ ] ⋅( ) belongs to the generalized function space C-q, q 2, but not to the space
immediately smaller C-q+1, write ϕ F C Cq q[ ] ∈ − − +\ 1. q can readily be interpreted as an
"order of unsmoothness" of ϕ F[ ] ⋅( ). Moving up the following scale, the functions become
more unsmooth, and conversely:
-
7
C
C
C
C
C
0
- 3
- 1
- 2
1
DIFFERENTIATE
INTEGRATE
GENERAL I ZED
FUNCT I ONS
FUNCT I ONS
REGULAR
II.4. The Generalized Delta Method
The result is stated in dimension d=1. An extension to the multivariate case is
provided in the Appendix. ϕ F C Cq q[ ] ∈ − − +\ 1 has the form:
ϕ α δF x F x x B F x
L
yq[ ]( ) = [ ]( ) ( ) + [ ]( )
=( )
−( )∑ ll
l
1
2 where each yl is a fixed point, αl F C[ ] ⋅( ) ∈−1
and B F C q[ ] ⋅( ) ∈ − +1 (see Section 5 for examples).
The following delta method characterizes the asymptotic distribution of the plug-in
functional as a function of the particular space C-q where the functional derivative ϕ F[ ] ⋅( )
lies:
-
8
Theorem: Suppose that Φ is L(2,m)-Hadamard-differentiable at the
true cdf F with functional derivative ϕ F[ ] ⋅( ). Then under A1-A3:
(i) If ϕ[ ]F C∈ −1 , then under A4(r,m):
n F F N V Fnd1 2 0Φ Φ Φ( ˆ ) ( ) ,−{ } → [ ]( ) with asymptotic variance:
V F VAR F x COV F x F xtk
t t kΦ[ ] = ( )( ) + ( ) ( )( )=
+∞
+∑ϕ ϕ ϕ[ ] [ ] , [ ]21
( i i ) If ϕ[ ] \F C Cq q∈ − − +1 for some q [2,s], then under A4(r+1/2,
m): h n F F N V Fnq
nd2 3 2 1 2 0−( ) −{ } → [ ]( )/ / ( ˆ ) ( ) ,Φ Φ Φ with asymptotic variance:
V F K x dx F yqL
Φ[ ] = ( )
[ ]( ){ }( )−∞
+∞
=∫ ∑
2 2
1
αl ll
. The asymptotic variance is the
same whether or not the data are serially dependent.
When the functional derivative is a regular function (that is ϕ[ ]F C∈ −1), the result
does not depend on how smooth it is beyond being cadlag. On the other hand, when it is a
generalized function, the result depends on the exact degree of unsmoothness of the
functional derivative (belonging to the space C q− , but not the one immediately smoother,
C q− +1). Many (but not all, e.g. the integrated squared density) of the functionals with
regular derivatives can be estimated without smoothing. In that case, the result is exactly
the same whether a kernel or empirical cdf is plugged-in, and there is no reason to smooth.
For functionals with unsmooth derivatives however, smoothing is essential, as the
plug-in cannot even be defined at the empirical cdf. And the asymptotic distribution is
driven exclusively by the "most unsmooth" component of the functional derivative: the
smoother component B F C q[ ] ⋅( ) ∈ − +1 of ϕ F C Cq q[ ] ∈ − − +\ 1 does not appear in the
asymptotic variance. Such a functional (asymptotically) behaves essentially like a linear
combination of the density or its derivatives that are not integrated upon. When
ϕ[ ] \F C Cq q∈ − − +1, it can also be noted that not only the asymptotic variance contains no
-
9
time-series term, but also has no cross-covariances across the L terms in the functional
derivative. For example, it is known that the kernel density evaluated at a point y1 and at a
different point y2 are asymptotically uncorellated.
This brings the following remark. The slower-than-root-n functionals have a
"local" character, such as the density evaluated at a point, or the mode of the density
function. Consider for example the density function f(.) and the local functional
Φy F f y[ ] ≡ ( ) (real-valued) as opposed to the global functional Φ F f[ ] ≡ ⋅( ) (Cs -valued).
Drawing from the experience of root-n functionals, it may be tempting to try to obtain weak
convergence to a Gaussian process of Φ F f[ ] ≡ ⋅( ). Unfortunately, no such result holds for
slower-than-root-n functionals. Indeed if it existed a limiting process for the normalized
kernel density estimator, this process would have to take independent values W(t) and
W(s) for every t … s.
The delta method derived here has an intuitive duality interpretation. The asymptotic
distribution of an unsmooth functional Φ is driven by the inner product ϕ[ ]( ) ( )F x dH x−∞
+∞
∫ .
When ϕ[F] is a generalized function (in C-q, q 2) then H must belong to C+q-1. Therefore
one needs to have a sufficiently regular nonparametric estimator and unknown cdf. to plug-
in as H = F̂n − F . This is the role played by the kernel smoothing. If one uses the empirical
distribution Fn instead of the KCDF F̂n then H F Fn= − will be in C-1 only, and therefore
the only functionals that can be plugged-into must have derivatives in C-1.
3. Consistent Estimation of the Asymptotic Variances
The asymptotic variances given by the delta method can be consistently estimated in
each case:
-
10
(i) If ϕ[F] C-1, then under A4(r,m) and the technical regularity condition A5
(given in the Appendix, and designed to guarantee that the truncated sum in the variance
estimator will effectively approximate the infinite sum) the asymptotic variance V FΦ[ ] can
be consistently estimated by:
ˆ [ ˆ ]( ) [ ˆ ]( )
[ ˆ ]( ) [ ˆ ]( ) [ ˆ ]( )
Vn
F xn
F x
G k nF x F x
nF x
n n ii
n
n ii
n
nk
G
n i n i k n ii
n
i
n kn
≡ −
++ −
−
= =
=+
==
−
∑ ∑
∑ ∑∑
1 1
21
1 1
2
1 1
2
1 1
2
1
ϕ ϕ
ϕ ϕ ϕ
where Gn is a truncation lag chosen such that limn
nG→∞= +∞ and G O nn = ( )1 3/ . This is an
estimator of the spectral density at zero (see Newey-West [1987], Robinson [1989,1991]).
The choice of the truncation lag Gn and the Bartlett kernel is subject to the same provisions
and can be improved upon as in Andrews [1991] in the parametric case.
(ii) If ϕ[F] C-q \ C-q+1, then under A4(r+1/2, m) the asymptotic variance V FΦ[ ]
can be consistently estimated by:
ˆ ˆV K x dx F ynq
n
L
≡ ( )
[ ]( ){ }( )−∞
+∞
=∫ ∑
2 2
1
αl ll
.
The appropriate estimate of the asymptotic variance makes it possible to construct
confidence intervals on Φ F̂n[ ] and carry out tests of general hypotheses regarding F̂n . Forexample, to test the hypothesis H F0 0: Φ[ ] = versus H F1 0: Φ[ ] ≠ one could simply use
the following Wald-type test statistics:
Wn ≡ λ(n)Φ(F̂n )' V̂n−1 Φ(F̂n )
under H 0
d → χ[1]2 , where λ( )( )
( )n
n in i
n h in iinq≡
−( )2 3 .
4. Rates of Convergence
The speed of decrease of the bandwidth to zero as the sample size increases is
constrained by A4. The bandwidth can be chosen within the bounds allowed by A4 in
-
11
order to generate the fastest possible rate of convergence β (the speed of convergence being
n−β ).
(i) If ϕ[F] C-1, then the plug-in will converge at rate β=1/2. The root-n rate is
achieved by kernel plug-ins under A3 no matter how hn is chosen within A4, and will
produce an asymptotic distribution centered at zero.
(ii) If ϕ[F] C-q \ C-q+1 for some q [2,s], then the rate of convergence is at best
β = − −( ) +( )r q r( )2 2 1 . It can be achieved by kernel plug-ins under A3 when choosinghn of the order n
−α , with α = +1 2 1( )r . The resulting asymptotic distribution of the plug-
in will not be centered at zero. For any ε>0, the rate of convergence β−ε however can be
achieved with a resulting asymptotic distribution centered at zero by choosing hn of the
order n−α , with α ε= + + −( ){ } ( ) 1 2 1 1 2 3/ /r q . This choice is admissible under
A4(r+1/2,m). Given the optimal rates of Stone [1980] and Goldstein and Messer [1992], it
therefore turns out that the kernel-type estimators can achieve the optimal rate (but if one
insists on getting the optimal rate then the limiting distribution is not centered at zero).
5. Examples and Applications
Classical examples as well as new distributions are provided in this section to both
show how the method can very easily yield classical results and provide new results.
Example 1: Ordinary Least Squares
The following trivial example illustrates the method in a very simple case,
recovering the asymptotic distribution of classical parametric estimators. Consider a simple
linear model: y x E xt t t t t= + [ ] =β ε ε, | 0 . Although at first sight a quintessentiallyparametric model, the linear regression model in fact makes no assumptions whatsoever
regarding the distribution of the disturbances (other than uncorrelatedness with the
-
12
regressors). In that sense, the OLS estimator can be treated as a nonparametric estimator.
OLS estimates the functional:
β = [ ][ ] ≡ ( ) = ( ) ( )−∞+∞
−∞
+∞
−∞
+∞
−∞
+∞
∫∫ ∫∫E XYE X F x yf x y dxdy x f x y dxdy22Φ , ,
by plugging-into this expression the empirical cdf Fn :
β̂OLS n t tt
n
tt
n
Fn
y xn
x≡ ( ) == =∑ ∑Φ 1 1
1
2
1
Now compute the functional derivative of Φ:
Φ F Hx y f h
x f h
x yf
x f
x y h
x f
x y f x h
x f
+[ ] =+{ }
+{ }= + −
−∞
+∞
−∞
+∞
−∞
+∞
−∞
+∞−∞
+∞
−∞
+∞
−∞
+∞
−∞
+∞−∞
+∞
−∞
+∞
−∞
+∞
−∞
+∞−∞
+∞
−∞
+∞
−∞
+∞
−∞
+∞
−∞
+∞
−∞
+∞
∫∫
∫∫
∫∫
∫∫
∫∫
∫∫
∫∫ ∫∫
∫∫2 2 2
2
2
+ ( )
= [ ] + [ ]( ) ( ) + ( )
( )
−∞
+∞
−∞
+∞
( )∫∫
2 2 1
2
2 1
2
O H
F F x y h x y dxdy O H
L
L
,
,, ,Φ ϕ
So the functional F Fa Φ( ) = β is L(2,1)-Hadamard-differentiable at F, and its
derivative is ϕ F u v u vE X
E XY u
E XC[ ]( ) = [ ] −
[ ][ ]{ }
∈ −, 22
2 21. Theorem (i) gives the asymptotic
distribution of the plug-in (using either the empirical or the kernel estimator of F) for
ϕ[ ]F C∈ −1 : n F F N V Fnd1 2 0Φ Φ Φ( ˆ ) ( ) ,−{ } → [ ]( ) , with asymptotic variance given by:
V F VAR F y x COV F y x F y xt tk
t t t k t kΦ[ ] = ( )( ) + ( ) ( )( )=
+∞
+ +∑ϕ ϕ ϕ[ ] , [ ] , , [ ] ,21
.
Replacing the functional derivative ϕ[F] by its expression and yt-xtβ by εt, it is
easy to check that this expression is equal to the classical OLS asymptotic variance
V E x E x E x xOLS t t t t t k t t kk
≡ [ ]( ) [ ] + [ ]
−
+ +=
+∞
∑22 2 2
1
2ε ε ε .
-
13
Example 2: Least Absolute Deviations
Consider again the simple linear model y xt t t= +β ε where x is K-variate, β is an
unknown K-dimensional parameter vector. The identification assumption on
ε t t T/ , ,={ }1 K consists now of being independent of x t Tt / , ,={ }1 K and having zeromedian. Let fε be the marginal density of the disturbances ε. The least absolute deviation
(LAD) estimator is defined by (with the minimum taken over a compact set):
ˆ argminβ ββLAD t tt
T
Ty x≡ −
=∑1
1
.
The first order condition is: 1
01T
sign y x xt t LAD tt
T
−( ) ≡=∑ β̂ , where sign(a) -1 if
a
-
14
Example 3: Censored Least Absolute Deviations
Powell [1984] extended the LAD estimator to the case where the dependent variable
is censored, i.e., only y xt t t≡ +{ }max ,0 β ε is observed. This case is typical of situationsarising in a labor supply context. In that case, consider the CLAD estimator:
ˆ argmin max ,β ββCLAD t tt
T
Ty x≡ − { }
=∑1 0
1
Regularity conditions guaranteeing the identifiability of β, and the existence and unicity
(asymptotically) of a solution are given by Powell [1984], and also assumed here. The
population first order condition is: E sign y x x x− { }( ) ( )[ ] =max ,0 1 0β β and let Φ F( ) ≡ βso β̂CLAD nF= ( )Φ . Now: ϕ β β
εCLAD F x y f
B sign y x x x C[ ] , max , '( ) = ( )
− { }( ) ( ) ∈− −12 0
0 11 1
with B E x x xt t t≡ ( )[ ]1 β ' . Thus n N f B W BCLAD d1 2 2 1 10 2 0ˆ ,β β ε−{ } → ( )( )( )− − − ,W VAR sign x x COV sign x x sign x x
COV sign x x sign x x
t t t t t t t k t k t kk
t k t k t k t t t
≡ ( ) ( )( ) + ( ) ( ) ( ) ( )( ){+ ( ) ( ) ( ) ( )( )}
+ + +=
+∞
+ + +
∑ε β ε β ε βε β ε β
1 1 1
1 11
,
,
This asymptotic distribution for dependent data appears to be new.
Example 4: Integrated Functionals
Consider next the family of real-valued functionals of the following form, where
ω(.) is a trimming function: Φ( ) , , ,...,F x x F x F x F x dxm≡ ( ) ( ) ( ) ( )( )−∞
+∞( ) ( ) ( )∫ ω ψ 1 2 . This class
includes the information matrix giving the asymptotic variance of maximum likelihood
estimators, the entropy measure, the average derivative estimators of Powell, Stock and
Stoker [1989] and Robinson [1989], the integral of the squared density, etc. The functional
derivative is: ϕ ∂∂
ω ∂∂
[ ]( ) ( ) ( ) ( , ( ),..., ( ))( )( )F x
xx
Fx F x F xq
q
q qm
q
m
= −
−−
−( )
=∑ 1 1
1
11
1
Ψ
C-1, so the
plug-in will converge at rate root-n and have an asymptotic distribution sensitive to
dependent data.
-
15
Example 5: Pointwise Estimation
Consider the classical example ΦqqF F y[ ] ≡ ( )( ), a derivative of the cdf evaluated at
y. Then if q=0, ϕ0[F](x) = 1(y − x) ∈C−1 , while ϕ δq y
q q qF x x C C[ ]( ) \= ( ) ∈( )−( ) − − −1 1 , hence
for q 1: h n F y F y N K x dx f ynq
nq q d q( ) / ˆ , | |2 1 2 1 2 1 20− ( ) ( )
−∞
+∞−( )( ) − ( ){ } → ( )
( )
∫ . The
extension to multivariate data is immediate given the multivariate result in the Appendix.
Example 6: Smooth Quantiles
Take Φ F F y[ ] ≡ [ ]−1 for some y. In the independent case, smooth estimation ofquantiles has been studied e.g., by Parzen [1979] and Silverman and Young [1987]. Here
the functional derivative can be computed as: ϕ[F](x) = − 1F(1) F−1(y)[ ] 1 F
−1(y) − x( ) ∈C−1
so the asymptotic distribution will converge at rate root-n and have time-dependent terms.
Letting f be the joint density of observations at lag k, the asymptotic variance is:
V F
y y f s t f s f t dsdt
F F y
k
F yF y
k
Φ[ ] =− + −{ }
[ ]( )−∞−∞=
+∞
−
−−
∫∫∑( ) ( , ) ( ) ( )
( )
( )( )
( )
1 2
11
1
1 1 2
This result also appears to be new. Weak convergence of the quantile process to a
Gaussian process is proved using the same method (see AÔt-Sahalia [1993]).
Example 7: Mode
The mode of a unimodal univariate density, studied by Parzen [1962] for i.i.d.
data, can be obtained by the following functional: Φ(F) ≡ [F(2) ]−1(0), that is the point at
which the derivative of the density is zero. The functional derivative can be computed here:
ϕ[F](x) = − 1F(3) [F(2) ]−1(0)[ ] δ([F(2) ]−1 (0))
(1) ∈C−3 \ C−2 , so it follows that:
-
16
h n F F N K x dxF F
F Fn n
d3 2 1 2 2 1 2 1 1 21 2 1
3 2 1 20 0 0
0
0
/ ( ) ( )( ) ( )
( ) ( )[ ˆ ] ( ) [ ] ( ) , | |
[ ] ( )
[ ] ( )
− −
−∞
+∞( )
−
−−{ } → ( )
[ ][ ]( )
∫ .
This result appears to be new for dependent data.
Example 8: Hazard Rate
Consider Φ F F yF y
[ ] ≡−
( ) ( )( )
1
1 for some fixed y. Its kernel estimation has been studied
by Roussas [1990]. Hazard rates are typically useful in unemployment studies. Here the
derivative can easily be computed:
ϕ δ[ ]( )( )
( )( )
( )\( )
( )
F xF y
xF y
F yy C Cy x= −
+−[ ] ( )
∈≥ − −11 1
11
22 1, and therefore:
h nf y
F y
f yF y
N K x dxf y
F ynn
n
d1 2 1 2 1 221 1
01
/ˆ ( )
ˆ ( )
( )( )
, | |( )
( )−−
−
→ ( )
−( )
−∞
+∞( )∫ .
Example 9: Regression Function
The Nadaraya-Watson method relies on a kernel plug-in to estimate
Φ F E Z Y y zf y z dz f y z dz[ ] ≡ =[ ] = ( ) ( )−∞
+∞
−∞
+∞
∫ ∫, , . The asymptotic distribution ofRobinson [1983] and Bierens [1985] can be recovered by computing the functional
derivative: ϕ δ[ ]( ) [ ][ ]
( ) \( )F xz a F
b Fw C Cy=
−
∈ − −2 1 where a F E Z Y y[ ] ≡ =[ ] andb F f y z dz[ ] ,≡ ( )
−∞
+∞
∫ . Hence for ε ≡ − =[ ]Z E Z Y y| and k regressors in Y:
h n F F NE Y y K w dw
f y z dznk
nd/ / ( ˆ ) ( ) ,
|
,
2 1 2
2 2
0Φ Φ−{ } →=[ ] ( )[ ]{ }
( )
( )−∞
+∞
−∞
+∞
∫∫
ε l
-
17
6. Conclusions
This paper has extended the delta method to nonparametric estimators of unsmooth
functionals. The regularity conditions are simple, easily verifiable and generally equal or
beat conditions used in case-by-case studies. Generalized derivatives were allowed to
permit the inclusion of virtually any functional, global or pointwise, explicitly or implicitly
defined. It was found here that both the rate of convergence to the asymptotic distribution
and the asymptotic variance were functions of the unsmoothness of the functional
derivative. Basing the estimator on dependent data modifies the asymptotic distribution
only if the functional is more irregular than some threshold level (cadlag). New functional
derivatives were computed for a variety of practical estimators used in econometrics, and
used to obtain straightforwardly their asymptotic distribution.
Compared to the case-by-case approach, the generalized delta method has another
advantage. It isolates the computation of the functional derivative, which is computed once
and for all. When considering dependent sequences, or nonparametric estimation strategies
other than kernel-based, the exact same functional derivative will be needed. The kernel
results of this paper could therefore potentially be extended to cover other nonparametric
methods. Many popular nonparametric procedures for density estimation are indeed of the
form ˆ ,f un
K u xn n ii
n
( ) ≡ ( )=∑1
1
. For example, the kernel method sets
K u xh
Ku x
hn i nd
i
n
,( ) ≡ −
1 with a fixed function K(.), while the orthogonal function
method is based on K u x p u p xn i jj
h
j i
n
,( ) ≡ ( ) ( )=
−
∑1
1
where{pi} is a system of orthogonal
functions in L2(Rd).
-
18
References
AÔt-Sahalia, Y., [1993], "Nonparametric Functional Estimation with Applications toFinancial Models," Ph.D. Thesis, MIT, May.
Andrews, D.W.K., [1991], "Heteroskedasticity and Autocorrelation ConsistentCovariance Matrix Estimation," Econometrica, Vol. 59, No. 3, 817-858.
Arcones, M.A., and Yu, B., [1992], "Central Limit Theorems for Empirical and U-Processes of Stationary Mixing Sequences," MSRI Mimeo, U.C. Berkeley.
Bierens, H.J., [1985], "Kernel Estimators of Regression Functions," in Bewley, T.F.,ed., Advances in Econometrics, Fifth World Congress, Vol. I, Econometric SocietyMonographs, Cambridge University Press, Cambridge, England.
Billingsley, P., [1968], Convergence of Probability Measures, Wiley, New-York.
Dudley, R.M., [1990], "Nonlinear Functionals of Empirical Measures and theBootstrap," in Probability in Banach Spaces 7, ed. E. Eberlein et al., Progress inProbability 21, 63-82, Birh‰user, Boston.
Fernholz, L.T., [1983], Von Mises Calculus for Statistical Functionals, LectureNotes in Statistics 19, Springer-Verlag.
Gill, R.D., [1989], "Non- and Semi-parametric Maximum Likelihood Estimators and thevon Mises Method (Part 1)," Scandinavian Journal of Statistics, Vol. 16, 97-128.
Goldstein, L. and Messer, P., [1992], "Optimal Plug-In Estimators forNonparametric Functional Estimation," Annals of Statistics, Vol. 20, 1306-1328.
Gyˆrfi, L., H‰rdle, W., Sarda, P. and Vieu, P., [1989], Nonparametric CurveEstimation from Time Series, Lecture Notes in Statistics 60, Springer-Verlag.
Masry, E., [1989], "Nonparametric Estimation of Conditional Probability Densities andExpectations of Stationary Processes: Strong Consistency and Rates," StochasticProcesses and their Applications, 32, 109-127.
von Mises, R., [1947], "On the Asymptotic Distribution of Differentiable StatisticalFunctions," Annals of Mathematical Statistics, Vol. 18, 309-348.
Newey, W.K. and West, K.D., [1987], "A Simple, Positive Semi-Definite,Heteroskedasticity and Autocorrelation Consistent Covariance Matrix," Econometrica,Vol. 55, No. 3, 703-708.
Pakes, A. and Pollard, D., [1989], "Simulation and the Asymptotics of OptimizationEstimators," Econometrica, Vol. 57, No. 5, 1025-1057.
Parzen, E., [1962], "On Estimation of a Probability Density Function and the Mode,"Annals of Mathematical Statistics, 33, 1065-1076.
-
19
------------, [1979], "Nonparametric Statistical Data Modeling," Journal of theAmerican Statistical Association, 74, 105-131.
Phillips, P.C.B., [1991], "A Shortcut to LAD Estimator Asymptotics," EconometricTheory, 7, 450-463.
Pollard, D., [1990], "Asymptotics for Least Absolute Deviation Estimator,"Econometric Theory, 6.---------------, [1984], Convergence of Stochastic Processes, Springer, New-York.
Powell, J.L., [1984], "Least Absolute Deviations Estimation for the CensoredRegression Model," Journal of Econometrics, Vol. 25, 303-325.Powell, J.L., Stock, J.H. and Stoker, T.M., [1989], "Semiparametric Estimationof Index Coefficients," Econometrica, Vol. 57, No. 6, 1403-1430.
Reeds, J.A.III, [1976], "On the Definition of Von Mises functionals," Ph.D. Thesis,Harvard University, Department of Statistics.
Robinson, P.M., [1991], "Automatic Frequency Domain Inference on Semiparametricand Nonparametric Models," Econometrica, Vol. 59, No. 5, 1329-1363.-----------------, [1989], "Hypothesis Testing in Semiparametric and NonparametricModels for Econometric Time Series," Review of Economic Studies, Vol. 56, 511-534.---------------, [1988], "Root-N Consistent Semiparametric Regression,"Econometrica, Vol. 56, 931-954.-----------------, [1984], "Robust Nonparametric Autoregression," in Robust andNonlinear Time Series Analysis, Franke, H‰rdle and Martin eds., Lecture Notes inStatistics 26, Springer-Verlag, Heidelberg.-----------------, [1983], "Nonparametric Estimators for Time Series," Journal of TimeSeries Analysis, 4, 185-207.
Rosenblatt, M., [1991], Stochastic Curve Estimation, NSF-CMBS RegionalConference Series in Probability and Statistics, Vol. 3, Institute of MathematicalStatistics, Hayward.-----------------, [1971], "Curve Estimates," Annals of Mathematical Statistics, Vol.42, 1815-1842.
Roussas, G., [1969], "Nonparametric Estimation in Markov Processes," Annals of theInstitute of Statistical Mathematics, Vol. 21, 73-87.-------------, [1989], "Hazard Rate Estimation Under Dependence Conditions," Journalof Statistical Planning and Inference, 22, 81-93.
Schwartz, L., [1954,1966], Theory of Distributions, Hermann, Paris.
Stone, C.J., [1980], "Optimal Convergence Rates for Nonparametric Estimators,"Annals of Statistics, Vol. 8, No. 6, 1348-1360.
Zemanian, A.H., [1965], Distribution Theory and Transform Analysis, McGraw-Hill, New-York.
-
20
Appendix
The statement of the Theorem in the multivariate case is the following:
(i) If ϕ[ ]F C∈ −1 , the result reads the same in dimension d;
(ii) If ϕ[ ] \F C Cq q∈ − − +1: let ϕ α ∂ δF x F x x B F x
L
y[ ]( ) = [ ]( ) ( ) + [ ]( )=
( )( )∑ l
l
lll
1
∆ where
l∆ = −q 2, αl F C[ ] ⋅( ) ∈
−1 and B F C q[ ] ⋅( ) ∈ − +1. l ly R d∈ ( ) contains d l( ) components, and
x x x= ( )−l l, is partitioned accordingly. The maximal number of variables affected by theDirac mass is d d L
* max / , ,≡ ( ) ∈{ }{ }l l K1 and is attained at l i n
L* *, , /≡ ∈{ } ( ) ={ }l K l1 L d d .
Then under A4(r+d*/2, m): h n F F N V Fnd q
nd* / / ( ˆ ) ( ) ,+ −( ) −{ } → [ ]( )2 4 2 1 2 0Φ Φ Φ
where:
V F K u d u F y t f y t d tΦ∆[ ] = ( )( )[ ]
[ ]( ){ } ( )
( )( )
−∞
+∞− − −
−∞
+∞
∈∫ ∫∑ ∂ α
l
ll l
ll l l l l
l
2 2, ,
L
and
K K v d vl
l l( )
−
−∞
+∞−⋅( ) ≡ ⋅( )∫ , .
Proof of Theorem: The following two lemmas will be used:
Lemma 1 (Central Limit Theorem): Under A1-A4( ,0), ˆ̂ ˆ ˆA n F E Fn n n≡ − [ ]( )1 2converges in law to a Gaussian C0-stochastic distribution GF in the space ( , ),C L
00∞( ) ,
with finite-dimensional covariances given below. If A4( ,0) is replaced by the more
stringent requirement A4(r,0), then the preceding statements hold for the centered process
ˆ ˆA n F Fn n≡ −( )1 2 instead of ˆ̂ ˆ ˆA n F E Fn n n≡ − [ ]( )1 2 . The covariance kernel of thegeneralized Brownian Bridge G B FF ≡ ˜ o is given by (where Fk is the joint cdf of
observations at lag k):
-
21
E F s B F t F s t F t F s t F t s F s F tk kk
˜ ( ( )) ˜ ( ( )) min , ( ) ( , ) ( , ) ( ) ( )B[ ] = ( )( ) −( ) + + −{ }=
+∞
∑1 21
.
Lemma 2 (Bounds for Remainder Term): Under A1-A3, for q = 0,...,d+s:
ˆ ˆ,
F E F O n hn nL q
p nq− [ ] = ( )( ) − −2 1 2 and: E F F O hn L q nr q dˆ ,[ ] − = ( )( ) − −( )2 .
To prove Lemma 1, show that the class of functions Γ ≡ ∈ ∈{ }( ) +W y R h Ry h d, */ ,where W
hK
th
dty h d
y
,( )−∞
⋅( ) ≡ − ⋅∫
1 forms a subgraph VC class. But such a class is a
Euclidean class (from Lemma (2.12) in Pakes and Pollard [1989]); conclude with Theorem
1 of Arcones and Yu [1992]. Alternatively one could use the U-statistics approach of
Robinson [1989]. Lemma 2 is easy given A2. Details are in AÔt-Sahalia [1993].
(i) Consider now the first part of the Theorem. By differentiability of the functional
Φ at F: n F F F x dA x R F An n n1 2 Φ Φ Φˆ [ ] ˆ [ , ˆ ][ ] − [ ]{ } = ( ) ( ) +
−∞
+∞
∫ ϕ , where:
R F A O n F FnX
n L mΦ[ , ˆ ] ˆ
,= −
∞( )
1 22
. First, R F A onX
pΦ[ , ˆ ] ( )= 1 follows from
Lemma 2 since: n F F O n h n hn L m p nm
nr m d1 2
21 2 2 1 2 2 2/
,
/ ( )ˆ − = +( )∞( )
− − − − is op (1) under
A4(r,m) as r>2(m-d). Then by Slutsky's Theorem, the distribution of n F Fn1 2 Φ Φˆ[ ] − [ ]{ }
is given by that of ϕ[ ] ˆF x dA xn( ) ( )−∞
+∞
∫ .
But by the continuous mapping theorem (e.g., Proposition 9.3.7 in Dudley
[1989]), ϕ[ ] ˆF x dA xn( ) ( )−∞
+∞
∫ converges in law to ϕ[ ]( ) ( ˜ ( ( ))F x d B F x−∞
+∞
∫ since from Lemma 1
Ân converges in law to the process G B FF ≡ ˜ o . ϕ[ ]( ) ( ˜ ( ( ))F x d B F x−∞
+∞
∫ is the ItÙ integral of
the real-valued, non-random function ϕ[F] with respect to the Gaussian stochastic process
̃B Fo and is therefore normally distributed. The asymptotic variance of the generic ItÙ
integral ω( ) ( ˜ ( ( ))x d B F x−∞
+∞
∫ can be computed as:
-
22
E x d B F x E x y d B F y d B F x
E B F y B F xx
x x
yd
d
d
ω ω ω
∂ ω∂ ∂
∂ ω
( ) ( ˜ ( )) ( ) ( ) ( ˜ ( )) ( ˜ ( ))
( ˜ ( ))( ˜ ( ))
o o o
o oL
−∞
+∞
−∞
+∞
−∞
+∞
−∞
+∞
−∞
+∞
∫ ∫∫
∫∫
=
= [ ] ( ) ( )
2
1 ∂∂ ∂1y ydydx
dL
Thus:
V F E F x E F x
E F x F x E F x E F x
t t
kt t k t t k
Φ[ ] = ( )[ ] − ( )[ ]+ ( ) ( )[ ] − ( )[ ] ( )[ ]{ }{
=
+∞
+ +∑
ϕ ϕ
ϕ ϕ ϕ ϕ
[ ] [ ]
[ ] [ ] [ ] [ ]
2 2
1
2
(ii) Consider now the case where:
ϕ α ∂ δ[ ]( ) \ ,F x F x x C C for qy
q q= [ ]( ) ( ) ∈ = −( ) ( ) − − +l l ll
l∆ ∆1 2 , 2≤q≤s.
The remainder term in the expansion of the functional is bounded as in (i). Then by
Slutsky's Theorem the asymptotic distribution is given by the linear term (scaled for now at
the rate n1/2):
∂ δ α
∂ α
l
l
l
l l
ll
ll l
l l l ll
∆
∆
( )( )
−∞
+∞
( ) −− −
=−∞
+∞−
−∞
+∞
( ) [ ]( )
= [ ]( ) − −
∫
∫∫
y n
nd
n n x y
n
x F x dA x
hF x x K
x th
x th
d x a t dt
ˆ ( )
, , ( )|
1
+ [ ]( ) − −
− [ ]( )
( ) −− −
=−∞
+∞−
−∞
+∞
( ) − −
∫∫n h F x x Kx th
x th
d x f t dt
F x x f x
nd
n n x y
1 2 1/
|
, , ( )
, ,
∂ α
∂ α
l
l l
l
ll l
l l l ll
ll l l
∆
∆ ll ll l
x d xx y
( ){ } =−∞
+∞−∫ |
-
23
= [ ]( ) − −
+ ( )( ) −
− −
=−∞
+∞−
−∞
+∞− −( )∫∫ 1 1 2 2h F x x K
x th
x th
d x a t dt O n hnd
n n x y
n p nr q∂ α
l
l l
ll l
l l l ll∆ , , ( )
|
Write the leading term as ωn nt a t dt( ) ( )−∞
+∞
∫ , where: a t dt dA tn n( ) = ( ) and
A n F Fn n≡ −( )1 2 . Then let ν ωn nd nt h t( ) ≡ ( )( )+( )ll2 2∆ / . Next show that:
E h t da t
K u d u F y t f y t d t o
nd
n nl
ll l
ll l l l l
l
l
( )+( )−∞
+∞
( )( )
−∞
+∞− − −
−∞
+∞
( ) ( )
= ( )( )[ ]
[ ]( ){ } ( )
+
∫
∫ ∫
2 2
2
2 21
∆
∆
/
, , (
ω
∂ α ))
But:
ω ω ω ω ωn n n i ni
n
n i ni
n
x dA xn
x x f x dxn
x E x( ) ( ) = ( ) − ( ) ( )
= ( ) − ( )[ ]{ }−∞
+∞
−∞
+∞
= =∫ ∫∑ ∑1 11 2
11 2
1/ / , so:
E t dA t t f t dt t f t dt
f t s f s f t t s dt ds
n n n n
k
n
k n n
ω ω ω
ω ω
( ) ( )
= ( )[ ] − ( )
+ −{ } ( ) ( )
−∞
+∞
−∞
+∞
−∞
+∞
=
−
−∞
+∞
−∞
+∞
∫ ∫ ∫
∑ ∫∫
2
2
2
1
1
2
( ) ( )
( , ) ( ) ( )
The first of the three terms above is O hn
d d− + − ( )( )( )2 l l∆ , while the other two are o hn
d d− + − ( )( )( )2 l l∆ . In particular the "time-series" term containing the sum over time lags isof lower order than the first term. The computations are very similar for all three terms; for
example for the first term:
-
24
−∞
+∞
−∞
+∞( ) −
− −
=−∞
+∞−
−∞
+∞( )
∫ ∫ ∫
∫
( )[ ] = [ ]( ) − −
=
ω ∂ α
∂ α
nn
dn n x y
nd
t f t dth
F x x Kx th
x th
d x f t dt
hF
2
2
2
2
1
1
( ) , , ( )|
l
l l
l
ll l
l l l ll
l
∆
Θ [[ ]( ){ } − −
−
=
−( )− −
=−∞
+∞−
≤∫∑ l l
l l l ll
l l
l l
l ll l l
x x Kx th
x th
d x f t dtx y
n n x y
, , ( )|
|/
∂ ∆ ΘΘ Θ ∆
2
Because
∂ ∂l l
l l
l ll l l l l l l l
∆ Θ∆ Θ
∆ Θ−( )− −
−−( )
− −− −
=( ) ( )
− −
Kx th
x th h
Kx th
x thn n n n n
, ,1
, the
term of highest order corresponds to lΘ = 0 , and is therefore given by:
−∞
+∞− ( )
− −
−∞
∞−∫ ∫ [ ]( ) ( ) − −
1 12
2
hF y x
hK
y th
x th
d x f t dtn
d
n n n
α ∂ll l
l l l ll
l
l
, , ( )∆
∆
= ( )( )[ ]
[ ]( ){ } ( )
+
( )+( )
( )−∞
+∞− − −
−∞
+∞
( )+∫ ∫1 1
2
2 2
2hK u d u F y t f y t d t o
hnd
ndl l
l ll
l l l l l
ll
l
l∆∆
∆∂ α , ,
Hence:
E t dA t K u d u F y t f y t d t o
V F o
n nν ∂ α( ) ( )
= ( )( )[ ]
[ ]( ){ } ( )
+ ( )
≡ [ ] + ( )−∞
+∞( )
( )−∞
+∞− − −
−∞
+∞
∫ ∫ ∫2
2 21
1
l
ll l
ll l l l l∆
Φ
, ,
Therefore ν ν νn n n i ni
ndx dA x
nx E x N V F( ) ( ) = ( ) − ( )[ ]{ } → [ ]( )
−∞
+∞
=∫ ∑1 01 2
1/ , Φ
by the Central Limit Theorem. But the first order Taylor expansion of the functional yields:
n F F t dA t O n h O n F Fn n n p nr q
p n L m
1 2 1 2 2 1 22
/ /
,( ˆ ) ( ) ˆΦ Φ−{ } = ( ) ( ) + ( ) + −
−∞
+∞− −( )
∞( )∫ ω
Suppose that d dl( ) = * (when there are more than one l the only terms that matter
are those corresponding to l ∈L*). Then under Assumption A4(r+d*/2,m),
h O n h on
dp n
r qp
l l( )+( ) − −( )( ) = ( )2 2 1 2 2 1∆ / and the remainder term is also op(1). Therefore:
h n F F t a t dt o
N K u d u F y t f y t d
nd
n n n p
d
l
ll l
ll l l l l
l
l
( )+( )−∞
+∞
( )( )
−∞
+∞− − −
−{ } = ( ) ( ) + ( )
→ ( )( )[ ]
[ ]( ){ } ( )
∫
∫
2 2 1 2
2 2
1
0
∆
∆
Φ Φ/ / ( ˆ ) ( )
, , ,
ν
∂ α tt−∞
+∞
∫
-
25
Under Assumption A4(r,m) only, this will be the asymptotic distribution of
h n F E Fn
dn n
l l( )+( ) − [ ]{ }2 2 1 2∆ Φ Φ/ / ( ˆ ) ( ˆ ) instead of h n F Fnd nll( )+( ) −{ }2 2 1 2∆ Φ Φ/ / ( ˆ ) ( ) . Indeed:
E F F O hn nr qΦ Φ( ˆ ) ( )[ ] − = ( )− −( )2 . Under A4(r+d*/2,m), this asymptotic bias term once
multiplied by h nnd l l( )+( )2 2 1 2∆ / / is o(1).
The absence of a covariance at the same order O hn
d− ( )+( )( )l l2 ∆ between termsassociated with different l yields:
h n F F
N K u d u F y t f y t d t
nd q
n
d
* ( ) / / ( ˆ ) ( )
, , ,
+ −( )
( )( )
−∞
+∞− − −
−∞
+∞
∈
−{ } → ( )( )[ ]
[ ]( ){ } ( )
∫ ∫∑
2 2 2 1 2
2 20
Φ Φ
∆∂ αl
ll l
ll l l l l
l L*
If present, the term B[F](.) C-q+1 only contributes terms of higher order in
powers of hn-1 and therefore does not change the asymptotic distribution.
Estimation of the Asymptotic Variance: The additional technical regularity
condition is A5: Let V V F F xi i i≡ [ ] ≡ [ ]( )ϕ and ˆ ˆV F xi n i≡ [ ]( )ϕ . Assume:(1) E Vi
3+[ ] < ∞δ where δ ε ε ε> +( ) +( )3 3 2/ for some ε>0;(2) E V GG isup ∈ [ ][ ] < ∞Ν 2 where N is a neighborhood of the true cdf F;(3) E V G V H C G Hi i L m[ ] − [ ][ ] ≤ − ∞( )2 2 , for G and H in N.