asymptotic and bootstrap tests for inﬁnite order ... · that is, stochastic dominance of any...

Asymptotic and Bootstrap Tests for Infinite Order StochasticDominance via the Method of Empirical Likelihood

Rami Victor Tabri∗

PhD Candidate, Department of Economics, McGill University,Montreal, Quebec, Canada H3A [email protected]

andLecturer, School of Economics, University of Sydney,

Sydney, New South Wales, Australia 2006

September 14, 2012

Abstract

We develop asymptotic and bootstrap tests for stochastic dominance of the infinite order for distri-butions with known common support - the set of non-negative real numbers. These tests posit a null ofdominance, which is characterized by an inequality in the corresponding Laplace transforms of the dis-tribution functions. The bootstrap procedure uses a bootstrap data generating process that satisfies thetwo ”Golden Rules” of bootstrapping, and is obtained using constrained empirical likelihood estima-tion. To implement the constrained estimator, we develop a feasible-value-function approach as in Tabriand Davidson (2011). The proposed bootstrap tests are based on the weighted one-sided Kolmogorov-Smirnov and Cramer von Mises test statistics, which we show to be valid, and we also characterize theset of probabilities where the asymptotic size is exactly equal to the nominal level. Additionally, theasymptotic and bootstrap likelihood ratio tests are developed in which a Wilks phenomenon is unveiled.We prove that it is asymptotically distributed as χ2

1 on the boundary of the null hypothesis. Finally, us-ing the Cramer von Mises and likelihood ratio test statistics, preliminary simulations are conducted inwhich we compare our bootstrap method with the bootstrap empirical process procedure proposed inLinton, Song, and Whang (2010), in terms of size distortion and power.

JEL Classification: C12; C13; C14;

Keywords: Inference for Infinite Order Stochastic Dominance; Empirical Likelihood; Bootstrap Test;Feasible Value-Function-Approach.

∗This paper is part of my doctoral thesis at McGill University. Support from CIREQ through my McGill-CIREQ DoctoralScholarship is gratefully acknowledged. I also thank Drs. Pierre Chaussee, Victoria Zinde-Walsh, and Russell Davidson fortheir helpful comments.

1

1 IntroductionEconomists have always been interested in the comparison of income and welfare distributions acrossgroups in terms of poverty and income inequality. Since the influential work of Atkinson (1970), mucheffort has been devoted to making the comparisons of income and welfare distributions more ethicallyrobust, by making judgements only when all members of a wide class of poverty and inequality indices,or social welfare functions lead to the same conclusion, rather on focusing on a particular index. The dif-ficulties that arose were dealt with by the use of notions of stochastic dominance. For example, Foster andShorrocks (1988) explored the partial ordering of income distributions induced by unambiguous povertyjudgements to provide a new perspective on poverty measurement, and its relation to inequality and wel-fare. Their key finding is a characterization of the class of measures proposed in Foster et al. (1984) interms of stochastic dominance partial orderings.

Consider two distributions denoted by A and B, which are characterized by cumulative distributionfunctions (CDFs) FA and FB respectively, having common support given by the interval [0, s], such thats ≤ ∞. Let Ds

A(x) denote the sth integral of the CDF FA; that is

DsA(x) =

∫ x

0

Ds−1A (z)dz, s = 2, 3, ...

v whereD1A(x) = FA(x), with an analogous definition forDj

B(x). We say that FB stochastically dominatesFA at degree s, written BDsA if

DsB(x) ≤ Ds

A(x), ∀x ∈ [0, s].

The recursive structure of the above definition implies that

BDsA =⇒ BDr A, ∀r > s.

That is, stochastic dominance of any finite degree implies stochastic dominance of all higher orders, in-cluding the infinite order where infinite degree stochastic dominance, denoted by BD∞A, is defined byletting s→∞ in the definition.

We focus on the weakest form of stochastic dominance, namely, the limiting infinite order stochasticdominance (ISD), because testing for this ordering is essentially a test of whether or not two CDFs can beranked using a finite order stochastic dominance relation. To see this, Thistle (1993) provides a necessaryand sufficient condition for infinite order stochastic dominance that is related to finite orders of stochasticdominance and is given by:

Proposition 1 (Thistle). BD∞A ⇐⇒ ∃s ∈ Z+ such thatBDsA.

Thistle (1993) also provides an explicit characterization of ISD from which one can base a test on, andis given by:

Proposition 2 (Thistle). BD∞A ⇐⇒ MB(−t) ≤ MA(−t), ∀t ∈ R+, where MA(−t) denotes theLaplace transform of the CDF FA, and similarly for B.

Additionally, for discrete income distributions having common mean, Fishburn and Willig (1984) showthat ISD is related to the ranking of these distributions in terms of the class of inequality indices intro-duced in Kolm (1976a; 1976b), known as Kolm’s ’leftist’ measures of income inequality. The continuous

2

analogue of this class of indices is given by

Il(a) =1

alog

∫ s

0

ea(µ−x)dF (x)

, a > 0,

where µ is the mean of F. In this case, it follows immediately that

BD∞A ⇐⇒ IBl (a) < IAl (a), ∀a > 0,

provided that µB = µA = µ > 0.An issue that arises is whether it is even necessary to test for infinite order dominance because of a

result in Davidson and Duclos (2000), which is presented as Lemma 1 below.

Lemma 1 (Davidson and Duclos). Let A and B be distributions having common support [0, s], s <∞. IfB dominates A at first order up to some w ∈ (0, s], with strict dominance over at least part of the range,then for any finite threshold, z ∈ [0, s], there exists a sufficiently larger finite order such that B dominatesA up to z.

Thus, if we can show that B dominates A in the lower tail, then the conclusion of this lemma is that Bdominates A at the infinite order. One can proceed to test for this type of restricted stochastic dominanceusing the test developed in Barrett and Donald (2003) for instance. However, multiple testing cannot beavoided without prior knowledge of the value of w in Lemma 1. An approach that tests for infinite orderdominance will not require multiple testing to determine whether or not a stochastic dominance relation ofsome finite order, holds between two CDFs in a specific direction.

In this paper, we develop a nonparametric approach to construct bootstrap tests for unidirectional ISDbased on the discrepancy of the Laplace transform functions of the CDFs. To our knowledge, Knight andSatchell (2008) are the first to propose a unidirectional test of ISD based on Laplace transform functions.They developed an asymptotic test, and their aim was to classify the relationship between two unknownCDFs when modeling the two samples as independent, or pairwise dependent random samples. Further-more, they used a union-intersection principle based on the minimum and maximum of a set of t statistics.Finally, Bennett (2007) focused on testing the ISD ordering with serially dependent samples with applica-tions to finance, using an integral-type test statistic, and bootstrap empirical process theory.

Bootstrap methods based on a resampling DGP have been used in the inference for stochastic dominanceorderings under the null. Most of these approaches considered the least favorable subset of the models ofthe null hypothesis, examples of which are McFadden (1989), Barrett and Donald (2003), and Horvath,Kokoszka, and Zitikis (2006). The least favorable subset is a subset of the null model where the marginaldistribution functions are equal, at which the rejection probability is maximized. Linton, Maasoumi andWhang (2005) showed that such approaches lead to asymptotically non-similar tests, which implies thatthe test is biased1. Linton, Song, and Whang (2010) address this problem by testing the null of stochasticdominance using a bootstrap empirical process procedure, where a resampling data generating process (thejoint empirical distribution function of the data), and the notion of a ”contact set” are used in constructingthe bootstrapped test statistics.

Using the method of sieve maximum likelihood, Tabri and Davidson (2011) developed bootstrap testsfor finite order stochastic dominance under the null that satisfy the two “Golden Rules” of bootstrapping2,

1See Lehmann and Romano (2005) for the concepts of similarity and unbiasedness for tests.2They are extensions and “guidelines” for bootstrap hypothesis testing found in Hall and Wilson (1991). For a discussion on

bootstrapping econometric models in the context of the golden rules, see Davidson (2007).

3

which are that (1) the bootstrap DGP should belong to the statistical model that represents the null hy-pothesis, and (2) the bootstrap DGP should be as good an estimate of the true DGP as possible, underthe assumption that the true DGP belongs to the null model. For such estimators, the importance of sat-isfying the second golden rule is mentioned in Horowitz (1999), who recommends when possible the useof restricted maximum likelihood, where the restriction characterizes the null hypothesis, so that furthergains in efficiency and numerical accuracy can be obtained. In the case of tests of stochastic dominance,implementing of a restricted maximum likelihood estimator can be difficult, because satisfying the null ofdominance entails in principle, imposing an infinite number of inequalities.

Our aim in this paper is to construct bootstrap tests for ISD under the null that uses a bootstrap DGPsatisfying the two golden rules. To that end, we develop a feasible value-function-approach as in Tabri andDavidson (2011), in which, we use constrained empirical likelihood (Owen (2001)) to estimate a bootstrapDGP. In contrast to Tabri and Davidson (2011), we don’t make any prior assumptions concerning thesmoothness of the joint distribution that generated the data. In our statistical model, we only require thatthe joint CDF that generated the data be at least a continuous, and have bounded variance. This addedlevel of flexibility in our statistical model is important for applications in which, the distributions to becompared lack smoothness. Zinde-Walsh (2008) showed in examples relating to policy and institutionaleffects that, income distributions may be represented by discontinuous density functions because of theseeffect. Therefore, it is important that our statistical model be able to capture such behaviors in the incomedistributions.

Regarding test statistics, we show the validity of our bootstrap tests based on the Kolmogorov-Smirnov(KS), Cramer von Mises (CVM), and likelihood ratio (LR) test. Finally, we do not investigate the caseof infinite order residual dominance, because it requires the development of a theory of infinite orderdominance for real valued random variables i.e. residuals can be negative3.

The rest of this paper is organized as follows. Section 2 presents the empirical likelihood optimizationproblems. Section 3 describes the value function approach, Section 4 presents the bootstrap procedure,and derives the validity results for the asymptotic and bootstrap tests based on the CVM, KS, and LR teststatistics. Section 5 reports on our simulation experiments, and finally, Section 6 concludes.

2 Empirical LikelihoodIn this section, we develop the unconstrained and (infeasible) constrained estimators of the joint distribu-tion of interest. After which, we introduce the testing problem of interest, and develop the statistical theoryof the estimators.

Let M denote the statistical model satisfying Assumption 1 stated below, and we denote by M0 themodel of the null hypothesis.

Assumption 1 (Statistical Model). The statistical model under consideration is described in terms of thefollowing distributional and sampling assumptions:

1. Smoothness and support of joint-distribution: We consider the set of all continuous joint CDFs oftwo populations A and B denoted by F (xA, xB) supported on [0,+∞)2.

2. Existence of moments:the set of distributions have uniformly bounded second moments

EF[(XK)2

]< CK

0 , K = A,B

3The results of Thistle (1993) hold for non-negative random variables only.

4

with known bounds.

3. Sampling: We have access to a bivariate random sample from the unknown joint CDF. The randomsample is denoted by Xini=1 where Xi = (XA

i , XBi )∀i.

Remarks

• M is relatively compact in the weak topology (see Proposition 4 in the appendix for a proof). Thisis important for two reasons: the first being that we don’t have to worry about the Bahadur-SavageProblem annulling our efforts at meaningful inference, and secondly, it will help in proving thevalidity of our bootstrap procedure (see the conditions on page 38 of Davison and Hinkely (1997)).

• In the context of income distributions, it seems natural to have the lower bound on the support of thedistribution be equal to zero.

• The identification of the statistical model is a consequence of the Uniqueness Theorem of LaplaceTransforms; see Theorem 1 of chapter 13 in Feller (2008).

• The pairwise dependent sample configuration is useful when dealing with situations involving thecomparison of pre- and post-tax income distributions, or the distributions of separate incomes ofmarried couples for instance.

• In terms of smoothness of the joint-distributions, the set of possible distributions in M is moreflexible than what is considered in Tabri and Davidson (2011). Hence, our paper can be viewed as afirst step in extending the value-function-approach to a richer statistical model.

The testing problem of interest is given by:

H0 : MB(−t) ≤MA(−t), ∀t ∈ [0,+∞) H1 : ∃t ∈ [0,+∞); MA(−t) < MB(−t), (1)

where MA(−t) and MB(−t) are the Laplace transforms corresponding of their marginal distribution coun-terparts. The null hypothesis corresponds to BD∞A, and the alternative hypothesis to no such incidence.

Since we have a random sample of observations on a two dimensional vector, an “observation” mustbe thought of as a pair (XA

i , XBi ) of dependent drawings. The empirical likelihood function (ELF) now

ascribes probabilities pi to each pair, so that the ELF is∑n

i=1 log(pi). If it is maximized with respect topi subject only to the constraint that

∑ni=1 pi = 1, the maximizing probabilities are pi = 1/n, then the

unrestricted empirical likelihood estimator of the joint distribution is given by 1n

∑ni=1 δXi

, where δx(A) =1 (x ∈ A) , and A measurable.

Utilizing the unrestricted empirical likelihood estimate, if ISD in the sample holds in the direction ofthe null, then we do not reject the null hypothesis in (1). And if ISD in the sample holds in the directionof the alternative hypothesis i.e. A dominates B in the sample at the infinite order, then we reject the nullhypothesis in (1). Hence, we only need to use a test statistic to determine the outcome of the test, whennon-dominance at the infinite order holds in the sample, but in both directions. In this case, we propose abootstrap test using a bootstrap DGP on the boundary of the null model. The boundary of the null modelis the set of all DGPs satisfying Assumption 1, such that

MB(−t)−MA(−t) ≤ 0,∀t ∈ [0,+∞), with equality holding for at least one t ∈ (0,+∞), (2)

5

and is denoted by ∂M0. Note that condition (2) corresponds to an infinite number of inequalities, andhence, is not directly executable on a computer. We utilize an equivalent characterization of (2) thatcircumvents the use of an infinite number of inequality restrictions. It is a single equality constraint in afunctional of F, given by:

V (F ) = maxt∈(0,+∞)

MB(−t)−MA(−t) = 0. (3)

In order to test for ISD , with some abuse of notation, we use the sample version of (3) given by:

V (p1, . . . , pn;X1, . . . , Xn) = maxt∈(0,+∞)

n∑i=1

pi

(e−tX

Bi − e−tXA

i

)= 0, (4)

as the constraint in the constrained empirical likelihood optimization problem. This optimization problemis to maximize the ELF with respect to the pi, and is characterized by the Lagrangean

L =n∑i=1

log(pi) + λ

(1−

n∑i=1

pi

)− µV (p1, . . . , pn;X1, . . . , Xn) . (5)

The first order conditions for this maximization problem are given by:n∑i=1

pi = 1, V (p1, . . . , pn) = 0,∂L∂pi

=1

pi− λ− µ∂V (p1, . . . , pn;X1, . . . , Xn)

∂pi= 0; i = 1, . . . , n

(6)

where ∂V (p1, . . . , pn;X1, . . . , Xn) /∂pi are computed using the Envelope Theorem, which makes use oft? = t (p1, . . . , pn, X1, . . . , Xn)-the argument that maximizes the objective function in (4). In this case,we can directly determine that λ = n. However, we cannot explicitly solve for the probabilities and theremaining Lagrange multiplier, µ, because that would require knowledge of the explicit functional form ofV, and is equivalent to knowing the functional form of t?, which is difficult to obtain for most sample sizesthat arise in practice.

Even when these probabilities cannot be directly computed, the existence of interior solutions of theabove optimization follows from an application of Weierstrass’ Theorem. i.e. each pi appears continuouslyin the Lagrangean (5), the (random) set

H =

pi, i = 1 . . . , n :

n∑i=1

pi = 1, V (p1, . . . , pn;X1, . . . , Xn) = 0, pi ≥ 0 ∀i

, (7)

a compact (closed and bounded), convex (it is the uncountable intersection of convex sets), and providedthat it is non-empty, we know a unique solution in (6) exists. H may be empty with positive probabilitywhen non-dominance at the infinite order in the sample is observed, in both directions. However, underthe null we expect the probability of the event, H = φ, to decrease as the sample size increases. It isimportant to note that even whenH is empty, this does not limit our ability to conduct inference, because wecan only reject the null hypothesis. In the ensuing section, we describe how to implement the constrainedempirical likelihood optimization problem using a feasible-value-function approach (FVFA).

The convergence of the unrestricted estimator follows by an application of the generalized Glivenko-Cantelli Theorem; see Elker, Pollard, and Stute (1979) for details. We postpone the discussion of theconvergence of the restricted estimator till the end of the next section.

6

3 Value Function ApproachIn this section we describe how to compute the constrained empirical likelihood estimator presented in theprevious section. Recall that this estimator is not directly implementable because t (p1, . . . , pn;X1, . . . , Xn)cannot be obtained in closed form. For this reason, we cannot use the Envelope theorem to compute thepartial derivatives of the value function V with respect to the probabilities in the first order conditions (6).

Following Tabri and Davidson (2011), we recognize that ultimately t (p1, . . . , pn;X1, . . . , Xn) will de-pend on the bivariate random sample, and that we should solve simultaneously for it and the pi. Theyproposed a value function approach to accomplish this task, and in our case, this requires that we augmentthe first order conditions of the infeasible constrained empirical likelihood problem (6), with the first or-der condition that has t (p1, . . . , pn;X1, . . . , Xn) as its solution. Doing this will give rise to a system ofnonlinear equations in the variables: t, λ, µ, p1, . . . , pn.

Formally, the first order condition that defines t (p1, . . . , pn;X1, . . . , Xn) is given by:

n∑i=1

piXBi e−tXB

i =n∑i=1

piXAi e−tXA

i .

Then, using the identities:

V (p1, . . . , pn;X1, . . . , Xn) =n∑i=1

pi

(e−tX

Bi − e−tXA

i

)∣∣∣∣∣t=t?

,

∂V (p1, . . . , pn;X1, . . . , Xn)

∂pi= e−tX

Bi − e−tXA

i

∣∣∣t=t?

,

where t? = t (p1, . . . , pn, X1, . . . , Xn) , we solve the system:

n∑i=1

pi = 1,n∑i=1

pi

(e−tX

Bi − e−tXA

i

)= 0,

n∑i=1

piXBi e−tXB

i =n∑i=1

piXAi e−tXA

i

∂L∂pi

=1

pi− λ− µ

(e−tX

Bi − e−tXA

i

)= 0; i =1, . . . , n, (8)

in the the variables t, λ, µ, p1, . . . , pn, for the solution t, λ, µ, p1, . . . , pn such that

n∑i=1

pi

(e−tX

Bi − e−tXA

i

)≤ 0, ∀t ∈ (0,+∞), (9)

with equality at least at t, that maximizes the empirical likelihood.The solution from the FVFA solves for the unique solution of the first order conditions of infeasible con-

strained empirical likelihood problem (6). As in the infeasible constrained empirical likelihood problem,the system (8) also yields λ = n, and from the first order condition for the pi in (8), we have that

pi =(n+ µ

(e−tX

Bi − e−tXA

i

))−1, i = 1, . . . , n. (10)

This allows us to reduce the system (8) to a 2× 2 nonlinear system in µ and t, given by

7

n∑i=1

XAi e−tXA

i −XBi e−tXB

i

n+ µ(e−tX

Bi − e−tXA

i

) = 0,n∑i=1

e−tXBi − e−tXA

i

n+ µ(e−tX

Bi − e−tXA

i

) = 0. (11)

Then, using (11) and (10), we solve for the probabilities that are positive, sum to unity, such that (9) holds,with equality at t. This is convenient in practice since it may be difficult to check if H, given by (7), isempty or not. In the former case, we would not be able to construct such probabilities for any µ, t thatsatisfies (11), and in the latter case we would be able to do so.

Let F =∑n

i=1 piδXi, which is the constrained empirical likelihood estimator of the joint CDF. We have

the following consistency result:

Proposition 3. Let the conditions in Assumption 1 hold. If F belongs to the boundary of the null hypothesisin (1), then

supD∈D|F (D)− F (D)| p→ 0 as n→∞,

where D is a uniformity class4 for the distribution F.

Remarks

• It is not difficult to adapt the proposed methodology to the case of two independent samples ofpossibly different sizes.

• The fact that the system (8) of n+3 equations in n+3 unknowns reduces to solving system (11) whichis two equations in two unknowns is a great computational advantage over the semi-nonparametricestimator proposed in Tabri and Davidson (2011).

4 Asymptotic and Bootstrap TestsIn this section, we propose asymptotic and bootstrap tests based on a general bootstrapping procedurethat uses the bootstrap DGP obtained by implementing the method of constrained empirical likelihood,which is described in Sections 3. because the null hypothesis is a weak inequality in a function, we canreformulate it in terms of a scalar quantity that represents this. In particular, we focus on the one-sidedweighted versions of the Kolmogorov-Smirnov (KS), and Cramer von Mises (CVM) type functionals toachieve this, which are respectively given by

KSI = sup[0,+∞)

(MB(−t)−MA(−t))w(t), (12)

CVMI =

∫ +∞

0

max (MB(−t)−MA(−t), 0)2w(t)dt, (13)

where w is a nonnegative integrable weighting function. Then the null hypothesis can be checked bylooking at whether KSI ≤ 0 or KSI > 0 when the KS functional is used, and if the CVM functional is

4see Elker, Pollard, and Stute (1979) for details.

8

used, then one checks whether CVMI = 0 or CVMI > 0. We also derive the asymptotic and bootstraplikelihood ratio test for the same testing problem.

Let X denote the data vector, the proposed bootstrap procedure follows these steps: IfH is empty, thenreject the null hypothesis. Otherwise,

1. Using constrained empirical likelihood and the value function approach, estimate a bootstrap DGPalong the boundary of the null hypothesis,F .

2. Simulate B bootstrap samples each of size n,X?1, . . . ,X

?B from F .

3. Construct the (approximate) bootstrap p-value:

p?(X) =1

B

B∑i=1

1[T (X?

i ) ∈ Rej(T (X)

)], (14)

where Rej(τ) is the rejection region for the test with critical value τ.

4. Reject the null hypothesis in (1) if p?(X) < α, for a nominal level of α.

4.1 Kolmogorov-Smirnov and Cramer von Mises TestsThe KS and CVM test statistics are respectively given by

√n sup

[0,+∞)

(MB(−t)− MA(−t)

)w(t), (15)

n

∫ +∞

0

max

(MB(−t)− MA(−t), 0

)2

w(t)dt, (16)

where

MK(−t) =1

n

n∑i=1

e−tXKi , K = A,B,

are the empirical Laplace transforms of the marginal distributions.Because the null hypothesis is composite, we show that these test statistics attain the highest rejection

probability on ∂M0, which is the subject of Theorem 1, and justifies using a bootstrap DGP in ∂M0 inorder to control the tests’ level and size properly. This property is the basis on which we build the tests. Wealso derive the asymptotic null distribution theory of the test statistics. A key step in developing these tworesults was to recognize that the test statistics are continuous (convex) functionals of the of the stochasticdominance random function

1√n

n∑i=1

(e−tX

Bi − e−tXA

i

)t ∈ [0,+∞). (17)

To prove Theorem 1, we used the fact that these test statistics are also monotone increasing functionalsof the random function (17) ( i.e. given two elements in the sample space of the random function (17)denoted by h and h′ such that h(t) ≥ h′(t)∀t, we have T (h) ≥ T (h′), and that rejection of the nullhypothesis is based on large values of these test statistics.

9

Theorem 1. Let T denote either the KS or CVM test statistics. Suppose that the distribution FB is changedso that the new distribution is weakly stochastically dominated by the old at the infinite order, and that thisnew distribution also stochastically dominates FA at the infinite order5. Then, the new distribution ofT weakly stochastically dominates its old distribution, at first order. If FA is changed so that the newdistribution weakly stochastically dominates the old at the infinite order, and that it is dominated by FB atthe infinite order, the same conclusion holds.

Remarks:

• Theorem 1 shows that as we move toward ∂M0 from a DGP in the null model, the rejection probabil-ity increases. Hence, for the KS and CVM test statistics, Theorem 1 justifies estimating a bootstrapDGP that belongs to the boundary of the null hypothesis.

• The least favorable case is a subset of ∂M0 where the data generating processes, joint CDFs de-noted by F, with marginal CDFs (FA, FB), are such that MB(−t) = MA(−t), ∀t ∈ (0,+∞). ByAssumption 1, the Laplace transform functions uniquely characterize the marginal CDFs. Then, bythe above equality and the Laplace Inversion Theorem, the least favorable case can be characterizedin terms of the marginal and joint CDFs. It is the set of all joint distributions with marginal CDFs(FA, FB), such that FA = FB. One can construct a bootstrap test using a bootstrap DGP that belongsto the least favorable case. However, as pointed out in Linton, Maasoumi, and Whang (2005), sucha test is asymptotically non-similar on the boundary of the null hypothesis, and therefore, biased6.

To develop the null asymptotic distribution theory, we used the Continuous Mapping Theorem, and thefollowing weak convergence result for the random function (17).

Theorem 2. The empirical process

1√n

n∑i=1

(e−tX

Bi − e−tXA

i −MB(−t) +MA(−t))

t ∈ [0,+∞),

with pairwise dependent samples converges weakly to a tight, zero mean Gaussian stochastic processG(t), t ∈ [0,+∞) with covariance kernel

Ω(s, t) =MB (−(s+ t))−MB(−s)MB(−t) +MA (−(s+ t)) +MB(−s)MA(−t)−MB,A (−s,−t)−MA(−s)MA(−t) +MB(−t)MA(−s)−MB,A (−t,−s) , ∀s, t ∈ [0,+∞),

(18)

where MB,A (−t,−s) is the joint Laplace transform, and MA(−s),MB(−t) are the marginal Laplacetransforms.

At this point, we can invoke the Continuous Mapping Theorem to establish the existence of asymptoticnull distributions. Because the random function (17) is bounded with probability one, no modification oftest statistics is necessary to ensure that they have non-degenerate asymptotic null distributions7. Now we

5This construction ensures that the new DGP is in the null model6Linton, Song, and Whang (2010) provide a simple example of how such a test is biased against certain local alternatives.7In the case of testing for finite order stochastic dominance under the null, when the supports of the random variables are

unbounded, Horvath, Kokoszka, and Zitikis (2006) suggested using weighted versions of test statistics that depend on the fullunbounded support of the joint CDF, because their asymptotic null distributions were non-degenerate. In testing for infiniteorder dominance under the null, we do not have this problem

10

present the main result of this section, the asymptotic validity of the bootstrap test based on the KS andCVM test statistics.

Theorem 3. Let F ∈ ∂M0, and let T denote the KS or CVM test statistics. Furthermore, let GF, T ,∞(·)denote the asymptotic cumulative distribution function of T , based on bivariate random samples from theCDF F. And let

GF ,T ,n(q) = Prob[T (X?) ≤ q | X

]be the probability distribution of the test statistic T based on the bootstrap DGP F , that is conditional onthe data X. Then, ∀ε > 0,

limn→∞

Prob[supq∈R

∣∣GF , T , n(q)−GF, T ,∞(q)∣∣ > ε

]= 0. (19)

Now, following Linton et al. (2010), we pay attention to the control of asymptotic rejection probabilitiesin the presence of a point mass in GF, T ,∞(·) at zero, and describe the set of probabilities in ∂M0 underwhich our bootstrap tests have asymptotically exact size.

Corollary 1. Let GF , pT , n(·) denote the distribution of the bootstrap p-value (14) with B → ∞, and

∀α ∈ (0, 1/2).

1. GF , pT ,∞(α) ≤ α.

2. If GF, T ,∞(·) is continuous, then GF , pT ,∞(α) = α.

The nominal level, α, cannot exceed 1/2 because of the possible presence of a point mass at zero inGF, T ,∞(·) (for DGPs in ∂M0). The mass present there is not large to the extent that the (1−α)-th quantileof GF, T ,∞(·) is zero (for conventional significance levels), for all DGPs in ∂M0. In fact, the mass doesnot exceed 1/2. The second part of Corollary 1 tells us that the bootstrap tests are asymptotically similaron subsets of ∂M0 that give rise to an asymptotic null distribution of the test statistic, that is continuous.

4.2 Likelihood Ratio TestsSince we are able to compute a restricted empirical likelihood estimator, we derive the asymptotic distri-bution theory of the likelihood ratio test statistic given by

LR = −2n∑i=1

log (npi) , (20)

where the pi are derived using the FVFA as described in Section 3. Now we present a result that describesthe large sample behavior of the LR statistic with respect toM, which is the main result of this section.

Theorem 4. Let F ∈M, and let LR be given by (20). Then,

LRd−→

χ21, if F ∈ ∂M0

0, if F ∈M0 − ∂M0

+∞, if F ∈M−M0.

as n→ +∞.

11

An important result in Theorem 4 is that the likelihood ratio statistic is asymptotically pivotal on ∂M0

with a χ21 distribution, because of its implications for bootstrap testing. Specifically, it is that the bootstrap

LR test should benefit from asymptotic refinements in finite samples; see Beran (1988). The type oflarge sample result presented in 4 is known in the statistics literature as a Wilks Phenomenon, and isunprecedented in the literature on nonparametric inference for stochastic dominance orderings under thenull.

Theorem 4 also has implications for the power of the asymptotic and bootstrap LR, which are presentedin Corollary 2 below.

Corollary 2. Given 0 < α < 1, let cv(α) be the (1 − α)-th quantile from χ21 distribution. Then, ∀F ∈

M−M0

1. Prob [LR ≥ cv(α)] −→ 1, as n −→ +∞,

2. Prob[p?(X) < α

]−→ 1, as n −→ +∞.

Corollary 2 tells us that the the asymptotic and bootstrap tests are consistent.

5 Simulation ExperimentsIn this section, we describe the simulation design in our Monte Carlo experiments. We compare the sizeand power of three tests. The CVM bootstrap tests based on the FVFA, and the bootstrap empirical processprocedure proposed in Linton, Song, and Whang (2010) (LSW), and the asymptotic LR test.

In practice, it is important to realize that whenever dominance of the infinite order holds in the sample,we can immediately decide on the outcome of the test without resorting to the use of a test statistic. Thatis because any reasonable test statistic that measures discrepancy between two Laplace transforms whichis sensitive to departures from the null, will take values in the complement of its critical region. In thisrespect, one should use a test statistic for testing the null hypothesis whenever non-dominance of theinfinite order holds in the sample. Also, note that when non-dominance in the sample takes the form ofdominance that is in the opposite direction of that of the null hypothesis, again, a test statistic will not berequired to decide on rejecting the null. Therefore, it is only when non-dominance in both directions holdsin the sample, with a non-severe form of non-dominance from the perspective of the null hypothesis, itbecomes necessary to use a test statistic to decide on the outcome of the test.

With this in mind, under the null and alternative hypotheses, the simulation in each experiment wasconditional on the sample having the following properties:

1. Non-dominance:

W = maxt∈R+

MB(−t)− MA(−t)

> 0 > min

t∈R+

MB(−t)− MA(−t)

= W.

2. Correct tail of the stochastic dominance function:

MB(−t)− MA(−t) 0, as t→ +∞.

3. Severity of non-dominance in the direction opposite to the null hypothesis:W < |W | .

12

We considered DGPs with parent joint distribution from the correlated (univariate) Gamma family ofdistributions, Γ(a, b), where a and b are shape and scale parameters respectively, and the correlationaldependence structure is the standard Gaussian copula. This Gamma probability family is useful for oursimulation experiments because their Laplace transforms are known in closed form, and are given by

M(−t) = (1 + bt)−a , ∀t ≥ 0. (21)

In all of our tentative simulations we fixed the correlation parameter to 0.6. We also considered the fol-lowing sample sizes: 16, 32, 64, and 128, and used 1000 replications per experiment, and 599 bootstrapsamples per replication in each experiment8. For completeness, we also describe in detail the LSW boot-strap empirical process procedure that uses the CVM test statistic (16). It proceeds as follows:

1. Using unconstrained empirical likelihood, estimate a bootstrap DGP, F .

2. Using the data, estimate the contact set using,

S =t ∈ [0,+∞) :

∣∣∣MB(−t)− MA(−t)∣∣∣ ≤ cn

, (22)

and test statistic as in (16), where cn → 0+ and cn√n→ +∞, as n→ +∞.

3. Simulate B bootstrap samples each of size n, X?1, . . . ,X

?B from F , and their corresponding boot-

strapped test statistics: T ?1 , . . . T?B, where

T ?i =

n

∫S∩[0,100)

max (M?B(−t)−M?

A(−t), 0)2 dt ,∫S∩[0,100) dt > 0 (23)

n

∫[0,100)

max (M?B(−t)−M?

A(−t), 0)2 dt ,∫S∩[0,100) dt = 0 (24)

4. Construct the (approximate) bootstrap p-value:

p?(X) =1

B

B∑i=1

1[T ?i > T

]. (25)

5. Reject the null hypothesis in (1) if p?(X) < α, for a nominal level of α.

Remarks:

• The bootstrap DGP proposed in the LSW procedure is the resampling DGP, which is a multino-mial distribution on the bivariate sample. And in our simulation designs, it satisfies the alternativehypothesis.

8These simulations are tentative because I will need to run the experiments with 10000 replications for more accurate results,and experiment with different DGPs as well. They were implemented using MATLAB 2011a using M code that we wrote ona desktop computer equipped with a quad-core processor, which took approximately 35 hours per experiment to complete. Weare in the process of working out our computing needs to run the experiments with each one having 10000 replications.

13

• The finite sample performance of the LSW bootstrap method depends heavily on the choice of thesequence cn in estimating the contact set, S, because it determines how the bootstrapped test statisticsare computed, which in turn determines the rejection probabilities (under the null and alternativehypotheses). Furthermore, Linton et at. (2010) do not provide any optimal procedure for selectingthe sequence cn. They only show in their Monte Carlo experiments that the choices of

cn = k0n−1/2 log (log(n)) , k0 ∈ 3, 3.2, . . . , 4 , (26)

perform well in finite samples, which is only suggestive because it might be due to the simulationdesign, and they do not compare it to other sequences. In contrast, our method does not require theuse of a contact set, and hence, circumvents this problem.

• Notice that if S∩[0, 100) has positive Lebesgue measure (i.e. case (23) above), then the bootstrappedtest statistics are not computed in exactly the same way as the test statistic based on the data. Forthis reason some caution is necessary when interpreting the approximate bootstrap p-value based onthem, given by equation (25). Only asymptotically can we interpret it as an actual p-value, becausethe bootstrapped distribution converges to the distribution of the test statistics under the null. Notethat, the bootstrap p-value in the LSW approach is averaging over random variables (23) and (24), inwhich the former is the source of over-rejections, and the latter being the source of under-rejections.In contrast, our method is based on the restricted maximum likelihood principle, and therefore, givesrise to an approximate bootstrap p-value having the proper interpretation in finite samples.

• It is only when the bootstrapped test statistics are solely computed according to (24), that theyare computed in exactly the same way as T . However, since

∣∣∣MB(−t)− MA(−t)∣∣∣ is a continuous

function of t for each n, the contact set intersected with the interval [0, 100) will have positiveLesbegue measure with probability one, provided that cn is small enough. For this reason we chosecn = n−1/3 in our simulation experiments when comparing their method to ours.

5.1 Simulation Under the Null HypothesisIn this section, we present our tentative results from simulation experiments under the null. The DGPis one of equal marginal distributions, equal to Γ(3, 1/2). In the simulation experiments, we conductedthe CVM bootstrap tests using the LSW procedure and the FVFA, and the asymptotic test based on thelikelihood ratio statistic. We chose cn = n−1/3 in the LSW bootstrap procedure, because it convergesto zero at a faster rate than any of the sequences in (26). Interestingly, in all our experiments under thenull, we obtained an empirical size of zero for the test that uses the LSW procedure, for all conventionalsignificance levels. The empirical size distortion for the CVM bootstrap test based on FVFA, and theasymptotic LR test are reported in Table 1. The estimates in Table 1 show that the CVM bootstrap testusing the FVFA has lower size distortion across experiments in comparison to the asymptotic LR test, forconventional significance levels. And the approximate bootstrap p-value distributions for all the tests inour experiments are presented in Figure 1.

14

Table 1: Empirical Size Distortions

Sample size Method Significance Level0.01 0.05 0.1

16FVFA 0 (0.0995) -0.033 (0.1293) -0.055 (0.2073)

LR 0.065 (0.2618) 0.106 (0.3629) 0.107 (0.4053)

64FVFA 0.03 (0.196) -0.001 (0.2159) -0.029 (0.2568)

LR 0.136 (0.3531) 0.166 (0.4115) 0.167 (0.4424)

128FVFA 0.062 (0.2585) 0.027 (0.2666) -0.005 (0.2932)

LR 0.182 (0.3939) 0.22 (0.444) 0.232 (0.4709)

Notes:the estimated size distortions from the FVFA and LR tests, with estimated standard errors in paren-theses.

For all of these experiments, the CVM bootstrap test using the LSW procedure performed very poorly.This is partly because the bootstrap DGP chosen by the LSW procedure in our experiments satisfies thealternative, which naturally gives rise to under-rejections. From Figure 1, we observe that its bootstrapp-value distribution places zero probability on the [0, 0.4] interval. The other aspect driving the inferiorityof the LSW procedure is that the event

Figure 1: Using MATLAB, we simulated the CDFs of the p-value for the feasible value-function-approach(FVFA), bootstrap empirical process method of Linton, Song, and Whang (2010) (LSW), and of the asymp-totic LR test (LR).

max

t∈[0,+∞)

∣∣∣MB(−t)− MA(−t)∣∣∣ ≤ 1√

n

, (27)

has a high probability of occurrence in our experiments. Now, becausemax

t∈[0,+∞)

∣∣∣MB(−t)− MA(−t)∣∣∣ ≤ 1√

n

⊂

maxt∈[0,+∞)

∣∣∣MB(−t)− MA(−t)∣∣∣ ≤ cn

,

holds for all cn such that cn√n → ∞, the estimated contact sets (S, given by (22)) are equal to the entire

interval, [0,+∞) in our experiments, and this would be true for any valid choice of cn. This implies thatthe bootstrapped test statistics in the LSW procedure are constructed in exactly in the same way as Tn is

15

from the data, which are obtained from a bootstrap DGP (multinomial on the sample pairs) that satisfiesthe alternative hypothesis.

To estimate the likelihood of the event (27) for small and large sample sizes, we simulated the randomvariable |W | , because

maxt∈[0,+∞)

∣∣∣MB(−t)− MA(−t)∣∣∣ = |W | , (28)

holds in our simulation design. Figure 2 presents the boxplot for 10000 replications of the random variable|W | in our simulation design, for sample sizes 64, 128, 1024, and 2048. The important finding is thatthe relative frequency of the event (27) is unity for all sample sizes, because the maximum of in eachboxplot is less than n−1/2, for its corresponding sample size. Additionally, the boxplots also reveal that the(conditional) null distribution of |W | is skewed, and as the sample size increases, it places more probabilityon the lower end of its support. We expect to see such characteristics since the “true” DGP in the model ofthe null hypothesis is one of equality in the marginal distributions.

Figure 2: Using MATLAB, for each sample sizes 64, 128, 1024, and 2048, we generated a boxplot of 104

samples of the random variable,|W | , that defines the event (27), in our experiments under the null. Theboxes in the plot contain 50% of the |W | variable, the central bar indicates the median. The whiskersindicate the range containing “most” of the simulated values of |W | ; the simulated values not included inthis range are indicated by red crosses.

5.2 Simulation Under the Alternative HypothesisIn this section, we present our tentative results from simulation experiments under the alternative. TheDGP is based on the correlated Γ(3, 1/2) and Γ(2, 1) (marginal) distributions. We selected these marginaldistributions because

16

Figure 3: The first panel presents a plot of the marginal CDFs of Γ(3, 1/2) and Γ(2, 1). The second paneldepicts the difference in the Laplace transforms of the two marginal CDFs.

neither dominates the other at the infinite order. Figure 3 presents the graphs of the marginal cumulativedistributions, and the difference in their corresponding Laplace transforms. With respect to our notation,we treat Γ(3, 1/2) as distribution B, and Γ(2, 1) as distribution A.

For n = 16, 32, 64, with 1000 replications per experiment, we conducted the CVM bootstrap tests usingthe LSW procedure and the FVFA, and the asymptotic test based on the likelihood ratio statistic. Similarto the previous section, the test based on the LSW procedure performed very poorly. Its empirical power iszero across all experiments, and for all conventional significance levels. Whereas the other tests had muchbetter empirical power properties, which are reported in Table 2 below. Figure 4 displays the boxplots ofthe p-values for all the tests. Since the alternative is known to be true, a superior test should have smallerp-values leading to greater rejections. In Figure 4, we observe that the test based on the FVFA, has higherpower than its LSW counterpart, and that the asymptotic LR test clearly has the highest power.

Figure 4: Boxplot of p-values from simulation under the alternative for the LSW (green), FVFA (blue)bootstrap procedures, and the LR test (red).

As in the previous subsection, the source of the poor finite sample performance of the CVM bootstraptest that uses the LSW procedure is partly due to the fact that the bootstrap DGP (multinomial on the

17

sample pairs) belongs to the alternative in our experimental design. The other source driving the inferiorityof this test is that the conditional probability of the event (27) (conditional on our experimental design), ishigh. To better assess how high this probability is, we simulated the random variable |W | 10000 times9

Table 2: Empirical Power

Sample size Method Significance Level0.01 0.05 0.1

16FVFA 0.024 (0.153) 0.115 (0.319) 0.237 (0.4252)

LR 0.225 (0.4176) 0.34 (0.4737) 0.416 (0.4929)

32FVFA 0.152 (0.359) 0.358 (0.4794) 0.541 (0.4983)

LR 0.589 (0.492) 0.697 (0.4596) 0.755 (0.4301)

64FVFA 0.63 (0.4828) 0.785 (0.4108) 0.865 (0.3417)

LR 0.9130 (0.2818) 0.942 (0.2337) 0.954 (0.2095)

Notes: empirical power from the FVFA and LR tests, with estimated standard errors in parentheses.

under the alternative in our experimental design for sample sizes n = 16, 32, and 64, and plotted themusing boxplots, which are depicted in Figure 5.

In Figure 5, we observe that the relative frequency of the event (27) in 104 replications is unity (i.e. themaximum value of W is less than n−1/2, for each sample size). Hence, as with the case under the null,the bootstrapped test statistics in the LSW procedure will be computed in exactly the same way as Tn.With these bootstrapped test statistics being simulated from a bootstrap DGP that satisfies the alternativehypothesis (due to our simulation design), we can expect the bootstrap test based on the LSW procedure tonot have high power, as demonstrated in Figure 4 (in fact, the test is biased for all conventional significancelevels). The results of this experiment are only suggestive, and do not constitute a definitive conclusionconcerning the inferiority of the LSW procedure, when compared to the one based on a FVFA for thistesting problem. For this reason, more experiments are required to in which we consider other parentdistributions, and larger sample sizes as well.

Figure 5: Boxplot of simulated values of |W | under the H1 simulation design.

9Note that (28) still holds in our experimental design under the alternative.

18

6 ConclusionIn this paper, we focused on developing tests for infinite order stochastic dominance under the null, becausesuch tests are essentially tests of whether or not one can rank two (marginal) CDFs using a stochasticdominance relation of some finite order, in a specific direction. We developed a bootstrap procedure whichis applicable to a wide class of test statistics that can be used to test for infinite order stochastic dominance.We used the method of constrained empirical likelihood to derive a bootstrap DGP that satisfies the twogolden rules of bootstrapping. In deriving it, we extended the FVFA due to Tabri and Davidson (2011) tothe case of empirical likelihood. This development is a first step to working with a more flexible statisticalmodel than the one used in Tabri and Davidson (2011). We also derived asymptotic and bootstrap testsbased on the LR statistic, in which a Wilks phenomenon was unveiled. This development is unprecedentedin the inference literature for testing unidirectional stochastic dominance under the null. Furthermore, thisresult has theoretical implications concerning the existence of asymptotic refinements for the bootstrap LRtest in finite samples. Finally, A remarkable feature of our proposed restricted estimator of the bootstrapDGP, as described in Section 3, is its computational simplicity: the number of unknown variables requiredto compute it is always two, regardless of the sample size.

For various sample sizes, we demonstrated in our simulation experiments that our bootstrap procedureperforms better than the one proposed by Linton et. al (2010), in terms of lower test size distortion,and higher power. The inferiority of their method in our simulation experiments was due to the fact thatthe approximate contact set- a cn enlargement of the contact set- did not play any role in their bootstrapprocedure. Specifically, it was a high probability event that the approximate contact set had a positivemeasure, and was equal to the entire domain of integration for the CVM test statistic that we used. For thisreason, their procedure amounted to bootstrapping from a DGP that satisfies the alternative hypothesis,in which the bootstrapped test statistics are computed in exactly the same way as the test statistic that iscomputed from the data. It is important to remind ourselves that, in practice, we do not have access to the“true” DGP. Therefore, when using the LSW bootstrap procedure with actual data, we cannot determinewhether or not we are facing a situation as in our simulation experiments. This is why it is important whendesigning a bootstrap procedure, that the two golden rules of bootstrapping are implemented. Although oursimulations demonstrated the power of the asymptotic LR test over the other tests we used, it may be dueto the optimality of employing a likelihood ratio test as it is deeply tied to such notions in statistical testing.Deriving the regularity conditions under which such optimality follows is part of my current research onthe subject. Otherwise, more simulations are underway, in which we consider the bootstrap LR and KStests, and will be provided in an updated version of the paper.

The methods developed in this paper can also be adapted to tests for finite orders of stochastic domi-nance. For order three and higher, it is straightforward to develop a feasible value-value-function approachas described in this paper. In the case of first order stochastic dominance, the objective function used todetermine the value function is the mathematical difference in the two estimated marginal distributions.These estimates of the distribution functions are discontinuous by construction, and for this reason, wecannot prescribe a feasible value-function-approach that uses the standard classical optimization theoryas in this paper. The discontinuity of the estimates of the distribution functions is also the reason whywe cannot directly extend the methods in this paper to testing for unidirectional second order stochasticdominance under the null. In this case, the objective function in the optimization problem that determinesthe value function is continuous, however, its derivative only exists at points where the estimates of themarginal distribution functions are continuous. How to circumvent this problem is one of the subjects of

19

my on-going research projects.

ReferencesA. Atkinson. On the measurement of inequality. Journal of Economic Theory, 2:244–263, 1970.

G. F. Barrett and S. G. Donald. Consistent tests for stochastic dominance. Econometrica, 71(1):71–104,2003.

C. Bennett. Consistent tests for completely monotone stochastic dominance. 2007.

R. Beran. Prepivoting test statistics: A bootstrap view of asymptotic refinements. Journal of the AmericanStatistical Association, 83:1967–1986, 1988.

P. Billingsley. Convergence of Probability Measures. John Wiley & Sons, New York, 1999.

R. Davidson. Bootstrapping Econometric Models. Quantile, 3:13–36, 2007.

R. Davidson and J-Y Duclos. Statistical inference for stochastic dominance and for the measurement ofpoverty and inequality. Econometrica, 68:1435–1464, 2000.

A. C. Davison and D. V. Hinkley. Bootstrap Methods and their Application. Cambridge Series in Statisticaland Probabilistic Mathematics. Cambridge University Press, 1997.

J. Elker, D. Pollard, and W. Stute. Glivenko-Cantelli Theorems for Classes of Convex Sets. Advances inApplied Probability, 1(4):820–833, 1979.

W. Feller. AN INTRODUCTION TO PROBABILITY THEORY AND ITS APPLICATIONS, 2ND ED.Number v. 2 in Wiley publication in mathematical statistics. Wiley India Pvt. Limited, 2008. ISBN9788126518067.

P. C. Fishburn and R. D. Willig. Transfer Principles in Income Redistribution. Journal of Public Eco-nomics, 25:323–328, 1984.

J. Foster and A. Shorrocks. Poverty orderings. Econometrica, 56:173–177, 1988.

J. Foster, E. J. Greer, and E. Thorbecke. A class of decomposable poverty measures. Econometrica, 52:761–766, 1984.

R. Hall and R. Wilson. Two Guidelines for Bootstrap Hypothesis Testing. Biometrics, 47:757–762, 1991.

J. Horowitz. The Bootstrap, volume 5 of Handbook of Econometrics. Elsevier, 1999.

L. Horvath, P. Kokoszka, and R. Zitikis. Testing for Stochastic Dominance using the Weighted Mcfadden-type Statistic. Journal of Econometrics, 133:191–205, 2006.

J. Knight and S. Satchell. Testing for infinite order stochastic dominance with applications to finance, risk,and income inequality. Journal of Economics and Finance, 32:35–46, 2008.

20

S-C. Kolm. Unequal Inequalities I. Journal of Economics Theory, 12:416–442, 1976a.

S-C. Kolm. Unequal Inequalities II. Journal of Economics Theory, 13:82–111, 1976b.

M. Kosorok. Introduction to Empirical Process Theory and Semiparametric inference. Springer, 2007.

E. Lehmann and P. Romano. Testing Statistical Hypotheses. Springer, 2005.

O. Linton, E. Maasoumi, and Y-J. Whang. Consistent testing for stochastic dominance under generalsampling schemes. Review of Economic Studies, 72:735–765, 2005.

O. Linton, K. Song, and Y-J. Whang. An Improved Bootstrap Test for Stochastic Dominance. Journal OfEconometrics, 154:186–202, 2010.

D. McFadden. Testing for stochastic dominance. Studies in the Economics of Uncertainty in honor ofJosef Hadar. Springer-Verlag, 1989.

A. Owen. Empirical Likelihood, volume 92 of Monographs on Statistics and Applied Probability. Chap-man & Hall/CRC, 2001.

R. V. Tabri and R. Davidson. Asymptotic and Bootstrap tests for stochastic dominance via the Method ofMaximum Likelihood. 2011.

P. D. Thistle. Negative moments, risk aversion and stochastic dominance. Journal of Financial QuantitativeAnalysis, 28:301–311, 1993.

D. W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilitic Mathematics.Cambridge University Press, first edition, 2000.

V. Zinde-Walsh. Kernel Estimation when Density may not Exist. Econometric Theory, 24(3):696–725,2008.

A ProofsProposition 4. The statistical model defined in Assumption 1 is relatively compact in the topology of weakconvergence.

Proof. First, let the statistical model be denoted byM, and let a typical element ofM is a DGP, and inour setup, it is a joint distribution of the (bivariate) random sample, with parent bivariate distribution sat-isfying the properties described in Assumption 1. We denote the parent distribution by F , and the elementcorresponding to it in the statistical model by F

⊗n . Our aim is to show thatM it is relatively compact, and

we will use Prokhorov’s Theorem (stated as Theorem 5.1 in Billingsley (1999)). That is, we will show thatM is tight.

Let η > 0, we need to find a compact set K ⊂ [0,+∞)2n such that F⊗n [K] > 1− η, for all F

⊗n ∈ M.

To that end, let ε > 0 be given, and consider the set

H =∃i ∈ 1, . . . , n, K ∈ A,B;XK

i > ε.

21

Then,

F⊗n [H] ≤

n∑i=1

B∑K=A

FK[XKi > ε

],

holds by subadditivity of a measure. By Chebychev’s inequality,

FK[XKi > ε

]≤EFK

[XK]2

ε2.

Then, by the uniform bound in Assumption 1,

EFK

[XK]2

ε2≤ CK

0 /ε2, ∀F

which implies thatF

⊗n [H] ≤ 2n max

K=A,B

CK

0

/ε2, ∀F

⊗n ∈M.

Therefore,

F⊗n

[XAi ≤ ε,XB

i ≤ ε, i = 1, . . . , n]

= 1−F⊗n [H] ≥ 1−2n max

K=A,B

CK

0

/ε2 > 1−n max

K=A,B

CK

0

/(ε2),

holds ∀F⊗n ∈M. Let η = n maxK=A,B

CK

0

/(ε2), and let

K = [0,+∞)2n −H =

XKi ≤

√n maxK=A,B CK

0 η

, i = 1, . . . , n, K = A,B

,

which is compact.

Lemma 2. The class F defined in (38) is a measurable VC subgraph class, and is Pointwise measurable.Furthermore, the class of functions Fδ and F2

∞ given by Definition 3, are pointwise measurable for everyδ > 0.

Proof. For the proof that (38) is a measurable VC subgraph class, see Bennett (2007). To show that Fis pointwise measurable, consider the countable set G = e−tx − e−ty, t ∈ Q+, where Q+ is the set ofpositive rational numbers. Then, since Q+ is a dense subset of R+, we are done. To show that Fδ and F2

∞are pointwise measurable, consider a dense subset of each, in which t is rational. The result follows by thefact that the rational numbers are a dense subset of the real number system.

Lemma 3. Consider two non-negative random variable J and J ′, with common support [0, s]], and let(Ui)

ni=1 be a random sample from the uniform distribution U [0, 1]. Obtain random samples of the distri-

butions using the uniform random numbers the quantile functions of the distributions, and construct theempirical Laplace transforms of these distributions: MK(−t) = 1

n

∑ni=1 e

−tzKi , t ≥ 0, where (zKi )ni=1,K = J, J ′. Then,

MJ(−t) ≥MJ ′(−t), ∀t ≥ 0 =⇒ MJ(−t) ≥ MJ ′(−t), ∀t ≥ 0.

22

Proof of Lemma 3:

Proof. The proof proceeds by contraposition. By contraposition, there exists t′ such that MJ(−t′) <MJ ′(−t′), which means that (the random variable) MJ(−t′), dominates MJ ′(−t′) at the first order. LetGKt′,n(·), K = J, J ′ denote their cumulative distribution functions, then this first order dominance in terms

of the CDFs isGJt′,n(x) < GK

t′,n(x),∀x ∈ [0, 1].

We also have that

MK(−t′) = E[MK(−t′)

]=

∫ 1

0

gdGKt′,n(x), K = J, J ′, (29)

where the first equality follows form the unbiasedness of the empirical Laplace transform at a point. Then,applying integration by parts to the quantity in the extreme right of (29) yields:

MK(−t′) = 1−∫ 1

0

GKt′,n(x)dg, K = J, J ′.

Hence,

MJ(−t′)−MJ ′(−t′) =

∫ 1

0

GJ ′

t′,n(x)dg −∫ 1

0

gdGJt′,n(x) < 0.

Thus, we found a point t′, such that MJ(−t′)−MJ ′(−t′) < 0.

Proof of Proposition 3:

Proof. Let F be the unrestricted empirical likelihood. Fix D ∈ D, then F (D)−F (D) = F (D)− F (D)+F (D)− F (D). Then, by the triangle in equality:

supD∈D|F (D)− F (D)| ≤ sup

D∈D|F (D)− F (D)|+ sup

D∈D|F (D)− F (D)|.

The part supD∈D |F (D) − F (D)| goes to zero in probability by the Glivenko-Cantelli Theorem. All wehave left to show is that supD∈D |F (D)− F (D)| = op(1). Fixing D ∈ D, notice that

F (D)− F (D) =n∑i=1

(pi −1

n)δXi

(D) =µ

n

n∑i=1

(e−tX

Ai − e−tXB

i

n+ µ(e−tX

Bi − e−tXA

i

)) δXi(D),

which implies that

supD∈D|F (D)− F (D)| ≤ |µ|

n

n∑i=1

∣∣∣∣∣ e−tXAi − e−tXB

i

n+ µ(e−tX

Bi − e−tXA

i

)∣∣∣∣∣ ≤ |µ|nn∑i=1

∣∣∣∣∣ 2

n+ µ(e−tX

Bi − e−tXA

i

)∣∣∣∣∣ .If µ = op(1), then we are done. In fact, µ = op(1) holds, and can be shown by applying the Mean ValueTheorem to the 2× 2 system (11) in the variables t and µ, in the neighborhood of a

t0 ∈ arg max MB(−t)−MA(−t), t ∈ (0,+∞) ,

23

and zero respectively. It is important to note that when F belongs to the boundary of the null,

arg max MB(−t)−MA(−t), t ∈ (0,+∞) , (30)

can be a set. When F belongs to the alternative model, (30) is a singleton10.The details are as follows:

0 =1

n

n∑i=1

(e−t0X

Bi − e−t0XA

i

)+ ψ11µ+ ψ12(t− t0) (31)

0 =1

n

n∑i=1

(XAi e−t0XA

i −XBi e−t0XB

i

)+ ψ21µ+ ψ22(t− t0), (32)

where

ψ11 = −n∑i=1

(e−tX

Bi − e−tXA

i

)2n+ µ

(e−tX

Bi − e−tXA

i

)∣∣∣∣∣∣∣(µ,t)

, ψ12 =n∑i=1

(XAi e−tXA

i −XBi e−tXB

i

)[n+ µ

(e−tX

Bi − e−tXA

i

)]2∣∣∣∣∣∣(µ,t)

,

ψ21 = −n∑i=1

n(e−tX

Bi − e−tXA

i

)(XAi e−tXA

i −XBi e−tXB

i

)[n+ µ

(e−t0X

Bi − e−t0XA

i

)]2∣∣∣∣∣∣(µ,t)

,

ψ22 =n∑i=1

((XB

i )2e−tXBi − (XA

i )2e−tXAi

)n+ µ

(e−tX

Bi − e−tXA

i

)∣∣∣∣∣∣(µ,t)

− µn∑i=1

(XAi e−tXA

i −XBi e−tXB

i

)2[n+ µ

(e−tX

Bi − e−tXA

i

)]2∣∣∣∣∣∣∣(µ,t)

(33)

with |µ| ≤ |µ|, and |t− t0| ≤ |t− t0|. Now, let Ψn =( ψ11 ψ12

ψ21 ψ22

). Then, we have in matrix form

[µ

t− t0

]= Ψ

−1n

− 1n

∑ni=1

(e−t0X

Bi − e−t0XA

i

)− 1n

∑ni=1

(XAi e−t0XA

i −XBi e−t0XB

i

) , (34)

provided that

ψ11ψ22 − ψ21ψ12 6= 0. (35)

Note that, (35) holds wheneverH, given by (7), is non-empty.From this point on, we want to use Slutsty’s Theorem and the Weak Law of Large Numbers, to de-

termine the limits of µ and t, depending on whether the “true” DGP belongs to the boundary of the nullhypothesis, or the alternative hypothesis. Therefore, we require that Ψn

p→ Ψ, positive definite, which ismet since

10A quick proof by contradiction proceeds as follows. Suppose that t0, t′0, belong to (30) with t0 6= t′0, and MB(−t0) −MA(−t0) = MB(−t′0) − MA(−t′0). Without loss of generality, suppose that t0 < t′0, holds. Then, using MK(−t0) >MK(−t′0),K = A,B (i.e. Laplace transforms are strictly decreasing functions on (0,+∞)), MB(−t0) − MA(−t0) >MB(−t′0)−MA(−t0) > MB(−t′0)−MA(−t′0). Contradiction.

24

1.ψ11

p→ −EF[e−t0X

B − e−t0XA]2< 0

because

EF

[e−t0X

B − e−t0XA]2>(EF

[e−t0X

B − e−t0XA])2≥ 0 (36)

holds, where the strict inequality in (36) is due to an application of Jensen’s inequality (i.e. thesquare function is strictly convex on [−1, 1]), and that the random variable e−tXB − e−tXA is non-degenerate for every t ∈ (0,+∞). The weak inequality in (36) is an equality when F belonging tothe boundary, and strictly greater than zero when it belongs to the alternative.

2.ψ22

p→ EF

[(XB)2e−t0X

B − (XA)2e−t0XAi

]< 0,

because the second moments exist, and the second order condition for a local maximum at t0 isfulfilled.

3.ψ12

p→ 0, ψ21 = Op (1) .

Furthermore, we have that the probability limit of the vector in (34) − 1n

∑ni=1

(e−t0X

Bi − e−t0XA

i

)− 1n

∑ni=1

(XAi e−t0XA

i −XBi e−t0XB

i

) p−→

−EF[e−t0X

B − e−t0XA]

−EF[XAe−t0X

A −XBe−t0XB] (37)

by an application of the Weak Law of Large Numbers. If F belongs to the boundary of the null, then limit isthe zero vector. If F belongs to the alternative hypothesis, then the probability limit of t− t0 is zero, whereas the probability limit of µ 6= 0. The former follows from the fact that EF

[XAe−t0X

A −XBe−t0XB]

is

the first order condition defining t0, and the latter result from the fact that at EF[e−t0X

B − e−t0XA]> 0,

under the alternative.From this proof, we observe that µ and t− t0 are both Op

(n−

12

)for F on the boundary of the null, and

in the alternative hypothesis.

Proof of Theorem 2:

Proof. Our approach relies on viewing the empirical process under study as indexed by a class of functionssatisfying various properties. These properties take the form of suitable measurability requirements and”size” requirements in terms of the concept entropy, based on covering numbers. Let F be the class offunctions given by

F = e−tx − e−ty, t ∈ R+, (38)

where (x, y) ∈ R2+. To prove the result, we need to show that F is Donsker, using Theorem 8.19 of

Kosorok (2007), which are both stated in Appendix B as Definition 1 and Theorem 7 respectively.Bennett (2007) showed that the class of functions given by equation (38) are Vapnik-Cervonenkis

Classes, or VC classes- a class of functions whose covering numbers grow at a polynomial rate. Lemma 2

25

implies that J(1,F , L2) < ∞ holds, and that Fδ,F2∞ are P-measurable for every δ > 0. Finally,

F (x, y) = 1,∀(x, y) ∈ R2+ is an envelope function for the class F since, sup(x,y)∈R2

+, f∈F |f(x, y)| = 1.Let

Zn(t) =[(MB(−t)− MA(−t)

)− (MB(−t)−MA(−t))

],

where MK(−t) = 1n

∑ni=1 e

−tXKi , for K = A,B. Now, we compute the covariance kernel, which is given

by the following expectation

Ω(s, t) = E [Zn(t)Zn(s)]− E [Zn(t)]E [Zn(s)] .

Proof of Theorem 1:

Proof. We prove the theorem using Lemma 3. Let B′ be such that BD∞B′, and B′D∞A. Then, Let

(Ui)ni=1 be a random sample from the uniform distribution U [0, 1], we use the quantile functions of B and

B′ to compute the corresponding random samples of the respective CDFs. By Lemma 3,

MB(−t) ≤MB′(−t), ∀t ≥ 0 =⇒ MB(−t) ≤ MB′(−t), ∀t ≥ 0.

This implies that

T(MB′(−t)− MA(−t)

)≥ T


),

which implies that

PA,B

[T(MB(−t)− MA(−t)

)> c|H0

]≤ PA,B′

[T(MB′(−t)− MA(−t)

)> c|H0

],

for any c in the union of the supports of T , under the two null DGPs.For the case when FA is changed to FA′ as stated in the theorem above, we have again by Lemma 3, that

MA′(−t) ≤MA(−t), ∀t ≥ 0 =⇒ MA′(−t) ≤ MA(−t), ∀t ≥ 0.

This implies that

T(MB(−t)− MA′(−t)

)≥ T


),

which in turn implies that

PA,B

[T(MB(−t)− MA(−t)

)> c|H0

]≤ PA′,B

[T(MB(−t)− MA′(−t)

)> c|H0

],

or any c in the union of the supports of T , under the two null DGPs.

Proof of Theorem 3:

26

Proof. We use the direct method of proof to prove this result. Let

GF ,T ,n(q) = Prob[T (X?) ≤ q | X

]be the probability distribution of the test statistic T based on the unrestricted empirical likelihood DGP F ,that is conditional on the data X i.e. the resampling bootstrap based on the empirical measure. Then, givenε > 0, by Markov’s inequality and the triangle inequality, we have that

E[supq∈R

∣∣GF , T , n(q)−GF, T ,∞(q)∣∣ > ε

]ε

is less than or equal to

E[supq∈R

∣∣∣GF , T , n(q)−GF , T , n(q)∣∣∣ > ε

]ε

+E[supq∈R

∣∣∣GF , T , n(q)−GF, T ,∞(q)∣∣∣ > ε

]ε

.

Since distribution functions are bounded with probability one, it is sufficient to show that

supq∈R

∣∣∣GF , T , n(q)−GF , T , n(q)∣∣∣ , and sup

q∈R

∣∣∣GF , T , n(q)−GF, T ,∞(q)∣∣∣ ,

are oP (1) to conclude the result. To that end, note that

GF , T , n(q) =

∫T (u1,...,un)≤q

n∏i=1

dF (ui), GF , T , n(q) =

∫T (u1,...,un)≤q

n∏i=1

dF (ui). (39)

Because Eq = T (u1, ..., un) ≤ q is measurable, we can express it as the n-fold cross product of itssections sections, where a section is a set given by Eqj = uj; (uj, u−j) ∈ Eq , i = 1, . . . , n. Therefore,by Fubini’s Theorem, the distributions in (39) can be expressed as

n∏j=1

F (Eqj) =n∏j=1

1

n

n∑i=1

δXi(Eqj),

n∏j=1

F (Eqj) =n∏j=1

n∑i=1

piδXi(Eqj),

respectively. Then,∣∣∣GF , T , n(q)−GF , T , n(q)

∣∣∣ =∣∣∣∏n

j=1

∑ni=1 piδXi

(Eqj)−∏n

j=11n

∑ni=1 δXi

(Eqj)∣∣∣ ,which

is less than or equal to

2

∣∣∣∣∣n∑i=1

pni

n∏j=1

δXi(Eqj)−

(1

n

)n n∑i=1

n∏j=1

δXi(Eqj)

∣∣∣∣∣ ≤ 2n∑i=1

∣∣∣∣pi − 1

n

∣∣∣∣ n∏j=1

δXi(Eqj), (40)

where the last inequality follows by applying the triangle inequality, and the Mean Value Theorem withderivative that is uniformly bounded above by unity. Therefore, we have that

supq∈R

∣∣∣GF , T , n(q)−GF , T , n(q)∣∣∣ ≤ 2

n∑i=1

∣∣∣∣pi − 1

n

∣∣∣∣ ≤ 2|µ|n

n∑i=1

∣∣∣∣∣ 2

n+ µ(e−tX

Bi − e−tXA

i

)∣∣∣∣∣ = oP (1).

27

Regarding the second term, supq∈R

∣∣∣GF , T , n(q)−GF, T ,∞(q)∣∣∣ , add and subtract GF, T , n(q) and apply-

ing the triangle inequality. Then follow similar steps as above in deriving the sections to reach the conclu-sion that

supq∈R

∣∣∣GF , T , n(q)−GF, T , n(q)∣∣∣ ≤ sup

D∈D

∣∣∣∣∣ 1nn∑i

δXi(D)− F (D)

∣∣∣∣∣n

,

where D is the Borel sigma-algebra on [0,+∞)2, which tends to zero by Glivenko-Cantelli Theorem intwo dimensions.

As forsupq∈R|GF, T , n(q)−GF, T ,∞(q)| ,

we can re-write it using sections as above, yielding

supq∈R

∣∣∣∣∣n∏j=1

F (Eqj)− limm→∞

m∏j=1

F (Eqj)

∣∣∣∣∣ ≤ supq∈R

∣∣∣e∑nj=1 log(F (Eqj)) − e

∑+∞j=1 log(F (Eqj))

∣∣∣ ≤ supq∈R

∣∣∣∣∣+∞∑j=n

log (F (Eqj))

∣∣∣∣∣ ,which tends to zero as n→∞. Combining these intermediate results, gives rise to the desired conclusion.

Proof of Corollary 1:

Proof. Theorem 3 implies that GF , T , n(·) converges weakly to GF, T ,∞(·).Asymptotic Similarity: The continuity of GF, T ,∞(·) ensures that the p-value functional, hn(τ) = 1 −GF , PT , n

(τ), gives rise to a well defined asymptotic distribution theory for DGPs in ∂M0. Therefore, theresult follows by an application of Theorem 18.11 of Van der Vaart (2000) (Continuous Mapping Theorem)to the sequence GF , PT , n

(·).For asymptotic validity, some DGPs in in ∂M0 may give rise to GF, T ,∞(·) having point mass at zero.

In this case, we show that the probability mass at zero in GF, T ,∞(·) for either test statistic cannot exceed1/2. To that end, note that

GF, T ,∞(0) = Prob[N(ψt, σ

2t

)≤ 0,∀t ∈ [0,+∞)

],

whereψt = (MB(−t)−MA(−t))w(t), σ2

t = Ω(t, t)w2(t),

with Ω(s, t) being the variance-covariance kernel (18) in Theorem 2. Now,

Prob[N(ψt, σ

2t

)≤ 0,∀t ∈ [0,+∞)

]≤ Prob

[N(ψt0 , σ

2t0

)≤ 0], (41)

where t0 is such that ψt0 = 0, and exists since F ∈ ∂M0. Connecting this string of inequalities implies

GF, T ,∞(0) ≤ Prob[N(0, σ2

t0

)≤ 0]

= 1/2.

Therefore, the mass at zero cannot be too large to the extent that it can be the (1 − α)-th quantile forconventional significance levels.

Proof of Theorem 4:

28

Proof. Let Yi = µn

(e−tX

Bi − e−tXA

i

), the constraint defining the Lagrange multiplier µ can be expressed

in terms of the Yi

0 =1

n

n∑i=1

(e−tX

Bi − e−tXA

i

)(1− Yi +

Yi1 + Yi

),

which is equivalent to

0 =1

n

n∑i=1

(e−tX

Bi − ettXA

i

)− µ

n2

n∑i=1

(e−tX

Bi − e−tXA

i

)2+

1

n

n∑i=1

(e−tX

Bi − e−tXA

i

)Yi

1 + Yi. (42)

Note that∣∣∣∣∣∣ 1nn∑i=1

(e−tX

Bi − e−tXA

i

)Yi

1 + Yi

∣∣∣∣∣∣ =µ2

n3

∣∣∣∣∣∣∣n∑i=1

(e−tX

Bi − e−tXA

i

)31 + Yi

∣∣∣∣∣∣∣ ≤µ2

n3

n∑i=1

∣∣∣∣(e−tXBi − e−tXA

i

)3∣∣∣∣ |1 + Yi|−1 ,

(43)

and the extreme right term above is

8n−2Op

(n−1)Op (1) = n−3Op(1) = op

(n−2),

since µ = op(1). Therefore, solving for µn

in (42) yields

µ

n= S−1n

[1

n

n∑i=1

(e−tX

Bi − e−tXA

i

)]+ S−1n β

Sn =1

n

n∑i=1

(e−tX

Bi − e−tXA

i

)2, β =

1

n

n∑i=1

(e−tX

Bi − e−tXA

i

)Yi

1 + Yi. (44)

Expanding the likelihood ratio statistic (20) in terms of t and µ using the probabilities (10) yields:

LR(F ) = 2n∑i=1

log

1 +µ(e−tX

Bi − e−tXA

i

)n

. (45)

Now apply a Maclurin series expansion of the the function log(1 + x) to (45):

LR(F ) = 2n∑i=1

[µ

n

(e−tX

Bi − e−tXA

i

)− 1

2

µ2

n2

(e−tX

Bi − e−tXA

i

)2+ ηi

], (46)

where ∃C0 > 0 such that

Prob[|ηi| ≤ C0 |Yi|3 , i = 1, . . . , n

]→ 1, n→∞, since max

i=1...,n|Yi| ≤ 2

∣∣∣∣ µn∣∣∣∣ = op (1) .

29

Using (44), substitute for µn

in (46), which gives

LR(F ) = nS−1n

[1

n

n∑i=1

(e−tX

Bi − e−tXA

i

)]2− nβ2S−1n + 2

n∑i=1

ηi, (47)

after simplifying. Now we need to show that nβ2S−1n , 2∑n

i=1 ηi are op(1), and that

1

n

n∑i=1

(e−tX

Bi − e−tXA

i

)− 1

n

n∑i=1

(e−t0X

Bi − e−t0XA

i

)= op(1),

wheret0 ∈ arg max MB(−t)−MA(−t), t ∈ (0,+∞) .

After that, we can apply the Central Limit Theorem to obtain the desired result.Proceeding, we have that

nβ2S−1n = nop(n−2)Op (1) = op

(n−1)

= op (1) ,∣∣∣∣∣2n∑i=1

ηi

∣∣∣∣∣ ≤ C0n maxi=1...,n

|Yi|3 ≤ C0n

∣∣∣∣ µn∣∣∣∣3 = C0 |µ|3 n−2 = op (1) .

By the Mean Value Theorem,

1

n

n∑i=1

(e−tX

Bi − e−tXA

i

)− 1

n

n∑i=1

(e−t0X

Bi − e−t0XA

i

)=

[1

n

n∑i=1

(XAi e−tXA

i −XBi e−tXB

i

)](t− t0),

(48)

where |t− t0| ≤ |t− t0|. Since (t− t0) = Op

(n−1/2

), Asymptotically,

1

n

n∑i=1

(XAi e−tXA

i −XBi e−tXB

i

)=

1

n

n∑i=1

(XAi e−t0XA

i −XBi e−t0XB

i

),

and therefore, the right side of (48) is asymptotically equivalent to[1

n

n∑i=1

(XAi e−t0XA

i −XBi e−t0XB

i

)](t− t0) = Op (1) op (1) .

Finally, this implies that

LR(F ) = nS−1n

[1

n

n∑i=1

(e−tX

Bi − e−tXA

i

)]2+ op (1)

The first part:result follows by realizing that t, the solution of the system (11), will tend to +∞ when F belongs to theinterior of the model of the null hypothesis.

30

The second part:This result follows since

LR(F ) = nS−1n

[1

n

n∑i=1

(e−tX

Bi − e−tXA

i

)]2+ op (1)

can also be expressed as

LR(F ) = nS−1n

[1

n

n∑i=1

(e−t0X

Bi − e−t0XA

i

)]2+ op (1) , (49)

which is because t converges in probability to t0. Then, an application of the standard Central LimitTheorem and Slutsky’s Theorem to get

S−1/2n

1√n

n∑i=1

(e−t0X

Bi − e−t0XA

i

)d→ N(0, 1).

Third part:We can repeat the arguments above with a DGP in the alternative model, in which the difference to takeaccount of is that for

t0 = arg max MB(−t)−MA(−t), t ∈ (0,+∞) ,we have

MB(−t0)−MA(−t0) > 0. (50)

Now, we add and subtract (50) inside the sum in (49), and re-arrange to get

LR(F ) = nS−1n

[1

n

n∑i=1

(e−t0X

Bi − e−t0XA

i −MB(−t0) +MA(−t0))]2

+ nS−1n [MB(−t0)−MA(−t0)]2

− 2nS−1n [MB(−t0)−MA(−t0)]1

n

n∑i=1

(e−t0X

Bi − e−t0XA

i −MB(−t0) +MA(−t0))

+ op (1) .

(51)

Then, we have that

nS−1n

[1

n

n∑i=1

(e−t0X

Bi − e−t0XA

i −MB(−t0) +MA(−t0))]2

d→ σ χ2(1), where

σ =EF

[e−t0X

Bi − e−t0XA

i

]2−(EF

[e−t0X

Bi − e−t0XA

i

])2EF[e−t0X

Bi − e−t0XA

i

]2 ,

and that

2nS−1n [MB(−t0)−MA(−t0)]1

n

n∑i=1

(e−t0X

Bi − e−t0XA

i −MB(−t0) +MA(−t0))

31

is nOp

(n−1/2

)Op

(n−1/2

)= Op (1) . Now, since

nS−1n [MB(−t0)−MA(−t0)]2 →∞, n→∞,

LR(F )→∞, n→∞.Since F in the alternative model was arbitrary, this divergence of the LR statistic holds for every DGP inthe alternative.

Proof of Corollary 2:

Proof. The first part follows directly from the result in Theorem 4 that

LR(F )→∞, n→∞,

in distribution for every F in the alternative model. A similar argument holds for the corresponding boot-strap p-value (14). It converges to zero as the sample size increases, for every F in the alternative model.This implies that the rejection probability under the alternative converges to unity, for every F in thealternative model.

B Some empirical Process TheoryThe material presented here is taken from Kosorok (2007). The aim of this section is to define the variousconcepts needed to state Theorem 8.19 of Kosorok (2007), which is the main result utilized in this paperto show that the class of functions e−tx − e−ty, t, x, y ∈ R+ is P -Donsker.

Consider a random sampleX1, ..., Xn drawn form a probability measure P on an arbitrary sample spaceχ. We define the empirical measure to be Pn = n−1

∑ni=1 δXi

, where δx is the measure that assigns mass1 at x and zero elsewhere. Define the random measure Gn =

√n (Pn − P ), and, for any class F of

measurable functions, f : χ → R, let G be a mean zero Gaussian process indexed by F , with covarianceE [f(X)g(X)]− Ef(X)EG(X), for all f, g ∈ F , and having appropriately continuous sample paths.

Definition 1. We say that a class of functions F is P -Donsker if GN G in l∞(F), where the limitingprocess, Gf for f ∈ F , is tight.

The content of Definition 1 is that the sequence of probability distributions on the functions induced bythe F empirical process, converges in distribution to a tight Gaussian stochastic process. Next, we presenta result that is used to prove the existence of the asymptotic distribution of the test statistics considered inthis paper.

Theorem 5 (Continuous Mapping Theorem). Let g : D 7→ E be continuous at all points inD0 ⊂ D, whereD and E are metric spaces. Then if Xn X in D, with P?(X ∈ D0) = 1, then g(Xn) g(X).

In general, two conditions must be met in order for Xn to converge weakly in l∞(T ) to a tight X . Thisis summarized in the following theorem.

Theorem 6. Xn converges weakly to a tight X in l∞(T ) if and only if:

1. For all finite t1, ...tk ⊂ T , the multivariate distribution of Xn(t1), ..., Xn(tk) converges to thatof X(t1), ..., X(tk).

32

2. There exists a semimetric ρ for which T is totally bounded and

limδ↓0

lim supn→∞

P ?

sup

s,t∈T with ρ(s,t)<δ|Xn(s)−Xn(t)| > ε

= 0, (52)

∀ε > 0.

Condition one of Theorem 6 is convergence of all the finite dimensional distributions and condition twoimplies asymptotic tightness. In many applications, the former is not hard to verify while the latter is muchmore difficult.

Definition 2. An envelope function of a class F is any function x → F (x) such that |f(x)| ≤ F (x), forevery x and f ∈ F .

Definition 3. Let F be a class of functions and δ > 0. Then define the following sets

Fδ = f − g; f, g,∈ F , ‖f − g‖P,2 < δ (53)F2∞ =

(f − g)2; f, g,∈ F

(54)

Definition 4. A class F of measurable functions, f : χ → R on the probability space (χ,A, P ), ispointwise measurable if there exists a countable subset G ⊂ F such that, for every f ∈ F there exists asequence gm ∈ G with gm(x)→ f(x), ∀x ∈ χ.

Definition 5. A class F of measurable functions, f : χ → R on the probability space (χ,A, P ), isP-Measurable if

(X1, ..., Xn) 7→

∥∥∥∥∥n∑i=1

eif(Xi)

∥∥∥∥∥F

is measurable on the completion of the product probability space, (χn,An, P n), for every constant vector(e1, ..., en) ∈ Rn where, ‖

∑ni=1 eif(Xi)‖F = supf∈F |

∑ni=1 eif((Xi)− EF (Xi)| .

It is not difficult to show that Pointwise measurability implies P -measurability.

Whether a class of functions is Donsker depends on the ”size” of the class. A relatively simple way tomeasure the ”size” of a class F is in terms of entropy. We shall consider the concept of entropy based on”uniform covering numbers.” The covering number N(ε,F , L2(Q)) is the minimal number of L2(Q)-ballsof radius ε needed to cover the set F . The entropy is the logarithm of the covering number. The uniformcovering number is given by

supQN(ε‖F‖Q,r,F , Lr(Q)),

where the supremum is taken over all probability measures Q for which the class F is not identically zero(and hence ‖F‖rQ,r = QF r > 0). The uniform covering numbers are relative to a given envelope functionF . This is fortunate, because the covering numbers under different measures Q typically are more stableif standardized by the norm of the envelope function. Also, notice that eh uniform covering number doesnot depend on the probability measure P of the observed data. The uniform entropy integral is defined as

J(δ,F , L2) =

∫ δ

0

√log sup

QN(ε‖F‖Q,2,F , L2(Q))dε.

Now, we are prepared to state Theorem 8.19 of Kosork (2007).

33

Theorem 7. Let F be a class of measurable functions ,f : χ → R, on the probability space (χ,A, P )with envelope F and J(1,F , L2) < ∞. Let the classes Fδ and F2

∞ be P -measurable for every δ > 0. IfP ?F 2 <∞, then F is P -Donsker.

An important class of examples for which good estimates of the uniform covering numbers are knownare the Vapnik-Cervonenkis Classes, or VC classes. Many classes of interest in statistics are VC, such asthe class of indicator functions, and the class of functions given by equation (38).

Consider an arbitrary collection x1, ...xn of points in a set χ and a collection C of subsets of χ. Wesay that C picks out a certain subset A of x1, ...xn, if it can be written as A = x1, ...xn ∩ C for someC ∈ C. The collection C is said to shatter x1, ...xn if C picks out each of its 2n subsets. The VC indexV (C) of C is the smallest n for which no set of size n is shattered by C. A collection C of measurable setsis called a VC class if its index V (C) is finite. By definition, a VC class of sets picks out strictly less than2n subsets from any set of n ≥ V (C) elements. Sauer’s lemma is that such a class can necessarily pick outonly a polynomial numberO(nV (C)−1) of subsets, well be low the 2n−1 that the definition appears to allow.

For a function f : χ 7→ R, the subset of χ × R given by (x, t) : t < f(x) is the subgraph of f . Acollection F of measurable real functions on the sample space χ is a VC subgraph class if the collectionof all subgraphs of functions F forms VC class of sets as sets in χ×R. Let V (F) denote the VC-index ofthe set of subgraphs of F . The next result shows that covering numbers of VC classes of functions growat a polynomial rate just like VC classes of sets:

Theorem 8. There exists a universal constant K < ∞ such that, for any VC-class of measurable F withintegrable envelope F , any r ≥ 1, any probability measure Q with ‖F‖Q,r > 0, and any 0 < ε < 1,

supQN(ε‖F‖Q,r,F , Lr(Q)) ≤ KV (F)(4e)V (F)

(2

ε

)r(V (F)−1)

.

Thus VC-classes of functions easily satisfy the uniform entropy requirements of the Donsker theoremabove.

Finally we state a useful result that helps in the proof of test consistency. It is Theorem 8.14 ofKosorok Kosorok (2007), but first we state the following definition

Definition 6. A class of measurable functions F of measurable functions f : χ → R is said to be PGlivenko-Cantelli class if

supf∈F|Pnf − Pf |

a.s?→ 0,

where Pf =∫χf(x)P (dx).

Theorem 9. Let F be a P-measurable class of measurable functions with envelope F and

supQN(ε‖F‖Q,1,F , L1(Q)),

for every ε > 0, where the supremum is taken over all finite probability measures Q for which the class Fis not identically zero( and hence ‖F‖Q,1 = QF 1 > 0). If P ?F <∞, then F is P Glivenko-Cantelli.

34

asymptotic and bootstrap tests for inﬁnite order ... · that is, stochastic dominance of any...

Documents