bootstrapping and permuting paired t-test type statistics

14
Stat Comput (2014) 24:283–296 DOI 10.1007/s11222-012-9370-4 Bootstrapping and permuting paired t -test type statistics Frank Konietschke · Markus Pauly Received: 15 August 2012 / Accepted: 15 November 2012 / Published online: 8 January 2013 © The Author(s) 2013. This article is published with open access at Springerlink.com Abstract We study various bootstrap and permutation methods for matched pairs, whose distributions can have different shapes even under the null hypothesis of no treat- ment effect. Although the data may not be exchangeable un- der the null, we investigate different permutation approaches as valid procedures for finite sample sizes. It will be shown that permutation or bootstrap schemes, which neglect the dependency structure in the data, are asymptotically valid. Simulation studies show that these new tests improve the power of the t -test under non-normality. Keywords Bootstrap · Heteroscedasticity · Matched pairs · Permutation tests 1 Introduction In many psychological, biological and medical experiments, data are collected in terms of a matched pairs design, e.g. when a homogeneous group of subjects is repeatedly ob- served under two conditions called time points in the termi- nology of repeated measures designs. Hereby different vari- ances of the observations occur in a natural way, e.g. when data are collected over time. The data of such trials can be modeled by independent and identically distributed random F. Konietschke ( ) Department of Medical Statistics, University of Goettingen, Humboldtallee 32, 37073 Goettingen, Germany e-mail: [email protected] M. Pauly Institute of Mathematics, University of Duesseldorf, Universitaetsstrasse 1, 40225 Duesseldorf, Germany e-mail: [email protected] vectors X i = (X i,1 ,X i,2 ) , i = 1,...,n, (1.1) with expectation E(X 1 ) = μ = 1 2 ) and an arbitrary positive definite covariance matrix Var(X 1 ) = Σ . Our aim is to test the null hypothesis H 0 : μ 1 = μ 2 , or H (1) 0 : μ 1 μ 2 , in this semi-parametric framework. The paired t -test type statistic |T n,stud | with T n,stud = n D n /V n (1.2) is the commonly used statistic for testing H 0 , where D i = X i,1 X i,2 denote the differences of the pairs for i = 1,...,n, D n = n 1 n i =1 D i = X 1 X 2 is the difference of the means, and V 2 n = (n 1) 1 n i =1 (D i D n ) 2 de- notes the sample variance of the D i ’s. As commonly known, T n,stud is exactly T (n 1)-distributed under H 0 , if the differences are normal, even for arbitrary Σ . Under non- normality, the distribution of T n,stud may be approximated by a T (n 1)-distribution, which follows from the central limit theorem. For large sample sizes, the null hypothesis H 0 : μ 1 = μ 2 will be rejected if |T n,stud |≥ t 1α/2 , where t 1α/2 denotes the (1 α/2)-quantile from the T (n 1)- distribution. Thus, the t -test can be equivalently written as ϕ t = 1 (t 1α/2 ,) ( |T n,stud | ) . (1.3) For testing H (1) 0 the t-test ϕ t can be redefined by using T n,stud as the test statistic in (1.3) and replacing the critical value t 1α/2 by t 1α . In a variety of papers and applications, however, it has already been shown that the rate of conver- gence from T n,stud to its asymptotic normality is rather slow, particularly for skewed distributions of the differences. For a detailed explanation we refer the reader to Munzel (1999).

Upload: others

Post on 18-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Stat Comput (2014) 24:283–296DOI 10.1007/s11222-012-9370-4

Bootstrapping and permuting paired t-test type statistics

Frank Konietschke · Markus Pauly

Received: 15 August 2012 / Accepted: 15 November 2012 / Published online: 8 January 2013© The Author(s) 2013. This article is published with open access at Springerlink.com

Abstract We study various bootstrap and permutationmethods for matched pairs, whose distributions can havedifferent shapes even under the null hypothesis of no treat-ment effect. Although the data may not be exchangeable un-der the null, we investigate different permutation approachesas valid procedures for finite sample sizes. It will be shownthat permutation or bootstrap schemes, which neglect thedependency structure in the data, are asymptotically valid.Simulation studies show that these new tests improve thepower of the t-test under non-normality.

Keywords Bootstrap · Heteroscedasticity · Matched pairs ·Permutation tests

1 Introduction

In many psychological, biological and medical experiments,data are collected in terms of a matched pairs design, e.g.when a homogeneous group of subjects is repeatedly ob-served under two conditions called time points in the termi-nology of repeated measures designs. Hereby different vari-ances of the observations occur in a natural way, e.g. whendata are collected over time. The data of such trials can bemodeled by independent and identically distributed random

F. Konietschke (�)Department of Medical Statistics, University of Goettingen,Humboldtallee 32, 37073 Goettingen, Germanye-mail: [email protected]

M. PaulyInstitute of Mathematics, University of Duesseldorf,Universitaetsstrasse 1, 40225 Duesseldorf, Germanye-mail: [email protected]

vectors

Xi = (Xi,1,Xi,2)′, i = 1, . . . , n, (1.1)

with expectation E(X1) = μ = (μ1,μ2)′ and an arbitrary

positive definite covariance matrix Var(X1) = Σ . Our aim isto test the null hypothesis H0 : μ1 = μ2, or H

(1)0 : μ1 ≤ μ2,

in this semi-parametric framework.The paired t-test type statistic |Tn,stud | with

Tn,stud = √n Dn/Vn (1.2)

is the commonly used statistic for testing H0, where Di =Xi,1 − Xi,2 denote the differences of the pairs for i =1, . . . , n, Dn = n−1 ∑n

i=1 Di = X1 − X2 is the differenceof the means, and V 2

n = (n − 1)−1 ∑ni=1(Di − Dn)

2 de-notes the sample variance of the Di ’s. As commonly known,Tn,stud is exactly T (n − 1)-distributed under H0, if thedifferences are normal, even for arbitrary Σ . Under non-normality, the distribution of Tn,stud may be approximatedby a T (n − 1)-distribution, which follows from the centrallimit theorem. For large sample sizes, the null hypothesisH0 : μ1 = μ2 will be rejected if |Tn,stud | ≥ t1−α/2, wheret1−α/2 denotes the (1 − α/2)-quantile from the T (n − 1)-distribution. Thus, the t-test can be equivalently written as

ϕt = 1(t1−α/2,∞)

(|Tn,stud |). (1.3)

For testing H(1)0 the t-test ϕt can be redefined by using

Tn,stud as the test statistic in (1.3) and replacing the criticalvalue t1−α/2 by t1−α . In a variety of papers and applications,however, it has already been shown that the rate of conver-gence from Tn,stud to its asymptotic normality is rather slow,particularly for skewed distributions of the differences. Fora detailed explanation we refer the reader to Munzel (1999).

284 Stat Comput (2014) 24:283–296

It is the aim of the present paper to discuss the limitbehaviour of various resampling versions of Tn,stud to im-prove its small sample properties under non-normality. Spe-cific examples are all kind of bootstrap and permutation re-sampling statistics. Although the data may not be exchange-able in model (1.1), an accurate and (asymptotically) validlevel α resampling test for H0 can be derived if (i) the re-sampling distribution of the statistic is asymptotically in-dependent from the distribution of the data; (ii) the resam-pling distribution has a limit; and (iii) if the distribution ofthe test statistic and the conditional resampling distribution(asymptotically) coincide, see Janssen (1997, 1999a, 1999b,2005), Janssen and Pauls (2003, 2005), Neubert and Brun-ner (2007), Pauly (2011) or Omelka and Pauly (2012). Theitems (i)–(iii) will be referred to the permanence property ofresampling tests.

More details on theory and applications of bootstrap andpermutation tests can be found in the monographs of Bassoet al. (2009), Good (2005) as well as Pesarin and Salmaso(2010b). Moreover, when comparing more than one aspectof the data, Brombin et al. (2011) also discuss permutationtests for paired observations with an useful application. Inparticular, permutation approaches for multivariate data areintensively discussed by Pesarin and Salmaso (2012) andBrombin and Salmaso (2009). Both papers provide a de-tailed summary of existing procedures and some new de-velopments. Regarding repeated measures designs, Pesarinand Salmaso (2010a) apply permutation tests by investigat-ing finite-sample properties.

The intuitive resampling or permutation strategy is todraw the differences with replacement Di from the data, orto permute the variables Xi,1 and Xi,2 within the pairs, re-spectively. The lack of both resampling schemes is that onlya few permutations (2n) are available, or that a small vari-ety within the resamplings occurs when n is rather small.The counterintuitive resampling or permutation strategiesare either drawing the variables Xi,s with replacement fromall 2n observations X1,1, . . . ,Xn,2, drawing the variablesXi,s − Xs with replacement from each marginal sampleX1,s , . . . ,Xn,s, s = 1,2, separately, or to permute all 2n ob-servations in X = (X1,1,X1,2, . . . ,Xn,2)

′, and then repeat-edly compute (e.g. 10,000 times) the paired t-test statistic.On the one hand, these counterintuitive resampling meth-ods increase the resampling variability, on the other hand,the dependency structure within the pairs is neglected. Inthis paper, it will be shown that both kinds of the intuitiveand also the counterintuitive resampling strategies, whichneglect the dependency structure in the data, fulfill the per-manence property, and thus, the corresponding resamplingtests are asymptotically valid. Extensive simulation studiesshow that especially permutation-based approaches improvethe paired t-test, even for extremely small sample sizes. Thepaper is organized as follows: In Sect. 2 we explain how re-sampling and permutation tests work and explain in detail

why the resulting tests are asymptotically valid. In Sect. 3extensive simulations are conducted to compare the differ-ent resamplings with the paired t-test. The paper closes witha discussion of the results. All technical details and proofsare given in the Appendix.

2 How do paired bootstrap and permutation testswork?

In this section we will study various resampling versions ofthe paired t-Test. Among others we like to point out whyspecial bootstrap and permutation tests, which neglect thedependency structure of the data within their resamplingscheme, are asymptotically valid level α tests for H0. LetX∗ = (X∗

1, . . . ,X∗n)

′, with X∗i = (X∗

i,1,X∗i,2), denote n re-

sampling vectors for i = 1, . . . , n, given the original data X,where

(I) X∗ is a random permutation of all data X = (X1,1,X1,2,

. . . ,Xn,2)′, or

(II) X∗i is a random permutation of the sample unit X′

i =(Xi,1,Xi,2), or

(III) X∗i,s is randomly drawn with replacement from all data

X, or(IV) X∗

i,s is randomly drawn with replacement from each

centered marginal sample Xs = (X1,s −Xs, . . . ,Xn,s −Xs)

′, s = 1,2, respectively.

The conditional resampling statistic of Tn,stud is then givenby

T ∗n,stud = √

n D∗n/V ∗

n , (2.1)

where D∗i = X∗

i,1 − X∗i,2 denotes the differences of the re-

sampling variables for i = 1, . . . , n, D∗n = n−1 ∑n

i=1 D∗i de-

notes their mean, and V ∗2n = (n − 1)−1 ∑n

i=1(D∗i − D

∗n)

2

denotes the sample variance of the differences D∗i .

Here we like to point out that the denominator in (2.1)is part of the resampling procedures, which is in accordancewith the guidelines for bootstrap testing, see Hall and Wil-son (1991), Beran (1997), Bickel and Freedman (1981), andJanssen (2005). Delaigle et al. (2011) have further shownthat studentized resampling t-statistics are more robust andaccurate than non-studentized statistics. The following givesan explanation how the corresponding resampling tests canbe computed.

The introduced conditional resampling tests rely on areference distribution L(T ∗

n,stud |X) given the data X. Thismeans that the data are treated as fixed values, and quantilesfrom the conditional resampling distribution of T ∗

n,stud areestimated to compute critical values. Denote by c∗

n(1 − α)

the (1 −α)-quantile of L(T ∗n,stud |X). Then, according to the

Stat Comput (2014) 24:283–296 285

definition of the paired t-test in (1.3), conditional resam-pling tests can be written as

ϕ∗n = 1(−∞,c∗

n(α/2))(Tn,stud) + 1(c∗n(1−α/2),∞)(Tn,stud). (2.2)

Next we will prove that T ∗n,stud as given in (2.1) is asymptot-

ically standard normal under all of the different resamplingschemes described above. In particular, we will show thatthe permanence property is fulfilled, thus, ϕ∗

n is an asymp-totically valid test for H0. Its asymptotic normality is par-ticularly derived under arbitrary alternatives, i.e. we do notassume that H0 is true. To give an answer to the question“How do paired Bootstrap and Permutation tests work?”we will introduce the following criterion from Janssen andPauly (2010), which uses the paired t-test as a benchmarkfor the resampling procedures.

Definition 2.1 The conditional tests ϕ∗n defined in (2.2) are

called

(i) asymptotically effective under H0 with respect to thepaired t-test, iff

E(∣∣ϕ∗

n − ϕt

∣∣) −→ 0 as n → ∞, and (2.3)

(ii) consistent iff

E(ϕ∗

n

) −→ 1{μ1 > μ2} (2.4)

for μ1 = μ2 as n → ∞.

Now we can formulate

Theorem 2.1 The resampling tests ϕ∗n defined in (2.1) are

asymptotically effective with respect to ϕt and consistent un-der all resampling schemes (I) through (IV).

From the proof it can be seen that a similar result alsoholds for one-sided versions of the tests. For further de-tails see the Appendix. Specifically, Theorem 2.1 shows thatthe counterintuitive resampling procedures (I), (III) and (IV)are asymptotically valid, because studentized statistics areresampled. Roughly speaking, the studentization of the re-sampling variables “deletes” the dependency structure in thedata when n is sufficiently large.

2.1 Resampling the differences Di

In this subsection we will also introduce resampling meth-ods, particularly wild boostrap methods, which are based onthe differences Di . The wild bootstrap technique is moti-vated by the residual bootstrap commonly applied in regres-sion analysis, see Wu (1986), Mammen (1992) and Beran(1997), and in time-series testing problems, see Kreiss andPaparoditis (2011). It is also proposed in the context of sur-vival analysis, see Lin (1997) or Beyersmann et al. (2012).

Here, we adapt the wild bootstrap to the simple matchedpairs design and we will compare the accuracy of the re-sulting test procedures with the resampling tests in (2.1) inextensive simulation studies. Let D∗

i denote n resamplingvariables given the original differences D = (D1, . . . ,Dn)

′,where D∗

i denotes the observed value from

(V) drawing with replacement from all differences D, or(VI) from a wild bootstrap method with D∗

i = WiDi , whereWi, i = 1, . . . , n, denote independent and identicallydistributed random variables, which are independentfrom the Di ’s, with E(W1) = 0 and Var(W1) = 1.

The corresponding resampling tests are then defined as in(2.2) with the paired t-test type resampling statistic

T ∗n,stud = √

n D∗n/V ∗

n , (2.5)

where now D∗n = n−1 ∑n

i=1 D∗i denotes the mean of the re-

sampled differences, and V ∗2n = (n−1)−1 ∑n

i=1(D∗i −D

∗n)

2

denotes the sample variance of the D∗i ’s. The effectiveness

of these resampling procedures is given in the next theorem.

Theorem 2.2 The resampling tests ϕ∗n defined in (2.5) are

asymptotically effective with respect to ϕt and consistent un-der both resampling schemes (V) and (VI).

Example and Remark 2.1 In our simulation study in Sect. 3,we will focus on the following weight examples. However,there are of course others that may be of interest for partic-ular situations.

(a) Wi, i = 1, . . . , n is a sequence of symmetric indepen-dent and identically distributed random variables with

P

(

W1 = 1 + √5

2

)

=√

5 − 1

2√

5and

P

(

W1 = 1 − √5

2

)

=√

5 + 1

2√

5.

In this case it even holds that E(W 31 ) = 1. These wild

bootstrap weights are typically used for studentized teststatistics, see e.g. Kreiss and Paparoditis (2011). We willcall the corresponding test Rademacher wild bootstrap.

(b) Wi, i = 1, . . . , n, is a sequence of independent and iden-tically distributed Gaussian random variables, i.e. Wi ∼N(0,1). This corresponds to the resampling procedureproposed by Lin (1997).

We note that Arlot et al. (2010a, 2010b) investigate wildbootstrap methods for multiple comparisons and confi-dence intervals in high-dimensional data using random signsWi, i = 1, . . . , n, with distribution P(W1 = −1) = P(W1 =1) = 1/2. This resampling method, however, is equivalent

286 Stat Comput (2014) 24:283–296

to the resampling scheme (II). For further details we referthe reader to Janssen (1999b).

Theorems 2.1 and 2.2 state that all the considered proce-dures fulfill the permanence property, thus, the correspond-ing tests ϕ∗

n are asymptotically valid. The numerical algo-rithm for the computation of the p-value is as follows

(1) Given the data X, compute the paired t-test statisticTn,stud as given in (1.2).

(2) Repeat the resampling steps N times (e.g. N = 10,000),compute the values T ∗

n,stud and save them in A1, . . . ,

AN .(3) Estimate the two-sided p-value by

p-value = min{2p1,2 − 2p1},

where p1 = 1

N

N∑

�=1

1{A� ≤ Tn,stud}.

In comparison to that the one-sided p-value is given byp1.

3 Simulations

For testing the two-sided null hypothesis H0 : μ1 = μ2

formulated above, we consider the unconditional t-test ϕt

based on the T (n − 1)-approximation of the statistic Tn,stud

in (1.2) and the various conditional resampling tests ϕ∗n

based on the resampling schemes (I) through (VI) as de-scribed in Sect. 2. The simulation studies are performed toinvestigate their behaviour with regard to maintaining thepre-assigned type-I error level under the hypothesis, andthe power of the statistics under alternatives. The obser-vations Xi = (Xi,1,Xi,2)

′, i = 1, . . . , n, were generated us-ing marginal distributions Fs and varying correlations ρ ∈(−1,1). We hereby generate exchangeable matched pairshaving a bivariate normal, exponential, log-normal or uni-form distribution, each with correlation ρ ∈ (−1,1), as wellas non-exchangeable data by simulating

(a) F1 = N(0,1) and F2 = N(0,2),(b) F1 = N(0,1) and F2 = N(0,4),(c) F1 = N(3,4) and F2 = χ2

3 , and(d) F1 = N(exp(0.5),3) and F2 = LN(0,1),

each with correlation ρ, respectively. Routine calculationsshow that μ1 = μ2 is fulfilled in all of these considera-tions. We only consider the small sample sizes n = 7 andn = 10 throughout this paper. All simulations were con-ducted with the help of R-computing environment, version2.13.2 (www.r-project.org), each with nsim = 10,000 andN = 10,000 bootstrap runs. The simulation results for ex-changeable normally, exponentially, log-normally, and uni-formly distributed matched pairs with the very small sample

size of n = 7 and different correlations ρ are displayed inTable 1.

It follows from Table 1 that the paired t-test is an accu-rate procedure for symmetric distributions (normal and uni-form), even for the very small sample size of n = 7. Whenthe data are skewed (exponential and log-normal), the t-testtends to be conservative. It is apparent that both the wildbootstrap methods using the Rademacher weights as definedin Remark 2.1(b) and the Gaussian weights given in Re-mark 2.1(c) are inappropriate tests for such small samplesizes. The resampling test with Rademacher weights is veryliberal. This can be explained by the fact that these weightsare very skewed distributed. Roughly speaking, both wildbootstrap resampling distributions are too far away from thedistribution of Tn,stud , when n is rather small and the origi-nal data are not resampled. Simply drawing the differencesfrom the data with replacement can not be recommended ei-ther. The corresponding test tends to be quite liberal whenthe data are skewed. This occurs, because the resamplingvariability (i.e. the variability within the resampling vari-ables D∗

i ) is rather small when n = 7. However, drawingwith replacement from either all 2n observations or fromeach marginal separately, results in more accurate test deci-sions. Comparing these results with the permutation basedapproaches, it is easily seen that both kind of permutationtests (i.e. to permute all data, or to permute within the sam-ple unit) control the type-I error level for all distributionsand all dependencies ρ in the data. Next we investigate thebehaviour of the different resampling tests for larger n = 10.The simulation results are displayed in Table 2.

From Table 2 an interesting phenomenon of replacementprocedures with resampling scheme (IV) and (V) can be ob-served: The rejection rates do not converge linearly in n to α.The tests are more liberal with n = 10 than with n = 7. Theirliberality increases with an increasing n up to the break-point n ≈ 15. With larger n (e.g. n ≥ 30), all resamplingtest based on drawing with replacements are accurate. Theliberality of the wild bootstrap tests using Rademacher orGaussian weights decrease. Both kind of permutation ap-proaches, however, are still the most accurate procedures.

Now we investigate how accurate the tests control thetype-I error level when both marginal distributions are dif-ferent. The simulation results for different non-exchangeabledistributions (a) through (d) with n = 7 and varying correla-tions are displayed in Table 3.

It follows from Table 3 that both the permutation ap-proaches are accurate, even for non-exchangeable distribu-tions, n = 7, and permutations of all data X. When twodistributions with extremely different shapes and negativecorrelations (normal versus log-normal) are compared, theytend to be slightly liberal. The same conclusions, however,can be drawn for the t-test. In Table 4 the simulation resultsfor n = 10 and the same non-exchangeable distributions aregiven.

Stat Comput (2014) 24:283–296 287

Tabl

e1

Type

-Ier

ror

leve

l(α

=5

%)

sim

ulat

ions

for

very

smal

lsam

ple

size

s(n

=7)

with

exch

ange

able

dist

ribu

tions

Dis

trib

utio

Tn,s

tud

Perm

utat

ion

test

sB

oots

trap

test

sW

ildbo

otst

rap

(VI)

Ove

rall

(I)

Per

unit

(II)

Ove

rall

(III

)M

argi

nal(

IV)

Dif

fere

nces

(V)

Rad

emac

her

Nor

mal

Nor

mal

−0.9

04.

644.

584.

984.

704.

454.

9514

.61

7.14

−0.5

04.

824.

695.

044.

904.

935.

0715

.18

7.24

−0.3

04.

824.

745.

054.

805.

145.

0414

.84

7.46

0.00

4.97

5.01

4.80

5.13

5.02

5.23

14.5

57.

53

0.30

5.12

5.17

5.16

5.20

4.67

5.26

15.0

37.

52

0.50

5.04

5.02

5.35

5.02

4.89

5.16

15.2

27.

46

0.90

4.82

4.77

5.10

4.87

5.08

5.32

15.1

37.

40

Exp

−0.9

04.

245.

685.

065.

106.

805.

6515

.45

7.25

−0.5

04.

225.

395.

415.

016.

806.

3416

.26

7.22

−0.3

04.

215.

555.

404.

995.

776.

4216

.21

7.64

0.00

3.99

5.25

5.42

4.69

5.82

6.64

16.8

47.

47

0.30

3.68

5.00

5.28

4.53

5.99

7.05

17.5

37.

60

0.50

3.50

4.38

5.34

4.21

5.61

7.12

17.3

67.

73

0.90

3.42

4.12

5.44

4.13

5.00

7.09

17.8

97.

32

Uni

form

−0.9

05.

795.

275.

095.

694.

124.

0213

.82

7.31

−0.5

04.

954.

684.

524.

713.

213.

7313

.54

7.03

−0.3

05.

445.

295.

325.

273.

604.

7614

.34

7.37

0.00

5.17

4.96

5.12

5.07

3.04

4.86

14.6

97.

65

0.30

4.80

4.48

5.39

4.77

3.62

5.04

15.7

07.

58

0.50

4.54

4.34

5.02

4.60

3.35

5.64

15.0

67.

50

0.90

4.49

4.15

5.17

4.38

3.73

6.25

16.2

77.

54

LN

orm

−0.9

03.

655.

185.

015.

047.

726.

6616

.95

6.83

−0.5

03.

025.

245.

234.

377.

077.

1617

.60

6.91

−0.3

03.

115.

375.

414.

077.

076.

8218

.28

6.53

0.00

2.67

4.78

5.02

3.80

5.90

6.92

18.1

36.

49

0.30

2.51

4.23

5.01

3.63

5.79

6.43

18.0

26.

18

0.50

2.64

4.78

5.33

3.67

5.27

7.41

18.6

07.

00

0.90

2.40

4.91

5.21

2.92

4.11

6.82

18.7

16.

46

288 Stat Comput (2014) 24:283–296

Tabl

e2

Type

-Ier

ror

leve

l(α

=5

%)

sim

ulat

ions

for

mod

erat

esa

mpl

esi

zes

(n=

10)

with

exch

ange

able

dist

ribu

tions

Dis

trib

utio

Tn,s

tud

Perm

utat

ion

test

sB

oots

trap

test

sW

ildbo

otst

rap

(VI)

Ove

rall

(I)

Per

unit

(II)

Ove

rall

(III

)M

argi

nal(

IV)

Dif

fere

nces

(V)

Rad

emac

her

Nor

mal

Nor

mal

−0.9

04.

584.

564.

894.

705.

144.

4412

.13

6.08

−0.5

04.

764.

734.

774.

834.

814.

7612

.67

6.46

−0.3

04.

985.

035.

325.

034.

765.

1312

.95

6.88

0.00

4.33

4.30

4.15

4.35

4.90

4.63

12.2

15.

77

0.30

5.08

5.02

4.85

5.13

4.91

4.99

12.5

46.

71

0.50

4.63

4.65

4.50

4.64

5.25

4.76

12.0

36.

28

0.90

4.83

4.86

4.98

4.80

5.35

5.00

12.0

86.

53

Exp

−0.9

04.

585.

495.

114.

957.

276.

2013

.91

6.65

−0.5

04.

675.

475.

275.

237.

766.

5713

.59

6.61

−0.3

04.

495.

275.

175.

087.

107.

0014

.49

6.67

0.00

4.07

5.06

4.85

4.62

7.08

7.39

14.4

26.

61

0.30

3.93

4.60

4.80

4.45

6.94

7.36

14.1

76.

44

0.50

3.86

4.44

4.64

4.24

6.24

7.74

14.7

96.

55

0.90

3.99

4.18

5.12

4.25

5.10

8.20

15.0

27.

00

Uni

form

−0.9

05.

395.

385.

085.

273.

673.

1311

.54

6.46

−0.5

05.

615.

385.

255.

513.

533.

6511

.84

6.57

−0.3

05.

515.

265.

255.

313.

443.

9711

.99

6.79

0.00

5.09

4.83

4.78

5.01

3.70

4.67

12.1

06.

99

0.30

5.12

5.00

4.99

4.96

3.69

5.16

12.9

36.

60

0.50

4.79

4.69

4.69

4.72

4.13

5.34

12.5

36.

35

0.90

4.68

4.57

5.01

4.58

3.66

6.48

13.7

76.

86

LN

orm

−0.9

03.

515.

504.

814.

538.

927.

6114

.61

6.23

−0.5

03.

184.

934.

584.

158.

937.

7214

.42

5.88

−0.3

03.

335.

395.

074.

378.

608.

4815

.35

6.40

0.00

3.02

5.03

5.02

3.99

8.24

8.87

15.9

26.

28

0.30

2.89

4.21

4.86

3.67

7.97

8.71

16.4

66.

18

0.50

2.87

4.42

5.00

3.61

7.31

8.43

15.9

05.

96

0.90

2.82

4.40

4.97

3.43

5.27

8.46

16.5

46.

06

Stat Comput (2014) 24:283–296 289

Tabl

e3

Type

-Ier

ror

leve

l(α

=5

%)

sim

ulat

ions

for

very

smal

lsam

ple

size

s(n

=7)

with

non-

exch

ange

able

dist

ribu

tions

(a)

thro

ugh

(d)

asde

scri

bed

inth

ete

xt

Dis

trib

utio

Tn,s

tud

Perm

utat

ion

test

sB

oots

trap

test

sW

ildbo

otst

rap

(VI)

Ove

rall

(I)

Per

unit

(II)

Ove

rall

(III

)M

argi

nal(

IV)

Dif

fere

nces

(V)

Rad

emac

her

Nor

mal

(a)

−0.9

05.

215.

215.

305.

265.

125.

7615

.31

7.90

−0.5

04.

844.

965.

475.

005.

205.

0115

.23

7.56

−0.3

04.

714.

755.

044.

875.

355.

0614

.95

6.92

0.00

4.72

4.65

4.87

4.83

4.98

4.86

14.6

07.

11

0.30

5.15

5.19

5.03

5.14

4.92

5.47

14.8

27.

51

0.50

4.97

5.00

5.28

4.95

5.15

5.58

15.6

47.

68

0.90

4.90

4.88

5.10

4.94

5.00

5.62

15.2

47.

62

(b)

−0.9

04.

935.

015.

105.

125.

064.

8315

.58

7.19

−0.5

04.

824.

714.

945.

004.

895.

0814

.63

7.19

−0.3

05.

135.

165.

075.

394.

975.

1814

.78

7.65

0.00

5.12

5.41

5.42

5.51

4.83

5.24

15.6

57.

75

0.30

5.18

5.25

5.32

5.20

5.02

5.24

14.7

77.

54

0.50

4.93

4.97

5.28

5.12

5.06

5.03

15.9

27.

76

0.90

5.09

5.35

5.50

5.43

5.00

5.42

14.9

67.

75

(c)

−0.9

06.

146.

636.

226.

496.

515.

8616

.93

8.76

−0.5

05.

855.

995.

766.

115.

925.

7315

.66

8.29

−0.3

05.

445.

805.

665.

865.

515.

8515

.58

7.86

0.00

5.10

5.31

5.50

5.45

5.52

5.80

15.4

57.

73

0.30

5.36

5.67

5.19

5.54

5.44

6.05

16.0

88.

03

0.50

5.11

5.40

5.29

5.37

5.64

5.63

15.4

27.

66

0.90

5.50

5.53

5.73

5.56

5.16

6.15

16.0

28.

17

(d)

−0.9

06.

947.

516.

947.

486.

916.

8417

.18

9.32

−0.5

06.

066.

515.

956.

566.

446.

5516

.59

8.87

−0.3

05.

535.

945.

356.

166.

266.

1115

.85

7.96

0.00

5.42

6.01

5.43

6.07

6.22

6.06

16.0

97.

93

0.30

5.04

5.71

5.36

5.36

5.50

5.82

15.7

97.

51

0.50

5.00

5.45

4.95

5.38

5.51

5.39

15.3

27.

50

0.90

5.91

6.13

5.17

6.15

6.19

5.98

16.5

68.

48

290 Stat Comput (2014) 24:283–296

Tabl

e4

Type

-Ier

ror

leve

l(α

=5

%)

sim

ulat

ions

for

mod

erat

esa

mpl

esi

zes

(n=

10)

with

non-

exch

ange

able

dist

ribu

tions

(a)

thro

ugh

(d)

asde

scri

bed

inth

ete

xt

Dis

trib

utio

Tn,s

tud

Perm

utat

ion

test

sB

oots

trap

test

sW

ildbo

otst

rap

(VI)

Ove

rall

(I)

Per

unit

(II)

Ove

rall

(III

)M

argi

nal(

IV)

Dif

fere

nces

(V)

Rad

emac

her

Nor

mal

(a)

−0.9

04.

994.

914.

905.

064.

964.

9012

.88

6.57

−0.5

04.

944.

914.

924.

924.

704.

9812

.42

6.31

−0.3

04.

824.

774.

754.

705.

114.

7511

.74

6.28

0.00

4.61

4.71

4.63

4.70

4.95

4.66

12.7

25.

90

0.30

4.97

5.09

4.92

4.94

4.66

5.08

12.9

36.

42

0.50

4.96

4.95

4.74

5.06

5.03

5.17

12.3

56.

50

0.90

4.86

4.91

5.02

4.97

5.35

4.90

12.3

86.

48

(b)

−0.9

04.

894.

984.

864.

884.

165.

1812

.61

6.53

−0.5

05.

115.

204.

995.

125.

465.

1713

.00

6.62

−0.3

04.

804.

974.

565.

054.

564.

6612

.37

6.18

0.00

5.13

5.31

5.12

5.26

4.99

5.00

13.0

96.

81

0.30

4.76

4.78

4.49

4.86

5.15

4.82

12.6

06.

22

0.50

5.05

5.13

4.82

5.18

4.66

5.09

13.0

66.

66

0.90

4.98

5.02

4.77

5.15

5.20

4.71

12.4

76.

45

(c)

−0.9

05.

835.

855.

815.

756.

535.

2213

.51

7.23

−0.5

06.

226.

476.

186.

145.

816.

0614

.49

8.02

−0.3

05.

826.

095.

806.

075.

705.

6313

.74

7.36

0.00

5.42

5.75

5.39

5.68

5.25

5.88

13.7

17.

61

0.30

5.09

5.27

4.96

5.14

5.37

5.42

13.2

66.

64

0.50

5.11

5.24

5.15

5.09

5.17

5.52

12.9

96.

82

0.90

5.38

5.57

5.41

5.50

5.41

5.80

13.8

97.

30

(d)

−0.9

06.

386.

456.

296.

407.

076.

2714

.62

7.89

−0.5

06.

026.

416.

276.

126.

076.

0914

.02

7.63

−0.3

05.

666.

065.

895.

876.

196.

2514

.57

7.79

0.00

5.11

5.43

5.50

5.33

6.23

5.59

13.4

36.

93

0.30

4.90

5.12

5.05

5.16

5.94

5.16

13.2

16.

43

0.50

4.82

5.03

4.95

4.97

5.40

5.60

13.2

46.

48

0.90

6.17

6.32

6.17

6.26

6.22

5.84

14.4

87.

83

Stat Comput (2014) 24:283–296 291

Tabl

e5

Pow

er(α

=5

%)

sim

ulat

ions

for

mod

erat

esa

mpl

esi

zes

(n=

10)

and

ρ=

1/2

Dis

trib

utio

Tn,s

tud

Perm

utat

ion

test

sB

oots

trap

test

sW

ildbo

otst

rap

(VI)

Ove

rall

(I)

Per

unit

(II)

Ove

rall

(III

)M

argi

nal(

IV)

Dif

fere

nces

(V)

Rad

emac

her

Nor

mal

Nor

mal

0.00

4.91

4.85

5.00

5.00

4.79

5.01

12.9

96.

61

0.10

5.75

5.74

5.56

5.54

5.88

5.84

14.4

87.

65

0.20

8.85

8.82

8.74

8.76

8.73

8.33

18.6

710

.95

0.30

14.4

114

.30

14.0

314

.23

13.9

913

.18

27.4

617

.55

0.40

20.4

420

.47

20.1

120

.39

20.2

819

.50

37.0

724

.21

0.50

28.7

828

.71

28.3

428

.81

28.4

527

.25

47.8

333

.98

0.60

39.8

039

.60

39.0

439

.87

38.9

736

.96

60.2

845

.34

0.70

51.2

451

.20

50.6

151

.17

50.2

947

.47

70.9

856

.96

0.80

60.1

260

.08

58.7

959

.82

59.4

255

.71

78.5

165

.81

0.90

72.1

371

.89

71.0

772

.34

70.7

866

.49

86.8

677

.03

1.00

80.6

480

.60

80.1

880

.85

79.6

974

.49

92.2

784

.28

LN

orm

0.00

2.90

4.15

4.66

3.69

7.29

8.43

15.6

96.

07

0.10

3.50

4.66

5.75

4.19

7.99

9.35

17.2

16.

90

0.20

5.18

7.18

8.26

6.51

10.8

412

.06

19.9

59.

43

0.30

7.97

10.6

811

.70

9.70

14.4

415

.92

24.7

513

.05

0.40

11.1

114

.21

15.7

412

.93

18.5

919

.55

29. 5

616

.62

0.50

16.2

620

.09

21.3

818

.40

23.7

124

.70

34.8

522

.44

0.60

21.4

626

.15

27.3

624

.19

29.4

730

.20

40.4

828

.50

0.70

27.6

032

.56

33.5

030

.38

35.4

535

.85

47.8

334

.97

0.80

33.0

337

.92

38.6

335

.54

40.0

139

.69

52.4

539

.62

0.90

37.2

442

.23

43.0

840

.05

43.5

43.5

355

.89

43.7

1

1.00

43.9

449

.24

49.3

646

.90

49.4

748

.32

61.8

551

.21

292 Stat Comput (2014) 24:283–296

Tabl

e6

Pow

er(α

=5

%)

sim

ulat

ions

for

mod

erat

esa

mpl

esi

zes

(n=

20)

and

ρ=

1/2

Dis

trib

utio

Tn,s

tud

Perm

utat

ion

test

sB

oots

trap

test

sW

ildbo

otst

rap

(VI)

Ove

rall

(I)

Per

unit

(II)

Ove

rall

(III

)M

argi

nal(

IV)

Dif

fere

nces

(V)

Rad

emac

her

Nor

mal

Nor

mal

0.00

5.04

4.98

4.95

5.00

4.83

4.84

9.49

5.68

0.10

6.84

6.81

6.99

6.78

6.65

6.72

12.2

37.

76

0.20

13.5

813

.45

13.4

713

.57

13.2

713

.24

21.2

514

.68

0.30

24.0

324

.07

23.9

723

.90

24.0

323

.52

35.0

325

.98

0.40

39.8

439

.51

39.9

939

.87

39.4

038

.37

52.2

342

.28

0.50

56.0

256

.21

55.7

055

.92

55.8

854

.35

67.8

858

.51

0.60

72.3

072

.15

72.1

372

.12

71.7

771

.50

82.3

374

.73

0.70

84.7

284

.75

84.5

784

.74

84.6

783

.78

91.1

686

.24

0.80

92.2

192

.12

92.2

192

.26

91.9

291

.34

96.2

193

.25

0.90

96.9

696

.93

96.9

496

.95

96.7

396

.34

98.7

097

.38

1.00

98.9

799

.01

98.9

898

.94

98.9

098

.57

99.6

499

.09

LN

orm

0.00

3.47

4.70

5.20

4.28

8.34

9.38

13.7

25.

73

0.10

4.32

5.59

6.01

5.18

9.64

10.7

814

.78

6.90

0.20

7.33

9.10

9.90

8.24

13.6

414

.69

18.9

110

.69

0.30

11.8

313

.80

14.3

513

.00

18.0

919

.11

24.7

215

.44

0.40

17.6

020

.15

21.2

619

.22

24.5

025

.59

31. 9

222

.38

0.50

24.2

327

.32

28.1

926

.17

30.8

131

.30

38.3

429

.21

0.60

31.5

534

.60

35.3

133

.37

37.6

937

.97

46.4

836

.60

0.70

39.5

542

.54

43.0

741

.39

44.3

744

.55

52.7

844

.57

0.80

47.3

350

.23

50.6

149

.26

50.3

349

.97

59.5

551

.91

0.90

55.0

257

.81

58.3

956

.86

57.1

956

.51

66.1

559

.26

1.00

60.9

763

.91

64.3

362

.96

61.6

060

.62

71.1

365

.38

Stat Comput (2014) 24:283–296 293

For larger n = 10, both permutation approaches are accu-rate and demonstrate a similar behaviour to the t-test.

To compare the power of the tests, we generate bivari-ate normally and log-normally distributed matched pairswith n = 10 and n = 20, respectively, each with correla-tion ρ = 1/2. Hereby, we shifted the data under time-point2 with δ ∈ (0,1). The simulation results for n = 10 are dis-played in Table 5. Although both the wild bootstrap methodsusing Rademacher and Gaussian weights, as well as the re-sampling tests based on scheme (III)–(V) were quite liberalin these situations, we included them in the power simula-tion study. However, to give a fair comparison between theprocedures, we will not grade them in detail and concentrateon the t-test and the permutation based approaches.

It follows from Table 5 that both the permutation ap-proaches have a comparable power to the t-test under nor-mality. Under non-normality, the power of the permutationbased approaches is remarkably higher. The same conclu-sions can be drawn for n = 20, as can be seen from Table 6.

4 Discussion

We analyzed two different permutation approaches for test-ing H0 : μ1 = μ2 with paired data under non-normality. Par-ticularly, we demonstrated that the usual assumption of ex-changeability is not necessary for the construction of per-mutation tests. We have analytically shown that permuta-tion approaches, which are based on permutations of all ob-served data (i.e. neglecting the dependency structure), areasymptotically valid procedures. The results are obtainedby investigating the conditional permutation distribution ofstudentized statistics. All results in this paper would nothold without the studentization. The investigation of permu-tation techniques in heteroscedastic repeated measures de-signs will be part of future research.

In this paper, only mean based approaches were consid-ered. Rank-based studentized permutation tests are proposedby Konietschke and Pauly (2012).

Acknowledgements The authors are grateful to an Associate Editorand two anonymous referees for helpful comments which considerablyimproved the paper. This work was supported by the German ResearchFoundation projects DFG-Br 655/16-1 and DFG-Ho 1687/9-1.

Open Access This article is distributed under the terms of the Cre-ative Commons Attribution License which permits any use, distribu-tion, and reproduction in any medium, provided the original author(s)and the source are credited.

Appendix

The next Lemma explains that it suffices to analyze the limitof the conditional distribution for proving all theorems. In

the sequel ‘P−→ ’ will denote convergence in probability as

n → ∞.

Lemma 5.1 Let ϕ∗n be one of the resampling tests (I)–(VI)

defined as in (2.2). If we have convergence

c∗n(1 − α)

P−→Φ−1(1 − α) (5.1)

for all α ∈ (0,1) and general μ ∈ R2, the test ϕ∗

n is asymp-totically effective with respect to ϕt and consistent.

Proof For completeness we start by giving a short proof forthe asymptotic exactness of ϕt : By the multivariate centrallimit theorem we have for E(X1) = μ convergence in dis-tribution

Sn := 1√n

n∑

i=1

(Xi − μ)D−→Y = (Y1, Y2)

′ ∼ N(0,Σ).

Since Tn = (1,−1)Sn is a linear transformation of Sn

we get from the continuous mapping theorem and Polya’sTheorem that under H0 : μ1 = μ2 supx∈R |P(Tn ≤ x) −Φ(x/σ)| → 0, where σ 2 = (1,−1)Σ(1,−1)T = σ 2

1 −2σ12 + σ 2

2 = Var(D1) with σ 2j := Var(X1,j ), j = 1,2 and

σ12 = Cov(X1,1,X1,2). Since V 2n is a consistent estimator

of σ 2 the result follows from Slutzky’s Theorem.Note that by (5.1) Lemma 1 in Janssen and Pauls (2003)

implies (2.3). Moreover, since the convergence (5.1) alsoholds under alternatives μ1 = μ2, the result follows fromthe convergence

Tn,stud(X) = Tn,stud

((Xi − μ)1≤i≤n

) + √n(μ1 − μ2)

Vn

P−→ sign(μ1 − μ2)∞. �

In the following we will apply Lemma 5.1. Note, that inorder to prove (5.1) it suffices to show that the conditionalresampling distribution converges weakly to a standard nor-mal distribution in probability, i.e.

supx∈R

∣∣P

(T ∗

n,stud ≤ x) − Φ(x)

∣∣ P−→0. (5.2)

Proof of Theorem 2.1 We start by analyzing the resamplingscheme (I) which is based on permuting the pooled sample:

Let Z1, . . . ,Z2n with Zi = Xi,1 for 1 ≤ i ≤ n and Zi =Xi,2 for n+1 ≤ i ≤ 2n denote the pooled sample. For study-ing the permutation test based on the resampling scheme(I) let π be a random permutation of (1, . . . ,2n), i.e. a ran-dom variable that is uniformly distributed on the symmetricgroup S2n, that is independent from X. Consider the modi-fied studentized version of Tn,stud

Tn,stud := √n Dn

(

(n − 1)−1

[n∑

i=1

(Xi,1 − X1)2

294 Stat Comput (2014) 24:283–296

+n∑

i=1

(Xi,2 − X2)2

])−1/2

=: Tn

Vn

,

see Eqs. (4.1)–(4.2) in Janssen (2005). We will complete theproof for (I) as follows: First we prove that the permutationversion of Tn

Vnderived from (I) fulfills (5.2). After that we

argue that T ∗n,stud has the same asymptotic limit behaviour

by discussing the different permutation versions of the stan-dardizations. For the first part we apply Theorem 4.1. inJanssen (2005). Note, that we have convergences in prob-ability

1

2n

2n∑

i=1

(

Zi − 1

2n

2n∑

j=1

Zj

)2

= 1

2n

n∑

i=1

X2i,1 + 1

2n

n∑

i=1

X2i,2

−(

1

2n

n∑

i=1

Xi,1 + 1

2n

n∑

i=1

Xi,2

)2

P−→ 1

2

(σ 2

1 + σ 22

) + 1

4(μ1 + μ2)

2 > 0,

by the law of large numbers, and

1√2n

max1≤i≤2n

∣∣∣∣∣Zi − 1

2n

2n∑

j=1

Zj

∣∣∣∣∣

≤ 4√n

(max

1≤i≤n|Xi,1| + max

1≤i≤n|Xi,2|

)P−→0, (5.3)

by the fact that (Xi,j /√

n )1≤i≤n fulfill the Lindeberg con-dition for each j = 1,2, which is more restrictive. HenceCondition (1.12) in his paper is fulfilled and Theorem 4.1 inJanssen (2005) implies a conditional central limit theoremfor the permutation version of Tn,stud

supx∈R

∣∣P

(Tn,stud

((Zπ(i))1≤i≤2n

) ≤ x) − Φ(x)

∣∣ P−→0.

Note that Tn,stud and Tn,stud only differ in their standardiza-tions. Hence, to complete the proof, we have to show thatthe difference V 2

n ((Zπ(i))1≤i≤2n) − V 2n ((Zπ(i))1≤i≤2n) con-

verges in probability to zero (note that both are positive ona set with probability tending to 1). Straightforward calcula-tions show that

V 2n

((Zπ(i))1≤i≤2n

) − V 2n

((Zπ(i))1≤i≤2n

)

= 2

[1

n − 1

n∑

i=1

Zπ(i)Zπ(n+i)

−(

1

n

n∑

j=1

Zπ(j)

)(1

n

n∑

j=1

Zπ(n+j)

)]

=: 2(Rπ

n,1 − Rπn,2

).

We will first study Rπn,1. The conditional expectation fulfills

n − 1

nE

(Rπ

n,1

∣∣X

)

= 1

2n(2n − 1)

1≤i =j≤2n

ZiZj

= 2n

2n − 1

(1

2n

2n∑

j=1

Zi

)2

+ 1

2n(2n − 1)

2n∑

i=1

Z2i

= 1

4(μ1 + μ2)

2 + oP (1).

Here oP (1) stands for a sequence that converges in probabil-ity to zero as n → ∞. Moreover, for the conditional secondmoment we have by the law of large numbers

n − 1

1E

((Rπ

n,1

)2∣∣X)

= 1

n2

1≤i =j≤n

E(Zπ(j)Zπ(n+j)Zπ(i)Zπ(n+i)|X)

+ 1

n2

n∑

k=1

E(Z2

π(k)Z2π(n+k)

∣∣X

)

= n − 1

n

1

2n(2n − 1)(2n − 2)(2n − 3)

×∑

1≤i1,i2,i3,i4≤2n

all =

Zi1 · · ·Zi4

+ 1

n

1

2n(2n − 1)

1≤i =j≤2n

Z2i Z

2j

= 1

2n

(2n∑

i=1

Zi

)4

+ oP (1)

=(

1

2(μ1 + μ2)

)4

+ oP (1)

= E(Rπ

n,1

∣∣X

)2 + oP (1).

Note that the third step comprised iterated applications ofthe law of large numbers together with inequalities that in-

volve the convergence in probability max1≤i≤2n Z2i /n

P−→ 0.

Altogether this shows Var(Rπn,1)

P−→ 0 so that Rπn,1 con-

verges in probability to 14 (μ1 + μ2)

2. For Rπn,2 similar cal-

Stat Comput (2014) 24:283–296 295

culations as above show that

E

(1

n

n∑

j=1

Zπ(j)

∣∣∣∣∣X

)P−→ 1

2(μ1 + μ2) and

Var

(1

n

n∑

j=1

Zπ(j)

∣∣∣∣∣X

)P−→0.

Thus 1n

∑nj=1 Zπ(j) converges in probability to 1

2 (μ1 +μ2).

Since the same holds true for ( 1n

∑nj=1 Zπ(n+j)) it follows

that Rn,2 converges in probability to 14 (μ1 + μ2)

2 whichcompletes the proof for the resampling scheme (I).

The proof for scheme (III), where we draw the resamplewith replacement from the pooled sample, can be obtainedwith similar methods.

Since case (II) is a special example of (VI), see Re-mark 2.1 above, the result follows from Theorem 2.2. Henceit remains to prove (IV). Therefore we can again proceedas in the proof of (I). First it follows from Theorem 4.2.in Janssen (2005) that T ∗

n,stud = Tn,stud(X∗) is asymptoti-cally standard normal, i.e. (5.2) holds with T ∗

n,stud replaced

by T ∗n,stud . Again the asymptotic equivalence of the different

studentizations follows as in case (I). �

Proof of Theorem 2.2 We start by verifying the result for(V). Therefore we will apply Theorem 3.1. in Janssen (2005)with the array Xn,i := Di/

√n. Note that by Eq. (3.4)

in his paper the result follows from the convergences∑n

i=1(Xn,i − X)2 = 1n

∑ni=1 D2

i − D2n

P−→ Var(D1) and

max1≤i≤n |Xn,i | P−→ 0. Here the last convergence is a con-sequence of (5.3). This finishes the proof for (V).

For the last case (VI) we analyze foremost the conditionaldistribution of the enumerator of the wild bootstrap t-typestatistic

√nD

∗n. Note, that given the data X

Wn,i := 1√n

WiDi, 1 ≤ i ≤ n

defines an array of row-wise independent random variables.It fulfills

E(Wn,i |X) = 0 and Var(Wn,i |X) = 1

nD2

i .

Hence the conditional variance of√

nD∗n fulfills

Var(√

nD∗n

∣∣X

) = 1

n

n∑

i=1

D2i

P−→E(D2

1

) =: σ 2W .

Since we also have

n∑

i=1

E(W 2

n,i1{|Wn,i | ≥ ε

}∣∣X

)

≤ E(W 2

1 1{

max1≤i≤n

|Wi | ≥ ε√

n})

−→ 0

for all ε > 0 by the dominated convergence theorem, Linde-berg’s central limit theorem implies

supx∈R

∣∣P

(√nD

∗n ≤ x

∣∣X

) − Φ(x/σW )∣∣ P−→0.

By Slutzky’s Lemma it remains to prove that V ∗2n converges

in probability to σ 2W . But this follows from the law of large

numbers since

n − 1

nV ∗2

n = 1

n

n∑

i=1

(WiDi)2 −

(1

n

n∑

j=1

WjDj

)2

converges in probability to E((W1D1)2) − E(W1D1)

2 =E(W 2

1 )E(D21) = σ 2

W . �

References

Arlot, S., Blanchard, G., Roquain, E.: Some nonasymptotic results onresampling in high dimension, I: confidence regions. Ann. Stat.38, 51–82 (2010a)

Arlot, S., Blanchard, G., Roquain, E.: Some nonasymptotic results onresampling in high dimension, II: multiple tests. Ann. Stat. 38,83–99 (2010b)

Basso, D., Pesarin, F., Salmaso, L., Solari, A.: Permutation Tests forStochastic Ordering and ANOVA. Springer, New York (2009)

Beran, R.: Diagnosing bootstrap success. Ann. Inst. Stat. Math. 49, 1–24 (1997)

Beyersmann, J., Di Termini, S., Pauly, M.: Weak convergence of thewild bootstrap for the Aalen-Johansen estimator of the cumula-tive incidence function of a competing risk. Scand. J. Stat. (2012).doi:10.1111/j.1467-9469.2012.00817.x

Bickel, P.J., Freedman, D.A.: Some asymptotic theory for the boot-strap. Ann. Stat. 9, 1196–1217 (1981)

Brombin, C., Salmaso, L.: Multi-aspect permutation tests in shapeanalysis with small sample size. Comput. Stat. Data Anal. 53,3921–3931 (2009)

Brombin, C., Salmaso, L., Ferronato, G., Galzignato, P.-F.: Multi-aspect procedures for paired data with application to biomet-ric morphing. Commun. Stat., Simul. Comput. 40, 3921–3931(2011)

Delaigle, A., Hall, P., Jin, J.: Robustness and accuracy of methods forhigh dimensional data analysis based on Student’s t-statistic. J. R.Stat. Soc. B 73, 283–301 (2011)

Good, P.: Permutation, Parametric and Bootstrap Tests of Hypotheses,3rd edn. Springer Series in Statistics. Springer, New York (2005)

Hall, P., Wilson, S.R.: Two guidelines for bootstrap hypothesis testing.Biometrics 47, 757–762 (1991)

Janssen, A.: Studentized permutation tests for non-i.i.d. hypotheses andthe generalized Behrens-Fisher problem. Stat. Probab. Lett. 36, 9–21 (1997)

Janssen, A.: Testing nonparametric statistical functionals with appli-cation to rank tests. J. Stat. Plan. Inference 81, 71–93 (1999a).Erratum: J. Stat. Plan. Inference 92, 297 (2001)

Janssen, A.: Nonparametric symmetry tests for statistical functionals.Math. Methods Stat. 8, 320–343 (1999b)

Janssen, A.: Resampling Student’s t-type statistics. Ann. Inst. Stat.Math. 57, 507–529 (2005)

296 Stat Comput (2014) 24:283–296

Janssen, A., Pauls, T.: How do bootstrap and permutation tests work?Ann. Stat. 31, 768–806 (2003)

Janssen, A., Pauls, T.: A Monte Carlo comparison of studentized boot-strap and permutation tests for heteroscedastic two-sample prob-lems. Comput. Stat. 20, 369–383 (2005)

Janssen, A., Pauly, M.: Asymptotics and effectiveness of conditionaltests with applications to randomization tests. Tech. Report, Uni-versity of Duesseldorf (2010)

Konietschke, F., Pauly, M.: A studentized permutation test for the non-parametric Behrens-Fisher problem in paired data. Electron. J.Stat. 6, 1358–1372 (2012)

Kreiss, J.-P., Paparoditis, E.: Bootstrap for dependent data: a review,with discussion, and a rejoinder. J. Korean Stat. Soc. 40, 357–378,393–395 (2011)

Lin, D.: Non-parametric inference for cumulative incidence functionsin competing risks studies. Stat. Med. 16, 901–910 (1997)

Mammen, E.: When Does Bootstrap Work? Asymptotic Results andSimulations. Springer, New York (1992)

Munzel, U.: Nonparametric methods for paired samples. Stat. Neerl.53, 277–286 (1999)

Neubert, K., Brunner, E.: A studentized permutation test for the non-parametric Behrens-Fisher problem. Comput. Stat. Data Anal. 51,5192–5204 (2007)

Omelka, M., Pauly, M.: Testing equality of correlation coefficients inan potentially unbalanced two-sample problem via permutationmethods. J. Stat. Plan. Inference 142, 1396–1406 (2012)

Pauly, M.: Discussion about the quality of F-ratio resampling tests forcomparing variances. Test 20, 163–179 (2011)

Pesarin, F., Salmaso, L.: Finite-sample consistency of combination-based permutation tests with application to repeated measures de-signs. J. Nonparametr. Stat. 22, 669–684 (2010a)

Pesarin, F., Salmaso, L.: Permutation Tests for Complex Data: Theory,Applications and Software. Wiley, Chichester (2010b)

Pesarin, F., Salmaso, L.: A review and some new results on permuta-tion testing for multivariate problems. Stat. Comput. 22, 639–646(2012)

Wu, C.: Jackknife, bootstrap and other resampling methods in regres-sion analysis. Ann. Stat. 14, 1261–1295 (1986)