testing a single regression coefficient in high ... · testing a single regression coefficient in...

36
Testing a single regression coefficient in high dimensional linear models Wei Lan Ping-Shou Zhong Runze Li Hansheng Wang Chih-Ling Tsai Technical Report Series #13-125 College of Health and Human Development The Pennsylvania State University KEYWORDS: False discovery rate; key confounder controlling; high dimensional data; single coefficient test Address correspondence to Wei Lan, Statistics School and Center of Statistical Research, Southwestern University of Finance and Economics, Chengdu, P.R. China. [email protected].

Upload: duongnga

Post on 10-Apr-2018

243 views

Category:

Documents


2 download

TRANSCRIPT

Testing a single regression coefficient in high

dimensional linear models

Wei Lan

Ping-Shou Zhong Runze Li

Hansheng Wang Chih-Ling Tsai

Technical Report Series #13-125

College of Health and Human Development The Pennsylvania State University

KEYWORDS: False discovery rate; key confounder controlling; high dimensional data; single coefficient test

Address correspondence to Wei Lan, Statistics School and Center of Statistical Research, Southwestern University of Finance and Economics, Chengdu, P.R. China. [email protected].

Testing a Single Regression Coefficient in HighDimensional Linear Models

Wei Lan, Ping-Shou Zhong, Runze Li, Hansheng Wang and Chih-Ling Tsai

November 30, 2013

Abstract

In linear regression models with high dimensional data, the classical z-test (or t-test)for testing the significance of each single regression coefficient is no longer applicable.This is mainly because the number of covariates exceeds the sample size. In thispaper, we propose a simple and novel alternative by introducing the Key conFounderControlling (KFC) method to control for predictors that are highly correlated withthe target covariate. Accordingly, the classical ordinary least squares approach can beemployed to obtain the parameter estimator. In addition, we demonstrate that theresulting estimator is consistent and asymptotically normal. This enables us to applythe z-test to assess the significance of each covariate. Based on the p-value obtainedfrom testing the significance of each covariate, we further conduct multiple hypothesistesting by controlling the false discovery rate at the nominal level. Then, we show thatthe multiple hypothesis testing achieves consistent model selection. Simulation studiesand empirical examples are presented to illustrate the finite sample performance andthe usefulness of the proposed method, respectively.

KEY WORDS: False Discovery Rate; Key Confounder Controlling; High DimensionalData; Single Coefficient Test

∗Wei Lan is Assistant Professor, Statistics School and Center of Statistical Research, Southwestern Uni-versity of Finance and Economics, Chengdu, P.R. China. Email: [email protected]. Ping-Shou Zhongis Assistant Professor, Department of Statistics and Probability, Michigan State University, East Lansing,MI 48823. Email: [email protected]. His research was supported by NSF grant DMS-1309156. RunzeLi is Distinguished Professor, Department of Statistics and The Methodology Center, The PennsylvaniaState University, University Park, PA 16802-2111. Email: [email protected]. His research was supported byNational Institute on Drug Abuse (NIDA) grant P50-DA10075, and National Cancer Institute (NCI) grantR01 CA168676. The content is solely the responsibility of the authors and does not necessarily representthe official views of the NIH or NIDA. Hansheng Wang is Professor, Department of Business Statistics andEconometrics, Guanghua School of Management, Peking University, Beijing, 100871, P.R. China. Email:[email protected]. His research was supported in part by National Natural Science Foundation ofChina (NSFC, 11131002, 11271032), Fox Ying Tong Education Foundation, the Business Intelligence Re-search Center at Peking University, and the Center for Statistical Science at Peking University. Chih-LingTsai is Robert W. Glock Chair professor, Graduate School of Management, University of California, Davis,CA 95616-8609 E-mail: [email protected]

1

1. INTRODUCTION

In linear regression models, it is a common practice to employ the z-test (or t-test)

to assess whether an individual predictor (or covariate) is significant when the number of

covariates (p) is smaller than the sample size (n). This test has been widely applied across

various fields and is available in most statistical software. One usually applies the ordinary

least squares (OLS) approach to estimate regression coefficients and standard errors for

constructing a z-test (or t-test); see, for example, Draper and Smith (1998). However, the

z-test (or t-test) is no longer applicable in a high dimensional linear model since p exceeds n.

This motivates us to modify the classical z-test (or t-test) to accommodate high dimensional

data.

In high dimensional regression analysis, hypothesis testing has attracted considerable at-

tention (Goeman et al., 2006, 2011; Zhong and Chen, 2011). Since these papers mainly focus

on testing a large set of coefficients against a high dimensional alternative, their approaches

are not applicable for testing the significance of a single coefficient. Hence, Buhlmann (2013)

recently applied the ridge estimation approach and obtained a test statistic to examine the

significance of an individual coefficient. His proposed test involves a bias correction, which

is different from the classical z-test (or t-test) via the OLS approach. In the meantime,

Zhang and Zhang (2013) proposed a low dimensional projection procedure to construct the

confidence intervals for a linear combination of a small subset of regression coefficients. The

key assumption behind their procedure is the existence of good initial estimators for the

unknown regression coefficients and the unknown standard deviation of random errors. To

this end, the penalty function with a tuning parameter is required to implement Zhang and

Zhang’s (2013) procedure. Later, van de Geer et al. (2013) extended the results of Zhang

and Zhang (2013) to broad models and general loss functions. In addition, they showed that

their estimator is optimal and pointed out that the ridge estimator of Buhlmann (2013) is

not optimal.

Instead of the ridge estimation and low dimensional projection, Fan and Lv (2008) used

1

the correlation approach to screen out those covariates that have weak correlations with the

response variable. As a result, the total number of predictors that are highly correlated

with the response variable is smaller than the sample size. Motivated by their approach,

we introduce the Key conFounder Controlling (KFC) method to control for predictors that

are highly correlated with the target covariate before a hypothesis test is conducted. If

the total number of key confounders resulting from the KFC procedure is smaller than

the sample size, their effects can be profiled out from both the response and the target

predictor via projections. Based on the profiled response and the profiled predictor, we are

able to employ a classical simple regression model to obtain the OLS estimate of the target

regression coefficient. We then demonstrate that the resulting estimator is√

n-consistent

and asymptotically normal. Accordingly, a z-test statistic can be constructed for testing

the target coefficient. Under some mild conditions, we show that the p-values obtained by

the asymptotic normal distribution satisfy the weak dependence assumption of Storey et al.

(2004). As a result, the multiple hypothesis testing procedure of Storey et al. (2004) can be

directly applied to control the false discovery rate (FDR). Finally, we demonstrate that the

proposed multiple testing procedure achieves model selection consistency.

The rest of the article is organized as follows. Section 2 introduces model notation and

proposes the KFC method. The theoretical properties of hypothesis tests via the KFC as well

as the FDR procedures are obtained. Section 3 presents simulation studies, while Section 4

provides real data analyses. Some concluding remarks are given in Section 5. All technical

details are relegated to the Appendix.

2. THE METHODOLOGY

2.1. The KFC Method

Let (Yi, Xi) be a random vector collected from the i-th subject ( 1 ≤ i ≤ n), where Yi ∈ R1

is the response variable and Xi = (Xi1, · · · , Xip)> ∈ Rp is the associated p-dimensional

predictor vector with E(Xi) = 0 and cov(Xi) = Σ = (σj1j2) ∈ Rp×p. Unless explicitly stated

2

otherwise, we hereafter assume that p À n and n tends to infinity for asymptotic behavior.

In addition, the predictors have been standardized such that E(Xij) = 0 and var(Xij) = 1,

which yields σjj = 1 for every 1 ≤ j ≤ p. Then, consider the linear regression model,

Yi = X>i β + εi, (2.1)

where β = (β1, · · · , βp)> ∈ Rp is an unknown regression coefficient vector, εi is i.i.d random

noise with mean 0 and finite variance σ2, which is independent of Xi. To assess the signifi-

cance of a single coefficient, we test the null hypothesis H0 : βj = 0 for any given j. Without

loss of generality, we focus on testing the first regression coefficient. That is,

H0 : β1 = 0 vs. H1 : β1 6= 0, (2.2)

and the same testing procedure is applicable to the rest of the individual regression coeffi-

cients.

For the sake of convenience, let Y = (Y1, · · · , Yn)> ∈ Rn be a vector of responses,

X = (X1, · · · , Xn)> ∈ Rn×p be the design matrix with the j-th column Xj ∈ Rn, and

E = (ε1, · · · , εn)> ∈ Rn. In addition, let I be an arbitrary index set with cardinality |I|.Then, define XiI = (Xij : j ∈ I)> ∈ R|I|, XI = (X1I , · · · , XnI)> = (Xj : j ∈ I) ∈ Rn×|I|,

ΣI = (σj1j2 : j1 ∈ I, j2 ∈ I) ∈ R|I|×|I|, and ΣIj = Σ>jI = (σj1j2 : j1 ∈ I, j2 = j) ∈ R|I|.

Moreover, define ΣIaIb= (σj1j2 : j1 ∈ Ia, j2 ∈ Ib) ∈ R|Ia|×|Ib| for any two arbitrary index sets

Ia and Ib, which implies ΣII = ΣI .

Before constructing the test statistic, we first control those predictors that are highly

correlated with Xi1 and could generate a confounding effect. To this end, define a predictor

Xij with j 6= 1 to be a key confounder of Xi1, if |σ1j| > c for some pre-specified threshold

value c. Then, denote S = {j : |σ1j| > c} to be a set of all key confounders associated with

Xi1. The choice of c (i.e., the KFC set S) will be discussed in Remark 1. To remove the

confounding effect due to XiS , we construct the profiled response and predictor as Y = QSY

and X1 = QSX1, respectively, where QS = In − XS(X>SXS)−1X>S ∈ Rn×n and In ∈ Rn×n is

an n by n identity matrix. We next follow the OLS approach and obtain the estimate of the

3

target coefficient β1,

β1 = (X>1 X1)−1(X>1 Y) = (X>1 QSX1)

−1(X>1 QSY).

We refer to the above procedure as the Key conFounder Controlling (KFC) method, β1 as

the KFC estimator of β1, and S as the KFC set of Xi1.

2.2. Asymptotic Normality of the KFC Estimator and Test Statistic

To study asymptotic properties of the KFC estimator β1, we define %j1j2(S) = σj1j2 −Σ>Sj1

Σ−1S ΣSj2 ∈ R1, which measures the partial covariance of Xij1 and Xij2 , after controlling

for the effect of XiS = (Xij : j ∈ S)> ∈ R|S|. The following regularity conditions are assumed

throughout this paper.

(C1) Gaussian Condition. Assume that the Xis are independent and normally distributed

with mean 0 and covariance matrix Σ.

(C2) Identification Condition. Let λmin(A) and λmax(A) stand for, respectively, the minimum

and maximum eigenvalues of any arbitrary semi-positive definite matrix A. We then

require that 0 < λmin < λmin(Σ) ≤ λmax(Σ) < λmax < ∞ for some constants λmin and

λmax.

(C3) Predictor Dimension. There exist two positive constants ~ < 1 and ν > 0 such that

log p ≤ νn~ for every n > 0.

(C4) Regression Coefficients. Assume |β| = ∑pj=1 |βj| < Cβ for some constant Cβ > 0.

(C5) Partial Covariance. There exists a constant ξ > 3/2 such that maxj 6∈S |%1j(S)| =

O(|S|−ξ) as |S| → ∞.

(C6) Dimension of the KFC Set. There exists a KFC set |S| and two positive constants Ca

and Cb such that Canν1 ≤ |S| ≤ Cbn

ν2 , where ν1 and ν2 are two positive constants

with 1/(2ξ) < ν1 ≤ ν2 < 1/3 and ~+ 3ν2 < 1.

4

Condition (C1) is a common condition used for high dimensional data to simplify theoretical

proofs; see for example, Wang (2009) and Zhang and Zhang (2013). This condition can be

relaxed to the sub-Gaussian random variables (Wang, 2012; Li et al., 2012) and our theoret-

ical results still hold. Condition (C2) assures the asymptotic identifiability. Otherwise, the

regression model (2.1) becomes asymptotically non-identifiable (Fan and Lv, 2008; Wang,

2009, 2012). By Condition (C3), the predictor dimension could diverge exponentially with

the sample size, so that the predictor dimension can be much larger than the sample size.

Condition (C4) is satisfied when βj is sparse with only a finite number of nonzero coef-

ficients. Condition (C5) requires that the partial covariance between the target covariate

and the other predictor not belonging to S, after controlling for the effect of XiS (i.e., key

confounders), converges towards 0 at a fast speed, as |S| → ∞. This condition is satisfied

for many typical covariance structures, e.g., diagonal, compound symmetric, and autore-

gressive structures. It is worth noting that, according to (C1), the conditional distribution

of Xi1 given Xi,−1 = (Xi2, · · · , Xip)> ∈ Rp−1 remains normal. As a result, there exists a

coefficient vector θ(1) = (θ(1),1, · · · , θ(1),p−1)T ∈ Rp−1 such that Xi1 = X>

i,−1θ(1) + ei1, where

ei1 is a random error that is independent of Xi,−1. Furthermore, under some sparsity and

regularity conditions given by Fan and Lv (2008), S ⊃ Sθ = {j : θ(1),j 6= 0}. This implies

that maxj 6∈S |%1j(S)| = 0. Hence, Condition (C5) is satisfied. Furthermore, this condition is

closely related to the assumption given in Theorem 5 of Zhang and Zhang (2013). Lastly,

Condition (C6) assures that the theoretical results in the theorem given below are valid even

though the size of the KFC set is much smaller than the sample size under Condition (C5).

It is a sufficient condition used to guarantee that maxj 6∈S |%1j(S)| is of order o(n−1/2), so that

the bias of β1 vanishes. Note that the size of the KFC set in Condition (C6) depends on the

rate of maxj 6∈S |%1j(S)| → 0. Thus, Condition (C6) can be dropped if θ(1) has finite non-zero

elements or Σ follows an autoregressive structure so that maxj 6∈S |%1j(S)| = O{

exp(−ζ|S|η)}

for some positive constants ζ and η. Under the above conditions, we obtain the following

result.

Theorem 1. Assume that Conditions (C1)-(C6) hold. We then have n1/2(β1 − β1) →d

5

N(0, σ2β1

), where σ2β1

= τ 2β1

(σ11 −Σ1SΣ−1S ΣS1)

−1, τ 2β1

= β>S∗(ΣS∗ −ΣS∗S+Σ−1S+ΣS+S∗)βS∗ + σ2,

S+ = {1}⋃S, and S∗ = {j : j 6∈ S+}.

Using the results of two lemmas in Appendix A, we are able to prove the above theorem; see

the detailed proofs in Appendix B. By Theorem 1, we construct the test statistic,

Z1 = n1/2β1/σβ1 , (2.3)

where σ2β1

= n−1τ 2β1

(X>1 QSX1

)−1, τ 2

β1= (n − |S+|)−1E>S+ ES+ , ES+ is the residual vector

obtained by regressing Yi on XiS+ , and XiS+ = (Xij : j ∈ S+)>. Applying similar techniques

to those used in the proof of Theorem 1, we can verify that σ2β1

is the consistent estimator of

σ2β1

. As a result, Z1 is asymptotically standard normal under H0, and one can reject the null

hypothesis if |Z1| > z1−α/2, where zα stands for the α-th quantile of the standard normal

distribution. Note that if p < n and S = {j : j 6= 1}, the test statistic Z1 is the same as the

classical z-test statistic.

Remark 1: To make the testing procedure practically useful, one needs to select the KFC

set S. According to Condition (C1), the random vector Xi follows a multivariate normal

distribution. As a result, maxj 6∈S+ |%1j(S)| = 0 if and only if Xi1 and Xij are conditionally

independent given XiS . This indicates that finding the KFC set is equivalent to identifying

nodes (variables) that have edges connected to the variable Xi1 in a Gaussian graph (see

Kalisch and Buhlmann (2007)). Hence, we adopt their approach to find the edges associated

with Xi1. To this end, let %1j(S) = X>1 X1/n be the sample counterpart of %1j(S) and

define F1j(S) = 2−1 log[{1 + %1j(S)}/{1 − %1j(S)}], which is in the spirit of Fisher’s Z-

transformation. Then, we select the smallest size of S (i.e., the largest threshold value

c) such that (n − |S| − 3)1/2 maxj 6∈S+ |F1j(S)| < z1−γ/2, for every j 6∈ S+, where γ is a

pre-specified significance level.

Remark 2. We use a single screening approach to obtain the KFC set of the target covariate,

which yields the KFC estimator of the target regression coefficient. On the other hand, Zhang

and Zhang (2013) employed the scaled lasso procedure (Sun and Zhang, 2012) to obtain the

6

initial estimators of all regression coefficients and the scale parameter estimator. Then, they

apply the classical lasso procedure to find the low dimensional projection vector. In sum,

Zhang and Zhang (2013) applied the lasso approach to find the low dimensional projection

estimator (LDPE) for the target regression coefficient. When X has orthogonal columns and

p < n, both approaches lead to the same parameter estimator as that obtained from the

marginal univariate regression (MUR). However, these two approaches are quite different,

and it seems nearly impossible to find the exact relationship between the KFC estimator

and LDPE when the columns of X are not orthogonal.

2.3. Controlling the False Discovery Rate (FDR)

In identifying significant coefficients among the high dimensional regression coefficients

βj (j = 1, · · · , p), a multiple testing procedure can be considered by testing H0j : βj = 0

simultaneously. Denote the p-value obtained by testing each individual null hypothesis, H0j,

as pj = 2{1 − Φ(|Zj|)

}, where Zj is the test statistic and can be constructed similarly to

that in equation (2.3). To guard against false discoveries, we next develop a procedure to

control the false discovery rate (Benjamini and Hochberg, 1995).

Let N0 = {j : βj = 0} be the set of variables whose associated coefficients are truly

zero and N1 = {j : βj 6= 0} be the set of variables whose associated coefficients are truly

nonzero. For any significance level t ∈ [0, 1], let V (t) = #{j ∈ N0 : pj ≤ t} be the number of

falsely rejected hypotheses, S(t) = #{j ∈ N1 : pj ≤ t} be the number of correctly rejected

hypotheses, and R(t) = #{j : pj ≤ t} be the total number of rejected hypotheses. We adopt

the approach of Storey et al. (2004) to implement the multiple testing procedure, which is

less conservative than the method of Benjamini and Hochberg (1995) and is applicable under

a general weak dependence structure (Storey et al., 2004). To this end, define FDP(t) =

V (t)/[R(t) ∨ 1] and FDR(t) = E{V (t)/[R(t) ∨ 1]

}, where R(t) ∨ 1 = max{R(t), 1}. Then,

the estimator proposed by Storey (2002) is

FDRλ(t) =π0(λ)t

{R(t) ∨ 1}/p, (2.4)

7

where π0(λ) ={(1−λ)p

}−1{p−R(λ)

}is an estimate of π0 = p0/p and λ ∈ [0, 1) is a tuning

parameter. Then, for any pre-specified significance level q and a fixed λ, consider the cutoff

point chosen by the thresholding rule, tq(FDRλ

)= sup

{0 ≤ t ≤ 1 : FDRλ(t) ≤ q

}. We

reject the null hypotheses for those p-values that are less than or equal to tq(FDRλ).

To study the theoretical property of FDRλ(t), we begin by introducing some notation

and definitions. Let p0 = |N0| be the number of null hypotheses and p be the total number

of hypotheses. In addition, let T1,n(t) = p−1∑p

j=1 P (pj ≤ t) be the average probability of

all rejected hypotheses. Define Sj to be the KFC set of covariate Xij. We next demonstrate

that tq(FDRλ

)asymptotically provides strong control of FDR at the pre-specified nominal

level q.

Theorem 2. Assume that p0/p → 1 as p goes to infinity, limn→∞ T1,n(t) = T1(t) and,

for any k ∈ N0,∑

j∈N0

∑l∈Sj

σ2lk = o(p/Λ2

0), where T1(t) is a continuous function and

Λ0 = max{

maxj∈N0 |Sj|, |N1|}. Under Conditions (C1)-(C6), we have that

lim supn→∞ FDR{tq(FDRλ)

} ≤ q.

The proof is given in Appendix C. In general, the dependencies among the test statistics Zj

become stronger as the overlap among the KFC sets increases. To control the dependence

of test statistics, they must have weaker dependence among covariates as the size of overlap

increases. Accordingly, the condition∑

j∈N0

∑l∈Sj

σ2lk = o(p/Λ2

0) for any k ∈ N0, in Theorem

2, controls the overall dependence between the covariates in the union of the KFC sets ∪l∈N0Sl

for any fixed k ∈ N0; see Fan et al. (2012) for a similar condition on dependence. In addition,

Λ0 provides an upper bound on the size of the overlap among the KFC sets. Assume that

Σ follows an autoregressive structure such that the |Sj| are small compared with n. This,

together with Condition (C2), implies that∑

j σ2jk ≤ λ2

max for any 1 ≤ k ≤ p. Hence, the

above condition is satisfied.

2.4. Model Selection Consistency

According to Theorem 2, for any given significance level q > 0, the FDR can be con-

trolled asymptotically by setting the threshold at t = tq(FDRλ). This result motivates us

8

to further investigate the model selection consistency by letting q → 0. In fact, the model

selection consistency in high dimensional linear models has been intensively studied in the

variable selection literature. There is a large body of papers discussing the model selection

consistency via the penalized likelihood approach (e.g., Meinshausen and Buhlmann, 2006;

Zhao and Yu, 2007; Huang et al., 2007). However, the use of p-values for model selection

has not received considerable attention. Some exceptions include Bunea et al. (2006) who

considered the variable selection consistency using p-values under the condition p = o(n1/2),

and Meinshausen et al. (2009) who investigated the consistency of a two-step procedure in-

volving screening and then a multiple test procedure. It is worth noting that the p-value

obtained in Meinshausen et al. (2009) is not designed for assessing the significance of single

coefficient. The aim of this section in our paper is to study the model selection consistency

using p-values obtained from the Z-tests in Section 2.2.

For any given nominal levels αn, let N αn1 = {j : pj ≤ αn} be an estimate of N1, the

set containing all the variables whose associated coefficients are truly nonzero. Assume

that αn → 0 as n → ∞. By Theorem 2, the probability of obtaining false discovers is

P{|N αn

1 ∩N0| ≥ 1} ≤ αn → 0, which implies that P

{N αn1 ⊂ N1

} → 1. Thus, this procedure

requires a sure screening property P{N1 ⊂ N αn

1

} → 1 to obtain model selection consistency.

Before demonstrating this property, two additional assumptions are given below.

(C7) There exist two positive constants κ and Cκ such that minj∈N1 |βj| > Cκn−κ for κ+~ <

1/2, where ~ is defined in (C3).

(C8) There exists some positive constant Ce such that for any ` > 0 and 1 ≤ j ≤ p,

P(n−1|X>

j E| > `)≤ exp(−Cen`2).

Condition (C7) is a minimum signal assumption, and similar conditions are commonly con-

sidered in variable screening literature (Fan and Lv, 2008; Wang, 2009). We further as-

sume that the random errors εi are independent normally distributed. Using the fact that

n−1‖Xj‖2 → 1, n−1/2X>j E follows a normal distribution with finite variance for j = 1, · · · , p,

9

Condition (C8) is satisfied. The above conditions, together with Conditions (C1) to (C6),

lead to the following result.

Theorem 3. Under Conditions (C1)-(C8), there exists a sequence of significance levels

αn → 0 such that P{N αn

1 = N1

} → 1.

The proof of Theorem 3 is given in Appendix D. According to the proof of Theorem 3, one

can select αn at the level of αn = 2{1−Φ(n)} with ~ < < 1/2− κ. This selection implies

that pαn/ log(p) → 0 as n → ∞, which is similar to the assumption (Cq) in Bunea et al.

(2006). Compared with the penalized likelihood method, the proposed testing procedure is

able to control the false discovery rate and the family-wise error rate for the given αn. This

is important especially in the finite sample case; see Meinshausen et al. (2009) for a detailed

discussion.

3. SIMULATION STUDIES

To demonstrate the finite sample performance of the proposed methods, we consider three

simulation studies with different covariance patterns and distributions among predictors.

Each simulation includes three different sample sizes (n=100, 200, 500) and two different

dimensions of predictors (p=500 and 1,000). All simulation results presented in this section

were based on 1,000 realizations. The nominal level α of the KFC test and the significance

level q of FDR are both set to 5%. Moreover, to determine the KFC set for each predictor,

three different significance levels were considered (α = 0.01, 0.05, and 0.10). Since the results

were similar, we only report the case with the nominal level α = 0.05.

To study the significance of each individual regression coefficient, consider the pro-

posed test statistic Zrj for testing the j-th coefficient in the r-th simulation, where j =

1, · · · , p and r = 1, · · · , 1000. Then, define an indicator measure as Irj = I(|Zrj| >

z1−α/2) and compute the empirical rejection probability (ERP) for the j-th coefficient test,

ERPj = 1000−1∑1000

r=1 Irj. As a result, ERPj is the empirical size under the null hypothesis

10

H0j : βj = 0, while it is the empirical power under the alternative hypothesis. Subse-

quently, define the average empirical size (ES) and the average empirical power (EP) as

ES = |N0|−1∑

j∈N0ERPj and EP = |N1|−1

∑j∈N1

ERPj, respectively. Accordingly, ES and

EP provide overall measures for assessing the performance of the single coefficient test. Based

on the p-values of the Zrj tests, we next employ the multiple testing procedure of Storey et

al. (2004) to study the performance of multiple tests via the empirical FDR discussed in

Section 2.3. It is worth noting that we adopt the commonly used tuning parameter λ = 1/2

in the first two examples, and its robustness is evaluated in Example 3. To assess the effect of

model selection consistency, we examine the average true rate TR = |N α1 ∩N1|/|N1| and the

average false rate FR = |N α1 ∩N0|/|N0|. When the true model can be identified consistently,

TR and FR should approach 1 and 0, respectively, as the sample size gets large. For the

sake of comparison, we also examine the marginal univariate regression (MUR) test (i.e.,

the classical t-test obtained from the marginal univariate regression model) in Monte Carlo

studies.

Example 1: Autocorrelated Predictors

Consider a linear regression model with autocorrelated predictors Xi generated from

a multivariate normal distribution with mean 0 and covariance Σ = (σj1j2) ∈ Rp×p with

σj1j2 = 0.5|j1−j2|. Although different predictors are correlated with each other, the correlation

decreases to 0 as the distance |j1 − j2| between Xij1 and Xij2 increases. The regression

coefficient vector β is such that β1 = 3, β4 = 1.5, β7 = 2, and βj = 0 for any j 6∈ {1, 4, 7},the variance of ε is σ2 = 1.

Table 1 shows that KFC is slightly better than MUR in controlling the size, while MUR

has a larger power than KFC. After closely examining the MUR’s performance, however, we

find that its ES can be misleading. For example, Xi2 ∈ N0 and it is moderately correlated

with a non-zero predictor Xi1 ∈ N1. As a result, the empirical size for testing H0 : β2 = 0

obtained from MUR could be as large as 0.90 in almost all realizations. On the other hand,

the most predictors in N0 are nearly independent of those predictors in N1 and the response

11

variable. Accordingly, MUR can have a reasonably average empirical size and a high average

true rate TR. This misleading result can be detected by its empirical false discovery rate,

which is much greater than the nominal level. Moreover, FDR gets larger as the sample size

increases, and this phenomenon is also observed at the average false rate FR. Therefore, the

MUR approach should be used with caution when testing a single coefficient, conducting

multiple hypothesis tests, or selecting variables.

In contrast to MUR, the KFC approach not only controls the size well, but also leads

to FDR closing to the nominal level as the sample size increases. Furthermore, the average

TR increases to 1 and the average FR decreases to 0, which are consistent with theoretical

findings. In sum, KFC performs well for testing a single coefficient, and the resulting p-values

are reliable for multiple hypothesis tests and model selections.

Example 2: Equally Correlated Predictors

Consider a model with equally correlated predictors Xi generated from a multivariate

normal distribution with mean 0 and a compound symmetric covariance matrix Σ = (σj1j2) ∈Rp×p, where σj1j2 = 1 if j1 = j2 and σj1j2 = 0.5 for any j1 6= j2. In addition, the regression

coefficients are set as follows: βj = 5 for 1 ≤ j ≤ 3, and βj = 0 for j > 3, and the variance

of ε is σ2 = 1.

Table 2 presents the results of ES, EP, FDR, TR, and FR for both KFC and MUR

methods. It indicates that MUR performs poorly in terms of both ES and FDR measures.

This finding is not surprising because every predictor in N0 is equally correlated with those

predictors in N1. As a result, the marginal correlation between any predictor in N0 and the

response variable is well bounded away from 0. Thus, MUR’s empirical rejection probability

is close to 100%, which leads to highly inflated ES and FDR. Furthermore, FR tends to 1 as

the sample size increases, which implies that MUR tends to over reject the null hypothesis

in large samples. On the other hand, the ES and FDR of KFC are close to the nominal level.

Moreover, TR and EP increase to 1 and FR decreases to 0 as the sample size gets larger.

The above results are supported by the theoretical findings and demonstrate that KFC is

12

reliable.

From the above two simulation studies, we find that FDR plays an important role for

examining the reliable of test statistics. Hence, we next study the accuracy for the estimation

of FDR discussed in Section 2.3. Since we are interested in the statistical behavior of

the number of false discoveries V (t), we follow Fan et al.’s (2012) suggestion and compare

FDRλ(t) in (2.4), with λ = 1/2, to FDP(t) calculated via V (t)/[R(t) ∨ 1]. For the sake of

illustration, we consider the same simulation settings as given in Examples 1 and 2 with

n = 100 and p = 500. Panels A and B in Figure 1 depict FDRλ(t) and FDP(t), obtained

via the KFC method for Examples 1 and 2, respectively, across various t values. In contrast,

Panels C and D are calculated via the MUR approach. Figure 1 clearly shows that FDRλ(t)

calculated from the p-values of KFC are reliable and consistent with the theoretical finding

in Theorem 2. However, MUR does not provide accurate estimates of FDP, and it should

be used with caution in high dimensional data analysis.

Example 3: Robustness of Covariate Distribution and λ Parameter

In the first two examples, the covariate vector Xis were generated from a multivariate

normal distribution with λ = 1/2. To assess the robustness of KFC against the covariate

distribution and λ, we conduct simulation studies for various λs and three distributions

of Xi = Σ1/2Zi, where each element of Zi is randomly generated from the standard nor-

mal distribution, the standardized exponential distribution exp(1), and the normal mixture

distribution 0.1N(0, 32) + 0.9N(0, 1), respectively, for i = 1, · · · , n, and the Σs are corre-

spondingly defined in Examples 1 and 2. Since all results are qualitatively similar, we only

report the case when λ = 0.1 and Zi follows a standardized exponential distribution. The

results in Tables 3 and 4 show similar findings to those in Tables 1 and 2, respectively.

Hence, Monte Carlo studies indicate that the KFC approach is robust against the covariate

distribution and the threshold parameter λ.

4. REAL DATA ANALYSIS

To illustrate the usefulness of the proposed method, we consider two empirical examples.

13

The first example studies supermarket data in China (Wang, 2009), in which p (6,398) is

much larger than n (464). The second example analyzes a corn spectroscopic data set, with

p = 700 and n = 80, that has been intensively studied by Lee et al. (2011).

4.1. Supermarket Data

This data set contains a total of n = 464 daily records. For each record, the response vari-

able (Yi) is the number of customers and the predictors (Xi1, · · · , Xip) are the sales volumes

of p = 6, 398 products. Consider a linear relationship between Yi and Xi = (Xi1, · · · , Xip)> ∈

Rp, given by Yi = X>i β + εi, where both the response and predictors are standardized so

that they have zero sample mean and unit sample variance.

To identify important products (predictors) that are associated with Yi, we apply the

proposed KFC and MUR methods and test the significance of each regression coefficient,

namely H0j : βj = 0 vs H1j : βj 6= 0. If the underlying coefficient structure is highly sparse,

the number of predictors in N1 is very small. This implies that most of the p-values are

computed for the case where the null hypothesis is true, and the asymptotic distribution

of those p-values should be uniform[0,1]. This, together with a small number of p-values

calculated for H0j for j ∈ N1, would yield a very flat histogram except for a bump around

the point 0. Figure 2 shows the histogram of the p-values for testing H0j (j = 1, · · · , p)

using the proposed KFC method. As one can observe from the histogram, the pattern

indicates that most of the H0j are true and the p-values are asymptotically valid. There

were 1,426 p-values that are less than the significance level α = 5%. After controlling the

false discoveries rate via the method of Storey et al. (2004) at the level of q = 5%, the

number of hypotheses H0j being rejected is 132. In other words, we have identified 132 most

important products on which the supermarket decision maker (or manager) might perform

further analysis. However, the histogram of the p-values generated from the MUR tests in

Figure 2 shows a heavily skewed pattern with a very thin tail. This suggests that most of its

p-values are very small. It rejected a total of 5,648 hypotheses H0j after controlling the FDR

at the level of q = 5%. In sum, KFC provides a simple and efficient approach to identifying

important products from high dimensional data, while MUR fails to accomplish this.

14

4.2. Corn Spectroscopic Data

The data consist of 80 samples (n=80) of corn measured on 3 different near-infrared

(NIR) spectrometers (m5spec, mp5spec and mp6spec). Each spectrum contains 700 channels

measured in a wavelength ranging from 1100nm to 2498nm at 2 nm intervals. The response

variables of interest are the moisture, oil, protein and starch values in each sample. In this

study, we only consider moisture as the response variable (Yi ∈ R1), and the covariates

(Xi ∈ Rp) are measured on the m5spec spectrometer, with p = 700. Following Lee et al.’s

(2011) study, there is a linear relationship between Yi and Xi, which is Yi = X>i β + εi, for

some independent random error εi. The main goal of this analysis is to identify important

variables that are associated with Yi.

We apply the proposed KFC and MUR methods to test the significance of each regression

coefficient, namely H0j : βj = 0 vs H1j : βj 6= 0. Figure 3 depicts the two histograms of

the p-values computed via the KFC and MUR methods, respectively. It indicates that, for

the KFC method, most of the H0js are valid and the histogram of the p-values are flat.

Specifically, 95 variables were significant under significance level α = 5%. After controlling

the false discovery rate at the level of q = 5%, the number of hypotheses H0j being rejected

is decreased to 6. In contrast, for the MUR method, the histogram of the p-values are

extremely skewed with very thin tails. This finding is not surprising since the covariates in

the model are highly multicollinear, as observed by Lee et al. (2011). In addition, all of the

variables are significant even after controlling the false discovery rate. This implies that the

information obtained from the MUR method was quite limited, and the resulting p-values

might be invalid. In sum, the KFC method effectively identifies 6 most important channels

that are significantly associated with the moisture of 80 corn samples.

The above two above examples indicate that the KFC method not only practically pro-

vides a simple and efficient approach to compute the p-value for testing a single coefficient in

a high dimensional linear model, but also results in reliable p-values for multiple hypothesis

testing.

15

5. DISCUSSION

In linear regression models with high dimensional data, we propose a single screening

procedure, Key conFounder Controlling (KFC), to control for predictors that are highly

correlated with the target covariate. This allows us to employ the classical ordinary least

squares approach to obtain the parameter estimator. We then demonstrate that the resulting

estimator is asymptotically normal. Accordingly, we extend the classical t-test (or z-test)

for testing the single coefficient to high dimensional data. Based on the p-value obtained

from testing the significance of each covariate, the multiple hypothesis testing is established

by controlling the false discovery rate at the nominal level. Moreover, we show that multiple

hypothesis testing leads to consistent model selection.

The proposed KFC method can be extended to testing a small subset of regression coef-

ficients. Consider the hypothesis:

H0 : βM = 0 vs. H1 : βM 6= 0, (5.1)

where M is a pre-specified index set with a fixed size and βM = (βj : j ∈M)> ∈ R|M| is the

subvector of β corresponding to M. Without loss of generality, we assume that M = {j :

1 ≤ j ≤ |M|} and 1 < |M| ¿ n. Then, define an overall KFC set of M as SM =⋃

j∈M Sj,

where Sj is KFC set for the jth predictor in M. Accordingly, the target parameter estimator

βM = (βj : j ∈M)> ∈ R|M| can be estimated by βM = (X>MQSMXM)−1(X>MQSMY), where

XM = (Xj : j ∈ M) ∈ Rn×|M|. Applying similar techniques to those used in the proof

of Theorem 1, we can show that n1/2(βM − βM) →d N(0, Σβ), where Σβ = σ2M(ΣM −

Σ>SMΣ−1

S ΣSM)−1 with σ2M = βT

S∗M(ΣS∗M −ΣS∗MS+

MΣ−1

S+M

ΣS+MS∗M

)βS∗M + σ2, S+

M = ∪j∈MS+j and

S∗M = {j : j 6∈ S+M}. Consequently, an F -type test statistic can be constructed to test (5.1).

To broaden the usefulness of the proposed method, we conclude the article by discussing

two possible research avenues. Firstly, from the model aspect, it would be practically useful

to extend the KFC method to generalized linear models, single index models, partial linear

models, and survival models. Secondly, from the data aspect, it is important to general-

16

ize the proposed KFC method to accommodate category explanatory variables, repeated

measurements, and missing observations. We believe these extensions would enhance the

usefulness of KFC in high dimensional data analysis.

TECHNICAL APPENDIX

Appendix A. Four Useful Lemmas

Before proving the theoretical results, we present the following four lemmas which are

needed in the proofs. The first lemma is directly borrowed from Lemma A.3 of Bickel and

Levina (2008), and the second lemma can be found in Bendat and Piersol (1966). As a

result, we only verify the third and fourth lemmas.

Lemma 1. Let σj1j2 = n−1∑

i Xij1Xij2 and assume that Condition (C1) holds. Then, there

exist three positive constants ζ0 > 0, C1 > 0, and C2 > 0, such that P (|σj1j2 − σj1j2| > ζ) ≤C1 exp(−C2nζ2) for any 0 < ζ < ζ0 and every 1 ≤ j1, j2 ≤ p.

Lemma 2. Let (U1, U2, U3, U4)> ∈ R4 be a 4-dimensional normal random vector with E(Uj) =

0 and var(Uj) = 1 for 1 ≤ j ≤ 4. We then have E(U1U2U3U4) = δ12δ34 + δ13δ24 + δ14δ23,

where δij = E(UiUj).

Lemma 3. Assume that Conditions (C1)–(C3) hold, and m = O(nν2) for some positive con-

stant ν2 which satisfies 3ν2+~ < 1, where ~ is given in (C3). Then, max|S|≤m |n−1X>1 QSX1−(σ11 − Σ1SΣ−1

S ΣS1)| →p 0.

Proof. Since n−1X>1 QSX1 = n−1X>1 X1−(n−1X>1 XS)(n−1X>SXS)−1(n−1X>SX1) and n−1X>1 X1 →p

σ11, it suffices to show that

max|S|≤m

∣∣∣(n−1X>1 XS)(n−1X>SXS)−1(n−1X>SX1)− Σ1SΣ−1S ΣS1

∣∣∣ →p 0. (A.1)

Denote ‖A‖ = {tr(AA>)}1/2 for any arbitrary matrix A. Then, we obtain (A.1) if the three

uniform convergence results given below can be proved.

max|S|≤m

∥∥∥n−1X>SX1 − ΣS1

∥∥∥ = op(1), (A.2)

max|S|≤m

∥∥∥n−1X>SX1

∥∥∥ < 2λmax, (A.3)

17

and max|S|≤m

∥∥∥n−1X>SXS − ΣS∥∥∥ = op(1). (A.4)

By Condition (C2), one can easily verify that ‖ΣS1‖2 ≤ λ2max. This, in conjunction with

(A.2), implies (A.3). Accordingly, it suffices to demonstrate (A.2) and (A.4).

It is noteworthy that, for any S satisfying |S| ≤ m, we have

‖n−1X>SX1 − ΣS1

∥∥ ={ ∑

j∈S(σ1j − σ1j)

2}1/2

≤ m1/2 maxj∈S

|σ1j − σ1j|.

This, together with the Bonferroni inequality, Condition (C1), Lemma 1, and the fact that

#{S ⊂ {1, · · · , p} : |S| ≤ m} ≤ pm, implies

P(

max|S|≤m

∥∥n−1X>SX1 − ΣS1

∥∥ > ε)≤

|S|≤m

P(∥∥n−1X>SX1 − ΣS1

∥∥ > ε)

≤∑

|S|≤m

P(

maxj∈S

|σ1j − σ1j| > ε/m1/2)≤

|S|≤m

∑j∈S

P(|σ1j − σ1j| > ε/m1/2

)

≤ pmmC1 exp(−C2nm−1ε2) = C1 exp(− C2nm−1ε2 + log m + m log p

). (A.5)

Furthermore, by the assumptions in Lemma 3 (m = O(nν2)) and Condition (C3) (log p ≤νn~), we have that m log p = O(nν2+~). Moreover, using the assumptions in Lemma 3 again

(3ν2 + ~ < 1), the right-hand side of (A.5) converges towards 0 as n →∞. Hence, we have

proved (A.2). Applying similar techniques to those used in the proof of (A.2), we can also

demonstrate (A.4). This completes the entire proof.

Lemma 4. Assume that (a) limp→∞ V (t)/p0 = G0(t) and limp→∞ S(t)/(p − p0) = G1(t),

where G0(t) and G1(t) are continuous functions; (b) 0 < G0(t) ≤ t for t ∈ (0, 1]; (c)

limp→∞ p0/p = 1. Then, we have lim supp→∞ FDR{tα(FDRλ)} ≤ α.

Proof. By slightly modifying the proof of Theorem 4 in Storey et al. (2004), we can

demonstrate the result. The detailed proof can be obtained upon request from the authors.

Appendix B. Proof of Theorem 1

Let T1 = (X>1 QSX1)−1X>1 QSE and T2 = (X>1 QSX1)

−1X>1 QSXS∗βS∗ . Then, β1 − β1 =

T1 + T2. Using the fact that E(εi|Xi) = 0, one can show that cov(T1, T2) = E(T1T2) −E(T1)E(T2) = 0. Therefore, T1 and T2 are uncorrelated. To prove the theorem, hence,

it suffices to show that (√

nT1,√

nT2)> is asymptotically bivariate normal. By Conditions

18

(C1)–(C3) and Lemma 2, we obtain that ‖n−1X>1 XS − Σ1S‖ →p 0, ‖n−1X>SXS − ΣS‖ →p 0,

and maxS∣∣n−1/2X>1 QSX1 − (σ11 − Σ1SΣ−1

S ΣS1)∣∣ →p 0. Accordingly, we have

√nT1 =

{1 + op(1)

}(σ11 − Σ1SΣ−1

S ΣS1

)−1(n−1/2X>1 E − n−1/2Σ1SΣ−1

S X>S E

)

={1 + op(1)

}(σ11 − Σ1SΣ−1

S ΣS1

)−1{

n−1/2

n∑i=1

(Xi1 − Σ1SΣ−1S XiS)εi

}.

Applying the same arguments as those given above, we also obtain that

√nT2 =

{1 + op(1)

}(σ11 − Σ1SΣ−1

S ΣS1

)−1{

n−1/2

n∑i=1

(Xi1 − Σ1SΣ−1S XiS)XiS∗βS∗

}.

Let ξi1 = (Xi1 − Σ1SΣ−1S XiS)εi and δi1 = (Xi1 − Σ1SΣ−1

S XiS)XiS∗βS∗ . Then, it can be

shown that E(ξi1) = 0, var(ξi1) = σ2(σ11−Σ1SΣ−1S ΣS1), and E(δi1) = (Σ1S∗−Σ1SΣ−1

S ΣSS∗)βS∗ .

Using Conditions (C4)-(C6), we further obtain that√

n|(Σ1S∗ − Σ1SΣ−1S ΣSS∗)βS∗| < Cβ ×√

n maxj 6∈S+ |%1j(S)| → 0. Moreover,

var(δi1) → E(δ2i1) = E

{(Xi1 − Σ1SΣ−1

S XiS)2β>S∗XiS∗X>iS∗βS∗

}

= E

[(Xi1 − Σ1SΣ−1

S XiS)2β>S∗E{

XiS∗X>iS∗|XiS+

}βS∗

]

→(σ11 − Σ1SΣ−1

S ΣS1

)β>S∗

(ΣS∗ − ΣS∗S+Σ−1

S+ΣS+S∗)βS∗ .

The bivariate Central Limit Theorem, together with the above results, implies that

(√nT1,

√nT2

)>=

(1 + op(1)

)(σ11 − Σ1SΣ−1

S ΣS1

)−1{

n−1/2∑i=1

(ξi1, δi1)>}

is asymptotically bivariate normal with mean zero and diagonal covariance matrix V =

Diag(Vii). In addition, V11 = σ2(σ11 − Σ1SΣ−1S ΣS1)

−1 and V22 =(σ11 − Σ1SΣ−1

S ΣS1

)−1β>S∗(

ΣS∗ − ΣS∗S+Σ−1S+ΣS+S∗

)βS∗ . Consequently,

√n(T1+T2) is asymptotically normal with mean

zero and variance V11 + V22, which completes the proof.

Appendix C: Proof of Theorem 2

We mainly apply Lemma 4 to prove Theorem 2. To this end, we need to show the

19

following two results,

1

p

p∑j=1

I(pj ≤ t)− T1,n(t) → 0 a.s. and (A.6)

1

p0

∑j∈N0

I(pj ≤ t)−G0,n(t) → 0 a.s., (A.7)

as p → ∞, where G0,n(t) = p−10

∑j∈N0

P (pj ≤ t). Since the proofs for (A.6) and (A.7) are

quite similar, we only verify (A.6). By the law of large numbers, it is enough to show that

var{1

p

p∑j=1

I(pj ≤ t)}

= O(p−δ) for any δ > 0. (A.8)

It is worth noting that the left-hand side of (A.8) is equivalent to

var{1

p

p∑j=1

I(|Zj| ≥ z1−t/2)}

= var{1

p

∑j∈N0

I(|Zj| ≥ z1−t/2)}

+ var{1

p

∑j∈N1

I(|Zj| ≥ z1−t/2)}

+2

p2

∑j1∈N0

∑j2∈N1

cov{

I(|Zj1| ≥ z1−t/2), I(|Zj2| ≥ z1−t/2)}

:= J1 + J2 + 2J3. (A.9)

Using the fact that var{I(|Zj| ≥ z1−t/2)

} ≤ E{I(|Zj| ≥ z1−t/2)

} ≤ 1 and the assump-

tion that p0/p → 1, together with the Cauchy-Schwarz inequality, we have that J2 ≤p−2|N1|

∑j∈N1

var{I(|Zj| ≥ z1−t/2)

} ≤ (p− p0)2/p2 → 0. In addition, applying the Cauchy-

Schwarz inequality, we obtain

J23 ≤ var

{1

p

∑j∈N0

I(|Zj| ≥ z1−t/2)}

var{1

p

∑j∈N1

I(|Zj| ≥ z1−t/2)}≤ (p− p0)

2p20/p

4 → 0.

Accordingly, to prove (A.8), we only need to show that J1 = O(p−δ) for some δ > 0. It can

be seen that

J1 =1

p2

∑j∈N0

var{

I(|Zj| ≥ z1−t/2)}

+1

p2

j1 6=j2,j1,j2∈N0

cov{

I(|Zj1| ≥ z1−t/2), I(|Zj2| ≥ z1−t/2)}

:= J11 + J12.

Since J11 ≤ p0/p2 → 0 as p →∞, it suffices to show that J12 = O(p−δ) for some δ > 0. Note

20

that

cov{

I(|Zj1| ≥ z1−t/2), I(|Zj2| ≥ z1−t/2)}

:= I1 + I2 + I3 + I4,

where I1 = E{I(Zj1 ≥ z1−t/2)I(Zj2 ≥ z1−t/2)

} − E{I(Zj1 ≥ z1−t/2)}E{I(Zj2 ≥ z1−t/2)

},

I2 = E{I(Zj1 ≥ z1−t/2)I(Zj2 ≤ −z1−t/2)} − E{I(Zj1 ≥ z1−t/2)}E{I(Zj2 ≤ −z1−t/2)

}, I3 =

E{I(Zj1 ≤ −z1−t/2)I(Zj2 ≥ z1−t/2)} − E{I(Zj1 ≤ −z1−t/2)}E{I(Zj2 ≥ z1−t/2)

}and I4 =

E{I(Zj1 ≤ −z1−t/2)I(Zj2 ≤ −z1−t/2)} − E{I(Zj1 ≤ −z1−t/2)}E{I(Zj2 ≤ −z1−t/2)

}. Since

the proofs for I1 to I4 are essentially the same, we only focus on I1.

Applying the asymptotic expansion of Zj given in the proof of Theorem 1, we have

Zj =

√nβj

σβj

=

√nβj

σβj

+ n−1/2

n∑i=1

uij + op(1), (A.10)

where uij = σ−1βj

(δij + ξij), δij = (Xij − ΣjSjΣ−1Sj

XSji)XiS∗j βS∗j , ξij = (Xij − ΣjSjΣ−1Sj

XSji)εi,

and σ2βj

= (σjj − ΣjSjΣ−1Sj

ΣSjj){β>S∗j (ΣS∗j − ΣS∗j S+

jΣ−1

S+j

ΣS+j S∗j )βS∗j + σ2

}. As a result, for any

j ∈ N0, Zj = n−1/2∑n

i=1 uij + op(1) and Zj can be expressed as a summation of independent

and identically distributed (i.i.d.) random variables uij. In addition, Condition (C1) implies

that uij has an exponential tail. This, together with the bivariate large deviation result

(Zhong et al., 2013), leads to

E{

I(Zj1 ≥ z1−t/2)I(Zj2 ≥ z1−t/2)}

= U(z1−t/2, z1−t/2; ρj1j2

){1 + o(1)

},

where ρj1j2 = corr(uij1 , uij2) and

U(a, b; ρ) ={

2π(1− ρ2)1/2}−1

∫ ∞

a

∫ ∞

b

exp{− 1

2(1− ρ2)(y2

1 + y22 − 2ρy1y2)

}dy1dy2.

Without loss of generality, we assume that ρj1j2 = corr(uij1 , uij2) > 0 and z1−t/2 > 0. Then,

by using the inequality in Willink (2004), we have

Φ(z1−t/2)Φ(ζz1−t/2) ≤ U(z1−t/2, z1−t/2; ρj1j2) ≤ Φ(z1−t/2

(ζz1−t/2

)(1 + ρj1j2

), (A.11)

where ζ ={(1− ρj1j2)/(1 + ρj1j2)

}1/2. Accordingly, we obtain that

Φ(z1−t/2){

Φ(ζz1−t/2)− Φ(z1−t/2)}≤ I1 ≤ Φ(z1−t/2)

{Φ(ζz1−t/2)(1 + ρj1j2)− Φ(z1−t/2)

}.

21

After algebraic simplification, (1− ζ)/ρj1j2 → 1 as ρj1j2 → 0. Hence, I1/(CIρj1j2) → 1 for a

positive constant CI , which implies that cov{I(|Zj1| ≥ z1−t/2), I(|Zj2| ≥ z1−t/2)

} ≈ CI |ρj1j2|.Consequently, if

∑j∈N0

|ρjk| = o(p) for any k ∈ N0, then J12 = O(p−δ) for some δ > 0.

To complete the proof, we next verify the above condition∑

j∈N0|ρjk| = o(p) for any

k ∈ N0. By the Cauchy-Schwarz inequality, we need to show that∑

j∈N0ρ2

jk = o(p). Condi-

tion (C2) and Condition (C4) yield σjj − ΣjSjΣ−1Sj

ΣSjj ≤ λ−1min uniformly and β>S∗j (ΣS∗j −

ΣS∗j S+jΣ−1

S+j

ΣS+j S∗j )βS∗j is bounded, respectively, for any j ∈ N0. Hence, maxj σ2

βj< ∞.

In addition, (A.10) implies var(uij) = 1. As a result, we only need to demonstrate that∑

j∈N0v2

jk = o(p), where vjk = cov(uij, uik).

It can be shown that vjk = vjk,1 + vjk,2, where vjk,1 = cov(ξij, ξik), vjk,2 = cov(δij, δik),

and ξij and δij are defined after equation (A.10). Hence, to complete the proof, it suffices to

show the following results:

p−1∑j∈N0

v2jk,1 = o(1) and p−1

∑j∈N0

v2jk,2 = o(1). (A.12)

We begin with proving the first equation of (A.12). Applying the Cauchy-Schwarz inequality,

it can be shown that

v2jk,1 = σ4

(σjk − ΣjSj

Σ−1Sj

ΣSjk − ΣjSkΣ−1Sk

ΣSkj − ΣjSjΣ−1Sj

ΣSjSkΣ−1Sk

ΣSkk

)2

≤ 4σ4{

σ2jk + (ΣjSj

Σ−1Sj

ΣSjk)2 + (ΣjSk

Σ−1Sk

ΣSkj)2 + (ΣjSj

Σ−1Sj

ΣSjSkΣ−1Sk

ΣSkk)2}

.

We then study the above four components separately. According to Condition (C2), one can

easily demonstrate that∑

j∈N0σ2

jk ≤ λ2max, which leads to p−1

∑j∈N0

σ2jk ≤ p−1λ2

max = o(1).

In addition, we have

(ΣjSjΣ−1Sj

ΣSjk)2 = ΣjSj

Σ−1Sj

ΣSjkΣjSjΣ−1Sj

ΣSjk ≤ tr(Σ−1Sj

ΣSjjΣjSjΣ−1Sj

)‖ΣSjk‖2

= ΣjSjΣ−2Sj

ΣSjj‖ΣSjk‖2 ≤ λ−2min‖ΣjSj

‖2‖ΣSjk‖2.

This, together with the assumption of Theorem 2 and Condition (C2), leads to

p−1∑j∈N0

(ΣjSjΣ−1Sj

ΣSjk)2 ≤ p−1

∑j∈N0

λ−2min‖ΣjSj

‖2||ΣSjk‖2 ≤ λ−2minλ

2maxp

−1∑j∈N0

‖ΣSjk‖2

22

= λ−2minλ

2maxp

−1∑j∈N0

l∈Sj

σ2lk = o(1).

Employing similar techniques, we can show that p−1∑

j(ΣjSkΣ−1Sk

ΣSkj)2 = o(1) and

p−1∑

j(ΣjSjΣ−1Sj

ΣSjSkΣ−1Sk

ΣSkk)2 = o(1). The above results complete the proof of the first

equation in (A.12).

We next prove the second equation of (A.12). According to the result in the proof of The-

orem 1, E(δij) ≤ Cδ maxl∈S+j|ρjl(Sj)| = o(1) for some positive constant Cδ. It follows that

p−1∑

j∈N0E2(δij)E

2(δik) = o(1); hence, we only need to show that p−1∑

j∈N0E2(δijδik) =

o(1). After algebraic simplification, we obtain that

E(δijδik

)= E

(XijXiS∗j βS∗j XikXiS∗kβS∗k

)− E(ΣjSj

Σ−1Sj

XSjiXiS∗j βS∗j XikXiS∗kβS∗k)

− E(XijXiS∗j βS∗j ΣkSk

Σ−1Sk

XSkiXiS∗kβS∗k)

+ E(ΣjSj

Σ−1Sj

XSjiXiS∗j βS∗j ΣkSkΣ−1Sk

XSkiXiS∗kβS∗k)

:= Q1j −Q2j −Q3j + Q4j.

For the sake of simplicity, we suppress the subscript i in the rest of the proof.

For any fixed k ∈ N0, Q1 :=∑

j∈N0Q2

1j, which can be bounded as follows,

Q1 ≤ CQΛβ

∑j∈N0

l∈S∗j ∩N1

u∈S∗k∩N1

E2(XjXkXlXu)|βl||βu|

= CQΛβ

∑j∈N0

u∈S∗j ∩S∗k∩N1

E2(XjXkX2u)β2

u

+ CQΛβ

∑j∈N0

l∈S∗j ∩S1

u∈S∗k∩N1,u 6=l

E2(XjXkXlXu)|βl||βu|

:= CQΛβQ11 + CQΛβQ12,

where CQ is a positive constant and Λβ = maxj,k∈N0

∑l∈S∗j ∩N1

∑u∈S∗k∩N1

|βl||βu|.

By Lemma 2, E(XjXkX2u) = σjkσuu +2σjuσku. Then, using the fact that

∑j∈N0

σ2ju < ∞

for any u ∈ {1, · · · , p}, we have∑

u∈N1

∑j∈N0

σ2ju = O(|N1|) = o(p). This, in conjunction

with Condition (C4), for any k ∈ N0, leads to

Q11 ≤∑j∈N0

u∈S∗j ∩S∗k∩N1

(2σ2

jkσ2uuβ

2u + 8σ2

juσ2kuβ

2u

)= o(p).

Furthermore, by Lemma 2, for any j 6= k 6= u 6= l, we have E(XjXkXlXu) = σjkσlu +

23

σjlσku + σjuσkl. After algebraic simplification, we obtain that

Q12 ≤ C12

∑j∈N0

l∈S∗j ∩N1

u∈S∗k∩N1,u6=l

(σ2

jkσ2lu|βlβu|+ σ2

jlσ2ku|βlβu|+ σ2

juσ2kl|βlβu|

)

for some positive constant C12 > 0. This, together with the fact that∑

q∈N1

∑j∈N0

σ2jq = o(p)

and Condition (C4), yields Q12 = o(p). Moreover, Condition (C4) implies that Λβ < C2β <

∞. Combining the above results, we finally have Q1 = o(p).

We subsequently study the bounded behavior of Q4 =∑

j∈N0Q2

4j. Let ΣjSjΣ−1Sj

:=

(dj1, · · · , dj|Sj |). By Condition (C2), the djl are bounded constants such that∑

l d2jl < ∞ for

l = 1, · · · , |Sj|. We then apply Lemma 2 and obtain that

Q4j =∑

q1∈Sj ,q2∈S∗j ∩N1,

l1∈Sk,l2∈S∗k∩N1

djq1dkl1βq2βl2E(Xq1Xq2Xl1Xl2)

=∑

q1∈Sj ,q2∈S∗j ∩N1,

l1∈Sk,l2∈S∗k∩N1

djq1dkl1βq2βl2

(σq1q2σl1l2 + σq1l1σq2l2 + σq1l2σq2l1

).

By the Cauchy-Schwarz inequality, we have

Q24j ≤

∑q1∈Sj ,q2∈S∗j ∩N1,

l1∈Sk,l2∈S∗k∩N1

d2jq1

d2kl1

β2q2

β2l2×

∑q1∈Sj ,q2∈S∗j ∩N1,

l1∈Sk,l2∈S∗k∩N1

(σq1q2σl1l2 + σq1l1σq2l2 + σq1l2σq2l1

)2

:= Q41,j ×Q42,j.

Conditions (C2) and (C4) imply that

Q41,j ≤∑

q1∈Sj

d2jq1

q2∈S∗jβ2

q2

l1∈Sk

d2kl1

l2∈S∗k

β2l2

< ∞.

In addition, by the Cauchy-Schwarz inequality, we obtain

Q42,j ≤ 3∑

q1∈Sj ,q2∈S∗j ∩N1,

l1∈Sk,l2∈S∗k∩N1

(σ2

q1q2σ2

l1l2+ σ2

q1l1σ2

q2l2+ σ2

q1l2σ2

q2l1

):= 3(Q421,j + Q422,j + Q423,j).

24

Let Q421 =∑

j∈N0Q421,j. After algebraic simplification, we have that

Q421 =∑j∈N0

∑q1∈Sj ,q2∈S∗j ∩N1,

l1∈Sk,l2∈S∗k∩N1

σ2q1q2

σ2l1l2

=∑j∈N0

q1∈Sj ,q2∈S∗j ∩N1

σ2q1q2

l1∈Sk,l2∈S∗k∩N1

σ2l1l2

.

By Condition (C2), we obtain that∑

l1∈Sk,l2∈S∗k∩N1σ2

l1l2≤ ∑

l2∈S∗k∩N1

∑l1∈Sk

σ2l1l2

= O(|N1|).In addition, the assumption of Theorem 2 leads to

∑j∈N0

q1∈Sj ,q2∈S∗j ∩N1

σ2q1q2

≤∑

q2∈N1

∑j∈N0

∑q1∈Sj

σ2q1q2

=∑

q2∈N1

o(p/|N1|2

)= o(p/|N1|).

Combining the above results, we obtain that Q421 = o(p). Analogously, we can demonstrate

that Q423 =∑

j∈N0Q423,j = o(p).

Subsequently, let Q422 =∑

j∈N0Q422,j, which can be simplified as

Q422 =∑j∈N0

∑q1∈Sj ,q2∈S∗j ∩N1,

l1∈Sk,l2∈S∗k∩N1

σ2q1l1

σ2q2l2

=∑j∈N0

q1∈Sj ,l1∈Sk

σ2q1l1

q2∈S∗j ∩N1,l2∈S∗k∩N1

σ2q2l2

.

By Condition (C2), we obtain that∑

q2∈S∗j ∩N1,l2∈S∗k∩N1σ2

q2l2≤ ∑

q2∈N1

∑l2∈N1

σ2q2l2

= O(|N1|).In addition, the assumption of Theorem 2 leads to

∑j∈N0

q1∈Sj ,l1∈Sk

σ2q1l1

=∑

l1∈Sk

∑j∈N0

∑q1∈Sj

σ2q1l1

≤ maxk|Sk|o

(p/Λ2

0

)= o(1).

As a result, we have that Q422 = o(p). This, together with Q421 = o(p) and Q423 = o(p),

yields Q42 = o(p). In sum, we obtain Q4 = o(p). Employing similar techniques to those used

in the proofs of Q1 = o(p) and Q4 = o(p), we are able to show that Q2 = o(p) and Q3 = o(p).

Consequently, we complete the proof of the second equation in (A.12), which implies that

(A.6) holds.

It is worth noting that G0,n(t) → t. This, in conjunction with (A.6), (A.7), and the

assumptions T1,n → T1(t) with T1(t) continuous and p0/p → 1 as p → ∞, indicates that

Conditions (a), (b), and (c) in Lemma 4 hold. Accordingly, the proof of Theorem 2 is

complete.

Appendix D. Proof of Theorem 3

Let Zj = n1/2βj/σβjbe the test statistic and pj be the corresponding p-value for j =

25

1, · · · , p. Define αn = 2{1 − Φ(n)} for some ~ < < 1/2 − κ, where ~ and κ are given in

(C7); hence, αn → 0 as n →∞. To prove the theorem, it suffices to show that

limn→∞

P{

V (αn) > 0}→ 0 and lim

n→∞P

{S(αn)/(p− p0) = 1

}→ 1.

It is worth noting that n → ∞ implicitly implies p → ∞. We demonstrate the above

equations in the following two steps accordingly.

STEP I. We show that P{V (αn) > 0} → 0. Using the fact that τ 2βj≥ σ2 for 1 ≤ j ≤ p,

we have

|Zj| ={∣∣∣(X>

j QSjXj)

−1X>j QSj

E + (X>j QSj

Xj)−1X>

j QSjXS∗j βS∗j

∣∣∣}/

(n1/2σβj)

≤ σ−1∣∣∣(X>

j QSjXj)

−1/2X>j QSj

E∣∣∣ + σ−1

∣∣∣(X>j QSj

Xj)−1X>

j QSjXS∗j βS∗j

∣∣∣.

This, together with Bonferroni’s inequality, leads to

P{

V (αn) > 0}

= P(

maxj∈N0

|Zj| > z1−αn/2

)

≤ P(

maxj∈N0

∣∣(X>j QSj

Xj)−1/2X>

j QSjE/σ

∣∣ > n/2)

+P(

maxj∈N0

∣∣(X>j QSj

Xj)−1X>

j QSjXS∗j βS∗j

∣∣ > σn/2). (A.13)

Consider the quantity,∣∣(X>

j QSjXj)

−1/2X>j QSj

E/σ∣∣, which is in the first term of the

right-hand side of the above equation. Employing the same technique as used in the proof of

Lemma 3, we obtain that maxj |n−1X>j QSj

Xj − (σjj −ΣjSjΣ−1Sj

ΣSjj)| = op(1). By Condition

(C2), one can easily verify that σjj − ΣjSjΣ−1Sj

ΣSjj ≥ λ−1max. The above two results lead to

n−1X>j QSj

Xj ≥ λ−1max uniformly for any j. In addition, (A.2), (A.3), and (A.4) imply that

maxj |X>j QSj

E| = maxj |X>j E −ΣjSj

Σ−1Sj

XSjE|×{1+ op(1)}. Accordingly, there exists some

constant Cm such that

maxj∈N0

∣∣(X>j QSj

Xj)−1/2X>

j QSjE/σ

∣∣ ≤ Cm maxj∈N0

n−1/2|X>j E|+ Cm max

j∈N0

n−1/2|ΣjSjΣ−1Sj

XSjE|.

By the Cauchy-Schwarz inequality, we have∣∣ΣjSj

Σ−1Sj

XSjE∣∣ ≤ ‖ΣjSj

Σ−1Sj‖maxj∈Sj

|X>j E|. By

Condition (C2), we further obtain ‖ΣjSjΣ−1Sj‖2 = ΣjSj

Σ−2Sj

ΣSjj ≤ λ2maxλ

−2min. As a result, there

26

exists some constant Cλ such that

maxj∈N0

∣∣(X>j QSj

Xj)−1/2X>

j QSjE/σ

∣∣ ≤ Cλ maxj∈N0

n−1/2|X>j E|.

This, in conjunction with Bonferroni’s inequality, yields

P(

maxj∈N0

∣∣(X>j QSj

Xj)−1/2X>

j QSjE/σ

∣∣ > n/2)≤

∑j∈N0

P(n−1

∣∣X>j E

∣∣ > C−1λ n−1/2/2

)

≤ 2p exp{− Chn

2} ≤ 2 exp

{− Chn2 + νn~

}

for some positive constant Ch. By definition, ~ < 2. Thus, the first term in the right-hand

side of the above equation, −CEn2/4, dominates the second term νn~, which immediately

leads to

limn→∞

P(

maxj∈N0

∣∣(X>j QSj

Xj)−1/2X>

j QSjE/σ

∣∣ > n/2)→ 0.

We next consider the quantity (X>j QSj

Xj)−1/2X>

j QSjXS∗j βS∗j , which is in the second term

of the right-hand side of equation (A.13). It is worth noting that

(X>j QSj

Xj)−1/2X>

j QSjXS∗j βS∗j = (X>

j QSjXj/n)−1/2n1/2

j∗∈S∗j%jj∗(Sj)βj∗

≤ Cβ

{min

jn−1X>

j QSjXj

}−1/2

maxj∗∈S∗j

|n1/2%jj∗(Sj)|.

Using the results of maxj |n−1X>j QSj

Xj−(σjj−ΣjSjΣ−1Sj

ΣSjj)| = op(1) and σjj−ΣjSjΣ−1Sj

ΣSjj ≥λ−1

max discussed after (A.13), we have that {minj n−1X>j QSj

Xj}−1/2 = Op(1). In addi-

tion, Condition (C5), together with the fact that Sj satisfies Condition (C6), leads to

maxj∗∈S∗j |n1/2%jj∗(Sj)| = o(1). By Corollary 1 of Kalisch and Buhlmann (2007), we im-

mediately obtain

maxj,S∗j

P{

maxj∗∈S∗j

|n1/2%jj∗(Sj)| > O(nb/2)}→ 0

for every ~ < b < 1. Taking b = (~+ )/2, we then have maxj∗∈S∗j |n1/2%jj∗(Sj)| = o(nb/2) =

o(n). This, in conjunction with the above result {minj n−1X>j QSj

Xj}−1/2 = Op(1), results

in

P(

maxj∈N0

∣∣(X>j QSj

Xj)−1X>

j QSjXS∗j βS∗j

∣∣ > σn/2)→ 0.

In sum, we have shown the asymptotic behavior of the first component in the right-hand

side of (A.13).

27

STEP II. We prove that limn→∞ P{

S(αn)/(p− p0) = 1}→ 1. By definition, we have

(p− p0)−1S(αn) = (p− p0)

−1∑j∈N1

I(|n1/2βj/σβj

| > n).

Applying the asymptotic result of the first component in the right-hand side of (A.13),

we have maxj |n1/2(βj − βj)/σβj| = o(n). Then, using the theorem’s assumption that

minj∈N1 |βj| ≥ Cκn−κ for some constants Cκ > 0 and κ > 0, we further obtain minj∈N1 |n1/2βj/σβj

| =O(n1/2−κ). Moreover, by Bonferroni’s inequality and the fact that + κ < 1/2, we have

P((p− p0)

−1S(αn) = 1)

= P(

minj∈N1

|n1/2βj/σβj| > n

)

≥ P(

minj∈N1

|n1/2βj/σβj| > n

)− P

(maxj∈N1

|n1/2(βj − βj)/σβj| > 2n

)→ 1,

which completes the proof of Step II. Consequently, the entire proof is complete.

REFERENCES

Bickel, P. and Levina, E. (2008). “Regularized estimation of large covariances matrix,” The

Annals of Statistics, 36, 199–227.

Bendat, J. S. and Piersol, A. G. (1966), Measurement and Analysis of Random Data, New

York: Wiley.

Benjamini, Y. and Hochberg, Y. (1995). “Controlling the false discovery rate: A practical

and powerful approach to multiple testing,” Journal of the Royal Statistical Society,

Series B, 57, 289–300.

Buhlmann, P. (2013). “Statistical significance in high-dimensional linear models,” Bernoulli,

to appear.

Bunea, F., Wegkamp, M., and Auguste, A. (2006). “Consistent variable selection in high

dimensional regression via multiple testing,” Journal of Statistical Planning and Infer-

ence, 136, 4349-4364.

Draper, N. R., and Smith, H. (1998). Applied Regression Analysis, 3rd Edition, New York:

Wiley.

28

Fan, J. and Lv, J. (2008). “Sure independence screening for ultra-high dimensional feature

space (with discussion),” Journal of the Royal Statistical Society, Series B, 70, 849–911.

Fan, J., Han, X., and Gu, W. (2012). “Estimating False Discovery Proportion Under

Arbitrary Covariance Dependence,” Journal of the American Statistical Association,

107, 1019–1035.

Goeman, J., Geer, V. D., and Houwelingen, V. (2006). “Testing against a high-dimensional

alternative,” Journal of the Royal Statistical Society, Series B, 68, 477–493.

Goeman, J., Houwelingen, V., and Finos, L. (2011). “Testing against a high dimensional al-

ternative in the generalized linear model: asymptotic type I error control,” Biometrika,

98, 381–390.

Huang, J., Ma, S., and Zhang, C.H. (2007). “Adaptive Lasso for sparse high-dimensional

regression models,” Statistica Sinica, 18, 1603-1618.

Kalisch, M. and Buhlmann, P. (2007). “Estimating high-dimensional directed acyclic

graphs with the PC-algorithm,” Journal of Machine Learning Research, 8, 613–636.

Lee, D., Lee, W., Lee, Y., and Pawitan, Y. (2001). “Sparse partial least-squares regression

and its applications to high-throughput data analysis,” Chemometrics and Intelligent

Laboratory Systems, 109, 1–8.

Li, R., Zhong, W., and Zhu, L. P. (2012). “Feature screening via distance correlation

learning,” Journal of the American Statistical Association, 107, 1129–1139.

Meinshausen, N. and Buhlmann, P. (2006). “High-dimensional graphs and variable selection

with the Lasso,” Annals of Statistics, 34, 1436-1462.

Meinshausen, N., Meier, L., and Buhlmann, P. (2009). “P-values for high-dimensional

regression,” Journal of the American Statistical Association, 104, 1671-1681.

Storey, J. D. (2002). “A direct approach to false discovery rates”, Journal of the Royal

Statistical Society, Series B, 64, 479–498.

Storey, J. D., Taylor, J. E., and Siegmund, D. (2004). “Strong control, conservative point

estimation and simultaneous conservative consistency of false discovery rates: a unified

approach,” Journal of the Royal Statistical Society, Series B, 66, 187–205.

29

Sun, T. and Zhang, C. H. (2012). “Scaled sparse linear regression,” Biometrika, 99, 879–

898.

Wang, H. (2009). “Forward regression for ultra-high dimensional variable screening,” Jour-

nal of the American Statistical Association, 104, 1512–1524.

Wang, H. (2012). “Factor profiled independence screening,” Biometrika, 99, 15–28.

Willink, R. (2004). “Bounds on the bivariate normal distribution function,” Communica-

tions in Statistics: Theory and Methods, 33, 2281–2297.

van de Geer, S., Buhlmann, P., and Ritov, Y. (2013). “On asymptotically optimal confi-

dence regions and tests for high-dimensional models,” Working Paper.

Zhang, C. H. and Zhang, S. S. (2013). “Confidence intervals for low dimensional parameters

in high dimensional linear models,” Journal of the Royal Statistical Society: Series B,

in Press.

Zhao, P. and Yu, B. (2006). “On model selection consistency of Lasso,” Journal of Machine

Learning Research, 7, 2541-2563.

Zhong, P. S. and Chen, S. X. (2011). “Tests for high dimensional regression coefficients

with factorial designs,” Journal of the American Statistical Association, 106, 260–274.

Zhong, P. S., Chen, S. X., and Xu, M. (2013). “Tests alternative to higher criticism for

high dimensional means under sparsity and column-wise dependence,” The Annals of

Statistics, to appear.

30

Table 1: Simulation results for Example 1 with α = 5% and q = 5%.

p n Methods ES EP FDR TR FR500 100 MUR 0.062 0.988 0.349 0.805 0.004

KFC 0.058 0.779 0.103 0.495 0.001

200 MUR 0.062 1.000 0.539 0.992 0.008KFC 0.054 0.950 0.072 0.767 0.001

500 MUR 0.062 1.000 0.642 1.000 0.011KFC 0.051 0.999 0.055 0.965 0.000

1000 100 MUR 0.058 0.986 0.328 0.771 0.002KFC 0.058 0.778 0.111 0.440 0.000

200 MUR 0.057 1.000 0.508 0.983 0.005KFC 0.053 0.944 0.072 0.704 0.000

500 MUR 0.057 1.000 0.658 1.000 0.007KFC 0.051 1.000 0.058 0.957 0.000

Table 2: Simulation results for Example 2 with α = 5% and q = 5%.p n Test ES EP FDR TR FR

500 100 MUR 1.000 1.000 0.994 1.000 1.000KFC 0.055 0.739 0.093 0.442 0.000

200 MUR 1.000 1.000 0.994 1.000 1.000KFC 0.050 0.936 0.053 0.813 0.000

500 MUR 1.000 1.000 0.994 1.000 1.000KFC 0.052 0.999 0.050 1.000 0.000

1000 100 MUR 1.000 1.000 0.997 1.000 1.000KFC 0.060 0.701 0.140 0.353 0.000

200 MUR 1.000 1.000 0.997 1.000 1.000KFC 0.052 0.907 0.073 0.753 0.000

500 MUR 1.000 1.000 0.997 1.000 1.000KFC 0.049 0.999 0.049 1.000 0.000

31

Table 3: Simulation results for Example 3 with α = 5%, q = 5%, Σ as given in Example 1,λ = 0.1, and Xis being generated from a standardized exponential distribution.

p n Test ES EP FDR TR FR500 100 MUR 0.062 0.985 0.364 0.804 0.004

KFC 0.057 0.752 0.072 0.456 0.000

200 MUR 0.061 1.000 0.541 0.989 0.008KFC 0.053 0.947 0.049 0.717 0.000

500 MUR 0.062 1.000 0.644 1.000 0.011KFC 0.051 0.999 0.050 0.963 0.000

1000 100 MUR 0.058 0.988 0.342 0.757 0.002KFC 0.058 0.756 0.083 0.403 0.000

200 MUR 0.057 1.000 0.513 0.983 0.004KFC 0.053 0.941 0.058 0.580 0.000

500 MUR 0.056 1.000 0.639 1.000 0.005KFC 0.052 1.000 0.048 0.944 0.000

Table 4: Simulation results for Example 3 with α = 5%, q = 5%, Σ as given in Example 2,λ = 0.1, and Xis being generated from a standardized exponential distribution.

p n Test ES EP FDR TR FR500 100 MUR 1.000 1.000 0.994 1.000 1.000

KFC 0.062 0.801 0.100 0.390 0.000

200 MUR 1.000 1.000 0.994 1.000 1.000KFC 0.054 0.995 0.054 0.765 0.000

500 MUR 1.000 1.000 0.994 1.000 1.000KFC 0.052 1.000 0.053 1.000 0.001

1000 100 MUR 1.000 1.000 0.997 1.000 1.000KFC 0.071 0.754 0.110 0.310 0.000

200 MUR 1.000 1.000 0.997 1.000 1.000KFC 0.058 0.986 0.076 0.632 0.000

500 MUR 1.000 1.000 0.997 1.000 1.000KFC 0.053 1.000 0.051 1.000 0.000

32

0.0

0.2

0.4

0.6

0.8

1.0

Threshold value t

Pro

babi

lity

0 0.006 0.014 0.022 0.03 0.038

True FDPEstimated FDP

A

0.0

0.2

0.4

0.6

0.8

1.0

Threshold value t

Pro

babi

lity

0 0.006 0.014 0.022 0.03 0.038

True FDPEstimated FDP

B

0.0

0.2

0.4

0.6

0.8

1.0

Threshold value t

Pro

babi

lity

0 0.006 0.014 0.022 0.03 0.038

True FDPEstimated FDP

C

0.0

0.2

0.4

0.6

0.8

1.0

Threshold value t

Pro

babi

lity

0 0.006 0.014 0.022 0.03 0.038

True FDPEstimated FDP

D

Figure 1: Panels A and B depict the estimated FDP value (i.e. FDR0.5(t)) against the trueFDP value obtained via the KFC method for Examples 1 and 2, respectively, while PanelsC and D are obtained via the MUR approach.

33

KFC

Fre

quen

cy

0.0 0.4 0.8

020

0040

0060

00

MUR

Fre

quen

cy

0.0 0.4 0.8

020

0040

0060

00

Figure 2: Supermarket data. The histograms of the p-values for the KFC and MUR tests.

KFC

Fre

quen

cy

0.0 0.4 0.8

020

040

060

0

MUR

Fre

quen

cy

0.0e+00 1.5e−05

020

040

060

0

Figure 3: Corn spectroscopic data. The histograms of the p-values for the KFC and MURtests.

34