model-robust designs for quantile regression

This article was downloaded by: [University of Alberta]On: 22 April 2015, At: 14:19Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Click for updates

Journal of the American Statistical AssociationPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/uasa20

Model-Robust Designs for Quantile RegressionLinglong Kong & Douglas P. WiensAccepted author version posted online: 10 Oct 2014.Published online: 22 Apr 2015.

To cite this article: Linglong Kong & Douglas P. Wiens (2015) Model-Robust Designs for Quantile Regression, Journal of theAmerican Statistical Association, 110:509, 233-245, DOI: 10.1080/01621459.2014.969427

To link to this article: http://dx.doi.org/10.1080/01621459.2014.969427

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://crossmark.crossref.org/dialog/?doi=10.1080/01621459.2014.969427&domain=pdf&date_stamp=2014-10-10

http://www.tandfonline.com/loi/uasa20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/01621459.2014.969427

http://dx.doi.org/10.1080/01621459.2014.969427

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Model-Robust Designs for Quantile RegressionLinglong KONG and Douglas P. WIENS

We give methods for the construction of designs for regression models, when the purpose of the investigation is the estimation of theconditional quantile function, and the estimation method is quantile regression. The designs are robust against misspecified responsefunctions, and against unanticipated heteroscedasticity. The methods are illustrated by example, and in a case study in which they are appliedto growth charts.

KEY WORDS: Asymptotic mean squared error; B-splines; Compound design; Exchange algorithm; Genetic algorithm; Growth charts;Heteroscedasticity; Minimax bias; Minimax mean squared error; Nonlinear models; Regression quantiles; Uniformity.

1. INTRODUCTION

The need for robust methods of analysis in statistical inves-tigations was convincingly made by Huber (1981), in whosework one finds a concentration on robustness against departuresfrom the investigator’s assumed parametric model of the distri-bution generating the data. Box and Draper (1959) had earliermade the case that, when there is any doubt about the form ofthe response model in a regression analysis in which the choiceof design is under the control of the experimenter, then suchchoices should be made robustly, that is, with an eye to theperformance of the resulting designs under a range of plausiblealternate models. A focus of the work of Box and Draper was ondesigns robust against polynomial responses of degrees higherthan that anticipated by the experimenter. This was extended inone direction by Huber (1975), who derived minimax designsfor straight line fits; these minimize the maximum mean squarederror of the fitted values, with the maximum taken over a fullL2-neighborhood of the experimenter’s assumed response. Thiswork, for which it was assumed that the regression estimateswould be obtained by least squares, has in turn been extendedin numerous directions—by Li (1984) to finite design spaces;by Wiens (1992) to multiple regression; by Woods et al. (2006)to GLMs; and by Li and Wiens (2011) to dose-response studies,to list but a few.

A method of estimation with a degree of distributional robust-ness is M-estimation (Huber 1964). Such methods convey ro-bustness against outliers in the response variable of a regression,but have influence functions which are unbounded in the factorspace. For random regressors this unboundedness may be ad-dressed by the use of bounded influence (BI) methods (Maronnaand Yohai 1981; Simpson, Ruppert, and Carroll 1992); other-wise it can be controlled by the design. Designs to be used inleague with M- or BI-estimates have been studied by Wiens(2000) and Wiens and Wu (2010). In the latter article it wasfound that there is very little difference between designs opti-mal (in some sense, robust or not) for least squares and those forM-estimation; this is, however, not the case for BI-estimation.

Linglong Kong ([email protected]) and Douglas P. Wiens([email protected]), Department of Mathematical and StatisticalSciences, University of Alberta, Edmonton, Alberta, Canada T6G 2G1. Thiswork has been supported by the Natural Sciences and Engineering ResearchCouncil of Canada.

Color versions of one or more of the figures in the article can be found onlineat www.tandfonline.com/r/jasa.

An increasingly popular method of estimation and inferencewas furnished by Koenker and Bassett (1978), who elegantlyrestated the case for robustness, went on to extend the notion ofunivariate quantiles to regression quantiles, and derived quantileregression methods of estimating the conditional quantile func-tion. Koenker and Bassett point out that the influence functionof a quantile regression estimator is, like that of an M-estimator,unbounded in the factor space. This can again be addressed bythe design. Dette and Trampisch (2012) recently studied thisproblem, assuming that the experimenter’s assumed model iscorrect; to date there is no published work on designs for quan-tile regression methods, which extends the natural robustness ofthese methods against outliers to robustness against misspeci-fied response models. We do so in this article, and also considerrobustness against unanticipated heteroscedasticity.

The need for optimal designs for quantile regression methodswas convincingly articulated by Dette and Trampisch (2012).That for robustness of design can arise in numerous ways.Beyond the obvious—that in many studies the fitted model isadopted largely as an article of faith—there are numerous sce-narios in which the final goal is to fit models which might notfall within a standard design paradigm, but for which a prelimi-nary study with reasonable efficiency against a range of modelsmight furnish a point from which to expand the investigations.Some recent examples of this employ quantile regression inmodel selection (Behl, Claeskes, and Dette 2014), ecologicalstudies (Martınez-Silva et al. 2013), financial modeling (Rubiaand Sanchis-Marco 2013), and the fitting of time-varying coef-ficients (Ma and Wei 2012). In such cases the initial fitted modelmight be nonlinear; we address this in Section 2.2.

In Section 2 we outline our notion of misspecified responsemodels, and set the stage for the optimality problems to be ad-dressed in subsequent sections. The misspecification engendersa bias in the estimate, motivating our use of mean squared error(MSE) of the estimate of the conditional quantile function asa measure of the loss. In Section 3 we illustrate some designswhich minimize the maximum MSE, with this maximum takenover certain very broad classes of response misspecifications.Then in Section 4 we specialize to designs which address onlythe bias component of the MSE—this is somewhat of a returnto the findings of Box and Draper, who stated that “. . . the opti-mal design in typical situations in which both variance and bias

© 2015 American Statistical AssociationJournal of the American Statistical Association

March 2015, Vol. 110, No. 509, Theory and MethodsDOI: 10.1080/01621459.2014.969427

233

Dow

nloa

ded

by [

Uni

vers

ity o

f A

lber

ta]

at 1

4:19

22

Apr

il 20

15

mailto:[email protected]

mailto:[email protected]

http://www.tandfonline.com/r/jasa

http://www.amstat.org

http://pubs.amstat.org/loi/jasa

http://dx.doi.org/10.1080/01621459.2014.969427

234 Journal of the American Statistical Association, March 2015

occur is very nearly the same as would be obtained if variancewere ignored completely and the experiment designed so as tominimize the bias alone” (Box and Draper 1959, p. 622).

The bias-minimizing designs turn out to have design weightsproportional to the square roots of the variance functions, whenthese functions are known. If instead they are also allowed torange over a certain broad class of variance functions, thenuniform designs minimize the maximum bias over both types ofdepartures from the experimenter’s assumptions. In Section 5this optimality of uniform designs is extended to minimizationof the maximum MSE over both types of departures; we findthat the minimax designs are uniform on their support. Finally,in Section 6, we illustrate the theory we have developed in anapplication to growth charts, in which the regressors are cubic B-splines and the appropriate choice of knots, and their locations,is in doubt.

We have posted software (see http://www.stat.ualberta.ca/˜wiens/homepage/pubs/qrd.zip) which runs on MATLAB, andinstructions for its use, to compute the optimal designs in all ofthese scenarios. All derivations and longer mathematical argu-ments are in the Appendix, or in the online addendum Kong andWiens (2014).

2. APPROXIMATE QUANTILE REGRESSION MODELS

To set the stage for the examples of subsequent sections,suppose that an experimenter intends to make observations onrandom variables Y with structure

Y = f ′(x)θ + σ (x)ε, (1)

for a p-vector f of functionally independent regressors, eachelement of which is a function of a q-vector x of independentvariables chosen (the “design”) from a space χ . We assumethat the errors ε are iid, and that the variance function σ 2(x)is strictly positive on the support of the design. For a fixedτ ∈ (0, 1), f ′(x)θ is to be the conditional τ -quantile of Y , givenx:

τ = Gε(0) = GY |x( f ′(x)θ ). (2)

(We write GU (·) for the distribution function of a random vari-able U.)

Now suppose that (1) is only an approximation, and that infact

Y = f ′(x)θ + δn(x) + σ (x)ε, (3)

for some “small” model error δn. The dependence of δ on nis necessary for a sensible asymptotic treatment—in order thatbias and variance remain of the same order we will assume thatδn = O(n−1/2). For fixed sample sizes this is not necessary.

The experimenter, acting as though δn ≡ 0 and σ (·) is con-stant, computes the quantile regression estimate

θ = arg mint

n∑i=1

ρτ (Yi − f ′(xi)t), (4)

where ρτ (·) is the “check” function ρτ (r) = r(τ − I (r < 0)),with derivative ψτ (r) = τ − I (r < 0).

We will consider two types of design spaces χ . The first isdiscrete, with N possible design points {xi}Ni=1; here N is arbi-trary. We also consider a continuous, compact design space, with

Lebesgue measure VOL(χ )def= ∫

χdx, in which case the design

is generated by a design measure ξ (dx). Initially, we shall unifythe presentation by writing sums of the form

∑x∈design α(x),

in which a fraction ξn,i = ni/n of the n observations are to bemade at the design point x = xi , as Lebesgue-Stieltjes integrals,viz, as n

∑Ni=1 ξn,iα(xi) = n

∫χα(x)ξn(dx). We assume that the

design measure ξn has a weak limit ξ∞ for which

limn→∞

n∑i=1

ξn,iρτ (Yi − f ′(xi)t)

=∫χ

EY |x[ρτ (Y − f ′(x)t)]ξ∞(dx).

Under (3) the meaning of θ becomes ambiguous. Thus wedefine this “true” regression parameter as that making the ex-perimenter’s approximation (1) most accurate, under the exper-imenter’s assumption of homoscedasticity. For a discrete designspace this is

θ = arg mint

1

N

N∑i=1

EY |x[ρτ (Y − f ′(xi)t)]. (5)

Carrying out the minimization in (5) and evaluating at t = θ :

0 = 1

N

N∑i=1

EY |x[ψτ (Y − f ′(xi)t)] f (xi)

= 1

N

N∑i=1

[Gε(0) −Gε(−δn(xi))] f (xi)

= (gε(0) +O(n−1/2))1

N

N∑i=1

δn(xi) f (xi), (6)

where gε is the density of Gε. We now define δ0(x) =limn→∞

√nδn(x), so that

1

N

N∑i=1

δ0(xi) f (xi) = 0. (7)

In a continuous design space the average is replaced by anintegral—see Equation (8b).

The true conditional τ -quantile Yτ = f ′(x)θ + δn(x) is pre-dicted by Yτ = f ′(x)θ , and our approach is to obtain the asymp-totic mean squared error matrix MSE θ of the parameter es-timates, thus obtaining the average—over χ—MSE of thesepredicted values, and to maximize this average MSE over theappropriate choice

χ discrete: �0 ={δ0(·)| (i) N−1

N∑i=1

δ0(xi) f (xi) = 0

and (ii) N−1N∑i=1

δ20(xi) ≤ η2

}, (8a)

χ continuous: �0 ={δ0(·)| (i)

∫χ

δ0(x) f (x)dx = 0

and (ii)∫χ

δ20(x)dx ≤ η2

}. (8b)

Dow

nloa

ded

by [

Uni

vers

ity o

f A

lber

ta]

at 1

4:19

22

Apr

il 20

15

http://www.stat.ualberta.ca/~wiens/home page/pubs/qrd.zip

Kong and Wiens: Model-Robust Designs for Quantile Regression 235

This is carried out in Section 2.3. We also consider classesof variance functions. These may be independent of the designor—see Equation (18)—vary with the designs weights, in whichcase we also maximize the MSE over this class. In any eventwe then go on to find the MSE-minimizing designs ξ∗, using avariety of analytic and numerical techniques.

In most cases the optimal designs ξ∗ must be approximated inorder to implement them in finite samples; for example, whenq = 1 we will do this by placing the design points at the quantiles

xi = ξ−1∗

(i − 0.5

n

), (9)

or at the closest available points in discrete design spaces. Forq > 1 the situation is more interesting and some suggestions arein Fang and Wang (1994) and Xu and Yuen (2011); an intriguingpossibility as yet (to our knowledge) unexplored is the use ofvector quantization to approximate the designs.

2.1 Asymptotics

In Equation (8) the imposition of Equation (7), and its ana-logue in continuous spaces, ensures the identifiability of the pa-rameter in (3). The bounds of η2 force the errors due to variation,and those due to the bias engendered by the model misspecifi-cation, to remain of the same order asymptotically—a situationakin to the imposition of contiguity in the asymptotic theory ofhypothesis testing. Define

μ0 =∫χ

δ0(x)1

σ (x)f (x)ξ∞(dx), (10a)

P0 =∫χ

f (x) f ′(x)ξ∞(dx), (10b)

P1 =∫χ

f (x)1

σ (x)f ′(x)ξ∞(dx). (10c)

Assume that the support of ξ∞ is large enough that P0 and P1

are positive definite. Define the target parameter θ to be theasymptotic solution to (4), so that

n∑i=1

ξn,iψτ (Yi − f ′(xi)θ ) f (xi)pr→ 0, (11)

in agreement with (6). The proof of the asymptotic normality ofthe estimate runs along familiar lines—see Knight (1998) andKoenker (2005)—and so we merely state the result. Completedetails are in Kong and Wiens (2014).

Theorem 1. Under conditions (A1)–(A3) the quantile regres-sion estimate θn of the parameter θ defined by Equation (11) isasymptotically normally distributed:

√n(θn − θ)

L→ N

(P−1

1 μ0,τ (1 − τ )

g2ε (0)

P−11 P0 P−1

1

).

2.2 Nonlinear Models

Dette and Trampisch (2012) obtained (nonrobust) designs forquantile regression and nonlinear models; these were locally op-timal, with, in our notation, (2) replaced by τ = GY |x(F (x; θ )),where F (x; θ ) is evaluated at fixed values of those elements of

θ which enter in a nonlinear manner. Some robustness againstmisspecifications of these local parameters was then introducedby considering Bayesian and maximin designs. In this articlethe examples pertain only to linear models. However, since (asalso in Dette and Trampisch 2012), our approach is asymp-totic in nature, the results presented here are easily modified toaccommodate nonlinear models. The definition (4) of the esti-mate is replaced by θ = arg mint

∑ni=1 ρτ (Yi − F (xi ; t)), and

then, in all occurrences, f (x) is to be replaced by the gradientf θ (x) = ∂F (x; θ )/∂θ . With these changes Theorem 1 contin-ues to hold, as does the rest of the theory of the article. Therobustness is then attained against misspecifications in the func-tional form of F (x; ·), possibly but not necessarily arising frommisspecified parameters.

2.3 Maximum MSE Over �0; Discrete Design Spaces

From Theorem 1, the asymptotic MSE matrix of θ is

MSEθ = P−11

[τ (1 − τ )

g2ε (0)

P0 + μ0μ′0

]P−1

1 .

We now introduce a measure of the asymptotic loss when theconditional quantile Yτ (x) = f ′ (x) θ + δn (x), for x ∈ χ , is in-correctly estimated by Yn (x) = f ′ (x) θn. For a discrete designspace χ = {x1, . . . , xN } this measure is the limiting averagemean squared error

AMSE = limn

1

N

N∑i=1

E[{√n(Yn(xi) − Yτ (xi))}2].

In terms of A = N−1 ∑Ni=1 f (xi) f ′ (xi), and using (i) of (8a),

we find that

AMSE = tr(

A·MSE θ

)+ 1

N

N∑i=1

δ20 (xi)

= τ (1 − τ )

g2ε (0)

tr(

AP−11 P0 P−1

1

)+ μ′0 P−1

1 AP−11 μ0

+ 1

N

N∑i=1

δ20 (xi) . (12)

We now write merely ξ for ξ∞. We impose a boundN−1 ∑N

i=1 σ2(xi) ≤ σ 2

0 for a given σ 20 , and we denote by chmax

the maximum eigenvalue of a matrix. The maximum value ofAMSE over �0 is given in the following theorem.

Theorem 2. For a discrete design space χ define

T 0,0 =∑ξi>0

f (xi) f ′(xi)ξi , T 0,k

=∑ξi>0

f (xi) f ′(xi)(

ξi

σ (xi)/σ0

)k, k = 1, 2,

and

T 0 = T−10,1T 0,0T−1

0,1, T 2 = T−10,1T 0,2T−1

0,1. (13)

Then max�0 AMSE is τ (1−τ )σ 20

g2ε (0) + η2 times

Lν (ξ |σ ) = (1 − ν) tr (AT 0) + νchmax (AT 2) , (14)

where ν = η2/{ τ (1−τ )σ 20

g2ε (0) + η2} .

Dow

nloa

ded

by [

Uni

vers

ity o

f A

lber

ta]

at 1

4:19

22

Apr

il 20

15


-1 0 105

1015

(a)ξ

)x(

-1 0 10246

(b)

ξ)x(

-1 0 10

2

4

(c)

ξ)x(

-1 0 10

0.51

1.5

(d)

ξ) x(

-1 0 105

1015

(e)

ξ)x(

-1 0 10246

(f)

ξ)x(

-1 0 10123

(g)

ξ)x(

-1 0 10

0.51

1.5

(h)

ξ)x(

-1 0 105

1015

(i)

ξ)x(

-1 0 10246

(j)

ξ)x(

-1 0 10

1

2

(k)

ξ)x(

-1 0 10

0.5

1

(l)

ξ)x(

-1 0 10123

(m)

ξ)x(

-1 0 10

1

2

(n)

ξ)x(

-1 0 10

1

2

(o)

ξ)x(

-1 0 10

0.51

1.5

(p)

ξ)x(

Figure 1. Minimax (over�0) design measures for heteroscedastic straight line regression,N = 101, normalized so that bar area = 1. Columns1–4 use ν = 0.05, 0.35, 0.65, 0.95, respectively; rows 1–4 use σ (x) ∝ (1 + |x|)−1, 1, 0.2 + |x|, 1 + (x/2)2, respectively. The bullets below thehorizontal axes are the locations of n = 10 design points, implemented as at (9).

The first component (tr (AT 0)) of Lν (ξ |σ ) arises solely fromvariation, the second (chmax (AT 2)) from (squared) bias. Notethat (14 ) depends on σ0 only through {σ (x)/σ0} and throughν. We may thus without loss of generality take σ0 = 1 andparameterize the designs solely by ν ∈ [0, 1], which may bechosen by the experimenter, representing his relative concernfor errors due to bias rather than to variation.

3. EXAMPLES: DESIGNS MINIMIZING MAX�0 MSEFOR FIXED VARIANCE FUNCTIONS

Before extending the theory presented thus far, we illus-trate it for some representative, fixed variance functions in twocases—approximate straight line regression in a discrete designspace, and approximate quadratic regression in a continuous de-sign space. The development of the first case is given in somedetail in the Appendix; that for the second is outlined onlybriefly.

3.1 Discrete Design Spaces

For least squares regression problems with univariate designvariables and homoscedastic variances, optimally robust de-signs have been constructed by, among others, Fang and Wiens(2000), who computed exact designs by simulated annealing.Here we construct optimal designs for heteroscedastic quantileregression problems and also take a different approach to the

implementation—we obtain exact optimal values {ξ∗,i} and thenimplement the designs as at (9).

For a fixed variance function and a discrete design space weseek a design ξ∗ minimizing (14). We illustrate the methodin the case of approximate straight line models— f (xi) =(1, xi)′—and suppose that the design space χ consists ofN points in [−1, 1]. The space χ is symmetric in that ifχ = (x1, . . . , xN )′ (−1 = x1 < · · · < xN = 1) and χπ denotesthe reversal (xN, . . . , x1)′ then χπ = −χ . We consider symmet-ric designs, that is, designs for which ξ = (ξ1, . . . , ξN )′, withξi = ξ (xi), satisfies ξ (xi) = ξ (−xi). We also assume a sym-metric but arbitrary variance function σi = σ (|xi |).

The designs are obtained by variational arguments followedby a constrained numerical minimization; the details are in theAppendix. See Figure 1 for representative plots of the designs,scaled so as to have unit area. In these plots the bullets belowthe horizontal axes are the locations of n = 10 design points,implemented as at (9). In the case of homoscedasticity (plots(e)–(h)) the designs for very small ν are close in nature totheir nonrobust counterparts, placing point masses at ±1. As νincreases these replicates spread out into clusters near ±1 and,depending upon the variance function, possibly near 0 as well.The limiting behavior as ν → 1 is studied in Section 4.

3.2 Continuous Design Spaces

The continuous case requires special consideration. Ratherthan AMSE at (12) we use instead the integrated mean squared

Dow

nloa

ded

by [

Uni

vers

ity o

f A

lber

ta]

at 1

4:19

22

Apr

il 20

15


-1 0 10

2

4

(a))x(

m

-1 0 10

1

(b)

)x(m

-1 0 10

1

(c)

)x(m

-1 0 10

1

(d)

)x(m

-1 0 10

2

4

(e)

)x(m

-1 0 10

1

(f)

)x(m

-1 0 10

1

(g)

)x(m

-1 0 10

1

(h)

)x(m

-1 0 10

5

(i)

)x(m

-1 0 10

1

2

(j)

)x(m

-1 0 10

1

2

(k)

)x(m

-1 0 10

1

(l)

)x(m

-1 0 10

2

4

(m)

)x(m

-1 0 10

1

2

(n)

)x(m

-1 0 10

1

(o)

)x(m

-1 0 10

1

(p)

)x(m

Figure 2. Minimax (over�0) design densities for heteroscedastic quadratic regression on [−1, 1]. Columns 1–4 use ν = 0.05, 0.35, 0.65, 0.95,respectively; rows 1–4 use σ (x) ∝ (1 + |x|)−1, 1, 0.2 + |x|, 1 + (x/2)2, respectively. The bullets on the horizontal axes are the locations ofn = 10 design points, implemented as in (9).

error

IMSE = limn

∫χ

E[{√n(Yn(x) − Yτ (x))}2]dx,

together with A = ∫χ

f (x) f ′(x)dx, and obtain

IMSE = τ (1 − τ )

g2ε (0)

tr(

AP−11 P0 P−1

1

)+ μ′0 P−1

1 AP−11 μ0

+∫χ

δ20 (x) dx.

In order that the maximum IMSE be finite, it is necessary thatthe design measure be absolutely continuous. That this shouldbe so is intuitively clear—if ξ∞ in (10) places positive mass onsets of Lebesgue measure zero, such as individual points, then δ0

may be chosen arbitrarily large on such sets without altering itsmembership in�0, and one can do this in such a way as to driveIMSE beyond all bounds, through (10a). A formal proof maybe based on that of Lemma 1 in Heo, Schmuland, and Wiens(2001).

When implementing continuous designs we discretize; forinstance, when there is only one covariate we employ (9). Asa referee has pointed out, this might result in an unboundedIMSE along particularly pathological sequences {δn}. A possiblealternative, which we do not illustrate here since it is unlikelyto find favor with practitioners, is to randomly choose designpoints from the optimal design measure; in the parlance of gametheory this would thwart the intentions of a malevolent Nature,which can then not anticipate the design.

In the same vein Bischoff (2010) stated a criticism, in a con-text of discretized, absolutely continuous, lack-of-fit designs asproposed by Wiens (1991) and Biedermann and Dette (2001),of the very rich class of alternatives, analogous to (8b), used bythose authors. Bischoff suggested using a smaller class of al-ternatives; here we are, however, in accord with Wiens (1992),who states, “Our attitude is that an approximation to a designwhich is robust against more realistic alternatives is preferableto an exact solution in a neighbourhood which is unrealisticallysparse.”

We write m (x) for the density of ξ when dealing with con-tinuous design spaces and take

∫χσ 2(x)dx ≤ σ 2

0 (= 1, as in thediscrete case).

Theorem 3. For a continuous design space χ define T 0 andT 2 as at (13), with

T 0,0 =∫χ

f (x) f ′(x)m (x) dx and

T 0,k =∫χ

f (x) f ′(x)

(m (x)

σ (x)/σ0

)kdx, k = 1, 2.

Then the maximum IMSE is given by Equation (14).

As an example we minimize IMSE for approximate quadraticregression, that is, f (x) = (1, x, x2)′, and a fixed variance func-tion σ 2 (x), over the design space χ = [−1, 1]. Similar prob-lems, assuming homoscedasticity, were studied previously byShi, Ye, and Zhou (2003) using methods of nonsmooth opti-

Dow

nloa

ded

by [

Uni

vers

ity o

f A

lber

ta]

at 1

4:19

22

Apr

il 20

15


mization, and by Daemi and Wiens (2013) following the meth-ods used here and outlined in the Appendix.

We show in the Appendix that the minimizing density is ofthe form

m(x; a) =(q1 (x) σ (x) + q2 (x)

a00 + q3(x)σ (x)

)+, (15)

for polynomials qj (x) = a0j + a2j x2 + a4j x

4, j = 1, 2, 3. The10 constants aij forming a are chosen to minimize the loss

Lν (ξ |σ ) at (14) over a, subject to∫ 1−1m(x; a)dx = 1. Some

examples are illustrated in Figure 2. Again there is a pronouncedincrease in the spreading out of the mass as ν increases, and againunder homoscedasticity these masses are initially concentratednear ±1 and 0, as in the nonrobust case. It is rather evident fromthe plots in the rightmost panels of Figure 2 that, as ν → 1, thedensity m(x; a) becomes proportional to σ (x), a phenomenonexplained in the following section.

4. BIAS MINIMIZING DESIGNS

The following result is quite elementary, but since we use itrepeatedly we give it a formal statement and proof.

Proposition 1. (i) Suppose that χ is discrete, that the

function p (x) is defined on χ0 ⊂ χ and that Mqdef=∑

xi∈χ0q (xi) f (xi) f ′(xi) exists for q = p, q = p2 and q = 1

(1 (xi) ≡ 1), and is invertible for q = p and q = 1. Then, underthe ordering “�” with respect to positive semidefiniteness,

M−1p Mp2 M−1

p � M−11 . (16)

(ii) Suppose that χ is continuous, that the function p (x) is

defined on χ0 ⊂ χ and that Mqdef= ∫

χ0f (x) f ′(x)q (x) dx exists

for q = p, q = p2 and q = 1, and is invertible for q = p andq = 1. Then (16) holds.

In discrete design spaces we define Aξ = ∑ξi>0 f (xi) f ′(xi).

It then follows from Proposition 1 that T 2 � A−1ξ ; note as well

that A−1ξ � (N A)−1. Together these imply that

Lν=1(ξ |σ ) = chmax (AT 2) ≥ chmax(AA−1

ξ

) ≥ 1/N. (17)

Motivated by the remark of Box and Draper (1959) quotedin Section 1 of this article we note that, if the experimenterseeks robustness only against errors due to bias (so that ν =1), whether arising from a misspecified response model or aparticular variance function σ 2 (·), then the maximum bias isminimized by ξi = σ (xi) /

∑Ni=1 σ (xi), since then T 2 = A−1

ξ

and the lower bound in (17) is attained.Similarly, in a continuous design space the maximum bias is

minimized by m(x) = σ (x)/∫χσ (x)dx; for this we use Am =∫

m(x)>0 f (x) f ′(x)dx, in place of Aξ and obtain a lower boundof 1 in (17).

If the form of the variance function is in doubt, then a minimaxapproach dictates taking a further maximum over a class of suchfunctions. We consider the class �0 = {

σ 2ξ (·|r)|r ∈ (−∞,∞)

}of variance functions given by

σξ (x|r) ={crξ

r/2 (x) I (ξ (x) > 0) , χ discrete,crm

r/2 (x) I (m(x) > 0) , χ continuous;(18)

cr is the required constant of proportionality determined by, forexample, N−1∑

ξi>0 σ2ξ (xi |r) = 1. In the discrete case define

S0 =∑ξi>0

f (xi) f ′(xi)ξi

and

Sk = Sk (r) =∑ξi>0

f (xi) f ′(xi)ξk(1− r

2 )i for k = 1, 2.

Note that S0 = S1 (0) = S2 (1). If σ 2 (·) ∈ �0 thenLν=1 (ξ |σ ) = chmax

(AS−1

1 (r) S2 (r) S−11 (r)

), which by Propo-

sition 1 exceeds chmax(AA−1ξ ); this in turn is minimized by

the uniform design ξ∗, with ξ∗,i ≡ 1/N . Thus this approxi-mate design, which we implement as at (9), is minimax withrespect to bias, over �0. Similarly, in a continuous designspace the minimax bias design is the continuous uniform:m∗ (x) ≡ 1/VOL(χ ).

This discussion has revealed why the minimax designs exhib-ited in Figures 1 and 2 become proportional to σ (x) as ν → 1,and for ν = 1 are uniform if the maximization is also carriedout over �0. For ν < 1, if the form of σ (·) is known then thisknowledge can be used to increase the efficiency of the design,relative to uniformity. In the next section we show that, if σ (·)is unknown but is allowed to range over �0, then uniform (ontheir support) designs are again minimax with respect to MSE.

5. MSE MINIMIZING DESIGNS

Under (18) the maximized, over (8a), loss (14) is

Lν (ξ |r) = (1 − ν) c2r tr(

AS−11 (r) S0 S−1

1 (r))

+νchmax(

AS−11 (r) S2 (r) S−1

1 (r)).

Several cases are of interest for fixed r. The case r = 0 corre-sponds to homoscedasticity. That for r = 2 is treated in Kongand Wiens (2014). If r = 1 (a case which turns out to be leastfavorable—see the proof of Lemma 1), then

Lν (ξ |r = 1) = (1 − ν)N tr(

AS−11 (1) S0 S−1

1 (1))

+νchmax(

AS−11 (1) S0 S−1

1 (1)).

In this case the optimal, approximate design ξ∗ is again uniformon all of χ : ξ∗,i ≡ 1/N . And again in the parlance of gametheory, the experimenter’s optimal reply to Nature’s strategyof placing the variances proportional to the design weights isto design in such a way that this variance structure is in facthomoscedastic.

To see that ξ∗ is uniform, note that at this design we have S0 =S2 (1) = A and S1 (1) = √

N A, so that S−11 (1) S0 S−1

1 (1) =(N A)−1, and it suffices to note that for any other design ξ ,by Proposition 1, S−1

1 (1) S0 S−11 (1) � A−1

ξ � (N A)−1.Similarly, in the continuous case the uniform design, with

density m∗ (x), minimizes Lν (ξ |r = 1).To extend these optimality properties of uniform designs to all

of �0, we first consider the discrete case, and define Lν (ξ ) =maxr Lν (ξ |r). By the following lemma, a minimax design isnecessarily uniform on its support.

Lemma 1. If ξ is a design with k-point support{xi1 , . . . , xik } ⊂ χ (k ≤ N ), placing mass ξij at xij , and ξkis the design placing mass 1/k at each point xij , then Aξ =

Dow

nloa

ded

by [

Uni

vers

ity o

f A

lber

ta]

at 1

4:19

22

Apr

il 20

15


-1 0 1

1

2

3

4

5

6

7

8

eerged

0.754

0.778

0.796

0.803

0.811

0.812

0.807

0.799

0.754

0.778

0.796

0.803

0.811

0.812

0.807

0.799

Figure 3. Minimax compound, uniform designs ξp minimizingmax�0,�0 AMSE for polynomial regression of degrees p = 1, . . . , 8;n = 41, N = 101, ν = 0.5. Bullets indicate design points; bottom lineis the n-point implementation of the design ξ∗,i ≡ 1/N . EfficienciesLν(ξp)/Lν(ξ∗) are given at the right.

∑kj=1 f (xij ) f ′(xij ) and

Lν (ξ ) ≥ Lν (ξk) = (1 − ν)N tr(

AA−1ξ

)+ νchmax(

AA−1ξ

).

By Lemma 1 the search for minimax designs reduces tosearching for support points on which the design is to be uni-form. Since Aξ increases, in the sense of positive semidefinite-ness, as k increases, a minimax design ξ∗ necessarily has max-imum support size. Among approximate designs the optimalchoice is thus ξ∗,i ≡ 1/N, i = 1, . . . , N . Among exact designsξ∗ must have support size k∗ = min(n,N ); the support points{x∗i1, . . . , x∗

ik∗} are those minimizing

Lν (ξ∗) = (1 − ν)N tr(

AA−1k∗

)+ νchmax(

AA−1k∗

), (19)

with Ak∗ = ∑k∗j=1 f (x∗

ij) f ′(x∗

ij). We are then seeking a com-

pound optimal design, for which problems some general theoryhas been furnished by Cook and Wong (1994); in our case thereis, however, the additional restriction to uniformity.

Example 5.1. In the case of straight line models and symmet-ric designs on a symmetric interval, Ak∗ = diag(k∗,

∑k∗j=1 x∗2

ij).

Both components of Lν (ξ∗) are decreased by progressively in-cluding in the support the largest remaining design points, so

as to “increase” Ak∗ . If n is odd, then 0 must be in the sup-port; the remaining points—all points if n is even—are the2 × min([n/2] , [N/2]) symmetrically placed design points oflargest absolute value. If n is a multiple of N, say n = mN , thenthis design is replicated m times. If n = mN + t for 0 < t < N

then an exact uniform design is not attainable if m > 0. A pos-sible implementation is to place m observations at each of the Npoints in the design space, and to append to this the 2 [t/2] sym-metrically placed design points of largest absolute value (and 0,if t is odd).

Example 5.2. We have found an exchange algorithm to bevery effective at constructing compound designs minimizing(19). This has been carried out for polynomial regression over[−1, 1], with the restriction to symmetric designs. See Figure 3,in which some typical cases are displayed and compared withthe approximate design ξ∗,i ≡ 1/N , implemented as at (9). Theefficiencies given in the figure have been found to be quite stableover other choices of n,N , and ν.

In a completely analogous manner we find that the continuousuniform design, with density m∗ (x), minimizes the maximumof Lν (ξ |r) over �0 and �0.

6. CASE STUDY: ROBUST DESIGN IN GROWTHCHARTS

Growth charts, also known as reference centile charts, werefirst conceived by Quetelet in the 19th century, and are com-monly used to screen the measurements from an individual sub-ject in the context of population values; to this end they areused by medical practitioners, and others, to monitor people’sgrowth. A typical growth chart consists of a family of smoothcurves representing a few selected quantiles of the distributionof some physical measurements—height, weight, head circum-ference, etc.—of the reference population as a function of age.Extreme measurements on the growth chart suggest that thesubject should be studied further, to confirm or to rule out anunusual underlying physical condition or disease. The conven-tional method of constructing growth charts is to get the em-pirical quantiles of the measurements at a series of time points,and to then fit a smooth polynomial curve using the empiricalquantiles—see Hamill et al. (1979). In recent years, a number of

0 5 10 153.6

3.8

4

4.2

4.4

4.6

4.8

5

5.2

5.4

5.6

(a)

thgi

eh

gol

N = 44207

0 5 10 150

2000

4000

6000

8000

10000

12000

14000

16000

18000

(b)

yc

ne

uq

erf

0 5 10 153.8

4

4.2

4.4

4.6

4.8

5

5.2

5.4

selit

na

uql

an

oitid

no

C

(c)

τ = 0.05τ = 0.25τ = 0.5τ = 0.75τ = 0.95

Figure 4. (a) log (height) versus age in full dataset; (b) frequencies of ages; (c) conditional quantile curves computed from full dataset.

Dow

nloa

ded

by [

Uni

vers

ity o

f A

lber

ta]

at 1

4:19

22

Apr

il 20

15


0 2 4 6 8 10 12 14 16 180

0.2

0.4

0.6

0.8

1

(a)

0 2 4 6 8 10 12 14 16 180

0.2

0.4

0.6

0.8

1

(b)

Figure 5. Cubic splines for growth study. (a) “Full” spline basis isof dimension 16; (b) “Reduced” basis of dimension 12 has fewer anddifferent internal knots.

different methods have been developed in the medical statisticsliterature—see Wei and He (2006) for a review.

A recent method proposed by Wei et al. (2006) is to estimatea family of conditional quantile functions by solving nonpara-metric quantile regression. In particular, suppose that we wantto construct the growth charts for height. As is common practicein pediatrics, we will take the logarithm of height (Y , in cen-timeters) as response, and age (x, in years), as the covariate. Weconsider the nonparametric location/scale model

Y = μ(x) + σ (x)ε,

where the location function μ(x) and scale function σ (x) sat-isfy certain smoothness conditions. Given data (yi, xi), i =1, . . . , n the τ th quantile curve can be estimated by minimizing∑n

i=1 ρτ (yi − μ(xi)).For growth charts, it is convenient to parameterize the condi-

tional quantile functions as linear combinations of a few basisfunctions. Particularly convenient for this purpose are cubic B-

splines. Given a choice of knots for the B-splines, estimationof the growth charts is a straightforward exercise in parametriclinear regression.

The data—see Figure 4, and the detailed description in Pere(2000)—were collected retrospectively from health centres andschools in Finland. To construct the conditional quantile curvesin Figure 4(c), for ages from birth to 18 years, we used the entiredataset of size 44,207 and the internal knot sequence

{0.2, 0.5, 1.0, 1.5, 2.0, 5.0, 8.0, 10.0, 11.5, 13.0, 14.5, 16.0}.(20)

This sequence was also used by Wei et al. (2006); see alsoKong and Mizera (2012). Spacing of the internal knots is dic-tated by the need for more flexibility during infancy and inthe pubertal growth spurt period. Linear combinations of thesefunctions provide a simple and quite flexible model for theentire curve over [0, 18]. Denoting the selected B-splines bybj (x) j = 1, . . . , p = 16, we obtain the model (1) with μ(x) =f ′(x)θ for f (x) = (

b1(x), . . . , bp(x))′

and θ = (θ1, . . . , θp

)′.

However, due to uncertainty in the selection of knots and toother approximations underlying the model, the designer mightwell seek protection against departures of the form (3). In thisstudy we will explore how to sample from the available ages torobustly estimate the growth charts of heights.

In computing and assessing the designs we supposed that theexperimenter would use the internal knot sequence

{2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0}; (21)

one measure of design quality is then the accuracy with whichthe quantile curves in Figure 4(c), using the “true” model definedby (20), are recovered from the, much smaller, designed samplefitted using (21).

The design space consisted of theN = 1799 unique values ofx in the original dataset; these span the range [0, 18.0] in incre-ments of 0.01 with only two exceptions. We investigated fourtypes of designs; in all cases illustrated here we used n = 200.The first design—“saturated”—places equal weight at each of ppoints, where p = 12 is the number of regression parameters tobe estimated to fit the reduced cubic spline basis. The literatureprovides little guidance on the optimal locations of these points,but we have followed Kaishev (1989) who studied D-optimal

0 5 10 150

5

10

15

20

25

30

(a)

qerf

0 5 10 150

5

10

15

20

25

30

(b)

qerf

0 5 10 150

5

10

15

20

25

30

(c)

qerf

Figure 6. Designs (uniform design not shown) computed for Example 6.1: (a) Saturated design on 12 points; (b) Minbias design; (c) Minimaxdesign with ν0 = 0.5.

Dow

nloa

ded

by [

Uni

vers

ity o

f A

lber

ta]

at 1

4:19

22

Apr

il 20

15


0 0.5 10

2

4

6

8

10

σ ∝ 1/(1+x)0 0.5 1

0

2

4

6

8

10

σ ∝ 1

0 0.5 10

2

4

6

8

10

σ ∝ .2 + x0 0.5 1

0

2

4

6

8

σ ∝ 1+(x/2)2

saturatedunifminbiasminimax




Figure 7. Maximum AMSE Lν (ξ |σ ) versus ν for various choices of σ and the designs of Example 6.1; the minimax design was tailored toν0 = 0.5.

designs for spline models and conjectured that a “near” optimaldesign places its mass at the p locations at which the individualsplines—see Figure 5(b)—attain their maxima. Saturated de-signs enjoy favored status within optimal design theory, whenthere is no doubt that the fitted model is in fact the correct one.In this current study they turn out to be quite efficient unlessν is quite large, that is, loss dominated by bias, in which caseboth the uniform and minimax designs, described below, result

in predictions with substantially less bias. As well, the saturateddesigns are rather poor at recovering the quantile curves fromthe data gathered at this small number of locations.

The second design is the uniform, implemented as at (9). Thishas been seen to have minimax properties when the maximumis taken over very broad classes of departures from the nominalmodel. The third—“minbias”—is as described in Section 4,with designs weights proportional to σ (x), again implemented

0 103.8

4

4.2

4.4

4.6

4.8

5

5.2

selitnauQ

0 10-0.1

-0.05

0

0.05

0.1

(a) Saturated design

srorrE

0 103.8

4

4.2

4.4

4.6

4.8

5

5.2

0 10-0.1

-0.05

0

0.05

0.1

(b) Unif orm design

0 103.8

4

4.2

4.4

4.6

4.8

5

5.2

0 10-0.1

-0.05

0

0.05

0.1

(c) Minbias design

0 103.8

4

4.2

4.4

4.6

4.8

5

5.2

0 10-0.1

-0.05

0

0.05

0.1

(d) Minimax design

Figure 8. Quantile curves computed in Example 6.1 from the four designs (a)–(d) and reduced spline basis on knots (21), and deviations fromthose computed using the full dataset and knots (20).

Dow

nloa

ded

by [

Uni

vers

ity o

f A

lber

ta]

at 1

4:19

22

Apr

il 20

15


Table 1. Root-mean squared errors (standard errors) for the designs in Example 6.1

Design

Saturated Uniform Minbias Minimax

τ = 0.05 0.061 (0.0009) 0.044 (0.0003) 0.060 (0.0016) 0.052 (0.0008)τ = 0.25 0.028 (0.0005) 0.020 (0.0002) 0.035 (0.0014) 0.033 (0.0012)τ = 0.50 0.012 (0.0004) 0.009 (0.0002) 0.026 (0.0015) 0.026 (0.0014)τ = 0.75 0.022 (0.0002) 0.021 (0.0002) 0.031 (0.0011) 0.038 (0.0015)τ = 0.95 0.053 (0.0007) 0.046 (0.0003) 0.054 (0.0009) 0.054 (0.0010)

as at (9). It is not possible to implement such a design veryaccurately when n < N , and it will be seen that because of thisits minimum bias property is lost. In some cases it does howeverhave attractive behavior with respect to the variance componentof the MSE.

The final design—“minimax”—minimizes Lν (ξ |σ ) at (14)for a particular variance function σ 2 (x) chosen from those item-ized in the captions of Figures 1 and 2. The minimax designswere obtained using a genetic algorithm similar to that describedin Welsh and Wiens (2013). The algorithm begins by generat-ing a “population” of 40 designs—the three designs describedabove and 37 which are randomly generated. Each is assigned a“fitness” value, with the designs having the smallest MSE beingthe “most fit,” and a probabilistic mechanism is introduced bywhich the most fit members become most likely to be chosen tohave “children.” The children are formed from the parents in aparticular way; with a certain probability they are then subjectedto random mutations. In this way the possible parents in eachgeneration are replaced by their children, thus forming the nextgeneration of designs. A feature of the algorithm is that a certainproportion of the members—the most fit 10%—always surviveintact; in essence they become their own children. This ensuresthat the best member of each generation has MSE no larger thanthat in the previous generation. In all cases we terminated after1000 generations without improvement.

Example 6.1. We computed the four designs, using the vari-ance function with σ0 (x) ∝ 0.2 + x and, in the case of the min-imax design, a proportion ν0 = 0.5 of the emphasis placed onbias reduction. See Figure 6. The performance of all designsagainst all four of the variance functions is illustrated in Figure 7,

0 5 10 150

5

10

15

20

25

30

(a)0 5 10 15

0

2

4

6

8

10

12

14

16

18

(b)

Figure 9. Minimax designs in Examples 6.2 and 6.3; (a) ν0 = 0; (b)ν0 = 1.

where the maximum MSE Lν (ξ |σ ) at (14) is plotted against ν.The efficiency of the minimax design relative to the best of theother three, which we define in terms of the ratio of the corre-sponding values of Lν0 (ξ |σ0), was 1.40—a substantial gain. Wethen fit quantile curves, for τ = 0.05, 0.25, 0.5, 0.75, 0.95, tothe full dataset (Figure 4(c)) and after each design. See Figure 8.For each combination of design and τ , root-mse values were

computed as rmse =√

mean(Ydesign − Yfull)2, where Yfull and

Ydesign refer to predicted values using the full dataset or thoseobtained from the designs. This required simulating data, whichwe did as follows. To get data at design point x we sampledfrom a Normal distribution, with mean given by the value, at x,of the “τ = 0.5” curve in Figure 4(c) and variance σ 2

Y (x) esti-mated from the Y-values, at x, in the original data. This processwas carried out 100 times; the rmse values given in Table 1are the averages of those so obtained, followed by the standarderrors in parentheses. The growth and error curves are based onone representative sample. The uniform and minimax designsyielded samples from which the quantile curves were recoveredquite accurately; the saturated and minbias designs were gen-erally less successful. In examples not reported here we foundhowever that for substantially larger values of n—for instancen = 1000—the minbias design performed as well as the othersin this regard.

Example 6.2. We next took ν0 = 0—all emphasis on variancereduction—but otherwise retained the features of Example 6.1.The saturated, uniform and minbias designs, whose construc-tion does not depend on ν, were thus as in Example 6.1; theminimax design is in Figure 9(a) and again enjoyed a relativeefficiency of 1.40 over the best of the others. The plots of thequantile curves—not shown—tell much the same story as thosefor Example 6.1.

Example 6.3. We then took ν0 = 1—all emphasis on biasreduction—and obtained the minimax design in Figure 9(b),with a relative efficiency of 1.62. See Table 3, where we give

Table 2. Maximum MSE Lν (ξ |σ0 ∝ .2 + x) of the designs inExamples 6.1–6.3

Design

Minimax for:

Saturated Uniform Minbias ν0 = 0 ν0 = 0.5 ν0 = 1

ν = 0 10.33 12.94 12.02 7.37 7.40 12.20ν = 0.5 5.22 6.47 6.31 3.72 3.72 6.10ν = 1 0.111 0.008 0.600 0.068 0.037 0.005

Dow

nloa

ded

by [

Uni

vers

ity o

f A

lber

ta]

at 1

4:19

22

Apr

il 20

15


Figure 10. (a) Minbias design and (b) minimax (ν0 = 0.5) designfor Example 6.4; both for σ0(x) ∝ 1/(1 + x).

the values of Lν (ξ |σ0) for all six designs discussed in Examples5–7, at ν = 0, 0.5, 1.

Example 6.4. As a final example we reran Example 6.1, butusing σ0 (x) ∝ 1/ (1 + x). The minimax design had a relativeefficiency of 1.17 against the best—the minbias design—of theother three; the efficiency was much greater against the uniformand saturated designs. See Table 2.

7. SUMMARY AND CONCLUDING REMARKS

Dette and Trampisch (2012) studied locally optimal quantileregression designs for nonlinear models, and concluded witha call for future research into the robustness of designs withrespect to the model assumptions. In this article we have detailedsuch research, with specific attention to linear models but withan outline of the modest changes required to address nonlinearmodels.

Although a number of our methods described here are an-alytically and numerically complex, some general guidance ispossible. One recurring theme of this article is that uniformdesigns are often minimax in sufficiently large classes of thetypes of departures we consider. It has long been recognizedin problems of design for least squares regression that the uni-form design plays much the same role as does the median inrobust estimation—highly robust if not terribly efficient—andour findings seem to extend this role to quantile regression.

In seeking protection against bias alone, resulting from modelmisspecification and a particular variance function, designs withweights proportional to the root of the variance function turn outto be minimax against response misspecifications.

Table 3. Maximum MSE Lν (ξ |σ0 ∝ 1/ (1 + x)) of the designs inExample 6.4; minimax design uses ν0 = 0.5

Design

Saturated Uniform Minbias Minimax

ν = 0 10.69 20.81 7.88 6.29ν = 0.5 5.40 10.41 3.95 3.15ν = 1 0.111 0.006 0.017 0.016

Uniform designs and minimum bias designs are easily im-plemented. The more complex design strategies illustrated inSection 3 are more laborious, but it has been seen that a roughdescription of the outcomes, when there are already availablenonrobust designs which minimize the loss at the experimenter’sassumed model, is that the robust designs can at least be ap-proximated by taking the replicates prescribed by the nonrobuststrategies, and spreading these out into clusters of distinct butnearby design points.

The robust designs obtained here all yield substantial gainsin efficiency, as measured in terms of maximum loss, whencompared to their competitors—enough to warrant some com-putational complexity in their construction. As is seen from theplots of the designs—Figures 1, 2, 6, 9, and 10 in particular—again in efficiency should be realizable, without a great deal ofcomputation, by merely following the preceding heuristic ofclustering replicates, and combining this with design weightssuggested by the minimum bias paradigm.

APPENDIX: DERIVATIONS

Mathematical developments for Section 3.1. With definitions ζi =ξi/σi , ζ = (ζ1, . . . , ζN )′ and

γ0 = 1

N

N∑i=1

x2i , κ1 =

N∑i=1

ζi, ω1 =N∑i=1

ζ 2i , γ1 =

N∑i=1

x2i σiζi ,

κ2 =N∑i=1

x2i ζi , ω2 =

N∑i=1

x2i ζ

2i ,

(14) becomesLν(ξ ) = (1 − ν){ 1κ2

1+ γ0γ1

κ22

} + νmax{ ω1κ2

1,γ0ω2κ2

2}. We shall

restrict to variance functions for which we can verify that, evaluated at{ξ∗,i}Ni=1,

ω1

κ21

≥ γ0ω2

κ22

. (A.1)

We thus minimize (1 − ν){ 1κ2

1+ γ0γ1

κ22

} + νω1κ2

1, first with γ1, κ1 and κ2

fixed; we then minimize over these values. For this we minimize ω1,subject to

(i)N∑i=1

x2i σiζi = γ1, (ii)

N∑i=1

ζi = κ1, (iii)N∑i=1

x2i ζi = κ2,

(iv)N∑i=1

σiζi = 1. (A.2)

It is sufficient that ζ� 0 (i.e., all elements nonnegative) minimize theconvex function

�(ζ ,λ) =N∑i=1

[ζ 2i − 2a{(1 + λ1x

2i ) + σi(λ2 + λ3x

2i )}ζi],

with the multipliers a (1, λ1, λ2, λ3)′, prearranged in this convenientmanner, chosen to satisfy the side conditions. Since � is a sum ofunivariate, convex functions it is minimized over ζ� 0 at the pointwise

positive part ζ+0

def= (ζ+

01, . . . , ζ+0N

)′, where ζ 0 is the stationary point of

� and ζ+0i = max (ζ0i , 0). The calculations yield

ζ∗i = ζ∗i (λ) ={(

1 + λ1x2i

)+ σi(λ2 + λ3x

2i

)}+∑N

i=1 σi{(

1 + λ1x2i

)+ σi(λ2 + λ3x

2i

)}+ , (A.3)

with λ = (λ1, λ2, λ3)′ determined from (i), (ii), and (iii) of (A.2).

Dow

nloa

ded

by [

Uni

vers

ity o

f A

lber

ta]

at 1

4:19

22

Apr

il 20

15


We may now minimize over λ rather than over (γ1, κ1, κ2), so thatthe numerical problem is to minimize

L (λ) = (1 − ν)

{1

κ21

+ γ0γ1

κ22

}+ ν

κ21

N∑i=1

ζ 2i (λ) ,

with ζi (λ) defined by (A.3) and γ1 = γ1 (λ), κ1 = κ1 (λ), κ2 = κ2 (λ)defined by (i), (ii), and (iii) of (A.2). After doing this with a numericalconstrained minimizer we check (A.1). Then ξ∗i = σiζ∗i .

In Figure 1 we have illustrated only some representative variancefunctions for which (A.1) holds. When it does not, one can minimizeinstead (1 − ν) { 1

κ21

+ γ0γ1κ2

2} + ν

γ0ω2κ2

2and then check that, at the optimal

design, γ0ω2κ2

2≥ ω1

κ21

. If this also fails, then a more complex method which

is however guaranteed to succeed is that of Daemi and Wiens (2013),used in Section 3.2. �

Proof of Theorem 2. By (12) we are to find

max�0

AMSE = τ (1 − τ )

g2ε (0)

tr(

AP−11 P0 P−1

1

)

+ max�0

[μ′

0 P−11 AP−1

1 μ0 +N−1N∑i=1

δ20 (xi)

].

We use methods introduced in Fang and Wiens (2000). We first repre-sent the design by a diagonal matrix Dξ with diagonal elements {ξi}.Define Dσ to be the diagonal matrix with diagonal elements {σ (xi)}.Let Q1 be an N × p matrix whose columns form an orthogonal basisfor the column space of the matrix F with rows { f ′ (x) | x∈X }—recallthat this is “Q” in the QR-decomposition of F. Then F = Q1 Rfor a p × p, nonsingular triangular matrix R. Augment Q1 by Q2 :N × (N − p) whose columns form an orthogonal basis for the or-

thogonal complement of this space. Then [ Q1

... Q2] is an orthogonalmatrix and δ0 = (δ0 (x1) , . . . , δ0 (xN ))′ is, by (i) of (8a), of the formδ0 = ηQ2c, where ‖c‖ ≤ 1. In these terms A = N−1 R′ R and from(10a)–(10c),

μ0 = ηR′ Q′1 D−1

σ Dξ Q2c, P0 = R′ Q′1 Dξ Q1 R,

P1 = R′ Q′1 D−1

σ Dξ Q1 R.

Thus

max�0

[μ′

0 P−11 AP−1

1 μ0 +N−1N∑i=1

δ20 (xi)

]

= η2

Nmax‖c‖≤1

[(c′ Q′2 Dξ D−1

σ Q1 R)(P−11 R′ R P−1

1 )

×(R′ Q′1 D−1

σ Dξ Q2c) + c′ Q′2 Q2c]

= η2

Nchmax[ Q′

2{Dξ D−1σ Q1 R P−1

1 R′ R P−11 R′ Q′

1 D−1σ Dξ

+IN−p} Q2].

Some algebra, followed by a return to the original parameterization,results in (14). �

Proof of Theorem 3. This parallels the proof of Theorem 1 of Wiens(1992), and can also be obtained by taking limits, as N → ∞, inTheorem 2. �

Derivation of (15). The, rather lengthy, calculations for this sectionare available in Kong and Wiens (2014). As in Section 3.1, we considersymmetric designs and variance functions:m(x) = m(−x) and σ (x) =σ (−x). In terms of

μi =∫ 1

−1xim(x)dx, κi =

∫ 1

−1xim(x)

σ (x)dx, ωi =

∫ 1

−1xi(m(x)

σ (x)

)2

dx,

we define π = 2/(κ4κ0 − κ2

2

)2, φ002 = π/

(3κ2

2

)and

φ110 = π

[κ2

4 − 1

3κ4κ2

], φ112 = π

[1

3

(κ4κ0 + κ2

2

)− 2κ4κ2

],

φ114 = π

[κ2

2 − 1

3κ2κ0

],

φ120 = π

[1

3κ2

2 − κ4κ2

], φ122 = π

[κ4κ0 + κ2

2 − 2

3κ2κ0

],

φ124 = π

[1

3κ2

0 − κ2κ0

],

φ210 = π

[1

3κ2

4 − 1

5κ4κ2

], φ212 = π

[1

5

(κ4κ0 + κ2

2

)− 2

3κ4κ2

],

φ214 = π

[1

3κ2

2 − 1

5κ2κ0

],

φ220 = π

[1

5κ2

2 − 1

3κ4κ2

], φ222 = π

[1

3

(κ4κ0 + κ2

2

)− 2

5κ2κ0

],

φ224 = π

[1

5κ2

0 − 1

3κ2κ0

].

We then calculate that

tr (AT 0)def= ρ0 (m) = [φ110 + φ220] + [φ002 + φ112 + φ222]μ2

+ [φ114 + φ224]μ4,

and that

AT 2 =⎛⎝ φ110ω0 + φ112ω2 + φ114ω4 0 φ120ω0 + φ122ω2 + φ124ω4

0 φ002ω2 0φ210ω0 + φ212ω2 + φ214ω4 0 φ220ω0 + φ222ω2 + φ224ω4

⎞⎠ ,

whose characteristic roots are ρ1(m) = φ002ω2 and the two roots of(φ110ω0 + φ112ω2 + φ114ω4 φ120ω0 + φ122ω2 + φ124ω4

φ210ω0 + φ212ω2 + φ214ω4 φ220ω0 + φ222ω2 + φ224ω4

)def=(ψ11 ψ12

ψ21 ψ22

).

Of these two roots, one is uniformly greater than the other, and is

ρ2(m) = ψ11 + ψ22

2+{(

ψ11 − ψ22

2

)2

+ ψ12ψ21

}1/2

.

Thus the loss is max(L1(m),L2(m)), where Lk(m) = (1 − ν)ρ0(m) +νρk(m), k = 1, 2.

We apply Theorem 1 of Daemi and Wiens (2013), by which wemay proceed as follows. We first find a density m1 minimizing L1(m)in the class of densities for which L1(m) = max(L1(m),L2(m)), anda density m2 minimizing L2(m) in the class of densities for whichL2(m) = max(L1(m),L2(m)). Then the optimal design ξ∗ has density

m∗ ={m1, if L1 (m1) ≤ L2 (m2) ,m2, if L2 (m2) ≤ L1 (m1) .

The two minimizations are first carried out with μ2, μ4, κ0, κ2, κ4 heldfixed, thus fixing all φijk and ρ0(m). Under these constraints L1 (m1) ≤L2 (m2) iff ρ1(m1) ≤ ρ2(m2).

With the aid of Lagrange multipliers we find that bothm1 andm2 areof the form (15). The ten constants aij forming a are chosen to minimizethe loss subject to the side conditions, but it is now numerically simplerto minimizeLν (ξ |σ ) at (14) directly over a, subject to

∫ 1−1 m(x; a)dx =

1.The densitym(x; a) is overparameterized, and when σ (·) is noncon-

stant we take a01 = 1. In the homogeneous case we take a02 = 1 andalso ai1 ≡ 0 and a00 = 0. �

Proof of Proposition 1. We give the proof of (i); that of (ii) is similar.For i = 1, . . . , N define b(xi) = (M−1

p p(xi) − M−11 ) f (xi)I (xi ∈ χ0).

Dow

nloa

ded

by [

Uni

vers

ity o

f A

lber

ta]

at 1

4:19

22

Apr

il 20

15


Then

0 �N∑i=1

b(xi)b′(xi) = M−1

p Mp2 M−1p − M−1

1 .

Proof of Lemma 1. Write

Lν(ξ |r) = (1 − ν)Ntr(

AS−11 (r) S0 S−1

1 (r))

∑ξi>0 ξ

ri

+νchmax

(AS−1

1 (r) S2 (r) S−11 (r)

),

and note that Lν(ξk|r) = (1 − ν)N tr(AA−1ξ ) + νchmax(AA−1

ξ ), inde-pendently of r. Thus it suffices to show that for some r = rξ ,

Lν(ξ |rξ

) ≥ (1 − ν)N tr(

AA−1ξ

)+ νchmax

(AA−1

ξ

). (A.5)

In fact rξ = 1 serves the purpose. To see this note that by Proposition1,

tr(AS−11 (1)S0 S−1

1 (1))∑ξi>0 ξi

≥ tr(AA−1ξ ),

and that for any r, S−11 (r)S2(r)S−1

1 (r) � A−1ξ , so that also

chmax(AS−11 (r)S2(r)S−1

1 (r)) ≥ chmax(AA−1ξ ). This establishes (A.4)

with rξ = 1. �

[Received July 2014. Revised September 2014.]

REFERENCES

Behl, P., Claeskes, G., and Dette, H. (2014), “Focussed Model Selection inQuantile Regression,” Statistica Sinica, 24, 601–624. [233]

Biedermann, S., and Dette, H. (2001), “Optimal Designs for Testing the Func-tional Form of a Regression via Nonparametric Estimation Techniques,”Statistics and Probability Letters, 52, 215–224. [237]

Bischoff, W. (2010), “An Improvement in the Lack-of-Fit Optimality of the(Absolutely) Continuous Uniform Design in Respect of Exact Designs,”in Proceedings of the 9th International Workshop in Model-Oriented De-sign and Analysis (moda9), eds. Giovagnoli, Atkinson, and Torsney, BerlinHeidelberg: Springer-Verlag. [237]

Box, G. E .P., and Draper, N. R. (1959), “A Basis for the Selection of a Re-sponse Surface Design,” Journal of the American Statistical Association,54, 622–654. [233,238]

Cook, R. D., and Wong, W. K. (1994), “On the Equivalence of Constrained andCompound Optimal Designs,” Journal of the American Statistical Associa-tion, 89, 687–692. [239]

Daemi, M., and Wiens, D. P. (2013), “Techniques for the Construction of RobustRegression Designs,” The Canadian Journal of Statistics, 41, 679–695.[238,244]

Dette, H., and Trampisch, M. (2012), “Optimal Designs for Quantile RegressionMethods,” Journal of the American Statistical Association, 107, 1140–1151.[233,235,243]

Fang, K. T., and Wang, Y. (1994), Number-Theoretical Methods in Statistics,London, UK: Chapman and Hall. [235]

Fang, Z., and Wiens, D. P. (2000), “Integer-Valued, Minimax Robust Designsfor Estimation and Extrapolation in Heteroscedastic, Approximately LinearModels,” Journal of the American Statistical Association, 95, 807–818.[236,244]

Hamill, P. V. V., Dridzd, T. A., Johnson, C. L., Reed, R. B., Roche, A. F.,and Moore, W. M. (1979), “Physical Growth: National Center for HealthStatistics percentiles,” American Journal of Clinical Nutrition, 32, 607–629.[239]

Heo, G., Schmuland, B., and Wiens, D. P. (2001), “Restricted Minimax RobustDesigns for Misspecified Regression Models,” The Canadian Journal ofStatistics, 29, 117–128. [237]

Huber, P. J. (1964), “Robust Estimation of a Location Parameter,” The Annalsof Mathematical Statistics, 35, 73–101. [233]

——– (1975), “Robustness and Designs,” in A Survey of Statistical Designand Linear Models, ed. J. N. Srivastava, Amsterdam: North Holland, pp.287–303. [233]

——– (1981), Robust Statistics, New York: Wiley. [233]Kaishev, V. K. (1989), “Optimal Experimental Designs for the B-Spline Re-

gression,” Computational Statistics & Data Analysis, 8, 39–47. [240]Knight, K. (1998), “Limiting Distributions for l1 Estimators Under General

Conditions,” The Annals of Statistics, 26, 755–770. [235]Koenker, R. (2005), Quantile Regression, Cambridge, UK: Cambridge Univer-

sity Press. [235]Koenker, R., and Bassett, G. (1978), “Regression Quantiles,” Econometrica, 46,

33–50. [233]Kong, L., and Mizera, I. (2012), “Quantile Tomography: Using Quantiles With

Multivariate Data,” Statistica Sinica, 22, 1589–1610. [240]Kong, L., and Wiens, D. P. (2014), “Robust Quantile Regression De-

signs,” University of Alberta Department of Mathematical and Statisti-cal Sciences Technical Report S129, available at http://www.stat.ualberta.ca/˜wiens/homepage/pubs/TR S129.pdf. [234,235,238,244]

Li, K. C. (1984), “Robust Regression Designs When the Design SpaceConsists of Finitely Many Points,” The Annals of Statistics, 12,269–282. [233]

Li, P., and Wiens, D. P. (2011), “Robustness of Design for Dose-Response Studies,” Journal of the Royal Statistical Society, Series B, 17,215–238. [233]

Ma, Y., and Wei, Y. (2012), “Analysis on Censored Quantile Residual LifeModel via Spline Smoothing,” Statistica Sinica, 22, 47–68. [233]

Maronna, R. A., and Yohai, V. J. (1981), “Asymptotic Behaviour of GeneralM-Estimates for Regression and Scale With Random Carriers,” Zeitschriftfur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 58, 7–20. [233]

Martınez-Silva, I., Roca-Pardinas, J., Lustres-Perez, V., Lorenzo-Arribas, A.,and Cadaro-Suarez, C. (2013), “Flexible Quantile Regression Models: Ap-plication to the Study of the Purple Sea Urchin,” SORT—Statistics andOperations Research Transactions, 37, 81–94. [233]

Pere, A. (2000), “Comparison of Two Methods of Transforming Height andWeight to Normality,” Annals of Human Biology, 27, 35–45. [240]

Rubia, A., and Sanchis-Marco, L. (2013), “On Downside Risk PredictabilityThrough Liquidity and Trading Activity: A Dynamic Quantile Approach,”International Journal of Forecasting, 29, 202–219. [233]

Shi, P., Ye, J., and Zhou, J. (2003), “Minimax Robust Designs for Mis-specified Regression Models,” The Canadian Journal of Statistics, 31,397–414. [237]

Simpson, D. G., Ruppert, D., and Carroll, R. J. (1992), “On One-Step GMEstimates and Stability of Inferences in Linear Regression,” Journal of theAmerican Statistical Association, 87, 439–450. [233]

Wei, Y., and He, X. (2006), “Discussion Article: Conditional Growth Charts,”The Annals of Statistics, 34, 2069–2097. [240]

Wei, Y., Pere, A., Koenker, R., and He, X. (2006), “Quantile Regression Meth-ods for Reference Growth Charts,” Statistics in Medicine, 25, 1369–1382.[240]

Welsh, A. H., and Wiens, D. P. (2013), “Robust Model-Based Sampling De-signs,” Statistics and Computing, 23, 689–701. [242]

Wiens, D. P. (1991), “Designs for Approximately Linear Regression: Two Op-timality Properties of Uniform Designs,” Statistics and Probability Letters,12, 217–221. [237]

——– (1992), “Minimax Designs for Approximately Linear Regression,” Jour-nal of Statistical Planning and Inference, 31, 353–371. [233,237,244]

——– (2000), “Robust Weights and Designs for Biased Regression Models:Least Squares and Generalized M-Estimation,” Journal of Statistical Plan-ning and Inference, 83, 395–412. [233]

Wiens, D. P., and Wu, E. K. H. (2010), “A Comparative Study of Robust Designsfor M-Estimated Regression Models,” Computational Statistics and DataAnalysis, 54, 1683–1695. [233]

Woods, D. C., Lewis, S. M., Eccleston, J. A., and Russell, K. G. (2006), “De-signs for Generalized Linear Models With Several Variables and ModelUncertainty,” Technometrics, 48, 84–292. [233]

Xu, X., and Yuen, W. K. (2011), “Applications and Implementations of Contin-uous Robust Designs,” Communications in Statistics—Theory and Methods,40, 969–988. [235]

Dow

nloa

ded

by [

Uni

vers

ity o

f A

lber

ta]

at 1

4:19

22

Apr

il 20

15

http://www.stat.ualberta.ca/~wiens/homepage/pubs/TR S129.pdf

model-robust designs for quantile regression

Documents