the automatic construction of bootstrap confidence...

The Automatic Construction of Bootstrap

Confidence Intervals

Bradley Efron

Stanford University

The Standard Intervals

θ ± z(α)σ(z(0.975) = 1.96

)

θ a point estimate of θ

σ an estimate of its standard error

(Taylor series, jackknife, bootstrap)

Automatic!

Bradley Efron Stanford University

Automatic Construction of Bootstrap CIs 2 / 31

Poisson Example

x ∼ Poisson(θ)

Observe x = 16

95% central intervals:

Lo Up Right/Left

Standard 8.16 23.84 1

Exact 9.47 25.41 1.44

Bootstrap 9.42 25.53 1.45



0 5 10 15 20 25 30

0.00

0.02

0.04

0.06

0.08

0.10

Poisson example x~Poisson(theta), observe x=16;exact (black) and standard (red) 95% endpoints

x

f(x)

* * * * * * **

*

*

*

*

*

*

** *

*

*

*

*

*

*

*

**

* * * * *●

| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

16



Student-t Corrections

xiind∼ N(θ, σ2) i = 1,2, . . . ,n

θ = x , σ2 =∑

(xi − x)2/(n − 1)

θ ∈ θ ± t (α)n−1σ

t (0.975)15 = 2.13 cf z(0.975) = 1.96

t correction is order O(1/n)



Bootstrap Confidence Intervals

bca algorithm makes 3 corrections to standard endpoints

Corrections of order O(1/√

n)

Nonparametric bcajack (automatic)

Parametric bcapar (almost automatic)



The Diabetes Data

n = 442 patients

xi = (pi , yi)

pi vector of 10 baseline predictors

yi measure of disease progression, 1 year

x a 442 × 11 matrix

θ = adjusted R2 of y regressed on p

x = (lm(y ∼ p)$ adj.r.squared = 0.507)



Diabetes Datan = 442 subjects, 10 predictors, response y “progression”.

age sex bmi map tc ldl hdl tch ltg glu y

.04 .05 .06 .02 −.04 −.03 −.04 .00 .02 −.02 −1.13

.00 −.04 −.05 −.03 −.01 −.02 .07 −.04 −.07 −.09 −77.13

.09 .05 .04 −.01 −.05 −.03 −.03 .00 .00 −.03 −11.13−.09 −.04 −.01 −.04 .01 .02 −.04 .03 .02 −.01 53.87

.01 −.04 −.04 .02 .00 .02 .01 .00 −.03 −.05 −17.13−.09 −.04 −.04 −.02 −.07 −.08 .04 −.08 −.04 −.10 −55.13

......

......

......

......

......

...



Nonparametric Bootstrapping

Data matrix xn×p

: rows are p-vectors

θ = t(x) estimates parameter of interest θ

Bootstrap data matrix x∗: randomly choose n rows from x

with replacement

Gives θ∗ = t(x∗){θ∗(1), θ∗(2), . . . , θ∗(B)

}provides inferences for θ(

σboot = standard deviation of θ∗)



B=2000 nonparametric bootstrap reps for adj.r.squared;Only 37.2% of the boots are less than .507

thetahat*

Fre

quen

cy

0.45 0.50 0.55 0.60

020

4060

8010

0

| |^ ^.507

37.2%



0.70 0.75 0.80 0.85 0.90 0.95

0.45

0.50

0.55

Bca and standard two−sided intervals: adjusted R2 statistc(from bcajack; red bars indicate monte carlo error)

black = bca, green=standardcoverage

limits

●

●

●

●

●

●

●

●

68% 80% 90% 95%



Correcting the Standard Intervals

Standard CIs assume θ ∼ N(θ, σ2), σ2 constant

bca makes 3 corrections:

1 for non-normality: using boot cdf G

2 for bias: using z0 = Φ−1G(θ)(

diabetes: z0 = Φ−1(0.372) = −0.327)

3 for “acceleration”: or nonconstant σ2, using jackknife



The bca Level α Endpoint

θbca(α) = G−1Φ

(z0 +

z0 + z(α)

1 − a(z0 + z(α))

)

G bootstrap cdf

Φ standard normal cdf

z0 = Φ−1G(θ)

a acceleration estimate

Second-order accuracy Claimed coverage accurate to O(1/n),

compared to O(1/√

n)

for standard CIs



Jackknife Calculations

x(i) is (n − 1) × p matrix having xi removed from x

θ(i) = t(x(i))

θ(·) =∑n

1 θ(i)/n

Di = θ(i) − θ(·)

Jackknife standard error σjack =

n − 1n

n∑1

D2i

1/2

Acceleration estimate a =16

n∑1

D3i

/ n∑1

D2i

3/2



Grouped Jackknife

n can be enormous these days

partition x into m groups of size g = n/m:

Xk = {xi1 , xi2 , . . . , xig } (k = 1,2, . . . ,m)

Now X = (X1,X2, . . . ,Xm) has just m “points”

Diabetes m = 40, g = 11 (2 left over)

bcajack(x,2000, rfun,m = 40)



Partial output of bcajack(x, 2000, rfun, m=40)x = diabetes data, rfun = adjusted R2

α bcalims jacksd standard

.025 .424 .004 .443.16 .465 .001 .474.5 .497 .001 .507

.84 .529 .001 .539.975 .560 .002 .570



Monte Carlo Error

Two types of error “sampling” and “Monte Carlo”

σboot concerns sampling error of θ

Was B = 2000 enough bootstrap replications?

Answer Jackknife the bootstrap replications.



Jackknife Estimates ofMonte Carlo Error

Randomly partition bootstrap replications{θ∗(1), θ

∗

(2), . . . , θ∗

(2000)

}into 10 groups of 200 each

Remove one group at a time, and rerun bcajack

Use jackknife formula to get “jacksd”

No new bootstrapping required



0.70 0.75 0.80 0.85 0.90 0.95

0.45

0.50

0.55

Monte Carlo error is small for upper limits,not so small for lower limits


limits

●

●

●

●

●

●

●

●

68% 80% 90% 95%



More output of bcajack(x, 2000, rfun, m=40)

θ sdboot z0 a sdjack

estimate .507 .033 −.327 −.004 .034jacksd .000 .001 .027 .000 .000



Parametric Example: The Pediatric Death Data

800 cases African facility for very sick babies

600 survived, 200 died

11 predictors x = (respiration, heart rate, weight, . . . )

Logistic regression model

πi death probability (i = 1,2, . . . ,800)

yi =

1 probability πi

0 probability 1 − πi

where πi =1

1 + e−x′i α

θ = “resp” coefficient α1



Parametric Bootstrapping

glm

(y ∼ X

800×11,binomial

)gives MLE α, θ = α1 = 0.943

πi = 1/ (

1 + e−x′i α), i = 1,2, . . . ,800

Bootstrap sample y∗i =

1 prob πi

0 prob 1 − πi

glm(y∗ ∼ X ,binomial) gives α∗ and θ∗ = α∗1

Also need b∗ = X ′y∗ for bca calculations



Program bcapar

B bootstrap samples give

θ∗ =(θ∗(1), θ∗(2), . . . , θ∗(B)

)and b ∗

11×B= (b∗(1),b∗(2), . . . ,b∗(B))

bcapar(θ, θ∗,b ∗

)yields bca limits

θ and θ∗ give G and z0

b ∗ gives acceleration a



2000 parametric bootstrap replications, ped death data;coef 'resp', logistic regression, 11 covariates

resp*

Fre

quen

cy

0.6 0.8 1.0 1.2 1.4 1.6

020

4060

8010

012

0

| |^ ^.943

43.1%



0.70 0.75 0.80 0.85 0.90 0.95

0.6

0.7

0.8

0.9

1.0

1.1

1.2

confidence intervals for resp, using bcapar(resphat = .943)


limits

●

●

●

●

●

●

●

●

68% 80% 90% 95%



glmnet Estimate of α(instead ofMLE)

glmnet(X ,y) fits increasing succession of logistic regressions;

cross-validation used to choose “best” one (shrinkage)

glmnet gave (α) and θ = α1 = 0.862 (cf 0.943 for MLE)

Bootstrapping α −→ π −→ y∗ and then

glmnet(X ,y∗) gives θ∗ and β∗

Confidence limits from bcapar(θ, θ∗,b ∗

)



2000 boot estimates of 'resp' using glmnet estimation(point estimate .862)

resp*

Fre

quen

cy

0.4 0.6 0.8 1.0 1.2 1.4

020

4060

8010

012

014

0

| |^ ^.862

62.7%



0.70 0.75 0.80 0.85 0.90 0.95

0.6

0.7

0.8

0.9

1.0

1.1

1.2

bcapar confidence limits for 'resp' based on glmnet ;(glmnet point estimate = .862)


limits

●

●

●

●

●

●

●

●

68% 80% 90% 95%



ComparingMLE and glmnet Estimates of “resp”

glmnet estimate θ is smaller than MLE:

glmnet .86 ± .13

MLE .94 ± .15

However bcapar confidence limits have almost the same

centering (but glmnet intervals shorter)



0.70 0.75 0.80 0.85 0.90 0.95

0.6

0.7

0.8

0.9

1.0

1.1

1.2

Compare logistic regression limits (black) withglmnet limits (red)


limits

●

●

●

●

●

●

●

●

68% 80% 90% 95%



References

DiCiccio, T. J. and Efron, B. (1996). Bootstrap confidence intervals. Statist.

Sci. 11: 189–228.

Efron, B. (2018). R program bcajack. Available from the author’s web site

efron.web.stanford.edu under “Talks”.

Efron, B. and Hastie, T. (2016). Computer Age Statistical Inference:

Algorithms, Evidence, and Data Science. Cambridge: Cambridge

University Press, Institute of Mathematical Statistics Monographs (Book 5).



efron.web.stanford.edu

the automatic construction of bootstrap confidence...

Documents