bootstrap methods: recent advances and new...

Post on 23-May-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bootstrap Methods: RecentBootstrap Methods: RecentAdvances and New ApplicationsAdvances and New Applications

2007 ICTP Lectures2007 ICTP LecturesAnestis Anestis AntoniadisAntoniadis

UJF, LJK LabUJF, LJK LabTrieste October, 2007Trieste October, 2007

11

Bootstrap TopicsBootstrap Topics• Introduction to Bootstrap

• Wide Variety of Applications

• Confidence regions and hypothesis tests

• Examples where bootstrap is not consistent and remedies:(1) infinite variance case for a population mean and(2) extreme values

• Available Software

22

IntroductionIntroduction

• The bootstrap is a general method for doingstatistical analysis without making strongparametric assumptions.

• Efron’s nonparametric bootstrap, resamplesthe original data.

• It was originally designed to estimate bias andstandard errors for statistical estimates muchlike the jackknife.

33

Introduction (continued)Introduction (continued)

• The bootstrap is similar to earliertechniques which are also calledresampling methods:– (1) jackknife,– (2) cross-validation,– (3) delta method,– (4) permutation methods, and– (5) subsampling..

44

Introduction (continued)Introduction (continued)

The technique was extended, modified and

refined to handle a wide variety of problems

including:– (1) confidence intervals and hypothesis tests,

– (2) linear and nonlinear regression,

– (3) time series analysis and other problems

55

Introduction (continued)Introduction (continued)

Definition of Efron’s nonparametric bootstrap.Given a sample of n independent identicallydistributed (i.i.d.) observations X1, X2, …, Xn froma distribution F and a parameter θ of thedistribution F with a real valued estimatorθ(X1, X2, …, Xn ), the bootstrap estimates the

accuracy of the estimator by replacing F with Fn,the empirical distribution, where Fn placesprobability mass 1/n at each observation Xi.

66

Introduction (continued)Introduction (continued)

• Let X1*, X2

*, …, Xn* be a bootstrap sample, that

is a sample of size n taken with replacementfrom Fn .

• The bootstrap, estimates the variance of

θ(X1, X2, …, Xn ) by computing orapproximating the variance of

θ* = θ(X1*, X2

*, …, Xn* ).

77

Schematic of Bootstrap ProcessSchematic of Bootstrap Process

88

Introduction (continued)Introduction (continued)

• Statistical Functionals - A functional is amapping that takes functions into realnumbers.

• Parameters of a distribution can usually beexpressed as functionals of the populationdistribution.

• Often the standard estimate of aparameter is the same functional appliedto the empirical distribution.

99

Introduction (continued)Introduction (continued)

• Statistical Functionals and the bootstrap.• A parameter θ is a functional T(F) where T

denotes the functional and F is apopulation distribution.

• An estimator of θ is θh = T(Fn) where Fn isthe empirical distribution function.

• Many statistical problems involveproperties of the distribution of θ - θh , itsmean (bias of θh ), variance, median etc.

1010

Introduction (continued)Introduction (continued)• Bootstrap idea: Cannot determine the distribution of

θ - θh but through the bootstrap we can determine, orapproximate through Monte Carlo, the distribution ofθh - θ*, where θ* = T(Fn

*) and Fn* is the empirical

distribution for a bootstrap sample X1*, X2

*,…,Xn*

(θ* is a bootstrap estimate of θ).• Based on k bootstrap samples the Monte Carlo

approximation to the distribution of θh - θ* is used toestimate bias, variance etc. for θh .

• In bootstrapping θh substitutes for θ and θ* substitutes forθh . Called the bootstrap principle.

1111

Introduction (continued)Introduction (continued)

1212

Introduction (continued)Introduction (continued)

• Basic Theory: Mathematical results show thatbootstrap estimates are consistent in particularcases.

• Basic Idea: Empirical distributions behave inlarge samples like population distributions.Glivenko-Cantelli Theorem tells us this.

• The smoothness condition is needed to transferconsistency to functionals of Fn, such as theestimate of the parameter θ.

1313

Wide Variety of ApplicationsWide Variety of Applications

• Efron and others recognized that throughthe power of fast computing the MonteCarlo approximation could be used toextend the bootstrap to many differentstatistical problems .

1414

Wide Variety of ApplicationsWide Variety of Applications(continued)(continued)

• It can estimate process capability indices fornon-Gaussian data.

• It is used to adjust p-values in a variety ofmultiple comparison situations.

• It can be extended to problems involvingdependent data including multivariate, spatialand time series data and in sampling fromfinite populations.

1515

Wide Variety of ApplicationsWide Variety of Applications(continued)(continued)

• It also has been applied to problems involvingmissing data.

• In many cases, the theory justifying the use ofbootstrap (e.g. consistency theorems) has beenextended to these non i.i.d. settings.

• In other cases, the bootstrap has been modifiedto “make it work.” The general case ofconfidence interval estimation is a notableexample.

1616

Confidence regions and hypothesisConfidence regions and hypothesisteststests

• The percentile method and other bootstrapvariations may require 1000 or morebootstrap replications to be very useful.

• The percentile method only works underspecial conditions.

• Bias correction and other adjustments aresometimes needed to make the bootstrap“accurate” and “correct” when the samplesize n is small or moderate.

1717

Confidence regions and hypothesisConfidence regions and hypothesistests (continued)tests (continued)

• Confidence intervals are accurate or nearlyexact when the stated confidence level for theintervals is approximately the long run probabilitythat the random interval contains the “true” valueof the parameter.

• Accurate confidence intervals are said to becorrect if they are approximately the shortestlength confidence intervals possible for the givenconfidence level.

1818

Examples where the bootstrap failsExamples where the bootstrap fails

• Athreya (1987) shows that the bootstrapestimate of the sample mean isinconsistent when the populationdistribution has an infinite variance.

• Angus (1993) provides similarinconsistency results for the maximumand minimum of a sequence ofindependent identically distributedobservations.

1919

Bootstrap RemediesBootstrap Remedies

• In the past decade many of the problemswhere the bootstrap is inconsistentremedies have been found by researchersto give good modified bootstrap solutionsthat are consistent.

• For both problems describe thus far asimple procedure called the m-out-nbootstrap has been shown to lead toconsistent estimates .

2020

The The m-out-of-n m-out-of-n BootstrapBootstrap

• This idea was proposed by Bickel and Ren (1996) forhandling doubly censored data.

• Instead of sampling n times with replacement from asample of size n they suggest to do it only m timeswhere m is much less than n.

• To get the consistency results both m and n need to getlarge but at different rates. We need m=o(n). That ism/n→0 as m and n both → ∞.

• This method leads to consistent bootstrap estimates inmany cases where the ordinary bootstrap has problems,particularly (1) mean with infinite variance and (2)extreme value distributions.

2121

Available SoftwareAvailable Software• Resampling Stats from Resampling Stats Inc.

(provides basic bootstrap tools in easy to usesoftware and is good as an elementary teaching tool).

• SPlus from Insightful Corporation ( good foradvanced bootstrap techniques such as BCa, easy touse in new Windows based version). The currentmodule Resample is what I use in my bootstrap classat statistics.com.

• S functions provided by Tibshirani (see Appendix inEfron and Tibshirani text or visit Rob Tibshirani’s website http:/www.stat-stanford.edu/~tibs)

2222

Available Software (continued)Available Software (continued)

• Stata has a bootstrap algorithm available thatsome users rave about.

• Mathworks and other examples (see SusanHolmes web page):

http:/www-stat.stanford.edu/~susan) or contact herby email

• SAS macros are available and Proc MULTTESTdoes bootstrap sampling.

2323

References on confidence intervalsReferences on confidence intervalsand hypothesis testsand hypothesis tests

(1) Chernick, M.R. (1999). Bootstrap Methods: APractitioner’s Guide. Wiley, New York.(2) Chernick, M.R. (2007). Bootstrap Methods: AGuide for Practitioners and Researchers, 2nd

Edition. Wiley, New York.(3) Hall, P. (1992). The Bootstrap andEdgeworth Expansion. Springer-Verlag, NewYork.(4) Efron, B. (1982) The Jackknife, the Bootstrapand Other Resampling Plans. Society forIndustrial and Applied Mathematics CBMS-NSFRegional Conference Series 38, Philadelphia.

24

(5) Carpenter, J. and Bithell, J. (2000).Bootstrap confidence intervals: when, which,what? A practical guide for medical statisticians.Statistics in Medicine 19, 1141-1164.(6) Bahadur, R.R. and Savage, L.J. (1956). Thenonexistence of certain statistical procedures innonparametric problems. Annals ofMathematical Statistics 27, 1115-1122.(7) Ewens, W.J. and Grant, G.R. (2001).Statistical Methods in Bioinformatics AnIntroduction.

References on confidence intervals andReferences on confidence intervals andhypothesis tests (continued)hypothesis tests (continued)

25

References on when bootstrap failsReferences on when bootstrap fails(1) Angus, J. E. (1993). Asymptotic theory forbootstrapping the extremes. Communs. Statist.Theory and Methods 22, 15-30.(2) Athreya, K. B. (1987). Bootstrap estimationof the mean in the infinite variance case. Ann.Statist. 15, 724-731.(3) Bickel, P. J. and Freedman, D. A. (1981).Some asymptotic theory for the bootstrap. Ann.Statist. 9, 1196-1217.

26

References on when bootstrap failsReferences on when bootstrap fails(continued)(continued)

(4) Chernick, M.R. (1999). Bootstrap Methods: APractitioner’s Guide. Wiley, New York.

(5) Chernick, M.R. (2007). Bootstrap Methods: AGuide for Practitioners and Researchers, 2nd

Edition. Wiley, New York.

(6) Cochran, W. (1977). Sampling Techniques.3rd ed., Wiley, New York

27

References on when bootstrap failsReferences on when bootstrap fails(continued)(continued)

(7) Knight, K. (1989). On the bootstrap of thesample mean in the infinite variance case. Ann.Statist. 17, 1168-1175.

(8) LePage, R., and Billard, L. (editors). (1992).Exploring the Limit of Bootstrap. Wiley, NewYork.(9) Mammen, E. (1992). When Does theBootstrap Work? Asymptotic Results andSimulations Springer-Verlag, Heidelberg.

(10) Singh, K. (1981). On the asymptoticaccuracy of Efron’s bootstrap. Ann. Statist. 9,1187-1195.

28

top related