statistical bootstrapping peter d. christenson biostatistician january 20, 2005

Statistical Bootstrapping

Peter D. Christenson

Biostatistician

January 20, 2005

Outline

• Example of bootstrap in a previous paper• Basics: Precision of estimates• Practical difficulties• Bootstrap concept• Particular use in our example• Other uses

Good bootstrap reference with tutorial pdf* and software:

www.insightful.com/hesterberg/bootstrap

*Source for most figures here.

Paper Using Bootstrapping

Outcome: Labor progression was estimated by the duration of labor for each cm of cervical dilation using serial vaginal exams.

Predictor: Classification as overweight, obese or normal weight.

Many possible confounding factors, e.g. fetal size.

Paper: Major Results

p-value is based on the precision of the estimated durations of 6.20 and 7.52 hours.

p<0.01 ↔ the 99% CI centered at 7.52-6.20 is above 0 (so it is unlikely that there is no difference).

Paper: From statistical methods

Survival analysis methods were needed to estimate the median duration from, say 4 to 5 cm dilation, since the exact times were not known.

Nevertheless, they did not just use the p-values from the software output, but used “bootstrapping”. Why?

Basics: Precision vs. Normal Range

• A random sample of 100 women shows a 4-10 cm dilation duration mean±SD of 7.0±1.25.

• Normal (95%) range is ~ 7±2(1.25) = 4.5 to 9.5 hours.

• With no other info about a patient, we predict she will have between 4.5 and 9.5 hours, based on SD=variation among individuals.

• But, how well (precise) did the study estimate the mean duration of all women? We only have one mean, but want 7.0±2(SD of means).

Estimating Precision from Theory

“All” pregnant women

To get SD of mean, conceptually take many samples:

Of course, we don’t have the luxury of more than 1 sample. From math theory, SD of a mean of N is SD/√N = SEM:

“All” pregnant women

Mean of X-bars SD of X-bars

Extensions of the Theory

The theory has been extended beyond means (SEM) to SEs for more complex measures, such as predictions from regression:

Blue bands are “normal ranges” and red bands are CIs, showing precision.

But, the relation is not just a factor of √N, as with means.

Difficulties

For most situations, standard errors (SE) have been developed based on theory. They may not be accurate in some circumstances:

• The sample may not be a simple random one that is required for standard SE formulas.

• There may be non-sampling sources of variation, e.g., using estimated results from one analysis in further analyses.

• There may be approximations or assumptions required for the formulas based on theory that are known to not hold.

The Bootstrap Standard Error

Obtain a single sample of size N. Then:

1. Create thousands of samples with replacement of size N, called “bootstrap samples” or “resamples” from the original sample.

2. Calculate the quantity of interest, the bootstrap estimate, for each sample.

3. Find the bootstrap distribution of these quantities, and in particular their SD, which is the bootstrap SE:

M M* M*

M*s are the bootstrap

estimates of M

The Bootstrap SE: Concept

Consider a sample of N=6 with 3 bootstrap samples:

Mean±SD of original sample = 4.46±7.54.

SEM = 7.54/√6 = 3.08

Bootstrap SEM is SD(4.13,4.64,1.74) = 1.55

Here, bootstrap SE is awful since only 3 samples were drawn. Typically, thousands are used.

Back to Labor Progression Paper

Why was bootstrapping used in the paper?

The design used a stratified random sample, not a simple random sample:

There are SE formulas for some quantities from studies that over-sample some groups, as was done here, but perhaps not for these adjusted medians.

Other Bootstrap Advantages

• Typically fewer assumptions.

• Very general and reliable: can use the same software code for many estimation problems that have different formulas.

• Some studies show more accuracy than classical methods.

• Can model the entire estimation process, not just sampling error. See next 2 slides.

Labor Progression Paper: Covariate Adjustment Method

Recall that durations were adjusted for several covariates, including oxytocin use and fetal size.

The oversampling of heavier women was accounted for with bootstrapping so that these adjustments were unbiased.

A single set of covariates was used for all bootstrap samples.

Labor Progression Paper: Alternative Covariate Adjustment

Recall that durations were adjusted for several covariates, including oxytocin use and fetal size.

The entire process of selecting covariates could have been performed separately with each bootstrap sample.

This could better incorporate the uncertainty of choosing which factors need to be used for adjustment.

Conclusions

• Bootstrapping can avoid the requirement of unnecessary assumptions.

• It is not needed in most applications.• It is needed for studies w/o simple random

sampling, unless software for other sampling designs is used.

• For our paper here, it probably had a small impact, but could have been used to gain further advantages.

• In general, it is of relatively minor importance for most studies. Just excluding a confounder would typically have a greater impact.

statistical bootstrapping peter d. christenson biostatistician january 20, 2005

Documents

bootstrap samples

bootstrap sem

sample of n

outlineexample of bootstrap

bootstrap distribution

bootstrap standard errorobtain

simple random sample

stratified random sample