biostatistics case studies 2007 peter d. christenson biostatistician session 3: incomplete data in...

24
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician http://gcrc.labiomed.org/ biostat Session 3: Incomplete Data in Longitudinal Studies

Upload: cecil-chambers

Post on 01-Jan-2016

225 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Biostatistics Case Studies 2007

Peter D. Christenson

Biostatistician

http://gcrc.labiomed.org/biostat

Session 3:

Incomplete Data in Longitudinal Studies

Page 2: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Case Study

Page 3: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Study Design

Page 4: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Study Results

1

2

3

Page 5: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Enrolled and Completed Subjects

Completer Analysis:N= 97+100.

LOCF Analysis: N=130+130. 33+30=63 were imputed.

MMRM Analysis: N=130+130. None were imputed.

When?

Page 6: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

General Reasoning

Completer Analysis:Biased; completers may differ from randomized in Phase III study. May be preferred in Phase II.

LOCF:Proposed as a neutral method to implement analysis on an intent-to-treat population.

MMRM:Uses all data, unlike completer analysis, but doesn’t impute unobserved data as in LOCF.

? Reasoning for this particular study ?

Page 7: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Imputation with LOCF

Completer

30

HAM-A Score

Week

0

• Ignores potential progression; conservative; usually attenuates likely changes and ↑ standard deviations.

• No correction for using unobserved data as if real.

Individual Subjects

0 1 2 3 4 6 8

denotes imputed: N=63/260

Use all 260 values as if observed here.

Page 8: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Change from Baseline

Baseline Final VisitIntermediate Visit

0

Change from Baseline

Intermediate Visit

Final VisitBaseline

0

LOCF:

Ignore Potential

Progression

LRCF:

Maintain Expected Relative

Progression

Individual Subjects

One Alternative: Last Rank Carried Forward

Page 9: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Completer vs. LOCF Analysis

LOCF Analysis

Δ b/w groups = 1.8

N=260:

197 actual, 63 imputed

Completer Analysis

Δ b/w groups = 2.5

N=197:

197 actual

Δ from baseline =~ 10

Clinically relevant Δ=?

(Week 8 or earlier)

Page 10: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Mixed Model Approach

The completer analysis removes some early data.The LOCF method adds unobserved later data.

E.g., remove week 0 or add week 8.

Mixed Model for Repeated Measures MMRM:

• Performs completer analysis.• Makes valid, but less preferred, comparison

using data omitted from completer analysis.• Combines these two results. • This paper only gives p-values for results.

Example: Next Slide

Page 11: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

MMRM Example*

*Brown, Applied Mixed Models in Medicine, Wiley 1999.

Consider a crossover (paired) study with 6 subjects. Subject 5 missed treatment A and subject 6 missed B.

Completer analysis would use IDs 1-4; trt diff=4.25.Strict LOCF analysis would impute 22,17; trt diff=2.83.

LOCF Difference

8

2

-1

8

0

0

2.83

Page 12: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

MMRM Example Cont’d

ΔW=4.25 Paired

ΔB=5 Unpaired

Mixed model gets the better* estimate of the A-B difference from the 4 completers paired mean Δw=4.25.It gets a poorer unpaired estimate from the other 2 subjects ΔB = 22-17 = 5.

How are these two “sub-studies” combined?

*Why better?

Page 13: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

MMRM Example Cont’d

ΔW=4.25 Paired

ΔB=5 Unpaired

The overall estimated Δ is a weighted average of the separate Δs, inversely weighting by their variances:

Δ = [ΔW/SE2(ΔW) + ΔB/SE2(ΔB)]/K

= [4.25/4.45 + 5.0/43.1]/(1/4.45 + 1/43.1) = 4.32

The 4.45 and 43.1 incorporate the Ns and whether data is paired or unpaired: How are they found?

Page 14: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

MMRM Example - SAS Output

Covariance Parameter Estimates CS 12.6264 Residual 8.8996

StandardEffect trt Estimate Error DF t Value Pr > |t|Intercept 18.1454 2.0162 5 9.00 0.000trt 1 4.3203 2.0082 3 2.15 0.1206

A-B Diff

Within Subjects (Paired)

Among Subjects

SE2 for N=4+4 Paired ΔW=4.25:

8.90(1/4 + 1/4) = 4.45

SE2 for N=1+1 Unpaired ΔB=43.1:

(8.90+12.63)(1/1 + 1/1) = 43.1

Also 4.45 and 43.1 are used to get SE(Δ) = 2.01

Page 15: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

MMRM - More General I

The example was “balanced” in missing data, with information from both treatments A and B in the unpaired data.

What if all missing data are at week 8, and none at week 0, as in our paper?

The unpaired week 0 mean is compared with the combined paired week 0 and week 8 mean, giving an estimate of half of the week 0 to week 8 difference. It is appropriately weighted with the paired week 0 to week 8 estimate.

Page 16: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

MMRM - More General II

Can the intervening week 1 to week 6 data be used to improve further the week 0 to week 8 comparison?

That information could be used to better estimate the variances and covariances, if we are willing to make assumptions, e.g., a consistency of variability at each time.

Are these just “just so”, common-sense results?

Mixed model estimates satisfy certain statistical optimality criteria, provided that the model assumptions hold.

Page 17: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

MMRM - Warning

Software has many options since mixed models are general and flexible. Defaults may not be appropriate.

Requires specifying model structure; assumptions needed; should check assumptions.

More experience needed than typical methods. Start by comparing ANOVA for a no-missing-data study with mixed model.

See next slide for some modeling needed.

Page 18: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Some Covariance Patterns

Compound Symmetry

Estimated Covariance Pattern:

Week 0 8

0 (7.06)2 12.4 8 12.4 (7.06)2

Correlation = 12.4/7.1*7.1=0.25

This model forces the SD among subjects to be the same at each week.

But: Week 0 SD = 5.2 Week 8 SD = 8.8

Unstructured

Estimated Covariance Pattern:

Week 0 8

0 (5.21)2 12.4 8 12.4 (8.79)2

Correlation = 12.4/5.2*8.8=0.27

This model allows different SDs among subjects at each week.

Page 19: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

MMRM Role in Major Analysis Methods

Remaining slides put MMRM in context with other major methods.

Mixed models are more general; MMRM is special case.

Page 20: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Big Picture: “Multiple” Data Lingo

Multiple Regression:Outcome: Single value, say HAM-A at 8 weeks. Predictors: Multiple - treatment, covariates (age, baseline disease severity, other meds, etc.)

Multivariate ANOVA (MANOVA):Outcome: Multiple, say (HAM-A, SDS, Dizziness) at 8 weeks, as a pattern or profile.Predictors: Single, say only treatment, or multiple.

Repeated Measures: (longitudinal, as in this paper)Outcome: Single quantity, say HAM-A.Predictors: Time, and others (treatment, covariates).

Page 21: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Repeated Measures Studies

The same subjects are measured repeatedly on the same outcome, usually at different times or body sites to be compared.

Does not apply to only replicated measurements, e.g. multiple histology slices that are averaged.

Time is usually relative, such as from start of treatment, or may be calendar time as in epidemiological studies.

Usually have fixed time intervals, but times may be different for different subjects, e.g., retrospective series of clinic visits.

Study goals will dictate type of analysis - next slide.

Page 22: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Goals in Repeated Measures Studies

Some study objectives:

• Compare overall time-averaged treatment.

• Specific features of pattern, as in pharmacokinetic studies of AUC, peak, half-life, etc.

• Compare treatments at every time point.

• Compare treatments on rate of change over time.

• Compare treatments at end of study.

Page 23: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Mixed Models in General

“Mixed” means combination of fixed effects (e.g., drugs; want info on those particular drugs) and random effects (e.g., centers or patients; not interested in the particular ones in the study).

AKA multilevel models, hierarchical models.

Very flexible, incorporate unequal patient variability, correlation, pairing, repeated values at multiple levels, subject clustering e.g., from the same family, and data missing at random.

More specifications required than typical analyses.

Page 24: Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician  Session 3: Incomplete Data in Longitudinal Studies

Summary: Mixed Models Repeated Measures

• Currently one of the preferred methods for missing data.

• Does not resolve bias if missingness is related to treatment.• Requires more model specifications than is typical.• Mild deviations from assumed covariance pattern do

not usually have a large influence.• May be difficult to apply objectively in clinical trials where the primary analysis needs to be detailed a priori.• Can be intimidating; need experience with modeling;

software has many options to be general and flexible.