power analysis and sample size determination

32
Power Analysis and Sample Size Determination AK Dhamija

Upload: ajay-dhamija

Post on 24-May-2015

13.971 views

Category:

Business


3 download

TRANSCRIPT

Page 1: Power Analysis and Sample Size Determination

Power Analysis and Sample SizeDetermination

AK DhamijaAK Dhamija

Page 2: Power Analysis and Sample Size Determination

Researchers differ

A researcher conducted a study comparing theeffect of an intervention vs placebo on reducingbody weight, and found 5 kg reduction amongthe intervention group with P=0.01.

Another researcher conducted a similar studycomparing the effect of the same intervention vsthe same placebo on reducing body weight, andfound the same 5 kg reduction with theintervention group but could not claim that theintervention was effective because P=0.35.

A researcher conducted a study comparing theeffect of an intervention vs placebo on reducingbody weight, and found 5 kg reduction amongthe intervention group with P=0.01.

Another researcher conducted a similar studycomparing the effect of the same intervention vsthe same placebo on reducing body weight, andfound the same 5 kg reduction with theintervention group but could not claim that theintervention was effective because P=0.35.

Page 3: Power Analysis and Sample Size Determination

Agenda Power Sample Size Calculations Examples Changes in the basic formulae Flaws in Statements

Power Sample Size Calculations Examples Changes in the basic formulae Flaws in Statements

Page 4: Power Analysis and Sample Size Determination

Power is Effected by…..

Variation in the outcome (σ2) ↓ σ2 → power ↑

Significance level (α) ↑ α → power ↑

Difference (effect) to be detected (δ) ↑ δ → power ↑

One-tailed vs. two-tailed tests Power is greater in one-tailed tests than in

comparable two-tailed tests

Variation in the outcome (σ2) ↓ σ2 → power ↑

Significance level (α) ↑ α → power ↑

Difference (effect) to be detected (δ) ↑ δ → power ↑

One-tailed vs. two-tailed tests Power is greater in one-tailed tests than in

comparable two-tailed tests

Page 5: Power Analysis and Sample Size Determination

Power Changes

2n = 32, 2 sample test, 81% power, δ=2,σ = 2, α = 0.05, 2-sided test

Variance/Standard deviation σ: 2 → 1 Power: 81% → 99.99% σ: 2 → 3 Power: 81% → 47%

Significance level (α) α : 0.05 → 0.01 Power: 81% → 69% α : 0.05 → 0.10 Power: 81% → 94%

2n = 32, 2 sample test, 81% power, δ=2,σ = 2, α = 0.05, 2-sided test

Variance/Standard deviation σ: 2 → 1 Power: 81% → 99.99% σ: 2 → 3 Power: 81% → 47%

Significance level (α) α : 0.05 → 0.01 Power: 81% → 69% α : 0.05 → 0.10 Power: 81% → 94%

Page 6: Power Analysis and Sample Size Determination

Power Changes

2n = 32, 2 sample test, 81% power, δ=2, σ = 2,α = 0.05, 2-sided test

Difference to be detected (δ) δ : 2 → 1 Power: 81% → 29% δ : 2 → 3 Power: 81% → 99%

Sample size (n) n: 32 → 64 Power: 81% → 98% n: 32 → 28 Power: 81% → 75%

One-tailed vs. two-tailed tests Power: 81% → 88%

2n = 32, 2 sample test, 81% power, δ=2, σ = 2,α = 0.05, 2-sided test

Difference to be detected (δ) δ : 2 → 1 Power: 81% → 29% δ : 2 → 3 Power: 81% → 99%

Sample size (n) n: 32 → 64 Power: 81% → 98% n: 32 → 28 Power: 81% → 75%

One-tailed vs. two-tailed tests Power: 81% → 88%

Page 7: Power Analysis and Sample Size Determination

Power Formula

Depends on study design Not hard, but can be VERY algebra

intensive May want to use a computer program or

statistician

Depends on study design Not hard, but can be VERY algebra

intensive May want to use a computer program or

statistician

Page 8: Power Analysis and Sample Size Determination

How Big a Sample We Need? Fundamental research question Should be addressed after determining the primary

objective and study design Too Few Patients in a clinical study

– May fail to detect a clinically important difference Too Many

– Involve extra patients– Therapy may have risks– Cost more

Fundamental research question Should be addressed after determining the primary

objective and study design Too Few Patients in a clinical study

– May fail to detect a clinically important difference Too Many

– Involve extra patients– Therapy may have risks– Cost more

Page 9: Power Analysis and Sample Size Determination

How Big a Sample We Need? Fundamentalresearch question How Big?

18 180 1,800 18,000 180,000

Fundamentalresearch question How Big?

18 180 1,800 18,000 180,000

Page 10: Power Analysis and Sample Size Determination

Sample Size Formula Information

Variables of interest type of data e.g. continuous, categorical

Desired power Desired significance level Effect/difference of clinical importance Standard deviations of continuous outcome

variables One or two-sided tests

Variables of interest type of data e.g. continuous, categorical

Desired power Desired significance level Effect/difference of clinical importance Standard deviations of continuous outcome

variables One or two-sided tests

Page 11: Power Analysis and Sample Size Determination

Sample Size & Study Design Randomized controlled trial (RCT) Block/stratified-block randomized trial Equivalence trial Non-randomized intervention study Observational study Prevalence study Measuring sensitivity and specificity

Randomized controlled trial (RCT) Block/stratified-block randomized trial Equivalence trial Non-randomized intervention study Observational study Prevalence study Measuring sensitivity and specificity

Page 12: Power Analysis and Sample Size Determination

Sample Size & Data Structure Paired data Repeated measures Groups of equal sizes Hierarchical data

Paired data Repeated measures Groups of equal sizes Hierarchical data

Page 13: Power Analysis and Sample Size Determination

Sample Size Non-randomized studies looking for differences or

associations require larger sample to allow adjustment for confounding factors

Absolute sample size is of interest surveys sometimes take % of population approach

Study’s primary outcome is the variable you do the samplesize calculation for If secondary outcome variables considered important make sure

sample size is sufficient

Increase the ‘real’ sample size to reflect loss to follow up,expected response rate, lack of compliance, etc. Make the link between the calculation and increase

Non-randomized studies looking for differences orassociations require larger sample to allow adjustment for confounding factors

Absolute sample size is of interest surveys sometimes take % of population approach

Study’s primary outcome is the variable you do the samplesize calculation for If secondary outcome variables considered important make sure

sample size is sufficient

Increase the ‘real’ sample size to reflect loss to follow up,expected response rate, lack of compliance, etc. Make the link between the calculation and increase

Page 14: Power Analysis and Sample Size Determination

StepsStep 1. Define Primary Objective

To see if feeding milk to 5 year old kids enhancesgrowth.

Step 2. Study Design Extra Milk Diet

5 yr olds Normal Milk Diet

Outcome: height (cm)

Step 3. Define clinically significant differenceone wishes to detect Difference (∆) of 0.5 cm

Step 1. Define Primary Objective To see if feeding milk to 5 year old kids enhances

growth.Step 2. Study Design

Extra Milk Diet

5 yr olds Normal Milk Diet

Outcome: height (cm)

Step 3. Define clinically significant differenceone wishes to detect Difference (∆) of 0.5 cm

Page 15: Power Analysis and Sample Size Determination

StepsStep 4. Define degree of certainty of finding this

differencebeta (β) or type II error : The probability of NOT detecting asignificant difference when there really is one.

Risk of a false-negative finding ie Risk of declaring no significantdifference in height between the milk diets when a differencereally does exist.

Set at ≤ 20%

Power of the Test: Probability of detecting a predefined clinicallysignificant difference.

Power = (1- β) = 1 -20% = 80%

Step 4. Define degree of certainty of finding thisdifference

beta (β) or type II error : The probability of NOT detecting asignificant difference when there really is one.

Risk of a false-negative finding ie Risk of declaring no significantdifference in height between the milk diets when a differencereally does exist.

Set at ≤ 20%

Power of the Test: Probability of detecting a predefined clinicallysignificant difference.

Power = (1- β) = 1 -20% = 80%

Page 16: Power Analysis and Sample Size Determination

StepsStep 5. Define significance level

Alpha (α) or type I error: The probability of detecting a significant differencewhen the treatments are really equally effective

Risk of a false-positive finding

Set at 5% :One has a 5% chance or 1 in 20 odds of declaring a significant differencebetween the milk diets when in fact they are really equal.

We are willing to accept that 1 time out of 20 we will produce a falsepositive finding

StatisticalStatistical significancesignificance testing doestesting does notnot eliminateeliminate uncertainty,ituncertainty,it merelymerelyquantifiesquantifies it.it.

Step 5. Define significance levelAlpha (α) or type I error: The probability of detecting a significant differencewhen the treatments are really equally effective

Risk of a false-positive finding

Set at 5% :One has a 5% chance or 1 in 20 odds of declaring a significant differencebetween the milk diets when in fact they are really equal.

We are willing to accept that 1 time out of 20 we will produce a falsepositive finding

StatisticalStatistical significancesignificance testing doestesting does notnot eliminateeliminate uncertainty,ituncertainty,it merelymerelyquantifiesquantifies it.it.

Page 17: Power Analysis and Sample Size Determination

For the Milk Study Type I error (α) = 0.05 Type II error (β) = 0.20 Power = (1- β) = 0.80 Clinically significant diff (∆) = 0.5cm Measure of variation (SD) = 2.0 cm

– Exists in literature or “Guesstimate”FormulaFormula

N =N = 2(SD)2(SD)22 xx f(f(αα,, ββ))∆∆22

== 2(2)2(2)22 xx 7.9 / 0.57.9 / 0.522

== 252.8 (each group)

Type I error (α) = 0.05 Type II error (β) = 0.20 Power = (1- β) = 0.80 Clinically significant diff (∆) = 0.5cm Measure of variation (SD) = 2.0 cm

– Exists in literature or “Guesstimate”FormulaFormula

N =N = 2(SD)2(SD)22 xx f(f(αα,, ββ))∆∆22

== 2(2)2(2)22 xx 7.9 / 0.57.9 / 0.522

== 252.8 (each group)

Beta

Alpha 0.05 0.10 0.20 0.50

0.10 10.8 8.6 6.2 2.7

0.05 13.0 10.5 7.9 3.8

0.02 15.8 13.0 10.0 5.4

0.01 17.8 14.9 11.7 6.6

Page 18: Power Analysis and Sample Size Determination

Simple Method Nomogram Standardized difference

= smallest medically relevant diffestimated standard deviation

= 0.5/2.0 = 0.25Assumptions:

1. 2 sample comparison only2. Same number of subjects

per group3. Variable is a continuous

measure that is normallydistributed

Nomogram Standardized difference

= smallest medically relevant diffestimated standard deviation

= 0.5/2.0 = 0.25Assumptions:

1. 2 sample comparison only2. Same number of subjects

per group3. Variable is a continuous

measure that is normallydistributed

500

Page 19: Power Analysis and Sample Size Determination

1 sample test Study Objective : Study effect of new sleep aid Baseline to sleep time after taking the medication for one week Two-sided test, α = 0.05, power = 1-β = 90% Difference(δ) = 1 (4 hours of sleep to 5) Standard deviation(σ) = 2 hr

Change δ from 1hr to 2 hr makes n goes from 43 to 11

2 2 2 21 / 2 1

2 2

( ) (1.960 1.282) 242.04 43

1

Z Zn

Study Objective : Study effect of new sleep aid Baseline to sleep time after taking the medication for one week Two-sided test, α = 0.05, power = 1-β = 90% Difference(δ) = 1 (4 hours of sleep to 5) Standard deviation(σ) = 2 hr

Change δ from 1hr to 2 hr makes n goes from 43 to 11

2 2 2 21 / 2 1

2 2

( ) (1.960 1.282) 242.04 43

1

Z Zn

2 2

2

(1.960 1.282) 210.51 11

2n

Page 20: Power Analysis and Sample Size Determination

1 sample test Change power from 90% to 80% makes n goes from 11 to 8 (Small sample: start thinking about using the t distribution)

Change the standard deviation from 2 to 3 makes n goes from 8 to 18

2 2

2

(1.960 0.841) 27.85 8

2n

Change power from 90% to 80% makes n goes from 11 to 8 (Small sample: start thinking about using the t distribution)

Change the standard deviation from 2 to 3 makes n goes from 8 to 182 2

2

(1.960 0.841) 317.65 18

2n

Page 21: Power Analysis and Sample Size Determination

Sleep Aid Example: 2 Sample Original design (2-sided test, α = 0.05, 1-β = 90%, σ = 2hr, δ = 1 hr) Two sample randomized parallel design Needed 43 in the one-sample design In 2-sample need twice that, in each group! 4 times as many people are needed in this design

Change δ from 1hr to 2 hr makes n goes from 72 to 44

2 2 2 21 / 2 1

2 2

2( ) 2(1.960 1.282) 284.1 85 170 total!

1

Z Zn

Original design (2-sided test, α = 0.05, 1-β = 90%, σ = 2hr, δ = 1 hr) Two sample randomized parallel design Needed 43 in the one-sample design In 2-sample need twice that, in each group! 4 times as many people are needed in this design

Change δ from 1hr to 2 hr makes n goes from 72 to 44

2 2 2 21 / 2 1

2 2

2( ) 2(1.960 1.282) 284.1 85 170 total!

1

Z Zn

2 2

2

2(1.960 1.282) 221.02 22 44 total

2n

Page 22: Power Analysis and Sample Size Determination

Sleep Aid Example: 2 Sample

Change power from 90% to 80% makes n goes from 44 to 32

Change the standard deviation from 2 to 3 makes n goes from 32 to 72

2 2

2

2(1.960 0.841) 215.69 16 32 total

2n

Change power from 90% to 80% makes n goes from 44 to 32

Change the standard deviation from 2 to 3 makes n goes from 32 to 722 2

2

2(1.960 0.841) 335.31 36 72 total

2n

Page 23: Power Analysis and Sample Size Determination

Summary

Changes in the detectable difference haveHUGE impacts on sample size 20 point difference → 25 patients/group 10 point difference → 100 patients/group 5 point difference → 400 patients/group

Changes in α, β, σ, number of samples, if it is a 1-or 2-sided test can all have a large impact on yoursample size calculation

Changes in the detectable difference haveHUGE impacts on sample size 20 point difference → 25 patients/group 10 point difference → 100 patients/group 5 point difference → 400 patients/group

Changes in α, β, σ, number of samples, if it is a 1-or 2-sided test can all have a large impact on yoursample size calculation

Page 24: Power Analysis and Sample Size Determination

Matched Pair Designs

Similar to 1-sample formula Means (paired t-test)

Mean difference from paired data Variance of differences

Proportions Based on discordant pairs

Similar to 1-sample formula Means (paired t-test)

Mean difference from paired data Variance of differences

Proportions Based on discordant pairs

Page 25: Power Analysis and Sample Size Determination

Difference in Proportion Study Objective To increase survival by 5% with a new cancer drug P1 = % survival (std) = 85% P2 = % survival (new) = 90% Power = 90%

N = P1 (100 - P1) + P2 (100 - P2) x f (α, β) = 913.5 (each group)(P2 - P1)2

= 1827 Total

A very large study has the power to demonstrate statisticalsignificance for very small, even clinically inconsequentialdifferences.

Study Objective To increase survival by 5% with a new cancer drug P1 = % survival (std) = 85% P2 = % survival (new) = 90% Power = 90%

N = P1 (100 - P1) + P2 (100 - P2) x f (α, β) = 913.5 (each group)(P2 - P1)2

= 1827 Total

A very large study has the power to demonstrate statisticalsignificance for very small, even clinically inconsequentialdifferences.

Page 26: Power Analysis and Sample Size Determination

Changes in basic formulae Unequal #s in Each Group

Ratio of cases to controls Use if want λ patients randomized to the treatment arm for every patient randomized

to the placebo arm

Take no more than 4-5 controls/case

2 1

2 2 21 / 2 1 1 2

1 2

controls for every case

( ) ( / )

n n

Z Zn

2 1

2 2 21 / 2 1 1 2

1 2

controls for every case

( ) ( / )

n n

Z Zn

Page 27: Power Analysis and Sample Size Determination

# of Covariates & # of Subjects At least 10 subjects for every variable investigated

In logistic regression No general justification This is stability, not power Peduzzi et al., (1985) biased regression coefficients and

variance estimates

Principle component analysis (PCA) (Thorndike1978 p 184): N≥10m+50 or even N ≥ m2 + 50

At least 10 subjects for every variable investigated In logistic regression No general justification This is stability, not power Peduzzi et al., (1985) biased regression coefficients and

variance estimates

Principle component analysis (PCA) (Thorndike1978 p 184): N≥10m+50 or even N ≥ m2 + 50

Page 28: Power Analysis and Sample Size Determination

Balanced Designs: Easier

Equal numbers in two groups is the easiestto handle

If you have more than two groups, still,equal sample sizes easiest

Complicated design = simulations Done by the statistician

Equal numbers in two groups is the easiestto handle

If you have more than two groups, still,equal sample sizes easiest

Complicated design = simulations Done by the statistician

Page 29: Power Analysis and Sample Size Determination

Multiple Comparisons If you have 4 groups

All 2 way comparisons of means 6 different tests

Bonferroni: divide α by # of tests 0.025/6 ≈ 0.0042

High-throughput laboratory tests

If you have 4 groups All 2 way comparisons of means 6 different tests

Bonferroni: divide α by # of tests 0.025/6 ≈ 0.0042

High-throughput laboratory tests

Page 30: Power Analysis and Sample Size Determination

Flaws in Statements "A previous study in this area recruited 150 subjects and found highly

significant results (p=0.014), and therefore a similar sample size shouldbe sufficient here."

Previous studies may have been 'lucky' to find significant results, due to randomsampling variation.

"Sample sizes are not provided because there is no prior information onwhich to base them."

Find previously published information Conduct small pre-study If a very preliminary pilot study, sample size calculations not usually necessary

No prior information on standard deviations Give the size of difference that may be detected in terms of number of standard

deviations

"A previous study in this area recruited 150 subjects and found highlysignificant results (p=0.014), and therefore a similar sample size shouldbe sufficient here."

Previous studies may have been 'lucky' to find significant results, due to randomsampling variation.

"Sample sizes are not provided because there is no prior information onwhich to base them."

Find previously published information Conduct small pre-study If a very preliminary pilot study, sample size calculations not usually necessary

No prior information on standard deviations Give the size of difference that may be detected in terms of number of standard

deviations

Page 31: Power Analysis and Sample Size Determination

Roadmap1. Do a sample size calculation before you start

collecting data2. Collect data3. Perform statistical test : IF p value < 0.05, declare

statistical significance4. Consider clinical significance by looking at the size of

the difference

1. Do a sample size calculation before you startcollecting data

2. Collect data3. Perform statistical test : IF p value < 0.05, declare

statistical significance4. Consider clinical significance by looking at the size of

the difference

Page 32: Power Analysis and Sample Size Determination

References

“Sample Size Estimation”, Phil Hahn Queen’sUniversity

”Sample Size and Power”, Laura Lee Johnson,Ph.D., Statistician, National Center forComplementary and Alternative Medicine

”Sample Size Estimation and Power Analysis”,Ayumi Shintani, PhD, MPH Department ofBiostatistics, Vanderbilt University

“Sample Size Estimation”, Phil Hahn Queen’sUniversity

”Sample Size and Power”, Laura Lee Johnson,Ph.D., Statistician, National Center forComplementary and Alternative Medicine

”Sample Size Estimation and Power Analysis”,Ayumi Shintani, PhD, MPH Department ofBiostatistics, Vanderbilt University