sample size annie herbert medical statistician research & development support unit salford royal...

25
Sample Size Sample Size Annie Herbert Annie Herbert Medical Statistician Medical Statistician Research & Development Support Unit Research & Development Support Unit Salford Royal Hospitals NHS Salford Royal Hospitals NHS Foundation Trust Foundation Trust [email protected] 0161 206 4567 0161 206 4567

Upload: beverly-mccoy

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Sample SizeSample Size

Annie HerbertAnnie HerbertMedical StatisticianMedical Statistician

Research & Development Support UnitResearch & Development Support UnitSalford Royal Hospitals NHS Foundation TrustSalford Royal Hospitals NHS Foundation Trust

[email protected] 206 45670161 206 4567

TimetableTimetable

TimeTime TaskTask

60 mins60 mins PresentationPresentation

DiscussionDiscussion

15 mins15 mins BreakBreak

60 mins60 mins Demonstration of S.S. calculationDemonstration of S.S. calculation

Practical tasksPractical tasks

OutlineOutline

When are sample size calculations necessary?When are sample size calculations necessary?

Single proportionsSingle proportionsTwo proportionsTwo proportionsTwo meansTwo meansOther situationsOther situations

Practical implicationsPractical implicationsUseful referencesUseful references

Why is it important to consider Why is it important to consider sample size?sample size?

To have a high chance of detecting a clinically To have a high chance of detecting a clinically important treatment effect if it exists.important treatment effect if it exists.

To ensure appropriate precision of estimates.To ensure appropriate precision of estimates.

To avoid wasting resources and the time of To avoid wasting resources and the time of participants.participants.

To avoid misleading conclusions.To avoid misleading conclusions.

When is a sample size calculation When is a sample size calculation not necessary?not necessary?

Truly qualitative research.Truly qualitative research.

Pilot studies that will be used to inform Pilot studies that will be used to inform larger studies (and not make conclusions).larger studies (and not make conclusions).

Example 1: Example 1: Population studies, single proportion (1)Population studies, single proportion (1)

What is the prevalence of dysfunctional What is the prevalence of dysfunctional breathing amongst asthma patients in breathing amongst asthma patients in

general practice? general practice?

(Thomas et al, BMJ 2001)(Thomas et al, BMJ 2001)

Results: Sample proportion of those Results: Sample proportion of those suffering from dysfunctional breathing and suffering from dysfunctional breathing and

a confidence interval for this proportion.a confidence interval for this proportion.

Population studies, single proportion (2)Population studies, single proportion (2)

Primary outcome variable: BinaryPrimary outcome variable: Binary

Required information:Required information:

1) Estimate of what proportion will be,1) Estimate of what proportion will be,

(i(if rate totally unknown pick 50% as most conservative estimate).

2) Size of population if population small,2) Size of population if population small,

e.g., < 20,000.e.g., < 20,000.

3) Acceptable deviation from this population estimate,3) Acceptable deviation from this population estimate,

(half width of confidence interval).(half width of confidence interval).

Population studies, single proportion (3)Population studies, single proportion (3)

Statement for Protocol (1)Statement for Protocol (1)Include all figures that you’ve inputted into the calculation.Include all figures that you’ve inputted into the calculation.

Possibly add statement about response rate/drop out.Possibly add statement about response rate/drop out.

Name the person who did the sample size calculation and Name the person who did the sample size calculation and any software.any software.

E.g., ‘A sample of 324 patients will be required to obtain a E.g., ‘A sample of 324 patients will be required to obtain a 95% confidence interval of +/- 5% around a prevalence of 95% confidence interval of +/- 5% around a prevalence of approximately 30%. This was calculated by the PI using approximately 30%. This was calculated by the PI using StatsDirect. We expect that 60% of those who we approach StatsDirect. We expect that 60% of those who we approach will agree to take part and as this is a questionnaire-based will agree to take part and as this is a questionnaire-based study that there’ll be next to no drop-outs, so we intend to study that there’ll be next to no drop-outs, so we intend to approach 540 patients.’approach 540 patients.’

How sample size varies with precision:How sample size varies with precision:

Expected Expected PrevalencePrevalence

Acceptable Acceptable DeviationDeviation Sample SizeSample Size

30%30% 1010 8181

30%30% 55 313313

30%30% 11 44664466

Example 2: Example 2: Comparing two proportions (1)Comparing two proportions (1)

Study: RCT comparing the effectiveness of Study: RCT comparing the effectiveness of colony-stimulating factors (CSFs) in reducing colony-stimulating factors (CSFs) in reducing

sepsis in premature babies.sepsis in premature babies.

Results: Rate of sepsis at 2 weeks in CSF Results: Rate of sepsis at 2 weeks in CSF group and Placebo group, difference group and Placebo group, difference

between these two proportions, confidence between these two proportions, confidence interval for this difference.interval for this difference.

Comparing two proportions (2)Comparing two proportions (2)

Primary outcome variable: Binary.Primary outcome variable: Binary.

Required information:Required information:

1) Estimate of proportion in each group 1) Estimate of proportion in each group

(difference is clinically important).(difference is clinically important).

2) Power.2) Power.

3) Significance level.3) Significance level.

4) Treatment:Control ratio (often 1:1).4) Treatment:Control ratio (often 1:1).

Definitions (1)Definitions (1)

‘‘Effect Size’Effect Size’– What do you expect to see?What do you expect to see?– What has been seen previously?What has been seen previously?– What is a clinically important difference?What is a clinically important difference?

‘‘Power’Power’– Probability of detecting a clinically Probability of detecting a clinically

important effect, important effect, ifif it exists. it exists.– Typically 80%, 90%.Typically 80%, 90%.

Definitions (2)Definitions (2)

‘‘Significance Level’Significance Level’– Cut-off level at which you would say a p-value Cut-off level at which you would say a p-value

is significant/non-significant.is significant/non-significant.– Probability of concluding that there is a Probability of concluding that there is a

statistically significant difference in the sample statistically significant difference in the sample when there is in fact no true difference in the when there is in fact no true difference in the population.population.

– Typically 5%.Typically 5%.– Should be set lower if multiple statistical tests Should be set lower if multiple statistical tests

have been planned.have been planned.

Comparing two proportions (3)Comparing two proportions (3)

Statement for Protocol (2)Statement for Protocol (2)

Comparing two proportions:Comparing two proportions:

E.g., ‘A sample size of at least 149 patients E.g., ‘A sample size of at least 149 patients per group is required to be able to detect per group is required to be able to detect an absolute difference of 16% (50% vs. an absolute difference of 16% (50% vs.

34%) in the rate of sepsis between groups 34%) in the rate of sepsis between groups with 80% power, at 5% significance level’.with 80% power, at 5% significance level’.

Example 3: Example 3: Comparing two means (1)Comparing two means (1)

A RCT to evaluate a brief psychological intervention in A RCT to evaluate a brief psychological intervention in comparison to usual treatment in the reduction of comparison to usual treatment in the reduction of

suicidal ideation. suicidal ideation. (Guthrie et al, BMJ 2001)(Guthrie et al, BMJ 2001)

‘‘Suicidal ideation will be measured on the Beck scale; the Suicidal ideation will be measured on the Beck scale; the standard deviation of this scale in a previous study was standard deviation of this scale in a previous study was 7.7, and a difference of 5 points is considered to be of 7.7, and a difference of 5 points is considered to be of

clinical importance.’clinical importance.’

Results: Mean reduction in Beck score in the Intervention Results: Mean reduction in Beck score in the Intervention group and Usual Treatment group, difference between group and Usual Treatment group, difference between

these two means, confidence interval for this difference.these two means, confidence interval for this difference.

Comparing two means (2)Comparing two means (2)

Primary outcome variable: NumericalPrimary outcome variable: Numerical

Required information:Required information:1)1) Estimate of standard deviation of primary Estimate of standard deviation of primary

outcome variable.outcome variable.2)2) Effect size (difference in means).Effect size (difference in means).3)3) Power.Power.4)4) Significance level.Significance level.5)5) Treatment:Control ratio.Treatment:Control ratio.

Where to find an estimate of the Where to find an estimate of the standard deviation:standard deviation:

Pilot study.Pilot study.– Though note standard deviations on very small numbers Though note standard deviations on very small numbers

may be imprecise.may be imprecise.

Previous studies.Previous studies.

In-house data.In-house data.

Rough estimate: Rough estimate: Quarter of the range of ‘usual’ values.Quarter of the range of ‘usual’ values.

Comparing two means (3)Comparing two means (3)

Statement for Protocol (3)Statement for Protocol (3)

Comparing two means:Comparing two means:

E.g., ‘A sample size of at least 39 patients E.g., ‘A sample size of at least 39 patients per group is required to be able to detect a per group is required to be able to detect a difference in mean Beck score of 5 points difference in mean Beck score of 5 points

or more with 80% power at 5% or more with 80% power at 5% significance level. This is assuming a significance level. This is assuming a standard deviation of 7.7 for the Beck standard deviation of 7.7 for the Beck

scale.scale.

Things to NoteThings to Note

Power is linked to effect size.Power is linked to effect size.– All trials have an infinite number of powers!All trials have an infinite number of powers!

Post-hoc power calculations are pointless.Post-hoc power calculations are pointless.– Power conveyed by confidence interval.Power conveyed by confidence interval.

If secondary outcomes are important separate If secondary outcomes are important separate sample size calculations should be done for sample size calculations should be done for these too.these too.– The largest size resulting from these calculations The largest size resulting from these calculations

should be used so powerful enough for all analyses.should be used so powerful enough for all analyses.

Other SituationsOther Situations

More than 2 groups.More than 2 groups.

Non-randomised studies, e.g. Case-Control.Non-randomised studies, e.g. Case-Control.

Equivalence trials.Equivalence trials.

Paired data, e.g., crossover, before/after trial.Paired data, e.g., crossover, before/after trial.

Time-to-event data.Time-to-event data.

Cluster-randomised studies.Cluster-randomised studies.

Diagnosis studies.Diagnosis studies.

Sample size calculations in practice:Sample size calculations in practice:

Often look at a range of assumptions.Often look at a range of assumptions.– Best case/worst case scenario.Best case/worst case scenario.

Balance between ideal statistical power, Balance between ideal statistical power, resources and time.resources and time.– Bear in mind only ¼ may consent (or lower).Bear in mind only ¼ may consent (or lower).

Interim/sub-group analyses.Interim/sub-group analyses.– Seek advice and adjust p-values, etc. accordingly.Seek advice and adjust p-values, etc. accordingly.

Issues of power should not overshadow issues Issues of power should not overshadow issues of quality.of quality.

Useful ReferencesUseful References

Sample size calculations in randomised trials: Sample size calculations in randomised trials: mandatory and mystical. mandatory and mystical.

Schulz & Grimes, The Lancet 2005; 365Schulz & Grimes, The Lancet 2005; 365

An Introduction to Medical Statistics. An Introduction to Medical Statistics.

Bland, M, OUP 2000Bland, M, OUP 2000

Sample size calculations for clinical studies. Sample size calculations for clinical studies.

Machin, Campbel, Fayers & Pinol, BlackwellsMachin, Campbel, Fayers & Pinol, Blackwells