sample size determination bandit thinkhamrop, phd (statistics) dept. of biostatistics &...

Sample Size Sample Size DeterminatioDeterminationn Bandit Thinkhamrop, PhD (Statistics)Bandit Thinkhamrop, PhD (Statistics)

Dept. of Biostatistics & DemographyDept. of Biostatistics & Demography

Khon Kaen UniversityKhon Kaen University

Essential of sample size Essential of sample size calculationcalculation

No one accept any “magic number”No one accept any “magic number” Too large vs Too smallToo large vs Too small To justify with the sponsor and the To justify with the sponsor and the

Ethics CommitteeEthics Committee To ensure:To ensure:

– adequate adequate powerpower to test a hypothesis to test a hypothesis– desired desired precisionprecision to obtain an estimate to obtain an estimate

Two main approachesTwo main approaches

Hypothesis-based sample size calculationHypothesis-based sample size calculation– Involve “power” or beta errorInvolve “power” or beta error– Ensure a significant finding but may not be Ensure a significant finding but may not be

conclusive clinicallyconclusive clinically– Easy and widely availableEasy and widely available

Confidence interval methods of sample Confidence interval methods of sample size calculationsize calculation– Involve precision of the estimationInvolve precision of the estimation– Ensure a conclusive finding clinically as this Ensure a conclusive finding clinically as this

method is directly estimate the magnitude of method is directly estimate the magnitude of effect effect

– Difficult and not widely availableDifficult and not widely available

Overall stepsOverall steps

Identify the primary outcomeIdentify the primary outcome Identify and review the Identify and review the magnitude of magnitude of

effecteffect and its variability that will be used and its variability that will be used as the basis of the conclusion of the as the basis of the conclusion of the research.research.

Identify what statistical method that will Identify what statistical method that will be used to obtain the main magnitude of be used to obtain the main magnitude of effect.effect.

Calculate the sample sizeCalculate the sample size Describe how the sample size is Describe how the sample size is

calculated with sufficient details that calculated with sufficient details that allow explicability.allow explicability.

Steps in the Steps in the calculationcalculation Base sample size calculationBase sample size calculation Design effect (for correlated outcome)Design effect (for correlated outcome) Contingency (increase to account for Contingency (increase to account for

non-responses or dropout)non-responses or dropout) Rounding up to a nearest (and Rounding up to a nearest (and

comfortable) numbercomfortable) number Evaluate if this sample size would Evaluate if this sample size would

provide a precise and conclusive provide a precise and conclusive answer to the research question by answer to the research question by analyze the data as if it is as expected. analyze the data as if it is as expected.

Suggested approachesSuggested approaches

For unknown parameters in the formula, For unknown parameters in the formula, try to find existing evidences or use your try to find existing evidences or use your best “GUESTIMATE”, a.k.a. educated best “GUESTIMATE”, a.k.a. educated guest.guest.

Do not use only one scenario or based on Do not use only one scenario or based on only one reference for the calculation. It is only one reference for the calculation. It is highly recommended that all key highly recommended that all key parameters should be varied to see how parameters should be varied to see how they effect on the sample size. they effect on the sample size.

Always evaluate its sufficiency by Always evaluate its sufficiency by estimate the main magnitude of effect estimate the main magnitude of effect and its 95% CI and see if it provide a and its 95% CI and see if it provide a conclusive finding. conclusive finding.

Consult with the statistician earlyConsult with the statistician early

Common pitfallsCommon pitfalls

Unjustified sample size by specifying a Unjustified sample size by specifying a “magic” number“magic” number

Based on a simplify formula or a sample size Based on a simplify formula or a sample size table without understanding its limitations table without understanding its limitations

""A previous study in this area recruited 50 A previous study in this area recruited 50 subjects and found highly significant results subjects and found highly significant results (p=0.001), and therefore a similar sample size (p=0.001), and therefore a similar sample size should be sufficientshould be sufficient." – never do it like this ." – never do it like this

Inconsistent with the protocolInconsistent with the protocol Too much rely on the previous findings in Too much rely on the previous findings in

sample size calculationsample size calculation

Examples of common Examples of common calculationscalculations Mean – one groupMean – one group Mean – two independent groups Mean – two independent groups Proportion – one groupProportion – one group Proportion – two independent Proportion – two independent

groups groups Get some idea from thoseGet some idea from those Practice with your own researchPractice with your own research

Mean – one groupMean – one group:Formula:Formula

Where:

n = The sample sizeZ/2 = The standard normal coefficient, typically 1.96 for 95% CI

s =The standard deviation.d = The desired precision level expressed as half of the maximum acceptable confidence interval width.

Mean – one groupMean – one group:Calculations:Calculations (fix (fix = = 0.05)0.05)

ExpectedExpected

Standard Standard deviationdeviation

PrecisionPrecision

(half width)(half width)nn

2525 55 9999

3030 55 141141

2525 1010 2727

3030 1010 3838

Mean – one groupMean – one group:Descriptions:Descriptions A sample size of 38 would be able A sample size of 38 would be able

to estimate a mean with a precision to estimate a mean with a precision of 10 assuming a standard of 10 assuming a standard deviation of 30 according to a study deviation of 30 according to a study by <Reference>. That is, based on by <Reference>. That is, based on the expected mean of 55 the expected mean of 55 <Reference>, the 95% confidence <Reference>, the 95% confidence interval of the estimated mean interval of the estimated mean would be between 45 and 65. would be between 45 and 65.

Mean – two independent Mean – two independent groupgroup:Formula:Formula

Sample size in each group (assumes equal sized groups)

Represents the desired level of statistical significance (typically 1.96 for = 0.05).

Represents the desired power (typically .84 for 80% power). A measure of

variability (This is a variance or a square of the standard deviation)

Minimum meaningful difference or Effect Size

Mean – two independent groupsMean – two independent groups:Calculations:Calculations (fix (fix = 0.05) = 0.05)H0: M1-M2=0. H1: M1-M2=D1<>0. Test Statistic: Z test with pooled variance (SD1 = 20; SD2 = 25)

PowePowerr

Mean in Mean in Control grp.Control grp.

Minimum and Minimum and meaningful meaningful differencedifference

n1n1 n2n2

90%90% 3030 1010 109109 109109

80%80% 3030 1010 8282 8282

90%90% 3030 2020 2828 2828

80%80% 3030 2020 2222 2222

90%90% 3535 55 432432 432432

80%80% 3535 55 322322 322322

90%90% 3535 1515 4949 4949

80%80% 3535 1515 3737 3737

Mean – two independent Mean – two independent groupsgroups:Descriptions:Descriptions

A total sample size of 37 in group one A total sample size of 37 in group one and 37 in group two would have a and 37 in group two would have a power of 80% to detect a difference power of 80% to detect a difference between group of 15between group of 15 assuming a mean assuming a mean of 35 in control group of 35 in control group with estimated gr with estimated gr

oup standard deviations of oup standard deviations of 2020 and and 25, 25, respectively,respectively, according to a study by according to a study by <Reference>. <Reference>.

The test statistic used is the two-sided The test statistic used is the two-sided two sample t-test. The significance level two sample t-test. The significance level of the test was targeted at 0.05. of the test was targeted at 0.05.

Proportion – one groupProportion – one group:Formula:Formula

Where:

n = The sample sizeZ/2 = The standard normal coefficient, , typically 1.96 for 95% CI

p = The value of the proportion as a decimal percent (e.g., 0.45).d = The desired precision level expressed as half of the maximum acceptable confidence interval width.

Proportion – one groupProportion – one group:Calculations:Calculations (fix (fix = = 0.05)0.05)

Expected Expected PrevalencePrevalence

PrecisionPrecision

(half width)(half width)nn

15%15% 2%2% 1,2251,225

20%20% 2%2% 1,5371,537

15%15% 4%4% 307307

20%20% 4%4% 385385

Proportion – one groupProportion – one group:Descriptions:Descriptions A sample size of 400 would have A sample size of 400 would have

a 95% confidence interval of 16% a 95% confidence interval of 16% to 24%to 24% assuming a prevalence of assuming a prevalence of 20% according to a study by 20% according to a study by <Reference>. <Reference>.

Proportion – two independent Proportion – two independent groupgroup:Formula:Formula

Sample size in each group (assumes equal sized groups)

Represents the desired level of statistical significance (typically 1.96 for = 0.05).

Represents the desired power (typically .84 for 80% power). A measure of

variability (similar to standard deviation)

Minimum meaningful difference or Effect Size

Proportion – two independent Proportion – two independent groupsgroups:Calculations:Calculations (fix (fix = 0.05) = 0.05)H0: P1-P2=0. H1: P1-P2=D1<>0. Test Statistic: Z test with pooled variancePowePowe

rrProportion in Proportion in Control grp.Control grp.

Minimum and Minimum and meaningful meaningful differencedifference

n1n1 n2n2

90%90% 40%40% 5%5% 2,0532,053 2,0532,053

80%80% 40%40% 5%5% 1,5341,534 1,5341,534

90%90% 50%50% 5%5% 2,0952,095 2,0952,095

80%80% 50%50% 5%5% 1,5651,565 1,5651,565

90%90% 40%40% 10%10% 519519 519519

80%80% 40%40% 10%10% 388388 388388

90%90% 50%50% 10%10% 519519 519519

80%80% 50%50% 10%10% 388388 388388

Proportion – two independent Proportion – two independent groupsgroups:Descriptions:Descriptions

A total sample size of 388 in group one A total sample size of 388 in group one and 388 in group two would have a and 388 in group two would have a power of 80% to detect a difference power of 80% to detect a difference between group of 10%between group of 10% assuming a assuming a prevalence of 50% in control group prevalence of 50% in control group according to a study by <Reference>. according to a study by <Reference>.

The test statistic used is the two-sided The test statistic used is the two-sided Z test. The significance level of the Z test. The significance level of the test was targeted at 0.0500. test was targeted at 0.0500.

Other considerationsOther considerations

Sampling design affects the calculation of sample sizeSampling design affects the calculation of sample size– Simple randomSimple random sampling / assignment sampling / assignment– Stratified randomStratified random sampling / assignment sampling / assignment– Clustered randomClustered random sampling / assignment sampling / assignment

Complex study designs affects the calculation of sample Complex study designs affects the calculation of sample sizesize– MatchingMatching– Multiple stages of samplingMultiple stages of sampling– Repeated measuresRepeated measures

Usually the sample size calculation is based on method of Usually the sample size calculation is based on method of analysisanalysis– Correlation, Agreement, Diagnostic performanceCorrelation, Agreement, Diagnostic performance– Z-testZ-test– Regression – multiple linear, logistic Regression – multiple linear, logistic – Multivariate analyses such as principle component or factor Multivariate analyses such as principle component or factor

analysisanalysis– Survival analysesSurvival analyses– Multilevel modelsMultilevel models

Other considerationsOther considerations

Demonstrate superiority Demonstrate superiority – Sample size sufficient to detect difference Sample size sufficient to detect difference

between treatments between treatments – Require to specify “Require to specify “minimum meaningfulminimum meaningful” ”

differencedifference Demonstrate non-inferiority or equally Demonstrate non-inferiority or equally

effectiveeffective– Sample size required to demonstrate Sample size required to demonstrate

equivalence larger than required to equivalence larger than required to demonstrate superioritydemonstrate superiority

– Require to specify “Require to specify “non-inferiority marginnon-inferiority margin or or equivalence rangeequivalence range” ”

Precision or Power Precision or Power EstimationEstimation Equivalence to sample size calculation – do it Equivalence to sample size calculation – do it

in the planning phase of the studyin the planning phase of the study Do it when the number of available sample is Do it when the number of available sample is

knownknown Wrong: “There are around 50 patients per Wrong: “There are around 50 patients per

year, of whom 10% may refuse to take part year, of whom 10% may refuse to take part in the study. Therefore over the 2 years of in the study. Therefore over the 2 years of the study, the sample size will be 90 the study, the sample size will be 90 patients. “patients. “

Correct: “It is estimated that there will be 90 Correct: “It is estimated that there will be 90 patients in the clinic. This will give a patients in the clinic. This will give a precision of the prevalence estimation of precision of the prevalence estimation of 20% assuming a prevalence of 65%.”20% assuming a prevalence of 65%.”

Suggested learning Suggested learning resourcesresources WWWWWW: : Statistics Guide for Research Grant Statistics Guide for Research Grant

Applicants Applicants at St George’s University of at St George’s University of London (maintained by Martin Bland):London (maintained by Martin Bland):– httphttp://://wwwwww--usersusers..yorkyork..acac..ukuk// 55~mb55~mb //guideguide//sizesize..htmhtm

SoftwareSoftware: : PASS2008, nQuery, PASS2008, nQuery, EpiTable, SeqTrial, PS, etc.EpiTable, SeqTrial, PS, etc.

sample size determination bandit thinkhamrop, phd (statistics) dept. of biostatistics &...

Documents

sample size z2

sample size table

descriptionsa sample

similar sample size

sample sizedescribe

main magnitude of effect

expected mean

magnitude of effect