lecture 4 sample size determination

26
RDP Statistical Methods in Scientific Research - Lectu re 4 1 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the sample size 4.3 Some simple variations 4.4 Further considerations

Upload: keran

Post on 31-Jan-2016

76 views

Category:

Documents


1 download

DESCRIPTION

Lecture 4 Sample size determination. 4.1 Criteria for sample size determination 4.2 Finding the sample size 4.3 Some simple variations 4.4 Further considerations. 4.1 Criteria for sample size determination. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 1

Lecture 4

Sample size determination

4.1 Criteria for sample size determination

4.2 Finding the sample size

4.3 Some simple variations

4.4 Further considerations

Page 2: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 2

4.1 Criteria for sample size determination

Suppose that we are to conduct an investigation comparing populations, A and B

Sample A comprises nA units of observation from A

Sample B comprises nB units of observation from B

Suppose that nA = nB and that n = nA + nB

The responses will be quantitative, and the analysis will use a t-test

How should we choose n?

Page 3: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 3

Let

A= mean response for A

B= mean response for B

Null hypothesis is H0: A= B

From the data, we will obtain the sample means and and sample standard deviations SA and SB for groups A and B

Once we have the data, we can:

Reject H0 and say that A> B Reject H0 and say that A< B

Not reject H0

Ax Bx

Page 4: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 4

When nA = nB = n/2, the t-statistic is

where

t will tend to be positive if A> B,negative if A< B andclose to zero if A= B

A BA B

A B

x x nx xt

2S1 1S

n n

2 2 2 2A A B B A B

A B

n 1 S n 1 S S SS

n n 2 2

Page 5: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 5

We will:

Reject H0 and say that A> Bif t k Reject H0 and say that A< B if t k

Not reject H0 if k < t < k

Say A < B Do not reject H0 Say A > B

k 0 k t

Now we need to find both n and k

Page 6: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 6

Suppose that, in truth, A= B

This does not mean that we will observe nor t = 0

In fact, we may observe t k or t k, just by chance

This means that we might reject H0 when H0 is true

This is called type I error

A Bx x

Page 7: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 7

Suppose that, in truth, A= B+

where > 0, andis of a magnitude that would be scientifically worth detecting

We may still observe t k by chance

This means that we might fail to reject H0 when H0 is false

This is called type II error

Page 8: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 8

The probability that t k or t k, when A= B, is calledthe risk of type I error, and is denoted by

(This is for a two-sided alternative: the probability that t k, when A= B, is the risk of type I error for a one-sided alternative and is equal to /2)

The probability that t k, when A= B + is calledthe risk of type II error, and is denoted by

The probability that t k, when A= B + is calledthe power, and is equal to 1

Page 9: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 9

Reducing type I error

Increase k – make it difficult to reject H0

Increasing power

Decrease k – make it easy to reject H0

Reducing type I error and increasing power simultaneously

Increase n – this will make the study more informative, but it will cost more

Page 10: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 10

4.2 Finding the sample size

Suppose that the true standard deviation within each of thepopulations A and B is

Then t Z where

Z follows the normal distribution, with standard deviation 1

When A= B, Z has mean 0

When A= B + , Z has mean n/(2)

A Bx x nZ

2

Page 11: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 11

Specify that the type I risk of error (two-sided) should be :

P( Z k or Z k : A= B) = (1)

Under H0, Z is normally distributedwith mean 0 and st dev 1

k is the valueexceeded by a normal (0, 1) random variable with prob /2

Page 12: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 12

Specify that the type II risk of error should be :

P( Z k : A= B + ) = (2)

Under H0, Z is normally distributedwith mean n/(2)and st dev 1

k n/(2)is the valueexceeded by a normal (0, 1) random variable with prob 1

Page 13: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 13

For = 0.05 and 1 – = 0.90, we have

k = 1.960 and k n/(2) = 1.282

Thus 2 2

22

1.960 1.282n 4 42.030

Power: 1

0.8 0.9 0.95

Type I error:

0.1 24.730 34.255 43.289

0.05 31.396 42.030 51.979

0.01 46.716 59.518 71.257

2

n

Page 14: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 14

Sample size increases: as increases as decreases

as decreases as 1 increases

Page 15: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 15

Unequal randomisation

The power of a study depends on

which, for equal sample sizes is equal to

For nE = RnC, n = RnC + nC and so

E Cn n

n

4.3 Some simple variations

n / 2 n / 2 n

n 4

E C2

Rn / R 1 n / R 1n n Rn

n n R 1

Page 16: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 16

Unequal randomisation

So, the overall sample size is multiplied by the factor

and nE by FE and nC by FC, where

2

2

R 14RnF n

4RR 1

R 1 2 3 5 10

F 1 1.125 1.333 1.800 3.025

FE 1 1.500 2.000 3.000 5.500

FC 1 0.750 0.667 0.600 0.550

E CandR 1 R 1

F F2 2R

Page 17: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 17

Unknown standard deviation

The sample size formula depends on guessing

If this guess is smaller than the truth, the sample size will be too small and the study underpowered

If this guess is larger than the truth, the sample size will be too large and the sample size unnecessarily large

A more accurate calculation can be based on the t-distribution rather than the normal, but this makes little difference and does not overcome the dependence on

Page 18: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 18

Unknown standard deviation

Often, the final analysis will be based on a linear model, not just a t-test

The formulae given can still be used, but is now the residual standard deviation (the SD about the fitted model)

Fitting the right factors will reduce the residual standard deviation, and so the sample size will also be reduced

but you have to guess what will be in advance!

Page 19: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 19

Sample size for estimation

The sample size can be determined to give a confidence interval of specified width W

The 95% confidence interval for = A B is of the form

when sample sizes are large (Lecture 1, Slide 24)

When nA = nB = n/2, this has length

A C A CA B A B

1 1 1 1x x 1.96 x x 1.96

n n n n

42 1.96 7.84

n n

Page 20: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 20

Sample size for estimation

We need to set

which means that

7.84 Wn

2

n 61.47W

Page 21: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 21

Binary data

For R = 1, = 0.05 and 1 – = 0.90, we have

where

pC is the anticipated success rate in C, and pE the improved rate in E to be detected with power 1

2

2 2

4 1.960 1.282 42.030n

p 1 p p 1 p

C E CEe e

E C

p Rp pplog log p

1 p 1and

p R 1

Page 22: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 22

Examples for binary data: R = 1, = 0.05 and 1 – = 0.90

pC pE n

0.1 0.2 0.811 0.1275 502

0.1 0.3 1.350 0.1600 144

0.1 0.5 2.197 0.2100 42

0.3 0.4 0.442 0.2275 946

0.3 0.5 0.847 0.2400 244

0.3 0.7 1.695 0.2500 60

0.4 0.5 0.405 0.2475 1034

0.4 0.6 0.811 0.2500 256

0.4 0.8 1.792 0.2400 56

0.5 0.6 0.405 0.2475 1034

0.5 0.7 0.847 0.2400 244

0.5 0.9 2.197 0.2100 42

p 1 p

Page 23: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 23

Binary data

This approach is based on the log-odds ratio

Many other approximate formulae exist

All give similar answers when sample sizes are large: exact calculations can be made for small sample sizes

CEe e

E C

pplog log

1 p 1 p

Page 24: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 24

4.4 Further considerations

Setting the values for and

The standard scientific convention is to ensure that will be small, and allow any risks to be taken with

For example, if an SD or a control success rate is underestimated at the design stage, the study will be underpowered – the analysis maintains the type I error at the cost of losing power

is the community’s risk of being given a false conclusion is the scientist’s risk of not proving his/her point

Page 25: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 25

Exceptions

If the scientist wishes to prove the null hypothesis (equivalence testing)then should be kept small, while can be inflated if necessary

In a pilot study, preliminary to a larger confirmatory studytype I errors can be rectified in the next study, but type II errors will mean that the next study is not conducted at all

Page 26: Lecture 4 Sample size determination

RDP Statistical Methods in Scientific Research - Lecture 4 26

Finally:

Many more sample size formulae exist – see Machin et al. (1997)

Software also exists: nQuery advisor, PASS

Ensure that the sample size formula used matches the intended final analysis

In complicated situations, the whole study can be simulated on the computer in advance to determine its power