psy2004 research methods psy2005 applied research methods week eleven stephen nunn

PSY2004 Research Methods PSY2005 Applied Research

Methods

Week Eleven

Stephen Nunn

•What it is

•Why it is important

sensitivityof a statistical test

why stats?

variability in the data

lots of different, random sources of variability

we’re trying to see if changes in the

Independent Variable• type of non-word

• treatment type

affects scores on the

Dependent Variable• reaction time

• no. of days drugs taken

lots of other things affect the DV• individual differences

• time of day

• mood

• level of attention

• etc etc etc etc

Lots of random, unsystematic, sources of variation, unrelated to IV

‘noise’

sometimes the effects due to the IV are big, strong

easy to see through the noise

but what if the effect you’re looking for is small, weak

your ‘equipment’(eyes, statistical test)

needs to be sensitive enough to spot it

otherwise you’ll miss it

sensitivityof a statistical test

ability or

probability of detecting an effect

[when there is one]

sounds like a good thing

[but is often ignored]

Reviews of meta-analyses* suggest most social science effect sizes are medium at

best, mostly small (Ellis, 2010)

*meta-analyses combine the results from several studies addressing the same hypotheses

Estimates of power in published Psychological research (e.g., Clark-Carter, 1997, looking at BJP)

mean power for medium effects = 0.6

mean power for small effects = 0.2

NB recommended level of power = 0.8

What does power = 0.2 mean?

[when there is an effect to detect]

you only have a 20% chance of detecting it[i.e., getting a statistically significant result]

The ‘noise’ will tend to swamp the effect of your IV.

Repeated running of the same study would only give a significant result 20% of the time

Or,

you have a 80% probability of making a Type II error

[failing to reject the null hypothesis when it is false]

what affects power?

anything that changes the effect / ’noise’ ratio

effect size

all other things being equal you will have greater power with a bigger effect, less power with a smaller effect

design

all other things being equal repeated measures designs are more powerful that independent groups

because they allow you to remove the ‘noise’ in the data due to individual differences

cell size

all other things being equal simpler designs, fewer levels of your IV will increase power

alpha[criterion for rejecting H0]

stricter (smaller) alphas DECREASE power

e.g., Post-hoc Type 1 error rate correction

Bonferroni

achieved at the expense of power

measures, samples

unreliable measures

heterogeneous samples

–> increase the ‘noise’

–> decrease power

sample size

a larger N gives you more power

[from Central Limit Theorem, increasing N reduces the variability in the sample means, reduces the ‘noise’]

but does this matter?

for the individual researcher:

power = 0.2 = highly likely to waste time and other resources

for ‘science’:

should we not worry more about Type 1 errors?

[rejecting H0 when it is false]

maybe, but:

common (but mistaken) tendency to interpret non-significant results as evidence for no difference

i.e., non-significant result due to low power isn’t just waste of resources, but can be misinterpreted in a misleading way

maybe, but:

a strong publication bias in Psychology means Type 1 errors and Power are intertwined

i.e., only significant results tend to get published

This bias means that if all H0 were true

then all published studies would be Type 1 errors

i.e., keeping the type 1 error rate at 5% for individual studies or research as a whole doesn’t keep the error rate in the literature at that level due to the publication bias

Low power across the discipline increases the proportion of published studies that are Type 1 errors

i.e., general low power reduces the proportion of studies with false H0s that reach

significance and which are therefore published (due to the publication bias). The ratio of Type 1 errors to correct rejections of H0 is therefore

increased (Ellis, 2010)

H0 true(no effect)

H0 true(no effect)

H0 false(effect)

H0 false(effect)

Type 1 errors(5%)

Correct failure to reject H0

Correct failure toreject H0

Correct rejection

of H0

80%power

40%power

Type 2 errors

Type 1 errors(5%)

Type 2 errors

Correct rejection

of H0

published published

ratio of type 1 errors to correct rejections = 5:80 (6.2%)

ratio of type 1 errors to correct rejections = 5:40 (12.4%)

NB the publication bias also makes it harder to reveal Type 1 errors in the literature

i.e., non-significant failures to replicate a published study (that reveal it as a possible Type 1 error) are less likely to be published due to the publishing bias against non-significant findings.

maybe, but:

small sample (i.e., low power) studies tend to over-estimate the size of effects and are more likely to be Type 1 errors (Ellis, 2010)

i.e., studies with small N are more likely to give misleading results (but not always)

low power is BAD for individual researchers and BAD for Psychology as a discipline

what you should do:

• make sure your own study (e.g., FYP) has sufficient power• use something like G*Power to calculate your N for

power (1-β) = 0.8

• simplify your designs, only include what’s necessary• an extra IV or condition either reduces power or

raises the N you need

what you should do:

• pay more attention to published studies that have greater power – e.g., larger samples

• distrust results from small samples

• look for meta-analyses

psy2004 research methods psy2005 applied research methods week eleven stephen nunn

Documents

years material

ivnoise chapters

weak chapters

hypotheses chapters

mean power

small effects

estimates of power

significant result chapters