effect size

7
Effect size An Effect Size is the strength or magnitude of the difference between two sets of data or, in outcome studies, between two time points for the same population. The degree to which the null hypothesis is false. In statistical hypothesis testing and power analysis, an effect size is the size of a statistically significant difference; that is, a difference between a mathematical characteristic (often the mean) of a distribution of a dependent variable associated with a specific level of an independent variable and the same characteristic of another distribution defined by a different level of the independent variable. Effect size is a different concept to statistical significance, and it is often relevant to compute an effect size measure when a conventional threshold for statistical significance, such as p < .05, has not been met. In its simplest form, an effect size is the difference between two means divided by the pooled standard deviation for those means. Different people offer different advice regarding how to interpret the resultant effect size, but the most accepted opinion is that of Cohen (1992) where 0.2 is indicative of a small effect, 0.5 a medium and 0.8 a large effect size. The basic formula to calculate the effect size is to subtract the mean of the control group from that of the experimental group and, then, to divide the numerator by the standard deviation of the scores for the control group. Effect 1

Upload: shanu-subramaniam

Post on 29-Nov-2015

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Effect Size

Effect size

An Effect Size is the strength or magnitude of the difference between two sets of data or, in

outcome studies, between two time points for the same population. The degree to which the null

hypothesis is false. In statistical hypothesis testing and power analysis, an effect size is the size

of a statistically significant difference; that is, a difference between a mathematical characteristic

(often the mean) of a distribution of a dependent variable associated with a specific level of an

independent variable and the same characteristic of another distribution defined by a different

level of the independent variable. Effect size is a different concept to statistical significance, and

it is often relevant to compute an effect size measure when a conventional threshold for

statistical significance, such as p < .05, has not been met.

In its simplest form, an effect size is the difference between two means divided by the

pooled standard deviation for those means. Different people offer different advice regarding how

to interpret the resultant effect size, but the most accepted opinion is that of Cohen (1992) where

0.2 is indicative of a small effect, 0.5 a medium and 0.8 a large effect size.

The basic formula to calculate the effect size is to subtract the mean of the control group

from that of the experimental group and, then, to divide the numerator by the standard deviation

of the scores for the control group.  Effect size is expressed as a decimal number and, while

numbers greater and 1.00 are possible, they do not occur very often.  Thus, an effect size near .00

means that, on average, experimental and control groups performed the same; a positive effect

size means that, no average, the experimental group performed better than the control group;

and, a negative effect size means that, on average, the control group performed better than the

experimental group did.  For positive effect sizes, the larger the number, the more effective the

experimental treatment.  

Statistical power

The power of any test of statistical significance is defined as the probability that it will reject a

false null hypothesis. Statistical power is inversely related to beta or the probability of making

a Type II error. In short, power = 1 – β. Statistical power is the likelihood that a study will detect

1

Page 2: Effect Size

an effect when there is an effect there to be detected. If statistical power is high, the probability

of making a Type II error, or concluding there is no effect when, in fact, there is one, goes down.

Statistical power is affected chiefly by the size of the effect and the size of

the sample used to detect it. Bigger effects are easier to detect than smaller effects, while large

samples offer greater test sensitivity than small samples. The power of any test of statistical

significance will be affected by four main parameters:

The effect size

The sample size (N)

The alpha significance criterion (α)

Statistical power, or the chosen or implied beta (β)

The researcher judges a significant difference exists between the sample means when

there isn't one.  On the other hand, the smaller the alpha level (e.g., α = .001), the more likely it is

that the researcher will accept a false null hypothesis (a Type II error).  That is, the researcher

judges that there is not a significant difference between the sample means when there is one.

Assuming the researcher can't conduct a census yet wants to ensure that it will be tougher

to detect a significant difference between two sample means (that is, to reject the null

hypothesis), the researcher will select a smaller alpha level.  For example, where α = .05 (a

"bigger" alpha—5 in 100 chances of a probability of error) the researcher is more likely to find a

significant difference than when α is .001 (a "smaller" α—1 in 1000 chances of a probability of

error). 

The challenge confronting the researcher is that if α is set either too high or too low, it is

likely that the researcher will make a wrong determination regarding the null hypothesis.  Don't

forget: the null hypothesis says that there is no difference between the means, that is, they are

equal at a stated probability of error (i.e., the researcher is wrong).

This is the point where confusion can enter into the picture, especially as students begin

to learn these relatively basic concepts.  To avoid any confusion, re-read the statements in the

preceding paragraph by applying them to the Null Hypothesis Chart:

2

Page 3: Effect Size

                                                Null Hypothesis

                                              True                    False

    Accept                             Correct              Type II Error

                                            Decision

    Reject                            Type I Error          Correct

                                                                       Decision

 

A Type I Error is when the researcher rejects a true null hypothesis.  That is, the

researcher says there is a significant difference between the sample means when, in fact, there is

no significant difference.  A Type II Error is when the researcher accepts a false null hypothesis.

That is, the researcher says that there is no significant difference when, in reality, there is a

significant difference.  Thus, if an analysis has little statistical power, the researcher is likely to

overlook or miss the outcome he desired to discover because the analysis did not have enough

statistical power to detect the significant difference that would have been evident if the statistical

power had been greater.

There are three ways to accomplish this, all of which are interrelated, meaning that they

impact one another.  The obvious first choice is to increase the sample size which decreases the

amount of sampling error present into the sample.  The second choice involves tinkering with the

significance level (e.g., a priori changing p = .05 to p = .01).  The third choice is to alter the

effect size, that is, to seek an outcome of a statistical test that departs more from the null

hypothesis. Thus, as the sample size, significant level, and the effect size increase, so does the

power of the significance test which is logical, because power increases automatically with an

increase in the sample size and virtually any difference can be made significant if the sample is

large enough.

3

Page 4: Effect Size

Sample size

Sample size is the number of observations used for calculating estimates of a given population.

For example, if we interviewed 30 random students at a given high school to see if they liked a

certain music artist, "30 students" would be our sample size.

The aim of statistical testing is to uncover a significant difference when it actually exists.

In its simplest form this involves comparing samples between one regime and another (which

may be a control). Sample size is important because larger samples increase the chance of

finding a significant difference, but cost more money.

The sample size is chosen to maximize the chance of uncovering a specific mean

difference, which is also statistically significant. Please note that specific difference and

statistically significant are two quite different ideas.

The specific difference is chosen by the researcher in terms of the outcome measure of

the experiment. For instance, 3kg mean weight change in a diet experiment, 10% mean

improvement in a teaching method experiment.

Statistical significance is a probability statement telling us how likely it is that the

observed difference was due to chance only. The reason larger samples increase your chance of

significance is because they more reliably reflect the population mean.

4