effect size
TRANSCRIPT
Effect size
An Effect Size is the strength or magnitude of the difference between two sets of data or, in
outcome studies, between two time points for the same population. The degree to which the null
hypothesis is false. In statistical hypothesis testing and power analysis, an effect size is the size
of a statistically significant difference; that is, a difference between a mathematical characteristic
(often the mean) of a distribution of a dependent variable associated with a specific level of an
independent variable and the same characteristic of another distribution defined by a different
level of the independent variable. Effect size is a different concept to statistical significance, and
it is often relevant to compute an effect size measure when a conventional threshold for
statistical significance, such as p < .05, has not been met.
In its simplest form, an effect size is the difference between two means divided by the
pooled standard deviation for those means. Different people offer different advice regarding how
to interpret the resultant effect size, but the most accepted opinion is that of Cohen (1992) where
0.2 is indicative of a small effect, 0.5 a medium and 0.8 a large effect size.
The basic formula to calculate the effect size is to subtract the mean of the control group
from that of the experimental group and, then, to divide the numerator by the standard deviation
of the scores for the control group. Effect size is expressed as a decimal number and, while
numbers greater and 1.00 are possible, they do not occur very often. Thus, an effect size near .00
means that, on average, experimental and control groups performed the same; a positive effect
size means that, no average, the experimental group performed better than the control group;
and, a negative effect size means that, on average, the control group performed better than the
experimental group did. For positive effect sizes, the larger the number, the more effective the
experimental treatment.
Statistical power
The power of any test of statistical significance is defined as the probability that it will reject a
false null hypothesis. Statistical power is inversely related to beta or the probability of making
a Type II error. In short, power = 1 – β. Statistical power is the likelihood that a study will detect
1
an effect when there is an effect there to be detected. If statistical power is high, the probability
of making a Type II error, or concluding there is no effect when, in fact, there is one, goes down.
Statistical power is affected chiefly by the size of the effect and the size of
the sample used to detect it. Bigger effects are easier to detect than smaller effects, while large
samples offer greater test sensitivity than small samples. The power of any test of statistical
significance will be affected by four main parameters:
The effect size
The sample size (N)
The alpha significance criterion (α)
Statistical power, or the chosen or implied beta (β)
The researcher judges a significant difference exists between the sample means when
there isn't one. On the other hand, the smaller the alpha level (e.g., α = .001), the more likely it is
that the researcher will accept a false null hypothesis (a Type II error). That is, the researcher
judges that there is not a significant difference between the sample means when there is one.
Assuming the researcher can't conduct a census yet wants to ensure that it will be tougher
to detect a significant difference between two sample means (that is, to reject the null
hypothesis), the researcher will select a smaller alpha level. For example, where α = .05 (a
"bigger" alpha—5 in 100 chances of a probability of error) the researcher is more likely to find a
significant difference than when α is .001 (a "smaller" α—1 in 1000 chances of a probability of
error).
The challenge confronting the researcher is that if α is set either too high or too low, it is
likely that the researcher will make a wrong determination regarding the null hypothesis. Don't
forget: the null hypothesis says that there is no difference between the means, that is, they are
equal at a stated probability of error (i.e., the researcher is wrong).
This is the point where confusion can enter into the picture, especially as students begin
to learn these relatively basic concepts. To avoid any confusion, re-read the statements in the
preceding paragraph by applying them to the Null Hypothesis Chart:
2
Null Hypothesis
True False
Accept Correct Type II Error
Decision
Reject Type I Error Correct
Decision
A Type I Error is when the researcher rejects a true null hypothesis. That is, the
researcher says there is a significant difference between the sample means when, in fact, there is
no significant difference. A Type II Error is when the researcher accepts a false null hypothesis.
That is, the researcher says that there is no significant difference when, in reality, there is a
significant difference. Thus, if an analysis has little statistical power, the researcher is likely to
overlook or miss the outcome he desired to discover because the analysis did not have enough
statistical power to detect the significant difference that would have been evident if the statistical
power had been greater.
There are three ways to accomplish this, all of which are interrelated, meaning that they
impact one another. The obvious first choice is to increase the sample size which decreases the
amount of sampling error present into the sample. The second choice involves tinkering with the
significance level (e.g., a priori changing p = .05 to p = .01). The third choice is to alter the
effect size, that is, to seek an outcome of a statistical test that departs more from the null
hypothesis. Thus, as the sample size, significant level, and the effect size increase, so does the
power of the significance test which is logical, because power increases automatically with an
increase in the sample size and virtually any difference can be made significant if the sample is
large enough.
3
Sample size
Sample size is the number of observations used for calculating estimates of a given population.
For example, if we interviewed 30 random students at a given high school to see if they liked a
certain music artist, "30 students" would be our sample size.
The aim of statistical testing is to uncover a significant difference when it actually exists.
In its simplest form this involves comparing samples between one regime and another (which
may be a control). Sample size is important because larger samples increase the chance of
finding a significant difference, but cost more money.
The sample size is chosen to maximize the chance of uncovering a specific mean
difference, which is also statistically significant. Please note that specific difference and
statistically significant are two quite different ideas.
The specific difference is chosen by the researcher in terms of the outcome measure of
the experiment. For instance, 3kg mean weight change in a diet experiment, 10% mean
improvement in a teaching method experiment.
Statistical significance is a probability statement telling us how likely it is that the
observed difference was due to chance only. The reason larger samples increase your chance of
significance is because they more reliably reflect the population mean.
4