statistical power the ability to find a difference when one really exists

21
Statistical Power The ability to find a difference when one really exists.

Upload: dana-harrison

Post on 28-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Power The ability to find a difference when one really exists

Statistical PowerThe ability to find a difference when

one really exists.

Page 2: Statistical Power The ability to find a difference when one really exists

Statistical Power

The probability of rejecting a false null hypothesis (H0).

The probability of obtaining a value of t (or z) that is large enough to reject H0 when H0 is actually false

We always test the null hypothesis against an alternative/research hypothesis

Usually the goal is to reject the null hypothesis in favor of the alternative

Page 3: Statistical Power The ability to find a difference when one really exists

Why is Power Important?

As researchers, we put a lot of effort into designing and conducting our research. This effort may be wasted if we do not have sufficient power in our studies to find the effect of interest.

Page 4: Statistical Power The ability to find a difference when one really exists

Type I versus Type II Error

A researcher can make two types of error when reporting the results of a statistical test.

Actual State of Reality

Researcher Decision

H0 is true H0 is false

Reject H0 Type I error ()Correct Decision

(1 – β)

Accept H0Correct Decision

(1 – )Type II error (β)

Page 5: Statistical Power The ability to find a difference when one really exists

Type I Error

The probability of a type I Error is determined by the alpha () level set by the researcher

Actual State of Reality

Researcher Decision

H0 is true H0 is false

Reject H0 Type I error ()Correct Decision

(1 – β)

Accept H0Correct Decision

(1 – )Type II error (β)

Page 6: Statistical Power The ability to find a difference when one really exists

Type II Error

A type II Error (β) results when the researcher finds that there isn’t a difference, when there really is one.

Actual State of Reality

Researcher Decision

H0 is true H0 is false

Reject H0 Type I error ()Correct Decision

(1 – β)

Accept H0Correct Decision

(1 – )Type II error (β)

Page 7: Statistical Power The ability to find a difference when one really exists

Statistical Power

Power is the ability of a test to detect a real effect. It is measured as a probability that equals 1 – β.

Actual State of Reality

Researcher Decision

H0 is true H0 is false

Reject H0 Type I error ()Correct Decision

(1 – β)

Accept H0Correct Decision

(1 – )Type II error (β)

Page 8: Statistical Power The ability to find a difference when one really exists

Power depends on…

To discuss power we need to understand the variables that affect its size.

1. The alpha level set by the researcher

2. The sample size (N)

3. The effect size (e.g., Cohen’s d)

Page 9: Statistical Power The ability to find a difference when one really exists

Power and Alpha ()

An increase in alpha, say from .05 to .1, artificially increases the power of a study.

Increasing alpha reduces the risk of making a type II error, but increases that of a type I.

Increasing the risk of making a type I error, in many cases, may be worse than making a type II error. E.g., replacing an effective chemotherapy drug

with one that is, in reality, less effective.

Page 10: Statistical Power The ability to find a difference when one really exists

Power and Sample Size (N)

Power increases as N increases. The more independent scores that are

measured or collected, the more likely it is that the sample mean represents the true mean.

Prior to a study, researchers rearrange the power calculation to determine how many scores (subjects or N) are needed to achieve a certain level of power (usually 80%).

Page 11: Statistical Power The ability to find a difference when one really exists

Power and Effect Size

Effect size is a measure of the difference between the means of two groups of data.

For example, the difference in mean jump ht. between samples of vball and bball players.

As effect size increases, so does power. For example, if the difference in mean jump

ht. was very large, then it would be very likely that a t-test on the two samples would detect that true difference.

Page 12: Statistical Power The ability to find a difference when one really exists

A Little More on Effect Size

While a p-value indicates the statistical significance of a test, the effect size indicates the “practical” significance.

If the units of measurement are meaningful (e.g., jump height in cm), then the effect size can simply be portrayed as the difference between two means.

If the units of measurement are not meaningful (questionnaire on behaviour), then a standardized method of calculating effect size is useful.

Page 13: Statistical Power The ability to find a difference when one really exists

Cohen’s d Cohen’s d is a common effect size index It describes the difference between two means in

terms of number of standard deviations The standard deviation (σpooled) represents a

weighted average variance from both samples

pooled

uud

21

)1()1(

)1()1(

21

222

211

NN

NNpooled

Page 14: Statistical Power The ability to find a difference when one really exists

Hypothetical Example

To understand statistical power the following slides provide a hypothetical example.

Assume that we know the actual effect size. The actual difference between the means.

Page 15: Statistical Power The ability to find a difference when one really exists

Jump Height Example

Basketball vs. Volleyball, who jumps higher? We have 16 athletes in each sample (N=16) We know the population means are:

Basketball: Mean jump ht = 30 5.7 in. Volleyball: Mean jump ht = 36 5.7 in.

Alpha = .05

Using the above information we can graphically demonstrate statistical power.

Knowing there is a difference, how many times out of 100 tests would we be correct?

Page 16: Statistical Power The ability to find a difference when one really exists

Step 1. What’s t-critical for the study?

For what t score will we consider there to be a significant difference between bball and vball?

We know, N=32 (df = 30), =.05, and 2-tailed Use =tinv() in Excel tinv(.05/2, 30) = 2.36….t critical = 2.36

Page 17: Statistical Power The ability to find a difference when one really exists

t critical: +2.36

-3-4 -1-2 0 1 2 3 4

t-distribution for 30

degrees of freedom / 2 = .025

t critical: -2.36

/ 2 = .025

We know the mean jump height for bball is 30, We know the mean jump height for bball is 30, then what would vball need to be to get t = 2.36?then what would vball need to be to get t = 2.36?

Use the independent t-test equationUse the independent t-test equation

22BV

BV

SEMSEM

XXt

For both groups

N = 16, Stdev = 5.7

0.2

3036.2

VX 8.34VX

Step 2Step 2

Page 18: Statistical Power The ability to find a difference when one really exists

On the actual distribution ofOn the actual distribution ofXXVV,which has a mean ,which has a mean

of 36, what would be the t-value for 34.8 in? of 36, what would be the t-value for 34.8 in?

Can you calculate the probability (area) of getting Can you calculate the probability (area) of getting a mean a mean 34.8 if the real mean is 36?...Type 2 34.8 if the real mean is 36?...Type 2 ErrorError

Critical value ofXV: 34.8 in

2422 2826 30 32 34 36 38

Distribution of values ofXV

if H0 is true (V = 30) and

SEDiff is 2

t critical: +2.36

-3-4 -1-2 0 1 2 3 4

/ 2 = .025 / 2 = .025

t critical: -2.36t-distribution for 30

degrees of freedom

Step 3Step 3

Page 19: Statistical Power The ability to find a difference when one really exists

β = .20Power

(1- β) = .80

Step 4Step 4

Distribution of values

ofXV if V = 36 and

SEMean is 1.43.

34.8

31.730.3 33.2 36 37.4 38.9 40.3 41.734.6

43.1

8.3436 t 87.0t

tdist(t,df,tails)

tdist(.87,15,1) = .20

-3-4 -1-2 0 1 2 3 4

t-distribution for 15

degrees of freedom

t = - 0.87

Page 20: Statistical Power The ability to find a difference when one really exists

Step 4Step 4

Distribution of values

ofX if H0 is true ( = 30)

and SEDiff is 2.

t critical: +2.36

-3-4 -1-2 0 1 2 3 4

2422 2826 30 32 34 36 38

Critical value ofX: 34.8 in

/ 2 = .025 / 2 = .025

t critical: -2.36t-distribution for 30

degrees of freedom

β = .20Power

(1- β) = .80

Distribution of values

ofXV if V = 36 and

SEMean is 1.43.

34.8

31.730.3 33.2 36 37.4 38.9 40.3 41.734.6

Page 21: Statistical Power The ability to find a difference when one really exists

Power Calculator

http://www.stat.ubc.ca/~rollin/stats/ssize/n2.html