week 9 testing hypotheses. philosophy of hypothesis testing model data null hypothesis, h 0 (and...

Week 9

Testing Hypotheses

Philosophy of Hypothesis Testing

Null hypothesis, H0 (and alternative, HA)

Test statistic, T

p-value = prob(T is as extreme), if H0 is true

Interpret p-value

Examples of H0 and HA

Null hypothesis examples:

• There is no extrasensory perception.

• There is no difference between the mean pulse rates of men and women.

• There is no relationship between exercise intensity and the resulting aerobic benefit.

Alternative hypotheses examples:

• There is extrasensory perception.

• Men have lower mean pulse rates than women do.

• Increasing exercise intensity increases the resulting aerobic benefit.

Side-effects of generic drug

Pharmac considers replacing an expensive drug with a cheaper generic ‘equivalent’. It is known that 20% of patients experience side-effects from the original drug.

Null, H0: 20% (or fewer) will have side effects.

Altern, HA: More than 20% will have side effects.

Does the generic equivalent have more side-effects?

H0: = 0.2

HA: > 0.2

Logic of hypothesis testing

Like “Presumed innocent until proven guilty”

Assume H0 is true and evaluate the evidence against it

Assume H0 is true and find P(getting data like what was observed)

Psychic Powers

Person correctly guesses outcome of 100 coin flips.

Psychic powers?

(or cheating!!!)

Guessing?

Psychic Powers

Model: Each of 100 ‘guesses’ independent

Each has prob of being correct

Data: 100 guesses with all 100 correct

H0 : = 0.5 guessing

HA : > 0.5 psychic or cheating

Test stat: number correct

Psychic Powers

p-value: P(all 100 correct) if guessing= (½)100 = very small

Conclusion Sample results inconsistent with H0 (guessing)

Extremely strong evidence of psychic powers (or cheating)

Interpreting p-value p-value > 0.1

Data could easily have occurred by chance with H0

No evidence that H0 is wrong

0.05 < p-value < 0.1 Only mild evidence that H0 is wrong

0.01 < p-value < 0.05 Moderately strong evidence that H0 is wrong

p-value < 0.01 Strong evidence that H0 is wrong

Data unlikely to have occurred if H0 were true

Interpreting p-value

However large the p-value, you must never conclude that H0 is true. The best you can say is that there is no evidence H0 is false.

(You could never conclude that H0: = 0.5, since it could just as easily be 0.50000001 — you would have no chance of distinguishing.)

Even if p-value < 0.01, there is still some chance that H0 could be true.

Testing a proportion

Model: n independent trials

Each has prob of being success

Data: x correct

Possible null and alternative hypotheses H0: = 0 vs HA: ≠ 0

H0: ≥ 0 vs HA: < 0

H0: ≤ 0 vs HA: > 0

Test statistic X = #success ~ binomial(n, 0) if H0 is true

p-value = prob (X as extreme as observed, if H0 is true)

Evaluate from binomial (n, 0) Minitab

Side-effects of drug

Standard drug has p(side effect) = 0.2

Generic drug: x = 14 out of n = 50 with side-effect

H0: = 0.2

HA: > 0.2

p-value = prob(X ≥ 14) for binomial(n=50, =0.2)

= 1 – prob(X ≤ 13)

= 1 – 0.8894

= 0.1106

No evidence of higher rate of side-effects with generic

Test statistic X = #success ~ binomial(n, 0) if H0 is true

Big sample : n 0 ≥ 10 and n(1 - 0) ≥ 10

p ~ approx normal0, 0 1−0

⎝ ⎜

⎠ ⎟

⎜ ⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟ ⎟

z = p−0

0 1−0( )

~ approx normal0, 1( )

Test statistic

p-value = prob (Z as extreme as observed, if H0 is true)

Evaluate from normal (0, 1) Minitab or tables

z = p−0

0 1−0( )

~ approx normal0, 1( )

Side-effects of drug

Standard drug has p(side effect) = 0.2

Generic drug: x = 112 out of n = 400 with side-effect

H0: = 0.2

HA: > 0.2

p-value = prob(X ≥ 112) for binomial(n=400, =0.2)

= prob(Z ≥ 4) for normal(0, 1)

= 0.0000Almost certain of higher rate of side-effects with generic

z = p−

001−

= 0.28−0.20.2×0.8400

Left and Right Foot Lengths

H0: = .5 versus Ha: .5

n = 112 students with unequal right and left foot size. x = 63 have longer right foot

Are Left and Right Foot Lengths Equal or Different?

( ) ( )32.1

1125.15.

5.5625.

0 =−

p = 63/112 = .5625

Let = population proportion with a longer right foot.

Assuming = 0.5 (H0),

So p-value = 2(0.093) = 0.186

p-value = 0.186 so no evidence against the null hypothesis.

Although was a tendency toward a longer right foot in sample, there is insufficient evidence to conclude the proportion in the population with a longer right foot is different from the proportion with a longer left foot.

Conclusion in words

Statistical significancevs Real importance

The p-value does not provide information about the magnitude of the effect.

The magnitude of a statistically significant effect can be so small that the practical effect is not important.

If sample size large enough, almost any null hypothesis can be rejected.

Birth Month and Height

Austrian study of heights of 507,125 military recruits.

Men born in spring were, on average, about 0.6 cm taller than men born in fall (Weber et al., Nature, 1998, 391:754–755).

A small difference: 0.6 cm = about 1/4 inch.

Sample size so large that even a very small difference was statistically significant.

Headline: Spring Birthday Confers Height Advantage

Internet and Loneliness

A closer look: actual effects were quite small.

“one hour a week on the Internet was associated, on average, with an increase of 0.03, or 1 percent on the depression scale” (Harman, 30 August 1998, p. A3).

“greater use of the Internet was associated with declines in participants’ communication with family members in the household, declines in size of their social circle, and increases in their depression and loneliness” (Kraut et al., 1998, p. 1017)

Test statistic and p-value

Common definition

p-value from normal (0, 1)

N(0,1) may be improved when s.e. is estimated

z = statistic − null value standard error

Testing Hypotheses About Mean

1. H0: = 0 versus HA: ≠ 0 (two-tailed)

2. H0: ≥ 0 versus HA: < 0 (one- tailed)

3. H0: ≤ 0 versus HA: > 0 (one- tailed)

Often write H0: = 0 for all tests.

Always use = 0 for finding p-values

Model: Random sample from popn with mean Popn is approx normal or reasonably large sample

Hypotheses:

Known (or large n)

Test statistic:

p-value: From normal (0,1)

Only if value of s is known or big enough sample that we can estimate it well (e.g. n > 30)

z = statistic − null value standard error

= x−0

⎛ ⎝ ⎜ ⎞

⎠ ⎟

Unknown — t-test

Test statistic:

t has standard distn if = 0

t-distn with n – 1 degrees of freedom

p-value: Prob of more extreme t if = 0 (i.e. H0) From t(n – 1) distribution

t = statistic − null value standard error

= x−0

⎛ ⎝ ⎜ ⎞

⎠ ⎟

Finding the p-value

Normal Body Temperature

What is normal body temperature?

Less than 98.6 ºF (on average)?

Data: Random sample of n = 18 normal body temps

Model: Body temps are random sample from popn with mean Popn is approx normal

98.2 97.8 99.0 98.6 98.2 97.8 98.4 99.7 98.297.4 97.6 98.4 98.0 99.2 98.6 97.1 97.2 98.5

H0: = 98.6

HA: < 98.6

Hypotheses:

Test statistic:

s.e. x( ) =s

n=0.684

18=0.161

38.2161.0

6.98217.980 −=−

x = 98.217

s = 0.684

p-value:

If = 98.6 , t has a t (n – 1) distn = t (17 d.f.) distn Minitab or Excel can find tail area:

t = x−0s

= −2.38H0: = 98.6

HA: < 98.6

Conclusion:

p-value = 0.015 Moderately strong evidence that the mean body

temperature (of this population) is less than 98.6 ºF

H0: = 98.6

HA: < 98.6

Minitab: Does all the calculations for you

p-value = 0.015 Moderately strong evidence that the mean body

temperature (of this population) is less than 98.6 ºF

Test of mu = 98.600 vs mu < 98.600Variable N Mean StDev SE Mean T PTemperature 18 98.217 0.684 0.161 -2.38 0.015

Paired data example

n = 10 pilots perform flight simulation (a) sober then (b) after alcohol.

Does useful performance time decrease with alcohol use?

Is mean difference zero or >zero?

Paired Data and the Paired t-Test

Data: d = x1 – x2

Model: Differences are random sample from popn with mean d

Parameter estimate:

Hypothesis test:

sdes d=..d

Same analysis as before (CI & test) but using differences

Is the mean difference in the population different from 0?

Paired t-Test

Hypotheses H0: d = 0 HA: d 0 (or d < 0 or d > 0

(Take care picking HA.)

error standard

valuenullmean sample −=

Test statistic

p-value and conclusion From t (n – 1 d.f.) distn Make sure you use correct tails, especially 1-tailed

Effect of Alcohol

Data: 10 pilots do flight simulation (a) sober, then (b) after alcohol Response = useful performance time

Model: Differences are random sample from popn with mean Approx normal

Does mean useful performance time decrease?

Effect of Alcohol

H0: d = 0

HA: d > 0

Hypotheses:

Test statistic:

d = 165 .6sd = 230.5

s.e. d( ) =sdn=230 .5

10=72.9

t= d −0

=195 .6 −0

72.9=2.68

Effect of Alcohol

p-value:

If d = 0 , t has a t (n – 1) distn = t (9 d.f.) distn

Minitab or Excel can find tail area:

H0: d = 0

HA: d > 0

t= d −0

Effect of Alcohol

Conclusion:

p-value = 0.013 Even with a small experiment, there is moderately strong

evidence that alcohol has decreased the mean performance time

H0: d = 0

HA: d > 0

Effect of Alcohol

Minitab: Does all the calculations for you

p-value = 0.013 Even with a small experiment, there is moderately strong

evidence that alcohol has decreased the mean performance time

Test of mu = 0.0 vs mu > 0.0Variable N Mean StDev SE Mean T PDiff 10 195.6 230.5 72.9 2.68 0.013

More about p-values

If H0 is true, you can still get small p-values Prob(p-value ≤ 0.05) = 0.05 Prob(p-value ≤ 0.01) = 0.01 Prob(p-value ≤ 0.001) = 0.001

If HA is true, small p-values are more likely … but big p-values are still possible

p-value is approx 0 — conclude HA almost certainly true.

Never conclude H0 is true

P-values for other tests

If you know H0 and HA

p-value (e.g. from Minitab output)

… then you should be able to report the conclusion from the test

e.g. Test for normality: H0: sample from normal popn; HA: not normal p-value = 0.163 No evidence that popn is not normal

week 9 testing hypotheses. philosophy of hypothesis testing model data null hypothesis, h 0 (and...

evidence h0

itassume h0

proven guiltyassume

drugstandard drug

mild evidence

h0no evidence

expensive drug

original drug

Documents

recent coffee research hypothesis testing. recent coffee...

hypothesis testing · hypothesis testing example: i null...

notes 4: hypothesis testing: hypothesis testing, one ... ·...

non-parametric hypothesis testing...

hypothesis testing some general concepts: null hypothesis h...

hypothesis - testing

introduction hypothesis testing for one mean hypothesis...

hypothesis testing -...

chapter 10 hypothesis testing 10 hypothesis testing

06. hypothesis testing - | department of zoology at...

introduction to hypothesis testing for...

introduction to statistics for hypothesis...

ch7: hypothesis testing (1 sample) 7.1 introduction to...

1 1 slide hypothesis testing chapter 9 ba 201. 2 2 slide...

hypothesis testing. intro to hypothesis testing make a...

hypothesis testing - uwghypothesis testing steps in...

hypothesis testing null hypothesis and research hypothesis ?

hypothesis testing. steps for hypothesis testing fig. 15.3...

hypothesis and hypothesis testing hypothesis a statement...

two statistical inference methods: confidence interval ...