18. two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18....

17
18. Two-sample problems for population means (σ unknown) The Practice of Statistics in the Life Sciences Third Edition © 2014 W.H. Freeman and Company

Upload: others

Post on 14-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

18. Two-sample problems for population means (σ unknown)

The Practice of Statistics in the Life Sciences Third Edition

© 2014 W.H. Freeman and Company

Page 2: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

Objectives (PSLS Chapter 18)

Comparing two means (σ unknown)

p Two-sample situations

p t-distribution for two independent samples

p Two-sample t test

p Two-sample t confidence interval

p Robustness

Page 3: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

Two samples situations

We may need to compare 2 treatments or 2 conditions. It is important to determine if the 2 samples are independent or not.

Independent samples: the individuals in both samples are chosen separately (“independently”)

! Chapter 18

Matched pairs samples: the individuals in both samples are related (for example, the same subjects assessed twice, or siblings)

! Chapter 17

Page 4: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

t distribution for 2 independent samples

We have 2 independent SRSs coming from 2 populations with (µ1,σ1)

and (µ2,σ2) unknown. We use ( 1,s1) and ( 2,s2) to estimate (µ1,σ1) and

(µ2,σ2) respectively.

p Both populations should be Normally distributed.

p In practice, it is enough that both distributions have similar shapes

and that the sample data contain no strong outliers.

x

x

Page 5: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

( ) ( )

2

22

1

21

2121

ns

ns

xxt

+

−−−=

µµThe two-sample t statistic follows approximately a t distribution with a standard error SE (denominator) reflecting variation from both samples.

The degrees of freedom of the t distribution are computed as follows:

df =

s12

n1+s22

n2

⎝ ⎜

⎠ ⎟

2

1n1 −1

s12

n1

⎝ ⎜

⎠ ⎟

2

+1

n2 −1s22

n2

⎝ ⎜

⎠ ⎟

2

Page 6: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

Two-sample t test

If you have 2 independent random samples and want to test

H0: µ1 = µ2 <=> µ1 − µ2 = 0

with either a one-sided or a two-sided alternative hypothesis:

Compute the t statistic and appropriate df.

Obtain and interpret the P-value (one- or two-sided, depending Ha).

( ) ( )1 2 1 2

2 21 2

1 2

x xt

s sn n

µ µ− − −= ⇒

+

1 2x xt

SE−

=

Page 7: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

Does smoking damage the lungs of children exposed to parental smoking?

Forced Vital Capacity (FVC) is the volume (in milliliters) of air that an individual can exhale in 6 seconds.

FVC was obtained for a sample of children not exposed to parental smoking and for a sample of children exposed to parental smoking.

Is the mean FVC lower in the population of children exposed to parental

smoking? We test:

Parental smoking Mean FVC s n

Yes 75.5 9.3 30

No 88.2 15.1 30

H0: µsmoke = µno <=> (µsmoke − µno) = 0

Ha: µsmoke < µno <=> (µsmoke − µno) < 0 (one-sided)

Page 8: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

92.3 24.3

7.12

301.15

303.9

2.885.752222

−≈−

=

+

−=

+

−=

no

no

smoke

smoke

nosmoke

ns

ns

xxt

Lung capacity is significantly impaired in children exposed to parental smoking, compared with children not exposed to parental smoking.

Parental smoking xbar s n

Yes 75.5 9.3 30

No 88.2 15.1 30

( )

( ) ( )

22 21 2

21 22 22 2 2 2

1 2

1 1 2 2

2.9 7.6481 11 1 2.9 7.6

29 291 1

s sn n

dfs s

n n n n

⎛ ⎞+⎜ ⎟ +⎝ ⎠= = ≈

⎛ ⎞ ⎛ ⎞ ++⎜ ⎟ ⎜ ⎟− −⎝ ⎠ ⎝ ⎠

In Table C, for df 40, we find: |t| > 3.551 ➔ P < 0.0005 (one-sided)

Software gives P = 0.00014, highly significant ➔ We reject H0

Page 9: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

Geckos are lizards with specialized toe pads that enable them to easily climb even slick surfaces. Researchers want to know if male and female geckos differ significantly in the size of their toe pads. In a random sample of Tokay geckos, they find that the mean toe pad area is 6.0 cm2 for the males and 5.3 cm2 for the females. What is the appropriate null hypothesis here?

A. .

B. .

C. .

D. .

E. .7.0:

0:

0:

0:

7.0:

0

0

0

0

0

=

=

=−

=−

=−

FMdifference

FMdifference

femalemale

femalemale

femalemale

H

H

H

xxH

xxH

µ

µ

µµ

Should the alternative hypothesis be one-side or two-sided?

Page 10: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

Two sample t-confidence interval

Because we have 2 independent samples we use the difference between

both sample averages to estimate (µ1 − µ2).

C

t*−t*

m m

p C is the area between −t* and t*

p Find t* in the line of Table C for the computed degrees of freedom

The margin of error m is:

SEtns

nstm **

2

22

1

21 =+=

( )21 xx −

Page 11: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

How does pesticide help seedling growth? Seeds are randomly assigned to be planted in pots with soil treated with pesticide or in pots with untreated soil. Seedling growth (in mm) is recorded after 2 weeks.

A 95% confidence interval for (µ1 - µ2) is:

Table C

2

22

1

21

21 *)(ns

ns

txx +±−

Using df = 30 from Table C, we get:

mm 80.896.9or ,2315.17

2101.11

042.2)52.4148.51(22

±+±−

Page 12: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

72.831.4*024.2 38, dffor

≈=

=

m

t-Test: Two-Sample Assuming Unequal Variances

Treatment group Control groupMean 51.476 41.522 Variance 121.162 294.079 Observations 21 23 Hypothesized Mean Difference - df 38 t Stat 2.311 P(T<=t) one-tail 0.013 t Critical one-tail 1.686 P(T<=t) two-tail 0.026 t Critical two-tail 2.024

Excel

t*

Independent Samples Test

2.362 .132 2.267 42 .029 9.95445 4.39189 1.09125 18.81765

2.311 37.855 .026 9.95445 4.30763 1.23302 18.67588

Equal variancesassumedEqual variancesnot assumed

ReadingScoreF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

SPSS

Growth

We are 95% confident that using pesticide yields seeds that are 1.2 to 18.7 mm longer on average after 2 weeks.

Page 13: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

Robustness

The two-sample statistic is most robust when both sample sizes are

equal and both sample distributions are similar. But even when we

deviate from this, two-sample tests tend to remain quite robust.

As a guideline, a combined sample size (n1 + n2) of 40 or more will allow

you to work even with the most skewed distributions.

Page 14: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

Avoid the pooled two-sample t procedures

There are two versions of the two-sample t procedures: one assuming equal variance (“pooled”) and one not assuming equal variance for

the two populations. They have slightly different formulas and df.

Two Normally distributed populations with unequal variances

The pooled (equal variance) two-sample t test has degrees of freedom n1 + n2 − 2.

However, the assumption of equal variance is hard to check, and thus the unequal variance test is safer.

Page 15: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

REVIEW: t procedures

One-sample t procedure

One sample summarized by its mean x̅ and standard deviation s.

Population parameters µ and σ unknown.

Inference about

Matched pairs t procedure

Two paired datasets (from a matched-pairs design). From the n pairwise differences we compute

Population parameters µdiff and σdiff unknown.

Inference about

Two-sample t procedure

Two independent samples (unrelated individuals in the two samples). We summarize each sample separately with

Population parameters µ1 , σ1 , µ2 , σ2 unknown.

Inference about

1 2µ µ−

µ

).,( diffdiff sx

1 1 2 2( , ; , ).x s x s

µdiff

Page 16: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

Which type of t inference procedure? A: one sample, B: matched pairs, C: two samples ?

p Is blood pressure altered by use of an oral contraceptive? Comparing a sample of women not using an OC with a sample of women taking it.

p Average cholesterol level in general adult population is 175 mg/dl. Take a sample of adults with ‘high cholesterol’ parents. Is the mean cholesterol level higher in this population?

p Does bread lose vitamin with storage? Take a sample of bread loaves and compare vitamin content right after baking and again after 3 days later.

p Does bread lose vitamin with storage? Take a sample of bread loaves just baked and a sample of bread loaves stored for 3 days and compare vitamin content.

Page 17: 18. Two-sample problems for population means (σ unknown)vollmer/stat307pdfs/book_ch18.pdf · 18. Two-sample problems for population means σ ... Chapter 18 Matched pairs samples:

Table C