chapter 10: inferences involving two populations

45
Chapter 10: Inferences Involving Two Populations 1 2 2 1 a 2 1 0 : : H H

Upload: augusta-stephens

Post on 29-Dec-2015

233 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Chapter 10: Inferences Involving Two Populations

Chapter 10: Inferences Involving Two Populations

1 2

21a

210

:

:

H

H

Page 2: Chapter 10: Inferences Involving Two Populations

Independent and Dependent Samples

• The object is to compare means of two samples and draw conclusions about the differences in population means.

• Two basic kinds of samples: independent and dependent (paired).

• Which kind you have depends on the sources of two samples and how the data was collected.

Page 3: Chapter 10: Inferences Involving Two Populations

Dependent Samples

• If one observation is collected for each sample from the same source, the samples are dependent.

• This is often called “pair data” because you get a pair of observations from one individual or experimental unit.

• Examples include pretest & posttest scores, weight before and after a diet, left eye and right eye acuity, etc.

• There is a one-to-one correspondence between an observation in one sample and an observation in the other sample.

Page 4: Chapter 10: Inferences Involving Two Populations

Independent Samples

• Two samples are independent if there is no connection between an observation of one sample with a particular observation of the other.

• Also, there can be no connection in the sampling procedure (an individual selected for one in no way affects the selection of any individual in the other, including by exclusion)

Page 5: Chapter 10: Inferences Involving Two Populations

Dependent Sample Examples

• The same test is given to all students at the beginning and end of a course to measure learning (one pair of scores per person).

• IQ tests are given to husband & wife pairs.

• A medical treatment is given to patients matched for condition, age, sex, race, weight, and other characteristics with patients in a control group.

Page 6: Chapter 10: Inferences Involving Two Populations

Independent Sample Examples

• The same test is given to all students in two classes (no pairing of scores occurs).

• IQ tests are given to men and women without consideration of relationship between any of them.

• Subjects are randomly assigned to a treatment and a control group to test a new drug. No attempt is made to “match” them.

Page 7: Chapter 10: Inferences Involving Two Populations

Difference of means for Paired Data

• When dependent samples are involved, the data is paired data.

• Paired data results from:– before and after studies,– a common source, or– from matched pairs.

• We will denote the random variables from the two samples by X1 and X2.

Page 8: Chapter 10: Inferences Involving Two Populations

The meaning of X1 and X2

• Two samples are being taken. For example, the pretest is one sample, and the posttest is another sample.

• Then X1 represents a pretest score and X2 represents a posttest score.

• However, there is an X1 and an X2 for each person taking the test.

Page 9: Chapter 10: Inferences Involving Two Populations

Paired Data

X1 X2

x11 x21

x12 x22

x13 x23

x14 x24

x15 x25

x16 x26

x17 x27

… …

Here the random variable namesare given by capital letters, and individual observations by smallletters. The data appear side-by-side, each pair having the samesecond subscript. You cannotchange the order of one columnwithout destroying the relation-ship between the columns. Thatis what makes it “paired data.”There are the same number ofobservations, n, in each sample.

Page 10: Chapter 10: Inferences Involving Two Populations

What do we want to know?

• In paired data studies, the parameter of interest is the mean difference between the groups.

• This is conceptually different from the difference between the means of the groups.

• In other words, the population of interest is actually the differences between X1 and X2 .

• We define a new value, d=x1-x2 as one observation taken from this population.

Page 11: Chapter 10: Inferences Involving Two Populations

Why is this important?

• The mean difference between the groups and the difference between the means of the groups are the same number.

• But their sampling distributions are different!

• From , we calculate , the mean difference.

• Now, will have a normal distribution if X1 and X2 are normal or n>30 (approx).

d1 2d x x

d

Page 12: Chapter 10: Inferences Involving Two Populations

Distribution of mean differences

• If we know is normally distributed, then we can use the same tests and confidence intervals that we learned for .

• We won’t bother with the “variance known” situation this time. We will calculate the variance from the sample and use the t distribution.

• In other words, treat the d’s as the sample. Find their mean and standard deviation.

d

x

Page 13: Chapter 10: Inferences Involving Two Populations

Distribution of mean differences

• There is a population parameter, , that we are trying to estimate.

• The point estimate is , taken from a sample of n differences (d’s).

• The d’s have a standard deviation, which is calculated in the same way as s.

• The standard deviation of is

• This is no difference from whatwe had before, except for symbols!

d

d

ds

d dd

ss

n

Page 14: Chapter 10: Inferences Involving Two Populations

Confidence Interval for Paired Differences

• A (1-α)100% CI for is given by:

, where .

d

dd

ss

n n-1, 2

dd t s

Page 15: Chapter 10: Inferences Involving Two Populations

Example: Salt-free diets are often prescribed for people with high blood pressure. The following data was obtained from an experiment designed to estimate the reduction in diastolic blood pressure as a result of following a salt-free diet for two weeks. Assume diastolic readings to be normally distributed.

Find a 99% confidence interval for the mean reduction.

Question: How do you decide which way to subtract?

Before 93 106 87 92 102 95 88 110

After 92 102 89 92 101 96 88 105

Difference 1 4 -2 0 1 -1 0 5

Page 16: Chapter 10: Inferences Involving Two Populations

Solution:Population Parameter of Interest:

The mean reduction (difference) in diastolic blood pressure.

Determine the distribution to use:Assumptions: Both sample populations are assumed normal, σ unknown.Use t with df = 8 1 = 7.Confidence level: 1 = 0.99

Two-tailed situation, /2 = 0.005t(df, /2) = t(7, 0.005) = 3.50

Sample evidence:Sample information:

39.2 and ,0.1 ,8 dsdn

Page 17: Chapter 10: Inferences Involving Two Populations

Calculate the error bound:

A 99% confidence interval for d is

n-1, / 2

2.393.50 (3.50)(0.845) 2.957

8ds

tn

1.0 2.957

(-1.957, 3.957)

Page 18: Chapter 10: Inferences Involving Two Populations

Hypothesis Testing:

When testing a null hypothesis about the mean difference, the test statistic is

where t* has a t distribution with df = n 1.

Example: The corrosive effects of various chemicals on normal and specially treated pipes were tested by using a dependent sampling plan. The data collected is summarized by

where d is the amount of corrosion on the treated pipe subtracted from the amount of corrosion on the normal pipe.

0* d

d

dt

s n

8.4 ,7.5 ,17 dsdn

Page 19: Chapter 10: Inferences Involving Two Populations

Example (continued): Does this sample provide sufficient evidence to conclude the specially treated pipes are more resistant to corrosion? Use = 0.05 a. Solve using the classical approach. b. Solve using the p-value approach.

Solution:1. State the hypotheses (you must say something about the direction of the difference): Test for the mean difference in corrosion, normal pipe - treated pipe. The null and alternative hypothesis:

H0: d = 0 (did not lower corrosion)

Ha: d > 0 (did lower corrosion)

Page 20: Chapter 10: Inferences Involving Two Populations

2. Determine the appropriate type of test:

Assumptions: Assume corrosion measures are approximately normal, σ unknown.

Use t-test for paired differences.

3. Define the rejection region:a. Right tailed test, Reject H0 if t*>t(16,0.05) = 1.75.

b. Reject H0 if p<.05.

4. Calculate the value of the test statistic:8.4 ,7.5 ,17 dsdn

0 5.7 0.0 5.7* 4.896

1.1644.8 17d

d

dt

s n

P( * 4.896) .0001 by the table

P( * 4.896) 0.00006731 using Excel

p t

p t

Page 21: Chapter 10: Inferences Involving Two Populations

5. State the conclusion:

a. Decision: Reject H0 because t*=4.896>1.75b. Decision: Reject H0 because p<.0001<α=.05.Conclusion: The treated pipes do not corrode as much as the normal pipes when subjected to chemicals.

Page 22: Chapter 10: Inferences Involving Two Populations

Two Independent Samples• Compare the means of two populations• Parameter of interest: (1 - 2)• Base inferences on • The parentheses indicate that we are

thinking of the difference as one parameter• Consider the general confidence interval

formula, P±TS. We know what P is now.• We need to know the distribution of

to find T and S.

1 2( )x x

1 2( )X X

Page 23: Chapter 10: Inferences Involving Two Populations

Distribution of . • The sampling distribution of

has a mean,

• The point estimate of is

• The standard deviation of is

• Since the variances are hardly ever known, we will have to estimate them.

1 2( )x x

1 2( )X X

1 2( )X X

1 2 1 2( )x x

1 2x x

1 22

2 21 2

1 2x x n n

1 2( )X X

Page 24: Chapter 10: Inferences Involving Two Populations

Sample Standard Deviation• The sample standard deviation of is

• The following assumptions are needed to use the above formula:– The samples are randomly selected from normally

distributed populations

– The samples are independent

– There is no reason to believe σ1=σ2

– The populations (not samples) are “large”

1 22

2 21 2

1 2x x

s ss

n n

1 2( )X X

Page 25: Chapter 10: Inferences Involving Two Populations

Distribution• The t distribution will be used.• Degrees of freedom:

– If n1=n2, no problem, df=n1-1.– Otherwise, df may be calculated by a

complicated formula. Statistical computer software will do this automatically.

– Alternatively, the smaller of n1-1 and n2-1 can be used as an approximation. (conservative—actual confidence level will be higher, actual p-value will be lower)

Page 26: Chapter 10: Inferences Involving Two Populations

Confidence Interval• Now we have all the information we need.• P=

T=t(df,α/2)

S=

• A (1-α)100% confidence interval for (1-2)

is given by

1 2( )x x

2 21 2

1 2

s s

n n

2 21 2

1 2 (df , / 2)1 2

( )s s

x x tn n

Page 27: Chapter 10: Inferences Involving Two Populations

Example: A recent study reported the longest average workweeks for non-supervisory employees in private industry to be chef and construction.

Find a 95% confidence interval for the difference in mean length of workweek between chef and construction. Assume normality for the sampled populations.

Solution:

Parameter of interest: 1 - 2 where 1 is the mean hours/week for chefs and 2 is the mean hours/week for

construction workers.

Industry n Average Hours/Week Standard Deviation

Chef 18 48.2 6.7

Construction 12 44.1 2.3

Page 28: Chapter 10: Inferences Involving Two Populations

df = 11, the smaller of: n1 1 = 18 1 = 17 and n2 1 = 12 1 = 11.

= 0.05

t(df, /2) = t(11, 0.025) = 2.20

A 95% Confidence interval for 1 - 2 is

1 2 48.2 44.1 4.1x x

2 2 2 21 2

1 2

6.7 2.3(1.7131)

18 12

s s

n n

4.1 (2.20)(1.7131) 4.1 3.77

(0.33,7.87)

Page 29: Chapter 10: Inferences Involving Two Populations

Note:

1. Using a calculator, the confidence interval is .55 to 7.65.

2. This confidence interval is narrower than the approximate interval computed on the previous slide. This illustrates the conservative (wider) nature of the confidence interval when approximating the degrees of freedom.

Page 30: Chapter 10: Inferences Involving Two Populations

Hypothesis Tests:

To test a null hypothesis about the difference between two population means, use the test statistic

where df is the smaller of df1 or df2 when computing t* without the aid of a computer.

Note: The hypothesized difference between the two population means (01 02) can be any specified value. The most common value is zero.

1 2 01 02

2 21 2

1 2

( ) ( )*

x xt

s sn n

Page 31: Chapter 10: Inferences Involving Two Populations

Example: A recent study compared a new drug to ease post-operative pain with the leading brand. Independent random samples were obtained and the number of hours of pain relief for each patient were recorded. The summary statistics are given in the table below.

Is there any evidence to suggest the new drug provides longer relief from post-operative pain? Use = 0.05

a. Solve using the p-value approach.

b. Solve using the classical approach.

Pain Reliever n Mean St.Dev.

New Drug 10 4.350 0.542

Leading Brand 17 3.929 0.169

Page 32: Chapter 10: Inferences Involving Two Populations

Solution:

1. The Hypotheses:

H0: 1 2 = 0 (new drug relieves pain no longer)

Ha: 1 2 > 0 (new drug works longer to relieve pain)

2. The appropriate test:

Assumptions: Both populations are assumed to be approximately normal. The samples were random and independently selected.

Use t*, df = 9

3. Rejection Region: Reject if t*>t(9, 0.05) = 1.83 or p<.05.

4. Calculations:

39.21763.0421.0

0017.00294.0421.0

17169.0

10542.0

)00.0()929.3350.4()()(*

22

2

22

1

21

2121

ns

ns

xxt

Page 33: Chapter 10: Inferences Involving Two Populations

4. (cont’d)The p-value:

5. The Conclusion:Decision: Reject H0.Conclusion: There is evidence to suggest that the new drug provides longer relief from post-operative pain.

019.0)9df with ,39.2*(P tP

Page 34: Chapter 10: Inferences Involving Two Populations

If independent samples of sizes n1 and n2 are drawn randomly from large populations with p1 = P1(success) and p2 = P2(success), respectively, then the sampling distribution of has these properties:

1. a mean

2. a standard error

3. an approximately normal distribution if n1 and n2 are sufficiently large.

21 pp

2121pppp

2

22

1

1121 n

qpnqp

pp

Difference of Two Proportions

Page 35: Chapter 10: Inferences Involving Two Populations

Note: To ensure normality:1. The sample sizes are both larger than 20.

2. The products n1p1, n1q1, n2p2, n2q2 are all larger than 5.

Since p1 and p2 are unknown, these products are estimated by

3. The samples consist of less than 10% of respective populations.

Confidence Intervals:

1. A confidence interval for p1 p2 is based on the unbiased sample statistic . 2. The confidence limits are found using the following formula:

22221111 , , , qnpnqnpn

1 1 2 21 2

1 2

( ) ( / 2)p q p q

p p zn n

21 pp

Page 36: Chapter 10: Inferences Involving Two Populations

Example: A consumer group compared the reliability of two similar microcomputers from two different manufacturers. The proportion requiring service within the first year after purchase was determined for samples from each of two manufacturers.

Find a 98% confidence interval for p1 p2, the difference in proportions needing service.

Manufacturer Sample Size Proportion Needing Service

1 200 0.15

2 250 0.09

Page 37: Chapter 10: Inferences Involving Two Populations

Solution:

1. Population Parameter of Interest: p1-p2 where p1 is the proportion of computers needing service for manufacturer 1 and p2 is the proportion of computers needing service for manufacturer 2.

2. Check the Assumptions:

Sample sizes larger than 20.

Products all larger than 5.

should have an approximate normal distribution.

Use Z distribution.

22221111 , , , qnpnqnpn

21 pp

Page 38: Chapter 10: Inferences Involving Two Populations

3. The Sample Evidence:

Sample information:

Point estimate:

4. The Confidence Interval:

a. Confidence coefficients:

z(/2) = z(0.01) = 2.33

b. Error Bound:

91.009.01 ,09.0 ,250

85.015.01 ,15.0 ,200

222

111

qpn

qpn

06.009.015.021 pp

0724.0)0311.0)(33.2(0003276.00006375.033.2

250)91.0)(09.0(

200)85.0)(15.0(

33.2)2/(2

22

1

11

n

qpnqp

zE

Page 39: Chapter 10: Inferences Involving Two Populations

c. Confidence limits:

Hypothesis Tests for the difference of two proportions:Look at what we’ve done before, e.g.:

All of our hypothesis tests have made use of this form:

1 2( ) E

0.06 0.0724

( 0.0124, 0.1324)

p p

0*x

xt

s

Estimate - Hypothesized ValueTest Statistic=

St. Dev. of Estimate

Page 40: Chapter 10: Inferences Involving Two Populations

Parts of the Test

If the null hypothesis is there is no difference between proportions, and we can assume normality,

a. the test statistic is z*

b. the parameter estimate is

c. the hypothesized value is 0

d. the standard error is … (more to come)

21 pp

Page 41: Chapter 10: Inferences Involving Two Populations

Let’s consider how we can construct a standard error term.

Now the standard deviation of p1' p2' is actually

However, if the null hypothesis is true, p1 = p2, so we can say

1 2

1 1 2 2

1 2p p

p q p q

n n

1 21 2 1 2

1 1p p

pq pqpq

n n n n

Page 42: Chapter 10: Inferences Involving Two Populations

But we don’t know p and q!

How can we estimate these from the sample?

Under the null hypothesis, the proportions of the two samples are the same. So simply take all of the data and pool it together to estimate the common proportion.

The test statistic becomes

ppp pqnnxx

p 1 and ,

21

21

21

21

11))((

*

nnqp

ppz

pp

Page 43: Chapter 10: Inferences Involving Two Populations

Example: The proportions of defective parts from two different suppliers were compared. The following data were collected.

Is there any evidence to suggest the proportion of defectives is different for the two suppliers? Use = 0.01.

Supplier Sample Size Number Defective

1 300 15

2 275 9

Page 44: Chapter 10: Inferences Involving Two Populations

1. The null and alternative hypotheses:

H0: p1 p2 = 0 (proportion of defectives the same)

Ha: p1 p2 0 (proportion of defectives different)2. The type of test:

Difference of proportions, withSamples are larger than 20.Products are larger than 5.Sampling distribution should be approximately

normal.Use z* for difference of proportions.

3. Rejection region: Reject H0 if z* > z(.005) = 2.575 orz* < -z(.005) = -2.575

4. Calculations:

22221111 , , , qnpnqnpn

958.0042.011

042.057524

275300915

21

21

pp

p

pq

nnxx

p

Page 45: Chapter 10: Inferences Involving Two Populations

4. Calculations cont’d:

5. Conclusion:

Do not reject H0 and conclude that there is no evidence to suggest the proportion of defectives is different for the two suppliers.

03.10167.00173.0

0002804.00173.0

2751

3001

)958.0)(042.0(

0327.005.0

11))((

*

21

21

nnqp

ppz

pp