chapter 11.1 inference for the mean of a population

30
Chapter 11.1 Inference for the Mean of a Population.

Upload: ezra-stanley

Post on 26-Dec-2015

236 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Chapter 11.1 Inference for the Mean of a Population

Chapter 11.1

Inference for the Mean of a Population.

Page 2: Chapter 11.1 Inference for the Mean of a Population

Example 1: One concern employers have about the use of technology is the amount of time that employees spend each day making personal use of company technology, such as phone, e-mail, internet, and games. The Associated Press reports that, on average, workers spend 72 minutes a day on such personal technology uses. A CEO of a large company wants to know if the employees of her company are comparable to this survey. In a random sample of 10 employees, with the guarantee of anonymity, each reported their daily personal computer use. The times are recorded at right.

Employee Time

1 662 703 754 885 696 717 718 639 8910 86

What is different about this problem?

When the standard deviation of a statistic is estimated from the data, the result is called the standard error of the statistic, and is given by s/√n.

When we use this estimator, the statistic that results does not have a normal distribution, instead it has a new distribution, called the t-distribution.

Does the data provide evidence that the mean for this company is greater than 72 minutes?

Page 3: Chapter 11.1 Inference for the Mean of a Population

Time for some Nspiration!

Page 4: Chapter 11.1 Inference for the Mean of a Population

One-Sample z-statistic

known:

statistic of deviation standard

parameter - statisticstatistic test

z =

x

n

Page 5: Chapter 11.1 Inference for the Mean of a Population

One-sample t-statistic:

unknown:

statistic of deviation standard

parameter - statisticstatistic test

t =

x

ns

Page 6: Chapter 11.1 Inference for the Mean of a Population

The variability of the t-statistic is controlled by the Sample Size.

The number of degrees of freeom is equal to n-1 .

Page 7: Chapter 11.1 Inference for the Mean of a Population

ASSUMING NORMALITY? 1.SRS is extremely important.2.Check for skewness.3.Check for outliers.4.If necessary, make a cautionary statement.5.In Real-Life, statisticians and researchers try very hard to avoid small samples.

Use a Box and

Whisker to check.

Page 8: Chapter 11.1 Inference for the Mean of a Population

Example 2: The Degree of Reading Power (DRP) is a test of the reading ability of children. Here are DRP scores for a random sample of 44 third-grade students in a suburban district:40 26 39 14 42 18 2543 46 27 19 47 19 2635 34 15 44 40 38 3146 52 25 35 35 33 2934 41 49 28 52 47 3548 22 33 41 51 27 1454 45

At the = .1, is there sufficient evidence to suggest that this district’s third graders reading ability is different than the national mean of 34?

Page 9: Chapter 11.1 Inference for the Mean of a Population

• I have an SRS of third-graders•Since the sample size is large, the sampling distribution is approximately normally distributed

OR

•Since the histogram is unimodal with no outliers, the sampling distribution is approximately normally distributed• is unknown

SRS?Normal?How do

you know?

Do you know

?What are your

hypothesis statements? Is

there a key word?

6467.

44189.11

34091.35

t Plug values

into formula.

p-value = tcdf(.6467,1E99,43)=.2606(2)=.5212

Use tcdf to calculate p-

value. = .1

H0: = 34 where is the true mean reading

Ha: = 34 ability of the district’s third-graders

Name the Test!!One Sample t-test for

mean

Page 10: Chapter 11.1 Inference for the Mean of a Population

Conclusion: Compare your p-value to & make

decisionSince p-value > , I fail to reject the null hypothesis.

Write conclusion in context in terms of Ha.

There is not sufficient evidence to suggest that the true mean reading ability of the district’s third-graders is different than the national mean of 34.

Page 11: Chapter 11.1 Inference for the Mean of a Population

Back to Example 1. The times are recorded below.Employee 1 2 3 4 5 6 7 8 9 10Time 66 70 75 88 69 71 71 63 89 86Does this data provide evidence that the mean for this company is greater than 72 minutes?

Page 12: Chapter 11.1 Inference for the Mean of a Population

• I have an SRS of employees•Since the histogram has no outliers and is roughly symmetric, the sampling distribution is approximately normally distributed

• is unknown, therefore we are using a 1 sample t-test

SRS?

Normal?How do

you know?Do you know

?

937.

1045.9

728.74

t Plug values

into formula.

p-value = tcdf(.937,1E99,9)=.1866(2)=.3732

Use tcdf to calculate p-

value.

H0: = 72 where is the true # of min spent on PT

Ha: = 72 time spent by this company’s employees

What are your hypothesis

statements? Is there a key word?

Page 13: Chapter 11.1 Inference for the Mean of a Population

Conclusion: Compare your p-value to & make

decisionSince p-value > , I fail to reject the null hypothesis that this company’s employees spend 72 minutes on average on Personal Technology uses.

Write conclusion in context in terms of Ha.

There is not sufficient evidence to suggest that the true amount of time spent on personal technology use by employees of this company is more than the national mean of 72 min.

Page 14: Chapter 11.1 Inference for the Mean of a Population

Now for the fun calculator stuff!

Page 15: Chapter 11.1 Inference for the Mean of a Population

Example 3: The Wall Street Journal (January 27, 1994) reported that based on sales in a chain of Midwestern grocery stores, President’s Choice Chocolate Chip Cookies were selling at a mean rate of $1323 per week. Suppose a random sample of 30 weeks in 1995 in the same stores showed that the cookies were selling at the average rate of $1208 with standard deviation of $275. Does this indicate that the sales of the cookies is different from the earlier figure?

Page 16: Chapter 11.1 Inference for the Mean of a Population

Assume:

•Have an SRS of weeks

•Distribution of sales is approximately normal due to large sample size

• s unknown

H0: = 1323 where is the true mean cookie sales

Ha: ≠ 1323 per week

Since p-value < of 0.05, I reject the null hypothesis. There is sufficient to suggest that the sales of cookies are different from the earlier figure.

0295.29.2

30275

13231208

valuept

Name the Test!!One Sample t-test for

mean

Page 17: Chapter 11.1 Inference for the Mean of a Population

Example 3: President’s Choice Chocolate Chip Cookies were selling at a mean rate of $1323 per week. Suppose a random sample of 30 weeks in 1995 in the same stores showed that the cookies were selling at the average rate of $1208 with standard deviation of $275. Compute a 95% confidence interval for the mean weekly sales rate.CI = ($1105.30, $1310.70)Based on this interval, is the mean weekly sales rate statistically different from the reported $1323?

Page 18: Chapter 11.1 Inference for the Mean of a Population

What do you notice about the decision from the confidence interval & the

hypothesis test?What decision would you make on Example 3 if = .01?

What confidence level would be correct to use?

Does that confidence interval provide the same decision?

If Ha: < 1323, what decision would the hypothesis test give at = .02?

Now, what confidence level is appropriate for this alternative hypothesis?

You should use a 99% confidence level for a two-sided hypothesis test at = .01.

You would fail to reject H0 since the p-value > .

CI = ($1068.6 , $1346.40) - Since $1323 is in this interval we would fail to reject H0.

Remember your, p-value = .01475

At = .02, we would reject H0.

The 98% CI = ($1084.40, $1331.60) - Since $1323 is in the interval, we would

fail to reject H0.

Why are we getting different answers?

In a one-sided test, all of goes into that tail (lower tail).

= .02

In a CI, the tails have equal area – so there should also be 2% in the upper tail

That leaves 96%96% in the middle & that should be your confidence confidence levellevel

.02.96

A 96% CI = ($1100, $1316).

Since $1323 is not in the interval, we would reject H0.

Tail probabilities between the significant level ()

and the confidence level MUST match!)

Page 19: Chapter 11.1 Inference for the Mean of a Population

Ex4: The times of first sprinkler activation (seconds) for a series of fire-prevention sprinklers were as follows:

27 41 22 27 23 35 30 33 2427 28 22 24

Construct a 95% confidence interval for the mean activation time for the sprinklers.

Page 20: Chapter 11.1 Inference for the Mean of a Population

Matched Pairs Test

A special type of t-inference

Page 21: Chapter 11.1 Inference for the Mean of a Population

Matched Pairs – two forms

• Pair individuals by certain characteristics

• Randomly select treatment for individual A

• Individual B is assigned to other treatment

• Assignment of B is dependent on assignment of A

• Individual persons or items receive both treatments

• Order of treatments are randomly assigned or before & after measurements are taken

• The two measures are dependent on the individual

Page 22: Chapter 11.1 Inference for the Mean of a Population

Is this an example of matched pairs?

1)A college wants to see if there’s a difference in time it took last year’s class to find a job after graduation and the time it took the class from five years ago to find work after graduation. Researchers take a random sample from both classes and measure the number of days between graduation and first day of employmentNo, there is no pairing of individuals, you have two independent samples

Page 23: Chapter 11.1 Inference for the Mean of a Population

Is this an example of matched pairs?

2) In a taste test, a researcher asks people in a random sample to taste a certain brand of spring water and rate it. Another random sample of people is asked to taste a different brand of water and rate it. The researcher wants to compare these samples

No, there is no pairing of individuals, you have two independent samples – If you would have the same people taste both brands in random order, then it would be an example of matched pairs.

Page 24: Chapter 11.1 Inference for the Mean of a Population

Is this an example of matched pairs?

3) A pharmaceutical company wants to test its new weight-loss drug. Before giving the drug to a random sample, company researchers take a weight measurement on each person. After a month of using the drug, each person’s weight is measured again.

Yes, you have two measurements that are dependent on each individual.

Page 25: Chapter 11.1 Inference for the Mean of a Population

A whale-watching company noticed that many customers wanted to know whether it was better to book an excursion in the morning or the afternoon. To test this question, the company collected the following data on 15 randomly selected days over the past month. (Note: days were not consecutive.)

Day 1 2 3 4 5 6 7 8 9 1011

12

13

14

15

Morning 8 9 7 9

10

13

10 8 2 5 7 7 6 8 7

After-noon 8 10 9 8 9

11

8 10 4 7 8 9 6 6 9First, you must find the differences for

each day.

Since you have two values for each day, they are

dependent on the day – making this data matched

pairs

You may subtract either way – just be careful

when writing Ha

Page 26: Chapter 11.1 Inference for the Mean of a Population

Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Morning 8 9 7 9 10 13 10 8 2 5 7 7 6 8 7After-noon 8 10 9 8 9 11 8 10 4 7 8 9 6 6 9

Differences 0 -1 -2 1 1 2 2 -2 -2 -2 -1 -2 0 2 -2

Assumptions:

• Have an SRS of days for whale-watching

• unknown

•Since the boxplot doesn’t show any outliers, we can assume the distribution is approximately normal.

I subtracted:Morning – afternoon

You could subtract the other way!

You need to state assumptions using the differences!

Notice the skewness of the boxplot, however, with no

outliers, we can still assume normality!

Page 27: Chapter 11.1 Inference for the Mean of a Population

Differences 0 -1 -2 1 1 2 2 -2 -2 -2 -1 -2 0 2 -2

Is there sufficient evidence that more whales are sighted in the afternoon?

Be careful writing your Ha!

Think about how you subtracted: M-A

If afternoon is more should the differences be

+ or -?Don’t look at numbers!!!!

H0: D = 0

Ha: D < 0

Where D is the true mean difference in whale sightings from morning minus afternoon

Notice we used D for differences

& it equals 0 since the null should be that there is NO

difference.

If you subtract afternoon –

morning; then Ha: D>0

Page 28: Chapter 11.1 Inference for the Mean of a Population

finishing the hypothesis test:

Since p-value > , I fail to reject H0. There is insufficient evidence to suggest that more whales are sighted in the afternoon than in the morning.

05.14

1803.

945.

15639.1

04.

df

p

nsx

t Notice that if you subtracted A-M, then your test statistic

t = + .945, but p-value would be the same

In your calculator, perform a t-test

using the differences (L3)

Differences 0 -1 -2 1 1 2 2 -2 -2 -2 -1 -2 0 2 -2

Page 29: Chapter 11.1 Inference for the Mean of a Population

Player Before After1 13 182 20 373 17 404 13 355 13 306 16 207 15 338 16 19

Ex: The effect of exercise on the amount of lactic acid in the blood was examined in journal Research Quarterly for Exercise and Sport. Eight males were selected at random from those attending a week-long training camp. Blood lactate levels were measured before and after playing 3 games of racquetball, as shown in the table.

What is the parameter of interest in this problem?

Construct a 95% confidence interval for the mean change in blood lactate level.

Page 30: Chapter 11.1 Inference for the Mean of a Population

Based on the data, would you conclude that there is a significant difference, at the 5% level, that the mean difference in blood lactate level was over 10 points?

Player Before After1 13 182 20 373 17 404 13 355 13 306 16 207 15 338 16 19