south high school · web viewa linear regression t-test the residual plot from five different least...

15
Ch. 12 Review 1. A group of men and women were surveyed to investigate the association between gender and the number of friends the person has on a social media web site. Results are shown in the table below: Which of the following procedures is the most appropriate for investigating whether an association exists between gender and the number of friends a person has on a social media web site? (A) A matched-pairs t-test for a mean difference (B) A two-sample t-test for the difference between means (C) A t-test for the slope of the regression line (D) A chi-square goodness-of-fit test (E) A chi-square test of independence 2. A 90 percent confidence interval for the slope of a regression line is determined to be (-0.181, 1.529). Which of the following statements must be true? (A) The correlation coefficient of the data is positive. (B) The sum of the residuals for the data based on the regression line is positive. (C) A scatterplot of the data would show a linear pattern. (D) The slope of the sample regression line is 1.348. (E) The slope of the sample regression line is 0. 3. A department store manager wants to know if a greater proportion of customers on the store’s mailing list would redeem a coupon for $5 off the price of an item than would redeem a coupon for 10 percent off the price of an item. The manager mails a $5 off coupon to a random sample of 500 customers and mails a 10 percent off coupon to an independent random sample of 500 customers. The number of coupons of each type that were redeemed was recorded. Assuming that the conditions for inference are met, what test procedure should be used to answer the manager’s question? (A) A one-sample t-test for a mean (B) A one-sample z-test for a proportion (C) A t-test for the slope of a regression line

Upload: others

Post on 05-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Ch. 12 Review

1. A group of men and women were surveyed to investigate the association between gender and the number of friends the person has on a social media web site. Results are shown in the table below:

Which of the following procedures is the most appropriate for investigating whether an association exists between gender and the number of friends a person has on a social media web site?

(A) A matched-pairs t-test for a mean difference

(B) A two-sample t-test for the difference between means

(C) A t-test for the slope of the regression line

(D) A chi-square goodness-of-fit test

(E) A chi-square test of independence

2. A 90 percent confidence interval for the slope of a regression line is determined to be (-0.181, 1.529). Which of the following statements must be true?

(A) The correlation coefficient of the data is positive.

(B) The sum of the residuals for the data based on the regression line is positive.

(C) A scatterplot of the data would show a linear pattern.

(D) The slope of the sample regression line is 1.348.

(E) The slope of the sample regression line is 0.

3. A department store manager wants to know if a greater proportion of customers on the store’s mailing list would redeem a coupon for $5 off the price of an item than would redeem a coupon for 10 percent off the price of an item. The manager mails a $5 off coupon to a random sample of 500 customers and mails a 10 percent off coupon to an independent random sample of 500 customers. The number of coupons of each type that were redeemed was recorded. Assuming that the conditions for inference are met, what test procedure should be used to answer the manager’s question?

(A) A one-sample t-test for a mean

(B) A one-sample z-test for a proportion

(C) A t-test for the slope of a regression line

(D) A matched-pairs t-test for a mean difference

(E) A two-sample z-test for a difference between two proportions

4. A statistics teacher wants to determine whether there is a linear relationship between high school students’ heights in inches (in.) and the lengths of the feet, in centimeters (cm). The teacher obtains height and foot length measurements for a random sample of 23 students at the high school and generates the following graph and computer output.

Provided that the assumptions for regression inference are satisfied, which of the following provides a 95% confidence interval estimate of the slope of the population regression line for predicting foot length from height?

(A)

(B)

(C)

(D)

(E)

5. A study compared the language skills and mental development of two groups of 24-month-old children. One group consisted of children identified as talkative, and the other group consisted of children identified as quiet. The scores for the two groups on a test that measured language skills are shown in the table below.

Assuming that it is reasonable to regard the groups as simple random samples and that the other conditions for for inference are met, what statistical test should be used to determine if there is a significant difference in the average test score of talkative and quiet children at the age of 24 months?

(A) A chi-square goodness-of-fit test

(B) A chi-square test of independence

(C) A matched-pairs t-test for means

(D) A two-sample t-test for means

(E) A linear regression t-test

6. The residual plot from five different least squares regression lines are shown below. Which of the plots provides the strongest evidence that its regression line is an appropriate model for the data and is consistent with the assumptions required for inference regression?

7. As part of a class project at a large university, Amber selected a random sample of 12 students in her major field of study. All students in the sample were asked to report their number of hour spent studying for the final exam and their score on the final exam. A regression analysis on the data produced the following partial computer output.

Amber wants to compute a 95 percent confidence interval for the slop of the least square regression line in the population of all students in her major field of study. Assuming that conditions for inference are satisfied, which of the following give the margin of error for the confidence interval?

(A) (2.228)(0.745)

(B) (2.228)

(C) (2.228)(5.505)

(D) (2.228)

(E) (2.228)(2.697)

8. Raoul performed an experiment using 16 windup rubber band single-propeller airplanes. He wound up the propeller a different number of times and recorded the amount of time (in seconds) that the airplane flew for each number of rotations that the propeller was wound. A regression analysis was performed and the partial computer output is given below.

Which of the following is a 95 percent confidence interval for the slope of the regression line that relates the number of rotations the rubber band is wound and the plane’s flight time?

(A)

(B)

(C)

(D)

(E)

9. In a study of the performance of a computer printer, the size (in kilobytes) and the printing time (in seconds) for each of 22 small text files were recorded. A regression line was a satisfactory description of the relationship between size and printing time. The results of the regression analysis are shown below.

Which of the following should be used to compute a 95 percent confidence interval for the slope of the regression line?

(A)

(B)

(C)

(D)

(E)

10. Two measures x and y were taken on 18 subjects. The first of two regressions, Regression I, yielded and had the following residual plot.

The second regression, Regression II, yielded and had the following residual plot.

Which of the following conclusions is best supported by the evidence above?

(A) There is a linear relationship between x and y, Regression I yields a better fit.

(B) There is a linear relationship between x and y, Regression II yields a better fit.

(C) There is a negative correlation between x and y.

(D) There is a nonlinear relationship between x and y, Regression I yields a better fit.

(E) There is a nonlinear relationship between x and y, Regression II yields a better fit.

11.According to data from the U.S. Health Care Financing Administration, the national expenditures for drugs and other medical nondurables (in billions of dollars) for selected years from 1970 to 1997 are as follows: (Note that Year is coded: 1970 is recorded simply as 70.)

Year

70

80

85

87

89

90

91

92

93

94

95

97

Expenditures

8.8

21.6

37.1

43.2

50.6

59.9

65.6

71.2

75

77.7

83.4

108.9

The computer printouts for four different linear regression models are shown below. Model 1 fits expenditures as a function of the year, Model 2 fits the square root of expenditures as a function of the year, Model 3 fits the logarithm base 10 of expenditures as a function of the year, and Model 4 fits the logarithm base 10 of expenditures as a function of the logarithm base 10 of year. Each printout also includes a plot of the residuals from the linear model versus the fitted values, as well as additional descriptive data produced from the least squares procedure.

Model 1

The regression equation is

Spent = - 253 + 3.51 Year

Predictor Coef SE Coef T P

Constant -252.77 35.91 -7.04 0.000

Year 3.5148 0.4041 8.70 0.000

S = 10.0208 R-Sq = 88.3% R-Sq(adj) = 87.2%

Analysis of Variance

Source DF SS MS F P

Regression 1 7596.6 7596.6 75.65 0.000

Residual Error 10 1004.2 100.4

Total 11 8600.8

Model 2

The regression equation is

SQRT(Spent) = - 16.7 + 0.272 Year

Predictor Coef SE Coef T P

Constant -16.738 1.419 -11.79 0.000

Year 0.27241 0.01597 17.06 0.000

S = 0.396008 R-Sq = 96.7% R-Sq(adj) = 96.3%

Analysis of Variance

Source DF SS MS F P

Regression 1 45.630 45.630 290.97 0.000

Residual Error 10 1.568 0.157

Total 11 47.199

Model 3

The regression equation is

LOGT(Spent) = - 1.87 + 0.0402 Year

Predictor Coef SE Coef T P

Constant -1.86503 0.07017 -26.58 0.000

Year 0.0402058 0.0007896 50.92 0.000

S = 0.0195800 R-Sq = 99.6% R-Sq(adj) = 99.6%

Analysis of Variance

Source DF SS MS F P

Regression 1 0.99402 0.99402 2592.78 0.000

Residual Error 10 0.00383 0.00038

Total 11 0.99785

Model 4

The regression equation is

LOGT(Spent) = - 13.3 + 7.69 LOGT(Year)

Predictor Coef SE Coef T P

Constant -13.2593 0.3172 -41.80 0.000

LOGT(Year) 7.6862 0.1630 47.16 0.000

S = 0.0211339 R-Sq = 99.6% R-Sq(adj) = 99.5%

Analysis of Variance

Source DF SS MS F P

Regression 1 0.99338 0.99338 2224.11 0.000

Residual Error 10 0.00447 0.00045

Total 11 0.99785

(a) Choose a model (Model 1-4) for predicting the national drug expenditures shortly after 1997. Justify your choice.

(b) Use the model you chose in part (a) to predict the national drug expenditures for 1998.

12.Windmills generate electricity by transferring energy from wind to a turbine. A study was conducted to examine the relationship between wind velocity in miles per hour (mph) and electricity production in amperes for one particular windmill. For the windmill, measurements were taken on twenty-five randomly selected days, and the computer output for the regression analysis for predicting electricity production based on wind velocity is given below. The regression model assumptions were checked and determined to be reasonable over the interval of wind speeds represented in the data, which were from 10 miles per hour to 40 miles per hour.

(a)Use the computer output above to determine the equation of the least squares regression line. Identify all variables used in the equation.

(b)What proportion of the variation in electricity production is explained by its linear relationship with wind velocity?

(c)Is there statistically convincing evidence that electricity production by the windmill is related to wind velocity? Explain.

Y

e

a

r

S

p

e

n

t

1

0

0

9

5

9

0

8

5

8

0

7

5

7

0

1

2

0

1

0

0

8

0

6

0

4

0

2

0

0

S

c

a

t

t

e

r

p

l

o

t

o

f

S

p

e

n

t

v

s

Y

e

a

r

F

i

t

t

e

d

V

a

l

u

e

R

e

s

i

d

u

a

l

1

0

0

8

0

6

0

4

0

2

0

0

2

0

1

5

1

0

5

0

-

5

-

1

0

R

e

s

i

d

u

a

l

s

V

e

r

s

u

s

t

h

e

F

i

t

t

e

d

V

a

l

u

e

s

(

r

e

s

p

o

n

s

e

i

s

S

p

e

n

t

)

Y

e

a

r

S

Q

R

T

(

S

p

e

n

t

)

1

0

0

9

5

9

0

8

5

8

0

7

5

7

0

1

1

1

0

9

8

7

6

5

4

3

2

S

c

a

t

t

e

r

p

l

o

t

o

f

S

Q

R

T

(

S

p

e

n

t

)

v

s

Y

e

a

r

F

i

t

t

e

d

V

a

l

u

e

R

e

s

i

d

u

a

l

1

0

9

8

7

6

5

4

3

2

0

.

7

5

0

.

5

0

0

.

2

5

0

.

0

0

-

0

.

2

5

-

0

.

5

0

R

e

s

i

d

u

a

l

s

V

e

r

s

u

s

t

h

e

F

i

t

t

e

d

V

a

l

u

e

s

(

r

e

s

p

o

n

s

e

i

s

S

Q

R

T

(

S

p

e

n

t

)

)

Y

e

a

r

L

O

G

T

(

S

p

e

n

t

)

1

0

0

9

5

9

0

8

5

8

0

7

5

7

0

2

.

0

1

.

8

1

.

6

1

.

4

1

.

2

1

.

0

S

c

a

t

t

e

r

p

l

o

t

o

f

L

O

G

T

(

S

p

e

n

t

)

v

s

Y

e

a

r

F

i

t

t

e

d

V

a

l

u

e

R

e

s

i

d

u

a

l

2

.

0

1

.

8

1

.

6

1

.

4

1

.

2

1

.

0

0

.

0

3

0

.

0

2

0

.

0

1

0

.

0

0

-

0

.

0

1

-

0

.

0

2

-

0

.

0

3

-

0

.

0

4

R

e

s

i

d

u

a

l

s

V

e

r

s

u

s

t

h

e

F

i

t

t

e

d

V

a

l

u

e

s

(

r

e

s

p

o

n

s

e

i

s

L

O

G

T

(

S

p

e

n

t

)

)

L

O

G

T

(

Y

e

a

r

)

L

O

G

T

(

S

p

e

n

t

)

2

.

0

0

0

1

.

9

7

5

1

.

9

5

0

1

.

9

2

5

1

.

9

0

0

1

.

8

7

5

1

.

8

5

0

2

.

0

1

.

8

1

.

6

1

.

4

1

.

2

1

.

0

S

c

a

t

t

e

r

p

l

o

t

o

f

L

O

G

T

(

S

p

e

n

t

)

v

s

L

O

G

T

(

Y

e

a

r

)

F

i

t

t

e

d

V

a

l

u

e

R

e

s

i

d

u

a

l

2

.

0

1

.

8

1

.

6

1

.

4

1

.

2

1

.

0

0

.

0

3

0

.

0

2

0

.

0

1

0

.

0

0

-

0

.

0

1

-

0

.

0

2

-

0

.

0

3

-

0

.

0

4

R

e

s

i

d

u

a

l

s

V

e

r

s

u

s

t

h

e

F

i

t

t

e

d

V

a

l

u

e

s

(

r

e

s

p

o

n

s

e

i

s

L

O

G

T

(

S

p

e

n

t

)

)