uestion 1.docx

uestion 1: What is the value m in the table below, if the mean and the variance of the random variable X are μx = 25 and σ2

x = 80? X 10 20 30 35 m

Probability .1 .5 .1 .2 .1

m = 30

m = 35

m = 38

m = 40

m = 45

Question 2: Two distributions D1 and D2 are displayed on the same graph. If the distribution D1 is right skewed, the distribution D2 is left skewed, and the mean of D1 is lower than the mean of D2, which of the following statements is not true?

The median of D1 is lower than the median of D2

The mean of D1 is lower than the median of D2.


The median of D1 is lower than the mean of D2.


Question 3: A large sample of size n is used to estimate the confidence interval for a proportion p. After further evaluation, the standard deviation of the sample distribution is considered too large. What size sample do we need to use for a new standard error equal to one tenth of the original standard error?

10n

50n

100n

141n

200n

Question 4: A box contains 3 yellow, 2 red, 4 green and 3 black marbles. Two marbles are taken one after the other at random from the box. What is the probability that both marbles are red?

1/50

1/60

1/66

1/24

1/18

Question 5: A large automobile manufacturer states that approximately 4% of all their cars made in 2010 have a defective component. Two samples of 150 cars (sample A) and 250 cars (sample B) were taken to test for the defective component. Which of the following statements must be true:

mean of sample A < mean of sample B

mean of the sample A > mean of sample B

standard deviation of sample A = standard deviation of sample B

standard deviation of sample A < standard deviation of sample B

standard deviation of sample A > standard deviation of sample B

Question 6: You roll a die until you get a 5 or a 6. What is the variance of this distribution?

6

3

1/3

8

12

Question 7: A back to back stem and leaf plot compares the heights of the players of two basketball teams. All heights in the plot below are in inches.

Team A Team B

6 8 9

1 4 4 4 5 8 9 7 2 3 6 6 6 8 9

1 1 2 4 8 2 5 6

The means of the two distributions are the same.

The medians of the two distributions are the same.

The ranges of the two distributions are the same.

The distributions have the same number of observations.

None of the statements above is correct.

Question 8: A car manufacturer wishes to estimate the difference in early failures between cars sold in a warm climate and cars sold in a cold climate. They take a random sample of 500 cars from each group. The means obtained are 4.1 failures for cars running in a warm climate and 3.8 failures for cars running in a cold climate. The standard deviation for both populations is 0.4 failures. Find the 95% confidence interval for the difference in the population means.

0.3 +/- 0.0333

0.3 +/- 0.0350

0.3 +/- 0.0400

0.3 +/- 0.0452

0.3 +/- 0.0496

Question 9: A nutritional consulting company is trying to find what percentage of the population of a town is overweight. The marketing department of the company contacts by telephone 600 people from a list of the entire town's population. All 100 people give answers to the survey. Which of the following is the most significant source of bias in this survey?

Size of sample.

Undercoverage.

Voluntary response bias

Nonresponse.

Response bias.

Question 10: Which of the following are true statements: I. All bell-shaped distributions are symmetric. II. Bar charts are useful to describe quantitative data.III. Cumulative frequency plots are useful to describe quantitative data.

I only.

I and II only.

II and III only.

I and III only.

I, II and III.

Question 11: A random sample of 400 passengers of an airline is polled after their flights. Of the passengers, 300 say they will fly again with the same airline. Which of the following is a 90% confidence interval for the proportion of passengers that will fly again with the same airline?

0.75 +/- 0.066

0.75 +/- 0.005

0.75 +/- 0.045

0.75 +/- 0.036

0.75 +/- 0.15

Question 12: The mean number of points per game scored by basketball players during a high school championship is 9.4, and the standard deviation is 1.5. Assuming that the number of points are normally distributed, what number of points per game will place a player in the top 15% players taking part in the basketball championship?

9.10 points per game





Question 13: A residual plot:

displays residuals of the response variable versus the independent variable.

displays residuals of the independent variable versus the response variable

displays residuals of the independent variable versus residuals of the response variable.

displays the independent variable versus the response variable.

displays the response variable versus the dependent variable.

Question 14: For any A and B random variables, which of the following statements must be true: I: μA+B = μA + μB

II σ2A-B = σ2

A + σ2B

III σ2A+B = σ2

A - σ2B

I only.

II only.

I and II.

II and II.

I and II and II.

Question 15: A random sample of 1000 balances in the retirement accounts of exempt employees of a company has a sample mean of μ1 = $100,000 and a standard deviation σ1 = $12,000. A random sample of 4000 balances in the retirement accounts of hourly employees of the same company has a sample mean of μ2 = $80,000 and a standard deviation σ2 = $14,000. If X is the sampling distribution of the differences in account balances of the two categories of employees, what is σx?

$439.3

$2657.6

$5490.9

$11,065

$13,000

Question 16: Four children are asked to pick their favorite ice cream flavor out of 8 different flavors, and each of them is equally likely to pick any of the eight ice cream flavors. What is the probability that each child orders a different ice cream type?

5/72

2/5

7/64

105/256

45/128

Question 17: The rainfall of a county is measured for 14 years in a row to ascertain the local corn crop. No irrigation was used during this time.Variable Coefficient s.e. of coefficient

Constant 33 .8

Rainfall 1.45 .422

Find the 96% confidence interval for the slope of the least squares regression line.

1.45 +/- .056

1.45 +/- .972

1.45 +/- 1.1

33 +/- .972

33 +/- .645

Question 18: An electronics company designs and manufactures DC voltage power supplies. The output voltages of the power supplies have accuracy errors that are caused by three independent internal circuits: A, B and C. Past measurements have shown that circuit A creates an error with a mean of 100mV and a standard deviation of 20mV, circuit B creates an error with a mean of 80mV and a standard deviation of 10mV and circuit C creates an error with a mean of 50mV and a standard deviation of 10mV. What is the standard deviation of the error of the power supplies caused by all three circuits?

12mV

24.5mV

32.5mV

40mV

55mV

Question 19: An opinion survey will be conducted at three corporations. Corporation A has 40,000 employees, corporation B has 60,000 employees and corporation C has 125,000 employees. The results of each survey will be used to estimate the opinions of employees at each corporation. Each survey will be conducted with a simple random sample of 500 employees. Which corporation will have its employees opinions estimated more accurately by the surveys?

Corporation A.

Corporation B.

Neither corporation will have a more accurate estimate.

Corporation C.

Corporations A and B.

Question 20: The quality department of an electronics manufacturer randomly selected 100 resistors. The mean resistance of the resistors was 201.5Ω and the standard deviation was .4Ω. Find the 98% confidence interval for this problem.

201.5+/-.093

201.5+/-.125

201.5+/-.133

201.5+/-.155

201.5+/-.212

Question 21: The heights of 100 students are normally distributed with a mean of 172cm. What is the standard deviation of the heights given that the probability of a height above 180cm is .25?

7.9cm

8.7cm

9.9cm

10.5cm

11.9cm

Question 22: Six hundreds travelers have purchased airline tickets through the same travel agency. The following table gives the two-way classification of their destination choices. What is the joint relative conditional frequency for female travelers to Asia if the marginal row totals are fixed?

Male Female Totals

Europe 189 195 384

Asia 49 62 111

South America 55 50 105

Totals 293 307 600

a) 53%

b) 56%

c) 58%

d) 61%

e) 63%

Question 23: Out of the 500 students of a school, 30% wear glasses. If we use a simple random sample of 25 students, which of the following statements is correct:

The sampling distribution is small relative to the population.

The sampling distribution is a skewed distribution.

The sampling distribution is normal.

The mean of the sampling distribution is equal to .3.

None of the above.

Question 24: The mean of the weights of a group of 100 men and women is 160lb. If the number of men in the group is 60 and the mean weight of the men is 180lb, what is the mean weight of the women?

a) 120lb

b) 125lb

c) 130lb

d) 132lb

e) 135lb

Question 25: A real estate agent finds home buyers and closes the sales for 70% of his clients that sell thier houses. What is the mean number of sales for his next 10 clients and what is the standard deviation of this distribution?

mean = 7 and standard deviation = 1.45





Question 26: The height and the weight of 18 students were measured and a scatterplot of the measures is shown below. If two pairs of measurements need to be removed from the set of 18, which of the choices shown below decreases the coefficient of correlation the most?

a) S2 and S3

b) S2 and S5

c) S1 and S3

d) S2 and S4

e) S1 and S2

Question 27: It takes different times for different workers to perform the same specific task, as it is shown in the distribution below. The boxplot displays time in minutes. Which of the following statements must be true?

a) The 25th percentile is greater than 70 minutes.

b) The distribution is skewed to the left.

c) The interquartile range is higher than 20 minutes.

d) distribution median < distribution mean

e) distribution median = distribution mean

Question 28: Six fruit baskets contain peaches, apples and oranges. Three of the baskets contain two apples and one orange each, two other baskets contain three apples and one peach each, and the last basket contains two peaches and two oranges. You select a basket at random and then select a fruit at random from the basket. Which of the following is the probability that the fruit is an apple?

.32

.4

.46

.5

.58

Question 29: A pharmaceutical company claims that its weight loss drug allows women to lose 8.5lb after one month of treatment. If we want to conduct an experiment to determine if the patients are losing less weight than advertised, which of the following hypotheses should be used?

H0: μ = 8.5; Ha: μ > 8.5

H0: μ = 8.5; Ha: μ ≠ 8.5

H0: μ = 8.5; Ha: μ < 8.5

H0: μ ≠ 8.5; Ha: μ < 8.5

H0: μ ≠ 8.5; Ha: μ > 8.5

Question 30: A car manufacturer claims that 90% of their cars do not experience engine failures before reaching a mileage of 150,000 miles. A sample of 65 cars is investigated and 9 of the cars had an engine failure before 150,000 miles. Which of the following is an appropriate test outcome?

z = 0.552 p = .214

z = -0.274 p = .214

z = -0.274 p = .151

z = -1.034 p = .214

z = -1.034 p = .151

Question 31: A tutoring company tests the results of their intensive training for a standardized test. Six students randomly selected have taken the test before and after having been trained by the company. The following table gives the test scores of the 6 students:Before 88 79 77 83 69

After 78 54 83 73 71

The appropriate test for this situation is:

a matched pair t-test

a chi-square goodness of fit test

a two-sample z-test

a one-sample t-test

a one-sample z-test

Question 32: The Department of Transportation of the State of New York claimed that it takes an average of 200 minutes to travel by train from New York to Buffalo. A random sample of 40 trains was taken and the average time required to travel from New York to Buffalo was 188 minutes, with a standard deviation of 28 minutes. What is the p-value for this test?

.0355

.0099

.1294

.2881

.1167

m = 40

Not answered

2 the mean of D2 is lower than the median of D1.Not answered

3 100nNot answered

4 1/66Not answered

5standard deviation of sample A > standard deviation of sample B

Not answered

6 6Not answered

7 None of the statements above is correct.Not answered

8 0.3 +/- 0.0496Not answered

9 Response bias.Not answered

10 I and III only.Not answered

11 0.75 +/- 0.036Not answered

12 10.95 points per gameNot answered

13displays residuals of the response variable versus the independent variable.

Not answered

14 I onlyNot answered

15 $439.3Not answered

16 105/256Not answered

17 1.45 +/- .972Not answered

18 24.5mVNot answered

19 Neither corporation will have a more accurate estimate. Not

answered

20 201.5+/-.093Not answered

21 11.9cmNot answered

22 56%Not answered

23 The sampling distribution is a skewed distribution.Not answered

24 130lbNot answered

25 mean = 7 and standard deviation = 1.45Not answered

26 S2 and S4Not answered

27 The distribution is skewed to the left.Not answered

28 .58Not answered

29 H0: mu = 8.5; Ha: mu < 8.5Not answered

30 z = -1.034 p = .151Not answered

31 a matched pair t-testNot answered

32 .0099

roblem 1

Which of the following statements are true? (Check one)

I. Categorical variables are the same as qualitative variables.

II. Categorical variables are the same as quantitative variables.

III. Quantitative variables can be continuous variables.

(A) I only

(B) II only

(C) III only

(D) I and II

(E) I and III

Solution

The correct answer is (E). Categorical variables are just another name for qualitative

variables. And quantitative variables are numeric variables, so they can be continuous

variables. Categorical variables, however, are not quantitative variables.

See

also:

Variabl

es

Problem 2

A coin is tossed three times. What is the probability that it lands on heads exactly one time?

(A) 0.125

(B) 0.250

(C) 0.333

(D) 0.375

(E) 0.500

Solution

The correct answer is (D). If you toss a coin three times, there are a total of eight possible

outcomes. They are: HHH, HHT, HTH, THH, HTT, THT, TTH, and TTT. Of the eight possible

outcomes, three have exactly one head. They are: HTT, THT, and TTH. Therefore, the

probability that three flips of a coin will produce exactly one head is 3/8 or 0.375.

See

also:

Probabilit

y

Problem 3

An auto analyst is conducting a satisfaction survey, sampling from a list of 10,000 new car

buyers. The list includes 2,500 Ford buyers, 2,500 GM buyers, 2,500 Honda buyers, and

2,500 Toyota buyers. The analyst selects a sample of 400 car buyers, by randomly sampling

100 buyers of each brand.

Is this an example of a simple random sample?

(A) Yes, because each buyer in the sample was randomly sampled.

(B) Yes, because each buyer in the sample had an equal chance of being sampled.

(C) Yes, because car buyers of every brand were equally represented in the sample.

(D) No, because every possible 400-buyer sample did not have an equal chance of

being chosen.

(E) No, because the population consisted of purchasers of four different brands of car.

Solution

The correct answer is (D). A simple random sample requires that every sample of size n (in

this problem, n is equal to 400) have an equal chance of being selected. In this problem,

there was a 100 percent chance that the sample would include 100 purchasers of each

brand of car. There was zero percent chance that the sample would include, for example, 99

Ford buyers, 101 Honda buyers, 100 Toyota buyers, and 100 GM buyers. Thus, all possible

samples of size 400 did not have an equal chance of being selected; so this cannot be a

simple random sample.

The fact that each buyer in the sample was randomly sampled is a necessary condition for a

simple random sample, but it is not sufficient. Similarly, the fact that each buyer in the

sample had an equal chance of being selected is characteristic of a simple random sample,

but it is not sufficient. The sampling method in this problem used random sampling and

gave each buyer an equal chance of being selected; but the sampling method was

actually stratified random sampling.

The fact that car buyers of every brand were equally represented in the sample is irrelevant

to whether the sampling method was simple random sampling. Similarly, the fact that

population consisted of buyers of different car brands is irrelevant.

See

also:

Survey Sampling

Methods

Problem 4

Which of the following statements is true?

I. The center of a confidence interval is a population parameter.

II. The bigger the margin of error, the smaller the confidence interval.

III. The confidence interval is a type of point estimate.

IV. A population mean is an example of a point estimate.

(A) I only

(B) II only

(C) III only

(D) IV only

(E) None of the above.

Solution

The correct answer is (E). The center of a confidence interval is a sample statistic, not a

population parameter. The confidence interval is equal to the sample statistic plus or minus

the margin of error; so the confidence interval gets bigger as the margin of error gets

bigger. A confidence interval is a type of interval estimate, not a type of point estimate.

A population mean is not an example of a point estimate; a sample mean is an example of a

point estimate.

See

also:

Estimation

Problems

Problem 5

A sample consists of four observations: {1, 3, 5, 7}. What is the standard deviation?

(A) 2

(B) 2.58

(C) 6

(D) 6.67

(E) None of the above

Solution

The correct answer is (B). First, we need to compute the sample mean.

x = ( 1 + 3 + 5 + 7 ) / 4 = 4

Then, we plug all of the known values into the formula for the standard deviation of a

sample, as shown below:

s = sqrt [ Σ ( xi - x )2 / ( n - 1 ) ]

s = sqrt { [ ( 1 - 4 )2 + ( 3 - 4 )2 + ( 5 - 4 )2 + ( 7 - 4 )2 ] / ( 4 - 1 ) }

s = sqrt { [ ( -3 )2 + ( -1 )2 + ( 1 )2 + ( 3 )2 ] / 3 }

s = sqrt { [ 9 + 1 + 1 + 9 ] / 3 } = sqrt (20 / 3) = sqrt ( 6.67 ) = 2.58

See

also:

Measures of

Variability

Problem 6

A card is drawn randomly from a deck of ordinary playing cards. You win $10 if the card is a

spade or an ace. What is the probability that you will win the game?

(A) 1/13

(B) 13/52

(C) 4/13

(D) 17/52


Solution

The correct answer is C. Let S = the event that the card is a spade; and let A = the event

that the card is an ace. We know the following:

There are 52 cards in the deck.

There are 13 spades, so P(S) = 13/52.

There are 4 aces, so P(A) = 4/52.

There is 1 ace that is also a spade, so P(S ∩ A) = 1/52.

Therefore, based on the rule of addition:

P(S ∪ A) = P(S) + P(A) - P(S ∩ A)

P(S ∪ A) = 13/52 + 4/52 - 1/52 = 16/52 = 4/13

See

also:

Rules of

Probability

Problem 7

Which of the following statements is true.

I. The standard error is computed solely from sample attributes.

II. The standard deviation is computed solely from sample attributes.

III. The standard error is a measure of central tendency.

(A) I only

(B) II only

(C) III only

(D) I and II

(E) I and III

Solution

The correct answer is (A). The standard error can be computed from a knowledge of sample

attributes - sample size and sample statistics. The standard deviation cannot be computed

solely from sample attributes; it requires a knowledge of one or more population

parameters. The standard error is a measure of variability, not a measure of central

tendency.

See

also:

Standard

Error

Problem 8

Nine hundred (900) high school freshmen were randomly selected for a national survey.

Among survey participants, the mean grade-point average (GPA) was 2.7, and the standard

deviation was 0.4. What is the margin of error, assuming a 95% confidence level?

(A) 0.013

(B) 0.025

(C) 0.500

(D) 1.960


Solution

The correct answer is (B). To compute the margin of error, we need to find the critical value

and the standard error of the mean. To find the critical value, we take the following steps.

Compute alpha (α): α = 1 - (confidence level / 100) = 1 - 0.95 = 0.05

Find the critical probability (p*): p* = 1 - α/2 = 1 - 0.05/2 = 0.975

Find the critical z score. Since the sample size is large, the sampling distribution will

be roughly normal in shape. Therefore, we can express the critical value as a z score.

For this problem, it will be the z score having a cumulative probability equal to 0.975.

Then, using an online calculator (e.g., Stat Trek's free normal distribution calculator),

a handheld graphing calculator, or the standard normal distribution table, we find the

cumulative probability associated with the z-score. Using the Normal Distribution

Calculator, we find that the critical value is 1.96. (On the actual AP Statistics exam,

you may need to use a graphing calculator or a normal table, since Stat Trek's

analytical tools will not be available.)

Next, we find the standard error of the mean, using the following equation:

SEx = s / sqrt( n ) = 0.4 / sqrt( 900 ) = 0.4 / 30 = 0.013

And finally, we compute the margin of error (ME).

ME = Critical value x Standard error = 1.96 * 0.013 = 0.025

See

also:

Margin of

Error

Problem 9

A national achievement test is administered annually to 3rd graders. The test has a mean

score of 100 and a standard deviation of 15. If Jane's z-score is 1.20, what was her score on

the test?

(A) 82

(B) 88

(C) 100

(D) 112

(E) 118

Solution

The correct answer is (E). From the z-score equation, we know

z = (X - μ) / σ

where z is the z-score, X is the value of the element, μ is the mean of the population, and σ

is the standard deviation.

Solving for Jane's test score (X), we get

X = ( z * σ) + 100 = ( 1.20 * 15) + 100 = 18 + 100 = 118

See

also:

Measures of

Position

Problem 10

Which of the following is a discrete random variable?

I. The average height of a randomly selected group of boys.

II. The annual number of sweepstakes winners from New York City.

III. The number of presidential elections in the 20th century.

(A) I only

(B) II only

(C) III only

(D) I and II

(E) II and III

Solution

The correct answer is B. The annual number of sweepstakes winners is an integer

value and it results from a random process; so it is a discrete random variable. The average

height of a group of boys could be a non-integer, so it is not a discrete variable. And the

number of presidential elections in the 20th century is an integer, but it does not vary and it

does not result from a random process; so it is not a random variable.

See

also:

Random

Variables

Problem 11

Which of the following statements are true? (Check one)

I. A sample survey is an example of an experimental study.

II. An observational study requires fewer resources than an experiment.

III. The best method for investigating causal relationships is an observational study.

(A) I only

(B) II only

(C) III only

(D) All of the above.


Solution

The correct answer is (E). In a sample survey, the researcher does not assign treatments to

survey respondents. Therefore, a sample survey is not an experimental study; rather, it is an

observational study. An observational study may or may not require fewer resources (time,

money, manpower) than an experiment. The best method for investigating causal

relationships is an experiment - not an observational study - because an experiment

features randomized assignment of subjects to treatment groups. Randomization "evens

out" the effects of extraneous variables, which makes it easier for the researcher to identify

causal effects of treatment variables.

See

also:

Data Collection

Methods

Problem 12

Suppose we want to estimate the average weight of an adult male in Dekalb County,

Georgia. We draw a random sample of 1,000 men from a population of 1,000,000 men and

weigh them. We find that the average man in our sample weighs 180 pounds, and the

standard deviation of the sample is 30 pounds. What is the 95% confidence interval.

(A) 180 + 1.86

(B) 180 + 3.0

(C) 180 + 5.88

(D) 180 + 30


Solution

The answer is (A). To specify the confidence interval, we work through the four steps below.

Identify a sample statistic. Since we are trying to estimate the mean weight in the

population, we choose the mean weight in our sample (180) as the sample statistic.

Select a confidence level. In this case, the confidence level is defined for us in the

problem. We are working with a 95% confidence level.

Find the margin of error. Previously, we described how to compute the margin of

error. The key steps are shown below.

Find standard error. The standard error (SE) of the mean is:

SE = s / sqrt( n ) = 30 / sqrt(1000) = 30/31.62 = 0.95

Find critical value. The critical value is a factor used to compute the margin of

error. To express the critical value as a t score (t*), follow these steps.

o Compute alpha (α): α = 1 - (confidence level / 100) = 0.05

o Find the critical probability (p*): p* = 1 - α/2 = 1 - 0.05/2 = 0.975

o Find the degrees of freedom (df): df = n - 1 = 1000 - 1 = 999

o The critical value is the t score having 999 degrees of freedom and

a cumulative probability equal to 0.975. Using an online calculator

(e.g., Stat Trek's free t Distribution Calculator), a handheld graphing

calculator, or a t distribution table, we find that the t score associated

with a cumulative probability of 0.975 is 1.96. (On the actual AP

Statistics exam, you may need to use a graphing calculator or a t table,

since Stat Trek's analytical tools will not be available.)

Note: We might also have expressed the critical value as a z score. Because

the sample size is large, a z score analysis produces the same result - a

critical value equal to 1.96.

Compute margin of error (ME): ME = critical value * standard error = 1.96 *

0.95 = 1.86

Specify the confidence interval. The range of the confidence interval is defined by

the sample statistic + margin of error. And the uncertainty is denoted by the

confidence level. Therefore, we can be 95% confident that the population means falls

within the interval 180 + 1.86.

See

also:

Confidence

Intervals

Problem 13

The stemplot below shows the number of hot dogs eaten by contestants in a recent hot dog

eating contest.

80

70

60

50

40

30

20

10

1

4 7

2 2 6

0 2 5 7 9 9

5 7 9

7 9

1

Which of the following statements are true?

I. The range is 70.

II. The median is 46.

III. The mean is 47.

(A) I only

(B) II only

(C) III only

(D) I and II

(E) I, II, and III

Solution

The correct answer is (D). The range is equal to the biggest value minus the smallest value.

The biggest value is 81, and the smallest value is 11; so the range is equal to 81 -11 or 70.

The median is equal to the middle value in the data set. Here, we have an even number of

values - 45 and 47 - in the middle of the data set. Their average is (45 + 47)/2 or 46, so the

median is equal to 46. The mean is equal to the average of all the values - 45.56.

See

also:

Stemplo

ts

Problem 14

The number of adults living in homes on a randomly selected city block is described by the

following probability distribution.

Number of

adults, x1 2 3

4 or

more

Probability, P(x)0.2

5

0.5

0

0.1

5???

What is the probability that 4 or more adults reside at a randomly selected home?

(A) 0.10

(B) 0.15

(C) 0.25

(D) 0.50

(E) There is not enough information to answer this question.

Solution

The correct answer is A. The sum of all the probabilities is equal to 1. Therefore, the

probability that four or more adults reside in a home is equal to 1 - (0.25 + 0.50 + 0.15) or

0.10.

See

also:

Probability

Distributions

Problem 15

A major metropolitan newspaper selected a simple random sample of 1,600 readers from

their list of 100,000 subscribers. They asked whether the paper should increase its coverage

of local news. Forty percent of the sample wanted more local news. What is the 99%

confidence interval for the proportion of readers who would like more coverage of local

news?

(A) 0.30 to 0.50

(B) 0.32 to 0.48

(C) 0.35 to 0.45

(D) 0.37 to 0.43

(E) 0.39 to 0.41

Solution

The answer is (D). The approach that we used to solve this problem is valid when the

following conditions are met.

The sampling method must be simple random sampling. This condition is satisfied;

the problem statement says that we used simple random sampling.

The sample should include at least 10 successes and 10 failures. Suppose we classify

a "more local news" response as a success, and any other response as a failure.

Then, we have 0.40 * 1600 = 640 successes, and 0.60 * 1600 = 960 failures - plenty

of successes and failures.

If the population size is much larger than the sample size, we can use an

"approximate" formula for the standard deviation or the standard error. This

condition is satisfied, so we will use a simple "approximate" formula for the standard

error.

Since the above requirements are satisfied, we can use the following four-step approach to

construct a confidence interval.

Identify a sample statistic. Since we are trying to estimate a population proportion,

we choose the sample proportion (0.40) as the sample statistic.

Select a confidence level. The confidence level is defined for us in the problem

statement. We are working with a 99% confidence level.

Find the margin of error. Elsewhere on this site, we show how to compute the margin

of errorwhen the sampling distribution is approximately normal. The key steps are

shown below.

Find standard deviation or standard error. Since we do not know the

population proportion, we cannot compute the standard deviation; instead, we

compute the standard error. And since the population is more than 10 times

larger than the sample, we can use the following formula to compute the

standard error (SE) of the proportion:

SE = sqrt [ p(1 - p) / n ] = sqrt [ (0.4)*(0.6) / 1600 ] = sqrt [ 0.24/1600 ] =

0.012


error. Because the sampling distribution is approximately normal and the

sample size is large, we can express the critical value as a z score by following

these steps.

o Compute alpha (α): α = 1 - (confidence level / 100) = 1 - (99/100) =

0.01


o The critical value is the z score having a cumulative probability equal

to 0.995. Using an online calculator (e.g., Stat Trek's free Normal

Distribution Calculator), a handheld graphing calculator, or a normal

distribution table, we find that the z score associated with a cumulative

probability of 0.995 is 2.58. (On the actual AP Statistics exam, you may

need to use a graphing calculator or a normal distribution table, since

Stat Trek's analytical tools will not be available.)


0.012 = 0.03



confidence level.

Therefore, the 99% confidence interval is 0.37 to 0.43. That is, we are 99% confident that

the true population proportion is in the range defined by 0.4 + 0.03.

See

also:

Estimating a

Proportion

Problem 16

Suppose a simple random sample of 150 students is drawn from a population of 3000

college students. Among sampled students, the average IQ score is 115 with a standard

deviation of 10. What is the 99% confidence interval for the students' IQ score?

(A) 115 + 0.01

(B) 115 + 0.82

(C) 115 + 2.1

(D) 115 + 2.6


Solution

The correct answer is (C). The approach that we used to solve this problem is valid when the




The sampling distribution should be approximately normally distributed. Because the

sample size is large, we know from the central limit theorem that the sampling

distribution of the mean will be normal or nearly normal; so this condition is satisfied.



Identify a sample statistic. Since we are trying to estimate a population mean, we

choose the sample mean (115) as the sample statistic.

Select a confidence level. In this analysis, the confidence level is defined for us in the




shown below.

Find standard deviation or standard error. Since we do not know the standard

deviation of the population, we cannot compute the standard deviation of the

sample mean; instead, we compute the standard error (SE). Because the

sample size is much smaller than the population size, we can use the

"approximate" formula for the standard error.

SE = s / sqrt( n ) = 10 / sqrt(150) = 10 / 12.25 = 0.82


error. Because the standard deviation of the population is unknown, we

express the critical value as a t score rather than a z score. To find the critical

value, we take these steps.

o Compute alpha (α): α = 1 - (confidence level / 100) = 1 - 99/100 = 0.01




a cumulative probability equal to 0.995. Using an online calculator

(e.g., Stat Trek's free t Distribution Calculator), a handheld graphing





Note: We might also have expressed the critical value as a z score. Because

the sample size is fairly large, a z score analysis produces a similar result - a

critical value equal to 2.58.


0.82 = 2.1



confidence level.

Therefore, the 99% confidence interval is 112.9 to 117.1. That is, we are 99% confident that

the true population mean is in the range defined by 115 + 2.1.

See

also:

Estimating the Population

Mean

Problem 17

Consider the boxplot below.

2 4 6 8 10 12 14 16 18


I. The distribution is skewed right.

II. The interquartile range is about 8.

III. The median is about 10.

(A) I only

(B) II only

(C) III only

(D) I and II

(E) II and III

Solution

The correct answer is (B). Most of the observations are on the high end of the scale, so the

distribution is skewed left. The interquartile range is indicated by the length of the box,

which is 18 minus 10 or 8. And the median is indicated by the vertical line running through

the middle of the box, which is roughly centered over 15. So the median is about 15.

See

also:

Boxplo

ts

Problem 18

The number of adults living in homes on a randomly selected city block is described by the

following probability distribution.

Number of

adults, x1 2 3 4

Probability, P(x)0.2

5

0.5

0

0.1

5

0.1

0

What is the standard deviation of the probability distribution?

(A) 0.50

(B) 0.62

(C) 0.79

(D) 0.89

(E) 2.10

Solution

The correct answer is D. The solution has three parts. First, find the expected value; then,

find the variance; then, find the standard deviation. Computations are shown below,

beginning with the expected value.

E(X) = Σ [ xi * P(xi) ]

E(X) = 1*0.25 + 2*0.50 + 3*0.15 + 4*0.10 = 2.10

Now that we know the expected value, we find the variance.

σ2 = Σ [ xi - E(x) ]2 * P(xi)

σ2 = (1 - 2.1)2 * 0.25 + (2 - 2.1)2 * 0.50 + (3 - 2.1)2 * 0.15 + (4 - 2.1)2 * 0.10

σ2 = (1.21 * 0.25) + (0.01 * 0.50) + (0.81) * 0.15) + (3.61 * 0.10) = 0.3025 + 0.0050 +

0.1215 + 0.3610 = 0.79

And finally, the standard deviation is equal to the square root of the variance; so the

standard deviation is sqrt(0.79) or 0.89.

See

also:

Attributes of Random

Variables

Problem 19


I. Random sampling is a good way to reduce response bias.

II. To guard against bias from undercoverage, use a convenience sample.

III. Increasing the sample size tends to reduce survey bias.

IV. To guard against nonresponse bias, use a mail-in survey.

(A) I only

(B) II only

(C) III only

(D) IV only


Solution

The correct answer is (E). None of the statements is true. Random sampling provides strong

protection against bias from undercoverage bias and voluntary response bias; but it is not

effective against response bias. A convenience sample does not protect against

undercoverage bias; in fact, it sometimes causes undercoverage bias. Increasing sample

size does not affect survey bias. And finally, using a mail-in survey does not

prevent nonresponse bias. In fact, mail-in surveys are quite vulnerable to nonresponse bias.

See

also:

Survey Sampling

Bias

Problem 20

Twenty-two students were randomly selected from a population of 1000 students. The

sampling method was simple random sampling. All of the students were given a

standardized English test and a standardized math test. Test results are summarized below.

Stude

nt

Englis

h

Mat

h

Difference,

d

(d

- d)2

1 95 90 5 16

2 89 85 4 9

3 76 73 3 4

4 92 90 2 1

5 91 90 1 0

6 53 53 0 1

7 67 68 -1 4

8 88 90 -2 9

9 75 78 -3 16

10 85 89 -4 25

11 90 95 -5 36

Stude

nt

Englis

h

Mat

h

Difference,

d

(d

- d)2

12 85 83 2 1

13 87 83 4 9

14 85 83 2 1

15 85 82 3 4

16 68 65 3 4

17 81 79 2 1

18 84 83 1 0

19 71 60 11 100

20 46 47 -1 4

21 75 77 -2 9

22 80 83 -3 16

Σ(d - d)2 = 270

d = 1

What is the 90% confidence interval for the mean difference between student scores on the

math and English tests? Assume that the mean differences are approximately normally

distributed.

(A) 1 + 0.8

(B) 1 + 1.0

(C) 1 + 1.3

(D) 1 + 2.0

(E) 1 + 3.6

Solution

The answer is (C). The approach that we used to solve this problem is valid when the




The sampling distribution should be approximately normally distributed. The problem

statement says that the differences were normally distributed; so this condition is

satisfied.



Identify a sample statistic. Since we are trying to estimate a population mean

difference in math and English test scores, we use the sample mean difference (d =

1) as the sample statistic.

Select a confidence level. In this analysis, the confidence level is defined for us in the




shown below.

Find standard deviation or standard error. Since we do not know the standard

deviation of the population, we cannot compute the standard deviation of the

sample mean; instead, we compute the standard error (SE). Since the sample

size is much smaller than the population size, we can use the approximation

equation for the standard error.

sd = sqrt [ (Σ(di - d)2 / (n - 1) ] = sqrt[ 270/(22-1) ] = sqrt(12.857) = 3.586

SE = sd / sqrt( n ) = 3.586 / [ sqrt(22) ] = 3.586/4.69 = 0.765


error. Because the standard deviation of the population is unknown, we

express the critical value as a t score rather than a z score. Since the sample

size is fairly small, we choose the t score. To find the critical value, we take

these steps.

o Compute alpha (α): α = 1 - (confidence level / 100) = 1 - 90/100 = 0.10




a cumulative probability equal to 0.95. Using an online calculator (e.g.,

Stat Trek's free t Distribution Calculator), a handheld graphing






0.765 = 1.3



confidence level.

Therefore, the 90% confidence interval is -0.3 to 2.3. That is, we are 90% confident that the

true population proportion is in the range defined by 1 + 1.3.

See

also:

Mean Difference Between Matched Data

Pairs

Problem 21

Below, the cumulative frequency plot shows height (in inches) of college basketball players.

What is the interquartile range?

(A) 3 inches

(B) 6 inches

(C) 25 inches

(D) 50 inches


Solution

The correct answer is (B). The interquartile range is the middle range of the distribution,

defined by Q3 minus Q1.

Q1 is the height for which the cumulative percentage is 25%. To find Q1 from the cumulative

frequency plot, follow the grid line to the right from the Y axis at 25%. This line intersects

the curve over the X axis at a height of about 71 inches. This means that 25% of the

basketball players are at most 71 inches tall, so Q1 is 71.

To find Q3, follow the grid line to the right from the Y axis at 75%. This line intersects the

curve over the X axis at a height of about 77 inches. This means that 75% of the basketball

players are at most 77 inches tall, so Q1 is 77.

Since the interquartile range is Q3 minus Q1, the interquartile range is 77 - 71 or 6 inches.

See

also:

Cumulative Frequency

Plots

Problem 22

Suppose X and Y are independent random variables. The variance of X is equal to 16; and

the variance of Y is equal to 9. Let Z = X - Y.

What is the standard deviation of Z?

(A) 2.65

(B) 5.00

(C) 7.00

(D) 25.0

(E) It is not possible to answer this question, based on the information given.

Solution

The correct answer is B. The solution requires us to recognize that Variable Z is a

combination of twoindependent random variables. As such, the variance of Z is equal to the

variance of X plus the variance of Y.

Var(Z) = Var(X - Y) = Var(X) + Var(Y) = 16 + 9 = 25

The standard deviation of Z is equal to the square root of the variance. Therefore, the

standard deviation is equal to the square root of 25, which is 5.

See

also:

Combinations of Random

Variables

Problem 23

Acme Toy Company sells baseball cards in packages of 100. Three types of players are

represented in each package -- rookies, veterans, and All-Stars. The company claims that

30% of the cards are rookies, 60% are veterans, and 10% are All-Stars. Cards from each

group are randomly assigned to packages.

Suppose you bought a package of cards and counted the players from each group. What

method would you use to test Acme's claim that 30% of the production run are rookies;

60%, veterans; and 10%, All-Stars.

(A) Chi-square goodness of fit test

(B) Chi-square test for homogeneity

(C) Chi-square test for independence

(D) One-sample t test

(E) Matched pairs t-test

Solution

The answer is (A). The chi-square goodness of fit test is used to find out whether an

observed pattern of categorical data is consistent with a specified distribution. In this

problem, we are dealing with a categorical variable -- the type of baseball card. And we want

to determine whether the distribution in our package is consistent with the production

distribution claimed by Acme Toy Company. So the chi-square goodness of fit test is

appropropriate.

The other chi-square options (the independence test and the homogeneity test) involve

comparing data from two samples. Since we only have one sample in this baseball card

problem, they are not appropriate. The t-tests are not appropriate, since they are used with

quantitative data; and this problem involves categorical data.

See

also:

Chi-Square Goodness of Fit

Test

Problem 24

Suppose a researcher conducts an experiment to test a hypothesis. If she doubles her

sample size, which of the following will increase?

I. The power of the hypothesis test.

II. The effect size of the hypothesis test.

III. The probability of making a Type II error.

(A) I only

(B) II only

(C) III only

(D) All of the above


Solution

The answer is (A). Increasing sample size makes the hypothesis test more sensitive - more

likely to reject the null hypothesis when it is, in fact, false. Thus, it increases the power of

the test. The effect size is not affected by sample size. And the probability of making a Type

II error gets smaller, not bigger, as sample size increases.

See

also:

Power of a Hypothesis

Test

Problem 25

College

High school

7

3 6 6

1 2 3 4

6 8 8 9

2 8

3

0

1

2

3

4

5

6

7

0 0 3 5

1 2 4 4 6

1 8 9

0 1

The back-to-back stemplot on the right shows the number of books read in a year by a

random sample of college and high school students. Which of the following statements are

true?

I. One college student read seven books.

II. The college median is equal to the high school median.

III. The mean is greater than the median in both groups.

(A) I only

(B) II only

(C) I and III only

(D) II and III only

(E) I, II, and III

Solution

The correct answer is (E). All of the college students read books during the year; the fewest

books read by a college student was seven. In both groups, the median is equal to 24. And

the mean number of books read per year is 25.3 for high school students and 30.4 for

college students; so the mean is greater than the median in both groups.

See

also:

Comparing

Distributions

Problem 26

Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours?

(A) 0.028

(B) 0.161

(C) 0.167

(D) 0.333

(E) There is not enough information to answer this question.

Solution

The answer is (B). This is a binomial experiment in which the number of trials is equal to 5,

the number of successes is equal to 2, and the probability of success on a single trial is 1/6

or about 0.167. Therefore, the binomial probability is:

b(2; 5, 0.167) = nCx * Px * (1 - P)n - x = 5C2 * (0.167)2 * (0.833)3 = 10 * (0.167)2 * (0.833)3

b(2; 5, 0.167) = 0.161

See

also:

Binomial

Distribution

Problem 27

With respect to experimental design, which of the following statements are true?

I. Blinding controls for the effects of confounding.

II. Randomization controls for effects of lurking variables.

III. Each experimental factor has one treatment level.

(A) I only

(B) II only

(C) III only



Solution

The correct answer is (B). By randomly assigning subjects to treatment levels, randomization

spreads potential effects of lurking variables roughly evenly across treatment

levels. Blinding ensures that subjects in control and treatment conditions experience

the placebo effect equally, but it does not guard against confounding. And finally,

each factor has two or more treatment levels. If a factor had only one treatment level, each

subject in the experiment would get the same treatment on that factor. As a result, that

factor would be confounded with every other factor in the experiment.

See

also:

Experimen

ts

Problem 28

In hypothesis testing, which of the following statements is always true?

I. The P-value is greater than the significance level.

II. The P-value is computed from the significance level.

III. The P-value is the parameter in the null hypothesis.

IV. The P-value is a test statistic.

V. The P-value is a probability.

(A) I only

(B) II only

(C) III only

(D) IV only

(E) V only

Solution

The answer is (E). The P-value is the probability of observing a sample statistic as extreme

as the test statistic. It can be greater than the significance level, but it can also be smaller

than the significance level. It is not computed from the significance level, it is not the

parameter in the null hypothesis, and it is not a test statistic.

See

also:

How to Test

Hypotheses

Problem 29

A national consumer magazine reported the following correlations.

The correlation between car weight and car reliability is -0.30.

The correlation between car weight and annual maintenance cost is 0.20.


I. Heavier cars tend to be less reliable.

II. Heavier cars tend to cost more to maintain.

III. Car weight is related more strongly to reliability than to maintenance cost.

(A) I only

(B) II only

(C) III only

(D) I and II

(E) I, II, and III

Solution

The correct answer is (E). The correlation between car weight and reliability is negative. This

means that reliability tends to decrease as car weight increases. The correlation between

car weight and maintenance cost is positive. This means that maintenance costs tend to

increase as car weight increases.

The strength of a relationship between two variables is indicated by the absolute value of

the correlation coefficient. The correlation between car weight and reliability has an absolute

value of 0.30. The correlation between car weight and maintenance cost has an absolute

value of 0.20. Therefore, the relationship between car weight and reliability is stronger than

the relationship between car weight and maintenance cost.

See

also:

Correlatio

n

Problem 30

Bob is a high school basketball player. He is a 70% free throw shooter. That means his

probability of making a free throw is 0.70. What is the probability that Bob makes his first

free throw on his fifth shot?

(A) 0.0024

(B) 0.0057

(C) 0.0081

(D) 0.0720

(E) 0.1681

Solution

The answer is (B). This is an example of a geometric distribution, which is a special case of a

negative binomial distribution. Therefore, this problem can be solved using the negative

binomial formula or the geometric formula. We demonstrate each approach below,

beginning with the negative binomial formula.

The probability of success (P) is 0.70, the number of trials (x) is 5, and the number of

successes (r) is 1. We enter these values into the negative binomial formula.

b*(x; r, P) = x-1Cr-1 * Pr * Qx - r

b*(5; 1, 0.7) = 4C0 * 0.71 * 0.34

b*(5; 3, 0.7) = 0.00567

Now, we demonstate a solution based on the geometric formula.

g(x; P) = P * Qx - 1

g(5; 0.7) = 0.7 * 0.34 = 0.00567

Notice that each approach yields the same answer.

See

also:

Binomial and Geometric

Distributions

Problem 31

An archer claims that 25% of her shots will be in the center of the target (i.e., a bulls-eye). A

sports writer plans to test this claim by sampling 300 shots. If the 300 shots result in 60 or

fewer bulls-eyes (i.e., 20% bulls-eyes), the writer will reject the archer's claim.

What is the probability that the sports writer will reject the archer's claim, when it is actually

true?

(A) 0.01

(B) 0.02

(C) 0.04

(D) 0.08

(E) 0.16

Solution

The answer is (B). The solution to this problem involves finding the P-value, the probability

of making 60 or fewer bulls-eyes, assuming that 25% of the archer's shots are normally

bulls-eyes. In other words, the null hypothesis is P = 0.25.

To find the P-value, take the following steps.

Calculate the standard deviation (σ), assuming that the null hypothesis is true.

σ = sqrt[ P * ( 1 - P ) / n ] = sqrt [(0.25 * 0.75) / 300] = sqrt(0.000625) = 0.025

where P is the hypothesized value of population proportion in the null hypothesis,

and n is the sample size.

Compute the z-score test statistic (z).

z = (p - P) / σ = (0.20 - 0.25)/0.025 = -2

where P is the hypothesized value of population proportion in the null hypothesis,

and p is the proportion of bulls-eyes observed in the sample.

The P-value is the probability that a z-score will be less than -2.0. Using an online

calculator (e.g., Stat Trek's free Normal Distribution Calculator), a handheld graphing

calculator, or a normal distribution table, we find that the z score of -2.0 has a

cumulative probability of 0.023. (On the actual AP Statistics exam, you may need to

use a graphing calculator or a normal distribution table, since Stat Trek's analytical

tools will not be available.)

Note: This problem can also be treated as a binomial experiment. Previously, we

showed how to analyze a binomial experiment. The binomial experiment is actually the

more exact analysis. It produces a P-value with a cumulative probability of 0.0245. Without a

computer, the binomial approach is computationally demanding. Therefore, many statistics

texts emphasize the approach presented above, which is a normal approximation of the

binomial.

See

also:

Hypothesis Test for a

Proportion

Problem 32

The Acme Car Company claims that at most 8% of its new cars have a manufacturing defect.

A quality control inspector randomly selects 300 new cars and finds that 33 have a defect.

Should she reject the 8% claim? Assume that the significance level is 0.05.

(A) Yes, because the P-value is 0.016.

(B) Yes, because the P-value is 0.028.

(C) No, because the P-value is 0.16.

(D) No, because the P-value is 0.28.

(E) There is not enough information to reach a conclusion.

Solution

The answer is (B). The solution to this problem takes four steps: (1) state the hypotheses, (2)

formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work

through those steps below:

State the hypotheses. The first step is to state the null hypothesis and an

alternative hypothesis.

Null hypothesis: P <= 0.08

Alternative hypothesis: P > 0.08

Note that these hypotheses constitute a one-tailed test. The null hypothesis will be

rejected only if the sample proportion is too big.

Formulate an analysis plan. For this analysis, the significance level is 0.05. The

test method, shown in the next section, is a one-sample z-test.

Analyze sample data. Using sample data, we calculate the standard deviation (σ)

and compute the z-score test statistic (z).

σ = sqrt[ P * ( 1 - P ) / n ] = sqrt [(0.08 * 0.92) / 300] = sqrt(0.0002453) = 0.0157

z = (p - P) / σ = (.11 - .08)/0.0157 = 1.91

where P is the hypothesized value of population proportion in the null hypothesis, p is

the sample proportion, and n is the sample size.

Since we have a one-tailed test, the P-value is the probability that the z-score is

greater than 1.91. We use the Normal Distribution Calculator to find P(z > 1.91) =

0.028. Thus, the P-value = 0.028.

Interpret results. Since the P-value (0.028) is less than the significance level (0.05),

we cannot accept the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention why this approach

is appropriate. Specifically, the approach is appropriate because the sampling method was

simple random sampling, the sample included at least 10 successes and 10 failures, and the

population size was at least 10 times the sample size.

See

also:

Hypothesis Test for a

Proportion

Problem 33

In the context of regression analysis, which of the following statements are true?

I. When the sum of the residuals is greater than zero, the model is nonlinear.

II. A random pattern in the residual plot indicates that linear regression is appropriate.

III. Influential points always reduce the correlation coefficient.

(A) I only

(B) II only

(C) III only

(D) I and II

(E) I, II, and III

Solution

The correct answer is (B). Researchers use residual plots to decide whether data fit a linear

or a non-linear model. A random pattern of residuals suggests that a linear model is

appropriate; a non-random pattern suggests that a non-linear model may be appropriate.

The sum of the residuals is always zero, whether the regression model is linear or nonlinear.

And influential points often increase the correlation coefficient.

See

also:

Residual Analysis in

Regression

Problem 34

Molly earned a score of 940 on a national achievement test. The mean test score was 850

with a standard deviation of 100. What proportion of students had a higher score than Molly?

(Assume that test scores are normally distributed.)

(A) 0.10

(B) 0.18

(C) 0.50

(D) 0.82

(E) 0.90

Solution

The correct answer is B. As part of the solution to this problem, we assume that test scores

are normally distributed. In this way, we use the normal distribution as a model for

measurement. Given an assumption of normality, the solution involves three steps.

First, we transform Molly's test score into a z-score, using the z-score transformation

equation.

z = (X - μ) / σ = (940 - 850) / 100 = 0.90

Then, using an online calculator (e.g., Stat Trek's free normal distribution calculator),

a handheld graphing calculator, or the standard normal distribution table, we find the

cumulative probability associated with the z-score. In this case, we find P(Z < 0.90) =

0.8159.

Therefore, the P(Z > 0.90) = 1 - P(Z < 0.90) = 1 - 0.8159 = 0.1841.

Thus, we estimate that 18.41 percent of the students tested had a higher score than Molly.

See

also:

Standard Normal

Distribution

Problem 35


I. A completely randomized design offers no control for lurking variables.

II. A randomized block design controls for the placebo effect.

III. In a matched pairs design, subjects within each pair receive the same treatment.

(A) I only

(B) II only

(C) III only



Solution

The correct answer is (E). In a completely randomized design, subjects are randomly

assigned to treatment conditions. Randomization provides some control for lurking

variables. By itself, arandomized block design does not control for the placebo effect. To

control for the placebo effect, the experimenter must include a placebo in one of the

treatment levels. In a matched pairs design, subjects within each pair are assigned

to different treatment levels.

See

also:

Experimental

Design

Problem 36

A sports writer hypothesized that Tiger Woods plays better on par 3 holes than on par 4

holes. He reviewed Woods' performance in a random sample of golf tournaments. On the par

3 holes, Woods made a birdie in 20 out of 80 attempts. On the par 4 holes, he made a birdie

in 40 out of 200 attempts. How would you interpret this result?

(A) The P-value is < 0.001, very strong evidence that Woods plays better on par 3

holes.

(B) The P-value is between 0.001 and 0.01, strong evidence that Woods plays better on

par 3 holes.

(C) The P-value is between 0.01 and 0.05, moderate evidence that Woods plays better

on par 3 holes.

(D) The P-value is between 0.05 and 0.10, some evidence that Woods plays better on

par 3 holes.

(E) The P-value is > 0.10, little or no support for the notion that Woods plays better on

par 3 holes.

Solution

The answer is (E). The solution to this problem takes four steps: (1) state the hypotheses, (2)

formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work

through those steps below:

State the hypotheses. The first step is to state the null hypothesis and an

alternative hypothesis.

Null hypothesis: P3 <= P4

Alternative hypothesis: P3 > P4

Note that these hypotheses constitute a one-tailed test. The null hypothesis will be

rejected if the proportion of birdies on par 3 holes (p3) is sufficiently greater than the

proportion of birdies on par 4 holes (p4).

Formulate an analysis plan. For this analysis, the test method is a two-proportion

z-test, which is shown below.

Analyze sample data. Using sample data, we calculate the pooled sample

proportion (p) and the standard error (SE). Using those measures, we compute the z-

score test statistic (z).

p = (p3 * n3 + p4 * n4) / (n3 + n4) = [(0.25 * 80) + (0.20 * 200)] / (80 + 200) = 50/280

= 0.214

SE = sqrt{ p * ( 1 - p ) * [ (1/n3) + (1/n4) ] }

SE = sqrt [ 0.214 * 0.786 * ( 1/80 + 1/200 ) ] = sqrt[ 0.214 * 0.786 * 0.0175 }= sqrt

[0.0029548] = 0.0544

z = (p3 - p4) / SE = (0.25 - 0.20)/0.0544 = 0.92

where p3 is the sample proportion of birdies on par 3, where p4 is the sample

proportion of birdies on par 4, n3 is the number of par 3 holes, and n4 is the number

of par 4 holes.

Since we have a one-tailed test, the P-value is the probability that the z-score is

greater than 0.92. We use the Normal Distribution Calculator to find P(z > 0.92) =

0.18. Thus, the P-value = 0.18.

Interpret results. Since the P-value (0.18) is greater than 0.10, we have little

support for the notion that Woods plays better on par 3 holes. In short, we cannot

reject the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention why this approach

is appropriate. Specifically, the approach is appropriate because the sampling method was

simple random sampling, the samples were independent, each population was at least 10

times larger than its sample, and each sample included at least 10 successes and 10

failures.

See

also:

Hypothesis Test for Difference Between

Proportions

Problem 37

In the context of regression analysis, which of the following statements are true?

I. A linear transformation increases the linear relationship between variables.

II. A logarithmic model is the most effective transformation method.

III. A residual plot reveals departures from linearity.

(A) I only

(B) II only

(C) III only

(D) I and II

(E) I, II, and III

Solution

The correct answer is (C). A linear transformation neither increases nor decreases the linear

relationship between variables; it preserves the relationship. A nonlinear transformation is

used to increase the relationship between variables. The most effective transformation

method depends on the data being transformed. In some cases, a logarithmic model may be

more effective than other methods; but it other cases it may be less effective. Non-random

patterns in a residual plot suggest a departure from linearity in the data being plotted.

See

also:

Transformations to Achieve

Linearity

Problem 38

Acme Corporation manufactures light bulbs. The CEO claims that an average Acme light

bulb lasts 300 days. A researcher randomly selects 15 bulbs for testing. The sampled bulbs

last an average of 290 days, with a standard deviation of 50 days. If the CEO's claim were

true, what is the probability that 15 randomly selected bulbs would have an average life of

no more than 290 days?

(A) 0.100

(B) 0.226

(C) 0.334

(D) 0.443

(E) .775

Solution

The answer is (B). The first thing we need to do is compute the t score, based on the

following equation:

t = [ x - μ ] / [ s / sqrt( n ) ]

t = ( 290 - 300 ) / [ 50 / sqrt( 15) ] = -10 / 12.909945 = - 0.7745966

where x is the sample mean, μ is the population mean, s is the standard deviation of the

sample, and n is the sample size.

Then, using an online calculator (e.g., Stat Trek's free T Distribution Calculator), a

handheld graphing calculator, or the t distribution table, we find the cumulative probability

associated with the t score. For this practice test, we can use the T Distribution Calculator;

but on the actual AP Statistics Exam, you may need to use a graphing calculator or a t

distribution table.

Since we know the t score, we select "T score" from the Random Variable dropdown box of

the T Distribution Calculator. Then, we enter the following data:

The degrees of freedom are equal to 15 - 1 = 14.

The t score is equal to - 0.7745966.

The calculator displays the cumulative probability: 0.226. Hence, if the true bulb life were

300 days, there is a 22.6% chance that the average bulb life for 15 randomly selected bulbs

would be less than or equal to 290 days.

See

also:

Student's t

Distribution

Problem 39

Which of the following would be a reason to use a one-sample t-test instead of a one-sample

z-test?

I. The standard deviation of the population is unknown.

II. The null hypothesis involves a continuous variable.

III. The sample size is large (greater than 40).

(A) I only

(B) II only

(C) III only

(D) I and II

(E) I and III

Solution

The answer is (A). When the standard deviation of the population is unknown, the t-test is

preferred. Either test can be used when the null hypothesis involves a continuous variable,

or when the sample size is large.

See

also:

Student's t

Distribution

Problem 40

A public opinion poll surveyed a simple random sample of voters. Respondents were

classified by gender (male or female) and by voting preference (Republican, Democrat, or

Independent). Results are shown below.

Voting Preferences

Republica

n

Democr

at

Independe

nt

Row

total

Male 200 150 50 400

Female 250 300 50 600

Column

total450 450 100 1000

If you conduct a chi-square test of independence, what is the expected frequency count of

male Independents?

(A) 40

(B) 50

(C) 60

(D) 180

(E) 270

Solution

The answer is (A). To apply the chi-square test for independence, we compute the expected

frequency counts for each cell of the table, using the following equation. The computation

for males who are classified as Independents is shown below.

Er,c = (nr * nc) / n

E1,3 = (400 * 100) / 1000 = 40000/1000 = 40

where r is the number of levels of gender, c is the number of levels of the voting preference,

nr is the number of observations from level r of gender, nc is the number of observations

from level c of voting preference, n is the number of observations in the sample, Er,c is the

expected frequency count when gender is level r and voting preference is level c, and Or,c is

the observed frequency count when gender is level r voting preference is level c.

uestion 1.docx

Documents