sas statistics quizzes

45
Quiz, Lesson 1: Introduction to Statistics Select the best answer for each question. When you are finished, click Submit Quiz. For an asymmetric (or skewed) distribution, which of the following statistics is a good measure for the middle of the data? a. mean b. median c. either mean or median Which of the following code examples correctly calculates descriptive statistics of popcorn yield (Yield) for each level of the class variable (Type) in the data set Statdata.Popcorn, as well as statistics for all levels combined? The output should include the following statistics: sample size, mean, median, standard deviation, variance, range, and interquartile range. a. proc means data=statdata.popcorn maxdec=2 fw=10 n mean median std var range qrange; class Type; var Yield; run; b. proc means data=statdata.popcorn maxdec=2 fw=10 printalltypes n mean median std var range qrange; class Yield; var Class; run; c. proc means data=statdata.popcorn maxdec=2 fw=10 printalltypes n mean median std var range qrange;

Upload: mahesh-kumar-joshi

Post on 19-Jan-2016

445 views

Category:

Documents


12 download

DESCRIPTION

a00-240 quizzes not actual test just the practice

TRANSCRIPT

Page 1: SAS Statistics Quizzes

Quiz, Lesson 1: Introduction to Statistics

Select the best answer for each question. When you are finished, click Submit Quiz.

For an asymmetric (or skewed) distribution, which of the following statistics is a good measure for the middle of the data?

 a. mean

 b. median

 c. either mean or median

Which of the following code examples correctly calculates descriptive statistics of popcorn yield (Yield) for each level of the class variable (Type) in the data set Statdata.Popcorn, as well as statistics for all levels combined?

The output should include the following statistics: sample size, mean, median, standard deviation, variance, range, and interquartile range.

 a. proc means data=statdata.popcorn maxdec=2 fw=10 n mean median std var range qrange; class Type; var Yield; run;

 b. proc means data=statdata.popcorn maxdec=2 fw=10 printalltypes n mean median std var range qrange; class Yield; var Class; run;

 c. proc means data=statdata.popcorn maxdec=2 fw=10 printalltypes n mean median std var range qrange; class Type; var Yield; run;

 d. proc means data=statdata.popcorn maxdec=2 fw=10 printalltypes n mean median std range IQR; class Type; var Yield;

Page 2: SAS Statistics Quizzes

run;

1. Read the following statement about the central limit theorem and choose the answer that contains the correct values for all of the missing fields:

The central limit theorem states that the distribution of sample __(1)__ is approximately __(2)__, regardless of the distribution of the population data, as long as the sample size is at least n = __(3)__.

 a. means, skewed, 20

 b. variance, equal, 30

 c. means, normal, 30

 d. proportions, equal, 10

2. Psychologists at a college want to know if students are sleeping more or less than the recommended average of 8 hours a day.

Which of the following code choices correctly tests the null hypothesis?

 a. proc univariate data=statdata.sleep mu0<>8; var hours; run;

 b. proc univariate data=statdata.sleep; var hours / mu0=8; run;

 c. proc univariate data=statdata.sleep; var hours / mu0<>8; run;

 d. proc univariate data=statdata.sleep mu0=8; var hours; run;

3. How do you define the term power?

 a. the measure of the ability of the statistical hypothesis test to reject the null hypothesis when it is actually false

 b. the probability of committing a Type I error

 c. the probability of failing to reject the null hypothesis when it is actually false

Page 3: SAS Statistics Quizzes

4. Select the choice that lists only continuous variables.

 a. body temperature, number of children, gender, beverage size

 b. age, body temperature, gas mileage, income

 c. number of children, gender, gas mileage, income

 d. gender, gas mileage, beverage size, income

5.

6. Which of the following code choices creates a histogram for the variable Speed from the data set SpeedTest with a normal curve overlay and a box with the skewness and kurtosis statistics printed in the northeast corner?

 a. proc univariate data=statdata.speedtest; histogram Speed / normal(mu=est sigma=est); inset skewness kurtosis; run;

 b. proc univariate data=statdata.speedtest; histogram Speed / normal; inset skewness kurtosis / position=ne; run;

 c. proc univariate data=statdata.speedtest; histogram Speed / normal(mu=est sigma=est); inset skewness kurtosis / position=ne; run;

 d. proc univariate data=statdata.speedtest; histogram Speed / normal(skewness kurtosis); run;

7. Select the statement below that incorrectly interprets a 95% confidence interval (15.02, 15.04) for the population mean, if the sample mean is 15.03 ounces of cereal.

 a. You are 95% confident that the true average weight for a box of cereal is between 15.02 and 15.04 ounces.

 b. The probability is .95 that the true average weight is between 15.02 and 15.04 ounces.

 c. In the long run, approximately 95% of the intervals calculated with this procedure will capture the true average weight..

Page 4: SAS Statistics Quizzes

8. The shape of a normal distribution depends on the value of which two parameters?

 a. the mean (x; ) and the standard deviation (s)

 b. the standard deviation (σ) and the variance (σ²)

 c. the mean (µ) and the standard deviation (σ)

 d. none of the above

9. The standard error of the mean is

 a. used to calculate confidence intervals of the mean.

 b. always normally distributed.

 c. sometimes less than 0.

 d. none of the above

Page 5: SAS Statistics Quizzes

Quiz Feedback, Lesson 1: Introduction to Statistics 

Your Score: 90%

Congratulations! Your score of 90% indicates that you've mastered the topics in this lesson. If you'd like, check the feedback below and select Review links for any sections containing topics you want to review.

When you're ready to start the next lesson, select the lesson in the Course Menu. Then click a topic in the Lesson Menu to begin.

1. For an asymmetric (or skewed) distribution, which of the following statistics is a good measure for the middle of the data?

 a. mean b. median c. either mean or median

Your answer: bCorrect answer: b

The median is not affected by outliers and is less affected by the skewness. The mean, on the other hand, averages in any outliers that might be in your data.

Review: Measures of Location

Which of the following code examples correctly calculates descriptive statistics of popcorn yield (Yield) for each level of the class variable (Type) in the data set Statdata.Popcorn, as well as statistics for all levels combined?

The output should include the following statistics: sample size, mean, median, standard deviation, variance, range, and interquartile range.

 a. proc means data=statdata.popcorn maxdec=2 fw=10 n mean median std var range qrange; class Type; var Yield; run;

 b. proc means data=statdata.popcorn maxdec=2 fw=10 printalltypes n mean median std var range qrange; class Yield; var Class; run;

Page 6: SAS Statistics Quizzes

 c. proc means data=statdata.popcorn maxdec=2 fw=10 printalltypes n mean median std var range qrange; class Type; var Yield; run;

 d. proc means data=statdata.popcorn maxdec=2 fw=10 printalltypes n mean median std range IQR; class Type; var Yield; run;

Your answer: cCorrect answer: c

The PROC MEANS statement must include the option PRINTALLTYPES in order for SAS to display statistics for all requested combinations of class variables – that is, for each level or occurrence of the variable and for all occurrences combined. The statistics specified on the second line must include the keywords N MEAN MEDIAN STD VAR RANGE QRANGE. The code must specify Type as the class variable and Yield as the analysis variable.

Review: The MEANS Procedure

3. Read the following statement about the central limit theorem and choose the answer that contains the correct values for all of the missing fields:

The central limit theorem states that the distribution of sample __(1)__ is approximately __(2)__, regardless of the distribution of the population data, as long as the sample size is at least n = __(3)__.

 a. means, skewed, 20 b. variance, equal, 30 c. means, normal, 30 d. proportions, equal, 10

Your answer: cCorrect answer: c

The central limit theorem states that the distribution of sample means is approximately normal, regardless of the distribution of the population data, as long as the sample size is at least n = 30.

Review: Normality and the Central Limit Theorem

Page 7: SAS Statistics Quizzes

4. Psychologists at a college want to know if students are sleeping more or less than the recommended average of 8 hours a day.

Which of the following code choices correctly tests the null hypothesis?

 a. proc univariate data=statdata.sleep mu0<>8; var hours; run;

 b. proc univariate data=statdata.sleep; var hours / mu0=8; run;

 c. proc univariate data=statdata.sleep; var hours / mu0<>8; run;

 d. proc univariate data=statdata.sleep mu0=8; var hours; run;

Your answer: dCorrect answer: d

You specify the MU0= option as part of the PROC UNIVARIATE statement to indicate the test value of the null hypothesis. The alternative hypothesis is that μ is not equal to 8 hours, but this does not need to be specified in the PROC UNIVARIATE code.

Review: Using PROC UNIVARIATE to Generate a   t   Statistic

5. How do you define the term power?

 a. the measure of the ability of the statistical hypothesis test to reject the null hypothesis when it is actually false

 b. the probability of committing a Type I error c. the probability of failing to reject the null hypothesis when it is actually

false

Your answer: aCorrect answer: a

Power is the ability of the statistical test to detect a true difference, or the ability to successfully reject a false null hypothesis. The probability of committing a Type I error is α. The probability of failing to reject the null hypothesis when it is actually false is a Type II error.

Review: Types of Errors and Power

6. Select the choice that lists only continuous variables.

 a. body temperature, number of children, gender, beverage size b. age, body temperature, gas mileage, income

Page 8: SAS Statistics Quizzes

 c. number of children, gender, gas mileage, income d. gender, gas mileage, beverage size, income

Your answer: bCorrect answer: b

The continuous variables are age, body temperature, gas mileage, and income.

Review: Types of Variables: Quantitative and Categorical

7. Which of the following code choices creates a histogram for the variable Speed from the data set SpeedTest with a normal curve overlay and a box with the skewness and kurtosis statistics printed in the northeast corner?

 a. proc univariate data=statdata.speedtest; histogram Speed / normal(mu=est sigma=est); inset skewness kurtosis; run;

 b. proc univariate data=statdata.speedtest; histogram Speed / normal; inset skewness kurtosis / position=ne; run;

 c. proc univariate data=statdata.speedtest; histogram Speed / normal(mu=est sigma=est); inset skewness kurtosis / position=ne; run;

 d. proc univariate data=statdata.speedtest; histogram Speed / normal(skewness kurtosis); run;

Your answer: cCorrect answer: c

In the HISTOGRAM statement, you specify the Speed variable and the NORMAL option using estimates of the population mean and the population standard deviation. In the INSET statement, you specify the keywords SKEWNESS and KURTOSIS, as well as the POSITION=NE option.

Review: The UNIVARIATE Procedure

8. Select the statement below that incorrectly interprets a 95% confidence interval (15.02, 15.04) for the population mean, if the sample mean is 15.03 ounces of cereal.

 a. You are 95% confident that the true average weight for a box of cereal is between 15.02 and 15.04 ounces.

 b. The probability is .95 that the true average weight is between 15.02 and 15.04 ounces.

 c. In the long run, approximately 95% of the intervals calculated with this procedure will capture the true average weight.

Page 9: SAS Statistics Quizzes

.

Your answer: cCorrect answer: b

A 95% confidence interval means that you are 95% confident that the interval contains the true population mean. If you sample repeatedly and calculate a confidence interval for each sample mean, 95% of the time your confidence interval will contain the true population mean. A confidence interval is not a probability. When a confidence interval is calculated, the true mean is in the interval or it is not. There is no probability associated with it.

Review: Confidence Intervals

9. The shape of a normal distribution depends on the value of which two parameters?

 a. the mean (x; ) and the standard deviation (s) b. the standard deviation (σ) and the variance (σ²) c. the mean (µ) and the standard deviation (σ) d. none of the above

Your answer: cCorrect answer: c

The shape of a normal distribution depends on the value of two parameters, the mean (µ) and the standard deviation (σ).

Review: Normal Distribution

10. The standard error of the mean is

 a. used to calculate confidence intervals of the mean. b. always normally distributed. c. sometimes less than 0. d. none of the above

Your answer: aCorrect answer: a

The standard error of the mean is part of the equation used to calculate a confidence interval of the mean. It is not normally distributed, and it is never less than 0.

Review: Point Estimators, Variability, and Standard Error, Interval Estimators

Page 10: SAS Statistics Quizzes

Quiz, Lesson 2: Analysis of Variance (ANOVA)

Select the best answer for each question. When you are finished, click Submit Quiz.

1. Given this SAS output, is there sufficient evidence to reject the assumption of equal variances?

Quiz, Lesson 3: Regression

Select the best answer for each question. When you are finished, click Submit Quiz.

1. Based on this correlation matrix, what type of relationship do Performance and RunTime have?

Pearson Correlation Coefficients, N = 31 Prob > |r| under H0: Rho=0

Performance RunTime Age

Performance1.00000 -0.82049

<.0001

-0.71257

<.0001

RunTime-0.82049

<.0001

1.00000 0.19523

0.2926

Age-0.71257

<.0001

0.19523

0.2926

1.00000

2.

 a. a fairly strong, positive linear relationship

 b. a fairly strong, negative linear relationship

 c. a fairly weak, positive linear relationship

 d. a fairly weak, negative linear relationship

3. When Oxygen_Consumption is regressed on RunTime, Age, Run_Pulse, and Maximum_Pulse, the parameter estimate for Age is -2.78. What does this mean?

 a. For each year older, the predicted value of oxygen consumption is 2.78 greater.

Page 11: SAS Statistics Quizzes

 b. For each year older, the predicted value of oxygen consumption is 2.78 lower.

 c. For every 2.78 years older, oxygen consumption doubles.

 d. For every 2.78 years younger, oxygen consumption doubles.

4. Given the information in this summary of variable selection, which stepwise selection method was specified in the PROC REG step?

StepVariableEntered

VariableRemoved

NumberVars In

PartialR-Square

ModelR-Square

C(p) F Value Pr > F

1 RunTime 1 0.7434 0.7434 3.3432 84.00 <.0001

2 Age 2 0.0213 0.7647 2.8192 2.54 0.1222

5.

 a. FORWARD

 b. BACKWARD

 c. STEPWISE

 d. can't tell from the information given

6. Here is a table of output statistics from PROC REG. If you sample a new value of the dependent variable when Performance equals 55, what are the lower and upper prediction limits for this newly sampled individual value?

Output Statistics

Name PerformanceDependent

VariablePredicted

Value

Std ErrorMean

Predict

95% CL Mean95% CL Predict

Jack 48 40.8400 44.9026 1.0190 42.0732 47.7319 37.4190 52.3861

Annie 43 45.1200 45.3793 1.3081 41.7475 49.0112 37.5570 53.2016

Kate 55 44.7500 44.2351 1.4885 40.1023 48.3678 36.1680 52.3021

Carl 40 46.0800 45.6654 1.6493 41.0862 50.2446 37.3608 53.9700

Don 58 44.6100 43.9490 1.8646 38.7719 49.1261 35.3003 52.5977

Effie 45 47.9200 45.1886 1.1361 42.0343 48.3429 37.5763 52.8009

7.

 a. 44.7500 and 44.2351

 b. 40.1023 and 48.3678

Page 12: SAS Statistics Quizzes

 c. 36.1680 and 52.3021

 d. can't tell from the information given

8. Which of the following statements describes a positive linear relationship between two variables?

1. The more I eat, the less I want to exercise2. The more salty snacks I eat, the more water I want to drink.3. No matter how much I exercise, I still weigh the same.

 a. 1 only

 b. 1 and 2

 c. 2 only

 d. 2 and 3

 e. 3 only

What output does this program produce?

proc corr data=statdata.bodyfat2 nosimple plots=matrix(nvar=all histogram); var Age Weight Height; run;

 a. individual correlation plots and simple descriptive statistics

 b. a scatter plot matrix only, with histograms along its diagonal

 c. a table of correlations and a scatter plot matrix with histograms along its diagonal

 d. can't tell from the information given

9. How many of the following models meet Mallows' Cp criterion for model selection?

Page 13: SAS Statistics Quizzes

ModelIndex

Number inModel

C(p) R-Square Variables in Model

1 7 5.8653 0.7445 Age Weight Neck Abdomen Thigh Forearm Wrist

2 8 5.8986 0.7466 Age Weight Neck Abdomen Hip Thigh Forearm Wrist

3 8 6.4929 0.7459 Age Weight Neck Abdomen Thigh Biceps Forearm Wrist

4 9 6.7834 0.7477 Age Weight Neck Abdomen Hip Thigh Biceps Forearm Wrist

5 7 6.9017 0.7434 Age Weight Neck Abdomen Biceps Forearm Wrist

 a. 0

 b. 1

 c. 3

 d. 5

In this PROC SCORE step, which option specifies the data set containing the parameter estimates that are used to score observations?

proc score data=dataset1 score=dataset2 out=dataset3 type=parms; var Performance; run;

 a. the DATA= option

 b. the SCORE= option

 c. the OUT= option

10.

According to these parameter estimates, are any of the variables in the model statistically significant in predicting or explaining the percentage of body fat?

Parameter Estimates

Variable DFParameterEstimate

StandardError

t Value Pr > |t|

Intercept 1 -20.98714 5.55433 -3.78 0.0002

Age 1 0.01226 0.02836 0.43 0.6658

Hip 1 -0.40163 0.09994 -4.02 <.0001

Abdomen 1 0.86123 0.06814 12.64 <.0001

Page 14: SAS Statistics Quizzes

 a. no

 b. yes, Age

 c. yes, Hip and Abdomen

 d. yes, Age, Hip, and Abdomen

What output does this program produce?

proc reg data=statdata.bodyfat2 plots(only)=(cp); model PctBodyFat2 = Age Weight Height Neck Chest Abdomen Hip Thigh Knee Ankle Biceps Forearm Wrist / selection=cp best=15; run; quit;

 a. only models that meet both Mallows' and Hocking's Cp criteria for model selection

 b. the best 15 models that meet the criteria for the forward, backward, and stepwise selection methods

 c. a set of the best 15 candidate models according to the Cp statistic generated using the all-possible regressions technique

 d. can't tell from the information given

 a. yes

 b. no

2. Given this SAS output, is there sufficient evidence to reject the hypothesis of equal means?

Quiz, Lesson 3: Regression

Select the best answer for each question. When you are finished, click Submit Quiz.

1. Based on this correlation matrix, what type of relationship do Performance and RunTime have?

Page 15: SAS Statistics Quizzes

Pearson Correlation Coefficients, N = 31 Prob > |r| under H0: Rho=0

Performance RunTime Age

Performance1.00000 -0.82049

<.0001

-0.71257

<.0001

RunTime-0.82049

<.0001

1.00000 0.19523

0.2926

Age-0.71257

<.0001

0.19523

0.2926

1.00000

2.

 a. a fairly strong, positive linear relationship

 b. a fairly strong, negative linear relationship

 c. a fairly weak, positive linear relationship

 d. a fairly weak, negative linear relationship

3. When Oxygen_Consumption is regressed on RunTime, Age, Run_Pulse, and Maximum_Pulse, the parameter estimate for Age is -2.78. What does this mean?

 a. For each year older, the predicted value of oxygen consumption is 2.78 greater.

 b. For each year older, the predicted value of oxygen consumption is 2.78 lower.

 c. For every 2.78 years older, oxygen consumption doubles.

 d. For every 2.78 years younger, oxygen consumption doubles.

4. Given the information in this summary of variable selection, which stepwise selection method was specified in the PROC REG step?

StepVariableEntered

VariableRemoved

NumberVars In

PartialR-Square

ModelR-Square

C(p) F Value Pr > F

1 RunTime 1 0.7434 0.7434 3.3432 84.00 <.0001

2 Age 2 0.0213 0.7647 2.8192 2.54 0.1222

5.

 a. FORWARD

 b. BACKWARD

Page 16: SAS Statistics Quizzes

 c. STEPWISE

 d. can't tell from the information given

6.7.

8. Here is a table of output statistics from PROC REG. If you sample a new value of the dependent variable when Performance equals 55, what are the lower and upper prediction limits for this newly sampled individual value?

Output Statistics

Name PerformanceDependent

VariablePredicted

Value

Std ErrorMean

Predict

95% CL Mean95% CL Predict

Jack 48 40.8400 44.9026 1.0190 42.0732 47.7319 37.4190 52.3861

Annie 43 45.1200 45.3793 1.3081 41.7475 49.0112 37.5570 53.2016

Kate 55 44.7500 44.2351 1.4885 40.1023 48.3678 36.1680 52.3021

Carl 40 46.0800 45.6654 1.6493 41.0862 50.2446 37.3608 53.9700

Don 58 44.6100 43.9490 1.8646 38.7719 49.1261 35.3003 52.5977

Effie 45 47.9200 45.1886 1.1361 42.0343 48.3429 37.5763 52.8009

9.

 a. 44.7500 and 44.2351

 b. 40.1023 and 48.3678

 c. 36.1680 and 52.3021

 d. can't tell from the information given

10.11.

12. Which of the following statements describes a positive linear relationship between two variables?

1. The more I eat, the less I want to exercise2. The more salty snacks I eat, the more water I want to drink.3. No matter how much I exercise, I still weigh the same.

 a. 1 only

 b. 1 and 2

 c. 2 only

 d. 2 and 3

 e. 3 only

13.14.

Page 17: SAS Statistics Quizzes

15. What output does this program produce?16. proc corr data=statdata.bodyfat2

nosimple17. plots=matrix(nvar=all histogram);18. var Age Weight Height;19. run;

 a. individual correlation plots and simple descriptive statistics

 b. a scatter plot matrix only, with histograms along its diagonal

 c. a table of correlations and a scatter plot matrix with histograms along its diagonal

 d. can't tell from the information given

20.21.

22. How many of the following models meet Mallows' Cp criterion for model selection?

ModelIndex

Number inModel

C(p) R-Square Variables in Model

1 7 5.8653 0.7445 Age Weight Neck Abdomen Thigh Forearm Wrist

2 8 5.8986 0.7466 Age Weight Neck Abdomen Hip Thigh Forearm Wrist

3 8 6.4929 0.7459 Age Weight Neck Abdomen Thigh Biceps Forearm Wrist

4 9 6.7834 0.7477 Age Weight Neck Abdomen Hip Thigh Biceps Forearm Wrist

5 7 6.9017 0.7434 Age Weight Neck Abdomen Biceps Forearm Wrist

 a. 0

 b. 1

 c. 3

 d. 5

23. In this PROC SCORE step, which option specifies the data set containing the parameter estimates that are used to score observations?

24. proc score data=dataset1 score=dataset2

25. out=dataset3 type=parms;26. var Performance;27. run;

Page 18: SAS Statistics Quizzes

 a. the DATA= option

 b. the SCORE= option

 c. the OUT= option

28.29.

30. According to these parameter estimates, are any of the variables in the model statistically significant in predicting or explaining the percentage of body fat?

Parameter Estimates

Variable DFParameterEstimate

StandardError

t Value Pr > |t|

Intercept 1 -20.98714 5.55433 -3.78 0.0002

Age 1 0.01226 0.02836 0.43 0.6658

Hip 1 -0.40163 0.09994 -4.02 <.0001

Abdomen 1 0.86123 0.06814 12.64 <.0001

 a. no

 b. yes, Age

 c. yes, Hip and Abdomen

 d. yes, Age, Hip, and Abdomen

What output does this program produce?

proc reg data=statdata.bodyfat2 plots(only)=(cp); model PctBodyFat2 = Age Weight Height Neck Chest Abdomen Hip Thigh Knee Ankle Biceps Forearm Wrist31. / selection=cp

best=15;32. run;33. quit;

 a. only models that meet both Mallows' and Hocking's Cp criteria for model selection

 b. the best 15 models that meet the criteria for the forward, backward, and stepwise selection methods

 c. a set of the best 15 candidate models according to the Cp statistic generated using the all-possible regressions technique

 d. can't tell from the information given

Page 19: SAS Statistics Quizzes

34. a. yes

 b. no

3. The manufacturer of a cereal company uses two different processes to package boxes of cereal. He wants to be sure the two processes are putting the same amount of cereal in each box. He plans to perform a two-sample t-test to determine whether the mean weight of cereal is significantly different between the two processes. What type of test should he run?

 a. an upper-tailed t-test

 b. a two-sided t-test

 c. a lower-tailed t-test

4.5.

6. Which of the following is not an assumption you make when including a blocking factor in an ANOVA randomized block design?

 a. The treatments are randomly assigned within each block.

 b. The errors are normally distributed.

 c. The effects of the treatment factor are constant across the levels of the blocking variable.

 d. The observations are dependent.

7.8.

9. When you perform ANOVA for a randomized block design, where do you indicate your blocking variable to SAS?

 a. PROC GLM statement

 b. MODEL statement only

 c. CLASS statement and MODEL statement

 d. LSMEANS statement

10.11.

12. If your blocking variable has a very small F-value in the ANOVA report, what would be a valid next step?

 a. Remove it from the MODEL statement and re-run the analysis.

Page 20: SAS Statistics Quizzes

 b. Test an interaction term.

 c. Report the F-value and plan a new study.

13.14.

15. The Dunnett method compares all possible pairs of means, so it can only be used when you make pairwise comparisons.

 a. true

 b. false

16.17.

18. You can examine Levene's Test for Homogeneity to more formally test which of the following assumptions?

 a. the assumption of errors being normally distributed

 b. the assumption of independent observations

 c. the assumption of equal variances

 d. the assumption of treatments being randomly assigned

19.20.

21. When you perform a two-way ANOVA in SAS, which of the following statements correctly defines the model that includes the interaction between the two main effect variables?

 a. class Drug*Disease;

 b. class Drug=Disease;

 c. model Drug*Disease;

 d. model Health=Drug Disease Drug*Disease;

22.23.

24. This table shows output from a post hoc pairwise comparison in which you tested the significance of a drug on patients' health for three different diseases. What conclusion can you make based on this output?

Page 21: SAS Statistics Quizzes

 a. The drug effect is significant when used in patients with disease Z.

 b. The drug effect is significant when used in patients with diseases Y and 

 c. The drug effect is not significant when used in patients with disease Z.

Quiz Feedback, Lesson 2: Analysis of Variance (ANOVA) 

Your Score: 90%

Congratulations! Your score of 90% indicates that you've mastered the topics in this lesson. If you'd like, check the feedback below and select Review links for any sections containing topics you want to review.

Page 22: SAS Statistics Quizzes

When you're ready to start the next lesson, select the lesson in the Course Menu. Then click a topic in the Lesson Menu to begin.

1. Given this SAS output, is there sufficient evidence to reject the assumption of equal variances?

 a. yes b. no

Your answer: bCorrect answer: b

The p-value of 0.2942 is greater than 0.05, so you fail to reject the null hypothesis and conclude that the variances are equal.

Review: The GLM Procedure

2. Given this SAS output, is there sufficient evidence to reject the hypothesis of equal means?

 a. yes b. no

Page 23: SAS Statistics Quizzes

Your answer: aCorrect answer: a

The p-value of <.001 is less than 0.05, so you would reject the null hypothesis and conclude that the means between the two groups are significantly different.

Review: Examining the Equal Variance   t -Test and   p -Values

3. The manufacturer of a cereal company uses two different processes to package boxes of cereal. He wants to be sure the two processes are putting the same amount of cereal in each box. He plans to perform a two-sample t-test to determine whether the mean weight of cereal is significantly different between the two processes. What type of test should he run?

 a. an upper-tailed t-test b. a two-sided t-test c. a lower-tailed t-test

4.Your answer: xCorrect answer: b

5. Because the cereal manufacturer is interested in determining whether the two processes produce a different mean cereal weight, he needs to perform a two-sided t-test.

6. Review: Scenario: Comparing Group Means, Scenario: Testing for Differences on One Side

7.

4. Which of the following is not an assumption you make when including a blocking factor in an ANOVA randomized block design?

 a. The treatments are randomly assigned within each block. b. The errors are normally distributed. c. The effects of the treatment factor are constant across the levels of the blocking variable. d. The observations are dependent.

5.Your answer: dCorrect answer: d

6. In an ANOVA model, you assume that the errors are normally distributed for each treatment, the errors have equal variances across treatments, and the observations are independent. When you add a blocking factor to your ANOVA model, you also assume that the treatments are randomly assigned within each block and that the effects of the treatment are the same within each block.

7. Review: More ANOVA Assumptions

Page 24: SAS Statistics Quizzes

8.

5. When you perform ANOVA for a randomized block design, where do you indicate your blocking variable to SAS?

 a. PROC GLM statement b. MODEL statement only c. CLASS statement and MODEL statement d. LSMEANS statement

6.Your answer: cCorrect answer: c

7. You list the blocking variable in the CLASS statement. You also also specify the variables as indicated in the ANOVA model, so you list the blocking variable in the MODEL statement.

8. Review: Performing ANOVA with Blocking

9.

6. If your blocking variable has a very small F-value in the ANOVA report, what would be a valid next step?

 a. Remove it from the MODEL statement and re-run the analysis. b. Test an interaction term. c. Report the F-value and plan a new study.

7.Your answer: cCorrect answer: c

8. If the F-value for the blocking variable is small, that is, if it's less than 1, then adding the blocking factor was not helpful in your analysis. Because you collect the data based on the blocking factor, your only choice is to report the F-value and plan a new study. The blocking factor must be included in all ANOVA models that you calculate with the sample that you've already collected.

9. Review: Performing ANOVA with Blocking

10.

7. The Dunnett method compares all possible pairs of means, so it can only be used when you make pairwise comparisons.

 a. true b. false

8.Your answer: bCorrect answer: b

9. The Tukey method and the pairwise t-tests are two methods you learned about that compare all possible pairs of means, so they can only be used when you make pairwise comparisons. The Dunnett method compares all categories to a

Page 25: SAS Statistics Quizzes

control group.10. Review: Dunnett's Multiple Comparison Method, Tukey's Multiple Comparison

Method

11.

8. You can examine Levene's Test for Homogeneity to more formally test which of the following assumptions?

 a. the assumption of errors being normally distributed b. the assumption of independent observations c. the assumption of equal variances d. the assumption of treatments being randomly assigned

9.Your answer: cCorrect answer: c

10. You use Levene's Test for Homogeneity in PROC GLM to verify the assumption of equal variances in a one-way ANOVA model.

11. Review: The GLM Procedure

12.

9. When you perform a two-way ANOVA in SAS, which of the following statements correctly defines the model that includes the interaction between the two main effect variables?

 a. class Drug*Disease; b. class Drug=Disease; c. model Drug*Disease; d. model Health=Drug Disease Drug*Disease;

10.Your answer: dCorrect answer: d

11. In the MODEL statement, you first specify the main effect variables as they exist in the two-way ANOVA model. You then define the interaction term by separating the two main effect variables with an asterisk in the MODEL statement.

12. Review: Performing Two-Way ANOVA with Interactions, Applying the Two-Way ANOVA Model

13.

10. This table shows output from a post hoc pairwise comparison in which you tested the significance of a drug on patients' health for three different diseases. What conclusion can you make based on this output?

Page 26: SAS Statistics Quizzes

 a. The drug effect is significant when used in patients with disease Z. b. The drug effect is significant when used in patients with diseases Y and Z. c. The drug effect is not significant when used in patients with disease Z.

Your answer: cCorrect answer: c

The p-value for disease Z is 0.7815. Because this p-value is greater than your alpha of 0.05, you fail to reject the null hypothesis and conclude that there is no significant effect of Drug onHealth for patients with disease Z.

Review: Performing a Post Hoc Pairwise Comparison

Quiz Feedback, Lesson 3: Regression 

Your Score: 60%

Your score of 60% indicates that you would benefit from reviewing topics in this lesson. Check the feedback below and select Review links for questions you missed. When you're ready, take the quiz again.

1. Based on this correlation matrix, what type of relationship do Performance and RunTime have?

Page 27: SAS Statistics Quizzes

Pearson Correlation Coefficients, N = 31 Prob > |r| under H0: Rho=0

Performance RunTime Age

Performance1.00000 -0.82049

<.0001

-0.71257

<.0001

RunTime-0.82049

<.0001

1.00000 0.19523

0.2926

Age-0.71257

<.0001

0.19523

0.2926

1.00000

2.

 a. a fairly strong, positive linear relationship b. a fairly strong, negative linear relationship c. a fairly weak, positive linear relationship d. a fairly weak, negative linear relationship

3.Your answer: bCorrect answer: b

4. The correlation coefficient for the relationship between Performance and RunTime is -0.82049, which is negative. It is also close to 1, making it a relatively strong relationship.

5. Review: Using Correlation to Measure Relationships between Continuous Variables

6.

2. When Oxygen_Consumption is regressed on RunTime, Age, Run_Pulse, andMaximum_Pulse, the parameter estimate for Age is -2.78. What does this mean?

 a. For each year older, the predicted value of oxygen consumption is 2.78 greater. b. For each year older, the predicted value of oxygen consumption is 2.78 lower. c. For every 2.78 years older, oxygen consumption doubles. d. For every 2.78 years younger, oxygen consumption doubles.

3.Your answer: dCorrect answer: b

4. The parameter estimate for Age is the average change in Oxygen_Consumption for a 1-unit change in Age. In this case, the parameter estimate is negative, So, for each year older (a 1-unit change in Age), oxygen consumption decreases by 2.78 units.

5. Review: The Simple Linear Regression Model

6.

Page 28: SAS Statistics Quizzes

3. Given the information in this summary of variable selection, which stepwise selection method was specified in the PROC REG step?

StepVariableEntered

VariableRemoved

NumberVars In

PartialR-Square

ModelR-Square

C(p) F Value Pr > F

1 RunTime 1 0.7434 0.7434 3.3432 84.00 <.0001

2 Age 2 0.0213 0.7647 2.8192 2.54 0.1222

4.

 a. FORWARD b. BACKWARD c. STEPWISE d. can't tell from the information given

5.Your answer: dCorrect answer: c

6. The summary table contains both Variable Entered and Variable Removed columns. Of the three types of stepwise selection (forward, backward, and stepwise), only stepwise selection can both enter and remove variables. Therefore, STEPWISE must have been specified in the PROC REG step.

7. Review: The Stepwise Selection Approach to Model Building, Specifying Stepwise Selection Methods in SAS, The REG Procedure: Performing Stepwise Regression

8.

4. Here is a table of output statistics from PROC REG. If you sample a new value of the dependent variable when Performance equals 55, what are the lower and upper prediction limits for this newly sampled individual value?

Output Statistics

Obs NamePerformanc

e

Dependent

Variable

Predicted

Value

Std ErrorMean Predic

t

95% CL Mean95% CL Predict

Residual

Jack 48 40.8400 44.9026 1.0190 42.0732

47.7319

37.4190

52.3861

-4.0626

Annie 43 45.1200 45.3793 1.3081 41.7475

49.0112

37.5570

53.2016

-0.2593

Kate 55 44.7500 44.2351 1.4885 40.1023

48.3678

36.1680

52.3021

0.5149

Carl 40 46.0800 45.6654 1.6493 41.0862

50.2446

37.3608

53.9700

0.4146

Don 58 44.6100 43.9490 1.8646 38.7719

49.1261

35.3003

52.5977

0.6610

Effie 45 47.9200 45.1886 1.1361 42.0343

48.3429

37.5763

52.8009

2.7314

Page 29: SAS Statistics Quizzes

5.

 a. 44.7500 and 44.2351 b. 40.1023 and 48.3678 c. 36.1680 and 52.3021 d. can't tell from the information given

6.Your answer: dCorrect answer: c

7. The CLI option, which displays the 95% CL Predict column in the Output Statistics table, produces confidence limits for an individual predicted value. In this table, the third observation, for Kate, contains the value 55 for Performance. Therefore, the values in her 95% CL Predict column are the lower and upper confidence limits for a new individual value at the same value of Performance. In contrast, the CLM option displays the values in the 95% CL Mean column, which are the lower and upper confidence limits for a mean predicted value for each observation.

8. Review: Specifying Confidence and Prediction Intervals in SAS, Viewing and Printing Confidence Intervals and Prediction Intervals, The REG Procedure: Producing Predicted Values

9.

5. Which of the following statements describes a positive linear relationship between two variables?

1. The more I eat, the less I want to exercise2. The more salty snacks I eat, the more water I want to drink.3. No matter how much I exercise, I still weigh the same.

 a. 1 only b. 1 and 2 c. 2 only d. 2 and 3 e. 3 only

6.Your answer: cCorrect answer: c

7. In statement 2, the amount of salty snacks eaten and thirst have a positive linear relationship. As the values of one variable (amount of salty snacks eaten) increase, the values of the other variable (thirst) increase as well.

8. Review: Using Scatter Plots to Describe Relationships between Continuous Variables, Using Correlation to Measure Relationships between Continuous Variables

9.

6. What output does this program produce?7. proc corr data=statdata.bodyfat2 nosimple

Page 30: SAS Statistics Quizzes

8. plots=matrix(nvar=all histogram);9. var Age Weight Height;10. run;

 a. individual correlation plots and simple descriptive statistics b. a scatter plot matrix only, with histograms along its diagonal c. a table of correlations and a scatter plot matrix with histograms along its diagonal d. can't tell from the information given

11.Your answer: bCorrect answer: c

12. By default, PROC CORR produces a table of correlations (which can be a correlation matrix, depending on your program). The NOSIMPLE option suppresses printing of the simple descriptive statistics for each variable, and PLOT=MATRIX requests a scatter plot matrix instead of individual scatter plots. The HISTOGRAM option displays histograms of the variables in the VAR statement along the diagonal of the scatter plot matrix.

13. Review: Producing a Correlation Matrix and a Scatter Plot Matrix

14.

7. How many of the following models meet Mallows' Cp criterion for model selection?

ModelIndex

Number inModel

C(p) R-Square Variables in Model

1 7 5.8653 0.7445 Age Weight Neck Abdomen Thigh Forearm Wrist

2 8 5.8986 0.7466 Age Weight Neck Abdomen Hip Thigh Forearm Wrist

3 8 6.4929 0.7459 Age Weight Neck Abdomen Thigh Biceps Forearm Wrist

4 9 6.7834 0.7477 Age Weight Neck Abdomen Hip Thigh Biceps Forearm Wrist

5 7 6.9017 0.7434 Age Weight Neck Abdomen Biceps Forearm Wrist

 a. 0 b. 1 c. 3 d. 5

Your answer: dCorrect answer: d

In Mallows' Cp criterion, p equals the number of variables in the model plus 1 for the intercept. Therefore, for these models, p equals 8, 9, or 10, depending on the number of terms in the model. All the C(p) values are less than their

Page 31: SAS Statistics Quizzes

respective p values, so all five models meet Mallows' Cpcriterion.

Review: Evaluating Models Using Mallows' Cp Statistic, Viewing Mallows' Cp Statistic in PROC REG, The REG Procedure: Using the All-Possible Regressions Technique, The REG Procedure: Using Automatic Model Selection

8. In this PROC SCORE step, which option specifies the data set containing the parameter estimates that are used to score observations?

9. proc score data=dataset1 score=dataset210. out=dataset3 type=parms;11. var Performance;12. run;

 a. the DATA= option b. the SCORE= option c. the OUT= option

13.Your answer: bCorrect answer: b

14. The SCORE= option specifies the data set that contains the parameter estimates. PROC SCORE reads the parameter estimates from this data set, scores the observations in the data set that the DATA= option specifies, and writes the scored observations to the data set that the OUT= option specifies.

15. Review: The SCORE Procedure: Scoring Predicted Values Using Parameter Estimates

16.

9. According to these parameter estimates, are any of the variables in the model statistically significant in predicting or explaining the percentage of body fat?

Parameter Estimates

Variable DFParameterEstimate

StandardError

t Value Pr > |t|

Intercept 1 -20.98714 5.55433 -3.78 0.0002

Age 1 0.01226 0.02836 0.43 0.6658

Hip 1 -0.40163 0.09994 -4.02 <.0001

Abdomen 1 0.86123 0.06814 12.64 <.0001

 a. no b. yes, Age c. yes, Hip and Abdomen d. yes, Age, Hip, and Abdomen

11.Your answer: cCorrect answer: c

12. Hip and Abdomen both have p-values lower than .05, so they are statistically

Page 32: SAS Statistics Quizzes

significant in predicting or explaining the variability of the percentage of body fat.

13. Review: Performing Simple Linear Regression, Analysis versus Prediction in Multiple Regression,The REG Procedure: Performing Multiple Linear Regression

14.

10. What output does this program produce?11. proc reg data=statdata.bodyfat2

plots(only)=(cp);12. model PctBodyFat2 = Age Weight Height13. Neck Chest Abdomen Hip

Thigh14. Knee Ankle Biceps

Forearm Wrist15. / selection=cp best=15;16. run;17. quit;

 a. only models that meet both Mallows' and Hocking's Cp criteria for model selection b. the best 15 models that meet the criteria for the forward, backward, and stepwise

selection methods c. a set of the best 15 candidate models according to the Cp statistic generated using the

all-possible regressions technique d. can't tell from the information given

18.Your answer: cCorrect answer: c

19. When you use the all-possible regressions technique, you specify RSQUARE, ADJRSQ, or CP in the SELECTION= option to rank models. The BEST= option selects the specified number of best models based on the SELECTION= statistic. If more than one statistic is specified, the first statistic listed determines the ranking.

Quiz, Lesson 3: Regression

Select the best answer for each question. When you are finished, click Submit Quiz.

1. Based on this correlation matrix, what type of relationship

Page 33: SAS Statistics Quizzes

do Performance and RunTime have?

Pearson Correlation Coefficients, N = 31 Prob > |r| under H0: Rho=0

Performance RunTime Age

Performance1.00000 -0.82049

<.0001

-0.71257

<.0001

RunTime-0.82049

<.0001

1.00000 0.19523

0.2926

Age-0.71257

<.0001

0.19523

0.2926

1.00000

2.

 a. a fairly strong, positive linear relationship

 b. a fairly strong, negative linear relationship

 c. a fairly weak, positive linear relationship

 d. a fairly weak, negative linear relationship

3.4.

5. When Oxygen_Consumption is regressed on RunTime, Age, Run_Pulse, and Maximum_Pulse, the parameter estimate for Age is -2.78. What does this mean?

 a. For each year older, the predicted value of oxygen consumption is 2.78 greater.

 b. For each year older, the predicted value of oxygen consumption is 2.78 lower.

 c. For every 2.78 years older, oxygen consumption doubles.

 d. For every 2.78 years younger, oxygen consumption doubles.

6.7.

8. Given the information in this summary of variable selection, which stepwise selection method was specified in the PROC REG step?

StepVariableEntered

VariableRemoved

NumberVars In

PartialR-Square

ModelR-Square

C(p) F Value Pr > F

1 RunTime 1 0.7434 0.7434 3.3432 84.00 <.0001

2 Age 2 0.0213 0.7647 2.8192 2.54 0.1222

9.

 a. FORWARD

 b. BACKWARD

Page 34: SAS Statistics Quizzes

 c. STEPWISE

 d. can't tell from the information given

10.11.

12. Here is a table of output statistics from PROC REG. If you sample a new value of the dependent variable when Performance equals 55, what are the lower and upper prediction limits for this newly sampled individual value?

Output Statistics

Obs Name PerformanceDependent

VariablePredicted

Value

Std ErrorMean

Predict

95% CL Mean95% CL Predict

Residual

Jack 48 40.8400 44.9026 1.0190 42.0732 47.7319 37.4190 52.3861 -4.0626

Annie 43 45.1200 45.3793 1.3081 41.7475 49.0112 37.5570 53.2016 -0.2593

Kate 55 44.7500 44.2351 1.4885 40.1023 48.3678 36.1680 52.3021 0.5149

Carl 40 46.0800 45.6654 1.6493 41.0862 50.2446 37.3608 53.9700 0.4146

Don 58 44.6100 43.9490 1.8646 38.7719 49.1261 35.3003 52.5977 0.6610

Effie 45 47.9200 45.1886 1.1361 42.0343 48.3429 37.5763 52.8009 2.7314

13.

 a. 44.7500 and 44.2351

 b. 40.1023 and 48.3678

 c. 36.1680 and 52.3021

 d. can't tell from the information given

14.15.

16. Which of the following statements describes a positive linear relationship between two variables?

1. The more I eat, the less I want to exercise2. The more salty snacks I eat, the more water I want to drink.3. No matter how much I exercise, I still weigh the same.

 a. 1 only

 b. 1 and 2

 c. 2 only

 d. 2 and 3

 e. 3 only

17.18.

Page 35: SAS Statistics Quizzes

19. What output does this program produce?20. proc corr data=statdata.bodyfat2 nosimple21. plots=matrix(nvar=all histogram);22. var Age Weight Height;23. run;

 a. individual correlation plots and simple descriptive statistics

 b. a scatter plot matrix only, with histograms along its diagonal

 c. a table of correlations and a scatter plot matrix with histograms along its diagonal

 d. can't tell from the information given

24.25.

26. How many of the following models meet Mallows' Cp criterion for model selection?

ModelIndex

Number inModel

C(p) R-Square Variables in Model

1 7 5.8653 0.7445 Age Weight Neck Abdomen Thigh Forearm Wrist

2 8 5.8986 0.7466 Age Weight Neck Abdomen Hip Thigh Forearm Wrist

3 8 6.4929 0.7459 Age Weight Neck Abdomen Thigh Biceps Forearm Wrist

4 9 6.7834 0.7477 Age Weight Neck Abdomen Hip Thigh Biceps Forearm Wrist

5 7 6.9017 0.7434 Age Weight Neck Abdomen Biceps Forearm Wrist

 a. 0

 b. 1

 c. 3

 d. 5

27. In this PROC SCORE step, which option specifies the data set containing the parameter estimates that are used to score observations?

28. proc score data=dataset1 score=dataset229. out=dataset3 type=parms;30. var Performance;31. run;

 a. the DATA= option

 b. the SCORE= option

Page 36: SAS Statistics Quizzes

 c. the OUT= option

32.

33. According to these parameter estimates, are any of the variables in the model statistically significant in predicting or explaining the percentage of body fat?

Parameter Estimates

Variable DFParameterEstimate

StandardError

t Value Pr > |t|

Intercept 1 -20.98714 5.55433 -3.78 0.0002

Age 1 0.01226 0.02836 0.43 0.6658

Hip 1 -0.40163 0.09994 -4.02 <.0001

Abdomen 1 0.86123 0.06814 12.64 <.0001

34.

 a. no

 b. yes, Age

 c. yes, Hip and Abdomen

 d. yes, Age, Hip, and Abdomen

35.

What output does this program produce?

proc reg data=statdata.bodyfat2 plots(only)=(cp);36. model PctBodyFat2 = Age Weight Height37. Neck Chest Abdomen Hip

Thigh38. Knee Ankle Biceps Forearm

Wrist39. / selection=cp best=15;40. run;41. quit;

 a. only models that meet both Mallows' and Hocking's Cp criteria for model selection

 b. the best 15 models that meet the criteria for the forward, backward, and stepwise selection methods

 c. a set of the best 15 candidate models according to the Cp statistic generated using the all-possible regressions technique

 d. can't tell from the information given

42.