stat 4510/7510 - applied statistical models i midterm exam...

5
STAT 4510/7510 - Applied Statistical Models I Midterm Exam (A) - Spring 2018 Name: True or False: Circle the correct answer. 1. True or False: Statistical learning, compared to machine learning, puts emphasis on the underlying models, their interpretability, and uncertainty. 2. True or False: Building an understanding of relationships between features is an example of supervised statistical learning. 3. True or False: One benefit of parametric approaches for modeling is that it is easy to interpret the parameters. 4. True or False: Parametric approaches typically require many more observations because we have to estimate the parameters. 5. True or False: In logistic regression, the more well-separated the classes are, the more reliable the parameter estimates. 6. True or False: The primary dierence between QDA and logistic regression is that QDA assumes that the observations come from a MVN distribution with common variance/covariance matrices across class. 7. True or False: The Bayes classifier gives the optimal test error rate. 8. True or False: KNN can be used for both classification and regression. 9. True or False: In deciding between competing regression models, we choose the one with the smallest training MSE. 10. True or False: As the amount of measurement error, increases, it becomes easier to estimate the function, f , in the model Y = f (X)+ . 11. True or False: Measurement error is a form of reducible error. 12. True or False: Parametric approaches tend to have less variance but more bias. 13. True or False: Test MSE should be used to compare across models for classification. 14. True or False: Cross-validation should be used to choose the optimal number of neighbors for KNN.

Upload: others

Post on 02-Aug-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STAT 4510/7510 - Applied Statistical Models I Midterm Exam ...faculty.missouri.edu/...ASM1_midterm1_Solutions.pdf · Midterm Exam (A) - Spring 2018 Name: True or False: Circle the

STAT 4510/7510 - Applied Statistical Models IMidterm Exam (A) - Spring 2018

Name:

True or False: Circle the correct answer.

1. True or False: Statistical learning, compared to machine learning, puts emphasis on the underlying

models, their interpretability, and uncertainty.

2. True or False: Building an understanding of relationships between features is an example of supervised

statistical learning.

3. True or False: One benefit of parametric approaches for modeling is that it is easy to interpret the

parameters.

4. True or False: Parametric approaches typically require many more observations because we have to

estimate the parameters.

5. True or False: In logistic regression, the more well-separated the classes are, the more reliable the

parameter estimates.

6. True or False: The primary di↵erence between QDA and logistic regression is that QDA assumes that

the observations come from a MVN distribution with common variance/covariance matrices across class.

7. True or False: The Bayes classifier gives the optimal test error rate.

8. True or False: KNN can be used for both classification and regression.

9. True or False: In deciding between competing regression models, we choose the one with the smallest

training MSE.

10. True or False: As the amount of measurement error, ✏ increases, it becomes easier to estimate the

function, f , in the model Y = f(X) + ✏.

11. True or False: Measurement error is a form of reducible error.

12. True or False: Parametric approaches tend to have less variance but more bias.

13. True or False: Test MSE should be used to compare across models for classification.

14. True or False: Cross-validation should be used to choose the optimal number of neighbors for KNN.

Page 2: STAT 4510/7510 - Applied Statistical Models I Midterm Exam ...faculty.missouri.edu/...ASM1_midterm1_Solutions.pdf · Midterm Exam (A) - Spring 2018 Name: True or False: Circle the

15. True or False: Using LDA for two classes and one feature variable, x, the decision boundary is the value

of x⇤, such that �1(x⇤

) = �2(x⇤).

16. True or False: The decision boundary for QDA is linear in x.

17. True or False: For classification, we always want to minimize the misclassification rate.

18. True or False: The area under the ROC curve for a model equivalent to random guessing is 0.5.

19. True or False: Because of the non-linear form of the logistic function, we cannot interpret the coe�cient

estimate, �̂1.

20. True or False: Bootstrapping is an e↵ective resampling method for evaluating model performance.

21. True or False: LOOCV results in low variance and high bias.

22. True or False: Validation is important for assessing whether the model is representing the true underlying

process that it generating the data.

23. True or False: K-fold cross validation is a model validation approach for parametric models only.

24. True or False: RSS is a measure of lack of fit.

25. True or False: A regression model with all dummy variables is equivalent to an ANOVA.

26. True or False: The variance inflation factor (VIF) is used to assess for outliers.

27. True or False: When p > n, we cannot use backward selection as a variable selection procedure.

28. True or False: If the relationship between X and Y is not linear, we cannot use a linear regression model.

29. True or False: Outliers are observations with unusual x values such that they have too much influence

on the regression fit.

30. True or False: KNN is particularly useful when the feature space is of high dimension.

31. Which of the following are assumptions of ANOVA (circle all that are true)?

(a) random sampling from the k populations

(b) independent sampling from the k populations

(c) the k populations have equal means

(d) the k populations have equal variances

(e) the population distributions are normal

Page 3: STAT 4510/7510 - Applied Statistical Models I Midterm Exam ...faculty.missouri.edu/...ASM1_midterm1_Solutions.pdf · Midterm Exam (A) - Spring 2018 Name: True or False: Circle the

32. Use the regression output to answer the following questions.

(a) What is the equation of the estimated regression model?

(b) Report the estimate of R2, and interpret it in context of the problem.

(c) What is the null and alternative hypothesis being tested by the F-statistic?

(d) Interpret the coe�cient of x2.

(e) What is the estimate of �2?

(f) What was the sample size, n, used to fit this regression?

Page 4: STAT 4510/7510 - Applied Statistical Models I Midterm Exam ...faculty.missouri.edu/...ASM1_midterm1_Solutions.pdf · Midterm Exam (A) - Spring 2018 Name: True or False: Circle the

32. Use the regression output to answer the following questions.

(a) What is the equation of the estimated regression model?

(b) Report the estimate of R2, and interpret it in context of the problem.

(c) What is the null and alternative hypothesis being tested by the F-statistic?

(d) Interpret the coe�cient of x2.

(e) What is the estimate of �2?

(f) What was the sample size, n, used to fit this regression?

Page 5: STAT 4510/7510 - Applied Statistical Models I Midterm Exam ...faculty.missouri.edu/...ASM1_midterm1_Solutions.pdf · Midterm Exam (A) - Spring 2018 Name: True or False: Circle the

33. Based on the figure below, write down an appropriate parametric regression model capturing the re-

lationship between the variables. Clearly define all terms in your model. [Note: You do not need to

estimate the values of the parameters. i.e., you can just write �0, �1, etc. You must interpret every

parameter you use.]

0 1 2 3 4 5 6

-6-4

-20

24

x

yspecies 1species 2species 3