course1.winona.educourse1.winona.edu/bdeppa/regression/homework/as… · web viewstat 360 –...

8
STAT 360 – Regression Analysis – Assignment 7 ( 75 pts.) Fall 2015 1 – Black Cherry Trees in R Note: You must complete all parts of this problem in R. Also note these data were used in Section 14 – Response Transformations so you can certainly look at the analysis done in the notes for some guidance with this problem. However I am going to have you look at an alternative model to the model developed in that section for this problem. These record the girth in inches, height in feet and volume of timber in cubic feet of each of a sample of 31 felled black cherry trees in Allegheny National Forest, Pennsylvania. Note that girth is the diameter of the tree (in inches) measured at 4 ft. 6 in. above the ground. The variables in this dataset: Y= ¿ Vol – volume of the black cherry tree (ft 3 ) X 1 =D – girth/diameter of tree (in.) X 2 =Ht – height of tree (ft.) a) Construct a scatterplot matrix of these data using the pairs.plus function in the Regression.RData directory I sent to you via e-mail. Briefly comment on what you learn from looking at this plot. (3 pts.) b) Fit the model E ( Vol |D,Ht ) =β o +β 1 D +β 2 Ht , call this model bc.lm1. Use plot(bc.lm1) to examine residual plots and case diagnostics for this model. Comment on any model deficiencies suggested by these plots. (3 pts.) c) Even though there is clearly curvature in the plot of the residuals vs. the fitted values, conduct Tukey’s Test for Nonadditivity by adding the squared fitted values from bc.lm1 as term in the model above (see haystack example in notes). Black Cherry Tree

Upload: others

Post on 05-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: course1.winona.educourse1.winona.edu/bdeppa/Regression/Homework/As… · Web viewSTAT 360 – Regression Analysis – Assignment 7 ( 75 pts.) Fall 2015 1 – Black Cherry Trees in

STAT 360 – Regression Analysis – Assignment 7 ( 75 pts.)Fall 2015

1 – Black Cherry Trees in RNote: You must complete all parts of this problem in R. Also note these data were used in Section 14 – Response Transformations so you can certainly look at the analysis done in the notes for some guidance with this problem. However I am going to have you look at an alternative model to the model developed in that section for this problem.

These record the girth in inches, height in feet and volume of timber in cubic feet of each of a sample of 31 felled black cherry trees in Allegheny National Forest, Pennsylvania. Note that girth is the diameter of the tree (in inches) measured at 4 ft. 6 in. above the ground.

The variables in this dataset:

Y=¿ Vol – volume of the black cherry tree (ft3) X1=D – girth/diameter of tree (in.) X2=Ht – height of tree (ft.)

a) Construct a scatterplot matrix of these data using the pairs.plus function in the Regression.RData directory I sent to you via e-mail. Briefly comment on what you learn from looking at this plot. (3 pts.)

b) Fit the model E (Vol|D,Ht )=βo+β1D+β2Ht, callthis model bc.lm1. Use plot(bc.lm1) to examine residual plots and case diagnostics for this model.Comment on any model deficiencies suggested by theseplots. (3 pts.)

c) Even though there is clearly curvature in the plot of the residuals vs. the fitted values, conduct Tukey’s Test for Nonadditivity by adding the squared fitted values from bc.lm1 as term in the model above (see haystack example in notes). Does this test suggest curvature? Explain . (3 pts.)

d) In the notes from Section 14 we used an inverse fitted value (response) plot to suggest the T (Y )=√Vol as a transformation to address the curvature. For this assignment use the Box-Cox Transformation method to identify a suitable transformation to achieve approximate normality of the response. It will give a range of possible λ values, choose the common transformation closest to the optimal λ value chosen by Box-Cox. R commands for doing this are shown below.

Black Cherry Tree

Page 2: course1.winona.educourse1.winona.edu/bdeppa/Regression/Homework/As… · Web viewSTAT 360 – Regression Analysis – Assignment 7 ( 75 pts.) Fall 2015 1 – Black Cherry Trees in

STAT 360 – Regression Analysis – Assignment 7 ( 75 pts.)Fall 2015

You can do this R by using either (or both) of the functions below:> BCtran(Vol) this function is in the Regression.RData directory.> summary(powerTransform(Vol)) function in the car library

What are the optimal λ value and the nearest common transformation recommended by the Box-Cox method? What is the LR test p-value for testing NH : λ=0 vs . AH : λ≠0 ? (4 pts.)

e) Fit the model using the common transformation for the response found in part (d) and call this model bc.lm2? Examine residual plots for this model and comment the adequacy of this model – i.e. use the command plot(bc.lm2). (3 pts.)

f) If we assume that black cherry trees are conical (not comical) then we might expect that the volume of a tree is given by the volume of a cone. (see the notes linked below for a variety of geometric considerations that are used in estimated the volume of a tree http://www2.latech.edu/~strimbu/Teaching/FOR306/T4.pdf)

Vol=13π r2hwherer=radius of tree( D2 )∧h=Ht

Taking the logarithm of both sides gives,

log (Vol )=log( π3 )+2 log (r )+ log (h)

This suggest using the following mean function with the response log transformed.

E ( log (Vol )|D ,Ht )=βo+β1 log (D )+β2 log (Ht )

Fit the model with the response and both predictors log transformed and call this model bc.lm3. Examine the residuals from this fit and comment on the models adequacy. (3 pts.)

Page 3: course1.winona.educourse1.winona.edu/bdeppa/Regression/Homework/As… · Web viewSTAT 360 – Regression Analysis – Assignment 7 ( 75 pts.) Fall 2015 1 – Black Cherry Trees in

STAT 360 – Regression Analysis – Assignment 7 ( 75 pts.)Fall 2015

g) What is theR2 for this model? Interpret this quantity. (2 pts.)

h) Given the expression for the log of the volume above

log (Vol )=log( π3 )+2 log (r )+ log (h)

we might expect that the estimated coefficient for the diameter to be twice as large as the estimated coefficient for the height. Does this appear to be the case? Explain. (2 pts.)

i) Conduct a test to see if there are any outliers for the model fit in part (f), i.e. bc.lm3. Summarize your results. (2 pts.)

j) Do the CERES plots for log (D )∧log (Ht ) in the model fit in part (f), i.e. bc.lm3 suggest any further transformation of these terms would improve the model? Explain. (3 pts.)

Page 4: course1.winona.educourse1.winona.edu/bdeppa/Regression/Homework/As… · Web viewSTAT 360 – Regression Analysis – Assignment 7 ( 75 pts.) Fall 2015 1 – Black Cherry Trees in

STAT 360 – Regression Analysis – Assignment 7 ( 75 pts.)Fall 2015

2 – Abalones for the North Coast and Islands of Bass Strait Tasmania, Australia (Datafiles: Abalone (no sex).JMP and Abalone (no sex).txt)

The goal of this regression analysis is to develop a model for predicting the age of abalone (or paua) from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and very time-consuming task. It is hoped that other measurements, which are easier to obtain, can be used to predict the age. Also the dust from cleaning a paua shell is toxic and can lead to health problems if exposed routinely to it. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem. 

a) Examine a scatterplot matrix of the response (Y )and all of the potential predictors (X1 ,…, X7).

Discuss these plots in terms of the following: (5 pts.) marginal/univariate distributions of Y∧X j ' s. relationships between Y vs . X j ' s relationships between X j' s unusual cases multicollinearity issues

In R this can be done using the function pairs.plus in Regression.RData.

Abalone or Paua shell – before and after cleaning.

The variables in these data:

Y=rings−number of rings X1=length−longest shell measurement (mm ) X2=diam−perpendicular ¿ length (mm ) X3=height−height of abalonewith meat inside(mm) X 4=whole .weight−weight of thewholeabalone (g ) X5=shucked .weight−weight of meat (g ) X6=visc .weight−gut weight (afterbleeding ) (g ) X7=shell .weight−weight of shell after drying(g)

Data Source:Warwick J. Nash, Tracy L. Sellers, Simon R. Talbot, Andrew J. Cawthorn & Wes B. Ford from their paper:

"The Population Biology of Abalone (Haliotis) in Tasmania Island and Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait" (1994)

Page 5: course1.winona.educourse1.winona.edu/bdeppa/Regression/Homework/As… · Web viewSTAT 360 – Regression Analysis – Assignment 7 ( 75 pts.) Fall 2015 1 – Black Cherry Trees in

STAT 360 – Regression Analysis – Assignment 7 ( 75 pts.)Fall 2015

b) Fit the model E (Y|X )=βo+β1X1+…+ β7 X7 and Var (Y|X )=σ2 in R and call it ab.lm1. Construct plots of the residuals from this model and comment on any model deficiencies suggested. Use the code below to plot all the diagnostic displays in one plotting window and include the plot. (4 pts.)

> ab.lm1 = lm(rings~.,data=Abalone) > par(mfrow=c(2,2))> plot(ab.lm1)> par(mfrow=c(1,1))

c) Construct an inverse fitted value (response) plot using the function invResPlot in the car library.

> invResPlot(ab.lm1)

What transformation (T ( y)) would you use for the response given what you have seen in part (b) and the inverse fitted value (response) plot shows? Explain why this plot may not give an accurate visual impression of the response transformation T ( y) for these data. (4 pts.)

d) Fit the model using the transformation you chose in part (c) and call it ab.lm2. Again examine the residual plots and comment on any model deficiencies exhibited by these plots. (3 pts.)

e) Use Tukey’s Nonadditivity Test to test for curvature in the model from part (d). Summarize your findings giving the t-statistic and associated p-value. (3 pts.)

f) In order to address the curvature not addressed by our current model (ab.lm2) we will now consider creating terms that allow for nonlinear effects in some of the predictors. One tool to do this is the component-plus-residual plot or C+R Plot. Construct C+R Plots for each predictor in ab.lm2.

> crPlots(ab.lm2)

Which predictors show the greatest degree of nonlinearity? Explain why some of these plots may not be giving an accurate impression of the function form for the predictors in this model. (4 pts.)

g) To address the problems with the C+R Plots for these data and examine CERES Plots for each predictor.> ceresPlots(ab.lm2)

Page 6: course1.winona.educourse1.winona.edu/bdeppa/Regression/Homework/As… · Web viewSTAT 360 – Regression Analysis – Assignment 7 ( 75 pts.) Fall 2015 1 – Black Cherry Trees in

STAT 360 – Regression Analysis – Assignment 7 ( 75 pts.)Fall 2015

Which predictors show the greatest degree of nonlinearity? What terms or transformations are suggested by the CERES plot for the following variables: length, diam, height, whole.weight, and shell.weight? Note: you do not necessarily need to transform all of these predictors. (6 pts.)

h) Using the choices for predictor transformations or nonlinear functions of the predictors (e.g. polynomials) build a multiple regression model incorporating them. Write out the mean function for final model. Summarize your final model and include that summary below. (8 pts.)

i) Use the plot function to examine residuals and case diagnostics for your final model from part (h). Discuss any problems/concerns exhibited by these plots and case diagnostics. (3 pts.)

j) Use the outlier t-test to check for outliers in your final model. Which observation(s) are classified as outliers? (3 pts.)

k) At this point you should have 2 – 5 observations/cases you find questionable due to their influence or the fact they are poorly fit (i.e. outliers). Rerun your model with these observations deleted and compare this model summary to the one including these deleted observations you found in part (h).

> poo.sub = lm(formula(poo),data=Abalone,subset=-c(1,2,3,4))

Here poo is the model from part (h) and –c(1,2,3,4)will contain the observations you identified in parts (i) & (j), e.g. if the observations to remove are 337, 888, and 3144 then you would use – c(337,888,3144). Compare and contrast the differences in the model summaries. (4 pts.)