multiple linear regression

of 18 /18
June 11, 2022 AGR206 1 Multiple Linear Regression. Concept and uses. Model and assumptions. Intrinsically linear models. Model development and validation. Problem areas. Non-normality. Heterogeneous variance. Correlated errors. Influential points and outliers. Model inadequacies. Collinearity. Errors in X variables.

Author: robert-larson

Post on 31-Dec-2015

34 views

Category:

Documents


3 download

Embed Size (px)

DESCRIPTION

Multiple Linear Regression. Concept and uses. Model and assumptions. Intrinsically linear models. Model development and validation. Problem areas. Non-normality. Heterogeneous variance. Correlated errors. Influential points and outliers. Model inadequacies. Collinearity. - PowerPoint PPT Presentation

TRANSCRIPT

  • Multiple Linear Regression.Concept and uses.Model and assumptions.Intrinsically linear models.Model development and validation.Problem areas.Non-normality.Heterogeneous variance.Correlated errors.Influential points and outliers.Model inadequacies.Collinearity.Errors in X variables.

    AGR206

  • Concept & Uses.Description restricted to data set. Did biomass increase with pH in the sample?Prediction of Y. How much biomass we expect to find in certain soil conditions?Extrapolation for new conditions: can we predict biomass in other estuaries?Estimation and understanding. How much does biomass change per unit change in pH and controlling for other factors?Control of process: requires causality. Can we create sites with certain biomass by changing the pH?

    AGR206

  • Body fat example in JMP.Three variables (X1, X3, X3) were measured to predict body fat % (Y) in people.Random sample of people.Y was measured by an expensive and very accurate method (assume it reveals true %fat).X1: thickness of triceps skinfoldX2: thigh circumferenceX3: midarm circumference.Bodyfat.jmp

    AGR206

  • Hos or values of interestDoes thickness of triceps skinfold contribute significantly to predict fat content?What is the CI for fat content for a person whose Xs have been measured?Do I have more or less fat than last summer?Do I have more fat than recommended?

    AGR206

  • Model and Assumptions.Linear, additive model to relate Y to p independent variables.Note: here, p is number of variables, but some authors use p for number of parameters, which is one more than variables due to the intercept.Yi=b0+ b1 Xi1++ bp Xip+eiwhere ei are normal and independent random variables with common variance s2.In matrix notation the model and solution are exactly the same as for SLR: Y= Xb + e b=(XX)-1(XY)All equations from SLR apply without change.

    AGR206

  • Linear modelsLinear, and intrinsically linear models.Linearity refers to the parameters. The model can involve any function of Xs for as long as they do not have parameters that have to be adjusted.A linear model does not always produce a hyperplane.Yi=b0+ b1 f1(Xi1)++ bp fp(Xi1)+eiPolynomial regression.Is a special case where the functions are powers of X.

    AGR206

  • Matrix Equations

    AGR206

  • Extra Sum of SquaresEffects of order of entry on SS.

    The 4 types of SS.

    Partial correlation.

    AGR206

  • Extra Sum of Squares:body fat

    AGR206

  • Response plane and errorThe response surface in more than 3D is a hyperplane.

    AGR206

  • Model developmentWhat variables to include.Depends on objective:descriptive -> no need to reduce number of variables.Prediction and estimation of Yhat: OK to reduce for economical use.Estimation of b and understanding: sensitive to deletions; may bias MSE and b. No real solution other than getting more data from better experiment. (Sorry!)

    AGR206

  • Variable SelectionEffects of elimination of variables:MSE is positively biased unless true b for variables eliminated is 0.bhat and Yhat are biased unless previous condition or variables eliminated are orthogonal to those retained.Variance of estimated parameters and predictions is usually lower.There are conditions for which MSE for reduced model (including variance and bias2) is smaller.

    AGR206

  • Criteria for variable selectionR2 - Coefficient of determination.R2 = SSReg/SSTotalMSE or MSRes - Mean squared residuals.if all Xs in it estimates s2.R2adj - Adjusted R2.R2adj = 1-MSE/MSTo = =1-[(n-1)/(n-p)] (SSE/SSTo)Mallows CpCp=[SSRes/MSEFull] + 2 p- n (p=number of parameters)

    AGR206

  • Example

    AGR206

  • Checking assumptions.Note that although we have many Xs, errors are still in a single dimension.Residual analysis is performed as for SLR, sometimes repeated over different Xs.Normality. Use proc univ normal option. Transform.Homogeneity of variance. Plot error vs. each X. Transform. Weighted least squares.Independence of errors.Adequacy of model. Plots errors. LOF.Influence and outliers. Use influence option in proc reg.Collinearity. Use collinoint option of proc reg.

    AGR206

  • code for PROC REGdata s00.spart2;set s00.spartina;colin=2*ph+0.5*acid+sal+rannor(23);run;proc reg data=s00.spart2;model bmss= colin h2s sal eh7 ph acid p k ca mg na mn zn cu nh4 / r influence vif collinoint stb partial;run;model colin=ph sal acid;run;

    AGR206

  • Spartina ANOVA outputModel: MODEL1Dependent Variable: BMSS

    Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 15 16369583.2 1091305.552 11.297 0.0001 Error 29 2801379.9 96599.307 C Total 44 19170963.2

    Root MSE 310.80429 R-square 0.8539 Dep Mean 1000.80000 Adj R-sq 0.7783 C.V. 31.05558

    AGR206

  • Parameters and VIF

    AGR206