# regression analysis and multiple regression

Post on 02-Jan-2016

116 views

Category:

## Documents

Tags:

• #### fitted line xythree

Embed Size (px)

DESCRIPTION

Regression Analysis and Multiple Regression. Session 7. Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares Error Variance and the Standard Errors of Regression Estimators Correlation Hypothesis Tests about the Regression Relationship - PowerPoint PPT Presentation

TRANSCRIPT

• Regression Analysis and Multiple RegressionSession 7

• Simple Linear Regression Model Using StatisticsThe Simple Linear Regression ModelEstimation: The Method of Least SquaresError Variance and the Standard Errors of Regression EstimatorsCorrelationHypothesis Tests about the Regression RelationshipHow Good is the Regression?Analysis of Variance Table and an F Test of the Regression ModelResidual Analysis and Checking for Model InadequaciesUse of the Regression Model for PredictionUsing the ComputerSummary and Review of Terms

• 7-1 Using Statistics

• Examples of Other Scatterplots

• Model Building

• 7-2 The Simple Linear Regression Model

• Picturing the Simple Linear Regression ModelThe simple linear regression model posits an exact linear relationship between the expected or average value of Y, the dependent variable, and X, the independent or predictor variable: E[Yi]=0 + 1 Xi

Actual observed values of Y differ from the expected value by an unexplained or random error:

Yi = E[Yi] + i = 0 + 1 Xi + i

• Assumptions of the Simple Linear Regression ModelThe relationship between X and Y is a straight-line relationship.The values of the independent variable X are assumed fixed (not random); the only randomness in the values of Y comes from the error term i.The errors i are normally distributed with mean 0 and variance 2. The errors are uncorrelated (not related) in successive observations. That is: ~ N(0,2)

• 7-3 Estimation: The Method of Least SquaresEstimation of a simple linear regression relationship involves finding estimated or predicted values of the intercept and slope of the linear regression line.The estimated regression equation: Y=b0 + b1X + e

where b0 estimates the intercept of the population regression line, 0 ;b1 estimates the slope of the population regression line, 1;and e stands for the observed errors - the residuals from fitting the estimated regression line b0 + b1X to a set of n points.

• Fitting a Regression LineXYDataXYThree errors from a fitted line XYThree errors from the least squares regression lineeXErrors from the least squares regression line are minimized

• Errors in Regression.{YX

• Least Squares Regression

• Sums of Squares, Cross Products, and Least Squares Estimators

• Example 7-1 MilesDollarsMiles 2Miles*Dollars 1211180214665212182222 1345240518090253234725 1422200520220842851110 1687251128459694236057 1849233234188014311868 2026230541046764669930 2133301645496896433128 2253338550760097626405 2400309057600007416000 2468369460910249116792 2699337172846019098329 28063998787363611218388 30823555949872410956510 320946921029768115056628 346642441201315614709704 364352981327144919300614 385248011483790418493452 403351471626508920757852 426757381820728824484046 449864202023200428877160 453360592054808827465448 480464262307841630870504 509063212590810032173890 523370262738428836767056 5439696429582720378771967949810605293426944390185024

• Example 7-1: Using the ComputerMTB > Regress 'Dollars' 1 'Miles';SUBC> Constant.

Regression Analysis

The regression equation isDollars = 275 + 1.26 Miles

Predictor Coef Stdev t-ratio pConstant 274.8 170.3 1.61 0.120Miles 1.25533 0.04972 25.25 0.000

s = 318.2 R-sq = 96.5% R-sq(adj) = 96.4%

Analysis of Variance

SOURCE DF SS MS F pRegression 1 64527736 64527736 637.47 0.000Error 23 2328161 101224Total 24 66855896

• Example 7-1: Using Computer-ExcelThe results on the right side are the output created by selecting REGRESSION option from the DATA ANALYSIS toolkit.

Sheet4

SUMMARY OUTPUT

Regression Statistics

Multiple R0.9824339304

R Square0.9651764276

Standard Error248.9927747387

Observations25

ANOVA

dfSSMSFSignificance F

Regression139521617.596942439521617.5969424637.47215856710

Residual231425940.2430576161997.4018720699

Total2440947557.84

CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

Intercept-100.6545126642139.0751189028-0.72374205720.4765213794-388.3529241266187.0438987982-388.3529241266187.0438987982

X Variable 10.76886039880.030452074425.248210997400.70586556920.83185522840.70586556920.8318552284

Sheet5

SUMMARY OUTPUT

Regression Statistics

Multiple R0.9824339304

R Square0.9651764276

Standard Error318.1578225487

Observations25

ANOVA

dfSSMSFSignificance F

Regression164527736.798873964527736.7988739637.47215856710

Residual232328161.20112609101224.40004896

Total2466855898

CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

Intercept274.8496866723170.33684369311.61356568970.120259309-77.5184416502627.2178149949-77.5184416502627.2178149949

MILES1.2553337760.049719711925.248210997401.15248085571.35818669631.15248085571.3581866963

Sheet1

12111802

13452405

14222005

16872511

18492332

20262305

21333016

22533385

24003090

24683694

26993371

28063998

30823555

32094692

34664244

36435298

38524801

40335147

42675738

44986420

45336059

48046426

50906321

52337026

54396964

Sheet2

Sheet3

• Example 7-1: Using Computer-ExcelResidual Analysis. The plot shows the absence of a relationship between the residuals and the X-values (miles).

• Total Variance and Error Variance

• 7-4 Error Variance and the Standard Errors of Regression Estimators

• Standard Errors of Estimates in Regression

• Confidence Intervals for the Regression Parameters

• 7-5 CorrelationThe correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables. The population correlation, denoted by, can take on any value from -1 to 1.indicates a perfect negative linear relationship-1<
• Illustrations of Correlation

• Covariance and Correlation*Note: If < 0, b1 < 0 If = 0, b1 = 0If > 0, b1 >0

• Example 7-2: Using Computer-Excel

Sheet4

SUMMARY OUTPUT

Regression Statistics

Multiple R0.9922659456

R Square0.9845917068

Standard Error0.2797613719

Observations10

ANOVA

dfSSMSFSignificance F

Regression140.009868598340.0098685983511.20092035410.0000000155

Residual80.62613140170.0782664252

Total940.636

CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

Intercept-8.76252469480.5940927979-14.74942084070.0000004391-10.1325060294-7.3925433602-10.1325060294-7.3925433602

US1.42363608730.062965575222.60975277070.00000001551.27843711671.56883505791.27843711671.5688350579

RESIDUAL OUTPUT

ObservationPredicted YResiduals

12.05710956890.2428904311

22.48420039510.1157996049

33.05365483-0.15365483

43.4807456562-0.2807456562

53.7654728737-0.0654728737

64.05020009120.0497999088

74.61965452610.1803454739

85.758563396-0.058563396

97.4669267008-0.4669267008

108.46347196190.4365280381

Sheet4

2.37.6

2.67.9

2.98.3

3.28.6

3.78.8

4.19

4.89.4

5.710.2

711.4

8.912.1

Y

Predicted Y

X Variable 1

Y

X Variable 1 Line Fit Plot

Sheet1

7.62.3

7.92.6

8.32.9

8.63.2

8.83.7

94.1

9.44.8

10.25.7

11.47

12.18.9

Sheet2

Sheet3

• Example 7-2: Regression Plot

Chart3

2.32.0571095689

2.62.4842003951

2.93.05365483

3.23.4807456562

3.73.7654728737

4.14.0502000912

4.84.6196545261

5.75.758563396

77.4669267008

8.98.4634719619

Y

Predicted Y

United States

International

United States Line Fit Ploty = 1.4236x - 8.7625

Sheet4

SUMMARY OUTPUT

Regression Statistics

Multiple R0.9922659456

R Square0.9845917068

Standard Error0.2797613719

Observations10

ANOVA

dfSSMSFSignificance F

Regression140.009868598340.0098685983511.20092035410.0000000155

Residual80.62613140170.0782664252

Total940.636

CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

Intercept-8.76252469480.5940927979-14.74942084070.0000004391-10.1325060294-7.3925433602-10.1325060294-7.3925433602

US1.42363608730.062965575222.60975277070.00000001551.27843711671.56883505791.27843711671.5688350579

RESIDUAL OUTPUT

ObservationPredicted YResiduals

12.05710956890.2428904311

22.48420039510.1157996049

33.05365483-0.15365483

43.4807456562-0.2807456562

53.7654728737-0.0654728737

64.05020009120.0497999088

74.61965452610.1803454739

85.758563396-0.058563396

97.4669267008-0.4669267008

108.46347196190.4365280381

Sheet4

2.37.6

2.67.9

2.98.3

3.28.6

3.78.8

4.19

4.89.4

5.710.2

711.4

8.912.1

Y

Predicted Y

United States

International

United States Line Fit Ploty = 1.4236x - 8.7625

Sheet1

7.62.3

7.92.6

8.32.9

8.63.2

8.83.7

94.1

9.44.8

10.25.7

11.47

12.18.9

Sheet2

Sheet3

• Hypothesis Tests for theCorrelation Coefficient

• Hypothesis Tests about the Regression Relationship

• Hypothesis Tests for the Regression Slope

• 7-7 How Good is the Regression?

• The Coefficient of DeterminationYXr2=0SSESSTYXr2=0.90SSESSTSSRYXr2=0.50SSESSTSSR

• 7-8 Analysis of Variance and an F Test of the Regression Model

• 7-9 Residual Analysis and Checking for Model Inadequacies

• 7-10 Use of the Regression Model for PredictionPoint PredictionA single-valued estimate of Y for a given value of X obtained by inserting the value of X in the estimated regression equation.Prediction Interval For a value of Y given a value of XVariation in regression line estimate.Variation of points around regression line.For an average value of Y given a value of XVariation in regression line estimate.

• Errors in Predicting E[Y|X]

• Prediction Interval for E[Y|X]XYXPrediction Interval for E[Y|X]YRegression lineThe prediction band for E[Y|X] is narrowest at the mean value of X.The prediction band widens as the distance from the mean of X increases.Predictions become very unreliable when we extrapolate beyond the range of the sample itself.Prediction band for E[Y|X]

• Additional Error in Predicting Individual Value of Y

• Prediction Interval for a Value of Y

• Prediction Interval for the Average Value of Y

• Using the ComputerMTB > regress 'Dollars' 1 'Miles' tres in C3 fits in C4;SUBC> predict 4000;SUBC> residuals in C5.Regression Analysis

The regression equation isDollars = 275 + 1.26 Miles

Predictor Coef Stdev t-ratio pConstant 274.8 170.3 1.61 0.120Miles 1.25533 0.04972 25.25 0.000

s = 318.2 R-sq = 96.5% R-sq(adj) = 96.4%

Analysis of Variance

SOURCE DF SS MS F pRegression 1 64527736 64527736 637.47 0.000Error 23 2328161 101224Total 24 66855896

Fit Stdev.Fit 95.0% C.I. 95.0% P.I. 5296.2 75.6 ( 5139.7, 5452.7) ( 4619.5, 5972.8)

• Plotting on the Computer (1)

• Plotting on the Computer (2)

• Using Statistics.The k-Variable Multiple Regression Model.The F Test of a Multiple Regression Model.How Good is the Regression.Tests of the Significance of Individual Regression Parameters.Testing the Validity of the Regression Model.Using the Multiple Regression Model for Prediction.

• Qualitative Independent Variables.Polynomial Regression.Nonlinear Models and Transformations.Multicollinearity.Residual Autocorrelation and the Durbin-Watson Test.Partial F Tests and Variable Selection Methods.Using the Computer.The Matrix Approach to Multiple Regression Analysis.Summary and Review of Terms.

• 7-11 Using Statistics

• 7-12 The k-Variable Multiple Regression Model

• Simple and Multiple Least-Squares Regression

• The Estimated Regression RelationshipThe estimated regression relationship:

where is the predicted value of Y, the value lying on the estimated regression surface. The terms b0,...,k are the least-squares estimates of the population regression parameters i.The actual, observed value of Y is the predicted value plus an error: y=b0+ b1 x1+ b2 x2+. . . + bk xk+e

• Least-Squares Estimation: The 2-Variable Normal EquationsMinimizing the sum of squared errors with respect to the estimated coefficients b0, b1, and b2 yields the following normal equations:

• Example 7-3

• Example 7-3: Using the ComputerExcel Output

Sheet4

SUMMARY OUTPUT

Regression Statistics

Multiple R0.9803263228

R Square0.9610396992

Standard Error1.9109404323

Observations10

ANOVA

dfSSMSFSignificance F

Regression2630.5381466487315.269073324386.33503537240.0000116729

Residual725.56185335133.6516933359

Total9656.1

CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%

Intercept47.16494227022.470414433319.09191495780.000000269241.323344568853.006539971641.323344568853.0065399716

X11.59904033590.28096305685.69128323810.00074201010.93466875322.26341191860.93466875322.2634119186

X21.14874793820.30524885013.76331618540.00704424580.42694962081.87054625560.42694962081.8705462556

Sheet1

YX1X2

72125

76118

78156

70105

68113

80169

821412

6584

6283

901810

Sheet2

Sheet3

• Decomposition of the Total Deviation in a Multiple Regression ModelTotal Deviation = Regression Deviation + Error Deviation SST = SSR + SSE

• 7-13 The F Test of a Multiple Regression ModelA statistical test for the existence of a linear relationship between Y and any or all of the independent variables X1, x2, ..., Xk:H0: 1 = 2 = ...= k=0H1: Not all the i (i=1,2,...,k) are 0

Source of Variation

Sum of Squares

Degrees of Freedom

Mean Square

F Ratio

Regression

SSR

(k)

Error

SSE

(n-(k+1))

=(n-k-1)

Total

SST

(n-1)

• Using the Computer: Analysis of Variance Table (Example 7-3)Analysis of Variance

SOURCE DF SS MS F pRegression 2 630.54 315.27 86.34 0.000Error 7 25.56 3.65Total 9 656.10The test statistic, F = 86.34, is greater than the critical point of F(2, 7) for any common level of significance (p-value 0), so the null hypothesis is rejected, and we might conclude that the dependent variable is related to one or more of the independent variables.

• 7-14 How Good is the Regression

• Decomposition of the Sum of Squares and the Adjusted Coefficient of DeterminationSSTSSESSRExample 11-1: s = 1.911 R-sq = 96.1% R-sq(adj) = 95.0%

• Measures of Performance in Multiple Regression and the ANOVA Table

Source of Variation

Sum of Squares

Degrees of Freedom

Mean Square

F Ratio

Regression

SSR

(k)

Error

SSE

(n-(k+1))

=(n-k-1)

Total

SST

(n-1)

• 7-15 Tests of the Significance of Individual Regression ParametersHypothesis tests about individual regression slope parameters:(1)H0: b1=0H1: b10(2)H0: b2=0H1: b20 . . .(k)H0: bk=0H1: bk0

• Regression Results for Individual Parameters

Variable

Coefficient

Estimate

Standard

Error

t-Statistic

Constant

53.12

5.43

9.783

*

X1

2.03

0.22

9.227

*

X2

5.60

1.30

4.308

*

X3

10.35

6.88

1.504

X4

3.45

2.70

1.259

X5

-4.25

0.38

11.184

*

n=150

t0.025=1.96

• Example 7-3: Using the ComputerMTB > regress 'Y' on 2 predictors 'X1' 'X2'

Regression Analysis

The regression equation isY = 47.2 + 1.60 X1 + 1.15 X2

Predictor Coef Stdev t-ratio pConstant 47.165 2.470 19.09 0.000X1 1.5990 0.2810 5.69 0.000X2 1.1487 0.3052 3.76 0.007

s = 1.911 R-sq = 96.1% R-sq(adj) = 95.0%

Analysis of Variance

SOURCE DF SS MS F pRegression 2 630.54 315.27 86.34 0.000Error 7 25.56 3.65Total 9 656.10

SOURCE DF SEQ SSX1 1 578.82X2 1 51.72

• Using the Computer: Example 7-4MTB > READ a:\data\c11_t6.dat C1-C5MTB > NAME c1 'EXPORTS' c2 'M1' c3 'LEND' c4 'PRICE' C5 'EXCHANGE'MTB > REGRESS 'EXPORTS' on 4 predictors 'M1' 'LEND' 'PRICE' 'EXCHANGE'

Regression Analysis

The regression equation isEXPORTS = - 4.02 + 0.368 M1 + 0.0047 LEND + 0.0365 PRICE + 0.27 EXCHANGE

Predictor Coef Stdev t-ratio pConstant -4.015 2.766 -1.45 0.152M1 0.36846 0.06385 5.77 0.000LEND 0.00470 0.04922 0.10 0.924PRICE 0.036511 0.009326 3.91 0.000EXCHANGE 0.268 1.175 0.23 0.820

s = 0.3358 R-sq = 82.5% R-sq(adj) = 81.4%

Analysis of Variance

SOURCE DF SS MS F pRegression 4 32.9463 8.2366 73.06 0.000Error 62 6.9898 0.1127Total 66 39.9361

• Example 7-5: Three PredictorsMTB > REGRESS 'EXPORTS' on 3 predictors 'LEND' 'PRICE' 'EXCHANGE'

Regression Analysis

The regression equation isEXPORTS = - 0.29 - 0.211 LEND + 0.0781 PRICE - 2.10 EXCHANGE

Predictor Coef Stdev t-ratio pConstant -0.289 3.308 -0.09 0.931LEND -0.21140 0.03929 -5.38 0.000PRICE 0.078148 0.007268 10.75 0.000EXCHANGE -2.095 1.355 -1.55 0.127

s = 0.4130 R-sq = 73.1% R-sq(adj) = 71.8%

Analysis of Variance

SOURCE DF SS MS F pRegression 3 29.1919 9.7306 57.06 0.000 Error 63 10.7442 0.1705Total 66 39.9361

• Example 7-5: Two PredictorsMTB > REGRESS 'EXPORTS' on 2 predictors 'M1' 'PRICE'

Regression Analysis

The regression equation isEXPORTS = - 3.42 + 0.361 M1 + 0.0370 PRICE

Predictor Coef Stdev t-ratio pConstant -3.4230 0.5409 -6.33 0.000M1 0.36142 0.03925 9.21 0.000PRICE 0.037033 0.004094 9.05 0.000 s = 0.3306 R-sq = 82.5% R-sq(adj) = 81.9%

Analysis of Variance

SOURCE DF SS MS F pRegression 2 32.940 16.470 150.67 0.000Error 64 6.996 0.109Total 66 39.936

• 7-16 Investigating the Validity of the Regression Model: Residual Plots

• Investigating the Validity of the Regression: Residual Plots (2)

• Histogram of Standardized Residuals: Example 7-6

• Investigating the Validity of the Regression: Outliers and Influential Observations

• Outliers and Influential Observations: Example 7-6Unusual ObservationsObs. M1 EXPORTS Fit Stdev.Fit Residual St.Resid 1 5.10 2.6000 2.6420 0.1288 -0.0420 -0.14 X 2 4.90 2.6000 2.6438 0.1234 -0.0438 -0.14 X 25 6.20 5.5000 4.5949 0.0676 0.9051 2.80R 26 6.30 3.7000 4.6311 0.0651 -0.9311 -2.87R 50 8.30 4.3000 5.1317 0.0648 -0.8317 -2.57R 67 8.20 5.6000 4.9474 0.0668 0.6526 2.02R

R denotes an obs. with a large st. resid.X denotes an obs. whose X value gives it large influence.

• 7-17 Using the Multiple Regression Model for Prediction

• Prediction in Multiple RegressionMTB > regress 'EXPORTS' 2 'M1' 'PRICE';SUBC> predict 6 160;SUBC> predict 5 150;SUBC> predict 4 130. Fit Stdev.Fit 95.0% C.I. 95.0% P.I. 4.6708 0.0853 ( 4.5003, 4.8412) ( 3.9885, 5.3530) 3.9390 0.0901 ( 3.7590, 4.1190) ( 3.2543, 4.6237) 2.8370 0.1116 ( 2.6140, 3.0599) ( 2.1397, 3.5342)

• 7-18 Qualitative (or Categorical) Independent Variables (in Regression)MOVIEEARNCOSTPROMBOOK 1284.21.00 2356.03.01 3505.56.01 4203.31.00 57512.511.01 6609.68.01 7152.50.50 84510.85.00 9508.43.01 10346.62.00 114810.71.01 128211.015.01 13243.54.00 14506.910.00 15587.89.01 166310.110.00 17305.01.01 18377.55.00 19456.48.01 207210.012.01MTB > regress 'EARN 3 'COST' 'PROM 'BOOK'

Regression Analysis

The regression equation isEARN = 7.84 + 2.85 COST + 2.28 PROM + 7.17 BOOK

Predictor Coef Stdev t-ratio pConstant 7.836 2.333 3.36 0.004COST 2.8477 0.3923 7.26 0.000PROM 2.2782 0.2534 8.99 0.000BOOK 7.166 1.818 3.94 0.001

s = 3.690 R-sq = 96.7% R-sq(adj) = 96.0%

Analysis of Variance

SOURCE DF SS MS F pRegression 3 6325.2 2108.4 154.89 0.000Error 16 217.8 13.6 Total 19 6543.0

• Picturing Qualitative Variables in RegressionA multiple regression with two quantitative variables (X1 and X2) and one qualitative variable (X3):

A regression with one quantitative variable (X1) and one qualitative variable (X2):

• Picturing Qualitative Variables in Regression: Three Categories and Two Dummy Variables

• Using Qualitative Variables in Regression: Example 7-6

• Interactions between Quantitative and Qualitative Variables: Shifting SlopesA regression with interaction between a quantitative variable (X1) and a qualitative variable (X2 ):

• 7-19 Polynomial Regression

• Polynomial Regression: Example 7-7

• Polynomial Regression: Other Variables and Cross-Product Terms

• 7-20 Nonlinear Models and Transformations: Multiplicative ModelMTB > loge c1 c3MTB > loge c2 c4MTB > name c3 'LOGSALE' c4 'LOGADV'MTB > regress 'logsale' 1 'logadv'Regression AnalysisThe regression equation isLOGSALE = 1.70 + 0.553 LOGADV

Predictor Coef Stdev t-ratio pConstant 1.70082 0.05123 33.20 0.000LOGADV 0.55314 0.03011 18.37 0.000

s = 0.1125 R-sq = 94.7% R-sq(adj) = 94.4%

Analysis of VarianceSOURCE DF SS MS F pRegression 1 4.2722 4.2722 337.56 0.000 Error 19 0.2405 0.0127Total 20 4.5126

• Transformations: Exponential ModelMTB > regress 'sales' 1 'logadv'

Regression AnalysisThe regression equation isSALES = 3.67 + 6.78 LOGADV

Predictor Coef Stdev t-ratio pConstant 3.6683 0.4016 9.13 0.000LOGADV 6.7840 0.2360 28.74 0.000

s = 0.8819 R-sq = 97.8% R-sq(adj) = 97.6%

Analysis of Variance

SOURCE DF SS MS F pRegression 1 642.62 642.62 826.24 0.000 Error 19 14.78 0.78Total 20 657.40

• Plots of Transformed Variables

• Variance Stabilizing TransformationsSquare root transformation:Useful when the variance of the regression errors is approximately proportional to the conditional mean of Y.Logarithmic transformation:Useful when the variance of regression errors is approximately proportional to the square of the conditional mean of Y. Reciprocal transformation:Useful when the variance of the regression errors is approximately proportional to the fourth power of the conditional mean of Y.

• Regression with Dependent Indicator Variables

• 7.21 Multicollinearity

• Effects of MulticollinearityVariances of regression coefficients are inflated.Magnitudes of regression coefficients may be different from what are expected.Signs of regression coefficients may not be as expected.Adding or removing variables produces large changes in coefficients.Removing a data point may cause large changes in coefficient estimates or signs.In some cases, the F ratio may be significant while the t ratios are not.

• Detecting the Existence of Multicollinearity: Correlation Matrix of Independent Variables and Variance Inflation FactorsMTB > CORRELATION 'm1' 'lend 'price 'exchange'

Correlations (Pearson)

M1 LEND PRICELEND -0.112PRICE 0.447 0.745EXCHANGE -0.410 -0.279 -0.420

MTB > regress 'exports' on 4 predictors 'm1 'lend 'price 'exchange';SUBC> vif.

Regression AnalysisThe regression equation isEXPORTS = - 4.02 + 0.368 M1 + 0.0047 LEND + 0.0365 PRICE + 0.27 EXCHANGE

Predictor Coef Stdev t-ratio p VIFConstant -4.015 2.766 -1.45 0.152M1 0.36846 0.06385 5.77 0.000 3.2LEND 0.00470 0.04922 0.10 0.924 5.4PRICE 0.036511 0.009326 3.91 0.000 6.3EXCHANGE 0.268 1.175 0.23 0.820 1.4

s = 0.3358 R-sq = 82.5% R-sq(adj) = 81.4%

• Variance Inflation FactorRelationship between VIF and Rh2

• Solutions to the Multicollinearity ProblemDrop a collinear variable from the regression.Change in sampling plan to include elements outside the multicollinearity range.Transformations of variables.Ridge regression.

• 7-22 Residual Autocorrelation and the Durbin-Watson TestAn autocorrelation is a correlation of the values of a variable with values of the same variable lagged one or more periods back. Consequences of autocorrelation include inaccurate estimates of variances and inaccurate predictions.

• Critical Points of the Durbin-Watson Statistic: =0.05, n= Sample Size, k = Number of Independent Variables k = 1 k = 2 k = 3 k = 4 k = 5 ndL dU dL dU dL dU dL dU dL dU 151.081.360.951.540.821.750.691.970.562.21 161.101.370.981.540.861.730.741.930.622.15 171.131.381.021.540.901.710.781.900.672.10 181.161.391.051.530.931.690.821.870.712.06 ...... ...... ...... 651.571.63 1.541.661.501.701.471.731.441.77 701.581.64 1.551.671.521.701.491.741.461.77 751.601.65 1.571.681.541.711.511.741.491.77 801.611.66 1.591.691.561.721.531.741.511.77 85 1.621.67 1.601.701.571.721.551.751.521.77 901.631.68 1.611.701.591.731.571.751.541.78 951.641.69 1.621.711.601.731.581.751.561.781001.651.69 1.631.721.611.741.591.761.571.78

• Using the Durbin-Watson StatisticMTB > regress 'EXPORTS' 4 'M1' 'LEND' 'PRICE' 'EXCHANGE';SUBC> dw.

Durbin-Watson statistic = 2.58PositiveAutocorrelationNegativeAutocorrelationTest isInconclusiveNoAutocorrelationTest isInconclusive0dLdU4-dL4-dU4For n = 67, k = 4:dU1.73 4-dU2.27 dL1.47 4- dL2.53 < 2.58 H0 is rejected, and we conclude there is negative first-order autocorrelation.

• 7-23 Partial F Tests and Variable Selection MethodsFull model:Y = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 + Reduced model:Y = 0 + 1 X1 + 2 X2 +

Partial F test:H0: 3 = 4 = 0H1: 3 and 4 not both 0

Partial F statistic:

where SSER is the sum of squared errors of the reduced model, SSEF is the sum of squared errors of the full model; MSEF is the mean square error of the full model [MSEF = SSEF/(n-(k+1))]; r is the number of variables dropped from the full model.

• Variable Selection MethodsAll possible regressionsRun regressions with all possible combinations of independent variables and select best model.Stepwise proceduresForward selectionAdd one variable at a time to the model, on the basis of its F statistic.Backward eliminationRemove one variable at a time, on the basis of its F statistic.Stepwise regressionAdds variables to the model and subtracts variables from the model, on the basis of the F statistic.

• Stepwise Regression

• Stepwise Regression: Using the Computer

MTB > STEPWISE 'EXPORTS' PREDICTORS 'M1 'LEND' 'PRICE 'EXCHANGE'

Stepwise Regression

F-to-Enter: 4.00 F-to-Remove: 4.00

Response is EXPORTS on 4 predictors, with N = 67

Step 1 2Constant 0.9348 -3.4230

M1 0.520 0.361T-Ratio 9.89 9.21

PRICE 0.0370T-Ratio 9.05

S 0.495 0.331R-Sq 60.08 82.48

• Using the Computer: MINITABMTB > REGRESS 'EXPORTS 4 'M1 'LEND 'PRICE' 'EXCHANGE';SUBC> vif;SUBC> dw.Regression AnalysisThe regression equation isEXPORTS = - 4.02 + 0.368 M1 + 0.0047 LEND + 0.0365 PRICE + 0.27 EXCHANGE

Predictor Coef Stdev t-ratio p VIFConstant -4.015 2.766 -1.45 0.152M1 0.36846 0.06385 5.77 0.000 3.2LEND 0.00470 0.04922 0.10 0.924 5.4PRICE 0.036511 0.009326 3.91 0.000 6.3EXCHANGE 0.268 1.175 0.23 0.820 1.4

s = 0.3358 R-sq = 82.5% R-sq(adj) = 81.4% Analysis of Variance

SOURCE DF SS MS F pRegression 4 32.9463 8.2366 73.06 0.000Error 62 6.9898 0.1127Total 66 39.9361

Durbin-Watson statistic = 2.58

• Using the Computer: SASdata exports;infile 'c:\aczel\data\c11_t6.dat';input exports m1 lend price exchange;proc reg data = exports;model exports=m1 lend price exchange/dw vif;run;Model: MODEL1Dependent Variable: EXPORTS

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Prob>F

Model 4 32.94634 8.23658 73.059 0.0001 Error 62 6.98978 0.11274 C Total 66 39.93612

Root MSE 0.33577 R-square 0.8250 Dep Mean 4.52836 Adj R-sq 0.8137 C.V. 7.41473

• Using the Computer: SAS (continued)Parameter Estimates

Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T|

INTERCEP 1 -4.015461 2.76640057 -1.452 0.1517 M1 1 0.368456 0.06384841 5.771 0.0001 LEND 1 0.004702 0.04922186 0.096 0.9242 PRICE 1 0.036511 0.00932601 3.915 0.0002 EXCHANGE 1 0.267896 1.17544016 0.228 0.8205

Variance Variable DF Inflation

INTERCEP 1 0.00000000 M1 1 3.20719533 LEND 1 5.35391367 PRICE 1 6.28873181 EXCHANGE 1 1.38570639

Durbin-Watson D 2.583(For Number of Obs.) 671st Order Autocorrelation -0.321

• The Matrix Approach to Regression Analysis (1)

• The Matrix Approach to Regression Analysis (2)