regression analysis and multiple regression

100
Regression Analysis and Multiple Regression Session 7

Upload: brenda-warren

Post on 02-Jan-2016

170 views

Category:

Documents


4 download

DESCRIPTION

Regression Analysis and Multiple Regression. Session 7. Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares Error Variance and the Standard Errors of Regression Estimators Correlation Hypothesis Tests about the Regression Relationship - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Regression Analysis and Multiple Regression

Regression Analysis and Multiple Regression

Session 7

Page 2: Regression Analysis and Multiple Regression

• Using Statistics

• The Simple Linear Regression Model

• Estimation: The Method of Least Squares

• Error Variance and the Standard Errors of Regression Estimators

• Correlation

• Hypothesis Tests about the Regression Relationship

• How Good is the Regression?

• Analysis of Variance Table and an F Test of the Regression Model

• Residual Analysis and Checking for Model Inadequacies

• Use of the Regression Model for Prediction

• Using the Computer

• Summary and Review of Terms

Simple Linear Regression Model

Page 3: Regression Analysis and Multiple Regression

This scatterplot locates pairs of observations of advertising expenditures on the x-axis and sales on the y-axis. We notice that:

Larger (smaller) values of sales tend to be associated with larger (smaller) values of advertising.

Scatterplot of Advertising Expenditures (X) and Sales (Y)

50403020100

140

120

100

80

60

40

20

0

Advertising

Sa

les

The scatter of points tends to be distributed around a positively sloped straight line.

The pairs of values of advertising expenditures and sales are not located exactly on a straight line. The scatter plot reveals a more or less strong tendency rather than a precise linear relationship. The line represents the nature of the relationship on average.

7-1 Using Statistics

Page 4: Regression Analysis and Multiple Regression

X

Y

X

Y

X 0

0

0

0

0

Y

X

Y

X

Y

X

Y

Examples of Other Scatterplots

Page 5: Regression Analysis and Multiple Regression

The inexact nature of the relationship between advertising and sales suggests that a statistical model might be useful in analyzing the relationship.

A statistical model separates the systematic component of a relationship from the random component.

Data

Statistical model

Systematic component

+Random

errors

In ANOVA, the systematic component is the variation of means between samples or treatments (SSTR) and the random component is the unexplained variation (SSE).

In regression, the systematic component is the overall linear relationship, and the random component is the variation around the line.

Model Building

Page 6: Regression Analysis and Multiple Regression

The population simple linear regression model:

Y= 0 + 1 X + Nonrandom or Random

Systematic Component Component

where Y is the dependent variable, the variable we wish to explain or predict; X is the independent variable, also called the predictor variable; and is the error term, the only random component in the model, and thus, the only source of randomness in Y.

0 is the intercept of the systematic component of the regression relationship.

1 is the slope of the systematic component. The conditional mean of Y:

E Y X X[ ] 0 1

7-2 The Simple Linear Regression Model

Page 7: Regression Analysis and Multiple Regression

The simple linear regression model posits an exact linear relationship between the expected or average value of Y, the dependent variable, and X, the independent or predictor variable: E[Yi]=0 + 1 Xi

Actual observed values of Y differ from the expected value by an unexplained or random error:

Yi = E[Yi] + i

= 0 + 1 Xi + i X

Y

E[Y]=0 + 1 X

Xi

}} 1 = Slope

1

0 = Intercept

Yi

{Error: i

Regression Plot

Picturing the Simple Linear Regression Model

Page 8: Regression Analysis and Multiple Regression

• The relationship between X and Y is a straight-line relationship.

• The values of the independent variable X are assumed fixed (not random); the only randomness in the values of Y comes from the error term i.

• The errors i are normally

distributed with mean 0 and variance 2. The errors are uncorrelated (not related) in successive observations. That is: ~ N(0,2)

X

Y

E[Y]=0 + 1 X

Assumptions of the Simple Linear Regression Model

Identical normal distributions of errors, all centered on the regression line.

Assumptions of the Simple Linear Regression Model

Page 9: Regression Analysis and Multiple Regression

Estimation of a simple linear regression relationship involves finding estimated or predicted values of the intercept and slope of the linear regression line.The estimated regression equation: Y=b0 + b1X + e

where b0 estimates the intercept of the population regression line, 0 ;b1 estimates the slope of the population regression line, 1;and e stands for the observed errors - the residuals from fitting the estimated regression line b0 + b1X to a set of n points.

The estimated regression line:

+

where Y (Y - hat) is the value of Y lying on the fitted regression line for a givenvalue of X.

Y b b X 0 1

7-3 Estimation: The Method of Least Squares

Page 10: Regression Analysis and Multiple Regression

Fitting a Regression Line

X

Y

Data

X

Y

Three errors from a fitted line

X

Y

Three errors from the least squares regression line

e

X

Errors from the least squares regression line are minimized

Page 11: Regression Analysis and Multiple Regression

.{Error ei Yi Yi

Yi the predicted value of Y for Xi

Y

X

Y b b X 0 1 the fitted regression line

Yi

Yi

Errors in Regression

Page 12: Regression Analysis and Multiple Regression

Least Squares RegressionThe sum of squared errors in regression is:

SSE = e (y

The is that which the SSEwith respect to the estimates b and b .

The :

y x

x y x x

i

2

i=1

n

ii=1

n

0 1

ii=1

n

ii=1

n

i ii=1

n

ii=1

n

i

2

i=1

n

)y

nb b

b b

i

2

0 1

0 1

least squares regression line

normal equations

minimizes

b0SSE

b1

Least squares b0

Least squares b1

Page 13: Regression Analysis and Multiple Regression

Sums of Squares and Cross Products:

Least squares regression estimators:

SS x x xx

n

SS y y yy

n

SS x x y y xyx y

n

bSSSS

b y b x

x

y

xy

XY

X

( )

( )

( )( )( )

2 2

2

2 2

2

1

0 1

Sums of Squares, Cross Products, and Least Squares Estimators

Page 14: Regression Analysis and Multiple Regression

Miles Dollars Miles 2 Miles*Dollars 1211 1802 1466521 2182222 1345 2405 1809025 3234725 1422 2005 2022084 2851110 1687 2511 2845969 4236057 1849 2332 3418801 4311868 2026 2305 4104676 4669930 2133 3016 4549689 6433128 2253 3385 5076009 7626405 2400 3090 5760000 7416000 2468 3694 6091024 9116792 2699 3371 7284601 9098329 2806 3998 7873636 11218388 3082 3555 9498724 10956510 3209 4692 10297681 15056628 3466 4244 12013156 14709704 3643 5298 13271449 19300614 3852 4801 14837904 18493452 4033 5147 16265089 20757852 4267 5738 18207288 24484046 4498 6420 20232004 28877160 4533 6059 20548088 27465448 4804 6426 23078416 30870504 5090 6321 25908100 32173890 5233 7026 27384288 36767056 5439 6964 29582720 3787719679498 10605 293426944 390185024

SSx xx

n

SSxy xyx y

n

bSS XY

SS X

b y b x

22

29342694479448

2

2540947552

390185024106605

2

2551402848

151402848

409475521 255333776 1 26

0 1106605

251 255333776)

79448

25

274 85

( )

. .

( .

.

Example 7-1

Page 15: Regression Analysis and Multiple Regression

5500500045004000350030002500200015001000

M iles

Do

llars

8 000

7 000

6 000

5 000

4 000

3 000

2 000

1 000

R-Squared = 0.965Y = 274.850 + 1.25533X

Regression of Dollars Charged against MilesMTB > Regress 'Dollars' 1 'Miles';SUBC> Constant.

Regression Analysis

The regression equation isDollars = 275 + 1.26 Miles

Predictor Coef Stdev t-ratio pConstant 274.8 170.3 1.61 0.120Miles 1.25533 0.04972 25.25 0.000

s = 318.2 R-sq = 96.5% R-sq(adj) = 96.4%

Analysis of Variance

SOURCE DF SS MS F pRegression 1 64527736 64527736 637.47 0.000Error 23 2328161 101224Total 24 66855896

Example 7-1: Using the Computer

Page 16: Regression Analysis and Multiple Regression

The results on the right side are the output created by selecting REGRESSION option from the DATA ANALYSIS toolkit.

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.98243393R Square 0.965176428Adjusted R Square 0.963662359Standard Error 318.1578225Observations 25

ANOVAdf SS MS F Significance F

Regression 1 64527736.8 64527736.8 637.4721586 2.85084E-18Residual 23 2328161.201 101224.4Total 24 66855898

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 274.8496867 170.3368437 1.61356569 0.120259309 -77.51844165 627.217815 -77.51844165 627.217815MILES 1.255333776 0.049719712 25.248211 2.85084E-18 1.152480856 1.358186696 1.152480856 1.358186696

Example 7-1: Using Computer-Excel

Page 17: Regression Analysis and Multiple Regression

Residual Analysis. The plot shows the absence of a relationship between the residuals and the X-values (miles).

Residuals vs. Miles

-800

-600

-400

-200

0

200

400

600

0 1000 2000 3000 4000 5000 6000

Miles

Re

sid

ua

ls

Example 7-1: Using Computer-Excel

Page 18: Regression Analysis and Multiple Regression

Y

X

What you see when looking at the total variation of Y.

X

What you see when looking along the regression line at the error variance of Y.

Y

Total Variance and Error Variance

Page 19: Regression Analysis and Multiple Regression

Degrees of Freedom in Regression:

An unbiased estimator of s2

, denoted by S2

:

df = (n - 2) (n total observations less one degree of freedom

for each parameter estimated (b0 and b1) )

= ( - )

=

MSE =SSE

(n - 2)

SSE Y Y SSY

SS XY

SS XSSY b SS XY

( )2

2

1

X

Y

Square and sum all regression errors to find SSE.

Example 10 - 1:

SSE SSY b SS XY

MSESSE

n

s MSE

=

166855898 1 255333776 51402852 4

2328161 2

2

2328161 2

23101224 4

101224 4 318 158

( . )( . )

.

.

.

. .

7-4 Error Variance and the Standard Errors of Regression Estimators

Page 20: Regression Analysis and Multiple Regression

The standard error of (intercept)

where s = MSE

The standard error of (slope)

0

1

b

s bs x

nSS

b

s bs

SS

X

X

:

( )

:

( )

0

2

1

Example 10 - 1:

s bs x

nSS X

s bs

SS X

( )

.

( )( . ).

( )

.

..

0

2

318 158 293426944

25 4097557 84170 338

1

318 158

40947557 840 04972

Standard Errors of Estimates in Regression

Page 21: Regression Analysis and Multiple Regression

A (1 - ) 100% confidence interval for b0

A (1 - ) 100% confidence interval for b1

:

,( )( )

:

,( )( )

b tn

s b

b tn

s b

02

2 0

12

2 1

Example 10 - 195% Confidence Intervals:b t s b

b t s b

0 0 025 25 2 0

0 025 25 2

170 33827485 352 43

7758 627 28

01 25533 010287115246 1 35820

1 1

. ,( ) ( )

. ,( ) ( )

( . ). .

[ . , . ]

( ). .

[ . , . ]

= 274.85 2.069) (

= 1.25533 2.069) ( .04972

Length = 1

Height =

Slope

Least-squares point estimate:b1=1.25533

Upper

95%

bou

nd o

n slo

pe: 1

.358

20

Lower 95% bound: 1

.15246

(not a possible value of the regression slope at 95%)

0

Confidence Intervals for the Regression Parameters

Page 22: Regression Analysis and Multiple Regression

The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables. The population correlation, denoted by, can take on any value from -1 to 1.

indicates a perfect negative linear relationship-1< <0 indicates a negative linear relationship indicates no linear relationship0< <1 indicates a positive linear relationshipindicates a perfect positive linear relationship

The absolute value of indicates the strength or exactness of the relationship.

7-5 Correlation

Page 23: Regression Analysis and Multiple Regression

Y

X

=0

Y

X

=-.8 Y

X

=.8Y

X

=0

Y

X

=-1 Y

X

=1

Illustrations of Correlation

Page 24: Regression Analysis and Multiple Regression

The sample correlation coefficient*:

=rSS

XYSS

XSS

Y

The population correlation coefficient:

=

Cov X Y

X Y

( , )

The covariance of two random variables X and Y: where X and are the population means of X and Y respectivelyY .

Cov X Y E X X Y Y( , ) [( )( )]

Example 10 - 1:

=

rSS

XYSS

XSS

Y

51402852.4

40947557.84 6685589851402852.4

52321943 299824

( )( )

..

*Note: If < 0, b1 < 0 If = 0, b1 = 0 If > 0, b1 >0

Covariance and Correlation

Page 25: Regression Analysis and Multiple Regression

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.992265946R Square 0.984591707Adjusted R Square 0.98266567Standard Error 0.279761372Observations 10

ANOVAdf SS MS F Significance F

Regression 1 40.0098686 40.0098686 511.2009204 1.55085E-08Residual 8 0.626131402 0.078266425Total 9 40.636

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept -8.762524695 0.594092798 -14.74942084 4.39075E-07 -10.13250603 -7.39254336 -10.13250603 -7.39254336US 1.423636087 0.062965575 22.60975277 1.55085E-08 1.278437117 1.568835058 1.278437117 1.568835058

RESIDUAL OUTPUT

Observation Predicted Y Residuals1 2.057109569 0.2428904312 2.484200395 0.1157996053 3.05365483 -0.153654834 3.480745656 -0.2807456565 3.765472874 -0.0654728746 4.050200091 0.0497999097 4.619654526 0.1803454748 5.758563396 -0.0585633969 7.466926701 -0.466926701

10 8.463471962 0.436528038

X Variable 1 Line Fit Plot

0

5

10

0 5 10 15

X Variable 1

Y

Y

Predicted Y

Example 7-2: Using Computer-Excel

Page 26: Regression Analysis and Multiple Regression

8 9 10 11 12

2

3

4

5

6

7

8

9

United States

Inte

rna

tiona

l

Y = -8.76252 + 1.42364X R-Sq = 0.9846

Regression Plot

Example 7-2: Regression Plot

Page 27: Regression Analysis and Multiple Regression

H0: =0 (No linear relationship)H1: 0 (Some linear relationship)

Test Statistic: tr

rn

n( )

2 212

Example 10 -1:

=0.98241- 0.9651

25- 2

=0.98240.0389

H rejected at 1% level0

tr

rn

t

n( )

.

.

. .

2 2

0 005

12

2525

2 807 2525

Hypothesis Tests for theCorrelation Coefficient

Page 28: Regression Analysis and Multiple Regression

Y

X

Y

X

Y

X

Constant Y Unsystematic Variation Nonlinear Relationship

A hypothesis test for the existence of a linear relationship between X and Y:

H 0 H1Test statistic for the existence of a linear relationship between X and Y:

( - )

where is the least - squares estimate of the regression slope and ( ) is the standard error of .

When the null hypothesis is true, the statistic has a distribution with - degrees of freedom.

:

:

( )

1 0

1 0

2

1

1

1 1 12

tn

b

s b

b s b b

t n

Hypothesis Tests about the Regression Relationship

Page 29: Regression Analysis and Multiple Regression

Example 10 - 1:

H 0 H1

=1.25533

0.04972

H 0 is rejected at the 1% level and we may

conclude that there is a relationship between

charges and miles traveled.

( - )

:

:

( )

.

. .( . , )

1 0

1 0

1

1

25 25

2 807 25 25

2

0 005 23

t

b

s b

t

n

Example 10 - 3:

H0 H1

=1.24 - 1

0.21

H0 is not rejected at the 10% level.

We may not conclude that the beta

coefficient is different from 1.

( - )

:

:

( )

.

. .( . , )

1 1

1 1

11

1

114

1 671 114

2

0 05 58

t

b

s b

t

n

Hypothesis Tests for the Regression Slope

Page 30: Regression Analysis and Multiple Regression

The coefficient of determination, r2, is a descriptive measure of the strength of the regression relationship, a measure of how well the regression line fits the data.

.{

Y

X

Y

Y

Y

X

{}Total Deviation

Explained Deviation

Unexplained Deviation

Total = Unexplained ExplainedDeviation Deviation Deviation (Error) (Regression)

SST = SSE + SSR

r2

( ) ( ) ( )

( ) ( ) ( )

y y y y y y

y y y y y y

SSR

SST

SSE

SST

2 2 2

1Percentage of total variation explained by the regression.

7-7 How Good is the Regression?

Page 31: Regression Analysis and Multiple Regression

Y

X

r2=0 SSE

SST

Y

X

r2=0.90SSE

SST

SSR

Y

X

r2=0.50 SSE

SST

SSR

Example 10 -1:

r 2 SSRSST

64527736 866855898

0 96518.

.

5500500045004000350030002500200015001000

7000

6000

5000

4000

3000

2000

Miles

Dol

lar s

The Coefficient of Determination

Page 32: Regression Analysis and Multiple Regression

7-8 Analysis of Variance and an F Test of the Regression Model

Example 10-1

Source ofVariation

Sum ofSquares

Degrees ofFreedom

Mean SquareF Ratio p Value

Regression 64527736.8 1 64527736.8 637.47 0.000

Error 2328161.2 23 101224.4

Total 66855898.0 24

Source ofVariation

Sum ofSquares

Degrees ofFreedom Mean Square F Ratio

Regression SSR (1) MSR MSRMSE

Error SSE (n-2) MSE

Total SST (n-1) MST

Page 33: Regression Analysis and Multiple Regression

x or y

0

Residuals

Homoscedasticity: Residuals appear completely random. No indication of model inadequacy.

0

Residuals

Curved pattern in residuals resulting from underlying nonlinear relationship.

0

Residuals

Residuals exhibit a linear trend with time.

Time

0

Residuals

Heteroscedasticity: Variance of residuals changes when x changes.

x or y

x or y

7-9 Residual Analysis and Checking for Model Inadequacies

Page 34: Regression Analysis and Multiple Regression

• Point PredictionA single-valued estimate of Y for a given value of X

obtained by inserting the value of X in the estimated regression equation.

• Prediction Interval For a value of Y given a value of X

Variation in regression line estimate.Variation of points around regression line.

For an average value of Y given a value of XVariation in regression line estimate.

7-10 Use of the Regression Model for Prediction

Page 35: Regression Analysis and Multiple Regression

X

Y

X

Y

Regression line

Upper limit on slope

Lower limit on slope

1) Uncertainty about the slope of the regression line

X

Y

X

Y

Regression lineUpper limit on intercept

Lower limit on intercept

2) Uncertainty about the intercept of the regression line

Errors in Predicting E[Y|X]

Page 36: Regression Analysis and Multiple Regression

X

Y

X

Prediction Interval for E[Y|X]

Y

Regression line

• The prediction band for E[Y|X] is narrowest at the mean value of X.

• The prediction band widens as the distance from the mean of X increases.

• Predictions become very unreliable when we extrapolate beyond the range of the sample itself.

Prediction Interval for E[Y|X]

Prediction band for E[Y|X]

Page 37: Regression Analysis and Multiple Regression

Additional Error in Predicting Individual Value of Y

3) Variation around the regression line.

X

YRegression line

X

Y

X

Prediction Interval for E[Y|X]

Y

Regression line

Prediction band for E[Y|X]

Prediction band for Y

Page 38: Regression Analysis and Multiple Regression

A (1- ) 100% prediction interval for Y:

Example 10 -1 (X = 4000):

(1.2553)(4000)

( )

( )

. . [ , ]

y tn

x xSSX

2

2

2

11

274.85 2.069 1125

4000 3177.9240947557.84

5296 05 676 62 4619.43 5972.67

Prediction Interval for a Value of Y

Page 39: Regression Analysis and Multiple Regression

A (1- ) 100% prediction interval for the E[Y X]:

Example 10 -1 (X = 4000):

(1.2553)(4000)

( )

( )

. . [ , ]

y tn

x xSSX

2

2

2

1

274.85 2.069125

4000 3177.9240947557.84

5296 05 156 48 5139.57 5452.53

Prediction Interval for the Average Value of Y

Page 40: Regression Analysis and Multiple Regression

MTB > regress 'Dollars' 1 'Miles' tres in C3 fits in C4;SUBC> predict 4000;SUBC> residuals in C5.Regression Analysis

The regression equation isDollars = 275 + 1.26 Miles

Predictor Coef Stdev t-ratio pConstant 274.8 170.3 1.61 0.120Miles 1.25533 0.04972 25.25 0.000

s = 318.2 R-sq = 96.5% R-sq(adj) = 96.4%

Analysis of Variance

SOURCE DF SS MS F pRegression 1 64527736 64527736 637.47 0.000Error 23 2328161 101224Total 24 66855896

Fit Stdev.Fit 95.0% C.I. 95.0% P.I. 5296.2 75.6 ( 5139.7, 5452.7) ( 4619.5, 5972.8)

Using the Computer

Page 41: Regression Analysis and Multiple Regression

5500500045004000350030002500200015001000

500

0

-500

Miles

Res

ids

700060005000400030002000

500

0

-500

Fits

Res

ids

MTB > PLOT 'Resids' * 'Fits' MTB > PLOT 'Resids' *'Miles'

Plotting on the Computer (1)

Page 42: Regression Analysis and Multiple Regression

Plotting on the Computer (2)

MTB > HISTOGRAM 'StRes'

210-1-2

8

7

6

5

4

3

2

1

0

StRes

Fre

quen

cy

5500500045004000350030002500200015001000

7000

6000

5000

4000

3000

2000

MilesD

olla

rs

MTB > PLOT 'Dollars' * 'Miles'

Page 43: Regression Analysis and Multiple Regression

• Using Statistics.

• The k-Variable Multiple Regression Model.

• The F Test of a Multiple Regression Model.

• How Good is the Regression.

• Tests of the Significance of Individual Regression Parameters.

• Testing the Validity of the Regression Model.

• Using the Multiple Regression Model for Prediction.

Multiple Regression (1)11

Page 44: Regression Analysis and Multiple Regression

• Qualitative Independent Variables.

• Polynomial Regression.

• Nonlinear Models and Transformations.

• Multicollinearity.

• Residual Autocorrelation and the Durbin-Watson Test.

• Partial F Tests and Variable Selection Methods.

• Using the Computer.

• The Matrix Approach to Multiple Regression Analysis.

• Summary and Review of Terms.

Multiple Regression (1)11

Page 45: Regression Analysis and Multiple Regression

Slope: 1

Intercept: 0

Any two points (A and B), or an intercept and slope (0 and 1), define a line on a two-dimensional surface.

B

A

x

y

x2

x1

y

C

A

B

Any three points (A, B, and C), or an intercept and coefficients of x1 and x2 (0 , 1, and 2), define a plane in a three-dimensional surface.

Lines Planes

7-11 Using Statistics

Page 46: Regression Analysis and Multiple Regression

y x x 0 1 1 2 2

The population regression model of a dependent variable, Y, on a set of k independent variables, X1, X2,. . . , Xk is given by:

Y= 0 + 1X1 + 2X2 + . . . + kXk +

where 0 is the Y-intercept of the regression surface and each i , i = 1,2,...,k is the slope of the regression surface - sometimes called the response surface - with respect to Xi.

x2

x1

y 2

10

Model assumptions:1. ~N(0,2), independent of other errors.2. The variables Xi are uncorrelated with the error term.

7-12 The k-Variable Multiple Regression Model

Page 47: Regression Analysis and Multiple Regression

In a simple regression model, the least-squares estimators minimize the sum of squared errors from the estimated regression line.

In a multiple regression model, the least-squares estimators minimize the sum of squared errors from the estimated regression plane.

X

Y

x2

x1

y

y b b x 0 1y b b x b x 0 1 1 2 2

Simple and Multiple Least-Squares Regression

Page 48: Regression Analysis and Multiple Regression

The estimated regression relationship:

where is the predicted value of Y, the value lying on the estimated regression surface. The terms b0,...,k are the least-squares estimates of the population regression parameters i.

Y b b X b X b Xk k 0 1 1 2 2

Y

The actual, observed value of Y is the predicted value plus an error:

y=b0+ b1 x1+ b2 x2+. . . + bk xk+e

The Estimated Regression Relationship

Page 49: Regression Analysis and Multiple Regression

y nb b x b x

x y b x b x b x x

x y b x b x x b x

0 1 1 2 2

1 0 1 1 1

2

2 1 2

2 0 2 1 1 2 2 2

2

Minimizing the sum of squared errors with respect to the estimated coefficients b0, b1, and b2 yields the following normal equations:

Least-Squares Estimation: The 2-Variable Normal Equations

Page 50: Regression Analysis and Multiple Regression

Y X1 X2 X1X2 X12 X2

2 X1Y X2Y 72 12 5 60 144 25 864 360 76 11 8 88 121 64 836 608 78 15 6 90 225 36 1170 468 70 10 5 50 100 25 700 350 68 11 3 33 121 9 748 204 80 16 9 144 256 81 1280 720 82 14 12 168 196 144 1148 984 65 8 4 32 64 16 520 260 62 8 3 24 64 9 496 186 90 18 10 180 324 100 1620 900--- --- --- --- ---- --- ---- ----743 123 65 869 1615 509 9382 5040

Normal Equations:

743 = 10b0+123b1+65b2

9382 = 123b0+1615b1+869b2

5040 = 65b0+869b1+509b2

b0 = 47.164942b1 = 1.5990404b2 = 1.1487479

Estimated regression equation:

. . .Y X X 47164942 15990404 114874791 2

Example 7-3

Page 51: Regression Analysis and Multiple Regression

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.980326323R Square 0.961039699Adjusted R Square 0.949908185Standard Error 1.910940432Observations 10

ANOVAdf SS MS F Significance F

Regression 2 630.5381466 315.2690733 86.33503537 1.16729E-05Residual 7 25.56185335 3.651693336Total 9 656.1

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 47.16494227 2.470414433 19.09191496 2.69229E-07 41.32334457 53.00653997X1 1.599040336 0.280963057 5.691283238 0.00074201 0.934668753 2.263411919X2 1.148747938 0.30524885 3.763316185 0.007044246 0.426949621 1.870546256

Excel Output

Example 7-3: Using the Computer

Page 52: Regression Analysis and Multiple Regression

Total Deviation = Regression Deviation + Error Deviation SST = SSR + SSE

x2

x1

y

y

Y Y : Error Deviation

Y Y : Regression DeviationTotal deviation: Y Y

Decomposition of the Total Deviation in a Multiple Regression Model

Page 53: Regression Analysis and Multiple Regression

A statistical test for the existence of a linear relationship between Y and any or all of the independent variables X1, x2, ..., Xk:

H0: 1 = 2 = ...= k=0H1: Not all the i (i=1,2,...,k) are 0

Source ofVariation

Sum ofSquares

Degrees ofFreedom Mean Square F Ratio

Regression SSR (k)

Error SSE (n-(k+1))=(n-k-1)

Total SST (n-1)

MSRSSR

k

MSESSE

n k

( ( ))1

MSTSST

n

( )1

7-13 The F Test of a Multiple Regression Model

Page 54: Regression Analysis and Multiple Regression

Analysis of Variance

SOURCE DF SS MS F pRegression 2 630.54 315.27 86.34 0.000Error 7 25.56 3.65Total 9 656.10

The test statistic, F = 86.34, is greater than the critical point of F(2, 7)

for any common level of significance (p-value 0), so the null hypothesis is rejected, and we might conclude that the dependent variable is related to one or more of the independent variables.0

F

F Distribution with 2 and 7 Degrees of Freedom

F0.01=9.55

=0.01

Test statistic 86.34f(F)

Using the Computer: Analysis of Variance Table (Example 7-3)

Page 55: Regression Analysis and Multiple Regression

The multiple coefficient of determination, R2 , measures the proportion ofthe variation in the dependent variable that is explained by the combinationof the independent variables in the multiple regression model:

=SSRSST

= 1-SSESST

R2

The is an unbiasedestimator of the variance of the population

errors, denoted by 2

:

=

mean square error

Standard error of estimate

, :

( ( ))

( )( ( ))

MSESSE

n k

y y

n k

s MSE

1

2

1

x2

x1

y

Errors: y - y

7-14 How Good is the Regression

Page 56: Regression Analysis and Multiple Regression

The , R2

, is the coefficient ofdetermination with the SSE and SST divided by their respective degrees of freedom:

= 1 -

SSE

(n - (k + 1))

SST

(n - 1)

adjusted multiple coefficient of determination

R2

SST

SSESSR

=SSR

SST= 1 -

SSE

SSTR

2

Example 11-1: s = 1.911 R-sq = 96.1% R-sq(adj) = 95.0%

Decomposition of the Sum of Squares and the Adjusted Coefficient of Determination

Page 57: Regression Analysis and Multiple Regression

Source ofVariation

Sum ofSquares

Degrees ofFreedom Mean Square F Ratio

Regression SSR (k)

Error SSE (n-(k+1))=(n-k-1)

Total SST (n-1)

MSRSSR

k

MSESSE

n k

( ( ))1

MSTSST

n

( )1

FMSR

MSE

=SSR

SST= 1 -

SSE

SSTR

2 = 1 -

SSE

(n - (k + 1))

SST

(n - 1)

=MSE

MSTR

2F

R

R

n k

k

2

12

1

( )

( ( ))

( )

Measures of Performance in Multiple Regression and the ANOVA Table

Page 58: Regression Analysis and Multiple Regression

Hypothesis tests about individual regression slope parameters:

(1) H0: 1=0H1: 10

(2) H0: 2=0H1: 20 .

. .

(k) H0: k=0H1: k0

Test statistic for test i tb

s bn k

i

i

:( )( ( )

1

0

7-15 Tests of the Significance of Individual Regression Parameters

Page 59: Regression Analysis and Multiple Regression

VariableCoefficientEstimate

StandardError t-Statistic

Constant 53.12 5.43 9.783 *X1 2.03 0.22 9.227 *X2 5.60 1.30 4.308 *X3 10.35 6.88 1.504

X4 3.45 2.70 1.259

X5 -4.25 0.38 11.184 *n=150 t0.025=1.96

Regression Results for Individual Parameters

Page 60: Regression Analysis and Multiple Regression

MTB > regress 'Y' on 2 predictors 'X1' 'X2'

Regression Analysis

The regression equation isY = 47.2 + 1.60 X1 + 1.15 X2

Predictor Coef Stdev t-ratio pConstant 47.165 2.470 19.09 0.000X1 1.5990 0.2810 5.69 0.000X2 1.1487 0.3052 3.76 0.007

s = 1.911 R-sq = 96.1% R-sq(adj) = 95.0%

Analysis of Variance

SOURCE DF SS MS F pRegression 2 630.54 315.27 86.34 0.000Error 7 25.56 3.65Total 9 656.10

SOURCE DF SEQ SSX1 1 578.82X2 1 51.72

Example 7-3: Using the Computer

Page 61: Regression Analysis and Multiple Regression

MTB > READ ‘a:\data\c11_t6.dat’ C1-C5MTB > NAME c1 'EXPORTS' c2 'M1' c3 'LEND' c4 'PRICE' C5 'EXCHANGE'MTB > REGRESS 'EXPORTS' on 4 predictors 'M1' 'LEND' 'PRICE' 'EXCHANGE'

Regression Analysis

The regression equation isEXPORTS = - 4.02 + 0.368 M1 + 0.0047 LEND + 0.0365 PRICE + 0.27 EXCHANGE

Predictor Coef Stdev t-ratio pConstant -4.015 2.766 -1.45 0.152M1 0.36846 0.06385 5.77 0.000LEND 0.00470 0.04922 0.10 0.924PRICE 0.036511 0.009326 3.91 0.000EXCHANGE 0.268 1.175 0.23 0.820

s = 0.3358 R-sq = 82.5% R-sq(adj) = 81.4%

Analysis of Variance

SOURCE DF SS MS F pRegression 4 32.9463 8.2366 73.06 0.000Error 62 6.9898 0.1127Total 66 39.9361

Using the Computer: Example 7-4

Page 62: Regression Analysis and Multiple Regression

MTB > REGRESS 'EXPORTS' on 3 predictors 'LEND' 'PRICE' 'EXCHANGE'

Regression Analysis

The regression equation isEXPORTS = - 0.29 - 0.211 LEND + 0.0781 PRICE - 2.10 EXCHANGE

Predictor Coef Stdev t-ratio pConstant -0.289 3.308 -0.09 0.931LEND -0.21140 0.03929 -5.38 0.000PRICE 0.078148 0.007268 10.75 0.000EXCHANGE -2.095 1.355 -1.55 0.127

s = 0.4130 R-sq = 73.1% R-sq(adj) = 71.8%

Analysis of Variance

SOURCE DF SS MS F pRegression 3 29.1919 9.7306 57.06 0.000 Error 63 10.7442 0.1705Total 66 39.9361

Example 7-5: Three Predictors

Page 63: Regression Analysis and Multiple Regression

MTB > REGRESS 'EXPORTS' on 2 predictors 'M1' 'PRICE'

Regression Analysis

The regression equation isEXPORTS = - 3.42 + 0.361 M1 + 0.0370 PRICE

Predictor Coef Stdev t-ratio pConstant -3.4230 0.5409 -6.33 0.000M1 0.36142 0.03925 9.21 0.000PRICE 0.037033 0.004094 9.05 0.000 s = 0.3306 R-sq = 82.5% R-sq(adj) = 81.9%

Analysis of Variance

SOURCE DF SS MS F pRegression 2 32.940 16.470 150.67 0.000Error 64 6.996 0.109Total 66 39.936

Example 7-5: Two Predictors

Page 64: Regression Analysis and Multiple Regression

160150140130120110

1

0

-1

P RIC E

RE

SID

UA

L

98765

1

0

-1

M1

RE

SID

UA

L

Residuals Plotted Against M1 (Apparently Random)

Residuals Plotted Against Price (Apparent Heteroscedasticity)

7-16 Investigating the Validity of the Regression Model: Residual Plots

Page 65: Regression Analysis and Multiple Regression

Investigating the Validity of the Regression: Residual Plots (2)

Residuals Plotted Against Time (Apparently Random)

Residuals Plotted Against Fitted Values (Apparent Heteroscedasticity)

706050403020100

1

0

-1

TIME

RE

SI D

UA

L

543

1

0

-1

Y-HATR

ES

IDU

AL

Page 66: Regression Analysis and Multiple Regression

MTB > Histogram 'SRES1'.Histogram of SRES1 N = 67

Midpoint Count -3.0 1 * -2.5 1 * -2.0 3 *** -1.5 1 * -1.0 5 ***** -0.5 13 ************* 0.0 19 ******************* 0.5 12 ************ 1.0 6 ****** 1.5 3 *** 2.0 2 ** 2.5 0 3.0 1 *

Standardized Residuals:

i

~ ( , )N 0 1

Histogram of Standardized Residuals: Example 7-6

Page 67: Regression Analysis and Multiple Regression

.

.

.

...

.

.

....

... .

* Outlier

y

x

Regression line without outlier

Regression line with outlier

Outliers

... .... .

.. ... . .

Point with a large value of xiy

x

*

Regression line when all data are included

No relationship in this cluster

Influential Observations

Investigating the Validity of the Regression: Outliers and Influential Observations

Page 68: Regression Analysis and Multiple Regression

Unusual ObservationsObs. M1 EXPORTS Fit Stdev.Fit Residual St.Resid 1 5.10 2.6000 2.6420 0.1288 -0.0420 -0.14 X 2 4.90 2.6000 2.6438 0.1234 -0.0438 -0.14 X 25 6.20 5.5000 4.5949 0.0676 0.9051 2.80R 26 6.30 3.7000 4.6311 0.0651 -0.9311 -2.87R 50 8.30 4.3000 5.1317 0.0648 -0.8317 -2.57R 67 8.20 5.6000 4.9474 0.0668 0.6526 2.02R

R denotes an obs. with a large st. resid.X denotes an obs. whose X value gives it large influence.

Outliers and Influential Observations: Example 7-6

Page 69: Regression Analysis and Multiple Regression

Sales

Advertising

Promotions

8.00

18.00

312

63.42

89.76

Estimated Regression Plane for Example 11-1

7-17 Using the Multiple Regression Model for Prediction

Page 70: Regression Analysis and Multiple Regression

MTB > regress 'EXPORTS' 2 'M1' 'PRICE';SUBC> predict 6 160;SUBC> predict 5 150;SUBC> predict 4 130. Fit Stdev.Fit 95.0% C.I. 95.0% P.I. 4.6708 0.0853 ( 4.5003, 4.8412) ( 3.9885, 5.3530) 3.9390 0.0901 ( 3.7590, 4.1190) ( 3.2543, 4.6237) 2.8370 0.1116 ( 2.6140, 3.0599) ( 2.1397, 3.5342)

A (1 - a) 100% prediction interval for a value of Y given values of Xi:

A (1 - a) 100% prediction interval for the conditional mean of Y givenvalues of Xi:

( )

[ ( )]

( ,( ( )))

( ,( ( )))

y t s y MSE

y t s E Y

n k

n k

21

2

21

Prediction in Multiple Regression

Page 71: Regression Analysis and Multiple Regression

MOVIE EARN COST PROM BOOK 1 28 4.2 1.0 0 2 35 6.0 3.0 1 3 50 5.5 6.0 1 4 20 3.3 1.0 0 5 75 12.5 11.0 1 6 60 9.6 8.0 1 7 15 2.5 0.5 0 8 45 10.8 5.0 0 9 50 8.4 3.0 1 10 34 6.6 2.0 0 11 48 10.7 1.0 1 12 82 11.0 15.0 1 13 24 3.5 4.0 0 14 50 6.9 10.0 0 15 58 7.8 9.0 1 16 63 10.1 10.0 0 17 30 5.0 1.0 1 18 37 7.5 5.0 0 19 45 6.4 8.0 1 20 72 10.0 12.0 1

MTB > regress 'EARN’ 3 'COST' 'PROM’ 'BOOK'

Regression Analysis

The regression equation isEARN = 7.84 + 2.85 COST + 2.28 PROM + 7.17 BOOK

Predictor Coef Stdev t-ratio pConstant 7.836 2.333 3.36 0.004COST 2.8477 0.3923 7.26 0.000PROM 2.2782 0.2534 8.99 0.000BOOK 7.166 1.818 3.94 0.001

s = 3.690 R-sq = 96.7% R-sq(adj) = 96.0%

Analysis of Variance

SOURCE DF SS MS F pRegression 3 6325.2 2108.4 154.89 0.000Error 16 217.8 13.6 Total 19 6543.0

An indicator (dummy, binary) variable of qualitative level A:

if level A is obtained

if level A is not obtainedX h

1

0

7-18 Qualitative (or Categorical) Independent Variables (in Regression)

Page 72: Regression Analysis and Multiple Regression

A multiple regression with two quantitative variables (X1 and X2) and one qualitative variable (X3):

A regression with one quantitative variable (X1) and one qualitative variable (X2):

X1

Y

Line for X2=1

Line for X2=0

b0

b0+b2

x2

x1

y

b3

y b b x b x 0 1 1 2 2

y b b x b x b x 0 1 1 2 2 3 3

Picturing Qualitative Variables in Regression

Page 73: Regression Analysis and Multiple Regression

b0 X1

YLine for X = 0 and X3 = 1

A regression with one quantitative variable (X1) and two qualitative variables (X2 and X2):

b0+b2

b0+b3

Line for X2 = 1 and X3 = 0

Line for X2 = 0 and X3 = 0

A qualitative variable with r levels or categories is represented with (r-1) 0/1 (dummy) variables.

Category X2 X3

Adventure 0 0Drama 0 1Romance 1 0y b b x b x b x

0 1 1 2 2 3 3

Picturing Qualitative Variables in Regression: Three Categories and Two Dummy Variables

Page 74: Regression Analysis and Multiple Regression

Salary = 8547 + 949 Education + 1258 Experience - 3256 Gender (SE) (32.6) (45.1) (78.5) (212.4) (t) (262.2) (21.0) (16.0) (-15.3)

On average, female salaries are $3256 below male salaries

Genderif Female

if Male

1

0

Using Qualitative Variables in Regression: Example 7-6

Page 75: Regression Analysis and Multiple Regression

A regression with interaction between a quantitative variable (X1) and a qualitative variable (X2 ):

X1

YLine for X2=0

b0+b2

b0

Line for X2=1Slope = b1

Slope = b1+b3

y b b x b x b x x 0 1 1 2 2 3 1 2

Interactions between Quantitative and Qualitative Variables: Shifting Slopes

Page 76: Regression Analysis and Multiple Regression

One-variable polynomial regression model:Y=0+1 X + 2X2 + 3X3 +. . . + mXm +

where m is the degree of the polynomial - the highest power of X appearing in the equation. The degree of the polynomial is the order of the model.

X1

Y

X1

Y

y b b X 0 1

( )

y b b X b X

b

0 1 2

2

20

y b b X 0 1

y b b X b X b X 0 1 2

2

3

3

7-19 Polynomial Regression

Page 77: Regression Analysis and Multiple Regression

MTB > regress sales' 2 'advert’ 'advsqr'

Regression Analysis

The regression equation isSALES = 3.52 + 2.51 ADVERT - 0.0875 ADVSQR

Predictor Coef Stdev t-ratio pConstant 3.5150 0.7385 4.76 0.000ADVERT 2.5148 0.2580 9.75 0.000ADVSQR -0.08745 0.01658 -5.28 0.000

s = 1.228 R-sq = 95.9% R-sq(adj) = 95.4%

Analysis of Variance

SOURCE DF SS MS F pRegression 2 630.26 315.13 208.99 0.000Error 18 27.14 1.51Total 20 657.40

151050

25

15

5

ADVERT

SA

LES

Polynomial Regression: Example 7-7

Page 78: Regression Analysis and Multiple Regression

Variable Estimate Standard Error T-statistic X1 2.34 0.92 2.54 X2 3.11 1.05 2.96 X1

2 4.22 1.00 4.22 X2

2 3.57 2.12 1.68 X1X2 2.77 2.30 1.20

Polynomial Regression: Other Variables and Cross-Product Terms

Page 79: Regression Analysis and Multiple Regression

The

Y X X X

The

Y X X X

:

multiplicative model

logarithmic transformation

:

log log log log log log

0 1 2 3

0 1 1 2 2 3 3

1 2 3

MTB > loge c1 c3MTB > loge c2 c4MTB > name c3 'LOGSALE' c4 'LOGADV'MTB > regress 'logsale' 1 'logadv'Regression AnalysisThe regression equation isLOGSALE = 1.70 + 0.553 LOGADV

Predictor Coef Stdev t-ratio pConstant 1.70082 0.05123 33.20 0.000LOGADV 0.55314 0.03011 18.37 0.000

s = 0.1125 R-sq = 94.7% R-sq(adj) = 94.4%

Analysis of VarianceSOURCE DF SS MS F pRegression 1 4.2722 4.2722 337.56 0.000 Error 19 0.2405 0.0127Total 20 4.5126

7-20 Nonlinear Models and Transformations: Multiplicative Model

Page 80: Regression Analysis and Multiple Regression

MTB > regress 'sales' 1 'logadv'

Regression AnalysisThe regression equation isSALES = 3.67 + 6.78 LOGADV

Predictor Coef Stdev t-ratio pConstant 3.6683 0.4016 9.13 0.000LOGADV 6.7840 0.2360 28.74 0.000

s = 0.8819 R-sq = 97.8% R-sq(adj) = 97.6%

Analysis of Variance

SOURCE DF SS MS F pRegression 1 642.62 642.62 826.24 0.000 Error 19 14.78 0.78Total 20 657.40

The

Y e

The

Y X

X

:

exponential model

logarithmic transformation

:

log log log

0

0 1 1

1

Transformations: Exponential Model

Page 81: Regression Analysis and Multiple Regression

151050

30

20

10

ADVERT

SA

LES

Simple Regression of Sales on Advertising

3210

3.5

2.5

1.5

LOGADV

LOG

SA

LE

Regression of Log(Sales) on Log(Advertising)

R- Sq uared = 0 .8 9 5

Y = 6 .59 2 71 + 1.19 176 X

R-Sq uared = 0 .9 47

Y = 1.70 0 82 + 0 .553 13 6 X

3210

25

15

5

LOGADV

SA

LES

R-Sq uared = 0 .978

Y = 3.6 682 5 + 6.784 X

Regression of Sales on Log(Advertising)

22122

1.5

0.5

-0.5

-1.5

Y-HAT

RE

SID

S

Residual Plots: Sales vs Log(Advertising)

Plots of Transformed Variables

Page 82: Regression Analysis and Multiple Regression

• Square root transformation:Useful when the variance of the regression errors is

approximately proportional to the conditional mean of Y.

• Logarithmic transformation:Useful when the variance of regression errors is approximately

proportional to the square of the conditional mean of Y.

• Reciprocal transformation:Useful when the variance of the regression errors is

approximately proportional to the fourth power of the conditional mean of Y.

Y Y

Y Ylog( )

YY

1

Variance Stabilizing Transformations

Page 83: Regression Analysis and Multiple Regression

E Y Xe

e

pp

p

X

X( )

log

( )

( )

0 1

0 11

1

y

x

1

0

Logistic Function

The logistic function:

Transformation to linearize the logistic function:

Regression with Dependent Indicator Variables

Page 84: Regression Analysis and Multiple Regression

x2

x1

Orthogonal X variables provide information from independent sources. No multicollinearity.

x2x1

Perfectly collinear X variables provide identical information content. No regression.

Some degree of collinearity. Problems with regression depend on the degree of collinearity.

x2

x1

A high degree of negative collinearity also causes problems with regression.

x2

x1

7.21 Multicollinearity

Page 85: Regression Analysis and Multiple Regression

• Variances of regression coefficients are inflated.

• Magnitudes of regression coefficients may be different from what are expected.

• Signs of regression coefficients may not be as expected.

• Adding or removing variables produces large changes in coefficients.

• Removing a data point may cause large changes in coefficient estimates or signs.

• In some cases, the F ratio may be significant while the t ratios are not.

Effects of Multicollinearity

Page 86: Regression Analysis and Multiple Regression

MTB > CORRELATION 'm1' 'lend’ 'price’ 'exchange'

Correlations (Pearson)

M1 LEND PRICELEND -0.112PRICE 0.447 0.745EXCHANGE -0.410 -0.279 -0.420

MTB > regress 'exports' on 4 predictors 'm1’ 'lend’ 'price’ 'exchange';SUBC> vif.

Regression AnalysisThe regression equation isEXPORTS = - 4.02 + 0.368 M1 + 0.0047 LEND + 0.0365 PRICE + 0.27 EXCHANGE

Predictor Coef Stdev t-ratio p VIFConstant -4.015 2.766 -1.45 0.152M1 0.36846 0.06385 5.77 0.000 3.2LEND 0.00470 0.04922 0.10 0.924 5.4PRICE 0.036511 0.009326 3.91 0.000 6.3EXCHANGE 0.268 1.175 0.23 0.820 1.4

s = 0.3358 R-sq = 82.5% R-sq(adj) = 81.4%

Detecting the Existence of Multicollinearity: Correlation Matrix of Independent Variables and Variance Inflation Factors

Page 87: Regression Analysis and Multiple Regression

1.00.50.0

100

50

0Rh

2

VIF

Relationship between VIF and Rh2

The associated with

where R is the value obtained for the regression of X on the other independent variables.

h

2 2

variance inflation factor X

VIF XR

R

h

hh

:

( ) 1

1 2

Variance Inflation Factor

Page 88: Regression Analysis and Multiple Regression

• Drop a collinear variable from the regression.

• Change in sampling plan to include elements outside the multicollinearity range.

• Transformations of variables.

• Ridge regression.

Solutions to the Multicollinearity Problem

Page 89: Regression Analysis and Multiple Regression

An autocorrelation is a correlation of the values of a variable with values of the same variable lagged one or more periods back. Consequences of autocorrelation include inaccurate estimates of variances and inaccurate predictions.

Lagged Residuals

i i i-1 i-2 i-3 i-4

1 1.0 * * * * 2 0.0 1.0 * * * 3 -1.0 0.0 1.0 * * 4 2.0 -1.0 0.0 1.0 * 5 3.0 2.0 -1.0 0.0 1.0 6 -2.0 3.0 2.0 -1.0 0.0 7 1.0 -2.0 3.0 2.0 -1.0 8 1.5 1.0 -2.0 3.0 2.0 9 1.0 1.5 1.0 -2.0 3.010 -2.5 1.0 1.5 1.0 -2.0

The Durbin-Watson test (first-order autocorrelation): H0: 1 = 0 H1:0The Durbin-Watson test statistic:

dei eii

n

eii

n

( )12

22

1

7-22 Residual Autocorrelation and the Durbin-Watson Test

Page 90: Regression Analysis and Multiple Regression

k = 1 k = 2 k = 3 k = 4 k = 5 n dL dU dL dU dL dU dL dU dL dU

15 1.08 1.36 0.95 1.54 0.82 1.75 0.69 1.97 0.56 2.21 16 1.10 1.37 0.98 1.54 0.86 1.73 0.74 1.93 0.62 2.15 17 1.13 1.38 1.02 1.54 0.90 1.71 0.78 1.90 0.67 2.10 18 1.16 1.39 1.05 1.53 0.93 1.69 0.82 1.87 0.71 2.06 . . . . . . . . . . . . . . . . . . 65 1.57 1.63 1.54 1.66 1.50 1.70 1.47 1.73 1.44 1.77 70 1.58 1.64 1.55 1.67 1.52 1.70 1.49 1.74 1.46 1.77 75 1.60 1.65 1.57 1.68 1.54 1.71 1.51 1.74 1.49 1.77 80 1.61 1.66 1.59 1.69 1.56 1.72 1.53 1.74 1.51 1.77 85 1.62 1.67 1.60 1.70 1.57 1.72 1.55 1.75 1.52 1.77 90 1.63 1.68 1.61 1.70 1.59 1.73 1.57 1.75 1.54 1.78 95 1.64 1.69 1.62 1.71 1.60 1.73 1.58 1.75 1.56 1.78100 1.65 1.69 1.63 1.72 1.61 1.74 1.59 1.76 1.57 1.78

Critical Points of the Durbin-Watson Statistic: =0.05, n= Sample Size, k = Number of Independent Variables

Page 91: Regression Analysis and Multiple Regression

MTB > regress 'EXPORTS' 4 'M1' 'LEND' 'PRICE' 'EXCHANGE';SUBC> dw.

Durbin-Watson statistic = 2.58

PositiveAutocorrelation

NegativeAutocorrelation

Test isInconclusive

NoAutocorrelation

Test isInconclusive

0 dL dU 4-dL4-dU 4

For n = 67, k = 4: dU1.73 4-dU2.27 dL1.47 4- dL2.53 < 2.58

H0 is rejected, and we conclude there is negative first-order autocorrelation.

Using the Durbin-Watson Statistic

Page 92: Regression Analysis and Multiple Regression

Full model:Y = 0 + 1 X1 + 2 X2 + 3 X3 + 4 X4 +

Reduced model:Y = 0 + 1 X1 + 2 X2 +

Partial F test:H0: 3 = 4 = 0H1: 3 and 4 not both 0

Partial F statistic:

where SSER is the sum of squared errors of the reduced model, SSEF is the sum of squared errors of the full model; MSEF is the mean square error of the full model [MSEF = SSEF/(n-(k+1))]; r is the number of variables dropped from the full model.

F(r, (n (k 1))

(SSER

SSEF

) / r

MSEF

7-23 Partial F Tests and Variable Selection Methods

Page 93: Regression Analysis and Multiple Regression

• All possible regressionsRun regressions with all possible combinations of independent

variables and select best model.

• Stepwise proceduresForward selection

Add one variable at a time to the model, on the basis of its F statistic.

Backward elimination

Remove one variable at a time, on the basis of its F statistic.Stepwise regression

Adds variables to the model and subtracts variables from the model, on the basis of the F statistic.

Variable Selection Methods

Page 94: Regression Analysis and Multiple Regression

Compute F statistic for each variable not in the model

Enter most significant (smallest p-value) variable into model

Calculate partial F for all variables in the model

Is there a variable with p-value > Pout?Removevariable

Stop

Yes

NoIs there at least one variable with p-value > Pin?

No

Stepwise Regression

Page 95: Regression Analysis and Multiple Regression

MTB > STEPWISE 'EXPORTS' PREDICTORS 'M1’ 'LEND' 'PRICE’ 'EXCHANGE'

Stepwise Regression

F-to-Enter: 4.00 F-to-Remove: 4.00

Response is EXPORTS on 4 predictors, with N = 67

Step 1 2Constant 0.9348 -3.4230

M1 0.520 0.361T-Ratio 9.89 9.21

PRICE 0.0370T-Ratio 9.05

S 0.495 0.331R-Sq 60.08 82.48

Stepwise Regression: Using the Computer

Page 96: Regression Analysis and Multiple Regression

MTB > REGRESS 'EXPORTS’ 4 'M1’ 'LEND’ 'PRICE' 'EXCHANGE';SUBC> vif;SUBC> dw.Regression AnalysisThe regression equation isEXPORTS = - 4.02 + 0.368 M1 + 0.0047 LEND + 0.0365 PRICE + 0.27 EXCHANGE

Predictor Coef Stdev t-ratio p VIFConstant -4.015 2.766 -1.45 0.152M1 0.36846 0.06385 5.77 0.000 3.2LEND 0.00470 0.04922 0.10 0.924 5.4PRICE 0.036511 0.009326 3.91 0.000 6.3EXCHANGE 0.268 1.175 0.23 0.820 1.4

s = 0.3358 R-sq = 82.5% R-sq(adj) = 81.4% Analysis of Variance

SOURCE DF SS MS F pRegression 4 32.9463 8.2366 73.06 0.000Error 62 6.9898 0.1127Total 66 39.9361

Durbin-Watson statistic = 2.58

Using the Computer: MINITAB

Page 97: Regression Analysis and Multiple Regression

data exports;infile 'c:\aczel\data\c11_t6.dat';input exports m1 lend price exchange;proc reg data = exports;model exports=m1 lend price exchange/dw vif;run;

Model: MODEL1Dependent Variable: EXPORTS

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Prob>F

Model 4 32.94634 8.23658 73.059 0.0001 Error 62 6.98978 0.11274 C Total 66 39.93612

Root MSE 0.33577 R-square 0.8250 Dep Mean 4.52836 Adj R-sq 0.8137 C.V. 7.41473

Using the Computer: SAS

Page 98: Regression Analysis and Multiple Regression

Parameter Estimates

Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T|

INTERCEP 1 -4.015461 2.76640057 -1.452 0.1517 M1 1 0.368456 0.06384841 5.771 0.0001 LEND 1 0.004702 0.04922186 0.096 0.9242 PRICE 1 0.036511 0.00932601 3.915 0.0002 EXCHANGE 1 0.267896 1.17544016 0.228 0.8205

Variance Variable DF Inflation

INTERCEP 1 0.00000000 M1 1 3.20719533 LEND 1 5.35391367 PRICE 1 6.28873181 EXCHANGE 1 1.38570639

Durbin-Watson D 2.583(For Number of Obs.) 671st Order Autocorrelation -0.321

Using the Computer: SAS (continued)

Page 99: Regression Analysis and Multiple Regression

The population regression

y

y

y

y

x x x x

x x x x

x x x x

x x x xk

k

k

k

n n n nk

.

.

.

..

model:

.

.

.

...

...

...

. . . . .

. . . . .

. . . . .

.

1

2

3

11 12 13 1

21 22 23 2

31 32 33 3

1 2 3

1

2

1

1

1

1

3

1

2

3

.

.

.

.

.

.

k k

Y X

The estimated regression

model:

Y = Xb+ e

The Matrix Approach to Regression Analysis (1)

Page 100: Regression Analysis and Multiple Regression

The normal equations

X Xb X Y

Estimators

b X X X Y

values

Y Xb X X X X Y HY

V b X X

s b MSE X X

:

:

( )

:

( )

( ) ( )

( ) ( )

1

1

2 1

2 1

Predicted

The Matrix Approach to Regression Analysis (2)