anareg week 10 multicollinearity interesting special cases polynomial regression

27
Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Upload: pauline-edwards

Post on 18-Jan-2018

222 views

Category:

Documents


0 download

DESCRIPTION

Multicollinearity (2) Solve the statistical problem and the numerical problem will also be solved The statistical problem is more serious than the numerical problem We want to refine a model that has redundancy in the explanatory variables even if X’X can be inverted without difficulty

TRANSCRIPT

Page 1: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Anareg Week 10MulticollinearityInteresting special casesPolynomial regression

Page 2: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

MulticollinearityNumerical analysis problem is that the matrix

X’X is close to singular and is therefore difficult to invert accurately

Statistical problem is that there is too much correlation among the explanatory variables and it is therefore difficult to determine the regression coefficients

Page 3: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Multicollinearity (2)Solve the statistical problem and the

numerical problem will also be solvedThe statistical problem is more serious than

the numerical problemWe want to refine a model that has redundancy

in the explanatory variables even if X’X can be inverted without difficulty

Page 4: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Multicollinearity (3)Extremes cases can help us to understand the

problemif all X’s are uncorrelated, Type I SS and Type II SS

will be the same, i.e, the contribution of each explanatory variable to the model will be the same whether or not the other explanatory variables are in the model

if there is a linear combination of the explanatory variables that is a constant (e.g. X1 = X2 (X1 - X2 = 0)), then the Type II SS for the X’s involved will be zero

Page 5: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Y = gpa X1 = hsmX3 = hss X4 = hseX5 = satm X6 = satvX7 = genderm;

Define: sat=satm+satv;We will regress Y on sat satm and satv;

Page 6: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Source DF Model 2 Error 221Corrected Total 223

•Something is wrong•dfM=2 but there are 3 Xs

Page 7: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

NOTE: Model is not full rank. Least-squares solutions for the parameters are not unique. Some statistics will be misleading. A reported DF of 0 or B means that the estimate is biased.

Page 8: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

NOTE: The following parameters have been set to 0, since the variables are a linear combination of other variables as shown.

satv = sat - satm

Page 9: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Par StVar DF Est Err t P

Int 1 1.28 0.37 3.43 0.0007sat B -0.00 0.00 -0.04 0.9684satm B 0.00 0.00 2.10 0.0365satv 0 0 . . .

Page 10: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Extent of multicollinearityOur CS example had one explanatory variable

equal to a linear combination of other explanatory variables

This is the most extreme case of multicollinearity and is detected by statistical software because (X’X) does not have an inverse

We are concerned with cases less extreme

Page 11: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Effects of multicollinearityRegression coefficients are not well

estimated and may be meaninglessSimilarly for standard errors of these

estimatesType I SS and Type II SS will differR2 and predicted values are usually ok

Page 12: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Two separate problemsNumerical accuracy

(X’X) is difficult to invertNeed good software

Statistical problemResults are difficult to interpretNeed a better model

Page 13: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Polynomial regressionWe can do linear, quadratic, cubic, etc. by

defining squares, cubes, etc. in a data step and using these as predictors in a multiple regression

We can do this with more than one explanatory variable

When we do this we generally create a multicollinearity problem

Page 14: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Polynomial Regression (2)We can remove the correlation between

explanatory variables and their squares Center (subtract the mean) before squaring NKNW rescale by standardizing (subtract the

mean and divide by the standard deviation)

Page 15: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Interaction ModelsWith several explanatory variables, we need to

consider the possibility that the effect of one variable depends on the value of another variable

Special casesOne indep variable – second orderOne indep variable – Third orderTwo cindep variables – second order

Page 16: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

One Independent variable –Second Order

The regression model:

The mean response isa parabole and is frequently called a quadratic

response function.βo reperesents the mean response of Y when x

= 0 and β1 is often called the linear effect coeff while β11 is called the quadratic effect coeff.

XXxwherexxY iiiiioi 2111

2111)( iioi xxYE

Page 17: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

One Independent variable –Third Order

The regression model:

The mean response is

XXxwhere

xxxY

ii

iiiioi 3111

2111

3111

2111)( iiioi xxxYE

Page 18: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Two Independent variable –Second Order

The regression model:

The mean response is

the equation of a conic section. The coeff β12 is often called the interaction effect coeff.

222111

21122222

21112211

XXxXXxwhere

xxxxxxY

iiii

iiiiiiioi

2112222

21112211)( iiiiiioi xxxxxxYE

Page 19: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

NKNW Example p 330Response variable is the life (in cycles) of a

power cellExplanatory variables are

Charge rate (3 levels)Temperature (3 levels)

This is a designed experiment

Page 20: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Obs cycles chrate temp 1 150 0.6 10 2 86 1.0 10 3 49 1.4 10 4 288 0.6 20 5 157 1.0 20 6 131 1.0 20 7 184 1.0 20 8 109 1.4 20 9 279 0.6 30 10 235 1.0 30 11 224 1.4 30

Page 21: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Create new variables chrate2=chrate*chrate; temp2=temp*temp; ct=chrate*temp;Then regress cycles on chrate, temp, chrate2, temp2, and ct;

Page 22: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Var b S(b) t Pr>|t|int 162.84 16.61 9.81 <.0002Chrate -55.83 13.22 -

4.22<0.01

Temp 75.50 13.22 5.71 <0.005Chrate2

27.39 20.34 1.35 0.2359

Temp2 -10.61 20.34 -.52 0.6244ct 11.50 16.19 .71 0.5092

Page 23: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

b. ANOVA TableSource df SS MSRegression 5 66366 11703X1 1 18704 18704X2|X1 1 34201 34201X1

2|X1,X2 1 1646 1646X2

2|X1,X2,X12 1 285 285

X22|X1,X2,X1

2, X2

2

1 529 529

Error 5 4240 1048Total 10 60606

Page 24: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

ConclusionWe have a multicollinearity problemLets look at the correlations (use proc corr)There are some very high correlations

r(chrate,chrate2) = 0.99103r(temp,temp2) = 0.98609

Page 25: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

A remedyWe can remove the correlation between

explanatory variables and their squares Center (subtract the mean) before squaring NKNW rescale by standardizing (subtract the

mean and divide by the standard deviation)

Page 26: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Last slide

Read NKNW 7.6 to 7.7 and the problems on pp 317-326

We used programs cs4.sas and NKNW302.sas to generate the output for today

Page 27: Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression

Last slide

Read NKNW 8.5 and Chapter 9