chapter 11 multiple linear regression group project ams 572

Chapter 11Multiple Linear Regression

Group Project

AMS 572

Group Members

From Left to Right: Yongjun Cheng, William Ho, Katy Sharpe, Renyuan Luo, Farahnaz Maroof, Shuqiang Wang, Cai Rong, Jianping Zhang, Lingling Wu.

Yuanhua Li

Overview

1-3 Multiple Linear Regression --William Ho

4 Statistical Inference ---Katy Sharpe & Farahnaz Maroof

6 Topics in Regression Modeling -- Renyuan Luo & Yongjun Cheng

7 Variable Selection Methods & SAS---Lingling Wu, Yuanhua Li & Shuqiang

Wang

5, 8 Regression Diagnostic and Strategy for Building a Model ---Cai Rong

Summary Jianping Zhang

11.1-11.3Multiple Linear Regression Intro

William Ho

Multiple Linear Regression

Historical Background

• Regression analysis is a statistical methodology to estimate the relationship of a response variable to a set of predictor variables

• Multiple linear regression extends simple linear regression model to the case of two or more predictor variable

• Francis Galton started using the term regression in his biology research• Karl Pearson and Udny Yule extended Galton’s work to statistical context• Gauss developed the method of least squares used in regression analysis

Example:

A multiple regression analysis might show us that the demand of a product varies directly with the change in demographic characteristics (age, income) of a market area.

Probabilistic Model

1 1 , 1, 2,...,ki i i k i iY x x x i n

iy is the observed value of the r.v. iY which depends on fixed predictor values

1 2, , ,i i ikx x x according to the following model:

where 0 1, , , k are unknown parameters.

the random error, , are assumed to be independent r.v.’si 2(0, )N

then the are independent r.v.’s withiY 2( , )iN

1 1( ) ki i i k ii E Y x x x

Fitting the model

• The least squares (LS) method is used to find a line that fits the equation

• Specifically, LS provides estimates of the unknown parameters,

1 22

0 1 2

1

[ ( ... )]k

n

i i i k i

i

Q y x x x

0 1, , , k Qwhich minimizes, , the sum of difference of the observed

values, , and the corresponding points on the lineiy

• The LS can be found by taking partial derivatives of Q with respect to

unknown variables and setting them equal to 0. The

result is a set of simultaneous linear equations, usually solved by

computer

• The resulting solutions, are the least squares (LS)

estimates of , respectively

1 1 ki i i k iY x x x

0 1, , , k 0 1

ˆ ˆ ˆ, , , k

0 1, , , k

Goodness of Fit of the Model

ˆ ( 1,2, , )i i ie y y i n • To evaluate the goodness of fit of the LS model, we use the residuals

defined by

• are the fitted values:

• An overall measure of the goodness of fit is the error sum of squares (SSE)

ˆiy

1 1ˆ ˆ ˆ ˆˆ ( 1,2,..., )ki i i k iy x x x i n

2

1

min n

ii

Q SSE e

• A few other definition similar to those in simple linear regression:

total sum of squares (SST)-2( )iSST y y

regression sum of squares (SSR) –

SSR SST SSE

• coefficient of multiple determination:

2 1SSR SSE

rSST SST

• • values closer to 1 represent better fits• adding predictor variables never decreases and generally increases

• multiple correlation coefficient (positive square root of ):

20 1r

2r

2r2r r

• only positive square root is used

• r is a measure of the strength of the association between the predictor

variables and the one response variable

Multiple Regression Model in Matrix Notation

The multiple regression model can be presented in a compact form by using matrix notation

Let:

1

2 ,

n

Y

YY

Y

1

2 ,

n

y

yy

y

1

2

n

be the n x 1 vectors of the r.v.’s , their observed values , and random errors ,

respectively Let:

'iY s 'iy s 'i s

11 1

21 2

1

1

1

1

k

k

n nk

x x

x xX

x x

be the n x (k + 1) matrix of the values of predictor variables(the first column corresponds to the constant term )

0

Let: 0

1

k

and

0

1

ˆ

ˆˆ

ˆk

be the (k + 1) x 1 vectors of unknown parameters and their LS estimates, respectively

• The model can be rewritten as:

Y X • The simultaneous linear equations whose solutions yields the LS estimates:

' 'X X X y • If the inverse of the matrix exists, then the solution is given by:'X X

1ˆ ( ' ) 'X X X y

11.4Statistical Inference

Katy Sharpe & Farahnaz Maroof

Determining the statistical significance of the predictor variables:

• We test the hypotheses:

H0 j : j 0

H1 j : j 0

• If we can’t reject

H0 j : Bj 0

vs.

, then the corresponding variable

x j is not a useful predictor of y.

• Recall that each j is normally distributed with mean

j

and variance

2v jj , where is the jth diagonal entry of the

matrix

V (X ' X) 1

v jj

Deriving a pivotal quantity

),(~ˆ 2jjjj vN • Note that

• Since the error variance is unknown, we employ its

unbiased estimate, which is given by

2

S2 SSE

n (k 1)

ei2

n (k 1)

MSE

d.o. f .

• We know that

(n (k 1))S2

2

SSE

2~ n (k1)

2, and that

S2and the

j are statistically independent.

• Using )1,0(~ˆ

Nv jj

jj

and

(n (k 1))S2

2 SSE

2 ~ n (k1)2

definition of the t-distribution

T Z

W /n (k 1)

,

we obtain the pivotal quantity )1(~ˆ

kn

jj

jj TvS

and the

,

Derivation of a Confidence Interval for

j

1)ˆ

( )1(,2/)1(,2/ kn

jj

jjkn t

vStP

1)ˆˆ( )1(,2/)1(,2/ jjknjjjjknj vstvstP

)ˆ(ˆ)1(,2/ jknj SEt

Note that jjj vsSE )ˆ(

Deriving the Hypothesis Test:

Hypotheses:

0ˆ:

0ˆ:0

ja

j

H

H

P (Reject H0 | H0 is true) =

)ˆ

()ˆ

( )1(,2/

0

)1(,2/

0

kn

jj

jjkn

jj

jj tvs

Ptvs

P

)ˆ()ˆ( )1(,2/)1(,2/ knjjjknjjj tvsPtvsP

)ˆ( )1(,2/ knjjj tvsP

Therefore, we reject H0 if )1(,2/ˆ

knjjj tvs

Another Hypothesis Test

Now consider:for all

for at least one

When H0 is true,

• F is our pivotal quantity for this test.• Compute the p-value of the test.• Compare p to , and reject H0 if p .• If we reject H0, we know that at least one j 0, and we refer to the previous test in this case.

H0 : j 0

Ha : j 0

0 j k

0 j k

F MSR

MSE~ fk,n (k1)

The General Hypothesis Test

• Consider the full model:

• Now consider partial model:

• We test:

• Reject H0 when

Yi 0 1x i1 ...k x ik i (i=1,2,…n)

Yi 0 1x i1 ...k m x i,k m i

F (SSEk m SSEk ) /m

SSEk /[n (k 1)]~ fm,n (k1)

H0 :k m1 ...k 0 vs.

Ha : j 0

for at least one

k m 1 j k

F fm,n (k1),

(i=1,2,…n)

• Hypotheses:

Predicting Future Observations

• Let

x* (x0*,x1

*,...,xk*)'

Y * 0 1x1 ...k xk

ˆˆ...ˆˆˆˆ ***110

** xxxY kk

* *2* *2 * )ˆ( VxxsVxxxVar

• Our pivotal quantity becomes )1(* *

**

~ˆ

knTVxxs

• Using this pivotal quantity, we can derive a CI to estimate *:* *

2/),1(*ˆ Vxxst kn

• Additionally, we can derive a prediction interval (PI) to predict Y*:* *

2/),1(* 1ˆ VxxstY kn

and let

11.6.1-11.6.3 Topics in Regression

Modeling

Renyuan Luo

11.6.1 Multicollinearity

Def. The predictor variables are linearly dependent.

This can cause serious numerical and statistical difficulties in fitting the regression model unless “extra” predictor variables are deleted.

How does the multicollinearity cause difficulties?

.computable and unique be to for order in

invertable be must Thus , equation the to solution the is ^

^

XXyXXX TTT

If the approximate multicollinearity happens:

1. is nearly singular, which makes numerically unstable. This reflected in large changes in their magnitudes with small changes in data.

2. The matrix has very large elements. Therefore are large, which makes statistically nonsignificant.

TX X^

1)( XXV TjjvVar 2

^

)( j

^

Measures of Multicollinearity

1. The correlation matrix R.Easy but can’t reflect linear relationships between more than two variables.

2. Determinant of R can be used as measurement of singularity of .

3. Variance Inflation Factors (VIF): the diagonal elements of . VIF>10 is regarded as unacceptable.

TX X

1R

11.6.2 Polynomial Regression

A special case of a linear model:

Problems:

1. The powers of x, i.e., tend to be highly correlated.

2. If k is large, the magnitudes of these powers tend to vary over a rather wide range.

So let k<=3 if possible, and never use k>5.

0 1 ... kky x x

2, , , kx x x

Solutions

1. Centering the x-variable:

Effect: removing the non-essential multicollinearity in the data.

2. Further more, we can do standardizing: divided by the standard deviation of x.

Effect: helping to alleviate the second problem.

* * *0 1 ( ) ... ( )k

ky x x x x

x

x x

s

xs

11.6.3 Dummy Predictor Variables

What to do with the categorical predictor variables?

1. If we have categories of an ordinal variable, such as the prognosis of a patient (poor, average, good), just assign numerical scores to the categories. (poor=1, average=2, good=3)

2. If we have nominal variable with c>=2 categories. Use c-1 indicator variables, , called Dummy Variables, to code.

for the ith category,

for the cth category.

1 1, , cx x

1ix 1 1i c

1 1... 0cx x

Why don’t we just use c indicator variables:

?

If we use this, there will be a linear dependency among them:

This will cause multicollinearity.

1 2, ,..., cx x x

1 2 ... 1cx x x

Example

If we have four years of quarterly sale data of a certain brand of soda cans. How can we model the time trend by fitting a multiple regression equation?

Solution: We use quarter as a predictor variable x1. To model the seasonal trend, we use indicator variables x2, x3, x4, for Winter, Spring and Summer, respectively. For Fall, all three equal zero. That means: Winter-(1,0,0), Spring-(0,1,0), Summer-(0,0,1), Fall-(0,0,0).

Then we have the model:0 1 1 2 2 3 3 4 4 ( 1,...,16)i i i i i iY x x x x i

11.6.4-11.6.5 Topics in Regression

Modeling

Yongjun Cheng

Logistic Regression Model

1938, R. A. Fisher and Frank Yates suggested the logistic transform for analyzing binary data.

xxYP

xYP

xYP

xYP10)|0(

)|1(ln

)|1(1

)|1(ln

x)|1Y of ln(Odds

Why is it important ?

Logistic regression model is the most popular model for binary data.

Logistic regression model is generally used for binary response variables.

Y = 1 (true, success, YES, etc.) or

Y = 0( false, failure, NO, etc.)

Some properties of logistic model

E(Y|x)= P(Y=1| x) *1 + P(Y=0|x) * 0 = P(Y=1|x) is bounded between 0 and 1 for all values of x .This is not true if we use model:

P(Y=1|x) = In ordinary regression, the regression coefficient has the interpretation

that it is the log of the odds ratio of a success event (Y=1) for a unit change in x.

For multiple predictor variables, the logistic regression model is

x10

1

11010 ][)]1([)|0(

)|1(ln

)1|0(

)1|1(ln

xxxYP

xYP

xYP

xYP

kkk

k xxxxxYP

xxxYP

...),...,,|0(

),...,,|1(ln 110

21

21

Standardized Regression Coefficients

Why we need standardize regression coefficients?

The regression equation for linear regression model:

1. The magnitudes of the can NOT be directly compared to

judge the relative effects of on y.

2. Standardized regression coefficients may be used to judge the

importance of different predictors

^ ^ ^ ^ ^

0 1 1 2 2 ... k ky x x x

^

j

jx

How to standardize regression coefficients?

Example: Industrial sales data

Linear Model:

The regression equation:

NOTE: but thus has a much larger effect than on y .

^ ^ ^ ^

0 1 1 2 2 1 22.61 0.129 0.341y x x x x

0 1 1 2 2i i i iy x x 1,2,...,10i

^ ^

1 2 1 20.192, 0.341, 6.830, 0.461, 1.501x x ys s s

^^*

111

6.8300.192 0.875

1.501x

y

s

s

^^*

222

0.4610.3406 0.105

1.501x

y

s

s

^ ^* *

1 2| | | |,

^

2

^

1 ||| 1x 2x

Summary for general case

Standardized Transform

Standardized Regression Coefficients ni

s

yyy

y

ii ,...,2,1,

_

*

kjnis

xxx

jx

jijij ,...,2,1;,...,2,1,

_

*

^^*

, 1, 2,... .xjjj

y

sj k

s

11.7.1Variables selection method

Stepwise Regression

LingLing Wu

Variables selection method

(1)Why we need variable selection method?

(2)How we select variables?

* stepwise regression

* best subsets regression

Stepwise Regression

(p-1)-variable model:

P-varaible model

ipipii xxY 1,11,10 ...

ipippipii xxxY ,1,11,10 ...

0

0

testHypothesis

test-F Partial

:1

:0

pp

pp

H

H

2/),1(0

),1(,11

|:|

)(:

)]1(/[

1/)(

pnpp

p

pp

pnp

ppp

ttHreject

SEtstatistictest

fpnSSE

SSESSEF

Partial correlation coefficients

2

2

2

11

1111|

2

1...1|

1...1|

1...1

1

)]1([

)...(

)...()...(

pxxpyx

pxxpyx

pp

r

pnrtF

xxSSE

xxSSExxSSE

SSE

SSESSEr

pp

p

pp

p

ppxxyx

Stepwise regression algorithm


SAS Example

Yuanhua Li, Jianping Zhang

No. X1 X2 X3 X4 Y

1 7 26 6 60 78.5

2 1 29 15 52 74.3

3 11 56 8 20 104.3

4 11 31 8 47 87.6

5 7 52 6 33 95.9

6 11 55 9 22 109.2

7 3 71 17 6 102.7

8 1 31 22 44 72.5

9 2 54 18 22 93.1

10 21 47 4 26 1159

11 1 40 23 34 83.8

12 11 66 9 12 113.3

13 10 68 8 12 109.4

Example 11.5 (pg. 416), 11.9 (pg. 431)

Following table gives data on the heat evolved in calories during hardening of cement on a per gram basis (y) along with the percentages of four ingredients: tricalcium aluminate (x1), tricalcium silicate (x2), tetracalcium alumino ferrite (x3), and dicalcium silicate (x4).

SAS Program (stepwise selection is used)

data example115;input x1 x2 x3 x4 y;datalines; 7 26 6 60 78.5 1 29 15 52 74.311 56 8 20 104.311 31 8 47 87.6 7 52 6 33 95.911 55 9 22 109.2 3 71 17 6 102.7 1 31 22 44 72.5 2 54 18 22 93.121 47 4 26 115.9 1 40 23 34 83.811 66 9 12 113.310 68 8 12 109.4;run;proc reg data=example115; model y = x1 x2 x3 x4 /selection=stepwise;run;

Selected SAS output

The SAS System 22:10 Monday, November 26, 2006 3

The REG Procedure Model: MODEL1 Dependent Variable: y

Stepwise Selection: Step 4

Parameter StandardVariable Estimate Error Type II SS F Value Pr > F

Intercept 52.57735 2.28617 3062.60416 528.91 <.0001x1 1.46831 0.12130 848.43186 146.52 <.0001x2 0.66225 0.04585 1207.78227 208.58 <.0001

Bounds on condition number: 1.0551, 4.2205----------------------------------------------------------------------------------------------------

SAS Output (cont)

All variables left in the model are significant at the 0.1500 level.

No other variable met the 0.1500 significance level for entry into the model.

Summary of Stepwise Selection

Variable Variable Number Partial Model Step Entered Removed Vars In R-Square R-Square C(p) F Value Pr > F

1 x4 1 0.6745 0.6745 138.731 22.80 0.0006 2 x1 2 0.2979 0.9725 5.4959 108.22 <.0001 3 x2 3 0.0099 0.9823 3.0182 5.03 0.0517 4 x4 2 0.0037 0.9787 2.6782 1.86 0.2054


Best Subsets Regression

Shuqiang Wang

11.7.2 Best Subsets Regression

For the stepwise regression algorithmThe final model is not guaranteed to be optimal

in any specified sense.

In the best subsets regression, subset of variables is chosen from the collection

of all subsets of k predictor variables) that optimizes a well-defined objective criterion


In the stepwise regression,We get only one single final models.

In the best subsets regression, The investor could specify a size for the

predictors for the model.


SST

SSE

SST

SSRr pp

p 12

Optimality Criteria

• rp2-Criterion:

n

iiipp YEYE

1

22

]][]ˆ[[1

The sample estimator, Mallows’ Cp-statistic, is given by

• Cp-Criterion (recommended for its ease of computation and its ability

to judge the predictive power of a model)

npSSE

C pp )1(2

ˆ 2

• Adjusted rp2-Criterion:

MST

MSEr p

padj 12,


AlgorithmNote that our problem is to find the minimum of a given function.

Use the stepwise subsets regression algorithm and replace the partial F criterion with other criterion such as Cp.

Enumerate all possible cases and find the minimum of the criterion functions.

Other possibility?

11.7.2 Best Subsets Regression & SAS

proc reg data=example115;

model y = x1 x2 x3 x4 /selection=stepwise;

run;

For the selection option, SAS has implemented 9 methods in total. For best subset method, we have the following options:

Maximum R2 Improvement (MAXR) Minimum R2 (MINR) Improvement R2 Selection (RSQUARE) Adjusted R2 Selection (ADJRSQ) Mallows' Cp Selection (CP)

11.5, 11.8 Building A Multiple Regression

Model

Steps and StrategyBy Rong Cai

Modeling is an iterative process. Several cycles of the steps maybe needed before arriving at the final model.

The basic process consists of seven steps

Get started and Follow the Steps

Categorization by Usage Collect the Data

Divide the Data Explore the Data

Fit Candidate Models

Select and Evaluate

Select the Final Model

Step I

Decide the type of model needed, according to different usage.

Main categories include: Predictive Theoretical Control Inferential Data Summary

• Sometimes, models are involved in multiple purposes.

Step II

Collect the Data

Predictor (X)

Response (Y)

Data should be relevant and bias-free

Reference: Chapter 3

Step III

Explore the Data Linear Regression Model is sensitive to the noise. Thus,

we should treat outliers and influential observations cautiously.

Reference: Chapter 4

Chapter 10

Step IV

Divide the Data

Training Sets: building

Test Sets: checking

How to divide? Large sample Half-Half

Small sample size of training set >16

Step V

Fit several Candidate Models

Using Training Set.

Step VI

Select and Evaluate a Good Model To improve the violations of model assumptions.

Step VII

Select the Final Model

Use test set to compare competing models by cross-validating them.

Regression Diagnostics (Step VI)

Graphical Analysis of Residuals Plot Estimated Errors vs. Xi Values

Difference Between Actual Yi & Predicted Yi

Estimated Errors Are Called Residuals Plot Histogram or Stem-&-Leaf of Residuals

Purposes Examine Functional Form (Linearity ) Evaluate Violations of Assumptions

Linear Regression Assumptions

Mean of Probability Distribution of Error Is 0

Probability Distribution of Error Has Constant Variance

Probability Distribution of Error is NormalErrors Are Independent

Residual Plot for Functional Form (Linearity)

X

e

X

e

Add X^2 TermAdd X^2 Term Correct SpecificationCorrect Specification

Residual Plot for Equal Variance

X

SR

X

SR

Unequal VarianceUnequal Variance Correct SpecificationCorrect Specification

Fan-shaped.Fan-shaped.Standardized residuals used typically (residual Standardized residuals used typically (residual

divided by standard error of prediction)divided by standard error of prediction)

Residual Plot for Independence

X

SR

X

SR

Not IndependentNot Independent Correct SpecificationCorrect Specification

Summary

Jianping Zhang

Chapter Summary

Multiple linear regression modeHow to fit multiple regression model (LS Fit)How to evaluate the goodness of fitting How to select predictor variablesHow to use SAS to do multiple regressionHow to building a multiple regression model

0 1 1 2 2 ... k ky x x x

Thank you!

Happy holidays!

chapter 11 multiple linear regression group project ams 572

Documents