chapter 11 multiple linear regression group project ams 572

72
Chapter 11 Multiple Linear Regression Group Project AMS 572

Upload: zoe-shelton

Post on 03-Jan-2016

235 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Chapter 11 Multiple Linear Regression Group Project AMS 572

Chapter 11Multiple Linear Regression

Group Project

AMS 572

Page 2: Chapter 11 Multiple Linear Regression Group Project AMS 572

Group Members

From Left to Right: Yongjun Cheng, William Ho, Katy Sharpe, Renyuan Luo, Farahnaz Maroof, Shuqiang Wang, Cai Rong, Jianping Zhang, Lingling Wu.

Yuanhua Li

Page 3: Chapter 11 Multiple Linear Regression Group Project AMS 572

Overview

1-3 Multiple Linear Regression --William Ho

4 Statistical Inference ---Katy Sharpe & Farahnaz Maroof

6 Topics in Regression Modeling -- Renyuan Luo & Yongjun Cheng

7 Variable Selection Methods & SAS---Lingling Wu, Yuanhua Li & Shuqiang

Wang

5, 8 Regression Diagnostic and Strategy for Building a Model ---Cai Rong

Summary Jianping Zhang

Page 4: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.1-11.3Multiple Linear Regression Intro

William Ho

Page 5: Chapter 11 Multiple Linear Regression Group Project AMS 572

Multiple Linear Regression

Historical Background

• Regression analysis is a statistical methodology to estimate the relationship of a response variable to a set of predictor variables

• Multiple linear regression extends simple linear regression model to the case of two or more predictor variable

• Francis Galton started using the term regression in his biology research• Karl Pearson and Udny Yule extended Galton’s work to statistical context• Gauss developed the method of least squares used in regression analysis

Example:

A multiple regression analysis might show us that the demand of a product varies directly with the change in demographic characteristics (age, income) of a market area.

Page 6: Chapter 11 Multiple Linear Regression Group Project AMS 572

Probabilistic Model

1 1 , 1, 2,...,ki i i k i iY x x x i n

iy is the observed value of the r.v. iY which depends on fixed predictor values

1 2, , ,i i ikx x x according to the following model:

where 0 1, , , k are unknown parameters.

the random error, , are assumed to be independent r.v.’si 2(0, )N

then the are independent r.v.’s withiY 2( , )iN

1 1( ) ki i i k ii E Y x x x

Page 7: Chapter 11 Multiple Linear Regression Group Project AMS 572

Fitting the model

• The least squares (LS) method is used to find a line that fits the equation

• Specifically, LS provides estimates of the unknown parameters,

1 22

0 1 2

1

[ ( ... )]k

n

i i i k i

i

Q y x x x

0 1, , , k Qwhich minimizes, , the sum of difference of the observed

values, , and the corresponding points on the lineiy

• The LS can be found by taking partial derivatives of Q with respect to

unknown variables and setting them equal to 0. The

result is a set of simultaneous linear equations, usually solved by

computer

• The resulting solutions, are the least squares (LS)

estimates of , respectively

1 1 ki i i k iY x x x

0 1, , , k 0 1

ˆ ˆ ˆ, , , k

0 1, , , k

Page 8: Chapter 11 Multiple Linear Regression Group Project AMS 572

Goodness of Fit of the Model

ˆ ( 1,2, , )i i ie y y i n • To evaluate the goodness of fit of the LS model, we use the residuals

defined by

• are the fitted values:

• An overall measure of the goodness of fit is the error sum of squares (SSE)

ˆiy

1 1ˆ ˆ ˆ ˆˆ ( 1,2,..., )ki i i k iy x x x i n

2

1

min n

ii

Q SSE e

• A few other definition similar to those in simple linear regression:

total sum of squares (SST)-2( )iSST y y

regression sum of squares (SSR) –

SSR SST SSE

Page 9: Chapter 11 Multiple Linear Regression Group Project AMS 572

• coefficient of multiple determination:

2 1SSR SSE

rSST SST

• • values closer to 1 represent better fits• adding predictor variables never decreases and generally increases

• multiple correlation coefficient (positive square root of ):

20 1r

2r

2r2r r

• only positive square root is used

• r is a measure of the strength of the association between the predictor

variables and the one response variable

Page 10: Chapter 11 Multiple Linear Regression Group Project AMS 572

Multiple Regression Model in Matrix Notation

The multiple regression model can be presented in a compact form by using matrix notation

Let:

1

2 ,

n

Y

YY

Y

1

2 ,

n

y

yy

y

1

2

n

be the n x 1 vectors of the r.v.’s , their observed values , and random errors ,

respectively Let:

'iY s 'iy s 'i s

11 1

21 2

1

1

1

1

k

k

n nk

x x

x xX

x x

be the n x (k + 1) matrix of the values of predictor variables(the first column corresponds to the constant term )

0

Page 11: Chapter 11 Multiple Linear Regression Group Project AMS 572

Let: 0

1

k

and

0

1

ˆ

ˆˆ

ˆk

be the (k + 1) x 1 vectors of unknown parameters and their LS estimates, respectively

• The model can be rewritten as:

Y X • The simultaneous linear equations whose solutions yields the LS estimates:

' 'X X X y • If the inverse of the matrix exists, then the solution is given by:'X X

1ˆ ( ' ) 'X X X y

Page 12: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.4Statistical Inference

Katy Sharpe & Farahnaz Maroof

Page 13: Chapter 11 Multiple Linear Regression Group Project AMS 572

Determining the statistical significance of the predictor variables:

• We test the hypotheses:

H0 j : j 0

H1 j : j 0

• If we can’t reject

H0 j : Bj 0

vs.

, then the corresponding variable

x j is not a useful predictor of y.

• Recall that each j is normally distributed with mean

j

and variance

2v jj , where is the jth diagonal entry of the

matrix

V (X ' X) 1

v jj

Page 14: Chapter 11 Multiple Linear Regression Group Project AMS 572

Deriving a pivotal quantity

),(~ˆ 2jjjj vN • Note that

• Since the error variance is unknown, we employ its

unbiased estimate, which is given by

2

S2 SSE

n (k 1)

ei2

n (k 1)

MSE

d.o. f .

• We know that

(n (k 1))S2

2

SSE

2~ n (k1)

2, and that

S2and the

j are statistically independent.

• Using )1,0(~ˆ

Nv jj

jj

and

(n (k 1))S2

2 SSE

2 ~ n (k1)2

definition of the t-distribution

T Z

W /n (k 1)

,

we obtain the pivotal quantity )1(~ˆ

kn

jj

jj TvS

and the

,

Page 15: Chapter 11 Multiple Linear Regression Group Project AMS 572

Derivation of a Confidence Interval for

j

1)ˆ

( )1(,2/)1(,2/ kn

jj

jjkn t

vStP

1)ˆˆ( )1(,2/)1(,2/ jjknjjjjknj vstvstP

)ˆ(ˆ)1(,2/ jknj SEt

Note that jjj vsSE )ˆ(

Page 16: Chapter 11 Multiple Linear Regression Group Project AMS 572

Deriving the Hypothesis Test:

Hypotheses:

0ˆ:

0ˆ:0

ja

j

H

H

P (Reject H0 | H0 is true) =

()ˆ

( )1(,2/

0

)1(,2/

0

kn

jj

jjkn

jj

jj tvs

Ptvs

P

)ˆ()ˆ( )1(,2/)1(,2/ knjjjknjjj tvsPtvsP

)ˆ( )1(,2/ knjjj tvsP

Therefore, we reject H0 if )1(,2/ˆ

knjjj tvs

Page 17: Chapter 11 Multiple Linear Regression Group Project AMS 572

Another Hypothesis Test

Now consider:for all

for at least one

When H0 is true,

• F is our pivotal quantity for this test.• Compute the p-value of the test.• Compare p to , and reject H0 if p .• If we reject H0, we know that at least one j 0, and we refer to the previous test in this case.

H0 : j 0

Ha : j 0

0 j k

0 j k

F MSR

MSE~ fk,n (k1)

Page 18: Chapter 11 Multiple Linear Regression Group Project AMS 572

The General Hypothesis Test

• Consider the full model:

• Now consider partial model:

• We test:

• Reject H0 when

Yi 0 1x i1 ...k x ik i (i=1,2,…n)

Yi 0 1x i1 ...k m x i,k m i

F (SSEk m SSEk ) /m

SSEk /[n (k 1)]~ fm,n (k1)

H0 :k m1 ...k 0 vs.

Ha : j 0

for at least one

k m 1 j k

F fm,n (k1),

(i=1,2,…n)

• Hypotheses:

Page 19: Chapter 11 Multiple Linear Regression Group Project AMS 572

Predicting Future Observations

• Let

x* (x0*,x1

*,...,xk*)'

Y * 0 1x1 ...k xk

ˆˆ...ˆˆˆˆ ***110

** xxxY kk

* *2* *2 * )ˆ( VxxsVxxxVar

• Our pivotal quantity becomes )1(* *

**

knTVxxs

• Using this pivotal quantity, we can derive a CI to estimate *:* *

2/),1(*ˆ Vxxst kn

• Additionally, we can derive a prediction interval (PI) to predict Y*:* *

2/),1(* 1ˆ VxxstY kn

and let

Page 20: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.6.1-11.6.3 Topics in Regression

Modeling

Renyuan Luo

Page 21: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.6.1 Multicollinearity

Def. The predictor variables are linearly dependent.

This can cause serious numerical and statistical difficulties in fitting the regression model unless “extra” predictor variables are deleted.

Page 22: Chapter 11 Multiple Linear Regression Group Project AMS 572

How does the multicollinearity cause difficulties?

.computable and unique be to for order in

invertable be must Thus , equation the to solution the is ^

^

XXyXXX TTT

If the approximate multicollinearity happens:

1. is nearly singular, which makes numerically unstable. This reflected in large changes in their magnitudes with small changes in data.

2. The matrix has very large elements. Therefore are large, which makes statistically nonsignificant.

TX X^

1)( XXV TjjvVar 2

^

)( j

^

Page 23: Chapter 11 Multiple Linear Regression Group Project AMS 572

Measures of Multicollinearity

1. The correlation matrix R.Easy but can’t reflect linear relationships between more than two variables.

2. Determinant of R can be used as measurement of singularity of .

3. Variance Inflation Factors (VIF): the diagonal elements of . VIF>10 is regarded as unacceptable.

TX X

1R

Page 24: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.6.2 Polynomial Regression

A special case of a linear model:

Problems:

1. The powers of x, i.e., tend to be highly correlated.

2. If k is large, the magnitudes of these powers tend to vary over a rather wide range.

So let k<=3 if possible, and never use k>5.

0 1 ... kky x x

2, , , kx x x

Page 25: Chapter 11 Multiple Linear Regression Group Project AMS 572

Solutions

1. Centering the x-variable:

Effect: removing the non-essential multicollinearity in the data.

2. Further more, we can do standardizing: divided by the standard deviation of x.

Effect: helping to alleviate the second problem.

* * *0 1 ( ) ... ( )k

ky x x x x

x

x x

s

xs

Page 26: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.6.3 Dummy Predictor Variables

What to do with the categorical predictor variables?

1. If we have categories of an ordinal variable, such as the prognosis of a patient (poor, average, good), just assign numerical scores to the categories. (poor=1, average=2, good=3)

Page 27: Chapter 11 Multiple Linear Regression Group Project AMS 572

2. If we have nominal variable with c>=2 categories. Use c-1 indicator variables, , called Dummy Variables, to code.

for the ith category,

for the cth category.

1 1, , cx x

1ix 1 1i c

1 1... 0cx x

Page 28: Chapter 11 Multiple Linear Regression Group Project AMS 572

Why don’t we just use c indicator variables:

?

If we use this, there will be a linear dependency among them:

This will cause multicollinearity.

1 2, ,..., cx x x

1 2 ... 1cx x x

Page 29: Chapter 11 Multiple Linear Regression Group Project AMS 572

Example

If we have four years of quarterly sale data of a certain brand of soda cans. How can we model the time trend by fitting a multiple regression equation?

Solution: We use quarter as a predictor variable x1. To model the seasonal trend, we use indicator variables x2, x3, x4, for Winter, Spring and Summer, respectively. For Fall, all three equal zero. That means: Winter-(1,0,0), Spring-(0,1,0), Summer-(0,0,1), Fall-(0,0,0).

Then we have the model:0 1 1 2 2 3 3 4 4 ( 1,...,16)i i i i i iY x x x x i

Page 30: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.6.4-11.6.5 Topics in Regression

Modeling

Yongjun Cheng

Page 31: Chapter 11 Multiple Linear Regression Group Project AMS 572

Logistic Regression Model

1938, R. A. Fisher and Frank Yates suggested the logistic transform for analyzing binary data.

xxYP

xYP

xYP

xYP10)|0(

)|1(ln

)|1(1

)|1(ln

x)|1Y of ln(Odds

Page 32: Chapter 11 Multiple Linear Regression Group Project AMS 572

Why is it important ?

Logistic regression model is the most popular model for binary data.

Logistic regression model is generally used for binary response variables.

Y = 1 (true, success, YES, etc.) or

Y = 0( false, failure, NO, etc.)

Page 33: Chapter 11 Multiple Linear Regression Group Project AMS 572

What is Logistic Regression Model? Consider a response variable Y=0 or 1and a single predictor variable x. We want to

model E(Y|x) =P(Y=1|x) as a function of x. The logistic regression model expresses the logistic transform of P(Y=1|x).

This model may be rewritten as

Example

x)|1Y of ln(Odds xxYP

xYP

xYP

xYP10)|0(

)|1(ln

)|1(1

)|1(ln

)exp(1

)exp()|1(

10

10

x

xxYP

Page 34: Chapter 11 Multiple Linear Regression Group Project AMS 572

Some properties of logistic model

E(Y|x)= P(Y=1| x) *1 + P(Y=0|x) * 0 = P(Y=1|x) is bounded between 0 and 1 for all values of x .This is not true if we use model:

P(Y=1|x) = In ordinary regression, the regression coefficient has the interpretation

that it is the log of the odds ratio of a success event (Y=1) for a unit change in x.

For multiple predictor variables, the logistic regression model is

x10

1

11010 ][)]1([)|0(

)|1(ln

)1|0(

)1|1(ln

xxxYP

xYP

xYP

xYP

kkk

k xxxxxYP

xxxYP

...),...,,|0(

),...,,|1(ln 110

21

21

Page 35: Chapter 11 Multiple Linear Regression Group Project AMS 572

Standardized Regression Coefficients

Why we need standardize regression coefficients?

The regression equation for linear regression model:

1. The magnitudes of the can NOT be directly compared to

judge the relative effects of on y.

2. Standardized regression coefficients may be used to judge the

importance of different predictors

^ ^ ^ ^ ^

0 1 1 2 2 ... k ky x x x

^

j

jx

Page 36: Chapter 11 Multiple Linear Regression Group Project AMS 572

How to standardize regression coefficients?

Example: Industrial sales data

Linear Model:

The regression equation:

NOTE: but thus has a much larger effect than on y .

^ ^ ^ ^

0 1 1 2 2 1 22.61 0.129 0.341y x x x x

0 1 1 2 2i i i iy x x 1,2,...,10i

^ ^

1 2 1 20.192, 0.341, 6.830, 0.461, 1.501x x ys s s

^^*

111

6.8300.192 0.875

1.501x

y

s

s

^^*

222

0.4610.3406 0.105

1.501x

y

s

s

^ ^* *

1 2| | | |,

^

2

^

1 ||| 1x 2x

Page 37: Chapter 11 Multiple Linear Regression Group Project AMS 572

Summary for general case

Standardized Transform

Standardized Regression Coefficients ni

s

yyy

y

ii ,...,2,1,

_

*

kjnis

xxx

jx

jijij ,...,2,1;,...,2,1,

_

*

^^*

, 1, 2,... .xjjj

y

sj k

s

Page 38: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.7.1Variables selection method

Stepwise Regression

LingLing Wu

Page 39: Chapter 11 Multiple Linear Regression Group Project AMS 572

Variables selection method

(1)Why we need variable selection method?

(2)How we select variables?

* stepwise regression

* best subsets regression

Page 40: Chapter 11 Multiple Linear Regression Group Project AMS 572

Stepwise Regression

(p-1)-variable model:

P-varaible model

ipipii xxY 1,11,10 ...

ipippipii xxxY ,1,11,10 ...

Page 41: Chapter 11 Multiple Linear Regression Group Project AMS 572

0

0

testHypothesis

test-F Partial

:1

:0

pp

pp

H

H

2/),1(0

),1(,11

|:|

)(:

)]1(/[

1/)(

pnpp

p

pp

pnp

ppp

ttHreject

SEtstatistictest

fpnSSE

SSESSEF

Page 42: Chapter 11 Multiple Linear Regression Group Project AMS 572

Partial correlation coefficients

2

2

2

11

1111|

2

1...1|

1...1|

1...1

1

)]1([

)...(

)...()...(

pxxpyx

pxxpyx

pp

r

pnrtF

xxSSE

xxSSExxSSE

SSE

SSESSEr

pp

p

pp

p

ppxxyx

Page 43: Chapter 11 Multiple Linear Regression Group Project AMS 572

Stepwise regression algorithm

Page 44: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.7.1Variables selection method

SAS Example

Yuanhua Li, Jianping Zhang

Page 45: Chapter 11 Multiple Linear Regression Group Project AMS 572

No. X1 X2 X3 X4 Y

1 7 26 6 60 78.5

2 1 29 15 52 74.3

3 11 56 8 20 104.3

4 11 31 8 47 87.6

5 7 52 6 33 95.9

6 11 55 9 22 109.2

7 3 71 17 6 102.7

8 1 31 22 44 72.5

9 2 54 18 22 93.1

10 21 47 4 26 1159

11 1 40 23 34 83.8

12 11 66 9 12 113.3

13 10 68 8 12 109.4

Example 11.5 (pg. 416), 11.9 (pg. 431)

Following table gives data on the heat evolved in calories during hardening of cement on a per gram basis (y) along with the percentages of four ingredients: tricalcium aluminate (x1), tricalcium silicate (x2), tetracalcium alumino ferrite (x3), and dicalcium silicate (x4).

Page 46: Chapter 11 Multiple Linear Regression Group Project AMS 572

SAS Program (stepwise selection is used)

data example115;input x1 x2 x3 x4 y;datalines; 7 26 6 60 78.5 1 29 15 52 74.311 56 8 20 104.311 31 8 47 87.6 7 52 6 33 95.911 55 9 22 109.2 3 71 17 6 102.7 1 31 22 44 72.5 2 54 18 22 93.121 47 4 26 115.9 1 40 23 34 83.811 66 9 12 113.310 68 8 12 109.4;run;proc reg data=example115; model y = x1 x2 x3 x4 /selection=stepwise;run;

Page 47: Chapter 11 Multiple Linear Regression Group Project AMS 572

Selected SAS output

The SAS System 22:10 Monday, November 26, 2006 3

The REG Procedure Model: MODEL1 Dependent Variable: y

Stepwise Selection: Step 4

Parameter StandardVariable Estimate Error Type II SS F Value Pr > F

Intercept 52.57735 2.28617 3062.60416 528.91 <.0001x1 1.46831 0.12130 848.43186 146.52 <.0001x2 0.66225 0.04585 1207.78227 208.58 <.0001

Bounds on condition number: 1.0551, 4.2205----------------------------------------------------------------------------------------------------

Page 48: Chapter 11 Multiple Linear Regression Group Project AMS 572

SAS Output (cont)

All variables left in the model are significant at the 0.1500 level.

No other variable met the 0.1500 significance level for entry into the model.

Summary of Stepwise Selection

Variable Variable Number Partial Model Step Entered Removed Vars In R-Square R-Square C(p) F Value Pr > F

1 x4 1 0.6745 0.6745 138.731 22.80 0.0006 2 x1 2 0.2979 0.9725 5.4959 108.22 <.0001 3 x2 3 0.0099 0.9823 3.0182 5.03 0.0517 4 x4 2 0.0037 0.9787 2.6782 1.86 0.2054

Page 49: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.7.2Variables selection method

Best Subsets Regression

Shuqiang Wang

Page 50: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.7.2 Best Subsets Regression

For the stepwise regression algorithmThe final model is not guaranteed to be optimal

in any specified sense.

In the best subsets regression, subset of variables is chosen from the collection

of all subsets of k predictor variables) that optimizes a well-defined objective criterion

Page 51: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.7.2 Best Subsets Regression

In the stepwise regression,We get only one single final models.

In the best subsets regression, The investor could specify a size for the

predictors for the model.

Page 52: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.7.2 Best Subsets Regression

SST

SSE

SST

SSRr pp

p 12

Optimality Criteria

• rp2-Criterion:

n

iiipp YEYE

1

22

]][]ˆ[[1

The sample estimator, Mallows’ Cp-statistic, is given by

• Cp-Criterion (recommended for its ease of computation and its ability

to judge the predictive power of a model)

npSSE

C pp )1(2

ˆ 2

• Adjusted rp2-Criterion:

MST

MSEr p

padj 12,

Page 53: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.7.2 Best Subsets Regression

AlgorithmNote that our problem is to find the minimum of a given function.

Use the stepwise subsets regression algorithm and replace the partial F criterion with other criterion such as Cp.

Enumerate all possible cases and find the minimum of the criterion functions.

Other possibility?

Page 54: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.7.2 Best Subsets Regression & SAS

proc reg data=example115;

model y = x1 x2 x3 x4 /selection=stepwise;

run;

For the selection option, SAS has implemented 9 methods in total. For best subset method, we have the following options:

Maximum R2 Improvement (MAXR) Minimum R2 (MINR) Improvement R2 Selection (RSQUARE) Adjusted R2 Selection (ADJRSQ) Mallows' Cp Selection (CP)

Page 55: Chapter 11 Multiple Linear Regression Group Project AMS 572

11.5, 11.8 Building A Multiple Regression

Model

Steps and StrategyBy Rong Cai

Page 56: Chapter 11 Multiple Linear Regression Group Project AMS 572

Modeling is an iterative process. Several cycles of the steps maybe needed before arriving at the final model.

The basic process consists of seven steps

Page 57: Chapter 11 Multiple Linear Regression Group Project AMS 572

Get started and Follow the Steps

Categorization by Usage Collect the Data

Divide the Data Explore the Data

Fit Candidate Models

Select and Evaluate

Select the Final Model

Page 58: Chapter 11 Multiple Linear Regression Group Project AMS 572

Step I

Decide the type of model needed, according to different usage.

Main categories include: Predictive Theoretical Control Inferential Data Summary

• Sometimes, models are involved in multiple purposes.

Page 59: Chapter 11 Multiple Linear Regression Group Project AMS 572

Step II

Collect the Data

Predictor (X)

Response (Y)

Data should be relevant and bias-free

Reference: Chapter 3

Page 60: Chapter 11 Multiple Linear Regression Group Project AMS 572

Step III

Explore the Data Linear Regression Model is sensitive to the noise. Thus,

we should treat outliers and influential observations cautiously.

Reference: Chapter 4

Chapter 10

Page 61: Chapter 11 Multiple Linear Regression Group Project AMS 572

Step IV

Divide the Data

Training Sets: building

Test Sets: checking

How to divide? Large sample Half-Half

Small sample size of training set >16

Page 62: Chapter 11 Multiple Linear Regression Group Project AMS 572

Step V

Fit several Candidate Models

Using Training Set.

Page 63: Chapter 11 Multiple Linear Regression Group Project AMS 572

Step VI

Select and Evaluate a Good Model To improve the violations of model assumptions.

Page 64: Chapter 11 Multiple Linear Regression Group Project AMS 572

Step VII

Select the Final Model

Use test set to compare competing models by cross-validating them.

Page 65: Chapter 11 Multiple Linear Regression Group Project AMS 572

Regression Diagnostics (Step VI)

Graphical Analysis of Residuals Plot Estimated Errors vs. Xi Values

Difference Between Actual Yi & Predicted Yi

Estimated Errors Are Called Residuals Plot Histogram or Stem-&-Leaf of Residuals

Purposes Examine Functional Form (Linearity ) Evaluate Violations of Assumptions

Page 66: Chapter 11 Multiple Linear Regression Group Project AMS 572

Linear Regression Assumptions

Mean of Probability Distribution of Error Is 0

Probability Distribution of Error Has Constant Variance

Probability Distribution of Error is NormalErrors Are Independent

Page 67: Chapter 11 Multiple Linear Regression Group Project AMS 572

Residual Plot for Functional Form (Linearity)

X

e

X

e

Add X^2 TermAdd X^2 Term Correct SpecificationCorrect Specification

Page 68: Chapter 11 Multiple Linear Regression Group Project AMS 572

Residual Plot for Equal Variance

X

SR

X

SR

Unequal VarianceUnequal Variance Correct SpecificationCorrect Specification

Fan-shaped.Fan-shaped.Standardized residuals used typically (residual Standardized residuals used typically (residual

divided by standard error of prediction)divided by standard error of prediction)

Page 69: Chapter 11 Multiple Linear Regression Group Project AMS 572

Residual Plot for Independence

X

SR

X

SR

Not IndependentNot Independent Correct SpecificationCorrect Specification

Page 70: Chapter 11 Multiple Linear Regression Group Project AMS 572

Summary

Jianping Zhang

Page 71: Chapter 11 Multiple Linear Regression Group Project AMS 572

Chapter Summary

Multiple linear regression modeHow to fit multiple regression model (LS Fit)How to evaluate the goodness of fitting How to select predictor variablesHow to use SAS to do multiple regressionHow to building a multiple regression model

0 1 1 2 2 ... k ky x x x

Page 72: Chapter 11 Multiple Linear Regression Group Project AMS 572

Thank you!

Happy holidays!