chapter 11 multiple linear regression chapter 11 multiple linear regression

Chapter 11 Chapter 11

Multiple Linear Multiple Linear RegressionRegression

Our Group Our Group Members:Members:

Content: Content: ► Multiple Regression Model -----Yifan Wang -----Yifan Wang

► Statistical Inference ---Shaonan Zhang & Yicheng Li ---Shaonan Zhang & Yicheng Li

► Variable Selection Methods & SAS ---Guangtao Li & Ruixue Wang---Guangtao Li & Ruixue Wang

► Strategy for Building a Model and Data Transformation --- Xiaoyu Zhang & Siyuan Luo--- Xiaoyu Zhang & Siyuan Luo► Topics in Regression Modeling ----Yikang Chai & Tao Li----Yikang Chai & Tao Li

► Summary -----Xing Chen-----Xing Chen

Ch 11.1-11.3Ch 11.1-11.3 Introduction to Multiple Introduction to Multiple

Linear RegressionLinear Regression

Yifan WangYifan Wang

Dec. 6th, 2007Dec. 6th, 2007

Based on Chapter 10, we studied how to fit a linear relationship between a response variable y and a predictor variable x.

But, sometimes we cannot handle a problem using simple linear regression, when there are two or more predictor variables.

For ExampleFor ExampleThe salary of a company employee may depend on job category years of experience education performance evaluations

Extend the simple linear regression model to Extend the simple linear regression model to the case of two or more predictor variables.the case of two or more predictor variables.

Multiple Linear RegressionMultiple Linear Regression (or simply (or simply Multiple RegressionMultiple Regression) is the statistical ) is the statistical methodology used to fit such models.methodology used to fit such models.

Multiple Linear RegressionMultiple Linear Regression

In multiple regression we fit a model of the form (excluding In multiple regression we fit a model of the form (excluding the error term)the error term)

Where are predictor variables and Where are predictor variables and are are kk+1 unknown parameters. +1 unknown parameters.

For exampleFor example

This model includes the kth degree polynomial model in a single variable x, namely,

Since we can put .

linearlinear

1 2 kky x x x 1 2, , , kx x x 2k

1 2, , ,i i ikx x x

2 kky x x x

21 2, , , .k

kx x x x x x

11.1 A Probabilistic Model For 11.1 A Probabilistic Model For Multiple Linear RegressionMultiple Linear Regression

Regard the response variable as random Regard the response variable as random

Regard the predictor variables as nonrandom. Regard the predictor variables as nonrandom.

The data for multiple regression consist of The data for multiple regression consist of nn vectors of vectors of observations ( ) for observations ( ) for i i =1,2,…,=1,2,…,nn..

Example 1Example 1

The response variable – the salary of the i th person in the sample

The predictor variables – his/her years of experience

– his/her years of education.

1 2, , , ;i i ik ix x x y

iy

1ix

2ix

1 1 , 1, 2,...,ki i i k i iY x x x i n

is the observed value of the r.v.. iY

predictor values 1 2, , ,i i ikx x x according to the following

Where is a random errorrandom error with =0, and 0 1, , , k

are unknown parameters. Assume are independent

i2(0, )N

iY 2( , )iN

1 1( ) ki i i k ii E Y x x x

depends on fixed

Example 2Example 2

model:

i

random variables. Then the are independent

random variables with

iy

( )iE

11.2 Fitting the Multiple 11.2 Fitting the Multiple Regression ModelRegression Model

1 22

0 1 2

1

[ ( ... )]k

n

i i i k i

i

Q y x x x

The LS estimates of the unknown parameters

minimize

The LS can be found by setting the first partial derivatives of

Q with respect to equal to zero.

The result is a set of simultaneous linear equations in (k+1)

unknowns. The resulting solutions, are the least least

squares (LS) estimatessquares (LS) estimates of , respectively

11.2.1 Least Squares (LS) Fit11.2.1 Least Squares (LS) Fit

0 1, , , k

0 1, , , k

0 1ˆ ˆ ˆ, , , k

11.2.2 Goodness of Fit of the 11.2.2 Goodness of Fit of the ModelModel

ˆ ( 1,2, , )i i ie y y i n

To access the goodness of fit of the LS model, we use the

residualsresiduals defined by

Where the are the fitted values:

An overall measure of the goodness of fit is the error sum error sum

of squares (of squares (SSESSE))

Compare it to the total sum of squares (total sum of squares (SSTSST))

As in Chapter 10, define the regression sum of squares regression sum of squares

((SSRSSR)) given by

ˆiy

2

1

min n

ii

Q SSE e

2( )iSST y y

SSR SST SSE

1ˆ ˆ ˆˆ ( 1,2,..., )ki i iky x x i n

2 1SSR SSE

rSST SST

the coefficient of multiple determinationcoefficient of multiple determination

• , values closer to 1 represent better fits• Adding predictor variables generally increases , thus ca

n be made to approach 1 by increasing the number of predict

ors.

Multiple correlation coefficientMultiple correlation coefficient (the positive square root o

f ):

• only positive square root is used• r is a measure of the strength of the association between the predictor variables and the one response variable

20 1r 2r2r

2r r2r

11.3 Multiple Regression 11.3 Multiple Regression Model in Matrix NotationModel in Matrix Notation

The multiple regression model can be presented in a The multiple regression model can be presented in a compact form by using matrix notation. compact form by using matrix notation. LetLet

1

2 ,

n

Y

YY

Y

1

2 ,

n

y

yy

y

1

2

n

be the n x 1 vectors of the r.v.’s , their observed values , and random errors , respectively. Next let

'iY s 'iy s'i s

11 1

21 2

1

1

1

1

k

k

n nk

x x

x xX

x x

be the n x (k+1) matrix of the values of predictor variables.

Finally Let0

1

k

and

0

1

ˆ

ˆˆ

ˆk

be the (k + 1) x 1 vectors of unknown parameters and their LS estimates, respectively

The model can be rewritten as:

Y X

The simultaneous linear equations whose solutions yields the LS estimates can be written in matrix notation as

' 'X X X y

If the inverse of the matrix exists, then the solution is given by

'X X

1ˆ ( ' ) 'X X X y

11.4 Statistical Inference11.4 Statistical Inference

Shaonan Zhang & Yicheng LiShaonan Zhang & Yicheng Li

Statistical Inference on Statistical Inference on ββ’s’s ----General Hypothesis Test ----General Hypothesis Test Determining the statistical significance of predictor variablesDetermining the statistical significance of predictor variables

we test the hypotheses:we test the hypotheses:

if we can’t reject , can be dropped from the if we can’t reject , can be dropped from the modelmodel

0:

0:

1

0

jj

jj

H

H

H0 j : j 0

x j

Statistical Inference on Statistical Inference on ββ’s’s ----General Hypothesis Test ----General Hypothesis Test Pivotal QuantityPivotal Quantity

recall: recall:

unbiased estimate of :unbiased estimate of :

),(~ˆ 2jjjj vN

MSEkn

e

kn

SSES i

)1()1(

22

2

)1,0(~ˆ

Nv

Zjj

jj

error degrees error degrees of freedomof freedom

2)1(22

2

~))1((

kn

SSESknW

)1(~ˆ

kn

jj

jj TvS

)1(/ knW

Z

Statistical Inference on Statistical Inference on ββ’s’s ----General Hypothesis Test ----General Hypothesis Test Confidence Interval for Confidence Interval for

Noted thatNoted that

So, the CI is:So, the CI is:

wherewhere

j)%1(100

1)ˆˆ( )1(,2/)1(,2/ jjknjjjjknj vstvstP

)ˆ(ˆ)1(,2/ jknj SEt jjj vsSE )ˆ(

1)ˆ

( )1(,2/)1(,2/ kn

jj

jjkn t

vStP

)%1(100

)1(~ˆ

kn

jj

jj TvS

Statistical Inference on Statistical Inference on ββ’s’s ----General Hypothesis Test ----General Hypothesis Test Hypothesis TestHypothesis Test

Specially, when = 0, we reject Specially, when = 0, we reject HH00 if if

01

00

ˆ:

ˆ:

jj

jj

H

H

)()(0000

jj

j

jj

jj

jj

j

jj

jj

vs

C

vsP

vs

C

vsP

)|(| CP j

2)(

00

jj

j

jj

jj

vs

C

vsP )1(,2/

0 knjjj tvsC

)1(,2/ knjjj tvs

)( )1(,2/0

knjjjj tvsP

)()( CPCP jj

j

P (Reject P (Reject H0H0 | | H0H0 is true) = is true) =

Statistical Inference on Statistical Inference on ββ’s’s ----Another Hypothesis Test ----Another Hypothesis Test Hypothesis: Hypothesis:

Pivotal QuantityPivotal Quantity

also,also,

P-valueP-value

If P-value is less than If P-value is less than αα, we , we reject reject HH00. And we use the previou. And we use the previou

s test in this case.s test in this case.

0...: 10 kH 0 oneleast At :1 jH

kr

knr

kSSE

knSSR

MSE

MSRF

)1(

))1((

)(

))1((2

2

F MSR

MSE~ fk,n (k1)

Statistical Inference on Statistical Inference on ββ’s’s ----Another Hypothesis Test ----Another Hypothesis Test ANOVA Table for Multiple RegressionANOVA Table for Multiple Regression

Source of Source of VariationVariation

(Source)(Source)

Sum of Sum of SquaresSquares

(SS)(SS)

Degrees of Degrees of Freedo Freedomm

(d.f.)(d.f.)

Mean Mean SquareSquare

(MS)(MS)

FF

RegressionRegression

ErrorError

SSRSSR

SSESSE

kk

n - (k+1)n - (k+1)

TotalTotal SSTSST n - 1n - 1

k

SSRMSR

)1(

kn

SSEMSE

MSE

MSRF

Statistical Inference on Statistical Inference on ββ’s’s ----Test Subsets of Parameters ----Test Subsets of Parameters Full ModelFull Model Partial ModelPartial Model

HypothesisHypothesis

test statisticstest statistics

reject reject HH00 when when

),...,2,1(... ,110 ni xxY imkimkii ),...,2,1(... ,110 ni xxY ikikii

H0 :k m1 ...k 0

0,..., of oneleast At : 11 kmkH

kkmkmk SSESSRSSESSRSST

F (SSEk m SSEk ) /m

SSEk /[n (k 1)]~ fm,n (k1)

F fm,n (k1),

Prediction of Future ObservationsPrediction of Future Observations

Let andLet and Whatever CI (Confidence Interval) or PI (Prediction Interval)Whatever CI (Confidence Interval) or PI (Prediction Interval)

we havewe have

andand Pivotal QuantityPivotal Quantity

a (1-) level CI to a (1-) level CI to estimateestimate **::

a (1-) level PI to a (1-) level PI to predictpredict Y*Y*::

x* (x0*,x1

*,..., xk*)'

kk xxY *1

*10

* ...

ˆˆ...ˆˆˆˆ ***110

** xxxY kk * *2* *2 * )ˆ( VxxsVxxxVar

)1(* *

**

~ˆ

knTVxxs

* *2/),1(

*ˆ Vxxst kn

* *2/),1(

* 1ˆ VxxstY kn

11.711.7Variable Selection MethodsVariable Selection Methods

Guangtao Li, RuiXue WangGuangtao Li, RuiXue Wang

1. Why1. Why do we need variable do we need variable selection methods?selection methods?

2. Two methods are introduced2. Two methods are introduced Stepwise RegressionStepwise Regression Best Subsets RegressionBest Subsets Regression

11.7.1 STEPWISE REGRESSION11.7.1 STEPWISE REGRESSION

Guangtao LiGuangtao Li

Recall TestRecall Test for Subsets of Parameters for Subsets of Parameters in 11.4in 11.4

• Full model:Full model:

• Partial model:Partial model:

• We test:We test:

• Reject HReject H00 when when

Yi 0 1x i1 ...k x ik i (i=1,2,…n)(i=1,2,…n)

Yi 0 1x i1 ...k m x i,k m i

H0 :k m1 ...k 0 vs.vs.

Ha : j 0 for at least onefor at least one

k m 1 j k

F fm,n (k1),

(i=1,2,…n)(i=1,2,…n)

• Hypotheses:Hypotheses:

),1(,k

mmk

)]1k(n/[SSE

m/)SSE(SSEF

knmf

►(p-1)-variable model:(p-1)-variable model:

►P-variable model:P-variable model:

ipipii xxY 1,11,10 ...

ipippipii xxxY ,1,11,10 ...

PartialPartial F-test:F-test:

►Reject HReject H0p 0p if if

0:0 ppH 0:1 ppH

),1(,1p

p1-pp )]1p(n/[SSE

1/)SSE(SSEF

pnf

Partial Correlation Partial Correlation CoefficientsCoefficients

We should add to the regression We should add to the regression equation only if is large enough,equation only if is large enough, i.e.,i.e., only if is statistically only if is statistically significant.significant.

px

2

2

11

1111|

2

1...1|

1...1|

1...1

1

)]1([

)...(

)...()...(

pxxpyx

pxxpyx

pp

r

pnrF

xxSSE

xxSSExxSSE

SSE

SSESSEr

p

p

pp

p

ppxxyx

pF2

1...1| pxxpyxr

Stepwise Regression Stepwise Regression AlgorithmAlgorithm

► Example: The Director of Broadcasting Example: The Director of Broadcasting Operations for a television station wants to Operations for a television station wants to study the issue of “standby hours,” which are study the issue of “standby hours,” which are hours where unionized graphic artists at the hours where unionized graphic artists at the station are paid but are not actually involved station are paid but are not actually involved in any activity. We are trying to predict the in any activity. We are trying to predict the total number of Standby Hours per Week (Y). total number of Standby Hours per Week (Y). Possible explanatory variables are: Total Staff Possible explanatory variables are: Total Staff Present (X1), Remote Hours(X2), Dubner Hours Present (X1), Remote Hours(X2), Dubner Hours (X3) and Total Labor Hours (X4). The results (X3) and Total Labor Hours (X4). The results for 26 weeks are given below.for 26 weeks are given below.

► Example: The Director of Broadcasting Example: The Director of Broadcasting Operations for a television station wants to Operations for a television station wants to study the issue of “standby hours,” which are study the issue of “standby hours,” which are hours where unionized graphic artists at the hours where unionized graphic artists at the station are paid but are not actually involved station are paid but are not actually involved in any activity. We are trying to predict the in any activity. We are trying to predict the total number of Standby Hours per Week (Y). total number of Standby Hours per Week (Y). Possible explanatory variables are: Total Staff Possible explanatory variables are: Total Staff Present (X1), Remote Hours(X2), Dubner Hours Present (X1), Remote Hours(X2), Dubner Hours (X3) and Total Labor Hours (X4). The results (X3) and Total Labor Hours (X4). The results for 26 weeks are given below.for 26 weeks are given below.

SAS Program for the SAS Program for the AlgorithmAlgorithm

Data test;Data test;input y x1 x2 x3 x4;input y x1 x2 x3 x4;datalines;datalines;245245 338338 414414 323323 20012001177177 333333 598598 340340 20302030271271 358358 656656 340340 22262226211211 372372 631631 352352 21542154196196 339339 528528 380380 20782078135135 289289 409409 339339 20802080195195 334334 382382 331331 20732073118118 293293 399399 311311 17581758116116 325325 343343 328328 16241624147147 311311 338338 353353 18891889154154 304304 353353 518518 19881988146146 312312 289289 440440 20492049115115 283283 388388 276276 17961796

► 161161 307307 402402 207207 17201720► 274274 322322 151151 287287 20562056► 245245 335335 228228 290290 18901890► 201201 350350 271271 355355 21872187► 183183 339339 440440 300300 20322032► 237237 327327 475475 284284 18561856► 175175 328328 347347 337337 20682068► 152152 319319 449449 279279 18131813► 188188 325325 336336 244244 18081808► 188188 322322 267267 253253 18341834► 197197 317317 235235 272272 19731973► 261261 315315 164164 223223 18391839► 232232 331331 270270 272272 19351935► run;run;► proc reg data=test;proc reg data=test;► model y = x1 x2 x3 x4 /SELECTION =stepwise ;model y = x1 x2 x3 x4 /SELECTION =stepwise ;► run;run;

Selected SAS OutputSelected SAS Output

Stepwise Selection: Step 1Stepwise Selection: Step 1

Variable x1 Entered: R-Square = 0.3660 and C(p) = 13.3215Variable x1 Entered: R-Square = 0.3660 and C(p) = 13.3215

Analysis of VarianceAnalysis of Variance

Sum of MeanSum of Mean Source DF Squares Square F Value Pr > FSource DF Squares Square F Value Pr > F

Model 1 20667 20667 13.86 0.0011Model 1 20667 20667 13.86 0.0011 Error 24 35797 1491.55073Error 24 35797 1491.55073 Corrected Total 25 56465Corrected Total 25 56465

Parameter StandardParameter Standard Variable Estimate Error Type II SS F Value Pr > FVariable Estimate Error Type II SS F Value Pr > F

Intercept -272.38165 124.24020 7169.17926 4.81 0.0383Intercept -272.38165 124.24020 7169.17926 4.81 0.0383 x1 1.42405 0.38256 20667 13.86 0.0011x1 1.42405 0.38256 20667 13.86 0.0011

Bounds on condition number: 1, 1Bounds on condition number: 1, 1--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Stepwise Selection: Step 2Stepwise Selection: Step 2

Variable x2 Entered: R-Square = 0.4899 and C(p) = 8.4193Variable x2 Entered: R-Square = 0.4899 and C(p) = 8.4193

Analysis of VarianceAnalysis of Variance

Sum of MeanSum of Mean Source DF Squares Square F Value Pr > FSource DF Squares Square F Value Pr > F

Model 2 27663 13831 11.05 0.0004Model 2 27663 13831 11.05 0.0004 Error 23 28802 1252.26402Error 23 28802 1252.26402 Corrected Total 25 56465Corrected Total 25 56465

Parameter StandardParameter Standard Variable Estimate Error Type II SS F Value Pr > FVariable Estimate Error Type II SS F Value Pr > F

Intercept -330.67483 116.48022 10092 8.06 0.0093Intercept -330.67483 116.48022 10092 8.06 0.0093 x1 1.76486 0.37904 27149 21.68 0.0001x1 1.76486 0.37904 27149 21.68 0.0001 x2 -0.13897 0.05880 6995.14489 5.59 0.0269x2 -0.13897 0.05880 6995.14489 5.59 0.0269

SAS Output(cont)SAS Output(cont)► All variables left in the model are significant at the 0.1500 level.All variables left in the model are significant at the 0.1500 level.

No other variable met the 0.1500 significance level for entry into the model.No other variable met the 0.1500 significance level for entry into the model.

► Summary of Stepwise SelectionSummary of Stepwise Selection

► Variable Variable Number Partial ModelVariable Variable Number Partial Model

Step Entered Removed Vars In R-Square R-Square C(p) F Value Step Entered Removed Vars In R-Square R-Square C(p) F Value Pr > FPr > F

► 1 x1 1 0.3660 0.3660 13.3215 13.86 1 x1 1 0.3660 0.3660 13.3215 13.86 0.00110.0011

► 2 x2 2 0.1239 0.4899 8.4193 5.59 2 x2 2 0.1239 0.4899 8.4193 5.59 0.02690.0269

11.7.2 Best Subsets Regression11.7.2 Best Subsets Regression

11.7.2 Best Subsets Regression11.7.2 Best Subsets Regression►In practice there are often several almost equally In practice there are often several almost equally

good models, and the choice of the final model good models, and the choice of the final model may depend on side considerations such as the may depend on side considerations such as the number of variables, the ease of observing number of variables, the ease of observing and/or controlling variables, etc. The best and/or controlling variables, etc. The best subsets regression algorithm permits subsets regression algorithm permits determination of a specified number of best determination of a specified number of best subsets of size p=1,2,…,k from which the choice subsets of size p=1,2,…,k from which the choice of the final model can be made by the of the final model can be made by the investigator.investigator.

11.7.2 Best Subsets 11.7.2 Best Subsets RegressionRegression

SST

SSE

SST

SSRr pp

p 12

Optimality Criteria

rp2-Criterion:

Criterion: Some programs use minimization of

as an optimization criterion. However, is not an independent criterion for it is equivalent to maximizing adjusted rp

2 .

Adjusted rp2-Criterion:

MST

MSEr p

padj 12,

pMSE

pMSE pMSE

CCpp-Criterion (-Criterion (recommended for its ease of computation recommended for its ease of computation

and its ability to judge the predictive power of a modeland its ability to judge the predictive power of a model))

►The sample estimator, Mallows’ CThe sample estimator, Mallows’ Cpp-statistic, is given by -statistic, is given by

► is an almost unbiased estimator of is an almost unbiased estimator of

npSSE

Cp

p )1(2ˆ 2

n

iiipp YEYE

1

22

]][]ˆ[[1

p

pC

PRESS p Criterion: The total prediction error sum of PRESS p Criterion: The total prediction error sum of squares (press) is: squares (press) is:

This criterion evaluates the predictive ability of a This criterion evaluates the predictive ability of a postulated model by omitting one observation at a time, postulated model by omitting one observation at a time, fitting the model based on the remaining observations fitting the model based on the remaining observations

and computing the predicted value for the omitted and computing the predicted value for the omitted observation.observation.

The PRESS p criterion is intuitively easier to grasp than The PRESS p criterion is intuitively easier to grasp than the Cthe Cpp-Criterion , but it is computationally much more -Criterion , but it is computationally much more

intensive and is not available in many packagesintensive and is not available in many packages..

2

1

)ˆ(

n

iiipp YYPRESS

SAS PRGRAMSAS PRGRAM

► Data test;Data test;► input y x1 x2 x3 x4;input y x1 x2 x3 x4;► datalines;datalines;► 245245 338338 414414 323323 20012001► 177177 333333 598598 340340 20302030► 271271 358358 656656 340340 22262226► 211211 372372 631631 352352 21542154► 196196 339339 528528 380380 20782078► 135135 289289 409409 339339 20802080► 195195 334334 382382 331331 20732073► 118118 293293 399399 311311 17581758► 116116 325325 343343 328328 16241624► 147147 311311 338338 353353 18891889► 154154 304304 353353 518518 19881988► 146146 312312 289289 440440 20492049► 115115 283283 388388 276276 17961796

SAS PRGRAMSAS PRGRAM► 161161 307307 402402 207207 17201720► 274274 322322 151151 287287 20562056► 245245 335335 228228 290290 18901890► 201201 350350 271271 355355 21872187► 183183 339339 440440 300300 20322032► 237237 327327 475475 284284 18561856► 175175 328328 347347 337337 20682068► 152152 319319 449449 279279 18131813► 188188 325325 336336 244244 18081808► 188188 322322 267267 253253 18341834► 197197 317317 235235 272272 19731973► 261261 315315 164164 223223 18391839► 232232 331331 270270 272272 19351935► run;run;► proc reg data=test;proc reg data=test;► model y = x1 x2 x3 x4 /SELECTION =RSQUARE adjrsq CP mse ;model y = x1 x2 x3 x4 /SELECTION =RSQUARE adjrsq CP mse ;► run;run;

ResultsResults► Number in AdjustedNumber in Adjusted► Model R-Square R-Square C(p) MSE Variables in Model R-Square R-Square C(p) MSE Variables in

ModelModel

► 1 0.3660 0.3396 13.3215 1491.55073 x11 0.3660 0.3396 13.3215 1491.55073 x1► 1 0.1710 0.1365 24.1846 1950.27491 x41 0.1710 0.1365 24.1846 1950.27491 x4► 1 0.0597 0.0205 30.3884 2212.24598 x31 0.0597 0.0205 30.3884 2212.24598 x3► 1 0.0091 -.0322 33.2078 2331.30545 x21 0.0091 -.0322 33.2078 2331.30545 x2► --------------------------------------------------------------------------------------------------------------------------------------------------------------------► 2 0.4899 0.4456 8.4193 1252.26402 x1 x22 0.4899 0.4456 8.4193 1252.26402 x1 x2► 2 0.4499 0.4021 10.6486 1350.49234 x1 x32 0.4499 0.4021 10.6486 1350.49234 x1 x3► 2 0.4288 0.3791 11.8231 1402.24672 x3 x42 0.4288 0.3791 11.8231 1402.24672 x3 x4► 2 0.3754 0.3211 14.7982 1533.34044 x1 x42 0.3754 0.3211 14.7982 1533.34044 x1 x4► 2 0.2238 0.1563 23.2481 1905.67595 x2 x42 0.2238 0.1563 23.2481 1905.67595 x2 x4► 2 0.0612 -.0205 32.3067 2304.83375 x2 x32 0.0612 -.0205 32.3067 2304.83375 x2 x3► --------------------------------------------------------------------------------------------------------------------------------------------------------------------► 3 0.5378 0.4748 7.7517 1186.29444 x1 x3 x43 0.5378 0.4748 7.7517 1186.29444 x1 x3 x4► 3 0.5362 0.4729 7.8418 1190.44739 x1 x2 x33 0.5362 0.4729 7.8418 1190.44739 x1 x2 x3► 3 0.5092 0.4423 9.3449 1259.69053 x1 x2 x43 0.5092 0.4423 9.3449 1259.69053 x1 x2 x4► 3 0.4591 0.3853 12.1381 1388.36444 x2 x3 x43 0.4591 0.3853 12.1381 1388.36444 x2 x3 x4► --------------------------------------------------------------------------------------------------------------------------------------------------------------------► 4 0.6231 0.5513 5.0000 1013.46770 x1 x2 x3 x44 0.6231 0.5513 5.0000 1013.46770 x1 x2 x3 x4

► Number in AdjustedNumber in Adjusted► Model R-Square R-Square C(p) MSE Variables in Model R-Square R-Square C(p) MSE Variables in

ModelModel

► 1 0.3660 0.3396 13.3215 1491.55073 x11 0.3660 0.3396 13.3215 1491.55073 x1► 1 0.1710 0.1365 24.1846 1950.27491 x41 0.1710 0.1365 24.1846 1950.27491 x4► 1 0.0597 0.0205 30.3884 2212.24598 x31 0.0597 0.0205 30.3884 2212.24598 x3► 1 0.0091 -.0322 33.2078 2331.30545 x21 0.0091 -.0322 33.2078 2331.30545 x2► --------------------------------------------------------------------------------------------------------------------------------------------------------------------► 2 0.4899 0.4456 8.4193 1252.26402 x1 x22 0.4899 0.4456 8.4193 1252.26402 x1 x2► 2 0.4499 0.4021 10.6486 1350.49234 x1 x32 0.4499 0.4021 10.6486 1350.49234 x1 x3► 2 0.4288 0.3791 11.8231 1402.24672 x3 x42 0.4288 0.3791 11.8231 1402.24672 x3 x4► 2 0.3754 0.3211 14.7982 1533.34044 x1 x42 0.3754 0.3211 14.7982 1533.34044 x1 x4► 2 0.2238 0.1563 23.2481 1905.67595 x2 x42 0.2238 0.1563 23.2481 1905.67595 x2 x4► 2 0.0612 -.0205 32.3067 2304.83375 x2 x32 0.0612 -.0205 32.3067 2304.83375 x2 x3► --------------------------------------------------------------------------------------------------------------------------------------------------------------------► 3 0.5378 0.4748 7.7517 1186.29444 x1 x3 x43 0.5378 0.4748 7.7517 1186.29444 x1 x3 x4► 3 0.5362 0.4729 7.8418 1190.44739 x1 x2 x33 0.5362 0.4729 7.8418 1190.44739 x1 x2 x3► 3 0.5092 0.4423 9.3449 1259.69053 x1 x2 x43 0.5092 0.4423 9.3449 1259.69053 x1 x2 x4► 3 0.4591 0.3853 12.1381 1388.36444 x2 x3 x43 0.4591 0.3853 12.1381 1388.36444 x2 x3 x4► --------------------------------------------------------------------------------------------------------------------------------------------------------------------► 4 0.6231 0.5513 5.0000 1013.46770 x1 x2 x3 x44 0.6231 0.5513 5.0000 1013.46770 x1 x2 x3 x4

11.7.2 Best Subsets Regression & SAS11.7.2 Best Subsets Regression & SAS

The resource of the example is The resource of the example is http://www.math.udel.edu/teaching/course_materialhttp://www.math.udel.edu/teaching/course_materials/m202_climent/Multiple%20Regression%20-%20Mods/m202_climent/Multiple%20Regression%20-%20Model%20Building.pdfel%20Building.pdf

11.5, 11.811.5, 11.8 Building A Multiple Regression Building A Multiple Regression

ModelModel

by SiYuan Luo & Xiaoyu Zhangby SiYuan Luo & Xiaoyu Zhang

•Building a multiple regression model consists of 7 steps. Building a multiple regression model consists of 7 steps.

•Though it is not necessary to follow each and every step in Though it is not necessary to follow each and every step in exact sequence shown on the next slide, the general exact sequence shown on the next slide, the general approach and major steps should be followed.approach and major steps should be followed.

•The model is an iterative process, it may take several The model is an iterative process, it may take several cycles of the steps before arriving at the final model. cycles of the steps before arriving at the final model.

IntroductionIntroduction

The The ““7 steps7 steps””

1.Decide 1.Decide the typethe type

6.Select and 6.Select and evaluateevaluate

5.Fit candidate 5.Fit candidate modelsmodels

4.Divide 4.Divide the datathe data

7.Select the 7.Select the final modelfinal model

3.Explore 3.Explore the datathe data

2.Collect 2.Collect the datathe data

Step 1 Step 1 Decide the typeDecide the type► Decide the type of model needed, different types of models Decide the type of model needed, different types of models

includes:includes: PredictivePredictive – a model used to predict the response variable from – a model used to predict the response variable from

a chosen set of predictor variables.a chosen set of predictor variables. TheoreticalTheoretical – a model based on a theoretical relationship – a model based on a theoretical relationship

between a response variable and predictor variables. between a response variable and predictor variables. ControlControl – a model used to control a response variable by – a model used to control a response variable by

manipulating predictor variables.manipulating predictor variables. InferentialInferential – a model used to explore the strength of – a model used to explore the strength of

relationships between a response variable and individual relationships between a response variable and individual predictor variables. predictor variables.

Data summaryData summary – a model used primarily as a device to – a model used primarily as a device to summarize a large set of data by a single equation. summarize a large set of data by a single equation.

► Often a model can be used for multiple purposes. Often a model can be used for multiple purposes. ► The type of model dictates the type of data needed. The type of model dictates the type of data needed.

Step 2 Collect the dataStep 2 Collect the data

► Decide the variables (Decide the variables (predictorpredictor and and responseresponse) on which ) on which to collect data. Measurement of the variables should be to collect data. Measurement of the variables should be done the right way depending on the type of subject. done the right way depending on the type of subject.

► See chapter 3 for precautions necessary to obtain relevant, See chapter 3 for precautions necessary to obtain relevant, bias-free data.bias-free data.

Step 3 Explore the dataStep 3 Explore the data

► The data should be examined for outliers, gross errors, The data should be examined for outliers, gross errors, missing values, etc. on a univariate basis using the technmissing values, etc. on a univariate basis using the techniques discussed in chapter 4. Outliers cannot just be omiiques discussed in chapter 4. Outliers cannot just be omitted because much useful information can be lost. See ctted because much useful information can be lost. See chapter 10 for how to deal with outliers. hapter 10 for how to deal with outliers.

► Scatter plots should be made to study bivariate relationScatter plots should be made to study bivariate relationships between the response variable and each of the preships between the response variable and each of the predictors. They are useful in suggesting possible transformdictors. They are useful in suggesting possible transformations to linearize the relationships. ations to linearize the relationships.

Step 4 Divide the data Step 4 Divide the data

► Divide the data into Divide the data into trainingtraining and and testtest sets: only a sets: only a subset of the data, the training set, should be used subset of the data, the training set, should be used to fit the model (step 5 and 6); the remainder, to fit the model (step 5 and 6); the remainder, called the training set, should be used for cross-called the training set, should be used for cross-validation of the fitted model (step 7). validation of the fitted model (step 7).

► The reason for using an independent data set to The reason for using an independent data set to test the model is that if the same data are used for test the model is that if the same data are used for both fitting and testing, then an overoptimistic both fitting and testing, then an overoptimistic estimate of the predictive ability of the fitted model estimate of the predictive ability of the fitted model is obtained. is obtained.

► The split for the two sets should be done randomly.The split for the two sets should be done randomly.

Step 5 fit Candidate models Step 5 fit Candidate models

► Generally several equally good models can be Generally several equally good models can be identified using the training data set. identified using the training data set.

► By conducts several runs by varying FIN and FOUT By conducts several runs by varying FIN and FOUT values, we can identify several that fits the training values, we can identify several that fits the training set. set.

Step 6 Select and evaluate Step 6 Select and evaluate

► From the list of candidate models we are now ready From the list of candidate models we are now ready to select two or three good models based on criteria to select two or three good models based on criteria such as the Cp-statistic, the number of predictors such as the Cp-statistic, the number of predictors (p), and the nature of predictors.(p), and the nature of predictors.

► These selected models should be checked for These selected models should be checked for violation of model assumptions using standard violation of model assumptions using standard diagnostic techniques, in particular, residual plots. diagnostic techniques, in particular, residual plots. Transformations in the response variable or some of Transformations in the response variable or some of the predictor variables may be necessary to the predictor variables may be necessary to improve model fits. improve model fits.

Step 7 Select the Final Step 7 Select the Final model:model:

► This is the step where we compare competing This is the step where we compare competing models by cross-validating them against the test models by cross-validating them against the test data. data.

► The model with a smaller cross-validation SSE is The model with a smaller cross-validation SSE is better predictive model. better predictive model.

► The final selection of the model is based on a The final selection of the model is based on a number of considerations, both statistical and no number of considerations, both statistical and no statistical. These include residual plots, outliers, statistical. These include residual plots, outliers, parsimony, relevance, and ease of measurement of parsimony, relevance, and ease of measurement of predictors. A final test of any model is that it makes predictors. A final test of any model is that it makes practical sense and the client is willing to buy it. practical sense and the client is willing to buy it.

Regression Diagnostics (Step VI)Regression Diagnostics (Step VI)

►Graphical Analysis of ResidualsGraphical Analysis of Residuals Plot Estimated Errors vs. Plot Estimated Errors vs. XXii Values Values

►Difference Between Actual Difference Between Actual YYii & Predicted & Predicted YYii

►Estimated Errors Are Called ResidualsEstimated Errors Are Called Residuals

Plot Histogram or Stem-&-Leaf of Plot Histogram or Stem-&-Leaf of ResidualsResiduals

►PurposesPurposes Examine Functional Form (Linearity )Examine Functional Form (Linearity ) Evaluate Violations of AssumptionsEvaluate Violations of Assumptions

Linear Regression Linear Regression AssumptionsAssumptions

► Mean of Probability Distribution of Error Is 0Mean of Probability Distribution of Error Is 0

► Probability Distribution of Error Has Constant VarianceProbability Distribution of Error Has Constant Variance

► Probability Distribution of Error is NormalProbability Distribution of Error is Normal

► Errors Are IndependentErrors Are Independent

Residual Plot Residual Plot for Functional Form (Linearity)for Functional Form (Linearity)

X

e

X

e

Add X^2 TermAdd X^2 Term Correct SpecificationCorrect Specification

Residual Plot Residual Plot for Equal Variancefor Equal Variance

X

SR

X

SR

Unequal VarianceUnequal Variance Correct SpecificationCorrect Specification

Fan-shaped.Fan-shaped.Standardized residuals used typically (residual Standardized residuals used typically (residual

divided by standard error of prediction) divided by standard error of prediction)

Residual Plot Residual Plot for Independencefor Independence

X

SR

X

SR

Not IndependentNot Independent Correct SpecificationCorrect Specification

Data transformationsData transformations

►Why do we need data transformations?Why do we need data transformations? Make seemingly nonlinear models linearMake seemingly nonlinear models linear

example:example:

Sometimes it gives a better explanation of Sometimes it gives a better explanation of the variation in the datathe variation in the data

1 20 1 2y x x

0 1 1 2 2log log log logy x x

►How do we do the data transformations?How do we do the data transformations? Power family of transformations on the Power family of transformations on the

response :Box-Cox methodresponse :Box-Cox method►Requirements:Requirements:

all the data is always positiveall the data is always positive

The ratio of the largest observed Y to the The ratio of the largest observed Y to the smallest issmallest is

at least 10at least 10

1 2, ,...., nY Y Y

Transformation formTransformation form

V= V=

where is the geometric mean of the where is the geometric mean of the

( 1) / , 0Y Y for

ln , 0Y Y for

Y iY1/

1 2( .... ) nnY YY Y

How to estimate How to estimate ► 1.Choose a value of from a selected range. Usually we 1.Choose a value of from a selected range. Usually we

look for it in the range (-1,1),we would usually cover the look for it in the range (-1,1),we would usually cover the selected range with about 11-21 values ofselected range with about 11-21 values of

► 2.For each value, evaluate V by applying each Y to the 2.For each value, evaluate V by applying each Y to the formula above. You will create a vector V=( ), then formula above. You will create a vector V=( ), then use it to fit a linear model by least squares use it to fit a linear model by least squares method. Record the residual sum of squares for the method. Record the residual sum of squares for the regressionregression

► 3. Plot versus .Draw a smooth curve through 3. Plot versus .Draw a smooth curve through the plotted points, and find at what value of the lowest the plotted points, and find at what value of the lowest point of the curve lies. That , is the maximum likelihood point of the curve lies. That , is the maximum likelihood estimate of estimate of

1, , nV V

V X

( , )S V

( , )S V

► Example:Example: The data in table are part of a more extensive The data in table are part of a more extensive

set given by Derringer(1974). This paper has set given by Derringer(1974). This paper has been adapted with permission of John Wiley & been adapted with permission of John Wiley & Sons, Inc. we wish to find a transformation of the Sons, Inc. we wish to find a transformation of the form , form ,

or , which will provide a good first-or , which will provide a good first-order fit to the data. Our model form is order fit to the data. Our model form is where f is the filler level and p is the where f is the filler level and p is the plasticizer level.plasticizer level.

( 1) / , 0V Y Y for

ln , 0Y Y for

0 1 2V f p

NaphNaphthenithenic c Oil,pOil,phr, phr, p

Filler, phr, fFiller, phr, f

00 1212 2424 3636 4848 6060

00 2626 3838 5050 7676 108108 157157

1010 1717 2626 3737 5353 8383 124124

2020 1313 2020 2727 3737 5757 8787

3030 -------- 1515 2222 2727 4141 6363

Note that the response data range from 157 to Note that the response data range from 157 to 13, a ratio of 157/13=12.1>10, hence a 13, a ratio of 157/13=12.1>10, hence a transformation on Y is likely to be effective. The transformation on Y is likely to be effective. The geometric mean is 41.5461 for this set of data.geometric mean is 41.5461 for this set of data.

The next table shows a selected values of The next table shows a selected values of

We pick 20 different values of from (-1,1) in We pick 20 different values of from (-1,1) in this case.this case.

( , )S V

-1.0-1.0 -0.8-0.8 -0.6-0.6 -0.4-0.4 -0.2-0.2 -0.15-0.15 -0.10-0.10 -0.08-0.08 -0.06-0.06 -0.05-0.05

24562456 14531453 779.779.11

354.7354.7 131.7131.7 104.5104.5 88.388.3 84.984.9 83.383.3 83.283.2

-0.04-0.04 -0.02-0.02 0.000.00 0.050.05 0.100.10 0.20.2 0.40.4 0.60.6 0.80.8 1.01.0

83.583.5 85.585.5 89.389.3 106.7106.7 135.9135.9 231.1231.1 588.0588.0 12221222 22432243 38213821

( , )S V

( , )S V

A smooth curve A smooth curve through these through these points is plotted in points is plotted in the next figure. We the next figure. We see that the see that the minimum occurs minimum occurs at about = -0.05. at about = -0.05. This is close to zero, This is close to zero, so suggesting that so suggesting that the transformation the transformation V= , or more V= , or more simply .simply .

( , )S V

lnY Y

lnY

Application of the transformation to the original Application of the transformation to the original data, then we get a set of data which are better data, then we get a set of data which are better linearly related. The best plane, fitted to these linearly related. The best plane, fitted to these transformed data by least squares, is transformed data by least squares, is

=3.212+0.03088f-0.03152p.=3.212+0.03088f-0.03152p.

the ANOVA table for this model isthe ANOVA table for this model is

ˆlnY

SourceSource DfDf SSSS MSMS FF

11 319.4485319.448555

----------

, |, | 22 10.5166710.51667 5.275835.27583 20452045

ResidualResidual 2020 0.051710.05171 0.002580.00258

TotalTotal 2323 330.0519330.051933

0b

1b 2b0b

If we had fitted a first-order model to the If we had fitted a first-order model to the untransformed data, we will obtainuntransformed data, we will obtain

=28.184+1.55f-1.717p=28.184+1.55f-1.717p

ANOVA table for this modelANOVA table for this modelY

SourceSource DfDf SSSS MSMS FF

, |, | 22 27842.6227842.62 13921.3113921.31 72.972.9

ResiduResidualal

2020 3820.603820.60 191.03191.03

Total, Total, correctedcorrected 2222 31663.2231663.22

1b 2b 0b

► We find out the transformed model has much We find out the transformed model has much stronger F-value.stronger F-value.

11.6.1 -11.6.311.6.1 -11.6.3Topics in Regression Topics in Regression

Modeling Modeling

Yikang Chai & Tao LiYikang Chai & Tao Li

11.6.1 Multicollinearity11.6.1 Multicollinearity

► Def.Def. The columns of the X matrix are exactly or The columns of the X matrix are exactly or approximately linearly dependent.approximately linearly dependent.

It means the predictor variables are related. It means the predictor variables are related.

► why are we concerned about it? why are we concerned about it?

This can cause serious numerical and statistical difficulties This can cause serious numerical and statistical difficulties in fitting the regression model unless “extra” predictor in fitting the regression model unless “extra” predictor variables are deleted.variables are deleted.

How does the multicollinearity How does the multicollinearity cause difficulties?cause difficulties?

The multicollinearity leads to the following The multicollinearity leads to the following problems:problems:

1. is nearly singular, which makes numerically unstable. This reflected in large changes in their magnitudes with small changes in data.

2. The matrix has very large elements. Therefore are large, which makes

statistically nonsignificant.

TX X^

1)( XXV T

jjvVar 2^

)( j

^

Measures of MulticollinearityMeasures of Multicollinearity

Three ways:Three ways:1.1. The correlation matrix R. Easy but can’t reflect linear The correlation matrix R. Easy but can’t reflect linear

relationships between more than two variables.relationships between more than two variables.

2. Determinant of R can be used as measurement of 2. Determinant of R can be used as measurement of

singularity ofsingularity of . . 3. Variance Inflation Factors (VIF): the diagonal elements of 3. Variance Inflation Factors (VIF): the diagonal elements of

. Generally, VIF>10 is regarded as unacceptable. . Generally, VIF>10 is regarded as unacceptable.

TX X

1R

11.6.2 Polynomial Regression11.6.2 Polynomial Regression

Consider the special case:

0 1 . . . kky x x

Problems:Problems: 1. The powers of x, i.e., tend to be

highly correlated.

2, , , kx x x

2. If k is large, the magnitudes of these powers tend to vary over a rather wide range.

These problems lead to numerical errors.These problems lead to numerical errors.

How to solve these How to solve these problems?problems?

Two ways: Two ways: 1. Centering the x-variable:1. Centering the x-variable:

Removing the non-essential multicollinearity in the data.Removing the non-essential multicollinearity in the data.

* * *0 1 ( ) ... ( ) k

ky x x x x

2. Standardize the x-variable: 2. Standardize the x-variable:

x

x x

s

Alleviate the problem that x varying over a wide Alleviate the problem that x varying over a wide

range.range.

11.6.3 Dummy Predictor Variables11.6.3 Dummy Predictor Variables

It’s an method to deal with the categorical variables.It’s an method to deal with the categorical variables.

1.For ordinal categorical variables, 1.For ordinal categorical variables, such as the such as the prognosis of a patient (poor, average, good), just prognosis of a patient (poor, average, good), just assign numerical scores to the categories. assign numerical scores to the categories. (poor=1, average=2, good=3)(poor=1, average=2, good=3)

2. If we have nominal variable with c>=2 categories. Use c-1 If we have nominal variable with c>=2 categories. Use c-1 indicator variables, indicator variables, , called , called Dummy VariablesDummy Variables, to code., to code.

1 1, , cx x

How to code?How to code?

set for the ith category,

for the cth category.

1ix 1 1i c

1 1... 0cx x

Why don’t we just use c indicator variables: ?1 2, , ..., cx x x

because there will be a linear dependency among them:

This will cause multicollinearity. 1 2 ... 1cx x x

ExampleExample The season of a year can be coded with three indicators: The season of a year can be coded with three indicators:

x1(winter),x2(spring),x3(summer). With this coding x1(winter),x2(spring),x3(summer). With this coding (1,0,0)for Winter ,(0,1,0) for Spring, (0,0,1) for Summer and (1,0,0)for Winter ,(0,1,0) for Spring, (0,0,1) for Summer and

(0,0,0) for Fall(0,0,0) for Fall

Consider modeling the temperature of a year of Consider modeling the temperature of a year of an area as a function of the season (X) and its an area as a function of the season (X) and its

latitude (A) latitude (A) , , we can get the following model: we can get the following model:

ii4i33i22i110i AXXXY

For winter:For winter: ii410i A)(Y

For spring:For spring: ii420i A)(Y

For summer:For summer: ii430i A)(Y

For fall:For fall: ii40i AY

Logistic Regression ModelLogistic Regression Model

► 1938, By R. A. Fisher and Frank Yates1938, By R. A. Fisher and Frank Yates► Logistic transform for analyzing binary data.Logistic transform for analyzing binary data.

ln(Odds of Y 1| x)

0 1

( 1| ) ( 1| )ln ln

1 ( 1| ) ( 0 | )

P Y x P Y xx

P Y x P Y x


► The Importance of The Importance of Logistic Regression ModelLogistic Regression Model

1.1. Logistic regression model Logistic regression model is the most popular model for is the most popular model for binary data.binary data.

2.2. Logistic regression model Logistic regression model is generally used for is generally used for binary binary reresponse variables.sponse variables.

Y = 1 (true, success, YES, etc.) , Y = 1 (true, success, YES, etc.) , while Y = 0 ( false, failure, NO, etc.)while Y = 0 ( false, failure, NO, etc.)

Logistic Regression ModelLogistic Regression Model► Details of Details of Regression ModelRegression Model

Main StepMain Step

1.1. Consider a response variable Y {0 or 1} and a single prediConsider a response variable Y {0 or 1} and a single predictor variable x. ctor variable x.

2.2. Model E(Y|x) =P(Y=1|x) as a function of x. The logistic regrModel E(Y|x) =P(Y=1|x) as a function of x. The logistic regr

ession model expresses the logistic transform of P(Y=1|x)ession model expresses the logistic transform of P(Y=1|x)..

0 1

( 1| ) ( 1| )ln(Odds of Y 1| x) ln ln

1 ( 1| ) ( 0 | )

P Y x P Y xx

P Y x P Y x

0 1

0 1

exp( )( 1| )

1 exp( )

xP Y x

x

Logistic Regression ModelLogistic Regression Model► ExampleExample

http://faculty.vassar.edu/lowry/logreg1.htmlhttp://faculty.vassar.edu/lowry/logreg1.html

ii IiIi iiiiii iviv vv vivi viivii

XX

Instances of YInstances of YCoded asCoded as

TotalTotalii+iii ii+iii

Y asY asObservedObservedProbabilityProbability

Y asY asOdds Odds RatioRatio

Y as LogY as LogOdds Odds RatioRatio 0 0

282829293030313132323333

443322224411

22227777

16161414

66559999

20201515

.3333.3333

.4000.4000

.7778.7778

.7778.7778

.8000.8000

.9333.9333

.5000 .5000

.6667 .6667 3.5000 3.5000 3.5000 3.5000 4.0000 4.0000

14.0000 14.0000

-.6931 -.6931 -.4055 -.4055 1.2528 1.2528 1.2528 1.2528 1.3863 1.3863 2.6391 2.6391

Logistic Regression ModelLogistic Regression ModelA. A. Ordinary Linear RegressionOrdinary Linear Regression B. B. Logistic RegressionLogistic Regression


► Weighted Linear Regression ofWeighted Linear Regression ofObserved Log Odds Ratios on XObserved Log Odds Ratios on X

XX ObservedObserved LogLog WeightWeight

2828 0.33330.3333 -.6931 -.6931 66

2929 0.40.4 -.4055 -.4055 55

3030 0.77780.7778 1.2528 1.2528 99

3131 0.77780.7778 1.2528 1.2528 99

3232 0.80.8 1.3863 1.3863 2020

3333 0.93330.9333 2.6391 2.6391 1515

Logistic Regression ModelLogistic Regression Model► Properties of Regression ModelProperties of Regression Model E(Y|x) = P(Y=1| x) *1 + P(Y=0|x) * 0 = P(Y=1|x) is bounded E(Y|x) = P(Y=1| x) *1 + P(Y=0|x) * 0 = P(Y=1|x) is bounded

between 0 and 1 for all values of x . While, it is not true if wbetween 0 and 1 for all values of x . While, it is not true if we use model:e use model:

In ordinary regression, the regression coefficient has the In ordinary regression, the regression coefficient has the interpretation that it is interpretation that it is the log of the odds ratio of a succesthe log of the odds ratio of a success event (Y=1) for a unit change in x.s event (Y=1) for a unit change in x.

► Extension to Multiple predictor variablesExtension to Multiple predictor variables

10 1P(Y=1|x) = x

1 20 1 1

1 2

( 1| , ,..., )ln ...

( 0 | , ,..., )k

k kk

P Y x x xx x

P Y x x x

Standardized Regression CoefficientsStandardized Regression Coefficients

► Why we need standardize regression coefficients?Why we need standardize regression coefficients? Recall the regression equation for linear regression modRecall the regression equation for linear regression mod

el:el:

1.1. The magnitudes of the can not be directly used to judgThe magnitudes of the can not be directly used to judge the relative effects of on y.e the relative effects of on y.

2.2. By using standardized regression coefficients, we may bBy using standardized regression coefficients, we may be able to judge the importance of different predictors e able to judge the importance of different predictors

^ ^ ^ ^ ^

0 1 1 2 2 ... k ky x x x

^

jjx

Standardized Regression CoefficientsStandardized Regression Coefficients

► Standardized Transform Standardized Transform

► Standardized Regression Coefficients Standardized Regression Coefficients

_

* , 1, 2,...,ii

y

y yy i n

s

_

* , 1, 2,..., ; 1, 2,...,j

ij jij

x

x xx i n j k

s

^^*

, 1, 2,... .xjjj

y

sj k

s

Standardized Regression CoefficientsStandardized Regression Coefficients► Example(Industrial sales data from Example(Industrial sales data from text booktext book))

► Linear Model:Linear Model:► The regression equation:The regression equation:

► Notice:Notice: but thus has a much larger effect than on y but thus has a much larger effect than on y

0 1 1 2 2i i i iy x x 1,2,...,10i ^ ^ ^ ^

0 1 1 2 2 1 22.61 0.129 0.341y x x x x

^ ^

1 2 1 20.192, 0.341, 6.830, 0.461, 1.501x x ys s s

^^*

111

6.8300.192 0.875

1.501x

y

s

s

^^*

222

0.4610.3406 0.105

1.501x

y

s

s

^ ^* *

1 2| | | |,

^ ^

1 2| | | 2x1x

Chapter SummaryChapter Summary► Multiple linear regression model Multiple linear regression model ► Fitting the multiple regression modelFitting the multiple regression model

Least squares fit Least squares fit

Goodness of fit of the model – SSE, SST, SSR, r^2Goodness of fit of the model – SSE, SST, SSR, r^2

► Statistical inference for multiple regressionStatistical inference for multiple regression

1. T-test 1. T-test

2. F-test for all for at least one 2. F-test for all for at least one

3. F-test for at least one 3. F-test for at least one

► How do we select variables (SAS)?How do we select variables (SAS)?

Stepwise regression - it’s fancy algorithmStepwise regression - it’s fancy algorithm

Best subsets regression – more realisticBest subsets regression – more realistic, flexible, flexible

► How about if the data is not linear? How about if the data is not linear?

Data transformationData transformation ► Building a multiple regression model – 7 stepsBuilding a multiple regression model – 7 steps

1 1 , 1, 2,...,ki i i k i iY x x x i n

H0 j : j 0

H1 j : j 0

H0 j : j 0

0 j k

H1 j : j 0

0 j k

H0 :k m1 ...k 0

Ha : j 0

k m 1 j k

We very appreciate We very appreciate your attention =)your attention =)

Please feel free to ask questions.Please feel free to ask questions.

The EndThe End

Thank You!Thank You!

chapter 11 multiple linear regression chapter 11 multiple linear regression

Documents