# generalized linear regression - little dumb doctor to multiple linear regression 3 tics tutorial at...

Post on 02-Apr-2018

213 views

Embed Size (px)

TRANSCRIPT

Multiple Linear Regressiondand

the General Linear Modelthe General Linear Model

1

More Statistics tutorial at www.dumblittledoctor.com

OutlineOutline

1. Introduction to Multiple Linear Regressionp g2. Statistical Inference 3. Topics in Regression Modelingp g g4. Example 5. Variable Selection Methods6. Regression Diagnostic and Strategy for Building a Model

2

More Statistics tutorial at www.dumblittledoctor.com

1.Introduction to Multiple Linear

RegressionRegression

3

More Statistics tutorial at www.dumblittledoctor.com

Multiple Linear RegressionMultiple Linear Regression Regression analysis is a statistical methodology to estimate the relationship of a response variable to a set of predictor variables

Multiple linear regression extends simple linear regression model to the case of two or more predictor variable

Example:Example:

A multiple regression analysis might show us that the demand of a product varies directly with the change in demographic characteristics (age, income) of a market area

Historical Background Francis Galton started using the term regression in his biology research

area.

Francis Galton started using the term regression in his biology research Karl Pearson and Udny Yule extended Galtons work to the statistical context Legendre and Gauss developed the method of least squares used in regression analysis Ronald Fisher developed the maximum likelihood method used in the related statistical inference (test of the significance of regression etc.).

4

More Statistics tutorial at www.dumblittledoctor.com

Probabilistic Model

iy is the observed value of the random variable (r.v.) iY which depends on

1 1 , 1, 2,...,ki i i k i iY x x x i n 0 1 2= + + + ... + + =

1 2, , ,i i ikx x xK according to the following model:fixed predictor values

1 1 , 1, 2,...,ki i i k i iY x x x i n 0 1 2 + + + ... + +

Here 0 1, , , k K are unknown model parameters, and n is the number of observations.

The random error, , are assumed to be independent r.v.s with mean 0 and variance i

Thus are independent r.v.s with mean and variance , where iY i

22p ,i i

1 1( ) ki i i k ii E Y x x x 0 1 2= = + + + ... +

6

More Statistics tutorial at www.dumblittledoctor.com

Fitting the model

The least squares (LS) method is used to find a line that fits the equationThe least squares (LS) method is used to find a line that fits the equation

Specifically, LS provides estimates of the unknown model parameters,

1 1 , 1, 2,...,ki i i k i iY x x x i n 0 1 2= + + + ... + + =

0 1, , , k K Qwhich minimizes, , the sum of squared difference of theobserved values, , and the corresponding points on the line with the same xsiy

1 22

0 1 21[ ( ... )]k

n

i i i k ii

Q y x x x =

= + + + + The LS can be found by taking partial derivatives of Q with respect to the unknown variables and setting them equal to 0. The result is a set of simultaneous linear equations.

0 1, , , k K

The resulting solutions, are the least squares (LS) estimators of , respectively.

0 1, , , k K0 1 , , , k K

Please note the LS method is non-parametric. That is, no probability distribution assumptions on Y or are needed. 7

More Statistics tutorial at www.dumblittledoctor.com

Goodness of Fit of the Model

To evaluate the goodness of fit of the LS model, we use the residuals ( 1, 2 , , )i i ie y y i n= = K

gdefined by are the fitted values: iy

An overall measure of the goodness of fit is the error sum of squares (SSE)

1 1 ( 1, 2, ..., )ki i i k iy x x x i n 0 1 2= + + + ... + =

2

1min

n

ii

Q SSE e=

= = A f h d fi i i i il h i i l li i A few other definition similar to those in simple linear regression:

total sum of squares (SST):2( )SST 2( )iSST y y=

regression sum of squares (SSR):

SSR SST SSE=

8

More Statistics tutorial at www.dumblittledoctor.com

coefficient of multiple determination:

2 1SSR SSER2 1SS SSRSST SST

= =

20 1R values closer to 1 represent better fits adding predictor variables never decreases and generally increases 2R

multiple correlation coefficient (positive square root of ):2R

2R R= + only positive square root is used

R is a measure of the strength of the association between the predictors ( ) d th i bl Y(xs) and the one response variable Y

9

More Statistics tutorial at www.dumblittledoctor.com

Multiple Regression Model in Matrix Notationp g

The multiple regression model can be represented in a compact form using matrix notationLet:

1

2 ,

YY

Y

= M

1

2 ,

yy

y

= M

1

2

= M

nY

M

n

y

y

M

n

M

be the n x 1 vectors of the r.v.s , their observed values , and random errors , ti l f ll b ti

'iY s 'iy s 'i srespectively for all n observationsLet:

11 1

21 2

11

k

k

x xx x

X

=

L

L

M M O M

11 n nkx x

M M O M

L

be the n x (k + 1) matrix of the values of the predictor variables for all n observationsbe the n x (k + 1) matrix of the values of the predictor variables for all n observations(the first column corresponds to the constant term )

010

More Statistics tutorial at www.dumblittledoctor.com

Let: 0

1

and

0

1

1

k

=

M

and

k

=

M

be the (k + 1) x 1 vectors of unknown model parameters and their LS estimates, respectively

The model can be rewritten as:

Y X Y X = + The simultaneous linear equations whose solutions yields the LS estimates:

' 'X X X y =X X X y = If the inverse of the matrix exists, then the solution is given by:'X X

1 ( ' ) 'X X X y =

11

More Statistics tutorial at www.dumblittledoctor.com

22.Statistical InferenceStatistical Inference

12

More Statistics tutorial at www.dumblittledoctor.com

Determining the statistical significance of the di t i blpredictor variables:

For statistical inferences, we need the assumption that

We test the hypotheses: H0 j : j = 0 H1 j : j 0vs.( )2~ 0,

iid

i N *i.i.d. means independent & identically distributed

If we cant reject H0 j : Bj = 0 , then the corresponding variable x j is not a significant predictor of y.j is not a significant predictor of y.

It is easily shown that each j is normal with mean jand variance 2v jj , where is the jth diagonal entry of the

matrix V = (X ' X)1v jj

13

More Statistics tutorial at www.dumblittledoctor.com

Deriving a pivotal quantity for the inference on

),(~ 2 jjjj vN Recall

j

The unbiased estimator of the unknown error varianceis given by

2

S2 = SSEn (k +1)

=ei

2

n (k +1)=

MSEd.o. f .

We also know that2

2( 1)2 2

( ( 1)) ~ n kn k S SSEW

+ +

= = , and thatS2and j are statistically independent.

With ~ (0,1),j jjj

Z Nv

= and by the definition of the t-distribution,

j jZT T

= =

we obtain the pivotal quantity for the inference on j

( 1)~/ ( 1) n kjjT T

W n k S v += =

+14

Derivation of the Confidence Interval for j jj

= ++ 1)( )1(,2/)1(,2/ knjj

jjkn tvS

tP

=+ ++ 1)( )1(,2/)1(,2/ jjknjjjjknj vstvstP

Thus the 100(1-)% confidence interval for is: j

/ 2, ( 1) ( )j n k jt SE +

where jjj vsSE =)(15

Derivation of the Hypothesis Test for jat the significance level Hypotheses: 0 : 0jH =

j

yp: 0

j

a jH

The test statistic is: 0 0 HjT T

The test statistic is: 0 ( 1)~

jn k

jj

T TS v +

=

The decision rule of the test is derived based on the Type I

P (Reject H0 | H0 is true) =

t

error rate . That is0( )P T c =

/ 2, ( 1)n kc t +=

Therefore, we reject H0 at the significance level if and only if h i th b d l f t Tif , where is the observed value of 0 / 2, ( 1)n kt t +

16

0t 0T

Another Hypothesis TestAnother Hypothesis Test

f llH 0 0 kNow consider:

for allfor at least one

H0 : j = 0Ha : j 0

0 j k0 j k

When H0 is true, the test statistics 0 , ( 1)~ k n kMSRF fMSE +

=

An alternative and equivalent way to make a decision for a statisticaltest is through the p-value, defined as:

p = P(observe a test statistic value at least as extreme as the one observed

At the significance level we reject H0 if and only if p <

0| )H

At the significance level , we reject H0 if and only if p < 17

The General Hypothesis TestThe General Hypothesis Test

Consider the full model:

Now consider a partial model:

Yi = 0 + 1xi1 + ...+ k xik + i (i=1,2,n)

Yi = 0 + 1xi1 + ...+ km xi,km + i

H : = = = 0 vs H : 0

(i=1,2,n)

H th H0 : km +1 = ... = k = 0 vs. Ha : j 0

for at least one k m +1 j k Hypotheses:

Test statistic: 0 , ( 1)( ) / ~

/[ ( 1)]k m k

m n kk

SSE SSE mF fSSE n k

+

=

+

Reject H0 when 0 , ( 1),m n kF f +> 18

Estimating and Predicting Future ObservationsEstimating and Predicting Future Observations

Let x* = (x0*,x1

*,...,xk*)' and let

* * * * 0 1 1 ... k kY x x x = + + + =

* * * *

Th i t l tit f i* *

T T =

* * * * 0 1 1 ... k kx x x = + + + =

* The pivotal quantity for is ( 1)* * ~ n kT Ts x Vx +=

Using this pivotal quantity, we can derive a CI for the estimated

mean *: * *2/),1(

* Vxxst kn +

Additionally, we can derive a prediction interval (PI) to predict Y*:Additionally, we can derive a prediction interval (PI) to predict Y :* *

2/),1(* 1 VxxstY kn + + 19

3. T i i R iTopics in Regression

ModelingModeling

20

More Statistics tutorial at www.dumblittledoctor.com

3.1 Multicollinearity3.1 Multicollinearity

Definition The predictor variables areDefinition. The predictor variables are linearly dependent.

This can cause serious numerical and t ti ti l diffi lti i fitti thstatistical difficulties in fitting the

regression model unless extra predictor i bl d l t dvariables are deleted.

21

How does the multicollinearity cause diffi lti ?difficulties?

( ) ( ) invertable be must Thus , equation the to solution the is ^ XXyXXX TTT =( ) ( ).computable and unique be to for order in

^

If th i t lti lli it hIf the approximate multicollinearity happens:

1. is nearly singular, which makes numerically unstable. This reflected in large changes in their magnitudes with small changes in data.

TX X^

2. The matrix has very large elements. Therefore are large, which makes statistically nonsignificant.

1)( = XXV T jjvVar 2^)( =

j

^g , y gj

22

More Statistics tutorial at www.dumblittledoctor.com

Measures of MulticollinearityMeasures of Multicollinearity

1. The correlation matrix R.1. The correlation matrix R.Easy but cant reflect linear relationships between more than two variables.

2. Determinant of R can be used as measurement of singularity of .TX X

3. Variance Inflation Factors (VIF): the diagonal elements of . VIF>10 is regarded as

bl1R

unacceptable.

23

3.2 Polynomial Regression3.2 Polynomial Regression

A special case of a linear model:p

0 1 ...k

ky x x = + + + +

Problems:1. The powers of x, i.e., tend to be highly

correlated

2, , , kx x xKcorrelated.

2. If k is large, the magnitudes of these powers tend to vary over a rather wide range.g

So, set k5.

24

More Statistics tutorial at www.dumblittledoctor.com

SolutionsSolutions1. Centering the x-variable:

* * *( ) ( ) ky x x x x

= + + + +

Effect: removing the non-essential multicollinearity in the data.

0 1 ( ) ... ( )ky x x x x = + + + +

2. Further more, we can standardize the data by dividing the standard deviation of x: s

Eff t h l i t ll i t th d bl

x

x xs

xs

Effect: helping to alleviate the second problem.

3. Using the first few principal components of the original fvariables instead of the original variables.

25

More Statistics tutorial at www.dumblittledoctor.com

3.3 Dummy Predictor Variables & The G l Li M d lGeneral Linear Model

How to handle the categorical predictorHow to handle the categorical predictor variables?

1. If we have categories of an ordinal i bl h th i fvariable, such as the prognosis of a

patient (poor, average, good), one can i i l t thassign numerical scores to the

categories. (poor=1, average=2, good=3)

26

More Statistics tutorial at www.dumblittledoctor.com

2. If we have nominal variable with c>=2 categories. Use c-1 indicator variables,

, called Dummy Variables, to 1 1, , cx x Kcode.

for the ith category, 1ix = 1 1i c

for the cth category.1 1... 0cx x = = =

27

Why dont we just use c indicator variables:?1 2, , ..., cx x x

If th t th ill b liIf we use that, there will be a linear dependency among them:

1 2 ... 1cx x x+ + + =

This will cause multicollinearity.

28

Example of the dummy variablesExample of the dummy variables

For instance, if we have four years of quarterly sale data , y q yof a certain brand of soda. How can we model the time trend by fitting a multiple regression equation?

Solution: We use quarter as a predictor variable x1. To model the seasonal trend, we use indicator variables x2, x3, x4, for Winter, S SSpring and Summer, respectively. For Fall, all three equal zero. That means: Winter-(1,0,0), Spring-(0,1,0), Summer-(0,0,1), Fall-(0,0,0).Then we have the model:

0 1 1 2 2 3 3 4 4 ( 1,...,16)i i i i i iY x x x x i = + + + + + =

29

3. Once the dummy variables are included, the resulting regression model is referred to as aresulting regression model is referred to as a General Linear Model.

This term must be differentiated from that of the Generalized Linear Model which include the General Linear Model as a special case with the identity link function:

The generalized linear model will link the model

1 1( ) ki i i k ii E Y x x x 0 1 2= = + + + ... +

The generalized linear model will link the model parameters to the predictors through a link function. For another example, we will check out the logit link in the logistic regression this afternoon.

30

More Statistics tutorial at www.dumblittledoctor.com

4. Example4. ExampleHere we revisit the classic regression towards Mediocrity in Hereditary Stature by Francis GaltonMediocrity in Hereditary Stature by Francis GaltonHe performed a simple regression to predict offspring height based on the average parent heightSlope of regression line was less than 1 showing that extremely tall parents had less extremely tall childrenAt the time Galton did not have multiple regression asAt the time, Galton did not have multiple regression as a tool so he had to use other methods to account for the difference between male and female heightsWe can now perform multiple regression on parent-offspring height and use multiple variables as predictorsp

31

More Statistics tutorial at www.dumblittledoctor.com

ExampleExample

OUR MODEL: iiiioi xxxY ++++= 332211

Y = height of childh i ht f f thx1 = height of father

x2 = height of motherx = gender of childx3 = gender of child

32

ExampleExampleIn matrix notation:In matrix notation:

XY +=

yXXX ')'( 1

=

We find that: 0 = 15.344761 = 0.4059782 = 0.3214953 = 5.22595

33

More Statistics tutorial at www.dumblittledoctor.com

ExampleExampleFather Mother Gender Child

1 78.5 67 1 73.2

1 78.5 67 1 69.2

1 78.5 67 0 69

1 78.5 67 0 69

1 75.5 66.5 1 73.5X =1 75.5 66.5 1 72.5

1 75.5 66.5 0 65.5

1 75.5 66.5 0 65.5

1 75 64 1 71

1 75 64 0 68

34

More Statistics tutorial at www.dumblittledoctor.com

ExampleExampleImportant calculations

162.4149)( 2 ==

yySSE ii060.515,11)( 2 ==

yySST i

ii

Is the predicted height

i iy X

=

6397.012 ==SSTSSEr

p gof each child given a set of predictor variables

64.4)1(

=+

=kn

SSEMSE)1( +kn

35

More Statistics tutorial at www.dumblittledoctor.com

ExampleExampleAre these values significantly different than zero?

Ho: j = 0H : 0

g y

Ha: j 0

Reject H0j if 2 245

j

t t t

= > = =( 1), / 2 894,0.025 2.245( )

1.63 0.0118 0.0125 0.005960 0118 0 000184 0 0000143 0 0000223

j n k

j

t t tSE

+= > = =

1 0.0118 0.000184 0.0000143 0.0000223( ' )

0.0125 0.0000143 0.000211 0.00003270.00596 0.0000223 0.0000327 0.00447

V X X

= =

( )jSE

= * jjMSE v36

ExampleExample-estimate SE t

Intercept 15.3 2.75 5.59*

Father 0.406 0.0292 13.9*HeightMother H i ht

0.321 0.0313 10.3*HeightGender 5.23 0.144 36.3*

* p

EXAMPLEEXAMPLETesting the model as a whole:

Ho: 0 = 1 = 2 = 3 = 0Ha: The above is not true.

22

, ( 1), 3,894,.052

2

[ ( 1)] 2.615(1 )

1 0 6397

k n kr n kF f f

k rSSE

+ +

= > = =Reject H0 if

2

2

1 0.6397

( ) 4149.162i i

rSST

SSE y y

= =

= =2( ) 11,515.060iSST y y= =

Since F = 529.032 >2.615, we reject Ho and conclude that our model predicts height better than by chance.

38

More Statistics tutorial at www.dumblittledoctor.com

EXAMPLEMaking Predictions

EXAMPLE

Lets say George Clooney (71 inches) and Madonna (64 inches) have a baby boy.( ) y y* 15.34476 0.405978 (71) 0.321495(64) + 5.225951(1) 69.97Y

= + + =

95% Prediction interval:

**)*'1(** 025,.894 YVxxMSEtY +

%

*)*'1(** 025,.894 VxxMSEtY ++

69.97 4.84 = (65.13, 74.81)39

More Statistics tutorial at www.dumblittledoctor.com

EXAMPLESAS code

EXAMPLE

ods graphics ondata revise;

set mylib.galton;

SAS code

set mylib.galton;if sex = 'M' then gender = 1.0;else gender = 0.0;

run;

proc reg data=revise;title "Dependence of Child Heights on Parental

Heights";g ;model height = father mother gender / vif;run;quit;

Alternatively one can use proc GLM procedure that canAlternatively, one can use proc GLM procedure that can incorporate the categorical variable (sex) directly via the class statement. 40

More Statistics tutorial at www.dumblittledoctor.com

Dependence of Child Heights on Parental Heights 2

The REG ProcedureModel: MODEL1

Dependent Variable: height height

Number of Observations Read 898Number of Observations Used 898

Analysis of Variance

Sum of MeanSum of MeanSource DF Squares Square F Value Pr > F

Model 3 7365.90034 2455.30011 529.03 |t|

Intercept Intercept 1 15.34476 2.74696 5.59

EXAMPLEEXAMPLE

42

EXAMPLEEXAMPLE

By Gary Bedford &Bedford &Christine Vendikos 43

5. Variables Selection Method

A. Stepwise RegressionA. Stepwise Regression

44

Variables selection methodVariables selection method

(1) Why do we need to select the variables?(1) Why do we need to select the variables?

(2) How do we select variables?(2) How do we select variables?* stepwise regression* best subset regression best subset regression

45

Stepwise RegressionStepwise Regression

(p-1)-variable model:(p 1) variable model:

ipipii xxY ++++= 1,11,10 ...

P-variable model

ipippipii xxxY +++++= ,1,11,10 ... pppp

46

test-F Partial

0 testHypothesis

H

0

0

:1

:0

=

pp

pp

H

H

:1 pp

)1(11

)]1(/[1/)(

+ >

= pn

ppp fSSE

SSESSEF ),1(,1

:

)]1(/[

++

p

pnp

p

tstatistictest

fpnSSE

|:|

)(:

>

=p

pp

ttHreject

SEtstatistictest

2/),1(0 |:| +> pnpp ttHreject47

Partial correlation coefficientsPartial correlation coefficients

11

1111|

21...1 )...(

)...()...(

=

=

pp xxSSE

xxSSExxSSESSE

SSESSEr

p

pp

p

ppxxyx

2

2

2 1...1|

1

)]1([

+== pxxpyx

r

pnrtF pp

1...1|1

pxxpyxr

48

5. Variables selection method

A. Stepwise Regression:A. Stepwise Regression: SAS Example

49

More Statistics tutorial at www.dumblittledoctor.com

Example 11.5 (T&D pg. 416), 11.9 (T&D pg. 431)The following table shows data on the heat evolved in calories during the hardening of

No. X1 X2 X3 X4 Y

cement on a per gram basis (y) along with the percentages of four ingredients: tricalcium aluminate (x1), tricalcium silicate (x2), tetracalcium alumino ferrite (x3), and dicalcium silicate (x4).

1 7 26 6 60 78.52 1 29 15 52 74.33 11 56 8 20 104.34 11 31 8 47 87.65 7 52 6 33 95.96 11 55 9 22 109.26 11 55 9 22 109.27 3 71 17 6 102.78 1 31 22 44 72.59 2 54 18 22 93 19 2 54 18 22 93.110 21 47 4 26 115911 1 40 23 34 83.812 11 66 9 12 113 312 11 66 9 12 113.313 10 68 8 12 109.4

50

SAS Program (stepwise variable selection is used)

data example115;data example115;input x1 x2 x3 x4 y;datalines;7 26 6 60 78.51 29 15 52 74 31 29 15 52 74.311 56 8 20 104.311 31 8 47 87.67 52 6 33 95.911 55 9 22 109.23 71 17 6 102.71 31 22 44 72.52 54 18 22 93.12 54 18 22 93.121 47 4 26 115.91 40 23 34 83.811 66 9 12 113.310 68 8 12 109 410 68 8 12 109.4;run;proc reg data=example115;

model y = x1 x2 x3 x4 /selection=stepwise;run; 51

Selected SAS outputSelected SAS output

The SAS System 22:10 Monday, November 26, 2006 3

The REG ProcedureModel: MODEL1

Dependent Variable: y

Stepwise Selection: Step 4

Parameter StandardVariable Estimate Error Type II SS FVariable Estimate Error Type II SS F Value Pr > F

Intercept 52.57735 2.28617 3062.60416 528.91

SAS Output (cont)SAS Output (cont)All variables left in the model are significant at the 0.1500 level.

No other variable met the 0.1500 significance level for entry into the model.

Summary of Stepwise SelectionSummary of Stepwise Selection

Variable Variable Number Partial ModelStep Entered Removed Vars In R-Square R-Square C(p) F Value Pr > F

1 x4 1 0.6745 0.6745 138.731 22.80 0.00062 x1 2 0.2979 0.9725 5.4959 108.22

5. Variables selection method

B. Best Subsets RegressionB. Best Subsets Regression

54

More Statistics tutorial at www.dumblittledoctor.com

Best Subsets RegressionBest Subsets Regression

For the stepwise regression algorithmFor the stepwise regression algorithmThe final model is not guaranteed to be optimal in any specified sensein any specified sense.

In the best subsets regressionIn the best subsets regression, subset of variables is chosen from the collection of all subsets of k predictor variables) thatof all subsets of k predictor variables) that optimizes a well-defined objective criterion

55

More Statistics tutorial at www.dumblittledoctor.com

Best Subsets RegressionBest Subsets Regression

In the stepwise regressionIn the stepwise regression,We get only one single final models.

In the best subsets regression, f fThe investor could specify a size for the

predictors for the model.

56

Best Subsets RegressionBest Subsets Regression

SSESSROptimality Criteria

SSTSSE

SSTSSR

r ppp == 12 rp2-Criterion:

MSE

C Criterion ( d d f it f t ti d it bilit

Adjusted rp2-Criterion: MSTMSE

r ppadj = 12

,

n

YEYE 2]][][[1

Cp-Criterion (recommended for its ease of computation and its ability to judge the predictive power of a model)

=

=i

iipp YEYE1

2 ]][][[

The sample estimator, Mallows Cp-statistic, is given by

npSSE

C pp ++= )1(2 2 57

More Statistics tutorial at www.dumblittledoctor.com

Best Subsets RegressionBest Subsets Regression

AlgorithmAlgorithmNote that our problem is to find the minimum of a given function.

Use the stepwise subsets regression algorithm p g gand replace the partial F criterion with other criterion such as Cp.pEnumerate all possible cases and find the minimum of the criterion functions.Other possibility?

58

More Statistics tutorial at www.dumblittledoctor.com

Best Subsets Regression & SASBest Subsets Regression & SAS

proc reg data=example115;model y = x1 x2 x3 x4 /selection=ADJRSQ;

run;

For the selection option SAS has implemented 9 methodsFor the selection option, SAS has implemented 9 methods in total. For best subset method, we have the following options:Maximum R2 Improvement (MAXR)Minimum R2 (MINR) ImprovementR2 Selection (RSQUARE)( Q )Adjusted R2 Selection (ADJRSQ)Mallows' Cp Selection (CP)

59

More Statistics tutorial at www.dumblittledoctor.com

6. Building A Multiple Regression ModelRegression Model

Steps and StrategySteps and Strategy

60

More Statistics tutorial at www.dumblittledoctor.com

Modeling is an iterative process SeveralModeling is an iterative process. Several cycles of the steps maybe needed before arriving at the final modelarriving at the final model.The basic process consists of seven steps

61

More Statistics tutorial at www.dumblittledoctor.com

Get started and Follow the StepsGet started and Follow the Steps

Categorization by Usage Collect the Datag y g

Divide the Data Explore the Data

Fit Candidate Models

Select and Evaluate

Select the Final ModelSelect the Final Model62

More Statistics tutorial at www.dumblittledoctor.com

Step IStep I

Decide the type of model neededDecide the type of model needed, according to different usage.Main categories include:Main categories include:PredictiveTheoreticalControlInferentialData Summary

Sometimes, models are involved in multiple purposes. p p p

63

More Statistics tutorial at www.dumblittledoctor.com

Step IIStep II

Collect the DataCollect the DataPredictor (X)R (Y)Response (Y)

Data should be relevant and bias-free

64

More Statistics tutorial at www.dumblittledoctor.com

Step IIIStep III

Explore the DataExplore the Data

Linear Regression Model is sensitive to theLinear Regression Model is sensitive to the noise. Thus, we should treat outliers and influential observations cautiously.influential observations cautiously.

65

More Statistics tutorial at www.dumblittledoctor.com

Step IVStep IV

Divide the DataDivide the DataTraining Sets: buildingTest Sets: checkingTest Sets: checking

How to divide?Large sample Half-Half

Small sample size of training set >16Small sample size of training set 16

66

More Statistics tutorial at www.dumblittledoctor.com

Step VStep V

Fit several Candidate ModelsFit several Candidate ModelsUsing Training Set.

67

More Statistics tutorial at www.dumblittledoctor.com

Step VIStep VI

Select and Evaluate a Good ModelSelect and Evaluate a Good ModelTo improve the violations of model assumptions.

68

More Statistics tutorial at www.dumblittledoctor.com

Step VIIStep VII

Select the Final ModelSelect the Final ModelUse test set to compare competing models by cross validating themmodels by cross-validating them.

69

More Statistics tutorial at www.dumblittledoctor.com

Regression Diagnostics (Step VI)Regression Diagnostics (Step VI)

Graphical Analysis of ResidualsGraphical Analysis of ResidualsPlot Estimated Errors vs. Xi Values

Difference Between Actual Yi & Predicted YiDifference Between Actual Yi & Predicted YiEstimated Errors Are Called Residuals

Plot Histogram or Stem-&-Leaf of Residualsg

PurposesExamine Functional Form (Linearity )Examine Functional Form (Linearity )Evaluate Violations of Assumptions

70

More Statistics tutorial at www.dumblittledoctor.com

Linear Regression AssumptionsLinear Regression Assumptions

Mean of Probability Distribution of Error IsMean of Probability Distribution of Error Is 0Probability Distribution of Error HasProbability Distribution of Error Has Constant VarianceP b bilit Di t ib ti f E i N lProbability Distribution of Error is NormalErrors Are Independent

71

More Statistics tutorial at www.dumblittledoctor.com

Residual Plot f F ti l F (Li it )for Functional Form (Linearity)

Add X^2 TermAdd X^2 Term Correct SpecificationCorrect Specification

e e

X XX X

72

More Statistics tutorial at www.dumblittledoctor.com

Residual Plot f E l V ifor Equal Variance

Unequal VarianceUnequal Variance Correct SpecificationCorrect Specification

SR SR

X XX XFanFan--shaped.shaped.Standardized residuals used typically (residual Standardized residuals used typically (residual

divided by standard error of prediction)divided by standard error of prediction)73

More Statistics tutorial at www.dumblittledoctor.com

Residual Plot f I d dfor Independence

SR SR

Not IndependentNot Independent Correct SpecificationCorrect Specification

SR SR

X X

74

More Statistics tutorial at www.dumblittledoctor.com