generalized linear regression - little dumb doctor to multiple linear regression 3 tics tutorial at...

of 73/73
Multiple Linear Regression d and the General Linear Model the General Linear Model 1 More Statistics tutorial at www.dumblittledoctor.com

Post on 02-Apr-2018

213 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Multiple Linear Regressiondand

    the General Linear Modelthe General Linear Model

    1

    More Statistics tutorial at www.dumblittledoctor.com

  • OutlineOutline

    1. Introduction to Multiple Linear Regressionp g2. Statistical Inference 3. Topics in Regression Modelingp g g4. Example 5. Variable Selection Methods6. Regression Diagnostic and Strategy for Building a Model

    2

    More Statistics tutorial at www.dumblittledoctor.com

  • 1.Introduction to Multiple Linear

    RegressionRegression

    3

    More Statistics tutorial at www.dumblittledoctor.com

  • Multiple Linear RegressionMultiple Linear Regression Regression analysis is a statistical methodology to estimate the relationship of a response variable to a set of predictor variables

    Multiple linear regression extends simple linear regression model to the case of two or more predictor variable

    Example:Example:

    A multiple regression analysis might show us that the demand of a product varies directly with the change in demographic characteristics (age, income) of a market area

    Historical Background Francis Galton started using the term regression in his biology research

    area.

    Francis Galton started using the term regression in his biology research Karl Pearson and Udny Yule extended Galtons work to the statistical context Legendre and Gauss developed the method of least squares used in regression analysis Ronald Fisher developed the maximum likelihood method used in the related statistical inference (test of the significance of regression etc.).

    4

    More Statistics tutorial at www.dumblittledoctor.com

  • Probabilistic Model

    iy is the observed value of the random variable (r.v.) iY which depends on

    1 1 , 1, 2,...,ki i i k i iY x x x i n 0 1 2= + + + ... + + =

    1 2, , ,i i ikx x xK according to the following model:fixed predictor values

    1 1 , 1, 2,...,ki i i k i iY x x x i n 0 1 2 + + + ... + +

    Here 0 1, , , k K are unknown model parameters, and n is the number of observations.

    The random error, , are assumed to be independent r.v.s with mean 0 and variance i

    Thus are independent r.v.s with mean and variance , where iY i

    22p ,i i

    1 1( ) ki i i k ii E Y x x x 0 1 2= = + + + ... +

    6

    More Statistics tutorial at www.dumblittledoctor.com

  • Fitting the model

    The least squares (LS) method is used to find a line that fits the equationThe least squares (LS) method is used to find a line that fits the equation

    Specifically, LS provides estimates of the unknown model parameters,

    1 1 , 1, 2,...,ki i i k i iY x x x i n 0 1 2= + + + ... + + =

    0 1, , , k K Qwhich minimizes, , the sum of squared difference of theobserved values, , and the corresponding points on the line with the same xsiy

    1 22

    0 1 21[ ( ... )]k

    n

    i i i k ii

    Q y x x x =

    = + + + + The LS can be found by taking partial derivatives of Q with respect to the unknown variables and setting them equal to 0. The result is a set of simultaneous linear equations.

    0 1, , , k K

    The resulting solutions, are the least squares (LS) estimators of , respectively.

    0 1, , , k K0 1 , , , k K

    Please note the LS method is non-parametric. That is, no probability distribution assumptions on Y or are needed. 7

    More Statistics tutorial at www.dumblittledoctor.com

  • Goodness of Fit of the Model

    To evaluate the goodness of fit of the LS model, we use the residuals ( 1, 2 , , )i i ie y y i n= = K

    gdefined by are the fitted values: iy

    An overall measure of the goodness of fit is the error sum of squares (SSE)

    1 1 ( 1, 2, ..., )ki i i k iy x x x i n 0 1 2= + + + ... + =

    2

    1min

    n

    ii

    Q SSE e=

    = = A f h d fi i i i il h i i l li i A few other definition similar to those in simple linear regression:

    total sum of squares (SST):2( )SST 2( )iSST y y=

    regression sum of squares (SSR):

    SSR SST SSE=

    8

    More Statistics tutorial at www.dumblittledoctor.com

  • coefficient of multiple determination:

    2 1SSR SSER2 1SS SSRSST SST

    = =

    20 1R values closer to 1 represent better fits adding predictor variables never decreases and generally increases 2R

    multiple correlation coefficient (positive square root of ):2R

    2R R= + only positive square root is used

    R is a measure of the strength of the association between the predictors ( ) d th i bl Y(xs) and the one response variable Y

    9

    More Statistics tutorial at www.dumblittledoctor.com

  • Multiple Regression Model in Matrix Notationp g

    The multiple regression model can be represented in a compact form using matrix notationLet:

    1

    2 ,

    YY

    Y

    = M

    1

    2 ,

    yy

    y

    = M

    1

    2

    = M

    nY

    M

    n

    y

    y

    M

    n

    M

    be the n x 1 vectors of the r.v.s , their observed values , and random errors , ti l f ll b ti

    'iY s 'iy s 'i srespectively for all n observationsLet:

    11 1

    21 2

    11

    k

    k

    x xx x

    X

    =

    L

    L

    M M O M

    11 n nkx x

    M M O M

    L

    be the n x (k + 1) matrix of the values of the predictor variables for all n observationsbe the n x (k + 1) matrix of the values of the predictor variables for all n observations(the first column corresponds to the constant term )

    010

    More Statistics tutorial at www.dumblittledoctor.com

  • Let: 0

    1

    and

    0

    1

    1

    k

    =

    M

    and

    k

    =

    M

    be the (k + 1) x 1 vectors of unknown model parameters and their LS estimates, respectively

    The model can be rewritten as:

    Y X Y X = + The simultaneous linear equations whose solutions yields the LS estimates:

    ' 'X X X y =X X X y = If the inverse of the matrix exists, then the solution is given by:'X X

    1 ( ' ) 'X X X y =

    11

    More Statistics tutorial at www.dumblittledoctor.com

  • 22.Statistical InferenceStatistical Inference

    12

    More Statistics tutorial at www.dumblittledoctor.com

  • Determining the statistical significance of the di t i blpredictor variables:

    For statistical inferences, we need the assumption that

    We test the hypotheses: H0 j : j = 0 H1 j : j 0vs.( )2~ 0,

    iid

    i N *i.i.d. means independent & identically distributed

    If we cant reject H0 j : Bj = 0 , then the corresponding variable x j is not a significant predictor of y.j is not a significant predictor of y.

    It is easily shown that each j is normal with mean jand variance 2v jj , where is the jth diagonal entry of the

    matrix V = (X ' X)1v jj

    13

    More Statistics tutorial at www.dumblittledoctor.com

  • Deriving a pivotal quantity for the inference on

    ),(~ 2 jjjj vN Recall

    j

    The unbiased estimator of the unknown error varianceis given by

    2

    S2 = SSEn (k +1)

    =ei

    2

    n (k +1)=

    MSEd.o. f .

    We also know that2

    2( 1)2 2

    ( ( 1)) ~ n kn k S SSEW

    + +

    = = , and thatS2and j are statistically independent.

    With ~ (0,1),j jjj

    Z Nv

    = and by the definition of the t-distribution,

    j jZT T

    = =

    we obtain the pivotal quantity for the inference on j

    ( 1)~/ ( 1) n kjjT T

    W n k S v += =

    +14

  • Derivation of the Confidence Interval for j jj

    = ++ 1)( )1(,2/)1(,2/ knjj

    jjkn tvS

    tP

    =+ ++ 1)( )1(,2/)1(,2/ jjknjjjjknj vstvstP

    Thus the 100(1-)% confidence interval for is: j

    / 2, ( 1) ( )j n k jt SE +

    where jjj vsSE =)(15

  • Derivation of the Hypothesis Test for jat the significance level Hypotheses: 0 : 0jH =

    j

    yp: 0

    j

    a jH

    The test statistic is: 0 0 HjT T

    The test statistic is: 0 ( 1)~

    jn k

    jj

    T TS v +

    =

    The decision rule of the test is derived based on the Type I

    P (Reject H0 | H0 is true) =

    t

    error rate . That is0( )P T c =

    / 2, ( 1)n kc t +=

    Therefore, we reject H0 at the significance level if and only if h i th b d l f t Tif , where is the observed value of 0 / 2, ( 1)n kt t +

    16

    0t 0T

  • Another Hypothesis TestAnother Hypothesis Test

    f llH 0 0 kNow consider:

    for allfor at least one

    H0 : j = 0Ha : j 0

    0 j k0 j k

    When H0 is true, the test statistics 0 , ( 1)~ k n kMSRF fMSE +

    =

    An alternative and equivalent way to make a decision for a statisticaltest is through the p-value, defined as:

    p = P(observe a test statistic value at least as extreme as the one observed

    At the significance level we reject H0 if and only if p <

    0| )H

    At the significance level , we reject H0 if and only if p < 17

  • The General Hypothesis TestThe General Hypothesis Test

    Consider the full model:

    Now consider a partial model:

    Yi = 0 + 1xi1 + ...+ k xik + i (i=1,2,n)

    Yi = 0 + 1xi1 + ...+ km xi,km + i

    H : = = = 0 vs H : 0

    (i=1,2,n)

    H th H0 : km +1 = ... = k = 0 vs. Ha : j 0

    for at least one k m +1 j k Hypotheses:

    Test statistic: 0 , ( 1)( ) / ~

    /[ ( 1)]k m k

    m n kk

    SSE SSE mF fSSE n k

    +

    =

    +

    Reject H0 when 0 , ( 1),m n kF f +> 18

  • Estimating and Predicting Future ObservationsEstimating and Predicting Future Observations

    Let x* = (x0*,x1

    *,...,xk*)' and let

    * * * * 0 1 1 ... k kY x x x = + + + =

    * * * *

    Th i t l tit f i* *

    T T =

    * * * * 0 1 1 ... k kx x x = + + + =

    * The pivotal quantity for is ( 1)* * ~ n kT Ts x Vx +=

    Using this pivotal quantity, we can derive a CI for the estimated

    mean *: * *2/),1(

    * Vxxst kn +

    Additionally, we can derive a prediction interval (PI) to predict Y*:Additionally, we can derive a prediction interval (PI) to predict Y :* *

    2/),1(* 1 VxxstY kn + + 19

  • 3. T i i R iTopics in Regression

    ModelingModeling

    20

    More Statistics tutorial at www.dumblittledoctor.com

  • 3.1 Multicollinearity3.1 Multicollinearity

    Definition The predictor variables areDefinition. The predictor variables are linearly dependent.

    This can cause serious numerical and t ti ti l diffi lti i fitti thstatistical difficulties in fitting the

    regression model unless extra predictor i bl d l t dvariables are deleted.

    21

  • How does the multicollinearity cause diffi lti ?difficulties?

    ( ) ( ) invertable be must Thus , equation the to solution the is ^ XXyXXX TTT =( ) ( ).computable and unique be to for order in

    ^

    If th i t lti lli it hIf the approximate multicollinearity happens:

    1. is nearly singular, which makes numerically unstable. This reflected in large changes in their magnitudes with small changes in data.

    TX X^

    2. The matrix has very large elements. Therefore are large, which makes statistically nonsignificant.

    1)( = XXV T jjvVar 2^)( =

    j

    ^g , y gj

    22

    More Statistics tutorial at www.dumblittledoctor.com

  • Measures of MulticollinearityMeasures of Multicollinearity

    1. The correlation matrix R.1. The correlation matrix R.Easy but cant reflect linear relationships between more than two variables.

    2. Determinant of R can be used as measurement of singularity of .TX X

    3. Variance Inflation Factors (VIF): the diagonal elements of . VIF>10 is regarded as

    bl1R

    unacceptable.

    23

  • 3.2 Polynomial Regression3.2 Polynomial Regression

    A special case of a linear model:p

    0 1 ...k

    ky x x = + + + +

    Problems:1. The powers of x, i.e., tend to be highly

    correlated

    2, , , kx x xKcorrelated.

    2. If k is large, the magnitudes of these powers tend to vary over a rather wide range.g

    So, set k5.

    24

    More Statistics tutorial at www.dumblittledoctor.com

  • SolutionsSolutions1. Centering the x-variable:

    * * *( ) ( ) ky x x x x

    = + + + +

    Effect: removing the non-essential multicollinearity in the data.

    0 1 ( ) ... ( )ky x x x x = + + + +

    2. Further more, we can standardize the data by dividing the standard deviation of x: s

    Eff t h l i t ll i t th d bl

    x

    x xs

    xs

    Effect: helping to alleviate the second problem.

    3. Using the first few principal components of the original fvariables instead of the original variables.

    25

    More Statistics tutorial at www.dumblittledoctor.com

  • 3.3 Dummy Predictor Variables & The G l Li M d lGeneral Linear Model

    How to handle the categorical predictorHow to handle the categorical predictor variables?

    1. If we have categories of an ordinal i bl h th i fvariable, such as the prognosis of a

    patient (poor, average, good), one can i i l t thassign numerical scores to the

    categories. (poor=1, average=2, good=3)

    26

    More Statistics tutorial at www.dumblittledoctor.com

  • 2. If we have nominal variable with c>=2 categories. Use c-1 indicator variables,

    , called Dummy Variables, to 1 1, , cx x Kcode.

    for the ith category, 1ix = 1 1i c

    for the cth category.1 1... 0cx x = = =

    27

  • Why dont we just use c indicator variables:?1 2, , ..., cx x x

    If th t th ill b liIf we use that, there will be a linear dependency among them:

    1 2 ... 1cx x x+ + + =

    This will cause multicollinearity.

    28

  • Example of the dummy variablesExample of the dummy variables

    For instance, if we have four years of quarterly sale data , y q yof a certain brand of soda. How can we model the time trend by fitting a multiple regression equation?

    Solution: We use quarter as a predictor variable x1. To model the seasonal trend, we use indicator variables x2, x3, x4, for Winter, S SSpring and Summer, respectively. For Fall, all three equal zero. That means: Winter-(1,0,0), Spring-(0,1,0), Summer-(0,0,1), Fall-(0,0,0).Then we have the model:

    0 1 1 2 2 3 3 4 4 ( 1,...,16)i i i i i iY x x x x i = + + + + + =

    29

  • 3. Once the dummy variables are included, the resulting regression model is referred to as aresulting regression model is referred to as a General Linear Model.

    This term must be differentiated from that of the Generalized Linear Model which include the General Linear Model as a special case with the identity link function:

    The generalized linear model will link the model

    1 1( ) ki i i k ii E Y x x x 0 1 2= = + + + ... +

    The generalized linear model will link the model parameters to the predictors through a link function. For another example, we will check out the logit link in the logistic regression this afternoon.

    30

    More Statistics tutorial at www.dumblittledoctor.com

  • 4. Example4. ExampleHere we revisit the classic regression towards Mediocrity in Hereditary Stature by Francis GaltonMediocrity in Hereditary Stature by Francis GaltonHe performed a simple regression to predict offspring height based on the average parent heightSlope of regression line was less than 1 showing that extremely tall parents had less extremely tall childrenAt the time Galton did not have multiple regression asAt the time, Galton did not have multiple regression as a tool so he had to use other methods to account for the difference between male and female heightsWe can now perform multiple regression on parent-offspring height and use multiple variables as predictorsp

    31

    More Statistics tutorial at www.dumblittledoctor.com

  • ExampleExample

    OUR MODEL: iiiioi xxxY ++++= 332211

    Y = height of childh i ht f f thx1 = height of father

    x2 = height of motherx = gender of childx3 = gender of child

    32

  • ExampleExampleIn matrix notation:In matrix notation:

    XY +=

    yXXX ')'( 1

    =

    We find that: 0 = 15.344761 = 0.4059782 = 0.3214953 = 5.22595

    33

    More Statistics tutorial at www.dumblittledoctor.com

  • ExampleExampleFather Mother Gender Child

    1 78.5 67 1 73.2

    1 78.5 67 1 69.2

    1 78.5 67 0 69

    1 78.5 67 0 69

    1 75.5 66.5 1 73.5X =1 75.5 66.5 1 72.5

    1 75.5 66.5 0 65.5

    1 75.5 66.5 0 65.5

    1 75 64 1 71

    1 75 64 0 68

    34

    More Statistics tutorial at www.dumblittledoctor.com

  • ExampleExampleImportant calculations

    162.4149)( 2 ==

    yySSE ii060.515,11)( 2 ==

    yySST i

    ii

    Is the predicted height

    i iy X

    =

    6397.012 ==SSTSSEr

    p gof each child given a set of predictor variables

    64.4)1(

    =+

    =kn

    SSEMSE)1( +kn

    35

    More Statistics tutorial at www.dumblittledoctor.com

  • ExampleExampleAre these values significantly different than zero?

    Ho: j = 0H : 0

    g y

    Ha: j 0

    Reject H0j if 2 245

    j

    t t t

    = > = =( 1), / 2 894,0.025 2.245( )

    1.63 0.0118 0.0125 0.005960 0118 0 000184 0 0000143 0 0000223

    j n k

    j

    t t tSE

    += > = =

    1 0.0118 0.000184 0.0000143 0.0000223( ' )

    0.0125 0.0000143 0.000211 0.00003270.00596 0.0000223 0.0000327 0.00447

    V X X

    = =

    ( )jSE

    = * jjMSE v36

  • ExampleExample-estimate SE t

    Intercept 15.3 2.75 5.59*

    Father 0.406 0.0292 13.9*HeightMother H i ht

    0.321 0.0313 10.3*HeightGender 5.23 0.144 36.3*

    * p

  • EXAMPLEEXAMPLETesting the model as a whole:

    Ho: 0 = 1 = 2 = 3 = 0Ha: The above is not true.

    22

    , ( 1), 3,894,.052

    2

    [ ( 1)] 2.615(1 )

    1 0 6397

    k n kr n kF f f

    k rSSE

    + +

    = > = =Reject H0 if

    2

    2

    1 0.6397

    ( ) 4149.162i i

    rSST

    SSE y y

    = =

    = =2( ) 11,515.060iSST y y= =

    Since F = 529.032 >2.615, we reject Ho and conclude that our model predicts height better than by chance.

    38

    More Statistics tutorial at www.dumblittledoctor.com

  • EXAMPLEMaking Predictions

    EXAMPLE

    Lets say George Clooney (71 inches) and Madonna (64 inches) have a baby boy.( ) y y* 15.34476 0.405978 (71) 0.321495(64) + 5.225951(1) 69.97Y

    = + + =

    95% Prediction interval:

    **)*'1(** 025,.894 YVxxMSEtY +

    %

    *)*'1(** 025,.894 VxxMSEtY ++

    69.97 4.84 = (65.13, 74.81)39

    More Statistics tutorial at www.dumblittledoctor.com

  • EXAMPLESAS code

    EXAMPLE

    ods graphics ondata revise;

    set mylib.galton;

    SAS code

    set mylib.galton;if sex = 'M' then gender = 1.0;else gender = 0.0;

    run;

    proc reg data=revise;title "Dependence of Child Heights on Parental

    Heights";g ;model height = father mother gender / vif;run;quit;

    Alternatively one can use proc GLM procedure that canAlternatively, one can use proc GLM procedure that can incorporate the categorical variable (sex) directly via the class statement. 40

    More Statistics tutorial at www.dumblittledoctor.com

  • Dependence of Child Heights on Parental Heights 2

    The REG ProcedureModel: MODEL1

    Dependent Variable: height height

    Number of Observations Read 898Number of Observations Used 898

    Analysis of Variance

    Sum of MeanSum of MeanSource DF Squares Square F Value Pr > F

    Model 3 7365.90034 2455.30011 529.03 |t|

    Intercept Intercept 1 15.34476 2.74696 5.59

  • EXAMPLEEXAMPLE

    42

  • EXAMPLEEXAMPLE

    By Gary Bedford &Bedford &Christine Vendikos 43

  • 5. Variables Selection Method

    A. Stepwise RegressionA. Stepwise Regression

    44

  • Variables selection methodVariables selection method

    (1) Why do we need to select the variables?(1) Why do we need to select the variables?

    (2) How do we select variables?(2) How do we select variables?* stepwise regression* best subset regression best subset regression

    45

  • Stepwise RegressionStepwise Regression

    (p-1)-variable model:(p 1) variable model:

    ipipii xxY ++++= 1,11,10 ...

    P-variable model

    ipippipii xxxY +++++= ,1,11,10 ... pppp

    46

  • test-F Partial

    0 testHypothesis

    H

    0

    0

    :1

    :0

    =

    pp

    pp

    H

    H

    :1 pp

    )1(11

    )]1(/[1/)(

    + >

    = pn

    ppp fSSE

    SSESSEF ),1(,1

    :

    )]1(/[

    ++

    p

    pnp

    p

    tstatistictest

    fpnSSE

    |:|

    )(:

    >

    =p

    pp

    ttHreject

    SEtstatistictest

    2/),1(0 |:| +> pnpp ttHreject47

  • Partial correlation coefficientsPartial correlation coefficients

    11

    1111|

    21...1 )...(

    )...()...(

    =

    =

    pp xxSSE

    xxSSExxSSESSE

    SSESSEr

    p

    pp

    p

    ppxxyx

    2

    2

    2 1...1|

    1

    )]1([

    +== pxxpyx

    r

    pnrtF pp

    1...1|1

    pxxpyxr

    48

  • 5. Variables selection method

    A. Stepwise Regression:A. Stepwise Regression: SAS Example

    49

    More Statistics tutorial at www.dumblittledoctor.com

  • Example 11.5 (T&D pg. 416), 11.9 (T&D pg. 431)The following table shows data on the heat evolved in calories during the hardening of

    No. X1 X2 X3 X4 Y

    cement on a per gram basis (y) along with the percentages of four ingredients: tricalcium aluminate (x1), tricalcium silicate (x2), tetracalcium alumino ferrite (x3), and dicalcium silicate (x4).

    1 7 26 6 60 78.52 1 29 15 52 74.33 11 56 8 20 104.34 11 31 8 47 87.65 7 52 6 33 95.96 11 55 9 22 109.26 11 55 9 22 109.27 3 71 17 6 102.78 1 31 22 44 72.59 2 54 18 22 93 19 2 54 18 22 93.110 21 47 4 26 115911 1 40 23 34 83.812 11 66 9 12 113 312 11 66 9 12 113.313 10 68 8 12 109.4

    50

  • SAS Program (stepwise variable selection is used)

    data example115;data example115;input x1 x2 x3 x4 y;datalines;7 26 6 60 78.51 29 15 52 74 31 29 15 52 74.311 56 8 20 104.311 31 8 47 87.67 52 6 33 95.911 55 9 22 109.23 71 17 6 102.71 31 22 44 72.52 54 18 22 93.12 54 18 22 93.121 47 4 26 115.91 40 23 34 83.811 66 9 12 113.310 68 8 12 109 410 68 8 12 109.4;run;proc reg data=example115;

    model y = x1 x2 x3 x4 /selection=stepwise;run; 51

  • Selected SAS outputSelected SAS output

    The SAS System 22:10 Monday, November 26, 2006 3

    The REG ProcedureModel: MODEL1

    Dependent Variable: y

    Stepwise Selection: Step 4

    Parameter StandardVariable Estimate Error Type II SS FVariable Estimate Error Type II SS F Value Pr > F

    Intercept 52.57735 2.28617 3062.60416 528.91

  • SAS Output (cont)SAS Output (cont)All variables left in the model are significant at the 0.1500 level.

    No other variable met the 0.1500 significance level for entry into the model.

    Summary of Stepwise SelectionSummary of Stepwise Selection

    Variable Variable Number Partial ModelStep Entered Removed Vars In R-Square R-Square C(p) F Value Pr > F

    1 x4 1 0.6745 0.6745 138.731 22.80 0.00062 x1 2 0.2979 0.9725 5.4959 108.22

  • 5. Variables selection method

    B. Best Subsets RegressionB. Best Subsets Regression

    54

    More Statistics tutorial at www.dumblittledoctor.com

  • Best Subsets RegressionBest Subsets Regression

    For the stepwise regression algorithmFor the stepwise regression algorithmThe final model is not guaranteed to be optimal in any specified sensein any specified sense.

    In the best subsets regressionIn the best subsets regression, subset of variables is chosen from the collection of all subsets of k predictor variables) thatof all subsets of k predictor variables) that optimizes a well-defined objective criterion

    55

    More Statistics tutorial at www.dumblittledoctor.com

  • Best Subsets RegressionBest Subsets Regression

    In the stepwise regressionIn the stepwise regression,We get only one single final models.

    In the best subsets regression, f fThe investor could specify a size for the

    predictors for the model.

    56

  • Best Subsets RegressionBest Subsets Regression

    SSESSROptimality Criteria

    SSTSSE

    SSTSSR

    r ppp == 12 rp2-Criterion:

    MSE

    C Criterion ( d d f it f t ti d it bilit

    Adjusted rp2-Criterion: MSTMSE

    r ppadj = 12

    ,

    n

    YEYE 2]][][[1

    Cp-Criterion (recommended for its ease of computation and its ability to judge the predictive power of a model)

    =

    =i

    iipp YEYE1

    2 ]][][[

    The sample estimator, Mallows Cp-statistic, is given by

    npSSE

    C pp ++= )1(2 2 57

    More Statistics tutorial at www.dumblittledoctor.com

  • Best Subsets RegressionBest Subsets Regression

    AlgorithmAlgorithmNote that our problem is to find the minimum of a given function.

    Use the stepwise subsets regression algorithm p g gand replace the partial F criterion with other criterion such as Cp.pEnumerate all possible cases and find the minimum of the criterion functions.Other possibility?

    58

    More Statistics tutorial at www.dumblittledoctor.com

  • Best Subsets Regression & SASBest Subsets Regression & SAS

    proc reg data=example115;model y = x1 x2 x3 x4 /selection=ADJRSQ;

    run;

    For the selection option SAS has implemented 9 methodsFor the selection option, SAS has implemented 9 methods in total. For best subset method, we have the following options:Maximum R2 Improvement (MAXR)Minimum R2 (MINR) ImprovementR2 Selection (RSQUARE)( Q )Adjusted R2 Selection (ADJRSQ)Mallows' Cp Selection (CP)

    59

    More Statistics tutorial at www.dumblittledoctor.com

  • 6. Building A Multiple Regression ModelRegression Model

    Steps and StrategySteps and Strategy

    60

    More Statistics tutorial at www.dumblittledoctor.com

  • Modeling is an iterative process SeveralModeling is an iterative process. Several cycles of the steps maybe needed before arriving at the final modelarriving at the final model.The basic process consists of seven steps

    61

    More Statistics tutorial at www.dumblittledoctor.com

  • Get started and Follow the StepsGet started and Follow the Steps

    Categorization by Usage Collect the Datag y g

    Divide the Data Explore the Data

    Fit Candidate Models

    Select and Evaluate

    Select the Final ModelSelect the Final Model62

    More Statistics tutorial at www.dumblittledoctor.com

  • Step IStep I

    Decide the type of model neededDecide the type of model needed, according to different usage.Main categories include:Main categories include:PredictiveTheoreticalControlInferentialData Summary

    Sometimes, models are involved in multiple purposes. p p p

    63

    More Statistics tutorial at www.dumblittledoctor.com

  • Step IIStep II

    Collect the DataCollect the DataPredictor (X)R (Y)Response (Y)

    Data should be relevant and bias-free

    64

    More Statistics tutorial at www.dumblittledoctor.com

  • Step IIIStep III

    Explore the DataExplore the Data

    Linear Regression Model is sensitive to theLinear Regression Model is sensitive to the noise. Thus, we should treat outliers and influential observations cautiously.influential observations cautiously.

    65

    More Statistics tutorial at www.dumblittledoctor.com

  • Step IVStep IV

    Divide the DataDivide the DataTraining Sets: buildingTest Sets: checkingTest Sets: checking

    How to divide?Large sample Half-Half

    Small sample size of training set >16Small sample size of training set 16

    66

    More Statistics tutorial at www.dumblittledoctor.com

  • Step VStep V

    Fit several Candidate ModelsFit several Candidate ModelsUsing Training Set.

    67

    More Statistics tutorial at www.dumblittledoctor.com

  • Step VIStep VI

    Select and Evaluate a Good ModelSelect and Evaluate a Good ModelTo improve the violations of model assumptions.

    68

    More Statistics tutorial at www.dumblittledoctor.com

  • Step VIIStep VII

    Select the Final ModelSelect the Final ModelUse test set to compare competing models by cross validating themmodels by cross-validating them.

    69

    More Statistics tutorial at www.dumblittledoctor.com

  • Regression Diagnostics (Step VI)Regression Diagnostics (Step VI)

    Graphical Analysis of ResidualsGraphical Analysis of ResidualsPlot Estimated Errors vs. Xi Values

    Difference Between Actual Yi & Predicted YiDifference Between Actual Yi & Predicted YiEstimated Errors Are Called Residuals

    Plot Histogram or Stem-&-Leaf of Residualsg

    PurposesExamine Functional Form (Linearity )Examine Functional Form (Linearity )Evaluate Violations of Assumptions

    70

    More Statistics tutorial at www.dumblittledoctor.com

  • Linear Regression AssumptionsLinear Regression Assumptions

    Mean of Probability Distribution of Error IsMean of Probability Distribution of Error Is 0Probability Distribution of Error HasProbability Distribution of Error Has Constant VarianceP b bilit Di t ib ti f E i N lProbability Distribution of Error is NormalErrors Are Independent

    71

    More Statistics tutorial at www.dumblittledoctor.com

  • Residual Plot f F ti l F (Li it )for Functional Form (Linearity)

    Add X^2 TermAdd X^2 Term Correct SpecificationCorrect Specification

    e e

    X XX X

    72

    More Statistics tutorial at www.dumblittledoctor.com

  • Residual Plot f E l V ifor Equal Variance

    Unequal VarianceUnequal Variance Correct SpecificationCorrect Specification

    SR SR

    X XX XFanFan--shaped.shaped.Standardized residuals used typically (residual Standardized residuals used typically (residual

    divided by standard error of prediction)divided by standard error of prediction)73

    More Statistics tutorial at www.dumblittledoctor.com

  • Residual Plot f I d dfor Independence

    SR SR

    Not IndependentNot Independent Correct SpecificationCorrect Specification

    SR SR

    X X

    74

    More Statistics tutorial at www.dumblittledoctor.com