(simple) multiple linear regression and nonlinear models handouts... · (simple) multiple linear...

of 24 /24
(Simple) Multiple linear regression and Nonlinear models Multiple regression One response (dependent) variable: Y More than one predictor (independent variable) variable: X 1 , X 2 , X 3 etc. – number of predictors = p Number of observations = n

Author: tranmien

Post on 20-Aug-2018

253 views

Category:

Documents


0 download

Embed Size (px)

TRANSCRIPT

  • (Simple) Multiple linear regression and Nonlinear models

    Multiple regression

    One response (dependent) variable: Y

    More than one predictor (independent variable) variable: X1, X2, X3 etc. number of predictors = p

    Number of observations = n

  • Multiple regression - graphical interpretation

    0 1 2 3 4 5 6 7X1

    0

    5

    10

    15

    Y

    7 8 9 10 11 12X2

    0

    5

    10

    15

    Y

    Multiple regression graphical explanation.syd

    Two possible single variable models:1) yi = 0 + 1xi1 + I2) yi = 0 + 2xi2 + i

    Which is a better fit?

    Multiple regression - graphical interpretation

    Multiple regression graphical explanation.syd

    Two possible single variable models:1) yi = 0 + 1xi1 + I2) yi = 0 + 2xi2 + i

    Which is a better fit?

    0 1 2 3 4 5 6 7X1

    0

    5

    10

    15

    Y

    7 8 9 10 11 12X2

    0

    5

    10

    15

    Y

    P=0.02r2=0.67

    P=0.61r2=0.00

  • Multiple regression - graphical interpretation

    Multiple regression graphical explanation.syd

    Perhaps a multiple regression model work fit better:

    yi = 0 + 1xi1 + 2xi2 +i

    0 1 2 3 4 5 6 7X1

    0

    5

    10

    15

    Y

    1 3

    2

    4

    5

    6

    7 8 9 10 11 12X2

    0

    5

    10

    15

    Y

    X1 Y expected residual X21 4 3.02 0.98 11.52 3 4.58 -1.58 9.253 5 6.14 -1.14 9.254 9 7.7 1.3 11.25 11.5 9.26 2.24 11.96 9 10.82 -1.82 8

    residual

    y b b xi 0 1 i1y b b xi 0 1 i1

    Multiple regression - graphical interpretation

    Multiple regression graphical explanation.syd

    Perhaps a multiple regression model work fit better:

    yi = 0 + 1xi1 + 2xi2 +i

    0 1 2 3 4 5 6 7X1

    0

    5

    10

    15

    Y

    7 8 9 10 11 12X2

    -2

    -1

    0

    1

    2

    3y b b xi 0 1 i1y b b xi 0 1 i1

    y b b xi 0 1 i1y b b xi 0 1 i1

    Residual of

  • Multiple regression - graphical interpretation

    Perhaps a multiple regression model work fit better:

    yi = 0 + 1xi1 + 2xi2 +I Estimated by

    y b b x b xi 0 1 i1 2 i2Whole Model

    Summary of FitRSquareRSquare AdjRoot Mean Square ErrorMean of ResponseObservations (or Sum Wgts)

    0.9994690.9991140.1006616.916667

    6

    Analysis of Variance

    SourceModelErrorC. Total

    DF235

    Sum ofSquares

    57.1779350.030398

    57.208333

    Mean Square28.58900.0101

    F Ratio2821.464Prob > F

    |t|

  • Simple regression results

    Multiple regression 1.syd

    X1

    X1

    Y

    X2

    X2

    X3

    X3

    X4

    X4

    Y

    0.580y = 0+1x4

    0.0127y = 0+1x3

    0.366y = 0+1x2

  • Multiple regression - partial residual plots

    Multiple regression 1.syd

    y = 0+1x1+2x2+3x3+ 4x4

    Model Partial residual

    y = 0+2x2+3x3+ 4x4 Ypartial(1)y = 0+1x1+3x3 + 4x4 Ypartial(2)y = 0+1x1+2x2 + 4x4 Ypartial(3)y = 0+1x1 +2x2 +3x3 Ypartial(4)

    0 50 100 150 200 250 300 350

    X1-200

    -100

    0

    100

    200

    YPA

    RTI

    AL(

    1)

    -30 -20 -10 0 10 20 30

    X2-30

    -20

    -10

    0

    10

    20

    30

    YPA

    RTI

    AL(

    2)

    -15 -10 -5 0 5 10 15

    X3-10

    -5

    0

    5

    10

    15

    YPA

    RTI

    AL(

    3)

    0 10 20 30 40 50 60 70 80 90 100

    X4-3

    -2

    -1

    0

    1

    2

    3

    YPA

    RTI

    AL(

    4)

    0 50 100 150 200 250 300 350

    X10

    100

    200

    300

    400

    Y

    -30 -20 -10 0 10 20 30

    X20

    100

    200

    300

    400

    Y

    -15 -10 -5 0 5 10 15

    X30

    100

    200

    300

    400

    Y

    0 10 20 30 40 50 60 70 80 90 100

    X40

    100

    200

    300

    400

    Y

    Partial residuals vs Xi

    Raw data (Y) vs Xi

    Ypartial(4)y = 0+1x1 +2x2 +3x3 Ypartial(3)y = 0+1x1+2x2 + 4x4Ypartial(2)y = 0+1x1+3x3 + 4x4Ypartial(1)y = 0+2x2+3x3+ 4x4

    Partial residualModel

    Ypartial(4)y = 0+1x1 +2x2 +3x3 Ypartial(3)y = 0+1x1+2x2 + 4x4Ypartial(2)y = 0+1x1+3x3 + 4x4Ypartial(1)y = 0+2x2+3x3+ 4x4

    Partial residualModel

  • Regression models

    Linear model:

    yi = 0 + 1xi1 + 2xi2 + .... + i

    Sample equation:

    ...y b b x b xi 0 1 i1 2 i2

    Partial regression coefficients H0: 1 = 0 Partial population regression coefficient

    (slope) for Y on X1, holding all other Xs constant, equals zero

    Example: assume Y = bird abundance, X1=Patch Area and X2=Year slope of regression of Y against patch area,

    holding years constant, equals 0.

  • Multiple regression plane

    Bird

    Abu

    ndan

    ce

    Years Patch Area

    Testing H0: i = 0

    Use partial t-tests: t = bi / SEbi Compare with t-distribution with n-2 df Separate t-test for each partial

    regression coefficient in model Usual logic of t-tests:

    reject H0 if P < 0.05 (again this is convention dont feel tied to this)

  • Overall regression model

    H0: 1 = 2 = ... = 0 (all population slopes equal zero).

    Test of whether overall regression equation is significant.

    Use ANOVA F-test: Variation explained by regression Unexplained (residual) variation

    Assumptions

    Normality and homogeneity of variance for response variable (previously discussed)

    Independence of observations (previously discussed)

    Linearity (previously discussed) No collinearity (big deal in multiple

    regression)

  • Collinearity

    Collinearity: predictors correlated

    Assumption of no collinearity: predictor variables uncorrelated with (ie.

    independent of) each other Effect of collinearity:

    estimates of is and significance tests unreliable

    Checks for collinearity Correlation matrix and/or SPLOM between

    predictors Tolerance for each predictor:

    1-r2 for regression of that predictor on all others if tolerance is low (near 0.1) then collinearity is a

    problem VIF values

    1/tolerance (variance inflator function) look for large values

    (>10) Condition indices (not in JMP Pro)

    Greater than 15 be cautious Greater than 30 a serious problem

    Look at all indicators to determine extent of colinearity

  • Scatterplots Scatterplot matrix (SPLOM)

    pairwise plots for all variables Example: build a multiple regression model to predict total

    employment using values of six independent variables. See Longley.syd MODEL total = CONSTANT + deflator + gnp + unemployment +

    armforce + population + timeDEFLATOR

    DE

    FLAT

    OR

    GNP UNEMPLOY ARMFORCE POPULATN TIME

    DE

    FLATO

    R

    GN

    P

    GN

    P

    UN

    EMP

    LOY

    UN

    EM

    PLOY

    ARM

    FOR

    CE

    AR

    MFO

    RC

    E

    POPU

    LATN

    PO

    PU

    LATN

    DEFLATOR

    TIM

    E

    GNP UNEMPLOY ARMFORCE POPULATN TIME

    TIME

    Look at relationship between predictor variables immediately you can see colinearity problems

    Checks for collinearity Correlation matrix and/or SPLOM between

    predictors Tolerance for each predictor:

    1-r2 for regression of that predictor on all others if tolerance is low (near 0.1) then collinearity is a

    problem VIF values

    1/tolerance (variance inflator function) look for large values

    (>10) Condition indices

    Greater than 15 be cautious Greater than 30 a serious problem

    Look at all indicators to determine extent of colinearity

  • Condition indices1 2 3 4 5

    1.00000 9.14172 12.25574 25.33661 230.42395

    6 71048.08030 43275.04738

    Dependent Variable TOTAL N 16 Multiple R 0.998 Squared Multiple R 0.995 Adjusted Squared Multiple R 0.992 Standard Error of Estimate 304.854

    Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)CONSTANT -3.48226E+06 8.90420E+05 0.00000 . -3.91080 0.00356DEFLATOR 15.06187 84.91493 0.04628 0.00738 0.17738 0.86314GNP -0.03582 0.03349 -1.01375 0.00056 -1.06952 0.31268UNEMPLOY -2.02023 0.48840 -0.53754 0.02975 -4.13643 0.00254ARMFORCE -1.03323 0.21427 -0.20474 0.27863 -4.82199 0.00094POPULATN -0.05110 0.22607 -0.10122 0.00251 -0.22605 0.82621TIME 1829.15146 455.47850 2.47966 0.00132 4.01589 0.00304

    Tolerance and Condition Indices

    Longley.syz

    Variance Inflator Function (VIF)

    Confidence Interval for Regression Coefficients

    95.0% Confidence Interval Effect Coefficient Lower Upper VIF---------+----------------------------------------------------------------CONSTANT -3.482259E+006 -5.496529E+006 -1.467988E+006 .DEFLATOR 15.061872 -177.029036 207.152780 135.532438GNP -0.035819 -0.111581 0.039943 1,788.513483UNEMPLOY -2.020230 -3.125067 -0.915393 33.618891ARMFORCE -1.033227 -1.517949 -0.548505 3.588930POPULATN -0.051104 -0.562517 0.460309 399.151022TIME 1,829.151465 798.787513 2,859.515416 758.980597

  • Solutions to collinearity

    Simplest - Drop redundant (correlated) predictors

    Principal components regression potentially useful

    Best model?

    Model that best fits the data with fewest predictors

    Criteria for comparing fit of different models: r2 generally unsuitable adjusted r2 better Mallows Cp better AIC Best lower values indicate better fit

  • Explained variance

    r2

    proportion of variation in Y explained by linear relationship with X1, X2 etc.

    SS RegressionSS Total

    Screening models All subsets

    recommended many models if many predictors ( a big problem)

    Automated stepwise selection: forward, backward, stepwise NOT recommended unless you get the same

    model both ways Check AIC values Hierarchical partitioning

    contribution of each predictor to r2

  • Model comparison (simple version)

    Fit full model: y = 0+1x1+2x2+3x3+

    Fit reduced models (e.g.): y = 0+2x2+3x3+

    Compare

    Multiple regression 1

    X1

    X1

    X2 X3 X4 Y

    X1

    X2 X2

    X3 X3

    X4 X4

    X1

    Y

    X2 X3 X4 Y

    Y

    y = 0+1x1+2x2+3x3+ 4x4

    Any evidence of Colinearity?

    Model Building

  • Again check for colinearity

    Compare Models using AIC

    Model 1:

    AIC 78.67 Corrected AIC 85.67

    Model 2

    AIC 77.06 Corrected AIC 81.67

    y = 0+1x1+2x2+3x3+ 4x4

    y = 0+1x1+2x2+3x3

  • Formally: Akaike information criterion (AIC, AICc)

    Sometimes the following equation is used: AIC = 2k + n[ln(RSS/n)]

    where, k = number of fitted parametersn = number of observations

    = residual sum of squares (RSS) / AICc = corrected for small sample sizeLower score means better fit

    ln 2 1 2 1ln 2 1 2 1 2 1

    AIC:AICc:

    Model SelectionAll Possible Models

    Ordered up to best 4 models up to 4 terms per model.

    ModelX1X3X2X4X1,X2X1,X3X1,X4X3,X4X1,X2,X3X1,X2,X4X1,X3,X4X2,X3,X4X1,X2,X3,X4

    Number1111222233334

    RSquare0.97020.31340.04820.01840.99630.97670.97180.33460.99980.99640.97890.34400.9998

    RMSE17.556184.260999.2053100.7486.3536

    16.012117.591385.49731.59036.4809

    15.740187.67651.6295

    AICc168.292227.895234.100234.686131.774166.899170.473230.55481.6718135.060168.780234.04285.6721

    BIC169.525229.129235.333235.919132.695167.819171.394231.47581.7786135.167168.887234.14984.3388

  • How important is each predictor variable to the model?

    Compare models sequential sum of squaresModel Adjusted r2

    y = 0+1x1+2x2+3x3+ 4x4y = 0+1x1+2x2+3x3y = 0+1x1+2x2y = 0+1x1

    y = 0+1x1+2x2+3x3+ 4x4For reference the output from the full model

  • y = 0+1x1+2x2+3x3+ 4x4For reference the output from the full model

    Compare models sequential sum of squares

    0.968440.027430.00387-0.00001

    Contribution to Model r2

    0.96844y = 0+1x10.99587y = 0+1x1+2x20.99974y = 0+1x1+2x2+3x30.99973y = 0+1x1+2x2+3x3+ 4x4

    Adjusted r2Model

    0.968440.027430.00387-0.00001

    Contribution to Model r2

    0.96844y = 0+1x10.99587y = 0+1x1+2x20.99974y = 0+1x1+2x2+3x30.99973y = 0+1x1+2x2+3x3+ 4x4

    Adjusted r2Model

    (Simple) Non-linear regression models

  • Non-linear regression

    Use when you cannot easily linearize a relationship (that is clearly non-linear_

    One response (dependent) variable: Y

    One predictor (independent variable) variable: X1

    Non-linear functions (of many types)

    Regression models

    Linear model:

    yi = 0 + 1x1 +

    Non - Linear model (one of many possible):

    yi = 0 + 1x12 +

  • Non-linear regression What is the hypothesis??

    This is a very big question- lets come back to this What does r2 mean??

    In linear regression it is the explained variance divided by total variance

    In non-linear it is the same but variance explained can be calculated in two ways

    Based on

    Based on

    2i

    y2)( yyi

    Raw r2

    Mean corrected r2

    Non-linear regression

    What is the hypothesis??

    0 4 8 12 16X

    0

    10

    20

    30

    40

    50

    60

    Y

  • Non-linear regression (for example)

    B*Exp(c*x)ay Fit Curve

    Model ComparisonModelExponential 3P

    AICc81.089952

    BIC79.922153

    SSE87.897377

    MSE7.3247814

    RMSE2.7064333

    R-Square0.9729491

    Plot

    0

    10

    20

    30

    40

    50

    0 5 10 15X

    Exponential 3P

    Parameter EstimatesParameterAsymptoteScaleGrowth Rate

    Estimate1.76096131.57943840.2293354

    Std Error1.95590910.78590040.032577

    Lower 95%-2.072550.039102

    0.1654857

    Upper 95%5.59447273.11977480.2931851

    What are the hypotheses?

    Non-linear regression (many models might be adequate)

    What are the hypotheses?

    YExponential 2p: Y = a*Exp(b*X)

    Exponential 3p: Y = a+b*Exp(c*X)

    Polynomial cubic: Y = a+b*X+c*X2+d*X3

  • What are the hypotheses?

    Exponential 2p: Y = a*Exp(b*X)

    Exponential 3p: Y = a+b*Exp(c*X)

    Polynomial cubic: Y = a+b*X+c*X2+d*X3

    abc

    ab

    abcd

    Comparing regression Models Evaluate assumptions - sometimes (like in the examples

    here) there are violations Simple (but not always correct) - compare adjusted r2 Problem: what counts??

    Particularly problematic when there are differences in number of estimated parameters

    One solution: compared added fit to expected added fit (because of increased numbers of parameters) One major restriction: models that are nested are

    easier to compare Means that the general form is the same or can be made

    the same simply by modifying parameter values

  • Non-linear regression (many models might be adequate)

    What are the hypotheses?

    Fit Curve

    Model ComparisonModelExponential 2PExponential 3PCubic

    AICc78.06818281.08995286.847655

    AICc Weight0.810952

    0.17898890.010059

    .2 .4 .6 .8 BIC78.01051579.92215383.72124

    SSE92.69032487.89737794.528911

    MSE7.13002497.32478148.5935373

    RMSE2.67021062.70643332.9314736

    R-Square0.971474

    0.97294910.9709082

    Plot

    0

    10

    20

    30

    40

    50

    Y

    0 5 10 15X

    Exponential 2p: Y = a*Exp(b*X)

    Exponential 3p: Y = a+b*Exp(c*X)

    Polynomial cubic: Y = a+b*X+c*X2+d*X3

    Multiple and Non-Linear Regression

    Be careful! Know what your hypotheses are Understand how to build models to test your

    hypotheses Understand statistical output you may be

    mislead if you dont