(simple) multiple linear regression and nonlinear models handouts... · (simple) multiple linear...
Embed Size (px)
TRANSCRIPT
-
(Simple) Multiple linear regression and Nonlinear models
Multiple regression
One response (dependent) variable: Y
More than one predictor (independent variable) variable: X1, X2, X3 etc. number of predictors = p
Number of observations = n
-
Multiple regression - graphical interpretation
0 1 2 3 4 5 6 7X1
0
5
10
15
Y
7 8 9 10 11 12X2
0
5
10
15
Y
Multiple regression graphical explanation.syd
Two possible single variable models:1) yi = 0 + 1xi1 + I2) yi = 0 + 2xi2 + i
Which is a better fit?
Multiple regression - graphical interpretation
Multiple regression graphical explanation.syd
Two possible single variable models:1) yi = 0 + 1xi1 + I2) yi = 0 + 2xi2 + i
Which is a better fit?
0 1 2 3 4 5 6 7X1
0
5
10
15
Y
7 8 9 10 11 12X2
0
5
10
15
Y
P=0.02r2=0.67
P=0.61r2=0.00
-
Multiple regression - graphical interpretation
Multiple regression graphical explanation.syd
Perhaps a multiple regression model work fit better:
yi = 0 + 1xi1 + 2xi2 +i
0 1 2 3 4 5 6 7X1
0
5
10
15
Y
1 3
2
4
5
6
7 8 9 10 11 12X2
0
5
10
15
Y
X1 Y expected residual X21 4 3.02 0.98 11.52 3 4.58 -1.58 9.253 5 6.14 -1.14 9.254 9 7.7 1.3 11.25 11.5 9.26 2.24 11.96 9 10.82 -1.82 8
residual
y b b xi 0 1 i1y b b xi 0 1 i1
Multiple regression - graphical interpretation
Multiple regression graphical explanation.syd
Perhaps a multiple regression model work fit better:
yi = 0 + 1xi1 + 2xi2 +i
0 1 2 3 4 5 6 7X1
0
5
10
15
Y
7 8 9 10 11 12X2
-2
-1
0
1
2
3y b b xi 0 1 i1y b b xi 0 1 i1
y b b xi 0 1 i1y b b xi 0 1 i1
Residual of
-
Multiple regression - graphical interpretation
Perhaps a multiple regression model work fit better:
yi = 0 + 1xi1 + 2xi2 +I Estimated by
y b b x b xi 0 1 i1 2 i2Whole Model
Summary of FitRSquareRSquare AdjRoot Mean Square ErrorMean of ResponseObservations (or Sum Wgts)
0.9994690.9991140.1006616.916667
6
Analysis of Variance
SourceModelErrorC. Total
DF235
Sum ofSquares
57.1779350.030398
57.208333
Mean Square28.58900.0101
F Ratio2821.464Prob > F
|t|
-
Simple regression results
Multiple regression 1.syd
X1
X1
Y
X2
X2
X3
X3
X4
X4
Y
0.580y = 0+1x4
0.0127y = 0+1x3
0.366y = 0+1x2
-
Multiple regression - partial residual plots
Multiple regression 1.syd
y = 0+1x1+2x2+3x3+ 4x4
Model Partial residual
y = 0+2x2+3x3+ 4x4 Ypartial(1)y = 0+1x1+3x3 + 4x4 Ypartial(2)y = 0+1x1+2x2 + 4x4 Ypartial(3)y = 0+1x1 +2x2 +3x3 Ypartial(4)
0 50 100 150 200 250 300 350
X1-200
-100
0
100
200
YPA
RTI
AL(
1)
-30 -20 -10 0 10 20 30
X2-30
-20
-10
0
10
20
30
YPA
RTI
AL(
2)
-15 -10 -5 0 5 10 15
X3-10
-5
0
5
10
15
YPA
RTI
AL(
3)
0 10 20 30 40 50 60 70 80 90 100
X4-3
-2
-1
0
1
2
3
YPA
RTI
AL(
4)
0 50 100 150 200 250 300 350
X10
100
200
300
400
Y
-30 -20 -10 0 10 20 30
X20
100
200
300
400
Y
-15 -10 -5 0 5 10 15
X30
100
200
300
400
Y
0 10 20 30 40 50 60 70 80 90 100
X40
100
200
300
400
Y
Partial residuals vs Xi
Raw data (Y) vs Xi
Ypartial(4)y = 0+1x1 +2x2 +3x3 Ypartial(3)y = 0+1x1+2x2 + 4x4Ypartial(2)y = 0+1x1+3x3 + 4x4Ypartial(1)y = 0+2x2+3x3+ 4x4
Partial residualModel
Ypartial(4)y = 0+1x1 +2x2 +3x3 Ypartial(3)y = 0+1x1+2x2 + 4x4Ypartial(2)y = 0+1x1+3x3 + 4x4Ypartial(1)y = 0+2x2+3x3+ 4x4
Partial residualModel
-
Regression models
Linear model:
yi = 0 + 1xi1 + 2xi2 + .... + i
Sample equation:
...y b b x b xi 0 1 i1 2 i2
Partial regression coefficients H0: 1 = 0 Partial population regression coefficient
(slope) for Y on X1, holding all other Xs constant, equals zero
Example: assume Y = bird abundance, X1=Patch Area and X2=Year slope of regression of Y against patch area,
holding years constant, equals 0.
-
Multiple regression plane
Bird
Abu
ndan
ce
Years Patch Area
Testing H0: i = 0
Use partial t-tests: t = bi / SEbi Compare with t-distribution with n-2 df Separate t-test for each partial
regression coefficient in model Usual logic of t-tests:
reject H0 if P < 0.05 (again this is convention dont feel tied to this)
-
Overall regression model
H0: 1 = 2 = ... = 0 (all population slopes equal zero).
Test of whether overall regression equation is significant.
Use ANOVA F-test: Variation explained by regression Unexplained (residual) variation
Assumptions
Normality and homogeneity of variance for response variable (previously discussed)
Independence of observations (previously discussed)
Linearity (previously discussed) No collinearity (big deal in multiple
regression)
-
Collinearity
Collinearity: predictors correlated
Assumption of no collinearity: predictor variables uncorrelated with (ie.
independent of) each other Effect of collinearity:
estimates of is and significance tests unreliable
Checks for collinearity Correlation matrix and/or SPLOM between
predictors Tolerance for each predictor:
1-r2 for regression of that predictor on all others if tolerance is low (near 0.1) then collinearity is a
problem VIF values
1/tolerance (variance inflator function) look for large values
(>10) Condition indices (not in JMP Pro)
Greater than 15 be cautious Greater than 30 a serious problem
Look at all indicators to determine extent of colinearity
-
Scatterplots Scatterplot matrix (SPLOM)
pairwise plots for all variables Example: build a multiple regression model to predict total
employment using values of six independent variables. See Longley.syd MODEL total = CONSTANT + deflator + gnp + unemployment +
armforce + population + timeDEFLATOR
DE
FLAT
OR
GNP UNEMPLOY ARMFORCE POPULATN TIME
DE
FLATO
R
GN
P
GN
P
UN
EMP
LOY
UN
EM
PLOY
ARM
FOR
CE
AR
MFO
RC
E
POPU
LATN
PO
PU
LATN
DEFLATOR
TIM
E
GNP UNEMPLOY ARMFORCE POPULATN TIME
TIME
Look at relationship between predictor variables immediately you can see colinearity problems
Checks for collinearity Correlation matrix and/or SPLOM between
predictors Tolerance for each predictor:
1-r2 for regression of that predictor on all others if tolerance is low (near 0.1) then collinearity is a
problem VIF values
1/tolerance (variance inflator function) look for large values
(>10) Condition indices
Greater than 15 be cautious Greater than 30 a serious problem
Look at all indicators to determine extent of colinearity
-
Condition indices1 2 3 4 5
1.00000 9.14172 12.25574 25.33661 230.42395
6 71048.08030 43275.04738
Dependent Variable TOTAL N 16 Multiple R 0.998 Squared Multiple R 0.995 Adjusted Squared Multiple R 0.992 Standard Error of Estimate 304.854
Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)CONSTANT -3.48226E+06 8.90420E+05 0.00000 . -3.91080 0.00356DEFLATOR 15.06187 84.91493 0.04628 0.00738 0.17738 0.86314GNP -0.03582 0.03349 -1.01375 0.00056 -1.06952 0.31268UNEMPLOY -2.02023 0.48840 -0.53754 0.02975 -4.13643 0.00254ARMFORCE -1.03323 0.21427 -0.20474 0.27863 -4.82199 0.00094POPULATN -0.05110 0.22607 -0.10122 0.00251 -0.22605 0.82621TIME 1829.15146 455.47850 2.47966 0.00132 4.01589 0.00304
Tolerance and Condition Indices
Longley.syz
Variance Inflator Function (VIF)
Confidence Interval for Regression Coefficients
95.0% Confidence Interval Effect Coefficient Lower Upper VIF---------+----------------------------------------------------------------CONSTANT -3.482259E+006 -5.496529E+006 -1.467988E+006 .DEFLATOR 15.061872 -177.029036 207.152780 135.532438GNP -0.035819 -0.111581 0.039943 1,788.513483UNEMPLOY -2.020230 -3.125067 -0.915393 33.618891ARMFORCE -1.033227 -1.517949 -0.548505 3.588930POPULATN -0.051104 -0.562517 0.460309 399.151022TIME 1,829.151465 798.787513 2,859.515416 758.980597
-
Solutions to collinearity
Simplest - Drop redundant (correlated) predictors
Principal components regression potentially useful
Best model?
Model that best fits the data with fewest predictors
Criteria for comparing fit of different models: r2 generally unsuitable adjusted r2 better Mallows Cp better AIC Best lower values indicate better fit
-
Explained variance
r2
proportion of variation in Y explained by linear relationship with X1, X2 etc.
SS RegressionSS Total
Screening models All subsets
recommended many models if many predictors ( a big problem)
Automated stepwise selection: forward, backward, stepwise NOT recommended unless you get the same
model both ways Check AIC values Hierarchical partitioning
contribution of each predictor to r2
-
Model comparison (simple version)
Fit full model: y = 0+1x1+2x2+3x3+
Fit reduced models (e.g.): y = 0+2x2+3x3+
Compare
Multiple regression 1
X1
X1
X2 X3 X4 Y
X1
X2 X2
X3 X3
X4 X4
X1
Y
X2 X3 X4 Y
Y
y = 0+1x1+2x2+3x3+ 4x4
Any evidence of Colinearity?
Model Building
-
Again check for colinearity
Compare Models using AIC
Model 1:
AIC 78.67 Corrected AIC 85.67
Model 2
AIC 77.06 Corrected AIC 81.67
y = 0+1x1+2x2+3x3+ 4x4
y = 0+1x1+2x2+3x3
-
Formally: Akaike information criterion (AIC, AICc)
Sometimes the following equation is used: AIC = 2k + n[ln(RSS/n)]
where, k = number of fitted parametersn = number of observations
= residual sum of squares (RSS) / AICc = corrected for small sample sizeLower score means better fit
ln 2 1 2 1ln 2 1 2 1 2 1
AIC:AICc:
Model SelectionAll Possible Models
Ordered up to best 4 models up to 4 terms per model.
ModelX1X3X2X4X1,X2X1,X3X1,X4X3,X4X1,X2,X3X1,X2,X4X1,X3,X4X2,X3,X4X1,X2,X3,X4
Number1111222233334
RSquare0.97020.31340.04820.01840.99630.97670.97180.33460.99980.99640.97890.34400.9998
RMSE17.556184.260999.2053100.7486.3536
16.012117.591385.49731.59036.4809
15.740187.67651.6295
AICc168.292227.895234.100234.686131.774166.899170.473230.55481.6718135.060168.780234.04285.6721
BIC169.525229.129235.333235.919132.695167.819171.394231.47581.7786135.167168.887234.14984.3388
-
How important is each predictor variable to the model?
Compare models sequential sum of squaresModel Adjusted r2
y = 0+1x1+2x2+3x3+ 4x4y = 0+1x1+2x2+3x3y = 0+1x1+2x2y = 0+1x1
y = 0+1x1+2x2+3x3+ 4x4For reference the output from the full model
-
y = 0+1x1+2x2+3x3+ 4x4For reference the output from the full model
Compare models sequential sum of squares
0.968440.027430.00387-0.00001
Contribution to Model r2
0.96844y = 0+1x10.99587y = 0+1x1+2x20.99974y = 0+1x1+2x2+3x30.99973y = 0+1x1+2x2+3x3+ 4x4
Adjusted r2Model
0.968440.027430.00387-0.00001
Contribution to Model r2
0.96844y = 0+1x10.99587y = 0+1x1+2x20.99974y = 0+1x1+2x2+3x30.99973y = 0+1x1+2x2+3x3+ 4x4
Adjusted r2Model
(Simple) Non-linear regression models
-
Non-linear regression
Use when you cannot easily linearize a relationship (that is clearly non-linear_
One response (dependent) variable: Y
One predictor (independent variable) variable: X1
Non-linear functions (of many types)
Regression models
Linear model:
yi = 0 + 1x1 +
Non - Linear model (one of many possible):
yi = 0 + 1x12 +
-
Non-linear regression What is the hypothesis??
This is a very big question- lets come back to this What does r2 mean??
In linear regression it is the explained variance divided by total variance
In non-linear it is the same but variance explained can be calculated in two ways
Based on
Based on
2i
y2)( yyi
Raw r2
Mean corrected r2
Non-linear regression
What is the hypothesis??
0 4 8 12 16X
0
10
20
30
40
50
60
Y
-
Non-linear regression (for example)
B*Exp(c*x)ay Fit Curve
Model ComparisonModelExponential 3P
AICc81.089952
BIC79.922153
SSE87.897377
MSE7.3247814
RMSE2.7064333
R-Square0.9729491
Plot
0
10
20
30
40
50
0 5 10 15X
Exponential 3P
Parameter EstimatesParameterAsymptoteScaleGrowth Rate
Estimate1.76096131.57943840.2293354
Std Error1.95590910.78590040.032577
Lower 95%-2.072550.039102
0.1654857
Upper 95%5.59447273.11977480.2931851
What are the hypotheses?
Non-linear regression (many models might be adequate)
What are the hypotheses?
YExponential 2p: Y = a*Exp(b*X)
Exponential 3p: Y = a+b*Exp(c*X)
Polynomial cubic: Y = a+b*X+c*X2+d*X3
-
What are the hypotheses?
Exponential 2p: Y = a*Exp(b*X)
Exponential 3p: Y = a+b*Exp(c*X)
Polynomial cubic: Y = a+b*X+c*X2+d*X3
abc
ab
abcd
Comparing regression Models Evaluate assumptions - sometimes (like in the examples
here) there are violations Simple (but not always correct) - compare adjusted r2 Problem: what counts??
Particularly problematic when there are differences in number of estimated parameters
One solution: compared added fit to expected added fit (because of increased numbers of parameters) One major restriction: models that are nested are
easier to compare Means that the general form is the same or can be made
the same simply by modifying parameter values
-
Non-linear regression (many models might be adequate)
What are the hypotheses?
Fit Curve
Model ComparisonModelExponential 2PExponential 3PCubic
AICc78.06818281.08995286.847655
AICc Weight0.810952
0.17898890.010059
.2 .4 .6 .8 BIC78.01051579.92215383.72124
SSE92.69032487.89737794.528911
MSE7.13002497.32478148.5935373
RMSE2.67021062.70643332.9314736
R-Square0.971474
0.97294910.9709082
Plot
0
10
20
30
40
50
Y
0 5 10 15X
Exponential 2p: Y = a*Exp(b*X)
Exponential 3p: Y = a+b*Exp(c*X)
Polynomial cubic: Y = a+b*X+c*X2+d*X3
Multiple and Non-Linear Regression
Be careful! Know what your hypotheses are Understand how to build models to test your
hypotheses Understand statistical output you may be
mislead if you dont