multiple and complex regression. extensions of simple linear regression multiple regression models:...
TRANSCRIPT
Multiple and complex regression
Extensions of simple linear regression
• Multiple regression models: predictor variables are continuous
• Analysis of variance: predictor variables are categorical (grouping variables),
• But… general linear models can include both continuous and categorical predictors
Relative abundance of C3 and C4 plants • Paruelo & Lauenroth (1996)
• Geographic distribution and the effects of climate variables on the relative abundance of a number of plant functional types (PFTs): shrubs, forbs, succulents, C3 grasses and C4 grasses.
data
• Relative abundance of PTFs (based on cover, biomass, and primary production) for each site
• Longitude• Latitude• Mean annual temperature• Mean annual precipitation• Winter (%) precipitation• Summer (%) precipitation• Biomes (grassland , shrubland)
73 sites across temperate central North America
Response variable Predictor variables
Box 6.1
C3
.88
.81
.75
.69
.63
.56
.50
.44
.38
.31
.25
.19
.13
.06
0.00
20
10
0
Std. Dev = .26
Mean = .27
N = 73.00
C4
.94
.88
.81
.75
.69
.63
.56
.50
.44
.38
.31
.25
.19
.13
.06
0.00
30
20
10
0
Std. Dev = .31
Mean = .29
N = 73.00
Relative abundance transformed ln(dat+1) because positively skewed
Comparing l10 vs ln
LC3
.01
-.05
-.10
-.15
-.20
-.25
-.30
-.36
-.41
-.46
-.51
-.56
-.61
-.66
-.72
-.77
-.82
-.87
-.92
-.97
12
10
8
6
4
2
0
Std. Dev = .33
Mean = -.55
N = 73.00
LNC3
.64
.60
.57
.54
.50
.47
.43
.40
.36
.33
.30
.26
.23
.19
.16
.12
.09
.05
.02
-.01
12
10
8
6
4
2
0
Std. Dev = .20
Mean = .22
N = 73.00
Collinearity
• Causes computational problems because it makes the determinant of the matrix of X-variables close to zero and matrix inversion basically involves dividing by the determinant (very sensitive to small differences in the numbers)
• Standard errors of the estimated regression slopes are inflated
Detecting collinearlity
• Check tolerance values
• Plot the variables
• Examine a matrix of correlation coefficients between predictor variables
Dealing with collinearity
• Omit predictor variables if they are highly correlated with other predictor variables that remain in the model
Correlations
1 .097 -.247* -.839** .074 -.065
. .416 .036 .000 .533 .584
73 73 73 73 73 73
.097 1 -.734** -.213 -.492** .771**
.416 . .000 .070 .000 .000
73 73 73 73 73 73
-.247* -.734** 1 .355** .112 -.405**
.036 .000 . .002 .344 .000
73 73 73 73 73 73
-.839** -.213 .355** 1 -.081 .001
.000 .070 .002 . .497 .990
73 73 73 73 73 73
.074 -.492** .112 -.081 1 -.792**
.533 .000 .344 .497 . .000
73 73 73 73 73 73
-.065 .771** -.405** .001 -.792** 1
.584 .000 .000 .990 .000 .
73 73 73 73 73 73
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
LAT
LONG
MAP
MAT
JJAMAP
DJFMAP
LAT LONG MAP MAT JJAMAP DJFMAP
Correlation is significant at the 0.05 level (2-tailed).*.
Correlation is significant at the 0.01 level (2-tailed).**.
Coefficientsa
7.391 3.625 2.039 .045
-.191 .091 -3.095 -2.101 .039 .003 307.745
-.093 .035 -1.824 -2.659 .010 .015 66.784
.002 .001 4.323 2.572 .012 .002 400.939
(Constant)
LAT
LONG
LOXLA
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Tolerance VIF
Collinearity Statistics
Dependent Variable: LC3a.
Coefficientsa
-.553 .027 -20.131 .000
-.003 .004 -.051 -.597 .552 .980 1.020
.048 .006 .783 8.484 .000 .827 1.209
.002 .001 .238 2.572 .012 .820 1.220
(Constant)
LONRE
LATRE
RELALO
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Tolerance VIF
Collinearity Statistics
Dependent Variable: LC3a.
(lnC3)= βo+ β1(lat)+ β2(long)+ β3(latxlong)
After centering both lat and long
R2=0.514
Analysis of variance
Source of variation
SS df MS
Regression Σ(yhat-Y)2 p Σ(yhat-Y)2
p
Residual Σ(yobs-yhat)2 n-p-1 Σ(yobs-yhat)2
n-p-1
Total Σ(yobs-Y)2 n-1
Matrix algebra approach to OLS estimation of multiple regression models
• Y=βX+ε
• X’Xb=XY
• b=(X’X) -1 (XY)
Coefficientsa
-2.230 .218 -10.246 .000
.042 .005 .680 7.805 .000
-2.448 .245 -10.005 .000
.044 .005 .720 8.144 .000
.000 .000 .163 1.840 .070
(Constant)
LAT
(Constant)
LAT
MAP
Model1
2
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: LC3a.
The forward selection is
Coefficientsa
-2.689 1.239 -2.170 .034
.000 .000 .181 1.261 .212
-.001 .012 -.012 -.073 .942
-.834 .475 -.268 -1.755 .084
-.962 .716 -.275 -1.343 .184
.007 .010 .136 .690 .493
.043 .010 .703 4.375 .000
-2.730 1.093 -2.498 .015
.000 .000 .180 1.269 .209
-.831 .470 -.267 -1.769 .082
-.963 .711 -.276 -1.354 .180
.007 .010 .138 .708 .481
.044 .006 .713 7.932 .000
-2.011 .406 -4.959 .000
.000 .000 .113 1.074 .287
-.812 .467 -.261 -1.738 .087
-.670 .577 -.192 -1.163 .249
.044 .006 .714 7.983 .000
-1.725 .306 -5.640 .000
-1.002 .433 -.322 -2.314 .024
-1.005 .486 -.288 -2.070 .042
.042 .005 .685 8.033 .000
(Constant)
MAP
MAT
JJAMAP
DJFMAP
LONG
LAT
(Constant)
MAP
JJAMAP
DJFMAP
LONG
LAT
(Constant)
MAP
JJAMAP
DJFMAP
LAT
(Constant)
JJAMAP
DJFMAP
LAT
Model1
2
3
4
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: LC3a.
The backward selection is
Criteria for “best” fitting in multiple regression with p predictors.
Criterion Formula
r2
Adjusted r2
Akaike Information Criteria AIC
Akaike Information Criteria AIC
total
sidual
total
gression
SS
SS
SS
SSr ReRe2 1
)1()
11 2r
pn
n
1
2)]/[ln( Re pn
pnnSSn sidual
121))/(2ln(
22 )Re pn
pnnSS
nsidual
Hierarchical partitioning and model selection
No pred Model r2 Adjr2 AIC (R) AIC
1 Lon 0.00005 -0.014 49.179 -165.10
1 Lat 0.4619 0.454 3.942 -204.44
2 Lon + Lat 0.4671 0.4519 5.220 -201.20
3 Long +Lat +
Lon x Lat0.5137 0.4926 0.437 -209.69