multiple and complex regression. extensions of simple linear regression multiple regression models:...

Multiple and complex regression

Extensions of simple linear regression

• Multiple regression models: predictor variables are continuous

• Analysis of variance: predictor variables are categorical (grouping variables),

• But… general linear models can include both continuous and categorical predictors

Relative abundance of C3 and C4 plants • Paruelo & Lauenroth (1996)

• Geographic distribution and the effects of climate variables on the relative abundance of a number of plant functional types (PFTs): shrubs, forbs, succulents, C3 grasses and C4 grasses.

data

• Relative abundance of PTFs (based on cover, biomass, and primary production) for each site

• Longitude• Latitude• Mean annual temperature• Mean annual precipitation• Winter (%) precipitation• Summer (%) precipitation• Biomes (grassland , shrubland)

73 sites across temperate central North America

Response variable Predictor variables

Box 6.1

C3

.88

.81

.75

.69

.63

.56

.50

.44

.38

.31

.25

.19

.13

.06

0.00

20

10

0

Std. Dev = .26

Mean = .27

N = 73.00

C4

.94

.88

.81

.75

.69

.63

.56

.50

.44

.38

.31

.25

.19

.13

.06

0.00

30

20

10

0

Std. Dev = .31

Mean = .29

N = 73.00

Relative abundance transformed ln(dat+1) because positively skewed

Comparing l10 vs ln

LC3

.01

-.05

-.10

-.15

-.20

-.25

-.30

-.36

-.41

-.46

-.51

-.56

-.61

-.66

-.72

-.77

-.82

-.87

-.92

-.97

12

10

8

6

4

2

0

Std. Dev = .33

Mean = -.55

N = 73.00

LNC3

.64

.60

.57

.54

.50

.47

.43

.40

.36

.33

.30

.26

.23

.19

.16

.12

.09

.05

.02

-.01

12

10

8

6

4

2

0

Std. Dev = .20

Mean = .22

N = 73.00

Collinearity

• Causes computational problems because it makes the determinant of the matrix of X-variables close to zero and matrix inversion basically involves dividing by the determinant (very sensitive to small differences in the numbers)

• Standard errors of the estimated regression slopes are inflated

Detecting collinearlity

• Check tolerance values

• Plot the variables

• Examine a matrix of correlation coefficients between predictor variables

Dealing with collinearity

• Omit predictor variables if they are highly correlated with other predictor variables that remain in the model

Correlations

1 .097 -.247* -.839** .074 -.065

. .416 .036 .000 .533 .584

73 73 73 73 73 73

.097 1 -.734** -.213 -.492** .771**

.416 . .000 .070 .000 .000

73 73 73 73 73 73

-.247* -.734** 1 .355** .112 -.405**

.036 .000 . .002 .344 .000

73 73 73 73 73 73

-.839** -.213 .355** 1 -.081 .001

.000 .070 .002 . .497 .990

73 73 73 73 73 73

.074 -.492** .112 -.081 1 -.792**

.533 .000 .344 .497 . .000

73 73 73 73 73 73

-.065 .771** -.405** .001 -.792** 1

.584 .000 .000 .990 .000 .

73 73 73 73 73 73

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

LAT

LONG

MAP

MAT

JJAMAP

DJFMAP

LAT LONG MAP MAT JJAMAP DJFMAP

Correlation is significant at the 0.05 level (2-tailed).*.

Correlation is significant at the 0.01 level (2-tailed).**.

Coefficientsa

7.391 3.625 2.039 .045

-.191 .091 -3.095 -2.101 .039 .003 307.745

-.093 .035 -1.824 -2.659 .010 .015 66.784

.002 .001 4.323 2.572 .012 .002 400.939

(Constant)

LAT

LONG

LOXLA

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig. Tolerance VIF

Collinearity Statistics

Dependent Variable: LC3a.

Coefficientsa

-.553 .027 -20.131 .000

-.003 .004 -.051 -.597 .552 .980 1.020

.048 .006 .783 8.484 .000 .827 1.209

.002 .001 .238 2.572 .012 .820 1.220

(Constant)

LONRE

LATRE

RELALO

Model1

B Std. Error


Beta


t Sig. Tolerance VIF

Collinearity Statistics


(lnC3)= βo+ β1(lat)+ β2(long)+ β3(latxlong)

After centering both lat and long

R2=0.514

Analysis of variance

Source of variation

SS df MS

Regression Σ(yhat-Y)2 p Σ(yhat-Y)2

p

Residual Σ(yobs-yhat)2 n-p-1 Σ(yobs-yhat)2

n-p-1

Total Σ(yobs-Y)2 n-1

Matrix algebra approach to OLS estimation of multiple regression models

• Y=βX+ε

• X’Xb=XY

• b=(X’X) -1 (XY)

Coefficientsa

-2.230 .218 -10.246 .000

.042 .005 .680 7.805 .000

-2.448 .245 -10.005 .000

.044 .005 .720 8.144 .000

.000 .000 .163 1.840 .070

(Constant)

LAT

(Constant)

LAT

MAP

Model1

2

B Std. Error


Beta


t Sig.


The forward selection is

Coefficientsa

-2.689 1.239 -2.170 .034

.000 .000 .181 1.261 .212

-.001 .012 -.012 -.073 .942

-.834 .475 -.268 -1.755 .084

-.962 .716 -.275 -1.343 .184

.007 .010 .136 .690 .493

.043 .010 .703 4.375 .000

-2.730 1.093 -2.498 .015

.000 .000 .180 1.269 .209

-.831 .470 -.267 -1.769 .082

-.963 .711 -.276 -1.354 .180

.007 .010 .138 .708 .481

.044 .006 .713 7.932 .000

-2.011 .406 -4.959 .000

.000 .000 .113 1.074 .287

-.812 .467 -.261 -1.738 .087

-.670 .577 -.192 -1.163 .249

.044 .006 .714 7.983 .000

-1.725 .306 -5.640 .000

-1.002 .433 -.322 -2.314 .024

-1.005 .486 -.288 -2.070 .042

.042 .005 .685 8.033 .000

(Constant)

MAP

MAT

JJAMAP

DJFMAP

LONG

LAT

(Constant)

MAP

JJAMAP

DJFMAP

LONG

LAT

(Constant)

MAP

JJAMAP

DJFMAP

LAT

(Constant)

JJAMAP

DJFMAP

LAT

Model1

2

3

4

B Std. Error


Beta


t Sig.


The backward selection is

Criteria for “best” fitting in multiple regression with p predictors.

Criterion Formula

r2

Adjusted r2

Akaike Information Criteria AIC

Akaike Information Criteria AIC

total

sidual

total

gression

SS

SS

SS

SSr ReRe2 1

)1()

11 2r

pn

n

1

2)]/[ln( Re pn

pnnSSn sidual

121))/(2ln(

22 )Re pn

pnnSS

nsidual

Hierarchical partitioning and model selection

No pred Model r2 Adjr2 AIC (R) AIC

1 Lon 0.00005 -0.014 49.179 -165.10

1 Lat 0.4619 0.454 3.942 -204.44

2 Lon + Lat 0.4671 0.4519 5.220 -201.20

3 Long +Lat +

Lon x Lat0.5137 0.4926 0.437 -209.69

multiple and complex regression. extensions of simple linear regression multiple regression models:...

Documents

long slide

model slide

xy slide

inflated slide

skewed slide

complex regression slide

vs ln slide

categorical grouping