more on regression species richness ph humidity light temper ature organic matter content species...
TRANSCRIPT
More on regression
Species richness
pH
Humidity
Light
Temperature
Organic matter content
Species richness
pH
Humidity
Light
Temperature
Organic matter content
Is it possible to infer causal relationships between
model drivers from regression analysis?
Is it possible to compare the goodness of different
models?
Is it possible to quantify the influence of different drivers?
Path analysis and linear structure models (Structure equation modelling SEM)
Y
X3X2 X4X1
e
Multiple regression
YX3
X2
X4X1
ee
e
e
e
Path analysis tries to do something that is logically impossible, to derive causal relationships from sets of observations.
Path analysis defines a whole model and tries to separate correlations into direct and indirect effects
eXaXaXaXaaY 443322110
The error term e contains the part of the variance in Y that is not explained by the model. These errors are called residuals
Regression analysis does not study the relationships between the predictor
variables
X Z
Y
WpXW pZX
pZYpXY
e e
e
e
Path analysis is largely based on the computation of partial coefficients of correlation.
Path coefficients
Path analysis is a model confirmatory tool. It should not be used to generate models or even to seek for models that fit the data set.
xw
xy
zx zy
W p X e
X p Y e
Z p X p Y e
xw
xy
zx zy
p X W e 0
X p Y e 0
p X p Y Z e 0
We start from regression functions
Using Z-transformed values we get
X
W
Z
Y
pXW
pYX
pXZ pYZ
W xw X
X xy Y
Z zx X zy Y
W Y xw X Y Y
X W xy Y W W
Z W zx X W zy Y W W
X Z xy Y X X
X Y xy Y Y Y
Z Y zx X Y zy Y Y Y
WY xw XY
XW xy YW
ZW zx XW zy YW
XZ xy YX
XY
Z p Z e
Z p Z e
Z p Z p Z e
Z Z p Z Z eZ
Z Z p Z Z eZ
Z Z p Z Z p Z Z eZ
Z Z p Z Z eZ
Z Z p Z Z eZ
Z Z p Z Z p Z Z eZ
r p r
r p r
r p r p r
r p r
r
xy
ZY zx XY zy
p
r p r p
eZY = 0
ZYZY = 1
ZXZY = rXY
xw
xy
zx zy
p X W e 0
X p Y e 0
p X p Y Z e 0
Path analysis is a nice tool to generate hypotheses.It fails at low coefficients
of correlation and circular model structures.
π=πΉπ·
Species richness and soil characteristics of ground beetles
Species richness
pH
Humidity
Light
Temperature
Organic matter content
pLH
pLT
pTH
pPHO
pTSpHS pOS
π=ππ»ππ»+ππππ+ππππ
π»=ππΏπ» πΏ+πππ»π
π=ππΏπ πΏ
π=πππ»π ππ»
ππ»=ππ»ππ»π»+πππππ»+πππππ»ππ=ππ»ππ»π+πππππ+πππππππ=ππ»ππ»π+πππππ+πππππ
π»π=ππΏπ» πΏπ+πππ»πππ»π=ππΏπ» πΏπ+πππ» ππ
ππ=ππΏπ πΏπ
ππ=ππ»ππ»π+πππππ+πππππ
TWe formulate a model of causal relationships.We multiply each equation by the other variables.
WE have seven unknowns and need seven linear equations.
ππ»=ππ»ππ»π»+πππππ»+πππππ»ππ=ππ»ππ»π+πππππ+πππππππ=ππ»ππ»π+πππππ+πππππ
π»π=ππΏπ» πΏπ+πππ»πππ»π=ππΏπ» πΏπ+πππ» ππ
ππ=ππΏπ πΏπT
Species Light Temprature pHOrganic matter content
Humidity
10 3.97 3.42 4.16 4.30 3.3518 3.17 3.37 4.13 3.70 3.2125 3.38 3.40 4.05 3.75 3.3825 3.01 3.45 4.09 3.91 3.1829 2.67 3.30 3.93 3.95 3.0940 2.95 3.42 4.03 3.90 3.1720 3.04 3.30 4.06 4.01 3.0819 2.86 3.54 4.23 3.93 3.0523 3.24 3.31 4.14 4.12 3.0730 2.96 3.40 4.08 3.94 3.0523 3.74 3.52 4.25 4.14 3.0731 3.02 3.30 3.83 3.82 3.2714 3.09 3.42 3.96 3.89 3.1316 3.81 3.45 4.34 4.19 3.1424 3.68 3.49 4.33 3.38 3.4116 3.67 3.48 4.03 3.92 3.3713 3.47 3.44 3.93 3.55 3.34
Species Light Temprature pH Organic_matter_content Humidity
Species 0 0.019219 0.27591 0.34139 0.58463 0.36388Light -0.56068 0 0.07543 0.033355 0.56839 0.022344
Temprature -0.28026 0.44232 0 0.011559 0.71838 0.52892pH -0.24591 0.51755 0.59609 0 0.49036 0.75606
Organic_matter_content -0.14277 0.14892 -0.094463 0.1796 0 0.040297
Humidity -0.23502 0.54942 0.16418 -0.081422 -0.50144 0
Correlation matrix
Y pHS pOS pTS pLH pTH pLT pPHOrSH 1 rOH rTH 0 0 0 0rST rHT rOT 1 0 0 0 0rSO rHO 1 rTO 0 0 0 0rHT 0 0 0 rLT 1 0 0rHO 0 0 0 rLO rTO 0 0rTO 0 0 0 0 0 rLO 0rOT 0 0 0 0 0 0 rPHT
Y pHS pOS pTS pLH pTH pLT pPHO-0.23502 1 0.040297 0.16418 0 0 0 0-0.28026 0.52892 0.71838 1 0 0 0 0-0.14277 0.040297 1 -0.09446 0 0 0 00.52892 0 0 0 0.44232 1 0 0
0.040297 0 0 0 0.14892 -0.09446 0 0-0.09446 0 0 0 0 0 0.14892 00.71838 0 0 0 0 0 0 0.011559
ππ»=ππ»ππ»π»+πππππ»+πππππ»ππ=ππ»ππ»π+πππππ+πππππππ=ππ»ππ»π+πππππ+πππππ
π»π=ππΏπ» πΏπ+πππ»πππ»π=ππΏπ» πΏπ+πππ» ππ
ππ=ππΏπ πΏπT
pHS -0.21888pOS -0.13999pTS -0.06392pLH 0.473304pTH 0.319568pLT -0.63432pPHO 62.14897
π·=πΉβππ
Species richness
pH
Humidity
Light
Temperature
Organic matter content
0.47
-0.63
0.31
62.2
-0.06-0.22 -0.14
π=πΉπ·
N X A B C1 1.00 0.68 0.55 2.162 1.30 0.98 1.49 0.453 1.42 0.74 0.13 0.554 1.70 0.12 0.28 2.345 2.47 0.63 0.73 0.606 3.02 0.73 1.73 0.147 3.91 0.19 0.28 2.608 4.42 0.73 1.36 2.749 5.09 1.91 1.89 0.9910 5.27 1.49 0.96 1.2111 5.58 1.11 1.14 3.2012 6.34 0.84 1.31 1.0113 6.64 1.72 2.57 0.9214 7.32 0.87 1.17 3.21
R2 is the explained variance in abivariate comparison
Logistic and other regression techniques
01
n
i ii
Y a a x
We use odds
The logistic regression model
π= ππ
1+ππ
π=ππ( π1βπ )=π0+βππ π₯π
π1βπ
=ππ0+β πππ₯ π
π= ππ0+β πππ₯ π
1+ππ0+β ππ π₯ π
P defines a probability according to a logistic model
1
0.5
Threshold
Surely malesSurely females
P
Gender A B CFemale 0.038 0.165 2.211Female 0.500 0.987 2.894Female 0.864 0.759 0.860Female 0.590 1.071 2.434Female 0.385 0.749 0.984Female 0.703 0.879 2.745Female 0.629 1.047 2.774Male 0.730 0.798 2.951Male 1.367 1.841 3.174Male 1.325 0.850 1.337Male 0.958 1.551 3.000Male 1.173 1.164 1.077Male 1.559 1.521 3.266Male 1.027 1.251 3.315X 0.900 0.856 2.345
0
0.2
0.4
0.6
0.8
1
Fem
ale
Fem
ale
Fem
ale
Fem
ale
Fem
ale
Fem
ale
Fem
ale
Mal
eM
ale
Mal
eM
ale
Mal
eM
ale
Mal
e
P
Gender
Gender A B CX 0.900 0.856 2.345a0 a1 a2 a3 a0
37.425 2.900 8.000 -52.5Y 2.436eY 11.43p 0.92 X is with probability 0.92 a male.
π= ππ0+β πππ₯ π
1+ππ0+β ππ π₯ π
Regression trees
Region AMT TAR RAI RAR
Annual mean
temperature
Temperature range
Annual mean precipitation
Precipitation range
Argentina_South 7.3 27.3 217 30
Argentina_South 7.9 25.7 375 66Argentina_South 7.2 24.4 568 94Argentina_South 7.1 23.8 685 104Argentina_South 7.4 26.5 284 48Argentina_South 7.8 25.3 416 74Argentina_Pampas 15 30.2 363 33Argentina_Pampas 15.1 31 342 32Argentina_Pampas 15.2 31.6 320 30Argentina_Pampas 15.2 32.2 313 26Argentina_Pampas 14.7 32.7 275 27Argentina_Pampas 14.4 32.5 194 17Argentina_East 18.6 31.8 243 51Argentina_East 19.2 30 355 73
Root
Australia Central Other
12 29RAR < 14.5
AMT < 11.15
Argentina South
Other
6 23
AMT < 16.45
Argentina Pampas
Other
6 17
RAI < 380
Argentina East
Other
6 11
Regression tree analysis tries to groups cases according to predefined nominal and ordinal variables and returns variables levels that best group these cases.It uses a heuristic pattern seeking algorithm.
N X A B1 1.00 0.68 0.552 1.30 0.98 1.493 1.42 0.74 0.134 1.70 0.12 0.285 2.47 0.63 0.736 3.02 0.73 1.737 3.91 0.19 0.288 4.42 0.73 1.369 5.09 1.91 1.8910 5.27 1.49 0.9611 5.58 1.11 1.1412 6.34 0.84 1.3113 6.64 1.72 2.5714 7.32 0.87 1.17
What is the correlation between B and X?
y = 0.17x + 0.43RΒ² = 0.29
0
0.5
1
1.5
2
2.5
3
0 2 4 6 8
XB
What is the pure correlation between B and X excluding the influence of A on both X and B?
We need the partial correlation of X and B.
A B
X
rAB
rBXrAX
y = 0.12x + 0.43
0
0.5
1
1.5
2
0 2 4 6 8
X
A
y = 0.99x + 0.21
0
0.5
1
1.5
2
2.5
3
0 1 2 3
B
A
DB
DX
y = 0.05x - 0.18RΒ² = 0.06
-1
-0.5
0
0.5
1
0 2 4 6 8
DB =
X
DX = B
Partial regressions are the regression of residuals excluding a third factor.
/ 2 21 1
XY XZ YZ
XY Z
XZ YZ
r r rr
r r
r\p X A B CX 0 0.29371 0.17742 0.03325A 0.33073 0 0.12024 0.3568B 0.41704 0.4732 0 0.27957C 0.61517 -0.29216 -0.33999 0
Coeff. Std.err. r2 pConstant -0.561 1.317 0.679A 1.425 1.286 0.109 0.294B 1.388 0.957 0.174 0.177C 1.065 0.432 0.378 0.033
R^20.0000.2620.2910.094
Partial linear correlations
The partial linear correlations of A, B, and C on X.
To show the isolated influence of single predictors we show the squared partial correlation coefficients within linear
regression results.
Multiple regression results