community and gradient analysis: matrix approaches in macroecology the world comes in fragments
TRANSCRIPT
Nicolaus Copernicus University – Department of Animal Ecology
Community and gradient analysis: Matrix approaches in macroecology
The world comes in fragments
Statistical inference means to compare your hypothesis H1 with an appropriate null hypothesis H0.
Hypothesis correct
1-
1-
Hypothesis wrong
Hypothesis rejected
Hypothesis accepted
Type I error
Type II error
Simple examples in ecology are
• The correlation between species richness and area
(H0: no correlation, t-test)
• Differences in productivity between plots of different soil properties.
(H0: no difference between means, ANOVA)
But what about more complex patterns:
• Relative abundance distributions
• Productivity – diversity relationship
• Succession
• Community assembly
Galapagos Islands n
i ii 1
H p ln p
Var(H) Jackknife[Samp(H)]
But: Your variance estimator comes from the underlying distribution of species and individuals.
Does the variance stem from • Species interactions?• Random processes?• Evolutionary history?• Ecological history?
In fact we do not have an appropriate null hypothesis.
Bootstrapped or jackknifed variance estimators only catch the variability in the underlying distribution.
We compare diversities on islands
A t-test points to significant differences in diversity.
Statistical inference
Species lip kor helPterostichus melanarius 704 1199 169Pterostichus oblongopunctatus (Fabricius) 180 1019 8Platynus assimilis (Paykull) 117 76 9Carabus granulatus 113 154 11Nebria brevicolis (Fabricius) 94 34 0Harpalus 4-punctatus Dejean 69 555 67Patrobus atrorufus (Stroem) 37 11 0Pterostichus strennus (Panzer) 24 28 6Epaphius secalis (Paykull) 8 0 0Badister bullatus (Schrank) 7 0 1Oxypselaphus obscurus (Herbst) 7 96 0Notiophilus palustris (Duftshmid) 7 4 2Synuchus vivalis (Illiger) 5 4 10Carabus nemoralis Muller 5 10 16Amara plebeja (Gyllenhal) 4 0 0Leistus terminatus (Hellwig) 4 3 1Badister unipustulatus Bonelli 3 0 0Leistus rufomarginatus (Duftshmid) 3 0 3Pterostichus nigrita (Paykull) 2 0 0Poecilus versicolor (Sturm) 2 0 0Pseudoophonus rufipes (De Geer) 2 90 3Harpalus latus (Linnaeus) 2 1 0Pterostichus antracinus 2 2 11Notiophilus biguttatus (Fabricius) 2 0 1Stomis pumicatus (Panzer) 2 14 2Amara brunea (Gyllenhal) 1 0 3
Is species co-occurrence random or do species have
similar habitat requirements?
A simple regression analysis points to joint occurrences.
PF(r=0) < 0.00001
)2(1 2
2
nRR
F
Abundance scale exponentially. Extreme values bias the results
Spearman’s r = 0.67, PF(r=0) < 0.001
Classical Fisherian testing relies on an equiprobable null assumption. All values are equiprobable.
In ecology this assumption is often not realistic.
y = 1.8589x + 26.397R² = 0.7075
0
500
1000
1500
0 500 1000
kor
lip
Species lip kor helPterostichus melanarius 704 1199 169Pterostichus oblongopunctatus (Fabricius) 180 1019 8Platynus assimilis (Paykull) 117 76 9Carabus granulatus 113 154 11Nebria brevicolis (Fabricius) 94 34 0Harpalus 4-punctatus Dejean 69 555 67Patrobus atrorufus (Stroem) 37 11 0Pterostichus strennus (Panzer) 24 28 6Epaphius secalis (Paykull) 8 0 0Badister bullatus (Schrank) 7 0 1Oxypselaphus obscurus (Herbst) 7 96 0Notiophilus palustris (Duftshmid) 7 4 2Synuchus vivalis (Illiger) 5 4 10Carabus nemoralis Muller 5 10 16Amara plebeja (Gyllenhal) 4 0 0Leistus terminatus (Hellwig) 4 3 1Badister unipustulatus Bonelli 3 0 0Leistus rufomarginatus (Duftshmid) 3 0 3Pterostichus nigrita (Paykull) 2 0 0Poecilus versicolor (Sturm) 2 0 0Pseudoophonus rufipes (De Geer) 2 90 3Harpalus latus (Linnaeus) 2 1 0Pterostichus antracinus 2 2 11Notiophilus biguttatus (Fabricius) 2 0 1Stomis pumicatus (Panzer) 2 14 2Amara brunea (Gyllenhal) 1 0 3
Species do not have the same abundances in the meta-
community and sites differ in capacity.
Statistical testing should incorporate such differences in
occurrence pobabilities.Ecologists often have a good H1
hypothesis.Much discussion is about the
appropriate null assumption H0.
What do we expect if colonization of these
three islands is random?
Ecology is interested in the differences between observed pattern and random expectation.
Our statistical tests should deal with these differences and not with raw pattern!
If we use classical Fisherian testing nearly all empirical ecological matrices are significantly non-random.
Thus we can’t separate ecological interactions from mass effects.
Theory of Island biogeography
Galapagos Islands
tries to understand diversity from a stochastic species based approach.
We treat the theory as H1
The theory gives us expectations that have
to be confirmed by observation.
Isolation
Sp
eci
es
rich
ne
ss
z
Isolation
Sp
eci
es
rich
ne
ss
z95% confidence limits
We treat the theory as H0
The theory gives us random expectations.
Residuals need ecological interpretation.
Multispecies metapopulation and patch occupancy models
Islands in a fragmented landscape
Random dispersal of individuals
between islands results in a
stable pattern of colonization
ijd
ij j i i
i, j i
dp am (e A p ) 1 p p
dt A
The change of occupancy p in time depends on
patch size and distance according to a logistc
growth equation.
Metapopulation models are single
species equivalents of the island
biogeography model.
Multispecies metapopulation models
give null expectations on community structure.
The neutral theory of biodiversity
EI
S
All ecological patterns are now triggered by only one fundamental constant, universal biodiversity number that is the product 2pm, with p being the
speciation rate and m being the size of the metacommunity.
Metacommunity
BD
EI
BD
EI
BD
EI
BD
EI
BD
EI
BD
P
P
P
P
Local community
Ecological drift Zero sum multinomial
Neutral models try to explain ecological patterns by five basic stochastic processes:
- Simple birth processes - Simple death processes
- Immigration of individuals - Dispersal of individuals
- Lineage branching
Neutral models are the individual based equivalents to the species based theory of island biogeography!
Although they make predictions about diversities they do not explicitly refer to species!
Diversities refer to evolutionary lineages
Ecological drift
The main trigger of neutrality is dispersal.
A high dispersal rates species specific traits are of minor importance
for the shape of basic ecological distributions.
Used as H1 Neutral models make explicit predictions about
Shape and parameters of species rank order distributions
Species – area relationships
Abundance - range size relations
Local diversity patterns
Patterns of succession
Local and regional species numbers
Branching patterns of taxonomic lineages
Used as H0 residuals from model predictions are measure of ecological interactions
• The model contains a number of hidden variables (dispersion limitation, branching
mode, dispersal probability, isolation, matrix shape…
• CPU times are a limiting resource
• Variable carrying capacities are needed to obtain realistic evolutionary time scales
Birth / Death
Dispersal limitation function
Dispersal rate
Mode and frequency of speciation
Immigration rate
Carrying capacities
The neutral, metapopulation and island biogeography models contain too many hidden variables to be of use as null hypothesis.
Ecological realism without too many parameters
We need null models that are ecologically realistic and rely on few assumptions that
apply to all species.
Gradient of null model assumptions including more and more constraints.
Null models only use information given in the matrix. Theses are matrix fill, marginal totals, and degree distributons.
Gradient of null model assumptions including more and more constraints.
Retain fill Retain fill and row totals
Retain fill and column totals
Retain fill and row degree distribution
Retain fill and column degree
distribution
Retain fill and row and
column degree distribution
Retain row and column totals
Possible constraints
Rows
Columns equiprobableproportional to marginal
totals
Marginal totals fixed
equiprobable x x xproportional to marginal totals x x x
marginal totals fixed x x x
S 4 2 7 1 3 5 6 8 S
8 1 1 0 0 0 1 1 1 53 1 1 0 0 1 0 1 0 49 1 1 1 0 0 1 0 0 41 1 0 0 1 0 0 0 1 32 0 0 1 1 0 0 1 0 36 0 0 1 0 1 1 0 0 34 0 1 0 0 1 0 0 0 2
10 0 0 1 1 0 0 0 0 25 1 0 0 0 0 0 0 0 17 0 0 0 0 0 0 0 1 1
S 5 4 4 3 3 3 3 3
Degree distribution
Marginal totals
Start from an empty matric and fill it
randomly without or according to some
constraints
Gradient of null model assumptions including more and more constraints.
Equiprobable - equiprobable
Proportional - proportional
Equiprobable - fixed
Fixed - Equiprobable
Fixed - proportional
Fixed - fixed
Includes mass effects
Most liberal
Identifies nearly all empirical
matrices as being not random
Low discrimination
power
Partly includes mass effects
Appropriate if species
abundances or site capacities
are equal
Identifies most empirical
matrices as being not random
Partly excludes mass effects
Appropriate if species
abundances or site capacities are
proportional to metapopulation abundances or sites capacities
Identifies many empirical
matrices as being not random
Excludes most mass effects
Appropriate if column totals
are proportional to sites capacities
Identifies many empirical
matrices as being random
Excludes mass effects
Appropriate if nothing is
known about abundances
and capacities
Identifies most empirical
matrices as being random
SitesSpecies 1 2 3 4 5 6 7 8A 1 1 1 0 0 0 0 1B 1 1 1 1 1 1 1 0C 1 1 1 1 1 1 1 1D 1 1 1 1 1 1 1 1E 1 1 0 1 0 0 1 1F 1 1 1 1 1 1 1 1G 1 0 1 1 1 0 0 0H 1 1 1 0 0 1 1 1I 0 1 1 0 1 0 0 1J 1 1 1 1 1 1 1 1K 0 1 1 1 1 0 0 1L 1 1 0 0 0 1 1 0
1 0 0 10 1 1 0
SitesSpecies 1 2 3 4 5 6 7 8A 0 0 0 0 0 0 0 0B 0 0 0 0 0 0 0 0C 0 0 0 0 0 0 0 0D 0 0 0 0 0 0 0 0E 0 0 0 0 0 0 0 0F 0 0 0 0 0 0 0 0G 0 0 0 0 0 0 0 0H 0 0 0 0 0 0 0 0I 0 0 0 0 0 0 0 0J 0 0 0 0 0 0 0 0K 0 0 0 0 0 0 0 0L 0 0 0 0 0 0 0 0
SitesSpecies 1 2 3 4 5 6 7 8A 0 1 0 1 0 0 0 0B 0 0 0 0 1 0 1 0C 0 1 0 0 1 0 0 0D 0 0 0 0 0 0 1 0E 0 0 1 0 1 0 0 0F 0 0 0 0 0 0 1 0G 0 0 0 0 0 0 0 0H 0 1 0 1 0 1 0 0I 0 0 0 0 0 0 0 0J 1 0 0 0 1 0 1 0K 1 0 0 0 0 0 0 1L 0 0 1 1 0 1 0 1
An initial empty matrix is filled step by step at random. If after a
placement violates the above constraints it steps back and places elsewhere. The process continues until all occurrences are placed.
Major drawbacks:
Long computation times
Potential dead ends
Fill algorithm
Swap algorithm
The algorithm screens the original matrix for
checkerboards and swaps them to leave row and columns sums
constant. Use at least 10*species*sites swaps.
Major drawbacks:
Generates biased matrices in dependence
on the original distribution
The algorithm starts with a random matrix according to the row and column constraints and
sequentially swaps all 2x2 submatrices until only 1 and 0
remain.
Major drawbacks:
Randomized matrices have a low variance
that are prone to type II errors.
Trial algorithm(Sum of squares reduction)
SitesSpecies 1 2 3 4 5 6 7 8A 0 2 0 2 0 0 0 0B 0 1 1 0 3 0 2 0C 0 1 0 2 1 2 3 0D 1 1 1 2 3 0 0 0E 0 0 1 0 1 0 0 3F 0 0 0 0 0 0 4 4G 0 1 0 1 1 0 1 0H 0 2 0 2 0 2 0 0I 0 1 1 0 0 2 0 0J 1 0 5 0 1 0 0 0K 1 0 3 0 0 0 0 1L 0 0 4 0 0 0 0 0
2 0 1 10 1 1 0
Algorithms for the fixed fixed null model
SitesSpecies 1 2 3 4 5 6 7 8A 1 1 1 0 0 0 0 1B 1 1 1 1 1 1 1 0C 1 1 1 1 1 1 1 1D 1 1 1 1 1 1 1 1E 1 1 0 1 0 0 1 1F 1 1 1 1 1 1 1 1G 1 0 1 1 1 0 0 0H 1 1 1 0 0 1 1 1I 0 1 1 0 1 0 0 1J 1 1 1 1 1 1 1 1K 0 1 1 1 1 0 0 1L 1 1 0 0 0 1 1 0
1 0 0 10 1 1 0
The Swap algorithm is most often used
1. Sequential swap: First make a burn in and swap 30000 times and then use each further 5000 swaps as a new random matrix
2. Independent swap: Generate each random matrix from the original matrix using at least 10*species*sites swaps.
Compare the observed metric scores with the simulated ones (100 or more randomized matrices)
638.0141.0
711.3621.3 sxx
scoreZ
Z-scorelower CL = -0.37
Z-scoreupper CL = -3.00
0
500
1000
1500
2000
3.57 3.67 3.78 3.89 3.99 4.1Scores
Fre
quen
cy
Observed score
upper CLLower CL
Using abundances
Abundance
Species
Populations
equiprobable
proportional to
observed totals
marginal totals fixed
proportional to marginal
totals
equiprobable
marginal totals fixed
proportional to marginal totals
equiprobable
populations fixed
Including abundances into null models increases the number of possible null models
These 27 combinations regard rows, columns, and row and columns.
CA
SA
AA
MA
U
Mantel
IA
IT
ITC
ITR
IS
ISC
ISR
IR
IF
OA
OF
PM
PR
PC
CA
SA
AA
MA
U
Mantel
200 MS200 MR
X
IA
IT
ITC
ITR
X
CA
SA
AA
MA
U
Mantel
200 MR
IA
IT
ITC
ITR
X
Dependence on size, fill, abundance
CA
SA
AA
MA
U
Mantel
Mmod = 600 seeded MR
IA
IT
ITC
ITR
X
BR
CASAMAU
IT
185 empirical abundance matrices
Dependence on null matrix
constructionPower to detect
segregation
Mmod = 200 seeded MR
Power to detectaggregation
XTest for differencesbetween taxa and
biomeC-score
Fixed -fixed null
model
185 empirical matrices transformed to presence – absence matrices
XTest for differencesbetween taxa and
biome
CA Morisita Variance Mantel
LCL UCL LCL UCL LCL UCL LCL UCL
Prop-prob, total abundance fixed 0 39 5 8 3 2 0 0
Prop-prob , row/column abundances fixed 20 1 0 21 2 20 0 0
Prop– prop, row/column richenss fixed 43 25 0 39 0 98 0 2
Prop-prob, total richnes fixed 65 6 91 0 73 4 49 0
Row/column richenss and abundance fixed 8 11 25 12 3 45 0 0
Occurences fixed 58 2 5 48 2 17 0 166
Occurrences and row/cloumn abundances fixed 0 72 8 24 20 56 0 164
Populations fixed 182 3 0 194 0 196 0 158
Populations per column fixed 171 1 1 145 1 181 15 142
Populations per row fixed 180 1 0 193 0 193 0 158
Testing of null models and metrics using proportional random matrices. The metrics shouldn’t detect these matrices as being non-random.
200 random matrices
0 0.2 0.4 0.6 0.8 1
Interaction
Arthropods
Carabidae
Non-arthropod invertebrates
Plants
Vertebrates
C-Score
Abundance matrices are more often detected as being non-random
0 0.2 0.4 0.6 0.8 1
Interaction
Arthropods
Carabidae
Non-arthropod invertebrates
Plants
VertebratesCA
),,,(),,,(; cdbdcabacdbdcabadc
ba
ST
4CACA
m(m 1)n(n 1)
)1(
))((2,
SS
NNNN
CS jiijjiji
Fraction of 185 matrices detected as being significantly (two-sided 95% CL) segregated (dark bars) or aggregated (white bars).