after 50 years of quantitative palaeoecology – senescence, maturity, or progress? h john b birks...

89
After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou, August 2015

Upload: laura-mcdaniel

Post on 02-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

After 50 Years of Quantitative Palaeoecology– Senescence, Maturity, or Progress?

H John B BirksUniversity of Bergen and University College London

Lanzhou, August 2015

Page 2: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Typical ‘Life’ of a Scientific Approach – Main Phases

At end of mature phase, stable state implies now accepted as ‘normal science’

Senescence implies some of the earlier ideas not very useful, best forgotten about!

Progress indicates more to be done, not yet fully mature

Activity (e.g. publications)

Time1965 1975 1985 1995 2005 2015

Pioneer Building Mature?

stability

senescence

progress

Where will we go from 2015?

Page 3: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Cognisance, ignorance, knowledge, and uncertainty

The Pioneering Phase: 1965–1974 (began 50 years ago)

Page 4: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

The Pioneering Phase: 1965–1974

All descriptive – characterise patterns in complex multivariate data (stratigraphical or surface-sample data). Exception is Webb & Bryson (1972) which provides narratives (untestable climate reconstructions) from pollen data, and Mosimann (1965) which presents robust statistical methods for estimating errors in pollen counting – sadly ignored today!

Page 5: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

The Building Phase: 1975–1985 (began 40 years ago)

Page 6: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

The Building Phase: 1975–1985

1985

1992

Primarily descriptive or narrative, hint of analytical hypothesis testing in Birks & Peglar (1979) in relation to different interglacials.

Page 7: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

The Building Phase: 1975–1985

At the same time, important developments going on in quantitative plant ecology

1987

J Ecol 1973

Vegetatio 1980

Ecology 1986 Correspondence analysis (CA),

detrended correspondence analysis (DCA), canonical correspondence analysis (CCA)

Page 8: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

The Mature Phase: 1985–2015 (began 30 years ago)

Page 9: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

The Mature Phase: 1985–2015

2012

Primarily narratives (plausible but untestable environmental reconstructions) and analytical hypothesis-testing

Page 10: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

The Mature Phase: 1985–2015

Culminated in the ‘big blue book’ of 2012 edited by Andy Lotter, Steve Juggins, John Smol, and myself

Was that the ‘pinnacle’ of the subject’s life?

Page 11: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

At same time, applied statisticians were developing new ‘state-of-the-art’ techniques for handling and analysing huge data-sets, so-called ‘data-mining’ and ‘statistical learning’ procedures.

20132011 2015

Page 12: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Data-sets so large that analysis can ‘learn’ from the data when split into a ‘training’ set, a ‘validation’ set, and a ‘test’ set. With ever-increasing computer power, can repeat the analyses with such random splitting of the data many times (e.g. 1000) to assess uncertainties, significance levels, etc. ‘Statistics in the computer age’

Cross-validation – bootstrapping, leave-one-out, leave-n-out (where n may be 100, 500, etc. objects)

Page 13: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Discuss six of these new techniques that in the last few years have shown themselves to be important additions to the quantitative palaeoecologist’s tool-kit.To put them into context, outline the basic uses of numerical methods in quantitative palaeoecology (Birks 2013 Encyclopedia of Quaternary Science 3, 821-830).

Data collection and data assessment• Identification• Error estimationData summarisation• Single stratigraphical or

geographical data-sets (e.g. zonation, ordination)

• Two or more stratigraphical or geographical data-sets

Data analyses• Palynological richness• Population analysis• Rate-of-change analysis• Time-series analysis• Pollen-based climate

reconstructionsData interpretation• Vegetation reconstruction• Causative factors

Quaternary data-sets can be modern (‘surface samples’) (M) or fossil (stratigraphical) (F) data

Page 14: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Six new techniques

1. Co-correspondence analysis (Co-CA)

2. Classification and regression trees (CART) + indicator species analysis (INDVAL)

3. Procrustes analysis and comparison of ordinations and classification

4. Principal curves (PC)

5. Intrinsic and extrinsic drivers of change

6. Statistical significance of environmental reconstructions

M or F, data summarisation, data analysis

F, data interpretation

F, data analysis, data interpretation

Page 15: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Palaeoecological pioneers of these new techniques in the last 2–3 years – next generation

Jacob Carstensen Gavin SimpsonUlrike Herzschuh

Tom Davidson Vivian Felde Alistair Seddon

Page 16: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

And the applied statisticians who developed these methods

Cajo ter Braak Glenn De’athTrevor Hastie

Robert Tibshirani Mark Hill Pierre Legendre

Page 17: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Co-Correspondence Analysis (Co-CA)

ter Braak & Schaffers (2004) Ecology 85: 834-846

Problems: are carabid beetles in grassland more closely related to vegetation structure (height, cover, biomass, etc.) than to vegetation composition? Are fen bryophytes more closely related to vegetation composition than to water chemistry?

Data - beetles, plants, vegetation type, vegetation structure, and environmental variablesall from same set of sites.

Approaches1. RDA/CCA - predict beetles from environmental data.

- cannot predict beetles from vegetation data because may be more plant species (predictors) than sites. No constraints.

2. CA/DCA of beetles and plants separately, correlate the axes (compare with Procrustes rotation). Correlative rather than predictive approach.Can reduce plant data to (D)CA axes first, use these as predictors. Will work if major patterns in one biological data-set are important for the other response data-set. Need not be so.

Need a one-step method where the most important relationships are expressed in the firstfew axes so that nothing important is missed. Co-correspondence analysis.

Page 18: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Co-correspondence analysis (Co-CA)

Problem with combined CA is that each CA has its own site weights (the site's total abundance of the species in the analysis). Pointless to have weights that are a sum of both beetles and plants.

As in CA (reciprocal averaging algorithm) but has an explicit maximisation criterion for Co-CA, the covariance between WA species and site scores of beetles should be maximised with WA species and site scores of plant data. Replaces linear combinations (PCA, PLS, RDA) with weighted averages, so it is suitable for unimodal biological data.

Symmetric, descriptive Co-CA (can swap data sets)

Asymmetric, predictive Co-CA (data A are thought to influence data B)

CA - selects species scores (by WA) to maximise variance of site scores under the constraint that the species scores have unit variance. Symmetrical in that species and sites can be interchanged in the optimisation criterion.

Co-CA - calculates two sets of WA species and site scores to maximise weighted covariance between the two sets of WA species and site scores with allowance for differences in weights among data. What is maximised is covariance between two sets of site scores with common site weights; covariance is maximised by finding appropriate species weights. Species scores of one set are weighted averages of other set's site scores and site scores are weighted averages of the species scores of own set.

Page 19: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Beetles 91 species, 173 plant species, 30 sites.

Eigenvalues of the first three axes of separate CAs and DCAs and of symmetric Co-CA of beetles and plants.

Axis

Method 1 2 3

Beetles

CA 0.50 0.36 0.32 4.99

DCA 0.50 0.32 0.21

Length of gradient 3.22 2.74 2.57

Plants

CA 0.57 0.53 0.42 5.65

DCA 0.57 0.41 0.27

Length of gradient 3.44 2.99 2.88

Beetles-plants Co-CA 0.25 0.13 0.08 0.94

Highly structured data-sets - high eigenvalues, long gradients.

Page 20: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Correlation coefficients between beetle-derived and plant-derived site scores of the first three axes of separate CAs and DCAs and of symmetric Co-CA (% fit = the percentage fit of the beetle data by the first two plant-derived axes).

Method Axis % fit (2 axes)

1 2 3

CA 0.88 0.27 0.46 15

DCA 0.89 0.53 0.07 16

Co-CA 0.96 0.94 0.88 19

Highest correlations for all three axes in Co-CA

Percentage variance for 2 axes highest (19%) in Co-CA

Page 21: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Direct ordination procedure for relating one community data-set to another community data-set.

Combines WA and PLS to maximise covariance between WA species scores of one community data-set with those of another. Finds ecological gradients common to both.

Species assemblages are a multivariate 'bio-assay' of the environment. Assemblages analysed by Co-CA often give better predictions of another set of assemblages than using environmental variables alone. Often environmental basis for ecological gradients is not precisely known.

Fen bryophytes - vascular plants 28% explained Co-CAFen bryophytes - environmental variables 17%

Co-CA can be used to find good indicators for biodiversity. Not all species groups are equally easy to sample or identify. Can try to predict a 'difficult' group from an 'easy' group. Need representative full data for both species groups from common set of sites for Co-CA. After this, only the 'easy' group need be sampled. Another idea is to look at biological data at different taxonomic levels - species, genera, families, or as functional types. See how well each predicts each other.

Page 22: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Felde et al. (2014)Setesdal, south-central Norway* = lakes

A palaeoecological application of Co-CA

Page 23: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Felde et al. (2014)

Setesdal pollen percentages

Page 24: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Felde et al. (2014)

Setesdal plant abundances

Page 25: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Felde et al. (2014)

Setesdal plants as pollen equivalents

Page 26: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

How similar are the patterns in the modern pollen and the modern vegetation (plants or pollen equivalents)

Co-correspondence analysis

Co-CA also been used to quantify co-correspondence between down-core variables (must be in identical samples) (e.g. diatoms, cladocera, chironomids)

See how co-correspondence analysis decreases with increasing elevation, far-distance pollen blown up from low elevations, distorting the pollen–vegetation relationship.

Felde et al. (2014)

Page 27: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Classification and Regression Trees (CART) and Indicator-Species Analysis (IndVal)

Common questions in analysis of large multivariate data-sets (fossil or modern) are

(1) Are there any ‘real’ groups or clusters in the data (i.e. groups that are not simply an artefact that a clustering algorithm will, by design, force things into ‘groups’ even random numbers)?

(2) How many ‘real’ groups are there in a data-set?

Related question is which variables are ‘indicative’ of particular groups (‘indicator species’).

Recent developments in applied statistics now make it possible to answer these questions.

Page 28: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Explain variation of single response variable by one or more explanatory or predictor variables.Response variable can be quantitative (regression trees) or categorical (classification trees).Predictor variables can be categorical and/or quantitative.Trees constructed by repeated splitting of data, defined by a simple rule based on single predictor variable.At each split, data partitioned into two mutually exclusive groups, each of which is as homogeneous as possible. Splitting procedure is then applied to each group separately.Aim is to partition the response into homogeneous groups but to keep the tree as small and as simple as possible.Usually create an over-large tree first, pruned back to desired size by cross-validation.Each group typically characterised by either the distribution (categorical response) or mean value (quantitative response) of the response variable, group size, and the predictor variables that define it.

Page 29: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Splitting ProceduresWay that predictor variables are used to form splits depends on their type.

1. Categorical variable with two levels (e.g. small, large), only one split is possible, with each level defining a group.

2. Categorical variables with more than two levels, any combination of levels can be used to form a split. With k levels, there are 2k-1 –1 possible splits.

3. Quantitative predictor variables, a split is defined by values less than and greater than some chosen value. Only the rank order of quantitative variables determines a split, and for u unique values there are u-1 possible splits.

From all possible splits of predictor variables, select the one that maximises the homogeneity of the two resulting groups. Homogeneity can be defined in many ways, depending on the type of response variable.

Trees drawn graphically, with root node representing the undivided data at the top, and the branches and leaves (each leaf representing a final group) beneath. Can also show summary statistics of nodes and distributional plots.

Page 30: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Ecological example Regression tree Classification tree

(5 point abundance) ( +/ - )

Splits minimise sum-of-squares within groups in regression tree; splits are based on proportions of presence and absence in the classification tree.CART can be used for (i) description and summarisation of data and (ii) prediction purposes for new data.Can identify the environmental conditions under which a taxon is particularly abundant (regression tree) or particularly frequent (classification tree).

Regression tree analysis of the abundance of the soft coral species Asterospicularia laurare rated on a 0-5 scale; only values 0-3 were observed. The explanatory variables were shelf position (inner, mid, outer), site location (back, flank, front, channel), and depth (m). Each of the three splits (nonterminal nodes) is labelled with the variable and its values that determine the split. For each of the four leaves (terminal nodes), the distribution of the observed values of A. laurae is shown in a histogram. Each node is labelled with the mean rating and number of observations in the group (italic, in parentheses). A. laurae is least abundant on inner- and mid-reefs (mean rating = 0-038) and most abundant on front outer-reefs at depths 3m (1.49). The tree explained 49.2% of the total ss, and the vertical depth of each split is proportional to the variation explained.

Classification tree on the presence-absence of A. laurae. Each leaf is labelled (classified) according to whether A. laurae is pre-dominantly present or absent, the proportions of observations in that class, and the number of observations in the group (italic, in parentheses). The misclassification rate of the model was 9%, compared to 15% for the null model (guessing with the majority, in this case the 85% of absences).

Page 31: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Regression trees explaining the abundances of the soft coral taxa Efflatounaria, Sinularia spp., and Sinularia flexibilis in terms of the four spatial variables (shelf position, location, reef type, and depth) and four physical variables (sediment, visibility, waves, and slope). At the bottom of the cross-validation plots (a, d, g), the bar charts show the relative proportions of trees of each size selected under the 1-SE rule (grey) and minimum rules (white) from a series of 50 cross-validations. For Efflatournaria (a), a five-leaf tree is most likely by either the 1-SE or the minimum rule. For Sinularia spp. (d), five- to eight-leaf trees have support, and for S. flexibilis (g), five- to nine-leaf trees have support. Cross-validation plots (a, d, g), representative of the modal choice for each taxa according to the 1-SE rule, are also shown. For all three taxa, a five-leaf tree was selected (c, f, i). The shaded ellipses enclose nodes pruned from the full trees (b, e, h), each of which accounted for > 99% of the total ss.

Page 32: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,
Page 33: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Multivariate Regression TreesDe'Ath, G. (2002) Ecology 83, 1105-1117

Natural extension of univariate regression trees. Considers multivariate response, not single response.

Replace univariate response by multivariate assemblage response and redefine the impurity of a node by summing the univariate impurity measure over the multivariate response.

Extend univariate sum-of-squares impurity criterion to multivariate sum-of-squares about the multivariate mean. Sum of squared Euclidean distances (SSD) of samples about the node centroid.

Each split minimises the SSD of samples from the centroids of the nodes to which they belong. Maximises the SSD between node centroids (cf. k-means clustering). This minimises SSD between all pairs of samples within nodes and maximises SSD between all pairs of samples in different nodes.

Each tree leaf can be characterised by multivariate mean of its samples, number of samples at the leaf, and the predictor values that define it.

Forms clusters of sites by repeated splitting of data, each split defined by simple rule based on environmental values. Splits chosen to minimise the dissimilarity of sites within node.

Page 34: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

MRT is a form of constrained clustering, with constraints set by the predictor variables and their values

MRT can be extended to dissimilarity measures other than squared Euclidean distance (distance-based MRT)

Can identify indicator species using Dufrêne & Legendre (1967) INDVAL approach

Page 35: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Use multivariate regression trees for numerical zonation of fossil data (depth as age as predictor)

Simpson & Birks (2012)

Lowest cross-validated relative error is 8 groups but 6 groups lie within one standard error of 8 groups so this is the simplest partition in ‘real’ statistically different groups

Page 36: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Can also be applied to modern surface samples (multivariate classification trees with vegetation groups as predictors)

Felde et al. (2014)

Page 37: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Given these groups, are there any statistically significant ‘indicator species’?

Basic concept and tradition in ecology and biogeography – characteristic or indicator species e.g. species characteristic of particular habitat, geographical region, vegetation type. Valuable in monitoring, conservation, management, description, and stratigraphy. Add ecological meaning to groups of sites discovered by clusteringINDICATOR SPECIES – indicative of particular groups of sites. ‘Good’ indicator species should be found mostly in a single group of a classification and be present at most of the sites belonging to that group. Important DUALITY (faithful AND high constancy)INDVAL – Dufrene & Legendre (1997) Ecological Monographs 67, 345-366Derives indicator species from any hierarchical or non-hierarchical classification of objectsIndicator value index based only on within-species abundance and occurrence comparisons. Its value is not affected by the abundances of other species. Significance of indicator value of each species is assessed by a randomisation procedure.

Page 38: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Specificity measure FAITHFULNESS

Aij = N individuals ij / N individuals i.

sum of the mean abundance of species i over all groups

Mean abundance of species i across the sites in group j

(means are used to remove any effects of variation in the number of sites belonging to the various groups)

Fidelity measure CONSTANCY

Bij = N sites ij / N sites. j

number of sites in group j where species i is present

total number of sites in cluster j

Aij is maximum when species i is present in group j only

Bij is maximum when species i is present in all sites in group j

Indicator value (Aij . Bij . 100) % INDVALij

Indicator value of species i for a grouping of sites is the largest value of INDVALij observed over all groups j of that classification.

INDVALi = max (INDVALij)

Will be 100% when individuals of species i are observed at all sites belonging to a single group

Indicator Species Value

Page 39: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

A random re-allocation procedure of sites among the groups is used to test the significance of INDVALi

Can be computed for any given partition of grouping of sites and/or for all levels of a hierarchical classification of sites.

Page 40: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Diagram of the analysis steps for the indicator value method

UPGMA-WARD

Site groups

Site ranking

MDS DCA

PcoA Ca

Hierarchicalcluster(s) Non-hierarchical

cluster(s)

k means

SitesSpecies

Sites

Any site typology

Measuring SpeciesIndicator Power

Random permutation of sites in the typology

Observed valueA randomised INDVAL to be included in the distribution

Randomised INDVAL distribution

Page 41: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Dendrogram representing the TWINSPAN classification of the year-catch cycles. The indicator species relative abundance levels are expressed on an ordinal scale (1, 0-2%; 2, 2-5%; 3, 5-10%; 4, 10-20%; and 5, 20-100%.

Chalky mesic grasslands

Chalky xeric grasslands

Zn grasslands and xeric sandy heathlands

Atypical and xeric gravelly heathlands

Temporary flooded heathlands

Peaty heathlands

Fringes of ponds and alluvial grasslands

Swamps and raised mires

T. secalis (1)P. nigrita (1)

D. globosus (1)

A. communis (1)

P. melanarius (1)

P. cupreus (1)

A.. equestris (1)

C. problematicus (3)A. ater (1)

P. versicolor (3)T cognatus (1)

P. madidus (1)H. rubripes (1)P. cupreus (1)

P. versicolor (3)P. lepidus (1)

C. melanocephalus (1)

P. cupreus (1)

B. ruficolis (1)D. globosus (1)C. violaceus (1)

P. diligens (1)P. rhaeticus (1)A. fuliginosus (1)P. minor (1) P. minor (3)

A. fuliginosus (1)L. pilicornis (1)

4

2

1

6

3

7

5

Carabid beetles 97 species. 123 year-catches from 69 different localities representing 9 habitats.

Page 42: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Feld

e et

al.

(201

4)

Modern pollen and vegetation types

Page 43: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Procrustes Analysis and Comparisons of Ordinations and Classification

Many ordination or ‘scaling’ methods can be used to summarise complex multivariate data in few (usually 1 or 2) dimensions

Principal components analysisCorrespondence analysisDetrended correspondence analysisNon-metric multidimensional scaling

Metric scaling (= principal coordinates analysis)Constrained ordinations (e.g. canonical correspondence analysis)

All these methods make different assumptions of the data (linear or unimodal responses, abundant taxa have greatest influence, rare taxa important or unimportant, etc.)Are the results we obtain with different methods consistent or are they method-dependent?Need a numerical method to compare two ordinations of the same n samples. Procrustes rotation or analysis

Page 44: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Procrustes Rotation rotates ordinations to maximum similarity between two

ordinations and estimates the minimised difference

Two configurations of points in ordinations representing the same n samples.

Take one configuration as fixed, move the other to match as closely as possible and to minimise the sum of squared distances of the transformed points from the respective points of the fixed configuration

1. Translation of origin – shift the origins of the co-ordinate axes2. Rotation and/or reflection of axes3. Uniform scaling (deflate or inflate the axis scale)

Single points can move a lot although the sum of squared distances can stay fairly constant, especially in large data sets.

Page 45: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,
Page 46: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Procrustes rotation of NMDSCAL (circles) and

PCA (arrows) ordinations

Page 47: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Procrustes rotation residuals (differences between NMDSCAL and

PCA ordination site scores)

Page 48: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Any number of configurations. Basic idea is to find a consensus or centroid configuration so that the fit of ordinary Procrustes rotation to this centroid over all configurations, is optimal. Minimise m2 where m2 = mi

2 where mi2 is Procrustes statistic

for each pair-wise comparison.

1. Translation2. Rotation and/or reflection3. Scaling

Generalised Procrustes Rotation

Can ordinate all the m2 values in a principal coordinates analysis

Page 49: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Example – to compare results of 12 different ordination methods to the same data

m2 can be considered as squared distances in a PCOORD analysis

Location of ordination methods on the two principal co-ordinates axes: these two axes represent 75% of the variation m2 statistics.

3 = log abundance data 4 = all joint absences ignored5 = abundance data

N = non-metric scaling C = correspondence analysis P = principal co-ordinates analysis

1,2 = presence/absence data only 2 = all joint absences ignored 3,4,5 = abundance data

Page 50: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Felde et al. (2014)

See how CA, DCA, and CCA form second axis

Can also use PROTEST to assess if m2 value for a given comparison of two ordinations is statistically significantly different from random expectation derived from a computer-intensive randomisation test

Comparison of ordinations of modern pollen data

Page 51: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Can have many classification (clusterings or partitionings) of same data

e.g. k-means clusteringSpherical k-meansTWINSPAN

Ward’s clustering method

Multivariate classification trees

How can we compare these clusterings?

Page 52: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

I II IIIClassification I 2 2 1

B II 1 0 4 (n = 10)

Classification A

How to Compare Classifications (1) Cross-classification table 

(2) Rand coefficient (1971) J. Amer. Stat. Assoc. 66, 846-850

1) - (n n 21

1

2

22

21

i j

iji j i

ijj

ij nnn

c

c = 1 – [½{(2 + 1)2 + (2 + 0)2 + (1 + 4)2 + (2 + 2 + 1)2 + (1 + 0 + 4)2} – 22 + 22 + 12 + 12 + 02 + 42] / 45

= 1 – [½ {38 + 50} – 26] / 45= 1 – 18/45

= 0.6Range 0 (dissimilar) to 1 (identical classifications)

Page 53: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Rand's coefficient should be corrected for chance so as to ensure

1. its expected value is 0 when the partitions are selected at random (subject to the constraint that the row and column totals are fixed)

2. its maximum value is 1

The similarity between two independent classifications of the same set of objects can be assessed by comparing their Rand statistic with its distribution under the randomisation model.

For small values of n objects, the complete set of n! values of Rand can be evaluated. For large values of n, comparison is made with the values resulting from a random subset of permutations.

Page 54: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

3 groups 7 groups 11 groupsAgglomerative (3 groups) 0.76 0.65 0.64Agglomerative (5 groups) 0.69 0.76 0.77Agglomerative (9 groups) 0.61 0.86 0.87Hybrid (9 groups) 0.61 0.86 0.87Hybrid (11 groups) 0.59 0.85 0.88

Matrix of Rand’s (1971) Coefficients between Partitions of the Lichti-Federovich and Ritchie (1968) Data Based on Vegetation-Landfrom Units andPartitions Suggested by Several Numerical Classifications of the Surface-Pollen Data

Vegetation – landform classificationNumerical pollen classification

Page 55: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Given the Rand values between all pairs of classifications, can ordinate them using principal coordinates analysis to see how similar different classifications are

Felde et al. (2014)

Page 56: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Results from related methods (e.g. k-means, Ward’s method) that are based on same underlying numerical approach (e.g. sum-of-squares) more similar than results based on methods with different underlying numerical approach (e.g. random forests, TWINSPAN).Bottom line is that, in general, palaeoecological data are well structured and any robust ordination or clustering method detects this structure.

General recommendations for data summarisation 1. Modern surface samples

Correspondence analysis, detrended correspondence analysis, principal curvesMultivariate classification trees, k-means clustering

2. Fossil dataPrincipal components analysis, CA, DCA, principal curvesMultivariate regression trees

Page 57: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Principal Curves (PC)

Principal components analysis (PCA) widely used as data-summarisation technique. Axes are linear combinations of the data the best explain, in a statistical sense, the data. Components are inherently linear and if data do not follow linear patterns, PCA is sub-optimal in capturing this non-linear variation. Hence CA, non-metric scaling, or principal coordinates analysis are used as ecological and palaeoecological data are inherently non-linear. Species responses are non-linear, usually unimodal.

Page 58: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

De'ath, G. (1999) Ecology 80, 2237-2253

Principal curves are smooth one-dimensional curves in a high-dimension space.Form of non-linear PCA, analogous to LOESS smoothers as a non-linear regression tool.Principal curves minimise sum of squares distances from data (as does PCA) but to a curve, not to a line or plane as in PCA.

Two species along single gradient Principal curve showing gradient location

Page 59: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

(a) least-square regression

(b) PCA

(c) cubic smoothing spline

(d) PC – combines (b) and (c) to create PC. Tries to minimise the orthogonal distances

Simpson & Birks (2012)

Degree of smoothness constrained by a penalty term. Optimal degree of smoothing identified by generalised cross-validation. Point on the PC to which an object projects is the point on the curve that is closest to the object in m dimensions.

Page 60: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Fitting is complex two-step iterative procedure. Start with a PCA result

25 sites, 4 species, Gaussian responses, one gradient. Plotted on first two PCA axes. Iterative fitting of principal curves.

(d) Improved and final fit with 7 d.f. 98.3% variance

(e) Result of using PCA axis 1 (50.4%) as start and 3 d.f.

(f) As (e) but 7 d.f.

(a) Data using PCA axis 2 (39.4%) as initial curve

(b) First iteration, snooth spline 3 d.f.

(c) Convergence with 3 d.f.

Page 61: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Also models, using smoothers, the response variables along the PC

Page 62: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Principal curves and real data

12 hunting spiders at 28 sites and 6 environmental variables

Principal curve superimposed on PCA biplot. Numbers are locations along the gradient. Principal curve captures 90% of species variance. Modelled environmental variable values for 6 locations show PC is mainly moisture, sand, moss and twig gradient.

Page 63: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Species responses along principal curvesAll are unimodal. Optima well approximated by intersection of species vectors with the principal curve. Curves have approximately equal tolerances.

Ideal for finding 1-dimensional gradients that explain species composition as well, or better than, higher dimensional ordination methods. Have been extended to 2-dimensional gradients as principal surfaces.

Less restrictive in assumptions than PCA, CA, or DCA. Only assumes smooth responses. Very neutral method (cf. LOESS in regression).

Computationally difficult, hardly used yet...

Abundances and response curves from the principal curve gradient analysis of the hunting spider data. Each panel represents a single species (8-letter code). The plots suggest that the principal curve fit is adequate and show unimodal response curves of approximately equal tolerances, with maxima located at widely varying locations along the gradient.

Page 64: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Palaeoecological use – Abernethy Forest late-glacial–early-Holocene pollen data

Simpson & Birks (2012)

PC axis 1 46.5%

PC axis 2 23.7%

Total 80.2%

Page 65: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Simpson & Birks (2012)

PC 95.8%

PCA1 + PCA2 80.2%

CA1 + CA2 52.3%

Distance along PC expressed as rate of change per kyr

Distance along gradient expressed a proportion of total gradient for PC, PCA1, and CA1

Page 66: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Sim

pson

& B

irks

(201

2)

Response curves for 9 most abundant pollen taxa in Abernethy Forest

Page 67: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Felde et al. (2014)

PC using different start configurations

PCA 41.6% 76.1%

PCoA 69.7% 77.9%

CA 37.8% 79.3%

NMDS 69.5% 73.2%

RDA 55.5% 74.8%

CCA 58.6% 72.7%

Modern pollen assemblages

Page 68: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

PC always as good as or better than simple ordination 1 or 2 axes

Felde et al. (2014)

Page 69: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

PCs very useful and powerful data-summarisation technique for very long (1–3.5 million year) pollen records of alternating glacial and interglacial stages from Colombia, Siberia, and Greece – on-going work by Vivian Felde, Chronis Tzedakis, Henry Hooghiemstra, Ulrike Herzschuh, and myself.

Page 70: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Intrinsic and Extrinsic Drivers of Change

Palaeoecological fossil sequences only, data and interpretation analyses rather than data summarisation.

What drives observed stratigraphical changes in a sequence?

Williams et al. (2011) J Ecol 99: 664-577

Extrinsic and intrinsic forcing of abrupt ecological change: case studies from the late Quaternary

Page 71: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Extrinsic (external) drivers of abrupt ecological change and intrinsic (internal) drivers of abrupt ecological changeExtrinsic

Intrinsic

Williams et al. (2011)

Page 72: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

How to detect extrinsic and intrinsic drivers from palaeo-ecological data? Need fossil and past environmental dataSeddon et al. (2014) Ecology 95: 3046-3055

Page 73: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

To detect regime shifts (change points), several methods for ‘change-point’ analysis

Seddon et al. (2014)

3 major change points

Page 74: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Seddon et al. (2014)

Non-linear regressions of diatom transitions in relation to drivers

Closely track environment, suggesting extrinsic drivers

Page 75: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Seddon et al. (2014)

Change from mangrove to a microbial-mat dominated system 945 yr BP

No fit in non-linear regression between δ13C and Ti influx

Good fits for two halves of data

Suggestive of intrinsic regime shift

‘Regime shift’ or ‘tipping point’

Major challenge to apply this methodology to evaluate relative importance of extrinsic and intrinsic drivers. Suspect extrinsic drivers are the most frequent and important

Extrinsic Intrinsic

Page 76: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Evaluation of Palaeoenvironmental Reconstructions

Major breakthrough in Quaternary science was the development of transfer functions (calibration functions) by Imbrie & Kipp (1971) that transformed fossil data (e.g. pollen, foraminifera, diatoms) into estimates of past environment (e.g. climate, sea-surface temperatures, lake-water pH)

John Imbrie

Nilva Kipp†

Page 77: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

General Theory of Reconstruction

Based on a diagram by Steve Juggins

Xm

Xf

Yf

Ym

Ûm TRANSFER FUNCTION

Page 78: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Many assumptions in this approach irrespective of which numerical method is used to derive the transfer function.

“Environmental variable (e.g. summer temperature) to be reconstructed is, or is linearly related to, an ecologically important variable in the system”

“Other environmental variables than, say summer temperature, have negligible influence, or their joint distribution with summer temperature in the fossil set is the same as in the training set”

Birks et al. (1990)

Page 79: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Numerical methods such as weighted averaging (WA), WA partial least squares, and modern analogue technique (MAT) will produce ‘reconstructions’ even with random data!

Key question therefore

Is an environmental reconstruction statistically significant?

Telford and Birks (2011) Quat Sci Rev 30: 1272-1278 doi: 10.1016/j.quatscirev.2011.03.002

A reconstruction is considered statistically significant if it explains more of the variance in the fossil data than most (95% by convention) reconstructions derived from transfer functions trained on randomised data

Richard Telford

Page 80: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Stages• PCA of fossil core data to determine the maximum amount of

variance explicable by one axis or latent variable, say 30%• Do a reconstruction and use the reconstruction as an

‘environmental’ variable in a RDA to see how much variance the reconstruction explains, say 20%

• Do 999 reconstructions using the same biological data, modern and fossil, but with environmental data drawn from a uniform distribution

• Derive an empirical distribution of variance explained based on 999 randomisations and calculate the p-value of the actual reconstructed value asp = Number of reconstructions ≥ 20% (including actual one)

Number of reconstructions + 1 (the actual one)

Page 81: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Telford & Birks (2011)

Round Loch of Glenhead, p = 0.006

Page 82: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Can test if more than one reconstruction made from one biological data-set is statistically significant.

Chukchi Sea dinoflagellates – summer sea-surface temperature; sea-ice duration; summer salinity

Summer salinity not significant (p = 0.146)

What about ice duration and SST?

Telford & Birks (2011)

Page 83: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Partial out SST first as it explains marginally more of the variance (p = 0.003). Ice no longer significant when SST is allowed first. No significant independent information.

Applicable to almost all reconstruction methods, not just WA or WA-PLS

Telford & Birks (2011)

Page 84: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Many reconstructions turn out not to be statistically significant as basic assumptions of the transfer functions are violated because of spatial autocorrelation, or because of strong collinearity of environmental variables (e.g. July temperature, JJA temperature, growing season length, growing degree days).

Important approach because it is testing a hypothesis, namely a reconstruction. Analytical phase in palaeoecology.

Page 85: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Conclusions

Quantitative palaeoecology is not senescent or stable but is continuing to make important progress – principal curves, statistical testing of reconstructions, intrinsic and extrinsic drivers, co-correspondence analysis, etc.

Progress is possible because of very talented young generation of researchers in quantitative palaeoecology and a brilliant set of applied statisticians. Essential to have effective and full joint discussions . Both are needed if progress is to continue.

Page 86: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Quaternary palaeoecology has reached a major stage in its development, namely identifying key questions and priority research areas for palaeoecology.

December 2012 Palaeo-50 workshop

905 questions submitted from 127 individuals in 26 countries and 5 continents

Reduced by removing duplicates to 804 questions in 55 topics

The 66 participants then narrowed the 804 questions down to 50 in 6 topics during an intensive 2-day workshop

Page 87: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

1. Human-environment interactions in the Anthropocene

2. Biodiversity, conservation, and novel ecosystems

3. Ecosystem processes and biogeochemical cycling

4. Comparing, combining, and synthesising information from multiple records

5. Developments in palaeoecology

Seddon et al. (2014)

Alistair Seddon

Anson Mackay

Ambroise Baker

Page 88: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Of the 50 questions, 18 are clearly quantitative and can only be answered using state-of-the-art numerical procedures, and 17 require significant numerical input. (In total 35 out of the 50 questions require quantitative input)

Quantitative palaeoecology is thus now part of mainstream palaeoecology. Quite a change from the pioneer phase of 1965-1974 – continual criticism and doubts about the value of what we were trying to do.

Much has happened in quantitative palaeoecology in last 50 years. Very much still to be done by the current generation of active researchers and the up-coming new generation.

Subject is very much alive, well, and progressing. Hopefully it will continue to develop in next 50 years.

Page 89: After 50 Years of Quantitative Palaeoecology – Senescence, Maturity, or Progress? H John B Birks University of Bergen and University College London Lanzhou,

Acknowledgements

Cajo ter Braak

Alistair Seddon

Vivian Felde

Anson Mackay

Trevor Hastie

Gavin Simpson

Steve Juggins

Richard Telford

Cathy Jenks