quantitative palaeoecology lecture 4. quantitative environmental reconstructions bio-351

200
QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Upload: wendy-rich

Post on 14-Jan-2016

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

QUANTITATIVE PALAEOECOLOGY

Lecture 4.Quantitative Environmental

Reconstructions

BIO-351

Page 2: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

IntroductionIndicator-species approachAssemblage approachMutual Climate Range methodProbability density functions

Proxy dataGeneral theoryAssumptions of transfer functionsLinear-based methods

Inverse linear regressionInverse multiple linear regressionPrincipal components analysis regressionSegmented inverse regressionPartial least squares

Requirements – biological and statistical

CONTENTS

Page 3: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Non-linear (unimodal) based methodsMaximum likelihood regression and

calibrationWeighted averaging regression and

calibration Error estimationTraining set assessmentReconstruction evaluationReconstruction validationExamples of weighted-averaging

reconstructionsWeighted-averaging – assessmentCorrespondence analysis regressionWeighted-averaging partial least-squares

(WA-PLS)Pollen-climate response surfacesAnalogue-based approachesConsensus reconstructions and smoothersUse of artificial simulated data setsNo analogue problemMultiple analogue problemMulti-proxy approachesSynthesis

CONTENTS (2)

Page 4: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

INTRODUCTION

‘TRANSFER FUNCTION’ or ‘BIOTIC INDEX’CALIBRATION or BIOINDICATION

INDICATOR-SPECIES APPROACH = SINGLE SPECIES BIOASSAY 

ASSEMBLAGE APPROACH = MULTI-SPECIES BIOASSAY

Birks H. J. B. (1995) Quantitative palaeoenvironmental reconstructions. In Statistical modelling of Quaternary science data (ed D Maddy & J S Brew), Quaternary Research Association pp161–254.

ter Braak C. J. F. (1995) Chemometrics and Intelligent Laboratory Systems 28, 165–180.

Page 5: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Gradient analysis:

Bioindication:

Environment gradient

Community

Community

Environment

GRADIENT ANALYSIS AND BIOINDICATION

Relation of species to environmental variables or gradients

In bioindication, use species optima or indicator values to obtain an estimate of environmental conditions or gradient values. Calibration, bioindication, reconstruction.

Page 6: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

The thermal-limit curves for Ilex aquifolium, Hedera helix, and Viscum album in relation to the mean temperatures of the warmest and coldest months. Samples 1,2,and 3 represent samples with pollen of Ilex, Hedera, and Viscum, Hedera and Viscum, and Ilex and Hedera, respectively. From Iversen 1944.

INDICATOR-SPECIES APPROACH

Page 7: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

ASSEMBLAGE APPROACH

Compare fossil assemblage with modern assemblages from known environments.

Identify the modern assemblages that are most similar to the fossil assemblage and infer the past environment to be similar to the modern environment of the relevant most similar modern assemblages.

If done qualitatively, standard approach in Quaternary pollen analysis, etc., since 1950s.

If done quantitatively, modern analogue technique or analogue matching.

Page 8: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Grichuk et al USSR 1950s–1960sAtkinson et al UK 1986, 1987

Coleoptera TMAX - mean temperature of warmest month

TMIN - mean temperature of coldest month

TRANGE - TMAX–TMIN

Quote median values of mutual overlap and ‘limits given by the extremes of overlap'.

MUTUAL CLIMATIC RANGE METHOD

Page 9: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Thermal envelopes for hypothetical species A, B, and C

Schematic representation of the Mutual Climate Range method of quantitative temperature reconstructions (courtesy of Adrian Walking).

Page 10: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

ASSUMPTIONS

1. Species distribution is in equilibrium with climate.

2. Distribution data and climatic data are same age.

3. Species distributions are well known, no problems with species introductions, taxonomy or nomenclature.

4. All the suitable climate space is available for species to occur. ? Arctic ocean, ? Truncation of climate space.

5. Climate values used in MCR are the actual values where the beetle species lives in all its known localities. Climate stations tend to be at low altitudes; cold-tolerant beetles tend to be at high altitudes. ? Bias towards warm temperatures. Problems of altitude, lapse rates.

495 climate stations across Palaearctic region from Greenland to Japan.

Page 11: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Climate reconstructions from (a) British Isles, (b) western Norway, (c) southern Sweden and (d) central Poland. TMAX refers to the mean temperature of the warmest month (July). The chronology is expressed in radiocarbon years BPx1000 (ka). Each vertical bar represents the mutual climatic range (MCR) of a single dated fauna. The bold lines show the most probable value or best estimate of the palaeotemperature derived from the median values of the MCR estimates and adjusted with the consideration of the ecological preferences of the recorded insect assemblages. Coope & Lemdahl 1995

Page 12: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Kühl et al. (2002) Quaternary Research 58; 381-392

Kühl (2003) Dissertations Botanicae 375; 149 pp.

Kühl & Litt (2003) Vegetation History & Archeobotany 12; 205-214

Basic idea is the quantify the present-day distribution of plants that occur as Quaternary fossils (pollen and/or macrofossils) in terms of July and January temperature and probability density functions (pdf).

Assuming statistical independence, a joint pdf can be calculated for a fossil assemblage as the product of the pdfs of the individual taxa. Each taxon is weighted by the extent of its climatic response range, so 'narrow' indicators receive 'high' weight.

The maximum pdf is the most likely past climate and its confidence interval is the range of uncertainty.

Can be used with pollen (+/-) and/or macrofossils (+/-).

PROBABILITY DENSITY FUNCTIONS

Page 13: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Estimated probability density function of Ilex aquilifolium as an

example for which the parametric normal distribution (solid line) fits

well the non-parametric distribution (e.g., Kernel function (dashed line)

histogram).

Distribution of Ilex aquilifolium in combination with January temperature.

Page 14: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Estimated one- and two-dimensional pdfs of four selected species. The histograms (non-parametric pdf) and normal distributions (parametric pdf) on the left represent the one-dimensional pdfs. Crosses in the right-hand plots display the temperature values provided by the 0.5º x 0.5º gridded climatology (New et el., 1999). Black crosses indicate presence, grey crosses absence of the specific taxon. A small red circle marks the mean of the corresponding normal distribution and the ellipses represent 90% of the integral of the normal distribution centred on . Most sample points lie within this range. The interval, however, may not necessarily include 90% of the data points. Carex secalina as an example of an azonally distributed species is an exception. A normal distribution does not appear to be an appropriate estimating function for this species, and therefore no normal distribution is indicated.

Page 15: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Climate dependences of Carpinus (betulus) (C), Ilex (aquilifolium) (I), Hedera (helix) (H), and Tilia (T) and their combination. The pdf resulting from the product of the four individual pdfs (dotted) is similar to the ellipse calculated on the basis of the 216 points with common occurrences for the four taxa (dashed). No artificial narrowing of the uncertainty range is evident.

Climate dependencies of Acer (A), Corylus (avellana) (C), Fraxinus

(excelsior) (F), and Ulmus (U), and their combination. The pdf (dotted) resulting from the product of the four individual pdfs has a mean very similar to the mean of the pdf (dashed) calculated

based on the 1667 points with common occurrences, but its variances are much

smaller.

Page 16: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 17: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Reconstruction for the fossil assemblage of Gröbern. The thin ellipses indicate the pdfs of the individual taxa included in the reconstruction, and the thick ellipse the 90% uncertainty range of the reconstruction result.

Page 18: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Simplified pollen diagram from Gröbern (Litt 1994), reconstructed January and July temperature, and 18O (after Boettger et al. 2000).

Page 19: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Reconstructed most probable mean January (blue) and July (red)

temperature and 90% uncertainty range (dotted lines)Kühl & Litt (2003)

Page 20: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Comparison of the reconstructed mean January temperature using the pdf-method (green) and the analog technique (blue).

Bispingen uncertainty range – 90%; La Grande Pile – 70%.

Page 21: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

• Biological data from palaeoecological studies 

• Pollen, molluscs, foraminifera, macrofossil plant remains, diatoms, chrysophytes, coleoptera, chironomids, rhizopods, moss remains, ostracods

 • Quantitative counts (usually %)

• Ordinal estimates (e.g. 1-5 scale)

• Presence-absence data (1/0) at different stratigraphical intervals and hence times

ENVIRONMENTAL PROXY DATA

Page 22: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Y - biological responses ("proxy data")

X - set of environmental variables that are assumed to be causally related to Y (e.g. sea-surface temperatures)

B - set of other environmental variables that together with X completely determine Y (e.g. trace nutrients)

If Y is totally explicable as responses to variables represented by X and B, we have a deterministic model (no allowance for random factors, historical influences)

Y = XB

If B = 0 or is constant, we can model Y in terms of X and Re, a set of ecological response functions

Y = X (Re)

In palaeoecology we need to know Re. We cannot derive Re deductively from ecological studies. We cannot build an explanatory model from our currently poor ecological knowledge.

Instead we have to use direct empirical models based on observed patterns of Y in modern surface-samples in relation to X, to derive U, our empirical calibration functions.

Y = XU

GENERAL THEORY

Page 23: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

In practice, this is a two-step process Regression in which we estimate , modern calibration functions or regression coefficients 

Training setYm modern surface-sample dataXm associated environmental data

  or 

(inverse regression)  Calibration, in which we reconstruct , past environment, from fossil core data 

TRANSFER FUNCTIONYf fossil core data fossil set

mU

)(ˆmmm XUY

)(ˆ 1mmm YUX

fX

)(ˆˆ 1fmf YUX

Page 24: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

BIOLOGICAL DATA(e.g. Diatoms, pollen,

chironomids)

ENVIRONMENTAL DATA

(e.g. Mean July temperature)Modern data”training set”

Fossil data

1, , m taxa

nsamples

Ym

1, , m taxa

tsamples

Yf

1 variable

nsamples

Xm

1 variable

tsamples

XoUnknown

To be reconstructed

Page 25: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Outline of the transfer function approach to quantitative palaeoenvironmental reconstruction

Page 26: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Step 1 Regression to estimate modern optima for each species

)(ˆ XmUmYm

where Ym = modern diatom abundance

Xm = modern chemical data (e.g. pH)

= estimated modern pH optimum for diatom species

mU

Step 2 Calibration to reconstruct past chemistry )(ˆ* YfUmXf

where Yf = fossil diatom abundance

Xf = reconstructed past chemistry (e.g. pH)

= inverse of modern species optima from Step 1

UmUm ˆ 1ˆ*

GENERAL THEORY OF RECONSTRUCTION

Page 27: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

REGRESSION ('CLASSICAL REGRESSION')  

Y = f (X) + ERROR Estimate f ( ) from training set by regression. The estimated f (  ) is then ‘inverted’ to find unknown x0 from fossil y0. 

  INVERSE REGRESSION = CALIBRATION  

00ˆ YXf

‘Plug in’ estimate given Y0 and g

)(ˆ

ERROR

00 YgX

YgX

Page 28: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Contain many taxa Contain many zero values  Commonly expressed as percentages - "closed" compositional data Quantitative data are highly variable, invariably show a skewed distribution Non-quantitative data are either presence / absence or ordinal ranks Taxa generally have non-linear relationship with their environment, and the relationship is often a unimodal function of the environmental variables

PROXY-DATA PROPERTIES

Page 29: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

SPECIES RESPONSES

Species nearly always have non-linear unimodal responses along gradients

J. Oksanen 2002

trees

(m)

Page 30: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

1. Taxa in training set (Ym) are systematically related to the physical environment (Xm) in which they live. 

2. Environmental variable (Xf , e.g. summer temperature) to be reconstructed is, or is linearily related to, an ecologically important variable in the system.

3. Taxa in the training set (Ym) are the same as in the fossil data (Yf) and their ecological responses (Ûm) have not changed significantly over the timespan represented by the fossil assemblage.

4. Mathematical methods used in regression and calibration adequately model the biological responses (Um) to the environmental variable (Xm).

5. Other environmental variables than, say, summer temperature have negligible influence, or their joint distribution with summer temperature in the fossil set is the same as in the training set.

6. In model evaluation by cross-validation, the test data are independent of the training data. The 'secret assumption' until Telford & Birks (2005).

ASSUMPTIONS IN QUANTITATIVE PALAEOENVIRONMENTAL

RECONSTRUCTIONS

Page 31: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

INVERSE REGRESSION

mmm YUX ˆ

July temperature = b0 + b1y1 + b2y2 + ... bzyz

Pinus Betula

species parameter

[ Y = UX ] 'Classical' Regression

‘predictor’ (e.g. environmental

variables)

‘response’ (e.g.

biology)

Inverse regression is most efficient if relation between each taxon and the environment is LINEAR and with a normal error distribution. Basically a linear model.

LINEAR-BASED METHODS

Page 32: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Light micrograph of the Quaternary fossil S. herbacea leaf showing epidermal cells and stomata (x40). The cuticle was macerated in sodium hypochlorine (8% w/v) for 2 min and mounted in glycerol jelly with safranin.

Page 33: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

CLASSICAL REGRESSION – e.g. GLM

Y Xresponse predictorvariable variable

Stomatal density = a + b (CO2) + ε

INVERSE REGRESSION

CO2 = a1+b1 (stomatal density) + ε Y X

response predictorvariable variable

CO2 past = a1 + b1 (stomatal density of fossil leaves)

UmYmXm ˆ

UmYfXf ˆ

Page 34: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Term Regression Standard t pcoefficient error

Constant (bo) 294.99 33.248 8.87 <0.001

CO2 (b1) -0.647 0.155 -5.61 <0.001

R2=0.538 R2adj=0.521

Both terms have regression coefficients significantly different from zero and the variance ratio (F[1.27] =31.44) exceeds the critical value of F at the 0.01 significance level (7.68), indicating that stomatal density has a strong statistical relationship with CO2 concentration.

Beerling et al. 1995

Regression plot of the total training set (n=29) for stomatal density of Salix herbacea leaves and the atmospheric CO2 concentration in which they grew. The regression details are as follows:

Page 35: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Kråkenes Lake

Page 36: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Late-glacial CO2 reconstructions at Kråkenes, western Norway (38 m a.s.l.)

Page 37: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Multiple regression of temperature (Xm) on abundance of taxa in core tops (Ym) (inverse regression).

i.e.

mmmmm ybybybybbYUX ...ˆ3322110

mmfmf ybybybybbYUX ...ˆˆ3322110

m

kikkf yX

10

ˆ

Approach most efficient if:1. relation between each taxon and environment is linear with normal error

distribution 2. environmental variable has normal distribution Usually not usable because:1. taxon abundances show multicollinearity2. very many taxa3. many zero values, hence regression coefficients

unstable4. basically linear model Consider non-linear model and introduce extra terms:  Can end up with more terms than samples. Cannot be solved. Hence "ad hoc" approach of Imbrie & Kipp (1971), and related approaches of Webb et al.

...23635

22423

212110 cbcbcbcbcbcbbXm

INVERSE MULTIPLE REGRESSION APPROACH

Page 38: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Location of 61 core top samples (Imbrie & Kipp 1971)

61 core-top samples x 27 taxaPrincipal components analysis 61 samples x 4 assemblages (79%) PRINCIPAL COMPONENTS REGRESSION (PCR)

Page 39: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Abundance of the tropical assemblage versus winter surface temperature for 61 core top samples. Data from Tables 4 and 13. Curve fitted by eye

Abundance of the subtropical assemblage versus winter surface temperature for 61 core top samples. Data from Tables 4 and 13. Curve fitted by eye

Abundance of the subpolar assemblage versus winter surface temperature for 61 core top samples. Data from Tables 4 and 13. Curve fitted by eye

Abundance of the polar assemblage versus winter surface temperature for 61 core top samples. Data from Tables 4 and 13. Curve fitted by eye

Page 40: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Now did inverse regression using 4 varimax assemblages rather than the 27 original taxa. 

Linear where A, B, C and D are varimax assemblages. 

Non-linear  

CALIBRATION STAGE using the fossil assemblages described as the 4 varimax assemblages  

X b b A b B b C b Dm 0 1 2 3 4

X b b A b B b C b D b AB b AC

b AD b BC b BD b CD b A b B

b C b D

m

0 1 2 3 4 5 6

7 8 9 10 112

122

132

142

X b b A b B b C b Df f f f f 0 1 2 3 4

Page 41: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

General abundance trends for four of the varimax assemblages related to winter surface temperatures.

Winter surface temperatures "measured" by Defant (1961) versus those estimated from the fauna in 61 core top samples by means of the transfer function.

Imbrie & Kipp 1971

Page 42: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Average surface salinities ”measured” by Defant (1961) versus those estimated from the fauna in 61 core top samples.

Summer surface temperatures ”measured” by Defant (1961) versus those estimated from the fauna in 61 core top samples.

Imbrie & Kipp 1971

Page 43: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Palaeoclimatic estimates for 110 samples of Caribbean core V12-133, based on palaeoecological equations (Table 12) derived from 61 core tops. Tw = winter surface temperature; Ts = summer surface temperature; ‰ = average surface salinity.

Salinity

Page 44: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

1. Why 4 assemblages? Why not 3, 5, 6? No cross-validation

2. Assemblages inevitably unstable, because of many transformation, standardization, and scaling options in PCA

3. Assumes linear relationships between taxa and their environment

4. No sound theoretical basis

APPROACH AD HOC BECAUSE

Page 45: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Scatter diagrams of: (A) the percent birch (Betula); and (B) the percent oak (Quercus) pollen versus latitude.

The thirteen regions for which regression equations were obtained.

Bartlein & Webb 1985

SEGMENTED LINEAR INVERSE REGRESSION

Page 46: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Regression equations for mean July temperature from the thirteen calibration regions in eastern North America

Region A: 54-71 N; 90-110 W

Pollen sum: Alnus + Betula + Cyperaceae + Forb sum + Gramineae + Picea + Pinus

July T (oC) = 12.39 + 0.50*Pinus.5 + 0.26*Forb sum + 0.15*Picea.5

(1.61) (.14) (.05) (.10)

- 0.89*Cyperaceae.5 – 0.37*Gramineae – 0.03*Alnus (.13) (.08) (.01)

R2 = 0.80; adj. R2 = 0.78; Se = 0.96oC

n = 114; F = 69.86; Pr = 0.0000

Region B: 53-71 N; 50-80 W

Pollen sum: Abies + Alnus + Betula + Herb sum + Picea + Pinus

July T (oC) = 8.17 + 0.54*Picea.5 + 0.17*Betula.5 - 0.04*Herb sum – 0.01*Alnus (2.27) (.19) (.14) (.01) (.01)

R2 = 0.70; adj. R2 = 0.70; Se = 1.52oC

n = 165; F = 95.48; Pr = 0.0000

Page 47: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Regression equations used to reconstruct mean July temperature at 6000 yr BP. Bartlein & Webb

1985

"We selected the appropriate equation for each sample by identifying the calibration region that; (1) contains modern pollen data that are analogous to the fossil sample; and (2) has an equation that does not produce an unwarranted extrapolation when applied to the fossil sample."

Page 48: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Isotherms for estimated mean July temperatures (ºC) at 6000 yr BP.

Page 49: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Difference map for mean July temperatures (ºC) between 6000 yr BP and today. Positive values indicate temperatures that were higher at

6000 yr BP than today.

Page 50: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Reconstructions produced by the regression approach

Elk Lake, Minnesota

Page 51: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Regression equation applications to the Elk Lake pollen data.

Calibration Region Age Range R2

(varve range)

Mean January 45-55ºN, 85-105ºW 320-10084 0.876

Temperature 45-55ºN, 95-105ºW 10134-11638 0.842

Mean July 40-50ºN,85-95ºW 320-65620.799

Temperature 40-50ºN,85-105ºW 6746-10084 0.786

45-50ºN,95-105ºW 10134-11638 0.701

Annual 40-55ºN,85-105ºW 320-36920.578

Precipitation 40-50ºN,85-105ºW 3794-7662 0.940

45-55ºN,85-105ºW 7862-11638 0.578

Page 52: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Chemometrics – predicting chemical concentrations from near Infra-red spectra

APPROACHES TO MULTIVARIATE CALIBRATION

Responses

Predictors

Page 53: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Form of PC regression developed in chemometrics

PCR - components are selected to capture maximum variance within the predictor variables irrespective of their predictive value for the environmental response variable

PLS - components are selected to maximise the covariance with the response variables

PARTIAL LEAST SQUARES REGRESSION – PLS

PLS usually requires fewer components and gives a lower prediction error than PCR. Both are ‘biased’ inverse regression methods that guard against multi-collinearity among predictors by selecting a limited number of uncorrelated orthogonal components. (Biased because some data are discarded).

Page 54: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

CONTINUUM REGRESSION

= 0 = normal least square regression

= 0.5 = PLS

= 1.0 = PCR

PLS is thus a compromise and performs so well by combining desirable properties of inverse regression (high correlation) and PCR (stable predictors of high variance) into one technique.

PLS will always give a better fit (r2) than PCR with same number of components.

Page 55: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

1. Need biological system with abundant fossils that is responsive and sensitive to environmental variables of interest. 

2. Need a large, high-quality training set of modern samples. Should be representative of the likely range of variables, be of consistent taxonomy and nomenclature, be of highest possible taxonomic detail, be of comparable quality (methodology, count size, etc.), and be from the same sedimentary environment. 

3. Need fossil set of comparable taxonomy, nomenclature, quality, and sedimentary environment. 

4. Need good independent chronological control for fossil set. 

5. Need robust statistical methods for regression and calibration that can adequately model taxa and their environment with the lowest possible error of prediction and the lowest bias possible. 

6. Need statistical estimation of standard errors of prediction for each constructed value. 

7. Need statistical and ecological evaluation and validation of the reconstructions.

BASIC REQUIREMENTS IN QUANTITATIVE PALAEOENVIRONMENTAL

RECONSTRUCTIONS

Page 56: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

A straight line displays the linear relation between the abundance value (y) of a species and an environmental variable (x), fitted to artificial data (•). (a=intercept; b=slope or regression coefficient).

A unimodal relation between the abundance value (y) of a species and an environmental variable (x). (u=optimum or mode; t=tolerance; c=maximum)..

Page 57: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Outline of ordination techniques presented in this paper. DCA (detrended correspondence analysis) was applied for the determination of the length of gradient (LG). LG is important for choosing between ordination based on a linear or on an unimodal response model. Correspondence analysis (CA) is not considered any further because in the microcosm experiment discussed here LG was =<1.5 SD units. LG <3 SD units are considered to be typical in experimental ecotoxicology. In cases where LG<3, ordination based on linear response models is considered to be the most appropriate. PCA (principal component analysis) visualizes variation in species data in relation to best fitting

theoretical variables.

Gradient length estimation

INDIRECT GRADIENT ANALYSIS

DIRECT GRADIENT ANALYSIS

Environmental variables explaining this visualized variation are deduced afterwards, hence indirectly. RDA (redundancy analysis) visualizes variation in species data directly in relation to quantified environmental variables. Before analysis, covariables may be introduced in RDA to compensate for systematic differences in experimental units. After RDA, a permutation test can be used to examine the significance of effects.

Page 58: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Estimate the gradient length for the environmental variable(s) of interest. Detrended canonical correspondence analysis with x as the only external or environmental predictor. Detrend by segments, non linear rescaling, ? rare taxa downweighted. Estimate of gradient length in relation to x in standard deviation (SD) units of compositional turnover. Length may be different for different environmental variables and the same biological data. 

pH 2.62 SDalkalinity 2.76 SDcolour 1.52 SD

 If gradient length < 2 SD, taxa are generally behaving monotonically along gradient and linear-based methods are appropriate. If gradient length > 2 SD, several taxa have their optima located within the gradient and unimodal-based methods are appropriate.

LINEAR OR UNIMODAL METHODS

Page 59: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

• Species

� Sample 'core tops'

Imbrie & Kipp (1971) Core-top data

Page 60: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

1. Forward selection of environmental variables

Winter SST 0.73 p=0.01 82.0%Salinity 0.13 p=0.01 17.0%Summer SST 0.02 p=0.17 1.0%

2. Three environmental variables together explain 46.1% of the observed variation in the 61 core tops. 3. First axis (1 = 0.75) is significantly different (p = 0.01)

from random expectation, indicating that the taxa are significantly related to the environmental variables. 

CANONICAL CORRESPONDENCE ANALYSIS

Page 61: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

 • Bioindication, Calibration, Transfer function, Reconstruction • Gaussian response model - regression+ We know observed abundances y+ We know gradient values x= Estimate or model the species response curves for all species • Bioindication - calibration+ We know observed abundances y+ We know the modelled species response curves for all species= Estimate the gradient value of x • The most likely value of the gradient is the one that

maximises the likelihood function given observed and expected abundances of species

• Can be generalised for any response function

NON-LINEAR (UNIMODAL) METHODS

MAXIMUM LIKELIHOOD PREDICTION OF GRADIENT VALUES

Page 62: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Species - pH response curve

Page 63: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 64: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

GAUSSIAN RESPONSE MODEL

Can be reparametrised as a generalised linear model:

• Gradient as a 2nd degree polynomial

• Logarithmic link function

J. Oksanen 2002

2

1

2bb

u

221b

t

2

21

0 4exp

bb

bh

2

2

2)(

exptux

h

2210)log( xbxbb

Page 65: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Globigerina pachyderma (left coiling)

Summer sea-surface temperature ºC

Optimum = 2Optimum = 2ºº

GLIM

Page 66: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Globigerina pachyderma (right coiling)

Page 67: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Orbulina universa

Page 68: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Globigerina rubescens

Page 69: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Imbrie and Kipp 1971 61 core tops 27 taxa

Summer SST Winter SST Salinity

Significant Gaussian logit model 19 21

21

Significant increasing linear logit model 6 3 4

Significant decreasing linear logit model 1 1 0

No relationship 1 2 2

GAUSSIAN LOGIT REGRESSION

Page 70: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

MAXIMUM LIKELIHOOD PREDICTION OF GRADIENT VALUES

 • Bioindication, Calibration, Transfer function, Reconstruction • Gaussian response model - regression+ We know observed abundances y+ We know gradient values x= Estimate or model the species response curves for all species • Bioindication - calibration+ We know observed abundances y+ We know the modelled species response curves for all species= Estimate the gradient value of x • The most likely value of the gradient is the one that

maximises the likelihood function given observed and expected abundances of species

• Can be generalised for any response function

Page 71: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

MAXIMUM LIKELIHOOD APPROACH

• Likelihood is the probability of a given observed value with a certain expected value

• Maximum likelihood estimation: expected or reconstructed values that give the best likelihood for the observed fossil assemblages

- ML estimates are close to observed values, and the proximity is measured with the likelihood function

- commonly we use the negative logarithm for the likelihood, since combined probabilities may be very small

J. Oksanen 2002J. Oksanen 2002

Page 72: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

INFERRING PAST TEMPERATURE FROM MULTIVARIATE SPECIES COMPOSITION

Modified from J. Oksanen 2002

Temperature (ºC)

9 10 11 12 13 14

Mod

ern

resp

onse

s-lo

gLik

Inferred Observed

Page 73: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

(values in brackets are RMSE when taxa with significant fits only are used).

ROOT MEAN SQUARED ERROR FOR WINTER SST, SUMMER SST, & SALINITY USING DIFFERENT

PROCEDURESWinter SST

Summer SST

Salinity

Imbrie & Kipp 1971 linear 2.57 2.55 0.573

Imbrie & Kipp 1971 non-linear 1.54 2.15 0.571

WACALIB 2.1 – 3.3

Maximum likelihood regression and ML calibration

3.21 2.09 0.711

Weighted averaging regression and WA calibration

1.97 2.02 0.570

WA regression and WA calibration with tolerance downweighting

1.92 2.03 0.560

ML regression, WA calibration 1.56 (1.56)

1.94 (1.94)

0.557 (0.656)

ML regression, WA calibration with tolerance downweighting

1.25 (1.25)

1.80 (1.80)

0.534 (0.615)

WACALIB 3.5 (debugged version!) Maximum likelihood regression and ML calibration

1.20 1.63 0.54

Page 74: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

• Only three parameters:

- u: location of the optimum on gradient x

- h: modal height at the optimum

- t: tolerance or width of response

• Parameters can be estimated with non-linear regression or generalised linear models

• WEIGHTED AVERAGES CAN APPROXIMATE U

Page 75: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

The basic idea is very simple.

In a lake with a certain pH range, diatoms with their pH optima close to the lake’s pH will tend to be the most abundant species present.

A simple estimate of the species’ pH optimum is thus an average of all the pH values for lakes in which that species occurs, weighted by the species’ relative abundance.

(WA regression)

WEIGHTED AVERAGING

Conversely, an estimate of a lake's pH is the weighted average of the pH optima of all the species present.

(WA calibration)

Page 76: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

n

iik

n

iiik

k

Y

XYU

1

Weighted averaging regression

Optimum

n

iik

n

ikiik

Y

UXYt k

1

1

2ˆˆ

21

Tolerance

where Uk is the WA optimum of taxon k

tk is WA standard deviation or tolerance of k

Yik is percentage of taxon k in sample i

Xi is environmental variable of interest in sample i

And there are i=1,....,n samples

and k=1, ....,m taxa

Page 77: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Weighted averaging calibration or reconstruction

m

kik

m

kkik

i

Y

UYX

1

ˆ WAWA

m

kkik

m

kkkik

i

tY

tUYX

1

2

1

2

ˆ

ˆˆˆ WAWAtoltol

Page 78: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Weighted averaging - the simple site average In the simple average all sites where the species is present have equal weight when calculating the optimum.

However, the species is likely to be most abundant at sites near the optimum.Therefore samples with high abundance of the species should be given more weight. In weighted averaging this is achieved by weighting the environment variable by a measure of species abundance.

Page 79: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Weighted averaging calibration A lake will tend to be dominated by taxa with chemical optima close to the lake's chemistry Estimate of this chemistry is given by averaging the optima of all taxa present in the lake. If a species' abundance data are available these can be used as weights:  WA Calibration

   where = estimate of environmental variable for fossil sample i

yik= abundance of species k in fossil sample iuk= optima of species k

m

kik

m

kkik

i

y

uy

x

1

1

ˆ

ˆ

ix

RECONSTRUCTING AN ENVIRONMENTAL VARIABLE FROM A FOSSIL ASSEMBLAGE

Page 80: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 81: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

TRAINING SET - 178 modern diatom samples and lake-water pH

Jack-knifingDo reconstruction 177 times. Leave out sample 1 and reconstruct pH; add sample 1 but leave out sample 2 and reconstruct pH. Repeat for all 177 reconstructions using a training set of size 177 leaving out one sample every time. Can derive jack-knifing estimate of pH and its variance and hence its standard error.

BootstrapDraw at random a training set of 178 samples using sampling with replacement so that same sample can, in theory, be selected more than once. Any samples not selected form an independent test set. Reconstruct pH for both modern test-set samples and for fossil samples. Repeat for 1000 bootstrap cycles.

Mean square error of prediction =

1. error due to variability in estimating species parameters in training set (i.e. s.e. of bootstrap estimates) +

2. error due to variation in species abundances at a given pH (i.e. actual predic-tion error differences between observed pH and the mean bootstrap estimate of pH for modern samples when in the independent test).

Birks et al. 1990

ESTIMATION OF SAMPLE-SPECIFIC ERRORS BASIC IDEA OF COMPUTER RE-SAMPLING

PROCEDURES

Page 82: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Use of data and the bootstrap distribution to infer a sampling distribution. The bootstrap procedure estimates the sampling distribution of a statistic in two steps. The unknown distribution of population values is estimated from the sample data, then the estimated population is repeatedly sampled to estimate the sampling distribution of the statistic.

Page 83: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

The bootstrap algorithm for estimating the standard error of a statistic ; each bootstrap sample is an independent random sample of size n from . The number of bootstrap replications B for estimating a standard error is usually between 25 and 200. As B , seB approaches the plug-in estimate of

)(ˆ xsF

se f

Page 84: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

61 sample training set, draw 61 samples at random with replacement to give a bootstrap training set of size 61. Any samples not selected form a test set. Mean square error of prediction =

error due to variability in estimates of optima and/or tolerances in training set

error due to variation in abundances at a given temperature

+

n

xx bootibooti

boot

2,, )ˆ(

n

xx bootibooti

boot

2,, )(

(s.e. of bootstrap estimates) (actual prediction error differences between observed xi and mean bootstrap estimate

s1 s2

+

( is mean of for all cycles when sample i is in test set).

bootix , bootix ,

For a fossil sample

RMSE = (S1 + S2)½

2

2

,,

2

SˆMSEP

boot

bootibooti

nxx

ERROR ESTIMATION BY BOOTSTRAPPING WACALIB 3.1+ AND C2

Page 85: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

ROOT MEAN SQUARE ERRORS OF PREDICTION ESTIMATED BY BOOTSTRAPPING

W A W Atol

Summer sea-surface temperature C Training setRMSE total 2.31 2.37RMSE S1 0.63 0.70RMSE S2 2.22 2.27

Fossil samples 2.225- 2.283-2.251 2.296

Winter sea-surface temperature C Training setRMSE total 2.23 2.19RMSE S1 0.62 0.7RMSE S2 2.14 2.07

Fossil samples 2.156- 2.106-2.201 2.249

Salinity ‰ Training setRMSE total 0.61 0.60RMSE S1 0.11 0.13RMSE S2 0.60 0.59

Fossil samples 0.603- 0.599-0.607 0.606

Page 86: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 87: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

ROOT MEAN SQUARED ERROR (RMSE) of

CORRELATION BETWEEN r

COEFFICIENT OF DETERMINATION r2

 r or r2 measures strength between observed and inferred values and allows comparison between transfer functions for different variables.

ii xx ˆ nxx ii

ii xx ˆ and

RMSE2 = error2 + bias2

Error = SE RANDOM PREDICTION ERROR ABOUT BIAS

Bias = Mean SYSTEMATIC PREDICTION ERROR (Mean of prediction errors)

ii xx ˆ

ii xx ˆ

Also Maximum bias – divide sampling interval of xi into equal

intervals (usually 10), calculate mean bias for each interval, and the largest absolute value of mean bias for an interval is used as a measure of maximum bias.

TRAINING SET ASSESSMENT

Note in RMSE the divisor is n, not (n - 1) as in standard deviation. This is because we are using the known gradient values only.

Page 88: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

BIAS AND ERROR

Good: Prediction root mean squared error (RMSEP)

Correlation unreliable: depends on the range of observations

• Root mean squared error

RMSE

• Bias b: systematic difference

• Error : random error about bias.

• RMSE2 = b2 + 2

Must be cross-validated or will be badly biased

N

i ii Nxx1

2/)(

J. Oksanen 2002

Page 89: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

• Root Mean Squared Error nxxRMSE iii2)ˆ(

• Two components

- Error

- Bias

RMSE2 = bias2 + error2

• Correlation coefficient is dependent on the range of observations - Large range: Large part of variance explained• Cross-validation must be used in assessing the prediction accuracy 1: Split sample

- Divide your data into training and test data sets

2: Jack-knife

- For every site i repeat:

Remove site i from the data set

Estimate species response curves

Do the calibration for site i

ACCURACY OF PREDICTION

Page 90: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

CROSS-VALIDATION

Leave-one-out ('jack-knife'), each in turn, or divide data into training and test data sets. Leave-one-out changes the data too little, and hence exaggerates the goodness of prediction. K-fold cross-validation leaves out a certain proportion (e.g. 1/10) and evaluates the model for each of the data sets left out.

Badly biased unless one does cross-validation

J. Oksanen 2002

Page 91: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

PARTITIONING RMSEP

RMSEP cf.RMSEr jack rr2 jack r2

PREDICTED VALUES APPARENT VALUES or ESTIMATED VALUESmean bias mean biasmaximum bias maximum bias

TRAINING SET ASSESSMENT AND SELECTION

Lowest RMSEP, highest r or r2 jack, lowest mean bias, lowest maximum bias.

Often a compromise between RMSEP and bias.

22

21 ss

RMSEP2 = ERROR2 + BIAS2

Error due to estimating optima and tolerances

Error due to variations in abundance of taxa at given environmental value

CROSS-VALIDATION STATISTICS

Page 92: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

SWAP (= Surface Waters Acidification Project) Diatom – pH Training Set

178 surface sediments

EnglandNorwayWalesScotlandSweden

551326030

267 taxa – in 2 or more samples with 1% or more in sample pH – arithmetic mean 4.33 – 7.25

mean = 5.59 median = 5.51

Screened to 167 samples pH 4.33 – 7.25mean = 5.56 median = 5.27262 taxa

 RMSE = 0.297 r = 0.933 RMSEP (bootstrapping) = 0.32RMSEP (split-sampling) = 0.31

Page 93: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 94: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

WA WAtol

 RMSE si1 0.072 0.305 RMSE S2 0.312 0.371 Total RMSE of prediction 0.320 0.480________________________________________________  Cross-validation 0.308 0.376 RMSE (0.269-0.338) (0.287-0.541)

ROOT MEAN SQUARED ERRORS OF PREDICTION FOR THE TRAINING SET

Page 95: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

The Round Loch of Glenhead, Galloway

Page 96: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

WA pH reconstructions with bootstrap standard errors of prediction

Page 97: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

INITIAL ASSUMPTIONS 1. Taxa related to physical environment.

2. Modern and fossil taxa have same ecological responses.

3. Mathematical methods adequately model the biological responses.

4. Reconstructions have low errors.

5. Training set is representative of the range of variation in the fossil set.

STATISTICAL AND ECOLOGICAL EVALUATION OF RECONSTRUCTIONS

Page 98: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

1. RMSEP for individual fossil samples

Monte Carlo simulation using leave-one-out initially to estimate standard errors of taxon coefficient and then to derive specific sample standard errors, or bootstrapping.

2. Goodness-of-fit statistics

CCA of calibration set, fit fossil sample passively on axis (environmental variable of interest), examine squared residual distance to axis, see if any fossil samples poorly fitted.

3. Analogue statistics

Good and close analogues. Extreme 5% and 2.5% of modern DCs.

4. Percentages of total fossil assemblage that consist of taxa not represented in all calibration data set and percentages of total assemblage that consist of taxa poorly represented in training set (e.g. < 10% occurrences) and have coefficients poorly estimated in training set (high variance) of beta values in cross-validation).

  < 5% not present reliable< 10% not present okay< 25% not present possibly okay > 25% not present not reliable

RECONSTRUCTION EVALUATION

Page 99: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 100: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

ASSESSMENT OF ANALOGUES ANALOG, MAT Chord distance or chi-squared distances. 

Select first VERY GOOD or CLOSE ANALOGUE

fifth GOOD

tenthpercentiles of all pairs of DC values FAIR n samples DC values RANDOMISATION TESTS ANALOG

12 1n n

Page 101: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 102: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Poor fit

Page 103: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Ordination of (a) chironomid taxa and environmental variables in Labrador, Canada, and (b) lakes.

Relationship between actual and chironomid-inferred summer surface-water temperatures for Labrador lakes.

Walker et al. 1991

Chironomids and climate

Page 104: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

WalkerWalker et al. et al. 19961996

Page 105: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Summer surface-water paleotemperature reconstruction for Splan Pond. For comparison names of climatic events for correlative European time intervals are included. The apparent root-mean-square error of the temperature estimates is 1.32ºC (10).

Percentage abundance of common midge taxa in sediments of Splan Pond, New Brunswick, Canada. For comparison, names of climatic events for correlative European time intervals are included.

Page 106: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Walker Walker et alet al 1997 1997

Page 107: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Reconstructions of the pH history of Lysevatten based on historical data and inference from the subfossil diatoms in the sediment. Historical data are pH measurements (thin solid line) and indirect data from fish reports and data from other similar lakes (thin broken line). The insert, showing pH variations from April 1961 to March 1962, is based on real measurements. Diatom-inferred values (thick solid line) were obtained by weighted averaging.

VALIDATION Diatoms and pH

Page 108: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Plot of observed vs. inferred annual mean TP concentrations (log g l-1) based on simple WA classical regression of 44 lakes.

Comparison of the measured seasonal range in TP concentrations for Mondsee (mean is shown by a line with open circles; minimum and maximum are shown as single lines) with the bootstrap RMSE of prediction for each individual reconstructed TP value (Est_se_p) using the diatom model (Mean boot is shown by a line with filled circles and the lower and upper errors are shown as single lines). All model values are back-transformed to g l-1.

Diatoms and total P validation

Page 109: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Measured annual mean TP concentrations (line with open circles) compared with the diatom-inferred TP values calculated as 3-year running means (single line), for the period 1975-93. All model values are back transformed to g l-1.

Measured

Diatom-Diatom-inferred inferred total Ptotal P

Page 110: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Baldeggersee frozen core

Page 111: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Baldeggersee

Diatom succession in Baldeggersee freeze-core BA93-C between 1885 & 1993. Only major taxa shown

Lotter 1998

Page 112: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Measured total phosphorus (TP) during spring circulation compared to diatom-inferred TP values and median grain-size distribution in the Baldeggersee annual layers (see Lotter et al. 1997c). The large filled circles show the measured spring circulation TP values for the upper-most 15m, whereas the horizontal lines represent the annual TP range in the uppermost 15m of the water column. The dots on the right side of the graph represent samples with close (filled dots; 2nd percentile) and good modern analogues (open dots; 5th percentile).

Page 113: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Diatoms and climate

Diatom-inferred mean July air temperature (black dots) from sediments of the three study lakes Alanen Laanijärvi, Lake 850, and Lake Njulla including sample specific error estimates (vertical error bars) and 210Pb-dating errors (horizontal error bars) compared with measured July T (grey dots) in Kiruna (for Alanen Laanjärvi) and in Abisko (for Lakes 850 and Njullla) during the past century. Measured July T are corrected for elevation (0.57ºC per 100m; Laaksonen, 1976) and smoothed (grey line) with a running mean (n = 13). The stippled lines separate periods with apparent 'good' and 'poor' correspondence between diatom-inferred and measured July T in Lakes 850 and Njulla.

Bigler and Hall 2003

Page 114: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Chironomids and climate

corresponding to the date obtained at each level.The black line is the chironomid-inferred temperatures with the estimated errors as vertical bars (mean±SSE). The horizontal error bars represent an estimated error in dating. The open stars indicate sediment intervals where the instrumental values fall outside the range of chironomid-inferred temperature (mean±SSE). The Pearson correlation coefficient r, and associated p-values are presented and indicate statistically significant correlations between measured and chironomid-inferred mean July air temperature at all study sites. The arrows indicate the climate normals (mean 1960-1999). Larocque & Hall

2003

Comparison between meteorological data and chironomid-inferred temperatures at each of the 4 study sites. The blue line represents the 5- or 2-year running means of the meteorological data at Abisko and Kiruna respectively, corrected using a lapse rate of 0.57ºC per 100m. The red line represents the 5-year (for lakes Njulla, 850, and Vuoskkujarvi) or 2-year (for Alanen Laanijavri) running means of the meteorological data

Page 115: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

1. Ecologically plausible – based on unimodal species response model.

2. Mathematically simple but has a rigorous mathematical theory. Properties fairly well known now.

3. Empirically powerful:

a. does not assume linear responses

b. not hindered by too many species, in fact helped by many species!

c. relatively insensitive to outliers

4. Tests with simulated and real data – at its best with noisy, species-rich compositional percentage data with many zero values over long environmental gradients (> 3 standard deviations).

5. Because of its computational simplicity, can derive error estimates for predicted inferred values.

6. Does well in ‘non-analogue’ situations as it is not based on the assemblage as a whole but on INDIVIDUAL species optima and/or tolerances.

7. Ignores absences.

8. Weaknesses.

WEIGHTED AVERAGING – AN ASSESSMENT

Page 116: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Species packing model: Gaussian logit curves of the probability (p) that a species occurs at a site, against environmental variable x. The curves shown have equispaced optima (spacing = 1), equal tolerances (t = 1) and equal maximum probabilities occurrence (pmax = 0.5). xo is the value of x at a particular site.

Page 117: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Diatoms and pH

Page 118: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

1. Sensitive to distribution of environmental variable in training set.

2. Considers each environmental variable separately.

3. Disregards residual correlations in species data. 

Can extend WA to WA-partial least squares to include residual correlations in species data in an attempt to improve our estimates of species optima.

Page 119: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

• WA estimate of species optimum (u) is good if:

1. Sites are uniformly distributed over species range

2. Sites are close to each other • WA estimates of gradient values (x) are

good if:1. Species optima are dispersed uniformly

around x2. All species have equal tolerances3. All species have equal modal abundances4. Optima are close together

j ij

i jiji y

xyu~

j ij

i iijj y

uyx~

y = abundance i = speciesu = optimum j = sitex = gradient value ~ = WA estimate

WEIGHTED AVERAGES

These conditions are only true for infinite species packing conditions!

Page 120: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

1. Both species and sites must have uniform and dense distribution over the gradient

2. To estimate values at gradient ends, some species optima must be outside the gradient endpoints. Result is bias and truncation

3. To estimate extreme species optima, some sites must be outside the most extreme species optima. Result is bias and truncation

4. Conditions 2 and 3 can be satisfied simultaneously only with infinite gradients

5. WA equations define the two-way reciprocal averaging algorithm of CA -

6. Ranges and variances of weighted averages are smaller than the range of values that they are based on. Need to 'deshrink' to restore the original range and variance.

...~~,~~,~ uxxuux

WEIGHTED AVERAGING CONDITIONS JOINTLY

Page 121: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

BIAS AND TRUNCATION IN WEIGHTED AVERAGES

Weighted averages are good estimates of Gaussian optima, unless the response in truncated.

Bias towards the gradient centre: shrinking.

J. Oksanen 2002

pH

WA GLR WA GLR

Page 122: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

APPROACHES TO MULTIVARIATE CALIBRATIONChemometrics – predicting chemical concentrations from near infra-red spectra

Responses

Predictors

Page 123: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Roux 1979 Reduced Imbrie & Kipp (1971) modern foraminifera data to 3 CA axes. Then used these in inverse regression. RMSE apparent Summer tempWinter temp

PC regression 2.55°C 2.57°C

CA regression 1.72°C 1.37°C

WA-PLS 1.53°C 1.17°C

CORRESPONDENCE ANALYSIS REGRESSION

Page 124: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Extend simple WA to WA-PLS to include residual correlations in species data in an attempt to improve our estimates of species optima. Partial least squares (PLS)

Form of PCA regression of x on y

PLS components selected to show maximum covariance with x, whereas in PCA regression components of y are calculated irrespective of their predictive value for x. Weighted averaging PLS

WA = WA-PLS if only first WA-PLS component is used

WA-PLS uses further components, namely as many as are useful in terms of predictive power. Uses residual structure in species data to improve our estimates of species parameters (optima) in final WA predictor. Optima of species that are abundant in sites with large residuals are likely to be updated most in WA-PLS.

WEIGHTED AVERAGING PARTIAL LEAST SQUARES (WA-PLS)

Page 125: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

WEIGHTED AVERAGING WA 1. Take the environmental variable (xi) as the site scores. 2. Calculate species scores (optima) (uk) by weighted averaging of site scores – WA regression.

3. Calculate new site scores by weighted averaging of species scores – WA calibration.

4. Regress the environmental variable (xi) on the preliminary new site scores and take the fitted values as the estimate of – deshrinking regression.

xi

xi

[Regression on xi CLASSICAL good for ‘ends’

or xi on INVERSE lower RMSE]

xi

xi

 where

m

k ik

kik

m

k ik

kikii

yuy

yuy

bbxbbx

ˆ

ˆ 1010

kk ubbu 10ˆ

Page 126: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

The weighted averaging (WA) method thus consists of three parts: WA regression, WA calibration and a deshrinking regression. The parts are motivated as follows. A species with a particular optimum will be most abundant in sites with x-values close to its optimum. This motivates

Part 1 (WA regression): Estimate species optima (u*k) by weighted

averaging of the x-value of the sites, i.e.

Species present and abundant in a particular site will tend to have optima close to its x-value. This motivates

Part 2 (WA calibration): Estimate the x-value of the sites by weighted averaging of the species optima, i.e.

Because averages are taken twice, the range of the estimated x-values (x*

i) is shrunken. The amount of shrinking can be estimated from the training set by regression either (x*

i) on (xi) or (xi) on (x*i)

proposed by ter Braak (1988) and ter Braak & Van Dam (1989), respectively. Birks et al. (1990a) discuss the virtue of these two deshrinking methods. For establishing the link with PLS we need the latter, ”inverse” deshrinking regression. This method also has the attractive property of giving minimum root mean squared error in the training set. This motivates

i kik yxyu ik*

yuyx ikk iki **

Page 127: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Part 3 (deshrinking regression): Regress the environmental variable (xi) on the preliminary estimates (x*

i) and take the fitted values as the estimates of (xi).

The final prediction formula for inferring the value of the environmental value from a fossil species assemblage is thus

where a0 and a1 are the coefficients of the deshrinking regression

and ûk = a0 + a1û*k.

The final prediction formula is thus again a weighted average, but one with updated species optima.

k kkyxyaaxaax 0

*

010

*

0100ˆ

yuy kk k 00 ˆ

Page 128: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

The problem of weighted averaging – shrinking of range of environmental reconstructions

Solution – deshrinking inverse regression

Derive inverse regression coefficients

initial xi = a + bxi

Apply regression to reconstructed values to ”deshrink”

final xi = (initiali – a)/b

Where xi = the measured env var; initial xi = the initial WA estimate of the env var; final xi = the final, deshrunk env var; and a and b are regression coefficients.

Page 129: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

1. Estimate species optima (ûk) by weighted averaging of the environmental variables (x) at the sites

where y+k has + to replace the summation over the subscript,

in this case i = 1, ...., n sites.

2. Estimate the x values of the sites by weighted averaging of the species optima

FULL DEFINITION OF TWO-WAY WEIGHTED AVERAGING

n

ikiikk yxyu

1

ˆ

m

kikiki yuyx

1

ˆˆ initial

Page 130: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

3. Because averages are taken twice, the range of the estimated initial x-values (x) is shrunken. Need to deshrink using either

(a) Inverse linear regression

This minimises RMSE in the training set

or

(b) Classical linear regression

iii xbbx ε)ˆ initial(10

)ˆ initial(ˆ final 10 ii xbbx

iii xbbx εˆ initial 10

10)ˆ initial(ˆ final bbxx ii

This deshrinks more than inverse regression and takes inferred values further away from the mean.

Page 131: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

For inverse regression and two-way WA

1001

00 ˆˆ bbyuyxm

kkk

m

kkk yuybbxbbx

100100100 ˆˆˆ

m

kkk yuy

10

*0

kk ubbu ˆ where 10*

For classical regression and two-way WA

m

kkk yuy

10

*0

10* ˆ where bbuu kk

Page 132: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Can also estimate for each species its WA tolerance or standard deviation (niche breadth)

as 2

1

1

2ˆˆ

n

ikkiikk yuxyt

and use these in a tolerance-weighted estimate of x

2

1

2

ˆ

ˆˆˆ

ki

m

kkkik

ity

tuyx

Page 133: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

1. Centre the environmental variable by subtracting weighted mean.

2. Take the centred environmental variable (xi) as initial site

scores – (cf. WA/CA)

3. Calculate new species scores by WA of site scores.

4. Calculate new site scores by WA of species scores.

5. For axis 1, go to 6. For axes 2 and more, make site scores uncorrelated with previous axes.

6. Standardise new site scores and (cf. WA/CA) use as new component.

7. Regress environmental variable on the components obtained so far using a weighted regression (inverse) and take fitted values as current estimate of estimated environmental variable. Go to step 2 and use the residuals of the regression as new site scores (hence name ‘partial’) (cf WA/CA). 

Optima of species that are abundant in sites with large residuals likely to be most updated.

WEIGHTED AVERAGING PARTIAL LEAST SQUARES – WA-PLS

Page 134: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

DEFINITION OF WA-PLS

Step 0 Centre the environmental variable by subtracting the weighted mean, i.e.

This simplifies the formulae.

Step 1 Take the centred environmental variable (xi) as initial site scores (ri)

Do steps 2 to 7 for each component:

Step 2 Calculate new species scores (u*k) by weighted averaging of the site scores, i.e.

Step 3 Calculate new site scores (ri) by weighted averaging of the species scores, i.e. new

Step 4 For the first axis go to step 5. For second and higher components, make the new site scores (ri) uncorrelated with the previous components by orthogonalization.

Step 5 Standardise the new site scores (ri).

Step 6 Take the standardised scores as the new component.

Step 7 Regress the environmental variable (xi) on the components obtained so far using weights (yi+/y++) in the regression and take the fitted values as current estimates ( ). Go to step 2 with the residuals of the regression as the new site scores (ri).

i iiii yxyxx :

i kiikk yryu*

k ikiki yuyr *

ix

Page 135: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Method for calculating inferred temperatures

Using WA inverse deshrinking models, inferred summer surface water temperatures (°C) for shallow lakes may be calculated as:

(without tolerance down-shrinking)

or

(with tolerance down-weighting)

With WA classical deshrinking models, the inferred summer surface water temperatures (°C) are calculated as:

(without tolerance down-shrinking)

or

(with tolerance down-weighting)

m

kik

m

kkiki yuybax

11ˆˆ

m

kkik

m

kkkiki tytuybax

1

2

1

2

ˆˆˆˆ

bayuyxm

kik

m

kkiki

11

ˆˆ

batytuyxm

kkik

m

kkkiki

1

2

1

2

ˆˆˆˆ

Page 136: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Using the WA-PLS models, inferred summer surface water temperatures (°C) may be calculated as:

where is the inferred temperature for sample i, a and b are the intercept and slope for the deshrinking equations, yik is the

abundance (depending on the model, either expressed as a percent of the total identifiable Chironomidae, or as the square-root of this value) of taxon k in sample i, ûk is the temperature optimum (°C) of

species, , is the Beta of species k, and is the tolerance (°C) of species k (Fritz et al. 1991; ter Braak 1987; Birks, pers.comm.).

k

xiˆ

t kˆ

m

kik

m

k kiki yyx11

ˆˆ

Page 137: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Performance of WA-PLS in relation to the number of components (s): apparent error (RMSE) and prediction error (RMSEP) in simulated data (R = 1 from simulation series III). The estimated optimum number of components is 3 because three components give the lowest RMSEP in the training set. The last column is not available for real data.

s Training set Test setApparent Leave-one-outRMSE RMSEP RMSEP

1 6.14 6.22 6.612 3.37 4.24 4.40*3 2.87 4.16* 4.574 2.22 4.65 4.945 2.01 4.65 5.116 1.82 4.50 5.62 

ter Braak & Juggins, 1993

LEAVE-ONE-OUT AND TEST SET CROSS-VALIDATION

Page 138: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

The performance of WA-PLS applied to the three diatom data sets in number of components (s) in terms of apparent RMSE and leave-one-out (RMSEP) (selected model).

Dataset SWAP Bergen ThamesRMSE RMSEP RMSE RMSEP RMSE RMSEP

s1 0.276 0.310* 0.353 0.394 0.341 0.3542 0.232 0.302 0.256 0.318* 0.238 0.2793 0.194 0.315 0.213 0.330 0.196 0.239*4 0.173 0.327 0.192 0.335 0.166 0.2245 0.153 0.344 0.174 0.359 0.153 0.2196 0.134 0.369 0.164 0.374 0.140 0.219 Reduction in prediction error (%)

0 19 32

Page 139: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Bergen data set: predicted pH and bias as a function of observed pH for components 1 and 2 in WA-PLS. Solid lines represent Cleveland’s LOESS scatterplot smooth (1979). 19% gain.

Page 140: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Thames data set: predicted salinity and bias as a function of observed salinity for components 1 and 3 in WA-PLS. Solid lines represents Cleveland’s LOESS scatterplot smooth (1979). Salinity is g-1 and transformed as log10 (salinity – 0.08). 32% gain.

Page 141: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

The relationship between (a) diatom-inferred TP and (b) residuals (inferred TP – observed TP) and observed TP for the one- and two-component WA-PLS models. Solid lines show LOWESS scatter plot smoothers.

Summary diatom diagram and reconstructed annual mean TP concentrations using one- and two-component WA-PLS models for Lake SøbyGård, showing standard errors of prediction for the two-component model. 210Pb dates (AD) are shown on the right hand side.

Bennion Bennion et alet al. . 19961996

NW Europe

Total P

152 lakes

Page 142: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Imbrie & Kipp (1971) data

WAWA WA-PLSWA-PLS

Page 143: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Italian and Austrian Alps (Aldo Marchetto & Roland Schmidt)31

Spanish Pyrenees (Jordi Catalan & Joan Garcia)28

ALPE sites (Nigel Cameron & Viv Jones) 30Norway (Frode Berge & John Birks) 10SWAP Norway & UK (Frode Berge, Roger Flower, Viv Jones)

20Total 119 

One 'rogue' sample detected 

118 samples 

527 diatom taxa 

pH 4.48 - 8.04median 6.10mean 6.15

 Gradient length 5.19 standard deviations

ALPE - DIATOM - pH TRAINING SET

Page 144: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

ALPE TRAINING SET - 118 SAMPLES 

Square root transformation  Components RMSE r2 RMSEPr2 (jack)WA-PLS -1 0.299 0.85 0.359 0.78

-2 0.178 0.96 0.337 0.81-3 0.131 0.97 0.331 0.81-4 0.100 0.98 0.331 0.81-5 0.075 0.99 0.339 0.80

  Select WA-PLS model with 3 components as simplest model (least parameters) that gives lowest RMSEP.

Page 145: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 146: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 147: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Leave-one-out cross validation

Predicted air temperature.

RMSEP = 0.89ºC

Bias = 0.61ºC

Predicted – observed air temperature

1:11:1

109 samples

NORWEGIAN CHIRONOMID – CLIMATE TRAINING SET

Page 148: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 149: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Inferred mean July air temperature

Oxygen isotope ratios

Page 150: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

PrecipitationPrecipitation 300 - 300 - 3537mm3537mm

Mean JulyMean July 7.7 7.7 - 16.4- 16.4ººCC

Mean JanuaryMean January -17.8 - 1.1-17.8 - 1.1ººCC

NORWEGIAN POLLEN AND CLIMATE

Page 151: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 152: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 153: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Root mean squared errors of prediction (RMSEP) based on leave-one-out jack-knifing cross-validation for annual precipitation, mean July temperature, and mean January temperature using five different statistical models.  

Pptn July January(mm) (C) (C)

Weighted averaging (WA) (classical) 486.5 1.33 2.86

Weighted averaging (WA) (inverse) 427.2 1.07 2.61

Partial least squares (PLS) 420.1 0.94 2.82

WA-PLS 417.5 1.03 2.57

Modern analogue technique (MAT) 385.3 0.91 2.42

Page 154: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 155: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Vuoskojaurasj, Abisko, SwedenVuoskojaurasj, Abisko, Sweden

Page 156: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 157: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Vuoskojaurasj consensus reconstructions

Page 158: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Tibetanus, Abisko Valley

Inferred from pollen

Inferred from pollen

Hammarlund et al. 2002

Page 159: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Björnfjelltjörn, N. Norway

Page 160: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 161: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Björnfjelltjörn consensus reconstructions

Page 162: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 163: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 164: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

LINEAR AND UNIMODAL-BASED NUMERICAL METHODS

Response model

Problem Linear Unimodal

Regression Multiple linear regression Weighted averaging (WA) of sample scores

Calibration Linear calibration 'inverse regression'

WA of taxa scores and simple two-way WA

Principal components regression

Correspondence analysis regression

Partial least squares (PLS-1)

WA-PLS (WAPLS-1)

Multivariate calibration (PLS-2) WAPLS-2

Ordination Principal components analysis (PCA)

Correspondence analysis (CA)

Constrained ordination (= reduced rank regression)

Redundancy analysis (RDA) Canonical correspondence analysis (CCA)

Partial ordination Partial PCA Partial CA

Partial constrained ordination

Partial RDA Partial CCA

Page 165: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Pollen percentages in modern samples plotted in ‘climate space’ (cf Iversen’s thermal limit species +/– plotted in climate space). Contoured

Trend-surface analysis R2 Bartlein et al. 1986

Contoured only Webb et al. 1987

Reconstruction purposes – grid, analogue matching

Simulation purposes  PROBLEMS 1. Need large high-quality modern data for large geographical areas.

2. No error estimation for reconstruction purposes.

3. Reconstruction procedure ‘ad hoc’ – grid size, etc.

RESPONSE SURFACES

Page 166: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Response surfaces for individual pollen types. Each point is labelled by the abundance of the type. Many points are hidden – only the observation with the highest abundance was plotted at each position. For (a) to (e) ’+’ denotes 0%, ’0’ denotes 0-10%, 1 denotes 10-20%, ’2’ denotes 20-30% etc. For (f) to (h), ’+’ denotes 0%, ’0’ denotes 0-1%, 1 denotes 1-2%, ’2’ denotes 2-3%, etc. ’H’ denotes greater than 10%.

Page 167: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 168: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Percentage of spruce (Picea) pollen at individual sites plotted in climate space along axes for mean July temperature and annual precipitation. (B) Grid laid over the climate data to which the pollen percentage are fitted by local-area regression. The box with the plus sign is the window used for local-area regression. (C) Spruce pollen percentages fitted onto the grid. (D) Contours representing the response surface and pollen percentages shown in part C.

Page 169: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Scatter diagram showing the smoothed distribution of percentages of spruce (Picea) and beech (Fagus) pollen from sediment with modern pollen data in eastern North America when the pollen percentages are plotted at coordinates for modern January and July mean temperature (P.J. Bartlein, unpublished). The arrow indicates the direction and approximate magnitude of temperature change at Montreal since 6000 yr BP.

Page 170: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Reconstruction produced by the response surface approach

Elk Lake, Minnesota

Page 171: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Fossil and simulated isopoll map sequences for Betula. Isopolls are drawn at 5, 10, 25, 50 and 75% using an automatic

contouring program.

Simulation purposes

Page 172: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Fossil and simulated isopoll map sequences from Quercus (deciduous). Maps are drawn at 3000-year intervals between 12000 yr BP and the present. The upper map sequence presents the obser-ved fossil and contemporary pollen values. The lower map sequence presents the pollen values simulated, by means of the pollen-climate response surface from the climate conditions obtained by applying to the measured contem-porary climate the palaeoclimate anomalies that Kutzbach & Guetter (1986) simulated using the NCAR CCM, for 12000 to 3000 yr BP. The map for the present is simulated from the measured contemporary climate. Isopolls are drawn at 2, 5, 10, 25, and 50% using an automatic contouring program. Huntley 1992

Page 173: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

 1. Choice of how much or how little

smoothing.

2. Choice of scale of grid for reconstructions.

3. No statistical measure of ‘goodness-of-fit’.

4. No reliable error estimation for predicted values.

RESPONSE SURFACES - ‘Ad hoc’

Page 174: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

PROBLEMS 1. Assessment of ‘most similar’?

2. 1, 2, 9, 10 most similar?

3. No-analogues for past assemblages.

4. Choice of similarity measure.

5. Require huge set of modern samples of comparable site type, pollen morphological quality, etc, as fossil samples. Must cover vast geographical area.

6. Human impact.

ANALOGUE-BASED APPROACH

Do an analogue-matching between fossil sample i and available modern samples with associated environmental data. Find modern sample(s) most similar to i, infer the past environment for sample i to be the modern environment for those modern samples. Repeat for all fossil samples.

Page 175: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Reconstructions produced using the analogue approach

Elk Lake, Minnesota

Page 176: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

MODIFIED MODERN ANALOGUE APPROACHES

Joel Guiot

1. Taxon weighting

Palaeobioclimatic operators (PBO) computed from either a time- series of fossil sequence or from a PCA of fossil pollen data from large spatial array of sites.

Weights are selected to 'emphasis the climate signal within the fossil data‘ and to 'highlight those taxa that show the most coherent behaviour in the vegetational dynamics', 'to minimise the human action which has significantly disturbed the pollen spectra', 'to reduce noise'.

2. Environmental estimates are weighted means of estimates based on 20, 40 or 50 or so most similar assemblages.

3. Standard deviations of these estimates give an approximate standard error.

Page 177: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Reconstruction of variations in annual total precipitation and mean temperature expressed as deviations from the modern values (1080 mm and 9.5oC for La Grande Pile. 800 mm and 11oC for Les Echets). The error bars are computed by simulation. The vertical axis is obtained by linear interpretation from the dates indicated in Fig.2

Cor is the correlation between estimated and actual data. +ME is the mean upper standard deviation associated to the estimates, -ME is the lower standard deviation. These statistics are calculated on the fossil data and on the modern data. In this case, R must be replaced by C.

Guiot et al. 1989

Page 178: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

MAT, ANALOG, C2

1. Modern data and environmental variable(s) of interest.

Do analog matches and environmental prediction for all samples but with cross-validation jack-knifing.

Find number of analogues to give lowest RMSEP for environmental variable based on mean or weighted mean of estimates of environmental variable. Can calculate bias statistics as well.

2. Reconstruct using fossil data using the ‘optimal’ number of analogues (lowest RMSEP, lowest bias).

Advise chord distance or chi-squared distance as dissimilarity measure. Optimises signal to noise ratio.

MODERN ANALOGUE TECHNIQUES FOR ENVIRONMENTAL RECONSTRUCTION= K – NEAREST NEIGHBOURS (K – NN)

Page 179: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Elk Lake climate reconstruc-tion summary. The three series plotted with red, green and blue lines show the reconstructions produced by the individual approaches, the series plotted with the thin black line show the envelope of the prediction intervals, and the series plotted with a thick purple line represents the stacked and smoothed reconstruction of each variable (constructed by simple averaging of the individual reconstructions for each level, followed by smoothing [Velleman, 1980]). The modern observed values (1978-1984) for Itasca Park are also shown.

CONSENSUS RECONSTRUCTIONS

Page 180: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

PLOTTING OF RECONSTRUCTED VALUES

1. Plot against depth or age the reconstructed values, indicate the observed modern value if known.

2. Plot deviations from the observed modern value or the inferred modern value against depth or age.

3. Plot centred values (subtract the mean of the reconstructed values) against depth or age to give relative deviations.

4. Plot standardised values (subtract the mean of the reconstructed values and divide by the standard deviation of the reconstructed values) against depth or age to give standardised deviations.

Add LOESS smoother to help highlight major trends.

Page 181: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

LOESS smoother

Page 182: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

THE SECRET ASSUMPTION OF TRANSFER FUNCTIONS

Telford & Birks (2005) Quaternary Science Reviews 24: 2173-2179

Estimating the predictive power and performance of a training set as RMSEP, maximum bias, r2, etc., by cross-validation ASSUMES that the test set (one or many samples) is INDEPENDENT of the training set (The Secret or Totally Ignored Assumption).

Cross-validation in the presence of spatial auto-correlation seriously violates this assumption.

See Richard Telford's lecture after this lecture.

Page 183: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

USE OF ARTIFICIAL, SIMULATED DATA-SETS

SIMULATED DATA-SETS

Generate many training sets (different numbers of samples and taxa, different gradient lengths, vary extent of noise, absences, etc) and evaluation test sets, all under different species response models.

Page 184: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

1. Probably widespread.

2. Does it matter?

3. Analog-based techniques for reconstruction - YES!

Modern analog technique

Response surfaces 

4. WA and related inverse regression methods

What we need are ‘good’ (i.e. reliable) estimates of ûk. Apply them to same taxa but in no-analog conditions in the past.

Assume that the realised niche parameter ûk is close to the potential or theoretical niche parameter uk

*.

WA and WA-PLS are, in reality, additive indicator species approaches rather than strict multivariate analog-based methods.

5. Simulated data

ter Braak, 1995. Chemometrics & Intelligent Laboratory Systems 28, 165–180

NO-ANALOG PROBLEM

Page 185: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

L-shaped climate configuration of samples (circles) in the training set (Table 3), with x the climate variable to be calibrated and z another climate variable. Also indicated are the regions of the samples in evaluation set A and set C 00 100100

100100

XX

ZZ

Set CSet C

Set ASet A

No analogue test No analogue test setset

Training setTraining set

Inverse versus classical methods; method-dependent bias in the leave-one-out error estimate. Comparison of the prediction error of inverse (WA-PLS and k-NN) and classical (MLM) approaches in the training set (t) and the three evaluation sets (B, A and C). Set B is a five time replication of t, set A is a subset of t and set C is an extrapolation set. The data are from simulation series 3 of Ref [53] in which species composition is governed by two climate variable (x and z) with an intermediate amount of unimodality (Rx = Rz = 1).

Numbers are geometric means of root mean squared errors of prediction of x in four replications. The coefficients of variation of each mean is ca. 10%. Coefficient of variation of the ratio of 2 means within a column is ca. 15%. The range of x is [0, 100]. The number in superscript is the range of optimal number of components in WA-PLS and the optimal number of nearest neighbours in k-NN in the four replicates. k-NN uses Eq.(3) & (5).

a Significant difference (P<0.01) between leave-out validation and validation by the independent evaluation set B.

Set t B A C

Inverse approach

WA-PLS 2.97-9 3.0 2.8 5.9

k-NN 4.43-5

a

2.5 2.9 13.5

Classical approach

MLM w.r.t. x

4.3 4.4 3.5 10.6

MLM w.r.t. x and z

2.8 3.0 2.9 4.6

ter Braak 1995ter Braak 1995

Page 186: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

6. General conclusions from simulated data experiments 

WA, WA-PLS, Maximum likelihood and MAT all perform poorly and no one method performs consistently better than other methods.

 For strong extrapolation, WA performed best. Appears WA-PLS deteriorates quicker than WA with increasing extrapolation.

 Hutson (1977) – no-analog conditions WA outperformed inverse regression and PCR.

 Important therefore to assess analog status of fossil samples as well as ‘best’ training set in terms of RMSEP, bias, etc.

 Dynamic training set concept.

 Analogues (say 10–20) for each fossil sample,

devise dynamic training set, use linear PLS methods, avoids edge effects, truncated responses, etc.

Page 187: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Fossil assemblage is similar to a number of modern samples that differ widely in their modern environment. Happens in pollen studies with training sets covering Europe, N America and parts of Asia. Major taxa only included, e.g. Pinus pollen may dominate northern, Mediterranean and southern assemblages. Constrained analog matching – Guiot e.g. constrain pollen choices on basis of inferred biome, fossil beetles, inferred lake-level changes Constrained response surfaces ( analog matching) – Huntley e.g. constrain area of search on the basis of inferred biome or plant macrofossils

MULTIPLE ANALOG PROBLEM

Page 188: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Reconstructed range of July temperatures (oC) at La Grande Pile (Vosges, France) from three methods:

a) using beetles alone,

b) using pollen alone,

c) using pollen constrained by beetles

Guiot Guiot et alet al. 1993. 1993

(a)(a)

(b)(b)

(c)(c)

Page 189: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Swiss surface pollen samples – lake sediments

Selected trees and shrubs

MULTI-PROXY APPROACHES

Page 190: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Swiss surface lake sediments.

Selected herbs and pteridophytes

Page 191: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Root mean squared errors of prediction (RMSEP) based on leave-one-out jack-knifing cross-validation for mean summer temperature (June, July, August), mean winter temperature (December, January, February) and mean annual precipitation using WA-PLS model.

 

 

RMSEP R2 No. of comps.

Mean summer temperature 1.252C 0.90 3

Mean winter temperature 1.025C 0.88 3

Mean annual precipitation 194.1mm 0.57 2

Page 192: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Modern Swiss pollen - climate

Page 193: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Gerzensee, Bernese Oberland, Switzerland

Page 194: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 195: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351
Page 196: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Lotter et al. 2000

PB-O

YD-PB Tr

YD

AL-YD Tr

G-O

Gerzensee

Page 197: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Lotter et al. 2000

PollenPollen

OO1616/O/O1818

ChydoridChydoridss

Gerzensee

Page 198: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

Birks & Ammann 2000

Page 199: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

(Compositional turnover along environmental gradient – SD units)

GRADIENT LENGTH

SHORT (<2sd) MEDIUM (2-4sd) LONG (>4sd)

LINEAR UNIMODAL-BASED METHODS

NOISE OF TRAIN-ING SET DATA †

VERY LOW

Least squares linear regression and calibration (inverse regression) GLM

Gaussian logit or multinomial regression and calibration GLM

? Generalised additive models GAM

LOWPartial least squares PLS regression and calibration PLS

Weighted averaging PLS regression and calibration WA-PLS

? WA-PLS

MED-IUM

Partial least squares PLS or robust linear regression and calibration

Weighted averaging regression and calibration WA

? WA or WA-PLS

HIGHPCA regression CA-regression ? DCA-

regression

? IDEAL PROBLEMS AT GRADIENT ENDS

TRY TO AVOID, CAUSES MULTIPLE

ANALOG PROBLEM?

Page 200: QUANTITATIVE PALAEOECOLOGY Lecture 4. Quantitative Environmental Reconstructions BIO-351

1. Zero values

2. High sample heterogeneity (root mean squared deviation for samples)

3. High taxon tolerances (root mean squared deviation) 

4. Rare taxa

5. % variance in Y explained by X, constrained 1 relative to unconstrained 2 .

† HOW TO ESTIMATE NOISE IN REAL DATA?