connecting dynamic vegetation models to data – an inverse ......model succession, impacts of...

13
SPECIAL ISSUE Connecting dynamic vegetation models to data – an inverse perspective Florian Hartig 1 *, James Dyke 2 , Thomas Hickler 3 , Steven I. Higgins 4 , Robert B. O’Hara 3 , Simon Scheiter 3 and Andreas Huth 1 INTRODUCTION Plants are essential for life on Earth. Not only is their photosynthetic production the primary energy source for the biosphere, plants also influence the main global biogeochem- ical cycles, structure terrestrial habitats, and interact with other important compartments of the Earth such as soil and atmosphere. Moreover, the global climate is substantially influenced by vegetation through the exchange and accumu- lation of carbon, and through effects on the Earth’s albedo, water balance and evapotranspiration (Bonan, 2008). Under- standing and predicting how climate and other factors such as soil, water and nutrients influence vegetation dynamics and species distributions is therefore a key question in ecology (Grace, 2004). Correlative versus dynamic vegetation models Current approaches to modelling plant species, communities and biomes, particularly in response to climate change, broadly fall into two categories: correlative models on the one hand and dynamic vegetation models (DVMs) on the other (Pereira et al., 2010; Dormann et al., 2012). Correlative modelling approaches, also referred to as climatic envelope models, species distribution models (SDMs) or niche models, use statistical methods to relate environmen- tal variables such as climate or soil conditions to the presence or absence of a species (Elith & Leathwick, 2009). An important factor for the success of these models is the way they make use of empirical data: in correlative species distribution models, parameters and model structure are 1 UFZ – Helmholtz Centre for Environmental Research, 04318 Leipzig, Germany, 2 Institute for Complex Systems Simulation, University of Southampton, Southampton SO17 1BJ, UK, 3 Biodiversity and Climate Research Centre (LOEWE BiK-F), 60325 Frankfurt am Main, Germany, 4 Institut fu ¨r Physische Geographie, Goethe Universita ¨t Frankfurt am Main, 60438 Frankfurt am Main, Germany *Correspondence: Florian Hartig, UFZ – Helmholtz Centre for Environmental Research, Permoserstr. 15, 04318 Leipzig, Germany. E-mail: fl[email protected] ABSTRACT Dynamic vegetation models provide process-based explanations of the dynamics and the distribution of plant ecosystems. They offer significant advantages over static, correlative modelling approaches, particularly for ecosystems that are outside their equilibrium due to global change or climate change. A persistent problem, however, is their parameterization. Parameters and processes of dynamic vegetation models (DVMs) are traditionally determined independently of the model, while model outputs are compared to empirical data for validation and informal model comparison only. But field data for such independent esti- mates of parameters and processes are often difficult to obtain, and the desire to include better descriptions of processes such as biotic interactions, dispersal, phenotypic plasticity and evolution in future vegetation models aggravates lim- itations related to the current parameterization paradigm. In this paper, we discuss the use of Bayesian methods to bridge this gap. We explain how Bayesian methods allow direct estimates of parameters and processes, encoded in prior distributions, to be combined with inverse estimates, encoded in likelihood functions. The combination of direct and inverse estimation of parameters and processes allows a much wider range of vegetation data to be used simultaneously, including vegetation inventories, species traits, species distributions, remote sensing, eddy flux measurements and palaeorecords. The possible reduction of uncertainty regarding structure, parameters and predictions of DVMs may not only foster scientific progress, but will also increase the relevance of these models for policy advice. Keywords Bayesian statistics, calibration, data assimilation, forest models, inverse mod- elling, model selection, parameterization, plant functional types, predictive uncertainty, process-based models. Journal of Biogeography (J. Biogeogr.) (2012) 39, 2240–2252 2240 http://wileyonlinelibrary.com/journal/jbi ª 2012 Blackwell Publishing Ltd doi:10.1111/j.1365-2699.2012.02745.x

Upload: others

Post on 09-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Connecting dynamic vegetation models to data – an inverse ......model succession, impacts of disturbances, and transient vegetation dynamics that result from changes in environmental

SPECIALISSUE

Connecting dynamic vegetation modelsto data – an inverse perspective

Florian Hartig1*, James Dyke2, Thomas Hickler3, Steven I. Higgins4,

Robert B. O’Hara3, Simon Scheiter3 and Andreas Huth1

INTRODUCTION

Plants are essential for life on Earth. Not only is their

photosynthetic production the primary energy source for the

biosphere, plants also influence the main global biogeochem-

ical cycles, structure terrestrial habitats, and interact with other

important compartments of the Earth such as soil and

atmosphere. Moreover, the global climate is substantially

influenced by vegetation through the exchange and accumu-

lation of carbon, and through effects on the Earth’s albedo,

water balance and evapotranspiration (Bonan, 2008). Under-

standing and predicting how climate and other factors such as

soil, water and nutrients influence vegetation dynamics and

species distributions is therefore a key question in ecology

(Grace, 2004).

Correlative versus dynamic vegetation models

Current approaches to modelling plant species, communities

and biomes, particularly in response to climate change, broadly

fall into two categories: correlative models on the one hand

and dynamic vegetation models (DVMs) on the other (Pereira

et al., 2010; Dormann et al., 2012).

Correlative modelling approaches, also referred to as

climatic envelope models, species distribution models (SDMs)

or niche models, use statistical methods to relate environmen-

tal variables such as climate or soil conditions to the presence

or absence of a species (Elith & Leathwick, 2009). An

important factor for the success of these models is the way

they make use of empirical data: in correlative species

distribution models, parameters and model structure are

1UFZ – Helmholtz Centre for Environmental

Research, 04318 Leipzig, Germany, 2Institute

for Complex Systems Simulation, University of

Southampton, Southampton SO17 1BJ, UK,3Biodiversity and Climate Research Centre

(LOEWE BiK-F), 60325 Frankfurt am Main,

Germany, 4Institut fur Physische Geographie,

Goethe Universitat Frankfurt am Main, 60438

Frankfurt am Main, Germany

*Correspondence: Florian Hartig, UFZ –

Helmholtz Centre for Environmental Research,

Permoserstr. 15, 04318 Leipzig, Germany.

E-mail: [email protected]

ABSTRACT

Dynamic vegetation models provide process-based explanations of the dynamics

and the distribution of plant ecosystems. They offer significant advantages over

static, correlative modelling approaches, particularly for ecosystems that are

outside their equilibrium due to global change or climate change. A persistent

problem, however, is their parameterization. Parameters and processes of

dynamic vegetation models (DVMs) are traditionally determined independently

of the model, while model outputs are compared to empirical data for validation

and informal model comparison only. But field data for such independent esti-

mates of parameters and processes are often difficult to obtain, and the desire to

include better descriptions of processes such as biotic interactions, dispersal,

phenotypic plasticity and evolution in future vegetation models aggravates lim-

itations related to the current parameterization paradigm. In this paper, we

discuss the use of Bayesian methods to bridge this gap. We explain how Bayesian

methods allow direct estimates of parameters and processes, encoded in prior

distributions, to be combined with inverse estimates, encoded in likelihood

functions. The combination of direct and inverse estimation of parameters and

processes allows a much wider range of vegetation data to be used simultaneously,

including vegetation inventories, species traits, species distributions, remote

sensing, eddy flux measurements and palaeorecords. The possible reduction of

uncertainty regarding structure, parameters and predictions of DVMs may not

only foster scientific progress, but will also increase the relevance of these models

for policy advice.

Keywords

Bayesian statistics, calibration, data assimilation, forest models, inverse mod-

elling, model selection, parameterization, plant functional types, predictive

uncertainty, process-based models.

Journal of Biogeography (J. Biogeogr.) (2012) 39, 2240–2252

2240 http://wileyonlinelibrary.com/journal/jbi ª 2012 Blackwell Publishing Ltddoi:10.1111/j.1365-2699.2012.02745.x

Page 2: Connecting dynamic vegetation models to data – an inverse ......model succession, impacts of disturbances, and transient vegetation dynamics that result from changes in environmental

inferred inversely by comparing model predictions with

observed species distribution patterns. It is widely acknowl-

edged that the resulting phenomenological descriptions of the

species’ niche must be interpreted with care (Araujo & Guisan,

2006; Elith & Leathwick, 2009), but the ability of these models

to rapidly convert distribution data into predictive models for

a large range of taxa has made them a widely used tool in

ecology and conservation biology (Zimmermann et al., 2010).

Dynamic vegetation models, on the other hand, provide a

bottom-up description of plant communities by explicitly

modelling physiological and population level processes such as

growth, photosynthesis, carbon allocation, regeneration and

mortality. For many questions that arise in the context of

environmental change, they have considerable advantages

compared to static correlative models. For example, dynamic

models do not need to assume that the vegetation is in

equilibrium with the environment. Thus, they may be used to

model succession, impacts of disturbances, and transient

vegetation dynamics that result from changes in environmental

and climatic conditions (Shugart, 1998; Prentice et al., 2007;

Hickler et al., 2012), as well as reactions to biotic changes or

invasions (Gritti et al., 2006).

Dynamic vegetation models, however, are also associated

with a major practical disadvantage: the model structure and

model parameters are traditionally derived from independent

experiments or empirical observations. We call this a direct

parameterization, as opposed to the inverse parameterization

approach used with correlative models. A direct parameteri-

zation is certainly appealing because it derives model outputs

from basic processes, and allows the validity of these processes

to be tested with independent data. However, the necessity for

direct observations of all parameters also severely limits

complexity in terms of processes, diversity and locations that

can be modelled. Process-based models are therefore often

characterized as being ‘data-hungry’ (Guisan & Thuiller, 2005).

The best of both worlds – combining direct and

inverse data sources with process-based models

The term ‘data-hungry’ suggests that parameterizing DVMs

requires large amounts of data. This may be the case, but the

more pressing problem is the type of data required. Direct

parameterizations require all model parameters (e.g. on growth

or physiology) to be determined by direct measurements or

observations, which are often difficult or laborious to obtain.

On the other hand, many existing data, such as species

distribution databases, eddy flux data or palaeorecords, cannot

be used because they describe variables that are modelled on

the basis of multiple, interacting processes and therefore

depend on many parameters at the same time.

Inverse modelling is the key to using these data. As with

correlative models, inverse modelling methods adjust model

parameters and processes according to their ability to repro-

duce field observations. Moreover, Bayesian methods offer the

possibility of systematically combining information gained by

this inverse modelling process with other independent infor-

mation. There are already some successful applications of these

techniques (e.g. Van Oijen et al., 2005), and we anticipate that

their wider application, particularly on more complex models,

is only a matter of time and computing power. In particular,

we believe that inverse modelling will improve parameteriza-

tions of processes such as local interactions, dispersal, migra-

tion and plant physiology, enable better representations of

functional and species diversity, and allow accommodation of

additional processes such as phenotypic plasticity and evolu-

tion. Inverse modelling may therefore be a crucial ingredient in

making DVMs a viable alternative to correlative species

distribution models for many questions of global change and

conservation.

The aim of this paper is to discuss the use of inverse

modelling with Bayesian statistics as a means to combine direct

and inverse parameterization paradigms for model selection

and parameter estimation. In the following, we first discuss the

properties of DVMs and the types of predictions they make.

Then, we explain the general principles of inferring parameters

and underlying processes by inverse modelling, and discuss

computational and methodological challenges. Finally, we

discuss data types and sources that we consider promising for

constraining parameters and underlying processes of DVMs,

and give some specific hints for including these data in an

inverse modelling process.

DYNAMIC VEGETATION MODELS

We understand a DVM to be any process-based model that

predicts the population dynamics of plant species or plant

functional types over time. A large number of different model

types fall into this category, from dynamic global vegetation

models (DGVMs; see Prentice et al., 2007) to more local

approaches such as forest gap models (see Bugmann, 2001).

The conceptual differences between these model approaches,

however, will not be the focus of this paper (see e.g. Bugmann,

2001; Prentice et al., 2007; Jeltsch et al., 2008).

A central component of most DVMs is the description of a

plant’s response to its environmental conditions (which may

be influenced by competition with other plants, see below) in

terms of processes such as resource uptake, photosynthesis,

respiration, resource allocation, mortality and reproduction

(Fig. 1). This description may be based on differential or

difference equations, and can be provided at different levels of

aggregation, ranging from fully individual-based representa-

tions over age classes up to population-level descriptions

(Smith et al., 2001; Prentice et al., 2007). When sufficient data

are available, all species of interest can be modelled explicitly.

Forest models, for example, often include parameterizations

for 5–30 species (e.g. Bugmann, 1996; ForClim model; Pacala

et al., 1996; SORTIE model; Xiaodong & Shugart, 2005;

FAREAST model; Hickler et al., 2012; LPJ-GUESS model).

For global models and for local models in highly diverse

ecosystems such as grasslands, savannas and tropical forests,

species are typically grouped into plant functional types (PFTs)

that represent a number of species with similar ecological

Dynamic vegetation models – an inverse perspective

Journal of Biogeography 39, 2240–2252 2241ª 2012 Blackwell Publishing Ltd

Page 3: Connecting dynamic vegetation models to data – an inverse ......model succession, impacts of disturbances, and transient vegetation dynamics that result from changes in environmental

properties (Kohler et al., 2000; see also Huth & Ditzer, 2000;

FORMIX model; Sitch et al., 2003; LPJ model; Scheiter &

Higgins, 2009; aDGVM model). It should be noted, however,

that the number of PFTs that are locally co-occurring in global

vegetation models is typically much lower than in their more

local counterparts (e.g. Huth & Ditzer, 2000).

Central to most DVMs are also interactions between plants,

for example through competition for resources such as light,

water and nutrients that are available in a common resource

pool (Fig. 1). These competition processes are major determi-

nants for predicted community composition, non-equilibrium

dynamics and succession. Examples are the dry limit of trees,

which is determined in many DGVMs by competition with

grasses and fire (Sitch et al., 2003; Scheiter & Higgins, 2009),

and successional dynamics in forest gap models, which are

based on vertical light competition between trees of different

sizes (Bugmann, 1996; Huth & Ditzer, 2000). Finally, geo-

graphical distribution and productivity maps can be created by

upscaling model predictions to the scale at which environ-

mental information is available (Fig. 1).

Thus, DVMs take a bottom-up approach to predict from

basic processes the responses of plants to environmental

conditions and competition. There are, however, limitations

to constraining all necessary parameters and processes bottom-

up. For example, despite the fact that many processes in DGVMs

are relatively well understood, small differences in parameter-

ization and model design can propagate to create large

uncertainties (Cramer et al., 2001; Sitch et al., 2008; Galbraith

et al., 2010). These uncertainties are unlikely to have been

reduced by the newer generation of DVMs (Moorcroft et al.,

2001; Smith et al., 2001; Sato et al., 2007). Moreover, the desire

to include additional processes, such as dispersal and migration

(Lischke et al., 2006), more detailed plant physiology (Higgins

et al., 2012), seed competition (Bohn et al., 2011), environ-

mental modulation (Linder et al., 2012) or phenotypic plasticity

and evolution (e.g. Parmesan, 2006; Kramer et al., 2010), is

bound to add to the problem of constraining uncertainty about

parameters and processes. A further issue in global vegetation

models is the practice of summarizing a large number of species

or individuals by an average functional type. The Amazonian

Basin, for example, is home to c. 11,000 tree species (Hubbell

et al., 2008), but many global vegetation models summarize this

diversity by only two functional types. While we concede that it

may not always be necessary to model each species explicitly,

there is substantial evidence that interspecific and intraspecific

diversity in species traits may be important for understanding

and correctly predicting community dynamics (Dıaz & Cabido,

2001; Clark, 2010; McMahon et al., 2011).

To address these challenges, we have to find additional

information to constrain parameters and processes. To that

end, it seems natural to explore the potential of inverse

modelling approaches, which allow for the use of a large range

of data types from different sources and scales that cannot be

used for direct parameter estimation. Modern Bayesian

methods provide the means to combine such inverse param-

eter estimates with any direct parameter estimates that are

available, and thus provide a method to synthesize all available

data sources. In the following section we review the general

technical prerequisites of Bayesian methods for parameter

estimation and model selection. This will be useful for

species1

species 2

Factor 1 Factor 1

Fact

or 2

Fact

or 2

Fundamental niche, growth rate > 0

in isolation

Realized niche, growth rate > 0 in community

LongitudeLa

titu

de

Realized distribution, map of realized niche to

environmental conditions

Individual scale Plot scale Geographical scale

Seeds

Carbonallocation

Upscaling

Projection into geographic space

species 2

species1

species1

species 2

Figure 1 Structure of dynamic vegetation

models and the connection to the niche. At

the core of the model are the properties of the

individual species (physiological,

morphological, phenological, life history and

bioclimatic parameters), which determine the

range of environmental factors at which an

individual plant is able to maintain positive

growth at low population density

(corresponding to its fundamental niche).

Species interactions determine the actual

community composition under given

environmental conditions (corresponding to

its realized niche). Finally, the realized niche

may be mapped into geographical space by

connecting a spatial map of environmental

conditions with the realized niches of the

species.

F. Hartig et al.

2242 Journal of Biogeography 39, 2240–2252ª 2012 Blackwell Publishing Ltd

Page 4: Connecting dynamic vegetation models to data – an inverse ......model succession, impacts of disturbances, and transient vegetation dynamics that result from changes in environmental

understanding our subsequent discussion of promising data

types and their properties.

INFERRING PARAMETERS AND PROCESSES

BY INVERSE MODELLING

Let us assume that we have some empirical observations D that

can be reproduced with a DVM, and that we want to compare

several versions of this model that may differ in the choice of

model parameters (parameter estimation) or in the choice of

processes (model selection) in their ability to reproduce these

data.

The likelihood as the goodness of fit

Clearly, a core question for any inverse methodology is to

define a measure that quantifies how well the predictions of

any particular model or parameterization fit to our observa-

tions. Different approaches have been advocated. A common

method is to define ad hoc some objective function that

measures ‘goodness of fit’, that is, the difference between

predicted and observed data (see e.g. Schroder & Seppelt, 2006).

Usually, this function is subsequently used to find the best fitting

parameter set by optimization. Although useful in some

situations, a disadvantage of the ad hoc choice of objective

functions is their lack of statistical interpretability, which

prevents a clear definition of confidence boundaries and creates

problems when multiple data sources have to be merged into one

single objective function (see Hartig et al., 2011).

An alternative is provided by the likelihood, a statistical

measure of the quality of the fit (e.g. Gelman et al., 2003).

Given that we want to compare different parameterizations Qof a model in terms of their ability to reproduce some

empirical data D, the likelihood p(D|Q) is defined as the

probability of obtaining the observations D given the model

parameterized with Q. One may ask why there should be a

probability of obtaining the data, given that even stochastic

vegetation models often produce very stable outputs and

therefore either reproduce the data or not. An analogy that

some readers may find useful is that of a nonlinear regression: in

such a regression, the aim is to fit a nonlinear function (which

could also be a DVM) that relates a dependent variable (in our

case the structure or dynamics of the vegetation) to an

independent variable (the environment). To this end, a regres-

sion uses a probabilistic model (e.g. that observations are

distributed around the ‘true’ value according to a normal

distribution) to represent additional processes that lead to

unexplained variation in the data (observation error, unob-

served covariates). This error model allows a probability p(D|Q)

(called the likelihood) to be assigned to the observed deviations

between observation and model prediction (residuals).

Thus, the likelihood is given by the probability of obtaining

the observed deviations between model predictions and data,

based on an error model that needs to be defined. This error

model can have a large effect on the results of the inference. It

must therefore be chosen with care to reflect our knowledge

about the processes that lead to variation in system states

(process error) and observations (observation error), as well as

with the actual deviations between model predictions and

observed data after the fit (below).

If we assume that the error is given by an independent,

normally distributed random variable (typically an observation

error), the likelihood is proportional to:

pðDjHÞ / e�½MðHÞ�D�2

2r2 ;

where M(Q) are the model predictions resulting from the

parameterization Q, D the observed data, and r the standard

deviation of the error (see Fig. 2a). Because of its simplicity and

general acceptance, the normal error model is widely used (e.g.

Lasslop et al., 2008 or Higgins et al., 2010), but many other

options are possible (e.g. Fig. 2b, see also Gelman et al., 2003).

When using multiple datasets, likelihoods can often be

combined with relative ease: when deviations between pre-

dicted and observed data are uncorrelated, their corresponding

likelihoods represent independent probabilities, and so their

joint likelihood is simply the product of the individual

likelihoods. However, difficulties arise when deviations

between model predictions and data are correlated within or

across data sets. Luckily, we can build on a large stock of

existing knowledge about how to treat these problems from

statistical models of the vegetation. Dormann et al. (2007), for

example, discuss the problem of specifying error models for

spatially autocorrelated data. Previous work on more complex

(hierarchical) error models that include several layers of

dependent stochastic processes, such as environmental

stochasticity, intraspecific variability, process errors and

observation errors is also of interest in this context (Clark,

e.g. Biomass

Observation Dobs

Model prediction M(Φ)

Likelihood: p(Dobs

|M(Φ))~

exp[-1/(2σ2)(M(Φ)-Dobs

)2]

(a) Normal error model

(b) Bernoulli error model

M(Φ) = pred. occurrence probability π

Dobs

= n observations, k presence

Likelihood:

p(Dobs

|M(Φ)) ~ πk (1-π)(n-k)

Figure 2 Examples of calculating the likelihood of observing Dobs

given the model predictions M(F): (a) a likelihood based on a

normal observation error. (b) A Bernoulli process is one of the

simplest error models for presence/absence observations, given

that the model predicts occurrence probabilities p.

Dynamic vegetation models – an inverse perspective

Journal of Biogeography 39, 2240–2252 2243ª 2012 Blackwell Publishing Ltd

Page 5: Connecting dynamic vegetation models to data – an inverse ......model succession, impacts of disturbances, and transient vegetation dynamics that result from changes in environmental

2005; Schurr et al., 2012). Likelihood functions may be

difficult to formulate for such complicated error structures

within DVMs, but new methods allow approximate likelihood

functions to be constructed directly from the stochastic model

outputs (Hartig et al., 2011). This allows us to go beyond

fitting process-based models with phenomenological error

models towards an approach where dynamic models explain

both the mean effects and the stochasticity in the data through

mechanistic descriptions of the ecological system and the data

acquisition process (Zurell et al., 2009).

Bayes’ theorem

By specifying the likelihood function, we have established a

statistically interpretable measure of fit, which could, for

example, be used to search for the model parameters Q with

the highest likelihood (maximum likelihood estimation).

Often, however, there is additional, independent information

about parameters or processes available, for example from field

data that provide direct parameter estimates. Bayes’ theorem

offers the possibility of merging such independent information

with the inversely generated information that is contained in

the likelihood. The theorem states that

pðHjDÞ ¼ pðDjHÞ � pðHÞpðDÞ

p(Q|D) is called the posterior density, and is interpreted as a

probability density that summarizes our information about the

probable values of Q. The posterior p(Q|D) depends on the

likelihood p(D|Q) (see above), and a new term p(Q), which

will be discussed below (see Gelman et al., 2003 and Stephens

et al., 2007 for more details on the Bayesian approach and its

relation to other measures of statistical inference). The

denominator, p(D), is the integral of the numerator over Q,

and acts only as a normalization constant.

Specifying prior information

The new term, p(Q), is called the ‘prior’. It quantifies the

uncertainty about the model parameters before comparing

model outputs to empirical data. As an example, if measurements

about a parameter such as the specific leaf area (SLA) of a plant

are available, we may summarize this information by a normal

distribution for this parameter (Fig. 3). Also, if a model was

already parameterized with Bayesian methods, and new infor-

mation becomes available, this new information can be merged

with the existing information by using the posterior distribution

from the old data as the prior for the new data (Fig. 4).

The possibility of including prior information about a

parameter in the inference is one of the great advantages of

Bayesian statistics. Often, however, there is a desire to avoid

doing so, either because there is no prior knowledge, or to

avoid the criticism of biasing the posterior in a particular

direction (‘let the data speak’). For this purpose, so called

‘vague’, ‘weak’, ‘non-informative’ or ‘reference’ priors are

used. Intuitively, one might think that the natural choice for

such a prior is the uniform (flat) prior, which associates the

same probability with all parameter values (Fig. 3). However,

it is well known that this is not generally true because it leads

to logical inconsistencies, for example when considering

nonlinear parameter transformations. As a rule of thumb, flat

priors are a reasonable way of expressing the absence of prior

information for parameters that act linearly and uncorrelated

on the model predictions. In other cases, corrections may be

needed (see Kass & Wasserman, 1996).

Checking model assumptions and posterior

diagnostics

A core assumption of all likelihood-based inference is that

deviations between model predictions and observations orig-

inate from a random source of variation that is captured well

by the error model underlying the likelihood function. If there

are structural errors in the model, however, or systematic

measurement errors, these assumptions may not hold. Poster-

ior distributions can then be biased because parameters are

adjusted to compensate for structural or systematic errors.

Detecting such systematic errors in a Bayesian fit, however, is

still helpful because it aids in the identification of problems in

data acquisition and model assumptions.

One diagnostic to check for is the consistency of prior and

posterior values. Typically, one would expect likelihood and

prior distributions to point towards the same range of

parameter values, which will result in a reduction of parameter

uncertainty. However, when there are substantial differences,

posterior uncertainty will remain large, or can even increase

compared to the prior. Although it is possible that such effects

result from pure chance, they may also suggest a structural

problem in the vegetation model, a systematic observation

error in the data, or an error in the specification of the priors.

A second aspect to examine is systematic patterns in the

deviations between model predictions and observed data

(systematic mismatch, correlation, heteroscedasticity) that

are not in accordance with the error model on which the

likelihood was based. Such discrepancies can often be

addressed by correcting the error model. Remaining systematic

Strong / informativeprior

Parameter valueParameter value

Pro

bab

ility

den

sity

Pro

bab

ility

den

sity

Weak / non-informative prior

Figure 3 A conceptual illustration of the difference between a

weak (non-informative) prior on the left, where all parameter

values are equally likely, and a strong (informative) prior on the

right.

F. Hartig et al.

2244 Journal of Biogeography 39, 2240–2252ª 2012 Blackwell Publishing Ltd

Page 6: Connecting dynamic vegetation models to data – an inverse ......model succession, impacts of disturbances, and transient vegetation dynamics that result from changes in environmental

deviations, however, that cannot be sensibly related to a

stochastic process, hint towards structural problems in the

vegetation model or systematic observation errors in the data.

Predictions from posterior parameter estimates

Apart from analysing posterior parameter estimates or using

them as new priors if new data become available (Fig. 4), the

posterior can also be used to estimate predictive uncertainties.

This is usually done by drawing a number of parameter

combinations from the posterior and calculating the

distribution of the model predictions that arises from these

draws.

Comparisons between structurally different models

The cycle of Bayesian parameter estimation, as summarized in

Fig. 4, can also be used to test or improve our knowledge

about underlying processes (e.g. ecophysiology or demogra-

phy) by comparing several structurally different models and

seeing which receives more support from the data. Various

approaches have been advocated, and the choice of method

depends on whether the aim is to find one best model (model

selection), or an ensemble of models, weighted by their model

fit (model averaging, e.g. Butler et al., 2009). For an overview

of these approaches, see the reviews by Johnson & Omland

(2004) and Kadane & Lazar (2004).

Remarks on algorithms and computing time

The estimation of posterior densities can usually not be done

analytically. Therefore, Monte Carlo algorithms such as

importance sampling (IS), Markov chain Monte Carlo

(MCMC) and sequential Monte Carlo (SMC) algorithms are

commonly used. These algorithms generate approximate

posterior densities by evaluating a large number of model

runs with different parameterizations (Andrieu et al., 2003;

Hartig et al., 2011). For most computing languages such as R,

Information independent of the model(prior information)

Comparison of model outputs and observed data

Analysis and applications of the fit

Dynamicvegetation

model

Repeat to add newinformation

Forecasting, quantification of predictive uncertainty

Model selectionand improvement

Measurements

e.g. specificleaf area Stem diameter

Stem diameter

Abu

ndan

ceA

bund

ance

image source: NASA

Error model

Prior parameterestimate Likelihood

of modelpredictionsgiven the observeddata

Posteriorparameterestimate

NPP predicted Forest size structure predicted

Forest size structureobserved

NPP observed

Figure 4 The inverse modelling cycle starts with the specification of prior information (top left). In the next stage, prior probabilities are

multiplied by the likelihood. As a result, we obtain a posterior estimate for parameters or model structure. This posterior estimate may be

used to calculate the predictive uncertainty of the model, to select and improve models, or as an input for a further inverse modelling cycle.

The comparison between model and data shows net primary productivity (NPP) predictions (T. Hickler, unpublished data) from LPJ-

DGVM (Sitch et al., 2003) with hydrology updates by Gerten et al. (2004) as implemented within the generalized ecosystem modelling

framework LPJ-GUESS (Smith et al., 2001) compared with normalized difference vegetation index (NDVI) as proxy for observed NPP from

Terra/MODIS (February 2011), and abundance of trees in different size classes for a tropical mountain forest simulated by the forest model

FORMIND and compared with field data (Dislich et al., 2009).

Dynamic vegetation models – an inverse perspective

Journal of Biogeography 39, 2240–2252 2245ª 2012 Blackwell Publishing Ltd

Page 7: Connecting dynamic vegetation models to data – an inverse ......model succession, impacts of disturbances, and transient vegetation dynamics that result from changes in environmental

MATLAB or PYTHON, there are packages that provide

implementations for the most common algorithms. Also,

there are stand-alone implementations such as OpenBUGS or

JAGS.

A point of practical importance is the number of model

calculations needed for these algorithms: an inverse Bayesian

parameter estimation typically requires about 5 · 104 to

5 · 105 model runs for a model of the order of 20 parameters.

Given that there are limits to the parallelizability particularly of

MCMC algorithms, it follows that a single model run should

not take longer than a minute (a runtime of 1 min for a single

simulation and 105 steps of the algorithm result in a total

runtime of c. 70 days). For models with longer runtimes,

model simplification, model parallelization or model emula-

tion (Conti & O’Hagan, 2010) can be considered.

SELECTING FIELD DATA FOR INVERSE

MODELLING OF VEGETATION MODELS

The final part of this paper is devoted to data and applications.

We discuss what we perceive as the most promising data types

for constraining parameters and processes of DVMs, and argue

that combining multiple data types at different scales provides

a promising general strategy for obtaining strong, independent

data sets.

Data from local vegetation inventories

One traditional source of vegetation data is local vegetation

inventories. Such observations usually describe the vegetation

on sampling plots of up to a few hectares, listing species,

abundance, sizes or biomass, growth and sometimes also the

spatial location of plant individuals. Direct vegetation obser-

vations are comparably abundant for forests [Condit et al.,

2000; Jenkins et al., 2001; ter Steege et al., 2006; see also the

Forest Inventory Analysis program (FIA) and the Center for

Tropical Forest Science (CTFS) forest plot network], but there

are also good datasets available for grasslands (Roscher et al.,

2004), savannas (Sankaran et al., 2005) and from general

databases such as the Global Index of Vegetation-Plot

Databases (see http://www.givd.info/). Additionally, local

observations can be used to generate chronosequences from

stands of different ages (space for time substitution, Pickett,

1989; for an application see Purves et al., 2008). Data are also

accumulating on plant functional traits, for example in the

TRY database (Kattge et al., 2011).

Local vegetation data are particularly useful because they

contain species-specific information about the abundance and

size-structure of the vegetation. Interesting examples of

Bayesian calibrations with local inventory data are provided

by Van Oijen et al. (2005), Higgins et al. (2010), Martınez

et al. (2011) and Weng et al. (2011).

Extending the spatial scale – remote sensing and

distribution data

Local vegetation inventories give a detailed picture of the local

vegetation structure, but they are expensive and so cannot be

extended to continental or global scales. Here, remote sensing

of the vegetation and species distribution databases (Fig. 4)

offer complementary information.

Distribution databases are usually generated from presence

or presence/absence observations as well as from expert

knowledge. Examples are GBIF (http://www.gbif.org/) or the

Flora Europaea database (http://www.faunaeur.org/). Distri-

bution data are the main data source for the inverse calibration

of correlative species distribution models, but they are

currently rarely used for the calibration of DVMs (but see

Purves et al., 2007). One reason for this may be that most

large-scale vegetation models are calibrated for functional

types and not for species. In principle, however, these data

might provide valuable information and could be used with

error functions similar to those used for correlative species

distribution models (see Fig. 2b).

Remote sensing data can be subdivided into two main types:

passive sensors that detect reflected light from the vegetation

(optical, infrared and microwave spectrum), and active sensors

that record the backscattering from either a laser (lidar = light

detection and ranging) or a radar signal (see Table 1;

Chambers et al., 2007; Frolking et al., 2009). Remote sensing

Table 1 Overview of remote sensing technologies and their application.

Type Notes, products, references

Passive Sensors such as Landsat or MODIS (Moderate Resolution Imaging Spectroradiometer) observe the visible, infrared and

microwave spectrum; allows estimation of leaf area via the normalized difference vegetation index, NDVI

(see, e.g. Zhang et al., 2003), as well as water content and temperature profiles of canopies (infrared spectrum,

Fensholt et al., 2010) and soil moisture and temperature (microwave spectrum, Kerr, 2007) and even chemical and

taxonomic diversity (Asner & Martin, 2009).

Laser (Lidar) A lidar device creates a laser beam and measures the reflection created by all optically reflecting material, i.e. ground,

leaves etc. May be used to estimate vegetation height and leaf area index, LAI (Asner et al., 2010; Lefsky, 2010).

Radar A radar device creates an electromagnetic signal, typically radio waves or microwaves, and measures reflections of

this signal. Depending on the wavelength, reflections are triggered particularly by the ‘massive’ parts of the vegetation,

i.e. branches and trunks. Used to estimate aboveground biomass, height/vertical structure of the vegetation (Krieger

et al., 2010; Kohler & Huth, 2010), also soil moisture (Kerr, 2007) and temperature.

F. Hartig et al.

2246 Journal of Biogeography 39, 2240–2252ª 2012 Blackwell Publishing Ltd

Page 8: Connecting dynamic vegetation models to data – an inverse ......model succession, impacts of disturbances, and transient vegetation dynamics that result from changes in environmental

techniques have undergone substantial progress over the past

few decades, so that global maps and time series of vegetation

types, leaf area and productivity are becoming increasingly

available. For all these reasons, a number of authors have

recently stressed the potential of these data for vegetation

modelling, in particular for forest models (Frolking et al.,

2009; Hurtt et al., 2010; Kohler & Huth, 2010).

Error models for remote sensing data will depend on the

variables and the resolution of the data. One obvious issue is that

remote sensing data can be obtained at very high resolutions

(below a metre), which can result in strong spatial autocorre-

lation. To avoid problems associated with this, one can consider

upscaling the data to match the spatial scale of the model, and

one should account for remaining strong residual autocorrelation

(Dormann et al., 2007). In some cases, however, information

may be contained in the small-scale variation itself: Kohler &

Huth (2010), for example, suggest that the local variability of

biomass values contains additional information that may be

used to infer the degree of human forest use (logging). Another

interesting example comes from Patenaude et al. (2008) who

fitted a model with a combination of hyperspectral, lidar and

radar measurements, as well as with additional local stand data.

Extending the temporal scale: eddy flux data and

palaeorecords

In addition to the spatial dimension, a second dimension along

which we can extend our empirical baseline is provided by

observations over time-scales that are not covered by the

aforementioned data types. Here, we discuss two particularly

interesting examples, namely eddy flux measurements and

palaeorecords (Fig. 4).

An eddy flux experiment typically consists of recording

wind, temperature and gas concentrations at different heights

above a vegetated surface. From these data, one can estimate

the heat, water, and gas exchange of the vegetation at scales up

to 30 m. The number of eddy flux experiments has increased

significantly in recent years. Most of these are organized in

FLUXNET (Baldocchi et al., 2001), a network of over 500 eddy

flux sited all over the world.

Eddy flux data are particularly useful for estimating

parameters related to processes that act on short time-scales,

such as photosynthesis or evapotranspiration. Like remote

sensing data, eddy flux data can be measured at a very high

resolution (hours or minutes), but it is important to check

whether this resolution fits to the scale of the model that is

used. When errors originate predominantly from processes

that change slowly compared to the measurement frequency,

observations may show strong temporal autocorrelation.

Hence, either the data have to be summarized at a coarser

temporal scale, or the temporal autocorrelation must be

accounted for in the error model. Friend et al. (2007) and

Williams et al. (2009) give an overview of inverse modelling

with FLUXNET data, partly also coupled with remotely sensed

information (see also: Hanson et al., 2004; Braswell et al.,

2005; Knorr & Kattge, 2005; Medvigy et al., 2009).

Finally, an exciting possibility is to compare vegetation

models to palaeodata. Pollen data, for example, can be used to

reconstruct vegetation composition over long time-scales. At

the same time, global climate reconstructions are available that

may be used as inputs for vegetation models (Allen et al.,

2010). Together, these data might prove invaluable for

inferring processes that are rare, slow or difficult to test, such

as long-distance dispersal abilities of plants, phenotypic

plasticity and rapid evolution or large-scale vegetation

responses to elevated CO2 levels (Miller et al., 2008; Prentice

& Harrison, 2009; McMahon et al., 2011). We stress, however,

that large uncertainties exist due to limited dating accuracy

and spatial resolution, taxonomic uncertainty, unknown

disturbance events, and large differences between species in

pollen productivity (e.g. Sugita et al., 1999). Nevertheless,

palaeodata provide important information on a time-scale

unmatched by other data.

The need to use multiple data types at a time

Which of these many different data types should we use? When

deciding on field data for inverse modelling, we have two aims.

The first aim is to find data that are informative about the

parameters and processes of interest. When a model output is

independent of a subset of the parameters, we cannot expect

field data for this output to be informative about these

parameters. A global sensitivity analysis can be useful for

identifying the relevant parameters for the available data, or

conversely, to identify the data that are needed to fit a

particular parameter subset or submodel.

The second aim relates to the problem of correlation. In

principle, it is always better to have more data, but there is one

caveat: as mentioned before, likelihoods can most easily be

combined when the underlying data are uncorrelated. Combin-

ing strongly correlated data requires more complex error

Figure 5 Complementarity of spatial and temporal scales within

the data sources discussed.

Dynamic vegetation models – an inverse perspective

Journal of Biogeography 39, 2240–2252 2247ª 2012 Blackwell Publishing Ltd

Page 9: Connecting dynamic vegetation models to data – an inverse ......model succession, impacts of disturbances, and transient vegetation dynamics that result from changes in environmental

models, which may increase the computational burden and lead

to instabilities in inference if the correlation structure is not

captured adequately. Thus, information gains have to be

weighed up against potential problems that come with new data.

Correlations, however, may not only be influenced by the

types of data that are chosen, but also by the way the data are

aggregated and represented. Data can be transformed, changed

in scale (up/downscaling) or summarized. A sensible choice

here may considerably reduce correlations. Thus, we believe

that independence of data is a technical problem, but we still

recommend considering the full spectrum of vegetation data

for inference. In particular, we recommend including data that

are measured at different scales (Fig. 5) because errors will

tend to be less correlated and information more complemen-

tary than for data obtained at the same scale (Grimm &

Railsback, 2012). Using all available data sources will not only

facilitate a synthesis of our empirical information as a product

that is more useful for ecological theory and prediction, it will

also facilitate the further development of DVMs by systemat-

ically assessing their ability to match multiple empirical

observations at the same time.

CONCLUSIONS

We have discussed the principles of the Bayesian modelling

approach for inferring parameters and structure of DVMs.

Inverse parameterization of DVMs resembles the correlative

species distribution modelling approach. The importance of

prior knowledge about parameters, and also about model

structure, however, will remain an area where DVMs differ

significantly from correlative modelling approaches. We

therefore think that inverse modelling methods will not, as

one might fear, reduce DVMs to merely a ‘very complicated’

version of a correlative model that is blindly adjusted to data.

Rather, Bayesian methods offer a state-of-the-art technique

that allows us to use all available data (on model inputs and

model outputs) to improve our knowledge about the

processes that govern vegetation dynamics, and hence our

ability to predict the future development of the Earth’s

vegetation.

The additional information that can be gained by applying

inverse modelling methods to heterogeneous data types such as

species distributions, local vegetation inventories, eddy flux

measurements, and remote sensing data offers tremendous

scientific potential (see also Luo et al., 2011). Using many or

all of these data in parallel will allow us to build models that

provide a better representation of the dynamics and functional

diversity of the terrestrial vegetation. Bayesian methods will

also improve our knowledge about the uncertainty of model

predictions, which is an important factor for informing policy.

The benefits of such a route lie not only in the potential to

generate substantially better predictions, making DVMs more

relevant for applied questions, but also in the ability to test

ecological theory with DVMs at different temporal and spatial

scales, which allows fundamental questions of evolution,

biogeography and community ecology to be addressed. For

all these reasons, we anticipate a further growth in the

popularity of Bayesian methods for inferring parameters and

processes of DVMs.

ACKNOWLEDGEMENTS

We would like to thank Christine Romermann, Steve Higgins

and Bob O’Hara for organizing the workshop from which

this paper originated, as well as the LOEWE initiative (see

below) for funding this event. We are grateful for comments

by Olga Bykova, Anne Carney, Carsten Dormann, Martin

Kazmierczak, Peter Linder, Glenn Marion, Peter Pearman,

Christopher Reyer, Stan Schymanski, Robert Whittaker and

three anonymous referees, which greatly improved this

manuscript. F.H. acknowledges support from ERC advanced

grant 233066. Funding for R.B.O’H. and S.S was provided by

the Biodiversity and Climate Research Centre (BiK-F), which

is part of the LOEWE programme ‘Landes-Offensive zur

Entwicklung Wissenschaftlich-okonomischer Exzellenz’ of

Hesse’s Ministry of Higher Education, Research and the

Arts. J.D. acknowledges support from the Helmholtz

Association through the research alliance ‘Planetary Evolution

and Life’.

REFERENCES

Allen, J.R., Hickler, T., Singarayer, J.S., Sykes, M.T., Valdes,

P.J. & Huntley, B. (2010) Last glacial vegetation of

northern Eurasia. Quaternary Science Reviews, 29, 2604–

2618.

Andrieu, C., de Freitas, N., Doucet, A. & Jordan, M.I. (2003)

An introduction to MCMC for machine learning. Machine

Learning, 50, 5–43.

Araujo, M.B. & Guisan, A. (2006) Five (or so) challenges for

species distribution modelling. Journal of Biogeography, 33,

1677–1688.

Asner, G.P. & Martin, R.E. (2009) Airborne spectranomics:

mapping canopy chemical and taxonomic diversity in

tropical forests. Frontiers in Ecology and the Environment, 7,

269–276.

Asner, G.P., Powell, G.V.N., Mascaro, J., Knapp, D.E., Clark,

J.K., Jacobson, J., Kennedy-Bowdoin, T., Balaji, A.,

Paez-Acosta, G., Victoria, E., Secada, L., Valqui, M. &

Hughes, R.F. (2010) High-resolution forest carbon

stocks and emissions in the Amazon. Proceedings of the

National Academy of Sciences of Sciences USA, 107, 16738–

16742.

Baldocchi, D., Falge, E., Gu, L. et al. (2001) FLUXNET: a new

tool to study the temporal and spatial variability of eco-

system-scale carbon dioxide, water vapor, and energy flux

densities. Bulletin of the American Meteorological Society, 82,

2415–2434.

Bohn, K., Dyke, J.G., Pavlick, R., Reineking, B. & Reu, B.

(2011) The relative importance of seed competition,

resource competition and perturbations on community

structure. Biogeosciences, 8, 1107–1120.

F. Hartig et al.

2248 Journal of Biogeography 39, 2240–2252ª 2012 Blackwell Publishing Ltd

Page 10: Connecting dynamic vegetation models to data – an inverse ......model succession, impacts of disturbances, and transient vegetation dynamics that result from changes in environmental

Bonan, G.B. (2008) Forests and climate change: forcings,

feedbacks, and the climate benefits of forests. Science, 320,

1444–1449.

Braswell, B.H., Sacks, W.J., Linder, E. & Schimel, D.S. (2005)

Estimating diurnal to annual ecosystem parameters by

synthesis of a carbon flux model with eddy covariance net

ecosystem exchange observations. Global Change Biology, 11,

335–355.

Bugmann, H. (2001) A review of forest gap models. Climatic

Change, 51, 259–305.

Bugmann, H.K.M. (1996) A simplified forest model to study

species composition along climate gradients. Ecology, 77,

2055–2074.

Butler, A., Doherty, R.M. & Marion, G. (2009) Model aver-

aging to combine simulations of future global vegetation

carbon stocks. Environmetrics, 20, 791–811.

Chambers, J.Q., Asner, G.P., Morton, D.C., Anderson, L.O.,

Saatchi, S.S., Espirito-Santo, F.D.B., Palace, M. & Souza, C.,

Jr (2007) Regional ecosystem structure and function: eco-

logical insights from remote sensing of tropical forests.

Trends in Ecology and Evolution, 22, 414–423.

Clark, J.S. (2005) Why environmental scientists are becoming

Bayesians. Ecology Letters, 8, 2–14.

Clark, J.S. (2010) Individuals and the variation needed for high

species diversity in forest trees. Science, 327, 1129–1132.

Condit, R., Ashton, P.S., Baker, P., Bunyavejchewin, S., Guna-

tilleke, S., Gunatilleke, N., Hubbell, S.P., Foster, R.B., Itoh, A.,

LaFrankie, J.V., Lee, H.S., Losos, E., Manokaran, N., Sukumar,

R. & Yamakura, T. (2000) Spatial patterns in the distribution

of tropical tree species. Science, 288, 1414–1418.

Conti, S. & O’Hagan, A. (2010) Bayesian emulation of com-

plex multi-output and dynamic computer models. Journal of

Statistical Planning and Inference, 140, 640–651.

Cramer, W., Bondeau, A., Woodward, F.I., Prentice, I.C.,

Betts, R.A., Brovkin, V., Cox, P.M., Fisher, V., Foley, J.A.,

Friend, A.D., Kucharik, C., Lomas, M.R., Ramankutty, N.,

Sitch, S., Smith, B., White, A. & Young-Molling, C. (2001)

Global response of terrestrial ecosystem structure and

function to CO2 and climate change: results from six

dynamic global vegetation models. Global Change Biology, 7,

357–373.

Dıaz, S. & Cabido, M. (2001) Vive la difference: plant func-

tional diversity matters to ecosystem processes. Trends in

Ecology and Evolution, 16, 646–655.

Dislich, C., Gunter, S., Homeier, J., Schroder, B. & Huth, A.

(2009) Simulating forest dynamics of a tropical montane

forest in South Ecuador. Erdkunde, 63, 347–364.

Dormann, C.F., McPherson, J.M., Araujo, M.B., Bivand, R.,

Bolliger, J., Carl, G., Davies, R.G., Hirzel, A., Jetz, W., Kis-

sling, D., Kuhn, I., Ohlemuller, R., Peres-Neto, P.R.,

Reineking, B., Schroder, B., Schurr, F.M. & Wilson, R.

(2007) Methods to account for spatial autocorrelation in the

analysis of species distributional data: a review. Ecography,

30, 609–628.

Dormann, C.F., Schymanski, S.J., Cabral, J., Chuine, I.,

Graham, C., Hartig, F., Kearney, M., Morin, X., Romer-

mann, C., Schroder, B. & Singer, A. (2012) Correlation and

process in species distribution models: bridging a dichot-

omy. Journal of Biogeography, 39, 2119–2131.

Elith, J. & Leathwick, J.R. (2009) Species distribution models:

ecological explanation and prediction across space and time.

Annual Review of Ecology, Evolution, and Systematics, 40,

677–697.

Fensholt, R., Huber, S., Proud, S.R. & Mbow, C. (2010)

Detecting canopy water status using shortwave infrared

reflectance data from polar orbiting and geostationary

platforms. IEEE Journal of Selected Topics in Applied Earth

Observations and Remote Sensing, 3, 271–285.

Friend, A.D., Arneth, A., Kiang, N.Y., Lomas, M., Ogee, J.,

Rodenbeck, C., Running, S.W., Santaren, J.-D., Sitch, S.,

Viovy, N., Woodward, F.I. & Zaehle, S. (2007) FLUXNET

and modelling the global carbon cycle. Global Change Biol-

ogy, 13, 610–633.

Frolking, S., Palace, M.W., Clark, D.B., Chambers, J.Q., Shu-

gart, H.H. & Hurtt, G.C. (2009) Forest disturbance and

recovery: a general review in the context of spaceborne re-

mote sensing of impacts on aboveground biomass and

canopy structure. Journal of Geophysical Research, 114,

G00E02.

Galbraith, D., Levy, P.E., Sitch, S., Huntingford, C., Cox, P.,

Williams, M. & Meir, P. (2010) Multiple mechanisms of

Amazonian forest biomass losses in three dynamic global

vegetation models under climate change. New Phytologist,

187, 647–665.

Gelman, A., Carlin, J.B., Stern, H.S. & Rubin, D.B. (2003)

Bayesian data analysis, 2nd edn. Chapman & Hall/CRC,

Boca Raton, FL.

Gerten, D., Schaphoff, S., Haberlandt, U., Lucht, W. & Sitch, S.

(2004) Terrestrial vegetation and water balance – hydro-

logical evaluation of a dynamic global vegetation model.

Journal of Hydrology, 286, 249–270.

Grace, J. (2004) Understanding and managing the global car-

bon cycle. Journal of Ecology, 92, 189–202.

Grimm, V. & Railsback, S.F. (2012) Pattern-oriented model-

ling: a ‘multi-scope’ for predictive systems ecology. Philo-

sophical Transactions of the Royal Society B: Biological

Sciences, 367, 298–310.

Gritti, E.S., Smith, B. & Sykes, M.T. (2006) Vulnerability of

Mediterranean Basin ecosystems to climate change and

invasion by exotic plant species. Journal of Biogeography, 33,

145–157.

Guisan, A. & Thuiller, W. (2005) Predicting species distribu-

tion: offering more than simple habitat models. Ecology

Letters, 8, 993–1009.

Hanson, P.J., Amthor, J.S., Wullschleger, S.D., Wilson, K.B.,

Grant, R.F., Hartley, A., Hui, D., Hunt, E.R., Jr, Johnson,

D.W., Kimball, J.S., King, A.W., Luo, Y., McNulty, G., Sun,

G., Thornton, P.E., Wang, S., Williams, M., Baldocchi, D.D.

& Cushman, R.M. (2004) Oak forest carbon and water

simulations: model intercomparisons and evaluations

against independent data. Ecological Monographs, 74, 443–

489.

Dynamic vegetation models – an inverse perspective

Journal of Biogeography 39, 2240–2252 2249ª 2012 Blackwell Publishing Ltd

Page 11: Connecting dynamic vegetation models to data – an inverse ......model succession, impacts of disturbances, and transient vegetation dynamics that result from changes in environmental

Hartig, F., Calabrese, J.M., Reineking, B., Wiegand, T. & Huth,

A. (2011) Statistical inference for stochastic simulation

models – theory and application. Ecology Letters, 14, 816–

827.

Hickler, T., Vohland, K., Feehan, J., Miller, P.A., Smith, B.,

Costa, L., Giesecke, T., Fronzek, S., Carter, T., Cramer, W.,

Kuhn, I. & Sykes, M.T. (2012) Projecting the future distri-

bution of European potential natural vegetation zones with

a generalized, tree species-based dynamic vegetation model.

Global Ecology and Biogeography, 21, 50–63.

Higgins, S.I., Scheiter, S. & Sankaran, M. (2010) The stability

of African savannas: insights from the indirect estimation

of the parameters of a dynamic model. Ecology, 91, 1682–

1692.

Higgins, S.I., O’Hara, R.B., Bykova, O., Cramer, M.D., Chuine,

I., Gerstner, E.-M., Hickler, T., Morin, X., Kearney, M.R.,

Midgley, G.F. & Scheiter, S. (2012) A physiological analogy

of the niche for projecting the potential distribution of

plants. Journal of Biogeography, 39, 2132–2145.

Hubbell, S.P., He, F., Condit, R., Borda-de-Agua, L., Kellner, J.

& ter Steege, H. (2008) How many tree species are there in

the Amazon and how many of them will go extinct? Pro-

ceedings of the National Academy of Sciences USA, 105,

11498–11504.

Hurtt, G.C., Fisk, J., Thomas, R.Q., Dubayah, R., Moorcroft,

P.R. & Shugart, H.H. (2010) Linking models and data on

vegetation structure. Journal of Geophysical Research, 115,

G00E10.

Huth, A. & Ditzer, T. (2000) Simulation of the growth of a

lowland Dipterocarp rain forest with FORMIX3. Ecological

Modelling, 134, 1–25.

Jeltsch, F., Moloney, K.A., Schurr, F.M., Kochy, M. & Schwa-

ger, M. (2008) The state of plant population modelling in

light of environmental change. Perspectives in Plant Ecology,

Evolution and Systematics, 9, 171–189.

Jenkins, J.C., Birdsey, R.A. & Pan, Y. (2001) Biomass and NPP

estimation for the mid-Atlantic region (USA) using plot-

level forest inventory data. Ecological Applications, 11, 1174–

1193.

Johnson, J.B. & Omland, K.S. (2004) Model selection in

ecology and evolution. Trends in Ecology and Evolution, 19,

101–108.

Kadane, J.B. & Lazar, N.A. (2004) Methods and criteria for

model selection. Journal of the American Statistical Associa-

tion, 99, 279–290.

Kass, R.E. & Wasserman, L. (1996) The selection of prior

distributions by formal rules. Journal of the American Sta-

tistical Association, 91, 1343–1370.

Kattge, J., Dıaz, S., Lavorel, S. et al. (2011) TRY – a global

database of plant traits. Global Change Biology, 17, 2905–2935.

Kerr, Y.H. (2007) Soil moisture from space: where are we?

Hydrogeology Journal, 15, 117–120.

Knorr, W. & Kattge, J. (2005) Inversion of terrestrial ecosystem

model parameter values against eddy covariance measure-

ments by Monte Carlo sampling. Global Change Biology, 11,

1333–1351.

Kohler, P. & Huth, A. (2010) Towards ground-truthing

of spaceborne estimates of above-ground life biomass and

leaf area index in tropical rain forests. Biogeosciences, 7,

2531–2543.

Kohler, P., Ditzer, T. & Huth, A. (2000) Concepts for the

aggregation of tropical tree species into functional types and

the application to Sabah’s lowland rain forests. Journal of

Tropical Ecology, 16, 591–602.

Kramer, K., Degen, B., Buschbom, J., Hickler, T., Thuiller, W.,

Sykes, M.T. & de Winter, W. (2010) Modelling exploration

of the future of European beech (Fagus sylvatica L.) under

climate change – range, abundance, genetic diversity and

adaptive response. Forest Ecology and Management, 259,

2213–2222.

Krieger, G., Hajnsek, I., Papathanassiou, K.P., Younis, M. &

Moreira, A. (2010) Interferometric synthetic aperture radar

(SAR) missions employing formation flying. Proceedings of

the IEEE, 98, 816–843.

Lasslop, G., Reichstein, M., Kattge, J. & Papale, D. (2008)

Influences of observation errors in eddy flux data on inverse

model parameter estimation. Biogeosciences, 5, 1311–1324.

Lefsky, M.A. (2010) A global forest canopy height map from

the Moderate Resolution Imaging Spectroradiometer and

the Geoscience Laser Altimeter System. Geophysical Research

Letters, 37, L15401.

Linder, H.P., Bykova, O., Dyke, J., Etienne, R.S., Hickler, T.,

Kuhn, I., Marion, G., Ohlemuller, R., Schymanski, S.J. &

Singer, A. (2012) Biotic modifiers, environmental modula-

tion and species distribution models. Journal of Biogeogra-

phy, 39, 2179–2190.

Lischke, H., Zimmermann, N.E., Bolliger, J., Rickebusch, S. &

Loffler, T.J. (2006) TreeMig: a forest-landscape model for

simulating spatio-temporal patterns from stand to landscape

scale. Ecological Modelling, 199, 409–420.

Luo, Y., Ogle, K., Tucker, C., Fei, S., Gao, C., LaDeau, S., Clark,

J.S. & Schimel, D.S. (2011) Ecological forecasting and data

assimilation in a data-rich era. Ecological Applications, 21,

1429–1442.

Martınez, I., Wiegand, T., Camarero, J.J., Batllori, E. & Gut-

ierrez, E. (2011) Disentangling the formation of contrasting

tree line physiognomies combining model selection and

Bayesian parameterization for simulation models. The

American Naturalist, 5, E136–E152.

McMahon, S.M., Harrison, S.P., Armbruster, W.S., Bartlein,

P.J., Beale, C.M., Edwards, M.E., Kattge, J., Midgley, G.,

Morin, X. & Prentice, I.C. (2011) Improving assessment

and modelling of climate change impacts on global terres-

trial biodiversity. Trends in Ecology and Evolution, 26, 249–

259.

Medvigy, D., Wofsy, S.C., Munger, J.W., Hollinger, D.Y. &

Moorcroft, P.R. (2009) Mechanistic scaling of ecosystem

function and dynamics in space and time: Ecosystem

Demography model version 2. Journal of Geophysical Re-

search, 114, G01002.

Miller, P.A., Giesecke, T., Hickler, T., Bradshaw, R.H.W., Smith,

B., Seppa, H., Valdes, P.J. & Sykes, M.T. (2008) Exploring

F. Hartig et al.

2250 Journal of Biogeography 39, 2240–2252ª 2012 Blackwell Publishing Ltd

Page 12: Connecting dynamic vegetation models to data – an inverse ......model succession, impacts of disturbances, and transient vegetation dynamics that result from changes in environmental

climatic and biotic controls on Holocene vegetation change in

Fennoscandia. Journal of Ecology, 96, 247–259.

Moorcroft, P.R., Hurtt, G.C. & Pacala, S.W. (2001) A method

for scaling vegetation dynamics: the ecosystem demography

model (ED). Ecological Monographs, 71, 557–586.

Pacala, S.W., Canham, C.D., Saponara, J., Silander, J.A., Kobe,

R.K. & Ribbens, E. (1996) Forest models defined by field

measurements: estimation, error analysis and dynamics.

Ecological Monographs, 66, 1–43.

Parmesan, C. (2006) Ecological and evolutionary responses to

recent climate change. Annual Review of Ecology, Evolution,

and Systematics, 37, 637–669.

Patenaude, G., Milne, R., Van Oijen, M., Rowland, C. & Hill,

R. (2008) Integrating remote sensing datasets into ecological

modelling: a Bayesian approach. International Journal of

Remote Sensing, 29, 1295–1315.

Pereira, H.M., Leadley, P.W., Proenca, V. et al. (2010) Sce-

narios for global biodiversity in the 21st century. Science,

330, 1496–1501.

Pickett, S.T.A. (1989) Space-for-time substitution as an alter-

native to long-term studies. Long-term studies in ecology:

approaches and alternatives (ed. by G.E. Likens), pp. 110–

135. Springer-Verlag, New York.

Prentice, I.C. & Harrison, S.P. (2009) Ecosystem effects of CO2

concentration: evidence from past climates. Climate of the

Past, 5, 297–307.

Prentice, I.C., Bondeau, A., Cramer, W., Harrison, S.P., Hic-

kler, T., Lucht, W., Sitch, S., Smith, B. & Sykes, M.T. (2007)

Dynamic global vegetation modeling: quantifying terrestrial

ecosystem responses to large-scale environmental change.

Terrestrial ecosystems in a changing world (ed. by J.G.

Canadell, D.E. Pataki and L.F. Pitelka), pp. 175–192.

Springer-Verlag, New York.

Purves, D.W., Zavala, M.A., Ogle, K., Prieto, F. & Benayas,

J.M.R. (2007) Environmental heterogeneity, bird-mediated

directed dispersal, and oak woodland dynamics in Medi-

terranean Spain. Ecological Monographs, 77, 77–97.

Purves, D.W., Lichstein, J.W., Strigul, N. & Pacala, S.W. (2008)

Predicting and understanding forest dynamics using a sim-

ple tractable model. Proceedings of the National Academy of

Sciences USA, 105, 17018–17022.

Roscher, C., Schumacher, J., Baade, J., Wilcke, W., Gleixner,

G., Weisser, W.W., Schmid, B. & Schulze, E.-D. (2004) The

role of biodiversity for element cycling and trophic inter-

actions: an experimental approach in a grassland commu-

nity. Basic and Applied Ecology, 5, 107–121.

Sankaran, M., Hanan, N.P., Scholes, R.J. et al. (2005) Deter-

minants of woody cover in African savannas. Nature, 438,

846–849.

Sato, H., Itoh, A. & Kohyama, T. (2007) SEIB-DGVM: a

new Dynamic Global Vegetation Model using a spatially

explicit individual-based approach. Ecological Modelling,

200, 279–307.

Scheiter, S. & Higgins, S.I. (2009) Impacts of climate change on

the vegetation of Africa: an adaptive dynamic vegetation

modelling approach. Global Change Biology, 15, 2224–2246.

Schroder, B. & Seppelt, R. (2006) Analysis of pattern–process

interactions based on landscape models – overview, general

concepts, and methodological issues. Ecological Modelling,

199, 505–516.

Schurr, F.M., Pagel, J., Cabral, J.S., Groeneveld, J., Bykova, O.,

O’Hara, R.B., Hartig, F., Kissling, W.D., Linder, H.P.,

Midgley, G.F., Schroder, B., Singer, A. & Zimmermann, N.E.

(2012) How to understand species’ niches and range

dynamics: a demographic research agenda for biogeography.

Journal of Biogeography, 39, 2146–2162.

Shugart, H.H. (1998) Terrestrial ecosystems in changing envi-

ronments. Cambridge University Press, Cambridge.

Sitch, S., Smith, B., Prentice, I.C., Arneth, A., Bondeau, A.,

Cramer, W., Kaplan, J.O., Levis, S., Lucht, W., Sykes, M.T.,

Thonicke, K. & Venevsky, S. (2003) Evaluation of ecosystem

dynamics, plant geography and terrestrial carbon cycling in

the LPJ dynamic global vegetation model. Global Change

Biology, 9, 161–185.

Sitch, S., Huntingford, C., Gedney, N., Levy, P.E., Lomas, M.,

Piao, S.L., Betts, R., Ciais, P., Cox, P., Friedlingstein, P.,

Jones, C.D., Prentice, I.C. & Woodward, F.I. (2008) Evalu-

ation of the terrestrial carbon cycle, future plant geography

and climate-carbon cycle feedbacks using five Dynamic

Global Vegetation Models (DGVMs). Global Change Biology,

14, 2015–2039.

Smith, B., Prentice, I.C. & Sykes, M.T. (2001) Representation

of vegetation dynamics in the modelling of terrestrial eco-

systems: comparing two contrasting approaches within

European climate space. Global Ecology and Biogeography,

10, 621–637.

ter Steege, H., Pitman, N.C.A., Phillips, O.L., Chave, J.,

Sabatier, D., Duque, A., Molino, J.-F., Prevost, M.-F.,

Spichiger, R., Castellanos, H., von Hildebrand, P. & Vas-

quez, R. (2006) Continental-scale patterns of canopy tree

composition and function across Amazonia. Nature, 443,

444–447.

Stephens, P.A., Buskirk, S.W. & del Rio, C.M. (2007) Inference

in ecology and evolution. Trends in Ecology and Evolution,

22, 192–197.

Sugita, S., Gaillard, M.-J. & Brostrom, A. (1999) Landscape

openness and pollen records: a simulation approach. The

Holocene, 9, 409–421.

Van Oijen, M., Rougier, J. & Smith, R. (2005) Bayesian

calibration of process-based forest models: bridging the

gap between models and data. Tree Physiology, 25,

915–927.

Weng, E., Luo, Y., Gao, C. & Oren, R. (2011) Uncertainty

analysis of forest carbon sink forecast with varying mea-

surement errors: a data assimilation approach. Journal of

Plant Ecology, 4, 178–191.

Williams, M., Richardson, A.D., Reichstein, M., Stoy, P.C.,

Peylin, P., Verbeeck, H., Carvalhais, N., Jung, M., Hollinger,

D.Y., Kattge, J., Leuning, R., Luo, Y., Tomelleri, E.,

Trudinger, C.M. & Wang, Y.P. (2009) Improving land

surface models with FLUXNET data. Biogeosciences, 6,

1341–1359.

Dynamic vegetation models – an inverse perspective

Journal of Biogeography 39, 2240–2252 2251ª 2012 Blackwell Publishing Ltd

Page 13: Connecting dynamic vegetation models to data – an inverse ......model succession, impacts of disturbances, and transient vegetation dynamics that result from changes in environmental

Xiaodong, Y. & Shugart, H.H. (2005) FAREAST: a forest gap

model to simulate dynamics and patterns of eastern Eur-

asian forests. Journal of Biogeography, 32, 1641–1658.

Zhang, X., Friedl, M.A., Schaaf, C.B., Strahler, A.H., Hodges,

J.C.F., Gao, F., Reed, B.C. & Huete, A. (2003) Monitoring

vegetation phenology using MODIS. Remote Sensing of

Environment, 84, 471–475.

Zimmermann, N.E., Edwards, T.C., Graham, C.H., Pearman,

P.B. & Svenning, J.-C. (2010) New trends in species distri-

bution modelling. Ecography, 33, 985–989.

Zurell, D., Berger, U., Cabral, J.S., Jeltsch, F., Meynard, C.N.,

Munkemuller, T., Nehrbass, N., Pagel, J., Reineking, B.,

Schroder, B. & Grimm, V. (2009) The virtual ecologist

approach: simulating data and observers. Oikos, 119,

622–635.

BIOSKETCH

Florian Hartig is interested in the processes that govern the

assembly, dynamics, distribution and evolution of ecological

communities. His current research focuses on diversity

patterns of tropical rain forests. All authors of this paper

share a common interest in vegetation modelling and/or

statistical methodology.

Author contributions: All authors worked together in design-

ing and writing this review. F.H. coordinated the discussion

and led the writing.

Editor: Peter Pearman

The papers in this Special Issue arose from two workshops

entitled ‘The ecological niche as a window to biodiversity’ held

on 26–30 July 2010 and 24–27 January 2011 in Arnoldshain

near Frankfurt, Germany. The workshops combined recent

advances in our empirical and theoretical understanding of the

niche with advances in statistical modelling, with the aim of

developing a more mechanistic theory of the niche. Funding

for the workshops was provided by the Biodiversity and

Climate Research Centre (BiK-F), which is part of the LOEWE

programme ‘Landes-Offensive zur Entwicklung Wissenschaft-

lich-okonomischer Exzellenz’ of Hesse’s Ministry of Higher

Education, Research and the Arts.

F. Hartig et al.

2252 Journal of Biogeography 39, 2240–2252ª 2012 Blackwell Publishing Ltd