bayesian uncertainty analysis with application to tipping points

Bayesian Uncertainty Analysis

for complex physical systems

modelled by computer simulators,

with application to tipping points

Michael Goldstein

Camila Caiado

Department of Mathematical Sciences, Durham University ∗

∗

Thanks to Leverhulme for funding through the Durha Tipping points project, to EPSRC for funding on

the Managing Uncertainty for Complex Models Consortium, to Ian Vernon for Galaxy Analysis, and Jonty

Rougier for Reification

Tipping Points: Work Packages

WP5:Critical Transitions

WP1: Neoglacial climate

transitions in the

North Atlantic

WP2: Financial

Crisis in the

Banking sector

WP4: Metaphor

and Agency

WP3: Mathematical

Basis of

Tipping Points

WP1: Neoglacial climate

transitions in the

North Atlantic

WP2: Financial

Crisis in the

Banking sector

WP4: Metaphor

and Agency

WP3: Mathematical

Basis of

Tipping Points

WP3: The Mathematical Basis of Tipping Points

Objectives

• Assess the predictability of tipping events and the system’s behaviour after

such events

• Develop models for complex systems like paleoclimate reconstruction,

societal dynamics and healthcare monitoring

• Characterization of tipping points

• Investigate deterministic and stochastic approaches to the modelling of

tipping points

Current Projects

• Multi-proxy paleoclimate reconstruction

• Live hospital monitoring

• Agent-based modelling of social dynamics

• Compartmental modelling of health and social issues

Managing uncertainty in complex models

There is a growing field of study relating to uncertainty quantification for

complex physical systems modelled by computer simulators.

Many people work on different aspects of such uncertainty analyses

Great resource: the Managing Uncertainty in Complex Models web-site

http://www.mucm.ac.uk/ (for references, papers, toolkit, etc.)

[MUCM is a consortium of U. of Aston, Durham, LSE, Sheffield, Southampton -

with Basic Technology funding. Now mutating into MUCM community.]

The aims of this talk are

(i) to introduce and illustrate some of the basic principles in this area

(ii) to discuss how this methodology applies to the study of tipping points.

EXAMPLE:Understanding the Universe

Major advances in cosmology in the last 100 years (mainly thanks to Einstein)

• Universe began in hot dense state: The Big Bang

• Since then Universe has been expanding rapidly

Cosmologists have spent much time and money researching the beginning, the

evolution, the current content, and the ultimate fate of the Universe.

Now know that the observable Universe is composed of billions of galaxies

each made up of 10 million - 10 trillion stars

How did these galaxies form?

Andromeda Galaxy and Hubble Deep Field View

• Andromeda Galaxy: closest large galaxy to our own milky way, contains 1

trillion stars.

• Hubble Deep Field: furthest image yet taken. Covers 2 millionths of the sky

but contains over 3000 galaxies.

Dark Matter and the Evolution of the Universe

Recent observations of galaxies have suggested that only 3 percent of the

entire energy content of the universe is the normal matter which forms stars,

planets and us.

A further 23 percent is ’Dark Matter’ (and the rest is Dark Energy).

Dark Matter cannot be ’seen’ as it does not give off light (or anything else).

However it does have mass and therefore affects stars and galaxies via gravity.

In order to study the effects of Dark Matter cosmologists try to model Galaxy

formation

• Inherently linked to amount of Dark Matter

• Of fundamental interest as tests cosmologists’ knowledge of a wide range

of complicated physical phenomena

Simulating Galaxy Evolution: Two Stage approach

The simulation is performed in two parts:

[1] First an N-Body simulation is run to determine the behaviour of fluctuations

of mass in the early Universe, and their subsequent growth into millions of

galaxy sized lumps of mass in the following 12 billion years.

[A very heavy simulation which takes 3 months, done on a supercomputer and

cannot be easily repeated.]

[2] These results on the behaviour of the massive lumps are then used by a

more detailed Galaxy Formation simulation (called GALFORM) which models

the far more complicated interactions of normal matter: gas cloud formation,

star formation and the effects of black holes at the centre of galaxies.

The first simulation is done on a volume of size (500 Mega-Parsec)3 or (1.63

billion light-years)3

This volume is split into 512 sub-volumes which are independently simulated

using the second model GALFORM. This simulation is run on upto 256 parallel

processors, and takes 20-30 minutes per sub-volume per processor

Universe at < 100 million years

CDM z=5.0

B−V

−0.1 0.4 0.9

Λ

Benson, Frenk, Baugh, Cole & Lacey (2001)

−22.5−20.5−19.5−19.0 −20.0 −22.0−21.5

B

−21.0

20

Mpc

−18.5

h− 5 log M

−1

h

Universe at ∼ 1 billion years

CDM z=3.0

B−V

−0.1 0.4 0.9

Λ


−22.5−20.5−19.5−19.0 −20.0 −22.0−21.5

B

−21.0

20

Mpc

−18.5

h− 5 log M

−1

h

Universe at ∼ 2 billion years

CDM z=2.0

B−V

−0.1 0.4 0.9

Λ


−22.5−20.5−19.5−19.0 −20.0 −22.0−21.5

B

−21.0

20

Mpc

−18.5

h− 5 log M

−1

h

Universe 4 billion years

CDM z=1.0

B−V

−0.1 0.4 0.9

Λ


−22.5−20.5−19.5−19.0 −20.0 −22.0−21.5

B

−21.0

20

Mpc

−18.5

h− 5 log M

−1

h

Universe 13 billion years (Today)

CDM z=0.0

B−V

−0.1 0.4 0.9

Λ


−22.5−20.5−19.5−19.0 −20.0 −22.0−21.5

B

−21.0

20

Mpc

−18.5

h− 5 log M

−1

h

Galform: Inputs and Outputs

Outputs: Galform provides many outputs but we start by looking at the bj and

K luminosity functions

• bj luminosity function: the number of blue (i.e. young) galaxies of a certain

luminosity per unit volume

• K luminosity function: the number of red (i.e. old) galaxies of a certain

luminosity per unit volume

These outputs can be compared to observational data

Inputs: 17 input variables reduced to 8 after expert judgements. These include:

• vhotdisk: relative amount of energy in the form of gas blown out of a galaxy

due to star formation

• alphacool: regulates the effect the central black hole has in keeping large

galaxies ’hot’

• yield: the metal content of large galaxies

and five others: alphahot, stabledisk, epsilonStar, alphareheat and vhotburst

Observational Data: Galaxy Surveys

Earth at centre of image. Data taken by telescopes looking in two seperate

directions. Galaxies observed up to a distance of 1.2 billion light years.

Galaxy Formation: Main Issues

Basic Questions

• Do we understand how galaxies form?

• Could the galaxies we observe have been formed in the presence of large

amounts of dark matter?

Fundamental Sources of Uncertainty

• We only observe the galaxies in our ‘local’ region of the Universe: it is

possible that they are not representative of the whole Universe.

• The output of the simulation is a ‘possible’ Universe which should have

similar properties to ours, but is not an exact copy.

• The output of the simulation is 512 different computer models for “slices” of

the universe which are exchangeable with each other and (hopefully) with

slices of our universe.

• We are uncertain which values of the input parameters should be used

when running the model

Computer simulators

A simulator f is a deterministic complex computer model for the physical

system. We denote the simulator as

y = f(x)

where x are uncertain model parameters, corresponding to unknown system

properties.

• We have n evaluations of the simulator at inputs X = (x1, . . . , xn).

• We denote the resulting evaluations as F = (f(x1), . . . , f(xn)).

How to relate models and physical systems?

• Basic ingredients:

x∗: system properties (unknown)

y: system behaviour (influenced by x∗)

z: partial observation of y (with error)

• Ideally, we would like to construct a deterministic model f , embodying the

laws of nature, which satisfies

y = f(x∗)

• In practice, however, the our actual model f is inadequate as:

(i) f simplifies the physics;

(ii) f approximates the solution of the physical equations

• Further practical issues:

(i) x may be high dimensional

(ii) evaluating f(x) for any x may be VERY expensive.

The Best input

How does learning about f teach us about y?

The simplest (and therefore most popular) way to relate uncertainty about the

simulator and the system is the so-called “Best Input Approach”.

We proceed as though there exists a value x∗ independent of the function fsuch that the value of f∗ = f(x∗) summarises all of the information that the

simulator conveys about the system.

Define the model discrepancy as ǫ = y − f∗

Our assumption is that ǫ is independent of f, x∗.

(Here, and onwards, all probabilistic statements relate to the uncertainty

judgements of the analyst.)

This formulation begs questions about why should x∗ correspond to “true”

system properties. We’ll come back to this later.

Representing beliefs about f using emulators

An emulator is a probabilistic belief specification for a deterministic function.

Our emulator for component i of f might be

fi(x) =∑

j

βij gij(x) + ui(x)

where B = {βij} are unknown scalars, gij are known deterministic functions

of x, and u(x) is a weakly stationary stochastic process.

[A simple case is to suppose, for each x, that ui(x) is normal with constant

variance and Corr(ui(x), ui(x′)) is a function of ‖x− x′‖.]

Bg(x) expresses global variation in f . u(x) expresses local variation in fThe emulator expresses prior uncertainty judgements about the function.

These are modified by observation of F .

Emulator comments

We fit the emulator, fi(x) =∑

j βij gij(x) + ui(x) given a collection of

model evaluations, using our favourite statistical tools - generalised least

squares, maximum likelihood, Bayes - with a generous helping of expert

judgement.

So, we need careful experimental design to choose which evaluations of the

model to make, and detailed diagnostics, to check emulator validity.

We have some useful backup tricks - for example, if we can only make a few

evaluations of our model, we may be able to make many evaluations of a

simpler approximate version of the model to get us started.

From the emulator, we may extract the mean, variance and covariance for the

function, at each input value x.

µi(x) = E(fi(x))κi(x, x

′) = Cov(fi(x), fi(x′))

Uncertainty analysis for complex models

Aim: to tackle problems arising from the uncertainties inherent in imperfect

computer models of highly complex physical systems, using a Bayesian

formulation. This involves

• prior probability distribution for best inputs x∗

• a probabilistic emulator for the computer function f• a probabilistic discrepancy measure relating f(x∗) to the system y• a likelihood function relating historical data z to y

This full probabilistic description provides a formal framework to synthesise

expert elicitation, historical data and a careful choice of simulator runs.

We may then use our collection of computer evaluations and historical

observations to analyse the physical process to

• determine values for simulator inputs (calibration; history matching);

• assess the future behaviour of the system (forecasting).

• “optimise” the performance of the system

Bayes linear analysis

Within the Bayesian approach, we have two choices.

(i) Full Bayes analysis: complete joint probabilistic specification for all quantities

(ii) Bayes linear analysis, based only on expectation as a primitive, involving

prior specification of means, variances and covariances

Probability is the most common choice, but there are advantages in working

with expectations - the uncertainty specification is simpler, the analysis

(particularly for experimental design) is much faster and more straightforward

and there are careful foundations.

The statistical approach based around expectation is termed Bayes linear

analysis, based around these updating equations for mean and variance:

Ez[y] = E(y) + Cov(y, z)Var(z)−1(z − E(z)),Varz[y] = Var(y)− Cov(y, z)Var(z)−1Cov(z, y)

Some of the examples that we will describe use Bayes linear methods.

(There are natural (but much more complicated) probabilistic counterparts.)

History matching

Model calibration aims to identify the best choices of input parameters x∗,

based on matching data z to the corresponding simulator outputs fh(x).However

(i) we may not believe in a unique true input value for the model;

(ii) we may be unsure whether there are any good choices of input parameters

(iii) full probabilistic calibration analysis may be very difficult/non-robust.

A conceptually simple procedure is “history matching”, i.e. finding the

collection, C(z), of all input choices x for which you judge the match of the

model outputs fh(x) to observed data, z, to be acceptably small, taking into

account all of the uncertainties in the problem.

If C(z) is non-empty, then an analysis of its elements reveals the constraints

on the parameter space imposed by the data.

Further the model projections f(x) : x ∈ C(z) over future outcomes, reveal

the futures consistent with the model physics and the historical data.

If the data is informative for the parameter space, then C(z) will typically form

a tiny percentage of the original parameter space, so that even if we do wish to

calibrate the model, history matching is a useful prior step.

History matching by implausibility

We use an ‘implausibility measure’ I(x) based on a probabilistic metric (eg.

number of sd between z and fh(x)) where z = yh ⊕ e, yh = fh(x∗)⊕ ǫ for

observational error e, and model disrepancy ǫ.For example, if we are matching a single output, then we might choose

I(x) =(z − E(fh(x)))

2

Var(z − E(fh(x)))

where Var(z − E(fh(x))) is the sum of measurement variance, Var(e),structural discrepancy variance, Var(ǫ), and emulator variance Var(fh(x)).The implausibility calculation can be performed univariately, or by multivariate

calculation over sub-vectors. The implausibilities are then combined, such as

by using IM (x) = maxi I(i)(x), and can then be used to identify regions of xwith large IM (x) as implausible, i.e. unlikely to be good choices for x∗.

With this information, we can then refocus our analysis on the ‘non-implausible’

regions of the input space, by (i) making more simulator runs & (ii) refitting our

emulator over such sub-regions and repeating the analysis. This process is a

form of iterative global search.

Back to Galaxy Formation

We want to history match the Galaxy Formation model Galform using the

emulation and implausibility techniques that we have outlined.

We want to reduce the volume of input parameter space as much as we can by

discarding all points that we are (reasonably) sure will not give an ’acceptable’

fit to the output data

We do this in stages, as follows:

• design a set of runs of the simulator within the input volume of interest

• choose a subset of the outputs for which we have system observations

• emulate these outputs

• calculate implausibility over the selected input volume

• discard all x input points that have implausibility greater than a certain cutoff

This process is then repeated. This is refocusing. As we are now in a reduced

input volume, outputs may be of simpler form and therefore easier to emulate.

As we have reduced the variation in the ouputs arising from the most important

inputs, this also allows us to assess variation due to secondary inputs.

Galform analysis

Following the cosmologist own attempt to history match Galform, we chose to

run only the first 40 sub-volumes (out of 512) and examine their mean. The

simulator function fi(x) is now taken to be the mean of the luminosity outputs

over the first 40 sub-volumes.

Design: 1000 point Latin Hypercube design in key inputs

Outputs: Decided to choose 11 outputs from the luminosity functions as they

could be emulated accurately

Active Variables: For each output we choose 5 active variables xA, i.e. those

inputs which are the most important for explaining variation in the output.

We then emulate each of the 11 outputs univariately using:

fi(x) =∑

j

βij gij(xA) + ui(x

A) + δi(x)

where now B = {βij} are unknown scalars, gij are now monomials in xA of

order 3 or less, and u(xA) is a gaussian process. The nugget δi(x) models

the effects of inactive variables as random noise.

11 Output points Chosen

Outputs chosen to be informative enough to allow us to cut down the parameter

space, but simple enough to be emulated easily.

Model Discrepancy

Before calculating the implausibility we need to assess the Model Discrepancy

and Measurement error.

Model Discrepancy has three components:

• ΦE : Expert assessment of model discrepancy of full model with 17

parameters and using 512 sub-volumes

• Φ40: Discrepancy term due to (i) choosing first 40 sub-volumes from full

512 sub-volumes, and (ii) need to extrapolate to our universe. Assess this

by repeating 100 runs but now choosing 40 random regions.

[More carefully, we may construct an exchangeable system of emulators to

fully account for this discrepancy.]

• Φ12: As we have neglected 9 parameters (due to expert advice) we need to

assess effect of this (by running latin hypercube design across all 17

parameters)

Measurement Error

Observational Errors composed of 4 parts:

• Normalisation Error: correlated vertical error on all luminosity output points

• Luminostiy Zero Point Error: correlated horizontal error on all luminosity

points

• k + e Correction Error: Outputs have to be corrected for the fact that

galaxies are moving away from us at different speeds (light is red-shifted),

and for the fact that galaxies are seen in the past (as light takes millions of

years to reach us)

• Poisson Error: assumed Poisson process to describe galaxy production (not

very accurate assumption!)

Implausibility

We can now calculate the Implausibility at any input parameter point x for

each outputs, given by:

I(i)(x) = |E(fi(x))− zi|2/(Var(fi(x)) +MD +OE)

where E(fi(x)) and Var(fi(x)) are the emulator expectation and variance, ziare the observed data and MD and OE are the Model Discrepancy and

Observational Errors

We can then combine the implausibilities across outputs by maximizing over

outputs, as IM (x) = maxi I(i)(x).Alternately, we can use a multivariate Implausibility measure:

I2(x) = (E(fi(x))− zi)T (Var(fi(x)) +MD +OE)−1(E(fi(x))− zi).

where Var(fi(x)), MD and OE are now the multivariate emulator variance,

multivariate model discrepancy and multivariate observational errors

respectively.

Summary of Results

We have completed Five Stages:

In later stages, we use a Multivariate Implausibility measure.

No. Model Runs No. Active Vars Adjusted R2 Space Remaining

Stage 1 1000 5 0.58 - 0.90 8.0 %

Stage 2 1916 8 0.83 - 0.98 2.9 %

Stage 3 1487 8 0.79 - 0.99 1.2 %

Stage 4 1899 10 0.75 - 0.99 0.12 %

In wave 5, we evaluate many good fits to data, and we stop.

Some of these choices give simultaneous matches to data sets that the

Cosmologists have been unable to match before.

2D Implausibility Projections: Stage 1 (8%) to Stage 4 (0.12%)

Linking models to reality: exchangeability

We must compensate for our lack of knowledge about the state of dark matter

over all of time and space (which the simulator requires). This is typical of initial

condition/forcing function uncertainty and is a large factor in the mismatch

between model and reality.

What Galform provides is 520 exchangeable computer models, f r(x) (one for

each description of dark matter). The exchangeability representation for

random functions allows us to express each function as

f r(x) = M(x)⊕Rr(x)M(x) is the mean function

Rr(x) are the (uncorrelated, exchangeable, mean zero) residual functions.

If we consider dark matter in our universe to be exchangeable with the 520

individual simulations, then our emulator for Galform evaluated on the correct

dark matter configuration is

f∗(x) = (MB +R∗

B)g(x) +Me(x) +R∗

e(x)We cannot evaluate this simulator (because we don’t know the appropriate dark

matter configuration) but we can emulate it, based on a detailed analysis of the

Galform experiments.

Reification

Consider both our inputs x and the simulator f as abstractions/simplifications

of real physical quantities and processes (through approximations in physics,

solution methods, level of detail, limitations of current understanding) to a much

more realistic simulator f∗, for which real, physical x∗ would be the best input,

in the sense that (y − f∗(x∗)) would be judged independent of (x∗, f∗).

We call f∗ the reified simulator (from reify: to treat an abstract concept as if it

was real). Our model f is informative for y because f is informative for a more

elaborate model f∗

Suppose that our emulator for f is f(x) = Bg(x) + u(x)Our simplest emulator for f∗ might be

f(x,w) = B∗g(x) + u∗(x) + u∗(x,w) where we might model our

judgements as B∗ = CB + Γ for known C and uncertain Γ, correlate u(x)and u∗(x), but leave u∗(x,w), involving any additional parameters, w,

uncorrelated.

Structured reification: systematic probabilistic modelling for all those aspects of

model deficiency whose effects we are prepared to consider explicitly.

Reification

Reification

Model, f

||①①①①①①①①①①

$$■■■

■■■■

■■■

‘Best’ input, x∗

��

Discrepancy

��

Measurementerror

��

Modelevaluations

f(x∗) // Actualsystem

// Systemobservations

Reification

Model, f

||①①①①①①①①①①

��

‘Best’ input, x∗

��

Discrepancy

��

Measurementerror

��

Modelevaluations

f∗ // f∗(x∗) // Actualsystem

// Systemobservations

Structural reification

Usually, we can think more carefully about the reasons for our model’s

inadequacy.

• Often we can imagine specific generalisations for f with extra model

components and parameters giving an better model f ′(x, v).• Suppose f ′(x, v0) = f(x). We might emulate f ′ ‘on top’ of f , i.e. as

f ′

i(x, v) = fi(x) +∑

k

γik gik(x, v) + ui(x, v)

where gik(x, v0) = ui(x, v0) = 0.

• The reified emulator for f∗

i (x, v, w) would then be

∑

j

β∗

ij gij(x) +∑

k

γ∗ik gik(x, v) + u∗i (x) + u∗i (x, v) + u∗i (x, v, w)

where we now model the relationship between the coefficients in the three

emulators.

The Zickfeld et al (2004) model of the Atlantic

Tropics North

2

3

4

Fresh water forcing, Fresh water forcing,

South

Temperature forcing,

1

Meridionaloverturning,

F1 F2

T ∗

1 T ∗

2T ∗

3

m

• Model parameters: T ∗

1 , T∗

2 , T∗

3 , Γ, k (last two not shown above).

• Model outputs: Steady state (SS) temperature for compartments 1, 2 and 3,

SS salinity differences between compartments 1 & 2 and between 2 & 3.

SS. overturning m, and critical freshwater forcing F crit1 .

One possible generalisation

An extra compartment at the ‘southern’ end denoting ‘Other oceans’.

Other oceans Tropics North

2

3

4


South



1B

1A

F1 F2

T ∗

1 T ∗

2T ∗

3

m

T5 = T ∗

5S5 = S0

(1− q)mqm

qm

• Two extra model parameters, T ∗

5 and q, with q = 0 returning us to the

original model.

• Same model outputs as before.

The system and the data

• Our system y is the Atlantic, but, due to the very aggregate nature of our

model it is more natural to use data from a much larger model (CLIMBER-2)

that has been carefully tuned to the Atlantic in a separate experiment. The

data from CLIMBER-2 comprises SS. temperatures, SS. salinity differences

and SS. overturning. We write

y = f∗(x∗, v∗) + ǫ∗ and z = Hy.

• For the discrepancy,

Varǫ∗i =

0.842 i = 1, 2, 3[SSTemps]

0.0752 i = 4, 5[SS SalinityDifferences]

3.302 i = 6[SSOverturning]

0.0442 i = 7[F crit1 ].

Some aspects of our emulation

• Our emulator for F crit1 (x) is (×10−3)

87− T ∗

2 + 68(T ∗

1 − T ∗

2 )− 5(T ∗

3 − T ∗

1 ) + 14Γ + 21 k + u7(x)

with Sd(u7(x)) = 21 [based on 30 evaluations of the simulator]

• Assess the difference between f ′ and f , and between f∗ and f ′ as

∆′ = E(Var(f ′(x∗, v∗)− f(x∗))|x∗, v∗)∆∗ = E(Var(f∗(x∗, v∗, w∗)− f ′(x∗, v∗))|x∗, v∗, w∗)

For F crit1 our modelling leads to the following assignments:

SD(f(x∗)) = 0.079√∆′ = 0.033√∆∗ = 0.066

SD(ǫ∗) = 0.044

Our resulting predictions for F crit1

1. From 30 evaluations of f , plus the reified statistical modelling, our prior

prediction is

F crit1 = 0.085± 0.224 (mean ±2 std. dev.)

2. Using the CLIMBER-2 data z, then the Bayes linear adjusted prediction is

F crit1 = 0.119± 0.099

[Our conclusions are ‘reasonably insensitive’ to moderate tweaks to our

prior values, and our modelling survives ‘reasonable diagnostic testing’.]

3. Model Design The effect of making many more evaluations of current

simulator f would be to reduce uncertainty about F crit1 by around 2%.

In comparison, the effect of constructing and evaluating the 5 compartment

model f ′ would be to reduce uncertainty in F crit1 by around 10%.

History matching for tipping points

The methods that we have described involve emulation of the computer

simulator, treated as a smooth function of the inputs.

If simulator output is a time series with more than one form of limiting behaviour

(e.g. collapse or non-collapse of the MOC), then often the input space divides

into regions Ri such that simulator output is smooth within each region, but

discontinuous across boundaries.

To identify qualitative behaviour consistent with the simulator and historical

observation, we may aim to eliminate all regions but one.

We may be able to do this by

(i) history matching based on early parts of the time series, if early time

simulator behaviour is smooth across regions, or

(ii) by introducing observations on additional outputs, which allow improved

history matching.

Here, we outline a general method when neither of these possibilities applies.

Illustration: History matching for Zickfield’s model

For illustration, we apply the method to Zickfield’s four-box model.

We run the model forward from the present day to equilibrium.

We want to identify whether the simulator suggests collapse, or not, given

current history.

There are many analyses that we could perform.

Although the model is fast to evaluate, we aim to produce methods that are

applicable for expensive models.

(In practice, we would use many evaluations of fast models to create informed

priors for the small number of evaluations of the slow models.)

Therefore, in this illustration,

(i) we use simulator ensembles of 25 runs per iteration and

(ii) we suppose that we collect and re-analyse data once every ten years.

Reminder: the Zickfeld model of the Atlantic

Tropics North

2

3

4


South


1


F1 F2

T ∗

1 T ∗

2T ∗

3

m

• Model parameters: T ∗

1 , T∗

2 , T∗

3 , Γ, k• Model outputs: Temperature for compartments 1, 2 and 3,

Salinity differences between compartments 1 & 2 and between 2 & 3.

Overturning m.

Four-box model - Zickfield et al (2004)

T1 =m

V1(T4 − T1) + λ1(T

∗

1 − T1)

T2 =m

V2(T3 − T2) + λ2(T

∗

2 − T2)

T3 =m

V3(T1 − T3) + λ3(T

∗

3 − T3)

T4 =m

V4(T2 − T4)

S1 =m

V1(S4 − S1) +

S0F1

V1

S2 =m

V2(S3 − S2) +

S0F2

V2

S3 =m

V3(S1 − S3) +

S0(F1 − F2)

V3

S4 =m

V4(S2 − S4)

m = k [β(S2 − S1)− α(T2 − T1)]

λi =Γ

cρ0zi, i = 1, . . . , 4

Variables of interest:

• ’overturning’ m• freshwater flux into the tropical

box F1

• thermal coupling constant Γ

We treat m as a function of F1 and Γ

x = (F1,Γ)

m(t) = f(x, t), t ≥ 0

meq = m(t → ∞)

Time series

0 100 200 300 400 500 6000

5

10

15

20

25

30

35

40

45

Time (years)

Ove

rtur

ning

(S

v)

25 runs of the model for different choices of x.

Runs which collapse are in red.

Classification and emulation

Suppose the simulator output is a time series f(x, t)Suppose there are two equilibrium regions for f , R1, R2, whose qualitative

behaviour is different (eg collapse/non-collapse).

We may proceed as follows.

(i) Choose a training sample X = (x1, ..., xm).Run the simulator to equilibrium for each member of X .

Separate X into X1 and X2, depending on the region for the equilibrium value.

(ii) Construct a probabilistic classifier, P(x ∈ R1), based on the training

sample.

Divide the input space into 3 regions, RX1, RX2, RX0, depending on

whether the probabilistic classifier assigns high, low or medium probability that

x ∈ R1, for each input choice, x.

(iii) Build emulators f1(x, t), f2(x, t) for

x ∈ RX∗

i = (RXi ∪RX0), i = 1, 2, respectively.

[We may want/need to expand our traing sample for this.]

Equilibrium

Γ

F1

20 30 40 50 60 70-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

5

10

15

20

25

30

35

40

Equilibrium overturning value for each value of x.

Collapsed values in red. (The boundary is sharp.)

Classifying the equilibrium

20 40 60-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

Γ

F1

Training Sample

5

10

15

20

25

30

35

40

Γ

F1

Region separation

20 40 60-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

Region 2

Gray Area

Region 1

LEFT Equilibrium values for the 25 runs in the training sample.

Collapsed values in red.

Classifying the equilibrium

20 40 60-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

Γ

F1

Training Sample

5

10

15

20

25

30

35

40

Γ

F1

Region separation

20 40 60-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

Region 2

Gray Area

Region 1

RIGHT Probabilistic classifier, gives posterior probability for each region, given

training sample. Light green area where posterior probability not near one for

either region.

Emulated time series

0 100 200 300 400 500 6000

5

10

15

20

25

30

35

40

45

time (years)

Ove

rtun

ring

(Sv)

In region 1, we emulate as f1(x, t) ∼ GP (a1 + a2t+ a3t2 + a4t

3,Σ1) and

in region 2, as f2(x, t) ∼ GP (b1 + b2t+ b3t2,Σ2) where ai and bj are

functions of x and Σi = si exp (−(x− x′)′Pi(x− x′)).


0 100 200 300 400 500 6000

5

10

15

20

25

30

35

40

45

time (years)

Ove

rtun

ring

(Sv)

For illustration, we choose as value for x∗, the present-day values used in the

CLIMBER 2 model, and display the emulated future overturning time series

(mean and 2 sd bands) under emulators 1 (collapse) and 2 (no collapse).


0 100 200 300 400 500 6000

5

10

15

20

25

30

35

40

45

time (years)

Ove

rtun

ring

(Sv)

We add the actual time series for this choice x∗.


0 100 200 300 400 500 6000

5

10

15

20

25

30

35

40

45

time (years)

Ove

rtun

ring

(Sv)

Continuing the illustration, we now add a ‘plausible’ synthetic observed data

series, by first adding correlated model discrepancy then observational error.

Iterative history matching

(iv) History match observed historical data, using emulator fi over

RX∗

i , i = 1, 2.

[(i) We might choose to assess different structural discrepancies for the two

cases.

(ii) We might make some additional runs of f(x, t) just over historical time - if

this is much faster than running to equilibrium.]

Remove implausible values of x from RXi, based on history match using fi,and from RX0 if implausible given both matches.

(v) Recompute probabilistic classifier over reduced input space, re-assess

regions RX1, RX2, RX0, and re-emulate f1, f2.

[We might want/need to resample f(x, t).](vi) Repeat the sequence:

sample function, classify, emulate, history match.

If we are only interested in qualitative behaviour, we stop when RX0 and one

of RX1, RX2 is empty.

(vii) We may not be able to achieve this with current information, so we may

need to repeat this process every time period as new data comes in.


0 100 200 300 400 500 6000

5

10

15

20

25

30

35

40

45

time (years)

Ove

rtun

ring

(Sv)

Now observe first 10 years of this data and history match using each emulator.

Resample 25 new values for x, re-evaluate probabilistic classifier, and

re-assess each emulator.

Implausibility

Implausibility - Region 1

Γ

F1

20 30 40 50 60 70-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

5

10

15

20

25

30

Γ

F1

Implausibility - Region 1 - Cutoff 3

20 30 40 50 60 70-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2


Γ

F1

20 30 40 50 60 70-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

2

4

6

8

10

12

14

16

18

Γ

F1


20 30 40 50 60 70-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

Not implausible

Implausible

Not implausible

Implausible

Left hand side Implausibility plot using each of the two emulators.

[White line separates regions. White cross is x∗.]

Implausibility


Γ

F1

20 30 40 50 60 70-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

5

10

15

20

25

30

Γ

F1


20 30 40 50 60 70-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2


Γ

F1

20 30 40 50 60 70-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

2

4

6

8

10

12

14

16

18

Γ

F1


20 30 40 50 60 70-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

Not implausible

Implausible

Not implausible

Implausible

Right hand side Implausible points in red, not implausible in blue.

[White line separates regions. White cross is x∗.]

Revised emulators

0 100 200 300 400 500 6000

5

10

15

20

25

30

35

time (years)

over

turn

ing

(Sv)

The two revised emulators for f(x∗).Note that the actual series for f(x∗) is now well within the appropriate

emulator.

Before 30 years, we can identify that the model and data are inconsistent with

collapse.

Time to classification

20 30 40 50 60 70

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0

1.6

2.7

4.5

7.4

12.2

20.1

33.1

54.6

Repeating this process each ten years (observe the data, sample 25 new

values, re-emulate and history match), for each possible choice for x∗, we plot

the time at which we identify, for sure, the region (collapse or no collapse),

containing this value.

Concluding comments

We have discussed some of the many aspects of uncertainty analysis for

complex physical systems modelled by computer simulators. In particular:

Multi-output emulation

Structural analysis relating simulators to the world

Iterative history matching.

These methods are widely applicable. In particular, they open a variety of ways

of investigating tipping points systematically.

References

I. Vernon, M. Goldstein, R. Bower (2010), Galaxy Formation: a Bayesian

Uncertainty Analysis (with discussion), Bayesian Analysis, 5(4): 619–670.

M. Goldstein and J.C.Rougier (2009). Reified Bayesian modelling and

inference for physical systems (with discussion), JSPI, 139, 1221-1239

J. Cumming, M. Goldstein (2009) Small Sample Bayesian Designs for

Complex High-Dimensional Models Based on Information Gained Using Fast

Approximations, Technometrics, 51, 377-388

M. Goldstein Subjective Bayesian analysis: principles and practice (2006)

Bayesian Analysis, 1, 403-420 (and ‘Rejoinder to discussion’: 465-472)

M. Goldstein, D.A. Wooff (2007) Bayes Linear Statistics: Theory and

Methods, Wiley

M.C. Kennedy, A. O’Hagan (2001). Bayesian calibration of computer models

(with discussion). Journal of the Royal Statistical Society, B,63, 425-464

J. J. Bissell, C. C. S. Caiado, M. Goldstein, B. Straughan (2013)

Compartmental modelling of social dynamics with generalised peer incidence,

Mathematical Models and Methods in Applied Sciences, to appear.