gaussian process modelling. gem-sa course - session 22 outline emulators the basic gp emulator...

Gaussian process modelling

GEM-SA course - session 2 2

Outline

Emulators

The basic GP emulator

Practical matters

Emulators


Simulator, meta-model, emulator

I’ll refer to a computer model as a simulatorIt aims to simulate some real-world phenomenon

A meta-model is a simplified representation or approximation of a simulator

Built using a training set of simulator runsImportantly, it should run much more quickly than the simulator itselfSo it serves as a quick surrogate for the simulator, for any task that would require many simulator runs

An emulator is a particular kind of meta-modelMore than just an approximation, it makes fully probabilistic predictions of what the simulator would produceAnd those probability statements correctly reflect the training information


Meta-models

Various kinds of meta-models have been proposed by modellers and model users

Notably regression models and neural networks

But misrepresenttraining data

Line does not passthrough the points

Variance around theline also has thewrong form


Emulation

Desirable properties for a meta-modelIf asked to predict the simulator output at one of the training data points, it returns the observed output with zero variance

Assuming the simulator output doesn’t have random noise

So it must be sufficiently flexible to pass through all the training data points

Not restricted to some regression form

If asked to predict output at another point its predictions will have non-zero variance, reflecting realistic uncertainty

Given enough training data it should be able to predict simulator output to any desired accuracy

These properties characterise what we call an emulator


2 code runs

Consider one input and one output

Emulator estimate interpolates data

Emulator uncertainty grows between data points


3 code runs

Adding another point changes estimate and reduces uncertainty


5 code runs

And so on

The basic GP emulator


Gaussian processes

A Gaussian process (GP) is a probability distribution for an unknown function

A kind of infinite dimensional multivariate normal distribution

If a function f(x) has a GP distribution we writef(.) ~ GP(m(.), c(.,.))

m(.) is the mean function

c(.,.) is the covariance function

f(x) has a normal distribution with mean m(x) and variance c(x,x)c(x,x') is the covariance between f(x) and f(x')

A GP emulator represents the simulator as a GPConditional on some unknown parameters

Estimated from the training data


The mean function

The emulator’s mean function provides the central estimate for predicting the model output f(x)

It has two parts1. A conventional regression component

r(x) = μ + β1h1(x) + β2h2(x) + …+βphp(x)

The regression terms hj(x) are a modelling choice

1. Should reflect how we expect the simulator to respond to its inputs

2. E.g. r(x) = μ + β1x1 + β2x2 + …+βpxp models a general linear trend

The coefficients μ and βj are estimated from the training data

A smooth interpolator of the residuals yi – r(xi) at the training points

Smoothness is controlled by correlation length parameters

Also estimated from the training data


The mean function – example

x

y

543210

20

15

10

5

x.

r

543210

8

6

4

2

0

-2

-4

-6

-8

0

Red dots are training data

Green line is regression line

Black line is emulator mean

Red dots are residuals from regression through training data

Black line is smoothed residuals.


The prediction variance

The variance of f(x) depends on where x is relative to training data

At a training data point, it is zero

Moving away from a training point, it growsGrowth depends on correlation lengths

When far from any training point (relative to correlation lengths), it resolves into two components1. The usual regression variance

2. An interpolator variance – Estimated from observed variance of residuals

The mean function is then just the regression part


Correlation lengthCorrelation length parameters are crucial

But difficult to estimate

There is one correlation length for each inputPoints less than one correlation length away in a single input are highly correlated

Learning f(x') says a lot about f(x)So if x' is a training point, the predictive uncertainty about f(x) is small

But if we go more than about two correlation lengths away, the correlation is minimal

We now ignore f(x') when predicting f(x)Just use regression

Large correlation length signifies an input with very smooth and predictable effect on simulator outputSmall correlation length denotes an input with more variable and fine scale influence on the output


Correlation length and variance

Examples of GP realisations. GEM-SA uses a roughness parameter b which is the inverse square of correlation length. σ2 is the interpolation variance.

Practical matters


Modelling

The main modelling decision is to choose the regression terms hj(x)

Want to capture the broad shape of the response of the simulator to its inputs

Then residuals are small

Emulator predicts f(x) with small variance

And predicts realistically for x far from training data

If we get it wrongResiduals will be unnecessarily large

Emulator has unnecessarily large variance when interpolating

And extrapolates wrongly


Design

Another choice is the set of training data pointsThis is a kind of experimental design problem

We want points spread over the part of the input space for which the emulator is needed

So that no prediction is too far from a training point

We want this to be true also when we project the points into lower dimensions

So that prediction points are not too far from training points in dimensions (inputs) with small correlation lengths

We also want some points closer to each otherTo estimate correlation lengths better

Conventional designs don’t take account of this yet!


ValidationNo emulator is perfectThe GP emulator is based on assumptions

A particular form of covariance function parametrised by just one correlation length parameter per inputHomogeneity of variance and correlation structure

Simulators rarely behave this nicely!Getting the regression component rightNormality

Not usually a big issueEstimating parameters accurately from the training data

Can be a problem for correlation lengths

Failure of these assumptions will mean the emulator does not predict faithfully

f(x) will too often lie outside the range of its predictive distribution

So we need to apply suitable diagnostic checks


When to use GP emulation

The simulator output should vary smoothly in response to changing its inputs

Discontinuities are difficult to emulateVery rapid and erratic responses to inputs also may need unreasonably many training data points

The simulator is computer intensive So it’s not practical to run many thousands of times for Monte Carlo methodsBut not so that we can’t run it a few hundred times to build a good emulator

Not too many inputsFitting the emulator is hard Particularly if more than a few inputs influence the output strongly


Stochastic simulators

Throughout this course we are assuming the simulator is deterministic

Running it again at the same inputs will produce the same outputs

If there is random noise in the outputs we can modify the emulation theory

Mean function doesn’t have to pass through the data

Noise increases predictive variance

The benefits of the GP emulator are less compellingBut we are working on this!


References

1. O'Hagan, A. (2006). Bayesian analysis of computer code outputs: a tutorial. Reliability Engineering and System Safety 91, 1290-1300.

2. Santner, T. J., Williams, B. J. and Notz, W. I. (2003). The Design and Analysis of Computer Experiments. New York: Springer.

3. Rasmussen, C. E., and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press.

gaussian process modelling. gem-sa course - session 22 outline emulators the basic gp emulator...

Documents