gaussian process modelling. gem-sa course - session 22 outline emulators the basic gp emulator...
TRANSCRIPT
Gaussian process modelling
GEM-SA course - session 2 2
Outline
Emulators
The basic GP emulator
Practical matters
Emulators
GEM-SA course - session 2 4
Simulator, meta-model, emulator
I’ll refer to a computer model as a simulatorIt aims to simulate some real-world phenomenon
A meta-model is a simplified representation or approximation of a simulator
Built using a training set of simulator runsImportantly, it should run much more quickly than the simulator itselfSo it serves as a quick surrogate for the simulator, for any task that would require many simulator runs
An emulator is a particular kind of meta-modelMore than just an approximation, it makes fully probabilistic predictions of what the simulator would produceAnd those probability statements correctly reflect the training information
GEM-SA course - session 2 5
Meta-models
Various kinds of meta-models have been proposed by modellers and model users
Notably regression models and neural networks
But misrepresenttraining data
Line does not passthrough the points
Variance around theline also has thewrong form
GEM-SA course - session 2 6
Emulation
Desirable properties for a meta-modelIf asked to predict the simulator output at one of the training data points, it returns the observed output with zero variance
Assuming the simulator output doesn’t have random noise
So it must be sufficiently flexible to pass through all the training data points
Not restricted to some regression form
If asked to predict output at another point its predictions will have non-zero variance, reflecting realistic uncertainty
Given enough training data it should be able to predict simulator output to any desired accuracy
These properties characterise what we call an emulator
GEM-SA course - session 2 7
2 code runs
Consider one input and one output
Emulator estimate interpolates data
Emulator uncertainty grows between data points
GEM-SA course - session 2 8
3 code runs
Adding another point changes estimate and reduces uncertainty
GEM-SA course - session 2 9
5 code runs
And so on
The basic GP emulator
GEM-SA course - session 2 11
Gaussian processes
A Gaussian process (GP) is a probability distribution for an unknown function
A kind of infinite dimensional multivariate normal distribution
If a function f(x) has a GP distribution we writef(.) ~ GP(m(.), c(.,.))
m(.) is the mean function
c(.,.) is the covariance function
f(x) has a normal distribution with mean m(x) and variance c(x,x)c(x,x') is the covariance between f(x) and f(x')
A GP emulator represents the simulator as a GPConditional on some unknown parameters
Estimated from the training data
GEM-SA course - session 2 12
The mean function
The emulator’s mean function provides the central estimate for predicting the model output f(x)
It has two parts1. A conventional regression component
r(x) = μ + β1h1(x) + β2h2(x) + …+βphp(x)
The regression terms hj(x) are a modelling choice
1. Should reflect how we expect the simulator to respond to its inputs
2. E.g. r(x) = μ + β1x1 + β2x2 + …+βpxp models a general linear trend
The coefficients μ and βj are estimated from the training data
A smooth interpolator of the residuals yi – r(xi) at the training points
Smoothness is controlled by correlation length parameters
Also estimated from the training data
GEM-SA course - session 2 13
The mean function – example
x
y
543210
20
15
10
5
x.
r
543210
8
6
4
2
0
-2
-4
-6
-8
0
Red dots are training data
Green line is regression line
Black line is emulator mean
Red dots are residuals from regression through training data
Black line is smoothed residuals.
GEM-SA course - session 2 14
The prediction variance
The variance of f(x) depends on where x is relative to training data
At a training data point, it is zero
Moving away from a training point, it growsGrowth depends on correlation lengths
When far from any training point (relative to correlation lengths), it resolves into two components1. The usual regression variance
2. An interpolator variance – Estimated from observed variance of residuals
The mean function is then just the regression part
GEM-SA course - session 2 15
Correlation lengthCorrelation length parameters are crucial
But difficult to estimate
There is one correlation length for each inputPoints less than one correlation length away in a single input are highly correlated
Learning f(x') says a lot about f(x)So if x' is a training point, the predictive uncertainty about f(x) is small
But if we go more than about two correlation lengths away, the correlation is minimal
We now ignore f(x') when predicting f(x)Just use regression
Large correlation length signifies an input with very smooth and predictable effect on simulator outputSmall correlation length denotes an input with more variable and fine scale influence on the output
GEM-SA course - session 2 16
Correlation length and variance
Examples of GP realisations. GEM-SA uses a roughness parameter b which is the inverse square of correlation length. σ2 is the interpolation variance.
Practical matters
GEM-SA course - session 2 18
Modelling
The main modelling decision is to choose the regression terms hj(x)
Want to capture the broad shape of the response of the simulator to its inputs
Then residuals are small
Emulator predicts f(x) with small variance
And predicts realistically for x far from training data
If we get it wrongResiduals will be unnecessarily large
Emulator has unnecessarily large variance when interpolating
And extrapolates wrongly
GEM-SA course - session 2 19
Design
Another choice is the set of training data pointsThis is a kind of experimental design problem
We want points spread over the part of the input space for which the emulator is needed
So that no prediction is too far from a training point
We want this to be true also when we project the points into lower dimensions
So that prediction points are not too far from training points in dimensions (inputs) with small correlation lengths
We also want some points closer to each otherTo estimate correlation lengths better
Conventional designs don’t take account of this yet!
GEM-SA course - session 2 20
ValidationNo emulator is perfectThe GP emulator is based on assumptions
A particular form of covariance function parametrised by just one correlation length parameter per inputHomogeneity of variance and correlation structure
Simulators rarely behave this nicely!Getting the regression component rightNormality
Not usually a big issueEstimating parameters accurately from the training data
Can be a problem for correlation lengths
Failure of these assumptions will mean the emulator does not predict faithfully
f(x) will too often lie outside the range of its predictive distribution
So we need to apply suitable diagnostic checks
GEM-SA course - session 2 21
When to use GP emulation
The simulator output should vary smoothly in response to changing its inputs
Discontinuities are difficult to emulateVery rapid and erratic responses to inputs also may need unreasonably many training data points
The simulator is computer intensive So it’s not practical to run many thousands of times for Monte Carlo methodsBut not so that we can’t run it a few hundred times to build a good emulator
Not too many inputsFitting the emulator is hard Particularly if more than a few inputs influence the output strongly
GEM-SA course - session 2 22
Stochastic simulators
Throughout this course we are assuming the simulator is deterministic
Running it again at the same inputs will produce the same outputs
If there is random noise in the outputs we can modify the emulation theory
Mean function doesn’t have to pass through the data
Noise increases predictive variance
The benefits of the GP emulator are less compellingBut we are working on this!
GEM-SA course - session 2 23
References
1. O'Hagan, A. (2006). Bayesian analysis of computer code outputs: a tutorial. Reliability Engineering and System Safety 91, 1290-1300.
2. Santner, T. J., Williams, B. J. and Notz, W. I. (2003). The Design and Analysis of Computer Experiments. New York: Springer.
3. Rasmussen, C. E., and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press.