slide 1 john paul gosling university of sheffield uncertainty and sensitivity analysis of complex...

31
Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Upload: amara-meek

Post on 01-Apr-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 1

John Paul Gosling

University of Sheffield

Uncertainty and Sensitivity Analysis of Complex Computer Codes

Page 2: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 2mucm.group.shef.ac.uk

Outline

Uncertainty and computer models

Uncertainty analysis (UA)

Sensitivity analysis (SA)

Page 3: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 3mucm.group.shef.ac.uk

Why worry about uncertainty?

How accurate are model predictions? There is increasing concern about uncertainty

in model outputs Particularly where model predictions are used

to inform scientific debate or environmental policy

Are model predictions robust enough for high stakes decision-making?

Page 4: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 4mucm.group.shef.ac.uk

For instance…

Models for climate change produce different predictions for the extent of global warming or other consequences Which ones should we believe? What error bounds should we put around

these? Are model differences consistent with the error

bounds? Until we can answer such questions

convincingly, decision makers can continue to dismiss our results

Page 5: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 5mucm.group.shef.ac.uk

Where is the uncertainty?

Several principal sources of uncertainty Accuracy of parameters in model equations Accuracy of data inputs Accuracy of the model in representing the real

phenomenon, even with accurate values for parameters and data

In this section, we will be concerned with the first two

Page 6: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 6mucm.group.shef.ac.uk

Inputs

We will interpret “inputs” widely Initial conditions Other data defining the particular context being

simulated Forcing data (e.g. rainfall in hydrology models) Parameters in model equations

These are often hard-wired (which is a problem!)

Page 7: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 7mucm.group.shef.ac.uk

Input uncertainty

We are typically uncertain about the values of many of the inputs Measurement error Lack of knowledge Parameters with no real physical meaning

However, we must have beliefs about the parameters.

The elicitation of these beliefs must be done in a careful manner as there will often be no data to contradict or support them.

Page 8: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 8mucm.group.shef.ac.uk

Output Uncertainty

Input uncertainty induces uncertainty in the output y

It also has a probability distribution In theory, this is completely determined by

the probability distribution on x and the model f

In practice, finding this distribution and its properties is not straightforward

Page 9: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 9mucm.group.shef.ac.uk

A trivial model

Suppose we have just two inputs and a simple linear model

y = x1 + 3*x2

Suppose that x1 and x2 have independent uniform distributions over [0, 1] i.e. they define a point that is equally likely to be

anywhere in the unit square Then we can determine the distribution of y

exactly

Page 10: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 10mucm.group.shef.ac.uk

A trivial model – y’s distribution

The distribution of y has this trapezium form

Page 11: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 11mucm.group.shef.ac.uk

A trivial model – y’s distribution

If x1 and x2 have normal distributions (x1, x2

~N(0.5, 0.252)), we get a normal output

Page 12: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 12mucm.group.shef.ac.uk

A slightly less trivial model

Now consider the simple nonlinear model

y = sin(x1)/{1+exp(x1+x2)}

We still have only 2 inputs and quite a simple equation

But even for nice input distributions, we cannot get the output distribution exactly

The simplest way to compute it would be by Monte Carlo

Page 13: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 13mucm.group.shef.ac.uk

Monte Carlo output distribution

This is for the normal inputs 10,000 random normal pairs were generated

and y calculated for each pair

Page 14: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 14mucm.group.shef.ac.uk

Uncertainty analysis (UA)

The process of characterising the distribution of the output y is called uncertainty analysis

Plotting the distribution is a good graphical way to characterise it

Quantitative summaries are often more important Mean, median Standard deviation, quartiles Probability intervals

Page 15: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 15mucm.group.shef.ac.uk

UA of slightly nonlinear model

Mean = 0.117, median = 0.122 Std. dev. = 0.049 50% range (quartiles) = [0.093, 0.148] 95% range = [0.002, 0.200]

Page 16: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 16mucm.group.shef.ac.uk

UA versus plug-in

Even if we just want to estimate y, UA does better than the “plug-in” approach of running the model for estimated values of x For the simple nonlinear model, the central

estimates of x1 and x2 are 0.5, but

sin(0.5)/(1+exp(1)) = 0.129

is a slightly too high estimate of y compared with the mean of 0.117 or median of 0.122

The difference can be much more marked for highly nonlinear models

Page 17: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 17mucm.group.shef.ac.uk

Summary

Why UA? Proper quantification of output uncertainty

Need proper probabilistic expression of input uncertainty

Improved central estimate of output Better than the usual plug-in approach

Page 18: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 18mucm.group.shef.ac.uk

Which inputs affect output most?

This is a common question

Sensitivity analysis (SA) attempts to address it

There are various forms of SA

The methods most frequently used are not the

most helpful!

Page 19: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 19mucm.group.shef.ac.uk

Local sensitivity analysis

To measure the sensitivity of y to input x i, compute the derivative of y with respect to x i

Nonlinear model: At x1 = x2 = 0.5, the derivatives are

wrt x1, 0.142; wrt x2, –0.094

What does this tell us?

Page 20: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 20mucm.group.shef.ac.uk

Local SA – deficiencies

Derivatives evaluated at the central estimate Could be quite different at other points nearby

Doesn’t capture interactions between inputs E.g. sensitivity of y to increasing both x1 and x2

could be greater or less than the sum of their individual sensitivities

Not invariant to change of units

Page 21: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 21mucm.group.shef.ac.uk

One-way SA

Vary inputs one at a time from central estimate Nonlinear model:

Vary x1 to 0.25, 0.75, output is 0.079, 0.152

Vary x2 to 0.25, 0.75, output is 0.154, 0.107

Is this really a good idea?

Page 22: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 22mucm.group.shef.ac.uk

One-way SA – deficiencies

Depends on how far we vary each input Relative sensitivities of different inputs change

if we change the ranges Also fails to capture interactions

Statisticians have known for decades that varying factors one at a time is bad experimental design!

Page 23: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 23mucm.group.shef.ac.uk

Multi-way SA

Vary factors two or more at a time Maybe statistical factorial design Full factorial designs require very many runs

Can find interactions but hard to interpret Often just look for the biggest change of output

among all runs Still dependent on how far we vary each input

Page 24: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 24mucm.group.shef.ac.uk

Probabilistic SA (PSA)

Inputs varied according to their probability distributions As in UA

Gives an overall picture and can identify interactions

Page 25: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 25mucm.group.shef.ac.uk

Variance decomposition

One way to characterise the sensitivity of the output to individual inputs is to compute how much of the UA variance is due to each input

For the simple non-linear model, we have

Input Contribution

X1 80.30 %

X2 16.77 %

X1.X2 interaction 2.93 %

Page 26: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 26mucm.group.shef.ac.uk

Main effects

We can also plot the effect of varying one input averaged over the others

Nonlinear model Averaging y = sin(x1)/{1+exp(x1+x2)} with

respect to the uncertainty in x2, we can plot it as a function of x1

Similarly, we can plot it as a function of x2 averaged over uncertainty in x1

We can also plot interaction effects

Page 27: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 27mucm.group.shef.ac.uk

1 2

1.00.50.0

0.15

0.10

0.05

0.00

x

y

Nonlinear example – main effects

Red is main effect of x1 (averaged over x2)

Blue is main effect of x2 (averaged over x1)

Page 28: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 28mucm.group.shef.ac.uk

Summary

Why SA? For the model user: identifies which inputs it

would be most useful to reduce uncertainty about

For the model builder: main effect and interaction plots demonstrate how the model is behaving Sometimes surprisingly!

Page 29: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 29mucm.group.shef.ac.uk

What’s this got to do with emulators?

Computation of UA and (particularly) SA by conventional methods (like Monte Carlo) can be an enormous task for complex environmental models Typically at least 10,000 model runs needed Not very practical when each run takes 1

minute (a week of computing) And out of the question if a run takes 30

minutes Emulators use only a fraction of model runs,

and their probabilistic framework helps keep track of all the uncertainty.

Page 30: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 30mucm.group.shef.ac.uk

What’s to come?

GEM-SA is the first stage of the GEM project GEM = “Gaussian Emulation Machine”

It uses highly efficient emulation methods based on Bayesian statistics

The fundamental idea is that of “emulating” the physical model by a statistical representation called a Gaussian process

GEM-SA does UA and SA Future stages of GEM will add more

functionality

Page 31: Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

Slide 31mucm.group.shef.ac.uk

References

There are two papers that cover the material in these slides:

Oakley and O’Hagan (2002). Bayesian inference for the uncertainty distribution of computer model outputs, Biometrika, 89, 769—784.

Oakley and O’Hagan (2004). Probabilistic sensitivity analysis of complex models: a Bayesian approach, J. R. Statist. Soc. B, 66, 751—769.