data-driven modelling. part 4. models of uncertaintymodels ... · pdf file1 data-driven...

29
1 Data-driven modelling. Part 4. Models of uncertainty Models of uncertainty Dimitri P. Solomatine UNESCO-IHE Institute for Water Education Hydroinformatics Chair Let’s consider a (physically-based) Model Example: a hydrological conceptual model SF Snow SF Snow Input(s) SM RF R EA SP SF IN SF Snow RF – Rain EA – Evapotranspiration SP – Snow cover IN – Infiltration R – Recharge SM – Soil moisture CFLUX – Capillary transport UZ – Storage in upper reservoir PERC – Percolation LZ – Storage in lower reservoir SM RF RF R R EA EA SP SF SF IN IN SF Snow RF – Rain EA – Evapotranspiration SP – Snow cover IN – Infiltration R – Recharge SM – Soil moisture CFLUX – Capillary transport UZ – Storage in upper reservoir PERC – Percolation LZ – Storage in lower reservoir Parameters States D.P. Solomatine. Data-driven models of uncertainty 2 LZ UZ R PERC Q=Q0+Q1 Q1 Transform function Q0 CFLUX Qo – Fast runoff component Q1 – Slow runoff component Q – Total runoff LZ UZ R R PERC PERC Q=Q0+Q1 Q1 Q1 Transform function Q0 Q0 CFLUX CFLUX Qo – Fast runoff component Q1 – Slow runoff component Q – Total runoff Output

Upload: dangque

Post on 06-Mar-2018

226 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

1

Data-driven modelling.Part 4.

Models of uncertaintyModels of uncertainty Dimitri P. Solomatine

UNESCO-IHE Institute for Water EducationHydroinformatics Chair

Let’s consider a (physically-based) Model

Example: a hydrological conceptual modelSF SnowSF Snow

Input(s)

SM

RF

R

EA

SP

SF

IN

SF – Snow

RF – Rain

EA – Evapotranspiration

SP – Snow cover

IN – Infiltration

R – Recharge

SM – Soil moisture

CFLUX – Capillary transport

UZ – Storage in upper reservoir

PERC – Percolation

LZ – Storage in lower reservoirSM

RFRF

RR

EAEA

SP

SFSF

ININ

SF – Snow

RF – Rain

EA – Evapotranspiration

SP – Snow cover

IN – Infiltration

R – Recharge

SM – Soil moisture

CFLUX – Capillary transport

UZ – Storage in upper reservoir

PERC – Percolation

LZ – Storage in lower reservoir

ParametersStates

D.P. Solomatine. Data-driven models of uncertainty 2

LZ

UZ

R

PERC Q=Q0+Q1Q1

Transformfunction

Q0

CFLUX Qo – Fast runoff component

Q1 – Slow runoff component

Q – Total runoff

LZ

UZ

RR

PERCPERC Q=Q0+Q1Q1Q1

Transformfunction

Q0Q0

CFLUXCFLUX Qo – Fast runoff component

Q1 – Slow runoff component

Q – Total runoff Output

Page 2: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

2

Model output: deterministic forecasts. How to find the 90% uncertainty bounds?

3500

4000

1500

2000

2500

3000

3500

Dis

char

ge(m

3/s)

D.P. Solomatine. Data-driven models of uncertainty 3

900 920 940 960 980 1000 10200

500

1000

Time(days)

Steps in uncertainty analysis

Identification of sources of uncertainty (Input, parameter, model operational)model, operational)

Quantification of uncertainty (e.g. distribution and range of parameters)

Reduction of uncertainty (e.g. better understanding of the model, accurate measurement etc)

Studying propagation of uncertainty through the model ( b M t C l i l ti )

D.P. Solomatine. Data-driven models of uncertainty 4

(e.g. by Monte Carlo simulation) Quantification of uncertainty in the model outputs (e.g.

analysis of the outputs of the results) Application of the uncertain information in decision

making process

Page 3: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

3

Sources of model uncertaintyy = M(x, p) + εs + εθ + εx + εy

SF – Snow

RF – Rain

SF – Snow

RF – Rain

D.P. Solomatine. Data-driven models of uncertainty 5

LZ

UZ

SM

RF

R

PERC

EA

Q=Q0+Q1Q1

Transformfunction

SP

Q0

SF

CFLUX

IN

EA – Evapotranspiration

SP – Snow cover

IN – Infiltration

R – Recharge

SM – Soil moisture

CFLUX – Capillary transport

UZ – Storage in upper reservoir

PERC – Percolation

LZ – Storage in lower reservoir

Qo – Fast runoff component

Q1 – Slow runoff component

Q – Total runoff

LZ

UZ

SM

RFRF

RR

PERCPERC

EAEA

Q=Q0+Q1Q1Q1

Transformfunction

SP

Q0Q0

SFSF

CFLUXCFLUX

ININ

EA – Evapotranspiration

SP – Snow cover

IN – Infiltration

R – Recharge

SM – Soil moisture

CFLUX – Capillary transport

UZ – Storage in upper reservoir

PERC – Percolation

LZ – Storage in lower reservoir

Qo – Fast runoff component

Q1 – Slow runoff component

Q – Total runoff

Sources of model uncertainty

Observational (data) uncertainty (εx , εy) a) measurement errors due to both instrumental and human

y = M(x, p) + εs + εθ + εx + εy

a) measurement errors due to both instrumental and human errors

b) error due to inadequate representativeness of a data sample due to scale incompatibility or difference in time and space between the variable measured by the instrument and the corresponding model variable.

Model (structural) uncertainty εs

D.P. Solomatine. Data-driven models of uncertainty 6

( ) y s

Parametric uncertainty εθ

Page 4: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

4

Measures of Uncertainty

1 400

Probability distributionConfidence bounds (two quantiles, say 5% and 95%)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

dic

tive

Pro

ba

bili

ty P

(H<

=h

)

50

100

150

200

250

300

350

Dis

cha

rge

(m

3.s

)

D.P. Solomatine. Data-driven models of uncertainty 77

8 8.5 9 9.5 10 10.5 11 11.5 120

Observed River Stage h (m)

10 20 30 40 50 60 700

Time (hr)

Other measures : range, variance, other moments

Uncertainty of a calibrated (“optimal”) model

Output YModel

Model output y^ Measured

Uncertainty of an optimal model M Model is calibrated on measured data y

time

Actual

Actual value y*(unknown)

Measured value y

MeasuredModel error

Observation error

D.P. Solomatine. Data-driven models of uncertainty 8

Model is calibrated on measured data y We say the model M uncertainty is manifested in the model error This error incorporates all uncertainties due to:

observational errors, inaccurately estimated parameters, inadequate model structure

We can bulid a ML model U to predict the pdf of the model M error

Page 5: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

5

Propagation of uncertainty through the model

y^ = M (x, p) x = input p = parameters x = input, p = parameters

Uncertainty in X and p propagates to output y pdf of parameters pdf of output pdfp pdfy pdf of inputs pdfx pdf of output pdfx pdfy

D.P. Solomatine. Data-driven models of uncertainty 9

Suggested approach to “building up”model uncertainty

uncertainty of an optimal model M (p*)t i t f M ( *) d t DATA t i t + uncertainty of M (p*) due to DATA uncertainty

+ uncertainty of M (p) due to PARAMETERS uncertainty + uncertainty of M (p) due to STRUCTURAL uncertainty = uncertainty of a model class M (p), given the

probabilistic properties of parameters and data

D.P. Solomatine. Data-driven models of uncertainty 10

Page 6: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

6

Monte Carlo Simulation

Mote Carlo casino: roulette wheel

D.P. Solomatine. Data-driven models of uncertainty 12

It is a random number generator – uses uniform distribution with the range of [0, 36]

Page 7: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

7

Monte Carlo simulation in analysing parametric uncertainty

D.P. Solomatine. Data-driven models of uncertainty 13

Monte Carlo sampling: illustrationy = M(x, s, θ) + εs + εθ + εx + εy

D.P. Solomatine. Data-driven models of uncertainty 14

Page 8: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

8

Monte Carlo simulation of parametric uncertainty

Consider the model M calculating y (e.g., discharge) y^(t) = M (X(t) p) y (t) = M (X(t), p) where X(t) = vector of inputs (precipitation, temperature etc)

known for t = 1,…, T p = vector of parameters (soil properties, roughness, etc)

Monte Carlo approach: sample N parameter vectors pi

run the model for each of them y^i(t) = M (X(t), pi)

D.P. Solomatine. Data-driven models of uncertainty 15

y i( ) ( ( ), pi)and generate N outputs (leave some of them if GLUE used)

assess distribution of Qi(t) for each time moment t (or its parameters - mean, variance, prediction intervals, quantiles)

Issues with Monte Carlo

There is however a problem during model operation: How to assess the parametric uncertainty of the p y

model M for t = T+1 when new input data X(t+1) is fed?

It is difficult since MC analysis is computationally intensive: 1) convergence of the Monte Carlo simulation is very slow

(O(N^-0.5)) so larger number of runs needed to establish a reliable estimate of uncertainties2) b f i l ti i ti ll ith th

D.P. Solomatine. Data-driven models of uncertainty 16

2) number of simulation increases exponentially with the dimension of the parameter vector ((O(n^d)) to cover the entire parameter domain

Idea: encapsulate the results of MC simulation in a machine learning

model (MLUE method, to be considered later)

Page 9: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

9

Economic sampling: Latin Hypercube Sampling (LHS), McKay 1979

the range of each variable is divided into M equally probable intervals

M sample points are placed as shown

the number of divisions, M, is equal for each variable

D.P. Solomatine. Data-driven models of uncertainty 17

for each variable LHS does not require

more samples for more dimensions (variables)

Building Models of Uncertainty : Using Methods of

Computational Intelligence

Page 10: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

10

Data: attributes, inputs, outputs

Data often missing, noisyg, y temporal and spatial, heterogeneous non-stationary

Measured data Attributes

Inputs OutputInstances x1 x2 … xn y

Model output

y* = f (x)

y*

D.P. Solomatine. Data-driven models of uncertainty 19

Instances x1 x2 … xn yInstance 1 x11 x12 x1n y1

Instance 2 x21 x21 x2 n y2

…Instance K xK1 xK2 xK n yK

yy1*y2*…yK*

CI in building models of natural processes -why not build a model of uncertainty?

Actual (observed) t t Y

Input dataNatural process

Xoutput Y

Data-drivenmodel

M (p, x)P di t d t t Y’

Error (p) min

yyKKxxKK nnxxKK22xxKK11Instance KInstance K……

yy22xx22 nnxx2121xx2121Instance Instance 22yy11xx11nnxx1212xx1111Instance Instance 11yyxxnn……xx22xx11InstancesInstances

OutputOutputInputsInputsAttributesAttributesMeasured dataMeasured data

yyKKxxKK nnxxKK22xxKK11Instance KInstance K……

yy22xx22 nnxx2121xx2121Instance Instance 22yy11xx11nnxx1212xx1111Instance Instance 11yyxxnn……xx22xx11InstancesInstances

OutputOutputInputsInputsAttributesAttributesMeasured dataMeasured data

D.P. Solomatine. Data-driven models of uncertainty 20

CI provides methods to build Data-driven models Ideally, such models are “ultimate models” since they are not

polluted by theories

(p, )Predicted output Y’

Page 11: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

11

Example of a data-driven (ML) model

observed data characterises the input-output relationship

Linear regression model

Y = a0 + a1 X

Y actual input output relationship X Y

model parameters are found by optimization

the model then predicts output for the new input without actual knowledge of what drives Y

(e.g., flow)output value

model predicts new output value

redgreen

blue

D.P. Solomatine. Data-driven models of uncertainty 21

X (e.g. rainfall)

new input value

Which model is “better”:green, red or blue?

An example of a ML model: artificial neural network

Artificial neural network

D.P. Solomatine. Data-driven models of uncertainty 22

i

tiijj

j

hidjkk

out xaagbbgY 2)(00 ))((

e + 1

1 = (u) gwhere

u-

Page 12: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

12

A novel idea: train a machine learning model to predict Error distribution

Observed outputPHYSICAL SYSTEM

Observed outputPHYSICAL SYSTEM

HYDROLOGIC FORECASTING

MODEL

Input data Modelerrors

Forecastederrors

DATA-DRIVEN

error forecasting

Improvedoutput

Modelparameters

Model outputHYDROLOGIC

MODEL

Input data Modelerrors

Forecastederrors

Improvedoutput

Modelparameters

Model output

ML ERROR FORE-CASTING

D.P. Solomatine. Data-driven models of uncertainty 23

Train ML model (e.g. Neural Network) to represent pdf of output uncertaintyD.P. Solomatine, D.L. Shrestha (2009). A novel method to estimate model uncertainty

using machine learning techniques. Water Resources Res. 45, W00B11. D. L. Shrestha, N. Kayastha, and D. P. Solomatine (2009). A novel approach to

parameter uncertainty analysis of hydrological models using neural networks. Hydrol. Earth Syst. Sci., 13, 1235–1248.

gmodelMODEL

Models and CI in Hydroinformatics

1. Models of systems (processes) M (s)

2. Models of models M (M(s))

3. Models of models’ uncertainty U (M(s))

4. Optimisation of models and systems s*, M*(s)

D.P. Solomatine. Data-driven models of uncertainty 24

4. Optimisation of models and systems s , M (s)

Page 13: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

13

Uncertainty analysis methods considered

1. UNNEC = machine learning model of the past errors of the optimal process model is builtthe optimal process model is built

2. MLUE = machine learning model of the process model’s Monte Carlo simulation results is built

D.P. Solomatine. Data-driven models of uncertainty 25

1. UNEEC methodUNcertainty Estimation based on local Errors and Clustering

D.P. Solomatine. Data-driven models of uncertainty 26

machine learning model of the past errors of the optimal process model is built

Page 14: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

14

UNEEC: assumptions, constraints

Assumptions

Model error is an indicator of the model uncertainty Model error is an indicator of the model uncertainty

Model error depends on the current condition of a natural system and can be predicted

Model errors are similar for similar conditions

Constraints

Model structure and parameters are fixed

D.P. Solomatine. Data-driven models of uncertainty 27

p

Need to re-train the error model with the changes in the catchment characteristics (e.g. land use change)

Data hungry, more data are needed for reliable results

Idea 1: assess CDF from all historical data about errors

Error (Qt-Qt’)

time

past records (examples)

i

iN

i

1

iN

i

12/

iN

i

1)2/1(

P di ti i t l

Error distribution

D.P. Solomatine. Data-driven models of uncertainty 28

Prediction interval

Page 15: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

15

Idea 2: local modelling of errors (link CDF to “characteristis variables”)

Error (Qt-Qt’)

Flow Qt-1

past records (examples in the space of inputs)

Outputi

iN

i

1

iN

i

12/

iN

i

1)2/1(

Error distribution in cluster

D.P. Solomatine. Data-driven models of uncertainty 29

Rainfall Rt-2

Prediction interval(different for each cluster)

Idea 3: Use fuzzy clustering of examples to generate training data sets

Lclus

Nclus

exampleclusLexample PICPI ,

Eager learning (ANN or M5 model tree)

Error limits(or prediction intervals)

Flow Qt-1

past records (examples in the space of inputs)

Output

clus1

•clus,,example is the membership grade of the example to cluster clus

Train regression (ANN) models:

( )

D.P. Solomatine. Data-driven models of uncertainty 30

New record. The trained f L and f U models will estimate the prediction interval

Rainfall Rt-2

models:PIL = fL (X)PIU = fU (X)

Page 16: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

16

Using instance-based learning

Lclus

Nclus

exampleclusLexample PICPI ,

Instance based learning

Error limits(or prediction intervals)

Flow Qt-1

past records (examples in the space of inputs)

New record

Output

cluspp

1,

•clus,,example is the membership grade of the example to cluster clus

g

D.P. Solomatine. Data-driven models of uncertainty 31

Rainfall Rt-2

record

The distance function is computed to estimate fuzzy weight

Clustering (finding groups of data in the space characterising hydro-meteo condition): K-means clustering, fuzzy C-means l t i

UNEEC details. Step 1: clustering

clustering

Obj. function

c

j

N

iji

mjim DVUJVU

1 1

2,,),(),min(

Distance22

, Ajiji vxD

Degree ofFuzzification

1m

D.P. Solomatine. Data-driven models of uncertainty 32

Constraint ic

jji

,1

1,

Fuzzification

Page 17: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

17

iLj ePIC

UNEEC details. Step 2: Determining Prediction Interval (PI) for each cluster

UPIC

N

ijiji

i

ki

1,,

12/:

Ni

ji,

N

iji

1,

N

iji

1,)2/1(

iUj ePIC

ijiji

ki

1,,

1)2/1(:

D.P. Solomatine. Data-driven models of uncertainty 33

N

iji

1,2/

UNEEC details. Step 3, 4, 5: Building and using the model

Lj

c

jji

Li PICPI

1, U

jc

jji

Ui PICPI

1,

Step 3: Generation of Prediction intervals for

each example

)X( uL

uL fPI )X(U

uU fPI

Step 4: Building the uncertainty Model

)X(LL fPI )X(UU fPI Step 5: Using the uncertainty Modelep

ende

nt C

ompu

tati

on

D.P. Solomatine. Data-driven models of uncertainty 34

)X( vufPI )X( vufPI uncertainty Model

Uii

Ui PIyPL ˆL

iiLi PIyPL ˆ

Model Outputs with uncertainty bounds

Inde

Page 18: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

18

UNEEC methodology

D.P. Solomatine. Data-driven models of uncertainty 35

2. MLUE methodMachine Learning in Uncertainty Estimation

D.P. Solomatine. Data-driven models of uncertainty 36

machine learning model of the process model’s Monte Carlo simulation results is built

Page 19: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

19

Monte Carlo simulation of parametric uncertaintyy = M(x, s, θ) + εs + εθ + εx + εy

D.P. Solomatine. Data-driven models of uncertainty 37

Monte Carlo simulation of parametric uncertainty

Consider the model M calculating y (e.g., discharge) y(t) = M (X(t), p)y( ) ( ( ), p) where X(t) = vector of inputs (precipitation, temperature etc)

known for t = 1,…, T p = vector of parameters (soil properties, roughness, etc)

Monte Carlo approach: sample N parameter vectors pi run the model for each of them yi(t) = M (X(t), pi)

and generate N outputs (leave some of them if GLUE used)

D.P. Solomatine. Data-driven models of uncertainty 38

and generate N outputs (leave some of them if GLUE used) assess distribution of Qi(t) for each time moment t (or its

parameters - mean, variance, prediction intervals, quantiles) The problem:

How to assess the parametric uncertainty of the model M for t = T+1 when new input data X(t+1) is fed?

Page 20: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

20

Issues with MC

Issues with re-running MC for new inputs: 1) convergence of the Monte Carlo simulation is very slow 1) convergence of the Monte Carlo simulation is very slow

(O(N^-0.5)) so larger number of runs needed to establish a reliable estimate of uncertainties

2) number of simulation increases exponentially with the dimension of the parameter vector ((O(n^d)) to cover the entire parameter domain

Idea:

D.P. Solomatine. Data-driven models of uncertainty 39

encapsulate the results of MC simulation in a machine learning model

MLUE Methodology (1)

Consider the sources of the uncertainty analysis to be conducted within the framework of Monte Carlo simulation

Execute the MC simulations to generate the datayi(t) = M (X(t), pi)

Estimate the uncertainty measures of the MC realizations, e.g., mean, variance, prediction intervals, quantiles to start with, estimate two quantiles (say, 5% and

D.P. Solomatine. Data-driven models of uncertainty 40

, q ( y,95%), forming the prediction interval PI

Page 21: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

21

MLUE Methodology (2)

Analyze the dependency of the uncertainty measures (quantiles) on the input and state variables of the(quantiles) on the input and state variables of the hydrological model we used Correlation and Average mutual information

analysis Select the input variables for machine learning model

based on the dependency analysisT i th hi l i d l U t di t th

D.P. Solomatine. Data-driven models of uncertainty 41

Train the machine learning model U to predict the uncertainty measures of MC realizations PI = U (X)

Validate machine learning model U by estimating the uncertainty measures with the “new” input data

Validation

Measuring predictive capability of uncertainty model U (measures the accuracy of uncertainty models in approximating the quantiles of the model outputs generated by MC simulations) Coefficient of correlation (r) and root mean squared error (RMSE)

Measuring the statistics of the uncertainty estimation (i.e. goodness of the model U as uncertainty estimator) Prediction interval coverage probability (PICP) and

mean prediction interval (MPI) (Shrestha & Solomatine 2006, 2008) 1 n

D.P. Solomatine. Data-driven models of uncertainty 42

Visualizing such as scatter and time plot of the prediction intervals obtained from the MC simulation and their predicted values

1

1

1, with =

0, otherwise

t

L Ut t t

PICP Cn

PL y PLC

1

1( )

nU Lt t

t

MPI PL PLn

Page 22: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

22

Application

• Catchment area: 135 km2

• Location: south west of England

Study area: Brue catchment, UK

• Average annual rainfall: 867 mm

• Mean river flow: 1.92 m3/s

• Calibration data: 24/06/94-24/06/95

• Validation data: 24/06/95-31/05/96

D.P. Solomatine. Data-driven models of uncertainty 44

Page 23: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

23

Study area: Brue catchment, UK

40

45 0

5

10

15

20

25

30

35

40

Dis

char

ge

(m3/s

)

5

10

15

20

25

30

35

Rai

nfa

ll (

mm

/ho

ur)

D.P. Solomatine. Data-driven models of uncertainty 45

0

5

0 2000 4000 6000 8000 10000 12000 14000 16000

Time (houry) (1994/6/24 05:00 - 1996/05/31 13:00)

35

40

Calibration data (8760 points):

Validation data (8217 points):

Conceptual Hydrological model HBV

SF – Snow

RF – Rain

EA – Evapotranspiration

SF – Snow

RF – Rain

EA – Evapotranspiration

SM

RF

R

EA

SP

Q0

SF

CFLUX

IN

p p

SP – Snow cover

IN – Infiltration

R – Recharge

SM – Soil moisture

CFLUX – Capillary transport

UZ – Storage in upper reservoir

PERC – Percolation

LZ – Storage in lower reservoir

Qo – Fast runoff component

Q1 – Slow runoff component

Q T t l ff

SM

RFRF

RR

EAEA

SP

Q0Q0

SFSF

CFLUXCFLUX

ININ

p p

SP – Snow cover

IN – Infiltration

R – Recharge

SM – Soil moisture

CFLUX – Capillary transport

UZ – Storage in upper reservoir

PERC – Percolation

LZ – Storage in lower reservoir

Qo – Fast runoff component

Q1 – Slow runoff component

Q T t l ff

D.P. Solomatine. Data-driven models of uncertainty 46

LZ

UZ

PERC Q=Q0+Q1Q1

Transformfunction

Q – Total runoff

LZ

UZ

PERCPERC Q=Q0+Q1Q1Q1

Transformfunction

Q – Total runoff

Page 24: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

24

Data Analysis

Analysis of dependency btw various combinations of the input variables and the outputinput variables and the output

Correlation

Average mutual information (AMI) between REt and PIs,

( optimal lag time is around 7-9 hours).

Additional analysis of the correlation and AMI between the PIs and observed discharge Qt are carried out. (i.e. with

D.P. Solomatine. Data-driven models of uncertainty 47

g Q (the lag of 0, 1, 2) have very high correlation with the PIs.

Experimental setup

MC simulation 9 Parameters of HBV model are sampled uniformly from 9 Parameters of HBV model are sampled uniformly from the feasible ranges

Nash-Sutcliffe coefficient of efficiency (CE) is used as error measure

Convergence – stabilized after 10,000 (75,000 runs made) Only 25,000 “good” models considered (rejection threshold

is set to 0) to compute prediction quantiles

D.P. Solomatine. Data-driven models of uncertainty 48

is set to 0) to compute prediction quantiles

Page 25: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

25

Experimental setup

Machine learning model U PI = U (RE Q Q ) PI = U (REt-5a, Qt-1, Qt-1 )

PI - lower or upper prediction intervals,

REt-5a - average of REt-5, REt-6, REt-7, REt-8, and REt-9 Qt-1 - Qt-1 - Qt-2.

Input variables were selected based on the analysis of their relatedness to output error (average mutual information)

( , )AMI ( ) l

XY i jP x yP

D.P. Solomatine. Data-driven models of uncertainty 49

Methods: M5 model trees, locally weighted regression MLP neural networks

2,

AMI= ( , ) log( ) ( )

jXY i j

X i Y ji j

P x yP x P y

Results

Page 26: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

26

UNEEC: Clustering result example

35

40

C1

15

20

25

30

Dis

char

ge (

m3 /s

)

C1

C2

C3C4

C5

D.P. Solomatine. Data-driven models of uncertainty 51

0 1000 2000 3000 4000 5000 6000 7000 80000

5

10

Time (hours)

UNEEC: Performance (MLP ANN)

200

250

val (

m3 /s

)

200

250

val (

m3 /s

)

0 50 100 150 200 250

50

100

150

Target lower interval (m3/s)

Pre

dict

ed lo

wer

inte

rv

0 50 100 150 200 250

50

100

150

Target upper interval (m3/s)

Pre

dict

ed u

pper

inte

rv

Prediction interval Data set Mean Std. dev. RMSE Corr. coef.

D.P. Solomatine. Data-driven models of uncertainty 52

Prediction interval Data set Mean Std. dev. RMSE Corr. coef.

training 110.91 53.6 5.9582 0.9937

CV 112.18 52.64 6.0852 0.9934 lower

training+CV 111.35 53.32 5.9582 0.9937

training 115.16 55.11 3.9002 0.9975

CV 116.69 54.18 3.9332 0.9974 upper

training+CV 115.66 54.79 3.9002 0.9975

Page 27: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

27

UNEEC: Estimation of uncertainty bounds

25 0 10

15

20

25

sch

arg

e (m

3/s

)

0

5

10

15

tion

bo

und

s (m

3/s

)

C1C2C3C4C5

PIs by instance baseObservedPIs by regression

0

5

10

15

20

Dis

cha

rge

(m

3/s

) 5

10

15

20

25

Pre

dic

tion

bo

un

ds

(m3/s

)

/s)

C1C2C3C4C5

PIs by instance baseObservedPIs by regression

0

5

Dis

20

25

Pre

dic

4700 4750 4800 4850 4900 4950 5000-2

0

2

Time (hours)Re

sid

ual

s (m

3/s

)

D.P. Solomatine. Data-driven models of uncertainty 53

0 25

0 1000 2000 3000 4000 5000 6000 7000 8000-20

0

20

Time (hours)Re

sid

ua

ls (

m3/

MLUE: Estimation of prediction intervals

D.P. Solomatine. Data-driven models of uncertainty 54

Page 28: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

28

MLUE: Performances

Predictive capability Goodness of uncertainty measures

MCS MT LWR ANN

PICP%

77.24 66.97 75.16 65.54

MPI m3/s

2.09 2.03 1.93 1.96

Corr C RMSE

PIL PIU PIL PIU

MT 0.841 0.792 0.614 1.641

LWR 0.822 0.798 0.643 1.604

ANN 0.847 0.806 0.584 1.568

uncertainty measures

D.P. Solomatine. Data-driven models of uncertainty 55

MCS = Monte CarloMT = M5 Model treeLWR = local weighted regressionANN =MLP neural network

Extensions

Estimation of several quantiles 5%, 10%:10%:90%, 95% i.e. estimating cdf of MC realizations by machine learning

modelsmodels

D.P. Solomatine. Data-driven models of uncertainty 56

Page 29: Data-driven modelling. Part 4. Models of uncertaintyModels ... · PDF file1 Data-driven modelling. Part 4. Models of uncertaintyModels of uncertainty Dimitri P. Solomatine UNESCO-IHE

29

Use of Machine learning methods in predicting uncertainty: conclusions

Machine learning methods are able to replicate: Past performance of a process model Past performance of a process model Results of Monte-Carlo simulations

The methods are computationally efficient and can be used in real time application

They are applicable to various kinds of models Future work:

Other ML methods are to be tested

D.P. Solomatine. Data-driven models of uncertainty 57

Other ML methods are to be tested The methods can be applied in the context of other

sources of uncertainty - input, structure, or combined

Thank you for your attention

D.P. Solomatine. Data-driven models of uncertainty 58