data-driven modelling. part 4. models of uncertaintymodels ... · pdf file1 data-driven...
TRANSCRIPT
1
Data-driven modelling.Part 4.
Models of uncertaintyModels of uncertainty Dimitri P. Solomatine
UNESCO-IHE Institute for Water EducationHydroinformatics Chair
Let’s consider a (physically-based) Model
Example: a hydrological conceptual modelSF SnowSF Snow
Input(s)
SM
RF
R
EA
SP
SF
IN
SF – Snow
RF – Rain
EA – Evapotranspiration
SP – Snow cover
IN – Infiltration
R – Recharge
SM – Soil moisture
CFLUX – Capillary transport
UZ – Storage in upper reservoir
PERC – Percolation
LZ – Storage in lower reservoirSM
RFRF
RR
EAEA
SP
SFSF
ININ
SF – Snow
RF – Rain
EA – Evapotranspiration
SP – Snow cover
IN – Infiltration
R – Recharge
SM – Soil moisture
CFLUX – Capillary transport
UZ – Storage in upper reservoir
PERC – Percolation
LZ – Storage in lower reservoir
ParametersStates
D.P. Solomatine. Data-driven models of uncertainty 2
LZ
UZ
R
PERC Q=Q0+Q1Q1
Transformfunction
Q0
CFLUX Qo – Fast runoff component
Q1 – Slow runoff component
Q – Total runoff
LZ
UZ
RR
PERCPERC Q=Q0+Q1Q1Q1
Transformfunction
Q0Q0
CFLUXCFLUX Qo – Fast runoff component
Q1 – Slow runoff component
Q – Total runoff Output
2
Model output: deterministic forecasts. How to find the 90% uncertainty bounds?
3500
4000
1500
2000
2500
3000
3500
Dis
char
ge(m
3/s)
D.P. Solomatine. Data-driven models of uncertainty 3
900 920 940 960 980 1000 10200
500
1000
Time(days)
Steps in uncertainty analysis
Identification of sources of uncertainty (Input, parameter, model operational)model, operational)
Quantification of uncertainty (e.g. distribution and range of parameters)
Reduction of uncertainty (e.g. better understanding of the model, accurate measurement etc)
Studying propagation of uncertainty through the model ( b M t C l i l ti )
D.P. Solomatine. Data-driven models of uncertainty 4
(e.g. by Monte Carlo simulation) Quantification of uncertainty in the model outputs (e.g.
analysis of the outputs of the results) Application of the uncertain information in decision
making process
3
Sources of model uncertaintyy = M(x, p) + εs + εθ + εx + εy
SF – Snow
RF – Rain
SF – Snow
RF – Rain
D.P. Solomatine. Data-driven models of uncertainty 5
LZ
UZ
SM
RF
R
PERC
EA
Q=Q0+Q1Q1
Transformfunction
SP
Q0
SF
CFLUX
IN
EA – Evapotranspiration
SP – Snow cover
IN – Infiltration
R – Recharge
SM – Soil moisture
CFLUX – Capillary transport
UZ – Storage in upper reservoir
PERC – Percolation
LZ – Storage in lower reservoir
Qo – Fast runoff component
Q1 – Slow runoff component
Q – Total runoff
LZ
UZ
SM
RFRF
RR
PERCPERC
EAEA
Q=Q0+Q1Q1Q1
Transformfunction
SP
Q0Q0
SFSF
CFLUXCFLUX
ININ
EA – Evapotranspiration
SP – Snow cover
IN – Infiltration
R – Recharge
SM – Soil moisture
CFLUX – Capillary transport
UZ – Storage in upper reservoir
PERC – Percolation
LZ – Storage in lower reservoir
Qo – Fast runoff component
Q1 – Slow runoff component
Q – Total runoff
Sources of model uncertainty
Observational (data) uncertainty (εx , εy) a) measurement errors due to both instrumental and human
y = M(x, p) + εs + εθ + εx + εy
a) measurement errors due to both instrumental and human errors
b) error due to inadequate representativeness of a data sample due to scale incompatibility or difference in time and space between the variable measured by the instrument and the corresponding model variable.
Model (structural) uncertainty εs
D.P. Solomatine. Data-driven models of uncertainty 6
( ) y s
Parametric uncertainty εθ
4
Measures of Uncertainty
1 400
Probability distributionConfidence bounds (two quantiles, say 5% and 95%)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pre
dic
tive
Pro
ba
bili
ty P
(H<
=h
)
50
100
150
200
250
300
350
Dis
cha
rge
(m
3.s
)
D.P. Solomatine. Data-driven models of uncertainty 77
8 8.5 9 9.5 10 10.5 11 11.5 120
Observed River Stage h (m)
10 20 30 40 50 60 700
Time (hr)
Other measures : range, variance, other moments
Uncertainty of a calibrated (“optimal”) model
Output YModel
Model output y^ Measured
Uncertainty of an optimal model M Model is calibrated on measured data y
time
Actual
Actual value y*(unknown)
Measured value y
MeasuredModel error
Observation error
D.P. Solomatine. Data-driven models of uncertainty 8
Model is calibrated on measured data y We say the model M uncertainty is manifested in the model error This error incorporates all uncertainties due to:
observational errors, inaccurately estimated parameters, inadequate model structure
We can bulid a ML model U to predict the pdf of the model M error
5
Propagation of uncertainty through the model
y^ = M (x, p) x = input p = parameters x = input, p = parameters
Uncertainty in X and p propagates to output y pdf of parameters pdf of output pdfp pdfy pdf of inputs pdfx pdf of output pdfx pdfy
D.P. Solomatine. Data-driven models of uncertainty 9
Suggested approach to “building up”model uncertainty
uncertainty of an optimal model M (p*)t i t f M ( *) d t DATA t i t + uncertainty of M (p*) due to DATA uncertainty
+ uncertainty of M (p) due to PARAMETERS uncertainty + uncertainty of M (p) due to STRUCTURAL uncertainty = uncertainty of a model class M (p), given the
probabilistic properties of parameters and data
D.P. Solomatine. Data-driven models of uncertainty 10
6
Monte Carlo Simulation
Mote Carlo casino: roulette wheel
D.P. Solomatine. Data-driven models of uncertainty 12
It is a random number generator – uses uniform distribution with the range of [0, 36]
7
Monte Carlo simulation in analysing parametric uncertainty
D.P. Solomatine. Data-driven models of uncertainty 13
Monte Carlo sampling: illustrationy = M(x, s, θ) + εs + εθ + εx + εy
D.P. Solomatine. Data-driven models of uncertainty 14
8
Monte Carlo simulation of parametric uncertainty
Consider the model M calculating y (e.g., discharge) y^(t) = M (X(t) p) y (t) = M (X(t), p) where X(t) = vector of inputs (precipitation, temperature etc)
known for t = 1,…, T p = vector of parameters (soil properties, roughness, etc)
Monte Carlo approach: sample N parameter vectors pi
run the model for each of them y^i(t) = M (X(t), pi)
D.P. Solomatine. Data-driven models of uncertainty 15
y i( ) ( ( ), pi)and generate N outputs (leave some of them if GLUE used)
assess distribution of Qi(t) for each time moment t (or its parameters - mean, variance, prediction intervals, quantiles)
Issues with Monte Carlo
There is however a problem during model operation: How to assess the parametric uncertainty of the p y
model M for t = T+1 when new input data X(t+1) is fed?
It is difficult since MC analysis is computationally intensive: 1) convergence of the Monte Carlo simulation is very slow
(O(N^-0.5)) so larger number of runs needed to establish a reliable estimate of uncertainties2) b f i l ti i ti ll ith th
D.P. Solomatine. Data-driven models of uncertainty 16
2) number of simulation increases exponentially with the dimension of the parameter vector ((O(n^d)) to cover the entire parameter domain
Idea: encapsulate the results of MC simulation in a machine learning
model (MLUE method, to be considered later)
9
Economic sampling: Latin Hypercube Sampling (LHS), McKay 1979
the range of each variable is divided into M equally probable intervals
M sample points are placed as shown
the number of divisions, M, is equal for each variable
D.P. Solomatine. Data-driven models of uncertainty 17
for each variable LHS does not require
more samples for more dimensions (variables)
Building Models of Uncertainty : Using Methods of
Computational Intelligence
10
Data: attributes, inputs, outputs
Data often missing, noisyg, y temporal and spatial, heterogeneous non-stationary
Measured data Attributes
Inputs OutputInstances x1 x2 … xn y
Model output
y* = f (x)
y*
D.P. Solomatine. Data-driven models of uncertainty 19
Instances x1 x2 … xn yInstance 1 x11 x12 x1n y1
Instance 2 x21 x21 x2 n y2
…Instance K xK1 xK2 xK n yK
yy1*y2*…yK*
CI in building models of natural processes -why not build a model of uncertainty?
Actual (observed) t t Y
Input dataNatural process
Xoutput Y
Data-drivenmodel
M (p, x)P di t d t t Y’
Error (p) min
yyKKxxKK nnxxKK22xxKK11Instance KInstance K……
yy22xx22 nnxx2121xx2121Instance Instance 22yy11xx11nnxx1212xx1111Instance Instance 11yyxxnn……xx22xx11InstancesInstances
OutputOutputInputsInputsAttributesAttributesMeasured dataMeasured data
yyKKxxKK nnxxKK22xxKK11Instance KInstance K……
yy22xx22 nnxx2121xx2121Instance Instance 22yy11xx11nnxx1212xx1111Instance Instance 11yyxxnn……xx22xx11InstancesInstances
OutputOutputInputsInputsAttributesAttributesMeasured dataMeasured data
D.P. Solomatine. Data-driven models of uncertainty 20
CI provides methods to build Data-driven models Ideally, such models are “ultimate models” since they are not
polluted by theories
(p, )Predicted output Y’
11
Example of a data-driven (ML) model
observed data characterises the input-output relationship
Linear regression model
Y = a0 + a1 X
Y actual input output relationship X Y
model parameters are found by optimization
the model then predicts output for the new input without actual knowledge of what drives Y
(e.g., flow)output value
model predicts new output value
redgreen
blue
D.P. Solomatine. Data-driven models of uncertainty 21
X (e.g. rainfall)
new input value
Which model is “better”:green, red or blue?
An example of a ML model: artificial neural network
Artificial neural network
D.P. Solomatine. Data-driven models of uncertainty 22
i
tiijj
j
hidjkk
out xaagbbgY 2)(00 ))((
e + 1
1 = (u) gwhere
u-
12
A novel idea: train a machine learning model to predict Error distribution
Observed outputPHYSICAL SYSTEM
Observed outputPHYSICAL SYSTEM
HYDROLOGIC FORECASTING
MODEL
Input data Modelerrors
Forecastederrors
DATA-DRIVEN
error forecasting
Improvedoutput
Modelparameters
Model outputHYDROLOGIC
MODEL
Input data Modelerrors
Forecastederrors
Improvedoutput
Modelparameters
Model output
ML ERROR FORE-CASTING
D.P. Solomatine. Data-driven models of uncertainty 23
Train ML model (e.g. Neural Network) to represent pdf of output uncertaintyD.P. Solomatine, D.L. Shrestha (2009). A novel method to estimate model uncertainty
using machine learning techniques. Water Resources Res. 45, W00B11. D. L. Shrestha, N. Kayastha, and D. P. Solomatine (2009). A novel approach to
parameter uncertainty analysis of hydrological models using neural networks. Hydrol. Earth Syst. Sci., 13, 1235–1248.
gmodelMODEL
Models and CI in Hydroinformatics
1. Models of systems (processes) M (s)
2. Models of models M (M(s))
3. Models of models’ uncertainty U (M(s))
4. Optimisation of models and systems s*, M*(s)
D.P. Solomatine. Data-driven models of uncertainty 24
4. Optimisation of models and systems s , M (s)
13
Uncertainty analysis methods considered
1. UNNEC = machine learning model of the past errors of the optimal process model is builtthe optimal process model is built
2. MLUE = machine learning model of the process model’s Monte Carlo simulation results is built
D.P. Solomatine. Data-driven models of uncertainty 25
1. UNEEC methodUNcertainty Estimation based on local Errors and Clustering
D.P. Solomatine. Data-driven models of uncertainty 26
machine learning model of the past errors of the optimal process model is built
14
UNEEC: assumptions, constraints
Assumptions
Model error is an indicator of the model uncertainty Model error is an indicator of the model uncertainty
Model error depends on the current condition of a natural system and can be predicted
Model errors are similar for similar conditions
Constraints
Model structure and parameters are fixed
D.P. Solomatine. Data-driven models of uncertainty 27
p
Need to re-train the error model with the changes in the catchment characteristics (e.g. land use change)
Data hungry, more data are needed for reliable results
Idea 1: assess CDF from all historical data about errors
Error (Qt-Qt’)
time
past records (examples)
i
iN
i
1
iN
i
12/
iN
i
1)2/1(
P di ti i t l
Error distribution
D.P. Solomatine. Data-driven models of uncertainty 28
Prediction interval
15
Idea 2: local modelling of errors (link CDF to “characteristis variables”)
Error (Qt-Qt’)
Flow Qt-1
past records (examples in the space of inputs)
Outputi
iN
i
1
iN
i
12/
iN
i
1)2/1(
Error distribution in cluster
D.P. Solomatine. Data-driven models of uncertainty 29
Rainfall Rt-2
Prediction interval(different for each cluster)
Idea 3: Use fuzzy clustering of examples to generate training data sets
Lclus
Nclus
exampleclusLexample PICPI ,
Eager learning (ANN or M5 model tree)
Error limits(or prediction intervals)
Flow Qt-1
past records (examples in the space of inputs)
Output
clus1
•clus,,example is the membership grade of the example to cluster clus
Train regression (ANN) models:
( )
D.P. Solomatine. Data-driven models of uncertainty 30
New record. The trained f L and f U models will estimate the prediction interval
Rainfall Rt-2
models:PIL = fL (X)PIU = fU (X)
16
Using instance-based learning
Lclus
Nclus
exampleclusLexample PICPI ,
Instance based learning
Error limits(or prediction intervals)
Flow Qt-1
past records (examples in the space of inputs)
New record
Output
cluspp
1,
•clus,,example is the membership grade of the example to cluster clus
g
D.P. Solomatine. Data-driven models of uncertainty 31
Rainfall Rt-2
record
The distance function is computed to estimate fuzzy weight
Clustering (finding groups of data in the space characterising hydro-meteo condition): K-means clustering, fuzzy C-means l t i
UNEEC details. Step 1: clustering
clustering
Obj. function
c
j
N
iji
mjim DVUJVU
1 1
2,,),(),min(
Distance22
, Ajiji vxD
Degree ofFuzzification
1m
D.P. Solomatine. Data-driven models of uncertainty 32
Constraint ic
jji
,1
1,
Fuzzification
17
iLj ePIC
UNEEC details. Step 2: Determining Prediction Interval (PI) for each cluster
UPIC
N
ijiji
i
ki
1,,
12/:
Ni
ji,
N
iji
1,
N
iji
1,)2/1(
iUj ePIC
ijiji
ki
1,,
1)2/1(:
D.P. Solomatine. Data-driven models of uncertainty 33
N
iji
1,2/
UNEEC details. Step 3, 4, 5: Building and using the model
Lj
c
jji
Li PICPI
1, U
jc
jji
Ui PICPI
1,
Step 3: Generation of Prediction intervals for
each example
)X( uL
uL fPI )X(U
uU fPI
Step 4: Building the uncertainty Model
)X(LL fPI )X(UU fPI Step 5: Using the uncertainty Modelep
ende
nt C
ompu
tati
on
D.P. Solomatine. Data-driven models of uncertainty 34
)X( vufPI )X( vufPI uncertainty Model
Uii
Ui PIyPL ˆL
iiLi PIyPL ˆ
Model Outputs with uncertainty bounds
Inde
18
UNEEC methodology
D.P. Solomatine. Data-driven models of uncertainty 35
2. MLUE methodMachine Learning in Uncertainty Estimation
D.P. Solomatine. Data-driven models of uncertainty 36
machine learning model of the process model’s Monte Carlo simulation results is built
19
Monte Carlo simulation of parametric uncertaintyy = M(x, s, θ) + εs + εθ + εx + εy
D.P. Solomatine. Data-driven models of uncertainty 37
Monte Carlo simulation of parametric uncertainty
Consider the model M calculating y (e.g., discharge) y(t) = M (X(t), p)y( ) ( ( ), p) where X(t) = vector of inputs (precipitation, temperature etc)
known for t = 1,…, T p = vector of parameters (soil properties, roughness, etc)
Monte Carlo approach: sample N parameter vectors pi run the model for each of them yi(t) = M (X(t), pi)
and generate N outputs (leave some of them if GLUE used)
D.P. Solomatine. Data-driven models of uncertainty 38
and generate N outputs (leave some of them if GLUE used) assess distribution of Qi(t) for each time moment t (or its
parameters - mean, variance, prediction intervals, quantiles) The problem:
How to assess the parametric uncertainty of the model M for t = T+1 when new input data X(t+1) is fed?
20
Issues with MC
Issues with re-running MC for new inputs: 1) convergence of the Monte Carlo simulation is very slow 1) convergence of the Monte Carlo simulation is very slow
(O(N^-0.5)) so larger number of runs needed to establish a reliable estimate of uncertainties
2) number of simulation increases exponentially with the dimension of the parameter vector ((O(n^d)) to cover the entire parameter domain
Idea:
D.P. Solomatine. Data-driven models of uncertainty 39
encapsulate the results of MC simulation in a machine learning model
MLUE Methodology (1)
Consider the sources of the uncertainty analysis to be conducted within the framework of Monte Carlo simulation
Execute the MC simulations to generate the datayi(t) = M (X(t), pi)
Estimate the uncertainty measures of the MC realizations, e.g., mean, variance, prediction intervals, quantiles to start with, estimate two quantiles (say, 5% and
D.P. Solomatine. Data-driven models of uncertainty 40
, q ( y,95%), forming the prediction interval PI
21
MLUE Methodology (2)
Analyze the dependency of the uncertainty measures (quantiles) on the input and state variables of the(quantiles) on the input and state variables of the hydrological model we used Correlation and Average mutual information
analysis Select the input variables for machine learning model
based on the dependency analysisT i th hi l i d l U t di t th
D.P. Solomatine. Data-driven models of uncertainty 41
Train the machine learning model U to predict the uncertainty measures of MC realizations PI = U (X)
Validate machine learning model U by estimating the uncertainty measures with the “new” input data
Validation
Measuring predictive capability of uncertainty model U (measures the accuracy of uncertainty models in approximating the quantiles of the model outputs generated by MC simulations) Coefficient of correlation (r) and root mean squared error (RMSE)
Measuring the statistics of the uncertainty estimation (i.e. goodness of the model U as uncertainty estimator) Prediction interval coverage probability (PICP) and
mean prediction interval (MPI) (Shrestha & Solomatine 2006, 2008) 1 n
D.P. Solomatine. Data-driven models of uncertainty 42
Visualizing such as scatter and time plot of the prediction intervals obtained from the MC simulation and their predicted values
1
1
1, with =
0, otherwise
t
L Ut t t
PICP Cn
PL y PLC
1
1( )
nU Lt t
t
MPI PL PLn
22
Application
• Catchment area: 135 km2
• Location: south west of England
Study area: Brue catchment, UK
• Average annual rainfall: 867 mm
• Mean river flow: 1.92 m3/s
• Calibration data: 24/06/94-24/06/95
• Validation data: 24/06/95-31/05/96
D.P. Solomatine. Data-driven models of uncertainty 44
23
Study area: Brue catchment, UK
40
45 0
5
10
15
20
25
30
35
40
Dis
char
ge
(m3/s
)
5
10
15
20
25
30
35
Rai
nfa
ll (
mm
/ho
ur)
D.P. Solomatine. Data-driven models of uncertainty 45
0
5
0 2000 4000 6000 8000 10000 12000 14000 16000
Time (houry) (1994/6/24 05:00 - 1996/05/31 13:00)
35
40
Calibration data (8760 points):
Validation data (8217 points):
Conceptual Hydrological model HBV
SF – Snow
RF – Rain
EA – Evapotranspiration
SF – Snow
RF – Rain
EA – Evapotranspiration
SM
RF
R
EA
SP
Q0
SF
CFLUX
IN
p p
SP – Snow cover
IN – Infiltration
R – Recharge
SM – Soil moisture
CFLUX – Capillary transport
UZ – Storage in upper reservoir
PERC – Percolation
LZ – Storage in lower reservoir
Qo – Fast runoff component
Q1 – Slow runoff component
Q T t l ff
SM
RFRF
RR
EAEA
SP
Q0Q0
SFSF
CFLUXCFLUX
ININ
p p
SP – Snow cover
IN – Infiltration
R – Recharge
SM – Soil moisture
CFLUX – Capillary transport
UZ – Storage in upper reservoir
PERC – Percolation
LZ – Storage in lower reservoir
Qo – Fast runoff component
Q1 – Slow runoff component
Q T t l ff
D.P. Solomatine. Data-driven models of uncertainty 46
LZ
UZ
PERC Q=Q0+Q1Q1
Transformfunction
Q – Total runoff
LZ
UZ
PERCPERC Q=Q0+Q1Q1Q1
Transformfunction
Q – Total runoff
24
Data Analysis
Analysis of dependency btw various combinations of the input variables and the outputinput variables and the output
Correlation
Average mutual information (AMI) between REt and PIs,
( optimal lag time is around 7-9 hours).
Additional analysis of the correlation and AMI between the PIs and observed discharge Qt are carried out. (i.e. with
D.P. Solomatine. Data-driven models of uncertainty 47
g Q (the lag of 0, 1, 2) have very high correlation with the PIs.
Experimental setup
MC simulation 9 Parameters of HBV model are sampled uniformly from 9 Parameters of HBV model are sampled uniformly from the feasible ranges
Nash-Sutcliffe coefficient of efficiency (CE) is used as error measure
Convergence – stabilized after 10,000 (75,000 runs made) Only 25,000 “good” models considered (rejection threshold
is set to 0) to compute prediction quantiles
D.P. Solomatine. Data-driven models of uncertainty 48
is set to 0) to compute prediction quantiles
25
Experimental setup
Machine learning model U PI = U (RE Q Q ) PI = U (REt-5a, Qt-1, Qt-1 )
PI - lower or upper prediction intervals,
REt-5a - average of REt-5, REt-6, REt-7, REt-8, and REt-9 Qt-1 - Qt-1 - Qt-2.
Input variables were selected based on the analysis of their relatedness to output error (average mutual information)
( , )AMI ( ) l
XY i jP x yP
D.P. Solomatine. Data-driven models of uncertainty 49
Methods: M5 model trees, locally weighted regression MLP neural networks
2,
AMI= ( , ) log( ) ( )
jXY i j
X i Y ji j
P x yP x P y
Results
26
UNEEC: Clustering result example
35
40
C1
15
20
25
30
Dis
char
ge (
m3 /s
)
C1
C2
C3C4
C5
D.P. Solomatine. Data-driven models of uncertainty 51
0 1000 2000 3000 4000 5000 6000 7000 80000
5
10
Time (hours)
UNEEC: Performance (MLP ANN)
200
250
val (
m3 /s
)
200
250
val (
m3 /s
)
0 50 100 150 200 250
50
100
150
Target lower interval (m3/s)
Pre
dict
ed lo
wer
inte
rv
0 50 100 150 200 250
50
100
150
Target upper interval (m3/s)
Pre
dict
ed u
pper
inte
rv
Prediction interval Data set Mean Std. dev. RMSE Corr. coef.
D.P. Solomatine. Data-driven models of uncertainty 52
Prediction interval Data set Mean Std. dev. RMSE Corr. coef.
training 110.91 53.6 5.9582 0.9937
CV 112.18 52.64 6.0852 0.9934 lower
training+CV 111.35 53.32 5.9582 0.9937
training 115.16 55.11 3.9002 0.9975
CV 116.69 54.18 3.9332 0.9974 upper
training+CV 115.66 54.79 3.9002 0.9975
27
UNEEC: Estimation of uncertainty bounds
25 0 10
15
20
25
sch
arg
e (m
3/s
)
0
5
10
15
tion
bo
und
s (m
3/s
)
C1C2C3C4C5
PIs by instance baseObservedPIs by regression
0
5
10
15
20
Dis
cha
rge
(m
3/s
) 5
10
15
20
25
Pre
dic
tion
bo
un
ds
(m3/s
)
/s)
C1C2C3C4C5
PIs by instance baseObservedPIs by regression
0
5
Dis
20
25
Pre
dic
4700 4750 4800 4850 4900 4950 5000-2
0
2
Time (hours)Re
sid
ual
s (m
3/s
)
D.P. Solomatine. Data-driven models of uncertainty 53
0 25
0 1000 2000 3000 4000 5000 6000 7000 8000-20
0
20
Time (hours)Re
sid
ua
ls (
m3/
MLUE: Estimation of prediction intervals
D.P. Solomatine. Data-driven models of uncertainty 54
28
MLUE: Performances
Predictive capability Goodness of uncertainty measures
MCS MT LWR ANN
PICP%
77.24 66.97 75.16 65.54
MPI m3/s
2.09 2.03 1.93 1.96
Corr C RMSE
PIL PIU PIL PIU
MT 0.841 0.792 0.614 1.641
LWR 0.822 0.798 0.643 1.604
ANN 0.847 0.806 0.584 1.568
uncertainty measures
D.P. Solomatine. Data-driven models of uncertainty 55
MCS = Monte CarloMT = M5 Model treeLWR = local weighted regressionANN =MLP neural network
Extensions
Estimation of several quantiles 5%, 10%:10%:90%, 95% i.e. estimating cdf of MC realizations by machine learning
modelsmodels
D.P. Solomatine. Data-driven models of uncertainty 56
29
Use of Machine learning methods in predicting uncertainty: conclusions
Machine learning methods are able to replicate: Past performance of a process model Past performance of a process model Results of Monte-Carlo simulations
The methods are computationally efficient and can be used in real time application
They are applicable to various kinds of models Future work:
Other ML methods are to be tested
D.P. Solomatine. Data-driven models of uncertainty 57
Other ML methods are to be tested The methods can be applied in the context of other
sources of uncertainty - input, structure, or combined
Thank you for your attention
D.P. Solomatine. Data-driven models of uncertainty 58