dimension reduction and emulation of fine-resolution data-assimilating models hobart, marine &...
TRANSCRIPT
Dimension reduction and emulation of fine-resolution data-assimilating models
Hobart, Marine & Atmospheric Division
Nugzar Margvelashvili
It is a mark of the educated man do not expect more precision from the treatment of subject matter than the nature of that subject permits
Aristotle (Nicomachean ethics)
How much precision we can achieve in environmental research and management ?
Appreciation of uncertainty has been rising over the last decades but understanding is still poor.
Often we do not know neither uncertainty nor how to handle the problem with high uncertainty.
Uncertainty highlights
“Wicked” problems charged with social dimension and uncertainty. Traditional approaches are bound to fail…
Rittel and Webber “Dilemmas in a general theory of planning”, Policy Sciences,4, 1973
Uncertainty highlights
“Wicked” problems charged with social dimension and uncertainty. Traditional approaches are bound to fail…
Rittel and Webber “Dilemmas in a general theory of planning”, Policy Sciences,4, 1973
Expansion of the post-modern culture and thought on coastal engineering practices. Needs for new approaches and paradigm shifts
Kemphuis J.W. “Coastal engineering – quo vadis?” Coastal engineering, 53, 2006
Uncertainty highlights
“Wicked” problems charged with social dimension and uncertainty. Traditional approaches are bound to fail…
Rittel and Webber “Dilemmas in a general theory of planning”, Policy Sciences,4, 1973
Expansion of the post-modern culture and thought on coastal engineering practices. Needs for new approaches and paradigm shifts
Kemphuis J.W. “Coastal engineering – quo vadis?” Coastal engineering, 53, 2006
Contingency of language, selfhood and community. Transition to a new literary culture, where individuals will be engaged in the creation of a diverse range of narratives “creating private self-images, and reweaving their webs of belief and desire”
Richard Rorty “Contingency, irony and solidarity”.
Questions
Managers and scientists routinely use interpretations of complex systems in their practices
-How much freedom they can afford developing such interpretations?
-Can they always reduce uncertainty of these interpretations using observations? -If the uncertainty is high and irreducible, how to discriminate different interpretations and justify management decisions?
Questions
Managers and scientists routinely use interpretations of complex systems in their practices
-How much freedom they can afford developing such interpretations?
-Can they always reduce uncertainty of these interpretations using observations? -If the uncertainty is high and irreducible, how to discriminate different interpretations and justify management decisions?
How to assimilate data in fine-resolution model using fast and cheap statistical surrogates of the model (emulators)?
Dimensionality curse
• To = 1 sec; N = 2; T= 22 = 4 sec;
To - Single model run-time;
N - Number of parameters;
T - Total run-time over all
vertexes;
Evaluation of the model of N parameters at the vertexes of N-dimensional hypercube
Dimensionality curse
To - Single model run-time;
N - Number of parameters;
T - Total run-time over all
vertexes;
Evaluation of the model of N parameters at the vertexes of N-dimensional hypercube
• To = 1 sec; N = 2; T= 22 = 4 sec;
• To = 1 sec; N = 30; T = 230 ~ 3 years;
Dimensionality curse
• To = 1 sec; N = 2; T= 22 = 4 sec;
• To = 1 sec; N = 30; T = 230 ~ 3 years;
• To = 1 sec; N = 100; T= 2100 >> age of Universe (~259 sec);
To - Single model run-time;
N - Number of parameters;
T - Total run-time over all
vertexes;
Evaluation of the model of N parameters at the vertexes of N-dimensional hypercube
Dimensionality curse
• To = 1 sec; N = 2; T= 22 = 4 sec;
• To = 1 sec; N = 30; T = 230 ~ 3 years;
• To = 1 sec; N = 100; T= 2100 >> age of Universe (~259 sec);
• To= 100 attoseconds (10-16 sec); N=100; T > 100000 years
To - Single model run-time;
N - Number of parameters;
T - Total run-time over all
vertexes;
Evaluation of the model of N parameters at the vertexes of N-dimensional hypercube
Dimensionality curse: implications
• Vast areas in a parameter space remain unexplored
• There are exceptions (simple response surface or hierarchical dependences)
• In a general case there is always room for surprises (the problem is fundamental)
• Ensemble of model evaluations provides a limited set of “observations” on the model behaviour
• To make sense from such “observations”, plausible hypothesis on the model behaviour beyond the sampling points are required
• Emulator: fast and cheap surrogate of the model
An outline
• In what follows we build an emulator for a fine-resolution coastal model and implement it for sequential data assimilation
• Brief description of the method
• Preliminary results from a 1d and 3-d applications with synthetic data sets
• Work in progress (more questions than answers)
Bayesian framework
• Assume we have a model with unknown initial state c(x) and parameters p (but the priors are known and measurements D are available)
• Inference• Posterior ~ Likelihood x Prior
• MCMC machinery to sample from the posterior
• MCMC intractable with complex fine-resolution models• Dimensionality too high (~105)• The model too slow (hours and weeks to run)
• Potential solution• Reduce dimensionality• Speedup simulations
Dimension reduction
• Decompose initial state c(x) into a set of basis functions
D|p,c
• Replace MCMC sampling from the space of initial state + parameters
• with MCMC sampling from the space of decomposition coefficients + parameters
r)x(fa)x(cK
1k
kk
D|p,a
• Even in a reduced space MCMC sampling is a challenge since the model is computationally expensive
• The problem of the state estimation is reduced to the problem of the “parameter” estimation
Emulator
• Decompose ensemble of model runs at time (t) and (t+1)
),(11 tttt paga
• Build Gaussian Process Model (GPM) to map to
• GPM learns from the ensemble of model runs
• GPM (plus decomposition) gives fast and cheap approximation of the model called emulator
1ta),( tt pa
t
K
1k
kk )x(fa)x(c
1t
K
1k
kk )x(fa)x(c
Schematic representation of the sampling strategy
Conventional:
Adopted:
Sequential assimilation strategy
• Run model ensemble from T0 to T1 (forecast)
• Do SVD & build emulator
- Model particle
Sequential assimilation strategy
• Run model ensemble from T0 to T1 (forecast)
• Do SVD & build emulator
• Run MCMC?– Degeneracy problem (ensemble
size too small to capture the proposal distribution)
- Model particle
Sequential assimilation strategy
- Emulator particle
- Model particle
• Run model ensemble from T0 to T1 (forecast)
• Do SVD & build emulator
• Run MCMC?– Degeneracy problem (ensemble
size too small to capture the proposal distribution)
• Introduce large pool of emulator particles
Sequential assimilation strategy
- Emulator particle
- Model particle
• Run model ensemble from T0 to T1 (forecast)
• Do SVD & build emulator
• Run MCMC?– Degeneracy problem (ensemble
size too small to capture the proposal distribution)
• Introduce large pool of emulator particles
• Run MCMC with emulators
Sequential assimilation strategy
- Emulator particle
- Model particle
• Run model ensemble from T0 to T1 (forecast)
• Do SVD & build emulator
• Run MCMC?– Degeneracy problem (ensemble
size too small to capture the proposal distribution)
• Introduce large pool of emulator particles
• Run MCMC with emulators
• Update model ensemble and run analysis
Further assumptions/approximations
• The emulators ameliorate the degeneracy problem but it is still there
• Observation models with tails heavier than the Gaussian (Gaussian density with normalised error; Lorentz density)
• Gaussian mixture approximation to populate new samples into the proposal density
Test application with 1d model
• 1d vertical sediment/pollutant transport model in a coupled benthic-pelagic layers
• 3 state variables (sediment, dissolved & particulate tracers)
• 4 unknown parameters (ripple height, settling velocity, sorption rate constant, sorption Kd)
• Synthetic data given by 2 hourly concentrations at reference point
• Sequential assimilation strategy
Estimated state variables
c
0
0.01
0.02
0.03
0 2 4 6 8 10 12Day
Dis
so
lve
d t
ox
(A
m-3
)
model
truth
95%
a
0
0.2
0.4
0.6
0 2 4 6 8 10 12
Day
Se
dim
en
t (k
g m
-3)
model
truth
Ensemble mean bias of estimated parameters for different assimilation scenarios
Velocity
-0.005-0.004-0.003-0.002-0.001
00.001
0 2 4 6 8 10 12Day
Bia
s (
m s
-1)
Kd
-20
-15
-10
-5
0
5
10
0 2 4 6 8 10 12
Bia
s (
m3
kg
-1)
Test application with 3-d sediment model (SE TAS)
• 2 state variables (silt &clay in water and sediments)
• 3 unknown parameters (ripple height and 2 settling velocities)
• Synthetic data given by 12 hour surface concentrations (“satellite” data)
• Ensemble of 16 model runs
• 2 eigen-functions
• Online sequential assimilation
Snapshot of surface TSS
Operational hydrodynamic model
Estimated parameters & model error
Fitzroy Estuary & Keppel Bay
• 2 state variables (silt &clay in water and sediments)
• 3 unknown parameters (ripple height and 2 settling velocities)
• Synthetic data given by 24 hour surface concentrations (“satellite” data)
• Ensemble of 16 model runs
• 2 eigen-functionsSurface suspended sediment
Estimated parameters & error(baseline scenario with “perfect” model)
Estimated TSS (top) vs “truth” (bottom)
Satellite data for suspended sediments Satellite data for suspended sediments (2003-2004)(2003-2004)
A. Dekker et al.
Model error approximation projected on 2d sub-spaces (Keppel Bay)
1 For a particular configuration of parameters run emulator up to time t2 Build anomaly of the emulator solutions3 Decompose anomaly via svd and keep 3 + 3 basis functions4 Decomposition coefficients corresponding to these basis functions define a dot point in 6d
space5 Project 6d to 2d
Model error approximation projected on 2d sub-spaces (Keppel Bay)
Ups and downs of the technique
• Treats the simulation model as a black box
• Has a potential to deliver cheap and fast emulators of complex model
• Needs further research and development
• Easy to parallelise (ensemble runs, svd decomposition, emulator runs)
• Computationally expensive on conventional desktop PC (requires multiprocessor machines)
Acknowledgments
Parslow John
Murray Lawrence
Campbell Eddy
Jones Emlyn
Herzfeld Mike
Andrewartha John
Rizwi Farhan
CSS TCP, WfO & WfHC themes