using r for global optimization of a fully-distributed hydrological model at continental scale (agu...

1
U U sing R for Global Optimization of a Fully-distributed sing R for Global Optimization of a Fully-distributed Hydrological Model at Continental Scale Hydrological Model at Continental Scale U U sing R for Global Optimization of a Fully-distributed sing R for Global Optimization of a Fully-distributed Hydrological Model at Continental Scale Hydrological Model at Continental Scale Mauricio Zambrano-Bigiarini, Zuzanna Zajac and Peter Salamon Mauricio Zambrano-Bigiarini, Zuzanna Zajac and Peter Salamon AGU 2013-1804792 Identifier: H51R-06 Dec 13 th , 2013 Mauricio Zambrano-Bigiarini*, Zuzanna Zajac and Peter Salamon European Commission • Joint Research Centre • Institute for Environment and Sustainability *Currently at: EULA-Chile Centre, University of Concepción (Chile) • Email: [email protected] www.jrc.europa.eu Joint Research Centre 1) Motivation The spatially-distributed LISFLOOD hydrological model is used for flood forecasting at Pan-European scale, within the European Flood Awareness System ( EFAS). Several model parameters need to be estimated through calibration for ca. 700 subcatchments. Calibrating all the individual catchment for the whole Europe is a very time consuming and prone-to-error task. 4) Pre-processing Historical daily data for 4062 stream gages (from national providers). hydroTSM, sp and raster packages were used to select ~700 stations with enough temporal data and good spatial distribution across Europe. Nine parameters were selected for calibration based on previous expert knowledge. The pan-European spatial extent was split up into 7 main calibration areas, in order to speed up the model computation time. Customized R scripts were used to extract observed time series for each catchment and to prepare the input files required for individual calibrations (i.e., ParamRanges.txt, ParamFiles.txt, obs.tss, and hydroPSO-subbXXX.R files along with a masking area map defining the drainage area of individual catchments). 2) Aim To describe and illustrate how the free software R has been used as a single environment to pre-process hydro-meteorological data, carry out global optimization, and to post-process calibration results at European scale. References: EFAS (2013), “European Flood Awareness System”, http://www.efas.eu/. [Online. Last accessed 05-Dec-2013] van Der Knijff, J. M., J. Younis, and A. P. J. De Roo (2010), LISFLOOD: a GIS-based distributed model for river basin scale water balance and flood simulation, International Journal of Geographical Information Science, 24(2), 189–212, doi:10.1080/13658810802549154. Zambrano-Bigiarini, M.; R. Rojas (2013), A model-independent Particle Swarm Optimisation software for model calibration, Environmental Modelling & Software, 43, 5-25, doi:10.1016/j.envsoft.2013.01.004. Zambrano-Bigiarini, M. (2013). hydroTSM: Time series management, analysis and interpolation for hydrological modelling. R package version 0.4-1. http://CRAN.R-project.org/package=hydroTSM Zambrano-Bigiarini, M. (2013). hydroGOF: Goodness-of-fit functions for comparison of simulated and observed hydrological time series. R package version 0.3-7. http://CRAN.R-project.org/package=hydroGOF 7) Concluding Remarks The use of the 'parallel' option available in the hydroPSO, allowed a substantial reduction of the total calibration time (ca. 50% with 6 cores). R proved to be an efficient environment to facilitate modeling, visualization and data analysis at continental scale. The use of a single environment for pre-processing, calibrating and post-processing of results made easier further changes to any step of the workflow. Results in hundreds of catchments with different hydro-climatological regimes showed that hydroPSO is an effective and efficient R package for finding near-optimal parameter sets at a low computation cost. Notwithstanding this case study is related only to the calibration of a hydrological model written in Ptyhon+PCRaster, we believe that a similar approach can be applied to a wide class of environmental models requiring some form of parameter optimization, from micro to global scale. 3) Why using R for massive hydrological analysis? Base functions allow efficient data manipulation and storage (spatial data and time series). Support for almost every vectorial and raster spatial format (rgdal, raster and sp packages). R is both a scientific software and a programming language (types, objects, functions, extensions). Scripting capabilities allow explicit documentation and reproducible research. Fully-customizable and high-quality graphical functions for exploratory data analysis and visualization. Highly extensible (4000+ packages with state-of-the art contributions in several fields of knowledge). Easy integration with other languages (C/C++, Fortran, Python, etc), e.g., for intensive computations. Easy parallelization (multi-core machines or network clusters). Multi-platform (GNU/Linux, Mac/OS X, Windows) Free and open-source. Fig 01. Shaded boxes represent the seven major calibration areas used for splitting up the pan-European spatial domain. Colored dots represent discharge stations coming from two different data sources, which were analyzed to select ca. 700 stations for calibration. Fig 02. Flow chart of the calibration of a single catchment. Files ParamRanges.txt and ParamFiles.txt defines which parameters are to be calibrated and where they have to be modified, respectively. Settings.xml defines location of model input files and the value of model parameters. Light-blue shaded boxes indicate some user intervention, while light-yellow shaded boxes represent static input files (not modified during optimization). obs.tss : file with observed discharges. dis.tss : file with simulated discharges. read_tss(): user-defined R function for reading .tss files NSE() : R function for computing the Nash-Sutcliffe efficiency (hydroGOF package) SPSO-2011 : Standard Particle Swarm Optimization 2011 (hydroPSO package). 5) Model Calibration 6) Calibration results + post-processing Fig 03. Evolution of the global optimum (Nash-Sutcliffe efficiency) and the normalized swarm radius (δnorm) along the number of iterations. Fig 06. Figure automatically generated for assessing the quality of the calibration results of each single catchment. The upper panel shows a comparison of the observed and simulated hydrographs during the verification period; the lower left panel shows a comparison of the flow duration curves thereof, while the lower right panel shows numerical statistics for comparing observations with their simulated counterparts. Fig 04. Nash-Sutcliffe efficiency (NSE) response surface projected onto the parameter space (pseudo 3D-dotty plots) for selected parameters, to highlight equifinality issues. NSE Fig 05. Dotty plots showing the model performance (NSE) versus parameter values, for three selected parameters. Vertical red line indicates the “optimum” parameter value.

Upload: mauricio-zambrano-bigiarini

Post on 18-Jul-2015

302 views

Category:

Education


5 download

TRANSCRIPT

  • UUsing R for Global Optimization of a Fully-distributed sing R for Global Optimization of a Fully-distributed Hydrological Model at Continental ScaleHydrological Model at Continental Scale

    UUsing R for Global Optimization of a Fully-distributed sing R for Global Optimization of a Fully-distributed Hydrological Model at Continental ScaleHydrological Model at Continental Scale

    Mauricio Zambrano-Bigiarini, Zuzanna Zajac and Peter SalamonMauricio Zambrano-Bigiarini, Zuzanna Zajac and Peter SalamonAGU 2013-1804792Identifier: H51R-06

    Dec 13th, 2013

    Mauricio Zambrano-Bigiarini*, Zuzanna Zajac and Peter SalamonEuropean Commission Joint Research Centre Institute for Environment and Sustainability*Currently at: EULA-Chile Centre, University of Concepcin (Chile) Email: [email protected]

    JointResearchCentre

    1) MotivationThe spatially-distributed LISFLOOD hydrological model is used for flood forecasting at Pan-European scale, within the European Flood Awareness System (EFAS).

    Several model parameters need to be estimated through calibration for ca. 700 subcatchments.

    Calibrating all the individual catchment for the whole Europe is a very time consuming and prone-to-error task.

    4) Pre-processing Historical daily data for 4062 stream gages (from

    national providers). hydroTSM, sp and raster packages were used to

    select ~700 stations with enough temporal data and good spatial distribution across Europe.

    Nine parameters were selected for calibration based on previous expert knowledge.

    The pan-European spatial extent was split up into 7 main calibration areas, in order to speed up the model computation time.

    Customized R scripts were used to extract observed time series for each catchment and to prepare the input files required for individual calibrations (i.e., ParamRanges.txt, ParamFiles.txt, obs.tss, and hydroPSO-subbXXX.R files along with a masking area map defining the drainage area of individual catchments).

    2) AimTo describe and illustrate how the free software R has been used as a single environment to pre-process hydro-meteorological data, carry out global optimization, and to post-process calibration results at European scale.

    References: EFAS (2013), European Flood Awareness System, http://www.efas.eu/. [Online. Last accessed 05-Dec-2013] van Der Knijff, J. M., J. Younis, and A. P. J. De Roo (2010), LISFLOOD: a GIS-based distributed model for river basin scale water

    balance and flood simulation, International Journal of Geographical Information Science, 24(2), 189212, doi:10.1080/13658810802549154.

    Zambrano-Bigiarini, M.; R. Rojas (2013), A model-independent Particle Swarm Optimisation software for model calibration, Environmental Modelling & Software, 43, 5-25, doi:10.1016/j.envsoft.2013.01.004.

    Zambrano-Bigiarini, M. (2013). hydroTSM: Time series management, analysis and interpolation for hydrological modelling. R package version 0.4-1. http://CRAN.R-project.org/package=hydroTSM

    Zambrano-Bigiarini, M. (2013). hydroGOF: Goodness-of-fit functions for comparison of simulated and observed hydrological time series. R package version 0.3-7. http://CRAN.R-project.org/package=hydroGOF

    7) Concluding Remarks The use of the 'parallel' option available in the hydroPSO, allowed a

    substantial reduction of the total calibration time (ca. 50% with 6 cores). R proved to be an efficient environment to facilitate modeling, visualization and

    data analysis at continental scale. The use of a single environment for pre-processing, calibrating and

    post-processing of results made easier further changes to any step of the workflow.

    Results in hundreds of catchments with different hydro-climatological regimes showed that hydroPSO is an effective and efficient R package for finding near-optimal parameter sets at a low computation cost.

    Notwithstanding this case study is related only to the calibration of a hydrological model written in Ptyhon+PCRaster, we believe that a similar approach can be applied to a wide class of environmental models requiring some form of parameter optimization, from micro to global scale.

    3) Why using R for massive hydrological analysis?

    Base functions allow efficient data manipulation and storage (spatial data and time series).

    Support for almost every vectorial and raster spatial format (rgdal, raster and sp packages).

    R is both a scientific software and a programming language (types, objects, functions, extensions).

    Scripting capabilities allow explicit documentation and reproducible research.

    Fully-customizable and high-quality graphical functions for exploratory data analysis and visualization.

    Highly extensible (4000+ packages with state-of-the art contributions in several fields of knowledge).

    Easy integration with other languages (C/C++, Fortran, Python, etc), e.g., for intensive computations.

    Easy parallelization (multi-core machines or network clusters).

    Multi-platform (GNU/Linux, Mac/OS X, Windows) Free and open-source.

    Fig 01. Shaded boxes represent the seven major calibration areas used for splitting up the pan-European spatial domain. Colored dots represent discharge stations coming from two different data sources, which were analyzed to select ca. 700 stations for calibration.

    Fig 02. Flow chart of the calibration of a single catchment. Files ParamRanges.txt and ParamFiles.txt defines which parameters are to be calibrated and where they have to be modified, respectively. Settings.xmldefines location of model input files and the value of model parameters. Light-blue shaded boxes indicate some user intervention, while light-yellow shaded boxes represent static input files (not modified during optimization).

    obs.tss : file with observed discharges. dis.tss : file with simulated discharges. read_tss(): user-defined R function for reading .tss files

    NSE() : R function for computing the Nash-Sutcliffe efficiency (hydroGOF package) SPSO-2011

    : Standard Particle Swarm Optimization

    2011 (hydroPSO package).

    5) Model Calibration

    6) Calibration results + post-processing

    Fig 03. Evolution of the global optimum (Nash-Sutcliffe efficiency) and the normalized swarm radius (norm) along the number of iterations.

    Fig 06. Figure automatically generated for assessing the quality of the calibration results of each single catchment. The upper panel shows a comparison of the observed and simulated hydrographs during the verification period; the lower left panel shows a comparison of the flow duration curves thereof, while the lower right panel shows numerical statistics for comparing observations with their simulated counterparts.

    Fig 04. Nash-Sutcliffe efficiency (NSE) response surface projected onto the parameter space (pseudo 3D-dotty plots) for selected parameters, to highlight equifinality issues.

    NSE

    Fig 05. Dotty plots showing the model performance (NSE) versus parameter values, for three selected parameters. Vertical red line indicates the optimum parameter value.

    Slide 1