multivariate streamflow simulation using hybrid deep

Research ArticleMultivariate Streamflow Simulation Using Hybrid DeepLearning Models

Eyob Betru Wegayehu and Fiseha Behulu Muluneh

School of Civil and Environmental Engineering Addis Ababa Institute of Technology Addis Ababa UniversityAddis Ababa Ethiopia

Correspondence should be addressed to Eyob Betru Wegayehu eyobbetruaaiteduet

Received 13 August 2021 Revised 30 September 2021 Accepted 5 October 2021 Published 27 October 2021

Academic Editor Maciej Lawrynczuk

Copyright copy 2021 Eyob BetruWegayehu and Fiseha BehuluMuluneh-is is an open access article distributed under the CreativeCommons Attribution License which permits unrestricted use distribution and reproduction in any medium provided theoriginal work is properly cited

Reliable and accurate streamflow simulation has a vital role in water resource development mainly in agriculture environmentdomestic water supply hydropower generation flood control and early warning systems In this context these days deep learningalgorithms have got enormous attention due to their high-performance simulation capacity In this study we comparedmultilayerperceptron (MLP) long short-term memory (LSTM) and gated recurrent unit (GRU) with the proposed new hybrid modelsincluding CNN-LSTM and CNN-GRU Hence we can simulate one-step daily streamflow in different agroclimatic conditionsrolling time windows and a range of variable input combinations -e analysis used daily multivariate and multisite time seriesdata collected from Awash River Basin (Borkena watershed Ethiopia) and Tiber River Basin (Upper Tiber River Basin Italy)stations-e datasets were subjected to rigorous quality control processes Consequently it rolled to a different time lag to removenoise in the time series and further split into training and testing datasets using a ratio of 80 20 respectively Finally the resultsshowed that integrating the GRU layer with the convolutional layer and using monthly rolled average daily input time series couldsubstantially improve the simulation of streamflow time series

1 Introduction

One of the emerging research areas in hydrology is hy-drological simulation [1] through which catchment re-sponses are evaluated in terms of meteorological forcingvariables Hydrological simulation is also crucial for waterresource planning and management such as flood pre-vention water supply distribution hydraulic structure de-sign and reservoir operation [2 3] However river flowsimulation is not an easy task since river flow time series arecommonly random dynamic and chaotic -e relationshipbetween streamflow generation and other hydrologic pro-cesses is nonlinear which is controlled not only by externalclimatic factors and global warming but also by physicalcatchment characteristics

Stream flows are mostly recorded at river gauging sta-tions However different research studies show that theavailability of gauging station records is generally decreasing

in most parts of the world [4] Tourian et al [5] gathered atime series plot of the number of stations with availabledischarge data from the Global Runoff Data Centre (GRDC)-is time series indicates a decline in the total monitoredannual stream flows between 1970 and 2010 Besides in-adequate discharge observation and malfunctioned gaugingstations worsen the situation in developing countries [6]Sparsely distributed rain gauge stations in Ethiopia also limitthe performance of physical hydrological models -ereforeresearch studies on the robustness of innovative dischargedata estimation models are undeniably important

Streamflow simulation models in the literature generallyare divided into two (1) process or physical-based modelsthat are generated from catchment characteristics and (2)data-driven models that depend on historically collecteddata [2 3 7] Process-based models commonly use theexperimental formula that provides insight into physicalcharacteristics and has extensive data requirements On the

HindawiComputational Intelligence and NeuroscienceVolume 2021 Article ID 5172658 16 pageshttpsdoiorg10115520215172658

other hand data-driven models are suitable and can func-tion easily without considering the internal physicalmechanism of the watershed system [2 3 7]

Artificial neural networks (ANNs) are the most used andstudied ldquoblack-boxrdquo models -ey are utilized in manyscientific and technological areas than the list of availableblack-box algorithms such as support vector machine(SVM) genetic programming (GP) fuzzy logic (FL) re-current neural network (RNN) and long short-termmemory (LSTM) [7 8] ANN is available in differentfunctionalities and architectural forms from simple to ad-vanced levels A recurrent neural network (RNN) is one ofthe advanced ANN architectures It has been considered aspecially designed deep learning network for time seriesanalysis that quickly adapts to temporal dynamics usingprevious time step information [2] However RNN cannotcapture long-time dependencies and it is susceptible tovanishing and exploding gradients

Couta et al suggested advanced RNN or long short-termmemory (LSTM) as one of the most effective approaches [8]-e LSTM unit has a cell that comprises an input gate anoutput gate and a forget gate [9] Due to these gates theLSTM model has shown promising results in different ap-plications including speech recognition time series mod-elling natural language processing handwritingrecognition and traffic flow simulation [3 10] Studies havealso shown that LSTM has powerful performance forstreamflow simulation over different powerful multilayered(ML) tools [3 11] Campos et al [10] applied autoregressiveintegrated moving average (ARIMA) and LSTM network toforecast floods on four Pararsquo iba do Sulrsquos River stations inBrazil Aljahdali et al [7] also compared the LSTM networkand layered RNN to forecast streamflow in the USArsquos tworivers the Black and Gila rivers A recent article byRahimzad et al [12] used time-lagged Qt-1 Qt-2 and otherclimatic variables to forecast Qt in the future and concludedthat the LSTM network outperforms linear regression (LR)multilayer perceptron (MLP) and support vector machine(SVM) in forecasting daily streamflow

A few years back Cho et al [13] introduced gated re-current units (GRUs) similar to LSTM with a forget gatewhich have fewer parameters than LSTM as it lacks anoutput gate GRUrsquos capacities in speech signal modelling andnatural language processing were similar to those of LSTMHowever there are debates on the relative performance ofthese two architectures for streamflow and reservoir inflowsimulation which is not well studied with different time-scales and environments

Notwithstanding the difference in their performanceselecting appropriate time series models from variousknown deep learning network architectures is difficultLSTMs and GRUs are not always the ideal sequence pre-diction option However simulation with better predictionaccuracy fast running time and less complicated modelsrequires more research Hence this comparative analysis onthe network architectures helps decide the optimized al-ternative for time series analysis Recently different hybriddeep learning models are getting wide attention from re-searchers in various fields of study Chen et al [14] used

convolutional neural network (CNN) LSTM and hybridCNN-LSTMmodels for nitrogen oxide emission prediction-ey concluded that CNN-LSTM has an accurate and stableforecast of periodic nitrogen oxide emissions from the re-fining industry Moreover Li et al [15] used univariate andmultivariate time series data as input for LSTM and CNN-LSTM models Hence for the analysis of air quality usingparticulate matter (PM25) concentration prediction theproposed multivariate CNN-LSTM model gives the bestresult due to low error and short training time

-e integration of CNN and LSTMmodels benefits timeseries prediction models such that the LSTM model canefficiently capture long time sequences of pattern infor-mation In contrast CNN models can filter out the noise ofthe input data and extract more valuable features whichcould increase the accuracy of the prediction model [16]Moreover integrating CNN with GRU can also lead us torobust preprocessing of data providing a viable option toimprove the modelrsquos accuracy [17] Even though combiningCNN with LSTM showed remarkable results in differentstudies its application in hydrological fields still demandsmore research [18] Muhammad et al [19] used LSTMGRU and hybrid CNN-GRU models for streamflow sim-ulation based on 35 years of Model Parameter EstimationExperiment (MOPEX) dataset of 10 river basins in the USA-ey revealed that the proposed hybrid model outperformsthe conventional LSTM nevertheless the performance isalmost the same with GRU Recently Barzegar et al [20]studied short-term water quality variable prediction using ahybrid CNN-LSTM model and effectively captured low andhigh water quality variables mainly dissolved oxygenconcentrations

Screening input variables for different model architec-tures is also a challenging task for the researchers Eventhough rainfall evaporation and temperature are causalvariables for streamflow modelling data availability andstudy objectives limit the choice variability [21] Van et al[21] discussed that applying temperature and evapotrans-piration input nodes into the model increases the networkcomplexity and causes overfitting In contrast Parisouj et al[22] concluded that using readily available input variablessuch as temperature and precipitation for data-drivenstreamflow simulation will provide a reliable result Hencethis research will contribute a step to this debate by testingdifferent input combinations of various climatic regions inthe performance of the proposed models

To the best of our knowledge we identify minimal lit-erature that shows the performance variation of differenthybrid models for streamflow simulation in various inputvariability conditions at once -us we compared variousforms of hybrid CNN-LSTM and CNN-GRU architectureswith the classical MLP GRU and LSTM networks tosimulate single-step streamflow using two climatic regionsavailable precipitation and minimum and maximum tem-perature data Moreover the study tests the hybrid modelswith different layer arrangements and applies Keras tuner tooptimize model hyperparameters In general the primaryobjective of this study will be to test the performance var-iation of the proposed models with extreme input variability

2 Computational Intelligence and Neuroscience

conditions which includes climatic input combinationinput time window and average rolling time windowvariability

-is study used different open-source software andmachine learning libraries including Python 36 for pro-gramming NumPy pandas Scikit-learn Hydroeval Stats-models and Matplotlib libraries All were used for datapreprocessing evaluation and graphical interpretationMoreover TensorFlow and Keras deep learning frameworkswere employed for modelling deep learning architectures

2 Study Area

In the present study two river subcatchments were selectedin two climatic regions the Awash River Basin Borkenasubcatchment in Ethiopia (Figure 1(a)) and the Upper TiberRiver Basin in Italy (Figure 1(b))

21 Borkena Watershed (Ethiopia) -e first case study areais in the Borkena watershed at the Kombolcha station outletlocated in the upper part of the Awash River Basin in thenorthern part of Ethiopia -e mainstream of the watershedemanates from Tosa mountain which is found near Dessietown -e arearsquos altitude ranges from 1775m at the lowestsite near Kombolcha to 2638m at the highest site upstreamof Dessie -e main rainy season of this watershed is fromJuly to September

22 Upper Tiber River Basin (Italy) -e second case studyarea is located in the Upper Tiber River Basin (UTRB) inItaly -e Tiber River Basin (TRB) is the second-largestcatchment in Italy [23] Geographically the basin is locatedbetween 405degN to 43degN latitudes and 105deg E to 13deg E lon-gitudes covering about 17500 km2 that occupies roughly 5of the Italian territory -e Upper Tiber River Basin (UTRB)is part of the TRB covering 4145 km2 (sim20 of the TRB)with its outlet at Ponte Nuovo -e elevation of thecatchment ranges from 148 to 1561m above sea level -earearsquos climate is the Mediterranean with precipitationmainly occurring from autumn (March to May) to spring(September to November) -e intense rainfall highly in-fluences the basinrsquos hydrology at the upstream part thatcauses frequent floods in the downstream areas [24]

3 Data Source and Preprocessing

Borkenarsquos required hydrological and metrological datasetswere collected from the Ministry of Water Irrigation andEnergy (MoWIE) of Ethiopia and the National Meteoro-logical Agency of Ethiopia (NMA) respectively UTRBrsquosdatasets were collected from the National Research Councilof Italy (CNR) and archived for public use with the WaterResource Management and Evaluation (WRME) platform inthe following link httphydrogateunipgitwrme

We collected 5844 available data series from the timewindow of January 1 1999 to December 31 2014 fromthe Borkena watershed Similarly for UTRB 7670 dataseries were collected from January 1 1958 to December

31 1978 Both case study datasets are multivariate andmultisite Even though we are highly concerned and chosethe series of time windows with minimum data gap forboth stations the datasets contain many missing valuesdue to different reasons -us our first task was to fill themissing values with the Monte Carlo approach for thisresearch

-e study applied linear correlation statistics to mea-sure the strength of dependency between different inputvariables [25] Even though Mehr and Gandomi [26] statedthat linear correlation might mislead or provide abundantinputs our study does norsquot have a huge feature size thatrequires intensive feature selection criteria Hence weadopted a linear correlation coefficient Moreover Kunet al [27] concluded that Pearson correlation coefficient(PCC) is the most applicable for multiple linear regressions(MLRs) and Oyebode [28] also stated that inputs selectedwith PCC showed superior model accuracy Hence thisstudy applied Pearson linear correlation coefficient [29 30]It has a value ranging between ldquo+1rdquo and ldquo-1rdquo where ldquo+1rdquoindicates a positive linear correlation ldquo0rdquo is no linearcorrelation and ldquo-1rdquo shows a negative linear correlation[25] Equation (1) calculates the Pearson correlation co-efficient and Tables 1 and 2 present the result Correlationvalues between positive (0 and 03) and negative (0 andminus03) show a weak linear relationship among variables [31]However since we have a small number of variables anddata size for this study we decided to omit Borkena station(Tmax) values which have r values ranging between(minus0129) and (+0107) and the details are presented inTable 1

r N 1113936 XY minus 1113936 X 1113936 Y( 1113857

N 1113936 x2

minus 1113936 X( 11138572

1113960 1113961 N 1113936 Y2

minus 1113936 Y( 11138572

1113960 1113961

1113969 (1)

After passing rigorous quality control processes the rawdata were then split chronologically into training and testingdatasets with a ratio of 80 20 respectively -e time seriesgraph and the corresponding box plot of split data for bothstations are presented in Figure 2 Different options existedin the literature to remove noise from the time series Asliding window is the first option to temporarily approxi-mate the time series datarsquos actual value [32] In comparisonrolling windows (moving average) is the second option thatsmooths the time series data by calculating the averagemaximum minimum or sum over a specific time [33]Hence for this study we applied average rolling windows tosmooth and remove noise from the time series by keepingthe length of the data still

-en daily weekly and monthly average rolling slidingwindows were used to rebuild the input and output timeseries into a supervised learning format Accordingly therolled time series data were then prepared with the time lagwindow of 30 or 45 for single-step streamflow simulation atBorkena and UTRB stations respectively Moreover splittime series data variable scaling was performed usingStandard Scaler for the modelling processrsquos computationaleasiness and numerical stability

Computational Intelligence and Neuroscience 3

4 Methods

In this study three types of network architectures MLPGRU and LSTM were compared with the proposed hybriddeep neural network architectures CNN-LSTM and CNN-GRU for the simulation of single-step streamflow by takingdifferent combinations of precipitation (P) minimumtemperature (Tmin) and maximum temperature (Tmax) asinputs -e proposed simulation model architectures withtheir input and output variables are briefly presented as aflowchart in Figure 3

41 Deep LearningModels Deep learning models are part ofa broader family of machine learning including recurrentneural networks (RNNs) convolutional neural networks(CNNs) deep belief networks (DBNs) and deep neuralnetworks (DNNs) -ese models have been applied to dif-ferent fields of study including speech recognition com-puter vision natural language processing and time seriesanalysis [13 16 34ndash36] -e following sections will brieflydiscuss some of these architectures that were used in thepresent study

42 Artificial Neural Network (ANN) Artificial neural net-work (ANN) is the most common machine learning modelthat has found application in streamflow simulation over thelast two decades [1 37] It is known for modelling complexinput-output relationships inherent in hydrological timeseries features within a river catchment -e traditionalfeedforward neural network (FFNN) with three layers ofinput-output and hidden layers trained by backpropagation(BP) algorithm gained popularity for nonlinear hydrologicaltime series modelling

Figure 4 displays the typical architecture of ANN

1113954yj fj 1113944

m

h1wjh lowastfh 1113944

n

i1whixi + whb

⎛⎝ ⎞⎠ + wjb⎡⎢⎢⎣ ⎤⎥⎥⎦ (2)

where i h j b and w indicate neurons of the input hiddenoutput layers bias and applied weight of the neuron re-spectively fh and fj show the activation functions of thehidden layer and output layer respectively xi n and mrepresent respectively the input value input neuron andhidden neuron numbers and y and 1113954yj denote the observedand calculated target values respectively In the calibrationphase of the model the values of the hidden and outputlayers and corresponding weights could be varied andcalibrated [38]

-e ability of ANN to link input and output variables incomplex hydrological systems without the need for priorknowledge about the nature of the process has led to a hugeleap in the use of ANN models in hydrological simulations[38]

43 Long Short-Term Memory (LSTM) -e difference ofLSTM from the classical MLP network is that layers of theneurons in LSTM have recurrent connections thus the statefrom the previous activation time step is used to formulatean output -e LSTM replaces the typical neuron in thehidden layer with a memory cell and three gates an inputgate a forget gate and an output gate [39] It is an advancedform of recurrent neural network (RNN) that can capturelong-term dependencies On the other hand RNN is acircular network in which an additional input is added torepresent the state of the neuron in the hidden layer at theprevious time steps [40] LSTM has two critical benefits overRNN overcoming vanishing and exploding gradients andholding memory to capture long-term temporal dependencyin input sequences -e mathematical formulation for dif-ferent parameters is listed in Table 3 and Figure 5 displaysthe LSTM memory cell with three gated layerslowastWiWfWoand Wc are the weights that map the hidden layer input tothe three gates of input forget and outputUi Uf Uo andUcweight matrices map the hidden layer output to gates bi bfbo and bc are vectors Ct and ht are the outcome of the celland the outcome of the layer respectively

39deg30prime0PrimeE

10deg30prime0PrimeN

11deg0prime0PrimeE

10deg30prime0PrimeNGuaging Station

0 35 7 14 21 28Kilometers

EW

N

S

Borkena Watershed

Awash River Basin

Ethiopian River Basins

11deg0prime0PrimeN

40deg0prime0PrimeE

39deg30prime0PrimeE 40deg0prime0PrimeE

(a)

12deg0prime0PrimeE

43deg0prime0PrimeN

43deg30prime0PrimeN

43deg0prime0PrimeN

River Catchments in ItalyTiber River BasinUpper Tiber River Basin

0 210420 840 1260 1680Kilometers

EW

N

S

Gauging Station(Ponte Nuovo)

43deg30prime0PrimeN

12deg30prime0PrimeE


(b)

Figure 1 Location of case study areas (a) Borkena (b) UTRB


44 Gated Recurrent Unit (GRU) GRU is a special type ofLSTM architecture in which it merges the input and forgetgates and converts them into an update gate which makesthe parameter numbers fewer and the training will be easier-ere are two input features each time the input vector xtand the previous output vector htminus1 -e output of eachspecific gate can be calculated through logical operation andnonlinear transformation of the input [34] -e mathe-matical formulations among inputs outputs and differentparameters are listed in equations (3) (4) (5) and (6)Moreover Figure 6 displays the structure of the gated re-current unit (GRU) network

Zt σ WzXt + Uzhtminus1 + bz( 1113857 (3)

rt σ WrXt + Urhtminus1 + br( 1113857 (4)

h

t tanh WhXt + rt lowast htminus1( 1113857 + bh( 1113857 (5)

ht 1 minus Zt( 1113857lowast htminus1 + Zt lowast h

t (6)

where Zt is the update gate vector rt is the reset gate vectorW and U are parameter matrices σ is a sigmoid functionand tanh is a hyperbolic tangent

45 Convolutional Neural Network (CNN) Convolutionalneural network (CNN) is one of the most successful deeplearning models especially for feature extraction and itsnetwork structures include 1D CNN 2DCNN and 3DCNN[15] CNN structure generally consists of a convolutionlayer a pooling layer and a full connection layer [18]

Table 1 Descriptive statistics of split time series data for the Borkena watershed

Stations Data type Pearson correlation with streamflowTraining data (80) Testing data (20)

Mean Max Min SD Mean Max Min SD

Kombolcha

Stream flow (m3sec) 1000 109 2169 000 232 101 948 00 202P (mmday) 0321 31 732 00 75 29 604 00 72Tmin (oc) 0271 125 209 15 33 125 206 26 34Tmax (degc) minus0099 272 336 164 25 273 330 196 21

ChefaP (mmday) 0344 35 816 00 86 34 643 00 81Tmin (oc) 0266 133 215 01 37 141 222 39 35Tmax (oc) minus0069 299 380 185 28 303 380 222 25

DessieP (mmday) 0335 35 806 00 86 29 670 00 73Tmin (oc) 0319 85 155 01 25 78 155 00 31Tmax (oc) 0107 238 300 160 19 241 300 150 21

KemiseP (mmday) 0372 31 819 00 83 29 721 00 75Tmin (oc) 0282 138 220 30 34 135 201 45 36Tmax (oc) minus0129 310 383 140 27 319 378 235 24

MajeteP (mmday) 0347 33 807 00 86 33 813 00 86Tmin (oc) 0202 147 230 14 29 146 215 67 29Tmax (oc) minus0057 286 378 172 28 291 380 208 24

Table 2 Descriptive statistics of split time series data for the UTRB


Mean Max Min SD Mean Max Min SDPonte Nuovo Streamflow (m3sec) 1000 506 9390 19 755 506 7370 37 686Castel Rigone P (mmday) 0384 26 728 00 66 27 677 00 69Montecoronaro P (mmday) 0339 39 2290 00 107 40 1100 00 105

Perugia (ISA)P (mmday) 0379 24 1204 00 66 25 618 00 63Tmin (oc) minus0353 97 304 minus90 63 93 252 minus50 56Tmax (oc) minus0379 174 374 minus45 81 163 330 06 72

Petrelle P (mmday) 0345 251 900 00 69 27 1171 00 74Pietralunga P (mmday) 0428 322 1500 00 81 31 731 00 73

SpoletoP (mmday) 0412 29 1136 00 79 29 942 00 78Tmin (oc) minus0265 75 230 minus126 64 88 217 minus54 58Tmax (oc) minus0383 188 387 minus35 86 187 368 20 78

Torgiano P (mmday) 0364 24 1412 00 71 25 620 00 69

Gubbio Tmin (oc) minus0315 87 260 minus120 59 61 193 minus113 54Tmax (oc) minus0377 181 390 minus80 81 174 341 minus09 75

Assisi Tmin (oc) minus0325 92 256 minus116 62 82 215 minus80 56Tmax (oc) minus0378 182 378 minus50 83 181 358 00 78


1D CNN is mainly implemented for sequence dataprocessing [41] 2D CNN is usually used for text and imageidentification [42] and usually 3D CNN is recognized formodelling medical image and video data identification [43]Hence since the aim of the present study is time seriesanalysis we implemented 1D CNN -e detailed process of1D CNN is described in Figure 7

As depicted in Figure 7 the input series is convoluted tothe convolution layer from top to bottom (shown by thearrows) -e grey or the mesh colours represent differentfilters where the size of the convolution layer depends on thenumber of input data dimensions the size of the filter andthe convolution step length

46 CNN-LSTM and CNN-GRU Hybrid Models In thisstudy hybrid models were designed by integrating CNNwith LSTM or GRU layers Hence the feature sequence fromthe CNN layer was considered as the input for the LSTM orGRU layer and then the short and long-time dependencieswere further extracted

-e proposed CNN-LSTM or CNN-GRU modelscontain two main components the first component con-sists of one dimensional single or double convolutional andaverage pooling layers Moreover a flatten layer is con-nected to further process the data into the format requiredby the LSTM or GRU In the second component thegenerated features are processed using LSTM GRU and

dense layers Additionally dropouts are introduced toprevent overfitting Figure 8 shows the designed modelinputs and outputs with a basic description of the con-volutional pooling and LSTM or GRU layers proposed forthis project

5 Data Analysis

Simulation with deep learning requires selecting a probablecombination of hyperparameters batch size epochsnumber of layers and number of units for each layer [8]Optimizing hyperparameters is not always consistent asthere is no hard rule to follow ldquo-e process is more of an artthan a sciencerdquo [44] Hence in this study we chose the Kerastuner optimizer developed by the Google team and includedit in the Keras open library [45 46]

51 Hyperparameter Optimization Tuning machine learn-ing model hyperparameters is critical Varying hyper-parameter values often results in models with significantlydifferent performances [47] -e models applied in thisstudy mainly contain two types of hyperparameters con-stant hyperparameters that are not altered through theoptimization process and variable hyperparameters Adamoptimizer is applied under the category of the constanthyperparameters because of its efficiency and easiness toimplementation that requires minimum memory and is

02000 2002 20082004 2006

Time (years)

Training Testing

01960 1964 1968

Time (years)

1972 1976 1980

200Q (m

3se

c)

400600800

2010 2012 2014

50Q (m

3se

c)100150200

0

Total TrainSplit data

Test

2468

Q (m

3se

c) 10121416

0


Test

2040Q

(m3

sec)

6080

100120

(a)

(a) (b)

(b)

Figure 2 Streamflow time series graph and the corresponding box plot of split data (a) Borkena (b) UTRB


commonly suited in different problems [48] In this cate-gory rectified linear unit (Relu) was used as an activationfunction and mean squared error (MSE) was used as a lossfunction

In contrast the second type of changing hyper-parameters is optimized by Keras tuner and hyperparameterchoices or value ranges for optimization are set using dif-ferent trials We also considered our PC capacity (processorIntel(R) Core (TM) i7-6500U CPU 250GHz and RAM 8gigabytes) with Windows 10 operating system Hyper-parameters are optimized with 20 trials and since deeplearning networks have different training and validationplots for each run we decided to repeat the iteration threetimes

All hyperparameter ranges or choices are listed inTable 4 CNN-LSTM1 and CNN-GRU1 models usedhyperparameter values from numbers 1 to 13 for opti-mization (Table 4) while we omitted 4 5 and 6 for CNN-LSTM2 and CNN-GRU2 -e remaining deep learningmodels MLP LSTM and GRU used a list of hyper-parameters from numbers 7 to 13 Finally each optimizedhyperparameter is used for each training and testing ex-periment Moreover the train and test traces from eachrun can be plotted to give a more robust idea of the be-haviour of the model to inspect overfitting and under-fitting issues

52 Performance Measures A wide variety of evaluationmetrics are listed in the literature [49] However the popularones are mean error (ME) coefficient of determination (R2)root mean square error (RMSE) mean absolute error(MAE) mean percentage error (MPE) mean absolutepercentage error (MAPE) mean absolute scaled error(MASE) and NashndashSutcliffe efficiency (NSE) -is study

used different input and model variability conditionsHence to concisely measure the analysis output and presentthe result we applied the following top three standardperformance evaluation criteria that can also have the po-tential to capture the extreme streamflow time series valueseffectively [50]

Coefficient of determination (R2)

R2

n 1113936

Qobs lowastQsim( 1113857 minus 1113936

Qobs( 1113857lowast 1113936

Qsim( 1113857

n 1113936Q

2obs1113872 1113873 minus 1113936

Qobs( 1113857

21113960 1113961

121113969

lowast n 1113936Q

2sim1113872 1113873 minus 1113936

Qsim( 1113857

21113960 1113961

(7)

Root mean square error (RMSE)

RMSE

1113936Nn1 Q

t0bs minus Q

tsim1113872 1113873

2

N

1113971

(8)

Mean absolute error (MAE)

P(t+n) Q(t+n)

P(t+1)

P(t)

Tmin(t+n)

Time

Xt Q

TimeCatchment

Tmin(t+1)

Tmin(t)

Input

Tmax(t+n)MLP

GRU

LSTM

CNN-LSTM

CNN-GRU

ModelArchitectures

Tmax(t+1)

Tmax(t)

Output

Figure 3 A simple architecture of the proposed models

X1W111

W112

W113

W133

W231

W221

W211

X2

Xn

Input layer Hidden layer

Output layer

h1

h2 y

hn

Figure 4 Typical architecture of ANN


Table 3 Mathematical formulation for LSTM cell

Network gates Purpose Equationslowast

Forget gate Chooses the information to reject from the cell ft σ (uf xt +wf ht-1 + bf )Input gate Decides what information is relevant to update in the current cell state it σ (ui xt +wi ht-1 + bi)Output gate Decides what to output based on input and the long-term memory of the cell ot σ (uo xt +wo ht-1 + bo)

Cell state Long-term memory Cet tanh (WcXt +Uchtminus1 + bc)Ct ftlowast Ctminus1 + itlowastCet

Hidden state Short-term memory ht tanh (Ct) lowast Ot

Ct-1

ht-1

ft

it

Ot

Cet

ht

φ

PreviousShort-Term

Memory

UpdatedShort-Term

Memory

PreviousLong-Term

Memory

UpdatedLong-Term

Memory

Ct

yt

ht

Xt Input

Output

+

Figure 5 LSTM memory cell with three gated layers [11]

htht-1

PreviousLong-Term

+Short-TermMemory

Xt Input

y^t

h^t

Zt

rt

tanh

1-

σσ

Output

UpdatedLong-Term

+Short-TermMemory

Figure 6 -e structure of gated recurrent unit (GRU) network [34]

W1

Input P1

C1

I1

I2

In

C2

C1=W1P1+W2P2C2=W1P2+W2P3Cn=W1Pn-1+W2Pn

C3 Cn

P2 P3 Pn

Convolution - Layer

Multi-variate Timeseries

Convolution - Layer

For Filter Size of 21

W2

Figure 7 -e process of 1D CNN [15]


Daily Multivariate Timeseries Data

Daily Weekly RolledAverage

Borkena StationsP and Tmin of the past 30 days

Upper Tiber StationsP Tmin and Tmax of the Past 45 days

Inpu

t

1D co

nv

Aver

ageP

oolin

g

Flat

ten

LSTM

or G

RU

LSTM

or G

RU

Dro

pout

Dro

pout

Out

put

Den

se(s

ingl

e out

put)

Both stations Daily Weekly and Monthlyrolled single step Streamflow

Monthly RolledAverage

Figure 8 -e basic architecture of the proposed CNN-LSTM or CNN-GRU models

Table 4 Model hyperparameter choices or value ranges for optimization by Keras tuner

No HyperparametersValue rangeslowastlowast

Choices DefaultMin Max Step

1 Conv_1_filter 8 32 8 lowast lowast

2 Conv_1_kernal lowast lowast lowast 2 or 3 lowast

3 Conv_1_pool_size lowast lowast lowast 2 or 3 lowast




7 CNN-LSTM1 CNN-LSTM2 CNN-GRU1 CNN-GRU2 LSTM GRU or MLP layer 1units 5 30 5 lowast lowast

8 Dropout 1 00 03 01 lowast 02


10 Dropout 2 00 03 01 lowast 0211 Learning rate lowast lowast lowast 1e-2 1e-3 or 1e-4 lowast

12 Number of epochs 10 100 10 lowast lowast

13 Number of batch sizes 10 100 10 lowast lowast

lowastlowastValue ranges or choices for optimization by Keras tuner (objective ldquovalidation lossrdquo max trials 20 and executions per trial 3) lowastNot applicable

Table 5 Daily streamflow simulation performance comparison of the proposed models for different input variables and climaticconditions

Model 1

Borkena UTRBP+Tmin P P+Tmin +Tmax P

R M R2 T R M R2 T R M R2 T R M R2 TM A T M A T M A T M A TS E P S E P S E P S E PE Elowast (sec) E Elowast (sec) E Elowast (sec) E Elowast (sec)

MLP 991 501 077 089 938 463 079 063 4911 2274 049 078 5657 2814 033 041GRU 878 437 082 361 794 364 085 332 4663 2089 055 261 5109 2674 045 339LSTM 841 409 083 235 965 487 078 292 4864 2279 051 386 4859 2500 051 598CNN-LSTM1 809 407 084 046 857 467 082 041 5120 2295 045 119 5616 2655 034 057CNN-LSTM2 799 409 085 072 914 450 080 045 4538 2185 057 082 5157 2584 044 185CNN-GRU1 794 366 085 063 832 409 083 086 5506 2349 037 116 5242 2498 043 083CNN-GRU2 907 419 080 101 843 426 083 028 4561 2179 057 064 4996 2538 048 068lowastTTPE (training time per epoch) -e bold values indicate the highest performance score


Table 6 Weekly rolled streamflow simulation performance comparison of the proposed models for different input variables and climaticconditions

Model 2

Borkena UTRBP+Tmin P P+Tmin+Tmax P



Table 7 Monthly rolled streamflow simulation performance comparison of the proposed models for different input variables and climaticconditions

Model 2




0

0102030405

Loss

06070809

10 20 30 40

Epochs

50 60 70 80

Train

Test

(a)

00102030405Lo

ss 06070809

10 20 30 40

Epochs

50 60

Train

Test

(b)

Figure 9 Training and test loss function of the optimized high score hybrid model (a) CNN-GRU2 model for Borkena Station (b) CNN-GRU2 model for UTRB Station


MAE 1n

1113944

n

i1Q

t0bs minus Q

tsim

11138681113868111386811138681113868111386811138681113868 (9)

where Qobs discharge observed Qsim discharge simu-lated and n number of observations -e range of R2 liesbetween 0 and 1 representing respectively no correlationand a perfect correlation between observed and simulatedvalues whereas smallest RMSE and MAE scores or valuesclose to zero direct to the best model performance

6 Results

Streamflow simulation result with the proposed seven deeplearning architectures different input time window seriestwo climatic regions two input combinations and three

0

0

10

20

30

Test

Q (m

3se

c)

40

50

60

70

10 20 30 40

Predicted Q (m3sec)

50 60 70

Linear_fit

Data_points

2011

-09

2012

-01

2012

-05

2012

-09

2013

-01

2013

-05

2013

-09

2014

-01

2014

-05

2014

-09

2015

-01

010203040

Q (m

3se

c) 506070

Time (Year)

Test

Predicted

(a)

0

0255075

Test

Q (m

3se

c)100125150175

25 50 75 100

Predicted Q (m3sec)

125 150 175

Linear_fit

Data_points

1975

-01

1975

-07

1976

-01

1976

-07

1977

-01

1977

-07

1978

-01

1978

-07

1979

-01

0255075

100

Q (m

3se

c) 125150175

Time (Year)

Test

Predicted

(b)

Figure 10 Comparison of true values and predicted values of the optimized high score hybrid model (a) CNN-GRU2 model for BorkenaStation (b) CNN-GRU2 model for UTRB Station

Table 8 Best hybrid model type input feature and Keras tuneroptimized hyperparameter values for Borkena station with its MSEscore

Hyperparameters CNN-GRU2Monthly rolled P

Conv_1_filter 24Conv_1_kernal 2Conv_1_pool_size 3GRU_l1_units 15Dropout1 01GRU_l2_units 20Dropout2 02Learning rate 00001Number of epochs 80Number of batch sizes 20Score (MSE) 0083


Table 9 Best hybrid model type input features and Keras tuner optimized hyperparameter values for UTRB station with its MSE score

Hyperparameters CNN-GRU2Monthly rolled P Tmin and Tmax


time_distributed_3_input InputLayer

time_distributed_3(conv1d_1) TimeDistributed(Conv1d)

time_distributed_4(average_pooling1d_1) TimeDistributed(AveragePooling1D)

time_distributed_5(flatten_1) TimeDistributed(Flatten)

gru GRU

dropout_2 Dropout

gru_1 GRU

dropout_3 Dropout

dense_1 Dense

input [( 30 5)]

[( 30 5)]output

input ( 30 5)

( 29 24)output

input ( 29 24)

( 9 24)output

input ( 9 24)

( 216)output

input ( 216)

( 15)output

input ( 15)

( 15)output

input ( 15)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 1)output

Figure 11 Internal network structure of the optimized high score hybrid CNN-GRU2 model for Borkena Station


average rolling time windows is presented in Tables 5 6 and7 Regardless of the combination of these conditions theCNN-GRUmodel showed promising results in most of thesescenarios (Tables 5 and 7) -e highest scores are presentedhere

(1) In daily streamflow simulation for Borkena stationCNN-GRU1 scored 794 366 085 and 063 and forUTRB station CNN-GRU2 scored 4561 2179 057and 064 for RMSE MAE R2 and training time perepoch respectively

(2) In weekly rolled streamflow simulation for Borkenastation CNN-LSTM2 scored 733 386 087 and052 and for UTRB station GRU scored 2521 1483077 and 556 for RMSE MAE R2 and training timeper epoch respectively

(3) In monthly rolled streamflow simulation the CNN-GRU2 model showed high performance with 515318 092 and 078 scores for Borkena station and1798 1299 083 and 071 for UTRB station which

are RMSE MAE R2 and training time per epochrespectively

Moreover from the proposed four hybrid models CNN-GRU2 or the model designed by a single 1D CNN layershowed the highest promising result on trial model 1(UTRB)and model 3 as shown in Tables 5 and 7 In contrast GRUon model 2 (UTRB) CNN-LSTM2 on model 2 (Borkena)and CNN-GRU1 on model 1 (Borkena) shared the second-highest promising result Streamflow simulation with theCNN-GRU2 model generally showed the highest perfor-mance than the other tested hybrid deep learning modelsand state-of-the-art LSTM GRU and MLP models In linewith our objectives the result is discussed with differentvariability conditions in the following paragraphs

61 Climatic Region Variability Testing models in differentclimatic conditions with historical data will likely providerobust deep learning models for streamflow simulation inthe future [51] Hence this research also tested different


time_distributed_15 (conv1d_5) TimeDistributed (Conv1d)

time_distributed_16 (average_pooling1d_5) TimeDistributed (AveragePooling1D)

time_distributed_17 (flatten_5) TimeDistributed (Flatten)

gru_10 GRU

dropout_10 Dropout

gru_11 GRU

dropout_11 Dropout

dense_5 Dense

input [( 45 15)]

[( 45 15)]output

input ( 45 15)

( 44 8)output

input ( 44 8)

( 22 8)output

input ( 22 8)

( 176)output

input ( 176)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 30)output

input ( 30)

( 30)output

input ( 30)

( 1)output

Figure 12 Internal network structure of the optimized high score hybrid CNN-GRU2 model for UTRB Station


models in two climatic regions and irrespective of climaticand time window variation the CNN-GRUmodel displayedthe highest scores on tested case study areas

62 Input Combination Variability Input combinationminimum temperature (Tmin) with precipitation (P) doesnot show significant performance increment in the Borkenastation (Tables 5 and 6) In some scenarios adopting P onlyas input increases the performance of the model (Table 7) Incontrast for UTRB streamflow simulation with all inputvariables or Tmin Tmax and P showed significant perfor-mance increments (Table 7)

63 Average Rolling Time Window VariabilityStreamflow simulation without rolling daily time series datahad deficient performance compared to monthly rolledaverage time series -is could be because the time seriesnoise in UTRB is visible compared to that in Borkenastation As a result performance increment from daily tomonthly rolled window models is much higher in UTRBthan in Borkena station

Generally the monthly rolled time window with theCNN-GRU2 model showed the top performance results inboth stations (Table 7) -e corresponding training and testloss functions of this optimized high score hybrid model forboth stations are displayed in Figure 9 ConsequentlyFigure 10 compares the true values and predicted values ofthis model -e optimized hybrid model boosts the per-formance score and lowers the training time per epochmuchbetter than GRU and LSTM models -is model inputfeature and Keras tuner optimized hyperparameter valuesfor both stations with its MSE score are presented in Tables 8and 9 Moreover the internal network structures of thesemodels are also shown in Figures 11 and 12 which displaythe model input and output parameter matrices for eachlayer

7 Conclusions

-is study showed a comparative analysis of different hybriddeep learning algorithms with state-of-the-art machinelearning models for one-step daily streamflow simulation attwo river basins or subcatchment stream flow outlets -eproposed algorithms for this study are CNN-LSTM andCNN-GRU hybrid deep learning models each model havingone or two 1D CNN layers with the classic MLP LSTM andGRU models -is study conducted a series of experimentsto observe the performance variation of the proposedmodels by introducing different input combinations rollingtime windows and climatic conditions for streamflowsimulation -e following list of points will summarize thesignificant findings of this study

(i) CNN-GRU2 with one 1D CNN layer showed thebest simulation performance reporting the lowestRMSE MAE and R2 out of all models in both casestudy areas Such results dictate that the

performance of the selected architectures is irre-spective of the climatic characteristics of the basins

(ii) Combining temperature data with precipitation asinput and inserting to the proposed models hadminimum performance increment in Borkenastation compared to UTRB case study area whichclearly showed that temperature data scarcity hasmore performance loss implication in UTRB sta-tion On the other hand the Borkena station hassignificant natural streamflow variability thanUTRB which is also reflected in the model results-is implies the consideration of catchment re-sponse before any deep learning modelapplications

(iii) Rolling the time window of input and output timeseries for streamflow simulation using the proposedmodels considerably increases performance in theUTRB than in the Borkena station

(iv) -e analysis results also showed that training timeper epoch for the hybrid deep learning models ismuch lower than that of GRU and LSTM models

Deep learning models usually require massive datasetsand their performance drops with small to medium datasetsHowever from this case study acceptable results andconsidering hybrid modelsrsquo hyperparameters sensitivity andcomplexity future research may further design optimizedconfigurations Moreover they can test these hybrid modelsfor long-term streamflow simulation in ephemeral seasonaland perennial river systems and other fields of study Ourfuture research will try to synchronize the highly performedhybrid deep learning models in this study with remotesensing datasets for the problem we experience in theungauged catchments

Data Availability

-e raw hydrological and metrological datasets used for theBorkena watershed are available from the correspondingauthor upon request However authorization letters arerequired from the Ministry of Water Irrigation and Energy(MoWIE) of Ethiopia (httpmowiegovet) and the Na-tional Meteorological Agency of Ethiopia (NMA) (httpwwwethiometgovet) whereas for UTRB the datasets canbe retrieved from an online repository (httphydrogateunipgitwrme)

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

-e corresponding author acknowledges the Ministry ofWater Irrigation and Energy of Ethiopia (MoWIE) and theItalian National Research Center (Centro National di Ric-cerca CNR) for the hydrological and metrological datasets-e corresponding author also thanks Dr Fiseha Behulu forhis advice and series follow-up


References

[1] Z Zhang Q Zhang and V P Singh ldquoUnivariate streamflowforecasting using commonly used data-driven models liter-ature review and case studyrdquo Hydrological Sciences Journalvol 63 no 7 pp 1091ndash1111 2018

[2] L Ni D Wang V P Singh et al ldquoStreamflow and rainfallforecasting by two long short-term memory-based modelsrdquoJournal of Hydrology vol 583 Article ID 124296 2020

[3] X Yuan C Chen X Lei Y Yuan and R Muhammad AdnanldquoMonthly runoff forecasting based on LSTM-ALO modelrdquoStochastic Environmental Research and Risk Assessmentvol 32 no 8 pp 2199ndash2212 2018

[4] A Sichangi L Wang and Z Hu ldquoEstimation of river dis-charge solely from remote-sensing derived data an initialstudy over the Yangtze riverrdquo Remote Sensing vol 10 no 9p 1385 2018

[5] M J Tourian N Sneeuw and A Bardossy ldquoA quantilefunction approach to discharge estimation from satellite al-timetry (ENVISAT)rdquoWater Resources Research vol 49 no 7pp 4174ndash4186 2013

[6] A W Sichangi L Wang K Yang et al ldquoEstimating conti-nental river basin discharges using multiple remote sensingdata setsrdquo Remote Sensing of Environment vol 179 pp 36ndash532016

[7] S Aljahdali A Sheta and H Turabieh ldquoRiver flow fore-casting a comparison between feedforward and layered re-current neural networkrdquo Learning and Analytics in IntelligentSystems vol 43 pp 523ndash532 2020

[8] D Couta Y Zhang and Y Li ldquoRiver flow forecasting usinglong short-termmemoryrdquoDEStech Transactions on ComputerScience and Engineering vol 16 2019

[9] Y Bai N Bezak K Sapac M Klun and J Zhang ldquoShort-termstreamflow forecasting using the feature-enhanced regressionmodelrdquo Water Resources Management vol 33 no 14pp 4783ndash4797 2019

[10] L C D Campos L Goliatt da Fonseca T L FonsecaG D de Abreu L F Pires and Y Gorodetskaya ldquoShort-termstreamflow forecasting for paraıba do Sul river using deeplearningrdquo Progress in Artificial Intelligence vol 43 pp 507ndash518 2019

[11] B B Sahoo R Jha A Singh and D Kumar ldquoLong short-termmemory (LSTM) recurrent neural network for low-flowhydrological time series forecastingrdquo Acta Geophysica vol 67no 5 pp 1471ndash1481 2019

[12] M Rahimzad A Moghaddam Nia H Zolfonoon J SoltaniA Danandeh Mehr and H-H Kwon Performance Com-parison of an LSTM-Based Deep Learning Model versusConventional Machine Learning Algorithms for StreamflowForecasting Water Resources Management New York NYUSA 2021

[13] K Cho B vanMerrienboer D Bahdanau and Y Bengio ldquoOnthe properties of neural machine translation enco-derndashdecoder approachesrdquo in Proceedings of the SSST-8 EighthWorkshop on Syntax Semantics and Structure in StatisticalTranslation pp 103ndash111 Doha Qatar October 2014

[14] C Chen W He J Li and Z Tang ldquoA novel hybrid CNN-LSTM scheme for nitrogen oxide emission prediction inFCC unitrdquoMathematical Problems in Engineering vol 2020Article ID 8071810 2020

[15] T Li M Hua and X Wu ldquoA hybrid CNN-LSTM model forforecasting particulate matter (PM25)rdquo IEEE Access vol 8pp 26933ndash26940 2020

[16] I E Livieris E Pintelas and P Pintelas ldquoA CNNndashLSTMmodel for gold price time-series forecastingrdquo Neural Com-puting and Applications vol 32 2020

[17] A A M Ahmed R C Deo N Raj et al ldquoDeep learningforecasts of soil moisture convolutional neural network andgated recurrent unit models coupled with satellite-derivedMODIS observations and synoptic-scale climate index datardquoRemote Sensing vol 13 no 4 p 554 2021

[18] Y Liu T Zhang A Kang J Li and X Lei ldquoResearch onrunoff simulations using deep-learning methodsrdquo Sustain-ability vol 13 no 3 p 1336 2021

[19] A U Muhammad X Li and J Feng ldquoUsing LSTM GRU andhybrid models for streamflow forecastingrdquoMachine Learningand Intelligent Communications vol 294 pp 510ndash524 2019

[20] R Barzegar M T Aalami and J Adamowski ldquoShort-termwater quality variable prediction using a hybrid CNN-LSTMdeep learning modelrdquo Stochastic Environmental Research andRisk Assessment vol 34 no 2 pp 415ndash433 2020

[21] S P Van H M Le D V -anh T D Dang H H Loc andD T Anh ldquoDeep learning convolutional neural network inrainfall-runoff modellingrdquo Journal of Hydroinformaticsvol 22 no 3 pp 541ndash561 2020

[22] P Parisouj H Mohebzadeh and T Lee ldquoEmploying machinelearning algorithms for streamflow prediction a case study offour river basins with different climatic zones in the UnitedStatesrdquo Water Resources Management vol 34 no 13pp 4113ndash4131 2020

[23] A Annis and F Nardi ldquoIntegrating VGI and 2D hydraulicmodels into a data assimilation framework for real time floodforecasting and mappingrdquo Geo-Spatial Information Sciencevol 22 no 4 pp 223ndash236 2019

[24] B M Fiseha S G Setegn A M Melesse E Volpi andA Fiori ldquoImpact of climate change on the hydrology of upperTiber River Basin using bias corrected regional climatemodelrdquo Water Resources Management vol 28 no 5pp 1327ndash1343 2014

[25] S Rania B Waad and E Nadia ldquoHybrid feature selectionmethod based on the genetic algorithm and pearson corre-lation coefficientrdquo Machine Learning Paradigms Ceory andApplication Studies in Computational Intelligence vol 8012000

[26] A D Mehr and A H Gandomi ldquoMSGP-LASSO an im-proved multi-stage genetic programming model for stream-flow predictionrdquo Information Sciences vol 561 pp 181ndash1952021

[27] R Kun F Wei Q Jihong Z Xia and S Xiaoyu ldquoComparisonof eight filter-based feature selection methods for monthlystreamflow forecasting ndash three case studies on CAMELS datasetsrdquo Journal of Hydrology vol 586 Article ID 124897 2020

[28] O Oyebode ldquoEvolutionary modelling of municipal waterdemand with multiple feature selection techniquesrdquo Journalof Water Supply Research amp Technology - Aqua vol 68 no 4pp 264ndash281 2019

[29] R Dehghani H Torabi Poudeh and Z Izadi ldquoDissolvedoxygen concentration predictions for running waters withusing hybrid machine learning techniquesrdquo Modeling EarthSystems and Environment vol 597 2021

[30] N Yuvaraj V Chang B Gobinathan et al ldquoAutomatic de-tection of cyberbullying using multi-feature based artificialintelligence with deep decision tree classificationrdquo Computersamp Electrical Engineering vol 92 Article ID 107186 2021

[31] B Ratner ldquo-e correlation coefficient its values range be-tween +1minus1 or do theyrdquo Journal of Targeting Measurementand Analysis for Marketing vol 17 no 2 pp 139ndash142 2009


[32] S K Tanbeer C F Ahmed B-S Jeong and Y-K LeeldquoSliding window-based frequent pattern mining over datastreamsrdquo Information Sciences vol 179 no 22pp 3843ndash3865 2009

[33] E Zivot and J Wang ldquoRolling analysis of time seriesrdquo inModeling Financial Time Series with S-Plus pp 299ndash346Springer New York NY USA 2003

[34] Y Wang W Liao and Y Chang ldquoGated recurrent unitnetwork-based short-term photovoltaic forecastingrdquo Energiesvol 11 no 8 p 2163 2018

[35] O B Sezer M U Gudelek and A M Ozbayoglu ldquoFinancialtime series forecasting with deep learning a systematic lit-erature review 2005-2019rdquo Applied Soft Computing vol 90Article ID 106181 2020

[36] Q Zou Q Xiong Q Li H Yi Y Yu and C Wu ldquoA waterquality prediction method based on the multi-time scalebidirectional long short-term memory networkrdquo Environ-mental Science and Pollution Research vol 27 no 14pp 16853ndash16864 2020

[37] S-C Wang ldquoArtificial neural networkrdquo InterdisciplinaryComputing in Java Programming vol 26 pp 81ndash100 2003

[38] V Nourani A Molajou H Najafi and A Danandeh MehrldquoEmotional ANN (EANN) a new generation of neural net-works for hydrological modeling in IoTrdquo in Artificial Intel-ligence In IoT Transactions On Computational Science AndComputational Intelligence pp 45ndash61 Springer New YorkNY USA 2019

[39] L Yan J Feng and T Hang ldquoSmall watershed stream-flowforecasting based on LSTMrdquo Advances in Intelligent Systemsand Computing vol 935 pp 1006ndash1014 2019

[40] S Zhu X Luo X Yuan and Z Xu ldquoAn improved long short-termmemory network for streamflow forecasting in the upperYangtze Riverrdquo Stochastic Environmental Research and RiskAssessment vol 14 no 9 2020

[41] A Osama A Onur K Serkan G Moncef and I D InmanldquoReal-time vibration-based structural damage detection usingone-dimensional convolutional neural networksrdquo Journal ofSound and Vibration vol 388 pp 154ndash170 2017

[42] Y Lecun L Bottou Y Bengio and P Haffner ldquoGradient-based learning applied to document recognitionrdquo Proceedingsof the IEEE vol 86 no 11 pp 2278ndash2324 1998

[43] H-C Shin H R Roth M Gao et al ldquoDeep convolutionalneural networks for computer-aided detection CNN archi-tectures dataset characteristics and transfer learningrdquo IEEETransactions on Medical Imaging vol 35 no 5 pp 1285ndash1298 2016

[44] A Essien and C Giannetti ldquoA deep learning framework forunivariate time series prediction using convolutional LSTMstacked autoencodersrdquo in Proceedings of the IEEE Interna-tional Symposium on Innovations in Intelligent Systems andApplications (INISTA) pp 1ndash6 Alberobello Italy July 2019

[45] D S Abdelminaam F H Ismail M Taha A TahaE H Houssein and A Nabil ldquoCoAID-DEEP an optimizedintelligent framework for automated detecting COVID-19misleading information on Twitterrdquo IEEE Access vol 9pp 27840ndash27867 2021

[46] O Kazakov and O Mikheenko ldquoTransfer learning and do-main adaptation based on modeling of socio-economic sys-temsrdquo Business Inform vol 14 no 2 pp 7ndash20 2020

[47] B Wang and N Z Gong ldquoStealing hyperparameters inmachine learningrdquo in Proceedings of the IEEE Symposium onSecurity and Privacy (SP) pp 36ndash52 San Francisco CA USAMay 2018

[48] C Boyraz and S N Engin ldquoStreamflow prediction with deeplearningrdquo in Proceedings of the 6th International Conferenceon Control Engineering Information Technology (CEIT)pp 1ndash5 Istanbul Turkey October 2018

[49] E K Jackson W Roberts B Nelsen G P WilliamsE J Nelson and D P Ames ldquoIntroductory overview errormetrics for hydrologic modelling - a review of commonpractices and an open source library to facilitate use andadoptionrdquo Environmental Modelling amp Software vol 119pp 32ndash48 2019

[50] S Kumar T Roshni and D Himayoun ldquoA comparison ofemotional neural network (ENN) and artificial neural net-work (ANN) approach for rainfall-runoff modellingrdquo CivilEngineering Journal vol 5 no 10 pp 2120ndash2130 2019

[51] P Bai X Liu and J Xie ldquoSimulating runoff under changingclimatic conditions a comparison of the long short-termmemory network with two conceptual hydrologic modelsrdquoJournal of Hydrology vol 592 Article ID 125779 2021


other hand data-driven models are suitable and can func-tion easily without considering the internal physicalmechanism of the watershed system [2 3 7]

Artificial neural networks (ANNs) are the most used andstudied ldquoblack-boxrdquo models -ey are utilized in manyscientific and technological areas than the list of availableblack-box algorithms such as support vector machine(SVM) genetic programming (GP) fuzzy logic (FL) re-current neural network (RNN) and long short-termmemory (LSTM) [7 8] ANN is available in differentfunctionalities and architectural forms from simple to ad-vanced levels A recurrent neural network (RNN) is one ofthe advanced ANN architectures It has been considered aspecially designed deep learning network for time seriesanalysis that quickly adapts to temporal dynamics usingprevious time step information [2] However RNN cannotcapture long-time dependencies and it is susceptible tovanishing and exploding gradients

Couta et al suggested advanced RNN or long short-termmemory (LSTM) as one of the most effective approaches [8]-e LSTM unit has a cell that comprises an input gate anoutput gate and a forget gate [9] Due to these gates theLSTM model has shown promising results in different ap-plications including speech recognition time series mod-elling natural language processing handwritingrecognition and traffic flow simulation [3 10] Studies havealso shown that LSTM has powerful performance forstreamflow simulation over different powerful multilayered(ML) tools [3 11] Campos et al [10] applied autoregressiveintegrated moving average (ARIMA) and LSTM network toforecast floods on four Pararsquo iba do Sulrsquos River stations inBrazil Aljahdali et al [7] also compared the LSTM networkand layered RNN to forecast streamflow in the USArsquos tworivers the Black and Gila rivers A recent article byRahimzad et al [12] used time-lagged Qt-1 Qt-2 and otherclimatic variables to forecast Qt in the future and concludedthat the LSTM network outperforms linear regression (LR)multilayer perceptron (MLP) and support vector machine(SVM) in forecasting daily streamflow

A few years back Cho et al [13] introduced gated re-current units (GRUs) similar to LSTM with a forget gatewhich have fewer parameters than LSTM as it lacks anoutput gate GRUrsquos capacities in speech signal modelling andnatural language processing were similar to those of LSTMHowever there are debates on the relative performance ofthese two architectures for streamflow and reservoir inflowsimulation which is not well studied with different time-scales and environments

Notwithstanding the difference in their performanceselecting appropriate time series models from variousknown deep learning network architectures is difficultLSTMs and GRUs are not always the ideal sequence pre-diction option However simulation with better predictionaccuracy fast running time and less complicated modelsrequires more research Hence this comparative analysis onthe network architectures helps decide the optimized al-ternative for time series analysis Recently different hybriddeep learning models are getting wide attention from re-searchers in various fields of study Chen et al [14] used

convolutional neural network (CNN) LSTM and hybridCNN-LSTMmodels for nitrogen oxide emission prediction-ey concluded that CNN-LSTM has an accurate and stableforecast of periodic nitrogen oxide emissions from the re-fining industry Moreover Li et al [15] used univariate andmultivariate time series data as input for LSTM and CNN-LSTM models Hence for the analysis of air quality usingparticulate matter (PM25) concentration prediction theproposed multivariate CNN-LSTM model gives the bestresult due to low error and short training time

-e integration of CNN and LSTMmodels benefits timeseries prediction models such that the LSTM model canefficiently capture long time sequences of pattern infor-mation In contrast CNN models can filter out the noise ofthe input data and extract more valuable features whichcould increase the accuracy of the prediction model [16]Moreover integrating CNN with GRU can also lead us torobust preprocessing of data providing a viable option toimprove the modelrsquos accuracy [17] Even though combiningCNN with LSTM showed remarkable results in differentstudies its application in hydrological fields still demandsmore research [18] Muhammad et al [19] used LSTMGRU and hybrid CNN-GRU models for streamflow sim-ulation based on 35 years of Model Parameter EstimationExperiment (MOPEX) dataset of 10 river basins in the USA-ey revealed that the proposed hybrid model outperformsthe conventional LSTM nevertheless the performance isalmost the same with GRU Recently Barzegar et al [20]studied short-term water quality variable prediction using ahybrid CNN-LSTM model and effectively captured low andhigh water quality variables mainly dissolved oxygenconcentrations

Screening input variables for different model architec-tures is also a challenging task for the researchers Eventhough rainfall evaporation and temperature are causalvariables for streamflow modelling data availability andstudy objectives limit the choice variability [21] Van et al[21] discussed that applying temperature and evapotrans-piration input nodes into the model increases the networkcomplexity and causes overfitting In contrast Parisouj et al[22] concluded that using readily available input variablessuch as temperature and precipitation for data-drivenstreamflow simulation will provide a reliable result Hencethis research will contribute a step to this debate by testingdifferent input combinations of various climatic regions inthe performance of the proposed models

To the best of our knowledge we identify minimal lit-erature that shows the performance variation of differenthybrid models for streamflow simulation in various inputvariability conditions at once -us we compared variousforms of hybrid CNN-LSTM and CNN-GRU architectureswith the classical MLP GRU and LSTM networks tosimulate single-step streamflow using two climatic regionsavailable precipitation and minimum and maximum tem-perature data Moreover the study tests the hybrid modelswith different layer arrangements and applies Keras tuner tooptimize model hyperparameters In general the primaryobjective of this study will be to test the performance var-iation of the proposed models with extreme input variability




2 Study Area









r N 1113936 XY minus 1113936 X 1113936 Y( 1113857

N 1113936 x2

minus 1113936 X( 11138572

1113960 1113961 N 1113936 Y2

minus 1113936 Y( 11138572

1113960 1113961

1113969 (1)




4 Methods





1113954yj fj 1113944

m


n

i1whixi + whb

⎛⎝ ⎞⎠ + wjb⎡⎢⎢⎣ ⎤⎥⎥⎦ (2)




39deg30prime0PrimeE

10deg30prime0PrimeN

11deg0prime0PrimeE



EW

N

S

Borkena Watershed

Awash River Basin


11deg0prime0PrimeN

40deg0prime0PrimeE


(a)

12deg0prime0PrimeE

43deg0prime0PrimeN

43deg30prime0PrimeN

43deg0prime0PrimeN


0 210420 840 1260 1680Kilometers

EW

N

S


43deg30prime0PrimeN

12deg30prime0PrimeE


(b)






h



t (6)






Kombolcha












Torgiano P (mmday) 0364 24 1412 00 71 25 620 00 69









5 Data Analysis



02000 2002 20082004 2006

Time (years)

Training Testing

01960 1964 1968

Time (years)

1972 1976 1980

200Q (m

3se

c)

400600800

2010 2012 2014

50Q (m

3se

c)100150200

0


Test

2468

Q (m

3se

c) 10121416

0


Test

2040Q

(m3

sec)

6080

100120

(a)

(a) (b)

(b)









R2

n 1113936


Qobs( 1113857lowast 1113936

Qsim( 1113857

n 1113936Q

2obs1113872 1113873 minus 1113936

Qobs( 1113857

21113960 1113961

121113969

lowast n 1113936Q

2sim1113872 1113873 minus 1113936

Qsim( 1113857

21113960 1113961

(7)


RMSE

1113936Nn1 Q

t0bs minus Q

tsim1113872 1113873

2

N

1113971

(8)


P(t+n) Q(t+n)

P(t+1)

P(t)

Tmin(t+n)

Time

Xt Q

TimeCatchment

Tmin(t+1)

Tmin(t)

Input

Tmax(t+n)MLP

GRU

LSTM

CNN-LSTM

CNN-GRU

ModelArchitectures

Tmax(t+1)

Tmax(t)

Output


X1W111

W112

W113

W133

W231

W221

W211

X2

Xn


Output layer

h1

h2 y

hn








Ct-1

ht-1

ft

it

Ot

Cet

ht

φ

PreviousShort-Term

Memory

UpdatedShort-Term

Memory

PreviousLong-Term

Memory

UpdatedLong-Term

Memory

Ct

yt

ht

Xt Input

Output

+


htht-1

PreviousLong-Term

+Short-TermMemory

Xt Input

y^t

h^t

Zt

rt

tanh

1-

σσ

Output

UpdatedLong-Term

+Short-TermMemory


W1

Input P1

C1

I1

I2

In

C2


C3 Cn

P2 P3 Pn

Convolution - Layer


Convolution - Layer


W2







Inpu

t

1D co

nv

Aver

ageP

oolin

g

Flat

ten

LSTM

or G

RU

LSTM

or G

RU

Dro

pout

Dro

pout

Out

put

Den

se(s

ingl

e out

put)





















Model 1






Model 2





Model 2




0

0102030405

Loss

06070809

10 20 30 40

Epochs

50 60 70 80

Train

Test

(a)

00102030405Lo

ss 06070809

10 20 30 40

Epochs

50 60

Train

Test

(b)



MAE 1n

1113944

n

i1Q

t0bs minus Q

tsim

11138681113868111386811138681113868111386811138681113868 (9)


6 Results


0

0

10

20

30

Test

Q (m

3se

c)

40

50

60

70

10 20 30 40

Predicted Q (m3sec)

50 60 70

Linear_fit

Data_points

2011

-09

2012

-01

2012

-05

2012

-09

2013

-01

2013

-05

2013

-09

2014

-01

2014

-05

2014

-09

2015

-01

010203040

Q (m

3se

c) 506070

Time (Year)

Test

Predicted

(a)

0

0255075

Test

Q (m

3se

c)100125150175

25 50 75 100

Predicted Q (m3sec)

125 150 175

Linear_fit

Data_points

1975

-01

1975

-07

1976

-01

1976

-07

1977

-01

1977

-07

1978

-01

1978

-07

1979

-01

0255075

100

Q (m

3se

c) 125150175

Time (Year)

Test

Predicted

(b)













gru GRU

dropout_2 Dropout

gru_1 GRU

dropout_3 Dropout

dense_1 Dense

input [( 30 5)]

[( 30 5)]output

input ( 30 5)

( 29 24)output

input ( 29 24)

( 9 24)output

input ( 9 24)

( 216)output

input ( 216)

( 15)output

input ( 15)

( 15)output

input ( 15)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 1)output














gru_10 GRU

dropout_10 Dropout

gru_11 GRU

dropout_11 Dropout

dense_5 Dense

input [( 45 15)]

[( 45 15)]output

input ( 45 15)

( 44 8)output

input ( 44 8)

( 22 8)output

input ( 22 8)

( 176)output

input ( 176)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 30)output

input ( 30)

( 30)output

input ( 30)

( 1)output







7 Conclusions








Data Availability




Acknowledgments



References
























































2 Study Area









r N 1113936 XY minus 1113936 X 1113936 Y( 1113857

N 1113936 x2

minus 1113936 X( 11138572

1113960 1113961 N 1113936 Y2

minus 1113936 Y( 11138572

1113960 1113961

1113969 (1)




4 Methods





1113954yj fj 1113944

m


n

i1whixi + whb

⎛⎝ ⎞⎠ + wjb⎡⎢⎢⎣ ⎤⎥⎥⎦ (2)




39deg30prime0PrimeE

10deg30prime0PrimeN

11deg0prime0PrimeE



EW

N

S

Borkena Watershed

Awash River Basin


11deg0prime0PrimeN

40deg0prime0PrimeE


(a)

12deg0prime0PrimeE

43deg0prime0PrimeN

43deg30prime0PrimeN

43deg0prime0PrimeN


0 210420 840 1260 1680Kilometers

EW

N

S


43deg30prime0PrimeN

12deg30prime0PrimeE


(b)






h



t (6)






Kombolcha












Torgiano P (mmday) 0364 24 1412 00 71 25 620 00 69









5 Data Analysis



02000 2002 20082004 2006

Time (years)

Training Testing

01960 1964 1968

Time (years)

1972 1976 1980

200Q (m

3se

c)

400600800

2010 2012 2014

50Q (m

3se

c)100150200

0


Test

2468

Q (m

3se

c) 10121416

0


Test

2040Q

(m3

sec)

6080

100120

(a)

(a) (b)

(b)









R2

n 1113936


Qobs( 1113857lowast 1113936

Qsim( 1113857

n 1113936Q

2obs1113872 1113873 minus 1113936

Qobs( 1113857

21113960 1113961

121113969

lowast n 1113936Q

2sim1113872 1113873 minus 1113936

Qsim( 1113857

21113960 1113961

(7)


RMSE

1113936Nn1 Q

t0bs minus Q

tsim1113872 1113873

2

N

1113971

(8)


P(t+n) Q(t+n)

P(t+1)

P(t)

Tmin(t+n)

Time

Xt Q

TimeCatchment

Tmin(t+1)

Tmin(t)

Input

Tmax(t+n)MLP

GRU

LSTM

CNN-LSTM

CNN-GRU

ModelArchitectures

Tmax(t+1)

Tmax(t)

Output


X1W111

W112

W113

W133

W231

W221

W211

X2

Xn


Output layer

h1

h2 y

hn








Ct-1

ht-1

ft

it

Ot

Cet

ht

φ

PreviousShort-Term

Memory

UpdatedShort-Term

Memory

PreviousLong-Term

Memory

UpdatedLong-Term

Memory

Ct

yt

ht

Xt Input

Output

+


htht-1

PreviousLong-Term

+Short-TermMemory

Xt Input

y^t

h^t

Zt

rt

tanh

1-

σσ

Output

UpdatedLong-Term

+Short-TermMemory


W1

Input P1

C1

I1

I2

In

C2


C3 Cn

P2 P3 Pn

Convolution - Layer


Convolution - Layer


W2







Inpu

t

1D co

nv

Aver

ageP

oolin

g

Flat

ten

LSTM

or G

RU

LSTM

or G

RU

Dro

pout

Dro

pout

Out

put

Den

se(s

ingl

e out

put)





















Model 1






Model 2





Model 2




0

0102030405

Loss

06070809

10 20 30 40

Epochs

50 60 70 80

Train

Test

(a)

00102030405Lo

ss 06070809

10 20 30 40

Epochs

50 60

Train

Test

(b)



MAE 1n

1113944

n

i1Q

t0bs minus Q

tsim

11138681113868111386811138681113868111386811138681113868 (9)


6 Results


0

0

10

20

30

Test

Q (m

3se

c)

40

50

60

70

10 20 30 40

Predicted Q (m3sec)

50 60 70

Linear_fit

Data_points

2011

-09

2012

-01

2012

-05

2012

-09

2013

-01

2013

-05

2013

-09

2014

-01

2014

-05

2014

-09

2015

-01

010203040

Q (m

3se

c) 506070

Time (Year)

Test

Predicted

(a)

0

0255075

Test

Q (m

3se

c)100125150175

25 50 75 100

Predicted Q (m3sec)

125 150 175

Linear_fit

Data_points

1975

-01

1975

-07

1976

-01

1976

-07

1977

-01

1977

-07

1978

-01

1978

-07

1979

-01

0255075

100

Q (m

3se

c) 125150175

Time (Year)

Test

Predicted

(b)













gru GRU

dropout_2 Dropout

gru_1 GRU

dropout_3 Dropout

dense_1 Dense

input [( 30 5)]

[( 30 5)]output

input ( 30 5)

( 29 24)output

input ( 29 24)

( 9 24)output

input ( 9 24)

( 216)output

input ( 216)

( 15)output

input ( 15)

( 15)output

input ( 15)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 1)output














gru_10 GRU

dropout_10 Dropout

gru_11 GRU

dropout_11 Dropout

dense_5 Dense

input [( 45 15)]

[( 45 15)]output

input ( 45 15)

( 44 8)output

input ( 44 8)

( 22 8)output

input ( 22 8)

( 176)output

input ( 176)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 30)output

input ( 30)

( 30)output

input ( 30)

( 1)output







7 Conclusions








Data Availability




Acknowledgments



References






















































4 Methods





1113954yj fj 1113944

m


n

i1whixi + whb

⎛⎝ ⎞⎠ + wjb⎡⎢⎢⎣ ⎤⎥⎥⎦ (2)




39deg30prime0PrimeE

10deg30prime0PrimeN

11deg0prime0PrimeE



EW

N

S

Borkena Watershed

Awash River Basin


11deg0prime0PrimeN

40deg0prime0PrimeE


(a)

12deg0prime0PrimeE

43deg0prime0PrimeN

43deg30prime0PrimeN

43deg0prime0PrimeN


0 210420 840 1260 1680Kilometers

EW

N

S


43deg30prime0PrimeN

12deg30prime0PrimeE


(b)






h



t (6)






Kombolcha












Torgiano P (mmday) 0364 24 1412 00 71 25 620 00 69









5 Data Analysis



02000 2002 20082004 2006

Time (years)

Training Testing

01960 1964 1968

Time (years)

1972 1976 1980

200Q (m

3se

c)

400600800

2010 2012 2014

50Q (m

3se

c)100150200

0


Test

2468

Q (m

3se

c) 10121416

0


Test

2040Q

(m3

sec)

6080

100120

(a)

(a) (b)

(b)









R2

n 1113936


Qobs( 1113857lowast 1113936

Qsim( 1113857

n 1113936Q

2obs1113872 1113873 minus 1113936

Qobs( 1113857

21113960 1113961

121113969

lowast n 1113936Q

2sim1113872 1113873 minus 1113936

Qsim( 1113857

21113960 1113961

(7)


RMSE

1113936Nn1 Q

t0bs minus Q

tsim1113872 1113873

2

N

1113971

(8)


P(t+n) Q(t+n)

P(t+1)

P(t)

Tmin(t+n)

Time

Xt Q

TimeCatchment

Tmin(t+1)

Tmin(t)

Input

Tmax(t+n)MLP

GRU

LSTM

CNN-LSTM

CNN-GRU

ModelArchitectures

Tmax(t+1)

Tmax(t)

Output


X1W111

W112

W113

W133

W231

W221

W211

X2

Xn


Output layer

h1

h2 y

hn








Ct-1

ht-1

ft

it

Ot

Cet

ht

φ

PreviousShort-Term

Memory

UpdatedShort-Term

Memory

PreviousLong-Term

Memory

UpdatedLong-Term

Memory

Ct

yt

ht

Xt Input

Output

+


htht-1

PreviousLong-Term

+Short-TermMemory

Xt Input

y^t

h^t

Zt

rt

tanh

1-

σσ

Output

UpdatedLong-Term

+Short-TermMemory


W1

Input P1

C1

I1

I2

In

C2


C3 Cn

P2 P3 Pn

Convolution - Layer


Convolution - Layer


W2







Inpu

t

1D co

nv

Aver

ageP

oolin

g

Flat

ten

LSTM

or G

RU

LSTM

or G

RU

Dro

pout

Dro

pout

Out

put

Den

se(s

ingl

e out

put)





















Model 1






Model 2





Model 2




0

0102030405

Loss

06070809

10 20 30 40

Epochs

50 60 70 80

Train

Test

(a)

00102030405Lo

ss 06070809

10 20 30 40

Epochs

50 60

Train

Test

(b)



MAE 1n

1113944

n

i1Q

t0bs minus Q

tsim

11138681113868111386811138681113868111386811138681113868 (9)


6 Results


0

0

10

20

30

Test

Q (m

3se

c)

40

50

60

70

10 20 30 40

Predicted Q (m3sec)

50 60 70

Linear_fit

Data_points

2011

-09

2012

-01

2012

-05

2012

-09

2013

-01

2013

-05

2013

-09

2014

-01

2014

-05

2014

-09

2015

-01

010203040

Q (m

3se

c) 506070

Time (Year)

Test

Predicted

(a)

0

0255075

Test

Q (m

3se

c)100125150175

25 50 75 100

Predicted Q (m3sec)

125 150 175

Linear_fit

Data_points

1975

-01

1975

-07

1976

-01

1976

-07

1977

-01

1977

-07

1978

-01

1978

-07

1979

-01

0255075

100

Q (m

3se

c) 125150175

Time (Year)

Test

Predicted

(b)













gru GRU

dropout_2 Dropout

gru_1 GRU

dropout_3 Dropout

dense_1 Dense

input [( 30 5)]

[( 30 5)]output

input ( 30 5)

( 29 24)output

input ( 29 24)

( 9 24)output

input ( 9 24)

( 216)output

input ( 216)

( 15)output

input ( 15)

( 15)output

input ( 15)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 1)output














gru_10 GRU

dropout_10 Dropout

gru_11 GRU

dropout_11 Dropout

dense_5 Dense

input [( 45 15)]

[( 45 15)]output

input ( 45 15)

( 44 8)output

input ( 44 8)

( 22 8)output

input ( 22 8)

( 176)output

input ( 176)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 30)output

input ( 30)

( 30)output

input ( 30)

( 1)output







7 Conclusions








Data Availability




Acknowledgments



References

























































h



t (6)






Kombolcha












Torgiano P (mmday) 0364 24 1412 00 71 25 620 00 69









5 Data Analysis



02000 2002 20082004 2006

Time (years)

Training Testing

01960 1964 1968

Time (years)

1972 1976 1980

200Q (m

3se

c)

400600800

2010 2012 2014

50Q (m

3se

c)100150200

0


Test

2468

Q (m

3se

c) 10121416

0


Test

2040Q

(m3

sec)

6080

100120

(a)

(a) (b)

(b)









R2

n 1113936


Qobs( 1113857lowast 1113936

Qsim( 1113857

n 1113936Q

2obs1113872 1113873 minus 1113936

Qobs( 1113857

21113960 1113961

121113969

lowast n 1113936Q

2sim1113872 1113873 minus 1113936

Qsim( 1113857

21113960 1113961

(7)


RMSE

1113936Nn1 Q

t0bs minus Q

tsim1113872 1113873

2

N

1113971

(8)


P(t+n) Q(t+n)

P(t+1)

P(t)

Tmin(t+n)

Time

Xt Q

TimeCatchment

Tmin(t+1)

Tmin(t)

Input

Tmax(t+n)MLP

GRU

LSTM

CNN-LSTM

CNN-GRU

ModelArchitectures

Tmax(t+1)

Tmax(t)

Output


X1W111

W112

W113

W133

W231

W221

W211

X2

Xn


Output layer

h1

h2 y

hn








Ct-1

ht-1

ft

it

Ot

Cet

ht

φ

PreviousShort-Term

Memory

UpdatedShort-Term

Memory

PreviousLong-Term

Memory

UpdatedLong-Term

Memory

Ct

yt

ht

Xt Input

Output

+


htht-1

PreviousLong-Term

+Short-TermMemory

Xt Input

y^t

h^t

Zt

rt

tanh

1-

σσ

Output

UpdatedLong-Term

+Short-TermMemory


W1

Input P1

C1

I1

I2

In

C2


C3 Cn

P2 P3 Pn

Convolution - Layer


Convolution - Layer


W2







Inpu

t

1D co

nv

Aver

ageP

oolin

g

Flat

ten

LSTM

or G

RU

LSTM

or G

RU

Dro

pout

Dro

pout

Out

put

Den

se(s

ingl

e out

put)





















Model 1






Model 2





Model 2




0

0102030405

Loss

06070809

10 20 30 40

Epochs

50 60 70 80

Train

Test

(a)

00102030405Lo

ss 06070809

10 20 30 40

Epochs

50 60

Train

Test

(b)



MAE 1n

1113944

n

i1Q

t0bs minus Q

tsim

11138681113868111386811138681113868111386811138681113868 (9)


6 Results


0

0

10

20

30

Test

Q (m

3se

c)

40

50

60

70

10 20 30 40

Predicted Q (m3sec)

50 60 70

Linear_fit

Data_points

2011

-09

2012

-01

2012

-05

2012

-09

2013

-01

2013

-05

2013

-09

2014

-01

2014

-05

2014

-09

2015

-01

010203040

Q (m

3se

c) 506070

Time (Year)

Test

Predicted

(a)

0

0255075

Test

Q (m

3se

c)100125150175

25 50 75 100

Predicted Q (m3sec)

125 150 175

Linear_fit

Data_points

1975

-01

1975

-07

1976

-01

1976

-07

1977

-01

1977

-07

1978

-01

1978

-07

1979

-01

0255075

100

Q (m

3se

c) 125150175

Time (Year)

Test

Predicted

(b)













gru GRU

dropout_2 Dropout

gru_1 GRU

dropout_3 Dropout

dense_1 Dense

input [( 30 5)]

[( 30 5)]output

input ( 30 5)

( 29 24)output

input ( 29 24)

( 9 24)output

input ( 9 24)

( 216)output

input ( 216)

( 15)output

input ( 15)

( 15)output

input ( 15)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 1)output














gru_10 GRU

dropout_10 Dropout

gru_11 GRU

dropout_11 Dropout

dense_5 Dense

input [( 45 15)]

[( 45 15)]output

input ( 45 15)

( 44 8)output

input ( 44 8)

( 22 8)output

input ( 22 8)

( 176)output

input ( 176)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 30)output

input ( 30)

( 30)output

input ( 30)

( 1)output







7 Conclusions








Data Availability




Acknowledgments



References



























































5 Data Analysis



02000 2002 20082004 2006

Time (years)

Training Testing

01960 1964 1968

Time (years)

1972 1976 1980

200Q (m

3se

c)

400600800

2010 2012 2014

50Q (m

3se

c)100150200

0


Test

2468

Q (m

3se

c) 10121416

0


Test

2040Q

(m3

sec)

6080

100120

(a)

(a) (b)

(b)









R2

n 1113936


Qobs( 1113857lowast 1113936

Qsim( 1113857

n 1113936Q

2obs1113872 1113873 minus 1113936

Qobs( 1113857

21113960 1113961

121113969

lowast n 1113936Q

2sim1113872 1113873 minus 1113936

Qsim( 1113857

21113960 1113961

(7)


RMSE

1113936Nn1 Q

t0bs minus Q

tsim1113872 1113873

2

N

1113971

(8)


P(t+n) Q(t+n)

P(t+1)

P(t)

Tmin(t+n)

Time

Xt Q

TimeCatchment

Tmin(t+1)

Tmin(t)

Input

Tmax(t+n)MLP

GRU

LSTM

CNN-LSTM

CNN-GRU

ModelArchitectures

Tmax(t+1)

Tmax(t)

Output


X1W111

W112

W113

W133

W231

W221

W211

X2

Xn


Output layer

h1

h2 y

hn








Ct-1

ht-1

ft

it

Ot

Cet

ht

φ

PreviousShort-Term

Memory

UpdatedShort-Term

Memory

PreviousLong-Term

Memory

UpdatedLong-Term

Memory

Ct

yt

ht

Xt Input

Output

+


htht-1

PreviousLong-Term

+Short-TermMemory

Xt Input

y^t

h^t

Zt

rt

tanh

1-

σσ

Output

UpdatedLong-Term

+Short-TermMemory


W1

Input P1

C1

I1

I2

In

C2


C3 Cn

P2 P3 Pn

Convolution - Layer


Convolution - Layer


W2







Inpu

t

1D co

nv

Aver

ageP

oolin

g

Flat

ten

LSTM

or G

RU

LSTM

or G

RU

Dro

pout

Dro

pout

Out

put

Den

se(s

ingl

e out

put)





















Model 1






Model 2





Model 2




0

0102030405

Loss

06070809

10 20 30 40

Epochs

50 60 70 80

Train

Test

(a)

00102030405Lo

ss 06070809

10 20 30 40

Epochs

50 60

Train

Test

(b)



MAE 1n

1113944

n

i1Q

t0bs minus Q

tsim

11138681113868111386811138681113868111386811138681113868 (9)


6 Results


0

0

10

20

30

Test

Q (m

3se

c)

40

50

60

70

10 20 30 40

Predicted Q (m3sec)

50 60 70

Linear_fit

Data_points

2011

-09

2012

-01

2012

-05

2012

-09

2013

-01

2013

-05

2013

-09

2014

-01

2014

-05

2014

-09

2015

-01

010203040

Q (m

3se

c) 506070

Time (Year)

Test

Predicted

(a)

0

0255075

Test

Q (m

3se

c)100125150175

25 50 75 100

Predicted Q (m3sec)

125 150 175

Linear_fit

Data_points

1975

-01

1975

-07

1976

-01

1976

-07

1977

-01

1977

-07

1978

-01

1978

-07

1979

-01

0255075

100

Q (m

3se

c) 125150175

Time (Year)

Test

Predicted

(b)













gru GRU

dropout_2 Dropout

gru_1 GRU

dropout_3 Dropout

dense_1 Dense

input [( 30 5)]

[( 30 5)]output

input ( 30 5)

( 29 24)output

input ( 29 24)

( 9 24)output

input ( 9 24)

( 216)output

input ( 216)

( 15)output

input ( 15)

( 15)output

input ( 15)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 1)output














gru_10 GRU

dropout_10 Dropout

gru_11 GRU

dropout_11 Dropout

dense_5 Dense

input [( 45 15)]

[( 45 15)]output

input ( 45 15)

( 44 8)output

input ( 44 8)

( 22 8)output

input ( 22 8)

( 176)output

input ( 176)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 30)output

input ( 30)

( 30)output

input ( 30)

( 1)output







7 Conclusions








Data Availability




Acknowledgments



References




























































R2

n 1113936


Qobs( 1113857lowast 1113936

Qsim( 1113857

n 1113936Q

2obs1113872 1113873 minus 1113936

Qobs( 1113857

21113960 1113961

121113969

lowast n 1113936Q

2sim1113872 1113873 minus 1113936

Qsim( 1113857

21113960 1113961

(7)


RMSE

1113936Nn1 Q

t0bs minus Q

tsim1113872 1113873

2

N

1113971

(8)


P(t+n) Q(t+n)

P(t+1)

P(t)

Tmin(t+n)

Time

Xt Q

TimeCatchment

Tmin(t+1)

Tmin(t)

Input

Tmax(t+n)MLP

GRU

LSTM

CNN-LSTM

CNN-GRU

ModelArchitectures

Tmax(t+1)

Tmax(t)

Output


X1W111

W112

W113

W133

W231

W221

W211

X2

Xn


Output layer

h1

h2 y

hn








Ct-1

ht-1

ft

it

Ot

Cet

ht

φ

PreviousShort-Term

Memory

UpdatedShort-Term

Memory

PreviousLong-Term

Memory

UpdatedLong-Term

Memory

Ct

yt

ht

Xt Input

Output

+


htht-1

PreviousLong-Term

+Short-TermMemory

Xt Input

y^t

h^t

Zt

rt

tanh

1-

σσ

Output

UpdatedLong-Term

+Short-TermMemory


W1

Input P1

C1

I1

I2

In

C2


C3 Cn

P2 P3 Pn

Convolution - Layer


Convolution - Layer


W2







Inpu

t

1D co

nv

Aver

ageP

oolin

g

Flat

ten

LSTM

or G

RU

LSTM

or G

RU

Dro

pout

Dro

pout

Out

put

Den

se(s

ingl

e out

put)





















Model 1






Model 2





Model 2




0

0102030405

Loss

06070809

10 20 30 40

Epochs

50 60 70 80

Train

Test

(a)

00102030405Lo

ss 06070809

10 20 30 40

Epochs

50 60

Train

Test

(b)



MAE 1n

1113944

n

i1Q

t0bs minus Q

tsim

11138681113868111386811138681113868111386811138681113868 (9)


6 Results


0

0

10

20

30

Test

Q (m

3se

c)

40

50

60

70

10 20 30 40

Predicted Q (m3sec)

50 60 70

Linear_fit

Data_points

2011

-09

2012

-01

2012

-05

2012

-09

2013

-01

2013

-05

2013

-09

2014

-01

2014

-05

2014

-09

2015

-01

010203040

Q (m

3se

c) 506070

Time (Year)

Test

Predicted

(a)

0

0255075

Test

Q (m

3se

c)100125150175

25 50 75 100

Predicted Q (m3sec)

125 150 175

Linear_fit

Data_points

1975

-01

1975

-07

1976

-01

1976

-07

1977

-01

1977

-07

1978

-01

1978

-07

1979

-01

0255075

100

Q (m

3se

c) 125150175

Time (Year)

Test

Predicted

(b)













gru GRU

dropout_2 Dropout

gru_1 GRU

dropout_3 Dropout

dense_1 Dense

input [( 30 5)]

[( 30 5)]output

input ( 30 5)

( 29 24)output

input ( 29 24)

( 9 24)output

input ( 9 24)

( 216)output

input ( 216)

( 15)output

input ( 15)

( 15)output

input ( 15)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 1)output














gru_10 GRU

dropout_10 Dropout

gru_11 GRU

dropout_11 Dropout

dense_5 Dense

input [( 45 15)]

[( 45 15)]output

input ( 45 15)

( 44 8)output

input ( 44 8)

( 22 8)output

input ( 22 8)

( 176)output

input ( 176)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 30)output

input ( 30)

( 30)output

input ( 30)

( 1)output







7 Conclusions








Data Availability




Acknowledgments



References



























































Ct-1

ht-1

ft

it

Ot

Cet

ht

φ

PreviousShort-Term

Memory

UpdatedShort-Term

Memory

PreviousLong-Term

Memory

UpdatedLong-Term

Memory

Ct

yt

ht

Xt Input

Output

+


htht-1

PreviousLong-Term

+Short-TermMemory

Xt Input

y^t

h^t

Zt

rt

tanh

1-

σσ

Output

UpdatedLong-Term

+Short-TermMemory


W1

Input P1

C1

I1

I2

In

C2


C3 Cn

P2 P3 Pn

Convolution - Layer


Convolution - Layer


W2







Inpu

t

1D co

nv

Aver

ageP

oolin

g

Flat

ten

LSTM

or G

RU

LSTM

or G

RU

Dro

pout

Dro

pout

Out

put

Den

se(s

ingl

e out

put)





















Model 1






Model 2





Model 2




0

0102030405

Loss

06070809

10 20 30 40

Epochs

50 60 70 80

Train

Test

(a)

00102030405Lo

ss 06070809

10 20 30 40

Epochs

50 60

Train

Test

(b)



MAE 1n

1113944

n

i1Q

t0bs minus Q

tsim

11138681113868111386811138681113868111386811138681113868 (9)


6 Results


0

0

10

20

30

Test

Q (m

3se

c)

40

50

60

70

10 20 30 40

Predicted Q (m3sec)

50 60 70

Linear_fit

Data_points

2011

-09

2012

-01

2012

-05

2012

-09

2013

-01

2013

-05

2013

-09

2014

-01

2014

-05

2014

-09

2015

-01

010203040

Q (m

3se

c) 506070

Time (Year)

Test

Predicted

(a)

0

0255075

Test

Q (m

3se

c)100125150175

25 50 75 100

Predicted Q (m3sec)

125 150 175

Linear_fit

Data_points

1975

-01

1975

-07

1976

-01

1976

-07

1977

-01

1977

-07

1978

-01

1978

-07

1979

-01

0255075

100

Q (m

3se

c) 125150175

Time (Year)

Test

Predicted

(b)













gru GRU

dropout_2 Dropout

gru_1 GRU

dropout_3 Dropout

dense_1 Dense

input [( 30 5)]

[( 30 5)]output

input ( 30 5)

( 29 24)output

input ( 29 24)

( 9 24)output

input ( 9 24)

( 216)output

input ( 216)

( 15)output

input ( 15)

( 15)output

input ( 15)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 1)output














gru_10 GRU

dropout_10 Dropout

gru_11 GRU

dropout_11 Dropout

dense_5 Dense

input [( 45 15)]

[( 45 15)]output

input ( 45 15)

( 44 8)output

input ( 44 8)

( 22 8)output

input ( 22 8)

( 176)output

input ( 176)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 30)output

input ( 30)

( 30)output

input ( 30)

( 1)output







7 Conclusions








Data Availability




Acknowledgments



References


























































Inpu

t

1D co

nv

Aver

ageP

oolin

g

Flat

ten

LSTM

or G

RU

LSTM

or G

RU

Dro

pout

Dro

pout

Out

put

Den

se(s

ingl

e out

put)





















Model 1






Model 2





Model 2




0

0102030405

Loss

06070809

10 20 30 40

Epochs

50 60 70 80

Train

Test

(a)

00102030405Lo

ss 06070809

10 20 30 40

Epochs

50 60

Train

Test

(b)



MAE 1n

1113944

n

i1Q

t0bs minus Q

tsim

11138681113868111386811138681113868111386811138681113868 (9)


6 Results


0

0

10

20

30

Test

Q (m

3se

c)

40

50

60

70

10 20 30 40

Predicted Q (m3sec)

50 60 70

Linear_fit

Data_points

2011

-09

2012

-01

2012

-05

2012

-09

2013

-01

2013

-05

2013

-09

2014

-01

2014

-05

2014

-09

2015

-01

010203040

Q (m

3se

c) 506070

Time (Year)

Test

Predicted

(a)

0

0255075

Test

Q (m

3se

c)100125150175

25 50 75 100

Predicted Q (m3sec)

125 150 175

Linear_fit

Data_points

1975

-01

1975

-07

1976

-01

1976

-07

1977

-01

1977

-07

1978

-01

1978

-07

1979

-01

0255075

100

Q (m

3se

c) 125150175

Time (Year)

Test

Predicted

(b)













gru GRU

dropout_2 Dropout

gru_1 GRU

dropout_3 Dropout

dense_1 Dense

input [( 30 5)]

[( 30 5)]output

input ( 30 5)

( 29 24)output

input ( 29 24)

( 9 24)output

input ( 9 24)

( 216)output

input ( 216)

( 15)output

input ( 15)

( 15)output

input ( 15)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 1)output














gru_10 GRU

dropout_10 Dropout

gru_11 GRU

dropout_11 Dropout

dense_5 Dense

input [( 45 15)]

[( 45 15)]output

input ( 45 15)

( 44 8)output

input ( 44 8)

( 22 8)output

input ( 22 8)

( 176)output

input ( 176)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 30)output

input ( 30)

( 30)output

input ( 30)

( 1)output







7 Conclusions








Data Availability




Acknowledgments



References























































Model 2





Model 2




0

0102030405

Loss

06070809

10 20 30 40

Epochs

50 60 70 80

Train

Test

(a)

00102030405Lo

ss 06070809

10 20 30 40

Epochs

50 60

Train

Test

(b)



MAE 1n

1113944

n

i1Q

t0bs minus Q

tsim

11138681113868111386811138681113868111386811138681113868 (9)


6 Results


0

0

10

20

30

Test

Q (m

3se

c)

40

50

60

70

10 20 30 40

Predicted Q (m3sec)

50 60 70

Linear_fit

Data_points

2011

-09

2012

-01

2012

-05

2012

-09

2013

-01

2013

-05

2013

-09

2014

-01

2014

-05

2014

-09

2015

-01

010203040

Q (m

3se

c) 506070

Time (Year)

Test

Predicted

(a)

0

0255075

Test

Q (m

3se

c)100125150175

25 50 75 100

Predicted Q (m3sec)

125 150 175

Linear_fit

Data_points

1975

-01

1975

-07

1976

-01

1976

-07

1977

-01

1977

-07

1978

-01

1978

-07

1979

-01

0255075

100

Q (m

3se

c) 125150175

Time (Year)

Test

Predicted

(b)













gru GRU

dropout_2 Dropout

gru_1 GRU

dropout_3 Dropout

dense_1 Dense

input [( 30 5)]

[( 30 5)]output

input ( 30 5)

( 29 24)output

input ( 29 24)

( 9 24)output

input ( 9 24)

( 216)output

input ( 216)

( 15)output

input ( 15)

( 15)output

input ( 15)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 1)output














gru_10 GRU

dropout_10 Dropout

gru_11 GRU

dropout_11 Dropout

dense_5 Dense

input [( 45 15)]

[( 45 15)]output

input ( 45 15)

( 44 8)output

input ( 44 8)

( 22 8)output

input ( 22 8)

( 176)output

input ( 176)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 30)output

input ( 30)

( 30)output

input ( 30)

( 1)output







7 Conclusions








Data Availability




Acknowledgments



References






















































MAE 1n

1113944

n

i1Q

t0bs minus Q

tsim

11138681113868111386811138681113868111386811138681113868 (9)


6 Results


0

0

10

20

30

Test

Q (m

3se

c)

40

50

60

70

10 20 30 40

Predicted Q (m3sec)

50 60 70

Linear_fit

Data_points

2011

-09

2012

-01

2012

-05

2012

-09

2013

-01

2013

-05

2013

-09

2014

-01

2014

-05

2014

-09

2015

-01

010203040

Q (m

3se

c) 506070

Time (Year)

Test

Predicted

(a)

0

0255075

Test

Q (m

3se

c)100125150175

25 50 75 100

Predicted Q (m3sec)

125 150 175

Linear_fit

Data_points

1975

-01

1975

-07

1976

-01

1976

-07

1977

-01

1977

-07

1978

-01

1978

-07

1979

-01

0255075

100

Q (m

3se

c) 125150175

Time (Year)

Test

Predicted

(b)













gru GRU

dropout_2 Dropout

gru_1 GRU

dropout_3 Dropout

dense_1 Dense

input [( 30 5)]

[( 30 5)]output

input ( 30 5)

( 29 24)output

input ( 29 24)

( 9 24)output

input ( 9 24)

( 216)output

input ( 216)

( 15)output

input ( 15)

( 15)output

input ( 15)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 1)output














gru_10 GRU

dropout_10 Dropout

gru_11 GRU

dropout_11 Dropout

dense_5 Dense

input [( 45 15)]

[( 45 15)]output

input ( 45 15)

( 44 8)output

input ( 44 8)

( 22 8)output

input ( 22 8)

( 176)output

input ( 176)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 30)output

input ( 30)

( 30)output

input ( 30)

( 1)output







7 Conclusions








Data Availability




Acknowledgments



References





























































gru GRU

dropout_2 Dropout

gru_1 GRU

dropout_3 Dropout

dense_1 Dense

input [( 30 5)]

[( 30 5)]output

input ( 30 5)

( 29 24)output

input ( 29 24)

( 9 24)output

input ( 9 24)

( 216)output

input ( 216)

( 15)output

input ( 15)

( 15)output

input ( 15)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 1)output














gru_10 GRU

dropout_10 Dropout

gru_11 GRU

dropout_11 Dropout

dense_5 Dense

input [( 45 15)]

[( 45 15)]output

input ( 45 15)

( 44 8)output

input ( 44 8)

( 22 8)output

input ( 22 8)

( 176)output

input ( 176)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 30)output

input ( 30)

( 30)output

input ( 30)

( 1)output







7 Conclusions








Data Availability




Acknowledgments



References

































































gru_10 GRU

dropout_10 Dropout

gru_11 GRU

dropout_11 Dropout

dense_5 Dense

input [( 45 15)]

[( 45 15)]output

input ( 45 15)

( 44 8)output

input ( 44 8)

( 22 8)output

input ( 22 8)

( 176)output

input ( 176)

( 20)output

input ( 20)

( 20)output

input ( 20)

( 30)output

input ( 30)

( 30)output

input ( 30)

( 1)output







7 Conclusions








Data Availability




Acknowledgments



References


























































7 Conclusions








Data Availability




Acknowledgments



References






















































References






















































multivariate streamflow simulation using hybrid deep

Documents