monthly streamflow prediction using modified emd-based support vector machine

12
Monthly streamflow prediction using modified EMD-based support vector machine Shengzhi Huang, Jianxia Chang , Qiang Huang, Yutong Chen State Key Laboratory Base of Eco-Hydraulic Engineering in Arid Area, Xi’an University of Technology, Xi’an 710048, China article info Article history: Received 24 July 2013 Received in revised form 23 January 2014 Accepted 28 January 2014 Available online 7 February 2014 This manuscript was handled by Andras Bardossy, Editor-in-Chief, with the assistance of Sheng Yue, Associate Editor Keywords: Empirical mode decomposition Support vector machine Monthly streamflow The Wei River Basin Grid research method summary It is of great significance for operation, planning and dispatching of hydropower station to predict monthly streamflow accurately. Therefore, the main goal of this study is to investigate the accuracy of a modified EMD–SVM model for monthly streamflow forecasting in the Wei River Basin, which has made an improvement by removing the high frequency (IMF 1 ) based on the conventional EMD–SVM model. The EMD–SVM model is obtained by combining empirical mode decomposition and support vector machine. To acquire the optimal c and g values of SVM, the grid research method was employed. Three quantitative standard statistical performance evaluation measures, root mean squared error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) were employed to evaluate the performances of the ANN, SVM, EMD–SVM and M-EMDSVM models. The comparison of results reveals that the M-EMDSVM approach has provided a superior alternative to ANN, SVM and EMD– SVM models for forecasting monthly streamflow at Huaxian hydrological station, and its pass rate of pre- diction reaches up to 82.6% in Huaxian station. To further illustrate the stability and representativeness of the modified EMD–SVM model, the Lintong and Xianyang stations were used to verify the model. The results show that the modified EMD–SVM model has a good stability and great representativeness as well as a high prediction precision. Ó 2014 Elsevier B.V. All rights reserved. 1. Introduction It is of great significance for the optimal management of water resources to predict streamflow, especially in long term forecast- ing. A lot of water resources utilizations including hydropower generation, optimal reservoir operation, the rational allocation of water resources and environment protection, etc. (Guo et al., 2011; Wang et al., 2009) need the runoff forecasting results to make decisions. Therefore, streamflow prediction has achieved a lot of attention in recent decades. Typically, physically based numerical models are employed to predict the streamflow (Partington et al., 2012; Marco et al., 2010). They have simulated the streamflow generation process through a governing equation which is subject to some specific boundary conditions. However, these models need a great deal of precise data to make parameters calibration. Actually, with many limitations, it is extremely difficult to acquire adequate precise data, which leads to poor model performance and some uncertain- ties (Yoon et al., 2011). Many previous researchers have used time-series models including AR model, MA model and ARIMA model (Raman and Sunilkumar, 1995) to forecast the streamflow. All the models assume that the relationship between the input and output series is linear, however, in fact, that is highly nonlinear. Therefore, they miss the nonlinear information hidden in the streamflow series, resulting in a poor performance. With a strong ability of nonlinear mapping, ANNs have been successfully applied in different subject areas including hydrology and water resource (Yoon et al., 2007; Zealand et al., 1999; Sudheer et al., 2002). Nevertheless, ANNs also have their own shortcomings including slowly learning speed, over-fitting, curse of dimensionality and convergence to local minimum. Therefore, when handling complex hydrological processes, they are inclined to acquire a poor perfor- mance. However, Support Vector Machines (SVMs) proposed by Cortes and Vapnik (1995) are based on the VC dimension theory and the structural risk minimization principle, which can theoret- ically achieve the global optimum. In the last decades, the SVM has been applied in the field of hydrology. Yu et al. (2012) have intro- duced SVM to predict the multi-layer soil moisture; Lin et al. (2013) have employed support vector machine to forecast typhoon flood; Yu et al. (2006) have proposed a real-time flood stage fore- casting model based on SVR; Chen et al. (2010) have predicted the daily precipitation using support vector machines and http://dx.doi.org/10.1016/j.jhydrol.2014.01.062 0022-1694/Ó 2014 Elsevier B.V. All rights reserved. Corresponding author. Tel.: +86 15686060577; fax: +86 29 82312797. E-mail addresses: [email protected] (S. Huang), chxiang@xaut. edu.cn (J. Chang). Journal of Hydrology 511 (2014) 764–775 Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

Upload: yutong

Post on 02-Jan-2017

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Monthly streamflow prediction using modified EMD-based support vector machine

Journal of Hydrology 511 (2014) 764–775

Contents lists available at ScienceDirect

Journal of Hydrology

journal homepage: www.elsevier .com/locate / jhydrol

Monthly streamflow prediction using modified EMD-based supportvector machine

http://dx.doi.org/10.1016/j.jhydrol.2014.01.0620022-1694/� 2014 Elsevier B.V. All rights reserved.

⇑ Corresponding author. Tel.: +86 15686060577; fax: +86 29 82312797.E-mail addresses: [email protected] (S. Huang), chxiang@xaut.

edu.cn (J. Chang).

Shengzhi Huang, Jianxia Chang ⇑, Qiang Huang, Yutong ChenState Key Laboratory Base of Eco-Hydraulic Engineering in Arid Area, Xi’an University of Technology, Xi’an 710048, China

a r t i c l e i n f o

Article history:Received 24 July 2013Received in revised form 23 January 2014Accepted 28 January 2014Available online 7 February 2014This manuscript was handled by AndrasBardossy, Editor-in-Chief, with theassistance of Sheng Yue, Associate Editor

Keywords:Empirical mode decompositionSupport vector machineMonthly streamflowThe Wei River BasinGrid research method

s u m m a r y

It is of great significance for operation, planning and dispatching of hydropower station to predictmonthly streamflow accurately. Therefore, the main goal of this study is to investigate the accuracy ofa modified EMD–SVM model for monthly streamflow forecasting in the Wei River Basin, which has madean improvement by removing the high frequency (IMF1) based on the conventional EMD–SVM model.The EMD–SVM model is obtained by combining empirical mode decomposition and support vectormachine. To acquire the optimal c and g values of SVM, the grid research method was employed. Threequantitative standard statistical performance evaluation measures, root mean squared error (RMSE),mean absolute error (MAE) and mean absolute percentage error (MAPE) were employed to evaluatethe performances of the ANN, SVM, EMD–SVM and M-EMDSVM models. The comparison of resultsreveals that the M-EMDSVM approach has provided a superior alternative to ANN, SVM and EMD–SVM models for forecasting monthly streamflow at Huaxian hydrological station, and its pass rate of pre-diction reaches up to 82.6% in Huaxian station. To further illustrate the stability and representativeness ofthe modified EMD–SVM model, the Lintong and Xianyang stations were used to verify the model. Theresults show that the modified EMD–SVM model has a good stability and great representativeness as wellas a high prediction precision.

� 2014 Elsevier B.V. All rights reserved.

1. Introduction

It is of great significance for the optimal management of waterresources to predict streamflow, especially in long term forecast-ing. A lot of water resources utilizations including hydropowergeneration, optimal reservoir operation, the rational allocation ofwater resources and environment protection, etc. (Guo et al.,2011; Wang et al., 2009) need the runoff forecasting results tomake decisions. Therefore, streamflow prediction has achieved alot of attention in recent decades.

Typically, physically based numerical models are employed topredict the streamflow (Partington et al., 2012; Marco et al.,2010). They have simulated the streamflow generation processthrough a governing equation which is subject to some specificboundary conditions. However, these models need a great deal ofprecise data to make parameters calibration. Actually, with manylimitations, it is extremely difficult to acquire adequate precisedata, which leads to poor model performance and some uncertain-ties (Yoon et al., 2011). Many previous researchers have used

time-series models including AR model, MA model and ARIMAmodel (Raman and Sunilkumar, 1995) to forecast the streamflow.All the models assume that the relationship between the input andoutput series is linear, however, in fact, that is highly nonlinear.Therefore, they miss the nonlinear information hidden in thestreamflow series, resulting in a poor performance. With a strongability of nonlinear mapping, ANNs have been successfully appliedin different subject areas including hydrology and water resource(Yoon et al., 2007; Zealand et al., 1999; Sudheer et al., 2002).Nevertheless, ANNs also have their own shortcomings includingslowly learning speed, over-fitting, curse of dimensionality andconvergence to local minimum. Therefore, when handling complexhydrological processes, they are inclined to acquire a poor perfor-mance. However, Support Vector Machines (SVMs) proposed byCortes and Vapnik (1995) are based on the VC dimension theoryand the structural risk minimization principle, which can theoret-ically achieve the global optimum. In the last decades, the SVM hasbeen applied in the field of hydrology. Yu et al. (2012) have intro-duced SVM to predict the multi-layer soil moisture; Lin et al.(2013) have employed support vector machine to forecast typhoonflood; Yu et al. (2006) have proposed a real-time flood stage fore-casting model based on SVR; Chen et al. (2010) have predictedthe daily precipitation using support vector machines and

Page 2: Monthly streamflow prediction using modified EMD-based support vector machine

S. Huang et al. / Journal of Hydrology 511 (2014) 764–775 765

multivariate analysis; Lin et al. (2009) have employed the SVM toforecast the hourly reservoir inflow during typhoon-warning periods.

Since the streamflow is affected by abundant factors includingrainfall, evaporation, atmospheric circulation and so on, its gener-ation process tends to be uncertain, highly nonlinear and time-varying, especially when extreme weather appears, thus themonthly streamflow series contains different frequency compo-nents. However, many previous studies (Guo et al., 2011) had di-rectly used the original series as the input variables when theyconstructed a forecasting model, which leads to missing some fea-tures of different resolution. According to Chou and Wang (2004),it cannot be easily reflected the internal mechanism of the changesin runoff to use only one resolution component to establish theprediction model of streamflow time-series. Considering aboutthis, empirical mode decomposition (EMD) was used to deal withthe data-preprocessing in this paper. Unlike wavelet analysis andalmost all other previous decomposition methods, EMD is basedon the principle of local scale separation and do not need any pre-determined basis functions (Huang et al., 1998; Kim and Oh, 2008;Lee and Ouarda, 2010). Thus, EMD is adaptive, empirical, direct andintuitive. Each series can be decomposed by EMD into a set of com-ponents called intrinsic mode functions (IMFs). Since the basis isadaptive, the IMFs have physically meanings. There is no needfor harmonics because of the adaptive nature of the basis (Kimand Oh, 2006, 2009). Therefore, contrary to wavelet analysis,EMD is ideally suitable for analyzing nonlinear and non-stationarydata (Huang and Wu, 2008). As we all know that the monthlystreamflow series is highly nonlinear and non-stationary, this isthe reason why we employed EMD instead of wavelet analysis tohandle the data-preprocessing in this paper.

Based on the outlined above, the study in this paper has devel-oped a modified model to forecast streamflow, which makes animprovement on the EMD-based support vector(EMD–SVM)ma-chine by removing the high frequency (IMF1). And this paper is or-ganized as follows: Section 2 provides a brief introduction to themethodology mentioned above including EMD and SVM, Section 3describes details of the study area and data, Section 4 presents anddiscusses the results of the case study, while conclusions are madefor this research in Section 5.

2. Methodology

2.1. Empirical mode decomposition (EMD)

As an adaptive method used for signal analysis, EMD was pro-posed by Huang et al. (1998). It is specifically designed to analyzethe nonlinear and non-stationary data. EMD produces Hilbert–Huang transform (HHT) coupled with Hilbert spectral analysis(HSA) representing a new data-driven paradigm for analyzing data(Huang and Wu, 2008). In general, HHT acts similarly to waveletanalysis (WA), nevertheless, theoretical basis of WA is mathemat-ical, while that of HHT is empirical. In addition, unlike WA, the ba-sis function of HHT is not fixed a priori. A time series x(t) isdecomposed by EMD approach into some orthogonal and band-limited functions ci(t), i = 1, . . . ,L, which is called intrinsic modefunctions (IMFs) as well as a ‘‘residual’’ r(t) representing the overalltrend of the original series. Then, calculating the summation of allthe IMFs and the final residual to reconstruct the original signal x(t)as follows:

xðtÞ ¼XL

i

ciðtÞ þ rðtÞ ð1Þ

It is worth mentioning that the IMFs have specific physicallymeanings of instantaneous frequency and amplitude. That is tosay, the IMFs represent a physically meaningful time–frequency–

energy description of a time series (Huang and Wu, 2008). Theadaptivity of the local variations in the data produces the physicsof the underlying processes, which makes a significant contribu-tion to analyzing nonlinear and non-stationary data. Huang et al.(1998) discussed that using a predetermined basis to fit all thephenomena would lead to a ‘‘harmonic distortion’’ when analyzingthe nonlinear and non-stationary data by Fourier analysis. It de-rives the basis from the data to generate the necessary adaptive ba-sis in an easy way. Furthermore, a nonlinear system does not allowany explanations by superposition. Thus, any linear expansion for anonlinear system has no physical significance. However, the HHTobtains the physical meaning related to the full nonlinear systemthrough individual components in the linear system rather thana physically linear expansion. In this way, HTT has an ability to ac-quire the important features of non-stationary and nonlinear data(Huang et al., 1998; Kijewski-Correa and Kareem, 2007; Lee andOuarda, 2011).

All IMFs represent time-series which must meet with two char-acteristics: (1) the number of extrema including maxima and min-ima and that of zero crossings must be equal or at most differs byone; (2) in any case, the mean value of the envelope defined by thelocal maxima and that of the local minima is zero. Through an iter-ative process which is called ‘‘sifting’’ the IMFs are produced. As thecore of the EMD approach, its main function is to remove ridingwaves and make the wave profiles more symmetric. The siftingprocess is described as follows (Huang et al., 1998; Yang et al.,2010):

(1) Identify all local extrema including minima and maxima ofx(t).

(2) Connect all local extrema with a cubic spline line to generatethe upper and lower envelope, eup and elow, respectively.

(3) Calculate the mean of the envelopes by m = [elow + eup]/2.(4) Obtain the detail, h(t) = x(t) �m(t).(5) Check the details of h(t): if h(t) satisfies the above two char-

acteristics, an IMF is generated with the residualm(t) = x(t) � h(t) replacing x(t); else h(t) is not an IMF withh(t) replacing x(t).

Repeat steps 1–5 until the stop criterion is met. A Cauchy typeof convergence test is employed to determine the stoppage crite-rion, which needs the normalized squared difference betweentwo successive sifting operations to be small. The difference is de-fined as follows:

SDk ¼XT

t¼0

jh1ðk�1ÞðtÞ � h1kðtÞj2

h21ðk�1ÞðtÞ

ð2Þ

According to Huang et al.(1998), on the condition that SDk is be-tween 0.2 and 0.3, then set c1ðtÞ ¼ h1k which is the first IMF.Through a large number of tests, it is found that SDk limited in0.2–0.3 can guarantee the obtained IMF to have enough physicalmeaning.

2.2. Support vector machine (SVM)

The idea of support vector machine (SVM), which is one of themost effective forecasting tools in recent years, has been proposedby Vapnik (1995). The basic thought about SVM technique is tomake use of a linear model to carry out nonlinear class boundaries,which projects the input vector into the high-dimensional featurespace by some nonlinear mapping. The linear model built in thenew space can stand for a nonlinear decision boundary in the ori-ginal space. The SVM is based on the structural risk minimizationinduction principle rather than the empirical risk minimization.As abundant papers and books provide a detailed introduction

Page 3: Monthly streamflow prediction using modified EMD-based support vector machine

766 S. Huang et al. / Journal of Hydrology 511 (2014) 764–775

about the theory of SVM technique (Vapnik, 1998; Lin et al., 2008;Gao et al., 2001; Karamouz et al., 2009), thus a brief description ofsupport vector regression is presented here.

Based on N training data fðxi; diÞgNi (xi represents input vector, di

means the desired value and N is the total number of training data),the SVM estimator on regression is expressed as follows:

y ¼ f ðxÞ ¼ wi/iðxÞ þ b ð3Þ

where /i is a nonlinear transfer function mapping the input vectorsinto a high dimensional feature space, and wi represents a weightvector and b denotes a bias. The coefficients (wi and b) can be reck-oned by minimizing the following regularized risk function (Vapnik,1995, 1998):

rðCÞ ¼ C1N

XN

i¼1

Leðdi; yiÞ þ12kwk2 ð4Þ

where

Leðd; yÞ ¼jd� yj � e; if jd� yjP e

0; otherwise

�ð5Þ

In Eq. (4), the first part is the empirical risk which is measuredby Eq. (5). Leðd; yÞ stands for the e� insensitiveloss function. Whenthe forecast value is within the e� tube; then the loss value is zero.The second part is used to measure the flatness of the function. C iscalled the regularized constant determining the degree of theempirical error in the optimization problem. Once the value of C in-crease, then an relative importance of the empirical risk concern-ing the regularization term will increase. e is marked as the errortolerance which is equal to the approximation accuracy of thetraining process. n and n* are denoted as positive slack variablespenalizing the training errors by the loss function within the errortolerance e. After that, Eq. (3) is converted to the following con-strained form.

Minimize :12kwk2 þ C

XN

i

ðni þ n�i Þ !

ð6Þ

Subject towi/ðxiÞ þ bi � di 6 eþ n�idi �wi/ðxiÞ � bi 6 eþ ni

ni; n�i ; i ¼ 1;2;3; � � �;N

8><>:

For solving the constrained optimization problem above, theprimal Lagrangian form is employed, whose formula is as follows:

L ¼ 12 kwk

2 þ CXN

i¼1

ðni þ n�i Þ !

�XN

i¼1

aiðwi/ðxÞ þ b� di þ eþ niÞ

�XN

i¼1

a�i ðdi þwi/ðxÞ � bþ eþ n�i Þ �XN

i¼1

ðbini þ b�i n�i Þ

ð7Þ

Eq. (7) is minimized corresponding to primal variables wi, b, nand n*, and maximized corresponding to the positive Lagrangianmultipliers a�i and b�i . Finally, Karush–Kuhn–Tucker conditionsare employed to the regression, and Eq. (7) also has a dual Lagrang-ian form as follows:

tðai;a�i Þ ¼XN

i¼1

diðai � a�i Þ � eXN

i¼1

ðai þ a�i Þ �12

XN

i¼1;j¼1

ðai � a�i Þðaj

� a�j ÞKðxi; xjÞ ð8Þ

Subject toPN

i¼1ðai � a�i Þ ¼ 0 and ai;a�i 2 ½0;C�; i ¼ 1;2; � � �;N InEq. (8), the Lagrange multipliers meet the equality ai � a�i ¼ 0.The Lagrange multipliers ai and a�i are computed, and thencalculating the optimal desired weight vector of the regressionhyperplane as follows:

w� ¼XN

i¼1

ðai � a�i ÞKðx; xiÞ ð9Þ

Therefore, the regression function can be expressed as follows:

f ðx;a;a�Þ ¼XN

i¼1

ðai � a�i ÞKðx; xiÞ þ b ð10Þ

where K(x, xi) represents the Kernel function which is defined asfollows:

Kðxi; xjÞ ¼ uðxiÞ �uðxjÞ ð11Þ

A function meeting Mercer’s condition (Vapnik, 1998) can beused as the Kernel function. In this study, Radial basis function(RBF) is employed as the kernel function:

Kðx; xjÞ ¼ expð�kx� xjk2=2r2Þ ð12Þ

where r represents the Gaussian noise level of standard deviation.

2.3. The modified EMD-based SVM model

To get more features of different resolution about nonlinearand non-stationary monthly streamflow series, the original seriesis decomposed into one residual series and several IMFs that areextracted from the original monthly streamflow with multiplefrequency using EMD technique. For the sake of being more con-cise and convenient, all the IMFs as well as the original series areuniformly called as sub-series in the following section. After that,each sub-series is modeled by SVM model respectively, resultingin forecasting each sub-series. Ultimately, through calculating thesummation of all the prediction results of all sub-series, we canobtain an ensemble fitness and predicting outcomes for the origi-nal monthly streamflow series. The forementioned model is de-noted as EMD–SVM. According to Giulia et al. (2011) and Guoet al. (2012), as the most nonlinear and unsystematic part ofthe monthly streamflow series, the high frequency (called IMF1)which is always extremely small has made a great disturbancefor the predicting accuracy of monthly streamflow. The character-istics of the IMF1 make it to be the most difficult to model accu-rately, representing the main source of errors in the model fittingprocess. In spite of fitting it well by the model, its prediction isunstable and even poor. It is found that the more nonlinear andnon-stationary original series leads to be more irregular predict-ing IMFs, resulting in worse prediction accuracy. Based on theoutlined above, a modified model (M-EMDSVM) which removesthe IMF1 is employed to increase the prediction accuracy of thenon-stationary monthly streamflow series.

In addition, motivated by the recognition of parameter p inARMA (p,q) model, the statistics tool which is the partial autocor-relation Function (PACF) was employed to conquer the limitationof neglecting the relationship between input(s) and output(s) ofSVM. The input variables are determined by analyzing the resultingpartial autocorrelation diagram that is the plots of PACF corre-sponding to the lag length. Specifically, we assume that the outputvariable is xi, on the condition that the PACF at lag k is out of the95% confidence interval which is ½�1:96=

ffiffiffinp

;1:96=ffiffiffinp� approxi-

mately, then xi�k is one of the input variable. It is worth mentioningthat sometimes none of the PACF coefficients is out of the 95% con-fidence interval. At this moment the previous one value is em-ployed, that is xi�1, as the input variable. In this paper, all themodels including the basic SVM, the SVMs in EMD–SVM and itsmodified model select inputs using the forementioned improvedmethod.

uðBÞxt ¼ hðBÞat ð13Þ

Page 4: Monthly streamflow prediction using modified EMD-based support vector machine

S. Huang et al. / Journal of Hydrology 511 (2014) 764–775 767

where

uðBÞ ¼ 1�u1B�u2B2 � � � �upBp

hðBÞ ¼ 1� h1B� h2B2 � � � �hqBq

(

Here, the brief introduction of PACF is as follows. For a time ser-iesfw1;w2; :::;wng;the Covariance at lag k (when k = 0, it is the Var-iance), expressed byck; is reckoned in Eq. (14)

ck ¼1n

Xn�k

i¼1

ðwi � �wÞðwiþk � �wÞ; k ¼ 0;1; . . . ;M ð14Þ

where �w is the average of the series, M = n/4 is the maximum lag.Distinctly, c�k ¼ ck.

Then the autocorrelation function (ACF) at lag k, expressed byqk; can be reckoned on the basis of Eq. (15).

qk ¼ck

c0ð15Þ

According to the Covariance and the resulting ACF, the calcula-tion for the PACF at lag k is exhibited, expressed by akk; as follows

an ¼ q1

akþ1;kþ1 ¼qkþ1�

Pk

j¼1qkþ1�jakj

1�Pk

j¼1qjakj

akþ1;j ¼ akj � akþ1;kþ1 � ak;k�jþ1

ðj ¼ 1;2; � � �; kÞ

9>>>>>>=>>>>>>;

where k ¼ 1;2; . . . ;M: Based on the previous approaches and tech-niques, the M-EMDSVM paradigm is presented in Fig. 1.

2.4. Three comparing indicators

The root mean square errors (RMSE), mean absolute errors(MAE) and mean absolute percentage error (MAPE) statistics indi-cators were employed to estimate the accuracy of ANN, SVM,

Time Series Data

EMD

IMF1 IMF2 IMFi IMFn Rn

SVM1 SVM2 SVMi SVMn

PACF1

SVMn+1

PACF2 PACFi PACFn PACFn+1

Fitness and Prediction

Output

Input

...

... ...

......

…... …...

Remove The High Frenquency

Fig. 1. The overall process of the M-EMDSVM ensemble methodology.

EMD–SVM and M-EMDSVM models. The RMSE, MAE and MAPEare defined as:

RMSE ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1n

XN

i¼1

ðYiobserved � YiestimateÞ2vuut ð17Þ

MAE ¼ 1n

XN

i¼1

Yiobserved � Yiestimatej j ð18Þ

MAPE ¼ 1n

XN

i¼1

Yiobserved � Yiestimated

Yiobserved

��������� 100% ð19Þ

The RMSE evaluates the fitness related to high streamflow val-ues whereas the MAE and MAPE assesses a more balanced perspec-tive of the fitness at moderate streamflows (Karunanithi et al.,1994). In which N denotes the number of data set, Yi representsthe monthly streamflow.

3. Study area and data

The Wei River Basin, as shown in Fig. 2, was selected in this re-search. The Wei River is the largest tributary of the Yellow River,which originates from Mountain Bird Mouse in Weiyuan County,Gansu Province and flows about 818 km from Longxi, Wushan,Tianshui, Baoji, Yanglin, Xianyang, Xi’an to Tongguan where itflows into the Yellow River ultimately. The drainage basin lies in103.5�E-110.5�E and 33.5�N-37.5�N, covering a total area of1.35 � 105 km2. Located in the continental monsoon climate zone,Wei River Basin is characterized by abundant precipitation andhigh temperatures in summer, and by rare precipitation and verylow temperatures in winter. The annual precipitation of the basinis about 559 mm (Zhang et al., 2008). Topographically, the altitudedecreases from the highest northwest mountainous areas to thelowest Guanzhong Plain in the southeast and south portion ofthe basin. It is worth mentioning that the Guanzhong Plain is des-ignated as a state key economic development zone, acting as agreat stimulus to the economic development of surrounding area.Therefore, the Guanzhong Plain’s economic development will di-rectly affect the sustainable development of Shaanxi province’seconomic society. In recent years, with the development of econ-omy and population growth, the Guanzhong region water demandincrease greatly. At present, it is difficult for the local water ofGuanzhong region to satisfy the requirement of economic and so-cial development. Given the significance of water security in theGuanzhong plain, an improvement on the ability of forecastingthe monthly streamflow in the Wei River Basin is of great signifi-cance to making a scientific water resources management strategy.

The daily streamflow data collected from the Huaxian, Lintongand Xianyang Hydrological stations in the Wei River Basin wereemployed in this study, whose stations are presented in Fig. 2.These stations have daily runoff data covering January 1st, 1960–December 31st, 2005, which was acquired from the hydrologicmanual. The data quality was strictly controlled during its release.Double-mass curve method was employed to check the data con-sistency, the results indicate that all the daily streamflow data usedin the paper are consistent. The data set from January 1960 toDecember 1995 is used for calibration whilst that from January1996 to December 2005 is used for validation. It is worth mention-ing that the Huaxian station was used to build the predicting mod-els, while the Lintong and Xianyang stations were used to verifythe stability and representativeness of the modified EMD–SVMmodel. The details of computing process of the two stations areomitted due to their similarity of the Huaxian station and the space

Page 5: Monthly streamflow prediction using modified EMD-based support vector machine

Fig. 2. The study area and location of the three hydrological stations.

Fig. 3. The observed monthly streamflow for Huaxian hydrological station from 1960 to 2005.

768 S. Huang et al. / Journal of Hydrology 511 (2014) 764–775

limitations. Fig. 3 shows the long-term monthly streamflow inHuaxian hydrological station. It can be seen from Fig. 3 that thestreamflow shows a highly unsteady state, which indicates that itis difficult to make a precise monthly streamflow forecasting. Obvi-ously, the monthly streamflow series has a remarkably decreasingtrend.

4. Results and discussions

4.1. The original series decomposed

The initial monthly streamflow is firstly decomposed by EMDapproach into some different IMFs and one residual. During the

Page 6: Monthly streamflow prediction using modified EMD-based support vector machine

Fig. 4. The decomposed sub-series of the monthly streamflow at Huaxian station.

Fig. 5. The PACFs of the normalized original series and sub-series of the mean monthly steamflow during the period from 1960 to 2005 for Huashan station.

S. Huang et al. / Journal of Hydrology 511 (2014) 764–775 769

Page 7: Monthly streamflow prediction using modified EMD-based support vector machine

Table 1The input variables of each normalized series.

Series Input variables

Original xt�1, xt�2

IMF1 xt�1, xt�2

IMF2 xt�1, xt�2, xt�3, xt�4, xt�5, xt�6

IMF3 xt�1, xt�2, xt�3, xt�4, xt��5

IMF4 xt�1, xt�2, xt�3, xt�4, xt�5

IMF5 xt�1, xt�2, xt�3, xt�4, xt�5

r xt�1, xt�2, xt�3, xt�4, xt�5

Table 2Forecasting performance indicators of models in Huaxian hydrological station.

Indicators ANN SVM EMD–SVM M-EMDSVM

A-RMSE (m3/s) 2.90 2.39 1.8 1.07F-RMSE (m3/s) 1.83 1.22 0.91 0.53S-RMSE (m3/s) 3.97 3.16 2.38 1.41A-MAE (m3/s) 1.66 1.22 0.85 0.51F-MAE (m3/s) 0.95 0.73 0.55 0.3S-MAE (m3/s) 2.37 1.72 1.15 0.72A-MAPE (%) 47.6 44.60 30.40 17.40F-MAPE (%) 34.8 30.60 22.90 12.60S-MAPE (%) 60.4 58.60 37.90 22.20

Notes: A – means the all 120 months, F – denotes the first half of the 120 months,S – stands for the second half of the 120 months, the following Tables 3 and 4 is thesame.

Table 3Forecasting performance indicators of models in Lintong hydrological station.

Indicators ANN SVM EMD–SVM M-EMDSVM

A-RMSE (m3/s) 2.78 2.20 1.66 1.087F-RMSE (m3/s) 1.91 1.33 0.96 0.62S-RMSE (m3/s) 3.65 3.07 2.36 1.54A-MAE (m3/s) 1.65 1.37 0.84 0.56F-MAE (m3/s) 0.86 0.81 0.63 0.45S-MAE (m3/s) 2.44 1.93 1.05 0.67A-MAPE (%) 48.8 45.40 31.10 18.60F-MAPE (%) 36.4 31.50 23.70 13.80S-MAPE (%) 61.2 59.30 38.50 23.40

770 S. Huang et al. / Journal of Hydrology 511 (2014) 764–775

decomposing procedure, it is not stopped until the residual is lessthan a predetermined value (0.01 is set as the predetermined valuein this paper) based on the sifting stoppage indicator introduced inSection 2. Ultimately, five IMFs and one residual are generatedfrom the initial series, which are shown in Fig. 4.

Fig. 4 shows that IMF1 with irregularity is the most nonlinearand disorder components of the monthly streamflow series. Asthe value of IMF1 is extremely small, so that it has made little con-tribution to model fitting, indicating that it is extremely difficult topredict the high frequency parts accurately. Based on the above,the employed modified model removes the IMF1. Moreover, theIMF3 and IMF4 value indicate that the monthly streamflow ofWei River Basin has 1-year and 11-years periods approximately.The 1-year period is corresponding to the earth moving aroundthe sun, while the 11-years period may be response to sunspot cy-cle. It is worth mentioning that the r value has a remarkably down-ward trend, which is consistent with the previous analysis inSection 3, indicating that the monthly streamflow of the Wei RiverBasin has a significant decline. Recently, the economy of Wei RiverBasin has achieved a rapid development, resulting in a largeamount of water resources. However, the decline of monthlystreamflow will aggravate the contradiction between the supplyand demand of water resources in the basin, especially in Guanz-hong Plain. Therefore, it is significant to make an accurate stream-flow forecasting, which is of great importance for the governmentto make a scientific water resources system planning.

Fig. 6. The process of acquiring optimal

4.2. Standardization of data processing

As the streamflow series is highly disorder and nonlinear, on thecondition that these data are directly input to SVM for training,then the learning process will produce large fluctuations of the val-ues, which will fail to reflect the small changes in measured values.According to these, the monthly streamflow series needs to be nor-malized before the training process. The formula for normalizationis as follows:

parameter by grid research method.

Page 8: Monthly streamflow prediction using modified EMD-based support vector machine

Table 4Forecasting performance indicators of models in Xianyan hydrological station.

Indicators ANN SVM EMD–SVM M-EMDSVM

A-RMSE (m3/s) 3.04 2.38 1.74 1.11F-RMSE (m3/s) 1.97 1.41 1.03 0.65S-RMSE (m3/s) 4.11 3.35 2.45 1.57A-MAE (m3/s) 1.75 1.37 0.96 0.62F-MAE (m3/s) 1.08 0.86 0.67 0.38S-MAE (m3/s) 2.42 1.88 1.25 0.86A-MAPE (%) 49.3 46.10 32.05 19.53F-MAPE (%) 35.7 32.80 24.60 14.20S-MAPE (%) 62.9 59.40 39.50 24.86

S. Huang et al. / Journal of Hydrology 511 (2014) 764–775 771

xi ¼xi � xmin

xmax � xminð20Þ

After simulation, the data can be re-scaled following the con-trary procedure of Eq. (20).

4.3. Determining the input variables

Firstly, the series including sub-series and original series wasstandardized according to Eq. (20), based on the method describedin Section 2, the PACFs of all the normalized sub-series and thenormalized original series were calculated and the results wereillustrated by Fig. 5, where PACF0 denotes the PACF of the normal-ized original series and the others (PACF1–PACF6) representing thePACF of each normalized sub-series respectively. According to theinput selection method introduced in Section 2, the input variablesare easily acquired by observing Fig. 5. Assuming xt is the outputvariable, the input variables of these seven normalized series forSVM model are shown in Table 1.

In Table 1, where the series of {xi} stands for the seven normal-ized series. In the iterative process of monthly streamflow predic-tion, xi stands for the corresponding predicting value of each serieswhen i exceeds the length of the series.

Fig. 7. Forecasted and observed streamflow during validation period by AN

4.4. Grid research method for parameter selection

After determining the input variables of each normalized series,then SVM model was employed to simulate the monthly stream-flow. As known, SVM model is sensitive to parameter selection,therefore, it is of significant importance to obtain the optimalparameter. In this study, grid research method with 3D displaytechnique was employed to acquire the optimal c and g values.Fig. 6 shows the optimal parameter research processes of the singleSVM model using grid research method. The best c and g values canbe obtained easily through this method.

4.5. Comparative analysis

After simulating the seven normalized series using SVM modeland re-scaling them, the simulation of the six sub-series as well asthe original series are obtained. Furthermore, the original seriesalso has been modeled by the basic SVM model. At the same time,the summation of all the prediction results of all sub-series are cal-culated to generate an ensemble fitness and forecasting result,which is the EMD–SVM modeling procedure. Moreover, the M-EMDSVM model is based on EMD–SVM, which calculates the sum-mation of the results of all the sub-series except IMF1. In addition,in order to further prove the superiority of the modified model,artificial neural network (ANN) was employed to predict themonthly steamflow in this study. The assessments of forecastingresults acquired by the four models of the monthly streamfloware presented in Table 2.

It can be obviously observed from Table 2 that the ANN modelhas the worst performance compared with the other models. TheSVM model has a good performance, acquiring a good RMSE,MAE and MAPE values of 2.39%, 1.22% and 44.6%, respectivelywhich reduces the RMSE, MAE and MAPE by 21.7%, 21.7% and6.5%, respectively compared with ANN. While the EMD–SVM mod-el has a better performance and it outperforms the SVM model,

N, SVM, EMD–SVM and M-EMDSVM in Huaxian hydrological station.

Page 9: Monthly streamflow prediction using modified EMD-based support vector machine

Fig. 8. Scatter plot of prediction by each model in Huaxian station.

772 S. Huang et al. / Journal of Hydrology 511 (2014) 764–775

obtaining a better RMSE, MAE and MAPE values of 1.8%, 0.8% and30.4%, respectively, which reduces the RMSE, MAE and MAPE by42.8%, 45.1% and 35.0%, respectively compared with ANN. Amongstall the models, the M-EMDSVM model has the best performanceand it outperforms the SVM and EMD–SVM models in terms ofall the standard statistical indicators, gaining the best RMSE,MAE and MAPE values of 1.07%, 0.51% and 17.4%, respectively,which reduces the RMSE, MAE and MAPE by 63.5%, 64.6% and60.1%, respectively compared with ANN. In terms of MAPE, theANN has the highest value, while the SVM model is 44.6% whichis larger than 30.4% of EMD–SVM and 17.4% of M-EMDSVM. Since

Fig. 9. Forecasted and observed streamflow during validation period by AN

the ANN model has the shortages including over-fitting, curse ofdimensionality and convergence to local minimum, thus its perfor-mance is worse than the SVM model. As the EMD–SVM model usesEMD technique to eliminate the noise of the original series, it gets abetter performance than SVM. Based on the EMD–SVM model, theM-EMDSVM model removes the high frequency which representsthe main source of error in the modeling stage, therefore, it obtainsthe best performance among the four models.

The results analyzed from Tables 3 and 4 are similar to Table 2.In Tables 3 and 4, The ANN model has the worst performancecompared with the other models. Amongst the four models, the

N, SVM, EMD–SVM and M-EMDSVM in Lintong hydrological station.

Page 10: Monthly streamflow prediction using modified EMD-based support vector machine

Fig. 10. Scatter plot of prediction by each model in Lintong hydrological station.

S. Huang et al. / Journal of Hydrology 511 (2014) 764–775 773

modified EMD–SVM has the best performance with the lowestRMSE, MAE and MAPE values. The models applied to the threehydrological stations have the consistent results, which indicatesthat the M-EMDSVM model has the characteristics of greatstability and high consistency.

Furthermore, the F-value and S-value of each model in Table 2indicate that the forecast accuracy is lower as the growth of thepredicted time. In terms of the ANN and SVM models, the S-MAPEvalue increases by 37.2% and 28%, respectively compared with theF-MAPE value which are larger than 15% of EMD–SVM and 9.6% of

Fig. 11. Forecasted and observed streamflow during validation period by A

M-EMDSVM. That indicates both ANN and SVM have a poor stabil-ity for long-term streamflow forecasting. Relatively, EMD–SVM hasa better stability, whilst the best stability belongs to M-EMDSVM.Similarly, it can be observed from Tables 3 and 4 that the M-EMDSVM has the best stability compared with the other models,whilst the ANN and SVM model have a poor stability. The perfor-mances of all prediction models developed at the three hydrologi-cal stations in this study are presented in Figs. 7–12.

Fig. 7 illustrates the streamflow prediction of Huaxian stationby the ANN, SVM, EMD–SVM and M-EMDSVM models in test

NN, SVM, EMD–SVM and M-EMDSVM in Xianyan hydrological station.

Page 11: Monthly streamflow prediction using modified EMD-based support vector machine

Fig. 12. Scatter plot of prediction by each model in Xianyang hydrological station.

774 S. Huang et al. / Journal of Hydrology 511 (2014) 764–775

period. Generally speaking, it can be observed from the hydrographsthat the four models all have a good performance for simulatingthe monthly streamflow. Moreover, Fig. 8 illustrates the scatterplots of prediction by the four models, indicating that theM-EMDSVM model has the best performance for forecasting themonthly streamflow, as the linear trend line of M-EMDSVM is clos-est to the 45-degree line compared with ANN, SVM and EMD–SVM.Similarly, it also can be observed from Figs. 9–12 that theM-EMDSVM model has the best accuracy in forecasting themonthly streamflow at Lintong and Xianyang stations, which isconsistent with that of the Huaxian station. Therefore, thepredicting results of the two stations has further verified the goodstability and high consistency of the modified EMD–SVM.

The reason for a better prediction accuracy of SVM model thanANN model primarily lies in the shortcoming of ANN model, e.g.slowly learning speed, over-fitting, curse of dimensionality andconvergence to local minimum. Conversely, SVM model is basedon the VC dimension theory and the structural risk minimizationprinciple, which can attack the problem in theory. Therefore,SVM model has a better prediction accuracy than ANN model.

In general, the EMD–SVM which is improved by combining withtwo methods, EMD and SVM, tends to be more adequate than thesingle SVM model for forecasting monthly streamflows. The origi-nal signal (monthly flow time-series in the present study) is repre-sented different resolution interval by EMD. In other words, thecomplex hydrological time-series are decomposed into severalsub-series using the EMD technique. Thus, some characteristicsof the subseries such as its daily, monthly, annually periods canbe seen more clearly than the original signal. After building upan EMD–SVM model, the SVM model is constructed with appropri-ate sub-series to belong to different scales. Predictions are moreprecise than that acquired directly by original signals due to thefact that the features (such as periodically) of the sub-series areobvious, (Ning and Yunping, 1998). This is why the EMD–SVMmodel performs better than the SVM model. In practice, it is diffi-cult to study monthly streamflow time-series due to this series af-fected by complex factors. Therefore, each time-series containsdifferent frequency components. Based on these, a modified

EMD–SVM model improved by removing the high frequency(IMF1) that is the most disorder and unsystematic part of themonthly streamflow series was employed in this study, which ac-counts for the better performance of the M-EMDSVM model thanthat of the EMD–SVM model.

To sum up, the modified EMD–SVM has a good performance inpredicting the monthly flow. It has a better accuracy in predictingthe peak flows than Ozgur and Mesut (2011) and Wang and Zhao(2009) who used a wavelet-SVM and several artificial intelligencemethods respectively to predict the monthly flow. Therefore, thispaper has a certain significance in predicting monthly flow.

5. Conclusions

The accuracy of the modified EMD–SVM model has been inves-tigated for predicting monthly streamflow in the Wei River Basin inthe study. The M-EMDSVM model is based on the EMD–SVM whichis coupled with EMD technique and SVM and has made animprovement by removing the high frequency (IMF1). The test re-sults are compared with the ANN, SVM and EMD–SVM models. Asregards Huaxian hydrological station, the ANN has the worst per-formance compared with the other models, and the EMD–SVMmodel has a better performance and it outperforms the ANN andSVM model, obtaining a better RMSE, MAE and MAPE statistics of1.8%, 0.8% and 30.4%, respectively, which reduces the RMSE, MAEand MAPE by 42.8%, 45.1% and 35.0%, respectively compared withANN. Among all the models, the M-EMDSVM model has the bestperformance and it outperforms the ANN, SVM and EMD–SVMmodels in terms of all the standard statistical measures, acquiringthe best RMSE, MAE and MAPE statistics of 1.07%, 0.51% and 17.4%,respectively, which reduces the RMSE, MAE and MAPE by 63.5%,64.6% and 60.1%, respectively compared with ANN. In terms of dif-ferent criteria (RMSE, MAE and MAPE), the M-EMDSVM model per-forms the best with the lowest MSE, MAE and MAPE in all testingcases.

The Lintong and Xianyang hydrological stations are used toverify the stability, consistency and representativeness of the

Page 12: Monthly streamflow prediction using modified EMD-based support vector machine

S. Huang et al. / Journal of Hydrology 511 (2014) 764–775 775

modified EMD–SVM model. Similarly, the predicting results showthat the M-EMDSVM has the best performance compared withthe other models at the two stations. Thus, the M- EMDSVM modelhas a good stability and great representativeness. Therefore, thisstudy concludes that the M-EMDSVM method has provided a supe-rior alternative to the ANN, SVM and EMD–SVM models for fore-casting monthly streamflow. With the significant improvementand the good stability of monthly streamflow forecasting precision,the developed model is an efficient tool for operation, planning anddispatching of hydropower station.

Acknowledgements

This research was supported by National Major FundamentalResearch Program, 973 (2011CB403306), the National NaturalFund Major Research Plan (51190093), Natural Science Foundationof China (51179149), the Ministry of Education in the new centurytalents program (NCET-10-0933). Sincere gratitude is extended tothe editor and anonymous reviewers for their professional com-ments and corrections, which greatly improved the presentationof the paper.

References

Chen, S.T., Yu, P.S., Tang, Y.H., 2010. Statistical downscaling of daily precipitationusing support vector machines and multivariate analysis. J. Hydrol. 385, 13–22.

Chou, C.M., Wang, R.Y., 2004. Application of wavelet-based multi-model Kalmanfilters to real-time flood forecasting. Hydrol. Process. 18, 987–1008.

Cortes, C., Vapnik, V., 1995. Support vector networks. Mach. Learn. 20 (3), 273–297.Gao, J.B., Gunn, S.R., Harris, C.J., Brown, M., 2001. A probabilistic framework for SVM

regression and error bar estimation. Mach. Learn. 46, 71–89.Giulia, N., Francesco, S., Linda, S., 2011. Impact of EMD decomposition and random

initialisation of weights in ANN hindcasting of daily stream flow series: anempirical examination. J. Hydrol. 406, 199–214.

Guo, J., Zhou, J.Z., Qin, H., et al., 2011. Monthly streamflow forecasting based onimproved support vector machine model. Expert Syst. Appl. 38, 13073–13081.

Guo, Z.H., Zhao, W.G., Lu, H.Y., Wang, J.Z., 2012. Multi-step forecasting for windspeed using a modified EMD-based artificial neural network model. Renew.Energy 37, 241–249.

Huang, N.E., Wu, Z., 2008. A review on Hilbert-Huang transform: method and itsapplications to geophysical studies. Rev. Geophys. 46, RG2006.

Huang, N.E., Shen, Z., Long, S.R., Wu, M.L.C., Shih, H.H., Zheng, Q.N., Yen, N.C., Tung,C.C., Liu, H.H., 1998. The empirical mode decomposition method and the Hilbertspectrum for non-stationary time series analysis. Proc. R. Soc. A—Math. Phys.Eng. Sci. 454A, 903–995.

Karamouz, M., Ahmadi, A., Moridi, A., 2009. Probabilistic reservoir operation usingBayesian stochastic model and support vector machine. Adv. Water Resour. 32(11), 1588–1600.

Karunanithi, N., Grenney, W.J., Whitley, D., Bovee, K., 1994. Neural networks forriver flow prediction. J. Comput. Civil Eng. 8 (2), 201–220.

Kijewski-Correa, T., Kareem, A., 2007. Using multi-objective genetic algorithm forSVM construction. J. Eng. Mech. 133 (2), 238–245.

Kim, D., Oh, H.S., 2006. Hierarchical smoothing technique by empirical modedecomposition. Korean J. Appl. Stat. 19, 319–330.

Kim, D., Oh, H.S., 2008. EMD: Empirical Mode Decomposition and Hilbert SpectralAnalysis. <http://cran.r-project.org/web/packages/EMD/index.html>.

Kim, D., Oh, H.S., 2009. EMD: a package for empirical mode decomposition andhilbert spectral analysis. Royal J. 1, 40–46.

Lee, T., Ouarda, T.B.M.J., 2010. Long-term prediction of precipitation and hydrologicextremes with nonstationary oscillation processes. J. Geophys. Res.: Atmos. 115,D13107.

Lee, T., Ouarda, T.B.M.J., 2011. Prediction of climate non-stationary oscillationprocesses with empirical mode decomposition. J. Geophys. Res.- Atmos. 116,D06107.

Lin, S.W., Ying, K.C., Chen, S.C., Lee, Z.-J., 2008. Particle swarm optimization forparameter determination and feature selection of support vector machines.Expert Syst. Appl. 35 (4), 1817–1824.

Lin, G.F., Chen, G.R., Huang, P.Y., Chou, Y.C., 2009. Support vector machine-basedmodels for hourly reservoir inflow forecasting during typhoon-warning periods.J. Hydrol. 372, 17–29.

Lin, G.F., Chou, Y.C., Wu, M.C., 2013. Typhoon flood forecasting using integratedtwo-stage Support Vector Machine approach. J. Hydrol. 486, 334–342.

Marco, B., Fausto, T., Alberto, P., et al., 2010. Development and testing of a physicallybased, three-dimensional model of surface and subsurface hydrology. Adv.Water Resour. 33, 106–122.

Ning, M., Yunping, C., 1998. An ANN and wavelet transformation based method forshort term load forecast. Energy management and power delivery. Int. Conf. 2,405–410.

Ozgur, K., Mesut, C., 2011. A wavelet-support vector machine conjunction model formonthlystreamflow forecasting. J. Hydrol. 399, 132–140.

Partington, D., Brunner, P., Simmons, C.T., et al., 2012. Evaluation of outputs fromautomated baseflow separation methods against simulated baseflow from aphysically based, surface water–groundwater flow model. J. Hydrol. 458–459,28–39.

Raman, H., Sunilkumar, N., 1995. Multivariate modelling of water resources timeseries using artificial neural networks. Hydrol. Sci. J. 40 (2), 145–163.

Sudheer, K.P., Gosain, A.K., Ramasastri, K.S., 2002. A data-driven algorithm forconstructing artificial neural network rainfall–runoff models. Hydrol. Process.16, 1325–1330.

Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer, New York.Vapnik, V., 1998. Statistical Learning Theory. Wiley, New York.Wang, H., Zhao, W., 2009. ARIMA model estimated by Particle Swarm optimization

algorithm for Consumer price index forecasting, Lecture notes in computerScience. Artif. Intel. Comput. Intel. 5855, 48–58.

Wang, W.C., Chau, K.W., Cheng, C.T., Qiu, L., 2009. A comparison of performance ofseveral artificial intelligence methodsfor forecasting monthly discharge timeseries. J. Hydrol. 374, 294–306.

Yang, P.C., Wang, G.L., Bian, J.C., Zhou, X.J., 2010. The prediction of non-stationaryclimate series based on empirical mode decomposition. Adv. Atmos. Sci. 27 (4),845–854.

Yoon, H., Hyun, Y., Lee, K.K., 2007. Forecasting solute breakthrough curves throughthe unsaturated zone using artificial neural networks. J. Hydrol. 335, 68–77.

Yoon, H., Jun, S.C., Hyun, Y.J., Bae, G.O., Lee, K.K., 2011. A comparative study ofartificial neural networks and support vector machines for predictinggroundwater levels in a coastal aquifer. J. Hydrol. 396, 128–138.

Yu, P.S., Chen, S.T., Chang, I.F., 2006. Support vector regression for real-time floodstage forecasting. J. Hydrol. 328, 704–716.

Yu, Z.B., Liu, D., Lü, H.S., Fu, X.L., Xiang, L., Zhu, Y.H., 2012. A multi-layer soilmoisture data assimilation using support vector machines and ensembleparticle filter. J. Hydrol. 475, 53–64.

Zealand, C.M., Burn, D.H., Simonovic, S.P., 1999. Short-term streamflow forecastingusing artificial neural networks. J. Hydrol. 214, 32–48.

Zhang, H., Chen, Y., Ren G., Yang G., 2008. The characteristics of precipitationvariation of Weihe River Basin in Shaanxi Province during recent 50 years. Agri.Res. Arid Areas 26 (4), 236–242.