precipitation forecasting by using wavelet-support vector machine conjunction model
TRANSCRIPT
Engineering Applications of Artificial Intelligence 25 (2012) 783–792
Contents lists available at SciVerse ScienceDirect
Engineering Applications of Artificial Intelligence
0952-19
doi:10.1
n Corr
E-m
journal homepage: www.elsevier.com/locate/engappai
Precipitation forecasting by using wavelet-support vector machineconjunction model
Ozgur Kisi a,n, Mesut Cimen b
a Erciyes University, Engineering Faculty, Civil Engineering Deptartment, 38039 Kayseri, Turkiyeb Suleyman Demirel University, Engineering Faculty, Civil Engineering Deptartment, Isparta, Turkiye
a r t i c l e i n f o
Article history:
Received 22 March 2011
Received in revised form
26 August 2011
Accepted 14 November 2011Available online 30 November 2011
Keywords:
Precipitation
Discrete wavelet transform
Support vector machine
Forecast
76/$ - see front matter & 2011 Elsevier Ltd. A
016/j.engappai.2011.11.003
esponding author. Tel.: þ90 352 4374901; fa
ail address: [email protected] (O. Kisi).
a b s t r a c t
A new wavelet-support vector machine conjunction model for daily precipitation forecast is proposed
in this study. The conjunction method combining two methods, discrete wavelet transform and support
vector machine, is compared with the single support vector machine for one-day-ahead precipitation
forecasting. Daily precipitation data from Izmir and Afyon stations in Turkey are used in the study. The
root mean square errors (RMSE), mean absolute errors (MAE), and correlation coefficient (R) statistics
are used for the comparing criteria. The comparison results indicate that the conjunction method could
increase the forecast accuracy and perform better than the single support vector machine. For the Izmir
and Afyon stations, it is found that the conjunction models with RMSE¼46.5 mm, MAE¼13.6 mm,
R¼0.782 and RMSE¼21.4 mm, MAE¼9.0 mm, R¼0.815 in test period is superior in forecasting daily
precipitations than the best accurate support vector regression models with RMSE¼71.6 mm,
MAE¼19.6 mm, R¼0.276 and RMSE¼38.7 mm, MAE¼14.2 mm, R¼0.103, respectively. The ANN
method was also employed for the same data set and found that there is a slight difference between
ANN and SVR methods.
& 2011 Elsevier Ltd. All rights reserved.
1. Introduction
Forecast of precipitation is essential for planning and manage-ment of water resources. But, the temporal and spatial variationof rainfall makes it difficult. Especially, it is important forhydrologists and meteorologists to accurately estimate dailyexcessive precipitations, which are responsible for some floods.Using some numerical models by Bustamante et al. (1999) andOlson et al. (2004) and physical models by Georgakakos and Bras(1984), studies of rainfall quantitative prediction have beencarried out. However, they are not successful enough in forecast-ing precipitation (Olson et al., 1995) due to inaccurate initialconditions, parameterization schemes of subscale phenomena,and limited spatial resolution (Ramirez et al., 2005). Using anumeric weather model, rainfall prediction is in general depen-dent highly on grid point values. However, it is quite difficult forrainfall as it is variable both in space and time.
In last decades, soft computing techniques has been success-fully used in precipitation/rainfall forecasting. French et al. (1992)used an artificial neural networks (ANN) for forecasting rainfallintensity fields at a lead-time of 1 h. Navone and Ceccatto (1994)predicted summer monsoon rainfall over India using an ANN.
ll rights reserved.
x: þ90 352 4375784.
Sivapragasam et al. (2001) forecasted rainfall using singularspectrum analysis coupled with support vector machine (SVM)approach. Freiwan and Cigizoglu (2005) investigated the accuracyof ANN technique for estimating monthly precipitation amount.Marzano et al. (2006) compared the accuracy of ANN approachwith the previously developed regression techniques in estima-tion of precipitation intensity. Ingsrisawang et al. (2008) useddecision tree, ANN and SVM for short-term rain forecasting. Hunget al. (2009) applied ANN for forecasting rainfall of Bangkok,Thailand. Wu et al. (2010a,b) used SVM for daily rainfall fore-casting. Dastorani et al. (2010) examined the potential of ANNand neuro-fuzzy models dryland precipitation prediction. Wuet al. (2010a,b) predicted daily and monthly rainfall time seriesby using modular ANN coupled with data-preprocessing techni-ques. Moustris et al. (2011) forecasted precipitation of Greece byusing ANN method. El-Shafie et al. (2011) used a neuro-fuzzy andANN techniques for forecasting monthly rainfalls of Klang River inMalaysia. Hong (2008) used SVM and chaotic particle swarmoptimization algorithm for rainfall forecasting.
The application of wavelet transform for analyzing variations,periodicities, trends in time series has received much attention inrecently years (Smith et al., 1998; Lu, 2002; Chou and Wang,2002; Dai et al., 2003; Coulibaly and Burn, 2004; Partal andKucuk, 2006; Zhou et al., 2008; Santos et al., 2009; Kisi, 2007,2009; Kisi and Cimen, 2011). Smith et al. (1998) used a discretewavelet transform (DWT) for quantifying streamflow variability.
O. Kisi, M. Cimen / Engineering Applications of Artificial Intelligence 25 (2012) 783–792784
They suggested that streamflows could be effectively classifiedinto distinct hydroclimatic categories using DWT. Lu (2002)applied wavelet transform for decomposition of interdecadaland interannual components of rainfall data in rainy season.Chou and Wang (2002) used a DWT for decomposition of unithydrograph. They employed on-line estimation of unit hydro-graph using the DWT components. Xingang et al. (2003) investi-gated the rainfall spectrum and its evolution of North China inrainy season with summer monsoon decaying interdecadal timescale using wavelet analysis. Coulibaly and Burn (2004) usedwavelet analysis to identify and describe variability in annualCanadian streamflows and to gain insights into the dynamical linkbetween the streamflows and the dominant modes of climatevariability in the Northern Hemisphere. Partal and Kucuk (2006)used a DWT for determining the possible trends in annual totalprecipitation series. They used the precipitation records frommeteorological stations of Turkiye and concluded that the trendanalysis on DWT components of the precipitation timeseries clearly explains the trend structure of data. Kisi (2008)predicted monthly streamflows using wavelet network models.Zhou et al. (2008) proposed a wavelet predictor-corrector modelfor the simulation and prediction of monthly discharge timeseries. Kisi (2009) introduced the wavelet-ANN conjunctionmodel for forecasting intermittent streamflow. Nourani et al.(2009) predicted Ligvanchai watershed precipitation using acombined neural-wavelet model. Kisi and Cimen (2011) proposeda wavelet and support vector machine conjunction model formonthly streamflow forecasting. All these studies showed thatwavelet transform is an effective tool for precisely locatingirregularly distributed multi-scale features of climate elementsin space and times.
The purpose of this paper is to investigate the performance ofwavelet and support vector machine conjunction model for one-day-ahead precipitation forecasting and to compare this with theperformance of single SVM and ANN models. The presented studyis the first application for forecasting precipitation using waveletand support vector machine in literature.
2. Support vector machines
The idea of SVMs, which are known as the classification andregression procedures, has been developed by Vapnik (1995).Support vector regression (SVR) is used to describe regressionwith SVMs in the open literature. In regression estimation withSVR we attempt to estimate a functional dependency f ð x
!Þ
between a set of sampled points X ¼ f x!
1, x!
2,. . .. . ., x!
lg takenfrom Rn and target values Y¼{y1,y2,yy.,yl} with yiAR (herein,the x!
i denotes preceding daily precipitation and yi denotescurrent daily precipitation). Let us assume that these sampleshave been generated independently from an unknown probabilitydistribution function Pð x
!,yÞ and a class of functions (Vapnik,
1995):
F ¼ ff 9f ð x!Þ¼ ðw!, x!ÞþB : w
!ARn, Rn-Rg ð1Þ
where w!
and B are coefficients that have to be estimated from theinput data. Herein, the fundamental problem is to find a functionf ð x!ÞAF that minimizes a risk functional:
R½f ð x!Þ� ¼
Zlðy�f ð x
!Þ, x!ÞdPð x!
,yÞ ð2Þ
where l is a loss function used to measure the deviation betweenthe target, y, and estimate, f ð x
!Þ, values. As the probability
distribution function Pð x!
,yÞ is unknown, it cannot minimize
R½f ð x!Þ� directly but only compute the empirical risk function as
Remp f ð x!Þ
h i¼
1
N
XN
i ¼ 1
lðyi�f ð x!
iÞÞ ð3Þ
This traditional empirical risk minimization is not advisablewithout any means of structural control or regularization. There-fore, a regularized risk function with the smallest steepnessamong the functions that minimize the empirical risk functioncould be used as
Rreg ½f ð x!Þ� ¼ Remp½f ð x
!Þ�þgJw
!J2
ð4Þ
where g is a constant (gZ0). This additional term reduces themodel space and thereby controls the complexity of the solution.For this reason, the following form of this expression can beconsidered (Smola, 1996; Cimen, 2008):
Rreg f ð x!Þ
h i¼ Cc
Xxi AX
leðyi�f ð x!
iÞÞþ1
2Jw!
J2ð5Þ
where Cc is a positive constant (i.e., additional capacity controlparameter) that has to be chosen beforehand. The constant Cc thatinfluences a trade-off between an approximation error and theregression (weight) vector Jw
!J is a design parameter. The loss
function in this expression, which is called e-insensitive lossfunction, has the advantage that we will not need all the inputdata for describing the regression vector w
!and can be written as
leðyi�f ð x!
iÞÞ ¼0 for 9yi�f ð x
!iÞ9oe
9yi�f ð x!
iÞ9 otherwise
8<: ð6Þ
This function behaves as a biased estimator when combinedwith a regularization term ðgJw
!J2Þ. The loss is equal to 0 if the
difference between the predicted f ð x!
iÞ and the measured value yi
is less than e. The choice of e value is easier than the choice of Cc
and it is often given as desired percentage of the output values yi.Hence, nonlinear regression function is given by function thatminimizes Eq. (5) subject to Eq. (6) as in the following expression(Vapnik, 1995; Gunn, 1998; Cimen, 2008):
f ðxÞ ¼XN
i ¼ 1
ðan
i �aiÞKðx,xiÞþB ð7Þ
where ai and an
i Z0 are the Lagrange multipliers, B is a bias term,and K(x,xi) is the Kernel function, which is based upon Reprodu-cing Kernel Hilbert Spaces. The data are often assumed to havezero mean (this can be achieved by pre-processing), so the biasterm is dropped. The kernel function is to enable operations to beperformed in the input space rather than the potentially highdimensional feature space. Hence an inner product in the featurespace has an equivalent kernel in input space. In general, theKernel functions treated by the SVR are the functions with thepolynomial, Gaussian Radial Basis, Exponential Radial Basis,Multi-Layer Perception, Splines, etc. The Gaussian Radial Basisfunction (GRBF) taken into consideration in this study because itis the most common kernel function (Caputo et al., 2002). It canbe written as follows:
Kðx,xiÞ ¼ expð�Jx�xiJ2=2s2Þ ð8Þ
where s is the Gaussian noise level of standard deviation.During the learning by SVR the purpose is to find a nonlinear
function given by Eq. (7) that minimizes a regularized riskfunction (i.e., Eq. (5)). This is achieved for the least value of thedesired error criterion (for example, RMSE) for various constantparameters Cc, and e and various kernel functions K(x,xi) withvarious constant s values. This process was achieved by aprogram written in Fortran 90. The program also produces theLagrange multipliers in Eq. (7).
TurkiyeAfyonIzmir
Fig. 1. The location of the precipitation stations.
Table 1The daily statistical parameters of data set for Izmir and Afyon Station.
Station Data
set
xmean
(cm)
Sx
(cm)
Csx
(cm)
xmin
(cm)
xmax
(cm)
r1 r2 r3
Izmir Training 1.81 6.95 6.46 0.00 108 0.273 0.087 0.057
Test 1.94 7.30 5.95 0.00 79.5 0.283 0.096 0.083
Entire 1.84 7.03 6.32 0.00 108 0.275 0.088 0.062
Afyon Training 1.16 3.47 4.86 0.00 37.2 0.174 0.076 0.015
Test 1.23 3.62 4.41 0.00 38.0 0.146 0.065 0.021
Entire 1.17 3.49 4.76 0.00 38.0 0.168 0.060 0.016
O. Kisi, M. Cimen / Engineering Applications of Artificial Intelligence 25 (2012) 783–792 785
3. Discrete wavelet transform
The theory of wavelet analysis was developed based on theFourier analysis. A signal is broken up smooth sinusoids ofunlimited duration in Fourier analysis. In wavelet analysis, asignal is similarly broken up into wavelets, which are waveformsof effectively limited duration and zero mean. Wavelet analysis isa windowing technique with variable-sized regions. The waveletanalysis indicates a time-scale view of a signal and provides amethod of expressing natural phenomena by utilizing their veryrudimentary multi-fractal basis. The lower scales refer to thecompressed wavelet and are able to follow the high frequencycomponent or the rapidly changing details of the signal. Thehigher scales represent the stretched version of a wavelet and thecorresponding coefficients indicate the slowly changing featuresof a low-frequency component (Kucuk and Agiralioglu, 2006).
Wavelet function c(t) called the mother wavelet, can bedefined as
R þ1�1
cðtÞdt¼ 0 and the ca,b(t) can be obtained throughcompressing and expanding c(t):
ca,bðtÞ ¼ 9a9�1=2ct�b
a
� �bAR, aAR, aa0 ð9Þ
where ca,b(t) is the successive wavelet, a is the scale or frequencyfactor, b is a time factor; R is the domain of real numbers.
If ca,b(t) satisfies Eq. (9), for the time series f(t)AL2(R) or finiteenergy signal, successive wavelet transform of f(t) is defined as
Wcf ða,bÞ ¼ 9a9�1=2Z
Rf ðtÞc
t�b
a
� �dt ð10Þ
where cðtÞ is the complex conjugate functions of c(t). It can beseen from Eq. (10) that the wavelet transform is the decomposi-tion of f(t) under different resolution level (scale). In other words,to filter wave for f(t) with different filter is the essence of wavelettransform.
The successive wavelet is often discrete in real applications.Let a¼ aj
0, b¼ kb0aj0, a041, b0AR, and k and j are integer numbers.
Discrete wavelet transform of f(t) can be written as
Wcf ðj,kÞ ¼ a�j=20
ZR
f ðtÞcða�j0 t�kb0Þdt ð11Þ
The most common (and simplest) choice for the parameters a0
and b0 is 2 and 1 time steps, respectively. This power of twologarithmic scaling of the time and scale is known as dyadic gridarrangement and is the simplest and most efficient case forpractical purposes (Mallat, 1989). Eq. (11) becomes binary wave-let transform when a0¼2, b0¼1:
Wcf ðj,kÞ ¼ 2�j=2Z
Rf ðtÞ cð2�jt�kÞdt ð12Þ
The characteristics of the original time series in frequency (a or j)and time domain (b or k) at the same time are reflected byWcf(a,b) or Wcf(j,k). When the frequency resolution of wavelettransform is low and the time domain resolution is high, a or j
becomes small. When the frequency resolution of wavelet trans-form is high and the time domain resolution is low, a or j becomeslarge (Wang and Ding, 2003).
For a discrete time series f(t), where occurs at different t
(here integer time steps are used), the DWT can be defined as(Mallat, 1989)
Wcf ðj,kÞ ¼ 2�j=2XN�1
t ¼ 0
f ðtÞcð2�jt�kÞ ð13Þ
where Wcf(j,k) is wavelet coefficient for the discrete wavelet of scalea¼2j, b¼2jk.
DWT operates two sets of function viewed as high-pass andlow-pass filters. The original time series are passed through high-pass and low-pass filters and separated at different scales (Kisi,2009). The time series is decomposed into one comprising itstrend (the approximation) and one comprising the high frequen-cies and the fast events (the detail). In the present study, thedetail coefficients and approximation sub-time series areobtained using the Eq. (13).
4. Data and statistical analysis
In this study, the daily precipitation data of two stationslocated in Aegean Region of Turkiye (located in west of Turkiye)are used in this study. The location of the stations is shown inFig. 1. Observed data records have been obtained from the DMI(Turkish State Meteorological Services). The observed data is 15years (5479 day) long with an observation period between 1987and 2001 (from January 1987 to December 2001) for bothstations.
In the applications, the first 12-year of precipitation data (4383day, 80% of the whole data set) are used for training and theremaining 3-year (1096 day, 20% of the whole data set) are used fortesting. The data sets’ daily statistics are presented in Table 1 for theIzmir and Afyon Station. In this table, xmean, Sx, Csx, xmin, xmax, r1, r2, r3
denote the overall mean, standard deviation, skewness, minimum,maximum, lag-1, lag-2 and lag-3 autocorrelation coefficients,respectively. The highest maximum daily precipitation value wasobserved at the Izmir Station (xmax¼1080 mm). The observed dailyprecipitations show quite high positive skewness values (Csx¼6.32and 4.76). It can be seen from the skewness coefficients in the fifthcolumn of Table 1 that the precipitation data show scattereddistribution. The data of Izmir Station have more scattered distribu-tion than those of the Afyon Station. The autocorrelations are quitelow, showing low persistence (e.g., r1¼0.275, r2¼0.088, r3¼0.062).
The root mean square errors (RMSE), mean absolute errors(MAE), Nash–Sutcliffe coefficient (NS) (Nash and Sutcliffe, 1970)and correlation coefficient (R) statistics are used to evaluateWSVR and SVR model accuracies. The R shows the degree whichtwo variables are linearly related to. Different types of informa-tion about the predictive capabilities of the model are measured
O. Kisi, M. Cimen / Engineering Applications of Artificial Intelligence 25 (2012) 783–792786
through RMSE and MAE. The RMSE sizes the goodness of the fitrelated to high precipitation values whereas the MAE measures amore balanced perspective of the goodness of the fit at moderateprecipitations (Karunanithi et al., 1994). The NS is a coefficient ofefficiency and indicates the relative assessment of the modelperformance in dimensionless measures (Nash and Sutcliffe,1970). The RMSE, MAE and NS are defined as
RMSE¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1
N
XN
i ¼ 1
ðYiobserved�Yiestimate
Þ2
vuut ð14Þ
MAE¼1
N
XN
i ¼ 1
Yiobserved�Yiestimate
�� �� ð15Þ
NS¼ 1�
PNi ¼ 1 ðYiobserved
�YiestimateÞ2
PNi ¼ 1 ðYiobserved
�Y�
iobservedÞ2
ð16Þ
in which N is the number of data set, Yi is the daily precipitationand bar denotes the mean.
Decompose input time series using
DWT
Output (e.g., current precipitation value)
Input time series (e.g., preceding daily
precipitations)
SVRmodel
Add the effective Ds components for input of SVR
Fig. 2. The WSVR model structure.
0200400600800
10001200
0day
Pre
cipi
tatio
n (m
m)
-600-400-200
0200400600
D2
-600-400-200
0200400600
D4
1000 2000 3000 4000
0day
1000 2000 3000 4000
0day
1000 2000 3000 4000
Fig. 3. Decomposed wavelet sub-time series compon
5. Application and results
The wavelet support vector regression (WSVR) models areobtained combining two methods, DWT and SVR. The WSVRmodel is an SVR model, which uses sub-time series componentsobtained using DWT on original data. The WSVR model structuredeveloped in the present study is shown in Fig. 2. For the WSVRmodel inputs, the original time series are decomposed into acertain number of sub-time series components (Ds). Each com-ponent plays different role in the original time series and thebehavior of each sub-time series is distinct (Wang and Ding,2003). In WSVR model, the inputs to the model are the Ds ofpreceding daily precipitation and the outputs are original andcurrent daily precipitation.
The previous daily precipitation time series are decomposedinto various Ds at different resolution levels by using DWT toestimate current precipitation value. Four resolution levels (2–4–8–16) are employed in this study. In general, there is log(n)resolution level, where n is the length of the time series (Wangand Ding, 2003). In this study, 4383 daily data are used to obtainSVR models for both stations. This approximately gives fourresolution levels. The original precipitation input time series ofIzmir Station and their Ds, which are, the time series of 2-daymode (D1), 4-day mode (D2), 8-day mode (D3), 16-day mode (D4)and approximate mode, are shown in Fig. 3. The correlationcoefficients between each D sub-time series and original dailyprecipitation time series are given in Table 2 for the Izmir andAfyon Station. In this table, the Dt-i (i¼1,2,3,4) and Pt denotes theD sub-time series at time t–i and measured precipitation at time t,respectively. These correlation values provide information for thedetermination of effective wavelet components on precipitation.It can be seen from Table 2 that D1 component shows lowcorrelations for the both stations. The D2 components have thehighest correlations. In order to select dominant D components,total absolute correlations are evaluated. The total correlationsare given in the last column of the Table 2. According to the total
-600-400-200
0200400600
0day
D1
-600-400-200
0200400600
D3
0
50
100
150
App
roxi
mat
ion
1000 2000 3000 4000
0day
1000 2000 3000 4000
0day
1000 2000 3000 4000
ents (Ds) of precipitation data of Izmir Station.
O. Kisi, M. Cimen / Engineering Applications of Artificial Intelligence 25 (2012) 783–792 787
correlations, the D2, D3 and D4 components seem to be moreeffective than the D1. Then, the new series obtained by adding theeffective Ds and approximation component are used as input tothe SVR model. This summed new series was labeled as DW.
For the Izmir Station, four input combinations based on precedingdaily precipitations are evaluated to estimate current precipitationvalue. The input combinations evaluated in the study are; (i) Pt�1,(ii) Pt�1 and Pt�2, (iii) Pt�1, Pt�2 and Pt�3, (iv) Pt�1, Pt�2, Pt�3 andPt�4. In all cases, the output is the precipitation Pt for the current day.The RMSE, MAE, NS and R statistics of SVR models in training and testperiod are given in Table 3. The Cc, e, and s parameters of theoptimum SVR and WSVR models are provided in Table 5.
Table 3 indicates that the SVR model, whose inputs are theprecipitations of four previous days (input combination (iv)), hasthe best accuracy in training period when compared to WSVR. Inthe test period, however, this SVR model seems to be worse thanthe others from the MAE and R viewpoints. The training and testperformance statistics of the WSVR models are given in Table 4. Inthis table DWt�1 indicates the summed series obtained by addingthe effective Ds and approximation component at time t�1.Table 4 shows that the WSVR model with three inputs has a
Table 2The correlation coefficients between each of sub-time series and original
precipitation data.
Discrete wavelet
components
Correlations Total
(absolute)
Dt�1/Pt Dt�2/Pt Dt�3/Pt Dt�4/Pt
Izmir station
D1 �0.348 �0.069 0.132 0.016 0.565
D2 0.216 �0.274 �0.344 �0.043 0.877
D3 0.347 0.150 �0.072 �0.246 0.815
D4 0.262 0.217 0.159 0.095 0.733
Approximate 0.377 0.372 0.363 0.352 1.405
Afyon station
D1 �0.382 �0.015 0.095 0.018 0.510
D2 0.189 �0.276 �0.318 �0.034 0.817
D3 0.331 0.130 �0.082 �0.241 0.784
D4 0.276 0.235 0.172 0.091 0.774
Approximate 0.318 0.314 0.307 0.293 1.232
Table 3The RMSE, MAE, NS and R statistics of SVR applications for Izmir Station.
Model inputs Training
RMSE (mm) MAE (mm) NS
(i) Pt�1 63.8 15.0 0.162
(ii) Pt�1 and Pt�2 65.4 19.6 0.120
(iii) Pt�1, Pt�2 and Pt�3 69.4 20.9 0.009
(iv) Pt�1, Pt�2, Pt�3 and Pt�4 63.9 15.0 0.161
Table 4The RMSE, MAE, NS and R statistics of WSVR applications for Izmir Station.
Model inputs Training
RMSE (mm) MAE (mm) N
(i) DWt�1 59.8 16.4 0.
(ii) DWt�1 and DWt�2 48.5 14.6 0.
(iii) DWt�1, DWt�2 and DWt�3 43.3 12.9 0.(iv) DWt�1, DWt�2, DWt�3 and DWt�4 43.3 15.1 0.
better accuracy than the others in training and test periods. It canbe seen from Tables 3 and 4 that the WSVR models perform muchbetter than the single SVR models from the various performancecriteria viewpoints. For the Izmir Station, the relative RMSE, MAE,NS and R differences between the WSVR (input combination iii)and SVR (input combination iv) models are 35%, 31%, 1502% and183% in the test period, respectively. The WSVR and SVR forecastsand residuals in test period are shown in Fig. 4. It can be obviouslyseen that the WSVR approximates the precipitation data betterthan the SVR. The scatterplots of each model are illustrated inFig. 5. It is obvious from the fit line equations and R values thatthe WSVR model performs much better than the single SVRmodel. WSVR predicts the maximum peak as 879 mm instead ofmeasured 795 mm, with an overestimation of 10.6%, while theSVR results in 363 mm, with an underestimation of 54.3%. How-ever, the WSVR prediction of the second max peak, 780 mm is280 mm, with an underestimation error of 64.6%, while the SVRyields as 0 mm. WSVR model seems to perform better than theSVR in forecasting peak precipitations.
Test
R RMSE (mm) MAE (mm) NS R
0.441 74.2 20.2 0.033 0.212
0.375 74.1 22.9 0.034 0.253
0.186 72.1 18.9 0.024 0.288
0.448 71.6 19.6 0.037 0.276
Test
S R RMSE (mm) MAE (mm) NS R
265 0.556 64.8 18.8 0.211 0.486
516 0.754 51.8 14.9 0.496 0.745
615 0.794 46.5 13.6 0.593 0.782614 0.784 47.4 16.2 0.577 0.762
Table 5The optimum parameters of the SVR and WSVR models.
Input Izmir Afyon
SVR WSVR SVR WSVR
Pt�1
Cc 10 100 100 10
Epsilon 0 0.0001 1.00E-06 0.1
Sigma 1000 0.9 1000 5
Pt�1 and Pt�2
Cc 10,000 10 1000 100
Epsilon 0.000001 0.01 0.5 0.1
Sigma 0.7 0.01 0.28 0.09
Pt�1, Pt�2 and Pt�3
Cc 1000 1000 100 100
Epsilon 0.5 0.5 0.0001 0.1
Sigma 0.285 0.285 2 0.2
Pt�1, Pt�2, Pt�3 and Pt�4
Cc 10 10,000 1000 100
Epsilon 0.001 0.1 0.5 0.1
Sigma 0.8 0.001 0.28 0.08
Fig. 4. Daily precipitation forecasts of SVR and WSVR models in test period—Izmir Station.
Fig. 5. The scatterplots of the SVR and WSVR models in test period—Izmir Station.
O. Kisi, M. Cimen / Engineering Applications of Artificial Intelligence 25 (2012) 783–792788
Table 6The RMSE, MAE, NS and R statistics of SVR applications for Afyon Station.
Model inputs Training Test
RMSE (mm) MAE (mm) NS R RMSE (mm) MAE (mm) NS R
(i) Pt�1 34.3 12.2 0.018 0.262 37.3 14.0 0.075 0.035
(ii) Pt�1 and Pt�2 34.7 11.7 0.002 0.251 37.3 13.2 0.074 0.076
(iii) Pt�1, Pt�2 and Pt�3 32.9 14.0 0.100 0.348 37.5 14.5 0.086 0.118
(iv) Pt�1, Pt�2, Pt�3 and Pt�4 32.9 11.2 0.096 0.374 38.7 14.2 0.154 0.103
Table 7The RMSE, MAE, NS and R statistics of WSVR applications for Afyon Station.
Model inputs Training Test
RMSE (mm) MAE (mm) NS R RMSE (mm) MAE (mm) NS R
(i) DWt�1 31.6 10.8 0.170 0.504 35.2 12.5 0.041 0.300
(ii) DWt�1 and DWt�2 23.1 8.3 0.555 0.761 23.5 8.7 0.578 0.771
(iii) DWt�1, DWt�2 and DWt�3 20.6 7.1 0.645 0.812 22.2 8.0 0.626 0.796
(iv) DWt�1, DWt�2, DWt�3 and DWt�4 20.0 7.3 0.666 0.826 21.4 9.0 0.647 0.815
O. Kisi, M. Cimen / Engineering Applications of Artificial Intelligence 25 (2012) 783–792 789
For the Afyon Station, also four input combinations are triedbased on preceding daily precipitations to estimate currentprecipitation value. The performance statistics of the SVR modelare given in Table 6. Table 6, which indicates that the SVR model,whose inputs are the precipitations of four previous days (inputcombination iv), has the smallest RMSE (32.9 mm), MAE(11.2 mm) and the highest R (0.374) in training period. The RMSE,MAE, NS and R statistics of the WSVR models are shown inTable 7. From this table, it is obvious that the WSVR model hasthe best accuracy for the input combination iv. Compared withthe SVR models given in Table 6, the WSVR models provide muchbetter performance than the SVR in daily precipitation forecast-ing. The relative RMSE, MAE, NS and R differences between theWSVR (input combination iv) and SVR (input combination iv)models are 45%, 37%, 320% and 691% in the test period, respec-tively. Fig. 6 illustrates the precipitation forecasts of the WSVRand SVR and the residuals in test period. As it can be seen fromthe hydrographs and residuals that the WSVR model performsmuch better than the single SVR model. The scatterplots of eachmodel are demonstrated in Fig. 7. From this figure the WSVRmodel seems also to have a much better forecast accuracy thanthe SVR model. While the WSVR predicts the maximum peak as344 mm instead of measured 380 mm, with an underestimationof 9%, the SVR computes as 335 mm, with an underestimation of12%. However, the SVR prediction of the second maximum peak(272 mm), is 16 mm with an underestimation error of 94%, whilethe WSVR yields 334 mm, with an underestimation of 23%. Herealso the WSVR model has the better accuracy.
Also, daily precipitation estimation has been carried out byfeed-forward artificial neural networks (ANN), for the purpose ofcomparison. The Levemberg–Marquardt algorithm is used herefor adjusting the weights of ANN model. The sigmoid and linearactivation functions are used for the hidden and output node(s),respectively. The hidden layer node numbers of each ANN modelwere determined after trying various network structures. Theoptimal ANN structures for each combination are given in Table 8.In this table, ANN(1,9,1) indicates an ANN model comprising1 input, 9 hidden and 1 output nodes. The ANN networks trainingwere stopped after 100 epochs since the variation of error was toosmall after this epoch. The error graph for an ANN model duringtraining is shown in Fig. 8. Various ANN structures were tried byincreasing the number of hidden neurons from 1 to 10. Hecht-Nielsen (1987) suggested an upper limit of (2nþ1) hidden layer
neurons, where ‘‘n’’ is the number of input neurons. The variationof MSE vs the number of hidden nodes for the ANN(1,9,1) model isshown in Fig. 8 in the test period. It is clear from the figure thatthe 9 hidden neuron has the lowest MSE. The comparison ofTables 3, 6 and 8 indicates that there is a slight differencebetween the ANN and SVR models. It is clear from Table 8 thatthe ANN method also have poor precipitation forecasts as foundfor the SVR method.
In overall, the wavelet and support vector regression conjunc-tion model that is improved combining two methods, DWT andSVR, seems to be more adequate than the single SVR model forforecasting daily precipitations. The original signal (daily precipi-tation time-series in the present study) is represented in differentresolution intervals by DWT. In other words, the complex hydro-logical time-series are decomposed into several simple time-series using a DWT. Thus, some features of the sub-series suchas its daily, monthly, annually periods can be seen more clearlythan the original signal. Setting up a WSVR model, the SVR modelis constructed with appropriate sub-series to belong to differentscales. Forecasts are more accurate than that obtained directly byoriginal signals due to the fact that the features (such asperiodically) of the sub-series are obvious, (Ning and Yunping,1998). This is why the WSVR model performs better than theSVR model.
6. Conclusions
The accuracy of WSVR model has been investigated for fore-casting daily precipitations in the present study. The WSVRmodels were developed by combining two methods, discretewavelet transform and support vector regression. The WSVRmodels were tested applying to different input combinations ofdaily precipitation data of Izmir and Afyon Station located inEagean Region of Turkiye. The test results were compared withthe single support vector regression and artificial neural networkmodels. The comparison results indicated that the WSVRperformed better than the SVR and ANN models in forecastingdaily precipitations. For the Izmir and Afyon stations, the WSVRconjunction model increased the prediction correlation coefficientand Nash–Sutcliffe coefficient with respect to the single SVRmodel by 183–691% and 1502–320% and reduced the root mean
Fig. 6. Daily precipitation forecasts of SVR and WSVR models in test period—Afyon Station.
Fig. 7. The scatterplots of the SVR and WSVR models in test period—Afyon Station.
O. Kisi, M. Cimen / Engineering Applications of Artificial Intelligence 25 (2012) 783–792790
Table 8The RMSE, MAE, MAPE and R statistics of ANN applications.
Model inputs Model Training Test
RMSE (mm) MAE (mm) NS R RMSE (mm) MAE (mm) NS R
Izmir station
(i) Pt�1 (1.9.1) 65.6 26.5 0.110 0.332 68.8 27.3 0.110 0.333
(ii) Pt�1 and Pt�2 (2.4.1) 65.7 26.5 0.108 0.329 68.4 27.3 0.120 0.348
(iii) Pt�1, Pt�2 and Pt�3 (3.1.1) 66.6 27.4 0.082 0.286 68.8 27.8 0.110 0.336
(iv) Pt�1, Pt�2, Pt�3 and Pt�4 (4.1.1) 66.6 27.3 0.083 0.288 68.6 27.7 0.115 0.345
Afyon station
(i) Pt�1 (1,5,1) 33.6 16.6 0.061 0.247 35.3 17.6 0.050 0.226
(ii) Pt�1 and Pt�2 (2,5,1) 33.4 16.5 0.069 0.263 35.2 17.4 0.056 0.239
(iii) Pt�1, Pt�2 and Pt�3 (3,4,1) 33.4 16.5 0.068 0.261 35.2 17.4 0.054 0.235
(iv) Pt�1, Pt�2, Pt�3 and Pt�4 (4,2,1) 33.5 16.5 0.067 0.259 35.1 17.4 0.058 0.243
Fig. 8. The variation of MSE vs. iteration and hidden node numbers for the ANN(1.9.1) model—Izmir Station.
O. Kisi, M. Cimen / Engineering Applications of Artificial Intelligence 25 (2012) 783–792 791
square errors and mean absolute errors by 35–45% and 31–37%,respectively.
References
Bustamante, J., Gomes, J., Chou, S., Rozante, J., 1999. Evaluation of April 1999Rainfall Forecasts over South America Using the ETA Model. INPE, CachoeriaPaulista, Sp, Brasil.
Caputo, B., Sim, K., Furesjo, F., Smola, A., 2002. Appearance-based object recogni-tion using SVMs: which kernel should ı use? In: Proceedings of NIPS Workshopon Statistical Methods for Computational Experiments in Visual Processingand Computer Vision, Whistler.
Cimen, M., 2008. Estimation of daily suspended sediments using support vectormachines. Hydrol. Sci. J. 53 (3), 656–666.
Chou, C.M., Wang, R.Y., 2002. On-line estimation of unit hydrographs using thewavelet-based LMS algorithm. Hydrol. Sci. J. 47 (5), 721–738.
Coulibaly, P., Burn, H.D., 2004. Wavelet analysis of variability in annual Canadianstreamflows. Water Res. Res. 40, W03105.
Dastorani, M.T., Afkhami, H., Sharifidarani, H., Dastorani, M., 2010. Application ofANN and ANFIS models on dryland precipitation prediction (Case Study: Yazdin Central Iran). J. Appl. Sci. 10, 2387–2394.
El-Shafie, A., Jaafer, O., Seyed, A., 2011. Adaptive neuro-fuzzy inference systembased model for rainfall forecasting in Klang River, Malaysia. Int. J. Phys. Sci. 6(12), 2875–2888.
Freiwan, M., Cigizoglu, H.K., 2005. Prediction of total monthly rainfall in Jordanusing feed forward backpropagation method. Fresenius Environ. Bull. 14 (2),142–151.
French, M.N., Krajewski, W.F., Cuykendal, R.R., 1992. Rainfall forecasting in spaceand time using a neural network. J. Hydrol. 137, 1–37.
Georgakakos, K.P., Bras, L.R., 1984. A hydrologically useful station precipitationmodel. Part I and II: formulation and application. Water Res. Res. 20 (11),1585–1596 1597–1610.
Gunn, S.R., 1998. Support Vector Machines for Classification and Regression.Technical Report, University of Southampton, England.
Hecht-Nielsen, R., 1987. Neurocomputing: picking the human brain. IEEE Spec-trum. 25 (3), 36–41.
Hong, W.C., 2008. Rainfall forecasting by technological machine learning models.Appl. Math. Comput. 200 (1), 41–57.
Hung, N.Q., Babel, M.S., Weesakul, S., Tripathi, N.K., 2009. An artificial neuralnetwork model for rainfall forecasting in Bangkok, Thailand. Hydrol. EarthSyst. Sci. 13, 1413–1425.
Ingsrisawang, L., Ingsriswang, S., Somchit, S., Aungsuratana, P., Khantiyanan, W.,2008. Machine learning techniques for short-term rain forecasting system inthe northeastern part of Thailand. World Acad. Sci. Eng. Technol. 41, 248–253.
Karunanithi, N., Grenney, W.J., Whitley, D., Bovee, K., 1994. Neural Networks forRiver Flow Prediction. J. Comp. Civil Eng. ASCE 8 (2), 201–220.
Kisi, O., 2008. Stream flow forecasting using neuro-wavelet technique. Hydrol.Process. 22 (20), 4142–4152.
Kisi, O., 2009. Neural networks and wavelet conjunction model for intermittentstreamflow forecasting. J. Hydrol. Eng. 14 (8), 773–782.
Kisi, O., Cimen, M., 2011. A wavelet-support vector machine conjunction model formonthly streamflow forecasting. J. Hydrol.
Kucuk, M., Agiralioglu, N., 2006. Wavelet regression techniques for streamflowpredictions. J. Appl. Stat. 33 (9), 943–960.
Lu, R.Y., 2002. Decomposition of interdecadal and interannual components forNorth China rainfall in rainy season. Chin. J. Atmos. 26, 611–624 (in Chinese).
Mallat, S.G., 1989. A theory for multi resolution signal decomposition: the waveletrepresentation. IEEE Trans. Pattern Anal. Mach. Intell. 11 (7), 674–693.
Marzano, F.S., Fionda, E., Ciotti, P., 2006. Neural-network approach to ground-based passive microwave estimation of precipitation intensity and extinction.J. Hydrol. 328, 121–131.
Moustris, K.P., Larissi, I.K., Nastos, P.T., Paliatsos, A.G., 2011. Precipitation forecastusing artificial neural networks in specific regions of Greece. Water Resour.Manage.
Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models,Part 1—a discussion of principles. J. of Hydrology 10 (3), 282–290.
Navone, H.D., Ceccatto, H.A., 1994. Predicting indian monsoon rainfall: a neuralnetwork approach. Clim. Dyn. 10, 305–312.
Ning, M., Yunping, C., 1998. An ANN and wavelet transformation based method forshort term load forecast. In: Energy Management and Power Delivery.International Conferences, vol. 2, pp. 405–410.
Nourani, V., Alami, M.T., Aminfar, M.H., 2009. A combined neural-wavelet modelfor prediction of Ligvanchai watershed precipitation. Eng. Appl. Artif. Intell. 22,466–472.
Olson, D.A., Junker, N.W., Korty, 1995. Evaluation 33 years of quantitativeprecipitation forecasting at NMC. Weather Forecast. 10, 498–511.
Olsson, J., Uvo, C.B., Jinno, K., Kawamura, A., Nishiyama, K., Koreeda, N., Nakashima, T.,Morita, O., 2004. Neural networks for rainfall forecasting by atmospheric down-scaling. J. Hydrol. Eng. ASCE 9, 1–12.
O. Kisi, M. Cimen / Engineering Applications of Artificial Intelligence 25 (2012) 783–792792
Partal, T., Kucuk, M., 2006. Long-term trend analysis using discrete waveletcomponents of annual precipitations measurements in Marmara region(Turkey). Physi. Chem. Earth 31, 1189–1200.
Ramirez, M.C.V., Velho, H.F.C., Ferreira, N.J., 2005. Artificial neural networktechnique for rainfall forecasting applied to the Sao Paulo region. J. Hydrol.301, 146–160.
Santos, C.A.G., Morais, B.S., Silva, G.B.L., 2009. Drought forecast using artificialneural network for three hydrological zones in San Francisco river basin. IAHS-AISH Publ. 333, 302–312.
Sivapragasam, C., Liong, S.-Y., Pasha, M.F.K., 2001. Rainfall and runoff forecastingwith SSA–SVM approach. J. Hydroinf. 3 (3), 141–152.
Smith, L.C., Turcotte, D.L., Isacks, B., 1998. Stream flow characterization and featuredetection using a discrete wavelet transform. Hydrol. Process. Vol. 12, 233–249.
Smola, A.J., 1996. Regression Estimation with Support Vector Learning Machines.MSc Thesis, Technische Universitat Munchen, Germany.
Dai, X.G., Wang, P., Chou, J.F., 2003. Multiscale characteristics of the rainy seasonrainfall and interdecadal decaying of summer monsoon in North China. Chin.Sci. Bull. 48, 2730–2734.
Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer Verlag, NewYork, USA.
Wang, W., Ding, J., 2003. Wavelet network model and its application to the
prediction of the hydrology. Nat. Sci. 1 (1), 67–71.Wu, C.L., Chau, K.W., Fan, C., 2010a. Prediction of rainfall time series using modular
artificial neural networks coupled with data-preprocessing techniques. J.Hydrol. 389, 146–167.
Wu, J., Liu, M., Jin, L., 2010b. Least square support vector machine ensemble fordaily rainfall forecasting based on linear and nonlinear regression, Advances in
Neural Network Research and Applications. Lect. Notes Electr. Eng. 67 (Part 1),
55–64. doi:10.1007/978-3-642-12990-2_7.Xingang, D., Ping, W., Jifan, C., 2003. Multiscale characteristics of the rainy season
rainfall and interdecadal decaying of summer monsoon in North China.Chinese Sci. Bull. 48, 2730–2734.
Zhou, H.C., Peng, Y., Liang, G.-H., 2008. The research of monthly dischargepredictor-corrector model based on wavelet decomposition. Water Resour.
Manage 22 (2), 217–227.