performance assessment of different data mining methods in statistical downscaling of daily...
TRANSCRIPT
Accepted Manuscript
Performance Assessment of Different Data Mining Methods in Statistical
Downscaling of Daily Precipitation
M. Nasseri, H. Tavakol-Davani, B. Zahraie
PII: S0022-1694(13)00300-4
DOI: http://dx.doi.org/10.1016/j.jhydrol.2013.04.017
Reference: HYDROL 18844
To appear in: Journal of Hydrology
Received Date: 16 April 2012
Revised Date: 7 April 2013
Accepted Date: 9 April 2013
Please cite this article as: Nasseri, M., Tavakol-Davani, H., Zahraie, B., Performance Assessment of Different Data
Mining Methods in Statistical Downscaling of Daily Precipitation, Journal of Hydrology (2013), doi: http://
dx.doi.org/10.1016/j.jhydrol.2013.04.017
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1
Performance Assessment of Different Data Mining 1
Methods in Statistical Downscaling of Daily Precipitation 2
M. Nasseri1, H. Tavakol-Davani2, B. Zahraie3 3
Abstract 4
In this paper, nonlinear Data-Mining (DM) methods have been used to extend the most cited 5
statistical downscaling model, SDSM, for downscaling of daily precipitation. The proposed 6
model is Nonlinear Data-Mining Downscaling Model (NDMDM). The four nonlinear and 7
semi-nonlinear DM methods which are included in NDMDM model are cubic-order 8
Multivariate Adaptive Regression Splines (MARS), Model Tree (MT), k-Nearest Neighbor 9
(kNN) and Genetic Algorithm-optimized Support Vector Machine (GA-SVM). The daily 10
records of 12 rain gauge stations scattered in basins with various climates in Iran are used to 11
compare the performance of NDMDM model with statistical downscaling method. 12
Comparison between statistical downscaling and NDMDM results in the selected stations 13
indicates that combination of MT and MARS methods can provide daily rain estimations 14
with less mean absolute error and closer monthly standard deviation and skewness values to 15
the historical records for both calibration and validation periods. The results of the future 16
projections of precipitation in the selected rain gauge stations using A2 and B2 SRES 17
scenarios show significant uncertainty of the NDMDM and statistical downscaling models. 18
19
Key Words: Statistical Downscaling, nonlinear Data-Mining method, Climate Change, 20
21
1
Ph.D. Candidate, School of Civil Engineering, University of Tehran, Tehran, Iran, Corresponding Author,
Graduate Student, School of Civil Engineering, University of Tehran, Tehran, Iran, [email protected], 3
Associate Professor, Center of Excellence for Engineering and Management of Civil Infrastructures, School of
Civil Engineering, University of Tehran, Tehran, Iran, [email protected], P.O. Box 11155-4563
2
1. Introduction 22
Outputs of Global Circulation Models (GCMs) are the base of climate change studies. 23
Spatial resolution of these data is not enough to determine local climate change effects and 24
they must be recalculated to a suitable resolution to be valid for local meteorological analysis. 25
The methods of extracting regional scale meteorological variables from GCM outputs have 26
been known as downscaling approaches. Four general categories of downscaling approaches 27
include regression (empirical) methods (Enke and Spekat, 1997; Faucher et al. 1999; Li and 28
Sailor, 2000; Wilby et al. 2002; Hessami et al. 2008; Raje and Mujumdar, 2011), weather 29
pattern approaches (Bárdossy and Plate, 1992; Yarnal et al. 2001; Bárdossy et al. 2002; 30
Wetterhall et al., 2009; Anandhi et al. 2011), stochastic weather generators (Semenov and 31
Barrow, 1997; Bates et al. 1998) and regional climate models (Mearns et al. 1995). 32
Regression or empirical methods are the most cited approaches in downscaling 33
simulation. Simplicity in use, relatively lower costs of pre-processing and straightforwardness 34
of computational procedure are the main reasons of the popularity of these downscaling 35
techniques. 36
Finding the empirical relationships between global and local scales of climate circulation 37
is the basic statement of any statistical downscaling method. According to this assumption, 38
correlation of global GCM meteorological variables (predictors) and local meteorological 39
variables such as observed precipitation and temperature (predictands) is the key point of this 40
type of downscaling procedure. The most well-known regression based downscaling methods 41
are structured for separate estimation of occurrence and amount of meteorological variables. 42
Advantages and disadvantages of statistical regression based downscaling methods have been 43
comprehensively discussed by Hessami et al. (2008). 44
Different nonlinear Data Mining (DM) methods such as Artificial Neural Networks 45
(ANN) (Tomassetti et al. 2009; Pasini, 2009; Mendes and Marengo, 2010; Fistikoglu and 46
3
Okan, 2011), k-Nearest Neighbor (kNN) (Yates et al. 2003; Gangopadhyay et al. 2005; Raje 47
and Mujumdar, 2011), Support Vector Machine (SVM) (Tripathi et al. 2006; Chen et al. 48
2010), Model Tree (MT) (Li and Sailor 2000), Multivariate Adaptive Regression Splines 49
(MARS) (Corte-Real et al. 1995) beside linear regression methods (Wilby et al. 2002; 50
Hessami et al. 2008) have been used in the previous studies for climatological research. 51
SDSM (Statistical DownScaling Model) is the most cited concepts and packages among 52
regression based statistical downscaling methods. This computer package benefits from 53
Multiple Linear Regression (MLR) method to estimate the amount and/or the occurrence of 54
local meteorological predictands. 55
In this paper, efficiency of four nonlinear and semi-nonlinear DM methods and their 56
previous applications in climatological research, namely MARS, MT, kNN and Genetic 57
Algorithm-optimized SVM (GA-SVM) have been evaluated versus application of the 58
standard MLR in estimating both occurrence and amount of precipitation. In this article, the 59
structure of SDSM has been used as the main platform to develop Nonlinear Data-Mining 60
Downscaling Model (NDMDM) model by replacing their MLR kernels with the selected DM 61
methods. In the next sections, local scale (areas of interest and predictands) and large scale 62
datasets (predictors) which are used in this study are described. Then, SDSM and the utilized 63
DM methods are briefly described. The next sections of the paper present the results of the 64
case study and concluding remarks and recommendations for further studies. 65
66
2. Datasets 67
2.1. Local Dataset 68
To assess the efficiency of the proposed downscaling method, twelve rain gauge stations 69
scattered in five different climatological basins in Iran, namely Hamoon-Jazmoorian, 70
Sefidrood, Mordab-Anzali, Shapoor-dalky and Mond are used. These basins are located in an 71
4
arid region in southeast of Iran near Iran-Pakistan border, a wet region in north of Iran near 72
Caspian Sea and a semi-arid region in southwest of Iran in Persian Gulf. Some statistical 73
characteristics such as average, maximum and standard deviation of observed daily 74
precipitation of the selected stations have been presented in Table 1. The locations of these 75
rain gauge stations are also shown in Fig. 1. As presented in Table 1, twenty six to thirty five 76
years of daily precipitation records up to the year 2000 (the start year of simulations of the 77
climate change scenarios) are available for the selected rain gauge stations. For each station, 78
the first 75% of the available record has been used for calibration of the downscaling model 79
and the rest of the recorded data has been used for validation of the model. The daily 80
precipitation records have been gathered from the Iran Water Resources Management 81
Company. 82
83
2.2. Large scale Datasets 84
The data bank of Hadley Center GCM, namely HadCM3, for A2 and B2 SRES (Special 85
Report on Emission Scenarios) scenarios has been used in this study to project the future 86
climate behavior. The coarse resolution (2.5° × 2.5°) reanalysis of atmospheric data from the 87
U.S. National Center for Environmental Prediction (NCEP) (Table 2) have been used as the 88
downscaling model predictors. 89
Because of inconsistency of spatial resolution of HadCM3 outputs (3.75° (long.) × 2.5° 90
(lat.)) and NCEP dataset, projection of large-scale predictors of NCEP on HadCM3 91
computational grid box has been used in this study. The daily projected data and HadCM3 92
outputs are available from the Canadian Climate Impacts Scenarios (CCIS) website 93
(www.cics.uvic.ca/scenarios/sdsm/select.cgi). 94
Twenty six different atmospheric variables are available for each grid box in this 95
database. For each rain gauge station, nine boxes covering and around the study areas have 96
5
been selected. Fig. 1 depicts center of each meteorological grid box and location of the 97
selected rain gauge stations. As it is illustrated in this figure, the grid boxes cover a large area 98
over the selected basins and around them. In addition, one to three-day lags of predictors 99
have been considered as candidate model inputs to incorporate cross correlation and auto-100
correlation in the modeling process. For each station, 936 (9 (grid boxes) × 26 101
(meteorological predictors) ×4 (0 to 3-day time lag)) predictors have been analyzed. 102
103
3. Methodology 104
In the current section at the first, platform of SDSM has been described. Then different 105
data-mining methods which are used in NDMDM are described and at the end, structure of 106
NDMDM is explained. 107
108
3.1. Statistical DownScaling Model (SDSM) 109
SDSM software is developed based on Multiple Linear Regression Downscaling Model 110
(MLRDM) (Wilby et al. 2002). SDSM outputs are the average of several weather ensembles 111
which are the results of using linear regression models with stochastic terms of bias 112
correction. Because of the linear structure of SDSM, selection of predictors is based on the 113
correlation and partial correlation analysis between the predictand and predictors and weights 114
of the predictors which are estimated via simple least square method. Dual simplex method 115
has been also provided in SDSM because of instability of regression coefficients for non-116
orthogonal predictor vectors. Hessami et al. (2008) added a new option of using ridge 117
regression (Hoerl and Kennard, 1970) in their downscaling model, namely ASD as a remedy 118
of the non-orthogonality impact of the predictor vectors as well (Hessami et al. 2008). 119
SDSM model contains of two separate sub-models to determine occurrence and amount 120
of conditional meteorological variables (or discrete variables) such as precipitation and 121
6
amount model for unconditional variables (or continues variables) such as temperature or 122
evaporation. Statistical downscaling using SDSM consists of the following steps: 123
1. In first step, suitable predictors should be selected. SDSM provides the ability of 124
some statistical analysis for users to select the best predictors. In SDSM, predictors 125
should have acceptable unconditional and conditional correlations with the 126
predictand. Also, partial correlation, P-Value and explained variance of the predictors 127
can be checked while using SDSM. The scatter plot is another tool provided in SDSM 128
in order to select the appropriate predictors. Acceptable ranges for the above 129
mentioned terms are proposed by Wilby et al. (2004). 130
2. A multiple linear regression model is calibrated to simulate the precipitation 131
occurrence which is called unconditional model. This model can be calibrated by two 132
different methods namely ordinary least square and dual simplex methods. An 133
autoregressive term can be added to this model. For each month, one MLR model 134
must be calibrated for occurrence estimation. The days with and without events 135
(precipitation) are represented with 1 and 0, respectively. For each day and ensemble, 136
a uniformly distributed random number between 0 and 1 is generated. If the random 137
number is less than the output of the occurrence model in that day, precipitation 138
occurs. Otherwise, precipitation does not occur. 139
3. Another multiple linear regression model, namely conditional model, is calibrated to 140
simulate the precipitation amount. This model is calibrated using the rainy days data. 141
Like the unconditional model, SDSM calibrate different conditional models for 12 142
months of year. For a day which is identified as a rainy day in the previous step, 143
output of the amount model is calculated. Then, a normally distributed number is 144
added to the output to consider the modeling error. This random number is generated 145
7
using a normal distribution function with zero mean and standard deviation equal to 146
standard error. 147
4. The result of the previous step is compared with a pre-defined threshold. If the result 148
is less than the threshold, the precipitation won’t occur. Otherwise, the result is 149
considered as the rainfall amount in that day and in that ensemble. 150
5. Furthermore, in SDSM, bias correction (b) (Eq. 1) and variance inflation (VIF) (Eq. 151
2) actions can be applied on the results of each monthly model to achieve acceptable 152
ensemble results both in the calibration and validation periods (Hessami et al. 2008): 153
modMeanMeanb obs (1)
2
12
Ste
VarVarVIF obs )( mod
(2)
Where Meanobs and Meanmod are the mean values of the observed and modeled 154
precipitation, respectively. Varobs and Varmod are the variances of observed and 155
modeled precipitation for the calibration period and Ste is the standard error in the 156
same period. b-1 is added to the amount of precipitation in each day and 12
VIF is 157
multiplied to the standard deviation of modeling error. While the downscaling model 158
is calibrated using NCEP dataset, in estimating VIF and bias correction, variables with 159
the subscript Mod are estimated using downscaling model outputs based on GCM 160
simulations. This approach allows the modeler to take into account the bias of GCM 161
in the downscaling process. 162
6. Finally, in order to achieve a single downscaled time series from all projected 163
ensembles, their arithmetic mean are calculated. 164
In this study, SDSM has been rewritten in MATLAB environment. Accuracy and 165
compatibility of the new MATLAB code with SDSM package has been tested using several 166
datasets. Then the SDSM MATLAB code has been extended to make the user capable of 167
8
choosing different predictors for the amount and occurrence models. Because of this 168
difference between the commercial SDSM package developed by Wilby et al. (2002) and the 169
code developed in this study, we have referred to it as MLRDM in the next sections of the 170
paper. 171
172
3.2. MARS 173
Initial idea of MARS (Friedman and Stueltze, 1981) has been developed and completed 174
by Friedman (1991). This method is based on multivariate regression with linear or nonlinear 175
mathematical kernels enhanced by continuous data partitioning. Various applications of 176
MARS can be found in hydrological studies (Coulibaly and Bladwin, 2005; Buccola and 177
Wood, 2010; Herrera et al. 2010). MARS detects inherent data nonlinearity and appropriate 178
partitions of data structure for model parameters using weighted summation of some 179
conditional linear or nonlinear polynomial basis functions. MARS uses the following 180
structure for conditional regression: 181
k
i
ii xBCxf1
)()( (3)
Where, )(xBi and iC are the basis function and its constant coefficient, respectively and 182
also k is the number of total basis functions used in the final model. Basis functions are 183
known as hinge function. The general form of hinge functions is as follows: 184
),0max(),0max()( iiiii xconstorconstxxB (4)
In Eq. 4, hinge function is linear while it is also possible to be presented in the form of 185
multiple nonlinear functions with orders higher than one. MARS has two important backward 186
and forward routines for identifying the best structure and especially the model parameters. 187
These coupled routines allow optimization of the MARS structure avoiding over 188
9
parameterization and over-fitting effects. For more information, the readers are referred to 189
Friedman and Stueltze (1981). 190
FAST MARS (Friedman, 1993), ARESLab (Jekabsons, 2010) are the recent softwares 191
developed for MARS. The ARESLab package provided by Jekabsons (2010) has been 192
utilized in this study, and maximum order of polynomial function in ARESLab has been set 193
up to 3. 194
195
3.3. Model Tree (MT) 196
MT is classified as a data-structured and modular DM method. Similar to MARS, MT is 197
also built on a data partitioning foundation. MT splits rules at the leaves of the conceptual 198
mathematical tree in non-terminal nodes of regression functions. So, the construction of a MT 199
is similar to that of decision tree while it has faster convergence compared with other DM 200
methods. MT can successfully manage problems with high dimensional spaces up to 201
hundreds of variables, and also combines a conventional model tree with the possibility of 202
generating linear regression functions at its leaves through increasing simulation 203
performance. MT operates very similar to piecewise or conditional mathematical functions. 204
One of the first applications of conditional regression method to describe behavior of 205
hydrological rainfall-runoff system has been presented in the 1970s by Becker (1976) and 206
also with Becker and Kundzewicz (1987). MT has found many applications in climatological 207
and hydrological sciences (Faucher et al. 1999; Li and Sailor, 2000; Xiong et al. 2001; 208
Solomatine and Xue, 2004). 209
M5 is a popular MT method. M5 learning paradigm was developed by Quinlan (1986, 210
1993). The first version of M5 consisted of piecewise or conditional linear models, which 211
made it an intermediate model between the linear models and truly nonlinear models such as 212
ANNs. The details of algorithm of the first version of M5 can be found in Quinlan (1993) and 213
10
Solomatine and Dulal (2003). A new version of M5 has been presented by Wang & Witten 214
(1997), namely M5′ which is used in the current paper. The main kernel of M5' developed in 215
MATLAB developed by Jekabsons (2010) is used in NDMDM model 216
217
3.4. k-Nearest Neighbor (kNN) 218
One of the simplest methods in pattern recognition is k-Nearest Neighbor (kNN). It is an 219
unsupervised machine learning method. It classifies objects based on the nearest observed in 220
the training dataset in the initial feature space and the interested object being assigned to the 221
class of the most similar between its k nearest neighbors (k is a positive integer). k is the only 222
parameter in this methods which should be calibrated. 223
Different revised forms of kNN have been presented in the literature (Lall and Sharma, 224
1996; Sharif and Burn, 2006). Useful reports and articles about applications of kNN in the 225
fields of hydro-science are available in the literature (Yates et al. 2003; Gangopadhyay et al. 226
2005; Sharif and Burn, 2006; Raje and Mujumdar, 2011). 227
In this paper, original type of kNN (with geometric distance value) has been used for 228
statistical downscaling simulation, and the best value of k in the range of 1 to 20 has been 229
detected via unsupervised learning. 230
231
3.5. Genetic Algorithm-Optimized Support Vector Machine (GA-SVM) 232
SVM is one of the new types of machine learning and data mining methods intended to 233
recognize the data structures for classification or regression. The basic revision of SVM was 234
developed by Vapnik and Cortes (1995). The most important feature of SVM in detecting the 235
data structure is transforming original data from input space to a new target space (feature 236
space) with new mathematical paradigm entitled Kernel function (Boser et al. 1992). For this 237
purpose, a nonlinear transformation function )( is defined to project the input space into a 238
11
higher dimension feature space nh . 239
According to Cover’s theorem (Cover, 1965) a linear function, )(f , can be formulated in 240
the higher dimensional feature space to represent a non-linear relation between the inputs xi 241
and the outputs yi as follows: 242
bxwxfy iii )(,)( (5)
Where w and b are the model parameters. This mathematical approach has been presented 243
previously by Aizerman et al. (1964). Boser et al. (1992) utilized this formulation to develop 244
nonlinear SVM. For more information about SVM in regression and pattern recognition 245
mode, the readers have been addressed to Vapnik (1998). 246
Because of nonlinearity of SVM and its parameters, some researchers optimized SVM 247
and the kernel parameters with evolutionary algorithms (Fei and Sun, 2008; Oliveira et al. 248
2010). 249
In this paper, SVM is used to model daily precipitation amount and occurrence in the 250
proposed downscaling model. In this paper, regression based SVM will be used to downscale 251
daily precipitation both in occurrence and amount modes. To achieve the best performance of 252
SVM, the kernel and SVM parameters (6 parameters) have been optimized using GA. Two 253
kernel functions, namely sigmoid and Radial Basis Function (RBF) have been test and 254
calibrated in this study. For a more detailed description of SVM, readers are referred to 255
Vapnik and Cortes (1995). In the next section, procedure of predictor(s) selection is described 256
in details. 257
258
3.6. Selection of the Predictors 259
The feature selection techniques (or selection of predictors, here) can be categorized into 260
three main branches, namely embedded, wrapper and filter based methods (Tan et al. 2006). 261
Most of the well-known general approaches of feature selection can be categorized in the 262
12
other two broad classes of wrapper and filter methods (Guyon and Elisseeff, 2003). Wrapper 263
methods measure the model performance up to all or most of possible subset of input 264
variables in order to find the appropriate input subsets based on their calibration results (Liu 265
and Yu, 2005). The filter based methods are model-free techniques which utilize statistical 266
criteria to find the existing dependencies between the input candidates and output variable(s) 267
or predictors. These criteria act as statistical benchmarks for reaching the suitable predictor 268
dataset. The linear correlation coefficient is a popular criterion for measuring dependencies 269
between input and output variables. Battiti (1994) showed that efficiency of linear correlation 270
coefficient is related to the effects of noise and data transformation during data preprocessing 271
and feature selection. Despite popularity and simplicity of linear correlation coefficient in 272
exploring the dependency of variables, it is inappropriate for real nonlinear systems (Battiti 273
1994). 274
Mutual Information (MI), as another filtering method, describes the reduction amount of 275
uncertainty in estimation of one parameter when another is available (Liue et al. 2008). It is a 276
robust and nonlinear filter method and recently has been found to be an appropriate statistical 277
criterion in feature or predictor selection problems in hydrology (Bowden et al. 2005a, 278
2005b; May et al. 2008a, 2008b). Achieving the best subset of input predictors in 279
downscaling problems is complicated and challenging because of large number of 280
meteorological predictors while considering the interactions of model parameters and its 281
structure. Since nonlinear mathematical kernels have been used in the proposed downscaling 282
model, MI is selected for choosing the best set of the downscaling model predictor(s). 283
284
3.7. Statistical Downscaling using NDMDM 285
NDMDM is a fully automated MATLAB package developed in this study. In this 286
computational package, five models including MLR (which is used in SDSM package), 287
13
nonlinear MARS, MT, kNN and GA-SVM are available for calibrating both occurrence and 288
amount models. Auto calibration capability is available for all five models as well. 289
NDMDM includes two separate subroutines for precipitation occurrence and amount 290
ensemble simulation. It also includes variance inflation and bias correlation similar to SDSM. 291
The following steps should be taken to use NDMDM model for downscaling precipitation: 292
1- A uniformly distributed random number in [0, 1] is generated to determine whether 293
precipitation occurs. Similar to SDSM, for each day and in each ensemble, a wet-day 294
occurs when the random number is less than or equal to the output of the calibrated 295
occurrence model which can be either of the five MLR, nonlinear MARS, MT, kNN 296
and GA-SVM models. 297
2- Another model (from the set of five available models in NDMDM) is calibrated to 298
simulate the precipitation amount using the rainy days data. Similar to SDSM, 299
NDMDM can calibrate different conditional models for 12 months of the year. For a 300
day which is identified as a rainy day in the previous step, output of the amount 301
model is calculated. Then, similar to SDSM, a normally distributed number is added 302
to the output to consider the modeling error. This random number is generated using 303
a normal distribution function with zero mean and standard deviation equal to 304
standard error. 305
3- In the last step, the results from the previous step are compared with a user-defined 306
threshold to avoid generation of irrational results (such as negative values or too 307
small positive values which can interrupt the dry spell analysis). 308
These three steps are also shown in Fig. 2. These steps are similar to SDSM and the only 309
major difference between NDMDM and SDSM is the four nonlinear MARS, MT, kNN and 310
GA-SVM models which are available in NDMDM and also the possibility of considering 311
different sets of predictors for precipitation occurrence and amount modeling in NDMDM. 312
14
The following five steps have been performed in this study for evaluation of the performance 313
of the models: 314
Singularity analysis is carried out in NDMDM. In this study, in model calibration 315
phase, NCEP variables are used while in scenario generation phase, GCM variables 316
are exploited. The calibrated models in NDMDM must be checked for possible over 317
fitting, extrapolation and singular response modes. In this paper, computed 318
precipitation values which are greater than hundred times of the maximum observed, 319
are considered as singular results. Consequently, the model combinations which 320
produce such results are rejected. This threshold is selected based on engineering 321
judgment and can be different for other basins. 322
The absolute relative errors are calculated for mean, standard deviation and skewness 323
in the dry and wet seasons for the model which passes the previous step. 324
The average, standard deviation, and skewness of errors have been calculated for dry 325
and wet seasons and have been used in evaluation of NDMDM results. 326
Final error (FER) is calculated using equation (7) assuming 3, 2, and 1 as the relative 327
weights of mean error (ErrorMean), standard deviation of errors (Errorstd.) and 328
skewness of errors (Errorskw.), respectively. It must be noted that the error weights in 329
this formula are hypothetical and can be changed based on the modeler’s judgment. 330
6
23 .. SkwStdMean ErrorErrorErrorFER
(6)
The weights used in this equation are selected based on expert judgment and may 331
differ for different purposes. For example, in the case of extreme precipitation, weight 332
of skewness and standard deviation may be selected greater than the weight of mean. 333
In the current study general evaluation of precipitation is aimed. 334
Finally, the model with the least FER value is selected as the best one. 335
15
In the next section, modeling results and the advantages and disadvantages of the proposed 336
methodology are described. 337
338
4. Results and discussion 339
To implement NDMDM, in the first step, suitable predictors must been extracted from the 340
pool of meteorological predictors. Based on the presented description in the section 3.6, MI 341
index has been calculated for different combinations of predictors and predictands to select 342
the suitable predictors. The first five predictors with highest MI values for each predictand 343
are selected. The MI range of the selected predictors is between 0.011 to 0.09. The selected 344
predictors are mostly relative humidity and zonal velocity in different geopotential heights 345
without any time lag. The selected predictors are scattered in all of the nine neighboring 346
boxes shown in Fig. 1. Tables 3 and 4 show the selected five and four predictors set for 347
occurrence and amount models for all of the stations. In these tables, MI values between the 348
predictands and selected predictors have been presented as well. Based on these tables, for 349
occurrence and amount simulation five and four predictors with one lags have been selected 350
(Far. and Khan. stations for occurrence and Ars., Khan. and Deh. Stations for amount) 351
respectively. Relative humidity has been detected as the most selected predictor for both 352
occurrence and amount models all of the stations. Using NDMDM, a set of occurrence and 353
amount models have been calibrated automatically for each month and then stochastic 354
weather generation has been performed for all twenty five possible combinations of five 355
models for occurrence and five models for amount estimation. 356
Number of generated ensembles in each downscaling simulation is set to 100 and also all 357
of the selected models have passed the singularity test as explained in the previous section. In 358
the case of GA-SVM, the population size and cross-over and mutation rates of GA have been 359
16
set to 50, 10% and 80%, respectively; the selected SVM type is epsilon-SVM for both 360
precipitation amount and occurrence models. 361
Sample results obtained from NDMDM for various combinations of the amount and 362
occurrence models for Del. Station are presented in Table 5. To achieve a good perspective of 363
seasonal performance of NDMDM, these results have been categorized to wet (November to 364
April) and dry (May to October) seasonal. In Del. Station, MT-MLR has been found as the 365
best combination of occurrence-amount models. In Table 6, the best combinations of 366
NDMDM models (both occurrence and amount models) for the twelve stations are presented 367
as well. 368
According to the Table 6, MT, MARS, MLR and kNN have been selected 17, 4, 2 and 1 369
times, respectively. Selected DM methods in occurrence mode are only MT and MARS. The 370
most important similarity of these two models is data-partitioning in developing its regression 371
based model structures and this might be the most important reason of their better 372
performance in precipitation occurrence modeling versus the other nonlinear methods used in 373
this study. 374
Table 7 depicts the comparison between statistical characteristics of MLRDM 375
(downscaling using MLR method for both occurrence and amount modeling which is similar 376
to SDSM), selected NDMDM models (based on the proposed FER criteria) and observations 377
in the rain gauge stations in dry and wet seasons and also for the calibration and validation 378
periods. Based on the illustrated results in Table 7, the mean values of the down scaled 379
precipitation series estimated by NDMDM in 58 and 42 percent of the selected rain gauge 380
stations in the calibration and validation periods, respectively have been closer to the 381
historical mean values than the MLRDM model results. In other words, overall NDMDM and 382
MLRDM performances in regenerating mean values are competitive however MLRDM 383
performance is slightly better than NDMDM. 384
17
Table 7 also shows that the standard deviation and skewness values of the down scaled 385
precipitation series estimated by NDMDM in 89 and 92 percent of the selected rain gauge 386
stations in the calibration and validation periods, respectively have been closer to the 387
historical values than the MLRDM model results. In other words, NDMDM shows significant 388
superiority over MLRDM in preserving historical standard deviation and skewness values of 389
the precipitation series. Comparison between the performances of NDMDM in dry and wet 390
seasons also shows that NDMDM performs better in preserving historical mean precipitation 391
of the dry season. 392
The results of the three selected stations (Del., Kas. and Khan.) including monthly mean, 393
monthly standard deviation, monthly skewness and q-q plot of observed versus computed 394
values are shown in Figs. 3, 4 and 5 for the calibration and validation periods. Both MRLDM 395
(MLR-MLR combination which is similar to MLRDM) and NDMDM performances have 396
been acceptable in simulating the monthly mean precipitation values for the calibration and 397
validation periods. For nearly all of the presented q-q plots, a threshold can be found; 398
simulated daily precipitation values larger than this threshold are less than observations. For 399
Del. Station this threshold is about 4.5 mm. For the Ras. Station, this threshold is 8 mm with 400
the exception of NDMDM in the validation period. It is 4 mm for Khan. Station with the 401
exception of the validation period. 402
In Table 8, number of available samples used for occurrence and amount modeling in the 403
calibration period for Del. Station is presented. Number of the parameters for different 404
NDMDM models is also shown in this Table. Number of parameters in MARS and MT are 405
automatically set according to the available samples avoiding over-fitting. In occurrence 406
simulation of the calibration period, many samples are available and models with higher 407
numbers of parameters than MLR are also acceptable. But in simulation of amount in the dry 408
season which has fewer samples, MARS and MT automatically determine suitable number of 409
18
parameters as shown in the Table. For example, in June and September which are the months 410
with fewest samples for amount modeling, both calibrated MT and MARS model have only 411
one parameter while MLR has 5 parameters. GA-SVM also has 6 parameters which can 412
cause over fitting in the dry season for Stations with few precipitation observations. 413
To evaluate the climate change impacts on the studied Stations, SRES A2 and B2 414
scenarios have been considered for the years 2000 to 2050. The results of downscaling using 415
MLRDM and NDMDM are generally different because of the different values of bias 416
correction and variance inflation factors calculated in the calibration period by these models. 417
In Fig. 6, 5-year moving average of precipitation is presented for A2 and B2 scenarios in 418
Del., Ras. and Khan. Stations. As it can be seen in Fig 6, except for Khan. Station, the 419
NDMDM has produced significantly different results compared with MLRDM model. For 420
example, the results of downscaling simulation for Del. Station with MLRDM are 421
significantly higher than the result of NDMDM and for Ras. Station, it is vise-versa. 422
Also in Table 9, some statistical properties of the estimated annual precipitation by 423
MLRDM and NDMDM for the all stations are reported. Based on the table, except for Kas, 424
and Far. Stations, NDMDM and MLRDM results have been relatively similar for each of the 425
scenarios. In other words, if long-term maximum values in one scenario by one model have 426
been increased or decreases, same type of variation have been also predicted by the other 427
model. This similarity of behavior have been also observed for minimum values in all 428
stations except for Shan., Deh., and Shoo. Stations. It also worth mentioning that in both 429
scenarios in almost 75% of the stations, lower minimum values have been estimated by 430
NDMDM compared with MLRDM. Higher variances have also been estimated by NDMDM 431
for all of the stations except for Khan., Shap., and Del. Stations. 432
These results demonstrate high uncertainty associated with downscaling model structures 433
in the climate change modeling for different climate regions and future scenarios. 434
19
435
5. Conclusions 436
The results of this study have shown that different combinations of DM methods can 437
provide good alternative approaches in empirical or regression based downscaling 438
simulations. NDMDM is also proved to be useful software for statistical downscaling. The 439
proposed approach is applicable for downscaling of all meteorological variables and there is 440
no restriction for using NDMDM for other variables however some tuning might be 441
necessary for example for singularity test threshold. 442
Based on the illustrated results of MARS and MT models, it can be concluded that data 443
partitioning plays an important role in similarity of the statistical downscaling results. So, 444
detection of similarity is highly recommended as one of the most important pre-processing 445
steps in statistical downscaling. 446
This study also shows that appropriate performance of NDMDM results is not only 447
related to the complexity of the selected DM methods and their higher numbers of 448
parameters. For example GA-SVM provides much better results in amount simulation 449
compared with MLR while they have almost same number of parameters. As another 450
evidence, kNN (with only one parameter, k) is the selected model for downscaling of Sha. Station 451
while MLR (with 5 parameters) is selected only twice. The results of this study show that the 452
statistical downscaling model performance is highly related to the modeling concept and the 453
overall performance of the occurrence-amount simulation. 454
Occurrence simulation is very similar to pattern recognition and regression based SVM 455
does not provide good results in pattern detection. It is expected that SVM for classification 456
provides better results in occurrence modeling but because of using SDSM platform, it has 457
not been a choice in this study. Presented results in this paper depict better performance of 458
NDMDM in preserving historical monthly mean of precipitation in dry seasons compared 459
20
with wet seasons and closer estimation of historical monthly standard deviation and skewness 460
values compared with MLRDM. Overall the results of this study have shown that NDMDM 461
can be a useful tool for statistical downscaling of precipitation in semi-arid regions with high 462
seasonal variability of precipitation and long dry seasons. 463
Significant uncertainties in projection of climate change effects on precipitation in this 464
study shows that future works can be focused on uncertainty assessment of MLRDM and 465
NDMDM models. These simulations also help in evaluating the mathematical stability of 466
regression models and their parameters. Comparison of the presented approaches in 467
downscaling of other climatic parameters such as temperature, evaporation and number of 468
days with event in Stations is also recommended. 469
470
21
6. References 471
Aizerman, M., Braverman, E., Rozonoer, L., 1964. Theoretical foundations of the potential 472
function method in pattern recognition learning. Automation and Remote Control 25, 473
821-837, 474
Anandhi, A., Frei, A., Pierson, D.C., Schneiderman, E.M., Zion, M.S., Lounsbury, D., 475
Matonse, A.H., 2011. Examination of change factor methodologies for climate change 476
impact assessment. Water Resour Res, 47, W03501, doi: 10.1029/2010WR009104 477
Bárdossy, A., Plate, E.J., 1992. Space-time model for daily rainfall using atmospheric 478
circulation patterns. Water Resour Res, 28(5), 1247–1259 479
Bárdossy, A., Stehlík, J., Caspary, H-J., 2002. Automated objective classification of daily 480
circulation patterns for precipitation and temperature downscaling based on optimized 481
fuzzy rules. Clim Res, 23, 11–22 482
Bates, B. C., Charles, S.P., Hughes, J.P., 1998. Stochastic downscaling of numerical climate 483
model simulations. Environ Modell Softw., 13, 325–331 484
Battiti, R., 1994. Using mutual information for selecting features in supervised neural net 485
learning. IEEE Transactions on Neural Networks, 5 (4), 537–550. 486
Becker, A., 1976, Simulations of nonlinear flow systems by combining linear models. IAHS 487
116, 135-142 488
Becker, A., Kundzewicz, Z.W., 1987. Nonlinear flood routing with multi linear models. 489
Water Resour Res, 23 (6), 1043-1048 490
Bowden, G. J., Dandy, G. C., Maier, H. R., 2005 a. Input determination for neural network 491
models in water resources applications Part 1—background and methodology. Journal of 492
Hydrology, 301(1-4), 75-92 493
22
Bowden, G. J., Maier, H. R., Dandy, G. C., 2005 b. Input determination for neural network 494
models in water resources applications Part 2. Case study: forecasting salinity in a river. 495
Journal of Hydrology, 301(1-4), 93-107 496
Boser, B. E., Guyon, I. M., Vapnik, V. N., 1992. A training algorithm for optimal margin 497
classifiers. In D. Haussler, editor, 5th Annual ACM Workshop on COLT, Pittsburgh, 498
PA, ACM Press,144-152, 499
Buccola, N.L., Wood, T.M., 2010. Empirical models of wind conditions on Upper Klamath 500
Lake. Oregon. U.S. Geological Survey Scientific-Investigations Report 2010–5201 501
Chen, S.T., Yu, P.S., Tang, Y.H., 2010. Statistical Downscaling of Daily Precipitation using 502
Support Vector Machines and Multivariate Analysis. J Hydrol, 385(1-4), 13-22 503
Coulibaly, P., Baldwine, C. K., 2005. Non stationary hydrological time series forecasting 504
using nonlinear dynamic methods. J Hydrol, 307, 164-174 505
Cover, T. M., 1965. Geometrical and Statistical Properties of Systems of Linear Inequalities 506
with Applications in Pattern Recognition. IEEE Trans. Elec. Comp., EC-14, 326-334. 507
Corte-Real, J., Zhang, X. and Wang, X. 1995. Downscaling GCM information to regional 508
scales: a non-parametric multivariate regression approach. Climate Dynamics, 11, 413–509
424. 510
Enke, W., Spekat, A., 1997. Downscaling climate model outputs into local and regional 511
weather elements by classification and regression. Clim Res, 8, 195-207 512
Faucher, M., Burrows, W.R., Pandolfo, L., 1999. Empirical-statistical reconstruction of 513
surface marine winds along the western coast of Canada. Clim Res, 11, 173-190 514
Fei, Sh. W., Sun, Y., 2008. Forecasting dissolved gases content in power transformer oil 515
based on support vector machine with genetic algorithm. Electric Power Systems 516
Research, 78 (3), 507-514 517
23
Fistikoglu, O., Okkan. U., 2011. Statistical Downscaling of Monthly Precipitation Using 518
NCEP/NCAR Reanalysis Data for Tahtali River Basin in Turkey. J Hydrol. Eng., 16(2), 519
doi:10.1061/(ASCE)HE.1943-5584.0000300 520
Friedman, J.H., Stuetzle, W., 1981. Projection pursuit regression. JASA, 76, 817–823 521
Friedman, J.H., 1991. Multivariate Adaptive Regression Splines. Ann Stat, 19(1), 1–67 522
Friedman, J.H., 1993. Fast MARS. Dept. of Statistics, Stanford University, Technical Report: 523
110, 524
Gangopadhyay, S., Clark, M., Rajagopalan, B., 2005, Statistical Downscaling using K-525
nearest neighbors. Water Resour Res, 41, W02024, doi:10.1029/2004WR003444 526
Guyon, I., Elisseeff, A., 2003. An introduction to variable and feature Selection. Journal of 527
Machine Learning Research, 3, 1157-1182. 528
Herrera, M., Torgo, L., Izquierdo, J., Perez-Garcia, R., 2010. Predictive models for 529
forecasting hourly urban water demand. J Hydrol, 384, 141-150 530
Hessami, M., Gachon, Ph., Ouarda, T.B.M.J., St-Hilaire, A., 2008. Automated regression-531
based statistical downscaling tool. Environ Modell Softw. 23(6), 813-834 532
Hoerl, A.E., Kennard, R.W., 1970. Ridge regression: application to nonorthogonal problems. 533
Technometrics 12 (1), 69-82 534
Jekabsons, G., 2010, ARESLab: Adaptive Regression Splines toolbox for Matlab/Octave. 535
available at http://www.cs.rtu.lv/jekabsons/ 536
Jekabsons, G., 2010. M5PrimeLab: M5' regression tree and model tree toolbox for Matlab. 537
available at http://www.cs.rtu.lv/jekabsons/ 538
Li, X., Sailor, D., 2000. Application of tree-structured regression for regional precipitation 539
prediction using general circulation model output. Clim Res, 16, 17-30 540
Liue, H., Yu, L., 2005. Toward integrating feature selection algorithms for classification and 541
clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502. 542
24
Lall, U., Sharma, A., 1996. A nearest neighbor bootstrap for resampling hydrologic time 543
series. Water Resources Research, 32(3), 679-693 544
May, R. J., Dandy, G. C., Maier, H. R., Nixon, J. B., 2008a. Application of partial mutual 545
information variable selection to ANN forecasting of water quality in water distribution 546
systems. Environmental Modeling and Software, 23, 1289-1299 547
May, R. J., Maier, H. R., Dandy, G. C., Fernando, T. G., 2008b. Non-linear variable selection 548
for artificial neural networks using partial mutual information. Environmental Modeling 549
and Software, 23, 1312-1326. 550
Mendes, D., Marengo, J.A., 2010. Temporal downscaling: a comparison between artificial 551
neural network and autocorrelation techniques over the Amazon Basin in present and 552
future climate change scenarios. Theor Appl Climatol, 100 (3-4), 413-421 553
Oliveira, A. L. I., Braga, P.L., Lima, R. M. F., Cornélio, M. L., 2010. GA-based method for 554
feature selection and parameters optimization for machine learning regression applied to 555
software effort estimation, Information and Software Technology, 52(11), 1155-1166 556
Pasini, A., 2009. Neural Network Modelling in Climate Change Studies. Artificial 557
Intelligence Methods in the Environmental Sciences II, 413-421 558
Quinlan, J. R., 1986. Induction on decision trees. Mach Learn 1, 81-106. 559
Quinlan, J. R., 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan 560
Kaufmann 561
Raje, D., Mujumdar, P.P., 2011. A comparison of three methods for downscaling daily 562
precipitation in the Punjab region. Hydrol Process, doi:10.1002/hyp.8083 563
Semenov, M.A., Barrow, E., 1997. Use of stochastic weather generator in the development of 564
climate change scenarios. Clim. Change, 35, 397–414 565
Sharif, M., Burn, D. H., 2006. Simulating climate change scenarios using an improved K-566
nearest neighbor model. Journal of Hydrology, 325, 179-196 567
25
Solomatine, D., Dulal, K. h., 2003. Model trees as an alternative to neural networks in 568
rainfall-runoff modeling. Hydrolog. Sci. J., 48, 399-411 569
Solomatine, D., Xue, Y., 2004, M5 model trees and neural networks: application to flood 570
forecasting in the upper reach of the Huai river in China. J. Hydrol. Eng., 9(6), 275-287 571
Tan, P. N., Steinbach, M., Kumar, V., 2006. Introduction to data mining. Addison Wesley, 572
Tomassetti, B., Verdecchia, M., Giorgi, F., 2009. NN5: A neural network based approach for 573
the downscaling of precipitation fields–Model description and preliminary results. J 574
Hydrol, 367(1-2), 14-26 575
Tripathi, Sh., Srinivas, V. V., Nanjundiah, R. S., 2006. Downscaling of Precipitation for 576
Climate Change Scenarios: A Support Vector Machine Approach. J Hydrol 330(3-4), 577
621-640 578
Xiong, L. H., Shamseldin, A. Y., O’Connor, K. M., 2001. A non-linear combination of the 579
forecasts of rainfall–runoff models by the first-order Takagi-Sugeno. fuzzy system. J 580
Hydrol. 245(1–4), 196–217, 581
Vapnik, V. N., Cortes, C., 1995. Support vector networks. Machine Learning, 20, 273–297. 582
Vapnik, V. N., 1998. Statistical Learning Theory. Wiley, New York. 583
Wang. Y., Witten, I. H., 1997. Induction of model trees for predicting continuous classes. 584
Proceedings of European Conference on Machine Learning, Prague, 128-137 585
Wetterhall, F., Bárdossy, A., Chen, D., Halldin, S., Xu, Ch., 2009, Statistical downscaling of 586
daily precipitation over Sweden using GCM output. Theor Appl Climatol, 96, 95–103 587
Wilby, R. L., Hay, L. E., Leavesley, G. H., 1999. A comparison of downscaled and raw GCM 588
output: implications for climate change scenarios in the San Juan River basin, Colorado. 589
J Hydrol, 225, 67–91 590
Wilby, R. L., Dawson, C. W., Barrow, E. M., 2002. SDSM–a decision support tool for the 591
assessment of regional climate change impacts. Environ Modell Softw, 17, 147–159 592
26
Wilby, R. L., Charles, S. P., Zorita, E., Timbal, B., Whetton, P., Mearns, L. O., 2004. 593
Guidelines for use of climate scenarios developed from statistical downscaling methods. 594
Supporting material of the Intergovernmental Panel on Climate Change, available from 595
the DDC of IPCC TGCIA, 27 596
Yarnal, B., Comrie, A. C., Frakes, B., Brown, D. P., 2001. Developments and prospects in 597
synoptic climatology. Int J Climatol, 21, 1923–1950 598
Yates, D., Gangopadhyay, S., Rajagopalan, B., Strzepek, K., 2003. A Technique for 599
Generation Regional Climate Scenarios using a Nearest-Neighbor Algorithm. Water 600
Resour Res, 39(7), 1199, doi:10.1029/2002WR001769 601
Table 1.Basic information about 12 rain gauge stations (Max.=Maximum and Std.=Standard deviation)
No. Station code Station name Abbr. Basin
Length of
Dataset
(year)
Longitude
(°E)
Latitude
(°N)
Statistical characteristics of
observed daily rainfall (mm)
Mean Max. Std. 1 44-014 Delfard Del. Jazmoorian
1975–2000
57.60 29.00 1.20 150 6.31
2 44-009 Dehrood Deh. Jazmoorian 57.73 28.87 0.76 132 4.71
3 44-016 Khoramshahi Kho. Jazmoorian 57.75 29.00 1.32 194 6.95
4 44-024 Kharposht Khar. Jazmoorian 57.83 28.48 0.46 80 3.20
5 17-082 Rasht Ras. Sefidrood
1966–2000
49.60 37.25 3.58 188 10.34
6 17-075 Farshekan Far. Sefidrood 49.58 37.40 3.30 168 9.93
7 18-007 Kasma Kas. Mordab-anzali 49.30 37.31 3.01 317 9.59
8 18-017 Shanderman Shan. Mordab-anzali 49.11 37.41 2.67 177 8.23
9 24-033 Khanzanian Khan. Mond
1972–2000
52.15 29.67 1.27 92 5.26
10 23-011 Shapoor Shap. Shapoor-dalky 51.11 29.58 0.85 75 4.45
11 23-019 Shoorjareh Shoo. Shapoor-dalky 51.98 29.25 0.99 120 4.92
12 43-034 Arsanjan Ars. Shapoor-dalky 51.30 29.92 0.87 111 4.56
Table 2. Large-scale Predictors from NCEP database
No. Predictor Abbreviation
1 Mean Sea Level Pressure mslp
2 Surface Airflow Strength p__f
3 Surface Zonal Velocity p__u
4 Surface Meridional Velocity p__v
5 Surface Vorticity p__z
6 Surface Wind Direction p_th
7 Surface Divergence p_zh
8 500 hpa Airflow Strength p5_f
9 500 hpa Zonal Velocity p5_u
10 500 hpa Meridional Velocity p5_v
11 500 hpa Vorticity p5_z
12 500 hpa Wind Direction p5th
13 500 hpa Divergence p5zh
14 850 hpa Airflow Strength p8_f
15 850 hpa Zonal Velocity p8_u
16 850 hpa Meridional Velocity p8_v
17 850 hpa Vorticity p8_z
18 850 hpa Wind Direction p8th
19 850 hpa Divergence p8zh
20 500 hpa Geopotential Height p500
21 850 hpa Geopotential Height p850
22 Relative Humidity at 500 hpa r500
23 Relative Humidity at 850 hpa r850
24 Near Surface Relative Humidity Rhum
25 Near Surface Specific Humidity Shum
26 Mean Temperature at 2m Temp
Table 3. Selected predictors for occurrence model calibrated for different stations
Station Predictor Lag Longitude
(degree)
Latitude
(degree) MI
Del.
rhum 0 56.25 30.00 0.0170
r850 0 56.25 30.00 0.0170
r850 0 60.00 27.50 0.0165
r850 0 56.25 27.50 0.0162
p500 0 52.50 30.00 0.0162
Deh.
r850 0 56.25 27.50 0.0169
r850 0 56.25 30.00 0.0161
r850 0 60.00 27.50 0.0161
rhum 0 56.25 27.50 0.0152
rhum 0 56.25 30.00 0.0152
Kho.
p500 0 52.50 30.00 0.0209
pr850 0 52.50 32.50 0.0207
pr850 0 56.25 27.50 0.0197
pr850 0 60.00 27.50 0.0190
rhum 0 56.25 27.50 0.0181
Khar.
r850 0 56.25 27.50 0.0113
r850 0 56.25 30.00 0.0113
r850 0 60.00 27.50 0.0112
rhum 0 56.25 27.50 0.0111
rhum 0 56.25 30.00 0.0110
Ras.
r850 0 48.75 37.50 0.0493
r850 0 48.75 40.00 0.0429
rhum 0 48.75 37.50 0.0380
r850 0 52.50 37.50 0.0328
p__u 0 48.75 40.00 0.0299
Far.
r850 0 48.75 37.50 0.0490
r850 0 48.75 40.00 0.0456
r850 0 52.50 37.50 0.0379
rhum 0 48.75 37.50 0.0335
r850 1 48.75 40.00 0.0319
Kas.
p__u 0 48.75 40.00 0.0449
r850 0 48.75 37.50 0.0426
r850 0 48.75 40.00 0.0352
r850 0 52.50 37.50 0.0320
rhum 0 48.75 37.50 0.0283
Shan.
p__u 0 48.75 40.00 0.0407
p__z 0 48.75 40.00 0.0394
pr850 0 48.75 37.50 0.0308
Station Predictor Lag Longitude
(degree)
Latitude
(degree) MI
pr850 0 48.75 40.00 0.0304
rhum 0 48.75 37.50 0.0258
Khan.
rhum 0 52.50 30.00 0.0436
r850 0 52.50 30.00 0.0436
r850 0 48.75 30.00 0.0366
r850 0 52.50 27.50 0.0350
p8zh 1 48.75 27.50 0.0350
Shap.
r850 0 48.75 30.00 0.0473
r850 0 48.75 32.50 0.0472
r850 0 52.50 27.50 0.0394
r850 0 52.50 30.00 0.0387
rhum 0 52.50 30.00 0.0358
Shoo.
r850 0 48.75 30.00 0.0485
r850 0 52.50 27.50 0.0485
r850 0 52.50 30.00 0.0412
rhum 0 52.50 30.00 0.0384
rhum 0 56.25 30.00 0.0382
Ars.
r850 0 52.50 27.50 0.0459
r850 0 52.50 30.00 0.0459
r850 0 56.25 30.00 0.0389
rhum 0 52.50 30.00 0.0389
rhum 0 56.25 30.00 0.0359
Table 4. Selected predictors for the amount model calibrated for different stations
Station Predictor Lag Longitude
(degree)
Latitude
(degree) MI
Del.
p5zh 0 52.50 30.00 0.0515
p5zh 0 56.25 27.50 0.0511
p5_v 0 56.25 30.00 0.0508
p5_v 0 52.50 30.00 0.0489
Deh.
r850 0 56.25 52.50 0.0538
rhum 0 56.25 27.50 0.0521
r850 1 56.25 30.00 0.0520
rhum 1 56.25 30.00 0.0492
Kho.
p5_v 0 52.50 27.50 0.0482
p5_v 0 52.50 30.00 0.0475
p5_v 0 52.50 32.50 0.0471
p5zh 0 52.50 27.50 0.0452
Khar.
r850 0 60.00 27.50 0.0879
r850 0 60.00 30.00 0.0849
rhum 0 56.25 27.50 0.0808
rhum 0 60.00 30.00 0.0771
Ras.
p8th 0 52.50 40.00 0.0243
p__u 0 48.75 37.50 0.0225
p8th 0 52.50 37.50 0.0223
p__u 0 45.00 40.00 0.0215
Far.
p__u 0 45.00 40.00 0.0242
p__u 0 48.75 37.50 0.0217
p8th 0 52.50 40.00 0.0217
r850 0 48.75 37.50 0.0215
Kas.
p__u 0 45.00 37.50 0.0140
p__u 0 48.75 37.50 0.0137
p8_u 0 45.00 37.50 0.0136
p8th 0 45.00 37.50 0.0135
Shan.
p__u 0 45.00 40.00 0.0194
p__z 0 48.75 40.00 0.0175
p8_u 0 45.00 37.50 0.0168
p8th 0 52.50 40.00 0.0166
Khan.
r850 0 52.50 30.00 0.0522
rhum 0 52.50 30.00 0.0522
p8_v 1 52.50 27.50 0.0434
p8_v 0 56.25 30.00 0.0431
Shap. p8_v 0 56.25 32.50 0.0624
r850 0 52.50 30.00 0.0618
Station Predictor Lag Longitude
(degree)
Latitude
(degree) MI
rhum 0 52.50 27.50 0.0509
rhum 0 52.50 30.00 0.0471
Shoo.
r850 0 52.50 30.00 0.0583
r850 0 56.25 30.00 0.0583
rhum 0 52.50 30.00 0.0446
rhum 0 56.25 30.00 0.0445
Ars.
p8_v 0 52.50 27.50 0.0563
p8_v 0 56.25 30.00 0.0489
p8zh 0 52.50 27.50 0.0483
p8_v 1 52.50 27.50 0.0440
Table 5. NDMDM results for Del. Station (based on the validation period)
Model Combination Dry Season Wet Season
FER N
o
Occurren
ce Model
Amount
Model Mean
Standard
Deviation. Skewness Mean
Standard
Deviation. Skewness
1 MLR MLR 0.27 0.51 3.92 1.98 2.96 2.59 0.4
2 MLR kNN 0.25 0.44 4.23 1.93 2.83 3.23 0.4
3 MLR MARS 0.25 0.41 3.73 1.97 2.83 3.46 0.41
4 MLR MT 0.27 0.44 3.06 1.94 2.88 3.15 0.43
5 MLR GA-
SVM 0.28 0.53 8.22 2.16 2.76 2.1 0.37
6 kNN MLR 0.16 0.54 4.33 1.84 3.56 3 0.35
7 kNN kNN 0.14 0.47 5.1 1.78 3.23 2.94 0.4
8 kNN MARS 0.15 0.58 11.64 1.87 3.68 5.51 0.27
9 kNN MT 0.16 0.49 4.22 1.78 3.51 4.82 0.35
10 kNN GA-
SVM 0.17 0.64 10.34 1.93 3.23 2.17 0.29
11 MARS MLR 0.75 1.81 2.96 1.93 3.56 3.41 0.6
12 MARS kNN 0.67 1.62 3.15 1.88 3.34 3.79 0.48
13 MARS MARS 0.77 1.87 2.86 1.97 3.92 9.19 0.61
14 MARS MT 0.8 1.94 3.06 1.89 3.49 4.74 0.66
15 MARS GA-
SVM 0.82 1.97 2.96 2.08 3.21 2.39 0.71
16 MT MLR 0.17 1.16 12.27 2.07 5.02 2.82 0.07
17 MT kNN 0.14 0.99 13.89 2.05 4.9 2.95 0.15
18 MT MARS 0.15 1.11 17.73 2.11 4.98 3.59 0.13
19 MT MT 0.17 1.03 8.97 2.07 4.93 2.99 0.09
20 MT GA-
SVM 0.22 2.15 25.04 2.22 4.87 2.25 0.26
21 GA-
SVM MLR 0.29 0.7 9.56 1.75 1.46 1.07 0.42
22 GA-
SVM kNN 0.24 0.54 8.41 1.82 1.47 1.34 0.43
23 GA-
SVM MARS 0.25 0.53 10.29 1.85 1.38 1.34 0.43
24 GA-
SVM MT 0.27 0.51 5.73 1.81 1.6 3.79 0.45
25 GA-
SVM
GA-
SVM 0.29 0.64 12.28 2.13 1.58 1.78 0.42
Observation 0.2 1.48 9.93 2.33 8.72 6.23 -
*Selected combination of the amount and occurrence models are marked in gray color.
Table 6. Best combination of occurrence and amount models in the twelve rain gauge stations
No. Station Occurrence
Model
Amount
model
1 Del. MT MLR
2 Deh. MT MARS
3 Kho. MT MT
4 Khar. MARS MT
5 Ras. MARS MT
6 Far. MT MT
7 Kas. MARS MT
8 Shan. MT MT
9 Khan. MT MT
10 Shap. MT kNN
11 Shoo. MT MLR
12 Ars. MT MT
Table 7. Comparison of statics of daily precipitation values downscaled by MLRDM and
NDMDM
Station Model
Calibration Validation
Dry Season Wet Season Dry Season Wet Season
Mean Std. Skw. Mean Std. Skw. Mean Std. Skw. Mean Std. Skw.
Del.
MLRDM 0.35 0.89 5.35 2.18 4.22 4.68 0.35 0.84 3.82 2.26 4.17 3.28
NDMDM 0.2 0.99 6.58 2.04 5.36 3.61 0.17 1.16 12.27 2.07 5.02 2.82
Obs. 0.28 2.58 22.40 2.12 8.47 8.20 0.20 1.48 9.93 2.33 8.72 6.23
Deh.
MLRDM 0.15 0.54 9.03 1.49 2.99 4.16 0.18 0.54 5.74 1.50 2.86 2.94
NDMDM 0.12 0.73 9.54 1.48 4.58 4.95 0.11 0.64 7.77 1.59 4.63 4.3
Obs. 0.12 1.59 19.96 1.47 6.78 7.76 0.08 0.95 20.12 1.29 5.32 5.94
Kho.
MLRDM 0.26 0.77 10.91 2.61 4.89 3.38 0.21 0.63 6.55 2.72 4.84 2.81
NDMDM 0.22 1.17 7.37 2.45 7.21 5.9 0.16 0.97 7.65 2.29 6.12 4.05
Obs. 0.23 1.99 14.27 2.55 9.63 6.77 0.20 1.52 10.84 2.08 9.28 8.45
Khar.
MLRDM 0.11 0.37 8.17 0.90 1.85 3.66 0.13 0.39 4.42 0.97 1.90 3.18
NDMDM 0.09 0.41 18.76 0.89 2.57 6.12 0.12 0.63 14.62 0.9 2.61 6.95
Obs. 0.07 1.02 24.68 0.87 4.55 7.95 0.12 1.53 22.28 0.76 3.63 6.39
Ras.
MLRDM 2.89 5.51 4.08 4.05 5.95 2.35 3.01 6.36 3.91 4.30 6.60 2.57
NDMDM 3.01 6.96 5.21 4.01 6.97 4.04 3.04 7.87 5.04 4.16 7.76 3.89
Obs. 3.00 10.28 6.62 4.16 10.33 4.71 3.44 11.31 6.06 3.77 9.38 4.12
Far.
MLRDM 2.66 5.36 4.01 3.70 5.36 2.29 2.64 5.93 3.72 3.91 5.93 2.51
NDMDM 2.63 7.41 4.68 3.67 7.34 3.32 1.68 6.17 6.11 3.74 7.85 3.53
Obs. 2.69 9.52 6.29 3.79 9.64 4.92 3.16 11.47 6.77 3.78 10.20 4.54
Kas.
MLRDM 3.04 5.02 3.31 2.93 4.17 2.48 2.79 5.55 3.68 2.74 4.32 2.88
NDMDM 3.14 7.06 5.72 3.03 5.44 4.2 2.88 7.72 5.81 2.89 5.59 4.12
Obs. 3.02 11.47 9.49 3.06 7.84 4.57 3.07 10.24 5.84 2.76 7.15 4.46
Shan.
MLRDM 2.81 4.13 2.86 2.48 2.90 1.83 2.63 4.63 3.10 2.37 3.18 2.86
NDMDM 2.8 6.82 4.72 2.55 5.2 5.76 2.12 6.52 5.93 2.47 5.6 6.78
Obs. 2.81 9.72 7.00 2.54 6.59 6.97 2.87 9.45 6.35 2.40 6.08 4.50
Khan.
MLRDM 0.20 0.72 7.13 2.33 4.12 2.82 0.21 0.70 6.11 2.36 4.47 3.16
NDMDM 0.14 1.08 12.85 2.3 5.21 3.39 0.08 0.57 11.86 2.35 5.64 3.69
Obs. 0.15 1.61 16.93 2.39 7.08 4.58 0.12 1.20 17.40 2.48 7.29 3.81
Shap.
MLRDM 0.07 0.53 22.94 1.51 3.40 3.58 0.06 0.37 12.27 1.58 3.52 3.55
NDMDM 0.06 0.7 20.78 1.5 4.01 3.69 0.03 0.5 23.65 1.33 3.81 3.95
Obs. 0.07 1.20 24.90 1.57 5.87 5.70 0.03 0.49 18.24 1.90 6.77 5.31
Shoo.
MLRDM 0.07 0.30 10.73 1.80 3.68 3.21 0.07 0.24 5.41 1.75 3.60 3.29
NDMDM 0.07 0.64 15.6 1.86 4.43 2.92 0.06 0.63 18.41 1.82 4.46 2.95
Obs. 0.07 0.91 19.45 1.83 6.53 6.59 0.06 0.74 19.86 2.24 7.53 5.01
Ars.
MLRDM 0.24 0.90 7.25 1.59 3.85 4.85 0.37 1.38 6.77 1.82 4.15 3.83
NDMDM 0.08 0.55 11.91 1.56 4.7 6.55 0.07 0.49 11.03 1.69 5.11 5.85
Obs. 0.09 1.19 23.65 1.62 6.10 6.99 0.05 0.71 23.11 1.79 6.78 5.89
Table 8. Number of Parameters for amount and occurrence models and number of the available
samples in Del. Station in the calibration period,
Model DM Method Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Occ
urr
ence
Sim
ula
tion
MLR 6 6 6 6 6 6 6 6 6 6 6 6
kNN 1 1 1 1 1 1 1 1 1 1 1 1
MARS 29 16 31 29 31 34 29 21 31 31 34 29
MT 48 50 54 29 17 9 16 22 7 17 11 40
GA-SVM 6 6 6 6 6 6 6 6 6 6 6 6
No. of Samples 620 565 612 570 589 570 589 589 578 620 600 620
Am
ount
sim
ula
tion
MLR 5 5 5 5 5 5 5 5 5 5 5 5
kNN 1 1 1 1 1 1 1 1 1 1 1 1
MARS 6 6 1 1 6 1 1 14 1 1 1 31
MT 9 8 5 2 3 1 3 1 1 1 1 7
GA-SVM 6 6 6 6 6 6 6 6 6 6 6 6
No. of Samples 87 110 121 47 25 7 13 26 7 18 16 69
Table 9. Maximum, Minimum and Variance of annual precipitation of all stations for different
future scenarios
Station Statistics A2 B2
MLRDM NDMDM MLRDM NDMDM
Del.
Maximum 610 497 520 452
Minimum 137 90 188 100
Variance 10692 10554 6090 8259
Deh.
Maximum 610 805 520 797
Minimum 137 156 188 142
Variance 10692 20140 6090 16655
Kho.
Maximum 585 535 582 442
Minimum 159 89 223 111
Variance 7988 10074 4616 4982
Khar.
Maximum 226 311 209 229
Minimum 45 22 66 22
Variance 1730 3380 1168 2226
Ras.
Maximum 1242 2264 1628 2786
Minimum 689 1099 609 1003
Variance 19241 63680 36767 89604
Far.
Maximum 1293 1225 1673 2106
Minimum 693 546 533 464
Variance 23853 30720 41407 91291
Kas.
Maximum 1323 1470 1486 1325
Minimum 792 592 690 492
Variance 18707 31380 25873 33943
Shan.
Maximum 1311 1632 1332 2201
Minimum 707 782 733 600
Variance 13901 46507 17757 84526
Khan.
Maximum 576 533 501 490
Minimum 129 93 132 107
Variance 12005 8839 8659 7614
Shap.
Maximum 462 262 382 261
Minimum 96 45 73 47
Variance 5480 3035 4168 2552
Shoo.
Maximum 93 188 90 193
Minimum 35 28 32 51
Variance 237 1220 197 994
Ars.
Maximum 540 581 416 453
Minimum 83 47 118 70
Variance 10241 12360 6562 7824
Fig. 1. Location map of rain gauge stations in Jazmoorian, Sefidrood, Mordab-anzali, Shapoor-
dalky and Mond basins
Selected
Predictand
Selected
Predictand
Selected
Predictors
Selected
Predictors
· MLR
· kNN
· MARS
· MT
· SVM-GA
· MLR
· kNN
· MARS
· MT
· SVM-GA
Occurrence Model Amount Model
Is the random
number less than the
occurrence model
output?
For each day, in each ensemble and using each model:
Precip.
Doesn’t
occur
Precip.
Doesn’t
occur
Precip.
occurs
Precip.
occursYes
No
Generate a
uniformly
distributed
random number
in [0,1]
Generate a
uniformly
distributed
random number
in [0,1]
Calculate The Amount model’s outputCalculate The Amount model’s output
Add a random number with normal distribution
to the output (Mean=0, Std.=Standard Error)
Add a random number with normal distribution
to the output (Mean=0, Std.=Standard Error)
Compare the result with the thresholdCompare the result with the threshold
Set Model Structure
Select the best combination of
occurrence and amount models
Select the best combination of
occurrence and amount models
· MLR
· kNN
· MARS
· MT
· SVM-GA
· MLR
· kNN
· MARS
· MT
· SVM-GA
Fig.2. NDMDM procedure
Fig.3. Downscaling results for Del. rain gauge
a) Monthly mean, b) Monthly standard deviation, c) Monthly skewness and d) q-q plot of
Monthly precipitation (left column: calibration period, right column: validation period)
0
1
2
3
4
5
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Mea
n (
mm
)
Month
a-2) Observed
MLRDM
NDMDM
0
1
2
3
4
5
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Mea
n (
mm
)
Month
a-1) Observed
MLRDM
NDMDM
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Std
. (m
m)
Month
b-2) Observed
MLRDM
NDMDM
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 M
on
thly
Std
. (m
m)
Month
b-1) Observed
MLRDM
NDMDM
0 2 4 6 8
10 12 14 16 18
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Sk
ewn
ess
(mm
)
Month
c-2) Observed
MLRDM
NDMDM
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Sk
ewn
ess
(mm
)
Month
c-1) Observed
MLRDM
NDMDM
0
1
2
3
4
5
6
0 1 2 3 4 5 6 Mod
eled
Mon
thly
Pre
cip
.
(mm
)
Observed Monthly Precip. (mm)
d-2)
MLRDM
NDMDM
Bisector
0
2
4
6
8
10
0 2 4 6 8 10 Mod
eled
Mon
thly
Pre
cip
.
(mm
)
Observed Monthly Precip. (mm)
d-1)
MLRDM
NDMDM
Bisector
Fig.4. Ras. rain gauge
a) Monthly mean, b) Monthly standard deviation, c) Monthly skewness and d) q-q plot of
monthly precipitation (Left column: calibration period, right column: validation period)
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Mea
n (
mm
)
Month
a-1) Observed
MLRDM
NDMDM
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Mea
n (
mm
)
Month
a-2) Observed
MLRDM
NDMDM
0
5
10
15
20
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Std
. (m
m)
Month
b-1) Observed
MLRDM
NDMDM
0
5
10
15
20
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Std
. (m
m)
Month
b-2) Observed
MLRDM
NDMDM
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Sk
ewn
ess
(mm
)
Month
c-1) Observed
MLRDM
NDMDM
0 1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Sk
ewn
ess
(mm
)
Month
c-2) Observed
MLRDM
NDMDM
0
5
10
15
0 5 10 15 Mod
eled
Mon
thly
Pre
cip
.
(mm
)
Observed Monthly Precip. (mm)
d-1)
MLRDM
NDMDM
Bisector
0
2
4
6
8
10
12
14
0 2 4 6 8 10 12 14 Mod
eled
Mon
thly
Pre
cip
.
(mm
)
Observed Monthly Precip. (mm)
d-2)
MLRDM
NDMDM
Bisector
Fig.5. Khan. rain gauge
a) Monthly mean, b) Monthly standard deviation, c) Monthly skewness and d) q-q plot of
monthly precipitation (Left column: calibration period, right column: validation period)
0
0.5
1
1.5
2
2.5
3
3.5
4
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Mea
n (
mm
)
Month
a-1) Observed
MLRDM
NDMDM
0
1
2
3
4
5
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Mea
n (
mm
)
Month
a-2) Observed
MLRDM
NDMDM
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Std
. (m
m)
Month
b-1) Observed
MLRDM
NDMDM
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 11 12 M
on
thly
Std
. (m
m)
Month
b-2) Observed
MLRDM
NDMDM
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Sk
ewn
ess
(mm
)
Month
c-1) Observed
MLRDM
NDMDM
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11 12
Mon
thly
Sk
ewn
ess
(mm
)
Month
c-2) Observed
MLRDM
NDMDM
0
2
4
6
8
10
0 2 4 6 8 10 Mod
eled
Mon
thly
Pre
cip
.
(mm
)
Observed Monthly Precip. (mm)
d-1)
MLRDM
NDMDM
Bisector
0
3
6
9
0 3 6 9 Mod
eled
Mon
thly
Pre
cip
.
(mm
)
Observed Monthly Precip. (mm)
d-2)
MLRDM
NDMDM
Bisector
Fig.6. 5-year moving average for A2 and B2 Scenarios in a) Del., b) Ras. and c) Khan. stations
0
100
200
300
400
500
600
20
05
20
10
20
15
20
20
20
25
20
30
20
35
20
40
20
45
20
50
An
nu
al
Pre
cip
(m
m)
a)
MLRDM-A2
MLRDM-B2
NDMDM-A2
NDMDM-B2
700
900
1100
1300
1500
1700
1900
2100
20
05
20
10
20
15
20
20
20
25
20
30
20
35
20
40
20
45
20
50
An
nu
al
Pre
cip
(m
m)
b)
MLRDM-A2
MLRDM-B2
NDMDM-A2
NDMDM-B2
0
100
200
300
400
500
600
20
05
20
10
20
15
20
20
20
25
20
30
20
35
20
40
20
45
20
50
An
nu
al
Pre
cip
(m
m)
c)
MLRDM-A2
MLRDM-B2
NDMDM-A2
NDMDM-B2
Regression-based of downscaling such as SDSM has been coded in MATLAB.
Nonlinear Data-Mining Downscaling Model (NDMDM), as a toolbox, has been programmed.
Four nonlinear and semi nonlinear data-mining method have been implemented in NDMDM.
Twelve rain gauges with different climate have been used as case studies.
Results of NDMDM have been better than statistical downscaling method using three statistical
indices.
*Highlights (for review)