robust parametric models of runoff characteristics at the mesoscale

16
Robust parametric models of runoff characteristics at the mesoscale Luis Samaniego a, * , Andra ´s Ba ´rdossy b a Institute of Regional Development Planning, University of Stuttgart, Stuttgart D-70569, Germany b Institute of Hydraulic Engineering, University of Stuttgart, Stuttgart, Germany Received 20 February 2004; revised 28 July 2004; accepted 3 August 2004 Abstract Many hydrologic studies report that runoff characteristics such as means or extremes of a given basin may be modified due to climatic and/or land use/cover changes and that the magnitude of these changes largely depends on the geographic location and the scale at which the study is carried out. Identifying the main causes of variability at the mesoscale, however, is a challenging task because of the lack of data regarding the spatial distribution of relevant explanatory variables and, if they exist, because of their high uncertainty. This study proposes a general method to find a robust non-linear model by solving a constrained multiobjective optimization problem whose solution space is composed of all feasible combinations of given explanatory variables. As a result, a model that simultaneously fulfills several criteria such as parsimony, robustness, significance, and overall performance is expected. Furthermore, it does not require assumptions regarding the sampling distributions neither of the parameters nor of the estimators because their p-values are estimated by a non-parametric technique. Finally, there is no limitation with respect to the functional form adopted for a given model and its estimator because a generalized reduced gradient algorithm is used for the calibration of its parameters. The proposed method was tested in the upper catchment of the Neckar River (Germany) covering an area of approximately 4000 km 2 . The objective of this study was to detect trends and responses of runoff characteristics in mesoscale catchments due to changes of climatic or land use/cover conditions. In this case, the explained variables are the specific total discharge in summer and winter whereas the explanatory variables comprise several physiographic, land cover and climatic characteristics evaluated for 46 subcatchments during the period 1961–1993. The results of the study indicate a significant gain in performance and robustness of the selected models compared to traditional stepwise methods. The applicability of this method to other disciplines and/or locations is possible. q 2004 Elsevier B.V. All rights reserved. Keywords: Runoff; Multiobjective optimization; Cross-validation; Permutation test; Mallows’ Cp’ statistic 1. Introduction In general, the purpose of modeling is to simulate a part of ‘reality’ or a system using a set of rules and algorithms that resemble the behavior and relation- ships of the observed variables. By doing so, Journal of Hydrology 303 (2005) 136–151 www.elsevier.com/locate/jhydrol 0022-1694/$ - see front matter q 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.jhydrol.2004.08.022 * Corresponding author. E-mail address: [email protected] (L. Samaniego).

Upload: luis-samaniego

Post on 26-Oct-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust parametric models of runoff characteristics at the mesoscale

Robust parametric models of runoff characteristics at the mesoscale

Luis Samaniegoa,*, Andras Bardossyb

aInstitute of Regional Development Planning, University of Stuttgart, Stuttgart D-70569, GermanybInstitute of Hydraulic Engineering, University of Stuttgart, Stuttgart, Germany

Received 20 February 2004; revised 28 July 2004; accepted 3 August 2004

Abstract

Many hydrologic studies report that runoff characteristics such as means or extremes of a given basin may be modified due to

climatic and/or land use/cover changes and that the magnitude of these changes largely depends on the geographic location and

the scale at which the study is carried out. Identifying the main causes of variability at the mesoscale, however, is a challenging

task because of the lack of data regarding the spatial distribution of relevant explanatory variables and, if they exist, because of

their high uncertainty. This study proposes a general method to find a robust non-linear model by solving a constrained

multiobjective optimization problem whose solution space is composed of all feasible combinations of given explanatory

variables. As a result, a model that simultaneously fulfills several criteria such as parsimony, robustness, significance, and

overall performance is expected. Furthermore, it does not require assumptions regarding the sampling distributions neither of

the parameters nor of the estimators because their p-values are estimated by a non-parametric technique. Finally, there is no

limitation with respect to the functional form adopted for a given model and its estimator because a generalized reduced

gradient algorithm is used for the calibration of its parameters. The proposed method was tested in the upper catchment of the

Neckar River (Germany) covering an area of approximately 4000 km2. The objective of this study was to detect trends and

responses of runoff characteristics in mesoscale catchments due to changes of climatic or land use/cover conditions. In this case,

the explained variables are the specific total discharge in summer and winter whereas the explanatory variables comprise

several physiographic, land cover and climatic characteristics evaluated for 46 subcatchments during the period 1961–1993.

The results of the study indicate a significant gain in performance and robustness of the selected models compared to traditional

stepwise methods. The applicability of this method to other disciplines and/or locations is possible.

q 2004 Elsevier B.V. All rights reserved.

Keywords: Runoff; Multiobjective optimization; Cross-validation; Permutation test; Mallows’ Cp’ statistic

0022-1694/$ - see front matter q 2004 Elsevier B.V. All rights reserved.

doi:10.1016/j.jhydrol.2004.08.022

* Corresponding author.

E-mail address: [email protected] (L. Samaniego).

1. Introduction

In general, the purpose of modeling is to simulate a

part of ‘reality’ or a system using a set of rules and

algorithms that resemble the behavior and relation-

ships of the observed variables. By doing so,

Journal of Hydrology 303 (2005) 136–151

www.elsevier.com/locate/jhydrol

Page 2: Robust parametric models of runoff characteristics at the mesoscale

1 But not in the sense of continuous rainfall-runoff modeling.2 That is to say, those basins whose length ranges from 102 to less

than 105 m and with an area less than 5000 km2 (Dooge, 1988).

L. Samaniego, A. Bardossy / Journal of Hydrology 303 (2005) 136–151 137

a modeler may gain expertise, get a deep under-

standing of the underlying processes and their mutual

interactions, forecast future trends and estimate likely

outcomes of plausible scenarios (Casti, 1984). How-

ever, how complex should a model be to describe an

observed ‘reality’, for instance, one aimed at describ-

ing a given characteristic of the runoff process at the

mesoscale? In order to answer this question one must

take into account three crucial issues, namely: (1) the

data availability, (2) the wise selection of relevant

observables, and (3) the level of predictability of the

model. Let us consider these issues in greater detail in

order to formulate the objective of this study.

First, the data availability, as pointed out by Wilby

(1997), must be ‘carefully considered’ in any model-

ing exercise, especially if its output (i.e. a model) is

projected to have a practical application or perhaps to

become a planning tool (e.g. one to be used in

environmental or regional planning applications).

This implies that a model should have variables that

can be obtained or derived either from existing

databases or by direct surveying; in other words, it

must avoid variables that cannot be estimated because

there is a lack of technical capabilities, their

acquisition is too costly or, even worse, it is too

complex or even impossible to acquire them. If these

guidelines are not observed, a model, perhaps

interesting from a theoretical point of view, would

just be unpractical and most probably misleading in

the realm of planning.

The second and third points mentioned above are

closely related and can be summarized as follows: a

chosen model should exhibit the minimum number of

parameters (i.e. parsimonious), the relationship

among its explanatory variables and the explained

variable should be as simple as possible, the number

of selected explanatory variables should be as few as

possible but they should explain as much as possible

the observed variability of the phenomenon rep-

resented by the explained variable (e.g. a given runoff

characteristic), all its variables should be statistically

significant, and finally, it should be resistant to

outliers, which are very likely to occur in a given

sample.

This paper presents a method for the selection and

validation of robust parametric models, which is

based on long-established stepwise regression

methods and non-parametric and cross validation

techniques.

2. Defining the formal system

There are a number of examples in the literature,

e.g. Chow (1984), Rodriguez-Iturbe (1969), Raudkivi

(1979), Clarke (1994) and Abdulla and Lettenmaier

(1997), in which a characteristic of the water cycle

was related to a set of appropriate explanatory

variables. In general, these methods regard the

intervening variables as time independent. That

means that the sample used for the calibration and

validation of the model is composed of a set of

constant information and a relevant statistic of a runoff

characteristic (e.g. a long-term mean, a percentile, a

maximum, or a minimum) for a set of basins.

In this study, in contrast, the main goal is to

formulate a parametric model that simulates the

development of a runoff characteristic over time1 for

a given set of mesoscale basins2 based on statistically

significant variables representing the main processes

involved in the water cycle at this scale. This implies

that those variables employed in the subsequent

analysis are a time series rather than averages over a

fixed period. In order to achieve the objective

mentioned above it is helpful to recall the following

definition: the hydrologic cycle within a drainage

basin is a ‘sequential, dynamic system in which water

is the major throughput’ (Chow, 1984). This system is

dynamic because it comprises several intertwined

spatial phenomena, or processes, that are changing

constantly over time. It is sequential because there are

inputs, an output, and a working fluid (i.e. water),

called throughput, passing through the system. Con-

sequently, the available information can be divided,

for analytical purposes, into two major categories,

namely: output or explained variable, and inputs or

explanatory variables. The latter can, in turn, be

further subdivided into three main subcategories,

namely: (1) physiographical factors, (2) shares of land

cover types, and (3) climatic or meteorological

factors.

Page 3: Robust parametric models of runoff characteristics at the mesoscale

L. Samaniego, A. Bardossy / Journal of Hydrology 303 (2005) 136–151138

Physiographical factors comprise all those vari-

ables that can be regarded as constant or quasi-static.

Put differently, those variables for which the period

needed to appreciate a significant change has an order

of magnitude greater than 105 years. These factors

comprise basin and channel characteristics such as:

geological formations that constitute the basin’s

underground; the basin’s soil layers and their specific

soils types; and geometric factors of the drainage

basin such as slope, aspect, shape, size, elevation, and

drainage density (Chow, 1984).

Shares of land cover types exhibit, in general, a

slow changing rate over time (excluding some local

exceptions, land use and land cover seldom change

more than 5% per year (Robinson et al., 1998). The

order of magnitude of a time interval necessary to

perceive a significant change in their values varies

from place to place, but in general, it would be

ranging from 100 to 101 years. These variables stand

for the observable consequences of anthropogenic

activities happening within a basin.

Finally, climatic or meteorological factors are

those variables characterized by extreme variations

in their order of magnitude in very short periods. The

period in which a significant change can be expected

ranges from 101 to even less than 10K4 years

(Kleeberg and Cemus, 1992). In general, these factors

exhibit some periodicity combined with partly chaotic

and stochastic behaviors. This category comprises the

following variables: precipitation, evaporation, solar

radiation, temperature, and atmospheric circulation

patterns (closely related with relative humidity and

wind velocity, among others).

The system described above can be formally

written as follows. Let Qtil be a given observed runoff

characteristic l, or output variable, for a given basin i

in time point t, then Qtil can be written as a function of

relevant observables, namely

Qtil Z f ðGt

i;Uti;M

ti;bÞC3t

i; c i Z 1;.; n;

c t Z 1;.;T ð1Þ

where Gti Z ½xt

i;1; xti;2;.; xt

i;g�T denotes a vector of size

g composed of those observables that describe the

physiographic characteristics of a given basin i in time

point t; it is assumed that GtC1i yGt

i; c iZ1;.; n

;ctZ1;.; T K1: Uti Z ½xt

i;gC1; xti;gC2;.; xt

i;gCu�T

denotes a vector of size u composed of input variables

that describe the land cover and land use states of a

given basin i in time point t; Mti Z ½xt

i;gCuC1; xti;gCuC2;

.; xti;gCuCm�

T is a vector of size m composed of input

variables that describe the climatic conditions of basin

i in time point t. JZgCuCm is the total number of

explanatory variables available in a given sample;

f($), a non-linear function of the previous variables to

be determined; b, a vector of size p* composed of

parameters to be estimated; 3ti is an independent and

identically distributed additive error.

It is assumed that these variables are known for n

basins and T time points, which do not need to be

necessarily consecutive; thus a sample size—for each

variable—composed of n0%nT observations at the

most is known. It should be noted that these variables

have to be evaluated—at least—in semi-annual

intervals to avoid serial autocorrelation.

3. Method

3.1. Notation

Let ~p be a vector of indexes denoting which

variables are included in a given model f($) having p*

parameters; here, pj denotes the j element of ~p: Let p

denote the cardinality of ~p; or in another words, the

number of explanatory variables of such a model. Let

Cp� ð~pÞ denote the Mallows’ statistic associated with a

model having the set of variables fxtip1; xt

ip2;.; xt

ippg

and c a given threshold value. Additionally, let the

statistic Qpjdenote a convenient measure of depen-

dence between the variable xtipj

and Qtil given a

function f($) under the conditions of the null

hypothesis HðjÞ0 (i.e. Variables Qt

il and xtipj

are

independent in RpC1). Let w be the value of the test

statistic based on the available data and a an adequate

level of significance, say 10%; and finally, let Fkð~pÞ

kZ1; 2 be two cross-validation indicators evaluated

independently for a given model but whose para-

meters bk were calibrated by minimizing two different

estimators Lk.

3.2. Estimation of the Mallows’ Cp� statistic

Finding the trade-off between the number of

variables included in a model (p) and its explanatory

power is a crucial point during the model building

Page 4: Robust parametric models of runoff characteristics at the mesoscale

L. Samaniego, A. Bardossy / Journal of Hydrology 303 (2005) 136–151 139

process. As a mater of fact, the greater p, the better the

fit, and the greater the value of R2 is; hence, many

suitable statistics have been proposed to counter-

balance this negative effect. For instance, the adjusted�R2

(Ezekiel, 1930), the Mallows’ Cp� statistic

(Mallows, 1973), and the Akaike’s Information

Criterion (Akaike, 1973). The Cp� criterion has the

advantage compared with an adjusted �R2that in

addition to adjusting the sum of squared errors, it can

be demonstrated that its expectation is equal to the

number of parameters used in the model (Daniel and

Wood, 1980), or

E½Cp� � Z p�: (2)

This means that the closer the value of Cp� to p*

is, the lesser the bias of the fitted model, and hence,

the better the model fit is. Therefore, this statistic is

aimed at guiding in the selection of a model that it

is composed of the minimum number of variables

but explains, as much as possible, the observed

variability in the observations. Using this property

and a given threshold c, a subset of best performing

models can be identified as is shown in Fig. 1.

Fig. 1. Cp* vs. p* plot depicting the subsets of potential models

(POT) satisfying the constraint cp* %cZ13 for winter and summer,

i.e. points under the dashed line. In this case c is equal to the

maximum number of explanatory variables in the saturated model.

As a rule of thumb, c can be made equal to the

maximum number of explanatory variables in the

saturated model, i.e. cZp.

The Mallows’ statistic for a model composed of p

variables is

Cp� ð~pÞ Z1 KR2

p� ð~pÞ

1 KR2J�

ðn0 KJÞC2p� Kn0; (3)

where

R2p� ð~pÞ

Z1K

PTtZ1

PniZ1ðQ

til KQ

tilð~pÞÞ

2

PTtZ1

PniZ1 Qt

il K1n0

PTtZ1

PniZ1 Qt

il

� �2; ð4Þ

R2J�

equal to R2

p� if pZJ and p*ZJ*. In other words,

the coefficient of determination associated with a

model containing all input variables available

(i.e. J)

n0

the total number of observations

i

an index related to a given basin

l

an index for an observed runoff characteristic

t

a time index

and

Qtilð~pÞZ f ðxt

ip1;xt

ip2;.;xt

ipp; bkÞ: (5)

Provided a function f($), bk can be estimated for

each proposed model by minimizing the estimator Lk

given by

minbk

Lk ZXT

tZ1

Xn

iZ1

wtijQ

til KQ

tilj

4; (6)

where

wti

a factor corresponding to a spatial unit i during the

time point t introduced to correct heteroscedasti-

city if present in the data set or to diminish the

influence of outliers in the estimation of the

model’s parameters; hence, it will contribute to

improve the model robustness. This factor is

Page 5: Robust parametric models of runoff characteristics at the mesoscale

L. Samaniego, A. Bardossy / Journal of Hydrology 303 (2005) 136–151140

estimated as follows:

wti Z

1 if j3t

i

s3

j%Zc

0 if j3t

i

s3

jOZc

8>><>>: (7)

s3

the estimated sample standard deviation of

random errors provided that the expectation of 3ti

is zero, �3ZE½3ti�Z0

s3 Z

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

n0 K1

Xt

Xi

ð3tiÞ

2

s(8)

Zc

a threshold value normally ranging from 2 to 3

(Rousseeuw and Leroy, 1987),

f

an exponent usually equal to 1 or 2.

3 Often called permutation test; see Davison and Hinkley (1997)

for more details.

3.3. Significance test

The inclusion of statistically significant variables

in a model is of key importance to find ‘good’ but

‘simple’ models among the numerous possibilities

given a set of predictors. The main reason for this

is that a non-significant variable will only increase

the total variance without increasing the goodness

of the fit of the model. Otherwise stated, it will

only add noise to the system that, in turn, will

deteriorate the explanatory power of other signifi-

cant predictors. In order to perform a significance

test within the context of this study the following

definitions are necessary. Let the set of observations

(i.e. a sample) of an observed runoff characteristic l

be denoted by

D Z fðQtil;x

tijÞ : i Z1;.;n; t Z1;.;T ;

j2ðp1;.;ppÞg; ð9Þ

whose cardinality (i.e. the number of valid obser-

vations) is

n0 Z jDj%nT : (10)

Based on D, assume that Qtil can be predicted

by a model using p explanatory variables linked by

a known functional f($) and a vector of calibrated

parameters bk; as in (5). In this case, there would be

p null hypotheses H0 and their respective alterna-

tives HA that require testing. The objective of the jth

null hypothesis HðjÞ0 is to test whether the variable xj

in model (5) is independent with respect to the

explained variable Qtil considering the multivariate

joint distribution function where, this model is

defined; or in other words, to infer—based on the

previous sample—that there is no evidence at a

given level of significance a that the variable xj was

chosen by chance when such a model was assessed.

Consequently, a measure of the discrepancy between

the data and the null hypothesis (i.e. a test statistic

Q) should be identified in order to perform these

tests. There are many possibilities to select such

statistic but the simplest measure of dependence

between the variables described above is the

estimator Lk (6) because it would take a large

value under the null hypothesis, and conversely, a

small one if the null hypothesis should not be true.

In the present study, a non-parametric test3 can

be used for assessing a simulated sampling distri-

bution of Q from which the significance probabil-

ities (i.e. p-values) for each respective hypothesis

are to be estimated. In general, Algorithm 1

shows the steps needed to carry out this significance

test.

Algorithm 1. Significance test

(1)

Let fZk2{1,2}

(2)

Given a functional QtilZf ðxt

ip1;xt

ip2;.;xt

ipp; b4ÞC

3ti and the sample D, estimate bk so that min Lk.

Set the test statistic wZLk.

(3)

For all j2{p1,.,pp},

(a) For rZ1,.,R,

(i) Generate fxt�ij g as a random permutation

of fxtijg; where iZ1,.,n and tZ1,.,T;

(ii) Generate the simulated data set D�r

replacing fxtijg by fxt�

ij g;

(iii) Based on D*r estimate bkr� so that min

L�kr: The value of the statistic for the

simulated data data set (if HðjÞ0 is true) is

then w�r ZL�

kr:

Page 6: Robust parametric models of runoff characteristics at the mesoscale

L. Samaniego, A. Bardossy / Journal of Hydrology 303 (2005) 136–151 141

(b) Sort w among w�r crZ1;.;R so that

w�ð1Þ%/%w�

ðrK1Þ%w%w�ðrÞ%/%w�

ðRÞ:

(11)

(c) Estimate a one sided Monte Carlo p-value by

p-valuezpmc Zðr K1Þ

RC1: (12)

(d) Select a level of significance (say, aZ5%);

then

(e) Make a decision:

If p-value%a then

0 Reject HðjÞ0 in favor of H

ðjÞA at the level

of significance a, and

0 Conclude. At this level of significance

variables Qtil and xt

ij are certainly not

independent.

Else, HðjÞ0 cannot be rejected at this level of

significance.

4 The sensitivity of a model to the presence of outliers in the data.5 Those models that satisfy the constraints given in (16)–(21).

Here R is the number of realizations carried out in

the permutation test. As a rule of thumb, Davison and

Hinkley (1997) have suggested that a reasonable

estimate of the p-value can be obtained when R is

greater than or equal to 500. In the present case, the

convergence of the p-value was always achieved

when Rz500 (Samaniego, 2003).

Since the observations of the sample D are

independent in space (each i correspond to a different

basin) and assuming that the temporal autocorrelation

can be neglected because of the time span used in the

evaluation (e.g. at semi-annual intervals), a random

permutation of the vector fxtijg for a given j [see step

(3a.i)] can be obtained as follows: (1) Generate a

vector of uniformly distributed random numbers

{yk}Z1,.,n0 (2) Associate to each xtij a random

number yk; Sort {yk} in ascending order so that

yð1Þ%/%yðn0Þ; Rearrange xt

ij according to the relative

ordering in {y(k)} so that the random permutation fxt�ij

gZfxðtÞðiÞjg is obtained.

In the previous algorithm the set fw�ð1Þ/w�

ðrÞ/w�ðRÞg constitutes a good approximation to the null

distribution of the statistic w. Based on this simulated

sampling distribution and the percentile method [see

steps (3b) and (3c)] the standard hypothesis testing

method is applied [see steps (3d) and (3e)] to infer

whether enough evidence exist in favor HðjÞ0 or against

it. This procedure is repeated for every variable j

belonging to a given model.

3.4. Model validation

The purpose of this section is to assess the

robustness4 of a given model, e.g. Qtilð~pÞ: Here, two

Jackknife statistics—hereafter called objective func-

tions—are to be calculated in order to fulfill this

objective.

The first objective function, denoted by F1, is

estimated based on the given model whose parameters

b1 are obtained by minimizing the estimator L1,

whereas the second one, F2, is estimated based on a

model containing the same variables and functional

relationship as the previous one; but whose par-

ameters b2 were obtained by minimizing L2. The

estimators L1 and L2 can be calculated as shown in (6),

with fZ1 and fZ2, respectively. It is worth

mentioning that the former is remarkably more robust

than the latter, as was demonstrated by Rousseeuw

and Leroy (1987). Put differently, each objective

function independently assesses the quality of a given

model with regard to a given estimator.

The sensitivity to outliers—here symbolized by F1

and F2—of each feasible model5 composed of p

variables is estimated by a cross-validation technique

(Efron, 1982; Simonoff, 1996) that is a special case of

the Jackknife Method introduced by Quenouille

(1949) and Tukey (1982). Algorithm 2 describes the

procedure used in this study to validate a given model

Qtilð~pÞ:

Algorithm 2. Model validation

(1)

Given a runoff characteristic l and fZk2{1,2}.

(2)

For iZ1,.,n,

(a) For tZ1,.,T,

(i) Let EtiZfðQt

il;xtijÞ : j2ðp1;.;ppÞg be a

subset of observations for a given i and t.

Eliminate the subset Eti from the original

sample so that the new subset is~DZDKEt

i;

Page 7: Robust parametric models of runoff characteristics at the mesoscale

L. Samaniego, A. Bardossy / Journal of Hydrology 303 (2005) 136–151142

(ii) Using ~D estimate ~bk so that min ~Lk;

(iii) Estimate ~QtilZf ðxt

ip1;xt

ip2;.;xt

ipp; ~bkÞ;

(iv) Calculate the Jackknife statistic for the

observation i, t as follows

qti Z ðQt

il K ~QtilÞ

2: (13)

(3)

Calculate the objective functions for a given

model by

Fkð~pÞZXn

iZ1

XT

tZ1

qti; qt

iR0: (14)

The interpretation of the value estimated by (14) is

as follows: the lesser its value, the more robust a

model is with regard to the disturbances caused by

outliers.

6 A combinatorial problem that is solvable in Non-Polynomial

time (Hartmann and Rieger, 2002; Coello et al., 2002).

3.5. Problem definition

The goal of this study based on previous definitions

can be formalized as a constrained multiobjective

optimization problem as follows

Find ~p Z½p1;.;pp�T so that

min ½F1ð~pÞ;F2ð~pÞ�T

(15)

subject to

fxtipj

: 1%pj%gg3Gti; (16)

fxtipj

: gC1%pj%gCug3Uti; (17)

fxtipj

: gCuC1%pj%Jg3Mti; (18)

Cp* ð~pÞ%c; (19)

PrðQpj%wjH

ðjÞ0 Þ%a cpj 2~p (20)

JOpR3; (21)

~p is to be determined by a simultaneous minimiz-

ation of both objective functions Fk, kZ1, 2 subject to

constraints given by (16)–(21); i.e. finding a Pareto

optimum. This optimization problem, therefore, is

aimed to single out a model that belongs to the

feasible region and, additionally, exhibits the slightest

sensitivity to outliers regardless of the estimator

employed for the calibration of its parameters.

The aim of these constraints is threefold: (1) to

ensure that each sub subset of explanatory variables

of the system has at least a cardinality equal to one;

(2) to reduce as much as possible the cardinality of

the solution space of a given problem composed of

J explanatory variables; and (3) to guarantee that all

variables of a given model are statistically

significant.

4. Searching for a ‘robust’ model

The multiobjective combinatorial optimization

problem presented in (15) is NP-complete.6 This

means that the running time of an algorithm devised

to find a solution of (15) increases greater than

exponentially with the number of variables J.

Therefore, if J is big, say more than 25, ‘good’

solutions for this problem can only be found by

heuristic approaches such as Simulated Annealing,

Genetic Algorithms, Neural Networks, among

others. On the other hand, if the number of variables

is small, say 13 or less, the ‘optimum’ solution can

be found after estimating all feasible combinations

(i.e. an enumerative approach), since the running

time of such an algorithm is still ‘acceptable’. If a

given problem is in between those thresholds, the

chosen method would largely depend on how big

the sample size is and how many realizations for the

significance test are required. A big sample size (say

n0O1000) would indicate that heuristic approaches

should be used. Since this paper intends to show the

feasibility of the proposed method (i.e. to solve the

problem explicitly written in (15)), it is convenient

to keep the number of variables below the lower

threshold so that the enumerative approach can be

used.

Finding an ‘optimum’ solution for a multiobjective

problem implies that one has to make compromises or

trade-offs between the objectives. For instance, a

solution that has the highest value in the first objective

function but the smallest one in the second objective

Page 8: Robust parametric models of runoff characteristics at the mesoscale

L. Samaniego, A. Bardossy / Journal of Hydrology 303 (2005) 136–151 143

function is considered far from the ‘optimum’. This

kind of optimality was originally proposed by

Edgeworth (1881) and later generalized by Pareto

(1896). A formal definition of the Pareto optimality

can be found in Coello et al. (2002). Since both

objective functions are commensurable [see (13) and

(14)], the simplest technique to find the global

minimum of the problem stated in (15) is to

minimize a weighted sum of the components

(other possibilities also are possible, see Sarker et

al. (2002) or Coello et al. (2002). Formally, this can

be written as

min Fð~pÞZX2

kZ1

ukFkð~pÞ; ckukR0; (22)

where uk are weighting coefficients to be selected.

In this case they were chosen equal to one because

both objective functions are equally important. This

technique, which is the oldest among mathematical

programming methods for solving multiobjective

optimization, is commonly used in scientific and

engineering problems despite its shortcomings (Das

and Dennis, 1997), probably because of its

simplicity and since it can be derived from the

Kuhn–Tucker conditions for non-dominated sol-

utions (Kuhn and Tucker, 1951). Algorithm 3

describes the searching technique employed in this

study.

Algorithm 3. Searching technique

(1)

7 T

is 2J

Select a function f($) for a given a runoff

characteristic l, e.g. potential, multilinear, or a

combination of both.

(2)

Calibrate all possible models7 (i.e. min Lk) given

a set of variables ðxti1;x

ti2;.;xt

iJÞ that satisfy

constraints (16)–(18), using two estimators Lk,

kZ1,2: one with 4Z1, and another with 4Z2,

respectively.

(3)

Select all models whose Cp� ð~pÞ%c for each

estimator. These models constitute the subset

of the best performing ones estimated for a

given 4.

he number of possible models given J explanatory variables

.

(4)

Calculate for the previously selected subsets of

models the objective functions Fkð~pÞ; kZ1;2;

then estimate F as in (22).

(5)

Rank models in ascending order with regard to F.

The model that would exhibit the minimum value

of F is chosen as the most robust model for the

given functional type.

(6)

Repeat Steps (1)–(5) if necessary (e.g. if another

function is to be tested).

(7)

If several functions are tested, the most suitable

function would be that exhibiting the minimum F

among those attempted.

(8)

Check that all variables constituting the most

robust model are statistically significant, i.e. those

whose p-value is less than 10%, for both

estimators.

(9)

Additional quality measures such as BIAS, MSE,

MAE or RMSE (Bardossy, 1993; Lettenmaier and

Wood, 1993), can be employed for further

screening of less robust models in the case that

there would exist competing models, i.e. those

models that fulfill all constraints and have very

similar values of the aggregated objective func-

tion F.

Since the randomization test used in this study is

a computing intensive technique, it is applied only

to those models that satisfy step (5) of Algorithm 3.

The calibration of the parameters carried out in step

(7) was done by a Generalized Reduced Gradient

(GRG) technique (Wolfe, 1963; Abadie and Car-

pentier, 1969), which has been implemented in

many Fortran subroutines (e.g. Lasdon et al.

(1978)). The GRG algorithm is based on a robust

implementation of the BFGS quasi-Newton algor-

ithm. This procedure requires a non-linear convex

and continuously differentiable objective function

such as (6), an iterative searching procedure that

employs a Hessian matrix estimated by central

differences, and a quadratic extrapolation technique

that search for local minima. Moreover, in order to

ease and speed up the convergence of the solution,

the domains of the input data Qtil and xt

ij—originally

in [0,RC]—were re-scaled to the interval [3, 1].

Those values originally equal to zero were modeled

as a very small positive number—e.g.

3Z1!10K10—in order to avoid indeterminations.

All parameters after the optimization were

Page 9: Robust parametric models of runoff characteristics at the mesoscale

L. Samaniego, A. Bardossy / Journal of Hydrology 303 (2005) 136–151144

transformed back to their original domains, with the

exception of the estimators Lk, thus L1OL2, as can

be seen in Table 2.

5. Application

Fig. 2. Map showing the location of the Upper Neckar Catchment

within the State of Baden-Wurttemberg, Germany.

5.1. The study area

The proposed method was tested in the upper

catchment of the Neckar River upstream of the

Plochingen gauging station covering an area of

approximately 4000 km2. As shown in Fig. 2, the

Study Area is located to the south and southeast

of Stuttgart, Germany. Its elevation above sea

level ranges from 240 to 1014 m and has a mean

elevation of 546 m. Slopes are in general mild;

90% of its area has slopes varying from 0 to 158,

although some areas in the Swabian Jura or in the

Black Forest may have values as high as 508. The

climate of the Study Area can be classified as Cf

according to Koppen’s notation. This climatic type

is characterized by having warm-to-hot summers

with generally mild winters, and it is wet all

seasons. The coldest and hottest months in the

Study Area are January and July, respectively. The

daily mean air temperature in the former is about

K0.8 8C, whereas in the latter is about 17 8C (for

the period from 1961 to 1990, DWD8). Although

the climate of the area is moderate, a maximum

annual range of about 47.4 8C has been observed

in past decades. The annual variation of precipi-

tation in the Study Area exhibits a multimodal

distribution. Precipitation-events may arise the

whole year round, the rainiest month being June

and the driest one October, whose monthly means

are 126 and 64 mm, respectively (1961–1995,

DWD). The mean annual precipitation observed

during this period is 908 mm.

With regard to land use, the Study Area has

endured rapid land use transitions from cropland or

grassland to either built-up area or industrial usages

since the early 1960s.

8 German Meteorological Service.

5.2. Data availability and variable definition

The basic information for the Study Area was

obtained from several sources, namely:

9

The output or explained variables in this study are

the cumulated specific runoff in winter and summer

seasons, Q1 and Q2, respectively. These variables

are estimated for each basin i and time point t based

on the time series of mean daily flows from

midnight to midnight, which were obtained from

LfU9 and DWD for 46 gauging stations within the

Study Area from Nov 1, 1961 to Oct 31, 1993.

Many other runoff characteristics can also be

estimated as described in Samaniego (2003); in

this study, however, only these two are used to

show how to apply the proposed method.

The physiographic variables were derived from:

(1) a digital elevation model with a spatial

resolution of 30!30 m (LfU); (2) a digitized soil

Institute for Environmental Protection Baden-Wurttemberg.

Page 10: Robust parametric models of runoff characteristics at the mesoscale

Table 1

Definition and notation of input and output variables for the study

area

10

L. Samaniego, A. Bardossy / Journal of Hydrology 303 (2005) 136–151 145

map at the scale 1: 200,000 (LfU); and (3) a

digitized geological map at the scale 1: 600,000

(LfU).

Variable Unit Description † Factor Name

Q Qi1 mm Total discharge in winter, lZ1

Qi2 mm Total discharge in summer, lZ2

G xi1 km2 Area of the catchment i

xi2 8 Mean catchment slope

xi3 8 Median of the catchment’s

slope

The land cover data were obtained from two main

sources: topographical maps at the scale 1: 25,000

for 1961 (LVA10) and three LANDSAT scenes for

the years 1975, 1984, and 1993 (LfU). The spatial

resolution of these images is 30!300 m reclassi-

fied into three land cover classes: forest, imper-

vious, and permeable cover, respectively.

xi4 8 Trimmed mean slope

F(15)–F(85)

xi5 8 Trimmed mean slope

F(30)–F(70)

xi6 8 Mean slope of the stream

network

xi7 8 Mean slope in floodplains

xi8 1/km Drainage density

xi9 – Shape factor

xi10 – Fraction of north-facing slopes

xi11 – Fraction of south-facing slopes

xi12 m Mean elevation of the

catchment

xi13 m Difference between max and

min elevation within a catch-

ment

xi14 – Fraction of saturated areas

xi15 mm Mean field capacity

xi16 – Fraction of karstic formations

U xi17 – Mean fraction of forest cover

xi18 – Mean fraction of impervious

cover

xi19 – Mean fraction of permeable

The climatological variables of daily precipitation

and temperature were obtained for 288 meteor-

ological stations in Baden-Wurttemberg from Nov

1, 1961 to Oct 31, 1993 (LfU and DWD). This

information has been subsequently interpolated by

External Drift Kriging with a spatial resolution of

300!300 m (Bardossy, 1999).

Based on the basic information, a number of

indicators or predictors were derived for each

subcatchment and time point within the Study Area

(i.e. 46 subcatchments from 1961 to 1993, hence

nZ46 and TZ33) as displayed in Table 1. For more

details on how to estimate each indicator, please refer

to Samaniego (2003). The size of the samples

(i.e. one for winter and summer, respectively) used

in this study were about n0z1000 observations (after

excluding outliers and years with no information).

They include basins whose area ranges from few

square kilometers to about 4000 km2.

cover

M xi20 mm Cumulative winter

precipitation

xi21 mm Cumulative summer

precipitation

xi22 mm Mean winter precipitation

xi23 mm Mean summer precipitation

xi24 mm Maximum antecedent

precipitation index in winter

xi25 mm Maximum antecedent

precipitation index in summer

xi26 K Mean temperature in January

xi27 K Mean temperature in July

xi28 K Maximum temperature in

January

xi29 K Maximum temperature in July

xi30 K Maximum antecedent

temperature index in winter

xi31 K Maximum antecedent

5.3. Model definition

Based on the variables shown in Table 1, it can be

concluded that the solution space is large since there

are 25 explanatory variables (i.e. JZ25). Knowing

that each of these explanatory variables is mutually

correlated with the rest to some degree, it was decided

to take for further analysis only those variables of

each subcategory that have the highest correlation

coefficient with the explained variable and that are

least correlated within each subcategory. As a result

of the screening not only was the solution space

reduced but the multicollinearity of explanatory

variables was minimized. In the present case,

temperature index in summer

State Surveying Agency Baden-Wurttemberg.

Page 11: Robust parametric models of runoff characteristics at the mesoscale

L. Samaniego, A. Bardossy / Journal of Hydrology 303 (2005) 136–151146

however, it is not recommended to have a total

number of variables less than seven, because the

dimensionality of present system is about seven

(Samaniego, 2003). The selected subsets of variables

for winter and summer are

fxj : jZ7;8;9;11;12;14;15;16;17;18;19;20;26g (23)

and

fxj : jZ7;9;10;13;14;15;16;17;18;19;21;29g; (24)

respectively.

In this study, three convex and continuously

differentiable functions are to be investigated. The

first one is a potential model (shortened to POT) that

considers all possible explanatory variables as having

non-linear relationships with the explained variable.

The second model type, thereafter called MLP1,

regards the climatic variables x20 and x21 as the only

ones having a non-linear relationship with the

explained variable whilst the rest are considered

linearly related with the explained variable. Lastly,

the third model type (shortened to MLP2) regards the

land cover variables as the only ones exhibiting linear

relationships with the output variable. These models

can be written explicitly as

Qtil Zb0

Yj

ðxtijÞ

bj C3ti; (25)

Qtil Zb0 C

Xj

jsj0

bjxtij Cbj0 ðx

tij0 Þ

bj0 C3ti (26)

and

Qtil Zb0 C

Xj2U

bjxtij CbJ*

Yj

j;U

ðxtijÞ

bj C3t

i: (27)

Here

U Z fxj : j Z17;18;19g

l Z1;2

j; j02fp1;.;ppg

j0 Z20 if l Z1

21 if l Z1

(

J� ZpC1

b0,bj,bJ� are the coefficients to be optimized.

It should be noted that in this paper (as opposed to

other studies, e.g. in Abdulla and Lettenmaier (1997))

the error term 3ti in (25)–(27) is additive. This model

feature can be used in this case because the explained

variables are dealing with the specific discharge

instead of the absolute values, which are basin-size

dependent. This, in turn, enables using catchments of

various sizes for the calibration of a given model.

5.4. Results and discussion

The results summarized in Table 2 were obtained

after applying the algorithm 3 to the available data

aiming at obtaining two robust models for the total

discharge in winter and in summer, respectively. It

should be noticed, however, that Table 2 only shows

the three best models for each function type ordered in

decreasing order of robustness (out of a total of 49,146

models generated and evaluated for winter and

summer, respectively).

The optimized parameters for winter and summer

are shown in Table 3. In general, the signs of these

coefficients correspond with the perception one can

have about this natural system. For instance, precipi-

tation and mean slope definitely should have a

positive sign. This means that the higher their values,

the bigger the specific discharge from a given basin

will be. Field capacity, on the contrary, should have a

negative sign because the higher its average value, the

bigger the quantity of water stored in the soil matrix,

and hence, the lesser the expected runoff.

In particular, the negative signs of the land cover

variables in winter can be explained based on the

following hydrological considerations. Forest and

permeable covered surfaces (e.g. grassland, cropland,

or meadows) would tend to have both higher

evapotranspiration and infiltration rates than imper-

vious covered surfaces. Additionally, the overall

roughness of the former is higher than that of the

latter, hence, longer concentration times and lesser

runoff volumes can be expected. This assertion has

Page 12: Robust parametric models of runoff characteristics at the mesoscale

Table 2

Sample of the best models for total discharge in winter and in summer

No. x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x26 x29 Cp* L1 F1 L2 F2 Obs.

Winter

Potential models: POT

3729 1 1 1 1 1 1 12.6 20.55 0.992 0.967 0.999

3829 1 1 1 1 1 1 1 1 1 9.5 20.24 1.004 0.953 0.986

3837 1 1 1 1 1 1 1 1 1 1 8.5 20.24 1.006 0.949 0.984

Multilinear-potential models: MLP1

7827 1 1 1 1 1 1 1 1 5.1 20.33 0.995 0.940 0.971

7318 1 1 1 1 1 1 1 5.1 20.35 0.996 0.942 0.970

7315 1 1 1 1 1 1 1 5.1 20.35 0.996 0.942 0.970

Multilinear-potential models: MLP2

3733 1 1 1 1 1 1 1 4.8 20.29 0.978 0.934 0.962 *

3734 1 1 1 1 1 1 1 4.7 20.29 0.983 0.934 0.962

3731 1 1 1 1 1 1 4.7 20.30 0.986 0.934 0.963

Summer

Potential models: POT

3965 1 1 1 1 1 1 1 1 1 1 9.9 70.83 7.501 7.249 7.433 *

4093 1 1 1 1 1 1 1 1 1 1 1 11.5 70.82 7.493 7.246 7.449

3967 1 1 1 1 1 1 1 1 1 1 1 11.5 70.81 7.524 7.246 7.443

Multilinear-potential models: MLP1

3967 1 1 1 1 1 1 1 1 1 1 1 12.2 74.97 8.556 8.244 8.457

4095 1 1 1 1 1 1 1 1 1 1 1 1 14.0 75.03 8.540 8.242 8.476

3455 1 1 1 1 1 1 1 1 1 1 15.0 75.09 8.560 8.279 8.477

Multilinear-potential models: MLP2

3967 1 1 1 1 1 1 1 1 1 1 1 16.6 71.49 7.791 7.518 7.736

4095 1 1 1 1 1 1 1 1 1 1 1 1 14.0 71.48 7.809 7.487 7.719

4028 1 1 1 1 1 1 1 1 1 19.9 71.77 7.791 7.567 7.762

1 denotes that a variable is included in the model, otherwise it is omitted. The most robust models are highlighted with the symbol *. All values are dimensionless since the

optimization was carried out in the interval (0, 1]. Because of this L1OL2 (see the corresponding columns). The real values of the estimators can be obtained by LkðmaxðQtilÞÞ

4 with

fZk.

L.

Sa

ma

nieg

o,

A.

Bard

ossy

/Jo

urn

al

of

Hyd

rolo

gy

30

3(2

00

5)

13

6–

15

11

47

Page 13: Robust parametric models of runoff characteristics at the mesoscale

Table 3

Results of the permutation test and optimized parameters for models No. 3733 in winter and No. 3975 in summer, with RZ500 and fZ2

Index j 0 17 19 J* 7 8 11 15 20

Winter—model type MLP2, No. 3733

b2j36.783 K1.166 K0.849 0.223 0.090 0.205 0.089 K0.115 1.199

p-value xj – x0.000 0.008 – x0.000 0.042 x0.000 x0.000 x0.000

Index j 0 7 9 13 14 15 16 17 18 21 29

Summer—model type POT, No. 3965

b2j20.235 0.647 0.135 0.095 K1.822 K0.654 0.007 K0.299 K0.016 1.946 K0.023

p-value xj – x0.000 x0.000 0.012 x0.000 x0.000 x0.000 x0.000 x0.000 x0.000 0.054

L. Samaniego, A. Bardossy / Journal of Hydrology 303 (2005) 136–151148

been confirmed by long-term controlled catchment

experiments in several locations around the globe and

with different types of tree species. Studies carried out

or reported by Law (1956), Bosch and Hewlett (1982),

Kirby et al. (1991), Eeles and Blackie (1993), and

Jones (1997) indicate that afforestation would lead to

a considerable reduction of annual runoff yield. Due

to this rationale, forest and permeable cover would

tend to reduce the seasonal specific yield, and hence, a

negative sign should be expected in the case of a

linear submodel (MLP2).

In summer both coefficients have negative signs.

With regard to forested areas an inverse relationship

between x17 and Q2 can be expected based on the

same rationale presented above. Impervious areas, on

the other hand, would evaporate water to the

atmosphere due to the absorption of heat provided

by the sun, but in much smaller amounts than the latter

because they lack a very important component of the

evapotranspiration process, namely the transpiration

of the vegetal tissue. As a result, a higher yield should

be expected at the outlet of those areas. This

relationship is denoted in the potential model No.

3965 by the negative sign of the exponent of variable

x18, and its smaller absolute value in comparison with

that of variable x17. In fact, these exponents are in the

following ratio b17:b18Z18.7:1.

The performance of the three model types can be

clearly visualized by plotting the results of the two

objective functions as it is depicted in Fig. 3. The left

panel of this figure shows that the best model to

describe the specific runoff in winter is the MLP2

type, whereas the worst is the POT type. In summer,

however, the opposite occurs: the POT type is

the most appropriate as can be seen in the right

panel of Fig. 3.

The significance test for those models marked with

a ‘*’ in Table 2 shows that all variables, with the

exception of x29, are significant at the 5% level, and in

many cases even at 1% level. Hence, the null

hypotheses can be safely rejected at the 5% level of

significance in favor of the alternative hypotheses, i.e.

these variables are certainly not independent from the

explained variable. Results of the Monte Carlo

simulations carried out with 500 replicates are

shown in Table 3.

The Pearson’s correlation coefficient (r) of the

selected models is 0.96 and 0.88 for winter and

summer, respectively. The lower value of r obtained

for the latter along with the higher values of the

objective functions (see Table 2) clearly indicates that

the level of uncertainty of the water system in summer

is higher than that in winter. Furthermore, the RMSE,

which can be thought of as a typical magnitude for

predicted errors, is 28.0 and 38.9 mm for winter and

summer, respectively.

To visualize the goodness of the fit achieved by this

method, the basin of the River Korsch was chosen.

Fig. 4 depicts the observed and calculated values of

the specific discharge as well as other important

variables such as precipitation and three categories of

land cover. This basin, which is located in the vicinity

of Stuttgart, is an interesting case to be analyzed

because it has endured a fast land use/cover change

triggered mainly by anthropogenic driving forces.

Hence, it offers a good example to validate the

calibrated models under extreme situations. In this

basin the impervious cover grew from approximately

Page 14: Robust parametric models of runoff characteristics at the mesoscale

Fig. 3. Plots showing the performance of various model types for

Winter (left panel) and Summer (right panel).

L. Samaniego, A. Bardossy / Journal of Hydrology 303 (2005) 136–151 149

7.3% of the total area in 1961 to approximately 30.9%

in 1993, i.e. an average annual growth rate of

approximately 4.6%. Forest, in contrast, grew slowly

from 1961 to the middle of the 1970s and since that

time it has declined (see upper panel of Fig. 4). During

the same period, precipitation has endured a continu-

ous downward trend as illustrated by the dashed line.

Precipitation has a marked periodicity but, in general,

its average is decreasing at the rate of 1.1 mm/year.

Conversely, the seasonal specific discharge has

increased at the rate of 0.83 mm/year during the

same period (see the lower panel of Fig. 4).

Based on the facts presented above and considering

that other factors are quasi-constant or reveal almost

no trend, an upward tendency of the specific discharge

can only be attributed to influences stemming from

land cover changes occurring in the basin since 1961.

This assertion has been corroborated by the models

presented before, which not only predict an upward

trend as can be seen in Fig. 4, but also relate the

specific discharge with two land cover variables,

whose tests of independence with the explained

variable can be rejected even at levels of significance

lower than 1% based on the Monte Carlo simulations

carried out. Moreover, it should be noted that the

selected models represent a regionalization for all

basins within the Study Area, and because of this,

these models might fail to predict with high certainty a

peak or a nadir at a given time point. However, they

have an advantage; i.e. they can perceive upward or

downward tendencies of those variables included in

the model, and hence, predict an expected value for

the explained variable based on such trends. It is

noteworthy to point out that the relationship between

land cover variables and the specific discharge is non-

linear in summer, whereas in winter, due to almost no

physiological activity of vegetation, this relationship

is very close to linear.

6. Conclusions

The following conclusions with regard to the

presented method can be drawn based on the results

reported in this paper.

The proposed method proved feasible to be

implemented and as a result of its application,

parsimonious and robust models were obtained for

the specific discharge in winter and summer in the

Study Area. These models are able to reveal many

of the entangled relationships between the pre-

dictors, i.e. the trends contained in the data.

Although the results presented here are valid only

for the Study Area, the proposed method is general

and transferable to other regions provided that

enough information is available for the calibration

and validation phases.

Page 15: Robust parametric models of runoff characteristics at the mesoscale

Fig. 4. Comparison of time series of land cover, precipitation, and specific discharge in winter and summer for the basin of River Korsch.

Calculated values using models No. 3733 for winter and No. 3965 for summer are also displayed.

L. Samaniego, A. Bardossy / Journal of Hydrology 303 (2005) 136–151150

The use of a nonlinear optimization algorithm

offered many advantages as compared with more

traditional methods such as the Least Squares

Method. First, it allowed calibrating and selecting a

model so that it performs well under two different

estimators simultaneously. This, in turn, provided

robust models exhibiting quite a high degree of

agreement between calculated and observed

values. Second, it permitted calibrating nonlinear

models with an additive error term. If the error

term of a calibrated model exhibits heteroscedas-

ticity, it can be removed by introducing a

continuous weighting function instead of (7).

The proposed method always selects parsimonious

models with the minimum number of variables and

parameters. This feature not only provides a clear

insight into the functioning of the system but also

considerably minimized the risk of over-parame-

terization as well as the possible multicollinearity

among predictors.

The use of the Jackknife statistic during the cross-

validation of the best models has tremendously

facilitated the task of the selection of the ‘best’

model. Additionally, it was of essential importance

in the present study since it allows estimating at the

same time the level of predictability and the

robustness of a model in the presence of data that

contain outliers. One important advantage of this

statistic is that it can always be used regardless of

the estimator employed.

Finally, the permutation test employed in this study

to simulate the sampling distribution of the chosen

test statistic under the null hypothesis (e.g.

independence) has proved to be an indispensable

analytical tool where the multivariate joint distri-

bution function of the predictors is unknown.

In this case, had any conventional parametric

statistical test been used, misleading decision

results as to which variable is to be in or out of a

model might have occurred.

Page 16: Robust parametric models of runoff characteristics at the mesoscale

L. Samaniego, A. Bardossy / Journal of Hydrology 303 (2005) 136–151 151

References

Abadie, J., Carpentier, J., 1969. Generalization of the Wolfe

reduced gradient method to the case of nonlinear constraints, in:

Fletcher, R. (Ed.), Optimization. Academic Press, London,

pp. 37–47.

Abdulla, F., Lettenmaier, D., 1997. Development of regional

parameter estimation equations for a macroscale hydrologic

model. Journal of Hydrology 197, 30–257.

Akaike, H., 1973. A new look at the statistical model identification.

IEEE Transactions on Automatic Control 19, 716–723.

Bardossy, A., 1993. Stochastische Modelle zur Beschreibung der

raumzeitlichen Variabilitat des Niederschlages. Institut fur

Hydrologie und Wasserwirtschaft der Universitat Karlsruhe

1993;, 43.

Bardossy, A., 1999. Impact of climate change on river basin

hydrology under different climatic conditions, in: Nachtnebel, P.

(Ed.), Final report. CC-HYDROENV4-CT95-0133.

Bosch, J., Hewlett, J., 1982. A review of catchment experiments to

determine the effect of vegetation changes on water yield and

evapotranspiration. Journal of Hydrology 55, 3–23.

Casti, J., 1984. On the theory of models and modelling natural

phenomenon, in: Bahrenberg, G., Fischer, M.M., Nijkamp, P.

(Eds.), Recent Developments in Spatial Data Analysis (Method-

ology, Measurement, Models). Gower, Vermont, pp. 73–92.

Chow, V. (Ed.), 1984. Handbook of Applied Hydrology: A

Compendium of Water Resources Technology. McGraw-Hill,

New York.

Clarke, R., 1994. Statistical Modelling in Hydrology. Wiley,

Chichester.

Coello, C., van Veldhuizen, D., Lamont, G.B., 2002. Evolutionary

Algorithms for Solving Multi-Objective Problems. Kluwer,

New York.

Daniel, C., Wood, F., 1980. Fitting Equations to Data: Computer

Analysis of Multifactor Data, second ed. Wiley, New York.

Das, I., Dennis, J., 1997. A closer look at drawbacks of minimizing

weighted-sums of objectives for pareto set generation in

multicriteria optimization problems. Structural Optimization

1997;, 14.

Davison, A., Hinkley, D., 1997. Bootstrap Methods and Their

Application. Cambridge University Press, Cambridge.

Dooge, J., 1988. Hydrology in perspective. Hydrological Sciences

Journal 1988;, 33.

Edgeworth, F., 1881. Mathematical Physics. P. Keagan, London.

Eeles, C., Blackie, J., 1993. Land-use changes in the Balquhidder

catchments simulated by a daily streamflow model. Journal of

Hydrology 145, 315–336.

Efron, B., 1982. The Jackknife, the bootstrap and other resampling

plans. Society for Industrial and Applied Mathematics. VII.

Regional Conference Series in Applied Mathematics 1982;, 38.

Ezekiel, M., 1930. Methods of Correlation Analysis. Wiley, New

York.

Hartmann, A., Rieger, H., 2002. Optimization Algorithms in

Physics. Wiley/VCH, Berlin.

Jones, J., 1997. Global Hydrology: Processes, Resources and

Environmental Management. Longman, Harlow.

Kirby, C., Newson, M., Gilman, K., 1991. Plynlimon Research: The

First Two Decades. Institute of Hydrology, Wallingford.

Kleeberg, H., Cemus, J., 1992. Regionalisierung in der hydrologie,

in: Kleeberg, H. (Ed.), Regionalisierung Hydrologischer Daten.

Weinheim, New York, pp. 1–15.

Kuhn, H., Tucker, A., 1951. Nonlinear programing, in: Neyman, J.

(Ed.), Second Berkeley Symposium on Mathematical Statistics

and Probability. University of California Press, Berkeley,

pp. 481–492.

Lasdon, L.S., Warren, A.D., Jain, A., Ratner, M., 1978. Design and

testing of a generalized reduced gradient code for nonlinear

programming. ACM Transactions on Mathematical Software 4,

34–50.

Law, F., 1956. The effects of afforestation upon water yields of

catchment areas. Journal of British Waterworks Association 38,

484–494.

Lettenmaier, D., Wood, E., 1993. Hydrologic Forecasting, in:

Maidment, D. (Ed.), Handbook of Hydrology. McGraw-Hill,

New York (Chapter 26).

Mallows, C.L., 1973. Some comments on cp. Technometrics 1973;,

15.

Pareto, V., 1896. Cours D’Economie Politique, vols I and II. F.

Rouge, Lausanne.

Quenouille, M., 1949. Approximate tests of correlation in time

series. Journal of the Royal Statistical Society 1949;, 11B.

Raudkivi, A., 1979. Hydrology: An Advanced Introduction to

Hydrological Processes and Modelling, first ed. Pergamon

Press, Oxford.

Robinson, J., Brush, S., Douglas, I., Graedel, T., Graetz, D.,

Hodge, W., Liverman, D., Melillo, J., Moss, R., Naumov,

A., Njiru, G., Penner, J., Rogers, P., Ruttan, V., Sturdevant,

J., 1998. Land-use and land-cover projections, in: Meyer, W.B.,

Tumer II., B.L. (Eds.), Changes in Land Use and Land Cover:

A Global Perspective. Cambridge University Press, Cambridge,

pp. 73–96.

Rodriguez-Iturbe, I., 1969. Estimations of statistical parameters for

annual river flows. Water Resources Research 1969;, 5.

Rousseeuw, P., Leroy, A., 1987. Robust Regression and Outlier

Detection. Wiley, New York.

Samaniego, L., 2003. Hydrological consequences of land use/land

cover change in mesoscale catchments. PhD Dissertation No.

118. Transactions of the Institute of Hydraulic Engineering,

Faculty of Civil Engineering, University of Stuttgart,

Stuttgart.

Sarker, R., Mohammadian, M., Xin, Y. (Eds.), 2002. Evolutionary

Optimization International Series in Operations Research and

Management Science. Kluwer, Boston.

Simonoff, J.S., 1996. Smoothing Methods in Statistics. Springer,

New York.

Tukey, J., 1982. Bias and confidence in not quite large samples.

Annals of Mathematical Statistics 29 (Abstract).

Wilby, R. (Ed.), 1997. Contemporary Hydrology: Towards Holistic

Environmental Science. Wiley, Chichester.

Wolfe, P., 1963. Methods of nonlinear programming, in:

Graves, R.L., Wolfe, P. (Eds.), Recent Advances in Mathemat-

ical Programming. McGaw-Hill, New York, pp. 67–86.