modelling rainfall-runoff using genetic programming

15
COMPUTER MODELLIRIG PERGAMON Mathematical and Computer Modellhrg 33 (2001) 707-721 www.elsevier.nl/locate/mcm Modelling Rainfall-Runoff using Genetic Programming P. A. WHIGHAM* AND P. F. CRAPPER CSIRO Land and Water P.O. Box 1666,Canberra, A.C.T. 2601, Australia pwhighamQlnfoscience.otago.ac.nz Peter.CrapperOcbr.clv.csiro.au Abstract+enetic programming is an inductive form of machine learning that evolves a computer program to perform a task defined by a set of prmented (training) examples and has been succeesfully applied to problems that are complex, nonlinear and where the sise, shape, and overall form of the solution are not explicitly known in advance. This paper describes the application of a grammatically- based genetic programming system to dii rainfall-runoff relationships for two vestly different catchments. A context-free grammar is used to deilne the search space for the mathematical language used to express the evolving programs. A daily time series of rainfall-runoff is used to train the evolving population. A deterministic lumped parameter model, based on the unit hydrograph, is compared with the results of the evolved models on an independent data wt. The favourable results of the genetic programming approach show that machine learning techniques are potentially a useful tool for developing hydrological models, especially when surface water movement and water losses are poorly understood. @ 2091 Elsevier Science Ltd. All rights reserved. Keywords--Rainfall runoff, Genetic programming. 1. INTRODUCTION The major objective of rainfall-runoffmodelling is to predict the runoff of a catchment from the rainfall incident on the catchment. The response of the catchment (especially Australian catch- merits)is highly capricious depending not only on the catchment characteristics (e.g., topography, area), vegetation characteristics, and antecedent conditions, but the meteorological conditions (e.g., areal distribution of r&fall) in a highly nonlinear and unpredictable fashion. Developing models that describe this relationship may help in understanding the overall behaviour of the catchment and support the development of more process-based models and catchment classific& tion schemes. 1.1. Machine Learning The field of evolutionary computation has been widely studied since the 1969s [1,2]. This form of machine learning is characterised by the use of a population of objects that compete to perform some specified task. Using biological analogies, the population of possible solutions are modified in two main ways: *Present address: Dept. of Information Science, University of Otago, P.O. Box 53, Dunedin, New Zealand. 08957177/01/t - ses front matter @ 2091 Elsevier Science Ltd. All rights mserved. FW=tby&@-Wf PII: SO8957177(90)002740

Upload: pa-whigham

Post on 04-Jul-2016

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Modelling rainfall-runoff using genetic programming

COMPUTER MODELLIRIG

PERGAMON Mathematical and Computer Modellhrg 33 (2001) 707-721 www.elsevier.nl/locate/mcm

Modelling Rainfall-Runoff using Genetic Programming

P. A. WHIGHAM* AND P. F. CRAPPER CSIRO Land and Water

P.O. Box 1666, Canberra, A.C.T. 2601, Australia pwhighamQlnfoscience.otago.ac.nz Peter.CrapperOcbr.clv.csiro.au

Abstract+enetic programming is an inductive form of machine learning that evolves a computer program to perform a task defined by a set of prmented (training) examples and has been succeesfully applied to problems that are complex, nonlinear and where the sise, shape, and overall form of the solution are not explicitly known in advance. This paper describes the application of a grammatically- based genetic programming system to dii rainfall-runoff relationships for two vestly different catchments. A context-free grammar is used to deilne the search space for the mathematical language used to express the evolving programs. A daily time series of rainfall-runoff is used to train the evolving population. A deterministic lumped parameter model, based on the unit hydrograph, is compared with the results of the evolved models on an independent data wt. The favourable results of the genetic programming approach show that machine learning techniques are potentially a useful tool for developing hydrological models, especially when surface water movement and water losses are poorly understood. @ 2091 Elsevier Science Ltd. All rights reserved.

Keywords--Rainfall runoff, Genetic programming.

1. INTRODUCTION

The major objective of rainfall-runoff modelling is to predict the runoff of a catchment from the rainfall incident on the catchment. The response of the catchment (especially Australian catch- merits) is highly capricious depending not only on the catchment characteristics (e.g., topography, area), vegetation characteristics, and antecedent conditions, but the meteorological conditions (e.g., areal distribution of r&fall) in a highly nonlinear and unpredictable fashion. Developing models that describe this relationship may help in understanding the overall behaviour of the catchment and support the development of more process-based models and catchment classific& tion schemes.

1.1. Machine Learning

The field of evolutionary computation has been widely studied since the 1969s [1,2]. This form of machine learning is characterised by the use of a population of objects that compete to perform some specified task. Using biological analogies, the population of possible solutions are modified in two main ways:

*Present address: Dept. of Information Science, University of Otago, P.O. Box 53, Dunedin, New Zealand.

08957177/01/t - ses front matter @ 2091 Elsevier Science Ltd. All rights mserved. FW=tby&@-Wf PII: SO8957177(90)002740

Page 2: Modelling rainfall-runoff using genetic programming

708 P. A. WHIGHAM AND P. F. CRAPPER

l mutation of an individual, which is an asexual operation causing (normally) a small change in the individual, and

l mating between individuals, which mixes the parent representations to create a child.

Individuals of the population are given a fitness measure, which is deiined by their performance against a training set of examples relating input and output patterns. Selection of individuals for mutation and mating, which creates the next generation of objects, are based in a proportional way on their fitness. This selection mechanism (similar in concept to Darwinian natural selection) drives the population towards better individuals, and therefore, better solutions.

1.2. Bias and Learning

Bias may be defined as the factors that influence a learning system to favour certain hypotheses or strategies. The application of a learning system always involves some form of bias. Bias may be introduced in any of the following areas:

l the problem representation, l the operators used to search the representation space, l the structural constraints of the representation, l the search constraints when manipulating the representation, and l the criterion used to evaluate proposed solutions.

The use of bias in machine learning has been promoted for many years when knowledge is available that can help narrow the search space of the problem. For example, Lenat [3] stated “All our experiences in AI research have led us to believe that for automatic programming, the answer lies in knowzedge, in adding a collection of expert rules which will guide code synthesis and transformation.”

When knowledge about the structure and form of good solutions is known, there should be the opportunity to define this knowledge explicitly. This is described ss background knowledge+ and includes bias of the language and how particular parts of the search space are to be explored. The’ machine learning technique described in this paper allows language and search bias to be explicitly defined. Although thii ability is used only to a limited extent within thii paper, it is hoped that the possibilities for expression of bias with time series modelling will be made clear.

1.3. Rainfall-Runoff Modelling

One of the traditional approaches to hydrograph modelling is to use the concept of the in- stantaneous unit hydrograph (IUH). The IUH is defined ss the hydrograph produced by the instantaneous application of a unit depth of rainfall to a catchment. The shape of the IUH is similar to a single peak hydrograph with a rapid rise and a slower decay. The fundamental as- sumption in the IUH model is that the precipitation input is equal to the integrated streamflow output. The nonlinear relationship between rainfall and streamflow has led to the development of effective mk&& which is determined by applying a nonlinear filter to the raw rainfall data. This effective rainfall is then equated with the integrated streamflow for the specified catchment.

The IHACRFS model applied in thii paper is based on IUH principles. The model defines a unit hydrograph for total streamflow by defining separate unit hydrographs for the quick&v and the slowflow components. The model is defined by six parameters, four of which are determined directly from the raw rainfall, streamflow, and temperature (or a surrogate), while the other two (the nonlinear parameters) are calibrated using a trial and error search procedure, optimismg the model to fit the observed rainfall-runoff relationship. Additional details about the model are contained in [4,5].

1.4. Aims

This paper will compare two diierent approaches to predicting rainfall-runoff relationships. One approach uses an evolutionary machine learning algorithm which evolves programs defined

Page 3: Modelling rainfall-runoff using genetic programming

Modellhg Rainfall-Runoff 709

in a formal language; the other approach uses a (more traditional) deterministic modellll frame work baaed on the unit hydrograph.

The format of the paper will proceed as follows: genetic programming (GP) will be introduced as a form of program induction using evolution. Formal grammars are then shown to be a useful method for defining language structure and may be neatly described in the same framework as GP. Grammars are shown to be able to be used as generators for sentences of structured lan- guages and may be manipulated using a crossover operator that maintains the language structure. Additionally, a grammar is shown to be able to express both language and search bias. Finally, a grammatical GP system is used to model two catchments with vastly different rainfall-runoff behaviour. The results are compared with the deterministic model IHACRJZS.

2. GENETIC PROGRAMMING

The field of program induction, using a tree-structured approach, was first clearly defined by Koza [6]. This field, named genetic programming (GP), evolved a solution in the form of a Lisp program using an evolutionary, population-based, search algorithm which extended the flxed- length concepts of genetic algorithms defined by Holland [2]. The structures were defined as a combination of functions (arity > 0) and terminals (0-arity functions) which combined to form Lisp programs, The following steps summarise the search procedure used with GP.

1. Create an initial population of programs, randomly generated as compositions of the function and terminal sets.

2. WHILE termination criterion not reached DO (a) Execute each program to obtain a performance (fitness) measure representing how

well each program performs the specified task. (b) Use a fitness proportionate selection method to select programs for reproduction to

the next generation. (c) Use probabilistic operators (crossover and mutation) to combine and modify compo-

nents of the selected programs. 3. The fittest program represents a (general&d) solution to the problem.

The termination criterion is based upon the problem that is being solved. Normally, an exact solution cannot be obtained and so the search for a solution is complete after a certain number of generations have been performed. In the csse of the timeseries model that is developed in thii paper, an exact solution to the training data is neither desirable (since this would imply that the solution has been overly specialised) nor is it likely to occur. Hence, the solution that is accepted is the one with the best fitness after a fixed number of generations.

GP has been previously applied to th-ne series prediction (71; however, no previous work has applied GP to the important problem of predicting rainfall-runoff patterns.

3. FORMAL GRAMMARS AND LEARNING

This section introduces the use of formal grammars to silow explicit biasing in the framework of genetic programming. The grammar structures are shown to be capable of explicitly stating both the language bias and how the. language is searched.

A formal grammar is a production system which defines how nonterminal symbols may be transformed to create terminal sentences of a language. A grammar is represented by a four- tuple (N, D, P, S), where N is the alphabet of nonterminal symbols, C is the alphabet of terminal symbols, P is the set of productions, and S is the designated start symbol. The following grammar, Gmath, defines a language for generating all possible mathematical expressions using the operators +, -, *, /, and a set of random real numbers, represented by the symbol 8.

G math =

(8 N = {Ml,

Page 4: Modelling rainfall-runoff using genetic programming

710 P. A. WWGHAM AND P. F. CRAPPER

c = {+,-,*,/181),

P=

1

S-+M

M++MM

1

1

1 -MM 1 *MM 1 /MM 1%

3.1. Derivation Steps and Derivation Tbees

A derivation step represents the application of a production to some string which contains a

nonterminal. In general, a series of derivation steps may be represented by a syntax tree or deriva-

tion tree. As shown in Figure 1, the following derivation steps, which define the mathematical

expression for (5.2 + (2.12 * 73.89)),

S=sM=++MM=%+%M++5.2M=++5.2*MM

=s +5.2 * RM =+ i-5.2 * 2.12M =t- +5.2 * 2.12% =b+ +5.2 * 2.12 73.89, (1)

may be represented as a tree. The genetic search operators of crossover and mutation are applied

directly to these trees.

+‘hL\M I

“; */I\

7 i 5.2

“; 7 2.12 73.89

Figure 1. A derivation tree for the expression 5.2 + 2.12 * 73.89 using cm&h.

3.2. How Do We Use the Grammar?

A grammar G may be used as a generator of sentences which are a part of the language L(G).

A limit on the depth of the derivation tree created from the grammar is necessary to ensure that

the generation process halts (there are also the practical issues of implementing the generation on

a finite machine and being able to execute the created programs). There are two steps involved

in using G to generate random sentences from L(G). Initially, each production P E G is labelled

with the minimum depth of derivation tree that can be created from thii production to produce a

string of terminal symbols. This mm-depth-tree value is used to guide the selection of productions when randomly creating sentences from L(G), limited by some maximum depth of derivation tree.

Each derivation tree is evaluated against the test data to obtain a fitness measure. Selection of

programs (derivation trees) for crossover and mutation are based on their relative fitness. Each generation, the population is transformed using the genetic operators of crossover and mutation

to give the next population. This process continues until some maximum number of generations

have passed, or an acceptable solution has been discovered.

3.3. Genetic Operators

This section describes how a derivation tree is modified from one generation to the next.

There are two search operators that are used to modify the evolving rainfall-runoff model; the

Page 5: Modelling rainfall-runoff using genetic programming

Modelliig Rainfall-Runoff 711

crossover operator is used as the standard search operator for each generation, mixing elements

of “potentially” useful partial solutions in an attempt to build a better solution; a hill-climbing

mutation is used as a final operator (after a fixed number of generations) that allows the random

constants that, have been used to seed the population to be modiied in an attempt to move

towards a more optimal solution. These two operators will now be described.

3.3.1. Crossover

Crossover applies two (parent) individuals and creates two (offspring) individuals. Crossover is

defined by two parameters: the probability of crossover occurring and some nonterminal B E N where crossover will be attempted. Given two derivation trees dl and dz, the steps involved are

as follows (see Figure 2).

l Randomly select a nonterminal site B from dl and ,B from ds. l Swap the derivation trees below B, thereby creating two new derivation trees.

l Insert these trees into the next-generation population.

If dl or d2 do not contain the nonterminal B, then no crossover is possible and the operation is

aborted.

Figure 2. Croesover between derivation trees.

The benefit of using the derivation trees to represent, the population now becomes clear; by

defining crossover to swap subtrees at, the same nonterminal guarantees that the language defined

by the grammar is maintained.

3.3.2. Hill climbing mutation for real numbers

The rainfall-runoff grammar (see Section 6) defines mathematical expressions which are initially

seeded with variables related to the process under study and random real numbers. These real

numbers are used as constants throughout the evolution, and are combined into the partial

solutions that. are evolved. A final solution that uses one or more of these constants may be

improved (based on the training data) by slight modifications of these “constants” and then reevaluating the new solution to see if it performs better. The hill climbing mutation works by

applying small random changes to the “constants” and maintainii the new solution only if the new solution improves the final performance based on the training data. The small changes to

constants are produced by creating a random number between -1 and +l which is added to the

constant. This is performed to each constant in the equation that is being mutated. Although

it, has been shown that different distributions for random mutations will perform better on some

problems [8], the purpose of this approach was only to show that the approach produced results which were comparable with the currently accepted approaches for rainfall-runoff modelling. This

Page 6: Modelling rainfall-runoff using genetic programming

712 P. A. WHlGHAM AND P. F. CRAPPER

mutation is applied a fixed number of times and only affects solutions which contain random constants. This operation may be considered as a fine tuning of the constants in the evolved solution.

4. LANGUAGE AND SEARCH BIAS

The language L(G) is shaped by the grammar G. Since this grammar is declaratively defined, the language bias for the learning system is clearly stated and easily changed without changing underlying functions of the system. This promotes the exploration of new language constructions which become apparent during the evolution of partial solutions. For example, a combination of terms that consistently appear in partial solutions may be explicitly stated as part of the gram- mar. Thus, an incremental approach to developing a solution may be performed in a declarative manner.

An additional language bias may be introduced when defining the grammar. Each production is labelled with a weighting factor, which biases the probability of each production being selected during the generation of the initial population. This allows background knowledge about which terminal symbols are likely to be most useful in developing a solution.

Search bias is introduced by allowing the crossover and mutation sites (i.e., the nonterminals) to be individually specified. For a particular setup, a number of crossover and mutation operations may be defined for different nonterminals with different probabilities of occurrence. The ability to specify which nonterminals are used for these operators allows a declarative specification of where the majority of the search for better solutions will be performed. It also promotes the exploration of the search space in an interactive sense by the user.

This paper will not pursue the goal of achieving the best possible solution. Rather, the paper is being used to demonstrate the appropriateness of thii technique to rainfall-runoff modelling, recognising the potential for further discovery and improvements. The system described in this paper has been previously defined by Whigham [9,10] and is entitled CFG-GP.

5. CATCHMENT DESCRIPTIONS

In order to test the modelling approaches, two very different catchments were chosen. The first catchment was the Teifi catchment at Glan Teifi in Wales. The second catchment was located within the Namoi River catchment in northeastern New South Wales, Australia, The Teifi catchment is a rural catchment draining 893.6 km2 with an average annual rainfall of 1368 mm. This station wss maintained and operated by the UK Environment Agency and the data can be obtained from the Institute of Hydrology. Compared with the Namoi catchment, the number of rain days per annum is much greater at Teifi but the maximum daily rainfall is only about half the value for the Namoi. The other major difference is that runoff percentages are very much higher at Teifi. For the calibration period, the runoff percentage was 66.‘7%, and for the simulation period the runoff percentage was 74.95%.

The Namoi River catchment was chosen to be as different as possible from the Teifi catchment. Using the Department of Land and Water Conservation (the gauging authority) naming conven- tion, the catchment is referred to as 419030 or the Manilla Rv at Barraba (30” 23’ 24” S and 150“ 37' 08" E). This catchment is approximately 568 km2 and drains the southern part of the Nandewar Range. Within the catchment, there are three reliable long-term raingauge stations with average annual rainfalls of 686 mm, 704 mm, and 727 mm. These stations have a reasonable spread of location and altitude and the average of the three values has been used as the catchment rainfall. For very large rainfall events (greater than 100 mm), there was a strong relationship be tween rainfall and runoff, but as the size of the event decreased, the relationship between rainfall and runoff became more random. The rainfall in thii part of the country is strongly summer dominated, which influenced the selection of the calibration and simulation periods. The calibra- tion run was done from 13 November 1965 to 10 March 1966, and the simulation run was done

Page 7: Modelling rainfall-runoff using genetic programming

Modelling Rainfall-Runoff 713

from 4 November 1966 to 13 March 1967. In spite of this catchment having a comparatively high

rainfall (by Australian standards at least) and our selection of the high rainfall months, the runoff

percentage over the calibration period was only 6.12% and the simulation period was 8.24%.

6. CFG-GP SETUP

The grammar, Gexp, used by CFG-GP to develop the rainfall-runoff models allowed general

’ mathematical functions to be evolved, and was defined as follows.

G exp =

1% N = {EQU, NL, EXPN},

c = {+, -, *,/,exp,rO,rl,r2;r3,r4,r5,av5,avlO,av15,

av20, av25, av30, ~40, av50, av60, av100, B},

P=

S + +EQU NL

NL + *EQU EXPN

EXPN -+ exp EQU

EQU + +EQU EQU 1 -EQU EQU

EQU --) *EQU EQU 1 /EQU EQU

EQU ---t exp EQU

EQU ---t r0 1 rl 1 r2 1 r3 I r4 I r5

EQU + av5 I avl0 I av15 I av20 I av25

EQU + av30 I av40 I av50 I av60 I avlO0

EQU-+s

1

1

The terminal symbols r0, rl, . . . , r5 represent the rainfall for the current day up to the last five

days rain. The av5, avl0,. . . , avlO0 terminals are the average rainfall for the last 5,10,. . . ,100 days, respectively. The terminal s is a random floating point number between -10.0 and 10.0

which is generated for each occurrence of R when the initial population is created. These random

“constants” are potentially modified using the hill climbing mutation once the initial generation of

a solution has been created. The exponential function is represented by the terminal string “exp”.

The grammar has a structural bias to form equations that are composed of a linear component

and a nonlinear (exponential) component. This is shown by the production S --) +EQU NL,

which forces all programs to have the minimal structure of A + B * exp(C), where A, B, and C

are climate variables or a random real number. The production EQU t expEQU allows the

exponential function to be included with any part of the evolved mathematical expression. The

language bias just forces the use of the exponential function at least once.

Table 1 shows the CFG-GP setup parameters which were used to develop both catchment

models. The crossover operator is applied only to the nonterminal EXP, with a probability

of 90%. Hence, approximately 10% of the population is passed unchanged into each subsequent

generation. This ensures that good solutions are not prematurely removed from the population and that the building blocks that are useful are maintained. The fitness measure used to evaluate

the performance of each program during calibration was the root mean square error (RMSE).

This approach treated each point, irrespective of magnitude, as equally important. Further work

Page 8: Modelling rainfall-runoff using genetic programming

714 P. A. WHIGHAM AND P. F. CRAPPER

of interest would be to study the effects of different metrics for the fitness measurement, based on various criterion. A likely outcome of this work would be a classification for catchments based

on which type of metric best evolves an overall rainfall-runoff relationship.

Table 1. CFG-GP parameter settings for evolving a simple rainfall-runoff model.

CFG-GP Parameter 1 Setting I

I Population Size I 1000 I I Generations I 50 __l

Fitness Measure Minimiss RMSE

The CFG-GP system used the same calibration data ss IHACRES to evolve the rainfall-runoff

models. Comparison of results will only refer to the simulation run which uses previously unseen

data of a similar length. The predictive performance for each model is measured by the R.MSE

for the simulation period and the error in predicted total discharge for the simulation period.

7. RESULTS

7.1. The Glan ‘Ibifi Catchment

The measured rainfall and streamflow, for the Teifi catchment between 1979 and 1982, is

shown in Figure 3 (rainfall events are shown in black). The simulated streamflows determined by

IHACRES and CFG-GP are shown in Figures 4 and 6. The actual error for each model is shown

in Figures 5 and 7. This error represents the difference between predicted and actual runoff. A

visual comparison with the measured streamflow indicates that both approaches have captured

the basic response of the catchment, however IHACRES appears to have better represented the

extreme streamflow events. The root mean square error (RMSE) for IHACRES was 15.4 and

for CFG-GP was 15.7. The predicted total discharge for the simulation period was measured as

35,6OOcumecs, with IHACRES predicting 34,776cumecs (2.3% error) and the evolved CFGGP

solution predicting 33,295cumec.s (6.4% error). Note that the total discharge was not used as

part of the fitness measure for CFG-GP. The CFG-GP expression for the Teifi catchment was

defined as follows:

This may be simplified

rl + av40 +

Equation (3) shows

+ (+ (rl, + (av40, *(avlO, av100))) , * (*(av5,

+ (-17.121983, *(av5, av40))) ,exp(-3.739896))).

to give

(2)

(avl0 * av100) + (av5 * (-17.121983 + (av5 * av40))) * 0.0237. (3)

that the catchment was influenced by antecedent conditions that could

extend for several months into the past (the avlO0 variable represents the average rainfall for

the last 100 days). Note also that the constant exponential expression in equation (2), namely exp(-3.739896), means that the resultant equation for runoff is a linear function of rl (previous

days rain), av5, avl0, av40, and av100.

7.2. The Manilla River at Barraba, N.S.W.

The measured rainfall and subsequent streamflow for the simulation period in the Barraba catchment is shown in Figure 8. As can be seen from this data, for large events, there is a strong

Page 9: Modelling rainfall-runoff using genetic programming

Modellmg Rainfall-Runoff 716

300

250

-x0

L f 100

al

0

m79 l/7/80 m32

Figure 3. Measured rainfall and runoff at Glan ‘IbiB.

ms

Figure 4. IHACRES modelled runoff at Glan Teifi.

Figure 5. IHACFtES error (predicted-actual) for the simulation period at Glan Teifi.

Page 10: Modelling rainfall-runoff using genetic programming

716 P. A. WHlGHAM AND P. F. CRAPPER

mm79 m3o In/81

Figure 6. CFG-GP modelled runoff at Glau Teifi.

l/7/82

rm

400 l/7/81 v?& 2

Figure 7. CFG-GP error (predicted-actual) for the simulation period at Glen Teifi.

relationship between rainfall and runoff. For smaller events, however, there is not a significant relationship between the rainfall and runoff. The simulated streamflows determined by the two approaches are shown in Figures 9 and 11. The associated errors for each model are shown in Figures 10 and 12. The RMSE for the IHACRES approach was 13.1 and for CFG-GP was 5.7. The predicted total discharge for the simulation period wss measured as 187cumecs, with IHACRES predicting 330 cumecs (76% error) and the evolved CFG-GP solution predicting 99 cumecs (47% error). For the purposes of our comparison (based on RMSE), these results are similar. The evolved expression found by CFG-GP was

+ (/ (/(/(/(T-O, -1.911790), -0.474622)) -4.164400),

- (/(-1.542888,5.944119), -(avlO,rO))) ,

* (r0, exp (/(0.251564, avl0)))) ,

which may be simplified to

((?-O/3.7)/(0.26 + avl0 - TO)] + rO* exp(0.252/avlO).

(4)

(5)

Page 11: Modelling rainfall-runoff using genetic programming

ModelUng Rainfall-Runoff 717

_gJ _.,........._...........................,.........~... ...". ..^......... “........ .....“.I .,,........ _ _,@)

Figure 8. Measured rainfall and runoff at Barraba.

4/l II66 01101167 I 3/03/67

Figure 9. IHACRES modelled runoff at Barraba.

60

40

20

0

-20

-40

-6u

a0

Figure 10. IHACRFS error (predicted-actual) for the eimulation period at Barraba.

Page 12: Modelling rainfall-runoff using genetic programming

718 P. A. WHICGIAM AND P. F. CRAPPER

sur

en .’

m

60 ..

%

!!a-

E 40 ‘-

a .-

P..

10 ‘-

C#IQPPmdebdFhnc4l~Bmmbm

Figure 11. CFG-GP modelled runoff at Barraba.

Figure 12. CFG-GP error (predicted-actual) for the simulation period at Barraba.

It is worth noting that equation (5) uses the current days rainfall (TO), and the average of the

last ten days rainfall (avl0). Additionally, (5) has the nonlinear term (exp(/(O.251564,avlO)),

which is a function of avl0. A comparison of equations (3) and (5) shows that the two catchments

have been modelled in very different ways. The Welsh catchment has been modelled using long term averages in a linear combination, whereas the Australian catchment has been modelled

using short average times and the current day in a nonlinear fashion. This would suggest, that

the underlying processes that are driving the water movement throughout both catchments are

quite different.

When an attempt was made to calibrate over different consecutive seasons for the Barraba data,

the IHACRES model was not able to find coefficients to suit all seasons, and therefore, could not converge. Thii accounts for the short calibration and simulation periods of only four months that has been used for testing these models. However, the CFG-GP approach, because it makea no assumption about underlying relationships, was able to be calibrated over successive seasons,

and therefore, use more information about the catchment rwponse to rainfall. When CFG-GP

was calibrated using this longer period (1000 days), the resultant model achieved significantly

Page 13: Modelling rainfall-runoff using genetic programming

Modelling Rainfall-Runoff 719

4/11/66 OlIDlI67 L31D3l67

Figure 13. Rainfall and CFG-GP modelled runoff (1000 days calibration).

80

60

40

20

0

-20

40

60

80

Figure 14. CFG-GP error (predicted-actual) using 1000 days calibration for Barraba.

better results on the original simulation data set (RMSE = 3.1). Additionally, the predicted total discharge changed to 157cumecs (16% error), which is superior to either previous solution. The response of this modelled streamflow is shown in Figure 13, with the associated error between predicted and actual shown in Figure 14. The evolved expression wss

+ (exp (+ (/ (exp(-4.874963), -0.796608))

/ (/(TO, -1.864706), +(-3.018028, -4.418388)

))), *(-3.24042O,exp(-1.181253))), (6)

which may be simplified to give

exp (0.0096 + (r0/13.35)) - 0.994. (7)

The interesting comparison between equations (5) and (7) is that using the larger dataset for calibration (training) resulted in a solution that was a nonlinear function solely of (rO), which represents the current days measured rainfall. No average rainfall value was found to be useful. This implies that the Barraba catchment has a very quick response between rainfall and runoff, and no significant seasonal signal.

Page 14: Modelling rainfall-runoff using genetic programming

720 P. A. WHIGHAM AND P. F. CRAPPER

8. DISCUSSION

The previous examples of applying CFG-GP to modelling rainfall-runoff has been encourag- ing. The use of a simple, nonlinear mathematical grammar has allowed the system to produce equations that capture some measure of the underlying response of the catchment. The linear, strongly seasonal model evolved for the Teifi catchment has a natural interpretation with the underlying climatic and topographic characteristics of this catchment. The nonlinear, weakly seasonal model evolved for the Namoi catchment also corresponds with the perceived behaviour of this landscape.

The grammar, Gexp, had only a weak bias towards forming certain types of mathematical expressions. Future work will involve extending the set of useful mathematical functions (such as a power function between variables) and exploring other language forms which may have more direct interpretation with natural processes.

9. CONCLUSION

In the present work, we have compared the results obtained with a deterministic lumped parameter model, based on the unit hydrograph approach, with those obtained using a stochastic machine learning model.

For the Welsh catchment, the results between the two models were similar. Since rainfall and runoff were highly correlated, the deterministic assumption underlying the IHACRES model was satisfied. Therefore, IHACREX could achieve a satisfactory correlation between calibration and simulation data. It is also interesting to note that for this catchment, the runoff ratio was approx- imately 70%, which suggests that a relationship does indeed exist between the rainfall and runoff. The CFG-GP approach does not require any causal relationships but achieved similar results. The behaviour of the studied Australian catchment is very different from the Welsh catchment. The runoff ratio was very low (7%), and hence, the a priori assumptions of IHACRES (and other deterministic models) were a poor representation of the real world. This was demonstrated by the inability of IHACREJS to use more than one sesson’s data for calibration purposes and only able to use data from a high rainfall period. Since the CFG-GP approach did not make any assumptions about the underlying physical processes, calibration periods over more than one season could be used. These led to significantly improved generalisations for the modelled behaviour of the catchment.

In summary, either approach worked satisfactorily when rainfall and runoff were correlated. However, when this correlation was poor, the CFG-GP had some advantages because it did not assume any underlying relationships. This is particularly important when considering the modelling of environmental problems, where typically the relationships are nonlinear, and are often measured at a scale which does not match with conceptual or deterministic modellmg assumptions.

REFERENCES 1. L.J. Fogel, A.J. Owens and M.J. Walsh, Artijicial Intelligence through Simulated Evolution, John Wiley &

Sons, ( 1966). 2. J.H. Holland, Adaptation in Natuml and Artificial Systems, MIT Press, (1975); Second edition, (1992). 3. D. Lenat, The role of heuristics in learning by discovery, In Machine Learning: An Artificial Intelligence

Approach, Chapter 9, (Edited by R.S. Michalski, J. Carbonell and T. Mitchell), Springer-Verlag, (1984). 4. A.J. Jakeman, LG. Littlewood and P.G. Whitehead, Computation of the instantaneous unit hydrograph and

identifiable component Rows with application to two small upland catchments, Journal of Hydrology, 117, 275-309, (1996).

5. LG. Littlewood and A.J. Jakeman, A new method of rainfall-runoff modelling and its applications in catch- ment hydrology, In Environmental Modelling, Volume II, (Edited by P. Zannetti), pp. 143-171, Computa- tional Mechanics, Southampton, UK, (1994).

6. J.R. Koza, Genetic Pmgmmming: On the Programming of Computers by Means of Natuml Selection, A Bradford Book, MIT Press, (1992).

Page 15: Modelling rainfall-runoff using genetic programming

Modelling Rainfall-Runoff 721

7. B. Mulloy, It. Riolo and R. Savit, Dynamics of genetic programming and chaotic time series prediction, In Genetic Programming: proceedings of the Firat Annual Conference, (Edited by J. Kosa), MIT, Cambridge, MA, (1996).

8. X. Yso and Y. Lui, An Analysis of Evolutionary Algorithms Based on Neighbourhood and Step Size, Lecture Notes in Computer Science, Volume 19f S, pp. 297-307, Springer-W&g, Berlin, (1997).

9. P.A. Whigham, Inductive bies and genetic programming, In First Znternational Conference on Genetic Algorithms in Engineering Systems: Innovations and Applications, pp. 461-466, IEE, UK, (1995).

10. P.A. Whigham, Grammatical bias for evolutionary learning, Ph.D. Thesis, School of Computer Science, University College, University of New South Wales, Australia, (1996).