stochastic models for surface hydrology

2

Surface hydrology

Stochastic models

Prof. W. Bauwens

Department of Hydrology and Hydraulic Engineering

[email protected] MODELS

11INTRODUCTION

12DEFINITIONS

23BASIC CONCEPTS OF STATISTICS

23.1Distribution functions

23.2Statistical parameters

23.2.1Midpoint parameters

33.2.2Measures for the variability

33.2.3Symmetry measures

43.3The stochastic characteristics of time series

43.3.1Persistence

43.3.2The serial correlation coefficients

43.3.3Approximate significance test on the serial correlation coefficients

53.3.4An exact test on the serial correlation coefficient with lag 1

53.3.5The turning point test

63.3.6Example

84COMMONLY USED DISTRIBUTION FUNCTIONS IN HYDROLOGY

84.1The Normal distribution (ND)

84.2The Lognormal distribution (LND)

84.3The Exponential distribution (ED)

94.4The Gamma distribution (GD)

94.5The Pearson III distribution (PIII)

104.6The Log Pearson III distribution (LPIII)

104.7Extreme values distributions (EVD)

125THE CHOICE AND PARAMETER ESTIMATION OF THE DISTRIBUTION

125.1The graphical representation of the distribution

125.2The choice of the distribution function

125.3The parameter estimation

125.3.1The method of moments

125.3.2Optimisation of the parameters

136RETURN PERIOD AND RISK OF A DESIGN

136.1The return period

136.2The risk of a design

147STOCHASTIC MODELS FOR YEARLY FLOW VOLUMES

147.1 Introduction

147.2A stochastic model for yearly flow volumes if no persistence is present

147.2.1The general procedure

147.2.2The probability distribution for the observed time series

147.2.3The generation of a series with pseudo random numbers

187.2.4The transformation of the pseudo random time series

187.3A stochastic model for yearly flow volumes if persistence is present

198STOCHASTIC MODELS FOR MONTHLY FLOW VOLUMES

198.1 Introduction

198.2Mean, variance and correlations are non-stationary : the Tomas Fiering model

208.3The correlation structure and the variance are stationary

208.4Only the correlation structure is stationary

229ARIMA MODELS

229.1 Introduction

229.2Autoregressive models

229.3Moving average models

239.4ARMA and ARIMA models

239.5Model identification

24REFERENCES

1INTRODUCTION

The most important application of stochastic models consists of the generation of synthetic time series. Synthetic time series are artificial series the never occurred and will never occur, but that have the same stochastic characteristics as the historic observations.

Synthetic time series with a duration that is longer than the historical observation series can be used for the evaluation of control strategies for hydraulic constructions such as e.g. reservoirs.

Other applications of stochastic models include:

the prediction of extreme events (in case of persistence); short term predictions; the extension of series of observations by means of correlation.

The syllabus does not aim at providing a complete overview of the stochastic modelling techniques in hydrology, but aims at providing an introduction of concepts and techniques that are used in this framework. Hereto, only models on a yearly or monthly time base are considered. The syllabus is also limited to stochastic models with one variable; multivariate stochastic models are not discussed.

2DEFINITIONS

Series, time series a succession of values (observations) with time

Stochastic modela mathematical model whereby the variables and/or parameters are considered as random variables, characterised by a probability distribution and whereby the time dependence of the variables is explicitly taken into account (as opposed to statistical models )

Synthetic time seriesa time series obtained by means of a stochastic model and for which the stochastic characteristics correspond to those of the observed time series

3BASIC CONCEPTS OF STATISTICS

3.1Distribution functions

The cumulative density function of a stochastic variable X is per definition

whereby P is the expected probability.

The probability density function is

and, consequently,

3.2Statistical parameters

The aim of the statistical methods is the definition of the essential information from a set of observations.

3.2.1Midpoint parametersMidpoint parameters indicate where the bulk of the probability mass of a stochastic variable X is situated.

The mean of the random variable, E(X), is the central moment of first order of the probability distribution:

.

The mean of a stochastic variable in the case where only a limited sample of N observations is available is estimated by means of the arithmetic mean:

If the distribution function is asymmetric, the geometric mean provides a better estimation of the bulk of the probability mass:

The median, being the value for which F(x) = 0.5, is preferred when the number of observations is very limited and the distribution is asymmetric as, in this case, the arithmetic mean is strongly influenced by the extremes.

The mode provides the peak value of the distribution function, being the most frequently observed value.

Finally, a weighted mean can be used, whereby a weight, w, is attributed to each of the observations:

An example of the latter is the method of Thiessen for the calculation of the areal rainfall. 3.2.2Measures for the variabilityVariability measures provide information about the spreading of the probability mass around the midpoint value.

The variance is the central moment of second order of the probability distribution. The estimation of the variance is calculated as:

The standard deviation is the square root of the variance:

It has the same dimensions as the variable.

The coefficient of variation is a dimensionless measure of the variability:

A p-percentile is a solution of the equation F(x) = p.

3.2.3Symmetry measuresSymmetry measures characterise the shape of a distribution.

The estimation of the skewness - a dimensionless parameter is calculated as

When the distribution is symmetric, the skewness is zero; if the distribution has a tail to the right, then the skewness is positive.

The curtosis measures the sharpness of a symmetric distribution, whereby the influence of the spreading is excluded. The higher the value, the sharper the distribution. The parameter is calculated as

EMBED Equation.3 3.3The stochastic characteristics of time series

3.3.1Persistence

A time series shows persistence if the values at a given moment are influenced by the values in the past. Many hydrologic series show persistence - and are therefore not random - as high values tend to be followed by high values and low values by low values; in other words: the series are characterised by consecutive values with the same order of magnitude.The serial correlation coefficients are a measure of these tendencies: significant values of the serial correlation coefficients are an indication of the existence of persistence in the time series. For purely random series, the values of the serial correlation coefficients will only depart from 0 as a consequence of the randomness of the sampling; for series with strong persistence, values of the serial correlation coefficients will tend to 1.

The existence or absence of persistence is very important, as the choice of the type of stochastic model will depend on it.

The literature mentions many tests to check for persistence in a series. In what follows, we will only mention a few of these tests.

3.3.2The serial correlation coefficientsConsider a time series {yt}, t = 1,2 ....N.

The serial correlation coefficient rk is defined as:

with ck the serial covariance with lag k:

The serial correlation coefficient with lag k describes the strength of the relation between a value in the time series and the value that is preceding this value by k time intervals.

The correlogram is a representation of the values r1, r2 .... rk ..., as a function of k (Fig.3.1)

3.3.3Approximate significance test on the serial correlation coefficients

In the absence of persistence, the serial correlation coefficients should theoretically be equal to zero. Practically, one has to account for a certain variance on this value, due to randomness and the limited number of samples. It can be shown that the standard deviation on the serial correlation coefficients for a series of N observations is 1/(N.

An approximate significance test consists of comparing the serial correlations with the interval + 2/(N; values outside this interval indicate the presence of persistence in the series.

Fig.3.1: Correlogram

3.3.4An exact test on the serial correlation coefficient with lag 1

The serial correlation coefficient with lag 1 is Normally distributed around a mean of -1/(N-1) with a variance of (N-2)2/(N-1)3. The coefficient r1 is therefore significantly different from zero - with a probability of 95% - if it is outside the interval

3.3.5The turning point test

The turning point test consists of counting the number of local maxima and minima the turning points - in a time series (Fig.3.2).

Persistence in a series will lead to a smaller number of turning points than would be the case if the series was random.Be ntp the number of turning points in a time series of N observations. It can be shown that the time series is a random series if

Fig.3.2: The turning points in a time series (purple dots)3.3.6ExampleThe yearly flow volume for a river over a period of 10 year is given in Tab.3.1.

Yearflow y(106 m3 )x = y meanx(t) * x(t+1)x(t) * x(t+2)

11881414.1197981-31844

21945478.1-36766-215575

31390-76.9346749297

41016-450.954514176257

51346-120.947260-2430

61076-390.9-7857-89555

7148720.14605-3435

81696229.1-3915315831

91296-170.9-11809

10153669.1

Sum14669243449-141455

Tab.3.1: Spreadsheet for the calculation of the serial correlation coefficientsThe mean of the time series is 1467 and the variance s2 is 86358.

The columns 3 and 4 of the spreadsheet are used for the calculation of the serial correlation coefficient with lag 1 (see equation above). The sum of the values in column 4 is 243449. Consequently

c1 = 243449 / 10 = 24345

and r1 = c1 / s2 = 24345 / 86358 = 0.28

Similarly, using column 5, we find

c2 = - 141455 / 10 = - 14146

r2 = c2 / s2 = - 14146 / 86358 = - 0.16

The approximate significance test for persistence (3.3.3) states that the serial correlations are significant if the absolute values larger are than 2/(N = 2/(10 = 0.63. As this is not the case for r1 of r2, it can be concluded that there is no persistence in the time series.

The exact test for r1 (3.3.4) rejects persistence if r1 is situated in the interval

or, with N = 10

( -0.69 , 0.47 ).

Also from this test, it can be concluded that there is no persistence in the series.

The number of turning points in the time series equals 6. For a random series, the expected number of turning points is 0.66(N-2) or, with N = 10, 5.33. For the series to be random, the number of turning points should be higher than

being 5.33 - 1.96 (1.45 or ca. 3. Also this condition is satisfied.

Fig.3.3: The correlogram for the example4COMMONLY USED DISTRIBUTION FUNCTIONS IN HYDROLOGY

4.1The Normal distribution (ND)

The probability density function for the Normal of Gauss distribution is given in Tab.4.1.

The theorem of the central limits states that a variable is characterised by a Normal distribution if: the variable is the result of a large number of causative factors; each of these factors has only a limited influence on this variable; the different factors are independent of each other; the effect of the factors is cumulative.The theorem explains the use of the ND for time-integrated hydrologic variables. As an example, we could think of the yearly precipitation amount: it is the result of the sum of the rainfall in a large number of independent rainfall showers, whereby each storm has only a limited impact on the outcome. Note however that the theorem does not hold for shorter durations (e.g. the monthly rainfall).

A problem with the use of the ND for most hydrologic variables is the fact that the distribution is unbounded: a probability mass will also be attributed to negative values! Consequently, the normal distribution is normally not used for hydrologic variables.

The normal distribution is therefore mainly used as an auxiliary distribution:

as a reference for other distributions; for the representation of random (measurement) errors; for the generation of pseudo-random numbers (see later).4.2The Lognormal distribution (LND)

The Lognormal distribution (LND) is an example of a distribution with positive skewness, a feature that is often observed for hydrologic variables. If a variable X is Lognormally distributed, the variable Y = log X is Normally distributed.

The 2-parameter LND (Tab.4.1) has a lower bound equal to 0; a 3-parameter LND exists, whereby a lower bound, different from 0, can be defined.

Applications of the LND in hydrology are e.g.

de distribution of the hydraulic conductivity of the soil over an area; de distribution of the size of rain drops in a storm.

4.3The Exponential distribution (ED)

The Exponential or Poisson distribution is commonly used for the representation of arrival times between independent events, e.g.

the arrival time between successive rain storms; the arrival time between successive pollution clouds in a river.Distribution

f(x)rangeparameters

Normal

Lognormal

y = log x

Exponential

Gamma

Pearson III

Log Pearson III

y = log x

EVD I

Tab.4.1: Distribution functions used in the hydrology

TypeNamekRange

EWV IGumbel=1-( (

EWV IIFrechet1-( (+(/k

Tab.4.2: The extreme value distributionsThe distribution function is given in Tab.4.1. The parameter ( represents the mean frequency of occurrence of the event.

4.4The Gamma distribution (GD)

The Gamma distribution (Tab.4.1) is suited for skewed distributions, without the need of transforming the variable.

An example of application concerns the rainfall amount in storms.

A limitation of the distribution relates to the lower boundary (0) in cases where a higher low boundary should be used. In those cases, the Pearson III distribution offers an alternative.

4.5The Pearson III distribution (PIII)

The Pearson III distribution (Tab.4.1) is a 3-parameter Gamma distribution, whereby a lower bound different from zero can be defined.

It is a very flexible distribution that is often used in hydrology. As an example, we mention the distribution of the yearly maximum flows.

4.6The Log Pearson III distribution (LPIII)

This distribution (Tab.4.1; Fig.4.2) can be used if the variable is extremely skewed, as is often the case for maximal flows of flood waves.

4.7Extreme values distributions (EVD)

Extreme values are selected minimal or maximal values from a dataset, e.g. yearly maximum rainfall intensities.

The general equation for the extreme values distributions is:

Depending on the value of k, 3 types of distributions are distinguished (Tab.4.2)

The Gumbel distribution is frequently used for the distribution of extreme low or high flows.

Fig.4.2: The Log Pearson III distribution

5THE CHOICE AND PARAMETER ESTIMATION OF THE DISTRIBUTION

5.1The graphical representation of the distribution

For the graphical representation of the distribution, the N values are sorted in ascending order. Every value is associated to its rank m.

Theoretically, the cumulative distribution function for an indefinite series is (theorem of Bernoulli):

For a limited series, this formula implies that the highest observed value can never be exceeded, which is unrealistic.

On the other hand, the method of Weibull, whereby it is stated that:

results in a return period that is too small for the highest values.

A number of compromise formulae have been presented in the literature (see e.g. Chow, 1988, p.398 and Mutreja,1982, p.88 for an overview). They are usually of the type:

The parameter b depends on the distribution Blom states that b=3/8 for a Normal distribution and Gringerton states that b=0.44 for a EVD I distribution. For skewed distributions, such as the LPIII, the optimal value of b also depends on the skewness: a value higher than 3/8 should be applied for positive skewness; a lower value for negative skewness.

5.2The choice of the distribution functionAlthough some general guidelines for the selection of a distributed have been put forward if the above text, it remains essential to test the validity of the selected distribution. Techniques that can be used hereto include the (2 and the Kolmogorov-Smirnov test.

5.3The parameter estimation

5.3.1The method of momentsThe method of the moments calculates the parameters of the distribution by assuming that the probability moments of the distribution (mean, variance,) are equal to the corresponding moments of the observed series. Tab.4.1 shows how these moments can be calculated for frequently used distributions.

5.3.2Optimisation of the parameters

The most commonly used parameter optimisation method is the Maximum Likelihood method (see Chapter on parameter calibration). 6RETURN PERIOD AND RISK OF A DESIGN6.1The return periodSee Chapter on precipitation.6.2The risk of a designConsider a structure that was designed based on a storm with return period T. One may hereby wonder what is the probability of failure of this structure within a period of n years. We therefore need to define the probability PF that the structure will experience, within a period of n years, 1 or more storms that are more extreme than the one for which the structure was designed:

PF = P(1 or more failures in n year) = 1 P(0 failures in n year)

De product law of statistics states that:

PF = 1 (1-p)nwith p the probability that a more extreme storm than the design storm occurs within an arbitrary year. Consequently,

PF = 1 (1-(1/T))nExampleConsider a reservoir designed for a storm with a return period of 10 years. The probability that the structure fails e.g. during the first 2 (5) years equals 19% (41%).

7STOCHASTIC MODELS FOR YEARLY FLOW VOLUMES

7.1 IntroductionThe most important application of stochastic models concerns the generation of synthetic time series. Synthetic time series with a duration that is longer than the historical observations can be used for the evaluation of control strategies of hydraulic structures such as reservoirs.

Although stochastic models for yearly flow volumes are of little practical use in this framework, we will consider this case first, as it allows to introduce the basic concepts of stochastic modelling.

Synthetic time series have the same stochastic characteristics as the historical observations and are thus even probable as an observed series. Hereto, the stochastic model has to reproduce the important stochastic parameters of the historical time series: the mean; the variance; the persistence.The existence or absence of persistence in this framework is of the greatest importance, as the choice of the type of stochastic model depends on it.7.2A stochastic model for yearly flow volumes if no persistence is present7.2.1The general procedure

The general procedure for the generation of the synthetic time series consists of 3 steps: the definition of the probability distribution of the observed time series; the generation of a time series with pseudo random variables (N(0,1)-distributed); the conversion of the pseudo random series to a time series with characteristics as defined in step 1.

7.2.2The probability distribution for the observed time series

The following distributions may be suited for yearly flow volumes: the log-normal distribution; the gamma distribution ( Pearson III ).Besides the type of distribution, also its parameters need to be defined.

7.2.3The generation of a series with pseudo random numbersThe series with pseudo random numbers should fulfil the following requirements:

the numbers are normally distributed with mean 0 and standard deviation 1 (ND(0,1)); no persistence may be present in the time series; the time series should not consist of partial series that repeat themselves.The generation of ND(0,1) pseudo random numbers occurs in two steps: the generation of uniformly distributed numbers, using the multiplicative congruence method; the transformation of these numbers to ND(0,1) distributed numbers e.g. using the method of Box and Muller.

The multiplicative congruence methodThe multiplicative congruence method is used to generate uniformly distributed random numbers: each number within the series has the same probability of occurrence.

The method uses a recursive formula of the type

hereby, the MOD operator results in the remainder after an integer division of (a xi-1 + b) by m. a, b and m are the (integer) parameters of the model. The model also requires an initial value for x, x0, that is called the seed of the generator.

The x values that are generated by this formula are integers between 0 and m-1. To obtain values between 0 and 1, the x values are transformed through a real division by m.

ExampleBe a = 3, b = 5, m = 16 and x0 = 4.

The recursive formula states

x1= (3*4 + 5) MOD 16 = 17 MOD 16 = 1

x2 = (3*1 + 5) MOD 16 = 8 MOD 16 = 8

x3 = 29 MOD 16 = 13

x4 = 44 MOD 16 = 12

x5 = 41 MOD 16 = 9

x6 = 32 MOD 16 = 0

x7 = 5 MOD 16 = 5

x8 = 20 MOD 16 = 4

x9 = 17 MOD 16 = 1

x10 = 8 MOD 16 = 8

The normalised time series thus becomes

4/16, 1/16, 8/16, 13/16,

The example illustrates one of the limitations of the method: the series repeats itself after a definite number of iterations (in the example: every 8 steps). The choice of the parameters a, b and m therefore requires special attention, as these parameters define the maximum length of the series, before it starts to repeat itself.

Many pocket calculators and compilers include functions for the generation of random numbers. They should be used with caution if very long time series are to be generated. When using them, the series should be checked for persistence and repetition.Tab.7.1 shows selected values for the parameters. When selecting the parameters, one should also pay attention to problems with numerical overflow (in relation to the characteristics of the computer hard- and software).

To avoid those problems, the use of a mixing procedure, as suggested by Bays and Durham (see e.g. Knuth, 1981), recommended. The procedure is illustrated on Fig.7.1.

Fig.7.1: The mixing procedure

Example of the mixing procedureStep 1. Simulate n=5 random numbers (normally e.g. 100 values would be generated):

R(1) = 0.2R(2) = 0.7R(3) = 0.1R(4) = 0.4R(5) = 0.9

We call this the mixing pile.

Step 2. Simulate a new random number Y ; Y = 0.3

Consequently j = 1 + TRUNC(5*0.3) = 2

The next Y value becomes R(2) = 0.7

Step 3. Simulate a new random number Y; Y = 0.5

Replace R(2) in the mixing pile by Y

The pile becomes R(1) = 0.2 R(2) = 0.5 R(3) = 0.1 R(4) = 0.4 R(5) = 0.9



Step 4. Simulate new random number Y; Y = 0.8

Replace R(3) in the mixing pile by Y

The pile becomes R(1) = 0.2 R(2) = 0.5 R(3) = 0.8 R(4) = 0.4 R(5) = 0.9



Etc.

abmoverflow at 2x

1061283607520

2111663787521

4211663787522

43025311197923

171112135312524

1412841113445625

421171178100026

1093182578643627

12772474911712828

23112536712005029

38772957313996830

81212841113445631

93014929723328032

2416374441177187533

1722110783951030034

845894598921772835

Tab.7.1: Constants for random generators (Knuth,1981)

The Box and Muller methodBe {xi} and {yi} two time series with uniformly distributed numbers between 0 and 1.

These series can be transformed into time series with Normally distributed pseudo random numbers, {ui}, with mean zero and standard deviation 1 by the following transformation:

or by the transformation

7.2.4The transformation of the pseudo random time series

During the final step of the procedure, the pseudo random time series {ui} is transformed into a series with the same distribution characteristics as the observed time series.

If the observed time series consists of Normally distributed numbers, the transformation of the time series {ui} with Normally distributed numbers with mean 0 and standard deviation 1 to a time series {Yi} - with observed mean

and standard deviation s is simply achieved by applying the following equation:

If the observed time series is log-normally distributed (with mean m and standard deviation s), the following transformation should be applied:

If the observed time series is characterised by a Pearson III type distribution, the transformation equation is:

with and Cs the skewness coefficient.7.3A stochastic model for yearly flow volumes if persistence is presentIf persistence is present, the synthetic time series cannot be generated by simply sampling values from a probability distribution, as the persistence needs to be included in the synthetic series. This can be achieved by means of an AR(I)MA model (see Section 9). 8STOCHASTIC MODELS FOR MONTHLY FLOW VOLUMES8.1 IntroductionThe generation of monthly flow data is often characterised by a typical periodicity or yearly cycle, as a consequence of the yearly rainfall pattern. This periodicity that is uncommon for yearly data makes that more complex models need to be used.The periodicity makes the monthly time series non-stationary. The non-stationarity can occur for:

the mean monthly values (e.g. the mean flows for the dry months of the year are smaller than those during the wet season); the variances of the monthly values (e.g. the variance on the flows for the dry months are smaller than those for the wet months); the correlation between the flows of consecutive months (e.g. the correlation between the flows of June and July is higher than the correlation between the flows of January and February).The method that can be used for monthly flows depends on the presence or absence of these not-stationarities. Note that the classification that follows may refer to the actual values in the series or to a transformation of these values (e.g. a logarithmic transformation).

8.2Mean, variance and correlations are non-stationary : the Tomas Fiering model

The method presented by Thomas and Fiering is based on the use of 12 linear regression equations, one for each month of the year :

with q the flow volume and s the standard deviation of the volume.

The index i represents a sequential time index, index j stands for the corresponding month of the year ( j = 112 ). Example: consider a time series that starts in Januari 2000; the index j for the month of March 2001 is 3; index i = 12+3 = 15.

( represents a random Normal deviation with mean 0 and standard deviation 1.

rj is the correlation between the months j and j+1.

bj represents the slope of the regression line between the flow of month j+1 and the flow of month j

Note that if the correlation between the flows of consecutive months tends towards 1 or 1, the random term tends towards zero. On the opposite, for low correlations, b will be small and the random term will predominate.

A synthetic time series can be generated with the model, using a pseudo random generator for the generation of the ( values.

8.3The correlation structure and the variance are stationaryIn this case, the periodicity in the mean flows has to be removed.

Be

the average flow of month j ; j = 112

A residual time series can be derived:

So far, the number of parameters of the model equals 12 (the monthly mean flows). The number of parameters can be reduced by fitting a Fourier series on the mean flows:

The next step of the procedure consists of the analysis of the residual time series for persistence.

If persistence is absent, {y( corresponds to a pseudo random series with a given distribution and parameters (to be determined). The synthetic series is then generated as:

whereby ( represents a pseudo random number, characterised by the same distribution as {y(.

If the residual series shows persistence, y needs to be represented by an AR(I)MA model .8.4Only the correlation structure is stationaryThe procedure is similar to the one described for the previous case, except the residual time series will now be represented by

Fourier harmonics can be defined for both the monthly mean and for the standard deviations.As a consequence of the smoothing process, the mean of the y values usually deviates from zero and an additional transformation is performed. Be

the mean and sy the standard deviation of the y values. The new reduced variables are determined as:

The z time series is then tested on persistence.

If there is no persistence, the z values can be generated by a pseudo random generator. next the y values and, finally, the q are derived by inverse transformation.

Most of the time, persistence will be present and, consequently, the z values will have to be modelled by means of an AR(I)MA model.

9ARIMA MODELS

9.1 IntroductionAutoregressive Integrated Moving Average models (ARIMA) are general linear models for the representation of stationary or non-stationary sequences of data.

The ARIMA models can be reduced to one of the following models: AR or autoregressive model; MA or moving average model (e: Moving Average); ARMA or autoregressive - moving average model.9.2Autoregressive models

In an autoregressive (AR) model, the deviation of a value from the mean value at a given moment in time is explained by similar deviations that occurred in the past and by a random component:

or

whereby

k is called the order of the AR model.

An autoregressive model of order k is characterised by k+2 parameters:

k values of (; the mean of the y values; the standard deviation of the normally distributed random variable (.The autoregressive model of first order, also called Markov chain, found many applications in hydrology. The parameter (1 represents the lag 1 serial correlation coefficient and is a measure for the strength of the persistence in the time series: a value of ((( close to 1 indicates a strong persistence and a value close to 0 a weak persistence.

9.3Moving average models

In a moving average (MA) model, the deviation of a value from the mean value is explained by a weighted sum of independent random deviations (:

or

whereby

An MA model of order l includes l+2 parameters:

l values of (; the mean of the y values; the standard deviation of the ( values (the mean is zero ).Pure MA models are rarely used in hydrology.

9.4ARMA and ARIMA models

By combining an AR model with an MA model, an ARMA model is obtained:

or

Even more general are the autoregressive integrated moving average models:

whereby the differential operator is defined by:

The reference to an ARIMA model is usually of the type ARIMA(k,d,l). Hereby, k represents the order of the AR model, d the differentiation order and l the order of the MA model.

ARIMA models are of particular importance for the representation of the Hurst phenomenon in series with long term persistence. Pure AR models are not able to represent such long term persistence.

9.5Model identificationThe key elements for the identification of the appropriate model type include: the visual representation of the time series; the behaviour of the autocorrelation function; the behaviour of the partial autocorrelation function.The partial autocorrelation kk is obtained by application of an AR process of order k = 1,2, on the time series

The values of kk as a function of k represent the partial autocorrelation function.

The characteristics of the identification elements for the different models are given in Tab.9.1.

modelautocorrelationpartial autocorrelation

AR(k)Indefinite in length, smoothed exponentials and/or sinus wavesDefinite in length (till lag k)

MA(l)Definite in length (till lag l)

Indefinite in length, smoothed exponentials and/or sinus waves

ARMA(k,l)Indefinite in length, first values are irregular, afterwards smoothed exponentials and/or sinus wavesIndefinite in length, first values are irregular, afterwards smoothed exponentials and/or sinus waves

Tab.9.1 : Identification of ARMA processesREFERENCES

Box, G.E.P. & G.M. Jenkins (1970). Time series analysis : forecasting and control. Holden-Day Inc.

Clarke R.T. (1973). Mathematical models in hydrology. Irrigation and drainage paper, 19, FAO, UN, Rome.

Delleur J.W. (1991). Time series analysis applied to hydrology. VUB-Hydrology, No.19, Vrije Universiteit Brussel

Knuth D. (1981). Seminumerical algorithms. In : The art of computer programming, Addisson-Wesley.

Mutreja K.N. (1986). Applied hydrology. Tata McGraw Hill Co., New Delhi.

Cunnane C. (1989). Statistical distributions for flood frequency analysis. World meteorological organization. Operational Hydrology Report No.33. WMO No.718. EMBED Equation.2

-2/(N

2/(N

3

5

6

4

2

1

0

rk

k

GENERATE n RANDOM NUMBERS

R(1), ,R (n)

GENERATE RANDOM NUMBER Y

j = 1 + TRUNC ( n * Y )

Y=R(j) OUTPUT

GENERATE NEW Y = R(j)

_1006243465.unknown

_1006933774.unknown

_1007204811.unknown

_1160115456.unknown

_1191315634.unknown

_1263732224.unknown

_1191315713.unknown

_1160116187.unknown

_1007204962.unknown

_1007205315.unknown

_1007228693.bin

_1007204916.unknown

_1007192562.unknown

_1007204685.unknown

_1007204695.unknown

_1007204664.unknown

_1007192793.unknown

_1007190893.unknown

_1007192346.unknown

_1006936889.unknown

_1006245493.unknown

_1006245680.unknown

_1006250182.unknown

_1006250236.unknown

_1006250341.unknown

_1006245710.unknown

_1006245583.unknown

_1006245598.unknown

_1006245528.unknown

_1006244364.unknown

_1006245459.unknown

_1006244823.unknown

_1006243737.unknown

_934810727.unknown

_935908732.unknown

_936006476.unknown

_936008264.unknown

_1006243330.unknown

_936008613.unknown

_938256383.unknown

_938256843.unknown

_938239589.unknown

_936008493.unknown

_936006929.unknown

_936008017.unknown

_936006722.unknown

_935910535.unknown

_935912514.unknown

_935912808.unknown

_935912153.unknown

_935910084.unknown

_935910377.unknown

_935909481.unknown

_935846506.unknown

_935908145.unknown

_935908298.unknown

_935907202.unknown

_934812288.unknown

_935845082.unknown

_934811025.unknown

_934802704.unknown

_934806013.unknown

_934808642.unknown

_934803439.unknown

_908656244.unknown

_934801342.unknown

_934801596.unknown

_934801059.unknown

_908655347.unknown

_908655578.unknown

_908654794.unknown

_908655059.unknown

stochastic models for surface hydrology

Documents