frequency_analysis

8/14/2019 Frequency_Analysis

1/12

KNS3143/07-08/RB

10 Frequency Analysis

1

FREQUENCY ANALYSES

1 Introduction

Hydrologic processes such as rainfall, snowfall, floods, droughts, etc., are usually investigated byanalyzing their records of observations. Many characteristics of these processes seem too wary in

a way amenable to deterministic analysis. Frequency analysis is performed to determine the

frequency of the likely occurrence of hydrologic events. This information is required in the

solution of a variety of water resource problems. Examples include design of reservoirs,

floodways, irrigation systems, stream-control works, water supply systems and hydroelectric

power plants; etc.

2 Motivation for Using Probability

An engineering design always takes into account the extreme situations. When a highway bridge

is built, it must be able to pass the design discharge of specified magnitude without being flooded

during its life span. The same holds when a dam is built, or drainage structure is built. When anurban area is developed, storm drainage is provided so that the area is not likely to be flooded.

When a building is constructed, it must withstand design wind loads of given magnitude. When a

water-supply scheme is designed, the scheme must be able to supply water to the specified extent.

Clearly, a design event is required for designing a water-resources project. The magnitude of this

design event varies from one project to the other, that is, it will be different for different

structures such as culverts, bridges, dams, etc. No matter how large the design event used, there is

always some chance (or risk) that this design event will be exceeded during the intended useful

life of the project. This and similar questions can be answered during concepts of probability.

Concepts of Probability

Let us consider an experiment of tossing a coin, which has a head (H) and a tail (T). When a coin

is tossed, we get either H or T, but cannot get both at the same time, for they are mutually

exclusive. Since there are only two possibilities in this case, the probability of getting H and T is

0.5, each. Thus, one can say that in a single trial, the probability of an equally likely event is

equal to the number of outcomes divided by the number of possible outcomes. In tossing, H is an

event, the appearance of H is an outcome and H and T are possible outcomes. In more than one

trial, the probability of an event equal to the number of success divided by the number of trials.

The probability of H or T is always 1, for success will be achieved every time the coin is tossed.

This leads to formulation of the following rules.

Rule 1

The probability of an event (E) is non-negative and less than or equal to 1.

1)(0 EP

The sum of probabilities of all possible outcomes in any trial is 1.

1)(1

==

N

i

iEP


2/12

KNS3143/07-08/RB


2

Rule 2

For two independent mutually exclusive events, E1and E2, the probability of E1or E2is equal to

the probability of E1plus the probability of E2.

)()()()( 212121 EPEPEEPorEEP +==

Rule 3

For two independent events E1 and E2, the probability of E1 and E2 is equal to the product of

individual probabilities of E1and E2.

)()()()( 212121 ExPEPEEPandEEP ==

Return Period

Suppose a coin is tossed once a year. On the average, its head will appear once in 2 years or

1/P(H). This reciprocal of the probability occurrence is termed the return period, or recurrenceinterval, T, in hydrology, and is widely used in hydrologic frequency analysis.

T = 1 / P

Clearly, T is an average value and not an actual value of occurrence of the associated outcome.

Thus a storm that has been exceed on the average once in 10 years has a probability of

exceedance in any year of 1/10 or 0.1. In other terms, the storm that is exceeded on the average of

1 year in 10 years has a percent probability of 0.1*1100=10% of being exceeded any year. This

does not mean that every 10 years a storm of that magnitude will occur. Therefore, the probability

that the storm will occur in any years is

P = 1 / T

The probability that the storm will not occur, P , in any year is

TPP /111 ==

The probability that the storm will not occur for n successive years is given by

( ) nnnT

PP )11()1( ==

This is because the probability of storm occurrence is the same from year to year. The probability

that the storm will occur at least once in n successive years, sometimes also called the risk R, is

nnnTPPR )/11(1)1(1)(1 ===

The above equation can be used to calculate the return periods for various degree of risk and

expected project life as given in Table 1.


3/12

KNS3143/07-08/RB


3

Frequency and Probability Functions

If the observations in a sample are identically distributed (each sample value drawn from the

same probability distribution), they can be arranged to form a frequency histogram. First, the

feasible range of the random variable is divided into discrete intervals, then the number of

observations falling into each interval is counted, and finally the result is plotted as bar graph as

shown in Fig 1.

The width x of the interval used in setting up the frequency histogram is chosen to be as smallas possible while still having sufficient observations falling into each interval for the histogram tohave a reasonable smooth variations over the range of the data.

If the number of observations niin interval i, covering the range (xi-x, xi), is divided by the totalnumber of observations n, the result is called therelative frequency function fs(x).


4/12

KNS3143/07-08/RB


4

n

nxf iis =)(

which is an estimate of P(xi-x X xi), the probability that the random variable X will lie in theinterval [xi-x, xi]. The subscript s indicates that the function is calculated from the sample data.

The sum of the values of the relative frequencies up to a given point is thecumulative frequency

function Fs(x).

=)( is xF =

i

j

js xf1

)(

This is an estimate of P(Xxi), the cumulative probability of xi.

The relative frequency and cumulative frequency functions are defined for a sample;

corresponding functions for the population are approached as limits as nand x0. In thelimit, the relative frequency function divided by the interval length x becomes the probabilitydensity function f(x).

The cumulative frequency function becomes the probability distribution function F(x).

For a given value of x, F(x) is the cumulative probability P(Xx), and it can be expressed as theintegral of the probability density function over the range Xx:

==x

duufxFxXP )()()(

where u is a dummy variable of integration.

One of the best-known probability density functions is that forming the familiar bell-shaped curve

for the normal distribution.

Distribution Characteristics

There are 4 principles characteristics of probability distributions. These parameters are estimated

from the distribution of observed sample data, and are then used as estimates of the parameters of

the population distributions.

1. Central Tendency

a. Arithmetic MeanThis is the most common and most reliable measure of central tendency. This is the first moment

about origin and can be expressed as

== dxxxfxE )()(

An estimation of the population mean is obtained from a sample as

nx0

xxfxf s

= )(lim)(

nx0

)(lim)( xFxF s=


5/12

KNS3143/07-08/RB


5

=

=n

i

ixn

x1

1

2. Dispersion

The two most common measures of dispersion or variability are range and variance. For a

sample, the range is the difference between the largest and smallest values, and conveys an idea

about the spread data. The range of many continuous hydrologic variables is from 0 to .

The most commonly used measure of dispersion is the variance, or the standard deviation. The

variance is the second moment about the mean and is expressed for a continuous variable X as222

2

2 ])[(][])[(][ XEXEXEXVar ====

This express variance as the average squared deviations about the mean.

For a discrete population size n,

=

=n

i

ixn 1

22 )(1

Because is not known precisely, an estimate of variance s2 is computed from the observedsample as:

=

=n

i

i xxn

s1

22 )(1

If the sample data are grouped, then

= =k

iii xxxfs

1

22

))((

where k is the number of class intervals, and f(xi) is the relative frequency of the ith class interval.

For n


6/12

KNS3143/07-08/RB


6

=

=

=ni

i

i xxn

M1

3

3 )(1

However, the most commonly used measure of asymmetry is the coefficient of skewness (C s), ,which is defined as the ratio of the third central moment divided by the cube of standard

deviation.3

3 /=

An unbiased estimate of for a sample size is obtained as

3

3

2

)2)(1( snn

MnCs

=

This coefficient is dimensionless and useful in comparing distributions. For a symmetrical

distribution, 3 =0 and therefore, =0. For a distribution that has a long tail to the right, >0, the

distribution is positively skewed. If the distribution has a long tail to the left, it has negative

skewness. If the sample n


7/12

KNS3143/07-08/RB


7

2- Log Normal DistributionThis is an extension of the normal distribution wherein the logarithmic of a sequence are

considered to be normally distributed. It has two-parameter, bell-shaped, symmetrical

distribution in this form. In terms of untransformed variate, x, it is a three-parameter (skewed)distribution having a range from 0 to .

3- Extreme Value DistributionConsider n data series with m observation in each series. A largest or smallest values is

obtained out of m observations in each series. There will be n such extreme values. The

probability distribution of these extreme values depends on the sample size, m, and the parent

distribution of the series. Frechet, Fisher and Tippet found that the distribution approaches an

asymptotic form as m is increased indefinitely. The type of asymptotic form is dependent on

the parent distribution from which the extreme values were obtained. Three types of

asymptotic distributions have been developed based on different parent distributions.

Thetype I

extreme value distribution, also known as theGumbel

distribution, results formthe exponential-type parent distribution.

Thetype II distribution originates form the cauchy-type distribution of the parent distribution

but it has little application in hydrologic events.

The type IIIor Weibulldistribution also arises from an exponential-type parent distribution

is limited in the direction of extreme values.

4- Log-Pearson Type III (Gamma-Type) DistributionKarl Pearson proposed a general equation for a distribution that fits many distributions-

including normal, beta, and gamma distributions- by choosing appropriate value for its

parameters. A form of the Pearson function, similar to the gamma is known as the Pearsontype III distribution. It is a distribution in three parameters with a limited range in the left

direction, unbounded to the right, and has a large skew. Since the flood low series commonly

indicate considerable skew, this is used as the distribution of flood peaks. The distribution is

usually fitted to the logarithms of flood values because this results lesser skewness. The log-

Pearson type III distribution has been adopted as a standard by US federal agencies for flood

analyses.


8/12


9/12

KNS3143/07-08/RB


9


10/12

KNS3143/07-08/RB


10

Method of Flood Frequency Analysis

The different methods to prepare a flood frequency curve are

1.

Graphical method2. Analytical method3. Using spreadsheet (Microsoft Excel)4. Using Special software (Smada or HEC)

Graphical Method using Probability Paper

The cumulative probability of a theoretical distribution may be represented graphically on

probability paper designed for the distribution. On such paper the ordinate usually represent the

value of x in a certain scale and the abscissa represents the probability P(Xx) or P(Xx), thereturn period T, or the reduced variate yT. The ordinate and abscissa scales are designed that the

data to be fitted are expected to appear close to a straight line. The purpose of using the

probability paper is to linearize the probability relationship so that the plotted data can be used forinterpolation, extrapolation or comparison purposes.

Plotting Positions

Plotting positions refers to the probability value assigned to each piece of data to be plotted.

Numerous methods have been proposed for the determination of plotting positions, most of which

are empirical. If n is the total number of values to be plotted and m is the rank of a value in a listordered by descending magnitude, the exceedence probability of the mth largest value, xm, is for

large n,

n

mxXP m = )(

However, this simple formula (known as Californias formula) produces a probability of 100%for m=n, which may not be easily plotted on a probability scale. As an adjustment, the above

formula may be modified to

n

amxXP m

)(

While the formula does not produce a 100%, it yields a zero probability (for m=1), which may

not be easily plotted on a probability scale either.

Several formula are suggested as shown in Table 3.

Procedure

1. Rank the data from the largest to the smallest values. If two or more observations havethe same value, assume that they have slightly different values and assign each a different

rank.

2. Calculate the plotting positions by using one of the formulas given in Table3. Do not omit any years during the period of record since it will be a biasing effect. If any

data are missing, make their estimates.


11/12

KNS3143/07-08/RB


11

4. Select the type of probability paper to be used5. Plot the magnitude of flood on the ordinate and the corresponding plotting position on the

abscissa, representing the probability of exceedence as one side of the scale and the

return period on the other side.

The extrapolation of the data for longer return periods should be done very cautiously because the

probability distribution is very sensitive in the tail part of the curve.

Probability plotting on a spreadsheet

Normal distribution

For normal distribution Pi is converted to another scale Zi by the following relation

])1([063.5 135.0135.0 iii PPZ = Now, xiand Ziare linearly related and a plot of xivs Ziwill give a straight line on arithmetic scale

if the data is normally distributed.

Logarithmic distribution

Exactly the same procedure as for the Gumbel plotting, but log scale is used for the Y-axis.

Gumbel Probability Plot

To perform Gumbel plot, Pi needs to be converted to another scale y i by using the following

relation.

)]ln(ln[ ii Py = Now, xiand yiare linearly related, a plot of xivs yiwill give a straight line on arithmetic scale if

the data are Gumbel distribution.

Log Gumbel Probability plotting

Exactly the same procedure as for Gumbel plotting, but use log scale for y-axis.

Goodness-of-fit tests

The distribution that has been chosen to fit the observed data can be tested for the goodness-of-fit.

This can be done by comparing the theoretical and sample values. With graphical methods this is

done by visually assessing the goodness-of-fit. Statistical goodness-of-fit are available to test the

hypothesis the observed data from the fitted probability distribution.

Probability plot correlation coefficient (PPCC) tests.

This test is usually performed to test the linearity of a probability plot on normal, lognormal,

gumbel, loggumbel plot.

PPCC test for normal probability plot


12/12

KNS3143/07-08/RB


12

Compute the correlation of coefficient (r) between x iand zi. If r is greater than the critical r value

at =5%, then the data are normally distributed.

PPCC test for Lognormal probability plot

Compute the correlation of coefficient (r) between Log(xi) and zi. If r is greater than the critical r

value at =5%, then the data are Lognormally distributed.

PPCC test for Gumbel probability plot

Compute the correlation of coefficient (r) between xiand yi. If r is greater than the critical r value

at =5%, then the data are Gumbel distributed.

PPCC test for LogGumbel probability plot

Compute the correlation of coefficient (r) between Log(xi) and yi. If r is greater than the critical r

value at =5%, then the data are LogGumbel distributed.

frequency_analysis

Documents