frequency_analysis
TRANSCRIPT
-
8/14/2019 Frequency_Analysis
1/12
KNS3143/07-08/RB
10 Frequency Analysis
1
FREQUENCY ANALYSES
1 Introduction
Hydrologic processes such as rainfall, snowfall, floods, droughts, etc., are usually investigated byanalyzing their records of observations. Many characteristics of these processes seem too wary in
a way amenable to deterministic analysis. Frequency analysis is performed to determine the
frequency of the likely occurrence of hydrologic events. This information is required in the
solution of a variety of water resource problems. Examples include design of reservoirs,
floodways, irrigation systems, stream-control works, water supply systems and hydroelectric
power plants; etc.
2 Motivation for Using Probability
An engineering design always takes into account the extreme situations. When a highway bridge
is built, it must be able to pass the design discharge of specified magnitude without being flooded
during its life span. The same holds when a dam is built, or drainage structure is built. When anurban area is developed, storm drainage is provided so that the area is not likely to be flooded.
When a building is constructed, it must withstand design wind loads of given magnitude. When a
water-supply scheme is designed, the scheme must be able to supply water to the specified extent.
Clearly, a design event is required for designing a water-resources project. The magnitude of this
design event varies from one project to the other, that is, it will be different for different
structures such as culverts, bridges, dams, etc. No matter how large the design event used, there is
always some chance (or risk) that this design event will be exceeded during the intended useful
life of the project. This and similar questions can be answered during concepts of probability.
Concepts of Probability
Let us consider an experiment of tossing a coin, which has a head (H) and a tail (T). When a coin
is tossed, we get either H or T, but cannot get both at the same time, for they are mutually
exclusive. Since there are only two possibilities in this case, the probability of getting H and T is
0.5, each. Thus, one can say that in a single trial, the probability of an equally likely event is
equal to the number of outcomes divided by the number of possible outcomes. In tossing, H is an
event, the appearance of H is an outcome and H and T are possible outcomes. In more than one
trial, the probability of an event equal to the number of success divided by the number of trials.
The probability of H or T is always 1, for success will be achieved every time the coin is tossed.
This leads to formulation of the following rules.
Rule 1
The probability of an event (E) is non-negative and less than or equal to 1.
1)(0 EP
The sum of probabilities of all possible outcomes in any trial is 1.
1)(1
==
N
i
iEP
-
8/14/2019 Frequency_Analysis
2/12
KNS3143/07-08/RB
10 Frequency Analysis
2
Rule 2
For two independent mutually exclusive events, E1and E2, the probability of E1or E2is equal to
the probability of E1plus the probability of E2.
)()()()( 212121 EPEPEEPorEEP +==
Rule 3
For two independent events E1 and E2, the probability of E1 and E2 is equal to the product of
individual probabilities of E1and E2.
)()()()( 212121 ExPEPEEPandEEP ==
Return Period
Suppose a coin is tossed once a year. On the average, its head will appear once in 2 years or
1/P(H). This reciprocal of the probability occurrence is termed the return period, or recurrenceinterval, T, in hydrology, and is widely used in hydrologic frequency analysis.
T = 1 / P
Clearly, T is an average value and not an actual value of occurrence of the associated outcome.
Thus a storm that has been exceed on the average once in 10 years has a probability of
exceedance in any year of 1/10 or 0.1. In other terms, the storm that is exceeded on the average of
1 year in 10 years has a percent probability of 0.1*1100=10% of being exceeded any year. This
does not mean that every 10 years a storm of that magnitude will occur. Therefore, the probability
that the storm will occur in any years is
P = 1 / T
The probability that the storm will not occur, P , in any year is
TPP /111 ==
The probability that the storm will not occur for n successive years is given by
( ) nnnT
PP )11()1( ==
This is because the probability of storm occurrence is the same from year to year. The probability
that the storm will occur at least once in n successive years, sometimes also called the risk R, is
nnnTPPR )/11(1)1(1)(1 ===
The above equation can be used to calculate the return periods for various degree of risk and
expected project life as given in Table 1.
-
8/14/2019 Frequency_Analysis
3/12
KNS3143/07-08/RB
10 Frequency Analysis
3
Frequency and Probability Functions
If the observations in a sample are identically distributed (each sample value drawn from the
same probability distribution), they can be arranged to form a frequency histogram. First, the
feasible range of the random variable is divided into discrete intervals, then the number of
observations falling into each interval is counted, and finally the result is plotted as bar graph as
shown in Fig 1.
The width x of the interval used in setting up the frequency histogram is chosen to be as smallas possible while still having sufficient observations falling into each interval for the histogram tohave a reasonable smooth variations over the range of the data.
If the number of observations niin interval i, covering the range (xi-x, xi), is divided by the totalnumber of observations n, the result is called therelative frequency function fs(x).
-
8/14/2019 Frequency_Analysis
4/12
KNS3143/07-08/RB
10 Frequency Analysis
4
n
nxf iis =)(
which is an estimate of P(xi-x X xi), the probability that the random variable X will lie in theinterval [xi-x, xi]. The subscript s indicates that the function is calculated from the sample data.
The sum of the values of the relative frequencies up to a given point is thecumulative frequency
function Fs(x).
=)( is xF =
i
j
js xf1
)(
This is an estimate of P(Xxi), the cumulative probability of xi.
The relative frequency and cumulative frequency functions are defined for a sample;
corresponding functions for the population are approached as limits as nand x0. In thelimit, the relative frequency function divided by the interval length x becomes the probabilitydensity function f(x).
The cumulative frequency function becomes the probability distribution function F(x).
For a given value of x, F(x) is the cumulative probability P(Xx), and it can be expressed as theintegral of the probability density function over the range Xx:
==x
duufxFxXP )()()(
where u is a dummy variable of integration.
One of the best-known probability density functions is that forming the familiar bell-shaped curve
for the normal distribution.
Distribution Characteristics
There are 4 principles characteristics of probability distributions. These parameters are estimated
from the distribution of observed sample data, and are then used as estimates of the parameters of
the population distributions.
1. Central Tendency
a. Arithmetic MeanThis is the most common and most reliable measure of central tendency. This is the first moment
about origin and can be expressed as
== dxxxfxE )()(
An estimation of the population mean is obtained from a sample as
nx0
xxfxf s
= )(lim)(
nx0
)(lim)( xFxF s=
-
8/14/2019 Frequency_Analysis
5/12
KNS3143/07-08/RB
10 Frequency Analysis
5
=
=n
i
ixn
x1
1
2. Dispersion
The two most common measures of dispersion or variability are range and variance. For a
sample, the range is the difference between the largest and smallest values, and conveys an idea
about the spread data. The range of many continuous hydrologic variables is from 0 to .
The most commonly used measure of dispersion is the variance, or the standard deviation. The
variance is the second moment about the mean and is expressed for a continuous variable X as222
2
2 ])[(][])[(][ XEXEXEXVar ====
This express variance as the average squared deviations about the mean.
For a discrete population size n,
=
=n
i
ixn 1
22 )(1
Because is not known precisely, an estimate of variance s2 is computed from the observedsample as:
=
=n
i
i xxn
s1
22 )(1
If the sample data are grouped, then
= =k
iii xxxfs
1
22
))((
where k is the number of class intervals, and f(xi) is the relative frequency of the ith class interval.
For n
-
8/14/2019 Frequency_Analysis
6/12
KNS3143/07-08/RB
10 Frequency Analysis
6
=
=
=ni
i
i xxn
M1
3
3 )(1
However, the most commonly used measure of asymmetry is the coefficient of skewness (C s), ,which is defined as the ratio of the third central moment divided by the cube of standard
deviation.3
3 /=
An unbiased estimate of for a sample size is obtained as
3
3
2
)2)(1( snn
MnCs
=
This coefficient is dimensionless and useful in comparing distributions. For a symmetrical
distribution, 3 =0 and therefore, =0. For a distribution that has a long tail to the right, >0, the
distribution is positively skewed. If the distribution has a long tail to the left, it has negative
skewness. If the sample n
-
8/14/2019 Frequency_Analysis
7/12
KNS3143/07-08/RB
10 Frequency Analysis
7
2- Log Normal DistributionThis is an extension of the normal distribution wherein the logarithmic of a sequence are
considered to be normally distributed. It has two-parameter, bell-shaped, symmetrical
distribution in this form. In terms of untransformed variate, x, it is a three-parameter (skewed)distribution having a range from 0 to .
3- Extreme Value DistributionConsider n data series with m observation in each series. A largest or smallest values is
obtained out of m observations in each series. There will be n such extreme values. The
probability distribution of these extreme values depends on the sample size, m, and the parent
distribution of the series. Frechet, Fisher and Tippet found that the distribution approaches an
asymptotic form as m is increased indefinitely. The type of asymptotic form is dependent on
the parent distribution from which the extreme values were obtained. Three types of
asymptotic distributions have been developed based on different parent distributions.
Thetype I
extreme value distribution, also known as theGumbel
distribution, results formthe exponential-type parent distribution.
Thetype II distribution originates form the cauchy-type distribution of the parent distribution
but it has little application in hydrologic events.
The type IIIor Weibulldistribution also arises from an exponential-type parent distribution
is limited in the direction of extreme values.
4- Log-Pearson Type III (Gamma-Type) DistributionKarl Pearson proposed a general equation for a distribution that fits many distributions-
including normal, beta, and gamma distributions- by choosing appropriate value for its
parameters. A form of the Pearson function, similar to the gamma is known as the Pearsontype III distribution. It is a distribution in three parameters with a limited range in the left
direction, unbounded to the right, and has a large skew. Since the flood low series commonly
indicate considerable skew, this is used as the distribution of flood peaks. The distribution is
usually fitted to the logarithms of flood values because this results lesser skewness. The log-
Pearson type III distribution has been adopted as a standard by US federal agencies for flood
analyses.
-
8/14/2019 Frequency_Analysis
8/12
-
8/14/2019 Frequency_Analysis
9/12
KNS3143/07-08/RB
10 Frequency Analysis
9
-
8/14/2019 Frequency_Analysis
10/12
KNS3143/07-08/RB
10 Frequency Analysis
10
Method of Flood Frequency Analysis
The different methods to prepare a flood frequency curve are
1.
Graphical method2. Analytical method3. Using spreadsheet (Microsoft Excel)4. Using Special software (Smada or HEC)
Graphical Method using Probability Paper
The cumulative probability of a theoretical distribution may be represented graphically on
probability paper designed for the distribution. On such paper the ordinate usually represent the
value of x in a certain scale and the abscissa represents the probability P(Xx) or P(Xx), thereturn period T, or the reduced variate yT. The ordinate and abscissa scales are designed that the
data to be fitted are expected to appear close to a straight line. The purpose of using the
probability paper is to linearize the probability relationship so that the plotted data can be used forinterpolation, extrapolation or comparison purposes.
Plotting Positions
Plotting positions refers to the probability value assigned to each piece of data to be plotted.
Numerous methods have been proposed for the determination of plotting positions, most of which
are empirical. If n is the total number of values to be plotted and m is the rank of a value in a listordered by descending magnitude, the exceedence probability of the mth largest value, xm, is for
large n,
n
mxXP m = )(
However, this simple formula (known as Californias formula) produces a probability of 100%for m=n, which may not be easily plotted on a probability scale. As an adjustment, the above
formula may be modified to
n
amxXP m
)(
While the formula does not produce a 100%, it yields a zero probability (for m=1), which may
not be easily plotted on a probability scale either.
Several formula are suggested as shown in Table 3.
Procedure
1. Rank the data from the largest to the smallest values. If two or more observations havethe same value, assume that they have slightly different values and assign each a different
rank.
2. Calculate the plotting positions by using one of the formulas given in Table3. Do not omit any years during the period of record since it will be a biasing effect. If any
data are missing, make their estimates.
-
8/14/2019 Frequency_Analysis
11/12
KNS3143/07-08/RB
10 Frequency Analysis
11
4. Select the type of probability paper to be used5. Plot the magnitude of flood on the ordinate and the corresponding plotting position on the
abscissa, representing the probability of exceedence as one side of the scale and the
return period on the other side.
The extrapolation of the data for longer return periods should be done very cautiously because the
probability distribution is very sensitive in the tail part of the curve.
Probability plotting on a spreadsheet
Normal distribution
For normal distribution Pi is converted to another scale Zi by the following relation
])1([063.5 135.0135.0 iii PPZ = Now, xiand Ziare linearly related and a plot of xivs Ziwill give a straight line on arithmetic scale
if the data is normally distributed.
Logarithmic distribution
Exactly the same procedure as for the Gumbel plotting, but log scale is used for the Y-axis.
Gumbel Probability Plot
To perform Gumbel plot, Pi needs to be converted to another scale y i by using the following
relation.
)]ln(ln[ ii Py = Now, xiand yiare linearly related, a plot of xivs yiwill give a straight line on arithmetic scale if
the data are Gumbel distribution.
Log Gumbel Probability plotting
Exactly the same procedure as for Gumbel plotting, but use log scale for y-axis.
Goodness-of-fit tests
The distribution that has been chosen to fit the observed data can be tested for the goodness-of-fit.
This can be done by comparing the theoretical and sample values. With graphical methods this is
done by visually assessing the goodness-of-fit. Statistical goodness-of-fit are available to test the
hypothesis the observed data from the fitted probability distribution.
Probability plot correlation coefficient (PPCC) tests.
This test is usually performed to test the linearity of a probability plot on normal, lognormal,
gumbel, loggumbel plot.
PPCC test for normal probability plot
-
8/14/2019 Frequency_Analysis
12/12
KNS3143/07-08/RB
10 Frequency Analysis
12
Compute the correlation of coefficient (r) between x iand zi. If r is greater than the critical r value
at =5%, then the data are normally distributed.
PPCC test for Lognormal probability plot
Compute the correlation of coefficient (r) between Log(xi) and zi. If r is greater than the critical r
value at =5%, then the data are Lognormally distributed.
PPCC test for Gumbel probability plot
Compute the correlation of coefficient (r) between xiand yi. If r is greater than the critical r value
at =5%, then the data are Gumbel distributed.
PPCC test for LogGumbel probability plot
Compute the correlation of coefficient (r) between Log(xi) and yi. If r is greater than the critical r
value at =5%, then the data are LogGumbel distributed.