frequency_analysis

Upload: dante-lau-jing-teck

Post on 04-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Frequency_Analysis

    1/12

    KNS3143/07-08/RB

    10 Frequency Analysis

    1

    FREQUENCY ANALYSES

    1 Introduction

    Hydrologic processes such as rainfall, snowfall, floods, droughts, etc., are usually investigated byanalyzing their records of observations. Many characteristics of these processes seem too wary in

    a way amenable to deterministic analysis. Frequency analysis is performed to determine the

    frequency of the likely occurrence of hydrologic events. This information is required in the

    solution of a variety of water resource problems. Examples include design of reservoirs,

    floodways, irrigation systems, stream-control works, water supply systems and hydroelectric

    power plants; etc.

    2 Motivation for Using Probability

    An engineering design always takes into account the extreme situations. When a highway bridge

    is built, it must be able to pass the design discharge of specified magnitude without being flooded

    during its life span. The same holds when a dam is built, or drainage structure is built. When anurban area is developed, storm drainage is provided so that the area is not likely to be flooded.

    When a building is constructed, it must withstand design wind loads of given magnitude. When a

    water-supply scheme is designed, the scheme must be able to supply water to the specified extent.

    Clearly, a design event is required for designing a water-resources project. The magnitude of this

    design event varies from one project to the other, that is, it will be different for different

    structures such as culverts, bridges, dams, etc. No matter how large the design event used, there is

    always some chance (or risk) that this design event will be exceeded during the intended useful

    life of the project. This and similar questions can be answered during concepts of probability.

    Concepts of Probability

    Let us consider an experiment of tossing a coin, which has a head (H) and a tail (T). When a coin

    is tossed, we get either H or T, but cannot get both at the same time, for they are mutually

    exclusive. Since there are only two possibilities in this case, the probability of getting H and T is

    0.5, each. Thus, one can say that in a single trial, the probability of an equally likely event is

    equal to the number of outcomes divided by the number of possible outcomes. In tossing, H is an

    event, the appearance of H is an outcome and H and T are possible outcomes. In more than one

    trial, the probability of an event equal to the number of success divided by the number of trials.

    The probability of H or T is always 1, for success will be achieved every time the coin is tossed.

    This leads to formulation of the following rules.

    Rule 1

    The probability of an event (E) is non-negative and less than or equal to 1.

    1)(0 EP

    The sum of probabilities of all possible outcomes in any trial is 1.

    1)(1

    ==

    N

    i

    iEP

  • 8/14/2019 Frequency_Analysis

    2/12

    KNS3143/07-08/RB

    10 Frequency Analysis

    2

    Rule 2

    For two independent mutually exclusive events, E1and E2, the probability of E1or E2is equal to

    the probability of E1plus the probability of E2.

    )()()()( 212121 EPEPEEPorEEP +==

    Rule 3

    For two independent events E1 and E2, the probability of E1 and E2 is equal to the product of

    individual probabilities of E1and E2.

    )()()()( 212121 ExPEPEEPandEEP ==

    Return Period

    Suppose a coin is tossed once a year. On the average, its head will appear once in 2 years or

    1/P(H). This reciprocal of the probability occurrence is termed the return period, or recurrenceinterval, T, in hydrology, and is widely used in hydrologic frequency analysis.

    T = 1 / P

    Clearly, T is an average value and not an actual value of occurrence of the associated outcome.

    Thus a storm that has been exceed on the average once in 10 years has a probability of

    exceedance in any year of 1/10 or 0.1. In other terms, the storm that is exceeded on the average of

    1 year in 10 years has a percent probability of 0.1*1100=10% of being exceeded any year. This

    does not mean that every 10 years a storm of that magnitude will occur. Therefore, the probability

    that the storm will occur in any years is

    P = 1 / T

    The probability that the storm will not occur, P , in any year is

    TPP /111 ==

    The probability that the storm will not occur for n successive years is given by

    ( ) nnnT

    PP )11()1( ==

    This is because the probability of storm occurrence is the same from year to year. The probability

    that the storm will occur at least once in n successive years, sometimes also called the risk R, is

    nnnTPPR )/11(1)1(1)(1 ===

    The above equation can be used to calculate the return periods for various degree of risk and

    expected project life as given in Table 1.

  • 8/14/2019 Frequency_Analysis

    3/12

    KNS3143/07-08/RB

    10 Frequency Analysis

    3

    Frequency and Probability Functions

    If the observations in a sample are identically distributed (each sample value drawn from the

    same probability distribution), they can be arranged to form a frequency histogram. First, the

    feasible range of the random variable is divided into discrete intervals, then the number of

    observations falling into each interval is counted, and finally the result is plotted as bar graph as

    shown in Fig 1.

    The width x of the interval used in setting up the frequency histogram is chosen to be as smallas possible while still having sufficient observations falling into each interval for the histogram tohave a reasonable smooth variations over the range of the data.

    If the number of observations niin interval i, covering the range (xi-x, xi), is divided by the totalnumber of observations n, the result is called therelative frequency function fs(x).

  • 8/14/2019 Frequency_Analysis

    4/12

    KNS3143/07-08/RB

    10 Frequency Analysis

    4

    n

    nxf iis =)(

    which is an estimate of P(xi-x X xi), the probability that the random variable X will lie in theinterval [xi-x, xi]. The subscript s indicates that the function is calculated from the sample data.

    The sum of the values of the relative frequencies up to a given point is thecumulative frequency

    function Fs(x).

    =)( is xF =

    i

    j

    js xf1

    )(

    This is an estimate of P(Xxi), the cumulative probability of xi.

    The relative frequency and cumulative frequency functions are defined for a sample;

    corresponding functions for the population are approached as limits as nand x0. In thelimit, the relative frequency function divided by the interval length x becomes the probabilitydensity function f(x).

    The cumulative frequency function becomes the probability distribution function F(x).

    For a given value of x, F(x) is the cumulative probability P(Xx), and it can be expressed as theintegral of the probability density function over the range Xx:

    ==x

    duufxFxXP )()()(

    where u is a dummy variable of integration.

    One of the best-known probability density functions is that forming the familiar bell-shaped curve

    for the normal distribution.

    Distribution Characteristics

    There are 4 principles characteristics of probability distributions. These parameters are estimated

    from the distribution of observed sample data, and are then used as estimates of the parameters of

    the population distributions.

    1. Central Tendency

    a. Arithmetic MeanThis is the most common and most reliable measure of central tendency. This is the first moment

    about origin and can be expressed as

    == dxxxfxE )()(

    An estimation of the population mean is obtained from a sample as

    nx0

    xxfxf s

    = )(lim)(

    nx0

    )(lim)( xFxF s=

  • 8/14/2019 Frequency_Analysis

    5/12

    KNS3143/07-08/RB

    10 Frequency Analysis

    5

    =

    =n

    i

    ixn

    x1

    1

    2. Dispersion

    The two most common measures of dispersion or variability are range and variance. For a

    sample, the range is the difference between the largest and smallest values, and conveys an idea

    about the spread data. The range of many continuous hydrologic variables is from 0 to .

    The most commonly used measure of dispersion is the variance, or the standard deviation. The

    variance is the second moment about the mean and is expressed for a continuous variable X as222

    2

    2 ])[(][])[(][ XEXEXEXVar ====

    This express variance as the average squared deviations about the mean.

    For a discrete population size n,

    =

    =n

    i

    ixn 1

    22 )(1

    Because is not known precisely, an estimate of variance s2 is computed from the observedsample as:

    =

    =n

    i

    i xxn

    s1

    22 )(1

    If the sample data are grouped, then

    = =k

    iii xxxfs

    1

    22

    ))((

    where k is the number of class intervals, and f(xi) is the relative frequency of the ith class interval.

    For n

  • 8/14/2019 Frequency_Analysis

    6/12

    KNS3143/07-08/RB

    10 Frequency Analysis

    6

    =

    =

    =ni

    i

    i xxn

    M1

    3

    3 )(1

    However, the most commonly used measure of asymmetry is the coefficient of skewness (C s), ,which is defined as the ratio of the third central moment divided by the cube of standard

    deviation.3

    3 /=

    An unbiased estimate of for a sample size is obtained as

    3

    3

    2

    )2)(1( snn

    MnCs

    =

    This coefficient is dimensionless and useful in comparing distributions. For a symmetrical

    distribution, 3 =0 and therefore, =0. For a distribution that has a long tail to the right, >0, the

    distribution is positively skewed. If the distribution has a long tail to the left, it has negative

    skewness. If the sample n

  • 8/14/2019 Frequency_Analysis

    7/12

    KNS3143/07-08/RB

    10 Frequency Analysis

    7

    2- Log Normal DistributionThis is an extension of the normal distribution wherein the logarithmic of a sequence are

    considered to be normally distributed. It has two-parameter, bell-shaped, symmetrical

    distribution in this form. In terms of untransformed variate, x, it is a three-parameter (skewed)distribution having a range from 0 to .

    3- Extreme Value DistributionConsider n data series with m observation in each series. A largest or smallest values is

    obtained out of m observations in each series. There will be n such extreme values. The

    probability distribution of these extreme values depends on the sample size, m, and the parent

    distribution of the series. Frechet, Fisher and Tippet found that the distribution approaches an

    asymptotic form as m is increased indefinitely. The type of asymptotic form is dependent on

    the parent distribution from which the extreme values were obtained. Three types of

    asymptotic distributions have been developed based on different parent distributions.

    Thetype I

    extreme value distribution, also known as theGumbel

    distribution, results formthe exponential-type parent distribution.

    Thetype II distribution originates form the cauchy-type distribution of the parent distribution

    but it has little application in hydrologic events.

    The type IIIor Weibulldistribution also arises from an exponential-type parent distribution

    is limited in the direction of extreme values.

    4- Log-Pearson Type III (Gamma-Type) DistributionKarl Pearson proposed a general equation for a distribution that fits many distributions-

    including normal, beta, and gamma distributions- by choosing appropriate value for its

    parameters. A form of the Pearson function, similar to the gamma is known as the Pearsontype III distribution. It is a distribution in three parameters with a limited range in the left

    direction, unbounded to the right, and has a large skew. Since the flood low series commonly

    indicate considerable skew, this is used as the distribution of flood peaks. The distribution is

    usually fitted to the logarithms of flood values because this results lesser skewness. The log-

    Pearson type III distribution has been adopted as a standard by US federal agencies for flood

    analyses.

  • 8/14/2019 Frequency_Analysis

    8/12

  • 8/14/2019 Frequency_Analysis

    9/12

    KNS3143/07-08/RB

    10 Frequency Analysis

    9

  • 8/14/2019 Frequency_Analysis

    10/12

    KNS3143/07-08/RB

    10 Frequency Analysis

    10

    Method of Flood Frequency Analysis

    The different methods to prepare a flood frequency curve are

    1.

    Graphical method2. Analytical method3. Using spreadsheet (Microsoft Excel)4. Using Special software (Smada or HEC)

    Graphical Method using Probability Paper

    The cumulative probability of a theoretical distribution may be represented graphically on

    probability paper designed for the distribution. On such paper the ordinate usually represent the

    value of x in a certain scale and the abscissa represents the probability P(Xx) or P(Xx), thereturn period T, or the reduced variate yT. The ordinate and abscissa scales are designed that the

    data to be fitted are expected to appear close to a straight line. The purpose of using the

    probability paper is to linearize the probability relationship so that the plotted data can be used forinterpolation, extrapolation or comparison purposes.

    Plotting Positions

    Plotting positions refers to the probability value assigned to each piece of data to be plotted.

    Numerous methods have been proposed for the determination of plotting positions, most of which

    are empirical. If n is the total number of values to be plotted and m is the rank of a value in a listordered by descending magnitude, the exceedence probability of the mth largest value, xm, is for

    large n,

    n

    mxXP m = )(

    However, this simple formula (known as Californias formula) produces a probability of 100%for m=n, which may not be easily plotted on a probability scale. As an adjustment, the above

    formula may be modified to

    n

    amxXP m

    )(

    While the formula does not produce a 100%, it yields a zero probability (for m=1), which may

    not be easily plotted on a probability scale either.

    Several formula are suggested as shown in Table 3.

    Procedure

    1. Rank the data from the largest to the smallest values. If two or more observations havethe same value, assume that they have slightly different values and assign each a different

    rank.

    2. Calculate the plotting positions by using one of the formulas given in Table3. Do not omit any years during the period of record since it will be a biasing effect. If any

    data are missing, make their estimates.

  • 8/14/2019 Frequency_Analysis

    11/12

    KNS3143/07-08/RB

    10 Frequency Analysis

    11

    4. Select the type of probability paper to be used5. Plot the magnitude of flood on the ordinate and the corresponding plotting position on the

    abscissa, representing the probability of exceedence as one side of the scale and the

    return period on the other side.

    The extrapolation of the data for longer return periods should be done very cautiously because the

    probability distribution is very sensitive in the tail part of the curve.

    Probability plotting on a spreadsheet

    Normal distribution

    For normal distribution Pi is converted to another scale Zi by the following relation

    ])1([063.5 135.0135.0 iii PPZ = Now, xiand Ziare linearly related and a plot of xivs Ziwill give a straight line on arithmetic scale

    if the data is normally distributed.

    Logarithmic distribution

    Exactly the same procedure as for the Gumbel plotting, but log scale is used for the Y-axis.

    Gumbel Probability Plot

    To perform Gumbel plot, Pi needs to be converted to another scale y i by using the following

    relation.

    )]ln(ln[ ii Py = Now, xiand yiare linearly related, a plot of xivs yiwill give a straight line on arithmetic scale if

    the data are Gumbel distribution.

    Log Gumbel Probability plotting

    Exactly the same procedure as for Gumbel plotting, but use log scale for y-axis.

    Goodness-of-fit tests

    The distribution that has been chosen to fit the observed data can be tested for the goodness-of-fit.

    This can be done by comparing the theoretical and sample values. With graphical methods this is

    done by visually assessing the goodness-of-fit. Statistical goodness-of-fit are available to test the

    hypothesis the observed data from the fitted probability distribution.

    Probability plot correlation coefficient (PPCC) tests.

    This test is usually performed to test the linearity of a probability plot on normal, lognormal,

    gumbel, loggumbel plot.

    PPCC test for normal probability plot

  • 8/14/2019 Frequency_Analysis

    12/12

    KNS3143/07-08/RB

    10 Frequency Analysis

    12

    Compute the correlation of coefficient (r) between x iand zi. If r is greater than the critical r value

    at =5%, then the data are normally distributed.

    PPCC test for Lognormal probability plot

    Compute the correlation of coefficient (r) between Log(xi) and zi. If r is greater than the critical r

    value at =5%, then the data are Lognormally distributed.

    PPCC test for Gumbel probability plot

    Compute the correlation of coefficient (r) between xiand yi. If r is greater than the critical r value

    at =5%, then the data are Gumbel distributed.

    PPCC test for LogGumbel probability plot

    Compute the correlation of coefficient (r) between Log(xi) and yi. If r is greater than the critical r

    value at =5%, then the data are LogGumbel distributed.