prof v.v.srinivas lecture-2 feb8 2012 given [compatibility mode]
TRANSCRIPT
Regional Frequency Analysis of FloodsRegional Frequency Analysis of Floods
Dr. V. V. SrinivasAssociate ProfessorAssociate Professor
Department of Civil Engineering, IISc Bangalore 560 012, India
Overview of PresentationOverview of Presentation
• Introduction
• Definitions
• Methods for Design Flood Estimation
• Case studies
Introduction
• Ungauged sites
Common problems in hydrologic design
• Ungauged sites
• Inadequate at-site information at gauged sites
Waterlogged field
Flooded Culvert
Culvert Failure
Levee Failure
Floodwaters washed away a culvert
Forum photo by Andy Blenkush
Introduction
Typical Applications of Predicting Floods
Planning and design of water-resources systems
Risk assessment of water control/conveyance structures where f il d t tifailure can cause devastation
• Culverts, Bridges, Storm Sewers, Dams, Levees, Flood Walls, floodway channels etc.
Economic evaluation of Engineering projects
Land use planning and management
Insurance assessment
Definitions
Complete duration series:All the peak flow data available for a location
Annual maximum seriesA series prepared using the largestvalues occurring in each of the years ofthe recordthe record.
Definitions
Partial duration series [or peak-over-threshold (POT) series]A series prepared using data points selectedfrom the observed record so that theirmagnitude is greater than a predefinedthreshold value
POT(K) series: PDS in which the number ofdata points is about K times the number ofyears of the record
A l d i [ POTAnnual exceedence series [or POT(1) series]
Preparation of POT series (Ref: FEH, 1999)
(i) Two consecutive peaks must be separated by at least three times the averagetime to rise.
At least five clean (i.e., not multi-peaked) events, whose peaks exceed thethreshold must be considered.
Time to rise
Engineering Hydrology By Subramanya
Figure: Elements of Flood Hydrograph
Preparation of POT series (Ref: FEH, 1999)
(ii) The minimum flow value in the trough between two peaks must be less than two-thirdsof the flow value of the first of the two peaksof the flow value of the first of the two peaks.
Average time to rise = 5 hoursTwo consecutive peaks must be separated by at least 15 hoursPeak ‘E’ is the largest and is therefore independentPeak ‘D’ occurs less than 15 hours before peak ‘E’ and is defined as dependentPeak ‘F’ is defined as dependent since although it occurs more than 15 hours after the peak ‘E’, the minimum discharge in the trough between the two peaks does not fall by more than two thirds of the peak discharge for event ‘E’
(FEH, 1999)
more than two-thirds of the peak discharge for event E
Peak ‘C’ is larger than peaks ‘A’ and ‘B’ and is judged independent of peak ‘E’ because it occurs more than 15 hours beforehand th i i di h i th t h b t th t i l th t thi d f ththe minimum discharge in the trough between the two is less than two-thirds of the discharge for peak C.
Peak ‘B’ occurs less than 15 hours before peak ‘C’ and is therefore dependentPeak ‘A’ is below the threshold and therefore no a POT event
(FEH, 1999)
Peak A is below the threshold and therefore no a POT event
Definitions
Random variableA variable is said to be random if its value depends on the outcome of a chance event
Examples of Event:• Culvert capacity being exceeded• Stage of flood flow exceeding deck height of a bridge
Recurrence interval (τ)Time interval between any two occurrences of an event
Return Period (T)Return Period (T)Expected value of τ for events equaling or exceeding a specified magnitude T= E(τ)
Probability of occurrence of an eventLet p=P(X ≥ xT) denote the probability of occurrence of an event X ≥ xTLet p P(X ≥ xT) denote the probability of occurrence of an event X ≥ xT
p can be related to return period T as, p=1/T
Probability of failure (non-occurrence of the event)Probability of failure (non occurrence of the event)
( ) ( ) 11 1T TF P X x P X xT
= < = − ≥ = −
Probability of occurrence of an event
For each observation in peak flow time series, there are two possible outcomes• Occurrence of event if X ≥ xT• Non-occurrence (failure) if X < xT
In peak flow time series, observations must be independent of each other.
The probability of a recurrence interval of duration τ = ( ) 11p p τ −−
Return period of the event = T=
( ) ( ) ( ) ( ) ( )1 2 31 1 2 1 3 1 4 1E C p p p p p pτττ∞ − ⎡ ⎤= − = + − + − + −∑ ⎣ ⎦
The expression in parenthesis has the form of the power series expansionwith x = -(1-p) and n = -2
( ) ( ) ( ) ( ) ( )11
1 1 2 1 3 1 4 1E C p p p p p pτ
τ=
⎡ ⎤= = + + +∑ ⎣ ⎦
Power series expansion, ( ) ( ) ( )( )2 31 1 21 1
2! 3!n n n n n n
x nx x x− − −
+ = + + + +K
( ) ( ) ( ) 2 11 1 1nT E p x p p pτ − −∴ = = + = − − =⎡ ⎤⎣ ⎦( ) ( ) ( )1p T
⎣ ⎦
⇒ =
(Ref: Chow et al., 1988)
(Ref: Chow et al., 1988)
DefinitionsRisk
A flood control structure fails if peak flow value exceeds Tx RiskT
(years)
For design life =100 yr
The hydrologic risk of failure can be computed as,
( )1 1 en
TR P X x= − − ≥⎡ ⎤⎣ ⎦
0.10 9500.15 6160.20 449( )T⎣ ⎦
en : Expected (or design) life of the structure
0.25 3480.30 2810.35 233e
( )1
1/ nT =
0.40 1960.45 1680.50 145
( )11 1 e/ nR− − 0.55 1260.60 1100 65 960.65 960.70 84
( ) ( ) 11 1T TF P X x P X xT
= < = − ≥ = −
Estimation of Return period (in years)Design life of structure (years)
Risk 100 50 250.10 950 475 2380 15 616 308 1540.15 616 308 1540.20 449 225 1130.25 348 174 870.30 281 141 710.35 233 117 590.40 196 98 490.45 168 84 420 50 145 73 370.50 145 73 370.55 126 63 320.60 110 55 280.65 96 48 240.70 84 42 210.75 73 37 190.80 63 32 160 85 53 27 140.85 53 27 140.90 44 22 11
How much at-site data do we need for Hydrologic design?
Institute of Hydrology, UKAt least T and preferably 2T
140
Limitations100
120
140
ars)
100
120
140
s)
• Ungauged sites
• Locations having paucity (scarcity) of flood data60
80
100
engt
h (y
ea
60
80
100
engt
h (y
ears
g p y ( y)
20
40
60
Rec
ord
L
20
40
60
Reco
rd L
e0
0 100 200 300 400
G i St ti i Ohi USA
00 100 200 300
Gauging Stations in Indiana USAGauging Stations in Ohio, USA Gauging Stations in Indiana, USA
(RL<60 years: 85%)(RL<60 years: 77%)
45
50
25
30
3540
Leng
th (Y
ears
)
0
5
1015
20
Rec
ord
L
00 10 20 30 40 50
Gauging stations in Krishna basin
40
50
ars)
20
30
ord
Leng
th (Y
ea
0
10
0 10 20 30 40 50 60 70
Rec
o
0 10 20 30 40 50 60 70Gauging stations in Godavari basin
(RL< 50 years: 100% of stations)
Methods for Design Flood Estimationg
Empirical methods
Dicken’s formula, Ryve’s formula, Inglis’ formula, Nawab Jung Bahadur formula, Creager’s formula and Rational formula
Frequency analysis methodsq y y
Rainfall-runoff methods
Unit hydrograph methodsy g p
Empirical Methods for Design Flood Estimationp g
(a) Peak discharge formulae involving drainage area only
Dicken’s formula
3 4/pQ CA=
Qp – Peak discharge in m3/s
A – Catchment area in km2
C=11.5 (North India); 14-19.5 (Central India); 22 to 26 (Western Ghats of India)
C - Constant
C depends on Rainfall and Altitude of catchment
Ryve’s formulaRyve’s formula
C = 6 8 for catchment areas within 80 km from the coast
2 3/pQ CA=
C = 6.8 for catchment areas within 80 km from the coast
= 8.8 for catchment areas within 80-2400 km from the coast
Empirical Methods for Design Flood Estimationp g
(a) Peak discharge formulae involving drainage area only
Inglis’ formula
(f f h d t h t i ld B b t t )
12310 4pAQ
A .=
+
Qp – Peak discharge in m3/s
A – Catchment area in km2
(for fan shaped catchments in old Bombay state)
N b J B h d f l
A' – Catchment area in mi2
C - ConstantNawab Jung Bahadur formula
(f H d b d D C t h t ) C 1 77 177
( )0 93 1 14. / log ApQ CA ′−⎡ ⎤⎣ ⎦′=
(for Hyderabad Deccan Catchments); C=1.77 - 177
Oth f l C ’ f l J i f l M difi d M ’ f lOther formulae: Creager’s formula; Jarvis formula; Modified Myer’s formula
Empirical Methods for Design Flood Estimationp g
(b) Peak discharge formulae involving drainage area and its shape
Dredge or Burge formula (for Indian catchments)Dredge or Burge formula (for Indian catchments)
( )2 3 2 319 6 19 6p / /
BLAQ . .L L
= =Qp – Peak discharge in m3/s
A – Catchment area in km2A Catchment area in km
L – Average length of catchment area in km
B – Average width of the catchment in km
Pettis formula P – Probable 100 year maximum 1 dayrainfall in cm
( )5 4/Q C P B= ⋅
C = 1.5 (humid areas); 0.2 (desert areas)
C - Constant( )pQ C P B
( ) ( )
(for northern USA, Ohio to Cennecticut)
Empirical Methods for Design Flood Estimationp g
(c) Peak discharge formulae involving rainfall intensity and drainage area
Rational formula
pQ CiA=Qp – Peak discharge in cfs
A – Catchment area in acres
C – Runoff coefficient (0 ≤ C ≤1)
i – Intensity of rainfall in inches/hr
The peak discharge corresponds to time of concentration
Used widely in storm sewer designs, because of its simplicity
1 cfs = 1 .008 acre.in/hr(the conversion is considered to be included
in the value of C)
The assumption of fixed C (i.e., ratio of peak discharge to rainfall rate) for a drainage basin is invalid.
In rational formula, the value of runoff coefficient ‘C’ depends on integrated effectsof several factors.
Character and condition of soil
Rainfall intensity
Proximity of the water table
Degree of soil compaction
Porosity of subsoilPorosity of subsoil
Vegetation
Ground slopeGround slope
Ponding character of surface of catchment - Depression storage
R i f ll i t it i i d t i d b d i t it d ti f (IDF)Rainfall intensity i is determined based on intensity-duration-frequency (IDF) relationships.
Design duration D is equal to the time of concentration for the catchment.
The frequency F (or corresponding return period, T ) is chosen by hydrologist.
2.5
2525
Fig. Intensity-duration-frequency curves of maximum rainfall in Chicago, USA (Source: Chow et al., 1988)
(Ref: Chow et al., 1988)
(Ref: Chow et al., 1988)
Issues with use of Empirical Methods
None of the empirical formulae will give precise results for all locations
Magnitude of flood in a catchment depends on a large set of predictor variables, and none of the formulae consider adequate number of variables.and none of the formulae consider adequate number of variables.
Predictor variables - Examples1) Precipitation
2) Physiographic catchment characteristicsDrainage Area; Length of longest channel
3) Location attributes3) Location attributesLatitude; Longitude; Altitude
4) Landuse/landcoverB il A A i l l A F A W b di W l dBuilt-up Area; Agricultural Area; Forest Area; Water bodies; Waste lands
5) Catchment shapeCompactness coefficient; Circularity ratio; Form factor; Elongation ratio
6) Drainage CharacteristicsDrainage density; Slope; Soils
Form factor: Ratio of area of the basin to square of its axial length (Large)
Compactness coefficient: Ratio of the perimeter of the basin to theCompactness coefficient: Ratio of the perimeter of the basin to thecircumference of a circle of area equal to the basin area. (Small)
Elongation ratio: Ratio of the diameter of a circle having area the same as theb i t th i l th f th b i (L )basin area, to the maximum length of the basin. (Large)
Circularity ratio: Ratio of the basin area to the area of a circle of sameperimeter as that of the basin. (Small)p ( )
Frequency Analysis Methods for Design Flood Estimation q y y g
Information at target site is augmented with that from gauged sites depicting similar hydrological behavior
Data or model parametersData or model parameters
Functional relationships between model parameters and watershed attributes
Region: A set of gauged sites depicting similar hydrological behavior
Regionalization: Process of identifying regions
R i l F A l i F l i b d R i l i f tiRegional Frequency Analysis: Frequency analysis based on Regional information
Approaches to Regionalization
Regression Approach- Thomas and Benson [1970], Wandle [1977], Glatfelter [1984]Choquette [1988]
ROI approach- Acreman and Wiltshire [1987] Burn [1990 a b] Zrinji and Burn [1994]- Acreman and Wiltshire [1987], Burn [1990 a,b], Zrinji and Burn [1994]
Hierarchical approach & Extension to ROI- Gabriele and Arnell [1991], Zrinji and Burn [1996]
Classification and Regression Tree (CART) based Approach- Breiman et al. [1984], Laaha and Bloschl [2006]
Canonical correlation analysis- Cavadias [1989, 1990, 1995], Cavadias et al. [2001]
Approaches to Regionalization
Nearest Neighbor approach- Vogel [2005]
Ensemble Prediction Approach- McIntyre et al. [2005]
Cluster analysis- Tasker [1982], Burn [1989], Burn et al. [1997], Burn and Goel [2000],
Hall and Minns [1999], Hall et al. [2002], Jingyi and Hall [2004]Rao and Srinivas [2006a b]Rao and Srinivas [2006a,b]
Frequency Analysis Methods for Design Flood Estimation
When T ≤ 27 years (FEH, IH Wallingford, 1999; Vol.3, P.46)
q y y g
Length of record (years)
Site Analysis Pooled Analysis
Recommendedmethod
y ( g )
(y ) y method
<T/2 No Yes Pooled analysis
T/2 to T For confirmation Yes Pooled analysis prevails
T to 2T Yes Yes Joint (site and Pooled) Analysis
> 2T Yes For confirmation Site analysis> 2T Yes For confirmation Site analysis prevails
Frequency Analysis Methods for Design Flood Estimation
When T > 27 years (FEH, IH Wallingford, 1999; Vol.3, P.46)
q y y g
Length of record (years)
Site Analysis Pooled Analysis
Recommendedth d
y ( g )
(years) Analysis method
<14 No Yes Pooled analysis
14 to T For confirmation Yes Pooled analysis prevails
T to 2T Yes Yes Joint (site and Pooled) Analysis
> 2T Yes For confirmation Site analysis> 2T Yes For confirmation Site analysis prevails
Assumptions in flood frequency analysis (FFA)p q y y ( )
At-site FFA: The flood (predictand) data at the target site are• serially independent• identically distributed in time
Regional FFA : The flood data at any site in a region are• serially independent• identically distributed in timey
The flood data at different sites in the region• are independent in space• have identical frequency distributions apart from a scale factor• have identical frequency distributions, apart from a scale factor
The causal precipitation (predictor) system is stochastic space independent andThe causal precipitation (predictor) system is stochastic, space-independent and time-independent (Chow et al., 1988).
Rainfall-runoff method for Flood Estimation
Involves estimation of flood frequency from rainfall frequency using ahydrological model which links rainfall to resultant runoff.y g
The method is appealing as it yields a hydrograph of river flow rather thanjust an estimate of peak flow.
Unit hydrograph method for Design Flood Estimation
Involves developing regression relationships between the physicalcharacteristics of catchments and parameters of their unit hydrographs, toarrive at a synthetic unit hydrograph for estimation of design floodarrive at a synthetic unit hydrograph for estimation of design flood.
Factors influencing the choice of Flood Estimation Approach
1. Type of Problem
h i f ll ff h h ld ll b d f i i fThe rainfall-runoff approach should generally be used for estimation of• Reservoir flood
• Probable maximum flood
P bl i l i i• Problems involving storage routing
2. Type of catchment
• Statistical approach should generally be used for large catchments (>1000 km2)
• Rainfall-runoff approach is favored when sub-catchments are disparate/dissimilar
3. Type of data
• Statistical approach is favoured when the gauged record is longer than 2 or 3 years
R i f ll ff h i f d h th i ti fl d b t• Rainfall-runoff approach is favored when there is no continuous flow record, butrainfall and flow data are available for 5 or more flood events
(FEH, 1999)
Points to be noted with reference to case studies in India
Several of the sub-zones delineated by CWC (1983) are heterogeneous
Need to delineate alternative homogeneous regions
Need to consider Partial duration Series
Regions need not be strictly contiguous
Regions need not be hard and can be fuzzyRegions need not be hard, and can be fuzzy
Selection ofattributes
Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds
Forming clusters
Selecting optimum number of regionsSelecting optimum number of regions
Testing the regions for homogeneity
Are the regionsNo
Adjustment of Homogeneous?
Yes
heterogeneous regions
Determining values of predictands
Selection ofattributes
Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds
Forming clusters
Selecting optimum number of regionsSelecting optimum number of regions
Testing the regions for homogeneity
Are the regionsNo
Adjustment of Homogeneous?
Yes
heterogeneous regions
Determining values of predictands
B i Ch t i ti F R i l A l i
Physiographic characteristics
Basin Characteristics For Regional Analysis
y g pDrainage area, Length and slope of main stream
Soil characteristicsInfiltration potential; Runoff coefficient Permeability Bulk densityInfiltration potential; Runoff coefficient, Permeability, Bulk density
Land use characteristicsForests, Agricultural, Suburban or urban land
Drainage characteristicsDrainage density
Meteorological characteristicsStorm direction, Mean annual precipitation
Directional statisticsDirectional statisticsSeasonality of flood events
Basin Characteristics For Regional Analysis (Contd )
Geographical location attributesLatitude longitude and Altitude
Basin Characteristics For Regional Analysis (Contd…)
Latitude, longitude, and Altitude
Geologic features of the basinFraction of the basin underlain by rock formationsFraction of the basin underlain by rock formations
Shape indicators of catchmentsForm factor, Compactness coefficient, Elongation ratio, Circularity ratio
Feature vector : vector containing basin characteristics
Selection ofattributes
Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds
Forming clusters
Selecting optimum number of regionsSelecting optimum number of regions
Testing the regions for homogeneity
Are the regionsNo
Adjustment of Homogeneous?
Yes
heterogeneous regions
Determining values of predictands
Clustering Clustering Algorithms available in literatureAlgorithms available in literatureClustering Clustering Algorithms available in literatureAlgorithms available in literature
The clustering algorithms available in literature can be broadly classified as
Partitional [e.g., K-means and K-medoids algorithms, fuzzy c-means clustering]
Hierarchical [e.g., single-linkage, complete-linkage and Ward’s algorithms.]
Density-based [e.g., DBSCAN (Density-Based Spatial Clustering of
Applications with Noise) and OPTICS (Ordering Points To Identify the Clustering
Structure]
Grid-based [e.g., STING (a STatistical INformation Grid approach) and CLIQUE
(Clustering In QUEst).].
Model-based and [e.g., COBWEB ]
Boundary-detecting. [e.g., SVC (Support Vector Clustering) ]
Regionalization using Soft Clustering Techniques
Fuzzy ClusteringBargaoui et al [1998] Hall and Minns [1999] Rao and Srinivas[2006]Bargaoui et al., [1998], Hall and Minns [1999], Rao and Srinivas[2006]
Artificial Neural Network Clustering
SOFMsHall and Minns [1999], Hall et al. [2002], Jingyi and Hall [2004]
G i N l G N t kGrowing Neural Gas NetworksSrinivas and Tripathi [2006]
Fuzzy SOFMSrinivas et al. [2008]
Fuzzy C-Means Algorithm
),()() :,( 2
1 1ik
c
i
N
kik du VXXVUJ ∑ ∑
= =
μ=Minimize
subject to the following constraints:
{ }N,,kuc
iik K1 1
1∈∀=∑
=
subject to the following constraints:
{ }c,,iNuN
kik K1 0
1∈∀<< ∑
=
initinit
∈[1, ] refers to weight exponent or fuzzifierμ ∞
(1) Initialise fuzzy cluster centroids: initc
init VV ,,1 K
(2) Compute fuzzy memberships:
1 ( 1)
2
1 ( 1)
1( )
for 1 11
/
initk i
ik /c
d ,u i c, k N
μ
μ
−
−
⎛ ⎞⎜ ⎟⎝ ⎠= ≤ ≤ ≤ ≤⎛ ⎞
X V
21
1( )
c
initi k id ,=
⎛ ⎞⎜ ⎟⎝ ⎠
∑ X V
( )N
ik ku μ∑ X(3) Update fuzzy cluster centroids for i =1,2,…,c
( )1
1
ki N
ikk
u μ
=
=
=
∑V
(4) Update the fuzzy membership as :iju
1 ( 1)
21
( )f 1 1
/
k id ,k N
μ−⎛ ⎞⎜ ⎟⎝ ⎠X V
1 ( 1)
21
( ) for 1 1
1( )
k iik /c
i k i
,u i c, k N
d ,
μ−
=
⎝ ⎠= ≤ ≤ ≤ ≤⎛ ⎞⎜ ⎟⎝ ⎠
∑ X V
Repeat steps (3) and (4) until the change in the value of the memberships between two successive iterations becomes sufficiently small
⎝ ⎠
successive iterations becomes sufficiently small.
Conventional methods suggest hardening fuzzy partition matrix
Nk uuu 1111 ⎤⎡ LL
iNiki
Nk
uuu
uuu
1
1111
⎥⎥⎥⎥⎥⎤
⎢⎢⎢⎢⎢⎡
=MMM
LL
MMM
U
NccNckc uuu 1 ×⎥⎥⎦⎢
⎢⎣ LL
MMM
1 }{ max1
==≤≤
ikci
jk uu jiuik ≠∀= 0
dd xv −== min}{min 1=jku jiu ≠∀= 0thenkici
ikci
jk dd xv −==≤≤≤≤ 11
min}{min 1=jku jiuik ≠∀= 0then
Instead, Rao and Srinivas (2006a) suggested using a threshold membership of 1/c to form fuzzy clusters
Srinivas et al. (2008) defined threshold for each feature vector
⎭⎬⎫
⎩⎨⎧
⎥⎦⎤
⎢⎣⎡=
≤≤)u(,
cmaxT ik
cik max
1211
Self-Organizing Feature Map (SOFM) Clustering
(1) Initialize weights {wij , i =1,…n, j =1,…m} of the connections from the n input nodes to the m output nodes, randomly.
kx1 m output nodes
nkx
p
(2) Draw an input vector Xk (randomly without replacement) and compute its distance from j-th output node Wj . Find the winning output node as:
mjttarg jkj
,1,2, )()(min ω K=−= WX
kx1 m output nodes
nkx
p
(3) Update the weight vectors of winner and its neighboring nodes:
)]()()[()()()1( ω ttthttt jkjjj WXWW −+=+ η )]()()[()()()( ω jk,jjj η
Repeat steps (2) and (3) over several iterations, until no changesin the Kohonen layer mapping are observed.
kx1 m output nodes
nkx
p
Decrease in topological neighborhood with increase in iterations
kx1 m output nodes
nkx
p
)(ω, th jNeighborhood function: Learning rate parameter :
)(tη
⎞⎛)0(η has value close to 0.1⎟⎟
⎠
⎞⎜⎜⎝
⎛−=
1exp)0()(
τηη tt
, a constant, is set equal to the maximum number of iterations (1000 in this study).
1τ maxt
⎟⎟⎞
⎜⎜⎛−=)( 2
2ω,
ωd
expth jj ⎟
⎠⎜⎝ )(2
)( 2ω,t
pjσ
is the topological distance between the winning node and its neighboring node jjd ,ω ω
jjd rr −= ωω, ⎟⎟⎠
⎞⎜⎜⎝
⎛−=
2)0()(
τσσ texpt
Effective width of the topological neighborhood at time t :
(0)ln max
2 στ
t=
)(tσ
ωr
Effective width of the topological neighborhood at time t :
Discrete vector defining the position of the winning node :
Discrete vector defining the position of the neighboring node j : r
)(tσ
Discrete vector defining the position of the neighboring node j :
Radius of lattice in the output layer of SOFM :
jr)0(σ
Existing thumb rules to decide number of nodes in output layer of SOFM
chose nodes to be at least two times CExp (Hall and Minns,1999)
chose nodes to be at least three times CExp (Hall et al., 2002)
CExp : Expected number of clusters
N t C i t d i i t k i i
Proposed method to design output layer of SOFM
Note: CExp in a study region is not known a priori.
Proposed method to design output layer of SOFM
For accurate estimation of flood quantiles with non-exceedance probability upto 0.99,regions of size 20 sites are reasonable (p.59, pp.119-123, Hosking and Wallis, 1997).g (p , pp , g , )
CExp = Number of sites in study region/20 = 245/20 ≈ 12
15
20
25
hits
5
10
15
Num
ber o
f
Case study:Indiana
01 3 5 7 9 11 13 15 17 19 21 23
Output neuron
65
10
15
Num
ber of Hi 7
80
5
10
15
Num
ber of Hits
2
3
4
5
2
34
560
its
23
45
67
34
56
780
s
11 12
12
Count maps for Kohonen layers obtained with classical SOFM
Case study: Ohio
30
40
H
20
40Num
be
3
4
3
40
10
20
Hits
3
4
5
3
4
50
20
er of Hits
1
2
1
2
1
2
1
2
3
The feature maps generated did not reveal any information that would allow selectionThe feature maps generated did not reveal any information that would allow selection of an appropriate number of clusters
3 options considered:
1-D SOFM with number of nodes in Kohonen layer equal to the expected number of regions Srinivas et al. [2003]
3 options considered:
Growing Neural Gas Networks Srinivas and Tripathi [2006]
Fuzzy SOFM Srinivas et al. [2008]
Fuzzy SOFM
c clusters formedat level-2
kx1 m´ prototypes formed at level-1
nkx
Srinivas et al. (2008)
Selection ofattributes
Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds
Forming clusters
Selecting optimum number of regionsSelecting optimum number of regions
Testing the regions for homogeneity
Are the regionsNo
Adjustment of Homogeneous?
Yes
heterogeneous regions
Determining values of predictands
Selecting optimum number of regions using cluster validity measures
Partition Coefficient
Selecting optimum number of regions using cluster validity measures
(Bezdek, 1974)
In Hydrology:Bargaoui et al., 1998; Hall and Minns 1999
2
1 1
1 c N
PC iki k
V ( ) ( u )N = =
= ∑∑UMax Range: [1/c,1]
Partition Coefficient ( e de , 9 )
Hall and Minns, 1999
N⎡ ⎤
Partition Entropy (Bezdek, 1974)
1 1
1 c N
PE ik iki k
V ( ) u log( u )N = =
⎡ ⎤= − ⎢ ⎥
⎣ ⎦∑∑UMin Range: [0, log c]
11)(
1)(−×
−=c
VcFPI PC U
U Range: [0, 1]Min
Fuzziness Performance Index In Hydrology:Guler and Thine, 2004
(Roubens, 1982)
1−c
(Roubens, 1982)Normalized Classification Entropy
)log()()(
cVNCE PE UU = Range: [0, 1]Min In Hydrology:
Guler and Thine, 2004
Extended Xie-Beni Index (Xie and Beni, 1991)
2
1 12( : )
min
c N
ij i ki k
XB,mi k
( u )V ,
N
μ
= =−
=−
∑∑ V XU V X
V V In Hydrology:i ki ,i k≠
Kwon’s Index (Kwon, 1998)
In Hydrology:Rao and Srinivas, 2006
22
1 1 12
1
( : )min
c N c
ij i k ii k i
XB,m
( u )cV ,
N
μ
= = =− + −
=∑∑ ∑V X V V
U V XV V
( )
min i ki ,i kN
≠−V V
Selection ofattributes
Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds
Forming clusters
Selecting optimum number of regionsSelecting optimum number of regions
Testing the regions for homogeneity
Are the regionsNo
Adjustment of Homogeneous?
Yes
heterogeneous regions
Determining values of predictands
Definition of Regional HomogeneityDefinition of Regional Homogeneity
A region is said to be homogeneous if data at all sites in that region followA region is said to be homogeneous if data at all sites in that region followsame distribution.
Degree of homogeneity is decide by comparing the between-site dispersionof the sample L-moment ratios for the sites in a region, with dispersion thatwould be expected for a homogeneous region.
66
Testing Regions for HomogeneityTesting Regions for Homogeneity using Heterogeneity Measures of (Hosking and Wallis, 1997)
Heterogeneity measure (HM) based on L-coefficient of variation (L-CV):
V
V1 σ
)μ(VH
−=
)μ(V
2
2
V
V22 σ
)μ(VH
−=HM based on L-CV and L-Skewness:
HM based on L-Skewness and L-Kurtosis: 3
V
V33 σ
)μ(VH
−=
3Vσ
)(itL-CV at site i :
)(3it
)(i
L-skewness at site i :
k i i
NN iR )(
)(4itL-kurtosis at site i :
∑∑==
=i
ii
ii
R ntnt11
)(Regional average L-CV:
∑∑==
=N
ii
N
i
ii
R ntnt11
)(33
Regional average L-skewness:
Regional average L-kurtosis: ∑∑==
=N
ii
N
i
ii
R ntnt11
)(44
Weighted standard deviation of at-site sample L-CVs:
{ } 2/1
11
2)( )(V ∑∑ −===
N
ii
N
i
Ri
i nttn
Weighted average distance from the site i to the group weighted mean in the 2-dimensional space of L-CV and L-skewness:
{ } ∑∑ −+−=N
ii
N
i
RiRii nttttn
11
2/123
)(3
2)(2 )()(V
== ii 11
Weighted average distance from the site i to the group weighted mean in the 2-dimensional space of L-skewness and L-Kurtosis:
{ } ∑∑ −+−=N
ii
N
i
RiRii nttttn
11
2/124
)(4
23
)(33 )()(V
in the 2 dimensional space of L skewness and L Kurtosis:
== ii 11
Vμ : Mean of V values from Monte Carlo (MC) simulationsVμ
2Vμ
: Mean of V values from Monte-Carlo (MC) simulations
: Mean of V2 values from MC simulations
3Vμ : Mean of V3 values from MC simulations
Vσ
2Vσ
: Standard deviation (SD) of V values from MC simulations
: SD of V2 values from MC simulations2
3Vσ : SD of V3 values from MC simulations
MC simulations 500
HETEROGENEITY MEASURES (Hosking and Wallis, 1997)
VALIDATION
Heterogeneity measure (HM) based L-CV: V
V1 σ
)μ(VH
−=
2V22
)μ(VH
−=HM based on L-CV and L-Skewness:
Vσ
2V2 σ
H
3V3 )μ(V −HM based on L-Skewness and L-Kurtosis:
3
3
V
V33 σ
)μ(H =
Acceptably homogeneous : H < 1
Possibly heterogeneous : 1≤ H < 2
Definitely heterogeneous : H ≥ 2
Selection ofattributes
Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds
Forming clusters
Selecting optimum number of regionsSelecting optimum number of regions
Testing the regions for homogeneity
Are the regionsNo
Adjustment of Homogeneous?
Yes
heterogeneous regions
Determining values of predictands
Adjustment of Heterogeneous ClustersHosking and Wallis (1997):
eliminating one or more sites from the data set
dividing a region to form two or more new regions
allowing a site to be shared by two or more regions
i it f i t th imoving one or more sites from a region to other regions
dissolving regions by transferring their sites to other regions
merging a region with one or more regionsg g g g
merging two or more regions and redefining groups
obtaining more data and redefining regions
Adjustment of Heterogeneous ClustersAdjustment of Heterogeneous Clusters
The primary option that is considered for adjusting a heterogeneous cluster is elimination of one or more sites that are grossly discordant
ith t t th it ithi th l t Th l di d twith respect to other sites within the cluster. The grossly discordant sites are identified using the discordancy measure given by Equation
11 T 11 ( ) ( )3i k i iD N −= − −u u S u u
where is discordancy measure for ith site in a cluster k having sites.iD kNwhere is discordancy measure for ith site in a cluster k having sites.
is a vector containing L- moment ratios (LMR) of predictand at site i
i th i ht d f th LMR d S i i t i
i k
T( ) ( )( )
i 3 4 i iit t t⎡ ⎤⎢ ⎥⎣ ⎦
=u
u is the unweighted group average of the LMR and S is a covariance matrix
kNi∑ u T( )( )
kN∑S
u
1i
i
kN=∑
=u
uT
1( )( )i i
i== − −∑S u u u u
74
Selection ofattributes
Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds
Forming clusters
Selecting optimum number of regionsSelecting optimum number of regions
Testing the regions for homogeneity
Are the regionsNo
Adjustment of Homogeneous?
Yes
heterogeneous regions
Determining values of predictands
Growth curve estimation using index flood method
The flood quantile estimate at site i for T-year return period is determined using index flood method (Dalrymple, 1960) by constructing a growth curve as,
ˆ ˆ ˆ( ) ( )i i kQ T q Tμ=
ˆiμ : Index flood value for site i (e.g., mean, or median of flood data)
Probability Distributions to fit peak flowsProbability Distributions to fit peak flows
Two parameter distributionsGumbel distribution (Extreme value type-1)Log Normal 2 (LN2) parameter distributionLog-Normal 2 (LN2) parameter distributionTwo parameter Gamma distribution
Three parameter distributionsGeneralized Pareto (GPA) distributionGeneralized Extreme value (GEV) distributionGeneralized logistic (GLO) distributionLog-Normal 3-parameter (LN3) distribution; Generalized Normal (GNO) distributionLog Normal 3 parameter (LN3) distribution; Generalized Normal (GNO) distributionThree parameter Gamma distribution; Pearson type-3 (P3) distributionLog- Pearson type-3 (LP3) distribution
Four parameter distributionFour parameter distributionKappa distributionFour-parameter Wakeby distribution
Five parameter distributionWakeby distribution
Which probability distribution fits the flood data?Which probability distribution fits the flood data?
Procedures for Identification
Probability papers
Goodness-of-fit testsGoodness of fit tests
Chi-Square test
Kolmogorov-Smirnov test
L-moment ratio diagram
Probability Papery pFlood data are plotted on probability paper to check if a frequency distribution fits the data
Probability papers are specifically designed for the frequency distributions
One of the axes represents the values of peak flows other axis represents exceedence or non-exceedence probability associated with peak flows, or return period
The plotted data appears close to a straight line if the frequency distribution fits the data
Flood quantile is estimated graphically by interpolation or extrapolation of the determined linear relationship between abscissa and ordinatelinear relationship between abscissa and ordinate
S No Formula b Equation
Plotting Position Formulae
S.No. Formula b Equation
1 California -
2 M difi d C lif i
( )m
mP X xn
≥ =
1m2 Modified California -
3 Hazen 0.5
( ) 1m
mP X xn−
≥ =
( ) 0 5m
m .P X x −≥ = ( )
1 2m
m bP X xb
−≥ =
+4 Gringorten 0.4
4
( )m n
( ) 0 440 12m
m .P X xn .−
≥ =+
( )1 2n b+ −
5 Blom 3/8
6 Tukey 1/3
( ) 3 81 4m
m /P X xn /−
≥ =+
( ) 1 3m /P X x −≥ =
7 Chegodayev 0.3
( )1 3mP X x
n /≥
+
( ) 0 30 4m
m .P X xn .−
≥ =+
8 Weibull 0 ( )1m
mP X xn
≥ =+
Results from study of Cunnane (1978)Results from study of Cunnane (1978)
Probability distribution Best plotting position formula
Log-normal / Normal Blom
EV-1 Gringorten
( ) 3 81 4m
m /P X xn /−
≥ =+
0 44m −EV-1(Extreme value type-1)
Gringorten
Exponential Gringorten
( ) 0 440 12m
m .P X xn .−
≥ =+
( ) 0 440 12m
m .P X xn−
≥ =+
Uniform Weibull
LP 3
0 12n .+
( )1m
mP X xn
≥ =+
LP-3(Log Pearson type 3) 3 8 for 0
3 8 for 0s
s
b / Cb / C> >< <
( )1 2m
m bP X xn b
−≥ =
+ −
( ) ( )1 mF x P X x= − ≥
Goodness-of-fit tests
Chi-square testThe flood data at target site are divided into m class intervals.
( )2m O E
Oi: The observed number of occurrences in class interval iE Th t d b f i l i t l i
( )22
1
mi i
ii
O EE
χ=
−=∑
Ei: The expected number of occurrences in class interval i
If the class intervals are chosen such that each class interval corresponds to an
E /equal probability, then , where n is the sample size of data.iE n / m=
2 2m
im O nχ = −∑
1i
iO n
nχ
== ∑
The distribution of is chi-square with degrees of freedom 2χ 1m pυ = − −
p : The number of parameters of distribution that is being tested to fit the data.
2 2zυ
χ ∑
: random variable among independent standard normal random variables
1i
izυχ
==∑
iz
The hypothesis that the data are from a specified distribution is rejected at significance level if following Equation is satisfied.
α
2 21,υ αχ χ −>
Kolmogorov-Smirnov Goodness-of-fit testKolmogorov Smirnov Goodness of fit test
The Kolmogorov-Smirnov test compares the sample cumulative frequency function and e o ogo o S o test co pa es t e sa p e cu u at e eque cy u ct o a dtheoretical cumulative distribution function (determined from the theoretical frequency distribution under test) to judge whether a chosen distribution fits the data acceptably closely.
The observed flood data at target site are divided into m class intervals, and the maximum deviation over all class intervals is determined as
( ) ( )i s iD max F x F x= −
: The upper limit of ith class interval
( ) ( )1
i s ii m
D max F x F x≤ ≤
ix
The hypothesis that the data are from a specified distribution is rejected at significance level if following Equation is satisfied.
α
n,D D α≥
L-Moment Ratio Diagram
1.0
0.8
0.9
1.0
0.6
0.7
GLO
urt
osis
0.4
0.5GPAGEVLN3PE3
L-K
u
0.2
0.3GE
0.0
0.1
100 080 060 040 020 000 020 040 060 080 100-0.1
-1.00 -0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80 1.00L-skewness
L-Moment Ratio Diagram - Region 4L-Moment Ratio Diagram - Region 4
0.65
0.45
0.55
GLOGPA
0.25
0.35
L-ku
rtos
is
GEVLN3PE3Gumbel
0.05
0.15 ExponentialSites of Region-4Group Mean
-0.15
-0.05
-0.15 -0.05 0.05 0.15 0.25 0.35 0.45 0.55 0.65L-skewness
1 n
Wang (1996) derived the following direct sample estimators of the L-moments
111
1 n
j:nnj
ˆ xC
λ=
= ∑
( )12 1 1
12
1 12
nj n j
j:nnj
ˆ C C xC
λ − −= −∑122 jC =
( )1 13 2 1 1 2
1 1 23
nj j n j n j
j:nˆ C C C C xλ − − − −= − +∑( )3 2 1 1 2
133 j:nnjC =∑
( )1 1 11 1 nj j j j j jˆ ∑( )1 1 1
4 3 2 1 1 2 314
1 1 3 34
j j n j j n j n jj:nn
j
ˆ C C C C C C xC
λ − − − − − −
== − + −∑
2 1Coefficient of L variation ( CV), L t l l− − =
3 3 2L skewness, = t l l−
4 4 2L kurtosis, t l l− =
2In general, = r rt l l
Regional Goodness-of-fit Test
: Number of sites in region kkN
in : Number of peak flow values at site i
( )it ( )3it ( )
4it : L-moment ratios computed using peak flow values at site i4 p g p
Compute the regional average L-moment ratios
( )
1 1
k kN Nr ii i
i i
N N
t n t n= =
⎫= ∑ ∑ ⎪
⎪⎪⎪⎪( )
3 31 1
k k
k k
N Niri i
i i
N N
t n t n= =
⎪⎪
= ∑ ∑ ⎬⎪⎪⎪⎪( )
4 41 1
k kN Niri i
i it n t n
= =
⎪= ∑ ∑ ⎪⎭
1set 1rl = by dividing flood values at each site by their mean
Fit the distribution being tested using the regional average L-moments
DIST4DISTτ L-kurtosis of the fitted distribution
4rt L-kurtosis of the observed data
The goodness of fit is judged by the difference between and 4DISTτ 4
rt
Compare the difference with standard deviation (i.e., sampling variability) of
obtained by repeated simulation of the homogeneous region using kappa distribution
4rt
Regional goodness-of-fit measure ( )4 4 4
DIST rDIST
t BZ
τ − +=g g
4σ
Regional goodness-of-fit measure ( )4 4 4
DIST rDIST
t BZ
τ
σ
− +=
4σ
[ ]( )44
simNm rt t−∑4B 4
rt: Bias of ( )1
4m
simB
Nθ==
4 4t: Bias of
[ ]( )( )
22
4 441
4 1
simNm r
simm
t t N B
Nσ =
− −=
∑4rt: Standard deviation of 4σ
( )1simN −
[ ] : Regional average L-kurtosis computed using the synthetic data simulated for mth realization [ ]4mt
Simulate a large number of realizations (Nsim = 500) of the region using kappa distribution. Each realization constitutes a homogeneous region, with synthetic data of flood simulated at each of the sites equal to its real-world counterpart. Further, in each realization, the data simulated at any site in the region is serially independent and the data simulated at different sites in the region are not cross-correlated
Hosking & Wallis (1997)
1.0
0.8
0.9
0.6
0.7
GLOGPAu
rtos
is
0.4
0.5GPAGEVLN3PE3G
L-K
u
0.2
0.3GE
Fittedobserved
0.0
0.1
-1.00 -0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80 1.00
observed
-0.11.00 0.80 0.60 0.40 0.20 0.00 0.20 0.40 0.60 0.80 1.00
L-skewness
Example: Generalized extreme value distribution
( ) ( )11, if 0
yk y ex k kξ α
−−∞ < ≤ + >⎧
⎪⎨
Probability density function
( ) ( )11 , , if 0, if 0
k y ef x e x kk x k
αξ α
− − −− ⎪= −∞ < < ∞ =⎨⎪ + ≤ < ∞ <⎩
( )1 1 , if 0k x
ln kk
y
ξα
⎧ ⎡ ⎤−− − ≠⎪ ⎢ ⎥⎪ ⎣ ⎦= ⎨( )
, if 0
yx
kξ
α
⎨−⎪
=⎪⎩
( )yeF x e
−−=Cumulative distribution function
2
3
2 27 8590 2 9554 , 33 r
lnk . c . c clnt
= + = −+
Shape parameter
( ) ( )2
1 2 1
r
kl k
kα
−
⋅=
− Γ +Scale parameter
( )1 1 1rl kkαξ ⎡ ⎤= + Γ + −⎣ ⎦Location parameter
How to compute ? ( )1 kΓ +
( )1 1 0y ty t e dt , y∞
−Γ + = + >∫
( )Γ : Gamma function
( ) ( ) ( )3 15 1 2 15 2 15 2 15. . . .Γ = Γ + = Γ0∫
Properties of Gamma function
( ) ( ) ( )( ) ( )
( )2 15 1 1 15 2 15 1 15 1 15
2 15 1 15 1 0 15
. . . . .
. . .
= Γ + = × Γ
= × Γ +
( ) ( )( ) ( )( ) ( )
1 if 0
1 if 1
1 ! if is a positive integer
y y y , y
y y / y, y
n n n
Γ + = Γ >
Γ = Γ + <
Γ
( ) ( ) ( )
( )
0 75 1 0 250 75
0 75 0 75. .
.. .
Γ − + ΓΓ − = =
− −( ) ( )1 ! if is a positive integern n nΓ = − ( )1 0 25
0 75 0 25.
. .Γ +
=− ×
( ) ( )2 81 2 81 1 if 0 y 1y b y b y b y yεΓ + = + + + + + ≤ ≤K
1 5
2 6
3 7
0.577191652 0.7567040780.988205891 0.4821993940.897056937 0.193527818
b bb b
b b
= − = −= == − = − ( ) 73 10yε −≤ ×
3 7
4 80.918206857 0.035868343b b= =
1.0
0.8
0.9
0.6
0.7
GLOGPAu
rtos
is
0.4
0.5GPAGEVLN3PE3G
L-K
u
0.2
0.3GE
Fittedobserved
0.0
0.1
-1.00 -0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80 1.00
observed
-0.11.00 0.80 0.60 0.40 0.20 0.00 0.20 0.40 0.60 0.80 1.00
L-skewness
Application Examples
REGIONALIZATION OF INDIANA WATERSHEDS
Table 1.5.1. Attributes considered for regionalization of Indiana watersheds.
Attribute Range
Attributes considered for regionalizationAttribute Range
Drainage Area 0.28 – 28,813 km2 Mean Annual Precipitation 86.36 – 116.84 cm
Main channel Slope 0.17 – 50.57 m/kmpMain channel Length 0.48 – 506.94 km
Basin Elevation 125.6 – 362.7 m Storage* 0% – 11%g
Soil Runoff coefficient 0.30 –1.00 Forest cover in Drainage Area 0.0 – 88.4%
I(24,2)** 6.60 – 8.51 cm( , )
*Storage – percentage of the contributing drainage area covered by lakes, ponds or wetlands **I(24,2) – 24-hour rainfall having a recurrence interval of 2 years.
Source: Indiana Department of Natural Resources (IDNR), Division of Water; USGS
Characteristics of the regions formed using Regression Approach (Glatfelter, 1984)
Region number NS RS
Heterogeneity Measure Region type
H H H
g g g pp ( , )
NS RS H1 H2 H3
1 16 598 4.85 1.39 -0.62 Heterogeneous
2 31 1191 4.99 0.96 -0.62 Heterogeneous
3 60 3449 2.09 0.14 -0.40 Heterogeneous
4 46 1294 1.08 1.65 0.20 Possibly HomogeneousHomogeneous
5 35 852 2.96 -0.24 -1.47 Heterogeneous
6 32 913 4.57 2.96 2.13 Heterogeneous
7 22 901 10.81 2.31 0.72 Heterogeneous
NS : Number of stationsRS : Region size in station years
Characteristics of the regions formed using Fuzzy c-means clustering (Rao and Srinivas 2006)
Region number NS RS
Heterogeneity Measure Region type
H H H
(Rao and Srinivas, 2006)
NS RS H1 H2 H3
1 52 911 0.63 0.83 -0.07 Homogeneous
2 60 2010 0.89 0.92 -0.08 Homogeneous
3 40 837 0.23 0.95 0.11 Homogeneous
4 75 3274 0.79 0.80 -0.08 Homogeneous
22 9 0 9 0 13 0 H5 22 975 0.97 -0.13 -0.77 Homogeneous
6 10 431 13.51 5.71 2.62 Heterogeneous
7 24 1012 0.48 0.03 0.79 Homogeneousg
8 14 160 0.99 -0.24 -1.70 Homogeneous
NS : Number of stationsRS : Region size in station years
Characteristics of the regions formed using Fuzzy SOFM (Srinivas et al 2008):
Region number NS RS
Heterogeneity Measure Region type
H1 H2 H3
Characteristics of the regions formed using Fuzzy SOFM (Srinivas et al., 2008):
H1 H2 H3
1 45 674 0.65 -0.30 -0.93 Homogeneous
2 55 1869 0.76 0.41 -0.37 Homogeneous
3 29 608 -0.32 0.93 0.31 Homogeneous
4 62 2820 0.86 0.69 -0.46 Homogeneous
5 24 1121 0 93 0 11 0 82 Homogeneous5 24 1121 0.93 -0.11 -0.82 Homogeneous
6 11 467 13.77 6.05 2.41 Heterogeneous
7 25 990 0.82 0.42 1.27 Homogeneous
8 16 188 0.54 -0.69 -1.99 Homogeneous
NS : Number of stationsRS : Region size in station years
12
7126
345 45
Glatfelter Regions Regions by Fuzzy SOFM clustering
Assessment of Predictive PerformanceAssessment of Predictive Performanceusing Leave one-out Cross Validation
Flood data and basin attributes for sitess =1,…, N
Take s = 1
Consider site s as ungauged
For SOFM method compute centroids ofFor SOFM method compute centroids of region(s) Vi=1,…, NR
For SOFM method assign site s to the regions using the procedure given in section 2.6
For Glatfelter regions assign site s to a region based on itsFor Glatfelter regions assign site s to a region based on its geographical location
For CCA method find neigborhoods for site s using procedure described in Appendex B of Ouarda et al.(2001) for chosen
confidence level
Determine growth curves of region(s)/neighborhoods of s using index flood method
Determine regression relationship between basin attributes and mean annual flood for region(s)/neigborhood
s = s + 1
For SOFM method compute flood quantiles of site s using Eq.(36), whereas for CCA and Glatfelter method use Eq.(35)
Is s > N ?No
YesCompute performance measures (e.g., R-bias, R-RMSE, NMSE)
1
ˆ ˆ1ˆ
N a Ri i
aii
Q QR biasN Q
=
⎛ ⎞−− = ⎜ ⎟⎜ ⎟
⎝ ⎠∑1i=
2ˆ ˆ1N a RQ Q⎛ ⎞∑
1
1ˆ
i iaii
Q QR RMSEN Q
=
⎛ ⎞−− = ⎜ ⎟⎜ ⎟
⎝ ⎠∑A i fl d ilˆ aQ
21 ˆ ˆ( )N
a Ri iQ Q−∑
At-site flood quantile
ˆ RiQ Regional quantile
aiQ
1
2
( )
1 ˆ ˆ( )( 1)
i ii
Na ai i
Q QN
NMSEQ Q
N
==−
−
∑
∑1( 1) iN =−
Leave one-out cross validationea e o e ou c oss a da o
T
Two-level SOFM Regional Regression CCA
R-bias R-RMSE NMSE R-bias R-RMSE NMSE R-bias R-RMSE NMSET R-bias R-RMSE NMSE R-bias R-RMSE NMSE R-bias R-RMSE NMSE
2 -0.139 0.618 0.065 -0.174 0.811 0.075 -0.198 0.755 0.071
10 -0.141 0.592 0.073 -0.163 0.785 0.086 -0.217 0.798 0.081
20 -0.145 0.605 0.080 -0.162 0.779 0.092 -0.223 0.816 0.086
100 -0.163 0.688 0.107 -0.166 0.805 0.117 -0.242 0.884 0.108
Ordinary least square regional regression equation is used to estimateindex flood value for the target ungauged site
REGIONALIZATION OF Ohio WATERSHEDS
GNG Clusters GNG Regions
B
A
B
A
C
Koltun (2003) GNG Regions
Application Example
REGIONALIZATION OF WATERSHEDSREGIONALIZATION OF WATERSHEDS
IN GODAVARI BASIN
Forming Watershed Clusters using Cluster Analysis
Hard clusteringMosley [1981], Tasker [1982], Burn et al. [1997] (Hierarchical)
Burn [1989], Burn and Goel [2000] (Partitional)Rao and Srinivas[2006] (Hybrid)[ ] ( y )
Fuzzy ClusteringBargaoui et al., [1998], Hall and Minns [1999], Rao and Srinivas[2006]
Artificial Neural Network Clustering
SOFMsHall and Minns [1999] Hall et al [2002] Jingyi and Hall [2004]Hall and Minns [1999], Hall et al. [2002], Jingyi and Hall [2004]
Growing Neural Gas NetworksSrinivas and Tripathi [2006]
Fuzzy SOFMSrinivas et al. [2008]
Study Area
Fig Stream network and gauges in Godavari basin (Area=3 12 813 km² )Fig. Stream network and gauges in Godavari basin (Area=3,12,813 km )
Performance Measures Considered to Compare the RegionsPerformance Measures Considered to Compare the Regions
ˆ ˆ( ) ( )1( ) 100ˆ ( )
N ai i
ai
q T q TR bias TN q T
⎛ ⎞−− = ×⎜ ⎟⎜ ⎟
⎝ ⎠∑2
ˆ ˆ( ) ( )1( ) 100N a
i iq T q TR RMSE T⎡ ⎤⎛ ⎞−⎢ ⎥×⎜ ⎟⎜ ⎟∑
1( )ii
q T= ⎝ ⎠
1
( ) 100ˆ ( )
i iaii
R RMSE TN q T
=
⎢ ⎥− = ×⎜ ⎟⎜ ⎟⎢ ⎥⎝ ⎠⎢ ⎥⎣ ⎦∑
1 N2
1
2
1 ˆ ˆ[ ( ) ( )]( )
1 ˆ ˆ[ ( ) ( )]( 1)
ai i
iN
a ai i
q T q TN
NMSE Tq T q T
N
=
−=
−
∑
∑1( 1) iN =− ∑
ˆ ( )iq T ˆ ( )aiq Tand are regional and at-site estimates of T-year quantile of predictand
at site iat site iˆ ( )aiq T represents mean of for all the sites in the study region.ˆ ( )a
iq T
119
Performance comparison – New Regions and CWC sub-zones inGodavari river basin.
T Newly delineated regions Sub-zones
R-bias R-RMSE NMSE R-bias R-RMSE NMSE
5 -3 635 6 788 0 002 -5 185 8 741 0 0045 3.635 6.788 0.002 5.185 8.741 0.004
10 -5.564 10.653 0.006 -9.609 16.049 0.042
25 -6.254 17.363 0.016 -12.133 26.609 0.154
50 6 205 22 802 0 028 12 737 34 300 0 27450 -6.205 22.802 0.028 -12.737 34.300 0.274
100 -6.066 28.474 0.046 -12.800 41.712 0.417
Conclusions
Cluster analysis based on soft computing techniques is effective indetermining homogeneous regions for flood frequency analysis
In regionalization using FCM algorithm the fuzzy memberships ofwatersheds in clusters are useful in adjusting the regions
It is not always possible to interpret clusters from the output of anSOFM, and Fuzzy SOFM and GNG network are feasiblealternatives.
The effort needed to determine regions using conventionalregression analysis is considerable, and the approach is inefficient.
Publications
Journal:
A R Rao and V V Srinivas (2006) Regionalization of Watersheds by Fuzzy ClusterA. R. Rao and V. V. Srinivas (2006) Regionalization of Watersheds by Fuzzy ClusterAnalysis. Journal of Hydrology, Elsevier, Vol.318, Issues 1-4, pp.57-79.
A. R. Rao and V. V. Srinivas (2006) Regionalization of Watersheds by Hybrid ClusterAnalysis Journal of Hydrology Elsevier Vol 318 Issues 1-4 pp 37-56Analysis. Journal of Hydrology, Elsevier, Vol.318, Issues 1 4, pp.37 56.
Srinivas, V. V., Tripathi, S., Rao, A. R., Govindaraju, R. S., 2008. Regional floodfrequency analysis by combining self-organizing feature maps and fuzzy clustering.Journal of Hydrology, 348, 148-166.Journal of Hydrology, 348, 148 166.
Text Book:Rao, A. R., Srinivas V. V., 2008. Regionalization of Watersheds – An Approach Basedon Cluster Analysis. Springer Publishers, Dordrecht, The Netherlands.on Cluster Analysis. Springer Publishers, Dordrecht, The Netherlands.
THANK YOUTHANK YOU