prof v.v.srinivas lecture-2 feb8 2012 given [compatibility mode]

Regional Frequency Analysis of FloodsRegional Frequency Analysis of Floods

Dr. V. V. SrinivasAssociate ProfessorAssociate Professor

Department of Civil Engineering, IISc Bangalore 560 012, India

Overview of PresentationOverview of Presentation

• Introduction

• Definitions

• Methods for Design Flood Estimation

• Case studies

Introduction

• Ungauged sites

Common problems in hydrologic design

• Ungauged sites

• Inadequate at-site information at gauged sites

Waterlogged field

Flooded Culvert

Culvert Failure

Levee Failure

Floodwaters washed away a culvert

Forum photo by Andy Blenkush

Introduction

Typical Applications of Predicting Floods

Planning and design of water-resources systems

Risk assessment of water control/conveyance structures where f il d t tifailure can cause devastation

• Culverts, Bridges, Storm Sewers, Dams, Levees, Flood Walls, floodway channels etc.

Economic evaluation of Engineering projects

Land use planning and management

Insurance assessment

Definitions

Complete duration series:All the peak flow data available for a location

Annual maximum seriesA series prepared using the largestvalues occurring in each of the years ofthe recordthe record.

Definitions

Partial duration series [or peak-over-threshold (POT) series]A series prepared using data points selectedfrom the observed record so that theirmagnitude is greater than a predefinedthreshold value

POT(K) series: PDS in which the number ofdata points is about K times the number ofyears of the record

A l d i [ POTAnnual exceedence series [or POT(1) series]

Preparation of POT series (Ref: FEH, 1999)

(i) Two consecutive peaks must be separated by at least three times the averagetime to rise.

At least five clean (i.e., not multi-peaked) events, whose peaks exceed thethreshold must be considered.

Time to rise

Engineering Hydrology By Subramanya

Figure: Elements of Flood Hydrograph

Preparation of POT series (Ref: FEH, 1999)

(ii) The minimum flow value in the trough between two peaks must be less than two-thirdsof the flow value of the first of the two peaksof the flow value of the first of the two peaks.

Average time to rise = 5 hoursTwo consecutive peaks must be separated by at least 15 hoursPeak ‘E’ is the largest and is therefore independentPeak ‘D’ occurs less than 15 hours before peak ‘E’ and is defined as dependentPeak ‘F’ is defined as dependent since although it occurs more than 15 hours after the peak ‘E’, the minimum discharge in the trough between the two peaks does not fall by more than two thirds of the peak discharge for event ‘E’

(FEH, 1999)

more than two-thirds of the peak discharge for event E

Peak ‘C’ is larger than peaks ‘A’ and ‘B’ and is judged independent of peak ‘E’ because it occurs more than 15 hours beforehand th i i di h i th t h b t th t i l th t thi d f ththe minimum discharge in the trough between the two is less than two-thirds of the discharge for peak C.

Peak ‘B’ occurs less than 15 hours before peak ‘C’ and is therefore dependentPeak ‘A’ is below the threshold and therefore no a POT event

(FEH, 1999)

Peak A is below the threshold and therefore no a POT event

Definitions

Random variableA variable is said to be random if its value depends on the outcome of a chance event

Examples of Event:• Culvert capacity being exceeded• Stage of flood flow exceeding deck height of a bridge

Recurrence interval (τ)Time interval between any two occurrences of an event

Return Period (T)Return Period (T)Expected value of τ for events equaling or exceeding a specified magnitude T= E(τ)

Probability of occurrence of an eventLet p=P(X ≥ xT) denote the probability of occurrence of an event X ≥ xTLet p P(X ≥ xT) denote the probability of occurrence of an event X ≥ xT

p can be related to return period T as, p=1/T

Probability of failure (non-occurrence of the event)Probability of failure (non occurrence of the event)

( ) ( ) 11 1T TF P X x P X xT

= < = − ≥ = −

Probability of occurrence of an event

For each observation in peak flow time series, there are two possible outcomes• Occurrence of event if X ≥ xT• Non-occurrence (failure) if X < xT

In peak flow time series, observations must be independent of each other.

The probability of a recurrence interval of duration τ = ( ) 11p p τ −−

Return period of the event = T=

( ) ( ) ( ) ( ) ( )1 2 31 1 2 1 3 1 4 1E C p p p p p pτττ∞ − ⎡ ⎤= − = + − + − + −∑ ⎣ ⎦

The expression in parenthesis has the form of the power series expansionwith x = -(1-p) and n = -2

( ) ( ) ( ) ( ) ( )11

1 1 2 1 3 1 4 1E C p p p p p pτ

τ=

⎡ ⎤= = + + +∑ ⎣ ⎦

Power series expansion, ( ) ( ) ( )( )2 31 1 21 1

2! 3!n n n n n n

x nx x x− − −

+ = + + + +K

( ) ( ) ( ) 2 11 1 1nT E p x p p pτ − −∴ = = + = − − =⎡ ⎤⎣ ⎦( ) ( ) ( )1p T

⎣ ⎦

⇒ =

(Ref: Chow et al., 1988)

DefinitionsRisk

A flood control structure fails if peak flow value exceeds Tx RiskT

(years)

For design life =100 yr

The hydrologic risk of failure can be computed as,

( )1 1 en

TR P X x= − − ≥⎡ ⎤⎣ ⎦

0.10 9500.15 6160.20 449( )T⎣ ⎦

en : Expected (or design) life of the structure

0.25 3480.30 2810.35 233e

( )1

1/ nT =

0.40 1960.45 1680.50 145

( )11 1 e/ nR− − 0.55 1260.60 1100 65 960.65 960.70 84

( ) ( ) 11 1T TF P X x P X xT

= < = − ≥ = −

Estimation of Return period (in years)Design life of structure (years)

Risk 100 50 250.10 950 475 2380 15 616 308 1540.15 616 308 1540.20 449 225 1130.25 348 174 870.30 281 141 710.35 233 117 590.40 196 98 490.45 168 84 420 50 145 73 370.50 145 73 370.55 126 63 320.60 110 55 280.65 96 48 240.70 84 42 210.75 73 37 190.80 63 32 160 85 53 27 140.85 53 27 140.90 44 22 11

How much at-site data do we need for Hydrologic design?

Institute of Hydrology, UKAt least T and preferably 2T

140

Limitations100

120

140

ars)

100

120

140

s)

• Ungauged sites

• Locations having paucity (scarcity) of flood data60

80

100

engt

h (y

ea

60

80

100

engt

h (y

ears

g p y ( y)

20

40

60

Rec

ord

L

20

40

60

Reco

rd L

e0

0 100 200 300 400

G i St ti i Ohi USA

00 100 200 300

Gauging Stations in Indiana USAGauging Stations in Ohio, USA Gauging Stations in Indiana, USA

(RL<60 years: 85%)(RL<60 years: 77%)

45

50

25

30

3540

Leng

th (Y

ears

)

0

5

1015

20

Rec

ord

L

00 10 20 30 40 50

Gauging stations in Krishna basin

40

50

ars)

20

30

ord

Leng

th (Y

ea

0

10

0 10 20 30 40 50 60 70

Rec

o

0 10 20 30 40 50 60 70Gauging stations in Godavari basin

(RL< 50 years: 100% of stations)

Methods for Design Flood Estimationg

Empirical methods

Dicken’s formula, Ryve’s formula, Inglis’ formula, Nawab Jung Bahadur formula, Creager’s formula and Rational formula

Frequency analysis methodsq y y

Rainfall-runoff methods

Unit hydrograph methodsy g p

Empirical Methods for Design Flood Estimationp g

(a) Peak discharge formulae involving drainage area only

Dicken’s formula

3 4/pQ CA=

Qp – Peak discharge in m3/s

A – Catchment area in km2

C=11.5 (North India); 14-19.5 (Central India); 22 to 26 (Western Ghats of India)

C - Constant

C depends on Rainfall and Altitude of catchment

Ryve’s formulaRyve’s formula

C = 6 8 for catchment areas within 80 km from the coast

2 3/pQ CA=

C = 6.8 for catchment areas within 80 km from the coast

= 8.8 for catchment areas within 80-2400 km from the coast


(a) Peak discharge formulae involving drainage area only

Inglis’ formula

(f f h d t h t i ld B b t t )

12310 4pAQ

A .=

+

Qp – Peak discharge in m3/s

A – Catchment area in km2

(for fan shaped catchments in old Bombay state)

N b J B h d f l

A' – Catchment area in mi2

C - ConstantNawab Jung Bahadur formula

(f H d b d D C t h t ) C 1 77 177

( )0 93 1 14. / log ApQ CA ′−⎡ ⎤⎣ ⎦′=

(for Hyderabad Deccan Catchments); C=1.77 - 177

Oth f l C ’ f l J i f l M difi d M ’ f lOther formulae: Creager’s formula; Jarvis formula; Modified Myer’s formula


(b) Peak discharge formulae involving drainage area and its shape

Dredge or Burge formula (for Indian catchments)Dredge or Burge formula (for Indian catchments)

( )2 3 2 319 6 19 6p / /

BLAQ . .L L

= =Qp – Peak discharge in m3/s

A – Catchment area in km2A Catchment area in km

L – Average length of catchment area in km

B – Average width of the catchment in km

Pettis formula P – Probable 100 year maximum 1 dayrainfall in cm

( )5 4/Q C P B= ⋅

C = 1.5 (humid areas); 0.2 (desert areas)

C - Constant( )pQ C P B

( ) ( )

(for northern USA, Ohio to Cennecticut)


(c) Peak discharge formulae involving rainfall intensity and drainage area

Rational formula

pQ CiA=Qp – Peak discharge in cfs

A – Catchment area in acres

C – Runoff coefficient (0 ≤ C ≤1)

i – Intensity of rainfall in inches/hr

The peak discharge corresponds to time of concentration

Used widely in storm sewer designs, because of its simplicity

1 cfs = 1 .008 acre.in/hr(the conversion is considered to be included

in the value of C)

The assumption of fixed C (i.e., ratio of peak discharge to rainfall rate) for a drainage basin is invalid.

In rational formula, the value of runoff coefficient ‘C’ depends on integrated effectsof several factors.

Character and condition of soil

Rainfall intensity

Proximity of the water table

Degree of soil compaction

Porosity of subsoilPorosity of subsoil

Vegetation

Ground slopeGround slope

Ponding character of surface of catchment - Depression storage

R i f ll i t it i i d t i d b d i t it d ti f (IDF)Rainfall intensity i is determined based on intensity-duration-frequency (IDF) relationships.

Design duration D is equal to the time of concentration for the catchment.

The frequency F (or corresponding return period, T ) is chosen by hydrologist.

2.5

2525

Fig. Intensity-duration-frequency curves of maximum rainfall in Chicago, USA (Source: Chow et al., 1988)

(Ref: Chow et al., 1988)

Issues with use of Empirical Methods

None of the empirical formulae will give precise results for all locations

Magnitude of flood in a catchment depends on a large set of predictor variables, and none of the formulae consider adequate number of variables.and none of the formulae consider adequate number of variables.

Predictor variables - Examples1) Precipitation

2) Physiographic catchment characteristicsDrainage Area; Length of longest channel

3) Location attributes3) Location attributesLatitude; Longitude; Altitude

4) Landuse/landcoverB il A A i l l A F A W b di W l dBuilt-up Area; Agricultural Area; Forest Area; Water bodies; Waste lands

5) Catchment shapeCompactness coefficient; Circularity ratio; Form factor; Elongation ratio

6) Drainage CharacteristicsDrainage density; Slope; Soils

Form factor: Ratio of area of the basin to square of its axial length (Large)

Compactness coefficient: Ratio of the perimeter of the basin to theCompactness coefficient: Ratio of the perimeter of the basin to thecircumference of a circle of area equal to the basin area. (Small)

Elongation ratio: Ratio of the diameter of a circle having area the same as theb i t th i l th f th b i (L )basin area, to the maximum length of the basin. (Large)

Circularity ratio: Ratio of the basin area to the area of a circle of sameperimeter as that of the basin. (Small)p ( )

Frequency Analysis Methods for Design Flood Estimation q y y g

Information at target site is augmented with that from gauged sites depicting similar hydrological behavior

Data or model parametersData or model parameters

Functional relationships between model parameters and watershed attributes

Region: A set of gauged sites depicting similar hydrological behavior

Regionalization: Process of identifying regions

R i l F A l i F l i b d R i l i f tiRegional Frequency Analysis: Frequency analysis based on Regional information

Approaches to Regionalization

Regression Approach- Thomas and Benson [1970], Wandle [1977], Glatfelter [1984]Choquette [1988]

ROI approach- Acreman and Wiltshire [1987] Burn [1990 a b] Zrinji and Burn [1994]- Acreman and Wiltshire [1987], Burn [1990 a,b], Zrinji and Burn [1994]

Hierarchical approach & Extension to ROI- Gabriele and Arnell [1991], Zrinji and Burn [1996]

Classification and Regression Tree (CART) based Approach- Breiman et al. [1984], Laaha and Bloschl [2006]

Canonical correlation analysis- Cavadias [1989, 1990, 1995], Cavadias et al. [2001]

Approaches to Regionalization

Nearest Neighbor approach- Vogel [2005]

Ensemble Prediction Approach- McIntyre et al. [2005]

Cluster analysis- Tasker [1982], Burn [1989], Burn et al. [1997], Burn and Goel [2000],

Hall and Minns [1999], Hall et al. [2002], Jingyi and Hall [2004]Rao and Srinivas [2006a b]Rao and Srinivas [2006a,b]

Frequency Analysis Methods for Design Flood Estimation

When T ≤ 27 years (FEH, IH Wallingford, 1999; Vol.3, P.46)

q y y g

Length of record (years)

Site Analysis Pooled Analysis

Recommendedmethod

y ( g )

(y ) y method

<T/2 No Yes Pooled analysis

T/2 to T For confirmation Yes Pooled analysis prevails

T to 2T Yes Yes Joint (site and Pooled) Analysis

> 2T Yes For confirmation Site analysis> 2T Yes For confirmation Site analysis prevails

Frequency Analysis Methods for Design Flood Estimation

When T > 27 years (FEH, IH Wallingford, 1999; Vol.3, P.46)

q y y g

Length of record (years)

Site Analysis Pooled Analysis

Recommendedth d

y ( g )

(years) Analysis method

<14 No Yes Pooled analysis

14 to T For confirmation Yes Pooled analysis prevails

T to 2T Yes Yes Joint (site and Pooled) Analysis

> 2T Yes For confirmation Site analysis> 2T Yes For confirmation Site analysis prevails

Assumptions in flood frequency analysis (FFA)p q y y ( )

At-site FFA: The flood (predictand) data at the target site are• serially independent• identically distributed in time

Regional FFA : The flood data at any site in a region are• serially independent• identically distributed in timey

The flood data at different sites in the region• are independent in space• have identical frequency distributions apart from a scale factor• have identical frequency distributions, apart from a scale factor

The causal precipitation (predictor) system is stochastic space independent andThe causal precipitation (predictor) system is stochastic, space-independent and time-independent (Chow et al., 1988).

Rainfall-runoff method for Flood Estimation

Involves estimation of flood frequency from rainfall frequency using ahydrological model which links rainfall to resultant runoff.y g

The method is appealing as it yields a hydrograph of river flow rather thanjust an estimate of peak flow.

Unit hydrograph method for Design Flood Estimation

Involves developing regression relationships between the physicalcharacteristics of catchments and parameters of their unit hydrographs, toarrive at a synthetic unit hydrograph for estimation of design floodarrive at a synthetic unit hydrograph for estimation of design flood.

Factors influencing the choice of Flood Estimation Approach

1. Type of Problem

h i f ll ff h h ld ll b d f i i fThe rainfall-runoff approach should generally be used for estimation of• Reservoir flood

• Probable maximum flood

P bl i l i i• Problems involving storage routing

2. Type of catchment

• Statistical approach should generally be used for large catchments (>1000 km2)

• Rainfall-runoff approach is favored when sub-catchments are disparate/dissimilar

3. Type of data

• Statistical approach is favoured when the gauged record is longer than 2 or 3 years

R i f ll ff h i f d h th i ti fl d b t• Rainfall-runoff approach is favored when there is no continuous flow record, butrainfall and flow data are available for 5 or more flood events

(FEH, 1999)

Points to be noted with reference to case studies in India

Several of the sub-zones delineated by CWC (1983) are heterogeneous

Need to delineate alternative homogeneous regions

Need to consider Partial duration Series

Regions need not be strictly contiguous

Regions need not be hard and can be fuzzyRegions need not be hard, and can be fuzzy

Selection ofattributes

Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds

Forming clusters

Selecting optimum number of regionsSelecting optimum number of regions

Testing the regions for homogeneity

Are the regionsNo

Adjustment of Homogeneous?

Yes

heterogeneous regions

Determining values of predictands

B i Ch t i ti F R i l A l i

Physiographic characteristics

Basin Characteristics For Regional Analysis

y g pDrainage area, Length and slope of main stream

Soil characteristicsInfiltration potential; Runoff coefficient Permeability Bulk densityInfiltration potential; Runoff coefficient, Permeability, Bulk density

Land use characteristicsForests, Agricultural, Suburban or urban land

Drainage characteristicsDrainage density

Meteorological characteristicsStorm direction, Mean annual precipitation

Directional statisticsDirectional statisticsSeasonality of flood events

Basin Characteristics For Regional Analysis (Contd )

Geographical location attributesLatitude longitude and Altitude

Basin Characteristics For Regional Analysis (Contd…)

Latitude, longitude, and Altitude

Geologic features of the basinFraction of the basin underlain by rock formationsFraction of the basin underlain by rock formations

Shape indicators of catchmentsForm factor, Compactness coefficient, Elongation ratio, Circularity ratio

Feature vector : vector containing basin characteristics



Forming clusters



Are the regionsNo


Yes



Clustering Clustering Algorithms available in literatureAlgorithms available in literatureClustering Clustering Algorithms available in literatureAlgorithms available in literature

The clustering algorithms available in literature can be broadly classified as

Partitional [e.g., K-means and K-medoids algorithms, fuzzy c-means clustering]

Hierarchical [e.g., single-linkage, complete-linkage and Ward’s algorithms.]

Density-based [e.g., DBSCAN (Density-Based Spatial Clustering of

Applications with Noise) and OPTICS (Ordering Points To Identify the Clustering

Structure]

Grid-based [e.g., STING (a STatistical INformation Grid approach) and CLIQUE

(Clustering In QUEst).].

Model-based and [e.g., COBWEB ]

Boundary-detecting. [e.g., SVC (Support Vector Clustering) ]

Regionalization using Soft Clustering Techniques

Fuzzy ClusteringBargaoui et al [1998] Hall and Minns [1999] Rao and Srinivas[2006]Bargaoui et al., [1998], Hall and Minns [1999], Rao and Srinivas[2006]

Artificial Neural Network Clustering

SOFMsHall and Minns [1999], Hall et al. [2002], Jingyi and Hall [2004]

G i N l G N t kGrowing Neural Gas NetworksSrinivas and Tripathi [2006]

Fuzzy SOFMSrinivas et al. [2008]

Fuzzy C-Means Algorithm

),()() :,( 2

1 1ik

c

i

N

kik du VXXVUJ ∑ ∑

= =

μ=Minimize

subject to the following constraints:

{ }N,,kuc

iik K1 1

1∈∀=∑

=

subject to the following constraints:

{ }c,,iNuN

kik K1 0

1∈∀<< ∑

=

initinit

∈[1, ] refers to weight exponent or fuzzifierμ ∞

(1) Initialise fuzzy cluster centroids: initc

init VV ,,1 K

(2) Compute fuzzy memberships:

1 ( 1)

2

1 ( 1)

1( )

for 1 11

/

initk i

ik /c

d ,u i c, k N

μ

μ

−

−

⎛ ⎞⎜ ⎟⎝ ⎠= ≤ ≤ ≤ ≤⎛ ⎞

X V

21

1( )

c

initi k id ,=

⎛ ⎞⎜ ⎟⎝ ⎠

∑ X V

( )N

ik ku μ∑ X(3) Update fuzzy cluster centroids for i =1,2,…,c

( )1

1

ki N

ikk

u μ

=

=

=

∑V

(4) Update the fuzzy membership as :iju

1 ( 1)

21

( )f 1 1

/

k id ,k N

μ−⎛ ⎞⎜ ⎟⎝ ⎠X V

1 ( 1)

21

( ) for 1 1

1( )

k iik /c

i k i

,u i c, k N

d ,

μ−

=

⎝ ⎠= ≤ ≤ ≤ ≤⎛ ⎞⎜ ⎟⎝ ⎠

∑ X V

Repeat steps (3) and (4) until the change in the value of the memberships between two successive iterations becomes sufficiently small

⎝ ⎠

successive iterations becomes sufficiently small.

Conventional methods suggest hardening fuzzy partition matrix

Nk uuu 1111 ⎤⎡ LL

iNiki

Nk

uuu

uuu

1

1111

⎥⎥⎥⎥⎥⎤

⎢⎢⎢⎢⎢⎡

=MMM

LL

MMM

U

NccNckc uuu 1 ×⎥⎥⎦⎢

⎢⎣ LL

MMM

1 }{ max1

==≤≤

ikci

jk uu jiuik ≠∀= 0

dd xv −== min}{min 1=jku jiu ≠∀= 0thenkici

ikci

jk dd xv −==≤≤≤≤ 11

min}{min 1=jku jiuik ≠∀= 0then

Instead, Rao and Srinivas (2006a) suggested using a threshold membership of 1/c to form fuzzy clusters

Srinivas et al. (2008) defined threshold for each feature vector

⎭⎬⎫

⎩⎨⎧

⎥⎦⎤

⎢⎣⎡=

≤≤)u(,

cmaxT ik

cik max

1211

Self-Organizing Feature Map (SOFM) Clustering

(1) Initialize weights {wij , i =1,…n, j =1,…m} of the connections from the n input nodes to the m output nodes, randomly.

kx1 m output nodes

nkx

p

(2) Draw an input vector Xk (randomly without replacement) and compute its distance from j-th output node Wj . Find the winning output node as:

mjttarg jkj

,1,2, )()(min ω K=−= WX

kx1 m output nodes

nkx

p

(3) Update the weight vectors of winner and its neighboring nodes:

)]()()[()()()1( ω ttthttt jkjjj WXWW −+=+ η )]()()[()()()( ω jk,jjj η

Repeat steps (2) and (3) over several iterations, until no changesin the Kohonen layer mapping are observed.

kx1 m output nodes

nkx

p

Decrease in topological neighborhood with increase in iterations

kx1 m output nodes

nkx

p

)(ω, th jNeighborhood function: Learning rate parameter :

)(tη

⎞⎛)0(η has value close to 0.1⎟⎟

⎠

⎞⎜⎜⎝

⎛−=

1exp)0()(

τηη tt

, a constant, is set equal to the maximum number of iterations (1000 in this study).

1τ maxt

⎟⎟⎞

⎜⎜⎛−=)( 2

2ω,

ωd

expth jj ⎟

⎠⎜⎝ )(2

)( 2ω,t

pjσ

is the topological distance between the winning node and its neighboring node jjd ,ω ω

jjd rr −= ωω, ⎟⎟⎠

⎞⎜⎜⎝

⎛−=

2)0()(

τσσ texpt

Effective width of the topological neighborhood at time t :

(0)ln max

2 στ

t=

)(tσ

ωr

Effective width of the topological neighborhood at time t :

Discrete vector defining the position of the winning node :

Discrete vector defining the position of the neighboring node j : r

)(tσ

Discrete vector defining the position of the neighboring node j :

Radius of lattice in the output layer of SOFM :

jr)0(σ

Existing thumb rules to decide number of nodes in output layer of SOFM

chose nodes to be at least two times CExp (Hall and Minns,1999)

chose nodes to be at least three times CExp (Hall et al., 2002)

CExp : Expected number of clusters

N t C i t d i i t k i i

Proposed method to design output layer of SOFM

Note: CExp in a study region is not known a priori.

Proposed method to design output layer of SOFM

For accurate estimation of flood quantiles with non-exceedance probability upto 0.99,regions of size 20 sites are reasonable (p.59, pp.119-123, Hosking and Wallis, 1997).g (p , pp , g , )

CExp = Number of sites in study region/20 = 245/20 ≈ 12

15

20

25

hits

5

10

15

Num

ber o

f

Case study:Indiana

01 3 5 7 9 11 13 15 17 19 21 23

Output neuron

65

10

15

Num

ber of Hi 7

80

5

10

15

Num

ber of Hits

2

3

4

5

2

34

560

its

23

45

67

34

56

780

s

11 12

12

Count maps for Kohonen layers obtained with classical SOFM

Case study: Ohio

30

40

H

20

40Num

be

3

4

3

40

10

20

Hits

3

4

5

3

4

50

20

er of Hits

1

2

1

2

1

2

1

2

3

The feature maps generated did not reveal any information that would allow selectionThe feature maps generated did not reveal any information that would allow selection of an appropriate number of clusters

3 options considered:

1-D SOFM with number of nodes in Kohonen layer equal to the expected number of regions Srinivas et al. [2003]

3 options considered:

Growing Neural Gas Networks Srinivas and Tripathi [2006]

Fuzzy SOFM Srinivas et al. [2008]

Fuzzy SOFM

c clusters formedat level-2

kx1 m´ prototypes formed at level-1

nkx

Srinivas et al. (2008)



Forming clusters



Are the regionsNo


Yes



Selecting optimum number of regions using cluster validity measures

Partition Coefficient

Selecting optimum number of regions using cluster validity measures

(Bezdek, 1974)

In Hydrology:Bargaoui et al., 1998; Hall and Minns 1999

2

1 1

1 c N

PC iki k

V ( ) ( u )N = =

= ∑∑UMax Range: [1/c,1]

Partition Coefficient ( e de , 9 )

Hall and Minns, 1999

N⎡ ⎤

Partition Entropy (Bezdek, 1974)

1 1

1 c N

PE ik iki k

V ( ) u log( u )N = =

⎡ ⎤= − ⎢ ⎥

⎣ ⎦∑∑UMin Range: [0, log c]

11)(

1)(−×

−=c

VcFPI PC U

U Range: [0, 1]Min

Fuzziness Performance Index In Hydrology:Guler and Thine, 2004

(Roubens, 1982)

1−c

(Roubens, 1982)Normalized Classification Entropy

)log()()(

cVNCE PE UU = Range: [0, 1]Min In Hydrology:

Guler and Thine, 2004

Extended Xie-Beni Index (Xie and Beni, 1991)

2

1 12( : )

min

c N

ij i ki k

XB,mi k

( u )V ,

N

μ

= =−

=−

∑∑ V XU V X

V V In Hydrology:i ki ,i k≠

Kwon’s Index (Kwon, 1998)

In Hydrology:Rao and Srinivas, 2006

22

1 1 12

1

( : )min

c N c

ij i k ii k i

XB,m

( u )cV ,

N

μ

= = =− + −

=∑∑ ∑V X V V

U V XV V

( )

min i ki ,i kN

≠−V V



Forming clusters



Are the regionsNo


Yes



Definition of Regional HomogeneityDefinition of Regional Homogeneity

A region is said to be homogeneous if data at all sites in that region followA region is said to be homogeneous if data at all sites in that region followsame distribution.

Degree of homogeneity is decide by comparing the between-site dispersionof the sample L-moment ratios for the sites in a region, with dispersion thatwould be expected for a homogeneous region.

66

Testing Regions for HomogeneityTesting Regions for Homogeneity using Heterogeneity Measures of (Hosking and Wallis, 1997)

Heterogeneity measure (HM) based on L-coefficient of variation (L-CV):

V

V1 σ

)μ(VH

−=

)μ(V

2

2

V

V22 σ

)μ(VH

−=HM based on L-CV and L-Skewness:

HM based on L-Skewness and L-Kurtosis: 3

V

V33 σ

)μ(VH

−=

3Vσ

)(itL-CV at site i :

)(3it

)(i

L-skewness at site i :

k i i

NN iR )(

)(4itL-kurtosis at site i :

∑∑==

=i

ii

ii

R ntnt11

)(Regional average L-CV:

∑∑==

=N

ii

N

i

ii

R ntnt11

)(33

Regional average L-skewness:

Regional average L-kurtosis: ∑∑==

=N

ii

N

i

ii

R ntnt11

)(44

Weighted standard deviation of at-site sample L-CVs:

{ } 2/1

11

2)( )(V ∑∑ −===

N

ii

N

i

Ri

i nttn

Weighted average distance from the site i to the group weighted mean in the 2-dimensional space of L-CV and L-skewness:

{ } ∑∑ −+−=N

ii

N

i

RiRii nttttn

11

2/123

)(3

2)(2 )()(V

== ii 11

Weighted average distance from the site i to the group weighted mean in the 2-dimensional space of L-skewness and L-Kurtosis:

{ } ∑∑ −+−=N

ii

N

i

RiRii nttttn

11

2/124

)(4

23

)(33 )()(V

in the 2 dimensional space of L skewness and L Kurtosis:

== ii 11

Vμ : Mean of V values from Monte Carlo (MC) simulationsVμ

2Vμ

: Mean of V values from Monte-Carlo (MC) simulations

: Mean of V2 values from MC simulations

3Vμ : Mean of V3 values from MC simulations

Vσ

2Vσ

: Standard deviation (SD) of V values from MC simulations

: SD of V2 values from MC simulations2

3Vσ : SD of V3 values from MC simulations

MC simulations 500

HETEROGENEITY MEASURES (Hosking and Wallis, 1997)

VALIDATION

Heterogeneity measure (HM) based L-CV: V

V1 σ

)μ(VH

−=

2V22

)μ(VH

−=HM based on L-CV and L-Skewness:

Vσ

2V2 σ

H

3V3 )μ(V −HM based on L-Skewness and L-Kurtosis:

3

3

V

V33 σ

)μ(H =

Acceptably homogeneous : H < 1

Possibly heterogeneous : 1≤ H < 2

Definitely heterogeneous : H ≥ 2



Forming clusters



Are the regionsNo


Yes



Adjustment of Heterogeneous ClustersHosking and Wallis (1997):

eliminating one or more sites from the data set

dividing a region to form two or more new regions

allowing a site to be shared by two or more regions

i it f i t th imoving one or more sites from a region to other regions

dissolving regions by transferring their sites to other regions

merging a region with one or more regionsg g g g

merging two or more regions and redefining groups

obtaining more data and redefining regions

Adjustment of Heterogeneous ClustersAdjustment of Heterogeneous Clusters

The primary option that is considered for adjusting a heterogeneous cluster is elimination of one or more sites that are grossly discordant

ith t t th it ithi th l t Th l di d twith respect to other sites within the cluster. The grossly discordant sites are identified using the discordancy measure given by Equation

11 T 11 ( ) ( )3i k i iD N −= − −u u S u u

where is discordancy measure for ith site in a cluster k having sites.iD kNwhere is discordancy measure for ith site in a cluster k having sites.

is a vector containing L- moment ratios (LMR) of predictand at site i

i th i ht d f th LMR d S i i t i

i k

T( ) ( )( )

i 3 4 i iit t t⎡ ⎤⎢ ⎥⎣ ⎦

=u

u is the unweighted group average of the LMR and S is a covariance matrix

kNi∑ u T( )( )

kN∑S

u

1i

i

kN=∑

=u

uT

1( )( )i i

i== − −∑S u u u u

74



Forming clusters



Are the regionsNo


Yes



Growth curve estimation using index flood method

The flood quantile estimate at site i for T-year return period is determined using index flood method (Dalrymple, 1960) by constructing a growth curve as,

ˆ ˆ ˆ( ) ( )i i kQ T q Tμ=

ˆiμ : Index flood value for site i (e.g., mean, or median of flood data)

Probability Distributions to fit peak flowsProbability Distributions to fit peak flows

Two parameter distributionsGumbel distribution (Extreme value type-1)Log Normal 2 (LN2) parameter distributionLog-Normal 2 (LN2) parameter distributionTwo parameter Gamma distribution

Three parameter distributionsGeneralized Pareto (GPA) distributionGeneralized Extreme value (GEV) distributionGeneralized logistic (GLO) distributionLog-Normal 3-parameter (LN3) distribution; Generalized Normal (GNO) distributionLog Normal 3 parameter (LN3) distribution; Generalized Normal (GNO) distributionThree parameter Gamma distribution; Pearson type-3 (P3) distributionLog- Pearson type-3 (LP3) distribution

Four parameter distributionFour parameter distributionKappa distributionFour-parameter Wakeby distribution

Five parameter distributionWakeby distribution

Which probability distribution fits the flood data?Which probability distribution fits the flood data?

Procedures for Identification

Probability papers

Goodness-of-fit testsGoodness of fit tests

Chi-Square test

Kolmogorov-Smirnov test

L-moment ratio diagram

Probability Papery pFlood data are plotted on probability paper to check if a frequency distribution fits the data

Probability papers are specifically designed for the frequency distributions

One of the axes represents the values of peak flows other axis represents exceedence or non-exceedence probability associated with peak flows, or return period

The plotted data appears close to a straight line if the frequency distribution fits the data

Flood quantile is estimated graphically by interpolation or extrapolation of the determined linear relationship between abscissa and ordinatelinear relationship between abscissa and ordinate

S No Formula b Equation

Plotting Position Formulae

S.No. Formula b Equation

1 California -

2 M difi d C lif i

( )m

mP X xn

≥ =

1m2 Modified California -

3 Hazen 0.5

( ) 1m

mP X xn−

≥ =

( ) 0 5m

m .P X x −≥ = ( )

1 2m

m bP X xb

−≥ =

+4 Gringorten 0.4

4

( )m n

( ) 0 440 12m

m .P X xn .−

≥ =+

( )1 2n b+ −

5 Blom 3/8

6 Tukey 1/3

( ) 3 81 4m

m /P X xn /−

≥ =+

( ) 1 3m /P X x −≥ =

7 Chegodayev 0.3

( )1 3mP X x

n /≥

+

( ) 0 30 4m

m .P X xn .−

≥ =+

8 Weibull 0 ( )1m

mP X xn

≥ =+

Results from study of Cunnane (1978)Results from study of Cunnane (1978)

Probability distribution Best plotting position formula

Log-normal / Normal Blom

EV-1 Gringorten

( ) 3 81 4m

m /P X xn /−

≥ =+

0 44m −EV-1(Extreme value type-1)

Gringorten

Exponential Gringorten

( ) 0 440 12m

m .P X xn .−

≥ =+

( ) 0 440 12m

m .P X xn−

≥ =+

Uniform Weibull

LP 3

0 12n .+

( )1m

mP X xn

≥ =+

LP-3(Log Pearson type 3) 3 8 for 0

3 8 for 0s

s

b / Cb / C> >< <

( )1 2m

m bP X xn b

−≥ =

+ −

( ) ( )1 mF x P X x= − ≥

Goodness-of-fit tests

Chi-square testThe flood data at target site are divided into m class intervals.

( )2m O E

Oi: The observed number of occurrences in class interval iE Th t d b f i l i t l i

( )22

1

mi i

ii

O EE

χ=

−=∑

Ei: The expected number of occurrences in class interval i

If the class intervals are chosen such that each class interval corresponds to an

E /equal probability, then , where n is the sample size of data.iE n / m=

2 2m

im O nχ = −∑

1i

iO n

nχ

== ∑

The distribution of is chi-square with degrees of freedom 2χ 1m pυ = − −

p : The number of parameters of distribution that is being tested to fit the data.

2 2zυ

χ ∑

: random variable among independent standard normal random variables

1i

izυχ

==∑

iz

The hypothesis that the data are from a specified distribution is rejected at significance level if following Equation is satisfied.

α

2 21,υ αχ χ −>

Kolmogorov-Smirnov Goodness-of-fit testKolmogorov Smirnov Goodness of fit test

The Kolmogorov-Smirnov test compares the sample cumulative frequency function and e o ogo o S o test co pa es t e sa p e cu u at e eque cy u ct o a dtheoretical cumulative distribution function (determined from the theoretical frequency distribution under test) to judge whether a chosen distribution fits the data acceptably closely.

The observed flood data at target site are divided into m class intervals, and the maximum deviation over all class intervals is determined as

( ) ( )i s iD max F x F x= −

: The upper limit of ith class interval

( ) ( )1

i s ii m

D max F x F x≤ ≤

ix

The hypothesis that the data are from a specified distribution is rejected at significance level if following Equation is satisfied.

α

n,D D α≥

L-Moment Ratio Diagram

1.0

0.8

0.9

1.0

0.6

0.7

GLO

urt

osis

0.4

0.5GPAGEVLN3PE3

L-K

u

0.2

0.3GE

0.0

0.1

100 080 060 040 020 000 020 040 060 080 100-0.1

-1.00 -0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80 1.00L-skewness

L-Moment Ratio Diagram - Region 4L-Moment Ratio Diagram - Region 4

0.65

0.45

0.55

GLOGPA

0.25

0.35

L-ku

rtos

is

GEVLN3PE3Gumbel

0.05

0.15 ExponentialSites of Region-4Group Mean

-0.15

-0.05

-0.15 -0.05 0.05 0.15 0.25 0.35 0.45 0.55 0.65L-skewness

1 n

Wang (1996) derived the following direct sample estimators of the L-moments

111

1 n

j:nnj

ˆ xC

λ=

= ∑

( )12 1 1

12

1 12

nj n j

j:nnj

ˆ C C xC

λ − −= −∑122 jC =

( )1 13 2 1 1 2

1 1 23

nj j n j n j

j:nˆ C C C C xλ − − − −= − +∑( )3 2 1 1 2

133 j:nnjC =∑

( )1 1 11 1 nj j j j j jˆ ∑( )1 1 1

4 3 2 1 1 2 314

1 1 3 34

j j n j j n j n jj:nn

j

ˆ C C C C C C xC

λ − − − − − −

== − + −∑

2 1Coefficient of L variation ( CV), L t l l− − =

3 3 2L skewness, = t l l−

4 4 2L kurtosis, t l l− =

2In general, = r rt l l

Regional Goodness-of-fit Test

: Number of sites in region kkN

in : Number of peak flow values at site i

( )it ( )3it ( )

4it : L-moment ratios computed using peak flow values at site i4 p g p

Compute the regional average L-moment ratios

( )

1 1

k kN Nr ii i

i i

N N

t n t n= =

⎫= ∑ ∑ ⎪

⎪⎪⎪⎪( )

3 31 1

k k

k k

N Niri i

i i

N N

t n t n= =

⎪⎪

= ∑ ∑ ⎬⎪⎪⎪⎪( )

4 41 1

k kN Niri i

i it n t n

= =

⎪= ∑ ∑ ⎪⎭

1set 1rl = by dividing flood values at each site by their mean

Fit the distribution being tested using the regional average L-moments

DIST4DISTτ L-kurtosis of the fitted distribution

4rt L-kurtosis of the observed data

The goodness of fit is judged by the difference between and 4DISTτ 4

rt

Compare the difference with standard deviation (i.e., sampling variability) of

obtained by repeated simulation of the homogeneous region using kappa distribution

4rt

Regional goodness-of-fit measure ( )4 4 4

DIST rDIST

t BZ

τ − +=g g

4σ

Regional goodness-of-fit measure ( )4 4 4

DIST rDIST

t BZ

τ

σ

− +=

4σ

[ ]( )44

simNm rt t−∑4B 4

rt: Bias of ( )1

4m

simB

Nθ==

4 4t: Bias of

[ ]( )( )

22

4 441

4 1

simNm r

simm

t t N B

Nσ =

− −=

∑4rt: Standard deviation of 4σ

( )1simN −

[ ] : Regional average L-kurtosis computed using the synthetic data simulated for mth realization [ ]4mt

Simulate a large number of realizations (Nsim = 500) of the region using kappa distribution. Each realization constitutes a homogeneous region, with synthetic data of flood simulated at each of the sites equal to its real-world counterpart. Further, in each realization, the data simulated at any site in the region is serially independent and the data simulated at different sites in the region are not cross-correlated

Hosking & Wallis (1997)

1.0

0.8

0.9

0.6

0.7

GLOGPAu

rtos

is

0.4

0.5GPAGEVLN3PE3G

L-K

u

0.2

0.3GE

Fittedobserved

0.0

0.1

-1.00 -0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80 1.00

observed

-0.11.00 0.80 0.60 0.40 0.20 0.00 0.20 0.40 0.60 0.80 1.00

L-skewness

Example: Generalized extreme value distribution

( ) ( )11, if 0

yk y ex k kξ α

−−∞ < ≤ + >⎧

⎪⎨

Probability density function

( ) ( )11 , , if 0, if 0

k y ef x e x kk x k

αξ α

− − −− ⎪= −∞ < < ∞ =⎨⎪ + ≤ < ∞ <⎩

( )1 1 , if 0k x

ln kk

y

ξα

⎧ ⎡ ⎤−− − ≠⎪ ⎢ ⎥⎪ ⎣ ⎦= ⎨( )

, if 0

yx

kξ

α

⎨−⎪

=⎪⎩

( )yeF x e

−−=Cumulative distribution function

2

3

2 27 8590 2 9554 , 33 r

lnk . c . c clnt

= + = −+

Shape parameter

( ) ( )2

1 2 1

r

kl k

kα

−

⋅=

− Γ +Scale parameter

( )1 1 1rl kkαξ ⎡ ⎤= + Γ + −⎣ ⎦Location parameter

How to compute ? ( )1 kΓ +

( )1 1 0y ty t e dt , y∞

−Γ + = + >∫

( )Γ : Gamma function

( ) ( ) ( )3 15 1 2 15 2 15 2 15. . . .Γ = Γ + = Γ0∫

Properties of Gamma function

( ) ( ) ( )( ) ( )

( )2 15 1 1 15 2 15 1 15 1 15

2 15 1 15 1 0 15

. . . . .

. . .

= Γ + = × Γ

= × Γ +

( ) ( )( ) ( )( ) ( )

1 if 0

1 if 1

1 ! if is a positive integer

y y y , y

y y / y, y

n n n

Γ + = Γ >

Γ = Γ + <

Γ

( ) ( ) ( )

( )

0 75 1 0 250 75

0 75 0 75. .

.. .

Γ − + ΓΓ − = =

− −( ) ( )1 ! if is a positive integern n nΓ = − ( )1 0 25

0 75 0 25.

. .Γ +

=− ×

( ) ( )2 81 2 81 1 if 0 y 1y b y b y b y yεΓ + = + + + + + ≤ ≤K

1 5

2 6

3 7

0.577191652 0.7567040780.988205891 0.4821993940.897056937 0.193527818

b bb b

b b

= − = −= == − = − ( ) 73 10yε −≤ ×

3 7

4 80.918206857 0.035868343b b= =

1.0

0.8

0.9

0.6

0.7

GLOGPAu

rtos

is

0.4

0.5GPAGEVLN3PE3G

L-K

u

0.2

0.3GE

Fittedobserved

0.0

0.1

-1.00 -0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80 1.00

observed

-0.11.00 0.80 0.60 0.40 0.20 0.00 0.20 0.40 0.60 0.80 1.00

L-skewness

Application Examples

REGIONALIZATION OF INDIANA WATERSHEDS

Table 1.5.1. Attributes considered for regionalization of Indiana watersheds.

Attribute Range

Attributes considered for regionalizationAttribute Range

Drainage Area 0.28 – 28,813 km2 Mean Annual Precipitation 86.36 – 116.84 cm

Main channel Slope 0.17 – 50.57 m/kmpMain channel Length 0.48 – 506.94 km

Basin Elevation 125.6 – 362.7 m Storage* 0% – 11%g

Soil Runoff coefficient 0.30 –1.00 Forest cover in Drainage Area 0.0 – 88.4%

I(24,2)** 6.60 – 8.51 cm( , )

*Storage – percentage of the contributing drainage area covered by lakes, ponds or wetlands **I(24,2) – 24-hour rainfall having a recurrence interval of 2 years.

Source: Indiana Department of Natural Resources (IDNR), Division of Water; USGS

Characteristics of the regions formed using Regression Approach (Glatfelter, 1984)

Region number NS RS

Heterogeneity Measure Region type

H H H

g g g pp ( , )

NS RS H1 H2 H3

1 16 598 4.85 1.39 -0.62 Heterogeneous

2 31 1191 4.99 0.96 -0.62 Heterogeneous

3 60 3449 2.09 0.14 -0.40 Heterogeneous

4 46 1294 1.08 1.65 0.20 Possibly HomogeneousHomogeneous

5 35 852 2.96 -0.24 -1.47 Heterogeneous

6 32 913 4.57 2.96 2.13 Heterogeneous

7 22 901 10.81 2.31 0.72 Heterogeneous

NS : Number of stationsRS : Region size in station years

Characteristics of the regions formed using Fuzzy c-means clustering (Rao and Srinivas 2006)

Region number NS RS


H H H

(Rao and Srinivas, 2006)

NS RS H1 H2 H3

1 52 911 0.63 0.83 -0.07 Homogeneous

2 60 2010 0.89 0.92 -0.08 Homogeneous

3 40 837 0.23 0.95 0.11 Homogeneous

4 75 3274 0.79 0.80 -0.08 Homogeneous

22 9 0 9 0 13 0 H5 22 975 0.97 -0.13 -0.77 Homogeneous

6 10 431 13.51 5.71 2.62 Heterogeneous

7 24 1012 0.48 0.03 0.79 Homogeneousg

8 14 160 0.99 -0.24 -1.70 Homogeneous


Characteristics of the regions formed using Fuzzy SOFM (Srinivas et al 2008):

Region number NS RS


H1 H2 H3

Characteristics of the regions formed using Fuzzy SOFM (Srinivas et al., 2008):

H1 H2 H3

1 45 674 0.65 -0.30 -0.93 Homogeneous

2 55 1869 0.76 0.41 -0.37 Homogeneous

3 29 608 -0.32 0.93 0.31 Homogeneous

4 62 2820 0.86 0.69 -0.46 Homogeneous

5 24 1121 0 93 0 11 0 82 Homogeneous5 24 1121 0.93 -0.11 -0.82 Homogeneous

6 11 467 13.77 6.05 2.41 Heterogeneous

7 25 990 0.82 0.42 1.27 Homogeneous

8 16 188 0.54 -0.69 -1.99 Homogeneous


12

7126

345 45

Glatfelter Regions Regions by Fuzzy SOFM clustering

Assessment of Predictive PerformanceAssessment of Predictive Performanceusing Leave one-out Cross Validation

Flood data and basin attributes for sitess =1,…, N

Take s = 1

Consider site s as ungauged

For SOFM method compute centroids ofFor SOFM method compute centroids of region(s) Vi=1,…, NR

For SOFM method assign site s to the regions using the procedure given in section 2.6

For Glatfelter regions assign site s to a region based on itsFor Glatfelter regions assign site s to a region based on its geographical location

For CCA method find neigborhoods for site s using procedure described in Appendex B of Ouarda et al.(2001) for chosen

confidence level

Determine growth curves of region(s)/neighborhoods of s using index flood method

Determine regression relationship between basin attributes and mean annual flood for region(s)/neigborhood

s = s + 1

For SOFM method compute flood quantiles of site s using Eq.(36), whereas for CCA and Glatfelter method use Eq.(35)

Is s > N ?No

YesCompute performance measures (e.g., R-bias, R-RMSE, NMSE)

1

ˆ ˆ1ˆ

N a Ri i

aii

Q QR biasN Q

=

⎛ ⎞−− = ⎜ ⎟⎜ ⎟

⎝ ⎠∑1i=

2ˆ ˆ1N a RQ Q⎛ ⎞∑

1

1ˆ

i iaii

Q QR RMSEN Q

=

⎛ ⎞−− = ⎜ ⎟⎜ ⎟

⎝ ⎠∑A i fl d ilˆ aQ

21 ˆ ˆ( )N

a Ri iQ Q−∑

At-site flood quantile

ˆ RiQ Regional quantile

aiQ

1

2

( )

1 ˆ ˆ( )( 1)

i ii

Na ai i

Q QN

NMSEQ Q

N

==−

−

∑

∑1( 1) iN =−

Leave one-out cross validationea e o e ou c oss a da o

T

Two-level SOFM Regional Regression CCA

R-bias R-RMSE NMSE R-bias R-RMSE NMSE R-bias R-RMSE NMSET R-bias R-RMSE NMSE R-bias R-RMSE NMSE R-bias R-RMSE NMSE

2 -0.139 0.618 0.065 -0.174 0.811 0.075 -0.198 0.755 0.071

10 -0.141 0.592 0.073 -0.163 0.785 0.086 -0.217 0.798 0.081

20 -0.145 0.605 0.080 -0.162 0.779 0.092 -0.223 0.816 0.086

100 -0.163 0.688 0.107 -0.166 0.805 0.117 -0.242 0.884 0.108

Ordinary least square regional regression equation is used to estimateindex flood value for the target ungauged site

REGIONALIZATION OF Ohio WATERSHEDS

GNG Clusters GNG Regions

B

A

B

A

C

Koltun (2003) GNG Regions

Application Example

REGIONALIZATION OF WATERSHEDSREGIONALIZATION OF WATERSHEDS

IN GODAVARI BASIN

Forming Watershed Clusters using Cluster Analysis

Hard clusteringMosley [1981], Tasker [1982], Burn et al. [1997] (Hierarchical)

Burn [1989], Burn and Goel [2000] (Partitional)Rao and Srinivas[2006] (Hybrid)[ ] ( y )

Fuzzy ClusteringBargaoui et al., [1998], Hall and Minns [1999], Rao and Srinivas[2006]

Artificial Neural Network Clustering

SOFMsHall and Minns [1999] Hall et al [2002] Jingyi and Hall [2004]Hall and Minns [1999], Hall et al. [2002], Jingyi and Hall [2004]

Growing Neural Gas NetworksSrinivas and Tripathi [2006]

Fuzzy SOFMSrinivas et al. [2008]

Study Area

Fig Stream network and gauges in Godavari basin (Area=3 12 813 km² )Fig. Stream network and gauges in Godavari basin (Area=3,12,813 km )

Performance Measures Considered to Compare the RegionsPerformance Measures Considered to Compare the Regions

ˆ ˆ( ) ( )1( ) 100ˆ ( )

N ai i

ai

q T q TR bias TN q T

⎛ ⎞−− = ×⎜ ⎟⎜ ⎟

⎝ ⎠∑2

ˆ ˆ( ) ( )1( ) 100N a

i iq T q TR RMSE T⎡ ⎤⎛ ⎞−⎢ ⎥×⎜ ⎟⎜ ⎟∑

1( )ii

q T= ⎝ ⎠

1

( ) 100ˆ ( )

i iaii

R RMSE TN q T

=

⎢ ⎥− = ×⎜ ⎟⎜ ⎟⎢ ⎥⎝ ⎠⎢ ⎥⎣ ⎦∑

1 N2

1

2

1 ˆ ˆ[ ( ) ( )]( )

1 ˆ ˆ[ ( ) ( )]( 1)

ai i

iN

a ai i

q T q TN

NMSE Tq T q T

N

=

−=

−

∑

∑1( 1) iN =− ∑

ˆ ( )iq T ˆ ( )aiq Tand are regional and at-site estimates of T-year quantile of predictand

at site iat site iˆ ( )aiq T represents mean of for all the sites in the study region.ˆ ( )a

iq T

119

Performance comparison – New Regions and CWC sub-zones inGodavari river basin.

T Newly delineated regions Sub-zones

R-bias R-RMSE NMSE R-bias R-RMSE NMSE

5 -3 635 6 788 0 002 -5 185 8 741 0 0045 3.635 6.788 0.002 5.185 8.741 0.004

10 -5.564 10.653 0.006 -9.609 16.049 0.042

25 -6.254 17.363 0.016 -12.133 26.609 0.154

50 6 205 22 802 0 028 12 737 34 300 0 27450 -6.205 22.802 0.028 -12.737 34.300 0.274

100 -6.066 28.474 0.046 -12.800 41.712 0.417

Conclusions

Cluster analysis based on soft computing techniques is effective indetermining homogeneous regions for flood frequency analysis

In regionalization using FCM algorithm the fuzzy memberships ofwatersheds in clusters are useful in adjusting the regions

It is not always possible to interpret clusters from the output of anSOFM, and Fuzzy SOFM and GNG network are feasiblealternatives.

The effort needed to determine regions using conventionalregression analysis is considerable, and the approach is inefficient.

Publications

Journal:

A R Rao and V V Srinivas (2006) Regionalization of Watersheds by Fuzzy ClusterA. R. Rao and V. V. Srinivas (2006) Regionalization of Watersheds by Fuzzy ClusterAnalysis. Journal of Hydrology, Elsevier, Vol.318, Issues 1-4, pp.57-79.

A. R. Rao and V. V. Srinivas (2006) Regionalization of Watersheds by Hybrid ClusterAnalysis Journal of Hydrology Elsevier Vol 318 Issues 1-4 pp 37-56Analysis. Journal of Hydrology, Elsevier, Vol.318, Issues 1 4, pp.37 56.

Srinivas, V. V., Tripathi, S., Rao, A. R., Govindaraju, R. S., 2008. Regional floodfrequency analysis by combining self-organizing feature maps and fuzzy clustering.Journal of Hydrology, 348, 148-166.Journal of Hydrology, 348, 148 166.

Text Book:Rao, A. R., Srinivas V. V., 2008. Regionalization of Watersheds – An Approach Basedon Cluster Analysis. Springer Publishers, Dordrecht, The Netherlands.on Cluster Analysis. Springer Publishers, Dordrecht, The Netherlands.

THANK YOUTHANK YOU

prof v.v.srinivas lecture-2 feb8 2012 given [compatibility mode]

Documents