prof v.v.srinivas lecture-2 feb8 2012 given [compatibility mode]

123
Regional Frequency Analysis of Floods Regional Frequency Analysis of Floods Dr. V. V. Srinivas Associate Professor Associate Professor Department of Civil Engineering, IISc Bangalore 560 012, India

Upload: narmathagce3424

Post on 30-Apr-2017

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Regional Frequency Analysis of FloodsRegional Frequency Analysis of Floods

Dr. V. V. SrinivasAssociate ProfessorAssociate Professor

Department of Civil Engineering, IISc Bangalore 560 012, India

Page 2: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Overview of PresentationOverview of Presentation

• Introduction

• Definitions

• Methods for Design Flood Estimation

• Case studies

Page 3: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Introduction

• Ungauged sites

Common problems in hydrologic design

• Ungauged sites

• Inadequate at-site information at gauged sites

Page 4: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]
Page 5: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Waterlogged field

Flooded Culvert

Culvert Failure

Levee Failure

Floodwaters washed away a culvert

Forum photo by Andy Blenkush

Page 6: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Introduction

Typical Applications of Predicting Floods

Planning and design of water-resources systems

Risk assessment of water control/conveyance structures where f il d t tifailure can cause devastation

• Culverts, Bridges, Storm Sewers, Dams, Levees, Flood Walls, floodway channels etc.

Economic evaluation of Engineering projects

Land use planning and management

Insurance assessment

Page 7: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Definitions

Complete duration series:All the peak flow data available for a location

Annual maximum seriesA series prepared using the largestvalues occurring in each of the years ofthe recordthe record.

Page 8: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Definitions

Partial duration series [or peak-over-threshold (POT) series]A series prepared using data points selectedfrom the observed record so that theirmagnitude is greater than a predefinedthreshold value

POT(K) series: PDS in which the number ofdata points is about K times the number ofyears of the record

A l d i [ POTAnnual exceedence series [or POT(1) series]

Page 9: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Preparation of POT series (Ref: FEH, 1999)

(i) Two consecutive peaks must be separated by at least three times the averagetime to rise.

At least five clean (i.e., not multi-peaked) events, whose peaks exceed thethreshold must be considered.

Page 10: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Time to rise

Engineering Hydrology By Subramanya

Figure: Elements of Flood Hydrograph

Page 11: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Preparation of POT series (Ref: FEH, 1999)

(ii) The minimum flow value in the trough between two peaks must be less than two-thirdsof the flow value of the first of the two peaksof the flow value of the first of the two peaks.

Page 12: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Average time to rise = 5 hoursTwo consecutive peaks must be separated by at least 15 hoursPeak ‘E’ is the largest and is therefore independentPeak ‘D’ occurs less than 15 hours before peak ‘E’ and is defined as dependentPeak ‘F’ is defined as dependent since although it occurs more than 15 hours after the peak ‘E’, the minimum discharge in the trough between the two peaks does not fall by more than two thirds of the peak discharge for event ‘E’

(FEH, 1999)

more than two-thirds of the peak discharge for event E

Page 13: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Peak ‘C’ is larger than peaks ‘A’ and ‘B’ and is judged independent of peak ‘E’ because it occurs more than 15 hours beforehand th i i di h i th t h b t th t i l th t thi d f ththe minimum discharge in the trough between the two is less than two-thirds of the discharge for peak C.

Peak ‘B’ occurs less than 15 hours before peak ‘C’ and is therefore dependentPeak ‘A’ is below the threshold and therefore no a POT event

(FEH, 1999)

Peak A is below the threshold and therefore no a POT event

Page 14: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Definitions

Random variableA variable is said to be random if its value depends on the outcome of a chance event

Examples of Event:• Culvert capacity being exceeded• Stage of flood flow exceeding deck height of a bridge

Recurrence interval (τ)Time interval between any two occurrences of an event

Return Period (T)Return Period (T)Expected value of τ for events equaling or exceeding a specified magnitude T= E(τ)

Probability of occurrence of an eventLet p=P(X ≥ xT) denote the probability of occurrence of an event X ≥ xTLet p P(X ≥ xT) denote the probability of occurrence of an event X ≥ xT

p can be related to return period T as, p=1/T

Probability of failure (non-occurrence of the event)Probability of failure (non occurrence of the event)

( ) ( ) 11 1T TF P X x P X xT

= < = − ≥ = −

Page 15: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Probability of occurrence of an event

For each observation in peak flow time series, there are two possible outcomes• Occurrence of event if X ≥ xT• Non-occurrence (failure) if X < xT

In peak flow time series, observations must be independent of each other.

The probability of a recurrence interval of duration τ = ( ) 11p p τ −−

Return period of the event = T=

( ) ( ) ( ) ( ) ( )1 2 31 1 2 1 3 1 4 1E C p p p p p pτττ∞ − ⎡ ⎤= − = + − + − + −∑ ⎣ ⎦

The expression in parenthesis has the form of the power series expansionwith x = -(1-p) and n = -2

( ) ( ) ( ) ( ) ( )11

1 1 2 1 3 1 4 1E C p p p p p pτ

τ=

⎡ ⎤= = + + +∑ ⎣ ⎦

Power series expansion, ( ) ( ) ( )( )2 31 1 21 1

2! 3!n n n n n n

x nx x x− − −

+ = + + + +K

( ) ( ) ( ) 2 11 1 1nT E p x p p pτ − −∴ = = + = − − =⎡ ⎤⎣ ⎦( ) ( ) ( )1p T

⎣ ⎦

⇒ =

Page 16: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

(Ref: Chow et al., 1988)

Page 17: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

(Ref: Chow et al., 1988)

Page 18: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

DefinitionsRisk

A flood control structure fails if peak flow value exceeds Tx RiskT

(years)

For design life =100 yr

The hydrologic risk of failure can be computed as,

( )1 1 en

TR P X x= − − ≥⎡ ⎤⎣ ⎦

0.10 9500.15 6160.20 449( )T⎣ ⎦

en : Expected (or design) life of the structure

0.25 3480.30 2810.35 233e

( )1

1/ nT =

0.40 1960.45 1680.50 145

( )11 1 e/ nR− − 0.55 1260.60 1100 65 960.65 960.70 84

( ) ( ) 11 1T TF P X x P X xT

= < = − ≥ = −

Page 19: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Estimation of Return period (in years)Design life of structure (years)

Risk 100 50 250.10 950 475 2380 15 616 308 1540.15 616 308 1540.20 449 225 1130.25 348 174 870.30 281 141 710.35 233 117 590.40 196 98 490.45 168 84 420 50 145 73 370.50 145 73 370.55 126 63 320.60 110 55 280.65 96 48 240.70 84 42 210.75 73 37 190.80 63 32 160 85 53 27 140.85 53 27 140.90 44 22 11

Page 20: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

How much at-site data do we need for Hydrologic design?

Institute of Hydrology, UKAt least T and preferably 2T

140

Limitations100

120

140

ars)

100

120

140

s)

• Ungauged sites

• Locations having paucity (scarcity) of flood data60

80

100

engt

h (y

ea

60

80

100

engt

h (y

ears

g p y ( y)

20

40

60

Rec

ord

L

20

40

60

Reco

rd L

e0

0 100 200 300 400

G i St ti i Ohi USA

00 100 200 300

Gauging Stations in Indiana USAGauging Stations in Ohio, USA Gauging Stations in Indiana, USA

(RL<60 years: 85%)(RL<60 years: 77%)

Page 21: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

45

50

25

30

3540

Leng

th (Y

ears

)

0

5

1015

20

Rec

ord

L

00 10 20 30 40 50

Gauging stations in Krishna basin

40

50

ars)

20

30

ord

Leng

th (Y

ea

0

10

0 10 20 30 40 50 60 70

Rec

o

0 10 20 30 40 50 60 70Gauging stations in Godavari basin

(RL< 50 years: 100% of stations)

Page 22: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Methods for Design Flood Estimationg

Empirical methods

Dicken’s formula, Ryve’s formula, Inglis’ formula, Nawab Jung Bahadur formula, Creager’s formula and Rational formula

Frequency analysis methodsq y y

Rainfall-runoff methods

Unit hydrograph methodsy g p

Page 23: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Empirical Methods for Design Flood Estimationp g

(a) Peak discharge formulae involving drainage area only

Dicken’s formula

3 4/pQ CA=

Qp – Peak discharge in m3/s

A – Catchment area in km2

C=11.5 (North India); 14-19.5 (Central India); 22 to 26 (Western Ghats of India)

C - Constant

C depends on Rainfall and Altitude of catchment

Ryve’s formulaRyve’s formula

C = 6 8 for catchment areas within 80 km from the coast

2 3/pQ CA=

C = 6.8 for catchment areas within 80 km from the coast

= 8.8 for catchment areas within 80-2400 km from the coast

Page 24: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Empirical Methods for Design Flood Estimationp g

(a) Peak discharge formulae involving drainage area only

Inglis’ formula

(f f h d t h t i ld B b t t )

12310 4pAQ

A .=

+

Qp – Peak discharge in m3/s

A – Catchment area in km2

(for fan shaped catchments in old Bombay state)

N b J B h d f l

A' – Catchment area in mi2

C - ConstantNawab Jung Bahadur formula

(f H d b d D C t h t ) C 1 77 177

( )0 93 1 14. / log ApQ CA ′−⎡ ⎤⎣ ⎦′=

(for Hyderabad Deccan Catchments); C=1.77 - 177

Oth f l C ’ f l J i f l M difi d M ’ f lOther formulae: Creager’s formula; Jarvis formula; Modified Myer’s formula

Page 25: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Empirical Methods for Design Flood Estimationp g

(b) Peak discharge formulae involving drainage area and its shape

Dredge or Burge formula (for Indian catchments)Dredge or Burge formula (for Indian catchments)

( )2 3 2 319 6 19 6p / /

BLAQ . .L L

= =Qp – Peak discharge in m3/s

A – Catchment area in km2A Catchment area in km

L – Average length of catchment area in km

B – Average width of the catchment in km

Pettis formula P – Probable 100 year maximum 1 dayrainfall in cm

( )5 4/Q C P B= ⋅

C = 1.5 (humid areas); 0.2 (desert areas)

C - Constant( )pQ C P B

( ) ( )

(for northern USA, Ohio to Cennecticut)

Page 26: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Empirical Methods for Design Flood Estimationp g

(c) Peak discharge formulae involving rainfall intensity and drainage area

Rational formula

pQ CiA=Qp – Peak discharge in cfs

A – Catchment area in acres

C – Runoff coefficient (0 ≤ C ≤1)

i – Intensity of rainfall in inches/hr

The peak discharge corresponds to time of concentration

Used widely in storm sewer designs, because of its simplicity

1 cfs = 1 .008 acre.in/hr(the conversion is considered to be included

in the value of C)

The assumption of fixed C (i.e., ratio of peak discharge to rainfall rate) for a drainage basin is invalid.

Page 27: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

In rational formula, the value of runoff coefficient ‘C’ depends on integrated effectsof several factors.

Character and condition of soil

Rainfall intensity

Proximity of the water table

Degree of soil compaction

Porosity of subsoilPorosity of subsoil

Vegetation

Ground slopeGround slope

Ponding character of surface of catchment - Depression storage

R i f ll i t it i i d t i d b d i t it d ti f (IDF)Rainfall intensity i is determined based on intensity-duration-frequency (IDF) relationships.

Design duration D is equal to the time of concentration for the catchment.

The frequency F (or corresponding return period, T ) is chosen by hydrologist.

Page 28: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

2.5

2525

Fig. Intensity-duration-frequency curves of maximum rainfall in Chicago, USA (Source: Chow et al., 1988)

Page 29: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

(Ref: Chow et al., 1988)

Page 30: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

(Ref: Chow et al., 1988)

Page 31: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Issues with use of Empirical Methods

None of the empirical formulae will give precise results for all locations

Magnitude of flood in a catchment depends on a large set of predictor variables, and none of the formulae consider adequate number of variables.and none of the formulae consider adequate number of variables.

Predictor variables - Examples1) Precipitation

2) Physiographic catchment characteristicsDrainage Area; Length of longest channel

3) Location attributes3) Location attributesLatitude; Longitude; Altitude

4) Landuse/landcoverB il A A i l l A F A W b di W l dBuilt-up Area; Agricultural Area; Forest Area; Water bodies; Waste lands

5) Catchment shapeCompactness coefficient; Circularity ratio; Form factor; Elongation ratio

6) Drainage CharacteristicsDrainage density; Slope; Soils

Page 32: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Form factor: Ratio of area of the basin to square of its axial length (Large)

Compactness coefficient: Ratio of the perimeter of the basin to theCompactness coefficient: Ratio of the perimeter of the basin to thecircumference of a circle of area equal to the basin area. (Small)

Elongation ratio: Ratio of the diameter of a circle having area the same as theb i t th i l th f th b i (L )basin area, to the maximum length of the basin. (Large)

Circularity ratio: Ratio of the basin area to the area of a circle of sameperimeter as that of the basin. (Small)p ( )

Page 33: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Frequency Analysis Methods for Design Flood Estimation q y y g

Information at target site is augmented with that from gauged sites depicting similar hydrological behavior

Data or model parametersData or model parameters

Functional relationships between model parameters and watershed attributes

Region: A set of gauged sites depicting similar hydrological behavior

Regionalization: Process of identifying regions

R i l F A l i F l i b d R i l i f tiRegional Frequency Analysis: Frequency analysis based on Regional information

Page 34: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Approaches to Regionalization

Regression Approach- Thomas and Benson [1970], Wandle [1977], Glatfelter [1984]Choquette [1988]

ROI approach- Acreman and Wiltshire [1987] Burn [1990 a b] Zrinji and Burn [1994]- Acreman and Wiltshire [1987], Burn [1990 a,b], Zrinji and Burn [1994]

Hierarchical approach & Extension to ROI- Gabriele and Arnell [1991], Zrinji and Burn [1996]

Classification and Regression Tree (CART) based Approach- Breiman et al. [1984], Laaha and Bloschl [2006]

Canonical correlation analysis- Cavadias [1989, 1990, 1995], Cavadias et al. [2001]

Page 35: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Approaches to Regionalization

Nearest Neighbor approach- Vogel [2005]

Ensemble Prediction Approach- McIntyre et al. [2005]

Cluster analysis- Tasker [1982], Burn [1989], Burn et al. [1997], Burn and Goel [2000],

Hall and Minns [1999], Hall et al. [2002], Jingyi and Hall [2004]Rao and Srinivas [2006a b]Rao and Srinivas [2006a,b]

Page 36: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Frequency Analysis Methods for Design Flood Estimation

When T ≤ 27 years (FEH, IH Wallingford, 1999; Vol.3, P.46)

q y y g

Length of record (years)

Site Analysis Pooled Analysis

Recommendedmethod

y ( g )

(y ) y method

<T/2 No Yes Pooled analysis

T/2 to T For confirmation Yes Pooled analysis prevails

T to 2T Yes Yes Joint (site and Pooled) Analysis

> 2T Yes For confirmation Site analysis> 2T Yes For confirmation Site analysis prevails

Page 37: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Frequency Analysis Methods for Design Flood Estimation

When T > 27 years (FEH, IH Wallingford, 1999; Vol.3, P.46)

q y y g

Length of record (years)

Site Analysis Pooled Analysis

Recommendedth d

y ( g )

(years) Analysis method

<14 No Yes Pooled analysis

14 to T For confirmation Yes Pooled analysis prevails

T to 2T Yes Yes Joint (site and Pooled) Analysis

> 2T Yes For confirmation Site analysis> 2T Yes For confirmation Site analysis prevails

Page 38: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Assumptions in flood frequency analysis (FFA)p q y y ( )

At-site FFA: The flood (predictand) data at the target site are• serially independent• identically distributed in time

Regional FFA : The flood data at any site in a region are• serially independent• identically distributed in timey

The flood data at different sites in the region• are independent in space• have identical frequency distributions apart from a scale factor• have identical frequency distributions, apart from a scale factor

The causal precipitation (predictor) system is stochastic space independent andThe causal precipitation (predictor) system is stochastic, space-independent and time-independent (Chow et al., 1988).

Page 39: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Rainfall-runoff method for Flood Estimation

Involves estimation of flood frequency from rainfall frequency using ahydrological model which links rainfall to resultant runoff.y g

The method is appealing as it yields a hydrograph of river flow rather thanjust an estimate of peak flow.

Page 40: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Unit hydrograph method for Design Flood Estimation

Involves developing regression relationships between the physicalcharacteristics of catchments and parameters of their unit hydrographs, toarrive at a synthetic unit hydrograph for estimation of design floodarrive at a synthetic unit hydrograph for estimation of design flood.

Page 41: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Factors influencing the choice of Flood Estimation Approach

1. Type of Problem

h i f ll ff h h ld ll b d f i i fThe rainfall-runoff approach should generally be used for estimation of• Reservoir flood

• Probable maximum flood

P bl i l i i• Problems involving storage routing

2. Type of catchment

• Statistical approach should generally be used for large catchments (>1000 km2)

• Rainfall-runoff approach is favored when sub-catchments are disparate/dissimilar

3. Type of data

• Statistical approach is favoured when the gauged record is longer than 2 or 3 years

R i f ll ff h i f d h th i ti fl d b t• Rainfall-runoff approach is favored when there is no continuous flow record, butrainfall and flow data are available for 5 or more flood events

(FEH, 1999)

Page 42: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Points to be noted with reference to case studies in India

Several of the sub-zones delineated by CWC (1983) are heterogeneous

Need to delineate alternative homogeneous regions

Need to consider Partial duration Series

Regions need not be strictly contiguous

Regions need not be hard and can be fuzzyRegions need not be hard, and can be fuzzy

Page 43: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Selection ofattributes

Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds

Forming clusters

Selecting optimum number of regionsSelecting optimum number of regions

Testing the regions for homogeneity

Are the regionsNo

Adjustment of Homogeneous?

Yes

heterogeneous regions

Determining values of predictands

Page 44: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Selection ofattributes

Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds

Forming clusters

Selecting optimum number of regionsSelecting optimum number of regions

Testing the regions for homogeneity

Are the regionsNo

Adjustment of Homogeneous?

Yes

heterogeneous regions

Determining values of predictands

Page 45: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

B i Ch t i ti F R i l A l i

Physiographic characteristics

Basin Characteristics For Regional Analysis

y g pDrainage area, Length and slope of main stream

Soil characteristicsInfiltration potential; Runoff coefficient Permeability Bulk densityInfiltration potential; Runoff coefficient, Permeability, Bulk density

Land use characteristicsForests, Agricultural, Suburban or urban land

Drainage characteristicsDrainage density

Meteorological characteristicsStorm direction, Mean annual precipitation

Directional statisticsDirectional statisticsSeasonality of flood events

Page 46: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Basin Characteristics For Regional Analysis (Contd )

Geographical location attributesLatitude longitude and Altitude

Basin Characteristics For Regional Analysis (Contd…)

Latitude, longitude, and Altitude

Geologic features of the basinFraction of the basin underlain by rock formationsFraction of the basin underlain by rock formations

Shape indicators of catchmentsForm factor, Compactness coefficient, Elongation ratio, Circularity ratio

Feature vector : vector containing basin characteristics

Page 47: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Selection ofattributes

Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds

Forming clusters

Selecting optimum number of regionsSelecting optimum number of regions

Testing the regions for homogeneity

Are the regionsNo

Adjustment of Homogeneous?

Yes

heterogeneous regions

Determining values of predictands

Page 48: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Clustering Clustering Algorithms available in literatureAlgorithms available in literatureClustering Clustering Algorithms available in literatureAlgorithms available in literature

The clustering algorithms available in literature can be broadly classified as

Partitional [e.g., K-means and K-medoids algorithms, fuzzy c-means clustering]

Hierarchical [e.g., single-linkage, complete-linkage and Ward’s algorithms.]

Density-based [e.g., DBSCAN (Density-Based Spatial Clustering of

Applications with Noise) and OPTICS (Ordering Points To Identify the Clustering

Structure]

Grid-based [e.g., STING (a STatistical INformation Grid approach) and CLIQUE

(Clustering In QUEst).].

Model-based and [e.g., COBWEB ]

Boundary-detecting. [e.g., SVC (Support Vector Clustering) ]

Page 49: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Regionalization using Soft Clustering Techniques

Fuzzy ClusteringBargaoui et al [1998] Hall and Minns [1999] Rao and Srinivas[2006]Bargaoui et al., [1998], Hall and Minns [1999], Rao and Srinivas[2006]

Artificial Neural Network Clustering

SOFMsHall and Minns [1999], Hall et al. [2002], Jingyi and Hall [2004]

G i N l G N t kGrowing Neural Gas NetworksSrinivas and Tripathi [2006]

Fuzzy SOFMSrinivas et al. [2008]

Page 50: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Fuzzy C-Means Algorithm

),()() :,( 2

1 1ik

c

i

N

kik du VXXVUJ ∑ ∑

= =

μ=Minimize

subject to the following constraints:

{ }N,,kuc

iik K1 1

1∈∀=∑

=

subject to the following constraints:

{ }c,,iNuN

kik K1 0

1∈∀<< ∑

=

initinit

∈[1, ] refers to weight exponent or fuzzifierμ ∞

(1) Initialise fuzzy cluster centroids: initc

init VV ,,1 K

(2) Compute fuzzy memberships:

1 ( 1)

2

1 ( 1)

1( )

for 1 11

/

initk i

ik /c

d ,u i c, k N

μ

μ

⎛ ⎞⎜ ⎟⎝ ⎠= ≤ ≤ ≤ ≤⎛ ⎞

X V

21

1( )

c

initi k id ,=

⎛ ⎞⎜ ⎟⎝ ⎠

∑ X V

Page 51: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

( )N

ik ku μ∑ X(3) Update fuzzy cluster centroids for i =1,2,…,c

( )1

1

ki N

ikk

u μ

=

=

=

∑V

(4) Update the fuzzy membership as :iju

1 ( 1)

21

( )f 1 1

/

k id ,k N

μ−⎛ ⎞⎜ ⎟⎝ ⎠X V

1 ( 1)

21

( ) for 1 1

1( )

k iik /c

i k i

,u i c, k N

d ,

μ−

=

⎝ ⎠= ≤ ≤ ≤ ≤⎛ ⎞⎜ ⎟⎝ ⎠

∑ X V

Repeat steps (3) and (4) until the change in the value of the memberships between two successive iterations becomes sufficiently small

⎝ ⎠

successive iterations becomes sufficiently small.

Page 52: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Conventional methods suggest hardening fuzzy partition matrix

Nk uuu 1111 ⎤⎡ LL

iNiki

Nk

uuu

uuu

1

1111

⎥⎥⎥⎥⎥⎤

⎢⎢⎢⎢⎢⎡

=MMM

LL

MMM

U

NccNckc uuu 1 ×⎥⎥⎦⎢

⎢⎣ LL

MMM

1 }{ max1

==≤≤

ikci

jk uu jiuik ≠∀= 0

dd xv −== min}{min 1=jku jiu ≠∀= 0thenkici

ikci

jk dd xv −==≤≤≤≤ 11

min}{min 1=jku jiuik ≠∀= 0then

Instead, Rao and Srinivas (2006a) suggested using a threshold membership of 1/c to form fuzzy clusters

Srinivas et al. (2008) defined threshold for each feature vector

⎭⎬⎫

⎩⎨⎧

⎥⎦⎤

⎢⎣⎡=

≤≤)u(,

cmaxT ik

cik max

1211

Page 53: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Self-Organizing Feature Map (SOFM) Clustering

(1) Initialize weights {wij , i =1,…n, j =1,…m} of the connections from the n input nodes to the m output nodes, randomly.

kx1 m output nodes

nkx

p

Page 54: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

(2) Draw an input vector Xk (randomly without replacement) and compute its distance from j-th output node Wj . Find the winning output node as:

mjttarg jkj

,1,2, )()(min ω K=−= WX

kx1 m output nodes

nkx

p

Page 55: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

(3) Update the weight vectors of winner and its neighboring nodes:

)]()()[()()()1( ω ttthttt jkjjj WXWW −+=+ η )]()()[()()()( ω jk,jjj η

Repeat steps (2) and (3) over several iterations, until no changesin the Kohonen layer mapping are observed.

kx1 m output nodes

nkx

p

Page 56: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Decrease in topological neighborhood with increase in iterations

kx1 m output nodes

nkx

p

Page 57: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

)(ω, th jNeighborhood function: Learning rate parameter :

)(tη

⎞⎛)0(η has value close to 0.1⎟⎟

⎞⎜⎜⎝

⎛−=

1exp)0()(

τηη tt

, a constant, is set equal to the maximum number of iterations (1000 in this study).

1τ maxt

⎟⎟⎞

⎜⎜⎛−=)( 2

2ω,

ωd

expth jj ⎟

⎠⎜⎝ )(2

)( 2ω,t

pjσ

is the topological distance between the winning node and its neighboring node jjd ,ω ω

jjd rr −= ωω, ⎟⎟⎠

⎞⎜⎜⎝

⎛−=

2)0()(

τσσ texpt

Effective width of the topological neighborhood at time t :

(0)ln max

2 στ

t=

)(tσ

ωr

Effective width of the topological neighborhood at time t :

Discrete vector defining the position of the winning node :

Discrete vector defining the position of the neighboring node j : r

)(tσ

Discrete vector defining the position of the neighboring node j :

Radius of lattice in the output layer of SOFM :

jr)0(σ

Page 58: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Existing thumb rules to decide number of nodes in output layer of SOFM

chose nodes to be at least two times CExp (Hall and Minns,1999)

chose nodes to be at least three times CExp (Hall et al., 2002)

CExp : Expected number of clusters

N t C i t d i i t k i i

Proposed method to design output layer of SOFM

Note: CExp in a study region is not known a priori.

Proposed method to design output layer of SOFM

For accurate estimation of flood quantiles with non-exceedance probability upto 0.99,regions of size 20 sites are reasonable (p.59, pp.119-123, Hosking and Wallis, 1997).g (p , pp , g , )

CExp = Number of sites in study region/20 = 245/20 ≈ 12

Page 59: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

15

20

25

hits

5

10

15

Num

ber o

f

Case study:Indiana

01 3 5 7 9 11 13 15 17 19 21 23

Output neuron

65

10

15

Num

ber of Hi 7

80

5

10

15

Num

ber of Hits

2

3

4

5

2

34

560

its

23

45

67

34

56

780

s

11 12

12

Count maps for Kohonen layers obtained with classical SOFM

Page 60: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Case study: Ohio

30

40

H

20

40Num

be

3

4

3

40

10

20

Hits

3

4

5

3

4

50

20

er of Hits

1

2

1

2

1

2

1

2

3

The feature maps generated did not reveal any information that would allow selectionThe feature maps generated did not reveal any information that would allow selection of an appropriate number of clusters

3 options considered:

1-D SOFM with number of nodes in Kohonen layer equal to the expected number of regions Srinivas et al. [2003]

3 options considered:

Growing Neural Gas Networks Srinivas and Tripathi [2006]

Fuzzy SOFM Srinivas et al. [2008]

Page 61: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Fuzzy SOFM

c clusters formedat level-2

kx1 m´ prototypes formed at level-1

nkx

Srinivas et al. (2008)

Page 62: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Selection ofattributes

Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds

Forming clusters

Selecting optimum number of regionsSelecting optimum number of regions

Testing the regions for homogeneity

Are the regionsNo

Adjustment of Homogeneous?

Yes

heterogeneous regions

Determining values of predictands

Page 63: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Selecting optimum number of regions using cluster validity measures

Partition Coefficient

Selecting optimum number of regions using cluster validity measures

(Bezdek, 1974)

In Hydrology:Bargaoui et al., 1998; Hall and Minns 1999

2

1 1

1 c N

PC iki k

V ( ) ( u )N = =

= ∑∑UMax Range: [1/c,1]

Partition Coefficient ( e de , 9 )

Hall and Minns, 1999

N⎡ ⎤

Partition Entropy (Bezdek, 1974)

1 1

1 c N

PE ik iki k

V ( ) u log( u )N = =

⎡ ⎤= − ⎢ ⎥

⎣ ⎦∑∑UMin Range: [0, log c]

11)(

1)(−×

−=c

VcFPI PC U

U Range: [0, 1]Min

Fuzziness Performance Index In Hydrology:Guler and Thine, 2004

(Roubens, 1982)

1−c

Page 64: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

(Roubens, 1982)Normalized Classification Entropy

)log()()(

cVNCE PE UU = Range: [0, 1]Min In Hydrology:

Guler and Thine, 2004

Extended Xie-Beni Index (Xie and Beni, 1991)

2

1 12( : )

min

c N

ij i ki k

XB,mi k

( u )V ,

N

μ

= =−

=−

∑∑ V XU V X

V V In Hydrology:i ki ,i k≠

Kwon’s Index (Kwon, 1998)

In Hydrology:Rao and Srinivas, 2006

22

1 1 12

1

( : )min

c N c

ij i k ii k i

XB,m

( u )cV ,

N

μ

= = =− + −

=∑∑ ∑V X V V

U V XV V

( )

min i ki ,i kN

≠−V V

Page 65: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Selection ofattributes

Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds

Forming clusters

Selecting optimum number of regionsSelecting optimum number of regions

Testing the regions for homogeneity

Are the regionsNo

Adjustment of Homogeneous?

Yes

heterogeneous regions

Determining values of predictands

Page 66: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Definition of Regional HomogeneityDefinition of Regional Homogeneity

A region is said to be homogeneous if data at all sites in that region followA region is said to be homogeneous if data at all sites in that region followsame distribution.

Degree of homogeneity is decide by comparing the between-site dispersionof the sample L-moment ratios for the sites in a region, with dispersion thatwould be expected for a homogeneous region.

66

Page 67: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Testing Regions for HomogeneityTesting Regions for Homogeneity using Heterogeneity Measures of (Hosking and Wallis, 1997)

Heterogeneity measure (HM) based on L-coefficient of variation (L-CV):

V

V1 σ

)μ(VH

−=

)μ(V

2

2

V

V22 σ

)μ(VH

−=HM based on L-CV and L-Skewness:

HM based on L-Skewness and L-Kurtosis: 3

V

V33 σ

)μ(VH

−=

3Vσ

Page 68: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

)(itL-CV at site i :

)(3it

)(i

L-skewness at site i :

k i i

NN iR )(

)(4itL-kurtosis at site i :

∑∑==

=i

ii

ii

R ntnt11

)(Regional average L-CV:

∑∑==

=N

ii

N

i

ii

R ntnt11

)(33

Regional average L-skewness:

Regional average L-kurtosis: ∑∑==

=N

ii

N

i

ii

R ntnt11

)(44

Page 69: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Weighted standard deviation of at-site sample L-CVs:

{ } 2/1

11

2)( )(V ∑∑ −===

N

ii

N

i

Ri

i nttn

Weighted average distance from the site i to the group weighted mean in the 2-dimensional space of L-CV and L-skewness:

{ } ∑∑ −+−=N

ii

N

i

RiRii nttttn

11

2/123

)(3

2)(2 )()(V

== ii 11

Weighted average distance from the site i to the group weighted mean in the 2-dimensional space of L-skewness and L-Kurtosis:

{ } ∑∑ −+−=N

ii

N

i

RiRii nttttn

11

2/124

)(4

23

)(33 )()(V

in the 2 dimensional space of L skewness and L Kurtosis:

== ii 11

Page 70: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Vμ : Mean of V values from Monte Carlo (MC) simulationsVμ

2Vμ

: Mean of V values from Monte-Carlo (MC) simulations

: Mean of V2 values from MC simulations

3Vμ : Mean of V3 values from MC simulations

2Vσ

: Standard deviation (SD) of V values from MC simulations

: SD of V2 values from MC simulations2

3Vσ : SD of V3 values from MC simulations

MC simulations 500

Page 71: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

HETEROGENEITY MEASURES (Hosking and Wallis, 1997)

VALIDATION

Heterogeneity measure (HM) based L-CV: V

V1 σ

)μ(VH

−=

2V22

)μ(VH

−=HM based on L-CV and L-Skewness:

2V2 σ

H

3V3 )μ(V −HM based on L-Skewness and L-Kurtosis:

3

3

V

V33 σ

)μ(H =

Acceptably homogeneous : H < 1

Possibly heterogeneous : 1≤ H < 2

Definitely heterogeneous : H ≥ 2

Page 72: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Selection ofattributes

Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds

Forming clusters

Selecting optimum number of regionsSelecting optimum number of regions

Testing the regions for homogeneity

Are the regionsNo

Adjustment of Homogeneous?

Yes

heterogeneous regions

Determining values of predictands

Page 73: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Adjustment of Heterogeneous ClustersHosking and Wallis (1997):

eliminating one or more sites from the data set

dividing a region to form two or more new regions

allowing a site to be shared by two or more regions

i it f i t th imoving one or more sites from a region to other regions

dissolving regions by transferring their sites to other regions

merging a region with one or more regionsg g g g

merging two or more regions and redefining groups

obtaining more data and redefining regions

Page 74: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Adjustment of Heterogeneous ClustersAdjustment of Heterogeneous Clusters

The primary option that is considered for adjusting a heterogeneous cluster is elimination of one or more sites that are grossly discordant

ith t t th it ithi th l t Th l di d twith respect to other sites within the cluster. The grossly discordant sites are identified using the discordancy measure given by Equation

11 T 11 ( ) ( )3i k i iD N −= − −u u S u u

where is discordancy measure for ith site in a cluster k having sites.iD kNwhere is discordancy measure for ith site in a cluster k having sites.

is a vector containing L- moment ratios (LMR) of predictand at site i

i th i ht d f th LMR d S i i t i

i k

T( ) ( )( )

i 3 4 i iit t t⎡ ⎤⎢ ⎥⎣ ⎦

=u

u is the unweighted group average of the LMR and S is a covariance matrix

kNi∑ u T( )( )

kN∑S

u

1i

i

kN=∑

=u

uT

1( )( )i i

i== − −∑S u u u u

74

Page 75: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]
Page 76: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Selection ofattributes

Preparing feature vectors Steps in RegionalFrequency Analysis of watersheds

Forming clusters

Selecting optimum number of regionsSelecting optimum number of regions

Testing the regions for homogeneity

Are the regionsNo

Adjustment of Homogeneous?

Yes

heterogeneous regions

Determining values of predictands

Page 77: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Growth curve estimation using index flood method

The flood quantile estimate at site i for T-year return period is determined using index flood method (Dalrymple, 1960) by constructing a growth curve as,

ˆ ˆ ˆ( ) ( )i i kQ T q Tμ=

ˆiμ : Index flood value for site i (e.g., mean, or median of flood data)

Page 78: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Probability Distributions to fit peak flowsProbability Distributions to fit peak flows

Two parameter distributionsGumbel distribution (Extreme value type-1)Log Normal 2 (LN2) parameter distributionLog-Normal 2 (LN2) parameter distributionTwo parameter Gamma distribution

Three parameter distributionsGeneralized Pareto (GPA) distributionGeneralized Extreme value (GEV) distributionGeneralized logistic (GLO) distributionLog-Normal 3-parameter (LN3) distribution; Generalized Normal (GNO) distributionLog Normal 3 parameter (LN3) distribution; Generalized Normal (GNO) distributionThree parameter Gamma distribution; Pearson type-3 (P3) distributionLog- Pearson type-3 (LP3) distribution

Four parameter distributionFour parameter distributionKappa distributionFour-parameter Wakeby distribution

Five parameter distributionWakeby distribution

Page 79: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Which probability distribution fits the flood data?Which probability distribution fits the flood data?

Procedures for Identification

Probability papers

Goodness-of-fit testsGoodness of fit tests

Chi-Square test

Kolmogorov-Smirnov test

L-moment ratio diagram

Page 80: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Probability Papery pFlood data are plotted on probability paper to check if a frequency distribution fits the data

Probability papers are specifically designed for the frequency distributions

One of the axes represents the values of peak flows other axis represents exceedence or non-exceedence probability associated with peak flows, or return period

The plotted data appears close to a straight line if the frequency distribution fits the data

Flood quantile is estimated graphically by interpolation or extrapolation of the determined linear relationship between abscissa and ordinatelinear relationship between abscissa and ordinate

Page 81: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]
Page 82: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]
Page 83: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

S No Formula b Equation

Plotting Position Formulae

S.No. Formula b Equation

1 California -

2 M difi d C lif i

( )m

mP X xn

≥ =

1m2 Modified California -

3 Hazen 0.5

( ) 1m

mP X xn−

≥ =

( ) 0 5m

m .P X x −≥ = ( )

1 2m

m bP X xb

−≥ =

+4 Gringorten 0.4

4

( )m n

( ) 0 440 12m

m .P X xn .−

≥ =+

( )1 2n b+ −

5 Blom 3/8

6 Tukey 1/3

( ) 3 81 4m

m /P X xn /−

≥ =+

( ) 1 3m /P X x −≥ =

7 Chegodayev 0.3

( )1 3mP X x

n /≥

+

( ) 0 30 4m

m .P X xn .−

≥ =+

8 Weibull 0 ( )1m

mP X xn

≥ =+

Page 84: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Results from study of Cunnane (1978)Results from study of Cunnane (1978)

Probability distribution Best plotting position formula

Log-normal / Normal Blom

EV-1 Gringorten

( ) 3 81 4m

m /P X xn /−

≥ =+

0 44m −EV-1(Extreme value type-1)

Gringorten

Exponential Gringorten

( ) 0 440 12m

m .P X xn .−

≥ =+

( ) 0 440 12m

m .P X xn−

≥ =+

Uniform Weibull

LP 3

0 12n .+

( )1m

mP X xn

≥ =+

LP-3(Log Pearson type 3) 3 8 for 0

3 8 for 0s

s

b / Cb / C> >< <

( )1 2m

m bP X xn b

−≥ =

+ −

( ) ( )1 mF x P X x= − ≥

Page 85: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Goodness-of-fit tests

Chi-square testThe flood data at target site are divided into m class intervals.

( )2m O E

Oi: The observed number of occurrences in class interval iE Th t d b f i l i t l i

( )22

1

mi i

ii

O EE

χ=

−=∑

Ei: The expected number of occurrences in class interval i

If the class intervals are chosen such that each class interval corresponds to an

E /equal probability, then , where n is the sample size of data.iE n / m=

2 2m

im O nχ = −∑

1i

iO n

== ∑

Page 86: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

The distribution of is chi-square with degrees of freedom 2χ 1m pυ = − −

p : The number of parameters of distribution that is being tested to fit the data.

2 2zυ

χ ∑

: random variable among independent standard normal random variables

1i

izυχ

==∑

iz

The hypothesis that the data are from a specified distribution is rejected at significance level if following Equation is satisfied.

α

2 21,υ αχ χ −>

Page 87: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Kolmogorov-Smirnov Goodness-of-fit testKolmogorov Smirnov Goodness of fit test

The Kolmogorov-Smirnov test compares the sample cumulative frequency function and e o ogo o S o test co pa es t e sa p e cu u at e eque cy u ct o a dtheoretical cumulative distribution function (determined from the theoretical frequency distribution under test) to judge whether a chosen distribution fits the data acceptably closely.

The observed flood data at target site are divided into m class intervals, and the maximum deviation over all class intervals is determined as

( ) ( )i s iD max F x F x= −

: The upper limit of ith class interval

( ) ( )1

i s ii m

D max F x F x≤ ≤

ix

The hypothesis that the data are from a specified distribution is rejected at significance level if following Equation is satisfied.

α

n,D D α≥

Page 88: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

L-Moment Ratio Diagram

1.0

0.8

0.9

1.0

0.6

0.7

GLO

urt

osis

0.4

0.5GPAGEVLN3PE3

L-K

u

0.2

0.3GE

0.0

0.1

100 080 060 040 020 000 020 040 060 080 100-0.1

-1.00 -0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80 1.00L-skewness

Page 89: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

L-Moment Ratio Diagram - Region 4L-Moment Ratio Diagram - Region 4

0.65

0.45

0.55

GLOGPA

0.25

0.35

L-ku

rtos

is

GEVLN3PE3Gumbel

0.05

0.15 ExponentialSites of Region-4Group Mean

-0.15

-0.05

-0.15 -0.05 0.05 0.15 0.25 0.35 0.45 0.55 0.65L-skewness

Page 90: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

1 n

Wang (1996) derived the following direct sample estimators of the L-moments

111

1 n

j:nnj

ˆ xC

λ=

= ∑

( )12 1 1

12

1 12

nj n j

j:nnj

ˆ C C xC

λ − −= −∑122 jC =

( )1 13 2 1 1 2

1 1 23

nj j n j n j

j:nˆ C C C C xλ − − − −= − +∑( )3 2 1 1 2

133 j:nnjC =∑

( )1 1 11 1 nj j j j j jˆ ∑( )1 1 1

4 3 2 1 1 2 314

1 1 3 34

j j n j j n j n jj:nn

j

ˆ C C C C C C xC

λ − − − − − −

== − + −∑

Page 91: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

2 1Coefficient of L variation ( CV), L t l l− − =

3 3 2L skewness, = t l l−

4 4 2L kurtosis, t l l− =

2In general, = r rt l l

Page 92: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Regional Goodness-of-fit Test

: Number of sites in region kkN

in : Number of peak flow values at site i

( )it ( )3it ( )

4it : L-moment ratios computed using peak flow values at site i4 p g p

Compute the regional average L-moment ratios

( )

1 1

k kN Nr ii i

i i

N N

t n t n= =

⎫= ∑ ∑ ⎪

⎪⎪⎪⎪( )

3 31 1

k k

k k

N Niri i

i i

N N

t n t n= =

⎪⎪

= ∑ ∑ ⎬⎪⎪⎪⎪( )

4 41 1

k kN Niri i

i it n t n

= =

⎪= ∑ ∑ ⎪⎭

Page 93: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

1set 1rl = by dividing flood values at each site by their mean

Fit the distribution being tested using the regional average L-moments

DIST4DISTτ L-kurtosis of the fitted distribution

4rt L-kurtosis of the observed data

The goodness of fit is judged by the difference between and 4DISTτ 4

rt

Compare the difference with standard deviation (i.e., sampling variability) of

obtained by repeated simulation of the homogeneous region using kappa distribution

4rt

Regional goodness-of-fit measure ( )4 4 4

DIST rDIST

t BZ

τ − +=g g

Page 94: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Regional goodness-of-fit measure ( )4 4 4

DIST rDIST

t BZ

τ

σ

− +=

[ ]( )44

simNm rt t−∑4B 4

rt: Bias of ( )1

4m

simB

Nθ==

4 4t: Bias of

[ ]( )( )

22

4 441

4 1

simNm r

simm

t t N B

Nσ =

− −=

∑4rt: Standard deviation of 4σ

( )1simN −

[ ] : Regional average L-kurtosis computed using the synthetic data simulated for mth realization [ ]4mt

Simulate a large number of realizations (Nsim = 500) of the region using kappa distribution. Each realization constitutes a homogeneous region, with synthetic data of flood simulated at each of the sites equal to its real-world counterpart. Further, in each realization, the data simulated at any site in the region is serially independent and the data simulated at different sites in the region are not cross-correlated

Page 95: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Hosking & Wallis (1997)

Page 96: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

1.0

0.8

0.9

0.6

0.7

GLOGPAu

rtos

is

0.4

0.5GPAGEVLN3PE3G

L-K

u

0.2

0.3GE

Fittedobserved

0.0

0.1

-1.00 -0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80 1.00

observed

-0.11.00 0.80 0.60 0.40 0.20 0.00 0.20 0.40 0.60 0.80 1.00

L-skewness

Page 97: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Example: Generalized extreme value distribution

( ) ( )11, if 0

yk y ex k kξ α

−−∞ < ≤ + >⎧

⎪⎨

Probability density function

( ) ( )11 , , if 0, if 0

k y ef x e x kk x k

αξ α

− − −− ⎪= −∞ < < ∞ =⎨⎪ + ≤ < ∞ <⎩

( )1 1 , if 0k x

ln kk

y

ξα

⎧ ⎡ ⎤−− − ≠⎪ ⎢ ⎥⎪ ⎣ ⎦= ⎨( )

, if 0

yx

α

⎨−⎪

=⎪⎩

( )yeF x e

−−=Cumulative distribution function

Page 98: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

2

3

2 27 8590 2 9554 , 33 r

lnk . c . c clnt

= + = −+

Shape parameter

( ) ( )2

1 2 1

r

kl k

⋅=

− Γ +Scale parameter

( )1 1 1rl kkαξ ⎡ ⎤= + Γ + −⎣ ⎦Location parameter

How to compute ? ( )1 kΓ +

( )1 1 0y ty t e dt , y∞

−Γ + = + >∫

( )Γ : Gamma function

( ) ( ) ( )3 15 1 2 15 2 15 2 15. . . .Γ = Γ + = Γ0∫

Properties of Gamma function

( ) ( ) ( )( ) ( )

( )2 15 1 1 15 2 15 1 15 1 15

2 15 1 15 1 0 15

. . . . .

. . .

= Γ + = × Γ

= × Γ +

( ) ( )( ) ( )( ) ( )

1 if 0

1 if 1

1 ! if is a positive integer

y y y , y

y y / y, y

n n n

Γ + = Γ >

Γ = Γ + <

Γ

( ) ( ) ( )

( )

0 75 1 0 250 75

0 75 0 75. .

.. .

Γ − + ΓΓ − = =

− −( ) ( )1 ! if is a positive integern n nΓ = − ( )1 0 25

0 75 0 25.

. .Γ +

=− ×

Page 99: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

( ) ( )2 81 2 81 1 if 0 y 1y b y b y b y yεΓ + = + + + + + ≤ ≤K

1 5

2 6

3 7

0.577191652 0.7567040780.988205891 0.4821993940.897056937 0.193527818

b bb b

b b

= − = −= == − = − ( ) 73 10yε −≤ ×

3 7

4 80.918206857 0.035868343b b= =

Page 100: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

1.0

0.8

0.9

0.6

0.7

GLOGPAu

rtos

is

0.4

0.5GPAGEVLN3PE3G

L-K

u

0.2

0.3GE

Fittedobserved

0.0

0.1

-1.00 -0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80 1.00

observed

-0.11.00 0.80 0.60 0.40 0.20 0.00 0.20 0.40 0.60 0.80 1.00

L-skewness

Page 101: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Application Examples

REGIONALIZATION OF INDIANA WATERSHEDS

Page 102: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Table 1.5.1. Attributes considered for regionalization of Indiana watersheds.

Attribute Range

Attributes considered for regionalizationAttribute Range

Drainage Area 0.28 – 28,813 km2 Mean Annual Precipitation 86.36 – 116.84 cm

Main channel Slope 0.17 – 50.57 m/kmpMain channel Length 0.48 – 506.94 km

Basin Elevation 125.6 – 362.7 m Storage* 0% – 11%g

Soil Runoff coefficient 0.30 –1.00 Forest cover in Drainage Area 0.0 – 88.4%

I(24,2)** 6.60 – 8.51 cm( , )

*Storage – percentage of the contributing drainage area covered by lakes, ponds or wetlands **I(24,2) – 24-hour rainfall having a recurrence interval of 2 years.

Source: Indiana Department of Natural Resources (IDNR), Division of Water; USGS

Page 103: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Characteristics of the regions formed using Regression Approach (Glatfelter, 1984)

Region number NS RS

Heterogeneity Measure Region type

H H H

g g g pp ( , )

NS RS H1 H2 H3

1 16 598 4.85 1.39 -0.62 Heterogeneous

2 31 1191 4.99 0.96 -0.62 Heterogeneous

3 60 3449 2.09 0.14 -0.40 Heterogeneous

4 46 1294 1.08 1.65 0.20 Possibly HomogeneousHomogeneous

5 35 852 2.96 -0.24 -1.47 Heterogeneous

6 32 913 4.57 2.96 2.13 Heterogeneous

7 22 901 10.81 2.31 0.72 Heterogeneous

NS : Number of stationsRS : Region size in station years

Page 104: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Characteristics of the regions formed using Fuzzy c-means clustering (Rao and Srinivas 2006)

Region number NS RS

Heterogeneity Measure Region type

H H H

(Rao and Srinivas, 2006)

NS RS H1 H2 H3

1 52 911 0.63 0.83 -0.07 Homogeneous

2 60 2010 0.89 0.92 -0.08 Homogeneous

3 40 837 0.23 0.95 0.11 Homogeneous

4 75 3274 0.79 0.80 -0.08 Homogeneous

22 9 0 9 0 13 0 H5 22 975 0.97 -0.13 -0.77 Homogeneous

6 10 431 13.51 5.71 2.62 Heterogeneous

7 24 1012 0.48 0.03 0.79 Homogeneousg

8 14 160 0.99 -0.24 -1.70 Homogeneous

NS : Number of stationsRS : Region size in station years

Page 105: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Characteristics of the regions formed using Fuzzy SOFM (Srinivas et al 2008):

Region number NS RS

Heterogeneity Measure Region type

H1 H2 H3

Characteristics of the regions formed using Fuzzy SOFM (Srinivas et al., 2008):

H1 H2 H3

1 45 674 0.65 -0.30 -0.93 Homogeneous

2 55 1869 0.76 0.41 -0.37 Homogeneous

3 29 608 -0.32 0.93 0.31 Homogeneous

4 62 2820 0.86 0.69 -0.46 Homogeneous

5 24 1121 0 93 0 11 0 82 Homogeneous5 24 1121 0.93 -0.11 -0.82 Homogeneous

6 11 467 13.77 6.05 2.41 Heterogeneous

7 25 990 0.82 0.42 1.27 Homogeneous

8 16 188 0.54 -0.69 -1.99 Homogeneous

NS : Number of stationsRS : Region size in station years

Page 106: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

12

7126

345 45

Glatfelter Regions Regions by Fuzzy SOFM clustering

Page 107: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]
Page 108: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]
Page 109: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Assessment of Predictive PerformanceAssessment of Predictive Performanceusing Leave one-out Cross Validation

Page 110: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Flood data and basin attributes for sitess =1,…, N

Take s = 1

Consider site s as ungauged

For SOFM method compute centroids ofFor SOFM method compute centroids of region(s) Vi=1,…, NR

For SOFM method assign site s to the regions using the procedure given in section 2.6

For Glatfelter regions assign site s to a region based on itsFor Glatfelter regions assign site s to a region based on its geographical location

For CCA method find neigborhoods for site s using procedure described in Appendex B of Ouarda et al.(2001) for chosen

confidence level

Determine growth curves of region(s)/neighborhoods of s using index flood method

Determine regression relationship between basin attributes and mean annual flood for region(s)/neigborhood

s = s + 1

For SOFM method compute flood quantiles of site s using Eq.(36), whereas for CCA and Glatfelter method use Eq.(35)

Is s > N ?No

YesCompute performance measures (e.g., R-bias, R-RMSE, NMSE)

Page 111: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

1

ˆ ˆ1ˆ

N a Ri i

aii

Q QR biasN Q

=

⎛ ⎞−− = ⎜ ⎟⎜ ⎟

⎝ ⎠∑1i=

2ˆ ˆ1N a RQ Q⎛ ⎞∑

1

i iaii

Q QR RMSEN Q

=

⎛ ⎞−− = ⎜ ⎟⎜ ⎟

⎝ ⎠∑A i fl d ilˆ aQ

21 ˆ ˆ( )N

a Ri iQ Q−∑

At-site flood quantile

ˆ RiQ Regional quantile

aiQ

1

2

( )

1 ˆ ˆ( )( 1)

i ii

Na ai i

Q QN

NMSEQ Q

N

==−

∑1( 1) iN =−

Page 112: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Leave one-out cross validationea e o e ou c oss a da o

T

Two-level SOFM Regional Regression CCA

R-bias R-RMSE NMSE R-bias R-RMSE NMSE R-bias R-RMSE NMSET R-bias R-RMSE NMSE R-bias R-RMSE NMSE R-bias R-RMSE NMSE

2 -0.139 0.618 0.065 -0.174 0.811 0.075 -0.198 0.755 0.071

10 -0.141 0.592 0.073 -0.163 0.785 0.086 -0.217 0.798 0.081

20 -0.145 0.605 0.080 -0.162 0.779 0.092 -0.223 0.816 0.086

100 -0.163 0.688 0.107 -0.166 0.805 0.117 -0.242 0.884 0.108

Ordinary least square regional regression equation is used to estimateindex flood value for the target ungauged site

Page 113: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

REGIONALIZATION OF Ohio WATERSHEDS

Page 114: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

GNG Clusters GNG Regions

Page 115: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

B

A

B

A

C

Koltun (2003) GNG Regions

Page 116: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Application Example

REGIONALIZATION OF WATERSHEDSREGIONALIZATION OF WATERSHEDS

IN GODAVARI BASIN

Page 117: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Forming Watershed Clusters using Cluster Analysis

Hard clusteringMosley [1981], Tasker [1982], Burn et al. [1997] (Hierarchical)

Burn [1989], Burn and Goel [2000] (Partitional)Rao and Srinivas[2006] (Hybrid)[ ] ( y )

Fuzzy ClusteringBargaoui et al., [1998], Hall and Minns [1999], Rao and Srinivas[2006]

Artificial Neural Network Clustering

SOFMsHall and Minns [1999] Hall et al [2002] Jingyi and Hall [2004]Hall and Minns [1999], Hall et al. [2002], Jingyi and Hall [2004]

Growing Neural Gas NetworksSrinivas and Tripathi [2006]

Fuzzy SOFMSrinivas et al. [2008]

Page 118: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Study Area

Fig Stream network and gauges in Godavari basin (Area=3 12 813 km² )Fig. Stream network and gauges in Godavari basin (Area=3,12,813 km )

Page 119: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Performance Measures Considered to Compare the RegionsPerformance Measures Considered to Compare the Regions

ˆ ˆ( ) ( )1( ) 100ˆ ( )

N ai i

ai

q T q TR bias TN q T

⎛ ⎞−− = ×⎜ ⎟⎜ ⎟

⎝ ⎠∑2

ˆ ˆ( ) ( )1( ) 100N a

i iq T q TR RMSE T⎡ ⎤⎛ ⎞−⎢ ⎥×⎜ ⎟⎜ ⎟∑

1( )ii

q T= ⎝ ⎠

1

( ) 100ˆ ( )

i iaii

R RMSE TN q T

=

⎢ ⎥− = ×⎜ ⎟⎜ ⎟⎢ ⎥⎝ ⎠⎢ ⎥⎣ ⎦∑

1 N2

1

2

1 ˆ ˆ[ ( ) ( )]( )

1 ˆ ˆ[ ( ) ( )]( 1)

ai i

iN

a ai i

q T q TN

NMSE Tq T q T

N

=

−=

∑1( 1) iN =− ∑

ˆ ( )iq T ˆ ( )aiq Tand are regional and at-site estimates of T-year quantile of predictand

at site iat site iˆ ( )aiq T represents mean of for all the sites in the study region.ˆ ( )a

iq T

119

Page 120: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Performance comparison – New Regions and CWC sub-zones inGodavari river basin.

T Newly delineated regions Sub-zones

R-bias R-RMSE NMSE R-bias R-RMSE NMSE

5 -3 635 6 788 0 002 -5 185 8 741 0 0045 3.635 6.788 0.002 5.185 8.741 0.004

10 -5.564 10.653 0.006 -9.609 16.049 0.042

25 -6.254 17.363 0.016 -12.133 26.609 0.154

50 6 205 22 802 0 028 12 737 34 300 0 27450 -6.205 22.802 0.028 -12.737 34.300 0.274

100 -6.066 28.474 0.046 -12.800 41.712 0.417

Page 121: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Conclusions

Cluster analysis based on soft computing techniques is effective indetermining homogeneous regions for flood frequency analysis

In regionalization using FCM algorithm the fuzzy memberships ofwatersheds in clusters are useful in adjusting the regions

It is not always possible to interpret clusters from the output of anSOFM, and Fuzzy SOFM and GNG network are feasiblealternatives.

The effort needed to determine regions using conventionalregression analysis is considerable, and the approach is inefficient.

Page 122: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

Publications

Journal:

A R Rao and V V Srinivas (2006) Regionalization of Watersheds by Fuzzy ClusterA. R. Rao and V. V. Srinivas (2006) Regionalization of Watersheds by Fuzzy ClusterAnalysis. Journal of Hydrology, Elsevier, Vol.318, Issues 1-4, pp.57-79.

A. R. Rao and V. V. Srinivas (2006) Regionalization of Watersheds by Hybrid ClusterAnalysis Journal of Hydrology Elsevier Vol 318 Issues 1-4 pp 37-56Analysis. Journal of Hydrology, Elsevier, Vol.318, Issues 1 4, pp.37 56.

Srinivas, V. V., Tripathi, S., Rao, A. R., Govindaraju, R. S., 2008. Regional floodfrequency analysis by combining self-organizing feature maps and fuzzy clustering.Journal of Hydrology, 348, 148-166.Journal of Hydrology, 348, 148 166.

Text Book:Rao, A. R., Srinivas V. V., 2008. Regionalization of Watersheds – An Approach Basedon Cluster Analysis. Springer Publishers, Dordrecht, The Netherlands.on Cluster Analysis. Springer Publishers, Dordrecht, The Netherlands.

Page 123: Prof v.v.srinivas Lecture-2 Feb8 2012 Given [Compatibility Mode]

THANK YOUTHANK YOU