improving real time flood forecasting using fuzzy inference system

17
Improving real time flood forecasting using fuzzy inference system Anil Kumar Lohani a,, N.K. Goel b , K.K.S. Bhatia c a National Institute of Hydrology, Jal Vigyan Bhawan, Roorkee 247667, India b Department of Hydrology, Indian Institute of Technology, Roorkee 247667, India c Poornima Group of Institutions, Jaipur, India article info Article history: Received 5 February 2013 Received in revised form 9 November 2013 Accepted 14 November 2013 Available online 22 November 2013 This manuscript was handled by Andras Bardossy, Editor-in-Chief, with the assistance of Ezio Todini, Associate Editor Keywords: Fuzzy inference system Artificial neural network Subtractive clustering Self Organizing Map Flood forecasting Lead period summary In order to improve the real time forecasting of foods, this paper proposes a modified Takagi Sugeno (T–S) fuzzy inference system termed as threshold subtractive clustering based Takagi Sugeno (TSC-T–S) fuzzy inference system by introducing the concept of rare and frequent hydrological situations in fuzzy mod- eling system. The proposed modified fuzzy inference systems provide an option of analyzing and comput- ing cluster centers and membership functions for two different hydrological situations, i.e. low to medium flows (frequent events) as well as high to very high flows (rare events) generally encountered in real time flood forecasting. The methodology has been applied for flood forecasting using the hourly rainfall and river flow data of upper Narmada basin, Central India. The available rainfall–runoff data has been classified in frequent and rare events and suitable TSC-T–S fuzzy model structures have been suggested for better forecasting of river flows. The performance of the model during calibration and val- idation is evaluated by performance indices such as root mean square error (RMSE), model efficiency and coefficient of correlation (R). In flood forecasting, it is very important to know the performance of flow forecasting model in predicting higher magnitude flows. The above described performance criteria do not express the prediction ability of the model precisely from higher to low flow region. Therefore, a new model performance criterion termed as peak percent threshold statistics (PPTS) is proposed to evaluate the performance of a flood forecasting model. The developed model has been tested for different lead periods using hourly rainfall and discharge data. Further, the proposed fuzzy model results have been compared with artificial neural networks (ANN), ANN models for different classes identified by Self Orga- nizing Map (SOM) and subtractive clustering based Takagi Sugeno fuzzy model (SC-T–S fuzzy model). It has been concluded from the study that the TSC-T–S fuzzy model provide reasonably accurate forecast with sufficient lead-time. Ó 2013 Elsevier B.V. All rights reserved. 1. Introduction Real time flood forecasting is used to provide timely warning to people residing in flood plains and can alleviate a lot of distress and flood damage. Flood forecasting also provide useful information to water management personnel for making optimal decisions related to flood control structures and reservoirs operation. Floods are nat- ural phenomena and are inherently complex to model. Conven- tional methods of flood forecasting are based on either simple empirical black box which do not try to mimic the physical pro- cesses involved or use complex models which aim to recreate the physical processes and the concept about the behavior of a basin in complex mathematical expressions (Lohani et al., 2005a). In between these two there is a wide variety of models, e.g. deterministic and stochastic, lumped and distributed, event driven and continu- ous or their combinations (Nielsen and Hansen, 1973; Box and Jenkins, 1976; Lundberg, 1982; Yakowitz, 1985; Yapo et al., 1993; Chatterjee et al., 2001), which are the basis of conventional flood forecasting system. Existing flood forecasting models are highly data specific and complex and make various simplified assump- tions (Hecht-Nielsen, 1991; Hykin, 1992). For a reliable forecast Singh (1989) has listed three basic criteria, i.e. accuracy, reliability, and timeliness. Timeliness of forecasting is extremely important and this can be achieved by simple and robust forecasting models. Recently there has been a growing interest in soft computing techniques viz. artificial neural networks (ANNs) and fuzzy logic. ANNs are basically data driven approach and are considered as black box models (Bishop, 1994) in hydrological context. These models are capable of adopting the non-linear relationship (Hecht-Nielsen, 1991; Flood and Kartam, 1994) between rainfall and runoff as com- pared to conventional techniques, which assume a linear relation- ship between rainfall and runoff. ANNs have strong generalization ability, which means that once they have been properly trained, 0022-1694/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.jhydrol.2013.11.021 Corresponding author. Tel.: +91 1332249214; fax: +91 1332272123. E-mail addresses: [email protected], [email protected], [email protected] (A.K. Lohani), [email protected] (N.K. Goel), [email protected] (K.K.S. Bhatia). Journal of Hydrology 509 (2014) 25–41 Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

Upload: kks

Post on 26-Jan-2017

235 views

Category:

Documents


9 download

TRANSCRIPT

Page 1: Improving real time flood forecasting using fuzzy inference system

Journal of Hydrology 509 (2014) 25–41

Contents lists available at ScienceDirect

Journal of Hydrology

journal homepage: www.elsevier .com/ locate / jhydrol

Improving real time flood forecasting using fuzzy inference system

0022-1694/$ - see front matter � 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.jhydrol.2013.11.021

⇑ Corresponding author. Tel.: +91 1332249214; fax: +91 1332272123.E-mail addresses: [email protected], [email protected], [email protected]

(A.K. Lohani), [email protected] (N.K. Goel), [email protected] (K.K.S.Bhatia).

Anil Kumar Lohani a,⇑, N.K. Goel b, K.K.S. Bhatia c

a National Institute of Hydrology, Jal Vigyan Bhawan, Roorkee 247667, Indiab Department of Hydrology, Indian Institute of Technology, Roorkee 247667, Indiac Poornima Group of Institutions, Jaipur, India

a r t i c l e i n f o s u m m a r y

Article history:Received 5 February 2013Received in revised form 9 November 2013Accepted 14 November 2013Available online 22 November 2013This manuscript was handled by AndrasBardossy, Editor-in-Chief, with theassistance of Ezio Todini, Associate Editor

Keywords:Fuzzy inference systemArtificial neural networkSubtractive clusteringSelf Organizing MapFlood forecastingLead period

In order to improve the real time forecasting of foods, this paper proposes a modified Takagi Sugeno (T–S)fuzzy inference system termed as threshold subtractive clustering based Takagi Sugeno (TSC-T–S) fuzzyinference system by introducing the concept of rare and frequent hydrological situations in fuzzy mod-eling system. The proposed modified fuzzy inference systems provide an option of analyzing and comput-ing cluster centers and membership functions for two different hydrological situations, i.e. low tomedium flows (frequent events) as well as high to very high flows (rare events) generally encounteredin real time flood forecasting. The methodology has been applied for flood forecasting using the hourlyrainfall and river flow data of upper Narmada basin, Central India. The available rainfall–runoff datahas been classified in frequent and rare events and suitable TSC-T–S fuzzy model structures have beensuggested for better forecasting of river flows. The performance of the model during calibration and val-idation is evaluated by performance indices such as root mean square error (RMSE), model efficiency andcoefficient of correlation (R). In flood forecasting, it is very important to know the performance of flowforecasting model in predicting higher magnitude flows. The above described performance criteria donot express the prediction ability of the model precisely from higher to low flow region. Therefore, anew model performance criterion termed as peak percent threshold statistics (PPTS) is proposed to evaluatethe performance of a flood forecasting model. The developed model has been tested for different leadperiods using hourly rainfall and discharge data. Further, the proposed fuzzy model results have beencompared with artificial neural networks (ANN), ANN models for different classes identified by Self Orga-nizing Map (SOM) and subtractive clustering based Takagi Sugeno fuzzy model (SC-T–S fuzzy model). Ithas been concluded from the study that the TSC-T–S fuzzy model provide reasonably accurate forecastwith sufficient lead-time.

� 2013 Elsevier B.V. All rights reserved.

1. Introduction

Real time flood forecasting is used to provide timely warning topeople residing in flood plains and can alleviate a lot of distress andflood damage. Flood forecasting also provide useful information towater management personnel for making optimal decisions relatedto flood control structures and reservoirs operation. Floods are nat-ural phenomena and are inherently complex to model. Conven-tional methods of flood forecasting are based on either simpleempirical black box which do not try to mimic the physical pro-cesses involved or use complex models which aim to recreate thephysical processes and the concept about the behavior of a basinin complex mathematical expressions (Lohani et al., 2005a). Inbetween these two there is a wide variety of models, e.g. deterministic

and stochastic, lumped and distributed, event driven and continu-ous or their combinations (Nielsen and Hansen, 1973; Box andJenkins, 1976; Lundberg, 1982; Yakowitz, 1985; Yapo et al., 1993;Chatterjee et al., 2001), which are the basis of conventional floodforecasting system. Existing flood forecasting models are highlydata specific and complex and make various simplified assump-tions (Hecht-Nielsen, 1991; Hykin, 1992). For a reliable forecastSingh (1989) has listed three basic criteria, i.e. accuracy, reliability,and timeliness. Timeliness of forecasting is extremely importantand this can be achieved by simple and robust forecasting models.

Recently there has been a growing interest in soft computingtechniques viz. artificial neural networks (ANNs) and fuzzy logic.ANNs are basically data driven approach and are considered as blackbox models (Bishop, 1994) in hydrological context. These modelsare capable of adopting the non-linear relationship (Hecht-Nielsen,1991; Flood and Kartam, 1994) between rainfall and runoff as com-pared to conventional techniques, which assume a linear relation-ship between rainfall and runoff. ANNs have strong generalizationability, which means that once they have been properly trained,

Page 2: Improving real time flood forecasting using fuzzy inference system

26 A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41

they are able to provide accurate results even for cases they havenever experienced before (Imrie et al., 2000; Lohani et al., 2012).Previous studies have shown that ANNs are capable of reproducingunknown rainfall–runoff relationship adequately (ASCE, 2000a,2000b). ANN is also a powerful tool in solving complex nonlinearriver flow forecasting problems (Hsu et al., 1995, 2002; Thirumalaiahand Deo, 1998a,b; Atiya et al., 1999; Toth, 2009; Birikundavyi et al.,2002; Kar et al., 2010) and in particular when the time required togenerate a forecast is very short. Sahoo and Ray (2006) demon-strated that the ANN can outperform rating curves for dischargeforecasting. Suitability of some deterministic and statistical tech-niques along with an ANN to model an event based rainfall–runoffprocess have been investigated by Jain and Indurthy (2003). Thisinvestigation on ANN with varying architecture, training rules anderror back propagation establishes the suitability of ANN in flowforecasting. A comprehensive review of the ANN application inprediction and forecasting of water resources variables can be foundin the works by Maier and Dandy (2000).

Fuzzy rule based method, introduced by Zadeh (1965), is anothersoft computing technique recently received attention for modelinghydrological processes. It is a qualitative modeling scheme wherethe system behavior is described using natural language (Sugenoand Yasukawa, 1993). Dubois et al. (1998) state that the real powerof fuzzy logic lies in its ability to combine modeling (constructing afunction that accurately mimics the given data) and abstracting(articulating knowledge from the data). See and Openshaw (1999,2000) indicated that the fuzzy logic can be used with a combinationof soft computing technique to create sophisticated river level mon-itoring and forecasting system. Hundecha et al. (2001) and Lohaniet al. (2009, 2011) have demonstrated the applicability of fuzzy logicapproach in rainfall–runoff modeling. Rule based fuzzy logic model-ing techniques for forecasting water supply was investigated byMahabir et al. (2003). A number of studies demonstrated that thefuzzy rule-based models for deriving stage–discharge–sedimentrelationships and sediment concentration forecasts produce muchbetter results than the conventional rating curve models (Kisi,2004, 2005; Kisi et al., 2006; Lohani et al., 2007a).

Luchetta and Manetti (2003) have developed a fuzzy logic basedapproach to the forecasting of hydrological levels, particularly suit-able to cope with extreme situations, by setting different rules fortrivial and rare situations. Neurofuzzy technique based on thecombination of backpropogation and least square error methodsfor the parameter optimization is applied in short term flood fore-casting by Nayak et al. (2005b) and pointed out that the number ofparameters grows exponentially with the number of membershipfunctions resulting in large training time. Lohani et al. (2012) com-pared the performance of the ANFIS with back propagation algo-rithm based ANN and AR models for hydrological time seriesmodeling. Kisi et al. (2012) applied ANN and ANFIS to forecast dailylake-level variations. Ren et al. (2010) presented a new classifiedreal-time flood forecasting framework by integrating a fuzzy clus-tering model and neural network with a conceptual hydrologicalmodel. Takagi–Sugeno (T–S) fuzzy technique has been applied torainfall–runoff modeling and flood forecasting by variousresearchers (Xiong et al., 2001; Vernieuwe et al., 2005; Jacquinand Shamseldin, 2006; Lohani et al., 2011; Kar et al., 2012b). TheT–S fuzzy structure identification is obtained directly by fuzzyclustering approach (Chiu, 1994). Fuzzy clustering also plays animportant role in finding out homogeneous region in regional floodfrequency analysis (Kar et al., 2012a).

The above discussion reveals that the core of the Takagi–Sugenofuzzy structure identification method is in the clustering and theprojection. A limitation of the Subtractive Clustering based TakagiSugeno (SC-T–S) fuzzy model is that if any data point falls awayfrom the cluster or outside the clusters the model performancemay not be satisfactory (Nayak et al.; 2005a). Particularly in

non-structural flood management a slight improvement in theaccuracy of the real time flood forecasts has many direct advanta-ges. The input data vectors which are used to train and build theshort term flood forecasting model do not have all the same impor-tance. In such cases the time series of river flow values containsboth low to medium (frequent events) as well as high to very highflows (rare events). In order to improve the real time forecasting offloods, this paper proposes a Threshold Subtractive Clustering basedTakagi Sugeno (TSC-T–S) fuzzy inference system. The proposed TSC-T–S fuzzy model is applied for the forecasting of hourly river flowof river Narmada, India by evaluating the fuzzy membership func-tions for frequent and rare events. In the proposed method the in-put–output data space is classified into frequent and rare events topreserve generalization capability of the Subtractive Clusteringbased Takagi Sugeno (SC-T–S) fuzzy inference system withimproved forecasting. The results of the proposed TSC-T–S fuzzymodel are evaluated with the forecast from ANN, SOM andSC-T–S (or interchangeably used as TS or T–S fuzzy model) fuzzymodels at different lead periods.

2. Fuzzy models

Unlike classical logic which requires a deep understanding of asystem, exact mathematical equations, and precise numeric values,fuzzy logic incorporates an alternative way of thinking, which al-lows modeling of complex systems using a higher level of abstrac-tion originating particularly from our knowledge and experience.Fuzzy logic allows expressing this knowledge in a subjective waywhich are mapped into exact numeric ranges. In ordinary (non fuz-zy) set theory, elements either fully belong to a set or are fully ex-cluded from it. The membership lA(x) of xw of a classical set A, as asubset of the universe x, is defined by:

lAðxÞ ¼1; iff x 2 A

0; iff x R A

� �ð1Þ

This means that an element xw is either a member of setA(lA(x) = 1) or not lA(x) = 0. This strict classification is useful inthe mathematics and other sciences.

The general linguistic fuzzy model of multi-input single-outputsystem is interpreted by rules with multi-antecedent and single-consequent variables such as the following:

Rule Ri: if x1 is Ai1 and if x2 is Ai2 and . . . and if xn is Ain

THENy is Bi; i ¼ 1;2; . . . ; k ð2Þ

where x1, x2, . . . ,xn are input variables and y is the output, Aij

(i = 1, . . . ,k, j = 1, . . . ,n) and Bi (i = 1, . . . ,k) are fuzzy sets.The corresponding fuzzy rules, antecedent and consequent

membership functions can be generated by (i) knowledge ofhuman experts; and (ii) suitable identification technique. If nopriori knowledge exists for a given system, fuzzy clusteringtechnique can be useful.

3. Fuzzy structure identification

Data driven fuzzy identification is an effective tool for theapproximation of uncertain non-linear systems (Hellendoorn andDriankov, 1997). Fuzzy models and their respective characteristicsare developed using clusters derived from the measured input andoutput data. A number of clustering algorithms have been reviewed:

(1) K-means or C-means clustering (Krishnaiah and Kanal,1982).

(2) Fuzzy C-means (FCM) clustering method (Bezdek, 1981;Bezdek and Pal, 1992).

Page 3: Improving real time flood forecasting using fuzzy inference system

A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41 27

(3) Mountain clustering (Yager and Filev, 1994).(4) Subtractive clustering (Chiu, 1994).

K-means/C-means clustering and fuzzy C-means (FCM) cluster-ing method are iterative techniques that starts with some initialcluster centers and generates membership grades, used to inducenew cluster centers. The performance of these methods directlyrelated with the chosen initial positions of the cluster centers. Inmountain clustering a grid is formed on the data set and it is com-bined with the potential value (mountain function) for each pointon the grid, based on its distance to the actual data point. A finergridding increases the number of potential cluster centers (Janget al., 2002). The grid vertices with highest potential point repre-sent the first cluster and subsequently by adjusting the potentialof all remaining grid points, the remaining clusters are determined.The major drawback of this method is that the computation growsexponentially with the dimension of the problem. Subtractive clus-tering proposed by Chiu (1994) is an extension of mountain clus-tering method in which instead of grid point, data points areconsidered for cluster centers. Subtractive clustering has an advan-tage over mountain clustering because its computation require-ment is simply proportional to the number of data points andindependent of the dimension of the problem (Jang et al., 2002).

Although clustering is generally associated with classificationproblems, Chiu (1994) has used the cluster estimation method asa basis of a fuzzy model identification algorithm. The core of thefuzzy structure identification method is in the clustering and theprojection. First, the output space is partitioned using a fuzzy clus-tering algorithm. Second, the partitions (clusters) are projectedonto the space of the input variables. The output partition and itscorresponding input partitions are the consequents and anteced-ents, respectively. Then by projecting each cluster onto each inputvariable, temporary clusters in the input space are obtained. Thismay be implemented by using the subtractive clustering methodthat automatically determines the number of clusters. The subtrac-tive clustering method uses the following formula to express thepotential as a sum of contribution of Euclidean distance betweena given point and all other data points (Chiu, 1994):

Di ¼XN

j¼1

dij ð3Þ

dij ¼ e�ajxi�xj j2 ; i ¼ 1;2; . . . ;N ð4Þ

where Di is the potential of the data point xi to be a cluster center, dij

denotes the contribution of every single distance, N is the number oftraining data samples and a ¼ 4=R2

a ; Ra is the cluster radii.

3.1. Computation of clusters for hydrological data

In short term flood forecasting application, the number ofinput–output data pairs is very large. The input data vectors, whichare used to train and build the T–S fuzzy model, do not have all thesame importance. Particularly, in small to medium size catch-ments, the river flow shows a very high rate of rise and fall inshorter spells due to upstream rain and such high spikes in thetime series of flow data do not show any periodicity. In such casesthe time series of river flow values contains both low to medium(frequent events) as well as high to very high flows (rare events).In general the high flow values are very few in numbers but impor-tant in forecasting. The main purpose of any flood forecasting mod-el is to predict ‘rare events’ or catastrophic events (Luchetta andManetti, 2003). In subtractive clustering approach, an adequatechoice of the cluster radius (Ra) matches the input–output pairsto a given accuracy. Generally, Ra value is determined by trialand error and it lies always between 0 and 1. Small value of Ra

results in more number of clusters, i.e. granular partition with min-imum matching error. But this reduces the generalization capabil-ity of a fuzzy model. While, a higher value of Ra reflects a roughpartition of the input space in separate clusters and hence a sepa-rate fuzzy sets (Abonyi et al., 2002). The parameter Ra is used as aninput parameter for the generation of subtractive clustering basedT–S fuzzy inference system and the same value of cluster radius(Ra) is used for defining membership functions of input variables.Such subtractive clustering based T–S fuzzy model when appliedto real time flood forecasting model, try to mimic varying hydro-logic situation with similar non-linear membership functions. Fur-thermore, the clusters obtained from the input (rainfall and/orrunoff) are generally biased towards frequent events. Clusters withdifferent radius of influence and thus with different Gaussianmembership function widths may serve a better input–outputmapping of continuous short interval rainfall–runoff data originat-ing from mixed population. This can be achieved by proposed TSC-T–S fuzzy inference system which deals with the frequent and rareevents separately during estimation of cluster and membershipfunctions. By carefully examining the available input–output datapairs, threshold values for each input variable can be decidedwhich subdivides the data into two classes (i) frequent events(Nf) and, (ii) rare events (Nr). Let there are N input–output datapairs in n dimensional data space then

Nf þ Nr ¼ N ð5Þ

and

Nf

Nr� 1 ð6Þ

In the subtractive clustering the data points have to be normal-ized in each direction within a unit hypercube. Following the pre-vious definition the normalized data vector is subdivided as acollection of frequent events Nf of data points {xf1, . . . ,xfk, . . . ,xfNf}and rare event Nr of data points {xr1, . . . ,xrl, . . . ,xrNr} in n dimen-sional data space. Since each data point is considered as a potentialcluster center, a density measure at data point xk of frequent eventsis defined as

Dfk ¼XNf

jf¼1

exp �kxfk � xfjf k

2

Rfa2

� �2

0B@

1CA ð7Þ

Similarly, density measure at data point xl of rare events

Drl ¼XNr

jr¼1

exp �kxrl � xrjrk2

Rra2

� �2

!ð8Þ

where Rfa and Rra are the radius of influence of clusters in frequentand rare events.

The measure of density for a data point is a function of its dis-tance to all other data points. A data point with many neighboringdata points will have a high potential value. The constant Rfa or Rra

are effectively the radius defining the neighborhood and datapoints outside this radius has little influence on the density mea-sure. The data points with highest density measure, denoted byDf �k and Dr�l are considered as first cluster centers in frequentðxf �jf Þ and rare events ðxr�jr Þ. In case of both rare and frequent eventsthe density measure is then recalculated for all other pointsexcluding the first cluster centers by the following formula:

For frequent events:

Dfk ¼ Dfk � Df �k �XNf

jf¼1

exp �kxfk � xf �jf k

2

gRfa2

� �2

0B@

1CA ð9Þ

For rare events:

Page 4: Improving real time flood forecasting using fuzzy inference system

28 A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41

Drl ¼ Drl � Dr�l �XNr

jr¼1

exp �kxrl � xr�jrk

2

gRra2

� �2

0B@

1CA ð10Þ

where g is a positive constant and gRfa and gRra are the radiusdefining the neighborhood that has measurable reductions inpotential. Further,

gRfa � Rfa ð11Þ

gRra � Rra ð12Þ

Again, the data point with highest density measure is consideredas the next cluster center. This process is repeated until a sufficientnumber of cluster centers are generated. A sophisticated stoppingcriterion using density measures and minimal distance betweenclusters given by Chiu (1994, 1996) is generally applied (Vernieuweet al., 2005). The procedure of acquiring the new cluster center andrevising the density measure (Eq. (9), (10)) repeats until the kthcluster center is very close to lth cluster center such as:Dr�k < e � Dr�l or Df �k < e � Df �l (where e is a small fraction). If e is toolarge, too few points will be accepted as cluster centers and if e istoo small, too many points will be accepted as cluster centers. Addi-tional criteria defined by Chiu (1994) for accepting/rejecting clustercenters is applied here, i.e. if Dr�k < �e � Dr�l and Dr�f < �e � Df �l acceptthe cluster center and continue and if Dr�k < e � Dr�l andDr�f < e � Df �l reject the cluster center and end the clustering process.Here, �e specifies a threshold for the potential above which acceptthe data point as a cluster; e specifies a threshold below whichrejects the data point. After computing the clusters, both frequentand rare event clusters are pooled together.

Further, it may be possible that the highest cluster center in the fre-quent events and lowest cluster center in the rare event are closeenough. Therefore, the two clusters may be clubbed together so asto reduce the closely spaced clusters and thus to improve thegeneralization capability of the fuzzy model. In subtractive clusteringapproach proposed by Chiu (1994), when maximum potential ratiolies in between accept and reject ratios a new cluster center isaccepted if

Pr þ dmin � 1 ð13Þ

or

dmin � 1� Pr ð14Þ

where Pr = maximum potential ratio and dmin = minimum distancefrom previously found clusters.

Now, in the modified TSC-T–S fuzzy inference system, the twonearest clusters from both frequent and rare event groups (i.e.highest one from the frequent events and lowest one from the rareevents) are accepted for developing fuzzy rules when

Dmin � 1� Pr ð15Þ

where Dmin = minimum distance between frequent and rare evenclusters. Here, maximum potential ratio is considered as accept ratioso as to accept most closely spaced clusters from two different groups.

Therefore, to verify the spacing between two cluster centers oftwo different groups the following check is applied:

xf �jMaxf � xf �jMinr

� �Rfa

� 1� Accept ratio andxr�jMaxf � xr�jMinr

� �Rra

� 1� Accept ratio ð16Þ

If the above check is satisfied then the cluster centers of theentire data vector, i.e. including both frequent and rare events ofjth data set are:

x�j ¼ ½xf �jf xr�jr � ð17Þ

where x�j represents total C clusters in the data set, xf �jf representstotal Cf clusters in the frequent events and, xj�jr represents total Cr

clusters in the rare events. Therefore, the total clusters in the dataset are defined as:

C ¼ Cf þ Cr ð18Þ

If the check defined by Eq. (16) is not satisfied than club the twoclosely spaced cluster center together and replace with a new clus-ter center defined as:

x�jRevised ¼xf �jMaxf � xr�jMinr

� �2

ð19Þ

where x�jRevised is the revised cluster center and this reduces the clo-sely spaced cluster centers and thus the number of rules in the fuz-zy inference system. Therefore, the revised cluster centers in thiscase are:

x�j ¼ ½xf �jf � xr�jr� x�jRevised� ð20Þ

where xf �jf � are the cluster centers in frequent events excluding thehighest cluster center and xr�jr� are the cluster centers in rare eventsexcluding the lowest cluster center.

Therefore, the total clusters are defined as:

C ¼ C�f þ C�r � 1� �

ð21Þ

where C�f ¼ Cf � 1 are revised clusters in the frequent events and,C�r ¼ Cr � 1 are revised clusters in the rare events.

The above cluster centers defined by Eqs. (17) and (20) revealscertain characteristics related to frequent and rare situations of thesystem to be modeled and can be reasonably used as the centersfor the fuzzy rules’ premise and antecedent membership functionthat describes the system behavior. To generate rules, the clustercenters ðx�j Þ are used as the centers for the premise sets and themembership of input xj to the jth premise part of the ith rule isdefined by the Gaussian membership function:

lfijðxjÞ ¼ e�

xj�x�fji

Rfai2

!2

when i 2 Cf ð22Þ

lrijðxjÞ ¼ e�

xj�x�rji

Rrai2

� 2

when i 2 Cr ð23Þ

where xj is the jth variable of the input data vector, x�ji is the ith clus-ter center of the jth input variable, Rai is the cluster radius of the ithcluster and i (=1, . . . ,C) is the cluster radius index or number of rules.The shape of the Gaussian membership function defined by (Eqs.(22) and (23)) indicate that for every input vector a membershipdegree to each fuzzy set greater than zero is obtained and all therules in the rule-base fires simultaneously (Lohani et al., 2007b).Therefore, this leads to the possibility of generating only a few rulesfor describing the accurate relationship between input and output.

The input membership function matrix of the TSC-T–S fuzzymodel can be represented as:

l11 l12 � � � l1ðCf�1Þ l1R l12 � � � � � � l1ðCr�1Þ

l21 l22 � � � l2ðCf�1Þ l2R l22 � � � � � � l2ðCr�1Þ

� � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � � � � � � � � �lðn�1Þ1 lðn�1Þ2 � � � lðn�1ÞðCf�1Þ lðn�1ÞR lðn�1Þ2 � � � � � � lðn�1ÞðCr�1

ln1 ln2 � � � lnðCf�1Þ lnR ln2 � � � � � � lnðCr�1Þ

266666666666666664

377777777777777775

Page 5: Improving real time flood forecasting using fuzzy inference system

A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41 29

3.2. SC-T–S and TSC-T–S Fuzzy Models

Fuzzy relational models can be regarded as an extension of lin-guistic models, which allow for different degrees of associationbetween the antecedent and the consequent linguistic terms. Amajor distinction can be made between the linguistic model, whichhas fuzzy sets in both antecedents and consequents of the rules,and the T–S model, where the consequents are (crisp) functionsof the input variables. Consider the identification of followingunknown nonlinear hydrological system based on some availableinput–output data sets xk = [x1k, x2k, . . . , xnk]T and yk (fork = 1, . . . , N, and N = Nf + Nr), respectively:

y ¼ f ðxÞ ð24Þ

A model to describe the above unknown non-linear systemusing a Takagi–Sugeno (T–S) Fuzzy model (Takagi and Sugeno,1985) for a n dimensional input space and single output consistsof a set of rules Ri, i = 1, . . . , C as given below:

Ri: if x1 is li1 and if x2 is li2 and . . . and if xn is lin

THEN y ¼ fiðx1; x2; . . . ; xnÞ ð25Þ

where x1, x2, . . . ,xn are the antecedents and y is the consequent, li1,li2, . . . ,lin are fuzzy sets and fi(x1, x2, . . . ,xn) is a linear function ofthe form:

fiðx1; x2; . . . ; xnÞ ¼ a0i þ a1ix1 þ a2ix2 þ � � � þ anixn ð26Þ

with a01, a1i, . . . ,ani the parameters of the consequent part of rule Ri.The total output of the T–S fuzzy model (Lohani et al., 2006,

2007b) of the nonlinear system represented by C cluster centersis computed by:

y ¼PC

i¼1lifiðx1; x2; . . . xnÞPCi0¼1li

ð27Þ

y1y2:

:

yN

26666664

37777775¼

wf11 wf

11 �x11 � � � wf11 �x1n

wf21 wf

21 �x21 � � � wf21 �x2n

� � � � � � � � � � � �� � � � � � � � � � � �wf

N1 wfN1 �xN1 � � � wf

N1 �x1n

wf1Cf

wf1Cf�x11 � � � wf

1Cf�x1n

wf2Cf

wf2Cf�x21 � � � wf

1Cf�x1n

� � � � � � � � �� � � � � � � � �

wfNCf

wfNCf�xN1 � � � wf

1Cf�x1n

wr11 wr

11 �x11 � � � wr11 �x1n

wf21 wr

21 �x21 � � � wr11 �x1n

� � � � � � � � � � � �� � � � � � � � � � � �wf

N1 wrN1 �xN1 � � � wr

11 �x1n

wr1Cr wr

1Cr �x11 � � � wr1Cr �x1n

wf2Cr

wr2Cr �x21 � � � wr

1Cr �x1n

� � � � � � � � � � � �� � � � � � � � � � � �

wfNCr

wrNCr �xN1 � � � wr

1Cr�x1n

266666664

377777775�½a10 a11 a1n a20 a21 a2n aC0 aC1 aCn �

ð34Þ

The overall truth value of the ith rule in the proposed SC-T–Sfuzzy model for n dimensional input vector is defined by meansof product operator:

liðxÞ ¼Yn

j¼1

lijðxjÞ when i 2 C ð28Þ

where xj is the jth input variable in the n dimensional input vector,and lij is the membership degree of xj to the fuzzy set describing thejth premise part of the ith rule.

The total output of the proposed TSC-T–S fuzzy model of thenonlinear system represented by C = Cf + Cr � 1 cluster centers iscomputed by:

y¼PCf�1

i¼1 lfi f iðx1;x2; . . . ;xnÞþlrev isedfiðx1;x2; . . . ;xnÞþ

PCr�1i¼2 lr

i fiðx1;x2; . . . ;xnÞPCf�1

i0¼1lf

i þlrevisedþPCr�1

i0¼2 lri

ð29Þ

Similarly, the total output of the proposed TSC-T–S fuzzy modelof the nonlinear system represented by C = Cf + Cr cluster centers iscomputed by:

y ¼PCf

i¼1lfi f iðx1; x2; . . . ; xnÞ þ

PCri¼1lr

i fiðx1; x2; . . . ; xnÞPCf

i0¼1lfi þ

PCri0¼1l

ri

ð30Þ

where li e [0, 1] is the degree at which the antecedent of rule Ri

holds.For n dimensional input vector, the overall truth value of the ith

rule in the proposed TSC-T–S fuzzy model is defined by means ofproduct operator:

lfi ðxÞ ¼

Yn

j¼1

lfijðxjÞ when i 2 Cf ð31Þ

lrevisedðxÞ ¼Yn

j¼1

lrevised;ijðxjÞ i 2 CRevised ð32Þ

lri ðxÞ ¼

Yn

j¼1

lrijðxjÞ when i 2 Cr ð33Þ

where xj is the jth input variable in the n dimensional input vector,and lf

ij or lrij is the membership degree of xj to the fuzzy set describ-

ing the jth premise part of the ith rule describing frequent or rareevents. While, lrevised is the membership degree of xj to the fuzzyset describing the jth premise part of the ith rule described by therevised cluster between rare and frequent events.

Now, the parameters (ai = [ai0, ai1, . . . ,aiC]) of the consequentpart of the proposed TSC-T–S fuzzy model output (y = [y1, y2,. . . ,yN]T) given by Eq. (30) can be computed by global least squaremethod by solving in following form:

y ¼ x � a

or

where

wfikðxkÞ ¼

lfi ðxkÞPk

i0¼1lfi ðxkÞ

when i 2 Cf ð35Þ

wrikðxkÞ ¼

lri ðxkÞPk

i0¼1lri ðxkÞ

when i 2 Cr ð36Þ

and k = 1, . . . ,N; i = 1, . . . ,C with N = number of data points andi = number of rules.

In the proposed fuzzy clustering model the threshold subtrac-tive clustering assign a set of rules and antecedent membershipfunctions for rare and frequent situations that models the databehavior. Further, by using global linear least square estimationeach rules consequent equation (Eq. (34)) is determined. Theadvantage of this method is that it generates Gaussian member-ship functions (Eqs. (22) and (23)) for frequent and rare situationsas fuzzy sets, which have, by nature, infinite support, therefore forevery special input vector a membership degree to each fuzzy setgreater than 0 is computed, and hence every rule in the rule-base

Page 6: Improving real time flood forecasting using fuzzy inference system

30 A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41

fires. This leads to the possibility of generating only a couple ofrules, describing the relationship between input and output chan-nels accurate enough. Fig. 1 illustrates the components of the pro-posed model and its data flow.

4. Study area and data used

The Narmada River emanates at Amarkantak in the Shahdol dis-trict of Madhya Pradesh in Central India at an elevation of1057 m.s.l. The river travels a distance of 1312 km before it fallsinto Gulf of Cambay in the Arabian Sea near Bharuch in Gujarat.The Narmada basin extends over an area of 98,796 sq. km and liesbetween longitudes 72�320E to 81�450E and latitudes 21�200N to23�450N. In the present study the upper Narmada basin up toMandla G&D site covering the catchment area of 13120 sq. kmhas been selected for flood forecasting (Fig. 2). The hourly rainfalldata for Jamtara, Dindori and Malankhand and hourly dischargedata at Mandla and Manot are available from 1989 to 1995. Fordevelopment of a real time flood-forecasting model at Mandla site,river discharge data of Mandla gauge site, Manot gauge site (up-stream of Mandla) and rainfall for the monsoon period have beenused. Areal rainfall computed by Thiessen polygon method servesas the input to the forecasting model. The input vector is selectedgenerally by trial and error method (Maier and Dandy, 2000).Determination of the number of antecedent rainfall and dischargevalues involves the computation of lags of rainfall and dischargevalues that have significant influence on the forecasted flow. These

Fig. 1. Flow chart of the TSC-T–S Fuzzy model algorithm.

influencing values corresponding to different lags can be very wellestablished through statistical analysis of the data series. A simpleand effective method based on statistical analysis has been evolvedby Sudheer et al. (2002). The statistical parameters such as autocorrelation function (ACF), partial auto correlation function (PACF)and cross correlation function (CCF) used for identification of in-puts (Lohani et al., 2005a,b; Sudheer et al., 2002; Nayak et al.,2005a; Kisi, 2008) provides the following two model structure:

I. Model M: Considering basin rainfall and antecedent dis-charge at Mandla site.

The cross correlation between the spatially averaged rainfalland runoff at Mandla indicates that the rainfall at 16, 17 and 18lags influence the runoff

QMandla;t ¼ f ðRt�16;Rt�17;Rt�18;Q Mandla;t�1;Q Mandla;t�2;Q Mandla;t�3;

Q Mandla;t�4;QMandla;t�5;Q Mandla;t�6Þ ð37Þ

II. Model MM: Considering basin rainfall, antecedent dis-charge at Mandla and Manot gauging sites.

The cross correlation of Discharge at Mandla with discharge atManot indicates that the flow of Manot gauging site at 3 and 4 lagsinfluence the runoff at Mandla.

QMandla;t ¼ f ðRt�16;Rt�17;Rt�18;Q Mandla;t�1;Q Mandla;t�2;Q Mandla;t�3;

Q Mandla;t�4;QMandla;t�5 andQ Mandla;t�6;QManot;t�3;Q Manot;t�4Þð38Þ

where QMandla,t�1 is observed discharge of Mandla gauging site att � 1 h, QManot,t�3 is observed discharge of Manot gauging site att � 3 h, Rt�16 is spatially averaged rainfall values at t � 16 h, and tis lead time (hours).

5. ANN model

The flood forecasting models have also been developed througha feed forward neural network with one hidden layer, consideringthe discharge as target variable to be forecasted. The feed forwardhierarchical architecture is the most commonly used neural net-work structure (Maier and Dandy, 2000). As discussed in previoussection and presented in Eqs. (37) and (38), 9 and 11 input vari-ables have been identified for developing the real time flood fore-casting models at Mandla site. The output layer has one neuroncorresponding to the forecasted discharge at time t in the ANNarchitectures of 9-N-1and 11-N-1. For developing ANN model a sig-moid activation function was used for the hidden layer, and a lin-ear transfer function was used for the output layer. Further, theoptimal number of neuron (N) in the hidden layers have been iden-tified using a trial and error procedure by varying the number ofhidden neurons from 2 to 10 with 1 point on each successive incre-ment. The ANN model was developed using the newff functionavailable in the neural network toolbox of MATLAB. The weightsand biases of the network were adjusted using gradient descentwith momentum weight and bias learning function during train-ing. Optimal network architecture was selected based on the min-imum root mean square error (RMSE). ANNs can suffer fromoverfitting. Overfitting is especially dangerous because it can easilylead to predictions that are far beyond the range of the trainingdata. In order to avoid overfitting of ANNs ‘‘early stopping’’ methodhas been applied. ANN structure consisting of 9 input neurons, 4hidden neurons and 1 output neurons (9–4–1) and 11 input neu-rons, 4 hidden neurons and 1 output neurons (11–4–1) have beenadopted as the best flood forecasting model structures for the mod-els represented by Eqs. (37) and (38).

Page 7: Improving real time flood forecasting using fuzzy inference system

Fig. 2. Index map of study area.

A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41 31

6. Self Organizing Map (SOM)

Kohonen neural network also known as the self-organizing fea-ture map is a realistic, although very simplified, artificial neuralnetworks model of the unsupervised Type of the human brain(Kohonen, 1982, 2001). The purpose of the SOM is to capture thetopology and probability distribution of input data. Hall and Minns(1999) indicated the feasibility of employing a Kohonen neuralnetwork for the classification of hydrological homogeneousregions.

The learning procedure in a Kohonen Map is unsupervisedcompetitive learning. Only the winning node and its neighborsare updated during the learning. Weights wij are updated usingfollowing formula:

wijðnewÞ ¼ wijðoldÞþ / ½xi �wijðoldÞ� ð39Þ

where xi is the ith input signal, wij is the weight of the connectionfrom node i to node j and / is the learning rate. The winning nodeis determined by a similarity measure, which can be Euclidean dis-tance measure or the dot product of two vectors. The Euclidean dis-tance (Dj) that is mostly used for similarity measure is calculated as:

Dj ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXn

i¼1ðxi �wijÞ2

qð40Þ

where the Kohonen map based data-clustering technique is appliedto show how multi-dimensional datasets can be reduced to 2-D(feature) maps, manifesting clusters of similar data items (Kianget al., 1997).

SOM is applied in the present study with the same model inputsand outputs as presented in Eqs. (38) and (39). Toth (2009) hasmentioned that there is no predefined number of possible condi-tions for the clustering using SOM. In this study nine nodes (corre-sponding to a class) have been considered and the SOM wasinitially applied to the calibration set. Further, keeping in viewthe patterns of the input output data some clusters have beenmerged and the system is classified into seven major classes toobtaining wider classes suitable for the preservation of the hydro-logical distinctiveness of the classes and accuracy of the forecast.Seven different neural networks, with ANN structure consistingof 9 input neurons, 4 hidden neurons and 1 output neurons(9–4–1) and 11 input neurons, 4 hidden neurons and 1 output

neurons (11–4–1) have been adopted as the best flood forecastingmodel structures in the similar way as discussed in the previoussection.

7. Results and discussion

The threshold subtractive clustering algorithm is employedtogether with the global least square method to identify T–S fuzzymodel on the training data set. A suitable transformation in thehydrologic series aids in improving the model performance(Sudheer and Jain, 2003). Before developing the model, logarithmictransformation as suggested by Nayak et al. (2005b) is applied tothe data. The clustering partitions a data set into a number ofgroups such that the similarity within a group is larger than thatamong groups. Most similarity metrics are sensitive to the rangeof elements in the input vectors and may have an influence onthe performance of the clustering algorithm (Babuška, 1998;Höppner et al., 1999). Therefore, the transformed data set isadditionally standardized as:

x ¼ x� �xr

ð41Þ

where, �xð¼ 1n

Pni¼1xiÞ is the mean and r is the standard deviation of

training data set. Further, the standardized data set has been nor-malized within the hypercube.

Total six parameters ðg; raf ; rar ; e�; e; sÞ influence the number of

rules in the proposed threshold subtractive clustering approach.Values of g = 1.5, e

�¼ 0:5 and e = 0.15 (Chiu, 1994) are often used

in subtractive clustering approach. The value of threshold parame-ter (s) varies from 0 to 1. Value of s = 0 or 1 indicate that the dataset is considered as a singular series without dividing it in rare andfrequent events. The subtractive clustering algorithm (Chiu, 1994)is a special case of the proposed threshold subtractive clusteringalgorithm when s = 0 or 1 and raf = rar. For the computation ofthe value of threshold parameter (s), the training data set wassorted in ascending order. A plot of the data sorted in ascendingorder provides a basis for classification of data set into rare and fre-quent event and thus help in deciding the value of thresholdparameter (s). In order to find out optimum cluster centers andthus the optimum fuzzy model, the threshold subtractive cluster-ing algorithm is initially used as subtractive clustering algorithm

Page 8: Improving real time flood forecasting using fuzzy inference system

Table 1Global performance evaluation criteria.

Coefficient of Correlation ¼PN

i¼1ðQo�Q

�oÞðQp�Q

�pÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPN

i¼1ðQo�Q

�oÞ

2PN

i¼1ðQp�Q

�pÞ

2q

Root Mean Square Error ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPN

i¼1ðQoi�QpiÞ2

N

r

Nash Sutcliffe Efficiency ¼ 100� 1�PN

i¼1ðQo�QpÞ2PN

i¼1ðQo�Q

�oÞ

2

" #

where N is the number of observations; Qo is the observed flow; Qp is

predicted flow; Q�

o is the mean of the observed flow; and Q�

p is the mean ofthe predicted flow

32 A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41

(considering s = 0 or 1 and raf = rar) and the cluster radius was var-ied between 0.1 and 1 with steps of 0.02. These cluster centers andthus the Gaussian membership function obtained from trainingdata set were used to compute consequent parameters through alinear least square method and a model was built. Evaluating themodel by the global model performance indices (Table 1) such asroot mean square error (RMSE) between the computed and ob-served discharge, the correlation coefficient and model efficiency(Nash and Sutcliffe, 1970), the optimal parameter combination ofthe model was sought. Once the optimization process is finished,the optimized membership functions for each input variable andconsequent parameters are defined for an optimized T–S fuzzymodel. This provides a conventional T–S fuzzy model.

For developing TSC-T–S fuzzy model, the graphically selectedvalue of s along with the value of g = 1.5, e

�¼ 0:5 and e = 0.15 were

considered. For training the TSC-T–S fuzzy model the values of rar

and raf were varied with steps of 0.02 between 0.1 and 1 while,the value of other parameters ðviz:g; e

�; eÞ were fixed. The number

of rules obtained for the respective parameter values and a largeinfluence is observed for raf and rar. Further, smaller values ofradius of influence of clusters (rar and raf) results in model withhigher number of rules.

Since the model subdivides the entire data set in two sets, a bet-ter representation of rare events is obtained through revised clus-ter centers. The hard boundary between frequent and rare eventsconsidered for independently clustering the data of two separategroups is removed and the cluster centers obtained for frequentand rare events are clubbed together. Further, the membershipfunction is computed at each cluster center using Eqs. (22) and(23). These membership function show an overlapping and there-fore the input data falling in the frequent events have some mem-bership in rare event and vis a vis the input data falling in rareevent has membership in frequent event. The algorithm givesequal weights to rare and frequent events by independently com-puting the cluster centers in two different data sets. Therefore, itimproves the performance of the model particularly for rare eventsand thus the overall modal performance also increases. Varying thevalue of s in the steps of 0.1 on both lower and higher side (s ± 0.1)and again computing the clusters and thus the T–S fuzzy model anoptimal threshold value which provides a best fuzzy model can be

Table 2Statistical properties of the data selected for modeling.

Model No. Calibration period

Rare events (%) Period Qmin (m3/s) Qmax (m

M8990/MM8990 21 1989–1990 8.86 2097.4M9091/MM9091 23 1990–1991 5.76 11794.0M9192/MM9192 24 1991–1992 2.64 4354.5M9293/MM9293 24 1992–1993 2.64 4354.5M9394/MM9394 24 1993–1994 3.15 6330.4M8994/MM8994 25 1989–1994 2.64 11794.0

obtained. The performance of the model may further be improvedby considering an optimal combination (raf, rar) of cluster radiusesfor frequent and rare events. To achieve this optimal combination,values of rar and raf are varied in the steps of 0.02 to obtain a bestcombination of parameters that produces the TSC-T–S fuzzy modelwith best performance indices. Different values of cluster radius inrare and frequent events results in membership functions of differ-ent widths. This indicates more linguistic relevance of the model.

Further, using the methodology explained earlier, these clustercenters were used in consequent parameter computation. Trainingthe model with different cluster radius and threshold values, anoptimal combination of raf, rar and s which produces a best TSC-T–S fuzzy model is figured out. The process is repeated for eachdata set separately.

The model performance indices such as correlation, efficiencyand RMSE indicate the overall model performance statistics. Fordescribing the model performance throughout the calibrationand validation period and to test the robustness of the developedmodel, performance evaluation criteria such as average absoluterelative error (AARE) and threshold statistics (Jain and Ormsbee,2002; Nayak et al., 2005a) have been employed in literature exten-sively. The AARE statistic provides overall performance index interm of absolute relative error between observed and predictedflow (absolute prediction error). While threshold statistic (TS) pro-vides the distribution of absolute prediction error in terms of num-ber of data points considered in calibration and validation. Thesestatistics can be calculated using the following equations:

AARE ¼ 1n

Xn

i¼1

jnij ð42Þ

in which nj ¼Q o

i � Q pj

Qpj

� 100 ð43Þ

where nj = Relative error between observed and predicted flow in%,Qo

i ¼ observed flow and Qpj ¼ predicted flow.

TSj ¼yj

N� 100 ð44Þ

where yj is the number of stream flows (out of total N computedstream flows) for which absolute relative error between computedand observed flows is less than j%.

It is clear from the definition that higher values of TS and lowervalues of AARE would indicate better model. In flood forecasting, itis very important to know the performance of flow forecastingmodel in predicting higher magnitude flows. The above describedperformance criteria do not express the prediction ability of themodel preciously from higher to low flow region. Therefore, it isfelt to introduce herein a new model performance criteria termedas peak percent threshold statistics of prediction between top u% andl% data (PPTS(l,u)). The term PPTS(l,u) is the average absolute relativeerror in prediction of flows lying in the band of top u% and l% data.For computation of the PPTS(l,u), the observed data are arranged indescending order and the following equation is used:

Validation period

3/s) SD (m3/s) Period Qmin (m3/s) Qmax (m3/s) SD (m3/s)

282.1 1991 5.76 11794.0 1051.6792.8 1992 2.64 4354.5 513.6513.6 1993 5.30 3379.3 350.5436.7 1994 3.15 6330.4 944.1735.4 1995 4.37 3730.4 417.3677.4 1995 4.37 3730.4 417.3

Page 9: Improving real time flood forecasting using fuzzy inference system

Table 3Performance indices of 1 h lead models (model M).

M9394 M8994

Calibration Validation (1995) Validation (1989–1992) Calibration Validation

TSC-T–S SOM TSC-T–S SOM TSC-T–S SOM TSC-T–S SOM TSC-T–S SOMSC-T–S ANN SC-T–S ANN SC-T–S ANN SC-T–S ANN SC-T–S ANN

Correlation 0.9911 0.9816 0.9915 0.9826 0.9901 0.9819 0.9901 0.9819 0.9905 0.98220.9828 0.9760 0.9830 0.9771 0.9823 0.9752 0.9828 0.9732 0.9840 0.9740

Efficiency 96.3102 95.6218 96.5286 95.8391 96.4537 95.7213 96.3721 95.5989 96.4545 95.771295.8045 95.2321 95.8888 95.6092 95.8144 95.5010 95.6424 95.4762 95.8153 95.5161

RMSE (m3/s) 88.731 91.811 88.689 91.003 88.712 92.772 84.002 85.141 83.830 85.01390.452 95.447 89.919 94.749 90.052 95.591 84.452 85.539 84.931 85.429

AARE 3.789 3.941 3.756 3.867 3.854 3.963 3.623 3.762 3.546 3.7883.924 4.102 3.803 4.061 3.907 4.187 3.663 3.928 3.595 3.842

TS1 46.280 45.976 49.316 47.841 49.504 47.827 49.021 45.616 49.125 47.15845.525 45.503 47.832 47.219 47.941 47.492 45.924 44.731 47.733 45.805

TS2 54.123 53.747 55.115 54.102 56.319 55.231 56.549 54.494 56.668 54.28953.633 52.164 54.030 53.627 55.228 54.174 53.723 52.679 53.801 52.730

TS5 78.481 77.201 80.784 78.012 82.782 81.905 82.531 81.824 82.620 81.79676.749 73.263 79.542 76.778 82.221 80.441 81.912 80.997 82.057 80.312

TS10 93.318 92.221 94.083 93.114 93.892 93.207 93.631 92.297 93.884 91.46591.227 90.193 92.610 92.020 92.787 92.992 92.457 91.552 92.741 90.953

TS20 96.746 95.229 98.598 97.672 99.305 98.839 99.104 97.219 99.293 98.40195.206 94.761 97.689 97.509 98.417 97.642 97.224 96.233 98.379 97.555

TS50 99.071 98.389 96.684 98.242 97.101 98.011 98.354 97.771 99.083 99.09298.414 97.110 99.056 99.836 98.143 99.768 97.812 96.564 99.121 98.734

PPTS(2) 3.181 5.973 3.310 3.602 3.296 3.638 2.801 3.308 2.842 3.2913.421 6.684 3.246 3.563 3.212 3.623 2.983 3.457 3.035 3.593

PPTS(3) 5.415 6.029 4.913 6.129 4.577 6.093 3.481 4.113 3.537 4.2285.844 6.717 5.750 6.648 5.681 6.491 3.544 4.327 3.630 4.450

PPTS(5) 8.937 9.604 8.877 9.595 8.988 9.711 7.902 7.989 7.976 8.0919.621 9.992 9.454 9.897 9.712 9.903 7.772 7.903 7.881 8.124

PPTS(10) 6.221 6.733 6.019 6.587 6.446 6.779 6.104 6.544 6.335 6.5726.742 7.102 6.527 6.909 6.636 6.914 6.256 6.667 6.392 6.719

PPTS(20) 5.896 6.593 4.866 5.428 5.218 5.560 5.003 5.195 5.078 5.4965.637 6.704 5.444 5.569 5.4725 5.639 5.312 5.497 5.375 5.581

Table 4Performance indices of 1 h lead models (model MM).

M9394 M8994

Calibration Validation Calibration Validation

TSC-T–S SOM TSC-T–S SOM TSC-T–S SOM TSC-T–S SOMSC-T–S ANN SC-T–S ANN SC-T–S ANN SC-T–S ANN

Correlation 0.9931 0.9891 0.9927 0.9869 0.9927 0.9887 0.9926 0.98690.9894 0.9834 0.9874 0.9823 0.9887 0.9819 0.9882 0.9815

Efficiency 96.850 96.582 96.580 95.898 97.500 96.881 96.730 95.93996.570 96.480 95.970 95.690 97.100 96.500 96.020 95.540

RMSE (m3/s) 83.324 87.233 88.123 91.013 80.283 84.006 82.127 83.82984.721 89.962 88.552 93.578 82.455 85.534 83.726 84.323

AARE 3.567 3.872 3.544 3.922 5.501 5.811 3.582 3.7673.782 4.040 3.880 4.139 5.577 6.073 3.669 3.915

TS1 47.221 46.911 49.832 48.641 50.538 46.201 49.265 46.55746.389 46.167 48.660 48.110 45.956 45.532 47.870 45.840

TS2 56.548 56.403 55.474 54.117 58.534 56.008 57.512 54.81256.429 56.253 54.840 53.670 56.181 55.945 54.910 53.600

TS5 83.869 82.003 82.736 79.569 84.325 83.394 82.820 81.97783.385 81.228 80.240 76.842 83.620 82.982 82.210 81.130

TS10 93.729 93.128 94.225 93.428 94.137 93.331 94.397 92.33993.671 92.789 93.290 92.181 93.327 93.270 93.420 91.740

TS20 98.069 97.907 98.867 97.944 99.392 97.309 99.330 98.95797.954 97.156 97.890 97.793 97.418 97.200 98.530 98.330

TS50 99.442 98.949 99.171 99.453 99.082 99.128 99.610 99.43899.213 98.575 99.710 99.861 99.065 99.057 99.520 99.230

PPTS(2) 3.162 6.422 3.205 3.579 2.618 3.023 2.833 3.2483.419 6.563 3.121 3.516 2.815 3.261 3.022 3.573

PPTS(3) 5.085 5.227 4.832 5.843 3.367 4.025 3.527 4.0075.067 5.794 5.579 6.249 3.388 4.151 3.614 4.426

PPTS(5) 5.079 6.116 8.241 9.018 7.647 7.688 7.952 8.1555.095 6.403 9.188 9.415 7.504 7.634 7.846 8.079

PPTS(10) 6.128 6.462 5.920 6.012 5.511 6.002 6.316 6.6016.292 6.751 6.212 6.233 5.651 6.089 6.363 6.683

PPTS(20) 5.434 5.887 4.722 5.136 4.110 4.434 5.063 5.3295.485 6.036 4.969 5.216 4.536 4.685 5.351 5.551

A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41 33

Page 10: Improving real time flood forecasting using fuzzy inference system

Fig. 3. Variation of correlation coefficients along the forecast time horizon for different data sets of river Narmada at Mandla gauging site (validation result-model MM).

34 A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41

PPTSðl;uÞ ¼1

ðkl � ku þ 1ÞXku

i¼kl

jnij ð45Þ

in which kl ¼ l�N100 and ku ¼ u�N

100 where, l and u are respectively lowerand higher limits in percentage, N is the number of data and ni isthe average relative error of the ith data. This statistics can mapthe performance of the model in various magnitude ranges of thedata. When the value of u = 100%, the PPTS(l,u) can be representedas PPTS(l,100) or simply PPTS(l). Further, PPTS(l) indicates the peak per-cent threshold statistics of top l% data. Similarly, the same statisticscan be used for evaluating the model performance in low flow mod-eling by smallest l% data from descending series.

7.1. Forecasting at very short time (1 h lead time)

In case of observed flow data of Mandla Gauging site, the fore-casting models (Eqs. (37) and (38)) have been calibrated and vali-dated for six different data sets (Table 2) in order to verify therobustness of the forecasting models developed using data of differ-ent periods and lengths. In the first five cases (i.e. M8990–M9394and MM8990–MM9394) 2 years of data were used for model cali-bration and 1 year of data for model validation and in the sixth case(M8994 or MM 8994), 6 years of data were used for calibration and

1 year of data for validation. Further, model M9394 is also validatedfor the period 89–92 to verify the model robustness for varying per-iod of validation (Table 3). Varying length for calibration period isalso useful in verifying the effect of input data length on modeldevelopment and performance. Furthermore, the input data weredivided into two regimes, namely frequent and rare events by putt-ing them in ascending order. It is observed that the flow data of theriver Narmada at Mandla and Manot sites are composed of about76% to 80% frequent events and 24% to 20% rare events during dif-ferent selected periods. After the classification of input data set intotwo classes the value of s is considered as 0.76–0.80 for differentcases. Using these values of s and varying radius of influences forfrequent and rare events in between 0.1 and 0.9 with step size of0.01, TSC-T–S fuzzy models were developed. Furthermore, usingeach data set ANN, SOM, SC-T–S fuzzy model and TSC-T–S fuzzymodels were developed to predict river flows for 1 h lead period.The values of performance indices of the models developed usingtwo different model input structures (Eqs. (37) and (38)) consider-ing the calibration period 93–94 and 89–94 and forecast of 1 h leadperiod at Mandla site are presented in detail in Tables 3 and 4. Fur-ther, the correlation coefficients for different models developedusing six different data sets and two different input modelstructures (Models MM) are illustrated in Fig. 3. The plottedvalues of correlation coefficient indicate a definite improvement

Page 11: Improving real time flood forecasting using fuzzy inference system

Table 5Performance indices of >1 h lead models (validation results model-M).

M8994 2 h 3 h 4 h 5 h 6 h

TSC-T–S SOM TSC-T–S SOM TSC-T–S SOM TSC-T–S SOM TSC-T–S SOMSC-T–S ANN SC-T–S ANN SC-T–S ANN SC-T–S ANN SC-T–S S ANN

Correlation 0.9854 0.9745 0.9744 0.9534 0.9598 0.9631 0.9483 0.9445 0.9157 0.93340.9745 0.9690 0.9687 0.9587 0.9587 0.9575 0.9466 0.9422 0.9129 0.9082

Efficiency 91.260 91.107 89.220 88.670 86.510 85.883 83.580 82.274 78.630 77.10791.190 90.980 89.020 88.440 86.140 85.390 83.150 81.910 77.560 76.660

RMSE (m3/s) 115.770 120.826 152.420 154.551 191.864 204.309 236.264 241.089 278.321 290.054117.300 122.141 152.645 155.239 196.270 208.334 238.350 242.760 284.980 292.385

AARE 5.485 5.604 7.373 9.556 11.673 12.131 12.453 14.341 18.369 19.3755.555 5.724 9.441 9.709 12.254 12.811 14.974 15.586 18.562 19.669

TS1 25.780 24.776 23.230 21.898 10.090 7.234 8.240 6.899 5.370 4.55725.260 24.070 22.790 21.350 8.910 6.790 7.240 6.050 4.640 4.480

TS2 37.420 36.834 33.360 32.489 18.570 14.778 14.130 13.874 11.130 10.82837.220 36.360 32.460 32.300 18.450 14.270 13.790 13.130 10.790 10.100

TS5 61.610 61.619 57.830 54.993 36.970 35.697 29.680 28.498 24.540 23.732TS10 84.990 82.980 72.420 71.867 62.550 61.891 59.630 58.211 42.920 40.889

84.120 82.560 72.010 71.600 62.110 61.670 58.750 58.320 42.650 40.740TS20 94.200 94.343 88.790 87.451 83.380 81.731 77.910 77.902 71.750 71.472

94.150 94.280 88.770 87.130 83.320 81.690 77.700 77.820 71.530 71.140TS50 99.260 99.391 97.520 97.512 96.600 95.882 94.830 94.190 92.900 92.379

99.210 99.260 97.330 97.380 96.190 95.290 94.810 94.160 92.730 92.190PPTS(2) 6.120 6.832 8.560 10.127 11.760 13.324 14.180 16.116 17.200 18.269

6.630 7.040 9.330 11.050 12.560 13.130 15.790 16.790 18.180 18.940PPTS(3) 6.810 8.117 10.770 12.473 14.680 16.815 18.370 21.228 23.360 26.214

7.660 8.380 11.680 13.130 14.770 16.970 20.900 23.450 25.190 26.610PPTS(5) 15.320 15.912 23.270 24.006 28.200 31.114 31.620 35.739 38.140 38.703

15.870 16.280 24.060 24.170 30.040 31.790 34.140 37.600 38.440 39.110PPTS(10) 11.540 11.488 16.940 17.429 22.090 23.911 25.980 27.791 29.420 31.381

11.670 11.510 17.380 17.910 22.680 24.780 26.690 28.300 30.610 31.720PPTS(20) 9.260 9.266 14.310 14.449 19.260 20.827 22.940 23.002 26.400 26.907

9.310 9.280 14.490 14.540 19.250 20.810 23.080 23.170 26.580 27.180

Table 6Performance indices of >1 h lead models (validation results model-MM).

M8994 2 h 3 h 4 h 5 h 6 h

TSC-T–S SOM TSC-T–S SOM TSC-T–S SOM TSC-T–S SOM TSC-T–S SOMSC-T–S ANN SC-T–S ANN SC-T–S ANN SC-T–S ANN SC-T–S ANN

Correlation 0.9880 0.9772 0.9820 0.9681 0.9760 0.9632 0.9660 0.9552 0.9520 0.98050.9770 0.9720 0.9690 0.9610 0.9640 0.9590 0.9560 0.9480 0.9430 0.9320

Efficiency 92.159 91.483 90.089 88.979 87.356 86.227 84.4039 82.847 79.401 77.02791.510 91.191 89.331 88.648 86.444 85.586 83.4451 82.100 77.832 76.833

RMSE (m3/s) 115.765 118.341 152.123 154.408 191.558 204.878 235.940 241.045 277.300 288.521117.298 121.460 152.270 154.700 196.267 207.100 237.352 242.756 283.977 290.000

AARE 5.430 5.523 7.299 9.458 11.556 12.080 12.328 14.461 18.185 19.1165.444 5.581 9.252 9.466 12.009 12.491 14.674 15.197 18.191 19.177

TS1 26.033 24.686 23.459 21.948 10.187 7.678 8.319 6.778 5.422 4.84725.346 24.128 22.869 21.397 8.939 6.807 7.263 6.065 4.651 4.493

TS2 37.783 36.917 33.691 32.589 18.751 14.998 14.269 13.451 11.242 10.90437.350 36.445 32.577 32.376 18.520 14.305 13.835 13.159 10.826 10.122

TS5 62.217 61.705 58.394 55.742 37.330 35.806 29.969 28.988 24.775 23.67161.713 61.528 57.965 54.907 37.089 35.225 29.794 28.514 24.569 23.581

TS10 85.822 83.855 73.131 72.019 63.160 62.371 60.215 58.991 43.342 41.13084.416 82.751 72.265 71.768 62.325 61.807 58.955 58.456 42.795 40.833

TS20 95.123 94.652 89.663 87.957 84.199 82.031 78.671 78.012 72.457 71.80794.481 94.497 89.081 87.330 83.610 81.883 77.969 77.995 71.779 71.302

TS50 100.235 99.525 98.480 97.789 97.551 95.879 95.761 94.878 93.811 92.42399.563 99.491 97.670 97.601 96.534 95.512 95.149 94.376 93.056 92.39961.500 61.390 57.760 54.780 36.960 35.140 29.690 28.450 24.480 23.530

PPTS(2) 5.944 6.439 8.309 10.014 11.413 12.731 13.762 15.007 16.692 17.3346.348 6.674 8.927 10.481 12.021 12.451 15.109 15.926 17.399 17.964

PPTS(3) 6.612 7.537 10.455 11.929 14.251 16.521 17.826 21.116 22.670 24.6367.329 7.949 11.174 12.454 14.138 16.100 20.005 22.239 24.108 25.238

PPTS(5) 14.867 15.220 22.588 22.867 27.368 29.367 30.692 33.215 37.018 37.85515.186 15.436 23.022 22.921 28.753 30.153 32.675 35.664 36.786 37.094

PPTS(10) 11.196 10.987 16.444 16.773 21.444 23.243 25.212 26.451 28.558 29.24411.166 10.915 16.637 16.986 21.705 23.504 25.545 26.842 29.291 30.085

PPTS(20) 8.984 8.881 14.257 13.789 18.697 19.789 22.268 21.746 25.620 25.8018.908 8.803 13.864 13.792 18.425 19.736 22.089 21.976 25.442 25.779

A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41 35

Page 12: Improving real time flood forecasting using fuzzy inference system

Fig. 4. Variation of RMSE along the forecast time horizon for different data sets of river Narmada at Mandla gauging site (validation result-model-MM).

36 A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41

in forecasting of flow series through proposed model in comparisonto SC-T–S fuzzy model, SOM and ANN model. The correlation coef-ficients of validation results of TSC-T–S fuzzy models M8990,M9091, M9192, M9293, M9394, M8994 are found to be 0.9905,0.9915, 0.9910, 0.9921, 0.9915, 0.9905 and for models MM8990,MM9091, MM9192, MM9293, MM9394, MM8994 are found to be0.9941, 0.9930, 0.9920, 0.993, 0.9927, 0.9926 respectively for onehour ahead forecast. It is also observed from Fig. 3 that theforecasting models validated for different time periods indicatevery similar model performance.

The RMSE value of SC-T–S fuzzy model is generally lower thanSOM and ANN models in all most all the models developed usingtwo different input structures and six different datasets. Further-more, the TSC-T–S fuzzy model shows a further reduction in RMSEin comparison to SC-T–S fuzzy model as demonstrate by the modelresults shown in Tables 3 and 4. The value of AARE and TS statisticswhich maps the performance of the models in terms of error, indi-cate that fuzzy models and in particular SC-T–S fuzzy model is bet-ter than SOM and ANN models in predicting more number of flowvalues accurately. Lower values of PPTS statistics and in general thevalues of PPTS(2), PPTS(3), PPTS(5), etc., indicate the capability of amodel in forecasting higher flow values. The values of these statis-tics are presented in Tables 3 and 4. The lower values of PPTS sta-tistics (Tables 3 and 4) confirms that the proposed TSC-T–S fuzzymodel is capable in forecasting the higher flow values more accu-rately than the corresponding ANN, SOM and SC-T–S fuzzy models.

7.2. Forecasting at short times (>1 h lead time)

The short term forecast for several hours provides a clear guidein project operation or a warning to people going to be affected byinundation or alert the teams for keeping a vigil on embankmentsand levees, etc., along rivers. A forecasting model needs meteoro-logical data from the catchment, river flow data in reaches orstretches of river at the analysis point at the earliest. It is under-stood that in case of forecasting of flow at high lead periods, gen-erally the accuracy of flood forecasting decreases whenforecasting time increases (Nayak et al., 2005a). For real time fore-casting it is necessary to have a model that can operate within theadaptive mode (Wood and Connell, 1985). Forecasting at leadperiods more than one hour (>1 h) can be more accurately mod-eled by adding more input information at previous time steps.In practice the discharge values of previous time steps (Eqs. (37)and (38)) are not readily available. Therefore, a simple recursivealgorithm is used to obtain forecast for successive lead time. Inthis process known values of inputs were used to forecast Qt+1

at Mandla gauging site and thus in turn will serve to predictQt+2. This procedure is thus repeated until the computation ofthe spectrum of forecasted values ranging from Qt+1 to Qt+6 isobtained. Similarly, the model input represented by Eq. (38) indi-cates that the forecasted values of Manot gauging sites (for leadperiod 1 h and 2 h) are required for forecasting discharge atMandla gauging site at lead periods 5 h and 6 h. Therefore, the

Page 13: Improving real time flood forecasting using fuzzy inference system

Fig. 5. Variation of NS-Efficiency along the forecast time horizon for different data sets of river Narmada at Mandla gauging site (validation result model-MM).

A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41 37

flow values of Manot site forecasted from the previous three flowvalues were supplied as an input to the model by incorporating asuitable subroutine in the flood forecasting model developed forMandla gauging site. Using computed discharge as input to thesemodels may cause the error to carry over from one step toanother. However, such carry over errors may not have anysignificant effect due to higher accuracy of fuzzy and ANNmodels (Sudheer and Jain, 2003).

Results showing the performance statistics of all models at dif-ferent lead periods are presented in Tables 5 and 6. Further, theperformance of the models in terms of correlation between theforecasted and observed values of flows is presented in Fig. 3. Allthe four models indicate reduction in correlation coefficient withincrease in forecasting lead period. Further, it is apparent fromthese figures that the correlation statistic of TSC-T–S fuzzy modelis superior to the other models at all lead periods.

The RMSE values with different lead times are presented inFig. 4. It is depicted from Fig. 4 that the value of RMSE increaseswith forecasting lead period in all the models. However, it is evi-dent that the rate of rise of RMSE with lead time is smaller in caseof TSC-T–S fuzzy model. The TSC-T–S fuzzy model for M8994 fore-casted the flows with a RMSE of 278.321 m3/s at 6 h, and theSC-T–S, SOM and ANN models forecasted the flows with RMSE of284.98, 290.054 and 292.385 m3/s respectively. Similarly, theTSC-T–S fuzzy model for MM8994 forecasted the flows with aRMSE of 277.30 m3/s at 6 h, and the SC-T–S, SOM and ANN

model forecasted the flows with RMSE of 283.977, 288.521 and290.00 m3/s respectively.

Furthermore, the efficiency of the forecasting model along theall lead periods is improved when TSC-T–S fuzzy model is used(Fig. 5). The value of correlation coefficient and efficiency indicatea continuous falling trend. The rate of reduction of correlation coef-ficient and efficiency is highest in case of ANN model. However, theTSC-T–S fuzzy model indicates a comparatively lower rate ofreduction of both correlation coefficient and efficiency with leadperiod. Furthermore, it is also observed that the correlation andefficiency statistics is consistently in the same order during cali-bration and validation for TSC-T–S fuzzy model, which confirms agood generalization capability of the model.

Values of AARE and threshold statistics also indicate a reductionin prediction accuracy with increased lead period (Tables 5 and 6).The TS statistics of the models indicates that less number of datacan be accurately forecasted while increasing the forecasting leadperiod. These values indicate that all the models show comparableperformance in terms of AARE for different lead periods, while thethreshold statistics indicate that the number of data points fallingunder different error ranges is quite different for different models.The model validation results (Fig. 6) indicate that between 32% to49%of flow values are predicted at 1 h lead period by TSC-T–S fuzzymodel with 1% error compared to 31.5% to 48.6% for SC-T–S fuzzymodel, 31.2% and 48.5% for SOM and 24.12% and 48.1% for ANN. Itis also depicted from the Fig. 6 that the number of flow values

Page 14: Improving real time flood forecasting using fuzzy inference system

Fig. 6. Variation of TS-statistics along the forecast time horizon for different data sets of river Narmada at Mandla gauging site (validation result model-MM).

38 A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41

predicted at threshold statistics 1 (i.e. 1% error range) get drasti-cally reduced when predicting the flow at higher lead periods. Athigher threshold level i.e. at 20% (TS20) the difference in thresholdstatistics of 1 h lead period and 6 h lead period forecasts is small ascompared to lower threshold statistics. The threshold statisticsindicates the number of data points forecasted with desired valueof accuracy. However, due to large variation in continuous riverflow data series consisting of both high and low flow values, theTS statistics hardly provide any significant information about mod-el performance.

The PPTS criterion discussed in the previous section was used toverify the model performance at high floods. For this purpose PPTSvalues for highest 2%, 3%, 5%, 10% and 20% flows have been com-puted. It is depicted from Fig. 7 that the PPTS reduces with theincrease in lead period. Furthermore, the TSC-T–S fuzzy modelillustrates its preeminence over other models in predicting highflow values. This can further be verified from the low numericalvalues of the PPTS statistics for TSC-T–S fuzzy model presented

in Tables 5 and 6. However, in other flow region all the modelsshow comparably similar performance. The PPTS statistics indi-cates different patterns for different cases (M8990–M8994 andMM 8990–MM 8994). This indicates that in different data sets, var-iation in PPTS statistics is different and this can serve as an alterna-tive criterion for critically selecting a suitable flood forecastingmodel.

It is also observed that the forecasting models developed usinginput structure represented by Eq. (38) are superior than the onedeveloped using input structure represented by Eq. (37). This isdue to inclusion of upstream discharge into the forecasting modelhas a direct impact on model performance. Furthermore, the mod-els developed using long term data, i.e. 1989–1994 (Model M8994and MM8994) shows nearly average kind of model performanceindices. However, models developed using small data length hasvarying model performance. It is also to be noted that in generalthe model developed from different data lengths show almostsimilar performance indices and TSC-TS fuzzy model is always

Page 15: Improving real time flood forecasting using fuzzy inference system

Fig. 7. Variation of PPTS-statistics along the forecast time horizon for different data sets of river Narmada at Mandla gauging site (validation result model MM).

A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41 39

superior model. Fig. 8 also illustrate that TSC-TS fuzzy modelprovides better forecast in comparison to SC-T–S, SOM and ANNmodels. Furthermore, the proposed TSC-TS method provides a

Fig. 8. The observed and forecast hydrograph (model M, lead time 3 h).

single model instead of different models for different hydrologicconditions. Therefore, the proposed TSC-TS fuzzy model is easyto develop and has better operational utility.

8. Summary and conclusions

A new threshold subtractive clustering based T–S fuzzy model(TSC-T–S) algorithm has been developed for the continuous fore-casting of river flow during floods. Further, the described modelhas been tested for the forecasting of hourly flood discharge atMandla stream gauging site of river Narmada in Central India.The proposed approach is a superset of the classical subtractiveclustering based T–S fuzzy inference system (SC-T–S) and serveras a useful tool for developing flood forecasting models. The perfor-mance of the model during calibration and validation is evaluatedby performance indices such as root mean square error (RMSE),model efficiency and coefficient of correlation (R). A new perfor-mance index termed as peak percent threshold statistics is pro-posed to evaluate the performance of flood forecasting model.

Page 16: Improving real time flood forecasting using fuzzy inference system

40 A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41

The developed model has been tested for different lead periodsusing hourly rainfall and discharge data. A comparison with ANN,SOM and SC-T–S fuzzy models has shown a better performanceof the proposed technique. Although all of the fuzzy, SOM andANN based models performed almost equally well, the advantageof the proposed fuzzy model is that they may forecast high flowsmore accurately, which is the most important task in flood fore-casting. Practical application of the model takes only seconds forexecution in desktop computer. Therefore, the proposed fuzzyalgorithm enables and supports the creation and execution of realtime flood forecasting model. Flooding is a very complex andinherently uncertain phenomenon. Applications of fuzzy set theoryin various fields of engineering have successfully demonstrated itspotential in modeling uncertainty. This paper has successfullydemonstrated the application of fuzzy set theory in flood forecast-ing by introducing the threshold subtractive clustering approach.Furthermore, a new model performance criterion termed as peakpercent threshold statistics (PPTS) proposed herein provides veryuseful information for the evaluation of the performance of floodforecasting models. Additional research could be carried out inorder to establish the suitability of the proposed model andperformance criterion in various regions.

References

Abonyi, J., Babuska, R., Szeifent, F., 2002. Modified Gath-Geva fuzzy clustering foridentification of Takagi-Sugeno fuzzy models. IEEE Transactions on SystemsMan Cybernetics, Part B 32 (5), 612–621.

ASCE Task Committee on Application of Artificial Neural, Networks in Hydrology,2000a. Artificial neural networks in hydrology. I: Preliminary concepts. Journalof Hydrologic Engineering 5 (2), 115–123.

ASCE Task Committee on Application of Artificial Neural, Networks in Hydrology,2000b. Artificial neural networks in hydrology. II: Hydrologic applications.Journal of Hydrologic Engineering 5 (2), 124–137.

Atiya, A., El-Shoura, S., Shaheen, I., El-Sherif, M., 1999. A comparison between neuralnetwork forecasting techniques case study: River flow forecasting. IEEETransactions on Neural Network 10 (2), 402–409.

Babuška, R., 1998. Fuzzy Modeling for Control, International Series in IntelligentTechnologies. Kluwer Academic Publishers, Boston, USA.

Bezdek, J.C., 1981. Pattern Recognition with Fuzzy Objective Function Algorithms.Plenum Press, New York.

Bezdek, J.C., Pal, S.K., 1992. Fuzzy Models for Pattern Recognition. IEEE Publication,New York, NY.

Birikundavyi, S., Labi, R., Trung, H.T., Rousselle, J., 2002. Performance of neuralnetworks in daily streamflow forecasting. Journal of Hydrologic Engineering 7(5), 392–398.

Bishop, C.M., 1994. Neural network and their applications. Review of ScientificInstruments 65, 1803–1832.

Box, G.E.P., Jenkins, G.M., 1976. Time Series Analysis: Forecasting and Control.Holden-Day, Oakland, CA.

Chatterjee, C., Jha, R., Lohani, A.K., Kumar, R., Singh, R., 2001. Runoff curve numberestimation for a basin using remote sensing and GIS. Asian-Pacific RemoteSensing and GIS Journal 14, 1–7.

Chiu, S., 1994. Fuzzy model identification based on cluster estimation. Journal onIntelligent Fuzzy Systems 2, 267–278.

Chiu, S., 1996. Method and software for extracting fuzzy classification rules bysubtractive clustering. In: Fuzzy Information Proceeding Society, BiennialConference of the North American, pp. 461–465.

Dubois, D., Nguyen, H.T., Prade, H., Sugeno, M., 1998. Introduction: the realcontribution of fuzzy systems. In: Nguyen, H.T., Sugeno, M. (Eds.), FuzzySystems: Modeling and Control. Kluwer, Dordrecht, pp. 1–17.

Flood, I., Kartam, N., 1994. Neural networks in civil engineering. I: Principles andunderstanding. Journal of Computing in Civil Engineering 8 (2), 131–148.

Hall, M.J., Minns, A.W., 1999. The classification of hydrologically homogeneousregions. Hydrological Sciences – Journal 44 (5), 693–704.

Hellendoorn, H., Driankov, D. (Eds.), 1997. Fuzzy Model Identification: SelectedApproaches. Springer, Berlin, Germany.

Hecht-Nielsen, R., 1991. Neurocomputing. Addison-wesley publication Company,New York.

Höppner, F., Klawonn, F., Kruse, R., Runkler, T., 1999. Fuzzy Cluster Analysis. Wiley,New York.

Hsu, K.-L., Gupta, H.V., Sorooshian, S., 1995. Artificial neural network modeling ofthe rainfall-runoff process. Water Resources Research 31 (10), 2517–2530.

Hsu, Kuo-lin, Gupta, Hoshin V., Gao, Xiaogang, Sorooshian, S., Imam, B., 2002. Self-organizing linear output map (SOLO): an artificial neural network suitable forhydrologic modeling and analysis. Water Resources Research 38 (12), 38-1.

Hundecha, Y., Bardossy, A., Theisen, H.W., 2001. Development of a fuzzy logic-basedrainfall-runoff model. Hydrologic Sciences Journal 46 (3), 363–376.

Hykin, S., 1992. Neural Networks: A Comprehensive. Macmillion Foundation,Indianapolis, Indiana.

Imrie, C.E., Durucan, S., Korre, A., 2000. River flow forecasting using artificial neuralnetworks: generalization beyond the calibration range. Journal of Hydrology233, 138–153.

Jacquin, Alexandra P., Shamseldin, Asaad Y., 2006. Development of rainfall–runoffmodels using Takagi–Sugeno fuzzy inference systems. Journal of Hydrology 329(1–2), 154–173.

Jain, A., Ormsbee, L.E., 2002. Short-term water demand forecast modelingtechniques: conventional methods versus AI. Journal of American WaterWorks Association 94 (7), 64–72.

Jain, A., Indurthy, S.K.V.P., 2003. Comparative analysis of event based rainfall-runoffmodeling techniques-deterministic, statistical, and artificial neural networks.Journal of Hydrologic Engineering 8 (2), 93–98.

Jang, J.-S.R., Sun, C.-T., Mizutani, E., 2002. Neuro-Fuzzy and Soft Computing. PrenticeHall of India Private Limited, New Delhi.

Kar, A.K., Lohani, A.K., Goel, N.K., Roy, G.P., 2010. Development of Flood ForecastingSystem Using Statistical and ANN Techniques in the Downstream Catchment ofMahanadi Basin, India. Journal of Water Resource and Protection 2, 880–887.http://dx.doi.org/10.4236/jwarp.2010.210105, <http://www.SciRP.org/journal/jwarp>.

Kar, A.K., Goel, N., Lohani, A.K., Roy, G.P., 2012a. Application of clustering techniquesusing prioritized variables in regional flood frequency analysis – case study ofMahanadi Basin. Journal of Hydrologic Engineering 17 (1), 213–223.

Kar, A.K., Winn, L., Lohani, A., Goel, N., 2012b. Soft computing–based workable floodforecasting model for Ayeyarwady River Basin of Myanmar. Journal ofHydrologic Engineering 17 (7), 807–822.

Kiang, M.Y., Kulkarni, U.R., Goul, M.R., Philppakkis, A., Chi, R.T., Turban, E., 1997.‘‘Improving the effectiveness of self organization map networks using a circularKohonen layer’’. 30th HI International Conference on System Sciences (HICSS).Advanced Technology Track 5, 521–529.

Kisi, O., 2004. Daily suspended sediment modeling using a fuzzy-differentialevolution approach. Hydrological Sciences Journal 49 (1), 183–197.

Kisi, O., 2005. Suspended sediment estimation using neuro-fuzzy and neuralnetwork approaches. Hydrological Sciences Journal 50 (4), 683–696.

Kisi, O., Karahan, M.E., Sen, Z., 2006. River suspended sediment modeling usingfuzzy logic approach. Hydrological Processes 20 (20), 4351–4362.

Kisi, O., 2008. Constructing neural network sediment estimation models using adata-driven algorithm. Mathematics and Computers in Simulation 79 (1), 94–103.

Kisi, O., Shiri, J., Nikoofar, B., 2012. Forecasting daily lake levels using artificialintelligence approaches. Computers & Geosciences 41, 169–180.

Kohonen, T., 1982. Self-organized formation of topologically correct feature maps.Biological Cybernetics 43, 59–69.

Kohonen, T., 2001. Self-Organizing Maps, third ed. Springer, Berlin, Germany.Krishnaiah, P.R., Kanal, L.N., 1982. Classification, Pattern Recognition and Reduction

of Dimensionality. Hand Book of Statistics, vol. 2. North Holland, Amsterdam.Lohani, A.K., Goel, N.K., Bhatia, K.K.S., 2005a. Real time flood forecasting using fuzzy

logic. In: Perumal, M. (Ed.), Hydrological Perspectives for SustainableDevelopment, vol. I. Allied Publishers Pvt. Ltd., New Delhi, pp. 68–176.

Lohani, A.K., Goel, N.K., Bhatia, K.K.S., 2005b. Development of fuzzy logic based realtime flood forecasting system for river Narmada in Central India. In:International Conference on Innovation Advances and Implementation ofFlood Forecasting Technology, ACTIF/Floodman/Flood Relief, October, 2005,Tromso, Norway. <http://www.Actif.cc.net/conference2005/proceedings>.

Lohani, A.K., Goel, N.K., Bhatia, K.K.S., 2006. Takagi-Sugeno fuzzy inference systemfor modeling stage-discharge relationship. Journal of Hydrology 331,146–160.

Lohani, A.K., Goel, N.K., Bhatia, K.K.S., 2007a. Deriving stage–discharge–sedimentconcentration relationships using fuzzy logic. Hydrological Sciences – Journal52 (4), 793–807.

Lohani, A.K., Goel, N.K., Bhatia, K.K.S., 2007b. Reply to comments provided by Z. Senon ‘‘Takagi–Sugeno fuzzy system for modeling stage-discharge relationship’’ byA.K. Lohani, N.K. Goel and K.K.S. Bhatia. Journal of Hydrology 337 (1–2), 244–247.

Lohani, A.K., Goel, N.K., Bhatia, K.K.S., 2009. Rainfall Runoff Modelling Using FuzzyRule Based Approach. International Conference on Water, Environment, Energyand Society (WEES). Hydrologic and Hydraculic Modelling Volume I,257–263.

Lohani, A.K., Goel, N.K., Bhatia, K.K.S., 2011. Comparative study of neural network,fuzzy logic and linear transfer function techniques in daily rainfall-runoffmodeling under different input domains. Hydrological Processes 25, 175–193.

Lohani A.K., Rakesh Kumar, Singh, R.D., 2012. Hydrological time series modeling: acomparison between adaptive neuro fuzzy. Neural Network and AutoRegressive Techniques, Journal of Hydrology 442–443 (6), 23–35.

Luchetta, A., Manetti, S., 2003. A real time hydrological forecasting system using afuzzy clustering approach. Computers and Geosciences 29, 1111–1117.

Lundberg, A., 1982. Combination of a conceptual model and an autoregressive errormodel for improving short time forecasting. Nordic Hydrology, 233–246.

Mahabir, C., Hicks, F.E., Fayek, A.R., 2003. Application of fuzzy logic to forecastseasonal runoff. Hydrologic Processes 17, 3749–3762.

Maier, H.R., Dandy, G.C., 2000. Neural networks for the prediction and forecasting ofwater resources variables: a review of modelling Issues and applications.Environmental Modelling & Software 15, 101–124.

Nash, J., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models: part1. A discussion of principles. Journal of Hydrology 10, 282–290.

Page 17: Improving real time flood forecasting using fuzzy inference system

A.K. Lohani et al. / Journal of Hydrology 509 (2014) 25–41 41

Nayak, P.C., Sudheer, K.P., Ramasastri, K.S., 2005a. Fuzzy computing based rainfall-runoff model for real time flood forecasting. Hydrological Processes 19, 955–968. http://dx.doi.org/10.1002/hyp.5553.

Nayak, P.C., Sudheer, K.P., Rangan, D.M., Ramasastri, K.S., 2005b. Short-term floodforecasting with a neurofuzzy model. Water Resources Research 41, W04004.http://dx.doi.org/10.1029/2004WR003562.

Nielsen, S.A., Hansen, E., 1973. Numerical simulation of the rainfall-runoff processeson a daily basis. Nordic Hydrology 4 (3), 171–190.

Ren, Minglei, Wang, Bende, Liang, Qiuhua, Fu, Guangtao, 2010. Classified real-timeflood forecasting by coupling fuzzy clustering and neural network. InternationalJournal of Sediment Research 25 (2), 134–148.

Sahoo, G.B., Ray, C., 2006. Flow forecasting for a Hawaii stream using rating curvesand neural networks. Journal of Hydrology 317 (1–2), 63–80.

See, L., Openshaw, S., 1999. Applying soft computing approaches to river levelforecasting. Hydrological Science Journal 44 (5), 763–778.

See, L., Openshaw, S., 2000. A hybrid multi-model approach to river levelforecasting. Hydrological Science Journal 45 (4), 523–536.

Singh, V.P., 1989. Hydrologic Systems – Rainfall–Runoff Modeling, vol. II. PrenticeHall Inc., Englewood Cliffs, New Jersey, USA.

Sudheer, K.P., Gosain, A.K., Ramasastri, K.S., 2002. A data-driven algorithm forconstructing artificial neural network rainfall runoff models. HydrologicalProcesses 16, 1325–1330.

Sudheer, K.P., Jain, S.K., 2003. Radial basis function neural network for modelingrating curves. Journal of Hydrologic Engineering 8 (3), 161–164.

Sugeno, M., Yasukawa, T., 1993. A fuzzy-logic based approach to qualitativemodeling. IEEE Transactions on Fuzzy Systems 1, 7–31.

Takagi, T., Sugeno, M., 1985. Fuzzy identification of systems and its application tomodeling and control. IEEE Transactions Systems, Man and Cybernetics 15 (1),116–132.

Thirumalaiah, K., Deo, M.C., 1998a. Real-time flood forecasting using neuralnetworks. Computer-Aided Civil and infrastructure Engineering 13 (2), 101–111.

Thirumalaiah, K., Deo, M.C., 1998b. River stage forecasting using artificial neuralnetworks. Journal of Hydrologic Engineering 3 (1), 26–32.

Toth, E., 2009. Classification of hydro-meteorological conditions and multipleartificial neural networks for streamflow forecasting. Hydrology and EarthSystem Sciences 13, 1555–1566. http://dx.doi.org/10.5194/hess-13-1555-2009.

Vernieuwe, H., Georgieva, O., De Baets, B., Pauwels, V.R.N., Verhoest, N.E.C., Troch,F.P.De, 2005. Comparison of Data driven Takagi Sugeno models of rainfalldischarge dynamics. Journal of Hydrology 302, 173–186.

Wood, E.F., Connell, P.E., 1985. Real time forecasting. In: Andersons, M.G., Bust, T.P.(Eds.), Hydrologic Forecasting. John Willey & Sons Ltd..

Xiong, L., Shamseldin, A., O’Connor, K., 2001. A non linear combination of theforecasts of rainfall-runoff models by the first order Takagi-Sugeno fuzzysystem. Journal of Hydrology 245, 196–217.

Yager, R.R., Filev, D.P., 1994. Generation of fuzzy rules by mountain clustering.Journal of Intelligent and Fuzzy Systems 2, 209–219.

Yakowitz, S.J., 1985. Markov flow models and the flood warning problem. WaterResources Research 21, 81–88.

Yapo, P., Sorrooshian, S., Gupta, V., 1993. A Markov chain flow model for floodforecasting. Water Resources Research 29, 2427–2436.

Zadeh, L.A., 1965. Fuzzy sets. Information and Control 8, 338–353.