clustering-based hybrid inundation model for forecasting flood inundation depths

Journal of Hydrology 385 (2010) 257–268

Contents lists available at ScienceDirect

Journal of Hydrology

journal homepage: www.elsevier .com/ locate / jhydrol

Clustering-based hybrid inundation model for forecasting flood inundation depths

Li-Chiu Chang a,*, Hung-Yu Shen a, Yi-Fung Wang b, Jing-Yu Huang a, Yen-Tso Lin a

a Department of Water Resources and Environmental Engineering, Tamkang University, Taiwan, ROCb Water Resources Agency, Ministry of Economic Affairs, Taiwan, ROC

a r t i c l e i n f o s u m m a r y

Article history:Received 29 June 2009Received in revised form 12 February 2010Accepted 23 February 2010

This manuscript was handled byA. Bardossy, Editor-in-Chief, with theassistance of Efrat Morin, Associate Editor

Keywords:Flood inundation mapK-means clusteringBack-propagation neural networkLinear regression

0022-1694/$ - see front matter � 2010 Elsevier B.V. Adoi:10.1016/j.jhydrol.2010.02.028

* Corresponding author. Address: 151 Ying-Chuan25137, Taiwan, ROC. Tel.: +886 2 26258523; fax: +88

E-mail address: [email protected] (L.-C. Ch

Estimation of flood depths and extents may provide disaster information for dealing with contingencyand alleviating risk and loss of life and property. We present a two-stage procedure underlying CHIM(clustering-based hybrid inundation model), which is composed of linear regression models and ANNs(artificial neural networks) to build the regional flood inundation forecasting model. The two-stage pro-cedure mainly includes data preprocessing and model building stages. In the data preprocessing stage,K-means clustering is used to categorize the data points of the different flooding characteristics in thestudy area and to identify the control point(s) from individual flooding cluster(s). In the model buildingstage, three classes of flood depth forecasting models are built in each cluster: the back-propagation neu-ral network (BPNN) for each control point, the linear regression models for the grids that have highly lin-ear correlation with the control point, and a multi-grid BPNN for the grids that do not have highly linearcorrelation with the control point. The practicability and effectiveness of the proposed approach is testedin the Dacun Township, Changhua County in Central Taiwan. The results show that the proposed CHIMcan continuously and adequately provide 1-h-ahead flood inundation maps that well match the simula-tion flood inundation results and very effectively reduce 99% CPU time.

� 2010 Elsevier B.V. All rights reserved.

Introduction

Floods are one of the most dangerous natural hazards and thegreatest challenge for hydrologists due to their mass force andshort response time. It can be seen that floods occur regularly, bothnationally and globally, and more frequently and severely in thepast decades due to land use and climate change. Disastrous floodscan cause major loss of life and result in catastrophic outcomes.Taiwan is located in the northwestern Pacific Ocean where theactivities of the subtropical jet stream are frequent. In the last cen-tury, there were about 360 typhoons, an average of 3.6 annually,that hit the Taiwan Island. Typhoons are usually coupled with hugeamounts of rain from June to October, and disastrous flooding re-sults from the intense bursts of rainfall. The rivers in this islandare short and steep, and their flows are relatively quick with floodslasting only for a few hours. The large flood peaks with fast-risinglimbs would unavoidably cause serious disasters. For example, in1996 Typhoon Herb triggered a storm with maximum accumulatedprecipitation reached 160 mm/h and 1748 mm/day, causing con-siderable casualties and property damage, and severely harmingthe basic infrastructure of the country (Cheng and Wang, 2004);Typhoon Nari struck north Taiwan with stunning rainfall on

ll rights reserved.

Rd., Tamsui, Taipei County6 2 26209651.ang).

September 17th, 2001 with the highest precipitation reaching149 mm/h. It caused 27 deaths, many civilian injuries, and severelydamaged the city functions, resulting in countless economiceffects.

The recent sequence of floods has raised public awareness offlooding. As a result, increased expenditure is being made on flooddefenses. Historically, hard engineering solutions, such as leveeand flood detention, have been preferred with associated environ-mental disbenefits, but alternative approaches may have signifi-cant environmental benefits. In the last decades, the WaterResources Agency (WRA), Taiwan, has devoted a great deal of effortin flood defense strategies. To provide protection for rare floodswould not be economically viable; consequently, in recent years,flood damage reduction and prevention in Taiwan has also beenmore focused on non-structural efforts, such as flood alert andwarning system.

Establishing flood and inundation forecasting systems could ex-tent information that would be helpful in dealing with contingen-cies and emergencies, and then alleviate the risk and losses of lifeand property. The ability to model potential flood inundation andproduce maps is not only of major concern for disaster responseand flood management but it also poses a key task for hydrologiststo generate a reference for damage estimation. During past dec-ades, predictions of flood inundation extent have been made possi-ble by advances in numerical modeling techniques and thesynergistic use of radar and optical remote sensing in conjunction

http://dx.doi.org/10.1016/j.jhydrol.2010.02.028

mailto:[email protected]

http://www.sciencedirect.com/science/journal/00221694

http://www.elsevier.com/locate/jhydrol

258 L.-C. Chang et al. / Journal of Hydrology 385 (2010) 257–268

with GIS modeling. For example, Hsu et al. (2002) proposed a two-dimensional (2D) inundation model to simulate flood inundationdue to storm rainfall. Horritt and Bates (2001, 2002) indicated thatsimulated topographic properties significantly affect simulation re-sults and that topography is a major factor in determining floodinundation patterns. Kang (2009) has applied an integrated urbaninundation model to estimate the influence on the extent of floodinundation with a high building density. On the other hand, remotesensing has proved invaluable in mapping flood extent for deter-ministic model calibration (Horritt, 2000) and validation (Horrittand Bates, 2002). The Flood Space Monitoring Information Systemhad been operated in Kazakhstan for several years. It can decodespace image to localize high flooding and to forecast developmentof flooding (Spivak et al., 2004). Bates et al. (2004) have estimatedspatially distributed uncertainty in models conditioned againstbinary pattern data contained in flood inundation maps by usingthe generalized likelihood uncertainty estimation procedure. Hor-ritt (2006) presented a comparison of uncertain spatial predictionsof flood extent with observed data. SAR (synthetic aperture radar)is combined with LiDAR (light detection and ranging) measure-ments to improve estimation of flood extent delineation or floodparameters (Mason et al., 2007; Zwenzner and Voigt, 2008).

Find the cluster centers as the control points

K-means Clustering Analys

Flood Inundation Potential Database

24 houdepth h

Training Data

Mo

Fig. 1. The flowchart of

Schumann et al. (2007) presented an improvement version ofregression and elevation-based flood information extraction model(REFIX) to fit river bank elevation from SAR.

The conventional inundation models need large computationaltimes as iterative solutions to carry out the high-resolution spatialdiscretization. Moreover, to implement appropriate mitigationstrategies, they require analyzing different flood scenarios undervaried hydro-meteorological conditions as well as with differentmitigation alternatives. Consequently, it is very difficult to reachon-line simulation and/or forecast of the inundation extent byusing conventional inundation models. Artificial neural networks(ANNs) have been applied within the field of hydrological model-ing in recent years (Imrie et al., 2000; Chang et al., 2001; Sivaku-mar et al., 2002; Rajurkar et al., 2004; Anctil and Rat, 2005;Chaves and Chang, 2008; Chiang and Chang, 2009). This computa-tional method offers real advantages over conventional modeling,especially when the underlying physical relationships are not fullyunderstood. However, relatively little attention has been paid tothe use of the ANNs for flood estimation in ungauged catchments(Dastorani and Wright, 2001; Dawson et al., 2006; Lin and Wu,2007) and/or inundation maps. The main purpose of this study isto propose a novel methodology of the regional flood inundation

is

X, Y

rs flood istrogram

9.02 ≥R

del Classification

the proposed CHIM.

L.-C. Chang et al. / Journal of Hydrology 385 (2010) 257–268 259

forecasting model by using a hybrid K-means clustering, linearregressions and ANNs methods. To solve the lack of the observedflood inundation data, the regional flood inundation forecastingmodels are trained and tested based on the flood inundation datagenerated by the two-dimensional non-inertial overland flow sim-ulation model with a large number of pre-designed rainfall pat-terns and scenarios, then, innovates the methodology forbuilding the inundation extent forecasting model. The practicabil-ity and accuracy of the proposed methodology is evaluated in Da-cun Township, Changhua County in Central Taiwan. We woulddemonstrate that the proposed methods can be applied with highaccuracy to the study of one-step-ahead forecasting flood inunda-tion extent and flood depths.

Methodology

During floodplain inundation, there are hydro-geomorphologicfactors that could significantly influence inundating depth andduration. For example, low-lying land is easily inundated; theflooding usually would be serious in the nearby river and drainagesystem where floodwater might overflow from the levee and/orcould not effectively drain out. To provide an effectively and timelyregional inundation map, the regional flooding characteristicsshould be investigated and the whole area clustered into severalgroups; then, models would be built in accordance with eachcluster.

We propose a two-stage approach to build the regional floodinundation forecasting model that includes data preprocessingand model building, as shown in Fig. 1. The simulation flood inun-dation potential data of the study area are partitioned into threeindependent subsets: training, validation and testing subsets forcross validation to assess the proposed approach. The training sub-set is used to adjust the model parameters, the validation subset isused to choose the best model, and the testing subset is used tomeasure the generalization performance of the selected model(Haykin, 2009). In the data preprocessing stage, we use the training

Shigo drainage canalRegional agricultural drain

Elevation (m)237- 254220- 237203- 220186- 203169- 186152- 169135- 152119 - 135102 - 11985- 10268- 8551- 6834- 5117- 341- 17

Fig. 2. The elevation distribution of Dacun Tow

subset to identify all the grids in the study area as inundated ornon-inundated grids; then, the K-means clustering method is usedto categorize the different flooding characteristics and to find the Kcluster centers as the control point in the K individual clusters. Thecoordinates (X, Y) and the 24 h flood depth histogram of each gridare taken as the flooding characteristics; then, setting the appropri-ate number of clusters, the K-means method finds the cluster cen-ters that can be taken as the flood index of each cluster orindividual flooding group, and the grids in the investigated areaare assigned to the affiliated cluster. Therefore, the grids within acluster have a similar flood index (characteristics). The grids ineach cluster are further divided into three classes to build theirrespective inundation forecasting model. They are the controlpoint, the ‘‘linear grids” having highly linear relationship withthe control point and the ‘‘nonlinear grids” having nonlinear rela-tionship with the control point. In this study, the coefficient ofdetermination (R2), the square of the correlation coefficient be-tween the flood depths of the grid and the relative control pointis chosen as 0.9 as the criterion for linear correlation.

In the model building stage, three classes of flood depth fore-casting models are built in each cluster: the back-propagation neu-ral network (BPNN) for each control point, the linear regressionmodels for the linear grids, and a multi-grid BPNN for the nonlineargrids. To distinguish the two types of BPNN used in this study, theflood depth forecasting model of each control point is called single-grid BPNN, which only relates to the specific control point, and themodel of the nonlinear grids is called multi-grid BPNN that canforecast 1-h-ahead flood depths of multiple grids in a specificcluster.

The K-means clustering analysis

In rudimentary exploratory procedures, a clustering analysis isquite helpful in identifying groups that both minimize within-group variation for data in a cluster and maximize between-groupvariation to identify potential differences between clusters. The

age system

ship and location of the drainage systems.


K-means clustering (MacQueen, 1967; Anderberg, 1973) is one ofthe simplest clustering algorithms and has been applied in a vari-ety of fields, including image recognition (Shapiro and Stockman,2001; Duygulu et al., 2002), biology (Tavazoie et al., 1999; Bozinovand Rahnenfuhrer, 2002; Wei et al., 2005) and hydrological models(Fowler and Kilsby, 2002; Lin and Chen, 2004).

The main idea is to partition a collection of data into k mutuallyexclusive clusters and to assign each object to the cluster that hasthe closest center. The K-means clustering uses the squared-error

)(ˆ&)1(ˆ tHtH cc −

Fig. 3. The architecture of the BPNN flood inunda

)(ˆ&)1(ˆ tcHtcH +

)1(ˆ&)(ˆ −tptHtptHii

Fig. 4. The architecture of the multi-grid BPNN flood in

criterion function (defined as Eq. (1)) as a performance index whenassigning cluster centers. First, the initial cluster centers are usu-ally picked randomly from among all of the data points. Then, hillclimbing techniques (a family of local search) are used to minimizeEq. (1) to determine K cluster centers and to separate the objectsinto clusters.

J ¼Xk

i¼1

Ji ¼Xn

i¼1

Xk

j¼1

wij xi � cj

�� 2 ð1Þ

)1(ˆ +tHc

tion forecasting model at the control points.

)1(ˆ +tptHi

undation forecasting model at the nonlinear grids.

Table 1Number of grids in five clusters of 24 h flood depth histogram.

Cluster Mean depth (m) Non-inundated

BPNN Linearmodel

Totalgrids

Training Validation Testing

1 0.01 0.01 0.02 2117 411 76 26042 0.11 0.11 0.11 2 506 532 1040


where n is the number of data points, k is the number of clusters, xi

is the ith data point, ci is the ith cluster center, W ¼ ½wij� is an n � kpattern association matrix, the element wij is 1 if xi belongs to clus-ter i, and 0 otherwise. wij is defined as

wij ¼1; ifkxi � cjk 6 kxi � cmk; 8m–j

0; otherwise

8><>: ð2Þ

with the properties ofPk

j¼1wij ¼ 1;8i ¼ 1; . . . ;n andPk

j¼1

Pni¼1wij ¼

n.The algorithm is computationally efficient and gives surpris-

ingly good results if the clusters are compact, hyper-spherical inshape and well-separated in the feature space. The performancehighly depends on the choice of initial centers and the number ofclusters. A possible approach is to repeat the operation severaltimes with different sets of initial cluster centers and the numberof clusters to choose the best solution based on the tightness ofclusters.

3 0.36 0.37 0.37 0 407 112 5194 0.80 0.80 0.80 0 127 242 3695 1.29 1.29 1.30 0 23 60 83

Total grids – – – 2119 1474 1022

4615

Table 2The performance of BPNN at the control points.

Cluster No. of neuron RMSE (m)


1 4 0.021 0.024 0.0222 3 0.003 0.003 0.0033 3 0.040 0.049 0.0444 3 0.047 0.051 0.0515 3 0.040 0.041 0.041

The back-propagation neural network

The back-propagation neural network (BPNN) is the most pop-ular and widely used. It is a multilayer feedforward artificial neuralnetwork, and consists of an input layer, one or more hidden layersand an output layer. In the training phase, the synaptic weights ofthe feedforward network are adjusted by using supervised learningto minimize the error between the network output and the desiredoutput. In this study, the minimization of energy function is per-formed by the steepest descent method, and on-line learning isused to adjust the synaptic weights on an example-by-example ba-sis. After training, the weights are fixed and the input value is prop-agated through the network until it reaches the output layer.Details on the BPNN algorithm can be found in the literature(Rumelhart et al., 1986; Haykin, 2009).

Fig. 5. The distribut

Application

The Dacun township, Changhua County in Central Taiwan in themunicipal area covers an area about 31 km2 with the elevationsranging from 14 to 252 m, and the Shigo drainage canal passesthrough the town centre, as shown in Fig. 2. This canal has a lengthof 21 km with maximum design drainage capacity for 5-year re-turn period flood. The over-concentrated rainfall often exceedsthe drainage capacity and flood inundation may occur. In the pastreports and historical hazard records, this town is a inundation-prone area that frequently suffers from severe flood disasters that

ion of clusters.

Table 3The performance of the linear regression models at the linear grids.


cause the serious loss of agricultural crops. In recent years, the gov-ernment has spent a great deal of effort and money on dredging therivers and improving the drainage systems and flood-proofingmeasures. To further decrease the flood damage, a flood warningsystem should be established for forecasting/estimating flooddepths and flood extents.

Cluster Linear parameters RMSE (m)

a b Training Validation Testing

1 0.27–1.17 �0.03 to 0.002 0.046 0.048 0.0462 0.56–7.38 �0.025 to 0.015 0.017 0.017 0.0173 0.69–1.64 �0.05 to 0.061 0.067 0.067 0.0674 0.74–2.23 �0.03 to 0.24 0.104 0.105 0.1055 0.59–2.58 �0.32 to 0.9 0.182 0.182 0.186

Mean 1.30 �0.056 0.073 0.073 0.074

The available data

The flood inundation data, published by the Water ResourcesAgency (WRA), Taiwan, are simulated by the model linkage ofHEC-1 model, SWMM (the storm water management model)and the two-dimensional non-inertial overland flow simulationmodel. The model inputs were the designed events of five nearby

0 0.2 0.4 0.6 0.80

0.20.40.60.8

1

1.21.41.61.8

2

(a)Trai

0 0.2 0.4 0.6 0.80

0.20.40.60.8

11.21.41.61.8

2

(b)Va

0 0.2 0.4 0.6 0.80

0.2

0.4

0.6

0.8

1

1.2

1.41.6

1.8

2

(c)Te

simulate

simulated

simulat

estim

ated

floo

d de

pth

(m)

estim

ated

floo

d de

pth

(m)

estim

ated

floo

d de

pth

(m)

Fig. 6. Comparison of simulated versus estimat

rain gauges (R1–R5), the elevation and Manning roughness ofgrids. The designed events of five nearby rain gauges were 20-,

1 1.2 1.4 1.6 1.8 2

ning data

1 1.2 1.4 1.6 1.8 2

lidation data

1 1.2 1.4 1.6 1.8 2

sting data

d flood depth (m)

flood depth (m)

ed flood depth (m)

ed flood depths at the fifth cluster center.


25-, 50-, 100- and 200-year 24-h duration rainfall; the elevationand Manning roughness were obtained from DEM (digital eleva-tion model) and DTM (digital terrain model) with land uses,

(a)Trai

(b)Valid

(c)Tes

Fig. 7. Comparison of simulated versus est

respectively. The spatial resolution is 80 m � 80 m, and the verti-cal accuracy is 0.1 m. Manning’s n roughness coefficient was setto 0.02 for bare earth, to 0.05 for Commercial and Industrial, to

ning data

ation data

ting data

imated flood depths at all linear grids.

Table 4The performance of multi-grid BPNN at the nonlinear grids.

Cluster No. of neuron RMSE (m)


1 13 0.052 0.066 0.0672 7 0.065 0.078 0.0793 7 0.077 0.164 0.1674 10 0.132 0.208 0.2105 8 0.210 0.291 0.290

Mean – 0.081 0.126 0.127


0.08 for residential, and to 0.15 for agricultural land. The investi-gated area is 29.5 km2 which can be further distributed into a to-tal of 4615 grids with each grid size of 80 m � 80 m. The totalsimulated events were 120 events divided into 60 events fortraining data, 30 events for validation data and 30 events for test-ing data. Each event has 24-h hydrographs of flood depths for thewhole investigated area. The total number of data is 132,91,200(120 events � 24 h � 4615 grids).

Forecasting model set-up

The regional flood inundation forecasting model was built bythree classes of models in each cluster; one class of model is thelinear regression model to the linear grids, and another two classes

(a)Train

(b) Val

(c)T

estim

ated

floo

d de

pth

(m)

estim

ated

floo

d de

pth

(m)

estim

ated

floo

d de

pth

(m)

simulated

simulated

simulated fl

Fig. 8. Comparison of simulated versus estimated fl

of models are the BPNN model to the control points and the non-linear grids. For the BPNN model, one or two hidden layers are

ing data

idation data

esting data flood depth (m)

flood depth (m)

ood depth (m)

ood depths at the nonlinear grids of cluster 5.


tested in the study case, the activation functions are sigmoid trans-fer functions for both hidden and output layers, and the number ofneurons is identified by a trial-and-error procedure.

(a)Traini

(b)Valida

(c)Test

Fig. 9. Comparison of simulated versus estim

Control point: the single-grid BPNN modelFor each control point, 1-h ahead flood depth forecasting is per-

formed by the single-grid BPNN model. The input variables include

ng data

tion data

ing data

ated flood depths at all nonlinear grids.


current and previous (t & t � 1) rainfall of five nearby rain gaugesand previous (t & t � 1) forecasting flood depths of the controlpoint, and the output is next step (t + 1) flood depth, as shown inFig. 3. The input dimension is 12 and the output dimension is only1. There are 1440 training data sets (60 events � 24 h), and 720validation and testing data sets, respectively.

Linear grids: the linear regression modelsFor every linear grid in each cluster, we build a simple regres-

sion model for its 1-h-ahead flood depth forecasting. Every grid

(a)simulate

(b)estimate

Fig. 10. Comparison of simulated and estim

has also 1440 training data sets. The simple linear regression equa-tion is defined byY ¼ aX þ b ð3Þwhere Y is forecasted flood depth at the linear grid at t + 1 time; X isforecasted flood depth at the relative control point at t + 1 time; aand b are parameters.

Nonlinear grids: the multi-grid BPNN modelFor the nonlinear behavior grids in a cluster, one multi-grid

BPNN model is constructed to forecast 1-h-ahead flood depths

d result

d result

Flood depth (m)

Flood depth (m)

ated peak flood depth for 006 event.


for all the multiple grids in the specific cluster, as shown in Fig. 4.The input variables include meteorological information, topo-graphic information, hydro-geomorphologic information and theflood depths at that grid and its relative control point. The meteo-rological information includes current (t) rainfall of five nearbyrain gauges; the topographic information is the distance betweenthe forecasted grid and its relative control point; the hydro-geo-morphologic information is Manning’s roughness coefficient ofthe forecasted grid; the forecasting flood depths include the rela-tive control point’s current and previous (t + 1 & t) forecasting flooddepths and its own previous (t & t � 1) estimated flood depths.Note that we only have model forecasting (estimating) depths forthe control point and grids because it is impossible to have realobservation inundation depth during flood. The dimension of in-puts is 12, and the number of training data is 60 events � 24 h� the number of the nonlinear grids for each cluster.

Table 5Evaluation of the peak flood depths in case 006 event.

Flooddepth(m)

MAE(m)

No. of grids Inundated volume (103 m3)

Simulated Estimated Simulated Estimated TIVPE(%)

0–0.5 0.04 3858 3840 1922 1368 �290.5–1 0.14 354 392 1665 1609 �3.31–2 0.13 350 327 2808 2776 �1.12–3 0.14 46 50 676 669 �1.13–4 0.25 7 6 146 143 �2.3

Total 0.06 4615 4615 7217 7086 �9.0

Results

The performance of the proposed approach is presented basedon the criteria of mean absolute error (MAE), root mean squareerror (RMSE) and total inundated volume percentage error (TIVPE)as shown below:

MAE ¼Pn

i¼1jHi � Hijn

ð4Þ

RMSE ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPni¼1ðHi � HiÞ2

n

sð5Þ

TIVPE ¼Pn

i¼1V i �Pn

i¼1ViPni¼1Vi

� 100% ð6Þ

where Hi and V i are the estimated flood depth and volume at the ithgrid, respectively; Hi and Vi are the simulated flood depth and vol-ume at the ith grid. TIVPE measures the total estimated inundatedvolume compared with the simulated volume.

In the study area of Dacun Township, the K-means method par-titioned the inundated area into five clusters shown in Fig. 5. Foreach cluster, the 24-h average flood depth is calculated and thenumber of grids is also counted as shown in Table 1. The gridsare taken as non-inundated grids and do not need any model forforecasting flood depth when their average flood depth in thetraining data is less than 0.01 m. Therefore, all grids are firstidentified inundated or non-inundated grids; then, according toR2; all grids are separated into Linear and BPNN for their flooddepth forecasting. In cluster 1, 2117 out of 2604 grids are non-inundated grids without any model, 76 grids are estimated bythe linear models, and 411 grids are estimated by a BPNN. In otherwords, most of the grids in this cluster have never been inundated.In clusters 2–5, we can find larger portions of grids that can easilybe estimated by the linear regression. That means those grids havehighly linear correlation (R2 P 0:9) with their relative controlpoints.

For each cluster, we first define the K-means method centerpoint as the control point and construct a single-grid BPNNbased on the nearby rain gauges information to estimate itsinundation variability. Table 2 shows that only a small numberof hidden nodes are requested for constructing a suitable sin-gle-grid BPNN to each control point with satisfactory accuracy.The constructed single-grid BPNNs could be well trained andprovide very small RMSEs (i.e. less than 5 cm) in all situations(i.e. Training, Validation and Testing cases) for all the controlpoints (cluster centers). To more clearly demonstrate the perfor-mance, a comparison of the estimated results with simulatedflood depths at the fifth cluster center is shown in Fig. 6. It

appears that almost all pairs of estimated and simulated gridsare close to the 1-to-1 line, which means very good forecastingperformance. These results demonstrate that inundation variabil-ity in each cluster could be predicted with high accuracy by theconstructed BPNN based on the nearby rain gauges and relativecontrol point information.

For those grids which have a highly linear correlation withtheir control points, we can construct a simple linear regressionmodel for each grid. The number of regression models for eachcluster is equal to the number of grids in each cluster. The linearparameters of Eq. (6) range from 0.27 to 7.38 and �0.32 to 0.9 fora and b, respectively, and the average results (RMSE) of the mod-els in each cluster are shown in Table 3. It appears the RMSEs arequite small for all the cases. The largest RMSEs are obtained incluster 5, which are less than 18 cm. Note that the average flooddepth in cluster 5 is about 1.29 m; consequently, these perfor-mances are adequate and acceptable. Fig. 7 shows the linearregression models’ performances of the training, validation andtesting data sets in all clusters. Apparently, the linear regressionmodels nicely fit the simulated flood depths of the linear gridsin all cases.

For those grids that do not have a highly linear correlationwith their correspondent control points, a multi-grid BPNN isconstructed to estimate their inundation variability. The numbersof hidden neurons of BPNN and model performance (RMSE) areshown in Table 4. The 1-h-ahead forecast of flood depths couldbe well estimated by the constructed multi-grid BPNN, especiallyfor the grids in clusters 1 and 2 where the RMSEs are all lessthan 8 cm. Again, the forecasted errors (RMSE) for cluster 5 seemmuch greater than those of other clusters. This is mainly becausethe flood depth of cluster 5 is much greater than in all otherclusters. The estimated and simulated flood depth of cluster 5is shown in Fig. 8. All pairs of estimated and simulated gridsare falling within a limited range to the diagonal line, whichmeans suitable forecasting has been achieved. Figs. 8 and 9 showthat BPNN is able to fit nonlinear flood depths at nonlinear gridsin all clusters.

In summary, the study area has 4615 grids, and they are dividedinto five clusters. There are five single-grid BPNNs constructed forfive control points, 1022 linear regression models for 1022 lineargrids, five multi-grid BPNNs for 1474 nonlinear grids, and nonemodel for the rest of 2119 non-inundated grids.

To map model performance for the entire study area, a compar-ison of flood depths and extents in the case of a testing event by thesimulated and estimated models is shown in Fig. 10. Table 5 showsthat the values of MAE are small, and number of grids and inun-dated volumes are very close to the simulated inundation data atthe different flood depth level. The distribution of flood depths isalmost the same. We can observe that the map obtained by theproposed models fits that of the simulated model well. The hybridmodels took less than 1 s while the simulation models neededmore than 17 min.


Conclusion

We present a framework for the clustering-based hybrid inun-dation model (CHIM) with a two-stage procedure to build the re-gional inundation model. By employing a clustering analysis, themodeling process can be simplified into few clusters of similarproperties suited to linear regression model or BPNN. The linearregression model is a simple but useful formula to predict the out-put value as this variable has a highly linear correlation with theinput variable. BPNN indeed has powerful computation and out-standing ability of forecasting the flood depths of a single controlpoint or multiple grids. Due to the high flexibility of the frame-work, the CHIM can have a number of variants. For instance, theclustering algorithm can be changed to other algorithms for dataclassification. Furthermore, we can replace BPNN with other pow-erful or suitable stochastic or deterministic models.

In the last 5 years, the government has spent considerable effortand funding to enhance the flood inundation potential databasewith high resolution DEMs, which was generated by a series ofnumerical simulation of models, in Taiwan. Although the databasewould be very helpful for flood mitigation and flood defense strat-egies, this high-resolution database with a huge amount of datawill face the great challenges of storage, querying processes andapplication limitations. The proposed CHIM is an effective method-ology that has powerful and fast computation ability for generatinga 1-h-ahead flood inundation map within a few seconds; moreover,a huge amount of inundation data does not need to be saved in thedata storage devices for the flood warning applications. In the fu-ture work, lead-time increment can be expected by using this flex-ible framework and the rainfall forecasting techniques.

Acknowledgements

The authors thankfully acknowledge the financial support forthis research by Water Resources Agency, Ministry of Economic Af-fair, Taiwan, ROC (Grant No. MOEAWRA0970303). Valuable infor-mation and simulated data provided by Dr. Lai J. S., HydrotechResearch Institute, National Taiwan University are greatlyappreciated.

References

Anctil, F., Rat, A., 2005. Evaluation of neural network streamflow forecasting on 47watersheds. Journal of Hydrologic Engineering 10 (1), 85–88.

Anderberg, M.R., 1973. Cluster Analysis for Application. Acdemic, New York.Bates, P.D., Horritt, M.S., Aronica, G., Beven, K., 2004. Bayesian updating of flood

inundation likelihoods conditioned on flood extent data. Hydrological Processes18 (17), 3347–3370.

Bozinov, D., Rahnenfuhrer, J., 2002. Unsupervised technique for robust targetseparation and analysis of DNA microarray spots through adaptive pixelclustering. Bioinformatics 18 (5), 747–756.

Chang, F.J., Liang, J.M., Chen, Y.C., 2001. Flood forecasting using radial basis functionneural networks. IEEE Transactions on Systems, Man, and Cybernetics Part C:Applications and Reviews 31 (4), 530–535.

Chaves, P., Chang, F.J., 2008. Intelligent reservoir operation system based onevolving artificial neural networks. Advances in Water Resources 31, 926–936.

Chiang, Y.M., Chang, F.J., 2009. Integrating hydrometeorological information forrainfall–runoff modeling by artificial neural networks. Hydrological Processes23 (11), 1650–1659.

Cheng, S.P., Wang, R.Y., 2004. Analyzing hazard potential of typhoon damage byapplying grey analytic hierarchy process. Natural Hazards 33 (1), 77–103.

Dawson, C.W., Abrahart, R.J., Shamseldin, A.Y., Wilby, R.L., 2006. Flood estimation atungauged sites using artificial neural networks. Journal of Hydrology 319, 391–409.

Dastorani, M.T., Wright, N.G., 2001. Application of artificial neural networks forungauged catchment flood prediction. In: Floodplain Management AssociationConference, San Diego, CA (March).

Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.A., 2002. Object Recognition asmachine translation: learning a lexicon for a fixed image. Lecture Notes inComputer Science 2353, 349–354.

Fowler, H.J., Kilsby, C.G., 2002. A weather-type approach to analysing waterresource drought in the Yorkshire region from 1881 to 1998. Journal ofHydrology 262, 177–192.

Haykin, S., 2009. Neural Networks and Learning Machines, third ed. PearsonPrentice Hall, New Jersey.

Horritt, M.S., 2000. Calibration of a two-dimensional finite element flood flowmodel using satellite radar imagery. Water Resources Research 36 (11), 3279–3291.

Horritt, M.S., 2006. A methodology for the validation of uncertain flood inundationmodels. Journal of Hydrology 326, 153–165.

Horritt, M.S., Bates, P.D., 2001. Predicting floodplain inundation: raster-basedmodelling versus the finite-element approach. Hydrological Processes 15 (5),825–842.

Horritt, M.S., Bates, P.D., 2002. Evaluation of 1D and 2D numerical models forpredicting river flood inundation. Journal of Hydrology 268 (1–4), 87–99.

Hsu, M.H., Chen, S.H., Chang, T.J., 2002. Dynamic inundation simulation ofstormwater interaction between sewer system and overland flows. JournalChinese Institute of Engineers 25 (2), 171–177.

Imrie, C.E., Durucan, S., Korre, A., 2000. River flow prediction using artificial neuralnetworks: generalisation beyond the calibration range. Journal of Hydrology233, 138–153.

Kang, S.H., 2009. The application of integrated urban inundation model in Republicof Korea. Hydrological Processes (online).

Lin, G.F., Chen, L.H., 2004. A non-linear rainfall–runoff model using radial basisfunction network. Journal of Hydrology 289, 1–8.

Lin, G.F., Wu, M.C., 2007. A SOM-based approach to estimating design hyetographsof ungauged sites. Journal of Hydrology 339, 216–226.

MacQueen, J. B., 1967. Some methods for classification and analysis of multivariateobservations. In: Proceedings of 5th Berkeley Symposium on MathematicalStatistics and Probability, vol. 1. University of California Press, pp. 281–297.

Mason, D.C., Horritt, M.S., Dall’Amico, J.T., Scott, T.R., Bates, P.D., 2007. Improvingriver flood extent delineation from Synthetic Aperture Radar using airbornelaser altimetry. IEEE Transactions on Geoscience and Remote Sensing 45, 3932–3943.

Rajurkar, M.P., Kothyari, U.C., Chaube, U.C., 2004. Modeling of the daily rainfall–runoff relationship with artificial neural network. Journal of Hydrology 285, 96–113.

Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learning internal representationsby error propagation. Parallel Distributed Processing 1, 318–362.

Schumann, G., Hostache, R., Puech, C., Hoffmann, L., Matgen, P., Pappenberger, F.,Pfister, L., 2007. High-resolution 3D flood information from radar for effectiveflood hazard management. IEEE Transactions on Geoscience and RemoteSensing 45, 1715–1725.

Shapiro, L.G., Stockman, G.C., 2001. Computer Vision. Prentice Hall, Upper SaddleRiver, NJ.

Sivakumar, B., Jayawardena, A.W., Fernando, TMKG., 2002. River flow forecasting:use of phase-space reconstruction and artificial neural networks approaches.Journal of Hydrology 265, 225–245.

Spivak, L., Arkhipkin, O., Pankratov, V., Vitkovskaya, I., Sagatdinova, G., 2004. Spacemonitoring of floods in Kazakhstan. Mathematics and Computers in Simulation67, 365–370.

Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M., 1999.Systematic determination of genetic network architecture. Nature Genetics22, 281–285.

Wei, Z., Altun, G., Harrison, R., Tai, P.C., Yi, P., 2005. Improved K-meansclustering algorithm for exploring local protein sequence motifs representingcommon structural property. IEEE Transactions on NanoBioscience 4 (3), 255–265.

Zwenzner, H., Voigt, S., 2008. Improved estimation of flood parameters bycombining space based SAR data with very high resolution digital elevationdata. Hydrology and Earth System Sciences Discussions 5 (5), 2951–2973.

clustering-based hybrid inundation model for forecasting flood inundation depths

Documents