roc-based calibration of flood inundation models

8
ROC-based calibration of ood inundation models G. J.-P. Schumann, 1 * H. Vernieuwe, 2 B. De Baets 2 and N. E. C. Verhoest 3 1 Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, 91109, USA 2 Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Coupure links 653, B-9000, Ghent, Belgium 3 Laboratory of Hydrology and Water Management, Ghent University, Coupure links 653, B-9000, Ghent, Belgium Abstract: The use of spatial patterns of ood inundation (often obtained from remotely sensed imagery) to calibrate ood inundation models has been widespread over the last 15 years. Model calibration is most often achieved by employing one or even several performance measures derived from the well-known confusion matrix based on a binary classication of ooding. However, relatively early on, it has been recognized that the use of commonly reported performance measures for calibrating ood inundation models (such as the F measure) is hampered because the calibration procedure commonly utilizes only one possible solution of a wet/dry classication of a remote sensing image [most often acquired by a synthetic aperture radar (SAR)] to calibrate or validate models and are biased towards either over-prediction or under-prediction of ooding. Despite the call in several studies for an alternative statistic, to this date, very few, if any, unbiased performance measure based on the confusion matrix has been proposed for ood model calibration/validation studies. In this paper, we employ a robust statistical measure that operates in the receiver operating characteristics (ROC) space and allows automated model calibration with high identiability of the best model parameter set but without the need of a classication of the SAR image. The ROC-based method for ood model calibration is demonstrated using two different ood event test cases with ood models of varying degree of complexity and boundary conditions with varying degree of accuracy. Verication of the calibration results and optional SAR classication is successfully performed with independent observations of the events. We believe that this proposed alternative approach to ood model calibration using spatial patterns of ood inundation should be employed instead of performance measures commonly used in conjunction with a binary ood map. © 2013 California Institute of Technology. Hydrological Processes © 2013 John Wiley & Sons, Ltd. KEY WORDS spatial patterns of inundation; SAR image; ood inundation model; calibration; ROC Received 17 June 2013; Accepted 13 August 2013 INTRODUCTION It is widely recognized that remote sensing can provide either direct observations of hydrologic and hydraulic processes from radar altimetry and interferometry (Alsdorf et al., 2001; LeFavour and Alsdorf, 2005; Frappart et al., 2006; Kiel et al., 2006; Alsdorf et al., 2007) or indirect estimations from visible, thermal (Smith, 1997; Marcus and Fonstad, 2008; Smith and Pavelsky, 2008) or radar (Smith, 1997; Schumann et al., 2007b; Hostache et al., 2009) imagery. Systematic application of imagery from optical or thermal sensors to ood mapping is hampered by persistent cloud cover during oods, particularly in small-sized to medium-sized catchments where ood waters often recede before weather conditions improve. Also, the inability to map ooding beneath vegetation canopies (see, e.g. Hess et al., 2003; Wilson et al., 2007) limits the applicability of these sensors. Routine ood detection and monitoring seems thus realistically only feasible with radar because micro- waves penetrate cloud cover and are reected away from the sensor by smooth open water bodies (e.g. Matgen et al., 2007; Di Baldassarre et al., 2009a). However, if available, optical imagery can be easily interpreted, and ood mapping is usually more straightforward than from radar imagery (Marcus and Fonstad, 2008). Although there is a clear increase in the number of remote platforms and satellites that may help in the monitoring of oods and which foster process understand- ing (Schumann et al., 2009b), remotely sensed data cannot dynamically reproduce processes over time, which is a crucial part of any ood management plan. Therefore, the use of hydrodynamic models, 1D, 2D or even 3D, is indispensable (e.g. Werner et al., 2005; Hunter et al., 2005). In order to be reliable, ood inundation models need careful calibration and preferably also need verication for which ground data are often scarce and relatively high in cost, especially for events of high magnitude (e.g. Werner et al., 2005; Hunter et al., 2005; Schumann et al., 2009b). However, satellites, which provide data at a much lower cost once in operation, are very complementary to ground data and offer an inviting alternative if no other data are *Correspondence to: Guy J.-P. Schumann, Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109, USA E-mail: [email protected] HYDROLOGICAL PROCESSES Hydrol. Process. 28, 54955502 (2014) Published online 20 September 2013 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/hyp.10019 © 2013 California Institute of Technology. Hydrological Processes © 2013 John Wiley & Sons, Ltd.

Upload: n-e-c

Post on 10-Apr-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ROC-based calibration of flood inundation models

HYDROLOGICAL PROCESSESHydrol. Process. 28, 5495–5502 (2014)Published online 20 September 2013 in Wiley Online Library(wileyonlinelibrary.com) DOI: 10.1002/hyp.10019

ROC-based calibration of flood inundation models

G. J.-P. Schumann,1* H. Vernieuwe,2 B. De Baets2 and N. E. C. Verhoest31 Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, 91109, USA

2 Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Coupure links 653, B-9000, Ghent, Belgium3 Laboratory of Hydrology and Water Management, Ghent University, Coupure links 653, B-9000, Ghent, Belgium

*CCaE-m

© 2

Abstract:

The use of spatial patterns of flood inundation (often obtained from remotely sensed imagery) to calibrate flood inundation models hasbeen widespread over the last 15 years. Model calibration is most often achieved by employing one or even several performancemeasures derived from the well-known confusion matrix based on a binary classification of flooding. However, relatively early on, ithas been recognized that the use of commonly reported performance measures for calibrating flood inundation models (such as the Fmeasure) is hampered because the calibration procedure commonly utilizes only one possible solution of a wet/dry classification of aremote sensing image [most often acquired by a synthetic aperture radar (SAR)] to calibrate or validate models and are biased towardseither over-prediction or under-prediction of flooding. Despite the call in several studies for an alternative statistic, to this date, veryfew, if any, unbiased performance measure based on the confusion matrix has been proposed for flood model calibration/validationstudies. In this paper, we employ a robust statistical measure that operates in the receiver operating characteristics (ROC) space andallows automated model calibration with high identifiability of the best model parameter set but without the need of a classification ofthe SAR image. The ROC-based method for flood model calibration is demonstrated using two different flood event test cases withfloodmodels of varying degree of complexity and boundary conditions with varying degree of accuracy. Verification of the calibrationresults and optional SAR classification is successfully performed with independent observations of the events. We believe that thisproposed alternative approach to flood model calibration using spatial patterns of flood inundation should be employed instead ofperformance measures commonly used in conjunction with a binary flood map. © 2013 California Institute of Technology.Hydrological Processes © 2013 John Wiley & Sons, Ltd.

KEY WORDS spatial patterns of inundation; SAR image; flood inundation model; calibration; ROC

Received 17 June 2013; Accepted 13 August 2013

INTRODUCTION

It is widely recognized that remote sensing can provideeither direct observations of hydrologic and hydraulicprocesses from radar altimetry and interferometry (Alsdorfet al., 2001; LeFavour and Alsdorf, 2005; Frappart et al.,2006; Kiel et al., 2006; Alsdorf et al., 2007) or indirectestimations from visible, thermal (Smith, 1997; Marcus andFonstad, 2008; Smith and Pavelsky, 2008) or radar (Smith,1997; Schumann et al., 2007b; Hostache et al., 2009)imagery. Systematic application of imagery from optical orthermal sensors to flood mapping is hampered by persistentcloud cover during floods, particularly in small-sized tomedium-sized catchments where flood waters often recedebefore weather conditions improve. Also, the inability tomap flooding beneath vegetation canopies (see, e.g. Hesset al., 2003; Wilson et al., 2007) limits the applicability ofthese sensors. Routineflood detection andmonitoring seemsthus realistically only feasible with radar because micro-

orrespondence to: Guy J.-P. Schumann, Jet Propulsion Laboratory,lifornia Institute of Technology, Pasadena, CA 91109, USAail: [email protected]

013 California Institute of Technology. Hydrological Processes © 2013

waves penetrate cloud cover and are reflected away from thesensor by smooth open water bodies (e.g. Matgen et al.,2007; Di Baldassarre et al., 2009a). However, if available,optical imagery can be easily interpreted, and floodmappingis usually more straightforward than from radar imagery(Marcus and Fonstad, 2008).Although there is a clear increase in the number of

remote platforms and satellites that may help in themonitoring of floods and which foster process understand-ing (Schumann et al., 2009b), remotely sensed data cannotdynamically reproduce processes over time, which is acrucial part of any flood management plan. Therefore, theuse of hydrodynamic models, 1D, 2D or even 3D, isindispensable (e.g. Werner et al., 2005; Hunter et al., 2005).In order to be reliable, flood inundation models need carefulcalibration and preferably also need verification for whichground data are often scarce and relatively high in cost,especially for events of high magnitude (e.g. Werner et al.,2005; Hunter et al., 2005; Schumann et al., 2009b).However, satellites, which provide data at a much lower

cost once in operation, are very complementary to grounddata and offer an inviting alternative if no other data are

John Wiley & Sons, Ltd.

Page 2: ROC-based calibration of flood inundation models

Table I. Confusion matrix

True class

Positives Negatives

Predicted class Positives Tp Fp

Negatives Fn Tn

5496 G. J.-P. SCHUMANN ET AL.

available (e.g. Alsdorf et al., 2007). There are severalnotable studies that successfully demonstrate the integrationof remotely sensed data with hydrodynamic models (Bateset al., 1997) to assess the reliability of the latter (Horritt, 2006)and quantify performances (see, e.g. Pappenberger et al.,2007) or to reduce calibration uncertainties (see, e.g.Hostache et al., 2009) and improve models using local errorinformation (Schumann et al., 2007a). More recently, studieshave looked at uncertainty in remotely sensed flood area andextent to augment the information contained in remotelysensed imagery (e.g. Schumann et al., 2009a; Stephens et al.,2012). This type of information has been shown to be morebeneficial than traditional binary (i.e. wet/dry) classification,particularly for calibrating flood inundation models (DiBaldassarre et al., 2009b).In a similar context, other studies concerned with flood

inundation model integration with remotely sensedinformation have provided critical accounts of the useof flooded area information with traditional performancemeasures from contingency tables (Stephens et al., 2013)and have advocated for the use of water elevations atflood edge to overcome limitations of the traditionalbinary flood maps (Mason et al., 2009; Stephens et al.,2012). Flood model calibration when conducted withbinary flood maps used in conjunction with traditionalperformance indices, such as the F measure (as commonlyreferred to in flood inundation studies) and others(summarized in Schumann et al., 2009b), tends to begeared towards either over-estimation or under-estimationof flooding (Schumann et al., 2009b; Stephens et al., 2013),and often, there is no identifiability of the correct parameterset as illustrated by Di Baldassarre et al., (2009b). Inaddition to ambiguities associated with performancemeasures for model calibration, there is the issue of choicebetween many different image processing techniques thatexist to derive flood area or extent from synthetic apertureradar (SAR) imagery (cf. Hess et al., 1995; Oberstadleret al., 1997; Horritt et al., 2001; Brivio et al., 2002;Schumann et al., 2009a).The objective of this paper was to introduce an

alternative to traditional measures for calibrating floodinundation models with single remotely sensed (SAR)imagery of flooding. The proposed method employs thereceiver operating characteristics (ROC) graph, a long-used method in signal detection theory (Bradley, 1997) forvisualizing, organizing and selecting classifiers on thebasis of their performance, which has increasingly beenused in machine learning (Fawcett, 2006). The area underthe ROC curve (AUC) (Bradley, 1997; Hanley andMcNeil, 1982) can then be used as a measure reflectingthe performance of the different classifiers. The majoradvantage of using the ROC curve is that it is an automatedmethod that avoids the use of an optimal threshold in orderto classify the SAR image.

© 2013 California Institute of Technology. Hydrological Processes © 2013

The paper is organized as follows. The second sectionbriefly explains the meaning of a ROC curve and its AUC.The third section describes two flood event test cases used todemonstrate the proposed method, and the fourth sectionillustrates and discusses the application of ROC curves inorder to calibrate a hydrodynamic model on the basis ofSAR imagery obtained for the two flood events.

ROC CURVE ANALYSIS

In a binary or two-class classification problem, instancesbelong to either a positive (e.g. flooded) or negative (e.g.non-flooded) class. In order to classify instances to thoseclasses, generally, a classification model is needed. On thebasis of a test set of labelled instances and a classificationmodel, a contingency table or confusion matrix can beconstructed from which basis metrics or performancecriteria can be calculated. Such a confusion matrix (Table I)depicts the difference between the true and predicted classesfor those instances. The number of positive (negative)instances that have correctly been classified as positive(negative) is denoted as true positives (negatives) Tp (Tn),whereas the positive (negative) instances that have beenerroneously classified as negative (positive) are denoted asfalse positives (negatives) Fp (Fn).When a classifier yields a continuous output, a threshold

is generally employed such that a binary output is obtained,and the confusion matrix is constructed for that particularthreshold. Yet, in order to compare two classifiers, a singlemeasure, invariant to the threshold selected, is needed. Ameasure that meets this criterion is the AUC. In a ROCgraph (Figure 1), the Tp rate is plotted in function of the Fprate. Hence, the relative tradeoffs between benefits (Tp) andcosts (Fp) are depicted (Fawcett, 2006). A perfect binaryclassification model will be represented by the point (0,1),reflecting that it correctly classified all positive instances(Tp= 1) while no negative instance was misclassified(Fp=0). The diagonal line Tp=Fp represents the strategyof randomly guessing a class, and a random classificationmodel will hence produce aROCpoint that ‘slides’ back andforth on the diagonal (Fawcett, 2006). For classificationmodels with a continuous output, the ROC curve can beconstructed by varying a threshold from �∞ to +∞(see Figure 1 for a representation of ROC space and a

John Wiley & Sons, Ltd. Hydrol. Process. 28, 5495–5502 (2014)

Page 3: ROC-based calibration of flood inundation models

Figure 1. Representation of a ROC curve in ROC space. The area under thecurve, with an AUC value of approximately 0.67, is indicated in grey. Thedashed diagonal line (Tp=Fp) represents a random guess with an AUC of 0.5

5497ROC-BASED FLOOD MODEL CALIBRATION

hypothetical example of a ROC curve). The ability oftwo classification models to rank positive instancesrelative to negative instances can then be compared onthe basis of their AUC (represented by the grey area inFigure 1). The model with the largest AUC then showsthe best performance. As a model that randomly guessesa class yields the diagonal as ROC curve, the AUC of arealistic classification model should be greater than 0.5.

Figure 2. Demonstration test case of (a) the River Dee and (b) the Lower Zamarea and river centerlines as well as the SAR image of the flood event. Note t

the Lower Zambezi Rive

© 2013 California Institute of Technology. Hydrological Processes © 2013

The idea that a ROC curve reflects the ability todistinguish positive and negative instances, without theneed to first identify an optimal threshold, is adopted in thispaper. Bymoving the threshold from�∞ to +∞ on a SARimage, and calculating the corresponding points in ROCspace w.r.t. a binary flood map simulated by a hydrodynamicmodel for a certain combination of parameter values, allinformation present in the SAR image is employed. Byrepeating the identification of a ROC curve for othercombinations of model parameter values, several ROCcurves, with corresponding AUCs, are obtained. Thestatistical test of De Long et al. (1988), developed for thecomparison of AUCs, was employed, and a Bonferroni–Holm correction was applied to the obtained p-values in orderto adjust them for the fact that a pairwise comparison isperformed.

TEST CASES AND HYDRODYNAMIC MODELLING

This section outlines the two modelling test cases that wereused to demonstrate the ROC-based method for calibrationof flood inundation models using spatial patterns ofinundated area from space-borne SAR. Both these testcases are shown in Figure 2 and described in detail in theliterature (Schumann et al., 2009a; Di Baldassarre et al.,2009b; Schumann et al., 2013) and will only be brieflyrecaptured here. In our case studies, the SAR flood images(C-band: ∼ 5.6 cm wavelength) were acquired with two

bezi River showing floodplain elevation including delineation of low-lyinghat the domain size of the River Dee test case is about 46 km2 while that ofr is about 170 000 km2

John Wiley & Sons, Ltd. Hydrol. Process. 28, 5495–5502 (2014)

Page 4: ROC-based calibration of flood inundation models

5498 G. J.-P. SCHUMANN ET AL.

different resolution modes, and the models applied to eachtest case had a different formulation, although in both casesthe skeleton of themodelwas LISFLOOD-FP (Bates andDeRoo, 2000; Bates et al., 2010).

The River Dee test case

The River Dee flood event in December 2006 is welldocumented by Schumann et al. (2009a) and Di Baldassarreet al. (2009b) in terms of the SAR data and thehydrodynamic modelling with LISFLOOD-FP, respective-ly. The flood image used for this test case was a 12.5-mresolution precision SAR image (PRI) in VV polarization,originating from a 25-m geometric resolution recorded bythe ERS-2 satellite right after the prolonged flood peak.The event was simulated using the inundation model

LISFLOOD-FP after Bates and De Roo (2000) in 1D–2Dmode (hereafter referred to as LISFLOOD-FP). The channelin the model was discretized separately from the overlyingfloodplain raster grid using width and elevation from cross-section surveys conducted by the Environment Agency ofEngland and Wales. The model interpolates between cross-sections and provides a seamless combination with thefloodplain digital terrainmodel. Flows along the channel aresimulated using a simple approximation to the 1DSt-Venantequations, and when bankfull depth is exceeded, water spillsout onto adjacent floodplain where flows are simulatedusing an approximation to a 2D diffusivewave implementedon a regular grid. The LISFLOOD-FPmodel was applied tothe test site using a grid resolution of 20m by averaging theLiDAR floodplain topography. The free parameters in thismodel that require calibration are Manning’s roughness (n)for the channel and floodplain.

The Lower Zambezi River test case

The second test case is a large-scale application(170 000 km2) to the Lower Zambezi River in theMozambiqueDelta. The flood chosen for this demonstrationis the February 2007 event and was captured by the Envisat-ASAR instrument at 75-m pixel resolution in wide swathmode and HH polarization, just a few days after the highmagnitude peak.A 2D flood model at 1-km2 grid resolution was built for

the Lower Zambezi basin including the coastal region ofMozambique. The setup and performance validation of themodel are described in detail in Schumann et al. (2013) whocoupled this model with streamflow forecasts from a large-scale hydrologic model. The version of the LISFLOOD-FPmodel used in this test case is based on the same skeleton asthat applied to the River Dee test case. The main differenceis that this model employs a recent update of the modelphysics, which includes the inertial term of the 1D St-Venant equations (Bates et al., 2010) (hereafter referred toas LISFLOOD-FP ACC). In addition, for this particular test

© 2013 California Institute of Technology. Hydrological Processes © 2013

case, Schumann et al. (2013) complemented this updatewith a sub-grid structure developed by Neal et al. (2012) forsimulating flows in channels much smaller than the actualgrid resolution of the model without the need for a separate1D channel model (as in the River Dee test case).Setting up the channel sub-grid structure requires actual

(sub-grid) channel width and bank height information,which were obtained from the global river widths databaseproduced by Andreadis et al. (2013) and the 90-m ShuttleRadar Topography Mission digital elevation model (SRTMDEM), respectively. We resampled the 90-m SRTM DEM,which has a reportedmean vertical floodplain error of 1.95min the study area (Karlsson and Arnberg, 2011) to 1 km forsimulating the floodplain. Removal of tall floodplainvegetation was deemed unnecessary in our case as mostland cover in the Lower Zambezi floodplain is primarilyshort vegetation made up of savanna, herbaceous anddegraded vegetation, and agricultural fields (The WorldBank, 2006). Resampling to 1 km speeds up the model byabout 150 times, thus allowing calibration of themodel at thescale we wish to simulate. As the model inverts these bankheights to channel depths, smoothing of the bank heights inflow direction, using local weighted regression, wasnecessary in order to avoid model instabilities that mightbe caused by large height variations along the riverbankstypically present in SRTM data. The sub-grid channelformulation converts bank height into channel bed elevationusing a derivation of downstream hydraulic geometry:

d ¼ a�wb (1)

where d is the depth from the river bank elevation, w is thewidth at bankfull for which estimations now exist at globallevel (Andreadis et al., 2013) or could be derived fromsatellite imagery, and both the term a and the exponent of w,i.e. b, are the free parameters of the model formulation thatneed to be estimated or calibrated (Neal et al., 2012) inaddition to Manning’s roughness (n).This particular setup allows modelling of flood flows in

two dimensions with efficient computational speeds whilestill allowing representation of channels smaller than themodel grid resolution (1 km2 in our case).

RESULTS AND DISCUSSION

Model calibration and SAR classification

For the River Dee test case, following the model setupdescribed earlier, ten different LISFLOOD-FP modelsimulations were performed by changing Manning’s n from0.01 to 0.1 with steps of 0.01. The model structure employedfor the Lower Zambezi domain required the calibration ofthree parameters as explained in the third section: thehydraulic geometry parameters a and b as well as n.

John Wiley & Sons, Ltd. Hydrol. Process. 28, 5495–5502 (2014)

Page 5: ROC-based calibration of flood inundation models

5499ROC-BASED FLOOD MODEL CALIBRATION

Following the setup in Schumann et al. (2013), LISFLOOD-FP ACC in sub-grid mode was run 165 times in OpenMPmode (Neal et al., 2009) using a combination of parametervalues. A limited but physically sensible range of values[(0.78, 0.82)] for the depth exponent, b, was taken from Park(1977). As depth parameterization is correlated with theroughness coefficient in the model, we also chose a limitedbut plausible range of the latter between 0.02 and 0.06. Thismeans that only the depth coefficient parameter, a, wasallowed to vary substantially [(0.045, 0.145)], which webelieve to be in line with the fact that we only had one type ofdata available for calibration because calibrating severalmodel parameters with only limited data can lead to modelequifinality (Beven, 2006).Wewish to note that for clarity ofthis paperManning’s coefficient refers in both case studies tothe channel only and thefloodplain roughness valuewas keptconstant at 0.06.Prior to the ROC analysis (i.e. model calibration), a sigma

Lee spatial smoothing filter (9×9 kernel size) was applied toboth SAR images to remove most speckle (i.e. random imagenoise of ‘salt and pepper’ effect obscuring features of interest)given a low signal-to-noise ratio at high spatial resolutions. Inthe case of the Zambezi River, the SAR imagewas aggregatedto the model resolution (1 km) given the size of the image(12.2million pixels) and the number ofmodel runs (note that inthe proposed ROC-based method, the entirety of the image isconsidered for each of the 165 model simulations).As outlined previously, we selected the ROC-based

approach for model calibration with SAR imagery offlooding as it allows to distinguish positive and negativeinstances without the need to first identify an optimal imageclassification threshold and hence classify the image into abinary flood map. This particular characteristic is for

Figure 3. ROC curves obtained for both (a) the River Dee test case and (b)computing the Tp rate and the Fp rate from a direct comparison between a mvaried from � 15.7 to + 21.8 dB (for the River Dee) and from � 21.5 to � 2.simulation is also plotted. The model simulation yielding the largest AUC (in

section for m

© 2013 California Institute of Technology. Hydrological Processes © 2013

obvious reasons desirable, especially when working withSAR imagery that present high bit rate data and can be verynoisy (as illustrated well in the River Dee test case examplein Figure 2a). Furthermore, the well-known nuisance ofblurred land/water edges, emergent flooded vegetation androughening of water surfaces in SAR images complicatesthe classification of inundation patterns and the derivation ofassociated uncertainties, which can be very large. Bymoving the threshold from� 15.7 to + 21.8 dB on the RiverDee SAR image and from � 21.5 to � 2.7 dB on theZambezi River SAR image, and calculating the correspond-ing points in ROC space w.r.t. a binary flood map obtainedby the hydrodynamic model for a certain combination ofparameter values, all information present in the SAR imageis employed with no need of a priori knowledge of the SARimage characteristics and degree of information content. Inorder to account for the difference in spatial resolution in themodel and the SAR image (in the River Dee test case), foreach SAR pixel coordinate pair, the corresponding pixelindices in the model simulation were computed, thusavoiding the need to resample either one. Repeating thisprocess for every combination of model parameter valuesallowed the generation of many different ROC curves, withcorresponding AUCs. Figure 3 shows all the ROC curvesobtained by combining the SAR image with each modelsimulation, for both test cases.All corresponding AUCs were computed and statistically

compared (using the De Long test and a Bonferroni–Holmcorrection to account for pairwise comparison) to guideidentification of the optimal model parameter set as well asthe optimal SAR image classification threshold. As eachpoint on a ROC curve represents a direct comparisonbetween a model simulation and a SAR image classified

the Zambezi River test case. Every point on a ROC curve is obtained byodel simulation and a classified SAR image using a particular threshold

7 dB (for the Zambezi River). For each test case, the AUC for each modeldicated by the red arrow) gives the best performance (see text in the secondore details)

John Wiley & Sons, Ltd. Hydrol. Process. 28, 5495–5502 (2014)

Page 6: ROC-based calibration of flood inundation models

5500 G. J.-P. SCHUMANN ET AL.

with a particular threshold, the optimal threshold can befound, if necessary, by selecting that point on the ROC curvewith the largest AUC that is furthest from the random guessline and located in the upper left part of the ROC space.Figure 4 shows the simulation of the calibrated model at thetime of the SAR satellite overpass and the optimallyclassified image for both the River Dee and Zambezi Rivertest cases.With regard to the ROC curves, the River Dee example

shows a more ideal shape of the curves (AUC for all modelsimulations ranged from 0.70 to 0.84) because the bestpossible combination of a SAR classification and modelsimulation would yield a point at coordinate (0, 1) of theROC space (i.e. a perfect classification or prediction),representing no false negatives (or 100% sensitivity) and nofalse positives (or 100% specificity). While the ZambeziRiver example is much less ideal, achieving modelcalibration and an optimal SAR image classification wasstill possible given that all model simulations, whenassessed with all the information contained in the SARimage, showed ‘realistic’ prediction skill (i.e. most of theROC curve points lie above the random guess line), with allAUCs between 0.55 and 0.61 (Figure 3). Also noteworthy isthat in both test cases, some model simulations generatedAUCswith no significant difference to the largest AUC (e.g.

Figure 4. ROC analysis results. Flood event simulation of the calibrated mclassification for (a) the River Dee test case an

© 2013 California Institute of Technology. Hydrological Processes © 2013

for the River Dee test case, those were the models run with nvalues of 0.05, 0.06 and 0.07). Nonetheless, the parameter(set) that generated the model simulation with the largestAUC is easily identifiable and was selected as the optimum(indicated by the red arrow in Figure 3).

Verification of results

Verification of the calibration results was performedusing independent observations of the events. In the case ofthe River Dee, the calibrated model, i.e. the simulation thatyielded the maximum AUC, had a Manning n value of0.06. This is the same optimum as that obtained by DiBaldassarre et al. (2009b) for the same test case using aprobability-based calibration method with a multi-algo-rithm flood map obtained by fusing an ensemble ofdifferent wet/dry classifications from SAR (Schumannet al., 2009a). Validation of the classified SAR imageshown in Figure 4(a), obtained with the proposed ROC-based method, was carried out using the same multi-algorithm probability flood map produced by Schumannet al. (2009a), which was employed by Di Baldassarre et al.(2009b) for model calibration. Performing an overlayoperation between the two maps and taking the average ofthe probabilities of the pixels that are denoted as flooded

odel at the time of SAR overpass as well as the optimal SAR imaged (b) the Zambezi River test case are shown

John Wiley & Sons, Ltd. Hydrol. Process. 28, 5495–5502 (2014)

Page 7: ROC-based calibration of flood inundation models

5501ROC-BASED FLOOD MODEL CALIBRATION

according to the binary flood map give a value of 0.87(where a value of 1 in the probability of inundation mapdenotes wet in all classification cases). Although thesecomparisons might not be objective as the methods onwhich the verification is performed first classify the SARimage into binary flood maps, we wish to stress that thoseare the only independent results available for assessment.In the Zambezi River test case, the area predicted

correctly between the calibrated model simulation and theLandsat image of the flood event used for model validationin Schumann et al. (2013) is 86.3%. Even though thecalibrated parameter values (a=0.055, b= 0.82, n=0.02)are close to the boundaries of the parameter space, thesimulated in-channel water levels are still within theuncertainty of the ICESat laser altimeter-observed waterlevels (± 3σ, after Hall et al., 2012) used for calibration inSchumann et al. (2013). Although no independent obser-vations were available at the time of the SAR acquisition toallow validation of the image classification shown inFigure 4(b), it is noteworthy that the optimal SARbackscatter (σ) threshold of � 8.96 dB agrees well withtypical values for non-forested flood waters in HHpolarization [�12 dB, �8 dB] reported in the literature(e.g. Manjusree et al., 2012) and is very close to thatidentified for the River Dee image {� 9.84 dB; see alsoManjusree et al. (2012) for a range of typical SAR σvalues [�15 dB, �6 dB] for non-forested flood waters inVV polarization}.Although a combination of the information contained in

the SAR image and a particular simulation of the modeldetermines the position of the calibration curves in ROCspace, the lower performance during calibration in the caseof the Zambezi River can be largely attributed to the muchpoorer quality of the model boundary conditions rather thanthe amount of useful information contained in the SARimage.Model inflows for the Zambezi River were simulatedusing the large-scale Variable Infiltration Capacity (VIC)hydrology model forced with Tropical Rainfall MeasuringMission (TRMM) satellite precipitation and EuropeanCentre for Medium-Range Weather Forecasts (ECMWF)analysis data [see Schumann et al., (2013) for details], andLISFLOOD-FP ACC was run using SRTM floodplaintopography and parameterized channel bathymetry in sub-grid mode, whereas in the test case of the River Dee, gaugedflows and water levels, LiDAR topography and surveyedchannel bathymetric data were available.

CONCLUDING REMARKS

In this paper, we proposed an alternative to traditionalmeasures for calibrating flood inundation models withsingle remotely sensed SAR imagery of flooding. Tradi-tional performance measures based on the well-known

© 2013 California Institute of Technology. Hydrological Processes © 2013

confusion matrix are commonly used to calibrate orvalidate flood inundation models with regard to spatialpatterns of inundation. However, the use of theseperformance measures with a binary classification offlooding for calibrating flood inundation models has beencriticized over the years because such studies utilize onlyone possible solution of a wet/dry classification of theSAR image to calibrate or validate models and are biasedtowards either over-prediction or under-prediction offlooding.The proposed method operates in the ROC space that

depicts the relative tradeoffs between benefits (Tp) andcosts (Fp) between a SAR image and model simulationswith different parameterizations. ROC curves areconstructed by computing the Tp rate and the Fp ratefrom a confusion matrix of a model simulation, and aSAR image classified with a particular threshold variedfrom �∞ to +∞. The AUC is used to quantify theperformance of each model, with a realistic modelsimulation having an AUC greater than 0.5 (random guessline). At the same time, the point on the ROC curve of thebest model (i.e. the ROC curve with the largest AUC)furthest away from the random guess line pinpoints theoptimal SAR classification threshold value, therebyavoiding any a priori processing of the image into wet/dry classes.The ROC-based method for flood model calibration is

statistically robust and has been demonstrated using twodifferent flood event test cases with flood models ofvarying degree of complexity and boundary conditionswith varying degree of accuracy. Verification of thecalibration results and SAR classification was success-fully performed with independent observations of theevents. We believe that this proposed alternative approachto flood model calibration using spatial patterns of floodinundation should be employed instead of performancemeasures commonly used in conjunction with a binaryflood map because the ROC-based method avoids any apriori classification of the flooded area and, as demon-strated, leads to high identifiability of an optimal modelparameter set and avoids subjectivity altogether, be it inthe selection of an image classification threshold or a bestpossible model parameter set.

ACKNOWLEDGEMENTS

This research was performed in the framework of projectG.0837.10 granted by the Research Foundation Flanders(FWO) and the STEREOproject (SR/02/152)financed by theBelgian Science Policy Office (BELSPO). Part of thisresearchwas also carried out at the Jet PropulsionLaboratory,California Institute of Technology, under a contract with theNational Aeronautics and Space Administration.

John Wiley & Sons, Ltd. Hydrol. Process. 28, 5495–5502 (2014)

Page 8: ROC-based calibration of flood inundation models

5502 G. J.-P. SCHUMANN ET AL.

REFERENCES

Alsdorf DE, Smith LC, Melack JM. 2001. Amazon floodplain water levelchanges measured with interferometric SIR-C radar. IEEE Transactionson Geoscience and Remote Sensing 390(2): 423–431.

Alsdorf DE, Rodriguez E, Lettenmaier DP. 2007. Measuring surface waterfrom space. Reviews of Geophysics 45. DOI: 10.1029/2006RG000197.

Andreadis KM, Schumann GJ-P, Pavelsky T. 2013. A simple global riverbankfull width and depth database. Water Resources Research. DOI:10.1002/wrcr.20440

Bates PD, De Roo AP. 2000. A simple raster-based model for floodinundation simulation. Journal of Hydrology 236: 54–77.

Bates PD, Horritt MS, Smith CN, Mason DC. 1997. Integrating remotesensing observations of flood hydrology and hydraulic modelling.Hydrological Processes 11: 1777–1795.

Bates PD, Horritt MS, Fewtrell TJ. 2010. A simple inertial formulation ofthe shallow water equations for efficient two dimensional floodinundation modelling. Journal of Hydrology 387: 33––45.

Beven K. 2006. A manifesto for the equifinality thesis. Journal of Hydrology320: 18–36.

Bradley AP. 1997. The use of the area under the ROC curve in theevaluation of machine learning algorithms. Pattern Recognition 300(7):1145–1159.

Brivio PA, Colombo R, Maggi M, Tomasoni R. 2002. Integration ofremote sensing data and GIS for accurate mapping of flooded areas.International Journal of Remote Sensing 230(3): 429–441.

De Long ER, De Long DM, Clarke-Pearson DL. 1988. Comparing theareas under two or more correlated receiver operating characteristiccurves: a nonparametric approach. Biometrics 440(3): 837–845.

Di Baldassarre G, Schumann G, Bates P. 2009a. Near real time satelliteimagery to support and verify timely flood modelling. HydrologicalProcesses 23: 799–803.

Di Baldassarre G, Schumann G, Bates PD. 2009b. A technique for thecalibration of hydraulic models using uncertain satellite observations offlood extent. Journal of Hydrology 367: 276–282.

Fawcett T. 2006. An introduction to ROC analysis. Pattern RecognitionLetters 27: 861–874.

Frappart F, Do Minh K, L’Hermitte J, Cazenave A, Ramillien G, Le ToanT, Mognard-Campbell N. 2006. Water volume change in the lowerMekong from satellite altimetry and imagery data. InternationalJournal of Geophysics 167: 570–584.

Hall AC, Schumann GJ-P, Bamber JL, Bates PD, Trigg MA. 2012.Geodetic corrections to Amazon River water level gauges using ICESataltimetry. Water Resources Research 48. doi:10.1029/2011WR010895.

Hanley JA, McNeil BJ. The meaning and use of the area under a receiveroperating characteristic (ROC) curve. Radiology 1982; 143: 29–36.

Hess LL, Melack JM, Filoso S, Wang Y. 1995. Delineation of inundatedarea and vegetation along the Amazon floodplain with the SIR-Csynthetic aperture radar. IEEE Transactions on Geoscience and RemoteSensing 330(4): 896–904.

Hess LL, Melack JM, Novo EMLM, Barbosa CCF, Gastil M. 2003. Dual-season mapping of wetland inundation and vegetation for the centralAmazon basin. Remote Sensing of Environment 87: 404–428.

Horritt MS. 2006. A methodology for the validation of uncertain floodinundation models. Journal of Hydrology 326: 153–165.

Horritt MS, Mason DC, Luckman AJ. 2001. Flood boundary delineationfrom synthetic aperture radar imagery using a statistical activecontour model. International Journal of Remote Sensing 220(13):2489–2507.

Hostache R, Matgen P, Schumann G, Puech C, Hoffmann L, Pfister L.2009. Water level estimation and reduction of hydraulic modelcalibration uncertainties using satellite SAR images of floods. IEEETransactions on Geoscience and Remote Sensing 47: 431–441.

Hunter NM, Bates PD, Horritt MS, De Roo PJ, Werner M. 2005. Utility ofdifferent data types for flood inundation models within a GLUEframework. Hydrology and Earth System Science 9: 412–430.

Karlsson JM,ArnbergW. 2011.Quality analysis of SRTMandHYDRO1K: acase study of flood inundation in Mozambique. International Journal ofRemote Sensing 32: 267–285.

Kiel B, Alsdorf D, LeFavour G. 2006. Capability of SRTM C- and X-bandDEM data to measure water elevations in Ohio and the Amazon.Photogrammetric Engineering and Remote Sensing 72: 313–320.

© 2013 California Institute of Technology. Hydrological Processes © 2013

LeFavour G, Alsdorf D. 2005.Water slope and discharge in theAmazon riverestimated using the shuttle radar topography mission digital elevationmodel. Geophysical Research Letters 32: 5. doi:10.1029/2005GL023836.

Manjusree P, Kumar LP, Bhatt CM, Rao GS, Bhanumurthy V. 2012.Optimization of threshold ranges for rapid flood inundation mapping byevaluating backscatter profiles of high incidence angle SAR images.International Journal of Disaster Risk Science 30(2): 113–122.

Marcus WA, Fonstad MA. 2008. Optical remote mapping of rivers at sub-meter resolutions and watershed extents. Earth Surface Processes andLandforms 33: 4–24.

Mason DC, Bates PD, Dall’Amico JT. 2009. Calibration of uncertainflood inundation models using remotely sensed water levels. Journal ofHydrology 368: 224–236.

Matgen P, SchumannG,Henry JB, Hoffmann L, Pfister L. 2007. Integration ofSAR-derived inundation areas, high precision topographic data and a riverflow model toward real-time flood management. International Journal ofApplied Earth Observation and Geoinformation 90(3): 247–263.

Neal JC,Fewtrell TJ, TriggMA.2009. Parallelisationof storage cellfloodmodelsusing OpenMP. Environmental Modelling and Software 240(7): 872–877.

Neal JC, Schumann G, Bates PD. 2012. A subgrid channel model forsimulating river hydraulics and floodplain inundation over large anddata sparse areas. Water Resources Research 48. doi:10.1029/2012WR012514.

Oberstadler R, Hönsch H, Huth D. 1997. Assessment of the mappingcapabilities of ERS-1 SAR data for flood mapping: a case study inGermany. Hydrological Processes 10: 1415–1425.

Pappenberger F, Frodsham K, Beven K, Romanowicz R, Matgen P. 2007.Fuzzy set approach to calibrating distributed flood inundation modelsusing remote sensing observations. Hydrology and Earth SystemSciences 11: 739–752.

Park CC. 1977. World-wide variations in hydraulic geometry exponents ofstream channels: an analysis and some observations. Journal ofHydrology 33: 133–146.

Schumann G, Matgen P, Pappenberger F, Hostache R, Pfister L. 2007a.Deriving distributed roughness values from satellite radar data for floodinundation modelling. Journal of Hydrology 344: 96–111.

Schumann G, Matgen P, Pappenberger F, Hostache R, Puech C,Hoffmann L, Pfister L. 2007b. High-resolution 3D flood informationfrom radar for effective flood hazard management. IEEE Transactionson Geoscience and Remote Sensing 45: 1715–1725.

Schumann G, Di Baldassarre G, Bates PD. 2009a. The utility of space-borneradar to render flood inundationmaps based onmulti-algorithm ensembles.IEEE Transactions on Geoscience and Remote Sensing 47: 2801–2807.

Schumann GJ-P, Bates PD, Horritt MS, Matgen P, Pappenberger F.2009b. Progress in integration of remote sensing derived flood extentand stage data and hydraulic models. Reviews of Geophysics 47. doi:10.1029/2008RG000274.

Schumann GJ-P, Neal JC, Voisin N, Andreadis KM, Pappenberger F,Phanthuwongpakdee N, Hall AC, Bates PD. 2013. A first large scale floodinundation forecasting model. Submitted in revised form to WaterResources Research. In press.

Smith LC. 1997. Satellite remote sensing of river inundation area, stage,and discharge: a review. Hydrological Processes 11: 1427–1439.

Smith LC, Pavelsky TM. 2008. Estimation of river discharge, propagationspeed, and hydraulic geometry from space: Lena River, Siberia. WaterResources Research 44. doi:10.1029/2007WR006133.

Stephens EM, Bates PD, Freer JE, Mason DC. 2012. The impact ofuncertainty in satellite data on the assessment of flood inundationmodels. Journal of Hydrology 414–415: 162–173.

Stephens EM, Schumann G, Bates PD. 2013. Problems with binary patternmeasures for flood model evaluation. Hydrological Processes. DOI:10.1002/hyp.9979.

The World Bank. 2006. Lower Zambezi river basin baseline data onlanduse, biodiversity, and hydrology. Draft Report 2, The World Bank:Washington, D. C., USA.

Werner M, Blazkova S, Petr J. 2005. Spatially distributed observations inconstraining inundation modelling uncertainties. Hydrological Processes190(16): 3081–3096.

Wilson M, Bates P, Alsdorf D, Forsberg B, Horritt M, Melack J, FrappartF, Famiglietti J. 2007. Modeling large-scale inundation of Amazonianseasonally flooded wetlands. Geophysical Research Letters 34.doi:10.1029/2007GL030156.

John Wiley & Sons, Ltd. Hydrol. Process. 28, 5495–5502 (2014)