Probabilistic Flood Loss Models for CompaniesLukas Schoppa1,2*, Tobias Sieg1,2, Kristin Vogel2, Gert Zöller3, and Heidi Kreibich1
6 May 2020*[email protected]
Picture: DLR
1 Section Hydrology, GFZ German Research Centre for Geosciences, Potsdam, Germany 2 Institute of Environmental Science and Geography, University of Potsdam, Potsdam, Germany 3 Institute of Mathematics, University of Potsdam, Potsdam, Germany
• Commonly involves stage-damage functions
• Most stage-damage functions are univariable models: loss depends only on water depth
• Few existing flood loss models account for uncertainty in loss estimates
Status Quo in Flood Loss Modeling
06.05.2020 Background – Methods – Results – Conclusions 2
Water Depth0
1
Rel
ativ
e Lo
ss
• Towards multivariable and probabilistic models; e.g.:• Multivariate generalized regression
• Rule-based models
• Decision trees
• Bayesian networks
• Despite significant contributions of companies to total flood losses, development focused on private households
Multivariable, probabilistic flood loss models for companies are lacking
New Approaches to Flood Loss Modeling
06.05.2020 Background – Methods – Results – Conclusions 3
• Development and validation of three multivariable, probabilistic flood loss models for companies for the assets• Building (BUI)
• Equipment (EQU)
• Goods and stock (GNS)
• Comparing predictive performance of Bayesian networks, Bayesian regression and random forest
1) against an established benchmark model: probabilistic square-root stage damage function (SDF)
2) against each other
Study objectives
06.05.2020 Background – Methods – Results – Conclusions 4
• Company loss data (n=1306) stems from four telephone surveys after major floods in Germany in the period 2002 – 2013
• Collected data cover flood intensity, company characteristics, warning and emergency measures, flood experience, and private precaution
Survey Data
06.05.2020 Background – Methods – Results – Conclusions 5
Variable Abbreviation
Predictor Variables
Water depth wd
Inundation duration dur
Return period rp
Business sector sec
Company size size
Spatial situation spat
Flood experience exp
Precaution ratio pre
Response Variables
Relative loss building rloss
Relative loss equipment rloss
Relative loss goods/stock rloss
Predictor (n=8) and response (n=1) variables
Building (n=545) Equipment (n=829) Goods & Stock (n=902)
0.00
0.25
0.50
0.75
1.00
Variable
wd
dur
sec
size
spat
exp
pre
rp
rloss
Model variables
06.05.2020 Background – Methods – Results – Conclusions 6
Kernel density estimates of model data for the three company assets building, equipment, and goods and stock. For comparability, all variables are scaled from zero to one in this plot. The lines in the violin plots indicate the quartiles while the dot represents the mean. The predictor set is comprised by nominal, ordinal and continuous variables. The response variable, relative loss (rloss), is defined on the interval [0, 1] and contains significant shares of zeros and ones, which correspond to companies with no and total loss.
Multivariable Models
06.05.2020 Background – Methods – Results – Conclusions 7
+
BetaBernoulli
Bayesian Networks (BN) Bayesian Regression (BR) Random Forest (RF)
• Graphical probabilistic model
• Network structure learned from data
• Zero-and-one inflated beta distribution
• Inflation accounts for zero and one loss cases
• Quantile regression forest• Conditional inference tree
algorithm
The multivariable models and the univariable stage-damage function all return probabilistic predictions of relative flood loss in the form of samples. We developed individual models for losses to the assets building, equipment, and goods and stock.
• Water depth and precaution are dominant predictor variables for all assets and models
• The predictor importance measures of the BN, BR, and RF are plausible and agree with previous findings
• The candidate models consistently identify the same predictors as most relevant
• Damage processes differ across assets: separate models are justified
Comparing Model Fits
06.05.2020 Background – Methods – Results – Conclusions 8
pre
rloss
rp
spat
exp
wd
dur
sec
size
dur
rp
pre
rloss
sec
spat
exp
wd
size
exp
pre
rloss
rp
sec
spat
wd
dur
size
BUI EQU GNS
-0.5 0.0 0.5 -0.5 0.0 0.5 -0.5 0.0 0.5
sec4
sec3
sec2
spat4
spat3
spat2
size
pre
exp
rp
dur
wd
Parameter estimate
BUI EQU GNS
0.0000 0.0025 0.0050 0.0075 0.00 0.01 0.02 0.03 0.00 0.01 0.02 0.03 0.04 0.05
sec
spat
size
pre
exp
rp
dur
wd
Mean decrease prediction accuracy
pre
rloss
rp
spat
exp
wd
dur
sec
size
dur
rp
pre
rloss
sec
spat
exp
wd
size
exp
pre
rloss
rp
sec
spat
wd
dur
size
BUI EQU GNS
-0.5 0.0 0.5 -0.5 0.0 0.5 -0.5 0.0 0.5
sec4
sec3
sec2
spat4
spat3
spat2
size
pre
exp
rp
dur
wd
Parameter estimate
BUI EQU GNS
0.0000 0.0025 0.0050 0.0075 0.00 0.01 0.02 0.03 0.00 0.01 0.02 0.03 0.04 0.05
sec
spat
size
pre
exp
rp
dur
wd
Mean decrease prediction accuracy
pre
rloss
rp
spat
exp
wd
dur
sec
size
dur
rp
pre
rloss
sec
spat
exp
wd
size
exp
pre
rloss
rp
sec
spat
wd
dur
size
BUI EQU GNS
-0.5 0.0 0.5 -0.5 0.0 0.5 -0.5 0.0 0.5
sec4
sec3
sec2
spat4
spat3
spat2
size
pre
exp
rp
dur
wd
Parameter estimate
BUI EQU GNS
0.0000 0.0025 0.0050 0.0075 0.00 0.01 0.02 0.03 0.00 0.01 0.02 0.03 0.04 0.05
sec
spat
size
pre
exp
rp
dur
wd
Mean decrease prediction accuracy
Building Equipment
Goods and Stock
Bayesian network structures, which were learned from the survey data, for the three assets. The BN structures show that the damage processes differ between building loss and losses to equipment, and goods and stock.
Probabilistic Predictions
06.05.2020 Background – Methods – Results – Conclusions 9
Obs. Loss: 0.4 | ID: 330 Obs. Loss: 1 | ID: 104 Obs. Loss: 1 | ID: 136
Obs. Loss: 0.04 | ID: 392 Obs. Loss: 0.21 | ID: 165 Obs. Loss: 0.21 | ID: 450
Obs. Loss: 0 | ID: 183 Obs. Loss: 0 | ID: 261 Obs. Loss: 0 | ID: 37
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Relative Loss Building [-]
Model BN BR RF SDF
• Prediction accuracy and sharpness appear to be higher for minor losses than for severe losses
• Stage-damage function (SDF) predictions are often bimodal (high density at 0/1)
• Predictive densities of the multivariable models (BN, BR, RF) are more flexible; e.g. de-/inflation of predictive densities at 0/1
Examples of predictive densities from the four models (color-coded) for the building loss of nine randomly selected companies (identified by ID). The observed loss is indicated by the black lines.
Predictive Performance
06.05.2020 Background – Methods – Results – Conclusions 10
MAE Mean CRPS MBE
BUI EQU GNS BUI EQU GNS BUI EQU GNS
-0.2
-0.1
0.0
0.1
0.2
0.0
0.1
0.2
0.3
0.4
0.0
0.1
0.2
0.3
0.4
Model BN BR RF SDF
MAE Mean CRPS MBE
BUI EQU GNS BUI EQU GNS BUI EQU GNS
-0.2
-0.1
0.0
0.1
0.2
0.0
0.1
0.2
0.3
0.4
0.0
0.1
0.2
0.3
0.4
Model BN BR RF SDF
Performance metrics mean average error (MAE), mean continuous ranked probability score (CRPS), and mean bias error (MBE) for the four models (color-coded) and assets (x-axis). Each boxplot summarizes 100 repetitions of a 10-fold cross-validation with varying data partitioning.
* CRPS: generalization of the MAE; optimum at zero
* • Multivariable models outperform stage-damage functions for all assets/performance metrics
• Performance differences among multivariable models are small
Predictive Performance
06.05.2020 Background – Methods – Results – Conclusions 11
mean CRPS:
0.099
mean CRPS:
0.161
mean CRPS:
0.162
mean CRPS:
0.103
mean CRPS:
0.165
mean CRPS:
0.158
mean CRPS:
0.099
mean CRPS:
0.159
mean CRPS:
0.154
mean CRPS:
0.125
mean CRPS:
0.204
mean CRPS:
0.208
BN BR RF SDF
BU
IE
QU
GN
S
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
Observed Relative Loss [-]
CR
PS
[-]
mean CRPS:
0.099
mean CRPS:
0.161
mean CRPS:
0.162
mean CRPS:
0.103
mean CRPS:
0.165
mean CRPS:
0.158
mean CRPS:
0.099
mean CRPS:
0.159
mean CRPS:
0.154
mean CRPS:
0.125
mean CRPS:
0.204
mean CRPS:
0.208
BN BR RF SDF
BU
IE
QU
GN
S
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
Observed Relative Loss [-]
CR
PS
[-]
Scatter plots of observed relative loss versus cross-validated continuous ranked probability scores (CRPS) for all models (columns, color-coded) and building loss (BUI). Each symbol represents the prediction error incurred by the respective model for one company. The black, step-wise lines show the average CRPS in different intervals of observed relative loss. The labels in the top-left corner of each panel contain the mean CRPS over all predictions of the respective model. We observed similar predictive errors for the assets equipment, and goods and stock.
• The scatter plots confirm that model errors are larger for severe losses
• The variation in the predictions and, hence, the errors is larger for multivariable models than for the stage-damage function
• Still, on average, the multivariable models perform better
Bias-variance tradeoff
Advantages of Multivariable Models
06.05.2020 Background – Methods – Results – Conclusions 12
• Water depth insufficiently discriminates between minor and major losses (see figure); additional predictors contain valuable information
• The structural complexity of the multivariable models allows for large flexibility in the predictive distributions, which is required to model the large shares of 0/1-cases in the data
BUI EQU GNS
0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000
0.00
0.25
0.50
0.75
1.00
Water Depth [cm]
Re
lative
Lo
ss [
-]
Scatter plots of water depth versus relative loss for the assets building (BUI), equipment (EQU), and goods and stock (GNS). The relationship between water depth and relative loss is characterized by pronounced noise.
Conclusions
06.05.2020 Background – Methods – Results – Conclusions 13
• Bayesian networks, Bayesian regression, and random forest outperform the stage-damage function
• Performance differences among multivariable models are small; model choice depends on data availability and study task
• Large predictive errors for severe losses could be caused by imbalances in the data: extreme losses are less frequent than minor losses
• Wide predictive densities suggest that the uncertainty in the predictive distributions is generally high and, hence, requires quantification through probabilistic loss models (especially for large losses)