prioritising water pipes for condition …ozwater.org/sites/all/files/ozwater/009 ywang.pdf ·...

8
PRIORITISING WATER PIPES FOR CONDITION ASSESSMENT WITH DATA ANALYTICS Bin Li 1 , Bang Zhang 1 , Zhidong Li 1 , Yang Wang 1 , Fang Chen 1 , Dammika Vitanage 2 1. National ICT Australia, Sydney, NSW 2. Sydney Water, Sydney, NSW ABSTRACT This work proposes a method for prioritising critical water mains (CWMs) with high risk, in terms of both failure probability and consequence costs. The proposed method can improve the prioritisation of high-risk CWMs requiring condition assessment in the near future and will further optimise the operational maintenance of water utilities. Due to the improved prioritisation, water utilities can reduce economic loses by predicting more CWM failures with high consequences and renew only mains of high risk. Our contributions mainly lie in two data analytics techniques: 1) an efficient and effective method of CWM failure prediction based on Bayesian nonparametric modelling called hierarchical beta process (HBP), and 2) a risk- aversion method of CWM selection for condition assessment based on constrained binary integer programming (BIP). Test results on a dataset from the water utility show that the proposed method outperforms previous methods in both pipe failure prediction and consequence cost savings by doubling the failure prediction performance. INTRODUCTION Being one of the most critical and valuable urban assets, water supply networks are required to be maintained in an effective and efficient way, avoiding pipe failures and keeping their condition healthy. There are two kinds of water mains in a water supply network: reticulation water mains (i.e., small diameter pipes), whose failures usually bear low consequences; and critical water mains (CWMs, i.e., large diameter pipes), whose failures typically result in severe consequences due to service interruptions and negative economic and social impacts, such as flooding and traffic disruptions (see Figure 1). The financial and social costs of reactive repairs of CWMs amount to more than $1 billion annually in Australia. Thus, it is crucial for water utilities to develop preventative maintenance strategies for CWMs in a financially viable way. In water industry practice, a common approach to preventative risk management of CWMs is prioritising renewal of assets in terms of both failure probabilities and consequences. Improving the prioritisation of high-risk CWMs can bring significant capital and operational savings. On one hand, water utilities can reduce economic loses by predicting more CWM failures with high consequences; on the other hand, they can target inspecting mains of higher risk. In this work, we consider how to 1) improve failure prediction for CWMs and 2) select high-risk CWMs in terms of both failure probabilities and consequences. Figure 1: Scenes of critical water main failure captured on site. Firstly, prioritisation of CWMs requires a good estimate of their failure likelihoods (probabilities). The mechanisms of water pipe failure have been studied for decades, and various physical and mechanical models, such as pipe wall thickness (Ferguson et al., 1996), material deterioration according to environmental conditions and quality of manufacturing (Rajani and Kleiner, 2001), and hydraulic characteristics (Misiunas, 2005), have been developed to estimate the condition of pipes. Statistical methods circumvent these issues by using historical failure data to predict future failures. The probability of a pipe failure itself is a random variable depending on a set of physical pipe attributes (e.g., age) as well as environmental conditions (e.g., soil type). Various parametric or semi-parametric models have been developed for water pipe failure analysis in the fields of

Upload: buitu

Post on 21-Jul-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PRIORITISING WATER PIPES FOR CONDITION …ozwater.org/sites/all/files/ozwater/009 YWang.pdf · PRIORITISING WATER PIPES FOR CONDITION ASSESSMENT WITH DATA ANALYTICS Bin Li 1, Bang

PRIORITISING WATER PIPES FOR CONDITION ASSESSMENT WITH DATA ANALYTICS

Bin Li 1, Bang Zhang 1, Zhidong Li 1, Yang Wang 1, Fang Chen 1, Dammika Vitanage 2

1. National ICT Australia, Sydney, NSW

2. Sydney Water, Sydney, NSW

ABSTRACT This work proposes a method for prioritising critical water mains (CWMs) with high risk, in terms of both failure probability and consequence costs. The proposed method can improve the prioritisation of high-risk CWMs requiring condition assessment in the near future and will further optimise the operational maintenance of water utilities. Due to the improved prioritisation, water utilities can reduce economic loses by predicting more CWM failures with high consequences and renew only mains of high risk. Our contributions mainly lie in two data analytics techniques: 1) an efficient and effective method of CWM failure prediction based on Bayesian nonparametric modelling called hierarchical beta process (HBP), and 2) a risk-aversion method of CWM selection for condition assessment based on constrained binary integer programming (BIP). Test results on a dataset from the water utility show that the proposed method outperforms previous methods in both pipe failure prediction and consequence cost savings by doubling the failure prediction performance. INTRODUCTION Being one of the most critical and valuable urban assets, water supply networks are required to be maintained in an effective and efficient way, avoiding pipe failures and keeping their condition healthy. There are two kinds of water mains in a water supply network: reticulation water mains (i.e., small diameter pipes), whose failures usually bear low consequences; and critical water mains (CWMs, i.e., large diameter pipes), whose failures typically result in severe consequences due to service interruptions and negative economic and social impacts, such as flooding and traffic disruptions (see Figure 1). The financial and social costs of reactive repairs of CWMs amount to more than $1 billion annually in Australia. Thus, it is crucial for water utilities to develop preventative maintenance strategies for CWMs in a financially viable way. In water industry practice, a common approach to preventative risk management of CWMs is prioritising renewal of assets in terms of both failure probabilities and consequences. Improving the prioritisation of high-risk CWMs can bring significant capital and operational savings. On one hand, water utilities can reduce economic loses by predicting more CWM failures with high

consequences; on the other hand, they can target inspecting mains of higher risk. In this work, we consider how to 1) improve failure prediction for CWMs and 2) select high-risk CWMs in terms of both failure probabilities and consequences.

Figure 1: Scenes of critical water main failure captured on site.

Firstly, prioritisation of CWMs requires a good estimate of their failure likelihoods (probabilities). The mechanisms of water pipe failure have been studied for decades, and various physical and mechanical models, such as pipe wall thickness (Ferguson et al., 1996), material deterioration according to environmental conditions and quality of manufacturing (Rajani and Kleiner, 2001), and hydraulic characteristics (Misiunas, 2005), have been developed to estimate the condition of pipes. Statistical methods circumvent these issues by using historical failure data to predict future failures. The probability of a pipe failure itself is a random variable depending on a set of physical pipe attributes (e.g., age) as well as environmental conditions (e.g., soil type). Various parametric or semi-parametric models have been developed for water pipe failure analysis in the fields of

Page 2: PRIORITISING WATER PIPES FOR CONDITION …ozwater.org/sites/all/files/ozwater/009 YWang.pdf · PRIORITISING WATER PIPES FOR CONDITION ASSESSMENT WITH DATA ANALYTICS Bin Li 1, Bang

mechanical, civil, structural, and environmental engineering. Recently, machine learning based data analytics techniques, including nonparametric Bayesian methods, have become quite popular and well accepted in practice as they offer more flexible modelling which requires fewer assumptions on the model structure. Nonparametric learning has been applied successfully in various industries, for instance, to predict remission times for leukemia patients, time between explosions in coal mines and weather forecasts (Ibrahim et al., 2001). While the general framework of nonparametric learning can be found in the literature of survival analysis and topic modelling (Chen et al., 2012), to our knowledge nonparametric Bayesian methods have not been investigated for pipe failure prediction. In this work, we adapt this new technique to predict CWM failures. Secondly, prioritisation of high-risk CWMs should take into account both failure probabilities and consequences. For example, those CWMs which have high failure probabilities but low consequences should not be selected because the economic loses are less significant even if they indeed fail in the near future. A good prioritisation strategy for water utilities should be to target inspection of CWMs that are likely to fail in the near future, focusing more on CWMs with high consequences at the same time. In this work, we exploit a risk-aversion method of CWM selection for condition assessment. Machine learning, the key data analytics technique used in this work, is a metaphor – treating the machine as a child and teaching it to use observations (attributes and historical failure records of CWMs) as inputs and corresponding labels (failure probabilities of CWMs) as outputs. The combination of observations and labels is called training data. Our aim is to let the machine “learn” a failure prediction rule which is able to bundle all the observations together to the label, avoiding biased predictions. The contributions of this work mainly lie in two machine learning components: 1) an efficient and effective method of CWM

failure prediction based on Bayesian nonparametric modelling called hierarchical beta process (HBP); and

2) a risk-aversion method of CWM selection for condition assessment, considering both failure probabilities and consequence costs, based on constrained binary integer programming (BIP).

DATA COLLECTION Regions and datasets This work uses the data from two different geographical regions in the greater Sydney area. The critical and reticulation networks for each

region are plotted in Figure 2. For each region, two datasets were available:

Figure 2. Water pipe networks in the two regions.

1) Pipe Network: the attributes of all CWMs in the region, including identification number, laid date, length, material, diameter size, location,

Page 3: PRIORITISING WATER PIPES FOR CONDITION …ozwater.org/sites/all/files/ozwater/009 YWang.pdf · PRIORITISING WATER PIPES FOR CONDITION ASSESSMENT WITH DATA ANALYTICS Bin Li 1, Bang

protective coating, surrounding soil type, etc. The oldest pipes were laid before 1900 and the average pipe age across regions is about 45 years;

2) Failure Records: failure records from 1999 to

2012, including report date, type of failure, and failure location.

Statistical details about the CWMs in these regions are given in Table 1. The 14 year observation period was relatively short compared to the life cycle of water pipes, and it is crucial to note that most (about 99%) pipes do not fail or fail just once during the observation period.

Table 1: Statistics on CWMs (proportion of greater Sydney area in brackets).

Number of pipes

Length of pipes

Number of failures

Region A 5944 (8.0%)

234km (4.6%)

434 (6.9%)

Region B 5041 (6.8%)

379km (7.5%)

563 (8.9%)

Total 17235 (23.2%)

1017km (20.2%)

1949 (31.0%)

Each CWM is associated with a consequence (see Figure 3), which refers to both the direct and indirect costs to the water utility in case of failure, such as the allowance for minimising the social impacts caused by water discontinuity, traffic disruption, property damage, etc. (Kane et al., 2013).

METHOD Hierarchical beta process for the estimation of pipe failure probability During the analysis, we found two facts that may significantly influence the results of CWM failure prediction: 1) The failure rate of CWMs does not always

increase in direct correlation with age. Despite the information available on features in the dataset and public resources, many other hidden factors impact the condition of water mains, such as manufacture condition. For example, the records show that some old CWMs failed less than younger ones; and

2) The sparseness of CWMs breaks already noted

above.

To enhance the performance of CWM failure prediction, these two facts were carefully

considered when formulating our nonparametric Bayesian method.

A

B

Figure 3. Ranking of consequence costs for CWMs in the two regions (Data source: Sydney Water Corporation).

highest

lowest

Page 4: PRIORITISING WATER PIPES FOR CONDITION …ozwater.org/sites/all/files/ozwater/009 YWang.pdf · PRIORITISING WATER PIPES FOR CONDITION ASSESSMENT WITH DATA ANALYTICS Bin Li 1, Bang

We propose the use of a nonparametric Bayesian learning algorithm called hierarchical beta process (HBP) to predict the condition of water pipes. Historical water pipe data can be incorporated and the model can grow to accommodate future data as necessary. This novel modelling approach has the potential to work effectively across many different pipe types and local conditions. It is superior to parametric modelling such as Weibull because it avoids assumptions of the structure of the model from the outset. Technically speaking, the beta process provides a Bayesian nonparametric prior for statistical models that involve binary prediction results. The hierarchical structure reflects the dependency amongst the different groups of pipes. Our work extends the work of Thibuax and Jordan (2007) by developing an efficient inference algorithm for sparse incident data, and by also considering both mean and concentration parameters rather than using fixed concentration during the inference process. The technical details of our method can be found in (Li et al., 2014), which is used as the component for CWM failure probability prediction in this work. Details of hierarchical beta process For a water-distribution system that consists of multiple groups of pipes, ��,� is denoted as the probability of failure for a pipe in the �-th group. Considering a hierarchical construction for pipe failures:

��~��� ���, ��1 − ����, for� = 1, … ,�

��,�~��� ��� , ��1 − ����, for� = 1,… , ��

��,�,�~��� !""����,��, for# = 1, … ,$�,� where �� and � are the mean and concentration parameters for the �-th group, �� and � are hyper-parameters for the hierarchical beta process, ��,� = {��,�,�|# = 1,… , $�,�} is the historical records

of pipe failure, ��,�,� = 1 means the pipe failed in the

#-th year, otherwise ��,�,� = 0.

An inference algorithm was developed to estimate both mean and concentration parameters for the hierarchical beta process. For critical main failure data, the available observation period is relatively short compared to the life cycle of the pipes, such that most (about 99%) pipes do not fail or fail only once during the observation period. The sparseness of the data could be used for approximation of the inference process to further reduce the computational complexity. The inference algorithm can be found in (Li et al., 2014).

Risk-aversion strategy of pipe selection for condition assessment The economic cost of a water main failure depends on both accurate failure probability estimation, and consequences. Thus, a risk-aversion strategy was designed to select pipes for condition assessment considering both aspects. Most of the economic models adopted by the water industry for CWM selection allocates CWMs into a risk table, whose rows and columns correspond to several levels of failure probabilities and consequences, respectively. The calibration of risk levels are determined by domain experts. Those CWMs highly ranked in both rows and columns are prioritised. The economic model has achieved substantial success in practice, but has two important limitations: 1) The economic model heavily relies on expert

knowledge and experience, hence subjective judgement to define different levels of failure probability and consequence (i.e., defining thresholds); and

2) The model can only rank pipes at a coarse granularity. For instance, pipes in the cells with the same colour are assigned the same risk level. It becomes problematic (or at least depends on domain expertise) when the condition assessment budget imposes fine-grain prioritisation.

In order to overcome these limitations, a risk-aversion approach is developed for automatically selecting the highest risk pipes for condition assessment. It considers both failure probability and consequence in a principled way, aiming to identify more high-risk CWMs with the same level of inspection effort. The terminology “risk-aversion” is borrowed from economics, which means avoiding choosing uncertain stocks given that the expected gains are the same. The idea is stemmed from the Mean-Variance Theory for modern portfolio selection (Markowitz, 1991, a famous economic principle). In water pipe selection, a pipe collection can be regarded as a portfolio and each pipe in the portfolio has an expected cost saving. We adapt the theory to water pipe selection to maximise the consequence cost savings while constraining the uncertainty in terms of failure probabilities within a reasonable degree. We can formulate the water pipe selection problem as a standard Binary Integer Programming (BIP) problem. Details of the risk-aversion strategy We assume there are ) pipes for selection. Their properties are given:

• Failure probability *+ for � = 1,… ,) (from a probability prediction model);

Page 5: PRIORITISING WATER PIPES FOR CONDITION …ozwater.org/sites/all/files/ozwater/009 YWang.pdf · PRIORITISING WATER PIPES FOR CONDITION ASSESSMENT WITH DATA ANALYTICS Bin Li 1, Bang

• Consequence cost �+ for � = 1,… , ) (from Sydney Water); and

• Pipe length "+ for � = 1, … , ). We assign a selection variable ,+ ∈ {0,1}, for � = 1,… ,), to each pipe. ,+ = 1 means the �-th pipe is selected for condition assessment, and ,+ = 0 otherwise. Then the pipe selection problem becomes an optimization problem over the selection variables ,+ with the following goals: • The selected pipes should have relatively high

expected consequence costs (according to both their failure probabilities and consequence costs) within the selection budget constraint.

• Meanwhile, the failure probability of a selected pipe should be high enough to avoid the situation in which the pipe with very high consequence cost but small failure probability is selected.

The first aim is consistent with the risk table’s strategy that the top-left cells are most risky. The second aim can be understood using the following illustration: Suppose we have two sets of selected pipes at the same expected risk level, i.e., r.p. =constant for all the pipes in the sets (on the same curve in the top panel in Figure 4), one set S1 have relatively lower failure probabilities and higher consequences; the other set S2 have moderate failure probabilities and moderate consequences. Suppose we inspect a certain length of pipes in S1 and S2, respectively, the saved consequence curves in terms of the inspected length are shown in the bottom panel in Figure 4 (black curve for S1 and red curve for S2). Since the pipes in S2 have relatively high probability to fail, we have higher likelihood to successfully detect the true failures and each correct detection of failure only saves a moderate consequence. In contrast, the pipes in S1 have lower probability to fail, thus we have lower likelihood to successfully detect the true failures, but each correct detection of failure can save a high consequence. Although they both have the same expected consequence saving in the long run (no matter higher likelihood with lower cost savings or lower likelihood with higher cost savings), we can observe that S2’s curve is usually over the S1’s curve in the midway. Since in practice only a limited budget is allocated per year for risk analysis (e.g., 1% of the total length), selecting those pipes with higher probabilities can lead to an earlier correct failure detection (thus an earlier consequence saving). Now we can formulate the pipe selection problem for condition assessment as a standard Binary Integer Programming (BIP) problem (Papadimitriou and Steiglitz, 1998) as follows:

Figure 4. Illustration of the failure probability constraints.

max78,…,79∈{�,:}; ,+��+*+�

<

+=:

subject to�1�; ,+<

+=:"+ > ?

�2�; ,+<

+=:log *+ C D

where ? denotes the condition assessment budget, namely the upper limit over the total length of the selected pipes; D is the threshold for constraining the failure probabilities, and it needs to be tuned for different probability prediction models. This problem can be efficiently solved by any off-the-shelf BIP solver. RESULTS Risk Maps The estimated failure probabilities provided by both methods allow the risk ranking of all individual pipes, at the lowest level of granularity. Risk maps highlighting the predicted top 5% most likely to fail pipes, and the actual failures correctly predicted by the Weibull and HBP methods are shown in Figure 5 for the two areas. It is clear that more pipe failures were correctly predicted by the HBP method, demonstrating the benefits of this new method.

Page 6: PRIORITISING WATER PIPES FOR CONDITION …ozwater.org/sites/all/files/ozwater/009 YWang.pdf · PRIORITISING WATER PIPES FOR CONDITION ASSESSMENT WITH DATA ANALYTICS Bin Li 1, Bang

A

B

Figure 5. Predicted top 5% most likely to fail pipes in the two areas. Asterisks represent actual failures in 2012, correctly predicted by each method.

Page 7: PRIORITISING WATER PIPES FOR CONDITION …ozwater.org/sites/all/files/ozwater/009 YWang.pdf · PRIORITISING WATER PIPES FOR CONDITION ASSESSMENT WITH DATA ANALYTICS Bin Li 1, Bang

Figure 6. Detection results for first 2.5% of all critical water mains in the two regions. The horizontal axis represents the cumulative length of inspected water pipes, and the vertical axis represents the number of detected pipe failures. Practical Benefits The proposed HBP model can significantly improve the efficiency of condition assessment. This is achieved via accurate selection of high risk pipes for condition assessment. The experimental results shown in Figure 6 for the two regions examined shows that the proposed HBP model can identify almost 100% more failures than the Weibull model for the same effort of inspection (1% of the network length inspected). Specifically, Figure 6 shows the failure detection performances of HBP and Weibull in the two regions. The horizontal axis represents the accumulated length of inspections, and the vertical axis represents the number of failures detected. The vertical purple line indicates the inspection of 1% of the network length. The blue line shows the performance of a random sample. As we can see, HBP can detect failures more efficiently than Weibull. For example in region A, HBP detects the first failure by inspecting only about 200 metres of mains, while Weibull requires a much longer length of pipe to be inspected before detecting the first

failure. More importantly, with the same amount of effort (1% of the network length inspection), doubling the performance of failure prediction means that 100% more failures can be prevented in advance. This will tremendously improve service continuity and minimize the potential cost for reactive repairment. Economic Savings We compared the economic savings of the two risk analysis methods, “Weibull model + Risk table” and the proposed “HBP + constrained BIP”, based on their respective failure probability predictions for 2012 and the same consequence data (provided by Sydney Water Corporation, see Figure 3). The economic savings for the two regions are reported in Figure 7. Each region’s savings are calculated as the sum of consequence savings for each correctly identified CWM in that region in 2012. The consequence cost savings are calculated as follows: We first list the asset IDs selected using both methods and check if these pipes were correctly identified as failing in 2012. If the failure was correctly identified, we calculate the cost savings and add the corresponding asset IDs. Within the assets comprising 1% of CWMs most likely to fail at high cost, the proposed method would have realised a more than doubling of consequence cost savings in 2012.

Figure 7. Cost saving results for first 1% and 2.5% of all critical water mains in the two regions. The vertical axis represents the saved costs in AU$.

CONCLUSION This work proposes an approach to critical water main (CWM) failure prediction and selection for condition assessment, which is based on two data analytics techniques, nonparametric Bayesian modelling and constrained binary integer programming.

0

1

2

3

4

5

1% 2.5%

Mil

lio

ns

Weibull + Risk table HBP + Constrained BIP

Page 8: PRIORITISING WATER PIPES FOR CONDITION …ozwater.org/sites/all/files/ozwater/009 YWang.pdf · PRIORITISING WATER PIPES FOR CONDITION ASSESSMENT WITH DATA ANALYTICS Bin Li 1, Bang

The test results on a dataset from the water utility show that the proposed method outperforms the previous method for CWM condition assessment (classical parametric modelling and risk table). In particular, assuming 1% of CWMs are inspected in 2012, the proposed method on average can outperform the previous method in both failure prediction and economic savings by more than doubling the performance of previous methods. REFERENCES Chen, X., Zhou, M., Carin, L. (2012), The

contextual focused topic model. in 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 96-104.

Ferguson, P., Heathcote, M., Moore, G. and

Russsell, D. (1996), Condition assessment of water mains using remote field technology. Journal of AWWA.

Ibrahim, J., Chen, M.-H., Sinha, D. (2001),

Bayesian Survival Analysis, Springer. Kane, G., Zhang, D., Lynch, D., Bendeli, M. (2013),

Sydney Water's critical water main strategy and implementation - a quantitative, triple-bottom line approach to risk based asset management, LESAM 2013 - IWA Leading-Edge Strategic Asset Management.

Li, Z., Zhang, B., Wang, Y., Chen, F., Taib, R.,

Whiffin, V., Wang, Y. (2014), Water pipe condition assessment: a hierarchical beta process approach for sparse incident data. Machine Learning, vol. 95, pp 11-26.

Markowitz, Harry M. (1991), Portfolio Selection,

second edition, Blackwell. Misiunas, D. (2005), Failure monitoring and asset

condition assessment in water supply systems, Doctoral dissertation, Lund University, Sweden.

Papadimitriou, C.H., Steiglitz, K. (1998),

Combinatorial optimization: algorithms and complexity. Mineola, NY: Dover.

Rajani, B.B. and Kleiner, Y. (2001), Comprehensive

review of structural deterioration of water mains: physically based models. Technical report NRCC-43722, National Research Council Canada.