Asset Life Prediction and Maintenance Decision-Making Using a Non-Linear Non-
Gaussian State Space Model
By
Yifan Zhou
Supervised by
Prof. Lin Ma Prof. Joseph Mathew Prof. Rodney Wolff
A thesis submitted in conformity with the requirements for the degree of doctor of philosophy
School of Engineering Systems Faculty of Built Environment and Engineering
Queensland University of Technology 2010
Statement of Original Authorship
The work contained in this thesis has not been previously
submitted to meet requirements for an award at this or any
other higher education institution. To the best of my knowledge
and belief, the thesis contains no material previously published
or written by another person except where due reference is
made.
i
Abstracts Estimating and predicting degradation processes of engineering assets is crucial for
reducing the cost and insuring the productivity of enterprises. Assisted by modern
condition monitoring (CM) technologies, most asset degradation processes can be
revealed by various degradation indicators extracted from CM data. Maintenance
strategies developed using these degradation indicators (i.e. condition-based
maintenance) are more cost-effective, because unnecessary maintenance activities
are avoided when an asset is still in a decent health state. A practical difficulty in
condition-based maintenance (CBM) is that degradation indicators extracted from
CM data can only partially reveal asset health states in most situations.
Underestimating this uncertainty in relationships between degradation indicators and
health states can cause excessive false alarms or failures without pre-alarms. The
state space model provides an efficient approach to describe a degradation process
using these indicators that can only partially reveal health states. However, existing
state space models that describe asset degradation processes largely depend on
assumptions such as, discrete time, discrete state, linearity, and Gaussianity. The
discrete time assumption requires that failures and inspections only happen at fixed
intervals. The discrete state assumption entails discretising continuous degradation
indicators, which requires expert knowledge and often introduces additional errors.
The linear and Gaussian assumptions are not consistent with nonlinear and
irreversible degradation processes in most engineering assets. This research proposes
a Gamma-based state space model that does not have discrete time, discrete state,
linear and Gaussian assumptions to model partially observable degradation
processes. Monte Carlo-based algorithms are developed to estimate model
parameters and asset remaining useful lives. In addition, this research also develops
a continuous state partially observable semi-Markov decision process (POSMDP) to
model a degradation process that follows the Gamma-based state space model and is
under various maintenance strategies. Optimal maintenance strategies are obtained
by solving the POSMDP. Simulation studies through the MATLAB are performed;
case studies using the data from an accelerated life test of a gearbox and a liquefied
ii
natural gas industry are also conducted. The results show that the proposed Monte
Carlo-based EM algorithm can estimate model parameters accurately. The results
also show that the proposed Gamma-based state space model have better fitness
result than linear and Gaussian state space models when used to process
monotonically increasing degradation data in the accelerated life test of a gear box.
Furthermore, both simulation studies and case studies show that the prediction
algorithm based on the Gamma-based state space model can identify the mean value
and confidence interval of asset remaining useful lives accurately. In addition, the
simulation study shows that the proposed maintenance strategy optimisation method
based on the POSMDP is more flexible than that assumes a predetermined strategy
structure and uses the renewal theory. Moreover, the simulation study also shows
that the proposed maintenance optimisation method can obtain more cost-effective
strategies than a recently published maintenance strategy optimisation method by
optimising the next maintenance activity and the waiting time till the next
maintenance activity simultaneously.
Keywords: Degradation model, EM algorithm, Particle filter, Particle smoother,
State space model, Partially observable Markov decision process
iii
Acknowledgements I wish to express my sincere thanks to Prof. Lin Ma who not only leaded me into the
area of engineering asset management but also taught me principles of academic
research. Without the help from Prof. Lin Ma, I could not have overcome the
obstacles and finished my research. Moreover, Prof. Lin Ma also helped me
understand western culture and enjoy my life in Australia.
I would like to thank Prof. Joseph Mathew and Prof. Rodney Wolff for their
valuable advice on my research and assistance in proofreading my papers.
I appreciate the financial support from Queensland University of Technology, China
Scholarship Council, and the Cooperative Research Centre for Integrated
Engineering Asset Management. With their generous support, I can concentrate on
my PhD study without taking any part-time job.
I real want to thank my parents Lihong Zhou and Meijun Fan. They always
encouraged me when I faced difficulties during the PhD study.
I am also grateful to Dr. Sheng Zhang and Dr. Yong Sun for their support, help, and
advice.
Last but not least, I would like to thank Dr. Liqun Zhang, Dr. Eric Kim, Dekui Mu,
Yi Yu, Nima Gorjian, Ruizi Wang, Dr. Avin Mathew, Vladimir Frolov, Fengfeng
Li, and Tony Kim who helped me improve my English, inspired me through fruitful
discussions, and made my life in Australia more memorable.
iv
Table of Contents
1 Introduction ....................................................................................... 1
1.1 Introduction of the Research ................................................................. 1
1.2 Research Objectives and Methodologies .............................................. 4
1.3 Relationships of the Developed Models and Algorithms ..................... 6
1.4 Originality and Significance ................................................................. 8
1.5 Related Publications of the Candidate ................................................ 11
1.6 Structure of the Thesis ........................................................................ 12
2 Literature Review ........................................................................... 15
2.1 Degradation Modelling ....................................................................... 15
2.1.1 Threshold Crossing Models .................................................................. 16
2.1.2 Degradation Models Based on the Hazard Rate Process ...................... 25
2.1.3 State Space Degradation Models .......................................................... 28
2.1.4 Comments ............................................................................................. 32
2.2 Condition-based Maintenance Decision-Making ............................... 33
2.2.1 Inspection Scheduling ........................................................................... 34
2.2.2 CBM Optimisation Objectives ............................................................. 35
2.2.3 CBM Optimisation Methods ................................................................. 36
2.2.4 Imperfect inspections ............................................................................ 39
2.2.5 Comments ............................................................................................. 40
2.3 Solving Algorithms for Nonlinear Non-Gaussian State Space Models41
2.3.1 Basic Inference Algorithms .................................................................. 41
2.3.2 Parameter Estimation Algorism ............................................................ 47
2.3.3 Control Algorithms for the State Space Model .................................... 50
v
2.3.4 Comments ............................................................................................. 53
3 Modelling Correlated Degradation Processes of Direct and
Indirect Indicators .......................................................................... 54
3.1 Introduction ......................................................................................... 54
3.2 Model Formulations and Solving Algorithms .................................... 56
3.2.1 Model Formulations .............................................................................. 56
3.2.2 Parameter Estimation ............................................................................ 57
3.2.3 Variance-Covariance Matrix of the Parameter Estimates .................... 63
3.2.4 Model Selection .................................................................................... 64
3.2.5 Monte Carlo-Based Lifetime Prediction ............................................... 66
3.3 Simulation Study ................................................................................. 68
3.3.1 Parameter Estimation ............................................................................ 69
3.3.2 Performance Investigation .................................................................... 72
3.3.3 Life Prediction ...................................................................................... 76
3.4 Case study: Crack Size Propagation Modelling ................................. 79
3.5 Chapter Summary ............................................................................... 82
4 Joint Modelling of Failure Events and Multiple Indirect
Indicators ......................................................................................... 84
4.1 Introduction ......................................................................................... 84
4.2 Model Formulations and Solving Algorithms .................................... 86
4.2.1 Model Formulations and Notations ...................................................... 86
4.2.2 Parameter Estimation ............................................................................ 87
4.2.3 Indicator Effectiveness Evaluation ....................................................... 91
4.3 Simulation Study ................................................................................. 92
4.3.1 Parameter Estimation ............................................................................ 93
vi
4.3.2 Lifetime Prediction ............................................................................... 94
4.3.3 Effectiveness Evaluation of Indicators ................................................. 96
4.4 Case Study: Lifetime Prediction for the Bearing on a Liquefied
Natural Gas (LNG) Pump ............................................................................. 98
4.4.1 Data Introduction .................................................................................. 98
4.4.2 Model Application .............................................................................. 100
4.4.3 Discussion ........................................................................................... 103
4.5 Chapter Summary ............................................................................. 103
5 Maintenance Strategy Optimisation Using the POSMDP ........ 105
5.1 Problem Formulation ........................................................................ 107
5.2 Regular Maintenance Intervals ......................................................... 108
5.2.1 Solving the POSMDP ......................................................................... 108
5.2.2 Simulation Study ................................................................................ 117
5.3 State-Dependent Maintenance Intervals ........................................... 121
5.3.1 The Formulations and Solution Method of the POSMDP .................. 122
5.3.2 Simulation Study ................................................................................ 124
5.4 Maintenance Strategy Considering Imperfect Maintenance ............. 131
5.4.1 The Formulations and the Solution Method of the POSMDP ............ 132
5.4.2 Simulation Study ................................................................................ 136
5.5 Chapter Summary ............................................................................. 140
6 Conclusions and Future Research Directions ............................ 142
6.1 Conclusions ....................................................................................... 142
6.1.1 Modelling Correlated Degradation Processes of Direct and Indirect
Indicators .......................................................................................................... 143
6.1.2 Joint Modelling of Failure Events and Multiple Indirect Indicators .. 144
vii
6.1.3 Maintenance Strategy Optimisation Using the Continuous State
POSMDP .......................................................................................................... 145
6.2 Future Research ................................................................................ 146
7 References ...................................................................................... 148
8 Appendix ........................................................................................ 164
viii
List of Figures Figure 1-1: Relationships of developed models and algorithms ................................. 7
Figure 3-1: The simulated indirect indicators and direct indicators .......................... 69
Figure 3-2: The development of the parameter estimates ......................................... 71
Figure 3-3: MSEs of the direct indicator estimates when the observation noise is 0.5
( 0.5) ........................................................................................................ 74
Figure 3-4: MSEs of the direct indicator estimates when the observation noise is
0.05 ( 0.05) .............................................................................................. 74
Figure 3-5: Life prediction results when the failure is observable ............................ 77
Figure 3-6: The lifetime distribution predicted at different time points .................... 78
Figure 3-7: The lifetime distribution prediction at 251 when the failure is not
observable ...................................................................................................... 78
Figure 4-1: Three Simulated degradation indicators ................................................. 93
Figure 4-2: The convergence process of the EM algorithm ...................................... 94
Figure 4-3: Estimation of underlying health states .................................................... 95
Figure 4-4: RUL prediction results ............................................................................ 96
Figure 4-5: Pump schematic ...................................................................................... 99
Figure 4-6: Outer raceway spall of P301C .............................................................. 100
Figure 4-7: Inner raceway flaking of P301D ........................................................... 100
Figure 4-8: RUL prediction results of the bearing on P301C .................................. 102
Figure 5-1: Parameters spreading of the censored Gaussian distribution ................ 111
Figure 5-2: Parameters spreading of the Beta distribution ...................................... 111
Figure 5-3: Minimum long-run average cost according to different inspection
intervals when actual health states are observable ...................................... 119
Figure 5-4: The results of the policy iteration when maintenance intervals are regular
and the standard deviation of the observation noise is σ 0.3 ................. 120
Figure 5-5: Some results of the policy iteration for POMDP with irregular
maintenance intervals (the numbers in rectangles are the optimal waiting
durations till the corresponding maintenance actions) ................................ 128
ix
Figure 5-6: Some results of the policy iteration for POMDP considering imperfect
maintenance (the numbers in rectangles are the optimal waiting durations till
the corresponding maintenance actions) ...................................................... 139
x
List of Tables Table 3-1: The mean likelihood function values and the elapsed times of the three
strategies ........................................................................................................ 72
Table 3-1: The measurements of the crack size during the accelerated life test ....... 79
Table 3-2: The AICc of different models .................................................................. 81
Table 4-1: The results of effectiveness evaluation for indicators .............................. 97
Table 4-2: The specifications of the pump ................................................................ 98
Table 4-3 Vibration data features ............................................................................ 100
Table 4-4: Effectiveness evaluation for the three features extracted from the
vibration signals ........................................................................................... 101
Table 4-5: RUL prediction results of the bearing on P301C ................................... 102
Table 5-1: Mean likelihood values of the Censored Gaussian distribution and the
Beta distribution under different observation noise ..................................... 112
Table 5-2: The Monte Carlo-based method that calculates the transition matrix .... 114
Table 5-3: The process of policy iteration for the POSMDP .................................. 117
Table 5-4: The long-run average costs derived by three methods (i.e., the method
simply ignoring the observation noise, the heuristic method, and POSMDP)
when the observation noise level is different .............................................. 120
Table 5-5: The long-run average costs per unit time derived by the POSMDP with
irregular inspection interval and the method proposed by Wang (Wang and
Christer 2000; Wang 2003b). ...................................................................... 130
Table 5-6: The process to calculate the transition matrix using the Monte Carlo
based method ............................................................................................... 135
xi
List of Notations Notations used in different chapters are summarised as follows:
Notations used in Chapter 3:
The direct indicator at time
The indirect indicator at time
·,· The PDF of the Gamma distribution
The shape function of the Gamma process
The scale parameter of the Gamma process
The observation noise of the Gamma-based state space model
The standard deviation of the observation noise
The failure threshold on the direct indicator
·,· The PDF of the Gaussian distribution
The time to perform the th inspection
The direct indicator value at the th inspection
The indirect indicator value at the th inspection
The number of inspections
· The indicator function of inspections when the direct indicator is
observable
The inspection index of the th observable direct indicator
The number of observable direct indicators
xii
The parameter set of the Gamma-based state space model
The parameter set in the system equation of the Gamma-based state space
model
The parameter set in the observation equation of the Gamma-based state
space model
The increment of the direct indicator before the th inspection
The increment of the shape function before the th inspection
·,· The PDF of the Beta distribution
· | · The PDF of the importance density used in particle filtering
The th sample generated at the th inspection time according to the
importance density
The weight of the sample
The th particle at the th inspection time after particle filtering
|, The weight of the th filtering particle at the th inspection corresponding to
the th smoothing particle at the 1th inspection
The th particle at the th inspection time after particle smoothing
The model parameters used to generate simulation data
The parameter estimates derived at the th EM iteration
· The PDF of the observation noise
· The Dirac delta measure
Additional notations used in Chapter 4
xiii
The underlying health state at time
The indirect indicator vector at time
The size of the indirect indicator vector
The failure time
The censoring time
The relative contribution ratio
Additional notations used in Chapter 5
The cost incurred by an inspection
The cost incurred by a preventive replacement activity
The cost incurred by an imperfect maintenance activity
The cost incurred by an unexpected breakdown
The duration of an inspection
The duration of a preventive replacement activity
The duration of an imperfect maintenance activity
The duration of an unexpected breakdown
The original belief of the Gamma-based state space model obtained by
particle filtering
The space of the original belief
·; The projected parametric distribution
Θ The parameter space of the projected parametric distribution ·;
xiv
Ω The projected parametric density space
Ω b The density projection function
Ω The discretised projected parametric density space
The th elements in the discretised projected parametric density space
Ω
The projected belief corresponding to the brand new health state
the relative cost starting in the projected belief state
The relative costs starting in if the “do nothing” strategy is adopted
The relative costs starting in if the “preventive replacement”
strategy is adopted
∆ The inspection interval
∆ | The expected reliability at the next inspection epoch given that the
current belief state is projected as
∆ | The expected survival time during the next inspection interval when
the current projected belief is
The long-run minimum expected cost (downtime) per unit time
The transition matrix in the discretised projected belief space Ω over
one inspection interval
·,· A distance measure defined in the projected belief space
· The policy function
· The optimal maintenance strategy obtained by the policy iteration
∆ The interval of the sampling points of the waiting time for the next
preventive replacement
xv
∆ The waiting duration till the next preventive replacement
∆ The maximum waiting time for the next preventive replacement
∆ The interval of the sampling points of the waiting time for the next
inspection
∆ The waiting duration till the next inspection
∆ The maximum waiting time for the next inspection
∆ The interval of the sampling points of the waiting time for the next
imperfect maintenance
∆ The waiting duration till the next imperfect maintenance
∆ The maximum waiting time for the next imperfect maintenance
, ∆ The relative cost (downtime) when the initial projected belief state is
and inspection is performed after ∆
, ∆ The relative cost (downtime) when the initial projected belief state is
and preventive replacement is performed after ∆
, ∆ The relative cost (downtime) when the initial projected belief state is
and imperfect maintenance is performed after ∆
The transition matrix of the discretised projected beliefs given that the
transition epoch is ∆ and a health inspection is conducted
The interval of the expected reliability when is required to
calculated
| The expected waiting time till the next maintenance activity when the
current belief can be projected to and preventive replacement is
performed
| The expected waiting time till the next maintenance activity when the
xvi
current belief can be projected to and an inspection is performed
| The expected waiting time till the next maintenance activity when the
current belief can be projected to and imperfect maintenance is
performed
Δ | The expected duration of the imperfect maintenance performed after
Δ given the current projected belief state
The transition matrix of the projected belief states after imperfect
maintenance
xvii
List of Abbreviations AHM Additive hazard model
AIC Akaike's information criterion
AICc Akaike's information criterion with a second order correction
BIC Bayesian information criterion
CBM Condition-based Maintenance
CDF Cumulative distribution function
CM Condition monitoring
DPCA Dynamic principle component analysis
E step Expectation step
EKF Extended Kalman filter
EM algorithm Expectation-maximisation algorithm
FFT Fast Fourier transform
HMM Hidden Markov model
HPF High pass filter
KL divergence Kullback–Leibler divergence
LNG Liquefied natural gas
M step Maximisation step
MCMC Markov chain Monte Carlo
MDP Markov decision process
MLE Maximum likelihood estimation
MSE Mean square error
PCA Principle component analysis
PDF Probability density function
PHM Proportional hazard model
POMDP Partially observable Markov decision process
POSMDP Partially observable semi-Markov decision process
PWLC Piecewise-linear and convex
PWLC Piecewise-linear and convex
RMS Root mean square
RUL Remaining useful life
xviii
SAME State-augmentation for marginal estimation
SIR filter Sampling importance resampling filter
SMDP Semi-Markov decision process
SPC Statistical process control
UKF Unscented Kalman filter
1
1 Introduction
1.1 Introduction of the Research
The availability and capability of engineering assets is an important business
objective in modern engineering asset management. An unexpected failure of a
critical engineering asset can cause the breakdown of the whole plant. For high-risk
assets (e.g. helicopters, aircrafts, bridges, and dams), reliability and safety is even
more crucial. Therefore, effective maintenance strategies should be executed to
enhance the reliability and availability of essential assets. The optimisation of
maintenance strategies, in turn, largely depends on the prediction of asset health
condition and failure time.
Conventional research on asset life prediction up to the early nineties has been based
on lifetime distribution. However, assets employed in modern industry are becoming
more and more reliable due to the development of material science and
manufacturing technology. As a result, reliability analysis relying on lifetime
distribution cannot be performed effectively due to the deficiency of failure events.
On the other hand, advanced sensors and computer systems have made more
condition monitoring (CM) data available. Effective indicators extracted from these
CM data can be used to model asset degradation processes. Based on these
degradation indicators, asset lives can be predicted. When durations and costs of
breakdowns and maintenance activities are known, optimal maintenance strategies
can be further developed.
In reality, degradation indicators extracted from CM data have different
relationships with failure mechanisms. Wang classified the information from
degradation processes as “direct information” and “indirect information” (Wang et
al. 2000). Motivated by the research of Wang, this research divides degradation
indicators into two categories: (1) direct indicators (e.g. the thickness of a brake pad,
and the crack depth on a gear) which directly relate to a failure mechanism; and (2)
1 Introduction 2
indirect indicators (e.g. indicators extracted from vibration signals and oil analysis
data) which can only partially reveal a failure mechanism.
Direct and indirect indicators both have advantages and disadvantages. In
degradation modelling, direct indicators are often used to represent the underlying
degradation process of an asset. An asset is regarded as failed when one of its direct
indicators crosses a predetermined failure threshold. In contrast, setting a
predetermined failure threshold on indirect indicators can cause excessive false
alarms or failures without pre-alarms. Therefore, direct indicators are preferable for
asset degradation process modelling given their deterministic failure thresholds
(Grall et al. 2002; Liao et al. 2006b; Crowder and Lawless 2007). However, direct
indicators are often technically or economically impossible to sample frequently. For
example, the crack on the tooth of a gear cannot be measured online. Similarly, the
wear of the impeller in a pump cannot be measured during its operating period.
Directly applying degradation models to these direct indicators with limited sample
size is often not practically possible. Moreover, for some engineering assets, the
failure mechanisms are complex and no direct indicator is available to represent the
underlying degradation processes. For example, Wang used a generic wear condition
as a direct indicator of aircraft engines (Wang 2007). This generic wear condition
was an abstract concept and was not extracted directly from the CM data. Whitmore
proposed a similar model in which a failure was assumed to happen when a latent
process across a predetermined failure threshold. This latent process did not have
particular physical meaning and was only known when a failure happens (Whitmore
et al. 1998). Different from direct indicators, indirect indicators can be often
obtained easily through various CM techniques. However, indirect indicators can
only partially reveal the degradation process of an asset. Consequently, an
appropriate mathematical model should be developed to describe the relationship
between indirect indictors and the related underlying degradation process.
The state space model is an effective approach to reveal the underlying degradation
process of an engineering asset using indirect indicators. The state space degradation
model consists of a state equation and an observation equation. The underlying
1 Introduction 3
degradation process of an engineering asset is modelled by the state equation, and
the relationship between the underlying degradation process and indirect indicators
is described by the observation equation. Subsequently, the state space model
combines both the information from the stochastic underlying degradation processes
and the uncertain relationships between the underlying degradation process and
indirect indicators. Moreover, the state space model is an effective tool for indicators
fusion. Compared with commonly used multivariate statistical approaches and
multivariate time series analysis methods, the state space model can analyse
degradation indicators with uneven sampling intervals.
Difficulties still exist while the state space model is applied to describe practical
asset degradation processes. First of all, existing state space degradation models are
largely discrete in time or state (Jie et al. 2000; Wang 2002; Wang 2006). In
contrast, most asset degradation processes are continuous both in time and state. The
discrete state assumption requires discretising continuous degradation indicators,
which needs expert knowledge and may introduce additional errors. The discrete
time assumption, on the other hand, assumes that maintenance activities and failures
can only happen at discrete time points with regular intervals, which is not consistent
with reality. To overcome shortcomings of discrete degradation model, some
continuous state space degradation models have been proposed. Nevertheless, most
of these continuous models adopt linear and Gaussian assumptions (Whitmore et al.
1998; Hashemi et al. 2003). When a degradation process follows the linear and
Gaussian assumptions, the degradation process is not monotonically increasing. On
the contrary most degradation processes of engineering assets (e.g. wearing,
corrosion, crack growth) are not reversible between two maintenance activities. In
addition, the Gaussian process possesses a diffusion property. Therefore, conditional
probability density functions (PDF) are involved when the likelihood function is
constructed to ensure that a Gaussian process does not drift beyond its failure
threshold between two normal health states (Yuan 2007). Integrals are often needed
to calculate these conditional PDFs, which increases difficulties in establishing and
evaluating the likelihood function. Therefore, by removing the discrete, linear, and
1 Introduction 4
Gaussian assumptions, the state space model can describe asset degradation
processes that are partially revealed by indirect indicators more effectively.
This research adopts a Gamma-based state space degradation model that does not
have discrete state, discrete time, linear, or Gaussian assumptions. Two types of
underlying degradation processes are considered in this research. The first type is
represented by a direct indicator that has particular physical meaning and can be
measured. The second type indicates the overall health condition of an asset and is
only known at failure times. Both the two types are partially revealed by some
indirect indicators. Monte Carlo-based parameter estimation algorithms are
developed to address the non-linear and non-Gaussian property of the Gamma-based
state space model. Lifetime prediction and maintenance strategy optimisation
methods for the Gamma-based state space model are also investigated. Research
objectives and methodologies are introduced in detail as follows.
1.2 Research Objectives and Methodologies
1. Modelling correlated degradation processes of direct and indirect indicators
The first objective of this research is to model the correlated degradation processes
of direct and indirect indicators.
In some applications, direct indicators can be revealed by indirect indicators. For
example, the wear status of the impeller in a slurry pump can be assessed through
the cumulative amplitude measure evaluated from its vane pass frequency (Mani et
al. 2008). Therefore, direct indicators can be estimated through related indirect
indicators. This research develops a Gamma-based state space model to model the
correlated degradation processes of direct and indirect indicators. The Gamma-based
state space model consists of a state equation and an observation equation. The state
equation describes the degradation process of a direct indicator. In this research, the
Gamma process is adopted to model the degradation process of a direct indicator.
The Gamma process has been widely used to model a range of direct indicator
degradation processes, e.g. fatigue crack growth (Lawless and Crowder 2004),
1 Introduction 5
corrosion of pressure vessel (Kallen and Van Noortwijk 2005), and brake-pad wear
for automobiles (Crowder and Lawless 2007). The observation equation models the
relationship between direct and indirect indicators. In this research, the indirect
indicator is assumed as a function of a direct indicator with an additional Gaussian
noise. The parameter estimation algorithm for the Gamma-based state space model
is also developed. The parameter estimation algorithm should be able to process
incomplete observation of direct indicators due to difficulties of measurement. After
model parameters are estimated, a life prediction method based on the Gamma-based
state space model is developed.
2. Joint modelling of failure events and multiple indirect indicators
The second objective of this research is to jointly model failure events and multiple
indirect indicators.
The Gamma-based state space model with multivariate observations is adopted to
combine failure events and multiple indirect indicators. The multiple indirect
indicators are modelled by the multivariate observations and the failure times are
modelled by the first crossing time of the underlying system state process. A
parameter estimation algorithm is developed for the Gamma-based state space model
with multivariate observations. The parameter estimation algorithm can consider
failure times and multiple indirect indicators. Moreover, the parameter estimation
algorithm should also consider the degradation sequences without failure times, i.e.,
censored data. The censored data is caused by preventive replacement or missing
observation of failure events. This research also provides a parametric bootstrap
method to evaluate the effectiveness of different degradation indicators in parameter
estimation and lifetime predication. After the effectiveness of different indicators is
identified, a more economical CM system can be obtained by ignoring unnecessary
sensors.
3. Maintenance strategy optimisation
1 Introduction 6
The third objective of this research is to develop a maintenance strategy optimisation
algorithm for an asset whose degradation process follows the Gamma-based state
space model.
This research develops a Monte Carlo-based continuous state partially observable
Semi-Markov decision process (POSMDP) to model the maintenance decision-
making process of an asset that follows the Gamma-based state space model.
Optimal maintenance strategies can be obtained by the solving the POSMDP.
Inspections, preventive replacement, corrective replacement, and imperfect
maintenance are all considered. Strategies that minimise the expected cost per unit
time and maximise the availability are both investigated. The next maintenance
activity and the waiting time till the next maintenance activity are optimised
simultaneously to achieve an optimal maintenance strategy.
4. Model and algorithm validations
The last objective of this research is to validate the developed models and algorithms
using simulation and field data.
Firstly, this research performs simulation study to investigate the performance of the
developed parameter estimation algorithms, life prediction algorithms, and
maintenance strategy optimisation algorithms. Secondly, a case study using the data
collected from the accelerated life test of a gear box is conducted to validate the
advantages of the state space degradation model without linear and Gaussian
assumptions when processing monotonically increasing direct indicators. Thirdly, a
case study that uses a field dataset from a liquefied natural gas industry is conducted
to validate the asset RUL prediction ability of the Gamma-based state space model.
1.3 Relationships of the Developed Models and Algorithms
As shown in Figure 1-1, this research is divided into three parts, i.e., modelling
correlated degradation processes of direct and indirect indicators, joint modelling of
failure events and multiple indirect indicators, and maintenance strategy
1 Introduction 7
optimisation. The first part models the degradation processes of direct and indirect
indicators by the Gamma-based state space model. The second part jointly models
failure events and multiple indirect indicators by the Gamma-based state space
model with multivariate observations. The outputs of the first two parts are model
parameters, model formulations, and RULs. The RUL is only a reference for asset
management decision support. After costs and durations of breakdowns and
maintenance activities are known, an optimal maintenance strategy can be further
developed to reduce costs and enhance availability. Consequently, the last part of
this research (i.e., maintenance strategy optimisation) is based on the model
parameters and formulations that are derived by the first two parts and additional
information about costs and durations of breakdowns and maintenance activities.
Direct Indicators Indirect Indicators Failure times
1. Modelling correlated degradation process of direct and indirect indicators
2. Joint modelling of failure events and multiple indirect indicators
Model Parameters and formulations Remaining useful life
Costs and durations of breakdowns maintenance activities
3. Maintenance strategy optimisation
Optimal maintenance strategies
Input data Three parts of this research Output results
Figure 1-1: Relationships of developed models and algorithms
1 Introduction 8
1.4 Originality and Significance
The state space model is an effective tool to model partially observable asset
degradation processes. However, only a small number of state space models that are
applied to degradation modelling do not have discrete time, discrete state, linear and
Gaussian assumptions. These applications largely adopt the physical-based approach
and assume that model formulations and parameters are known (Cadini et al. 2009;
Orchard et al. 2009).
This research for the first time adopts a Gamma-based state space model to describe
asset degradation processes. The parameter estimation, lifetime prediction, and
maintenance strategy optimisation algorithms are systematically investigated. The
algorithms developed in this research can be also used to process other nonlinear
non-Gaussian state space degradation models. Detailed originality and significance
is summarised as following three parts:
1. This research develops a Gamma-based state space model to describe the
correlated degradation processes of direct and indirect indicators
This research for the first time use a Gamma-based state space model to model the
correlated degradation processes of direct and indirect indicators. A parameter
estimation method that considers indirect indicators and incomplete direct indicator
observations is developed. Issues encountered when a nonlinear non-Gaussian state
space model is applied to degradation modelling are addressed systematically.
Advantages of the developed models and algorithms are as follows:
1) The monotonically increasing property of the Gamma process is consistent
with irreversible degradation processes of most direct indicators.
Consequently, the Gamma process has been widely applied in modelling
direct indicators (Lawless and Crowder 2004; Kallen and Van Noortwijk
2005; Park and Padgett 2005b; Liao et al. 2006a; Yuan 2007; van Noortwijk
2009). Therefore, the Gamma-based state space model is expected to achieve
1 Introduction 9
a better fitness result when used to model the correlated degradation
processes of direct and indirect indicators
2) The observation equation of the Gamma-based state space model does not
have linear assumption. Therefore, more complex relationships between
direct and indirect indicators in practice can be described.
3) The developed EM algorithm can process incomplete direct indicator
observations. Direct indicators are often more difficult to obtain than indirect
indicators, and samples of direct indicators are often incomplete. Therefore,
the developed EM algorithm can make use of practical degradation dataset
more efficiently.
4) Existing research on the Gamma process also provides approaches to
consider operation conditions and unit-specific random effects during
degradation modelling (Lawless and Crowder 2004). Based on these
approaches, the proposed Gamma-based state space model can be extended
to deal with more complicated case studies.
2. The developed Gamma-based state space model is also extended to jointly
model failure times and multiple degradation indicators
In this research, the Gamma-based state space model also jointly models failure
events and multiple indirect indicators. Some original work has been done:
1) This research develops a Monte Carlo-based EM algorithm to estimate the
parameters of the Gamma-based state space model. The multiple degradation
indicators and event data are all considered during parameter estimation. The
proposed parameter estimation method can also use censored data whose
failure time is unknown.
2) This research develops an initial parameter identification method for the
Gamma-based states space model using the method of moments and
properties of the Gamma process. The proposed method can process
degradation data with irregular inspection intervals.
1 Introduction 10
3) This research develops a parametric bootstrap method to evaluate the
effectiveness of different degradation indicators that are adopted in the
Gamma-based state space model.
Advantages of the developed models and algorithms are as follows:
1) The situation where event data are insufficient can be overcome by
considering both degradation indicators and event data. In addition, the
Gamma-based state space model can also consider the censored data whose
failure time is unknown. These additional censored data can improve the
accuracy of parameter estimation and RUL prediction (Heng et al. 2009).
2) The monotonically increasing property of the Gamma process makes the
establishment of a likelihood function easier when a failure is considered.
For example, during the calculation of the likelihood function for a state
space model based on the Gaussian process, conditional PDFs are required to
ensure that the underlying degradation process does not drift across a failure
threshold between two normal states (Whitmore et al. 1998). Integrals are
needed in these conditional PDFs, which increases difficulties in establishing
and evaluating likelihood functions.
3) This research provides a method to evaluate the effectiveness of different
indicators. Consequently, a more cost effective condition monitoring system
can be established by only installing necessary sensors, and the size of the
database that stores degradation indicators can be also reduced, after the
effectiveness of different degradation indicators is identified.
4) The developed algorithms in this research can be also used to process state
space model with other non-Gaussian underlying system processes. As a
result, the failure time following different distributions can be modelled.
3. This research proposes a continuous state POSMDP to optimise the
maintenance strategy for an asset whose degradation process follows the
Gamma-based state space model
Existing partially observable Markov decision processes (POMDP) that are applied
in maintenance strategy optimisation is discrete in time and state, while the
1 Introduction 11
POSMDP developed in this research can process the state space degradation model
continuous in time and state. Moreover, the proposed POSMDP can deal with the
non-Gaussian state space degradation model which has not been discussed in
literature.
The proposed continuous state POSMDP has following advantages when used to
optimise maintenance strategies:
1) The continuous time property enables the failure happen at any time and can
process irregular inspection intervals. The continuous state, on the other
hand, avoids discretising continuous asset health states, which may introduce
errors and affect the maintenance strategy optimisation results.
2) Monte Carlo-based methods are used to solve the POSMDP. Consequently,
the proposed POSMDP can be adopted to process various state space models
without Gaussian assumptions.
3) As an extension of the Markov decision process (MDP), the POSMDP can
optimise maintenance strategies without specifying a predetermined strategy
structure (e.g. the control limit theory). Therefore, the POSMDP can derive
more flexible maintenance strategies when multiple maintenance activities
can be chosen from.
4) The POSMDP decomposes a long-run decision process into single steps.
Subsequently, some practical issues (e.g., state-dependent maintenance costs
and durations, and uncertain maintenance effects) can be formulated
concisely.
1.5 Related Publications of the Candidate
1. Refereed International Journals
Zhou, Y., L. Ma, et al. (2009). "Asset Life Prediction Using Multiple Degradation
Indicators and Failure Events: A Continuous State Space Model Approach."
Maintenance and Reliability 44: 72-81.
1 Introduction 12
Zhou, Y., Y. Sun, et al. "Latent Degradation Indicators Estimation and Prediction: a
Monte Carlo Approach." Mechanical Systems and Signal Processing. In
Press.
Zhou, Y., L. Ma, et al. "Maintenance Decision-Making with Multiple Maintenance
Options Using a Continuous-State Partially Observable Semi-Markov
Decision Process." Microelectronics Reliability. In Press.
2. Refereed International Conferences
Zhou, Y., L. Ma, et al. (2009). Asset Life Prediction Using Multiple Degradation
Indicators and Lifetime Data: a Gamma-Based State Space Model Approach.
ICRMS' 2009. Chengdu, China, IEEE.
Zhou, Y. (2010). Maintenance Decision-Making Using a Continuous-State Partially
Observable Semi-Markov Decision Process. IEEE – Prognostics & System
Health Management Conference. Macau, China, IEEE.
Zhou, Y., L. Ma, et al. (2008). A Gamma-based Continuous State Space Model for
Asset Degradation WCEAM-IMS. Beijing, China, Springer-Verlag London
Ltd: 1981-1991.
Zhou, Y., L. Ma, et al. (2008). Latent Degradation Indicator Estimation Using
Condition Monitoring Information. WCEAM-IMS 2008. Beijing, China,
Springer-Verlag London Ltd: 1967-1980.
Yu, Y., L. Ma, et al. (2008). Confidence Interval of Lifetime Distribution Using
Bootstrap Method. WCEAM-IMS. Beijing, China, Springer-Verlag London
Ltd: 1883-1890.
1.6 Structure of the Thesis
Chapter 1 Introduction
1 Introduction 13
At the beginning of this chapter, the background and topic of this research are
presented. Then, the objectives and methodologies of this research are identified.
After that, the relationships of the models and algorithms developed in this research
are presented. Finally, the originality and significance of this research is
summarised.
Chapter 2 Literature Review
The literature review is divided into three parts, i.e., degradation modelling, CBM
Decision-Making, and solving algorithms for nonlinear non-Gaussian state space
models. The first part surveys different aspects of degradation modelling. The
second part reviews literature that solves different issues involved in CBM decision-
making. The last part of the literature review investigates existing algorithms that
process nonlinear non-Gaussian state space models.
Chapter 3 Modelling Correlated Degradation Processes of Direct and Indirect
Indicators
In this chapter, the underlying health state of an asset is represented by a direct
indicator that relates to a failure mechanism directly. The direct indicator cannot be
sampled frequently due to difficulties of measurement. However, the direct indicator
is assumed to be partially revealed by an indirect indicator. The Gamma-based state
space model with a single degradation indicator is used to model the correlated
degradation processes of direct and indirect indicators. The parameter estimation and
lifetime predication algorithms for the Gamma-based state space model are
developed. The performance of the developed algorithms is evaluated by simulation
studies. Finally, a case study using the data collected from the accelerated life test of
a gear box is conducted to demonstrate the disadvantage of the linear and Gaussian
assumption in modelling the particular asset degradation process.
Chapter 4 Joint Modelling of Failure Events and Multiple Indirect Indicators
This chapter considers the situation that direct indicators are not available. In this
situation, the underlying degradation process becomes an abstract process which is
assumed to be only known at failure time. A Gamma-based state space model with
1 Introduction 14
multiple indicators is used as a joint model of multiple degradation indicators and
failure events. A Monte Carlo-based parameter estimation method that considers
multiple degradation indicators is developed. Censored data whose failure times are
unknown can be used during the parameter estimation. A parametric bootstrap
method is developed to evaluate the effectiveness of different indicators. Finally a
case study that uses the vibration data collected from a liquefied natural gas (LNG)
industry is conducted to validate the RUL prediction ability of the Gamma-based
state space model.
Chapter 5 Maintenance Strategy Optimisation Using the POSMDP
This chapter develops a continuous state POSMDP to optimise maintenance
strategies of an asset whose degradation process follows the Gamma-based state
space model. The continuous state POSMDP is converted to a semi-Markov decision
process (SMDP) through a Monte Carlo-based density projection. Optimal
maintenance strategies are then obtained by solving the converted SMDP through
policy iteration. The maintenance strategies with regular inspection interval,
irregular inspection interval, and imperfect maintenance are investigated using the
POSMDP. Simulation studies are carried out to validate the effectiveness of the
POSMDP.
Chapter 6 Conclusions and Future Research Directions
The last chapter summaries the whole thesis and identifies some possible future
research directions.
15
2 Literature Review Before the Gamma-based state space model is introduced and the corresponding
RUL prediction and maintenance strategies optimisation algorithms are investigated,
the related literature is reviewed and discussed first. This literature review is divided
into three parts. Because asset life prediction and maintenance strategy optimisation
both depend on asset degradation process modelling, commonly used degradation
modelling methods are reviewed in Section 2.1. After that, Section 2.2 discusses
different issues involved in maintenance decision-making. In this research, RUL
prediction and maintenance strategy optimisation are both based on the solving
algorithms for nonlinear non-Gaussian state space models. Therefore, these solving
algorithms are summarised in Section 2.3.
2.1 Degradation Modelling
From the perspective of statisticians, degradation or aging “pertains to a unit’s
position in a state space wherein the probabilities of failure are greater than in a
former position” (Singpurwalla 2006). According to that definition, degradation is
an abstract conception which cannot be observed or measured directly. However,
some physical indicators that reveal a degradation process may exist. When these
degradation indicators are obtained, it is possible to describe a degradation process
with mathematical models. In the reliability community, these mathematical models
are called degradation models. Once a degradation model is established, asset health
prediction and maintenance strategy optimisation can be conducted. Hence,
degradation models play a vital part in condition based maintenance (CBM).
A degradation model usually contains two components, i.e., degradation processes
of indicators, and the relationship between degradation indicators and failure events.
According to different ways to describe the relationship between degradation
indicators and failure events, commonly used degradation models can be classified
into three types. The first type is threshold crossing models in which a failure time is
modelled by the time when the degradation process of an indicator crosses a failure
2 Literature Review 16
threshold. The second type degradation models depend on a hazard rate process. In
these degradation models, degradation indicators are related to a failure time
distribution through a hazard rate process. The last type of degradation models,
namely the state space model, consists of a state equation and an observation
equation. The state equation models an underlying degradation process, and the
observation equation describes the relationship between the underlying degradation
process and related degradation indicators. The three types of degradation models
are reviewed in detail in the following sections.
2.1.1 Threshold Crossing Models
Threshold crossing models are the most commonly used degradation models in
engineering asset management. In a threshold crossing model, a failure is assumed to
happen when a degradation indicator crosses a failure threshold. A threshold
crossing model contains two critical components, i.e., degradation processes of
indicators, and failure thresholds on these indicators. In practice, the value of a
degradation indicator at a certain time points follows a conditional distribution given
the operation time, historical degradation indicator observations, and historical
working environment. A flexible mathematical model with a reasonable number of
parameters should be established to derive this conditional distribution. Section
2.1.1.1 reviews commonly used approaches to modelling the stochastic development
process of degradation indicators. The second issue of the threshold crossing model
is the identification of a failure threshold. For a direct indicator that directly relates
to a failure mechanism (e.g., the thickness of a brake pad and the crack depth on a
gear), a fixed failure threshold can be identified. On the other hand, a random failure
threshold is required to set on an indirect indicator that does not directly relate to a
failure mechanism (e.g., the indicators extracted from vibration signals and oil
analysis data). Methods to identify a failure threshold are reviewed in Section
2.1.1.2.
2 Literature Review 17
2.1.1.1 Degradation Process Modelling
Indicators obtained during asset deterioration are essential information to disclose
asset degradation processes. An appropriate model for degradation indicators is
indispensable for accurate asset life prediction and effective maintenance strategy
optimisation. According to the difference in mathematical assumptions, degradation
indicator modelling methods can be largely classified into four categories, i.e. the
general path model, the random process model, the Markovian stochastic process
model, and the time series model. The four types of modelling methods are reviewed
in following paragraphs. In additions, this section also surveys the method to model
multivariate degradation processes.
General Path Models
The general path model or the degradation path approach assumes that the
degradation curves of a group of assets follow an identical function form. To model
the difference among individual assets, some of the function parameters are assumed
to be random variables. For example, the degradation curves of similar assets
follow the multiplication of a same function with different random variables
; 1,2, … , , i.e., the degradation curve of the th asset is · (Zuo et
al. 1999). The simplest general path model is the random deterioration rate model
(Frangopol et al. 2004). In random deterioration rate model, the degradation process
is described as · , where, the is a random deterioration rate.
The assumption of the general path model is that the function forms of degradation
curves are known except several individual-independent random parameters.
Therefore, the general path model needs relatively few training samples and is still
effective when only sparse samples of degradation indicators are available.
However, the general path model is not flexible enough. The temporal uncertainty is
not considered due to the predetermined function form of degradation curves.
Random Process Model
2 Literature Review 18
The random process model presumes that the degradation indicators of different
individual assets at the th inspection time point ( 1,2, … , ) follow a certain
distribution where is the number of inspections. The parameters of that distribution
is a time-dependent parameter vector . The function of this time-dependent
parameter vector can be estimated by the regression using distribution
parameter vectors estimated at different times, i.e. ; 1,2, … , .
Consequently, stochastic degradation processes of indicators are presented by
changed distribution parameters at different inspections (Zuo et al. 1999).
Compared with the general path model, the random process model does not require
the degradation curve of each unit to follow the same function form. Therefore, it is
more flexible. However, the random process model requires distribution parameters
of a degradation indicator at every inspection. As a result, there should be enough
degradation indicator observations at each inspection. This assumption is often
impractical. Moreover, inspection schedules of different units may not be identical.
In this case, interpolation is required (Jiang and Jardine 2008). The interpolation
may introduce additional uncertainties into the process of parameter estimation.
Another disadvantage of the random process model is that the distributions of
degradation indications of different individuals only depend on time. Subsequently,
the heterogeneity of different subjects is not considered.
Markovian Stochastic Process Model
Both the general path model and the random process model are largely based on
regression. These regression models have been widely applied in practice. However,
limitations of these conventional regression based models were reported recently
(Yuan 2007). One of the limitations is that the regression model does not consider
the temporal uncertainty during the progress of degradation. In practice, most
degradation processes experience temporal uncertainty because of individual
randomness and dynamic environment.
2 Literature Review 19
Distinct from the general path model and the random process model, the stochastic
process model introduces time-varying uncertainties into degradation modelling.
Therefore, the stochastic process model is more flexible while fitted to practical
degradation indicators. Various stochastic processes have found their applications in
degradation process modelling (Singpurwalla 1995). Most of these stochastic
processes have the assumption of independent increments. The occurrence times and
the sizes of these increments follow different distributions as to diverse stochastic
processes (Yuan 2007). Following paragraphs introduce some stochastic processes
commonly used in degradation modelling.
The Markov chain is a widely used stochastic degradation model discrete in time
and state. The Markov chain can be represented as ; 0,1,2, … , where
denotes the degradation indicator at time . All the possible values of are
contained in a state space . The states in the space can shift to each other
according to a transient matrix . In the state space , there is a special state
called the absorb state which represents the failure state of an asset. The number of
remaining steps required to move from the current state to the absorb state is
the RUL of the asset. The distribution of the RUL can be obtained according to the
transition matrix. The Markov chain degradation model relies on two assumptions.
One is that the inspection interval is regular, and the other is that the future health
state depends only on the current one. Morcous investigated the impact of the two
assumptions on the effectiveness of a degradation model using the Markov chain
(Morcous 2006).
A drawback of the Markov chain is that the transition probability does not relate to
the resident time of the Markov chain at the current state. The semi-Markov model
overcomes this shortcoming by assuming the residing time of a state follows a
specified distribution. Thus, the transition probability depends on the time spent at
the current state. Black et al. described the process of fitting a semi-Markov model
to a degradation dataset (Black et al. 2005). A case study using a degradation dataset
of switchgear oil was conducted in that paper.
2 Literature Review 20
The semi-Markov model removes the discrete time assumption of the Markov chain.
However, it still has the discrete state assumption. The discrete state assumption
requires discretisation of continuous asset health states. To avoid errors brought in
by this discretisation, stochastic degradation processes continuous in state have been
also adopted in degradation modelling. Two continuous state stochastic processes
commonly used in degradation modelling are the Wiener process and the Gamma
process. The stochastic process ; 0 is said to be the Wiener process, when
the independent increment follows a Gaussian distribution with
mean and variance , for all , 0 . The and are called the drift
parameter and the diffusion parameter, respectively. Similarly, a continuous
stochastic process ; 0 is the Gamma process when the independent
increment follows a Gamma distribution , .
The increasing function is the shape function, while 0 is the scale
parameter. The normal distribution and the Gamma distribution both belong to the
class of infinitely divisible distributions. Therefore, they are adopted as the
distribution of independent increments of continuous stochastic process (Castanier et
al. 2005).
The Wiener process has been widely used in degradation modelling. Whitmore and
Schenkelberg developed a degradation model using the Wiener process with a time
scale transformation (Whitmore and Schenkelberg 1997). A case study using
accelerated life testing data was conducted in that paper. Whitmore et al. developed
a bivariate Wiener process to deal with the situation when degradation indicators did
not relate to a failure deterministically (Whitmore et al. 1998). By defining a failure
as the first crossing time of the Wiener process to a certain threshold, Lee et al.
assessed the mortality risk of the work environment railway workers (Lee et al.
2004). The most promising property of the Wiener process is its convenient
mathematical characteristics. For the Wiener process with a white noise, an
analytical likelihood function can be obtained (Hashemi et al. 2003). Explicit
expressions are available when stress and strength are assumed to be two
2 Literature Review 21
independent Wiener processes (van Noortwijk 2009). However, as a non-
monotonically increasing stochastic process, the Wiener process shows its disability
while describing unrecoverable engineering asset deterioration processes.
Different from the Wiener process, the Gamma process has a monotonically
increasing property. This monotonically increasing property makes the Gamma
process more appropriate to model an irreversible degradation process. In addition,
efficient algorithms have been also developed for the simulation and parameter
estimation of the Gamma process. Due to the monotonically increasing property and
mathematical tractability, the Gamma process has been extensively adopted in
engineering asset degradation modelling. Two case studies about carbon-film
resistors and fatigue crack sizes were carried out using the Gamma process (Park
and Padgett 2005a). Kallen and van Noortwijk used the Gamma process to model
corrosion damage mechanism and imperfect inspections were considered (Kallen
and Van Noortwijk 2005). The random effects among individuals as well as
environment covariates were considered while modelling degradation processes
using the Gamma process (Lawless and Crowder 2004). A comprehensive review
about applications of the Gamma process in degradation modelling was conducted
by Van Noortwijk (van Noortwijk 2009). The parameter estimation and simulation
methods of the Gamma process were summarized in that paper.
Time series models
Time series refers to a sequence of data points obtained at uniform time intervals.
Time series analysis aims to understand the underlying mechanism of an observed
data sequence. Based on the underlying mechanism, useful information can be
obtained and the accurate prediction of the observed sequence can be conducted.
Time series is a special case of stochastic processes. Compared with Markovian
stochastic processes, time series models do not necessarily follow the Markovian
assumption. Subsequently, time series analysis can process more generic
degradation sequence. Some applications of time series models to describe asset
2 Literature Review 22
deterioration processes have been conducted (Stavropoulos and Fassois 2000;
Huitian et al. 2001; Lu et al. 2001).
Compared with Markovian stochastic processes, applications of the time series
analysis in degradation modelling are relatively few. A main reason is that, in time
series analysis, a relatively long sequence of data is required for model identification
and prediction. In reality, this kind of long degradation data sequence is not
common. However, degradation data sequence with enough length will be available
for some critical assets with the development of CM technology. Furthermore,
extensions of time series analysis theories (e.g., multivariate time series, intervene of
time series, and non linear time series) will make the modelling of practical
degradation data more feasible.
Multivariate Degradation Processes
In reality, an asset may have multiple failure modes; a failure mode may be revealed
by several degradation indicators. The issue of multiple degradation indicators has
been widely discussed in literatures (Whitmore et al. 1998; Lu et al. 2001; Wang and
Coit 2004; Xu and Zhao 2005). The multiple degradation indicators may be
governed by a single degradation process. Therefore correlations may exist among
these degradation indicators. A degradation model should consider these correlations
among indicators. Alternatively, the inter-dependent degradation indicators should
be transformed into independent indicators by the dimension reduction algorithms.
Principle component analysis (PCA) is a commonly used dimension reduction
technique. The idea of PCA is generating a new set of variables, namely principal
components. Each principal component is a linear combination of the original
variables. All the principal components are orthogonal to each other. Consequently
there is no redundant information. Lin et al. applied PCA to reduce the number of
covariates in the proportional hazard model (PHM) (Lin et al. 2006). Wang and
Zhang adopted the PCA to reduce the dimension of the original data set while doing
oil analysis (Wang and Zhang 2005). The original PCA was extended to dynamic
2 Literature Review 23
principle component analysis (DPCA) to deal with autoregressive multivariate
degradation data (Makis et al. 2006).
While the number of observations is tractable, asset health prediction can be also
performed without dimension reduction. Liao et al. dealt with each of multiple
degradation indicators by the nonlinear model fitting separately (Liao et al. 2006b).
However, the correlations among indicators should be taken into consideration when
they are significant. Wang and Coit showed that an incorrect independence
assumption may underestimate system reliability (Wang and Coit 2004). Whitmore
et al. treated the multiple observations as a multivariate Wiener process. The
relationships among observations were modelled by the covariance matrix
(Whitmore et al. 1998). A multivariate time series model was employed to describe
interdependent degradation data by Lu (Lu et al. 2001).
2.1.1.2 Threshold Identification
Once the development process of a degradation indicator is described by a
mathematical model, a well defined threshold is required to indicate the occurrence
of a failure. Various threshold identification approaches have been developed
according to various failure mechanisms, diverse degradation indicator properties,
and different failure data accessibility.
Failures of assets are largely divided into two categories, i.e. soft (degradation)
failure, and hard (catastrophic) failure (Zuo et al. 1999). A soft failure happens when
the performance of a device deteriorates to an unacceptable level. A soft failure does
not cause an immediate breakdown. For a soft failure, the threshold largely depends
on industry standards, expert knowledge, and the result of optimisation. A hard
failure refers to the completely breakdown of an asset. It can happen at any time
during asset operation. However, the probability of a hard failure often relates to the
health state of an asset (Singpurwalla 2006).
2 Literature Review 24
The threshold of a hard failure can be also determined by industry standards or
expert knowledge. However, when enough failure records are available, statistical
approaches to identifying the threshold are preferable. Three commonly used
approaches are adopted to identify the relationship between survival probability and
degradation indicators, i.e. PHM, logistic regression, and multiple time-scale
modelling. The PHM is originally used to model the effects of multiple
environmental covariates. However, some researchers also employed the PHM to
identify the distribution of a random failure threshold on multiple indicators (Jiang
and Jardine 2008). The PHM is introduced in Section 2.1.2.1 in detail. The logistic
regression is a standard statistical technique for binomially distributed
response/dependent variables. When used to identify the failure threshold, the
logistic regression can be written as
Pr failure| ∑∑ , (2-1)
where is the regression coefficient vector and is the degradation indicator vector.
Xu and Zhao adopted the logistic regression model to identify the relationship
between degradation indicators and the probability of a catastrophic failure (Xu and
Zhao 2005). The multiple time-scale modelling method identifies a new time scale
that consists of various usages and calendar time. In the new time scale, the lifetime
distribution has the minimum coefficient of variation. In other words, the failure
time is the most predictable in the obtained composite time scale. A failure threshold
can be then set on that composite time scale. Besides calendar time and variety of
usages, the time scale can also include degradation indicators (Jiang and Jardine
2006).
In most situations, an asset degradation process is often revealed by multiple
degradation indicators. Two methods can be adopted to identify a failure threshold
using multiple degradation indicators. The first approach combines multiple
degradation indicators into one composite degradation indicator through the PHM or
the multiple time scale model (Jiang and Jardine 2006; Makis et al. 2006). A
threshold can be set up on the composite degradation indicator. An alternative
approach processes the development of multiple degradation indicators as a
2 Literature Review 25
degradation curve in a multi-dimensional space. Several threshold boundaries can be
set in the multi-dimensional space. Each boundary corresponds to a particular failure
mode (Lu et al. 2001; Lee and Whitmore 2006).
Most threshold regression methods require enough failure event data. However,
failure history is not always sufficient in reality. Several approaches have been
proposed to overcome the shortage of failure data. Besides the incorporation of
industry standards and expert knowledge, the statistical process control (SPC) is
another approach to set up a criterion for failure occurrences. SPC is an effective
abnormality detection tool, which has the ability to disclose abnormal behaviours
from the processes. Different from other threshold identification methods, SPC is
not based on the abnormal data. Instead, it infers thresholds and control principles
from the normal data. SPC has been used to detect abnormality from the CM signals.
Goode et al. developed an SPC method to divide the whole operational cycle of the
machine into a stable stage and a failure stage (Goode et al. 2000).
2.1.2 Degradation Models Based on the Hazard Rate Process
Hazard rate, a measure of development of risk, plays a fundamental role in reliability
analysis. Hazard rate is defined as the rate of failure for the survivors during the next
instant of time. In discrete time situation, the hazard rate is given by
Δ Δ · , (2-2)
where is the survival function. For continuous time situation, the hazard rate is
written as
lim Δ Δ · ⁄ , (2-3)
where is the PDF of the lifetime. In this review, the hazard rate is continuous in
time without special statement. When the hazard rate function is identified, the
CDF of the failure time can be calculated as
1 exp . (2-4)
Therefore, the life time distribution can be calculated if the hazard rate function is
obtained. Some degradation models assume that the hazard rate is a function of
2 Literature Review 26
degradation indicators or environment covariates. Consequently, the lifetime
distribution is related to values of future degradation indicators or environment
covariates. These degradation models that are based on the hazard rate process
consist of two components: (1) the relationships between the covariates and the
hazard rates, (2) the degradation processes of degradation indicators and the change
of environment covariates.
2.1.2.1 Covariate-Hazard Relationship Modelling
The PHM proposed by Cox (Cox 1972) is a commonly used approach to model the
relationship between covariates and hazard rates. The formulation of PHM is given
by
| exp , (2-5)
where is the baseline hazard rate at time , the is the covariates vector at
time , and the is the corresponding regression coefficient vector. Parameter
estimation and fit in goodness test approaches for the PHM with time independent
covariates can be obtained from (Prasad and Rao 2002). Liao et al. introduced a
parameter estimation method for the PHM with time dependent covariates (Liao et
al. 2006b). However, sufficient failure event data required by existing parameter
estimation methods are sometimes not available in reality. To overcome the shortage
of data, an approach to incorporate expert knowledge into parameter estimation was
proposed (Zuashkiani et al. 2006). An important assumption of the PHM is the
effects of covariates are time independent. This time independent assumption of
PHM is not always true. Kumar and Westberg provided a method to convert a time
dependent covariate to several time independent covariates which can be processed
by the PHM (Kumar and Westberg 1996).
An alternative way to model the relationship between covariates and a hazard rate
processes is the additive hazard model (AHM). In contrast to the multiplicative form
in (2-5), the formulation of the AHM given by
| (2-6)
2 Literature Review 27
follows an additive form. The AHM does not have the proportional hazard rate
assumption. Therefore, the AHM is more flexible than the PHM. For some
applications, the AHM has more plausible performance than the PHM (Lin and Ying
1994). In the original AHM, the regression vector is time dependent. This time
dependent property makes the AHM more flexible. However, the number of
covariates is limited due to the complexity of the parameter estimation. Lin and Ying
treated the regression vector as time independent to simplify the AHM.
Subsequently, they could estimate the model parameters using a partial likelihood
function similar to the PHM (Lin and Ying 1994). To strike a balance between the
flexibility and mathematical tractability, McKeague and Sasieni proposed a partly
parametric additive risk model(McKeague and Sasieni 1994). The partly parametric
additive risk model assumed that only a part of elements in regression vector were
time dependent.
Some research developed a hybrid model by combining the PHM and the AHM:
| , exp . (2-7)
Lin and Ying investigated the additive-multiplicative hazard model and proposed a
class of efficient parameter estimation method (Lin and Ying 1995). Based on the
hybrid model proposed by Lin and Ying, Torben and Thomas treated the regression
coefficients and as time dependent variables (Torben and Thomas 2002). The
additive-multiplicative hazard model was employed to investigate the mortality from
cancer (Kravdal 1997). A transformed hazard model was proposed as a unified
formulation of the additive, multiplicative and hybrid hazard model (Zeng et al.
2005).
2.1.2.2 Degradation Indicator and Environmental Covariate
Modelling
After the relationship between covariates and hazard rate is established, the hazard
rate process can be induced from the processes of degradation indicators and
environmental covariates. In some applications, properties of an asset that can affect
2 Literature Review 28
the failure time (e.g., material of an component) and some environmental covariates
are time independent (Prasad and Rao 2002). For these time independent covariates,
the hazard rate process is calculated as a deterministic function of time, and the
lifetime distribution can be obtained straightforwardly according to Equation (2-4).
For degradation indicators or dynamic environmental covariates, deterministic
functions of time were used as approximations in some applications. Liao et al.
approximated indicator degradation processes as polynomial functions of time (Liao
et al. 2006b). Then, a deterministic hazard rate process was calculated according to
these indicator functions and the PHM. In reality, a deterministic function of time is
often not flexible enough for stochastic development processes of degradation
indicators and environmental covariates. Banjevic and Jardine employed a
continuous time discrete state Markov process to model degradation indicators that
are extracted from the results of oil analysis (Banjevic and Jardine 2006). Makis et
al. adopted the same approach as Banjevic (Makis et al. 2006). However, the
dynamic principal component analysis (DPCA) was performed before the PHM
model was applied to reduce the size of the degradation indicator vector. Some
stochastic processes continuous in time and state are also used to model degradation
indicators and dynamic environmental covariates. However, to facilitate the
calculation of the survival function, some assumptions about degradation indicator
(environmental covariate) processes and the relationship between the hazard rate and
degradation indicators (environmental covariates) are often made. Yashin et al.
assumed that degradation indicator (environmental covariate) follow a Wiener
process and the hazard rate was a function of the covariates (Yashin and Manton
1997; Yashin et al. 2007). The survival function was then calculated according to
Cameron-Martin approach.
2.1.3 State Space Degradation Models
In a state space model, the dynamic characteristics of a system are modelled by a
system state process. A general formulation of a discrete time state space model is
given by
Γ (2-8)
2 Literature Review 29
and
(2-9)
(Garcia Marquez et al. 2007), where is the system state at time , is the input of
the system at time , and is the observation of the system at time . System
disturbance and measurement noise are denoted by and respectively. The
corresponding coefficients, i.e., , Γ, , , , and , that can be time-dependent
are determined by the system characteristics. Equation (2-8) is called the state
equation which describes the evolution of system states. Equation (2-9) is the
observation equation that addresses the relationships between observations and
system states. In the original state space model, the observations are
conditionally independent from each other given the underlying system states .
However, some extended state space models bring in direct relationships between
observations, e.g., autoregressive hidden Markov model (HMM) (Logan and
Robinson 1997).
When the state space model is used to describe the degradation of an asset, the
underlying asset degradation process is modelled by the system equation, and the
underlying degradation process is partially revealed by observations (i.e.,
degradation indicators). Compared with other degradation models discussed above,
the state space model considers both stochastic asset degradation processes and
uncertain relationships between degradation indicators and health states. Therefore
the state space model can process partially observable degradation process more
efficiently and no additional mathematical model for time-dependent degradation
indicators is needed. Moreover, the state space model is an effective tool for
indicator fusion. Compared with commonly used multivariate statistical approaches
and multivariate time series analysis methods, the state space model can analyse
degradation indicators with uneven sampling intervals.
When system processes are discrete in time and state, a typical example of the state
space model is the HMM. The system state process of the HMM is a Markov chain
which is not observable. This hidden Markov chain is revealed by observations
2 Literature Review 30
probabilistically. The HMM, as a powerful pattern recognition tool, has been widely
used in engineering asset diagnosis. Miao adopted modulus maxima as a defect
feature. Further, in order to provide decision information for CBM, a two-stage
HMM-based classification system is presented using the feature extracted from
wavelet modulus maxima (Miao 2005). Li et al. obtained defected feature vectors by
the FFT, wavelet transform and bispectrum from the speed-up and speed-down
process in rotating machinery. After that, HMMs have been employed as the
classifiers to recognise faults (Li and Pham 2005). Ge et al. used a number of
autoregressive models to describe monitoring signals in different time periods of a
stamping operation and used the residues as the features. Then, a HMM was
introduced for classification (Ge et al. 2004).
Some research also employed the HMM to model degradation processes of assets.
When the HMM is used to model a degradation process, asset health states are
described by an unobservable Markov chain. An asset suffers a failure when the
underlying Markov chain reaches an absolved state that represents a failure state.
The underlying Markov chain is revealed by observations probabilistically. The
HMM is an appropriate tool to combine information from inspections with event
data (Jardine et al. 2006). A HMM has been constructed by Bunks, McCarthy et al.
The state probability densities and state transition probabilities were modelled.
Sixty-eight states due to different torque levels and defect types were used in the
model (Bunks et al. 2000). Wang modelled partially observable asset health states as
a three-state (i.e. good, defective, and failed) Markov chain. The delay time concept,
HMM and filtering theory were combined to form a prognosis model (Wang 2006).
The HMM was extended by adopting a continuous time discrete state Markov
process as the latent system process (Makis and Jiang 2003).
HMM assumes asset health states to be discrete. However, most engineering assets
degrade continuously. Therefore state space models continuous in state are often
more appropriate for engineering assets. Christer et al. developed a continuous state
discrete time state space model to estimate and predict the erosion status of a furnace
through its conductance ratios (Christer et al. 1997). Recently, Wang proposed a
2 Literature Review 31
new state space model by assuming increments of underlying health states follow a
Beta distribution (Wang 2007). Subsequently, Wang’s new model had a monotonic
increasing underlying degradation process that was more similar to irreversible
engineering asset wear processes. However, both of the two models developed by
Christer and Wang were discrete in time, which assumed that inspections and
failures can only happen at discrete time points with regular intervals. On the other
hand, irregular inspection intervals are often more cost-effective and failures always
happen between these discrete inspections.
Some state space degradation models continuous in time and state have been also
proposed. However, these sate space degradation models largely follows linear and
Gaussian assumptions. In linear and Gaussian state space model, both the state
equation and the observation equation follow a linear formulation and a Gaussian
random component. Wang et al. developed a state space model to predict the RUL of
bearings using root mean square (RMS) values of vibration signals (Wang 2002).
Wang’s model used values of RUL as underlying health states. This deterministic
underlying degradation process did not consider stochastic heterogeneous
degradation processes of different individuals. Whitmore et al. proposed a bivariate
Wiener process (Whitmore et al. 1998) to model a partially revealed degradation
process. The bivariate Wiener process is also a special type state space model.
However, the bivariate Wiener process only considered the covariates collected at
failure or censoring times, while degradation indicators at other occasions were
ignored. Hashemi et al. formulated a joint model of a counting process and a
sequence of longitudinal measurements, using the state space model based on the
Wiener process (Hashemi et al. 2003). Proust extended the model of Hashemi to a
nonlinear situation (Proust et al. 2006). The linear and Gaussian property can
provide convenience in mathematical operations. However, the degradation process
of a direct indicator and the relationships between direct and indirect indictors are
not necessarily linear. Moreover, the Gaussian assumption renders a degradation
process to be non-monotonically increasing. In contrast, most degradation processes
of direct indicators (e.g. wear, corrosion, and crack depth growth) are not reversible.
2 Literature Review 32
The nonlinear non-Gaussian state space models have also found its applications in
degradation modelling. Cadini et al. modelled a fatigue crack degradation process
that followed Paris–Erdogan model and was under non-destructive ultrasonic
inspection by a state space model. Particle based method was used to estimating the
failure probability. A optimal preventive replacement strategy was also developed
(Cadini et al. 2009). Orchard et al. predicted degradation indicators using the particle
filter with a feedback correction loop that could improve solution accuracy and
reduce uncertainty bounds (Orchard et al. 2009). However, these applications of
nonlinear non-Gaussian state space models largely adopt physical-based approach
and assume that model parameters are known.
2.1.4 Comments
Degradation modelling has been investigated intensively. However, two practical
issues are only partially addressed by existing research. The first issue is identifying
uncertain failure thresholds of degradation indicators. In reality, an identical
indicator of different individuals may have diverse values when a failure happens.
Therefore, setting a fixed failure threshold on this kind of indicators is not
appropriate. The second issue is fusing multiple degradation indicators extracted
from condition monitoring data. A failure mechanism is often revealed by more than
one degradation indicators, and information from these degradation indicators
should be fused properly. The state space model can solve these two issues
effectively. However, existing state space degradation models largely depend on
assumptions such as, discrete time, discrete state, linearity, and Gaussianity. The
discrete time assumption requires inspections and failures can only happen at
discretised time points, which is not realistic. The discrete state assumption entails
discretising continuous degradation states, which often introduces additional errors.
The linear and Gaussian assumptions are not consistent with nonlinear and
irreversible degradation processes in most engineering assets. Therefore, the
application of nonlinear non-Gaussian state space model is expected to model asset
degradation process more effectively.
2 Literature Review 33
2.2 Condition-based Maintenance Decision-Making
Most engineering assets experience deterioration with age and usage. During a
degradation process, the running cost of an asset increases while asset capability
decreases. When the degradation process crosses a failure threshold, a failure will
take place. As a result, additional expenditure for the unexpected breakdown is
incurred. To enhance asset capability and reduce cost, maintenance activities are
entailed in the procedure of asset operation. Maintenance is defined as actions to
“control the deterioration process leading to failure of a system” and “restore the
system to its operational state through corrective actions after a failure” (Blischke
and Murthy 2000). In the lights of this definition, the maintenance can be
categorised into preventive maintenance and corrective maintenance. Preventive
maintenance is adopted to control a deterioration process while corrective
maintenance is carried out to bring a failed asset back to a working state.
The preventive maintenance can be divided into three categories, i.e. design-out
maintenance, time-based maintenance, and condition-based maintenance (CBM)
(Blischke and Murthy 2000). The design-out maintenance refers to carrying out
optimisation during the design stage of a component. The time-based maintenance
can be further classified into three subcategories, i.e. clock-based maintenance, age-
based maintenance, and usage-based maintenance. The clock-based maintenance is
carried out at specified times. The age-based maintenance is performed at certain
age of a component. The usage-based maintenance, on the other hand, is scheduled
based on the usage of a component. When the condition monitoring information of a
component is available, the CBM is preferable. A major advantage of the CBM is
that unnecessary maintenance when an asset is in a good health state can be avoided
(Zhou 2007). This literature review focuses on the research of CBM strategies.
To obtain an optimal CBM strategy, several issues are to be addressed. The first is
the underlying degradation models that have been discussed in Section 2.1. Under
the CBM scheme, a degradation process is assumed to be revealed by the
information collected during inspections. Inspection scheduling is discussed in
2 Literature Review 34
Section 2.2.1. In Section 2.2.2, different objectives of maintenance strategy
optimisation are introduced. To achieve these maintenance optimisation objectives,
various optimisation algorithms have been developed. These optimisation algorithms
are reviewed in Section 2.2.3. Additional problems brought in by imperfect
inspections are discussed in Section 2.2.4.
2.2.1 Inspection Scheduling
Asset health inspection is a fundamental approach to acquiring information for CBM
decision-making. Similar to other maintenance activities, an inspection can incur
additional costs. Some inspection methods even insist on the shutdown of an asset.
Therefore, inspections should be well scheduled to reduce cost and enhance asset
availability.
Asset health inspections can be carried out continuously or only at discrete time
points. In practice, continuous condition monitoring is often technically or
economically impossible. Therefore, most current CBM methods adopt discrete
inspections. Wang classified the inspection interval of discrete inspections into three
categories: regular inspection intervals, inspection intervals with limited number of
different length, and inspection intervals with arbitrary length (Wang et al. 2000).
For example, Amari and McLaughlin employed a maintenance strategy with regular
inspection intervals (Amari and McLaughlin 2004). Grall et al. assumed that
inspection intervals of different length could be chosen from according to the
current health state (Grall et al. 2002). Identification of the next inspection epoch
was regarded as an optimisation problem given the past inspection information in the
research of Christer and Wang, and arbitrary length of inspection intervals could be
used (Christer and Wang 1995).
Some research about CBM strategies with continuous inspections is also available.
Marseguerra et al. considered a continuously monitored multi-component system
and used a Genetic Algorithm (GA) to determine the optimal CBM policy
(Marseguerra et al. 2002). A condition-based maintenance model was developed for
2 Literature Review 35
degradation processes that follow the Gamma process and are under continuous
monitoring (Liao et al. 2006a). Barata et al. modelled a continuously monitored
system through a Monte Carlo simulation method (Barata et al. 2002).
2.2.2 CBM Optimisation Objectives
A maintenance strategy is optimised according to a single or multiple objectives.
These objectives relate to the property of an engineering asset and its functions in an
enterprise. Most research in maintenance optimisation focuses on two aspects, i.e.
cost, and availability.
Cost is the most commonly used criterion to optimise maintenance strategies
(Hontelez et al. 1996; Barata et al. 2002). Most maintenance activities, e.g., health
inspections, preventive maintenance, and corrective maintenance, incur a certain
amount of costs. In addition, a failure usually causes an additional cost. Maintenance
costs can be state-independent (Wang et al. 2000), predetermined functions of health
states (Moustafa et al. 2004), or random variables that depend on health states (Chen
and Trivedi 2005). Health states can also affect operating costs (Moustafa et al.
2004) and production profits (Wang 2009). The cost objectives can be evaluated as
the expected cost per unit time over an infinite or finite horizon.
Cost is an effective criterion for maintenance decision-making. However, costs of
maintenance activities or an unexpected breakdown of an asset are often difficult to
evaluate. On the other hand, the down time of an asset can be often measured
accurately. In these situations, the availability becomes a more reasonable principle
to optimise maintenance strategies. The availability is the proportion of the time for
which a machine is available for use. The formulation of the availability is given by
Availability = Up Time / (Up Time + Down Time), (2-10)
where the up time is the time for which equipment is operable and the down time
refers to inoperable time. Amari and McLaughlin illustrated algorithms to find the
optimal model parameters that maximise the system availability (Amari and
2 Literature Review 36
McLaughlin 2004). Condition-based availability limit policy was developed for a
continuously monitored degrading system (Liao et al. 2006a).
Some maintenance strategy optimisation methods consider multiple objectives.
Marseguerra et al. performed maintenance decision-making as a multi-objective
search according to both the profit and availability of a multi-component system
(Marseguerra et al. 2002). Bris et al. optimised maintenance strategies according to
the cost under the constraint of availability (Bris et al. 2003). Munõz et al.
considered two objective functions (i.e. cost and risk) when optimising maintenance
strategies (Munõz et al. 1997). A constraint was set on one of the objective functions
while the other objective function was adopted to optimise the maintenance strategy.
Constraints were also set on the values of variables in the objective functions.
From the prospective of the entire business process, a maintenance strategy should
incorporate with other related strategies of enterprise management (e.g. the spare
inventory strategy, and the configuration of manufacturing system). Zhou performed
a joint optimisation of maintenance scheduling and the production dispatching in a
complex multi-product manufacturing system (Zhou 2007). The optimisation of
spare part inventory and maintenance policy was carried out simultaneously by Ilgin
and Tunali (Ilgin and Tunali 2007).
2.2.3 CBM Optimisation Methods
The objective function of maintenance decision-making can be established
according to a property of the renewal reward process (Ross 1996). The renewal
reward process can be defined as pairs , ; 1,2, … , , where with
distribution are the interval length of a renewal process ; 0 and are
the reward earned during the renewals. The rewards are independent and
identically distributed. The total reward up to time is given by ∑ . If
the expected length of a renewal interval and the expected reward per interval satisfy
2 Literature Review 37
∞ and ∞ , the long-run expected reward per unit time can be
obtained as
Lim ⁄ ⁄ . (2-11)
The asset life between two maintenance activities (e.g., preventive or corrective
replacement) that bring the asset to an as good as new state can be modelled as a
renewal interval. During the interval, a certain cost can be incurred. Therefore, a
degradation process under maintenance can be modelled as a renewal reward
process, and the long-run expected cost per unit time can be calculated as Equation
(2-11).
When the average cost per unit time can be calculated efficiently by Equation (2-11),
the optimal maintenance strategy can be obtained through directly searching the
strategy space. Park optimised the preventive replacement threshold and the
inspection interval for a Gamma degradation process using the renewal reward
process (Park 1988). Crowder and Lawless developed a maintenance strategy for an
asset whose degradation process follows a Gamma process with random with a
random effect that controled heterogeneity across units (Crowder and Lawless
2007). The expected cost per unit time was calculated according to Equation (2-11).
Grall et al. proposed a multi-level control limit strategy for a continuous degradation
process (Grall et al. 2002). The analytical formulation of the long-run expected cost
per unit time was established using the property of the renewal reward process.
The renewal reward process is a useful tool to optimise maintenance strategies.
However, in some applications, the number of variables to optimise is large and
some constraints may be applied to maintenance strategies. In these applications, it
is difficult to obtain the optimal maintenance strategy by directly searching the
strategy space. Artificial intelligence algorithms are often used to optimise
maintenance strategies in these situations. Marseguerra et al. used the genetic
algorithm (GA) to optimise the maintenance strategy regarding both availability and
profits (Marseguerra et al. 2002). Ilgin and Tunali used the GA to optimise the
preventive maintenance policy and spare provision simultaneously (Ilgin and Tunali
2 Literature Review 38
2007). The GA and hybrid GA/simulated annealing (SA) techniques were compared
when maintenance scheduling was optimised (Mohanta et al. 2007).
To calculate the expected cost per unit time using Equation (2-11), structure
characteristics of the optimal maintenance strategy (e.g., control limit theory) should
be identified first. In some situations, the structure of the optimal maintenance
strategy may be different from that predetermined subjectively. For example,
Moustafa et al. showed that the optimal maintenance strategy for the degradation
system that was discussed in their paper did not necessary follow the control limit
theory (Moustafa et al. 2004). Maillart also demonstrated counterintuitive structural
properties of the optimal maintenance strategy for a Markovian deterioration system
with obvious failures (Maillart 2006). When multiple maintenance activities can be
chosen from, it becomes even more difficult to identify the maintenance structure.
The Markov decision process (MDP) and its extensions are often used to investigate
the optimal structure of a maintenance strategy. Makis and Jardine used the MDP to
model the maintenance decision process for an asset that follows PHM, and the
condition for the effectiveness of the control limit theory on a hazard rate process
was derived (Makis and Jardine 1992). The structure characteristics of the optimal
maintenance strategy for a classic two-state production process were investigated by
several papers using the partially observable Markov decision process (POMDP)
that is an extension of MDP (Ross 1971; Wang 1976; White 1978; White 1979;
Grosfeld-Nir 2007). Hopp and Kuo investigated the structure of the optimal
maintenance strategy of partially observable aircraft engine components using the
POMDP (Hopp and Kuo 1998).
Besides the structure property of an optimal maintenance strategy, the parameters of
an optimal maintenance strategy can be also derived by the MDP and its extensions.
Chen and Trivedi performed joint optimisation of inspection rate and maintenance
activities using the semi-Markov decision process (SMDP) (Chen and Trivedi 2005).
Amari et al. developed a maintenance strategy optimisation method that could be
applied to a wide range of stochastic deterioration processes (Amari et al. 2006).
Chan and Asgarpoor developed a maintenance optimisation method that considered
2 Literature Review 39
both random failures and failures due to deterioration using the MDP (Chan and
Asgarpoor 2006). Moustafa et al. investigated maintenance strategies for a multi-
state semi-Markov deterioration process that involved multiple maintenance actions
(Moustafa et al. 2004). Both the control limit theory and the policy iteration for the
SMDP were applied to derive the optimal maintenance strategies. The results
showed that more cost-effective maintenance strategies could be developed by the
SMDP.
2.2.4 Imperfect inspections
In some situations, asset health states cannot be acquired deterministically by
inspections. Simply ignoring the uncertainty of asset health state estimates can cause
excessive false alarms or breakdowns without pre-alarms.
A commonly used approach to model the uncertain relationship between asset health
states and degradation observations is the PHM. Some research has been conducted
to optimise the maintenance strategy of an asset whose degradation process is
described by the PHM. Makis and Jardine used a backward recursion algorithm to
obtain the optimal maintenance strategy for a degradation process that followed the
PHM (Makis and Jardine 1992). Some applications and extensions of Makis’
method have been also published (Vlok et al. 2002; Lin et al. 2006; Ghasemi et al.
2008). Kumar and West used a total time on test (TTT) plot based on the PHM to
estimate the optimal maintenance time interval and threshold values for monitored
variables (Kumar and Westberg 1997). Kobbacy et al. proposed a full history PHM
for preventive maintenance scheduling, which considered multiple maintenance
cycles. However, Kobbacy’s method assumed that covariates were time independent
between two maintenance activities (Kobbacy et al. 1997).
An alternative approach to describing a partially observable degradation process is
the state space model. Compared with the PHM that estimates a health state only
based on the information acquired from the latest inspection, the state space model
uses the degradation observations up to the current time to identify a health state.
2 Literature Review 40
After the effects and costs of maintenance activities are considered, the state space
degradation model becomes a POMDP. Existing research on maintenance strategy
optimisation using the POMDP can be largely divided into two types. The first type
investigates the structure property of the optimal maintenance strategy. Monahan
(Monahan 1982) reviewed some early papers (Ross 1971; Wang 1976; White 1978;
White 1979) investigating maintenance strategies for a classic two-state production
process. Optimal maintenance strategy structures under different assumptions were
identified using the POMDP. More recent research by Grosfeld-Nir investigated the
two-state production process again using the POMDP and obtained a weaker
condition for the optimal policy to be of a control limit type (Grosfeld-Nir 2007).
The second type of research further identifies the optimal maintenance strategies by
solving POMDPs. Ghasemi et al. derived the optimal condition based maintenance
policy with regular maintenance intervals for a discrete state degradation process
using the POMDP (Ghasemi et al. 2008). Maillart optimised the inspection intervals
and maintenance actions at different health state using the POMDP (Maillart 2006).
Both perfect and imperfect inspections were considered. However, Maillart assumed
that the degradation process was discrete in time and state.
2.2.5 Comments
The CBM has been investigated comprehensively. However, most research assumed
that inspections can reveal health states completely. Existing approaches to
optimising maintenance strategies for partially observable degradation process are
still not enough. These maintenance optimisation approaches largely assume discrete
time and states. On the other hand, most practical asset degradation processes are
continuous in state. Moreover, failures and maintenance activities do not only
happen at discrete time epochs with regular intervals. Therefore, maintenance
strategy optimisation methods for partially observable degradation processes
continuous in time and state need more investigations.
2 Literature Review 41
2.3 Solving Algorithms for Nonlinear Non-Gaussian State
Space Models
This research investigates algorithms for asset life prediction and maintenance
strategy optimisation using the Gamma-based state space model. The Gamma-based
state space model does not have the linear and Gaussian assumptions. Therefore,
existing exact solving algorithms (e.g., Kalman filter) are not effective for the
Gamma-based state space model. This section reviews commonly used approximate
solving algorithms for nonlinear non-Gaussian state space models. Three types
solving algorithms are encountered during asset life prediction and maintenance
strategy optimisation. The first type is basic inference algorithms that estimate
distributions of underlying system states using observations. These inference
algorithms can be conducted recursively using Bayesian theory. The second type
solving algorithm is parameter estimation algorithms for the state space model. The
last type of solving algorithms addresses the control of the state space model. In
practice, the change of states can bring in certain costs or rewards, e.g., in
degradation modelling when the change of underlying health states indicates a
failure the costs for a breakdown and corrective maintenance are induced. Control
algorithms optimise the actions that can affect the state transition to minimise the
costs or maximise the rewards. The three types of solving algorithms for the
nonlinear non-Gaussian state space model are reviewed in following sections.
2.3.1 Basic Inference Algorithms
Two basic inference algorithms for the state space model are used in this research,
i.e., filtering and smoothing.
2.3.1.1 Filtering
The filtering algorithm estimates the present system state using the observations up
to the current time. For a state space model continuous in state, the filtering can be
performed recursively as
2 Literature Review 42
| : | | | : , (2-12)
where and denote the system state and the observation at the th inspection. In
this research, the filtering algorithm is to estimate the present health state given
degradation indicators up to the current time. In addition, the filtering algorithm is
the basis of other solving algorithm for the state space model.
For state space models with the linear and Gaussian assumptions, the Kalman filter
is used to estimate system states analytically. In a linear and Gaussian state space
model, if the filtering result at the th inspection, i.e., | : , follows the
Gaussian distribution, the next filtering result | : will also follow the
Gaussian distribution. Therefore, if the initial state is known or follows the Gaussian
distribution, the following state estimates will all follow the Gaussian distribution. A
Gaussian distribution can be represented by two variables, i.e., the mean value ,
and the variance . Consequently, the filtering algorithms essentially calculate
mean values and variance values of system states at different time steps. The
Kalman filter provides an approach to calculate the mean value and the variance
value recursively using , , and the current observation . For the
derivation of the Kalman filter, readers can refer to (Yu et al. 2004).
For nonlinear non-Gaussian state space models the filtering result | : does
not necessary follow some particular parametric distribution. Therefore, the
distribution | : cannot be represented exactly by a fixed number of
parameters, and some approximate filtering algorithms are required to process non-
linear non-Gaussian state space models. A commonly used approximate filtering
algorithm is the extended Kalman filter (EKF). The EKF performs local linearization
of the state equation and the observation equation by derivatives. After the
linearization, the original state space model is approximated as a linear and Gaussian
state space model and the Kalman filter can be adopted. However, when the state
equation and the observation equation are highly non-linear, derivatives do not
obtain satisfactory approximate results. To improve the performance of EKF,
another approximate filtering algorithm named the unscented Kalman filter (UKF)
2 Literature Review 43
was developed (Julier and Uhlmann 1997). The UKF uses a deterministic sampling
technique, i.e., the unscented transform, to pick a minimal set of sample points (i.e.,
sigma points). These sigma points are processed by the state equation and the
observation equation. After that, the mean value and the variance value are
recovered according to these sigma points.
Another filtering algorithm for the non-linear non-Gaussian state space model is the
particle filter. Different from the UKF, the particle filter represents the distribution
of a system state by a large amount of random numbers instead of several
deterministic sampling points. In addition, the result of the particle filter is not fitted
by a Gaussian distribution. Therefore, the particle filter can obtain more accurate
estimation results than the UKF at the expense of lower efficiency. Due to the
enhancement of computational power, the particle filter is becoming prevalent in
processing a nonlinear non-Gaussian state space model.
The process of the particle filter follows the principle of importance sampling: a
function · is assumed to be the PDF of a distribution difficult to draw samples
from directly. The values of function · are proportional to those of · . An
“importance density” · that can generate random numbers easily is selected to
generate a certain number of particles ~ 1,2, … , , where is the index
of a particle, and is the number of particles. Following on, the distribution · is
represented approximately as
∑ , (2-13)
where · is the Dirac delta measure given by
0, 1,
, (2-14)
and the weight of the th particle is calculated according to
. (2-15)
For the particle filter, Equation (2-15) can be written as
2 Literature Review 44
: | : : | : , (2-16)
and the weights : ; 1,2, … , can be calculated recursively
according to
· | | | , . (2-17)
For the derivation of Equations (2-17) from (2-16), readers can refer to
(Arulampalam et al. 2002b). After : are worked out, the posterior distribution of
the system state at the th inspection is approximated as
| : ∑ 1,2, … , . (2-18)
A problem of the particle filter is degeneracy. After several time steps, the weight of
one particle may have a dominant value, while the weights of the other particles tend
to be zero. In this situation, most computation efforts are spent to the particles whose
effects on filtering results are ignorable. The degeneracy is caused by the difference
between the importance density | , and the posterior density | : .
Doucet proofed that the variance of the importance weights : increases with the
time index (Doucet et al. 2000). The degeneracy can be alleviated by adopting an
importance density close to the posterior density and can be overcome by resampling
the particles.
Arulampalam summarised the approaches to obtain an approximate posterior density
during particle filtering (Arulampalam et al. 2002b). Doucet et al. proposed a local
linearization method to obtain an importance density that is close to the posterior
density. This local linearization method is similar to the EKF (Doucet et al. 2000).
The Unscented Particle Filter (Van Der Merwe et al. 2000), which obtains the
importance density using the unscented particle filter. These methods can improve
the filtering results when the posterior density | : is significantly different
from the prior density | : . However, the addition approximation step makes
the filtering algorithm more computational expensive. Therefore, when the posterior
density and the prior density do not have significant difference, increasing the
number of particles is a more efficient way to improve the filtering result.
2 Literature Review 45
Adopting an importance density close to the posterior density can only alleviate the
degeneracy. The variance of the importance weights : still increases over time.
Therefore, resampling particles according to their weights is often indispensable in
particle filtering methods. However, the resampling brings in another problem. After
resampling, a particle with a large weight can repeat for many times and the
diversity among the particles is lost. This phenomenon is named sample
impoverishment. A filtering result that suffers from a severe sample impoverishment
is a poor representation of the posterior density. Some methods to solve the sample
impoverishment were discussed in (Arulampalam et al. 2002a). Similar to the
algorithms for approximating posterior densities, these algorithms that reduce the
sample impoverishment also require additional computational efforts.
Both the significant difference between the importance density and the posterior
density and sample impoverishment are caused by small observation noise.
However, in degradation modelling, the observation noise is often considerable
otherwise the threshold crossing model is more appropriate than the state space
degradation model. Therefore, this paper does not consider the approximation of the
posterior density during importance sampling. The Sampling Importance
Resampling (SIR) filter, one of the most commonly used particle filters, is adopted
in this paper. The SIR filter chooses the prior density as the importance density,
i.e. | , | , and particles are resampled in every time step.
During the resampling, random numbers : ; 1,2, … , are
sampled from : ; 1,2, … , according to weights : . When the
observation noise is not much smaller than the process noise, the SIR filter is an
efficient and effective approach to estimate underlying system states of a state space
model.
2 Literature Review 46
2.3.1.2 Smoothing
Filtering algorithms only use the observations up to the current time when estimating
the posterior distribution of a system state. Smoothing algorithms, on the other hand,
use the entire sequence of observations to estimate the distribution of a system state,
i.e. | : . Therefore, smoothing algorithms can obtain more accurate and
robust estimates of underlying system states.
Two types of particle smoothing algorithms are commonly used, i.e. the forward-
backward smoother, and the two-filter smoother. The two methods are both based on
the result of particle filtering, i.e., :: ; 1,2, … , 1,2, … , . During
the forward-backward smoothing, the filtering result at time step is adopted as the
smoothing result at that time, i.e. : : . Then the weights are calculated
recursively from the end to the beginning according to
| ∑ 1|
∑ 1 11 1,2, … , 1,2, … , 1, (2-19)
where :: ; 1,2, … , 1,2, … , are the smoothing results. After
that, the smoothing result at the th step are resampled from the filtering result :
according to the weights | . The idea of the two-filter smoother can be
demonstrated as
| : | : , :| : : | : ,
: | :
| :
: |
. (2-20)
For details of the two-filter smoother, readers can refer to (Klaas et al. 2006). A
drawback of the two smoothing algorithms is that the system states at different time
steps are estimated independently. In many applications the joint distribution of
system states at different time steps is required. For example in this research joint
distributions of two adjacent system states are required duration parameter
estimation.
2 Literature Review 47
To address this drawback of particle smoothing, Godsill et al. (Godsill et al. 2004)
proposed the particle smoother using backward simulation algorithm. This algorithm
can obtain the joint distribution of the whole sequence of system states given the
entire sequence of observations. Therefore, this smoothing algorithm was widely
adopted in the parameter estimation of non-Gaussian non-linear state space models
(Gibson and Ninness 2005; Kim 2005; Schön et al. 2006). The particle smoother
using the backward simulation algorithm is based on the results of the particle filter,
i.e. :: . The recursive algorithm to calculate the weights of the smoothing
particles is given by
|, 1|
∑ 1 1
1,2, … , 1,2, … , 1, (2-21)
where |, denotes the weight of the th filtering particle at the th time step
corresponding to the th smoothing particle at the 1 th time step. A random
number is then resampled from : according to the weights |, :
|, ; 1,2, … , .
2.3.2 Parameter Estimation Algorism
The main difficulty in parameter estimation for a state space model is that the
underlying system states are not observable. Therefore, the complete likelihood
function cannot be directly used to estimate the parameters. Instead, the parameters
are obtained by maximising the marginal likelihood function of the observations.
This marginal likelihood function entails integration. For a linear Gaussian state
space model, the marginal likelihood function can be calculated analytically
(Christer et al. 1997). On the contrary, for a nonlinear non-Gaussian state space
model, the closed-form of this marginal likelihood function is not available.
Consequently, Monte Carlo-based methods are often used in parameter estimation.
Three different types of parameter estimation algorithms are often used for nonlinear
non-Gaussian state space model: i.e., Gradient-based methods, EM algorithms, and
Markov chain Monte Carlo (MCMC) algorithms.
2 Literature Review 48
2.3.2.1 Gradient-based Methods
The gradient of the marginal likelihood function of a nonlinear non-Gaussian state
space model can be evaluated by sequential Monte Carlo methods. These sequential
Monte Carlo methods were reviewed in (Andrieu et al. 2004). Given these
calculation methods, the marginal likelihood function can be maximised through
gradient-based methods. Doucet and Tadić developed gradient-based parameter
estimation methods for nonlinear non-Gaussian state space models (Doucet and
Tadić 2003). The parameter estimation method developed by Doucet can be
performed recursively both in batch or online.
For a nonlinear non-Gaussian state space model, the marginal likelihood function
and its gradients are approximated by particle-based method. Consequently, large
number of local maxima exists in the values of the marginal likelihood function.
Schön shows that the Gradient-based methods can be easily converged to these local
maxima (Schön et al. 2006).
2.3.2.2 Expectation-maximization (EM) Algorithms
The EM algorithm is an extension of the maximum likelihood estimation (MLE)
method to deal with a model with unobservable variables. The EM algorithm was
first proposed by Dempster et al. (Dempster et al. 1977). After that, Wu investigated
the convergence property of the EM algorithm (Wu 1983). The EM algorithm
consists of two steps, i.e., the Expectation (E) step and the Maximization (M) step.
The E step estimates the expected complete likelihood function given a set of
parameters. In the M step, a new set of parameters is obtained by maximising the
expected complete likelihood function that is established in the E step. These new
parameters are again used in the E step. The E-M iteration continuous until a
convergence condition is satisfied. The EM algorithm uses the expectation of the
complete likelihood function instead of the marginal likelihood function. In most
situations, the complete likelihood function can be estimated and evaluated more
2 Literature Review 49
easily. Therefore, the EM algorithm is often used to estimate the parameters of a
model with unobservable variables. When the EM algorithm is used to estimate the
parameters of a state space model, the gradient of the marginal likelihood function is
not required. Furthermore, the EM algorithm is more robust against attraction to
local maxima than gradient-based methods (Gibson and Ninness 2005).
Some research has been performed to estimate the parameters of nonlinear non-
Gaussian state space models through the EM algorithm. Schön et al. developed an
EM algorithm based on the particle smoother for a state space model with Gaussian
noise (Schön et al. 2006). In their research, a numerical experiment was also carried
out to demonstrate the robust of the EM algorithm against local maxima. Wills et al.
developed an EM algorithm based on the particle smoother for a general stochastic
nonlinear state space model (Wills et al. 2008). Kim used the EM algorithm based
on the particle smoother to estimate the parameters of the stochastic volatility
models (Kim 2005). The missing observation issue was also discussed by Kim. In
addition, Kim also developed a method of moments to identify the initial parameters
for the EM algorithm. Olsson et al. applied the fixed-lag particle smoother to process
a long sequence observations when performing the EM algorithm (Olsson et al.
2008)
2.3.2.3 Markov Chain Monte Carlo (MCMC) Algorithms
The MCMC parameter estimation algorithm is similar to the EM algorithm. The
difference is that the EM algorithm updates parameters deterministically while the
MCMC method generates new parameters from a distribution. This property of
MCMC algorithm can effectively prevent parameter estimates from converging to a
local mode.
Chopin proposed an MCMC algorithm to perform particle filtering and identify
parameters simultaneously (Chopin 2002). Doucet et al. combined the idea of
simulated annealing to the process of MCMC parameter estimation and developed
an algorithm named State-Augmentation for Marginal Estimation (SAME) (Doucet
2 Literature Review 50
et al. 2002). Jacquier et al. used an algorithm similar to SAME to estimate the
parameters of two latent state models central to financial econometrics (Jacquier et
al. 2007). Jacquier also proposed methods that provide standard errors and
convergence diagnostics.
Compared with the EM algorithm, the MCMC algorithm is more robust against
attraction to local maxima. However, in some situations, especially when the size of
a parameter vector is large, generate a sample from the conditional distribution of
the parameter vector can be troublesome and less efficient. In addition, the MCMC
algorithm may suffer from an accumulation of error over time and can even diverge
over time (Andrieu et al. 2004).
2.3.3 Control Algorithms for the State Space Model
In some applications of the state space model, the transition of system states can
cause reward or cost and some actions can be adopted to change the transition
probability of the system states. Control algorithms for the state space model are
used to select an optimal action according to the current system state so as to
minimise the cost or maximise the reward. A commonly used model to describe this
control process is the POMDP. By solving the POMDP, the optimal strategy can be
obtained to minimise the cost or maximise the reward.
To further discuss the POMDP the complete observable MDP needs to be introduced
first. As an extension of the Markov chain, the MDP considers additional actions
that can change state transition probabilities and the rewards (costs) that are caused
by state transitions. A typical MDP can be represented by a tuple , , · ·,· , · ·
`,· , where and denote the finite sets of states and actions, respectively. The
transition probability , Pr | , denotes the
probability that the state at time 1 is given that the state at time was and an
action was adopted at time . The reward function , represents the reward
that can be obtained when the state changes from to and an action is selected.
2 Literature Review 51
In some applications, the reward function , is replaced by the cost function
, . An optimal policy can be obtained as a policy function by
solving the MDP. When the parameters of a MDP are known, two commonly used
solving algorithms are the value iteration and the cost iteration. For details of the
two solving algorithms, readers can refer to (Puterman 1994). During the
optimisation, the objective function is the expected long-term average reward (cost)
per unit time or the expected long-term discounted reward (cost).
In a POMDP underlying states of a system is partially revealed by observations, and
a probability distribution over the current state can be obtained by filtering
algorithm. This probability distribution, namely the belief, summarises the entire
history of observations and actions. A POMDP can be converted to an MDP by
maintaining a consistent belief set. However, since the belief space of a POMDP is
continuous, conventional solution methods for a discrete state MDP cannot be
directly applied to solve a POMDP. The main difficulty in solving the POMDP is
the representation of the value function that is a crucial component of both the value
iteration and the policy iteration. For a discrete state MDP, the value function can be
maintained easily as a table with one entry per state. However, for a POMDP, the
belief space is continuous and the value function can be an arbitrary function over
this continuous belief space. This arbitrary function cannot be represented by a table
with finite number of entries.
Fortunately, for a discrete state POMDP, the value function is piecewise-linear and
convex (PWLC). This PWLC function can be represented by the supremum of a
finite number of hyperplanes that are denoted by an -vector (Monahan 1982).
Based on these hyperplanes, some effective solution algorithms have been
developed (Sondik 1978; Cassandra et al. 1997; Kaelbling et al. 1998). However,
these methods for a discrete state POMDP cannot be generalised to solve a
continuous state POMDP, because infinite-dimensional -vectors are required to
represent the value function of a continuous state POMDP.
2 Literature Review 52
To address this difficulty, some approximate solution methods for a continuous state
POMDP have been proposed. Porta generalised -vectors to -functions and
modelled beliefs, actions, observations, and rewards by Gaussian mixtures (Porta et
al. 2005). A disadvantage of Prota’s method is that the number of Gaussian mixtures
used to represent the functions of interest increases exponentially with the number of
value iterations. Bertsekas proposed a closed-form solution method for a linear
system with quadratic cost (Bertsekas 2005). The linear and quadratic assumption
limits the application of Bertsekas’ method. Thrun presented a Monte Carlo
algorithm for learning to act in a continuous state POMDP (Thrun 2000). In Thrun’s
algorithm, each belief state was represented by a group of samples, and each sample
was presented by particles. The number of the samples determined the dimension of
the belief space. A large number of samples were required to get a close
approximation result. In addition, Kullback–Leibler divergences between a new
belief and different samples were required in the process of reinforcement learning.
Therefore, the efficiency of Thrun’s algorithm still needed to be enhanced. An
effective method to improve the efficiency of solution methods for a continuous
POMDP is reducing the dimension of the belief space. Brooks et al. proposed a
parametric method to solve continuous state POMDPs (Brooks et al. 2006). Brooks’
method reduced the dimension of a POMDP by representing the belief of a POMDP
using a Gaussian distribution. Thus Brooks compressed the dimension of a POMDP
from infinite to two. The EKF was used to estimate the belief of a POMDP at
different time points. To make the solution method more effective for nonlinear non-
Gaussian state space model, a more recent paper of Brooks adopted the particle filter
to estimate the belief of a POMDP (Brooks and Williams 2007). The Gaussian
distribution was still used to approximate beliefs. Zhou proposed a similar Monte
Carlo based solution method for POMDP (Zhou et al. to appear). However, Zhou
considered the whole exponential family distributions when approximating the belief
of a POMDP. This extension made Zhou’s method more appropriate when the belief
state did not follow the Gaussian distribution. Moreover, Zhou developed rigorous
theoretical error bounds for her algorithm.
2 Literature Review 53
2.3.4 Comments
Compared with state space models with discrete states and those with linear and
Gaussian assumptions, most nonlinear non-Gaussian state space models cannot be
solved analytically. Some approximate algorithms are used to calculate system state
estimates, model parameters, and the optimal control policy of nonlinear non-
Gaussian state space models. With the enhancement of the calculation ability of
computers, Monte Carlo-based algorithms are becoming more and more popular in
processing nonlinear non-Gaussian state space models. Some effective Monte Carlo-
based algorithms have been proposed. However, the applications of these algorithms
in asset degradation process modelling and maintenance strategy optimisation are
still limited. This research systematically addresses the issues that are encountered
when these Monte Carlo-based algorithms are used to predict asset lives and
optimise maintenance strategies based on the Gamma-based state space degradation
model.
54
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators
3.1 Introduction
Asset health inspections can produce two types of indicators: (1) direct indicators
(e.g. the thickness of a brake pad and the crack depth on a gear) which directly relate
to a failure mechanism; and (2) indirect indicators (e.g. the indicators extracted from
vibration signals and oil analysis data) which can only partially reveal a failure
mechanism.
Direct and indirect indicators both have advantages and disadvantages. Direct
indicators provide more accurate references for asset degradation modelling, while
they are often technically or economically impossible to sample frequently. For
example, the crack on the tooth of a gear cannot be measured online. Similarly, the
wear of the impeller in a pump cannot be measured during its operating period.
Directly applying degradation models to these direct indicators with limited sample
size is often not practically possible. Different from direct indicators, indirect
indicators can be often obtained easily through various CM techniques. However,
some statistical models (e.g., PHM (Jiang and Jardine 2008), and the logistic
regression model (Xu and Zhao 2005)) are required to identify the uncertain failure
threshold – “gray boundary” (Liao et al. 2006b) on an indirect indicator. These
statistical models require sufficient failure history which is sometimes not available
in practice.
Instead of straight applying degradation models to direct indicators or identifying
“gray boundaries” on indirect indicators, some researchers investigated quantitative
relationships between direct and indirect indicators. For example, the wear status of
the impeller in a slurry pump can be assessed through the cumulative amplitude
measure evaluated from its vane pass frequency (Mani et al. 2008). The average
vibration amplitude sampled at a bearing changes with the degrees of the angular
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 55
misalignment of the shaft (Sun et al. 2006). Indicators extracted from a vibration
signal relate to the crack size on a bearing (Shiroishi et al. 1997). These relationships
between the two types of indicators make it possible to estimate more desirable
direct indicators through the related indirect indicators which can be obtained more
easily.
An efficient approach to estimating direct indicators using indirect indicators is the
state space model. In the state space model, the degradation process of a direct
indicator is modelled by a state equation, and the relationship between a direct
indicator and its corresponding indirect indicator is described by an observation
equation. Subsequently, the state space model is able to consider both the
information from the stochastic degradation process of the direct indicator and the
uncertain relationship between a direct indicator and an indirect indicator.
This chapter develops a state space model that does not have discrete state, discrete
time, linear, and Gaussian assumptions to describe degradation processes of direct
and indirect indicators. Among non-Gaussian stochastic processes, the Gamma
process has been widely used to model a range of direct indicator degradation
processes, e.g. fatigue crack growth (Lawless and Crowder 2004), corrosion of
pressure vessel (Kallen and Van Noortwijk 2005), and brake-pad wear for
automobiles (Crowder and Lawless 2007). The prevalence of the Gamma process is
due to its monotonically increasing property which is consistent with most direct
indicator degradation processes. Therefore, a Gamma-based state space model is
investigated as an example of the non-Gaussian non-linear state space model. Monte
Carlo-based parameter estimation and life prediction algorithms are developed to
solve the Gamma-based state space model.
The body of this chapter is organised as follows. Section 3.2 introduces the
formulations and solving methods of the Gamma-based state space degradation
model. Then, simulation studies are performed in Section 3.3 to demonstrate the
performance of the solving algorithms. A case study using the data from an
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 56
accelerate life test of a gear box is conducted in Section 3.4. Finally a belief
summary of this chapter is provided in Section 3.5.
3.2 Model Formulations and Solving Algorithms
3.2.1 Model Formulations
The formulation of the Gamma-based state space model can be divided into two
components, i.e. the state equation given by
~ , , (3-1)
and the observation equation given by
. (3-2)
Here, denote the direct indicator at time , and represents the indirect
indicator at time . As shown in Equation (3-1), the direct indicator is assumed to
follow a Gamma process whose increments follow the Gamma distribution,
where , denotes the Gamma distribution with a shape
parameter and a scale parameter . In asset degradation modelling,
a commonly used formulation of is given by
· . (3-3)
A brief introduction of the Gamma process has been given in Section 2.1.1.1, and
readers can also refer to (van Noortwijk 2009). The observation equation (3-2)
assumes that the indirect indicator follows a function of the corresponding direct
indictor, i.e. , plus the observation noise . In this chapter, follows
a power formulation, i.e.
· . (3-4)
This power formulation can model various nonlinear relationships and has only two
parameters. Subsequently, the power formulation is an appropriate candidate to
model the nonlinear relationship between direct and indirect indicators. For some
practical dataset, other formulations may be more appropriate. These formulations
can be also treated by the algorithms proposed in this chapter. The formulation
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 57
selection methods for a practical dataset are discussed in Section 3.2.4. The
observation noise at different inspections is presumed to follow an identical
independent normal distribution, i.e. ~ 0, . Both the underlying health state
process and the observation process are continuous in time and state. In
addition, the underlying state process follows a non-stationary Gamma process.
Subsequently the proposed Gamma-based state space model can operate without the
discrete state, discrete time, linear, and Gaussian assumptions.
The degradation indicator observations used in this chapter are denoted as follows.
Only a single sequence of degradation indicators is considered to make the
formulations in parameter estimation algorithm more concise and understandable.
The formulations for multiple indicator sequences can be established by extending
the formulations developed in this chapter without any theoretical difficulties.
Inspection times are assumed to be : ; 1,2, … , , where is the number
of inspections. The direct and indirect indicators at the th inspection are denoted as
and , respectively. Only a part of direct indicators : ; 1,2, … ,
are assumed to be observable due to the difficulties of measurement, while indirect
indicators : ; 1,2, … , are all known. A function · given by
0 is not observable1 is observable (3-5)
is used to indicate the observability of a direct indicator. The inspection index of the th observable direct indicator is denoted as 1, … where is the number of
observable direct indicators. Obviously, 1, 1, … . The direct
indicator is assumed to be observable at least at one inspection time, i.e., 0.
3.2.2 Parameter Estimation
The EM algorithm is adopted to estimate the parameters of the Gamma-based state
space model. Three issues need to be addressed to perform the EM algorithm.
Firstly, the state process of the Gamma-based state space model does not follow the
linear and Gaussian assumptions. Consequently, the expectation of the complete
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 58
likelihood function, the marginal likelihood function, and the variance-covariance
matrix of parameter estimates cannot be calculated analytically. Motivated by (Kim
2005), this research uses the particle filter and smoother that are based on Monte
Carlo simulations to deal with this non-Gaussian and non-linear situation. The
second issue is the combination of observable direct indicators into the marginal
likelihood function. These observable direct indicators are brought in by the particle
filter and smoother during the E step of the EM algorithm. The last issue is
enhancing the efficiency of the time-consuming Monte Carlo-based EM algorithm.
This issue is addressed by dividing the EM algorithm into two stages with different
numbers of particles, and improving the convergence checking strategy for the EM
iterations.
The whole process of the EM algorithm can be divided into four steps. The first step
estimates initial parameters. Inappropriate initial parameters may cause the final
optimisation result to become trapped in a local maximum point, or even make the
EM algorithm divergent (Wu 1983). The second step, namely the E step, estimates
the expectation of the complete likelihood function. Subsequently, a new set of
parameters are obtained by maximising the expected complete likelihood function
during the M step. The final step checks the convergence of the EM loop. If the
convergence condition is satisfied, the final result of parameter estimation is
obtained. Otherwise, an additional EM iteration begins. These four steps are
discussed in detail as follows:
3.2.2.1 Initial parameter identification
According to the assumptions of this research, direct indicators are known at some
inspection time points. The complete likelihood function can be established based on
these observable direct indicators : ; 1,2, … , and their
corresponding indirect indicators : ; 1,2, … , . Subsequently, initial
parameters can be obtained by maximising this complete likelihood function based
on : , and : .
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 59
3.2.2.2 E step
The E step estimates the expectation of the complete likelihood function that can be
decomposed into two components as
log : , : | log : | log : | , : .(3-6)
In Equation (3-6), the notations , , and denote the vectors of parameters to
estimate, where , , , , , , , , , and , , .
The two components of Equation (3-6) can be further calculated as
log : | log ∏ ; ,∑ log logΓ 1 log ⁄
(3-7)
and
log : | , :
log log2 ∑ 2, (3-8)
respectively, where 2,3, … , , and
2,3, … , . The four expected values (i.e., , , , and log ) in
Equations (3-7) and (3-8) are estimated through the particle smoothing algorithm.
To perform the particle filter and smoother, the state process of the Gamma-based
state space model should be identified first. According to the Gamma Bridge
property (van Noortwijk 2009), the system state process changes from the Gamma
process to a hybrid stochastic process after the observations of direct indicators are
considered. The hybrid stochastic process can be written as
: , :
,
; ,0,
|; , 0,
, (3-9)
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 60
where is the inspection index of the next observable direct indicator given the
current inspection index , i.e. ; , 1 , and
·; , denotes the PDF of the Beta distribution with shape parameters and .
The derivation process of Equation (3-9) is given in the Appendix. When the state
process follows other stochastic processes, the corresponding hybrid stochastic
processes considering observable direct indictors can be calculated by a similar
process.
The formulations of the original particle filter and smoother are modified according
to the posterior stochastic process given by Equation (3-9). The weights of filtering
particles are updated recursively according to
· | | , | , ,
· | | | ,. (3-10)
The two importance density functions, i.e. , , and , ,
follow the two equations in Equation (3-9), i.e. , and | . At the
time points : ; 1,2, … , , when direct indicators are known, all the
particles are simply set to the values of observable direct indicators : . At the
other time steps, the SIR algorithm is performed. Similarly, the recursive weights
evaluation equation for smoothing particles is modified to
|, | , ∑ ,
| ∑ . (3-11)
The two components in Equation (3-11), | , and , follow
the two equations in Equation (3-9). At times , when 1 or 1
1 , the results of particle filtering are directly taken as the results of particle
smoothing. At the other time points, particle smoothing is carried out by the
backwards simulation algorithm. Finally, sequences of samples are generated,
i.e. :: ; 1,2, … , 1,2, … , . Using :
: , the expected values in
Equations (3-7) and (3-8) are calculated according to
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 61
∑ ∑
∑ log ∑ log .(3-12)
3.2.2.3 M step
After the expectation of the complete likelihood function is obtained by the E step,
the maximisation step (i.e. M step) can be carried out. Equations (3-7) and (3-8) are
optimised separately during the M step.
By calculating the partial derivative of Equations (3-7) with respect to the variable ,
the parameter can be represented as:
⁄ . (3-13)
A new equation with parameters and can be established by substituting Equation
(3-13) into Equation (3-7). Estimates of parameters and can be then achieved by
maximising the new equation using a multivariate optimisation algorithm. Estimates
of and from the last EM iteration can be used as the initial values for the
multivariate optimisation algorithm According to the two estimates and , the
parameter estimate can be calculated as Equation (3-13).
Similarly, by calculating partial derivatives of Equation (3-8), relationships between
parameters can be obtained as:
∑ ∑ , (3-14)
and
∑ 2 . (3-15)
A new equation with the parameter can be established by substituting Equations
(3-14) and (3-15) into Equation (3-8). The parameter estimate can be then
obtained by maximising the new equation. Subsequently, the parameter estimates
and can be calculated according to Equations (3-14) and (3-15).
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 62
3.2.2.4 Convergence checking
The EM algorithm essentially maximises the marginal likelihood function.
Therefore, the increment of the marginal likelihood function is a commonly used
indicator for the convergence of the EM algorithm. However, the computation of the
marginal likelihood function of the Gamma-based state space model involves
recursive Monte Carlo sampling, which cannot be performed efficiently. An
alternative method is to calculate the relative likelihood function (Kim 2005). This
calculation is based on the result of the particle smoother in the E step with no
additional Monte Carlo sampling being required. Consequently, evaluating the
relative likelihood function is a more efficient solution than directly calculating the
marginal likelihood function. The relative likelihood function of the Gamma-based
state space model is given by
log : , :
: , :log : , :
: , :: , :
log ∑ : , :
: , :
, (3-16)
where : is the th sequence of smoothing particles during the th EM loop, and
· denotes the PDF about direct and indirect indicators calculated using the
parameters estimated at the th EM loop.
Theoretically, the relative likelihood function converges to zero. However, the EM
algorithm used in this research is based on Monte Carlo approximation, and the
relative likelihood function itself is estimated using a Monte Carlo method.
Therefore, the relative likelihood function does not decrease smoothly to zero.
Rather, according to the simulation study, it decreases with an obvious trend during
the first several iterations and then becomes more fluctuating. Following on, if the
number of particles used in the EM algorithm increases, the relative likelihood
function decreases again and fluctuates within a smaller range closer to zero.
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 63
Therefore, the fluctuating value of the relative likelihood function indicates the
convergence of the EM algorithm for the current number of particles.
An alternative method to check the convergence of the EM algorithm is to directly
monitor the trend of parameter estimates. When EM iterations converge, parameter
estimates finally fluctuate within a certain range. This range becomes smaller when
more particles are used. Therefore, the development of parameter estimates is also
an indicator for the convergence of the EM algorithm. However, it usually takes a
relatively larger number of EM iterations to detect that parameter estimates are
fluctuating without an apparent trend.
Follow the idea of (Kim 2005), this research task develops a two-stage EM
algorithm to strike a balance between the efficiency and accuracy of parameter
estimation. During the first stage, 1,000 particles are used. The development
processes of parameter estimates are used as the indicator of convergence, because
the relative likelihood function calculated using 1,000 particles contains relatively
larger errors. Another reason is that EM iterations using 1,000 particles are still
efficient, and a small number of additional iterations do not cause a drop in overall
efficiency. In the second stage, 2,000 particles are adopted. At this stage, the relative
likelihood function – calculated using 2,000 particles – is preferred because it is
more accurate. Thus, the parameters estimated at the last loop of the second stage are
taken as the final results. The number of particles was chosen after some simulation
experiments that are discussed in Section 3.3.1.
3.2.3 Variance-Covariance Matrix of the Parameter Estimates
After the parameters of the Gamma-based state space model are estimated, the
variance-covariance matrix should be calculated to obtain the confidence intervals of
parameter estimates. Kim gives a method to calculate the Variance-Covariance
matrix via particle smoothing (Kim 2005). According to Kim’s method, the
observed information matrix can be written as
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 64
: , : ∑ : , :
∑ : , : · : , :
∑ : , : ∑ : , :
, (3-17)
where : is the th sequence of smoothing particles. Once the observed information
matrix is calculated, the variance-covariance matrix can be obtained by taking the
inverse of it.
3.2.4 Model Selection
In reality, a degradation dataset can be often fitted by state space models with
different formulations. Among these candidate formulations, the one with the best
fitness result can be identified by using a model selection criterion.
Some model selection methods have been proposed in literatures. The first type is
sequential null hypothesis methods. This kind of methods allow for variables to be
added or deleted at each step. The sequential null hypothesis methods are mainly
used to deal with the nested models. The results may depend on the choice of
subjective levels. When a model is used to do the estimation and prediction, a
more straightforward solution is comparing the mean square error (MSE) of the
results derived by candidate models. This kind of methods is effective and easy to
carry out. However, a large number of samples are required to get a confident
conclusion. Some methods are proposed to conduct this kind of methods when the
dataset is limited, e.g. cross-validation methods. However, it is still computer
intensive to derive the MSE for many times which is required when the MSEs
derived by two models are similar to each other. Two more quantitative and efficient
model selection methods are Akaike's information criterion (AIC) and Bayesian
information criterion (BIC). AIC, proposed by Akaike in 1974 (Akaike 1974), is
developed based on the information theory. On the other hand, BIC proposed by
Schwarz is based on Bayesian theory (Schwarz 1978). AIC and BIC have been used
in some paper of degradation modelling (Park and Padgett 2005a; Park and Padgett
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 65
2006). AIC and BIC are both based on an important assumption: the candidate
models should use the same dataset, and AIC further requires a large sample size.
Cavanaugh and Shumway summarised and compared the results of model selection
criteria in dealing with the state space model (Cavanaugh and Shumway 1997). A
new Bootstrap variant AIC was developed in their research to deal with the same
sample size. The new Bootstrap variant AIC developed by Cavanaugh and Shumway
(Cavanaugh and Shumway 1997) may have better results for state space models.
However, the computer intensity makes it impossible to deal with the proposed non-
Gaussian state space model.
In this chapter, the choice of formulations for different components in the state space
model is conducted using the Akaike's information criterion with a second order
correction (AICc) (Cavanaugh and Shumway 1997). The AICc is a relative measure
of lost information when a given model is used to describe a real dataset. A smaller
value of AICc indicates a better fitness result. Compared with the commonly used
AIC, the AICc is more effective for a small sample size. As illustrated in
2 2 log 2 1 1⁄ 1, (3-18)
the AICc considers the value of the marginal likelihood function , the parameter
number , and the sample size . For a given dataset, a model with a high likelihood
and a small number of parameters is preferable.
The model selection criteria, i.e., BIC, AIC, AICc, all require the value of the
marginal likelihood function. For the Gamma-based state space model the marginal
likelihood function cannot be calculated analytically. An algorithm based on the
particle filter is developed in this section to obtain the marginal likelihood function
of the Gamma-based state space model.
The marginal likelihood function of the Gamma-based state space model is given by
: , :
| ∏ : , , , : , ,, (3-19)
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 66
where is the inspection index of the last observable direct indicator given the
current inspection index , max ; , 1 . Due to the non-
Gaussian non-linear property of the Gamma-based state space model, the conditional
PDFs at the right side of the equality sign in Equation (3-19) are calculated by
particle filtering.
Unlike the particle filter developed during the E step of the EM algorithm, the
particle filter used to estimate the conditional PDFs in Equation (3-19) does not
depend on the direct indicators after the current inspection time. Subsequently, the
importance density function is given by
, ; , . (3-20)
After filtering, the PDFs in Equation (3-19) can be calculated by
: , , | , : , ,
∑ · ; 0, (3-21)
and
, : , , | , : , ,
· ; 0, · ∑ ; · ,, (3-22)
using the filtering results :: and :
: , where : represent the samples
generated from the prior density function : , , :
3.2.5 Monte Carlo-Based Lifetime Prediction
In this chapter, a failure is assumed to happen when a direct indicator crosses a
failure threshold . Two situations are considered here: One is that failures can be
detected during inspections; the other is that failures are unobservable during
inspections, which can happen when failures do not cause immediate breakdowns or
sharp changes of indirect indicators.
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 67
When a failure is observable, the CDF of the lifetime (i.e., the survival function) is
illustrated as
Pr | : , Pr | | : , (3-23)
which consists of two components. The first component is the PDF of the current
direct indicator , given the observations of indirect indicators up to the current
time and the fact that a failure has not yet happened; i.e. | : , , where
denotes the current inspection index. The PDF | : , can be obtained
by particle filtering. After filtering, | : , is represented by a set of
particles : . The second component is the conditional survival function given the
current direct indicator , which is obtained as
Pr | Pr , ⁄ (3-24)
according to the properties of the Gamma process, where · is the indicator
function given by
0,1, . (3-25)
After substituting Equation (3-24) into Equation (3-23), and using the result of
particle filtering, the survival function is obtained as
Pr | : , Λ Pr Λ Λ | | :
∑ Pr Λ Λ |. (3-26)
When η t is differentiable, the conditional PDF of the lifetime can be calculated as
| : , Λ ∑
ln. (3-27)
When a failure is unobservable, the survival function becomes a piecewise function
given by
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 68
| : | :
∑,
∑∞
∑B ; ,
B ,
·∑
0
. (3-28)
Equation (3-28) is constructed using the results of particle smoothing, i.e. :: .
After the current inspection time , the survival function is similar to Equation
(3-26). Before , the survival function is inferred from smoothing particles.
Between two known points, the sample points of a Gamma process follow the Beta
distribution. According to the characteristics of the Beta distribution, the second
equation of Equation (3-28) is obtained, where denotes the next inspection
index given the time , Be , 1 is the Beta function, and
Be ; , 1 is the incomplete Beta function.
The lifetime PDF is also divided into two components by the current inspection
time . After , the lifetime PDF is similar to Equation (3-27); before , the
lifetime PDF is approximated by calculating the average values of inspection
intervals:
| : | :
∑ ·
log∞
∑ ∑
·0
.(3-29)
3.3 Simulation Study
To demonstrate the implemented process and the performance of the proposed
algorithms, a simulation study was conducted. When a modest sized training sample
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 69
was used, the standard deviation of the parameter estimate in the shape function
given by Equation (3-3) was significant. Moreover, the variance-covariance matrix
of parameter estimates showed that the estimates and were highly correlated.
Therefore, and cannot be regarded as unknown simultaneously for average sized
training data. This chapter only considers the situation when is fixed to one.
The parameters of the Gamma-based state space model investigated in this
simulation study were set as 0.8 0.1 1.5 1.2 0.5 .
The simulation data were assumed to be collected from a test lasting 200 hours. The
sampling intervals of direct and indirect indicators were 20 hours and one hour,
respectively. A sequence of simulated data was plotted in Figure 3-1.
Figure 3-1: The simulated indirect indicators and direct indicators
3.3.1 Parameter Estimation
First of all, initial parameters were estimated using observable direct indicators and
the corresponding indirect indicators. In this situation, only 10 of 200 indirect
indicators were used. The initial parameters were then obtained as
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 70
0.8401 0.0956 1.4847 1.2024 0.3174 .
Following on, the EM iterations were conducted in two stages. In the first stage,
1,000 particles were used. The EM loop converged after 71 iterations. In the second
stage, 2,000 particles were used for a better result. The EM loop converged after
eight additional iterations. The convergence processes of different parameter
estimates are presented in Figure 3-2 which shows that parameter estimates became
less fluctuating when 2,000 particles were used. The final results of the parameter
estimation were obtained as
0.7893 0.1179 1.5221 1.1938 0.4878 .
The variance-covariance matrix of the parameter estimates was then calculated as
Σ
1.6255 1.0011 0.2171 0.0571 0.10711.0011 0.8298 0.1434 0.0393 0.06400.2171 0.1434 2.2141 0.7738 0.01940.0571 0.0393 0.7738 0.2758 0.00730.1071 0.0640 0.0194 0.0073 0.6991
10 .
According to the variance-covariance matrix, the standard deviations of the
parameter estimates were:
var diag Σ 0.0403 0.0288 0.0471 0.0166 0.0264 .
The estimation result shows that the proposed EM algorithm has the power to
recover unknown parameters.
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 71
Figure 3-2: The development of the parameter estimates
To investigate the performance of the two-stage EM algorithm, ten additional
degradation sequences were generated. The generated simulation data were
processed by EM algorithms using six different strategies, i.e., a single-stage
strategy with 1000 particles, a single-stage strategy with 1500 particles, a single-
stage strategy with 2000 particles, a two-stage strategy with 1000 and 1500 particles,
a two-stage strategy with 1000 and 2000 particles, and a two-stage strategy with
1500 and 2000 particles. Originally, 500 particles were also considered; this made
EM loops more efficient. However, the small population of particles caused EM
iterations to sometimes diverge. In contrast, 1,000 particles made EM loops more
stable, and the elapsed time of EM iterations was still satisfactory. Similarly, it is
possible to consider more than 2,000 particles. However, additional particles could
not improve parameter estimation results significantly while the efficiency of EM
loops dropped considerably.
The three single-stage strategy used the relative likelihood function as the criteria of
convergence. The simulation study was carried out on a laptop computer with Intel
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 72
T2400 and 1G memory .The elapsed times and mean likelihood function values of
the last three EM iterations were recorded as Table 3-1. The results showed that the
two-stage strategy with 1000 and 2000 particles has the smallest relative likelihood
value, which indicates a better parameter estimation results. In addition, the two-
stage strategy with 1000-2000 particles consumes less time than the single-stage
strategy with 2000 particles and another two-stage strategy with 1500-2000
particles. In practice, degradation models are often trained using historical data
offline and the requirement of efficiency is relatively low. In this situation, a strategy
that can derive better parameter estimates is preferred. Therefore the two-stage EM
algorithm with 1000-2000 particles that has a relatively smaller mean relative
likelihood value is adopted.
Table 3-1: The mean likelihood function values and the elapsed times of six different strategies
Number of
Stages
Number of
Particles
Relative likelihood
function (10-3)
Elapsed time
(Seconds)
Single 1000 5.816 1111
Single 1500 5.094 1872
Single 2000 3.296 3257
Two 1000-2000 2.590 2648
Two 1000-1500 3.271 2505
Two 1500-2000 3.044 4509
3.3.2 Performance Investigation
This section demonstrates the advantages of the proposed EM algorithm and the
state space model by comparing three approaches to estimating direct indicators. In
the first approach, the parameters of the state space model were identified by the EM
algorithm, and then the direct indicators were estimated by the particle filter. In the
second and the third approaches, the model parameters were both estimated by the
maximum likelihood method which only considers observable direct indicators with
their corresponding indirect indicators. Direct indicators were estimated using the
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 73
particle filter in the second approach while they were estimated only using the
observation equation (3-2) in the third approach.
Firstly, the simulated training data were generated. To investigate the effects of the
observation noise, two Gamma-based state space models with different observation
noise ( 0.5 and 0.05) were considered. The other parameters were the same
as those in Section 3.3.1. For the two state space models, 60 sequences of simulated
data with 200 indirect indicator observations were generated, respectively. To
explore the effects of the quantity of observable direct indicators, the 60 simulated
sequences were divided into three equal-sized groups, with 5, 10, and 20 observable
direct indicators, respectively. Then, 60 additional sequences of simulated data were
generated for test. The training and testing data were processed by the three
approaches discussed above. The estimates of direct indictors in the testing dataset
were obtained. To evaluate the effectiveness of the three methods, the mean square
errors (MSE) of the direct indicator estimates were calculated. The values of MSE
when 0.5 and 0.05 are demonstrated in Figure 3-3 and Figure 3-4,
respectively. In these two figures, MSE1 denotes the MSE of the direct indicators
estimates by the particle filter whose parameters were identified by the EM
algorithm (i.e. the first approach). MSE2 denotes the MSE of the direct indicators
estimates by the particle filter whose parameters were estimated using observable
direct indicators and their corresponding indirect indicators (i.e. the second
approach). MSE3 denotes the MSE of the direct indicators estimated by the
observation equation whose parameters were estimated using observable direct
indicators and their corresponding indirect indicators (i.e. the third approach).
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 74
Figure 3-3: MSEs of the direct indicator estimates when the observation noise is 0.5 ( . )
Figure 3-4: MSEs of the direct indicator estimates when the observation noise is 0.05 ( . )
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 75
In Figure 3-3 ( 0.5), MSE2 is 42.41%, 57.62%, and 66.43% smaller than MSE3
for the three different numbers (i.e., 5, 10, and 20) of observable direct indicators.
This indicates that the particle filter using more accurate parameter estimates can
improve the estimation results achieved by observation equation more significantly.
On the contrary, MSE1 is 43.63%, 41.68%, and 23.04% smaller than MSE2 in the
three different situations. The decreasing difference between MSE1 and MSE2 shows
that EM algorithm can achieve more significant improvement of performance for
smaller number of underlying health state observations.
In Figure 3-4 ( 0.05), when 10 or 20 indirect indicators were observable, MSE1
was slightly larger than MSE2. In this situation, only considering the indirect
indicators whose corresponding direct indicators were observable could get better
parameter estimation results. The reason is that the sample size of direct indicators
was relatively larger and the errors introduced by Monte Carlo algorithms in the E
step were significant compared with the small observation noise ( 0.05). In
practice, the number of observable direct indicators is usually limited due to the
difficulties of measurement. Moreover, due to the uncertain relationship between
direct and indirect indicators and the measurement errors of direct indicators, the
noise in the observation equation is often significant in reality. Therefore, it is
beneficial to consider additional indirect indicators without corresponding
observable direct indicators using the proposed EM algorithms in most real case
studies.
In most situations of the simulation experiment, MSE2 was smaller than MSE3.
Therefore, the particle filter based on the state space model can have a more accurate
estimation result than that obtained only using the observation equation. The only
exception happened when 0.05 and five direct and indirect indicators were
used. In this situation, the estimation result derived by the particle filter was less
accurate than that achieved using the observation equation only. The reason is as
follows: The parameter estimation results for the state equation can be poor when
only five direct indicators are available. This problem is also called overfit. In
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 76
statistical modelling the overfit problem refers to a model with too many degrees of
freedom corresponding to the training sample size. The overfit problem also exists
when the direct and indirect indicators are used to estimate the parameters of the
observation equation. In Figure 3.3, the particle filter can still achieve better
underlying state estimation results than observation equation when the parameter
estimates of the state equation and the observation equation are both poor. When the
observation noise is insignificant ( 0.05), the parameters in the observation
equation can be identified accurately only using a small sample size. However, the
overfit problem of the state equation still exists. In this situation, considering the
state equation cannot improve the underlying state estimates derived only by
observation equation. This overfit problem was solved using the whole sequence of
indirect indicators. In reality, indirect indicators can often be sampled easily.
Therefore, this overfit problem of the state space model can be overcome by
increasing the sampling rate of indirect indicators.
3.3.3 Life Prediction
After the model parameters have been estimated, the lifetime of an engineering asset
can be predicted according to the algorithm developed in Section 3.2.5. The
parameter estimates obtained in Section 3.3.1 were used, i.e.,
0.7893 0.1179 1.5221 1.1938 0.4878 .
A sequence of simulated degradation indicators was generated for testing. After a
failure threshold Λ 8 was set on the sequence of direct indicators , the failure
time was obtained as 253.3.
In the first situation, the failure was assumed to be observable. The conditional
survival function given indirect indicator observations was then obtained using
Equation (3-26). The survival functions predicted at different inspections were
plotted in Figure 3-5, which shows that the jumps of the survival functions were
sharper and closer to the actual failure time when more indirect indicators were
available. The lifetime PDF can be calculated via Equation (3-27). The lifetime
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 77
distributions derived at 0 and 240 were demonstrated in Figure 3-6. The
figure shows that the lifetime PDF is biased and with a wide confidence interval at
first. On the other hand, at the last stage of life, the failure time can be predicted
accurately. The reason is that, at the initial stage, the lifetime PDF is a prior
distribution, and this prior distribution is updated by the information from indirect
indicators and the fact that failure has not yet occurred.
Figure 3-5: Life prediction results when the failure is observable
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 78
Figure 3-6: The lifetime distribution predicted at different time points
Figure 3-7: The lifetime distribution prediction at when the failure is not observable
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 79
The situation when the failure is unobservable was also considered. The survival
function and the lifetime PDF derived by Equations (3-28) and (3-29) were similar
to those plotted in Figure 3-5 and Figure 3-6. The difference was that the survival
function and the lifetime PDF were not conditional on the fact that the item had
survived to current time. Therefore, as shown in Figure 3-7, the values of the
lifetime PDF before current time were not necessarily equal to zero. The piece-wise
lifetime PDF in Figure 3-7 was calculated using Equation (3-29).
The life prediction result shows that the proposed life prediction method can
combine the information from indirect indicators and the age of an engineering
asset.
3.4 Case study: Crack Size Propagation Modelling
The data used in this case study was collected from the accelerated life test of a
single stage spur gearbox. The gear investigated in this research was 10 mm wide
and had 27 teeth. The shaft speed was 2400 RPM. To accelerate the degradation
process, a semi-circle notch of 1mm radius was initially spark eroded at the root
fillet of a tooth, and the gearbox worked under an overload condition. The vibration
signal was sampled at 73 time points with irregular intervals. On the contrary, the
crack depth was only measured at six different time points (listed in Table 3-2) due
to the difficulties of measurement. This case study modelled the development
processes of the crack depth (a direct indicator) and the change of the related
indicator (an indirect indicator) extracted from vibration signals.
Table 3-2: The measurements of the crack size during the accelerated life test
Measure time (hour) 0.0917 3.3383 3.7536 4.6383 5.5064 5.6864
Crack depth (mm) 1 2.57 2.73 3.11 3.81 4.16
To extract an indicator that relates to the crack depth on the gear, this numeric study
adopted an indicator extraction method discussed in (Wang 2003a). Firstly, a
residual signal was obtained from the signal average by filtering out gear meshing
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 80
harmonics (i.e. using a multi band-stop filter). This residual signal represents
random transmission errors for healthy gears. For faulty gears (e.g. gears with tooth
cracking or tooth pitting), the transmission errors include a sudden change (e.g. a
spike) and the signal becomes non-Gaussian. Kurtosis is a good measure of non-
Gaussianity (e.g. spikiness) in a signal. Therefore, the kurtosis of the residual signal
is an effective indicator to reveal crack development processes. Previous research
has also revealed that the kurtosis of the residual signal has a sound co-relationship
with the crack on the test gear (Wang and Wong 2002; Wang 2003a). Therefore, the
kurtosis of the residual signal was adopted as the indirect indicator of the crack
depth.
In this case study, the Gamma-based state space model given by Equations (3-1),
(3-2), and (3-3) was considered first, where the crack depth was denoted by Λ ,
and the kurtosis of the residual vibration signal was represented by . The crack
depth on a gear cannot decrease during a degradation process. Therefore, the
monotonically increasing Gamma process was an appropriate candidate to model the
enlargement of the crack depth. The effectiveness of the Gamma process in
modelling the development of a crack has been verified by existing research
(Lawless and Crowder 2004; Park and Padgett 2005a). As the development of the
crack depth demonstrated a nonlinear relationship with the time, the nonlinear
Gamma process given by Equations (3-1) and (3-3) was used to model the
development of the crack depth. Due to the small sample size, the one parameter
shape function (i.e. ) was adopted by the nonlinear Gamma process. The
relationship between kurtosis values of the residual vibration signal and the crack
depth was also nonlinear. Consequently, the power formulation (i.e. Λ ·
Λ ) was used in Equation (3-2). For practical degradation process, the
observation noise in Equation (3-2) does not necessarily follow the identical
independent normal distribution. Motivated by Christer’s research (Christer et al.
1997), four typical formulations of the observation noise were considered as showed
in Table 3-3, i.e. time-independent noise, noise with linear standard deviation, noise
with linear variance, and noise with exponential variance. More complex
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 81
formulations of the observation noise were not considered in this case study, because
the sample size in this case study was limited and more parameters may cause an
overfit problem.
Table 3-3: The AICc of different models
Observation noise PDF of AICc
Time independent noise √
exp 117.9231
Noise with a linear standard
deviation √
exp ; · 138.4049
Noise with a linear variance √
exp ; √ · 117.1404
Noise with an exponential
variance √
exp ;
· exp ·
106.9659
The AICc was used to choose the most appropriate formulation of the observation
noise from the four candidates listed in Table 3-3. The AICc values of the four
different formulations of the observation noise were calculated (see Table 3-3).
According to the results, the noise with exponential variance had the lowest AICc
value and was selected to model the observation noise. Finally, the model
parameters were estimated as:
1.939 0.1087 0.2269 1.893 0.1456 .
A state space model with linear and Gaussian assumption given by
Λ Δ Λ ~ · Δ , · √Δ (3-30)
and
~ · Λ t , (3-31)
was also used to fit to the dataset collected from the accelerated life test. When the
linear and Gaussian assumption is adopted, the parameters can be estimated
efficiently using the EM algorithm based on the Kalman smoother (Khan and Dutt
2007) The parameter estimation results were:
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 82
0.5557 2.288 0.5443 0.2487 .
The corresponding AICc value was calculated as 1227.997, which is much greater
than that of the Gamma-based state space models. In addition, the Gaussian
assumption makes the development of the crack depth fluctuant, which is not
consistent with the fact that the crack depth monotonically increases. Therefore, the
linear Gaussian assumptions are not appropriate for this dataset collected in the
accelerated life test of the gear box.
3.5 Chapter Summary
This chapter models correlated degradation processes of direct and indirect
indicators using a Gamma-based state space model. An EM algorithm based on the
particle smoother has been developed to estimate the parameters of the Gamma-
based state space model. The results of the simulation experiment demonstrate that
the proposed EM algorithm can estimate the underlying parameters accurately.
When samples of a direct indicator are limited and observation noise is significant,
the proposed EM algorithm can improve the parameter estimates by considering
more indirect indicators that are easier to obtain in reality. In addition, a lifetime
prediction approach using the particle filter, the particle smoother, and Bayesian
theory has been developed. The simulation study shows that the lifetime prediction
algorithm can combine indirect indicator observations and age information to
estimate the failure time of an engineering asset. Finally, a case study using
experimental data has been conducted to demonstrate the model selection method
that can identify the candidate model formulation with best fitness result. The case
study also shows that the linear and Gaussian assumption is not appropriate for some
practical data.
The proposed Monte Carlo-based algorithms enable the state equation of the state
space model to adopt other non-linear non-Gaussian stochastic processes. The
observation equation can also use a range of non-linear formulations to describe the
relationship between direct and indirect indicators. These state space models –
without linear and Gaussian assumptions – are expected to be more effective when
3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 83
fitted to practical data. Moreover, the parameter estimation method developed in this
research can deal with the situation when more indirect indicators than direct
indicators are known. Consideration of these additional indirect indicators can
improve parameter estimation results and avoid the overfit problem when the
observations of direct indicators are limited.
84
4 Joint Modelling of Failure Events and Multiple Indirect Indicators
4.1 Introduction
Chapter 3 investigates the state space degradation model that describes correlated
degradation processes of a direct indicator and a indirect indicator. However, for
some engineering asset, the failure mechanism is complex and no physical direct
indicator can be extracted to represent the underlying degradation process. For
example, Wang used a generic wear condition as a direct indicator of aircraft
engines (Wang 2007). This generic wear condition was not extracted directly from
the condition monitoring data. When no direct indictor can be extracted from CM
data, the underlying degradation process is only observable at failure times.
Therefore, the indicators extracted from the CM data and the lifetime data should be
combined to model the degradation process when a direct indicator is not available.
Moreover, in some situations multiple indirect indicators can be extracted from CM
data. The effectiveness of these indicators in life prediction should be evaluated, and
information from these indirect indicators should be fused properly. This chapter
develops a state space model that describes an asset degradation process using
multiple degradation indicators and failure events.
The state space model is an effective mathematical model that can combine multiple
degradation indicators and lifetime data. The state space model presumes the
existence of an underlying degradation process. When the underlying degradation
process crosses a predetermined threshold, a failure happens. The underlying
degradation process is partially revealed by multiple degradation indicators.
Compared with other degradation models, the state space model considers both the
stochastic underlying degradation process and uncertain relationships between the
underlying degradation process and the degradation indicators. Therefore,
degradation indicators are used more efficiently, and no additional mathematical
models for time dependent degradation indicators are needed when predicting asset
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 85
lives. Moreover, the state space model is an effective tool for indicators fusion.
Compared with commonly used multivariate statistical approaches and multivariate
time series analysis methods, the state space model can analyse degradation
indicators with uneven sampling intervals.
Existing research of the state space degradation models that combine degradation
indicators and failure events largely adopts discrete time or state assumptions. Wang
proposed a state space model whose underlying health state increments followed a
beta distribution (Wang 2007). Subsequently, Wang’s new model had a monotonic
increasing underlying degradation process that was similar to irreversible
engineering asset wear processes. However, Wang’s new model was discrete in
time. Makis and Jiang developed a state space model based on a continuous time
discrete state Markov process (Makis and Jiang 2003). The discrete state assumption
requires discretising continuous degradation processes, which needs expert
knowledge and may introduce additional errors. To remove discrete time and state
assumptions, state space models continuous in time and state have also been
developed. Wang et al. developed a state space model to predict the RUL of bearings
using RMS values of vibration signals (Wang 2002). Wang’s model used values of
RUL as underlying health states. This deterministic underlying degradation process
did not consider stochastic heterogeneous degradation processes of different
individuals. Whitmore et al. proposed a bivariate Wiener process (Whitmore et al.
1998) to model a partially revealed degradation process. However, the bivariate
Wiener process only considered the covariates collected at failure and censoring
times, while degradation indicators at other occasions were ignored.
To address the limitations in the existing state space degradation models, this
chapter applies the Gamma-based state space model to combine multiple
degradation indicators and failure events. Continuous time property enables the
proposed model to process irregular inspection intervals. Continuous states, on the
other hand, avoid discretising indicators with continuous values. This chapter uses
Monte Carlo based parameter estimation and lifetime prediction algorithms to
process the Gamma-based state space model. The censored failure data problem
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 86
which has been ignored by most existing state space degradation models (Christer et
al. 1997; Wang 2002; Makis and Jiang 2003; Wang 2007) is considered. In addition,
a parametric Bootstrap algorithm is developed to evaluate the effectiveness of
different indicators in asset degradation modelling. The proposed algorithms are
validated by both simulated data and field data.
4.2 Model Formulations and Solving Algorithms
4.2.1 Model Formulations and Notations
In this chapter, the system equation of the Gamma-based state apace model given by
Λ Δ Λ ~Ga · Δ , (4-1)
is assumed to follow a Gamma process. The scalar variable Λ 0 denotes the
underlying health state at time 0. A larger value of Λ indicates a worse health
state, and a failure is assumed to happen when Λ crosses a predetermined
threshold . An asset is assumed to be non-defective at the initial time, i.e.,
Λ 0 0. The increments of Λ follow a Gamma distribution given by Equation
(4-1), where Ga · Δ , denotes the Gamma distribution with shape parameter
· Δ and scale parameter . The second component of the Gamma-based state
space model is the observation equation. In this chapter indirect indicators are
assumed to follow a multivariate normal distribution given by
~N · Λ , Σ , (4-2)
where denotes the indirect indicator vector at time , and N · Λ , Σ denotes
the multivariate normal distribution with mean vector · Λ and covariance
matrix Σ. Here, the multivariate normal distribution is selected due to the following
reasons: The multivariate normal distribution is the most important multivariate
continuous distribution. The mathematical property of the multivariate normal
distribution has been well investigated, and the inference algorithms have been well
developed. Furthermore, as the most commonly used multivariate distribution, the
multivariate normal distribution has been widely used to approximate joint random
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 87
variables in practice. In addition, the multivariate normal distribution is also widely
used in state space model (Stathopoulos and Karlaftis 2003; Proust et al. 2006;
Proust-Lima and Jacqmin-Gadda 2007).
To formulate the parameter estimation algorithm more concisely, only degradation
indicators from one degradation process are considered in this chapter. Inspection
times are denoted as 1,2, … , ), where is the number of inspections. The
values of the underlying health state and indirect indicator vector at the th inspection
are denoted as and respectively. The failure time and the failure threshold of the
underlying degradation state are denoted as and . Note that is assumed equal
to 1, because the identical life time distribution can be obtained by changing the
scale parameter for different values of . For an asset preventively replaced
before failure, the censoring time is denoted as . Unlike the commonly used PHM,
the degradation indicators at or are not indispensible during parameter
estimation of the Gamma-based state space model.
4.2.2 Parameter Estimation
Similar to Chapter 3, this chapter uses the Monte Carlo based EM algorithm to
estimate the parameters of the Gamma-based state space model. In this chapter, the
underlying degradation process is only observable at the failure time. Therefore, the
initial parameters cannot be estimated using the method adopted in Chapter 3. This
chapter estimates the initial parameter using the method of moments. In addition,
due to the difference in model formulations and assumptions, the E step of the EM
algorithm is also different from that in Chapter 3. The details of the initial
parameters estimation method and the E step of the EM algorithm are introduced as
follows, while other steps that are identical to those in Chapter 3 are not discussed in
this chapter.
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 88
4.2.2.1 Initial parameters estimation
The initial parameters for the EM algorithm are estimated by the method of
moments. Due to uneven inspection intervals, the increments of degradation
indicator vectors should be scaled before treated by the method of moments. The
method of moments used in this research is motivated by that adopted in (Cinlar et
al. 1977). Firstly, the equation
Λ · ·· ·
· · · · (4-3)
can be obtained according to the property of the Gamma process. Then the first-
order and second-order moments of the scaled increments of degradation indicator
vectors can be calculated as
∑ ∑ · 1 · (4-4)
and
∑ · ∑· · ·
∑ · · · · · · ·· ·
. (4-5)
After that, given an initial estimate , the estimate of , and Σ are calculated using
·⁄ , (4-6)
∑ ∑/
∑ , (4-7)
and
∑ · · · ∑
1 · · 2 ∑ (4-8)
The estimate is obtained by experience. When any diagonal element of is
negative, a bigger value of is required.
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 89
4.2.2.2 E step
The E step is to estimate the expectation of the complete likelihood function. In this
section, both complete and censored failure data are considered. When complete
failure data are available, the expected complete likelihood function given
degradation indicators and failure time can be written as:
: , log : , : ,
: , log : ,: , log : | : ,
, (4-9)
where , , and represent the model parameters
to estimate. To make the equations more concise, in this chapter, ; ,
1, … , is denoted by : ; similarly ; , 1, … , is denoted by : .
The two components of Equation (4-9) can be written as:
: , log : | ∑ log log Γ 1
·: , log
: , : , 1
(4-10)
and
: , log : | : , ⁄ log Σ
tr Σ ∑: , · · ·
, (4-11)
respectively, where , , 2,3, … , 1 , and is
the size of the indirect indicator vector. To achieve a shorter equation, denotes
, and represents in Equation (4-10). To calculate Equations (4-10) and
(4-11), three components (i.e., : , ,
: , , and : , log
should be estimated first. The three components are estimated through the particle
smoother algorithm. The particle smoother can approximate conditional distributions
of underlying health states given degradation indicators : and failure time by a
set of random samples :: ; 1,2, … , 1 1,2, … , as:
: , ∑ 1,2, … , . (4-12)
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 90
In Equation (4-12), · is the Dirac delta measure given by
0, 1,
. (4-13)
Using these smoothing results :: , the three components in Equations (4-10) and
(4-11) can be approximated as:
: ,1 ∑
: , log1 ∑ log 1
: ,2 1 ∑
: ,
2
: ,
2
. (4-14)
To conduct the particle smoother, the conditional PDF of the underling health state
at the next inspection time given the failure time and the current heath state should
be calculated first. In the developed model, the failure time is assumed as the first
crossing time of the underlying Gamma process Λ ; 0 to a predetermined
failure threshold . Therefore, the conditional PDF of the underlying health state at
the next inspection time can be written as:
, Be ; , (4-15)
according to the Gamma bridge property.
For censored data, the expected complete likelihood function is similar to Equation
(4-9), except replacing the failure time with the censored time . The expected
complete likelihood function for censored data is also approximated by the results of
particle smoothing. When conducting particle smoothing, the conditional PDF of the
underlying health state at the 1 th inspection point is modified from Equation
(4-15) to
, Λ Λ Ga ; ,
· , / /, / /
1, … , 1. (4-16)
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 91
The derivation process of Equation (4-16) is demonstrated in Appendix.
4.2.3 Indicator Effectiveness Evaluation
In real applications, it is important to evaluate the relative effectiveness of different
degradation indicators in parameter estimation and lifetime prediction. After
effective indicators are identified, a more economical condition monitoring system
can be built by only installing necessary sensors. Moreover, the size of the database
that stores condition monitoring data can be also reduced. In addition, the over-
fitting problem when applying a degradation model to a real dataset may be
overcome by ignoring unnecessary degradation indicators. Some degradation models
can identify the effectiveness of different degradation indicators. For example, the
importance of different covariates of the PHM can be revealed by the regression
coefficients. For the composite scale model, the effectiveness of different
degradation indicators can be disclosed by weight parameters and mean values of
degradation indicators (Jiang and Jardine 2006). In the proposed Gamma-based state
space model, the relationships between degradation indicators and underlying health
states are modelled by an observation equation that is in various formulations.
Consequently, the effectiveness of a degradation indicator cannot be simply
evaluated by a certain parameter.
This research develops a parametric bootstrap method to evaluate the effectiveness
of indicators. Because the parameters of the Gamma-based state space model cannot
be estimated efficiently, the bootstrap method that estimates the parameters of a
large number of simulated data is not appropriate. An alternative method is
comparing the influences of different indicators on the result of particle filtering. An
indicator that affects particle filtering results significantly can have a considerable
impact on the result of parameter estimation, because the estimation of the expected
complete likelihood function during the EM algorithm is based on the particle
filtering and smoothing. In addition, the asset life prediction method also relies on
the particle filter. Therefore, the influence of an indicator during particle filtering
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 92
reveals the effectiveness of the indicator in degradation modelling and life
prediction.
The process of the proposed indicator effectiveness evaluation method is as follows:
Firstly, the proposed model is fitted to a training dataset and the parameters are
estimated as . Then, sequences of simulated data are generated using the
parameter estimates . After that, the particle filter is carried out to estimate
underlying health states of the simulated degradation sequences. During the
particle filtering, each degradation indicator is omitted in turn, and MSE of the
underlying health state estimates is calculated. Thus MSEs are calculated as
( 1,2, … , ), where is the size of a degradation indicator vector and
denotes the MSE of underlying health state estimates when the th indicator is
omitted. After that a particle filter considering all the indicators is applied to the
simulated data, and the MSE of the underlying health state estimates is obtained as
. A relative contribution ratio is calculated as ⁄ (obviously
1 ∞ ) for the th degradation indicator. A bigger value of indicates that the th
degradation indicator is more important. On the contrary, if is close to one, the th
degradation indicator can be omitted. However, degradation indicators which are
highly correlated to each other may have relative contribution ratios close to one
simultaneously. These indicators cannot be removed altogether. One solution is only
omitting the indicator with the smallest relative contribution ratio, and then
calculating the relative contribution ratios of the rest indicators again. Subsequently,
highly correlated degradation indicators will not be omitted simultaneously.
4.3 Simulation Study
To investigate the performance of the proposed algorithms, a simulation study was
conducted. First of all, a set of simulation data was generated. The simulation dataset
consisted of two complete degradation sequences and two censored degradation
sequences of degradation indicators. The parameters adopted to generate a
simulation dataset were as follows:
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 93
0.005, 0.05, 2 2.5 3 , and Σ5 1 11 5 21 2 6
10 .
These parameters are illustrative only and without any particular meaning. The
inspection interval was assumed to be 60 hours, i.e. 60. One of the four
sequences of degradation indicators is shown in Figure 4-1.
Figure 4-1: Three Simulated degradation indicators
4.3.1 Parameter Estimation
Given the four degradation sequences, parameter estimation was conducted. First of
all, according to Equations (4-6), (4-7), and (4-8), initial parameters were estimated
as:
0.01, 0.02535, 2.007 2.322 2.811 ,
and
5.057 0.317 2.2980.317 5.624 3.5722.298 3.572 7.892
10 .
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 94
Then, EM iterations started with this initial parameter set. The EM iterations were
conducted in two stages. In the first stage which lasted 57 iterations, 1,000 particles
were used to perform particle smoothing. At the second stage, 2,000 particles were
adopted for a better estimation result. As shown in Figure 4-2, the convergence
process of parameter estimates became much smoother when 2,000 particles were
used. After 67 iterations, the final results were acquired as:
0.005475, 0.04454, 2.024 2.516 3.037 ,
and
4.729 1.256 0.9271.256 4.481 1.9650.927 1.965 6.162
10 .
The parameter estimation results showed that the proposed EM algorithm can detect
the unknown parameters accurately.
Figure 4-2: The convergence process of the EM algorithm
4.3.2 Lifetime Prediction
To test the lifetime prediction ability of the proposed model, an additional simulated
sequence of degradation indicators was generated. As described in Section 3.2.5, the
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 95
lifetime prediction algorithm is divided into two steps. The first step is estimating
the distribution of current underlying health state using the particle filter. As to the
simulated data for test, underlying health states at different inspections were
estimated as Figure 4-3. The second step is predicting the RUL based on the
underlying health state estimation results. The life prediction results and
corresponding confidence intervals are demonstrated in Figure 4-4. As shown in
Figure 4-4, when more condition monitoring indicators were available, the RUL
prediction results became more accurate and the confidence intervals were narrower.
The reason is that the prior estimate of the URL was updated by more degradation
indicators and the fact that the asset still survived. Therefore, the proposed lifetime
prediction algorithm can combine the information from degradation indicators and
survived time.
Figure 4-3: Estimation of underlying health states
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 96
Figure 4-4: RUL prediction results
4.3.3 Effectiveness Evaluation of Indicators
The effectiveness evaluation method for indicators was also tested by a simulation
study. Firstly, forty-four complete sequences of simulated degradation indicators
were generated using the parameters:
0.005, 0.05, 0.2 2.5 3 ,
and 1 0 00 0.005 00 0 0.005
.
The inspection interval was still assumed to be 60 hours. Four sequences of these
simulated degradation indicators were used as training data; the other 40 sequences
were used as test data. Based on the training data, the parameters were estimated as:
0.004903, 0.04979, 0.2292 2.568 3.08 ,
and
1032 1.677 5.4261.677 4.622 0.48435.426 0.4843 4.613
10 .
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 97
The bootstrap algorithm developed in Section 4.2.3 was then conducted. Forty
sequences of simulated indicators were generated during the bootstrap process, and
relative contribution ratios of different indicators were calculated as the second row
of Table 4-1. On the laptop computer with Intel T2400 CPU and 1 G memory, the
bootstrap algorithm lasted 126 seconds.
Table 4-1: The results of effectiveness evaluation for indicators
Index of the indicator j 1 2 3
Relative contribution ratio 1.012 1.655 2.205
MSE 4.511×10-4 6.729×10-4 11.58×10-4
To investigate the performance of the proposed effectiveness evaluation algorithm
for indicators, parameter estimation was conducted using the original training
dataset when different indicators were omitted. When the first indicator was not
considered the parameters were estimated as:
0.004847, 0.05037, 2.577 3.091 ,
4.67 0.45550.4555 4.67 10 .
Similarly, when the other two indicators were omitted, the parameter estimates were:
0.004746, 0.05144, 0.2343 3.11 ,
1031 8.1728.172 4.631 10 ,
and
0.004288, 0.05693, 0.2334 2.631 , 1032 2.7612.761 4.534 10
Using these parameter estimates, the particle filter was carried out to process the test
data. The MSEs (denoted by ) of the underlying health state estimates are
given by the third row of Table 4-1. The MSE of the underlying health state
estimates using all the three indicators was also calculated as 3.904×10-4. The results
displayed in Table 4-1 show that ignoring an indicator with a larger relative
contribution ratio during parameter estimation can cause more significant error in
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 98
underlying health state estimation. On the contrary, considering the first indicator
whose relative contribution ratio is near one, cannot improve the underlying health
estimates significantly. Therefore, the proposed indicator effectiveness evaluation
method can recognize the importance of different degradation indicators.
4.4 Case Study: Lifetime Prediction for the Bearing on a
Liquefied Natural Gas (LNG) Pump
4.4.1 Data Introduction
LNG pumps are critical in the LNG industry. An unexpected breakdown of an LNG
pump can reduce the amount of LNG at the receiving terminal and cause
performance degradation of the whole plant. The specifications of LNG pumps
investigated in this case study are listed in Table 4-2, and the structure of an LNG
pump is shown in Figure 4-5. The LNG pump is enclosed within a suction vessel
and mounted with a vessel top plate. Three ball bearings are installed to support the
entire dynamic load of the integrated shaft of the pump and a motor. The three
bearings in the LNG pump are self-lubricated at both sides of the rotor shaft and tail
using LNG. Due to the low viscous value (about 0.16 cP) of LNG, the three bearings
are poorly lubricated. In addition, the bearings work at a high speed (3,600 rpm).
Therefore, bearings installed in these LNG pumps are failure-prone.
Table 4-2: The specifications of the pump
Capacity Pressure Impeller Stage Speed Voltage Rating Current
241.8 m3/hr 88.7 kg/cm2. g 9 3,585 RPM 6,600V 746 kW 84.5 A
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 99
Figure 4-5: Pump schematic
To monitor the health of the bearings, for each bearing, three accelerometers were
installed on housing near the bearing assembly in horizontal, vertical, and axial
directions respectively. In this case study, vibration signals from two bearings
installed on two LNG pumps were investigated. The vibration signals were sampled
at irregular intervals. At the beginning and last stage of life, the vibration signals
were measured more frequently; while at the middle stage of life, the vibration
signals were collected at relatively larger intervals. This kind of irregular inspection
strategy is often used in reality, because it is not necessary to measure vibration
signals frequently when a bearing is running smoothly. The vibration signals
investigated in this case study were all measured at the horizontal direction. The
overall features of the vibration signals are listed in Table 4-3. The outer raceway
spalling and the inner raceway flaking on the bearings are shown in Figure 4-6 and
Figure 4-7. In this case study, vibration signals collected from the bearing installed
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 100
on Pump P301D were used to estimate the parameters of the proposed model, while
the vibration signals collected from the bearing installed on Pump P301C were used
to test the lifetime prediction ability of the proposed model.
Table 4-3 Vibration data features
Machine No
Life Time Failure Mode Sample
Number Sampling Frequency
P301C 4,698Hrs Outer raceway spall 120 12,800 Hz
P301D 3,511Hrs Inner raceway flaking 136 12,800 Hz
Figure 4-6: Outer raceway spall of P301C Figure 4-7: Inner raceway flaking of P301D
4.4.2 Model Application
Bearing failures (e.g. inner race crack, outer race crack, and rolling element crack)
often generate shock pulses whose energy emanates at a relatively high frequency
band. Therefore, a vibration signal, after a high pass filter (HPF), is often more
sensitive to early defects of a bearing. For a raw vibration signal, the kurtosis and the
crest factor which reveals the number of extreme deviations can also indicate early
defects. After investigating different features of the vibration signals used in this
case study, three features were adopted as degradation indicators of the proposed
model: the entropy of the vibration signal after a HPF at 3,000 Hz, the crest factor of
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 101
the vibration signals after a HPF at 2,500 Hz, and the crest factor of the raw
vibration signals.
Using vibration signals collected from Pump P301D, the parameters of the proposed
model were estimated as:
0.01087, 0.02621, 1.658 0.6134 2.392 ,
and
Σ5.295 1.439 1.7641.439 5.356 1.0991.764 1.099 5.741
10
Following on, the effectiveness of the three indicators was investigated. Table 4-4
shows that the crest factor of the raw signals has the highest relative contribution
ratio. However, the relative contribution ratios of the three features are close to each
other. Therefore, none of the features can be omitted.
Table 4-4: Effectiveness evaluation for the three features extracted from the vibration signals
Features Entropy after HPF at 3000 Hz
Crest factor after HPF at 2500 Hz
Crest factor of the raw signal
Relative contribution ratio 1.594 1.305 2.155
Using the model parameters estimated using the vibration signals collected form
P301D, the RUL of the bearing installed on Pump P301C was estimated as Figure
4-8. At the beginning, the prediction error was significant. This was caused by the
difference between the lifetimes of the training dataset and the test dataset. At the
beginning, only few condition monitoring observations were collected. The RUL
was largely predicted based on the lifetime of the training dataset which was much
shorter than that of the test data. Consequently, the predicted RUL was shorter than
the actual value. When a longer indicator history was considered, the slower
degradation progress of the bearing from P301C was detected. As a result, the
prediction error decreased. Especially at the last stage of the life, prediction results
were very close to real values. Figure 4-8 also illustrates that most actual RUL
values fall in the 95% confidence interval, even at the beginning of the life.
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 102
Figure 4-8: RUL prediction results of the bearing on P301C
This research also used degradation indicators from both P301C and P301D as
training data. The RUL of the bearing on P301C was then predicted using this
training result. The RUL predication errors of P301C at different times using both
the degradation indicator sequences and one degradation indicator sequence is
shown in Table 4-5. After the degradation data sequence from the P301C was also
used as the training dataset, the prediction results were improved. The over
conservative estimate of RUL at the beginning was partially overcome; the non-
conservative estimate of RUL was also resolved. This indicates that more reliable
prediction results can be obtained if more training datasets are available.
Table 4-5: RUL prediction results of the bearing on P301C
Operation Hours (Hour) 1 342 480 654 836 1170 2072 2876 3369 3482 3783 4228
RUL prediction errors (both the degradation sequences) (Hour)
540 333 480 394 352 938 250 560 712 375 162 87
RUL prediction errors (one degradation sequence) (Hour)
1141 889 948 950 777 1038 390 346 373 45 126 207
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 103
4.4.3 Discussion
In this case study, the inspection intervals were extremely irregular, which varied
from 3 hours to 133 hours. Converting these uneven observation intervals to equal
ones by interpolation is extremely difficult. Therefore, degradation models (e.g.
(Wang 2007)) with the discrete time assumption are not appropriate for this case
study. Moreover, discretising of the degradation indicators is also difficult due to the
inadequate knowledge of the degradation process of a bearing on a LNG pump.
Therefore, the Gamma-based state space degradation model continuous in time and
state is preferable in this case study.
The results of this case study show that the proposed Gamma-based state space
model can combine the information from degradation indicators and lifetimes.
Furthermore, using the particle filtering method, the remaining useful life estimate
can be updated recursively by considering the current degradation indicators.
4.5 Chapter Summary
This chapter jointly models multiple indirect indicators and event data using a
Gamma-based state space model. To deal with the non-Gaussian property of the
proposed model, a Monte Carlo-based EM algorithm has been proposed to estimate
the parameters and the censored degradation data have been considered in the
parameter estimation algorithm. The asset life prediction algorithm has been also
developed using the Monte Carlo method and Bayesian theory. In addition, this
paper has developed an effectiveness evaluation method for degradation indicators
to identify the relative importance of the degradation indicators adopted in the state
space model. The performance of the proposed algorithms has been evaluated in
simulation studies and a real application.
Compared with existing state space degradation models, the developed model is
continuous in time and states, and does not follow the Gaussian assumption. This
continuous property enables the proposed model to process irregular inspection
4 Joint Modelling of Failure Events and Multiple Indirect Indicators 104
intervals and avoid discretising continuous degradation indicators. Furthermore, the
monotonic increasing Gamma process used in the proposed model is more
appropriate to model the irreversible asset health degradation processes than the
commonly used Gaussian process. The monotonically increasing property of the
Gamma process also makes the construction of the likelihood function easier than
non-monotonically increasing stochastic processes when failure events are
considered.
105
5 Maintenance Strategy Optimisation Using the POSMDP
Chapter 3 and 4 develop degradation modelling methods that can consider direct
indicators, indirect indicators, and event data. Based on these degradation modelling
methods and additional information about costs and durations of maintenance
activities, optimal maintenance strategies with respect to long-run average cost per
unit time or availability can be further developed. This chapter develops a POSMDP
to optimise maintenance strategies of engineering assets with continuous
degradation processes and partially observable health states.
Due to the limitation of current CM technologies, the actual health state of an asset
may not be revealed accurately by health inspections. A maintenance strategy
ignoring this uncertainty of health inspections can cause additional costs or
downtime. Therefore, the maintenance decision-making should be based on a
degradation model that considers these imperfect inspections, when the uncertainty
of asset health inspections is not negligible. In this Chapter, the state space model is
used to model degradation processes with imperfect inspections. In the state space
degradation model, the current health state can be represented as a distribution
conditional on historical maintenance activities and inspection results. For a state
space model discrete in state, the dimension number of this distribution is equal to
the number of health states minus one. For a continuous state space model, the
dimension number of this distribution can become infinite. Maintenance decision-
making based on these multi-dimensional health state distributions is more complex
than that based on known values of health states.
A commonly used approach to performing maintenance strategy optimisation for a
partially observable degradation process is the POMDP. As an extension of the
MDP, the POMDP can deal with the state dependent maintenance costs (or
durations) and multiple maintenance actions effectively. Moreover, when
performing maintenance strategy optimisation, the POMDP does not assume special
5 Maintenance Strategy Optimisation Using the POSMDP 106
strategy structures (e.g., the control limit theory) which are not necessary optimal.
However, it has been identified that the existing POMDPs adopted in maintenance
decision-making are largely discrete in time and have a limited number of health
states. While these two assumptions make the POMDP more mathematically
tractable, the discrete time assumption requires the health state transitions and
maintenance activities only happen at discrete epochs, which cannot model the
failure time accurately and is not cost-effective. A limited number of health states,
on the other hand, may not be elaborate enough in improving the effectiveness of
maintenance.
To optimise maintenance strategy for the Gamma-based state space model that is
continuous in time and state, this chapter develops a POSMDP which is continuous
in time and state. When the state of a POMDP is continuous, the dimension number
of the health state distributions may become infinite. To reduce the dimension
number of the health state distributions, this research adopts the density projection
method that was used in the parametric POMDP (Brooks et al. 2006; Brooks and
Williams 2007; Zhou et al. to appear). By using a Monte Carlo-based density
projection method, the POSMDP is converted to a completely observable SMDP.
The converted SMDP is then solved using the policy iteration adopted in (Tijms and
van der Duyn Schouten 1985; Moustafa et al. 2004). Because Monte Carlo-based
methods are used during density projection, the proposed POSMDP can deal with
non-Gaussian non-linear state space models.
The remnant of this chapter is organised as follows. Section 5.1 introduces
formulations and notations used in this Chapter. Section 5.2 applies the POSMDP to
optimise the maintenance strategy in which inspection intervals are fixed and
preventive replacement can only happen immediately after inspections. Section 5.3
optimises both the next maintenance activity and the waiting duration until the next
maintenance activity simultaneously. Section 5.4 further considers imperfect
maintenance with random effects and state dependent random durations.
5 Maintenance Strategy Optimisation Using the POSMDP 107
5.1 Problem Formulation
In this chapter, the degradation process of an engineering asset is assumed to follow
the Gamma-based state space model given by the state equation
∆ ~Ga · ∆ , (5-1)
and the observation equation
~ , , (5-2)
where denotes the health state of an asset at time . A larger value of
indicates a worse health state. When crosses a predetermined failure threshold
Λ , a failure will occur. The underlying health state follows a Gamma process.
The function Ga · ∆ , denotes the PDF of a Gamma distribution with a shape
parameter · ∆ and a scale parameter . An asset does not have any initial defects
( 0 0). The observation of the health state is assumed to follow a normal
distribution with a mean value and a standard deviation .
Three types of maintenance activities are considered in this chapter, i.e., health
inspections, replacement (preventive or corrective), and imperfect maintenance. The
costs of the inspection, replacement, and imperfect maintenance are denoted by ,
, and . The corresponding durations are , , and . An additional cost
and a breakdown of length will be incurred when a failure happens. The failure
is assumed to be detected immediately, and followed by a corrective replacement.
Both the corrective and preventive replacement bring an asset to a state as good as
new, while an imperfect maintenance improves the asset health to a state better than
old and worse than new.
In Sections 5.2 and 5.3, the objective function of the maintenance strategy
optimisation is the long-run expected cost per unit time. The optimisation with the
objective to maximise long-run availability is investigated in Section 5.4.
5 Maintenance Strategy Optimisation Using the POSMDP 108
5.2 Regular Maintenance Intervals
This section demonstrates the solving process and the performance of the POSMDP
by investigating a CBM strategy with regular maintenance intervals. In this
maintenance strategy, only preventive replacement and health inspections are
considered, and the inspection interval ∆ is a state independent constant. The
degradation process and the inspection results are assumed to follow Equations (5-1)
and (5-2). A failure is assumed to be detected immediately, and is followed
immediately by a corrective maintenance. The durations to carry out maintenance
activities and the breakdown caused by a failure are assumed negligible (i.e.,
0 ). An optimal maintenance strategy minimising the long-run
expected cost per unit time is developed using the POSMDP. The obtained strategy
is compared with a strategy simply ignoring the observation noise and a heuristic
strategy setting a fixed threshold on average values of filtering particles.
5.2.1 Solving the POSMDP
When a degradation process follows the Gamma-based state space model, the
particle filter is used to estimate the health state and the dimension number of the
beliefs is outsized. These high dimensional beliefs make the POSMDP difficult to
solve. To reduce the dimension of the beliefs, this research first performs a Monte
Carlo based density projection, which projects the beliefs to a parametric distribution
space. Then grid points are selected in this projected belief space. After that the
relative cost functions starting in these grid points are established and the policy
iteration is conducted to identify optimal maintenance strategies at these grid points.
The detailed solving process of the POSMDP is introduced as follows.
5.2.1.1 Density Projection
After particle filtering, the beliefs of the POSMDP can be obtained as a set of
particles and the corresponding weights, i.e. , ; 1,2, … , ∑
1 , where is the th particle of the filtering result, is the weight of and
5 Maintenance Strategy Optimisation Using the POSMDP 109
is the number of particles used during the filtering. The dimension number of the
belief space is equal to 2 1. A large number of particles are often used during
particle filtering to obtain accurate estimates of asset health states. Consequently, the
belief space has a high dimension. To reduce the belief dimension, space is
projected to a new parametric density space Ω ·; ; Θ , where ·; is
the PDF of a certain parametric distribution, and Θ is the parameter space of this
distribution. This density projection is performed by the maximum likelihood
estimation as
Ω b arg max Ω ∑ log ; . (5-3)
Thus the dimension number of the projected belief space Ω is reduced to the the
number of parameters used by the distribution ·; .
The parametric distribution ·; should be able to closely approximate the
original belief with a small number of parameters. According to the particle
filtering results of the Gamma-based state space model, two candidates are
considered in this research, i.e. the Gaussian distribution and the Beta distribution.
The Gaussian distribution is one of the most commonly used distributions, and is
straightforward to apply. However, the domain of the Gaussian distribution is
∞ ∞ , while the filtering particles vary from 0 to Λ . Consequently, this
research censors the original Gaussian distribution to the domain 0 . The PDF
of the censored Gaussian distribution is given by
; , √·
√
. (5-4)
Dissimilar to the Gaussian distribution, the Beta distribution is defined on the
domain 0 1 . Therefore, the ratio of the health state to the failure threshold
can be assumed to follow the Beta distribution, and the PDF of is given by
; ,,
1 · . (5-5)
To compare the two candidate parametric distributions, a simulation study is carried
out. During the simulation study, the parameters of the Gamma-based state space
5 Maintenance Strategy Optimisation Using the POSMDP 110
model was set as 2.5 and 0.5. Different levels of observation noise were
considered, i.e. 0.1, 0.3, 0.5, 0.7, and 0.9. The failure
threshold was Λ 4. These parameters were selected for demonstration only and
without particular physical meanings. The inspection interval was ∆ 0.65, which
is the optimal inspection interval when health states are completely observable. The
derivation of this optimal inspection interval is given in Section 5.2.2. One thousand
sequences of simulated degradation data were generated, and treated by particle
filtering with 1000 particles. The filtering results were then fitted by the censored
Gaussian distribution and the Beta distribution respectively. Because the two
distributions both have two parameters, the fitness results can be simply evaluated
by likelihood values. The distribution with a higher likelihood value is preferred.
The mean values of the likelihood when the censored Gaussian distribution and the
Beta distribution fitted to the data were calculated as Table 5-1. Table 5-1 shows that
the fitting results of the censored Gaussian distribution were better than those of the
Beta distribution, especially for small observation noise. The parameter estimation
results when the two distributions were fitted to the filtering results of the simulation
data with observation noise 0.3 are plotted as Figure 5-1 and Figure 5-2. The
two figures show that the parameter spreading of the censored Gaussian distribution
is more regular than that of the Beta distribution. Consequently, the parameter space
of the censored Gaussian distribution can be discretised more easily. For the above
two reasons, the censored Gaussian distribution is adopted as the projected
parametric distribution ·, .
5 Maintenance Strategy Optimisation Using the POSMDP 111
Figure 5-1: Parameters spreading of the censored Gaussian distribution
Figure 5-2: Parameters spreading of the Beta distribution
5 Maintenance Strategy Optimisation Using the POSMDP 112
Table 5-1: Mean likelihood values of the Censored Gaussian distribution and the Beta
distribution under different observation noise
Observation noise σ 0.1 0.3 0.5 0.7 0.9
Censored Gaussian distribution 956.5 17.62 -331.1 -522.2 -641.4
Beta distribution -2331 -1384 -1037 -850.0 -732.0
To solve the POSMDP, the projected belief space Ω ·, ; Θ should be
discretised. The discretisation of Ω is essentially selecting a set of grid points
, ; 1,2, … , in the parameter space Θ, where is the number of the
grid points. The corresponding sampling points in the projected belief space are
Ω ·, ; 1,2, … , Ω. As shown in Figure 5-1, most points of
parameters appear in a certain area of the parameter space, i.e.,
, |0 4, 0.1 0.34 . In this situation, the grid points are only
chosen from this area with a proper resolution. The principle of selecting the grid
points in Θ depends on the relative cost function that is used in policy iteration and is
discussed in Section 5.2.1.2.
5.2.1.2 The Relative Cost Function of the POSMDP
The relative cost function is a crucial part of the policy iteration algorithm that
solves the SMDP (Maillart 2006). It formulates the relative cost of a single step in a
long-run decision process. According to the assumptions discussed at the beginning
of Section 5.2, the relative cost function can be written as
min , . (5-6)
Here, denotes the relative cost starting in the projected belief state , where
and denote the relative costs starting in if the “preventive
replacement” and “do nothing” strategies are adopted, respectively. Further,
is given by
, (5-7)
where denotes the relative cost when an asset is brand new, i.e., 0;
can be calculated as
5 Maintenance Strategy Optimisation Using the POSMDP 113
1 ∆ |
∑ ∆ | ∆ |; (5-8)
∆ is the inspection interval; is the long-run minimum expected cost per unit time;
∆ | is the expected reliability at the next inspection epoch given that the
current belief state is projected as ; ∆ | is the expected survival time during
the next inspection interval when the current projected belief is . According to the
properties of the Gamma process, ∆ | and ∆ | can be calculated as
∆ | Pr Λ ∆ |·∆ ,
·∆ √
√
(5-9)
and
∆ | |∆
· ,· √
∆
√
. (5-10)
The matrix is the transition matrix in the discretised projected belief space
Ω over one inspection interval, i.e.
Pr ∆ , ∆ . (5-11)
The calculation of is discussed in detail later.
After the relative cost functions are established, an efficient strategy to select grid
points from the parameter space Θ can be developed. According to Equation
(5-7), the relative cost is independent from the projected belief state when the
preventive replacement is the optimal maintenance action. The preventive
replacement is optimal only if an asset is in a poor health state. Subsequently, when
constructing grid points in Θ, only one value of is needed to represent the situation
when the optimal strategy is preventive replacement. On the other hand, a high
resolution should be applied when is near the preventive replacement threshold.
According to Equation (5-8), the calculation of the relative cost at when is near
the preventive replacement threshold may depends on the relative costs at all the
5 Maintenance Strategy Optimisation Using the POSMDP 114
sample points. However, due to the monotonous increasing property of the
underlying degradation process, the value is close to zero if . Therefore,
the resolution of can be lower when it is much smaller than the preventive replace
threshold.
5.2.1.3 Calculation of the Transition Matrix
Due to the non-Gaussian property of the Gamma-based state space model, the
transition matrix is calculated through Monte Carlo-based methods. The belief
state is obtained using the particle filter based on both the previous belief state
∆ and the current observation . Therefore, the observation is
considered during the calculation of . The elements of are calculated as
Pr ∆ , ∆Pr ∆ , ∆ , ∆
· ∆ , ∆. (5-12)
The first component of Equation (5-12) denotes the conditional probability density
of the observation after one inspection interval given the current belief state and the
fact that the failure does not happen during that inspection interval. The second
component of Equation (5-12) is the conditional probability that the discretised
projected belief state equals to at the next inspection epoch given the current
projected belief state, the observation at the next inspection epoch, and the fact that
the failure does not happen. According to Equation (5-12) the Monte Carlo-based
algorithm that calculates is developed as in Table 5-2.
Table 5-2: The Monte Carlo-based method that calculates the transition matrix
Step 1: Generate 2 samples of the health state , , … , from the censored
Gaussian distribution ·, .
Step 2: Predict the corresponding health states , , … , after one inspection
interval according to the state equation (2-1).
5 Maintenance Strategy Optimisation Using the POSMDP 115
Step 3: Resample 2 samples of the health state , , … , from ;
1,2, … ,2 and , i.e. the subset of the health state samples
, , … , whose values indicate that a failure does not happen.
Step 4: Generate observation samples , , … , corresponding to the
health state samples , , … , using the observation equation (5-2).
Step 5: Calculate the weights of the health state samples , , … , ,
using each observation sample , according to the observation equation
(5-2). The calculation process is as follows:
; , ∑ ; , ; 1,2, … , ;
1,2, … , .
Step6: Project the sample-weight sets , ; 1,2, … , ;
1,2, … , to the parametric distribution space Ω according to the projection
function (5-3), and get samples in the projected belief space Ω as
; 1,2, … , .
Step 7: Find the nearest neighbour of each projected belief state in the
discretised projected belief space Ω ; 1,2, … , , record the
frequency of all the elements in Ω as ; 1,2, … , , obviously,
∑ .
Step 8: Obtain elements in the th row of the transition matrix , as
/ ; 1,2, … , . The elements in the other rows of can be obtained
in the same way.
5 Maintenance Strategy Optimisation Using the POSMDP 116
At the first step, 2 instread of samples of the health state are generated so that
the independence between the observation samples , , … , and the health
state samples , , … , is guaranteed. The resampling in the third step is
to satisfy the condition contains in the first component of Equation (5-12), i.e. the
failure does not happen. When the distance between two distributions is measured
during the seventh step, the commonly used Kullback–Leibler divergence (KL
divergence) is a candidate. However, in this research, the grid points
, ; 1,2, … , in the parameter space all locate at the vertices of rectangles.
Therefore, for any point in the parameter space, there is a unique grid point
that satisfies the following two equations simultaneously: arg min , ,…, |
| and arg min , ,…, | | . This research defines a distance measure
·,· given by
·, , ·, | |, (5-13)
which can be calculated much more efficiently than the KL divergence. Because the
calculation of the distance between distributions is performed times for each row
in , adopting ·,· instead of the commonly used KL divergence can improve the
overall efficiency of the algorithm in Table 5-2 significantly.
5.2.1.4 Policy Iteration
The policy iteration is to find an optimal maintenance policy that minimises the
long-run expected cost per unit time. A policy is denoted as
Ω and , , where represents “do nothing”, and stands for
“preventive replacement”. The main idea of policy iteration is calculating a new
policy · iteratively by minimising the relative cost obtained using the current
policy · . This iteration continues until · convergences to an optimal policy
· . For this particular maintenance strategy optimisation problem, the process of
policy iteration is demonstrated in Table 5-3.
5 Maintenance Strategy Optimisation Using the POSMDP 117
Table 5-3: The process of policy iteration for the POSMDP Step 1: Set an initial maintenance policy: and for
1,2, … , , where the belief denotes the brand new health state.
Step 2: Solve the following system of equations of ; 1,2, … , and :
· · ; 1,2, … ,
, where 0, and are given by (5-8) and (5-7), and
· is the indicator function given by
0,1, (5-14)
Step 3: Calculate relative cost functions and 1,2, … , using
the solutions obtained in Step 2.
Step 4: Obtain the improved policy by:
,, 1,2, … , (5-15)
Step 5: If · · , the optimal maintenance strategy · is obtained as
· . Otherwise, go to Step 2 and start a new iteration.
The obtained optimal maintenance strategy · is defined in the discretised
projected belief space Ω . Therefore, when this strategy is implemented, the belief
obtained by the particle filter should be projected to the parametric distribution space
Ω, and then discretised to the space Ω using the nearest neighbourhood method.
5.2.2 Simulation Study
The process of solving the POSMDP entails the projection of beliefs that are
obtained by particle filtering to a parametric distribution space, the discretisation of
the projected belief space, and the Monte Carlo-based method that calculates the
5 Maintenance Strategy Optimisation Using the POSMDP 118
transition matrix. These approximations may affect the optimisation results.
Therefore, it is important to investigate the performance of the developed POSMDP
through a simulation study. During this simulation study, the maintenance strategy
developed using the POSMDP was compared with a strategy simply ignoring the
observation noise and a heuristic strategy setting a preventive replacement threshold
on the mean values of filtering particles.
In this simulation study, the parameters of the Gamma-based state space model were
selected as follows: 2.5, 0.5, and different values of were considered,
i.e., 0.1 , 0.3 , 0.5 , and 0.9 . The failure threshold on the
underlying Gamma process was set as 4 . A large number (i.e. 106) of
degradation sequences were generated for each value of . The costs of
maintenance activities are set as: 0.1, 1, and 10.
When the underlying health states can be observed deterministically, the renewal
theory can be applied to identify the preventive replacement threshold that
minimises the long-run expected cost per unit time. The derivation of the preventive
replacement threshold using the renewal theory was discussed in (Park 1988). Using
the algorithm developed in (Park 1988), the optimal thresholds for different
inspection intervals were calculated and the corresponding long-run expected costs
per unit are plotted in Figure 5-3. As shown in Figure 5-3, the optimal inspection
interval is 0.65 and the corresponding preventive replacement threshold is 1.836.
Setting this preventive replacement threshold on the imperfect inspection results of
the simulation data can get the average costs per unit time as the first column of
Table 5-4. The preventive replacement threshold was also set on the mean values of
filtering particles, and the obtained average costs per unit time are listed in the
second column of Table 5-4. During the particle filtering, 400 particles were used.
5 Maintenance Strategy Optimisation Using the POSMDP 119
Figure 5-3: Minimum long-run average cost according to different inspection intervals when
actual health states are observable
This maintenance decision-making problem was also solved through the POSMDP.
When discretising the projected belief space, the selection of grid points followed
the principle discussed in Section 5.2.1.2. For example, when the observation noise
0.3, the sampling points of the standard deviation were selected from 0.1 to
0.34 with a sample interval 0.03. The mean value was sampled with multiple
resolutions: From 0.2 to 1.4, the resolution was 0.2; from 1.5 to 1.86 the resolution
was 0.01; a single sample point at 1.9 was selected to represent the defective health
state that needs preventive replacement. Four hundred particles were used in the
Monte Carlo method that calculates the transition matrix . The obtained optimal
maintenance strategies when observation noise σ 0.3 are shown in Figure 5-4.
The figure shows that the optimal maintenance strategies do not only depend on the
mean value parameter of censored Gaussian distribution, but also depend on its
standard deviation parameter. Therefore, the maintenance strategy developed using
the POSMDP is different from the heurist strategy that just sets a threshold on the
5 Maintenance Strategy Optimisation Using the POSMDP 120
mean value of the filtering results. The average costs per unit time derived by the
POSMDP are listed in the last column of Table 5-4.
Figure 5-4: The results of the policy iteration when maintenance intervals are regular and the
standard deviation of the observation noise is .
Table 5-4: The long-run average costs derived by three methods (i.e., the method simply
ignoring the observation noise, the heuristic method, and POSMDP) when the observation noise
level is different
σ Ignoring the noise Heuristic method POSMDP
0.1 0.8028 0.8029 0.8028
0.3 0.8176 0.8163 0.8157
0.5 0.8474 0.8427 0.8397
0.7 0.8902 0.8779 0.8706
0.9 0.9363 0.9157 0.9002
Table 5-4 shows that the average costs are almost the same when the observation
noise is not significant. In contrast, when the observation noise becomes
5 Maintenance Strategy Optimisation Using the POSMDP 121
considerable, the POSMDP outperforms the method simply ignoring the observation
noise and the heuristic method that applies a preventive replacement threshold on the
mean values of filtering particles. Therefore, the POSMDP shows its advantages
when dealing with partially observable degradation processes, although
approximations are involved in its solution algorithm. However, the advantages of
the POSMDP cannot be fully reflected under this simple maintenance strategy
structure. Benefits to use the POSMDP are more obvious when state-dependent
inspection intervals, state-dependent costs, and durations of maintenance activities
are involved in maintenance strategy optimisation. In these more complex situations,
the long-run expected cost per unit time derived by the commonly used renewal
theory is difficult to evaluate. The POSMDP is used to investigate the maintenance
decision-making problems under more complex situations in the following two
sections.
5.3 State-Dependent Maintenance Intervals
A fixed maintenance interval is often not cost-effective in practice, and a dynamic
maintenance interval that depends on the current health state of an asset is more
rational. When a defect is detected, a further inspection or a replacement should be
scheduled in a short time to avoid a failure without pre-alarm. On the other hand,
unnecessary maintenance activities should be avoided when an asset is still in a good
health condition, because these unnecessary maintenance activities can introduce
additional cost. A premature preventive replacement can reduce the useful life of an
asset. Some health inspections can be also expensive. For example, the inspection of
compressor blades on an aircraft engine involves engine disassembly (Hopp and
Kuo 1998). Some inspections in a process industry require disturbance of production
and the removal of highly corrosive and/or toxic chemicals from equipment.
Therefore, maintenance intervals should be optimised according to the current health
state to reduce the cost.
This section develops a POSMDP that can develop maintenance strategies with
state-dependent maintenance intervals. In this section, the action space of the
5 Maintenance Strategy Optimisation Using the POSMDP 122
POSMDP consists of two components: one is the next maintenance activity (i.e. the
inspection or the preventive replacement), and the other is the waiting duration till
the next maintenance activity. Consequently, both the next maintenance activity and
its corresponding waiting time are optimised by the POSMDP.
In this section, the degradation process still follows Equations (5-1) and (5-2). The
costs of inspection, replacement, and unexpected breakdown (i.e., , , and )
are assumed to be state independent. The durations for maintenance activities and
the unexpected breakdowns are still not considered.
5.3.1 The Formulations and Solution Method of the POSMDP
When the waiting duration for the next maintenance activity is considered, the
relative cost function becomes
min , ,,
, ∆ , , ∆ . (5-16)
Here, , ∆ given by
, ∆1 ∆ |∆ | ∆ |
1, ,
0
(5-17)
denotes the relative cost when the current projected belief state is , and preventive
replacement is conducted after ∆ , where 0,1, , and ∆ is the
maximum waiting time for the next preventive replacement. Similarly,
, ∆ given by
, ∆ 1 ∆ |∑ ∆ | ∆ | (5-18)
denotes the relative cost when the current projected belief state is , and an
inspection is conducted after ∆ , where 1,2, , and ∆ is the
maximum delay time for the next inspection. In this research, the resolution of the
sampling points of the waiting time for the next preventive replacement is higher
than that of the waiting time for the next inspection, i.e. ∆ ∆ . This difference
in resolutions is due to two reasons. Firstly, the next replacement time need to be
5 Maintenance Strategy Optimisation Using the POSMDP 123
determined more accurately, because the cost for preventive replacement is much
higher than that of a inspection in most situations, i.e., . Secondly,
, ∆ can be calculated more efficiently and a higher resolution does not
bring down the overall efficiency of the solving algorithm.
In Equations (5-17) and (5-18), · | · and · | · are the expected conditional
reliability and survival time which are calculated according to Equations (5-9) and
(5-10) respectively. ( 1,2, , ) is the transition matrix of the
discretised projected beliefs given that the transition epoch is ∆ and a health
inspection is conducted. The calculation of follows the same process in Table
5-2. The values of · | · and · | · are calculated analytically, while ;
1,2, , are identified through time consuming Monte Carlo-based method.
Therefore, the calculation of ; 1,2, , is the bottleneck of the
efficiency of the whole solving algorithm. This research partially addresses this
bottleneck by avoiding calculating the unnecessary rows in ; 1,2, , .
According to Equations (5-17) and (5-18), is only used to calculate the relative
cost when an inspection is conducted after ∆ . Therefore, when the reliability of
an asset is above some threshold, i.e., ∆ | , the inspection is not
necessary and is not required. Similarly, when the reliability is below some
threshold, i.e. ∆ | , preventive replacement is preferred and is
not necessary neither. By choosing proper thresholds and , the computing time
can be reduced significantly. Because, rows in are calculated independently,
additional rows can be added into the original calculation results when optimisation
results show that the and are not set appropriately.
After the elements in Equations (5-17) and (5-18) have been calculated, the optimal
maintenance activity and waiting time according to each discretised projected belief
state can be identified using the policy iteration.
5 Maintenance Strategy Optimisation Using the POSMDP 124
In this section, beliefs of POSMDP are still projected into the censored Gaussian
distribution given by Equation (5-4). However, the sampling strategy of the
parameters in Equation (5-4) is different from that used in Section 5.2. In Section 5.2
the mean parameter of the projected belief states near the preventive replacement
threshold was sampled at a higher resolution. In this section, both the maintenance
activity and its waiting duration are to be optimised. The approximate thresholds for
these action-duration combinations are difficult to estimate beforehand. Therefore,
regular grid points are adopted to discretise the parameter space of the projected
parametric density in this section.
5.3.2 Simulation Study
A simulation study was conducted to investigate the performance of the developed
POSMDP with state-dependent maintenance intervals. Firstly, optimal maintenance
strategies for different observation noise levels and inspection costs were identified
and effects of these two parameters on the optimal maintenance strategies structures
were investigated. Secondly, optimisation results obtained using the POSMDP were
compared with those derived by another maintenance optimisation algorithm
developed by Wang (Wang and Christer 2000; Wang 2003b).
In this simulation study, parameters of the Gamma-based state space model were
assumed as: 2.5, 0.5, and 4. Different values of (i.e., 0.1,
0.3, 0.5, and 0.7) were used. The costs of preventive replacement
and unexpected breakdown were 1 and 10 . Different values of
inspection cost, i.e., 0.1, 0.3, and 0.5, were used.
In this simulation study, the sampling resolutions and the maximum sampling unit
number of the waiting duration till the next maintenance action were ∆ 0.01,
200, ∆ 0.05, and 26. The mean parameter of the projected belief
state was sampled from 0.2 to 3.8 with a fixed interval of 0.02. The sampling points
of the standard deviation parameter were selected according to different standard
deviations of observation noise. The upper and lower thresholds of the reliability
5 Maintenance Strategy Optimisation Using the POSMDP 125
for calculating the elements of transition matrices were set as: 0.9999 ,
0.95. The results of the policy iteration showed that the interval
included all the situations when the inspection was the optimal action. When
calculating the elements of the transition matrices, 400 particles were used by the
Monte Carlo method. Optimal maintenance strategies were then obtained through
the policy iteration. Some results of the policy iteration for different standard
deviations of observation noise and inspection costs are demonstrated in Figure 5-5,
where the numbers above the marks are the waiting durations till the corresponding
maintenance actions. As shown in Figure 5-5, when the observation noise and
inspection cost were not substantial, an inspection was the optimal maintenance
activity when the asset health state was under a certain threshold. On the contrary,
when the observation noise was significant or the inspection was costly, the only
optimal maintenance activity was preventive replacement. In this situation,
performing an inspection was not economical for every projected belief state and the
optimal maintenance strategy became a time-based preventive maintenance strategy.
In this simulation study, when , 0.1, 0.5 , 0.3, 0.5 , 0.5, 0.3 ,
0.5, 0.5 , 0.7, 0.3 , 0.7, 0.5 , the inspection was not economical for every
projected belief state. The optimal time-based replacement interval obtained by the
POSMDP was 1.5. This result was consistent with that derived using the renewal
theory, i.e. 1.5014. Therefore, the proposed POSMDP can adopt different
maintenance strategy structure according to the change of observation noise level
and inspection cost.
Figure 5-5 (a) shows that the waiting durations can increase with standard deviation
for small mean values at certain points. Similarly, Figure 5-5 (c) shows that the
optimal maintenance strategy shift from preventive replacement to inspections with
the increment of the mean value . This is due to the approximations adopted in the
developed solving algorithms. The censored normal distribution is not close to the
filtering result of the Gamma-based state space model when the mean parameter is
small. Therefore, using different types of distributions to approximate filtering
results can improve the maintenance strategy optimisation result at the expense of
5 Maintenance Strategy Optimisation Using the POSMDP 126
increasing complexity of algorithm. The result of this simulation study shows that
current algorithms have already obtained better performance than that of an existing
approximate optimisation algorithm proposed by Wang (Wang and Christer 2000;
Wang 2003b). The POSMDP that performs density projection to a space with multi-
type distributions will be investigated in future.
(a) 0.3, 0.1
5 Maintenance Strategy Optimisation Using the POSMDP 127
(b) 0.3, 0.5
(c) 0.1, 0.3
5 Maintenance Strategy Optimisation Using the POSMDP 128
(d) 0.7, 0.3
Figure 5-5: Some results of the policy iteration for POMDP with irregular maintenance
intervals (the numbers in rectangles are the optimal waiting durations till the corresponding
maintenance actions)
These obtained maintenance policies were applied to the simulated degradation data.
For each pair of an observation noise standard deviation value and an inspection
cost, 5 10 sequences of simulated degradation data were generated. The average
costs per unit time were then calculated as the third column of Table 5-5. The
simulation results showed that a lower average cost could be obtained when the
observation noise was small and the cost of inspection was inexpensive. This is
consistent with the intuition: the maintenance cost can be reduced by using accurate
and cost-effective health inspection technologies.
The proposed POSMDP was compared with another approximate maintenance
strategy optimisation method developed by Wang (Wang and Christer 2000; Wang
2003b). Wang identified the optimal replacement time given a fixed inspection
interval through an approximate renewal theory. The optimal inspection interval was
5 Maintenance Strategy Optimisation Using the POSMDP 129
identified through simulation studies. The state space degradation model
investigated by Wang is different from the Gamma-based state space model.
However, the assumptions of the two models are similar, and the method developed
by Wang can also be used to process the Gamma-based state space model with slight
modification. For the Gamma-based state space model, the approximate renewal
theory used by Wang can be written as
1 | 1 ∆ 1 | , (5-19)
where is the remainder time to a replacement and denotes the health state
estimate at the th inspection point. For the Gamma-based state space model, the
health state estimates are obtained by particle filtering. The | is the expected
reliability before the next replacement given the current belief state . The |
is the expected survival time before the next replacement when the current belief
state is . An optimal replacement time can then be obtained by minimising
Equation (5-19). If ∆ , a preventive replacement is carried out after .
Otherwise, another inspection is performed after ∆ , and a new decision is to be
made based on the new inspection result.
To identify a cost-effective inspection interval, Wang performed repeated simulation
studies using different inspection intervals. The inspection interval that has the
lowest average cost per unit time is selected as the optimal inspection interval ∆ . In
this research, the number of simulations for each inspection interval is 6×104. The
average costs corresponding to the optimal inspection interval ∆ are listed in the last
column of Table 5-5.
As shown in Table 5-5 and Table 5-4, both the POSMDP that considers state
dependent maintenance intervals and Wang’s method can develop more cost
effective maintenance strategy than the POSMDP that adopts fixed maintenance
intervals. However, the POSMDP has better performance than the approximated
renewal theory developed by Wang, especially when the inspection noise and cost is
not significant. The reason is that the proposed POSMDP uses a state dependent
inspection interval instead of the fixed inspection interval adopted by Wang. Wang
5 Maintenance Strategy Optimisation Using the POSMDP 130
also provided a method to identify state dependent inspection intervals in (Wang
2003b) by conducting intensive simulation studies after every inspection. This
method that identifies the optimal inspection interval is applicable for Wang’s state
space model in which the approximate average cost per unit time can be evaluated
efficiently. For the Gamma-based state space model, Wang’s method is not
applicable due to its low efficiency. On the other hand, when the strategy derived by
the POSMDP is implemented, the optimal maintenance activities and their
corresponding waiting durations are identified according to the pre-calculated policy
function · . Therefore, the proposed POSMDP can consider the state dependent
inspection interval more efficiently.
Table 5-5: The long-run average costs per unit time derived by the POSMDP with irregular
inspection interval and the method proposed by Wang (Wang and Christer 2000; Wang 2003b).
Observation
noise ( )
Inspection
cost ( )
Average cost derived
by POSMDP
Average cost derived
by Wang’s method
0.1 0.1 0.7329 0.7548
0.1 0.3 0.8745 0.8765
0.1 0.5 0.8927 0.8925
0.3 0.1 0.7477 0.7723
0.3 0.3 0.8871 0.8924
0.3 0.5 0.8920 0.8949
0.5 0.1 0.7739 0.7985
0.5 0.3 0.8930 0.8955
0.5 0.5 0.8926 0.8938
0.7 0.1 0.8020 0.8239
0.7 0.3 0.8921 0.8934
0.7 0.5 0.8928 0.8940
5 Maintenance Strategy Optimisation Using the POSMDP 131
5.4 Maintenance Strategy Considering Imperfect
Maintenance
The above two sections only consider preventive replacement that brings an asset to
a brand new state. However, for some engineering asset, a more cost-effective
option for a moderate degradation state is imperfect maintenance. Compared with
preventive replacement, imperfect maintenance can be performed more
economically and efficiently; though it can only partially improve the health state of
an asset. Therefore, when preventive replacement and imperfect maintenance can be
both adopted, a maintenance strategy should strike a balance between maintenance
effects and maintenance costs (or durations). For the commonly used renewal
theory, the selection from preventive replacement and imperfect maintenance is not
straightforward. Some special strategy structure (e.g., control limit theory) should be
assumed to establish the ratio of the expected cost per renewal cycle and the
expected length of a renewal cycle. Unfortunately, these special strategy structures
are not optimal in all the situations (Moustafa et al. 2004).
As an extension of MDP, the proposed POSMDP simply treats preventive
replacement and imperfect maintenance as two different actions when modelling a
maintenance decision process. The two actions are selected according to the current
health state of an asset, and no special strategy structure is required. Subsequently,
the POSMDP can develop a strategy considering both preventive replacement and
imperfect maintenance under a flexible strategy structure. In addition, the POSMDP
decomposes a long-run maintenance decision process into single maintenance
cycles. Consequently, random effects and durations of imperfect maintenance can be
formulated easily.
In this section, imperfect maintenance is assumed to improve the health of an asset
to a random level which depends on the current health state. The duration of
imperfect maintenance is also assumed to be a random value that relates to the
current health state. In this section, the objective function of the maintenance
strategy optimisation is the long-run availability (i.e. the ratio of the running time to
5 Maintenance Strategy Optimisation Using the POSMDP 132
the total time). In some real applications, the costs of maintenance activities and
failures are difficult to estimate accurately, and the cost objective function may be
sensitive to the uncertainty of these cost estimates. On the other hand the running
time and the down time of an asset can be measured accurately. Therefore, the
availability is a more effective objective to optimise maintenance strategies in these
applications, and this section shows that the proposed POSMDP can be also used to
develop a maintenance strategy that maximises the availability of an asset.
It is worth mentioning that the uncertainty of the costs of different maintenance
actions can be also considered by the algorithms developed in this section. Due to
limited course duration, this consideration has not been further explored in this
thesis.
5.4.1 The Formulations and the Solution Method of the POSMDP
In this section, the degradation process is still assumed to follow the Gamma-based
state space model given by Equations (5-1) and (5-2). Three types of maintenance
activities are considered, i.e. health inspections, replacement, and imperfect
maintenance. The duration of a health inspection is denoted by ; the duration of
the replacement is ; the length of a breakdown caused by a failure is denoted by
. It is assumed that , and , , are all state-independent. The
duration of imperfect maintenance follows the exponential distribution with a mean
value given by
· exp , (5-20)
where is the current underlying health state and denotes the corresponding
random duration of imperfect maintenance. The ratio of the underlying health states
after and before imperfect maintenance is modelled by a Beta distribution and the
PDF of the underlying health state after the imperfect maintenance can be calculated
as:
|,
1 · ,(5-21)
5 Maintenance Strategy Optimisation Using the POSMDP 133
where and denote the underlying health states before and after the
imperfect maintenance respectively. It is assumed that, inspections, replacement, and
imperfect maintenance can be only performed when an asset is shut down.
Maximising the long-run availability is equal to minimising the long-run expected
downtime per unit time. Consequently, the relative cost functions developed in
Section 5.2 and 5.3 are modified to a relative downtime function given by
min , ,,
, ,
, ∆ , , ∆ , , ∆ . (5-22)
Here, , ∆ denotes the relative downtime when the initial projected
belief state is and preventive replacement is performed after ∆ . Similarly
the relative downtime when the inspection and imperfect maintenance is selected is
denoted as , ∆ and , ∆ , respectively, where ∆ and
∆ are the delay time to carry out an inspection and imperfect maintenance. An
assumption used in this section is that every imperfect maintenance activity is
followed by an immediate inspection. This inspection is to measure the result of the
imperfect maintenance, and its duration is included in that of the imperfect
maintenance.
The relative downtime starting in projected belief state when preventive
replacement will be performed after ∆ is calculated as
, ∆1 ∆ |∆ | ∆ |
1, ,
0
, (5-23)
where ∆ | is the expected waiting time till the next maintenance activity.
The next maintenance activity can be the planned preventive replacement, or a
corrective replacement that follows an unexpected failure. ∆ | is
calculated as
∆ | |∆ 1 ∆ | . (5-24)
5 Maintenance Strategy Optimisation Using the POSMDP 134
The relative downtime when the inspection is performed after ∆ is formulated
as
, ∆ 1 ∆ |∑ ∆ | ∆ | . (5-25)
The expected waiting time till the next maintenance activity, i.e., ∆ | , is
given by
∆ | |∆
∆ | 1 ∆ |. (5-26)
The relative downtime starting in projected belief state when the imperfect
maintenance is performed after ∆ is given by
, ∆1 ∆ |
∆ | ∑· ∆ | ∆ |
1, ,
0| ∑ 0| 0
. (5-27)
Here, Δ | is the expected duration of the imperfect maintenance performed
after Δ given the current projected belief state , which is given by
Δ | Δ , Δ
· exp Gam ; · ∆ , ; ,. (5-28)
After Δ | is worked out, the expected waiting time till the next maintenance
activity can be calculated as
∆ | |∆
∆ | 1 ∆ |. (5-29)
Another component in Equation (5-27) is the transition matrix (i.e.
0,1 , )) of the projected belief states after imperfect maintenance.
Elements of can be denoted as
Pr b ∆ |b , ∆ ,
∆ , Λ ∆, (5-30)
5 Maintenance Strategy Optimisation Using the POSMDP 135
where ∆ means that the optimal action after ∆ is
imperfect maintenance, and ∆ is the observation after the imperfect
maintenance. The transition matrix is worked out through Monte Carlo-based
method as Table 5-6.
Table 5-6: The process to calculate the transition matrix using the Monte Carlo based method Step 1: Generate 2 samples of the health state , , … , from the censored
Gaussian distribution ·, .
Step 2: Predict the corresponding health states , , … , after ∆
according to the system equation (5-1).
Step 3: Resample 2 samples of the health state , , … , from ;
1,2, … ,2 , i.e., the subset of the samples , , … ,
whose values indicate a failure does not happen.
Step 4: Generate 2 samples of the health state , , … , after the
imperfect maintenance corresponding to the original health state samples
, , … , through Equation (5-21).
Step 5: Generate observation samples , , … , corresponding to the
health state samples , , … , using the observation equation (5-2).
Step 6: For each observation sample , calculate the weights of the health state
samples , , … , , according to the observation equation (5-2):
; , ∑ ; , ; 1,2, … , ;
1,2, … , .
5 Maintenance Strategy Optimisation Using the POSMDP 136
Step7: Project the sample-weight sets , ; 1,2, … , ;
1,2, … , to the parametric distribution space Ω according to the projection
function (5-3), and get samples in the projected belief space Ω as
; 1,2, … , .
Step 8: Find the nearest neighbour of each projected belief state in the
discretised projected belief space Ω ; 1,2, … , , record the
frequency of all the elements in Ω as ; 1,2, … , , obviously,
∑ .
Step 9: Obtain the th row of the transition matrix , as / ;
1,2, … , . The elements in the other rows of can be obtained in the
same way.
After different components in the relative downtime function are worked out, the
policy iteration is used to identify the optimal maintenance strategy that maximises
the availability.
5.4.2 Simulation Study
This simulation study investigated the effects of different maintenance activity
durations on policy iteration results. The parameters of the Gamma-based state space
model were selected as 2.5, 0.5, 0.3. The length of the breakdown
caused by an unexpected failure was assumed as 1. Different durations of
inspections and the preventive replacement were adopted, i.e., 0.003,
0.1 , 0.01, 0.1 , 0.003, 0.3 , and 0.003, 0.05 .
The expected duration of imperfect maintenance was assumed to follow Equation
(5-20), where: 0.02 , 0.003 , and 0.75 . The effects of imperfect
maintenance were given by Equation (5-21), where 2, and 3.
5 Maintenance Strategy Optimisation Using the POSMDP 137
The optimal maintenance strategies were derived as the process discussed in Section
5.4.1. Different strategies were obtained as Figure 5-6 according to different
durations of inspection and replacement durations. The four subfigures in Figure 5-6
demonstrate four different maintenance structures. Figure 5-6 (a) shows a typical
maintenance strategy structure that uses all the three types of maintenance activities.
The inspection is adopted when an asset is in a good health state. After the asset
degrades to a certain level, the imperfect maintenance is a better option. As the
degradation continuous, a preventive replacement becomes the most cost-effective
option. Figure 5-6 (b) shows that an inspection is not an optimal action for all belief
states if the duration of an inspection is too long. Similarly, as shown in Figure 5-6
(c) when a replacement is time consuming, the imperfect maintenance is performed
even when the asset is in a highly degraded state. On the contrary, as shown in
Figure 5-6 (d), when the preventive replacement can be carried out efficiently, the
imperfect maintenance is not optimal in all situations.
(a) 0.003, 0.1
5 Maintenance Strategy Optimisation Using the POSMDP 138
(b) 0.01, 0.1
(c) 0.003, 0.3
5 Maintenance Strategy Optimisation Using the POSMDP 139
(d) 0.003, 0.05
Figure 5-6: Some results of the policy iteration for POMDP considering imperfect maintenance
(the numbers in rectangles are the optimal waiting durations till the corresponding maintenance actions)
This simulation study shows that the POSMDP can develop maintenance strategies
with various structures according to different durations of maintenance activities.
Therefore, the POSMDP is an effective tool to optimise the maintenance strategy
when multiple maintenance actions can be selected from. Moreover, because the
POSMDP decomposes a long-run decision process into single steps, the state
dependent durations and effects of maintenance activities can be formulated easily.
Similar to the results obtained in Section 5.3 the next optimal maintenance activities
and the corresponding waiting time demonstrated in Figure 5-6 do not change
monotonically. This is caused by the approximate solving algorithm used in this
research. Adopting a hybrid parametric distribution set as the projected belief space
may improve the result. However the current solving algorithm already shows its
effectiveness in identifying the structure property of maintenance strategies, and the
non-monotone of obtained optimal maintenance strategies is not significant. The
5 Maintenance Strategy Optimisation Using the POSMDP 140
POSMDP using multi-type parametric distributions as projected belief space will be
investigated in the future.
5.5 Chapter Summary
This chapter has developed a POSMDP for maintenance strategy optimisation when
the health state of an asset can be only partially observed. Compared with the
existing POMDP methods that optimise maintenance strategies, the developed
POSMDP does not have the assumption of discrete time and state. Without these
two assumptions, degradation processes can be modelled more accurately and more
cost-effective maintenance strategies can be developed. In this chapter, the
formulations and solving methods of the POSMDP for three different maintenance
decision-making problems have been discussed in detail. Simulation studies have
been performed to validate the effectiveness of the POSMDP applied to maintenance
decision-making. The results show that the developed POSMDP can derive cost-
effective maintenance strategies with flexible structures.
The proposed maintenance decision-making method has several advantages. Firstly,
Monte Carlo-based methods are used to solve the POSMDP. Consequently, the
POSMDP can be adopted to deal with various state space models without Gaussian
and linear assumptions. Secondly, as an extension of MDP, the POSMDP can
optimise the maintenance strategies without specifying a predetermined strategy
structure. Therefore, the POSMDP can derive more flexible maintenance strategies
when multiple maintenance activities are available. Finally, the POSMDP
decomposes a long-run decision process into single steps. Therefore, some practical
issues (e.g., the state dependent maintenance costs or durations, and the uncertain
maintenance effects) can be formulated easily.
Though some investigations on the POSMDP have been performed by this chapter,
further research is still needed. One possible extension of the proposed POSMDP is
changing the horizon of the current POSMDP from infinite to finite. For a finite
5 Maintenance Strategy Optimisation Using the POSMDP 141
horizon, the policy iteration used in this chapter is no longer effective and a new
solving method is required.
142
6 Conclusions and Future Research Directions
6.1 Conclusions
The state space model has proven to be an effective tool to model the asset
degradation process where health states are only partially observable. A
comprehensive literature review has divulged the existing state space model used in
asset degradation modelling largely follows discrete time, discrete state, linear and
Gaussian assumptions:
1) The discrete time assumption implies that inspections and failures only
happen at predetermined discrete time with fixed intervals.
2) The discrete state assumption requires classifying continuous degradation
states into finite number of states. This classification largely depends on the
expert knowledge. The finite number of states may not be elaborate enough
in describing the asset health state.
3) The linear and Gaussian assumption, on the other hand, is not in consistent
with nonlinear and monotonically increasing property of most engineering
asset degradation processes between two adjacent maintenance activities.
To address these limitations, this research adopts a Gamma-based state space model
that describes partially observable asset degradation processes. The parameter
estimation, lifetime prediction, and maintenance strategy optimisation algorithms for
the Gamma-based state space degradation model have been developed. The
developed models and algorithms have been justified by simulation and field data.
The thesis has presented three pieces of original work as follows:
1) Degradation process modelling of direct and indirect indicators using the
Gamma-based state space model;
2) Joint modelling of failure events and multiple indirect indicators using the
Gamma based state space model;
3) Maintenance strategy optimisation using the continuous state POSMDP.
Detailed conclusions are summarised in following sections.
6 Conclusions and Future Research Directions 143
6.1.1 Modelling Correlated Degradation Processes of Direct and Indirect Indicators
Direct indicators provide more accurate references for asset life prediction and
maintenance decision-making; however, they are often more difficult to obtain and
are often incomplete. Indirect indicators, on the other hand, can be collected more
easily through various condition monitoring techniques and are more available. The
state space model provides an efficient approach to estimate direct indicators using
indirect indicators. However, existing state space models that describe the
degradation processes of direct and indirect indicators largely follow the discrete
time, discrete state, linear and Gaussian assumptions.
To address this research gap, several original contributions have been made:
1) This research for the first time uses a Gamma-based state space model to
describe the degradation processes including both direct and indirect
indictors.
2) This research has developed a Monte Carlo-based EM algorithm to estimate
the model parameters. Both direct and indirect indicator observations are
considered by the EM algorithm.
The Gamma-based state space model has several advantages while modelling the
degradation processes of direct and indirect indicators
1) The underlying Gamma process can achieve a better fitness result when
describing the monotonically increasing degradation process of a direct
indicator than the Gaussian process.
2) More complex relationships between direct and indirect indicators can be
described by the nonlinear property of the Gamma-based state space model.
3) The situation that direct indicator observations are less than indirect indicator
observations can be dealt with by the EM algorithm developed in this
research.
6 Conclusions and Future Research Directions 144
6.1.2 Joint Modelling of Failure Events and Multiple Indirect Indicators
For some degradation processes, direct indicators are difficult to obtain; instead,
failure events information is more available. In these situations, failure event data
should be utilised, and should especially be utilised jointly with multiple indirect
indictors to improve the confidence of the model outcome for an asset degradation
process. Two issues need to be addressed on this regard. Firstly both the failure
times and censoring times of assets should be considered. The second issue is that
the information from multiple indirect indicators should be fused appropriately. This
research uses the Gamma-based state space model to address these two issues, and
some original work has been done in this research:
1) A Monte Carlo based EM algorithm to consider multiple indirect indicators
failure times and censoring times has been developed.
2) A parametric bootstrap method to evaluate the effectiveness of different
degradation indicators has also been developed.
The models and algorithms developed in this research for asset life prediction using
failure events and multiple degradation indicators have several advantages:
5) The situation where event data are insufficient can be overcome considering
both degradation indicators and event data.
6) The likelihood function that considers failure events can be established more
concisely using the monotonically increasing Gamma process.
7) A more cost effective condition monitoring system can be established when
only installing necessary sensors, and the size of the database that stores
degradation indicators can be reduced, after the effectiveness of different
degradation indicators is identified.
8) Other types of nonlinear non-Gaussian state space models can be also
processed by the algorithms developed in this research. Therefore, the failure
time following different distributions can be modelled by the state space
degradation model.
6 Conclusions and Future Research Directions 145
6.1.3 Maintenance Strategy Optimisation Using the Continuous State POSMDP
This research models the maintenance decision-making process for the Gamma-
based state space model as a continuous state POSMDP. When the state of the
POSMDP is continuous, the dimension of the belief space can become infinite. This
research converts the POSMDP to a SMDP through a Monte Carlo density
projection method. The converted SMDP is then solved by the policy iteration.
Compared with existing POMDPs that are also applied in maintenance strategy
optimisation, the new features of the POSMDP is as follows:
1) The POSMDP used in this research is continuous in time which can model
maintenance activities and failures that happen at random times.
2) The POSMDP is continuous in state, while existing POMDPs used in
maintenance strategy optimisation only have limited number of states.
3) The POSMDP adopted in this research is based on a non-Gaussian state
space degradation model.
The simulation study shows that the continuous state POSMDP has several
advantages:
1) The POSMDP can derive more cost-effective maintenance strategies than
existing approximate maintenance strategy optimisation methods when state
space degradation models continuous in time and state are used.
2) Maintenance strategies with both regular and state-dependent inspection
intervals can be optimised by the POSMDP.
3) Flexible maintenance strategies with multiple maintenance activities can be
developed for different values of maintenance costs (durations)
4) The POSMDP decomposes a long-run decision process into single steps.
Therefore, concise formulations can be obtained.
6 Conclusions and Future Research Directions 146
6.2 Future Research
As state earlier, this research for the first time systematically investigates the
application of nonlinear non-Gaussian state space model in asset degradation
modelling. Several potential future research directions are presented as follows.
1) The computational efficiency of the proposed Monte Carlo-based algorithm
is still to be enhanced. The parameter estimation and maintenance
optimisation algorithms entails Monte Carlo-based algorithm that are
inefficient. Although, this research has developed strategies to make the
algorithms less computational expensive, more efficient algorithm will
improve the applicability of nonlinear non-Gaussian state space models in
the real world scenario.
2) During the parameter estimation of the Gamma-based state space model, the
choice of particle number and the convergence criterion largely follow
empirical approaches. A method to adaptively choosing the particle number
is to be developed. In addition, a more effective and efficient convergence
criterion is to be investigated.
3) In this research, an asset is regarded as a single component. In practice, an
asset often consists of multiple components. Two relationships between the
components should be taken into consideration, i.e. the stochastic
dependence and the economic dependence (Castanier et al. 2005).
Applications of the state space model to the life prediction and maintenance
decision-making of a multi-component system are required.
4) This research optimises maintenance strategies according to the long-run
expected cost per unit time or long-run expected availability. The
maintenance strategy optimisation method for a finite horizon has not been
discussed. When the optimisation horizon is finite, the POSMDP cannot be
solved by the policy iteration, and further research is required.
6 Conclusions and Future Research Directions 147
5) This research used the censored Gaussian distribution and the projected
belief space in POSMDP. In practice, using multi-type parametric
distributions as the projected space may improve the results of maintenance
strategies optimisation. However, efficient projection and distance
measurement algorithms are to be developed before apply the projected
belief space with multi-type distributions.
6) This research optimises maintenance strategies by simply minimising or
maximising an objective function. In reality, some constraints need to be
considered. These constraints can be reliability, availability, costs, or the
resource used during maintenance activities.
148
7 References
Akaike, H. (1974). "A New Look at the Statistical Model Identification." Automatic
Control, IEEE Transactions on 19(6): 716-723.
Amari, S. V. and L. McLaughlin (2004). Optimal Design of a Condition-Based
Maintenance Model. Reliability and Maintainability, 2004 Annual
Symposium - RAMS. L. McLaughlin: 528-533.
Amari, S. V., L. McLaughlin, et al. (2006). Cost-Effective Condition-Based
Maintenance Using Markov Decision Processes. Reliability and
Maintainability Symposium, 2006. RAMS '06. Annual: 464-469.
Andrieu, C., A. Doucet, et al. (2004). "Particle Methods for Change Detection,
System Identification, and Control." Proceedings of the IEEE 92(3): 423-
438.
Arulampalam, M. S., S. Maskell, et al. (2002a). "A Tutorial on Particle Filters for
Online Nonlinear/Non-Gaussian Bayesian Tracking." Signal Processing,
IEEE Transactions on 50(2): 174-188.
Arulampalam, M. S., S. Maskell, et al. (2002b). "A Tutorial on Particle Filters for
Online Nonlinear/Non-Gaussian Bayesian Tracking." IEEE Transactions on
Signal Processing 50(2): 174-188.
Banjevic, D. and A. K. S. Jardine (2006). "Calculation of Reliability Function and
Remaining Useful Life for a Markov Failure Time Process." IMA J
Management Math 17(2): 115-130.
Barata, J., C. G. Soares, et al. (2002). "Simulation Modelling of Repairable Multi-
Component Deteriorating Systems for `on Condition' Maintenance
Optimisation." Reliability Engineering & System Safety 76(3): 255-264.
7 References 149
Bertsekas, D. P. (2005). Dynamic Programming and Optimal Control. Belmont,
Mass., Athena Scientific.
Black, M., A. T. Brint, et al. (2005). "A Semi-Markov Approach for Modelling
Asset Deterioration." The Journal of the Operational Research Society
56(11): 1241.
Blischke, W. R. and D. N. P. Murthy (2000). Reliability : Modeling, Prediction, and
Optimization. New York, Wiley.
Bris, R., E. Châtelet, et al. (2003). "New Method to Minimize the Preventive
Maintenance Cost of Series-Parallel Systems." Reliability Engineering &
System Safety 82(3): 247-255.
Brooks, A., A. Makarenko, et al. (2006). "Parametric Pomdps for Planning in
Continuous State Spaces." Robotics and Autonomous Systems 54(11): 887-
897.
Brooks, A. and S. B. Williams (2007). A Monte Carlo Update for Parametric
Pomdps. International Symposium on Research Robotics.
Bunks, C., D. McCarthy, et al. (2000). "Condition-Based Maintenance of Machines
Using Hidden Markov Models." Mechanical Systems and Signal Processing
14(4): 597-612.
Cadini, F., E. Zio, et al. (2009). "Model-Based Monte Carlo State Estimation for
Condition-Based Component Replacement." Reliability Engineering &
System Safety 94(3): 752-758
Cassandra, A. R., M. L. Littman, et al. (1997). Incremental Pruning: A Simple, Fast,
Exact Method for Partially Observable Markov Decision Processes.
Uncertainty in Artificial Intelligence (UAI).
7 References 150
Castanier, B., A. Grall, et al. (2005). "A Condition-Based Maintenance Policy with
Non-Periodic Inspections for a Two-Unit Series System." Reliability
Engineering & System Safety 87(1): 109-120.
Cavanaugh, J. and R. Shumway (1997). "A Bootstrap Variant of Aic for State-Space
Model Selection." Statistica Sinica 7: 473-496.
Chan, G. K. and S. Asgarpoor (2006). "Optimum Maintenance Policy with Markov
Processes." Electric Power Systems Research 76(6-7): 452-456.
Chen, D. and K. S. Trivedi (2005). "Optimization for Condition-Based Maintenance
with Semi-Markov Decision Process." Reliability Engineering & System
Safety 90(1): 25-29.
Chopin, N. (2002). "A Sequential Particle Filter Method for Static Models."
Biometrika 89(3): 539-551.
Christer, A. H. and W. Wang (1995). "A Simple Condition Monitoring Model for a
Direct Monitoring Process." European Journal of Operational Research
82(2): 258-269.
Christer, A. H., W. Wang, et al. (1997). "A State Space Condition Monitoring Model
for Furnace Erosion Prediction and Replacement." European Journal of
Operational Research 101(1): 1-14.
Cinlar, E., E. Osman, et al. (1977). "Stochastic Process for Extrapolating Concrete
Creep." Journal of the Engineering Mechanics Division 103(6): 1069-1088
Cox, D. R. (1972). "Regression Models and Life-Tables." Journal of the Royal
Statistical Society. Series B (Methodological), 34(2): 187-220.
Crowder, M. and J. Lawless (2007). "On a Scheme for Predictive Maintenance."
European Journal of Operational Research 176(3): 1713-1722.
7 References 151
Dempster, A. P., N. M. Laird, et al. (1977). "Maximum Likelihood from Incomplete
Data Via the Em Algorithm." Journal of the Royal Statistical Society. Series
B (Methodological) 39(1): 1-38.
Doucet, A., S. Godsill, et al. (2000). "On Sequential Monte Carlo Sampling Methods
for Bayesian Filtering." Statistics and Computing 10(3): 197-208.
Doucet, A., S. J. Godsill, et al. (2002). "Marginal Maximum a Posteriori Estimation
Using Markov Chain Monte Carlo." Statistics and Computing 12(1): 77-84.
Doucet, A. and V. Tadić (2003). "Parameter Estimation in General State-Space
Models Using Particle Methods." Annals of the Institute of Statistical
Mathematics 55(2): 409-422.
Frangopol, D. M., M.-J. Kallen, et al. (2004). "Probabilistic Models for Life-Cycle
Performance of Deteriorating Structures: Review and Future Directions."
Steel Construction 6(4): 197-212.
Garcia Marquez, F. P., D. J. Pedregal Tercero, et al. (2007). "Unobserved
Component Models Applied to the Assessment of Wear in Railway Points: A
Case Study." European Journal of Operational Research 176(3): 1703-1712.
Ge, M., R. Du, et al. (2004). "Hidden Markov Model Based Fault Diagnosis for
Stamping Processes." Mechanical Systems and Signal Processing 18(2): 391-
408.
Ghasemi, A., S. Yacout, et al. (2008). "Optimal Stategies for Non-Costly and Costly
Observations in Condition Based Maintenance." IAENG International
Journal of Applied Mathematics 38(2).
Gibson, S. and B. Ninness (2005). "Robust Maximum-Likelihood Estimation of
Multivariable Dynamic Systems." Automatica 41(10): 1667-1682.
Godsill, S. J., A. Doucet, et al. (2004). "Monte Carlo Smoothing for Nonlinear Time
Series." Journal of the American Statistical Association 99(465): 156.
7 References 152
Goode, K. B., J. Moore, et al. (2000). "Plant Machinery Working Life Prediction
Method Utilizing Reliability and Condition-Monitoring Data." Proceedings
of the Institution of Mechanical Engineers 214(2): 109.
Grall, A., C. Berenguer, et al. (2002). "A Condition-Based Maintenance Policy for
Stochastically Deteriorating Systems." Reliability Engineering & System
Safety 76(2): 167-180.
Grosfeld-Nir, A. (2007). "Control Limits for Two-State Partially Observable
Markov Decision Processes." European Journal of Operational Research
182(1): 300-304.
Hashemi, R., H. Jacqmin-Gadda, et al. (2003). "A Latent Process Model for Joint
Modeling of Events and Marker." Lifetime Data Analysis 9(4): 331-343.
Heng, A., A. C. C. Tan, et al. (2009). "Intelligent Condition-Based Prediction of
Machinery Reliability." Mechanical Systems and Signal Processing 23(5):
1600-1614.
Hontelez, J. A. M., H. H. Burger, et al. (1996). "Optimum Condition-Based
Maintenance Policies for Deteriorating Systems with Partial Information."
Reliability Engineering & System Safety 51(3): 267-274.
Hopp, W. J. and Y.-L. Kuo (1998). "An Optimal Structured Policy for Maintenance
of Partially Observable Aircraft Engine Components." Naval Research
Logistics 45(4): 335-352.
Huitian, L., W. J. Kolarik, et al. (2001). "Real-Time Performance Reliability
Prediction." Reliability, IEEE Transactions on 50(4): 353-357.
Ilgin, M. and S. Tunali (2007). "Joint Optimization of Spare Parts Inventory and
Maintenance Policies Using Genetic Algorithms." The International Journal
of Advanced Manufacturing Technology 34(5): 594-604.
7 References 153
Jacquier, E., M. Johannes, et al. (2007). "Mcmc Maximum Likelihood for Latent
State Models." Journal of Econometrics 137(2): 615-640.
Jardine, A. K. S., D. Lin, et al. (2006). "A Review on Machinery Diagnostics and
Prognostics Implementing Condition-Based Maintenance." Mechanical
Systems and Signal Processing 20(7): 1483-1510.
Jiang, R. and A. K. S. Jardine (2006). "Composite Scale Modeling in the Presence of
Censored Data." Reliability Engineering & System Safety 91(7): 756-764.
Jiang, R. and A. K. S. Jardine (2008). "Health State Evaluation of an Item: A
General Framework and Graphical Representation." Reliability Engineering
& System Safety 93(1): 89-99.
Jie, Y., T. Kirubarajan, et al. (2000). "A Hidden Markov Model-Based Algorithm for
Fault Diagnosis with Partial and Imperfect Tests." Systems, Man, and
Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 30(4):
463-473.
Julier, b. S. J. and J. K. Uhlmann (1997). A New Extension of the Kalman Filter to
Nonlinear Systems. Int. Symp. Aerospace/Defense Sensing, Simul. and
Controls: 182-193.
Kaelbling, L. P., M. L. Littman, et al. (1998). "Planning and Acting in Partially
Observable Stochastic Domains." Artificial Intelligence 101(1-2): 99-134.
Kallen, M. J. and J. M. Van Noortwijk (2005). "Optimal Maintenance Decisions
under Imperfect Inspection." Reliability Engineering & System Safety 90(2-
3): 177-185.
Khan, M. E. and D. N. Dutt (2007). "An Expectation-Maximization Algorithm
Based Kalman Smoother Approach for Event-Related Desynchronization
(Erd) Estimation from Eeg." Biomedical Engineering, IEEE Transactions on
54(7): 1191-1198.
7 References 154
Kim, J. (2005). Parameter Estimation in Stochastic Volatility Models with Missing
Data Using Particle Methods and the Em Algorithm. United States --
Pennsylvania, University of Pittsburgh.
Klaas, M., M. Briers, et al. (2006). Fast Particle Smoothing: If I Had a Million
Particles. Proceedings of the 23rd international conference on Machine
learning. Pittsburgh, Pennsylvania, ACM.
Kobbacy, K. A. H., B. B. Fawzi, et al. (1997). "A Full History Proportional Hazards
Model for Preventive Maintenance Scheduling." Quality and Reliability
Engineering International 13(4): 187-198.
Kravdal, Ø. (1997). "The Attractiveness of an Additive Hazard Model: An Example
from Medical Demography." European Journal of Population/Revue
européenne de Démographie 13(1): 33-47.
Kumar, D. and U. Westberg (1996). "Proportional Hazards Modeling of Time-
Dependent Covariates Using Linear Regression: A Case Study [Mine Power
Cable Reliability]." Reliability, IEEE Transactions on 45(3): 386-392.
Kumar, D. and U. Westberg (1997). "Maintenance Scheduling under Age
Replacement Policy Using Proportional Hazards Model and Ttt-Plotting."
European Journal of Operational Research 99(3): 507-515.
Lawless, J. and M. Crowder (2004). "Covariates and Random Effects in a Gamma
Process Model with Application to Degradation and Failure." Lifetime Data
Analysis 10(3): 213-227.
Lee, M.-L. T. and G. A. Whitmore (2006). "Threshold Regression for Survival
Analysis: Modeling Event Times by a Stochastic Process Reaching a
Boundary." Statistical Science 21(4): 501–513.
Lee, M.-L. T., G. A. Whitmore, et al. (2004). "Assessing Lung Cancer Risk in
Railroad Workers Using a First Hitting Time Regression Model."
Environmetrics 15(5): 501-512.
7 References 155
Li, W. and H. Pham (2005). "An Inspection-Maintenance Model for Systems with
Multiple Competing Processes." Reliability, IEEE Transactions on 54(2):
318-327.
Liao, H., E. A. Elsayed, et al. (2006a). "Maintenance of Continuously Monitored
Degrading Systems." European Journal of Operational Research 175(2): 821-
835.
Liao, H., W. Zhao, et al. (2006b). Predicting Remaining Useful Life of an Individual
Unit Using Proportional Hazards Model and Logistic Regression Model.
Reliability and Maintainability Symposium, 2006. RAMS '06. Annual: 127-
132.
Lin, D., D. Banjevic, et al. (2006). "Using Principal Components in a Proportional
Hazards Model with Applications in Condition-Based Maintenance." The
Journal of the Operational Research Society 57(8): 910.
Lin, D. Y. and Z. Ying (1994). "Semiparametric Analysis of the Additive Risk
Model." Biometrika 81(1): 61-71.
Lin, D. Y. and Z. Ying (1995). "Semiparametric Analysis of General Additive-
Multiplicative Hazard Models for Counting Processes." The Annals of
Statistics 23(5): 1712-1734.
Logan, B. T. and A. J. Robinson (1997). Enhancement and Recognition of Noisy
Speech within an Autoregressive Hidden Markov Model Framework Using
Noise Estimates from the Noisy Signal. Acoustics, Speech, and Signal
Processing, 1997. ICASSP-97., 1997 IEEE International Conference on. A. J.
Robinson. 2: 843-846 vol.2.
Lu, S., H. Lu, et al. (2001). "Multivariate Performance Reliability Prediction in
Real-Time." Reliability Engineering & System Safety 72(1): 39-45.
Maillart, L. M. (2006). "Maintenance Policies for Systems with Condition
Monitoring and Obvious Failures." IIE Transactions 38: 463-475.
7 References 156
Makis, V. and A. K. S. Jardine (1992). "Optimal Replacement in the Proportional
Hazards Model." INFOR 30(2): 172-183.
Makis, V. and X. Jiang (2003). "Optimal Replacement under Partial Observations."
Mathematics of Operations Research 28(2): 382.
Makis, V., J. Wu, et al. (2006). "An Application of Dpca to Oil Data for Cbm
Modeling." European Journal of Operational Research 174(1): 112-123.
Mani, G., D. Wolfe, et al. (2008). Slurry Pump Wear Assessment through Vibration
Monitoring. WCEAM-IMS 2008. Beijing, China, Springer-Verlag London
Ltd: 1068-1076.
Marseguerra, M., E. Zio, et al. (2002). "Condition-Based Maintenance Optimization
by Means of Genetic Algorithms and Monte Carlo Simulation." Reliability
Engineering & System Safety 77(2): 151-165.
McKeague, I. W. and P. D. Sasieni (1994). "A Partly Parametric Additive Risk
Model." Biometrika 81(3): 501-514.
Miao, Q. (2005). Application of Wavelets and Hidden Markov Model in Condition-
Based Maintenance. Canada, University of Toronto (Canada).
Mohanta, D. K., P. K. Sadhu, et al. (2007). "Deterministic and Stochastic Approach
for Safety and Reliability Optimization of Captive Power Plant Maintenance
Scheduling Using Ga/Sa-Based Hybrid Techniques: A Comparison of
Results." Reliability Engineering & System Safety 92(2): 187-199.
Monahan, G. E. (1982). "A Survey of Partially Observable Markov Decision
Processes: Theory, Models, and Algorithms." Management Science 28(1): 1-
16.
Morcous, G. (2006). "Performance Prediction of Bridge Deck Systems Using
Markov Chains." Journal of Performance of Constructed Facilities 20(2):
146-155.
7 References 157
Moustafa, M. S., E. Y. A. Maksoud, et al. (2004). "Optimal Major and Minimal
Maintenance Policies for Deteriorating Systems." Reliability Engineering &
System Safety 83(3): 363-368.
Munõz, A., S. Martorell, et al. (1997). "Genetic Algorithms in Optimizing
Surveillance and Maintenance of Components." Reliability Engineering &
System Safety 57(2): 107-120.
Olsson, J., O. Capp´e, et al. (2008). "Sequential Monte Carlo Smoothing with
Application to Parameter Estimation in Nonlinear State Space." Bernoulli
14(1): 155–179.
Orchard, M., G. Kacprzynski, et al. (2009). Advances in Uncertainty Representation
and Management for Particle Filtering Applied to Prognostics. Applications
of Intelligent Control to Engineering Systems: 23-35.
Park, C. and W. Padgett (2005a). "Accelerated Degradation Models for Failure
Based on Geometric Brownian Motion and Gamma Processes." Lifetime
Data Analysis 11(4): 511-527.
Park, C. and W. J. Padgett (2005b). "New Cumulative Damage Models for Failure
Using Stochastic Processes as Initial Damage." Reliability, IEEE
Transactions on 54(3): 530-540.
Park, C. and W. J. Padgett (2006). "Stochastic Degradation Models with Several
Accelerating Variables." Reliability, IEEE Transactions on 55(2): 379-390.
Park, K. S. (1988). "Optimal Continuous-Wear Limit Replacement under Periodic
Inspections." Reliability, IEEE Transactions on 37(1): 97-102.
Porta, J. M., M. T. J. Spaan, et al. (2005). Robot Planning in Partially Observable
Continuous Domains. Robotics: Science and Systems I. Cambridge,
Massachusetts.
7 References 158
Prasad, P. V. N. and K. R. M. Rao (2002). Reliability Models of Repairable Systems
Considering the Effect of Operating Conditions. Reliability and
Maintainability Symposium, 2002. Proceedings. Annual: 503-510.
Proust-Lima, C. and L. L. H. Jacqmin-Gadda (2007). "A Nonlinear Latent Class
Model for Joint Analysis of Multivariate Longitudinal Data and a Binary
Outcome." Statistics in Medicine 26(10): 2229-2245.
Proust, C., H. Jacqmin-Gadda, et al. (2006). "A Nonlinear Model with Latent
Process for Cognitive Evolution Using Multivariate Longitudinal Data."
Biometrics 62(4): 1014-1024.
Puterman, M. L. (1994). Markov Decision Processes : Discrete Stochastic Dynamic
Programming. Hoboken, N.J. ; [Great Britain], Wiley-Interscience.
Ross, S. M. (1971). "Quality Control under Markovian Deterioration." Management
Science 17(9): 587-596.
Ross, S. M. (1996). Stochastic Processes. New York, Wiley.
Schön, T., A. Wills, et al. (2006). Maximum Likelihood Nonlinear System
Estimation Proceedings 14th IFAC Symposium on System Identification.
Schwarz, G. (1978). "Estimating the Dimension of a Model." The Annals of
Statistics 6(2): 461-464.
Shiroishi, J., Y. Li, et al. (1997). "Bearing Condition Diagnostics Via Vibration and
Acoustic Emission Measurements." Mechanical Systems and Signal
Processing 11(5): 693-705.
Singpurwalla, N. D. (1995). "Survival in Dynamic Environments." Statistical
Science 10(1): 86-103.
Singpurwalla, N. D. (2006). Reliability and Risk : A Bayesian Perspective. New
York, J. Wiley & Sons.
7 References 159
Sondik, E. J. (1978). "The Optimal Control of Partially Observable Markov
Processes over the Infinite Horizon: Discounted Costs." Operations Research
26(2): 282-304.
Stathopoulos, A. and M. G. Karlaftis (2003). "A Multivariate State Space Approach
for Urban Traffic Flow Modeling and Prediction." Transportation Research
Part C: Emerging Technologies 11(2): 121-135.
Stavropoulos, C. N. and S. D. Fassois (2000). "Non-Stationary Functional Series
Modeling and Analysis of Hardware Reliability Series: A Comparative Study
Using Rail Vehicle Interfailure Times." Reliability Engineering & System
Safety 68(2): 169-183.
Sun, Y., L. Ma, et al. (2006). "Mechanical Systems Hazard Estimation Using
Condition Monitoring." Mechanical Systems and Signal Processing 20(5):
1189-1201.
Thrun, S. (2000). "Monte Carlo Pomdps." Advances in Neural Information
Processing Systems 12: 1064-1070.
Tijms, H. C. and F. A. van der Duyn Schouten (1985). "A Markov Decision
Algorithm for Optimal Inspections and Revisions in a Maintenance System
with Partial Information." European Journal of Operational Research 21(2):
245-253.
Torben, M. and H. S. Thomas (2002). "A Flexible Additive Multiplicative Hazard
Model." Biometrika 89(2): 283.
Van Der Merwe, R., A. Doucet, et al. (2000). The Unscented Particle Filter. Adv.
Neural Inform. Process. Syst.
van Noortwijk, J. M. (2009). "A Survey of the Application of Gamma Processes in
Maintenance." Reliability Engineering & System Safety 94(1): 2-21.
7 References 160
Vlok, P. J., J. L. Coetzee, et al. (2002). "Optimal Component Replacement Decisions
Using Vibration Monitoring and the Proportional-Hazards Model." The
Journal of the Operational Research Society 53(2): 193-202.
Wang, P. and D. W. Coit (2004). Reliability Prediction Based on Degradation
Modeling for Systems with Multiple Degradation Measures. Reliability and
Maintainability, 2004 Annual Symposium - RAMS: 302-307.
Wang, R. C. (1976). "Computing Optimal Quality Control Policies: Two Actions."
Journal of Applied Probability 13(4): 826-832.
Wang, W. (2002). "A Model to Predict the Residual Life of Rolling Element
Bearings Given Monitored Condition Information to Date." IMA Journal of
Management Mathematics 13(1): 3.
Wang, W. (2003a). "An Evaluation of Some Emerging Techniques for Gear Fault
Detection." Structural Health Monitoring 2(3): 225-242.
Wang, W. (2003b). "Modelling Condition Monitoring Intervals: A Hybrid of
Simulation and Analytical Approaches." The Journal of the Operational
Research Society 54(3): 273.
Wang, W. (2006). "Modelling the Probability Assessment of System State Prognosis
Using Available Condition Monitoring Information." IMA Journal of
Management Mathematics 17(3): 225.
Wang, W. (2007). "A Prognosis Model for Wear Prediction Based on Oil-Based
Monitoring." Journal of the Operational Research Society 58: 887-893.
Wang, W. (2009). "An Inspection Model for a Process with Two Types of
Inspections and Repairs." Reliability Engineering & System Safety 94(2):
526-533.
7 References 161
Wang, W. and A. H. Christer (2000). "Towards a General Condition Based
Maintenance Model for a Stochastic Dynamic System." The Journal of the
Operational Research Society 51(2): 145-155.
Wang, W., P. A. Scarf, et al. (2000). "On the Application of a Model of Condition-
Based Maintenance." The Journal of the Operational Research Society
51(11): 1218.
Wang, W. and A. K. Wong (2002). "Autoregressive Model-Based Gear Fault
Diagnosis." Journal of Vibration and Acoustics 124(2): 172-179.
Wang, W. and W. Zhang (2005). "A Model to Predict the Residual Life of Aircraft
Engines Based Upon Oil Analysis Data." Naval Research Logistics 52(3):
276-284.
White, C. C., III (1978). "Optimal Inspection and Repair of a Production Process
Subject to Deterioration." The Journal of the Operational Research Society
29(3): 235-243.
White, C. C., III (1979). "Bounds on Optimal Cost for a Replacement Problem with
Partial Observations." Naval Research Logistics Quarterly 26(3): 415-422.
Whitmore, G. and F. Schenkelberg (1997). "Modelling Accelerated Degradation
Data Using Wiener Diffusion with a Time Scale Transformation." Lifetime
Data Analysis 3(1): 27-45.
Whitmore, G. A., M. J. Crowder, et al. (1998). "Failure Inference from a Marker
Process Based on a Bivariate Wiener Model." Lifetime Data Analysis 4(3):
229-251.
Wills, A., T. B. Schön, et al. (2008). Parameter Estimation for Discrete-Time
Nonlinear Systems Using Em. 17th IFAC World Congress. COEX, Korea,.
Wu, C. F. J. (1983). "On the Convergence Properties of the Em Algorithm." The
Annals of Statistics 11(1): 95-103.
7 References 162
Xu, D. and W. Zhao (2005). Reliability Prediction Using Multivariate Degradation
Data. Reliability and Maintainability Symposium, 2005. Proceedings.
Annual: 337-341.
Yashin, A. I., K. G. Arbeev, et al. (2007). "Stochastic Model for Analysis of
Longitudinal Data on Aging and Mortality." Mathematical Biosciences
208(2): 538-551.
Yashin, A. I. and K. G. Manton (1997). "Effects of Unobserved and Partially
Observed Covariate Processes on System Failure: A Review of Models and
Estimation Strategies." Statistical Science 12(1): 20-34.
Yu, B. M., K. V. Shenoy, et al. (2004). Derivation of Kalman Filtering and
Smoothing Equations, Department of Electrical Engineering Stanford
University.
Yuan, X. (2007). Stochastic Modeling of Deterioration in Nuclear Power Plant
Components. Civil and Environmental Engineering. Waterloo, University of
Waterloo.
Zeng, D., G. Yin, et al. (2005). "Inference for a Class of Transformed Hazards
Models." Journal of the American Statistical Association 100(471): 1000.
Zhou, E., M. C. Fu, et al. (to appear). "Solving Continuous-State Pomdps Via
Density Projection." IEEE Transactions on Automatic Control.
Zhou, J. (2007). Joint Decision Making on Preventive Maintenance and
Reconfiguration in Complex Manufacturing Systems. United States --
Michigan, University of Michigan.
Zuashkiani, A., D. Banjevic, et al. (2006). Incorporating Expert Knowledge When
Estimating Parameters of the Proportional Hazards Model. Reliability and
Maintainability Symposium, 2006. RAMS '06. Annual: 402-408.
7 References 163
Zuo, M. J., R. Jiang, et al. (1999). "Approaches for Reliability Modeling of
Continuous-State Devices." Reliability, IEEE Transactions on 48(1): 9-18.
164
8 Appendix
The derivation of the conditional PDF of the underlying health state
given the health states and , where :
According to Bayesian theory and Markovian property, the conditional PDF can be
calculated as:
,, ,
,
| | , (8-1)
where
| ; , , (8-2)
; ,, (8-3)
and
; ,. (8-4)
The , can be finally obtained as
,
; , . (8-5)
8 Appendix 165
The derivation of the conditional PDF of underlying health states for
censored data, where is the censored time:
Λ Λ , Λ
Pr Λ Λ , Λ |
Pr Λ |Λ |Pr Λ |Λ
Pr Λ , Λ |Λ |
Pr Λ |Λ
Λ , Λ |Λ
Pr Λ |Λ
Λ |Λ Λ |Λ
Pr Λ |Λ
Λ |Λ Pr Λ |ΛPr Λ |Λ
Ga ; ,
·1 Γ , Λ / /Γ
1 Γ , Λ / /Γ
where Γ , is the incomplete Gamma function