analyzing transformer replacement policies: a simulation approach to reducing failure...
TRANSCRIPT
Analyzing Transformer ReplacementPolicies: A Simulation Approach to
Reducing Failure Risk
Daniel P. Chen
Advisor: Professor Warren B. Powell
Submitted in partial fulfillment
of the requirements for the degree of
Bachelor of Science in Engineering
Department of Operations Research and Financial Engineering
Princeton University
April 14th, 2014
I hereby declare that I am the sole author of this thesis.
I authorize Princeton University to lend this thesis to other institutions or individuals
for the purpose of scholarly research.
Daniel P. Chen
I further authorize Princeton University to reproduce this thesis by photocopying or
by other means, in total or in part, at the request of other institutions or individuals
for the purpose of scholarly research.
Daniel P. Chen
Abstract
PSE&G and utilities nationwide face a considerable amount of operational and finan-cial risk from the possibility of widespread transformer failure. The current policythat PSE&G uses for transformer replacement does not replace transformers untilthey are close to failure and is not sufficient to protect PSE&G from significant fail-ure risk. This paper implements several replacement policies to reduce failure risk. Itfocuses on policies that utilize both chronological age and the number of faults expe-rienced as criteria for replacement. The results show that these policies are effectiveat reducing failure risk while incurring significant opportunity costs. The final partof this paper explores the trade-off between failure risk and opportunity cost in orderto inform the future decisions of the utility.
Acknowledgements
First and foremost, I would like to thank my advisor, Professor Warren Powell,without whom this thesis would not have been possible. Thank you for introducingme to the problem of transformer replacement as well as consistently pushing me toexplore the problem in new and interesting ways. I am grateful for your constantencouragement, especially during the many times I came to you with what I thoughtwere unsolvable obstacles throughout this process.
I would also like to thank Richard Wernsing and Angela Rothweiler from theAsset Strategy team at PSE&G. Thank you for taking the time to respond to themany inquiries I had and helping me understand the basics of operating a utilitycompany. I hope that the results of this thesis will be half as helpful to you as youhave been to me.
I would be remiss if I did not acknowledge the many friends who have not onlymade this thesis process bearable but also provided an incredible source of supportand hilarity throughout these past 4 years. Thank you for putting up with me.
To the Princeton Tower Club and members of the Centennial Room, for alwaysproviding lively dinner table conversation and an endless supply of coffee. To theORF crew, for ensuring that I was never alone throughout my many late nights andsupplying much needed distractions to keep me from being too productive. And tothe gentlemen of Dod 1S, for being the best roommates a guy could ask for. I can onlyhope that we will get the chance to continue our adventures after graduation.
Specifically, I would also like to thank Ashley Chiang, Shreya Nathan, MedhaRanka, and Satyajeet Pal for their help with the editing process. Thank you fortaking the time out of your busy schedules to read my thesis.
And finally, to my brother, parents, and grandparents who have made everythingat Princeton possible. Thank you for always being there for me throughout my lifeand believing that I can achieve anything I set my mind to.
To Mom, Dad, and Patrick
Contents
1 Introduction 11.1 PSE&G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 The Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Transformer Lifetime . . . . . . . . . . . . . . . . . . . . . . . 31.2.2 Transformer Aging . . . . . . . . . . . . . . . . . . . . . . . . 41.2.3 Handling Transformer Failure . . . . . . . . . . . . . . . . . . 5
1.3 Information from Diagnostics . . . . . . . . . . . . . . . . . . . . . . 61.4 Overview of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 The Stochastic Model 122.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Transformer Location Correlation . . . . . . . . . . . . . . . . 122.1.2 Aging Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.3 Failure Threshold . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.4 Failure and Replacement . . . . . . . . . . . . . . . . . . . . . 14
2.2 The Unobservable Model . . . . . . . . . . . . . . . . . . . . . . . . . 142.2.1 Initialization of Model . . . . . . . . . . . . . . . . . . . . . . 142.2.2 The State Variable . . . . . . . . . . . . . . . . . . . . . . . . 152.2.3 Exogenous Information . . . . . . . . . . . . . . . . . . . . . . 152.2.4 Transition Functions . . . . . . . . . . . . . . . . . . . . . . . 17
3 Model Selection 193.1 Transformer Location Initialization . . . . . . . . . . . . . . . . . . . 193.2 Number of Faults Per Year . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.1 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . 213.2.2 The Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2.3 Fitting the Poisson Distribution . . . . . . . . . . . . . . . . . 243.2.4 The Negative Binomial Distribution . . . . . . . . . . . . . . . 25
3.3 Location of Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4 Transformer Correlation . . . . . . . . . . . . . . . . . . . . . . . . . 273.5 Failure Times and Fault Magnitude . . . . . . . . . . . . . . . . . . . 29
3.5.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.5.2 The Base Model . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5.3 Failure Times . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.5.4 Magnitude of Faults . . . . . . . . . . . . . . . . . . . . . . . 333.5.5 Parameter Selection . . . . . . . . . . . . . . . . . . . . . . . . 35
4 Policies 384.1 The Observable Model . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.1 The State Variable . . . . . . . . . . . . . . . . . . . . . . . . 394.1.2 Decision Variables . . . . . . . . . . . . . . . . . . . . . . . . 394.1.3 Exogenous Information . . . . . . . . . . . . . . . . . . . . . . 404.1.4 Transition Functions . . . . . . . . . . . . . . . . . . . . . . . 424.1.5 Objective Function . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Base Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.2.1 Simulating DGA Testing . . . . . . . . . . . . . . . . . . . . . 45
4.3 Pure Aging Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.4 Variance Reduction Policy . . . . . . . . . . . . . . . . . . . . . . . . 474.5 Threshold Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.6 Lookahead Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6.1 Overview of Lookahead Policies . . . . . . . . . . . . . . . . . 494.6.2 1 year . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.6.3 2 year . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.7 Policies To Estimate the Value of Information . . . . . . . . . . . . . 524.7.1 Lookahead Plus Policy . . . . . . . . . . . . . . . . . . . . . . 534.7.2 Perfect Information Policy . . . . . . . . . . . . . . . . . . . . 54
5 Results 575.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.1.1 Opportunity Cost . . . . . . . . . . . . . . . . . . . . . . . . . 575.1.2 Cost of Failure . . . . . . . . . . . . . . . . . . . . . . . . . . 595.1.3 Empirical Objective Function . . . . . . . . . . . . . . . . . . 605.1.4 Failure Risk vs. Opportunity Cost per Replacement . . . . . . 61
5.2 Comparing Policy Performance . . . . . . . . . . . . . . . . . . . . . 625.3 Policy Specific Results . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3.1 Base Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.3.2 Pure Aging Policy . . . . . . . . . . . . . . . . . . . . . . . . 675.3.3 Variance Reduction Policy . . . . . . . . . . . . . . . . . . . . 695.3.4 Threshold Policy . . . . . . . . . . . . . . . . . . . . . . . . . 725.3.5 Lookahead Policy . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4 Comparison Across Different Risk Measures . . . . . . . . . . . . . . 775.4.1 Case Study: Zero Risk Tolerance . . . . . . . . . . . . . . . . 80
5.5 Limitations of Policies . . . . . . . . . . . . . . . . . . . . . . . . . . 825.5.1 Chronological Age vs. Weighted Ranking . . . . . . . . . . . . 835.5.2 Variance in Fault Magnitude . . . . . . . . . . . . . . . . . . . 845.5.3 Expected Amount of Aging Per Year . . . . . . . . . . . . . . 87
5.6 Value of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.6.1 Estimating the Value of Information . . . . . . . . . . . . . . 89
5.6.2 Impact of Measurement Noise . . . . . . . . . . . . . . . . . . 91
6 Conclusion 956.1 Areas for Further Research . . . . . . . . . . . . . . . . . . . . . . . . 966.2 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
List of Figures
1.1 A map of PSE&G’s service area . . . . . . . . . . . . . . . . . . . . . 21.2 PSE&G’s Maintenance Policies . . . . . . . . . . . . . . . . . . . . . 71.3 Summary of gas concentrations from DGA (Hamrick, 2009) . . . . . . 81.4 Gas concentration over time from Karlsson (2007) . . . . . . . . . . . 91.5 The basic structure of the U.S. electric grid (Council of Economic Ad-
visers, 2013) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1 Histogram showing the ages of current transformers . . . . . . . . . . 203.2 Historical number of unique incidents per year . . . . . . . . . . . . . 233.3 Comparison of the Poisson and Negative Binomial distributions. The
green line shows the p.m.f. of the fitted Negative Binomial distributionand the blue line shows the p.m.f. of the fitted Poisson distribution . 26
3.4 Distribution of the number of transformers affected per fault on average 283.5 The number of failures per year in the Base Model . . . . . . . . . . 313.6 Sample paths of failure rates using the fitted stochastic model . . . . 37
5.1 Values of the objective function of policies with different values for thetunable parameters. The orange line represents the minimum valueacross all policies and the labels above each policy represent the mini-mum value within each policy. . . . . . . . . . . . . . . . . . . . . . . 62
5.2 The best and worst values of the objective function within each policy,with costs broken down by opportunity cost and failure cost . . . . . 63
5.3 The relationship between the number of replacements made in thetime interval t = [70, 120] and the opportunity cost per replacementcompared across policies . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.4 A sample path of the Base policy from t = [70,120]. The blue linerepresents the number of replacements and the red line represents thenumber of failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.5 Performance of the Pure Aging policy across the 3 different risk mea-sures with different values for α. α decreases from left to right. . . . . 68
5.6 Sample paths of the Pure Aging policy with different values for α . . 695.7 Performance of the Variance Reduction policy across the 3 different
risk measures with β = 0.2 and different values for η. η decreases fromleft to right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.8 Sample paths of the Variance Reduction policy with different values for η 72
5.9 Performance of the Threshold policy across the 3 different risk measureswith β = 0.2 and different values for τ . τ decreases from left to right. 74
5.10 Sample paths of the Threshold policy with different values for τ . . . 755.11 Performance of the Lookahead policies across the 3 different risk mea-
sures with β = 0.2, δ1 = 1, δ2 = 0.5 and different values for γ. γdecreases from left to right. . . . . . . . . . . . . . . . . . . . . . . . 77
5.12 Performance of different policies in terms of reducing p9 with varyingtunable parameters for each policy . . . . . . . . . . . . . . . . . . . 78
5.13 Performance of different policies in terms of reducing p12 with varyingtunable parameters for each policy . . . . . . . . . . . . . . . . . . . 79
5.14 Performance of different policies in terms of reducing p15 with varyingtunable parameters for each policy . . . . . . . . . . . . . . . . . . . 80
5.15 Comparison of different policies across risk measures in the scenario ofzero risk tolerance where failure risk is completely minimized . . . . . 81
5.16 Fitted probability distribution functions of failure values using the twocriteria. The x-axis represents the percentage of the mean value . . . 84
5.17 Fitted probability distribution function of all fault magnitudes . . . . 855.18 Fitted probability distribution function of fault magnitudes separated
by fault type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.19 Distribution functions of the annual amount of aging experienced by
transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.20 Values of the objective function of policies including policies to es-
timate the value of information. The top orange line represents theminimum value across all currently feasible policies. The middle or-ange line represents the minimum value in the Lookahead Plus policy.The bottom orange line represents the minimum value of the PerfectInformation policy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.21 Performance of the Lookahead Plus policy under different ε on the p9risk measure. δp increases from the left to right. . . . . . . . . . . . . 92
5.22 Comparison of P it with different values of ε at t=90 in the same sample
path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935.23 Impact of measurement noise on Lookahead Plus policy performance
compared to Threshold policy. . . . . . . . . . . . . . . . . . . . . . . 94
List of Tables
3.1 Parameter selection for correlation matrix . . . . . . . . . . . . . . . 293.2 Summary of best fit parameters . . . . . . . . . . . . . . . . . . . . . 36
5.1 Results for Base policy . . . . . . . . . . . . . . . . . . . . . . . . . . 665.2 Results for Pure Aging policy . . . . . . . . . . . . . . . . . . . . . . 675.3 Results for Variance Reduction Policy . . . . . . . . . . . . . . . . . . 705.4 Results for Threshold Policy . . . . . . . . . . . . . . . . . . . . . . . 735.5 Results for Lookahead Policy . . . . . . . . . . . . . . . . . . . . . . . 765.6 Statistics for the failure values for the chronological age and weighted
ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.7 Statistics for the fault magnitude of m2 and m3 faults . . . . . . . . . 865.8 Statistics for the amount of aging per year . . . . . . . . . . . . . . . 885.9 Summary of the value of information . . . . . . . . . . . . . . . . . . 90
Chapter 1
Introduction
Power transformers are one of the most important components of a power grid:
they play a key role in the distribution of power to homes, factories, and other units
on the grid. From the perspective of the utility, the protection and maintenance
of transformers is of the utmost importance. The failure of a transformer is most
often the initial cause or an exacerbating factor in large power outages. Because
transformers are extremely costly and difficult to replace, utilities are constrained in
the number of transformers that it can replace in a given year. Given the uncertainty
surrounding when a transformer will fail, the utility faces a considerable amount
of risk in the failure of transformers. This thesis formulates a stochastic model for
transformer failures in order to design a policy that minimizes the failure risk faced
by a utility.
1.1 PSE&G
This thesis was inspired by close collaboration with Professor Warren Powell and
the Asset Strategy Team at PSE&G led by Richard Wernsing and Angela Rothweiler.
1
After initial meetings with the Asset Strategy Team, it became clear that PSE&G was
running a network with a significant amount of uncertainty surrounding the failure of
its transformers. As a regulated utility with tightly controlled resources for manage-
ment of its grid, PSE&G must implement an effective policy to prevent transformer
failures for the future. Given these resource constraints, PSE&G is especially suscep-
tible to high levels of risk. Thus, transformer reliability represents a top priority for
PSE&G.
1.1.1 The Grid
Figure 1.1: A map of PSE&G’s service area
PSE&G operates a network that serves a densely populated area of New Jersey
that stretches from New York City to Philadelphia, shown in Figure 1.1. PSE&G’s
2
service area is home to more than 60% of New Jersey residents despite occupying less
than 30% of the state’s area. The network consists of 358 transformers in operation,
with a significant number of transformers that are around 50-70 years old. The
primary concern of PSE&G is the failure of a large number of transformers within a
short time frame. Given that more than a third of the transformers in the network
are around 60 years old, the probability of a large number of failures occurring in the
same year is high.
More importantly, the problem of having a sizable cluster of transformers around
the same age is not unique to PSE&G. After World War II, a period of global in-
dustrial growth led to a dramatic growth in energy demand. Between 1950 and
1970, global energy consumption grew from 1 billion mWh to 10 billion mWh (Bart-
ley, 2003). This growth in demand necessitated rapid infrastructure investment and
transformer installation. According to the Council of Economic Advisers, U.S. his-
torical transmission construction reached a peak in 1967, before declining rapidly
through the 1970s and 1980s (Council of Economic Advisers, 2013). Utility compa-
nies nationwide experienced rapid expansion of their networks in a short period of
time, and many now face the same problem as PSE&G.
1.2 Literature Review
1.2.1 Transformer Lifetime
A great deal of literature exists on how to model transformer lifetime in terms
of reliability theory. However, there is not a consensus on what is the best model
for transformer lifetime. Chen and Egan (2006) formulate transformer retirement
patterns using Iowa Curves (asset retirement patterns based on research at Iowa
State University). Given a family of Iowa Curves, Chen and Egan (2006) use a
3
Bayesian method and a Perks’ hazard function to choose the best-fit curve with
highest probability.
van Schijndel et al. (2006) explore the concept of an integral transformer lifetime
model. An integral transformer lifetime model is defined as being able to treat all
relevant degradation mechanisms of all relevant subcomponents. Such a model would
ideally be able to take past degradation data and predict future failure probabilities.
van Schijndel et al. (2006) explore an application on winding insulation degradation,
but stop short of developing a full model. Further research is required on other forms
of degradation and on externally measurable parameters in order to make the model
fully usable.
Hong et al. (2009) develop a statistical procedure for modeling lifetimes of trans-
formers in a system of an unnamed utility company. The procedure recognizes the
difference in lifetime between “New group” and “Old group” transformers and mod-
els them separately. This highlights how the engineering of transformers has changed
over time. Older transformers more than 50 years old were built without the aid of
computer simulation models, which caused manufacturers to build them with signif-
icant overcapacity. Additionally, Hong et al. (2009) identify certain trends among
different manufacturers. A limitation of the procedure, however, is that the predic-
tion interval is extremely wide, since the model was based on limited data. Hong
et al. (2009) recognize that a better model could be built with more detailed usage
data that is not necessarily collected by utilities.
1.2.2 Transformer Aging
The factors that cause transformers to age are well understood. McNutt (1992)
identifies the 3 main factors that impact transformer aging: temperature, moisture,
and oxygen. These 3 factors weaken the strength of the insulation, which can signif-
4
icantly increase the likelihood of transformer failure. McNutt (1992) also highlights
the importance in minimizing the levels of moisture, oxygen, and heat in extending
transformer lifetime.
Arshad et al. (2004) corroborate the same 3 factors that impact transformer
aging. This paper also identifies the occurrence of through faults (“faults”) that
exert significant thermal and electromagnetic forces that weaken the tensile strength
of transformer insulation. Moreover, Arshad et al. (2004) highlight that moisture
and oxygen penetrate the insulation from their presence in the atmosphere. Thus,
given proper maintenance of the transformer, the aging due to moisture and oxygen
is largely constant over time.
Brown (2007) further emphasizes the role of thermal aging and through faults
in transformer failure. A major source of thermal aging is caused by transformer
overload. “Through faults cause extreme physical stress on transformer windings,
and are the major cause of transformer failures.” (Brown, 2007, pg. 19-11)
1.2.3 Handling Transformer Failure
The literature on how to handle transformer failures can be divided into 2 cate-
gories: finding an optimal number of spare transformers and how to determine when
to replace transformers.
Spares
Kogan et al. (1996) identify a procedure to find an optimal number of spares to
keep on hand. Kogan et al. (1996) treat the problem as a standard inventory problem,
and formulate the expected cost as the sum of the cost of nonavailability of a spare
when needed and the cost of owning a spare. In order to accomplish this, Kogan et al.
5
(1996) assume a Poisson distribution of transformer failures.
Chowdhury and Koval (2005) build upon Kogan et al. (1996) to determine a
procedure for finding the optimal number of spares. Failure rates, repair times, and
transformer inventor information are taken as inputs into the model. Chowdhury and
Koval (2005) identify the importance of substation design on the reliability require-
ments of a system of transformers. Depending on the redundancy of transformers at
substations in the system, a spare may or may not be needed in the case of failure at
a location with transformer redundancy.
Replacement
Fischer et al. (2010) explore how to determine the sequence of replacement for
transformers. This is done using 2 criteria: condition and default risk. Specifically,
default risk incorporates the transformer’s importance to the rest of the grid. Default
risk is assessed based on the condition of the insulating system, which is measured
through scheduled diagnostic tests on each transformer.
Dixon et al. (2010) analyze when an utility should replace an aging transformer
and highlights the importance of the insulation system in the functioning of a trans-
former. Although there exist certain symptoms of imminent failure, most of the time
there is significant uncertainty when predicting when a transformer will fail. Dixon
et al. (2010) also note the operational differences between older and new transformers,
noting the cooler operation of and robust design of older transformers.
1.3 Information from Diagnostics
In order to gain information on the condition of transformers in its network,
PSE&G regularly conducts maintenance diagnostics on its transformers . The pri-
6
mary method of diagnostics is through “Dissolved Gas Analysis” (DGA). These tests
are conducted on an annual basis on each transformer in the network. Figure 1.2 is
an excerpt from PSE&G’s “Substation Maintenance Manual”.
Figure 1.2: PSE&G’s Maintenance Policies
The results of the DGA are what PSE&G currently uses to determine when to
replace transformers. However, the limitation of this strategy is that DGA only de-
tects when a transformer is about to fail. Figure 1.3 from Hamrick (2009) shows the
difference in gas concentrations between a transformer operating in normal conditions
compared to failure conditions. From Figure 1.3, it is clear that there are large dis-
crepancies for the concentrations of different gases between the “Action Limits” and
the “Normal Limits.” Moreover, according to Hamrick (2009), the change between
“Action Limits” and “Normal Limits” occurs rapidly. “Where DGA results include
a sharp increase in key gas concentration levels and/or normal limits have been ex-
ceeded, it is suggested that an additional sample and analysis be performed to confirm
the previous evaluation...Once key gas concentrations have exceeded normal limits,
other analysis techniques should be considered for determining the potential problem
within the transformer” (Hamrick, 2009, pg. 2).
Karlsson (2007) describes the accumulation of gases in more detail, by defining
7
Figure 1.3: Summary of gas concentrations from DGA (Hamrick, 2009)
the “gas rate” G:
G =Gas level(t2)−Gas Level(t1)
t2 − t1
The gas rate remains constant during the majority of a transformer’s lifetime, as
the gas level increases linearly. At some point, the gas level sharply increases in
a short period of time, and the gas rate is extremely high. Figure 1.4 shows how
this phenomenon occurs. Karlsson (2007) indicates that once the gas rate exceeds
a certain threshold, the transformer is assumed to have an accelerating wear out
effect. From this, we can see that the results of the DGA will not indicate a problem
throughout the majority of a transformer’s lifetime until the sudden increase in gas
levels occurs.
Unfortunately, this means that DGA does not give very precise information about
the age of the transformer. Its usefulness lies primarily in its ability to detect when the
transformer is close to failure. It should be noted that the results of DGA require some
8
Figure 1.4: Gas concentration over time from Karlsson (2007)
interpretation and are not the sole determinant of transformer replacement. However,
for the purposes of this thesis, it is assumed that once DGA results show that there
has been an increase in the gas concentration, the decision is made to replace the
transformer shortly thereafter. Given the fact that DGA provides little information
on the interim aging process leading up to a transformer’s failure, annual DGA tests
by PSE&G are not sufficient to protect itself from the widespread transformer failure
it is trying to avoid. If too many transformers fail DGA testing in the same year,
PSE&G will not be able to replace them all in such a short time frame. It is from
these constraints that arises the need for replacement policies that take into account
the aging process over time.
9
1.4 Overview of Thesis
If PSE&G believes that its current policy for replacement will lead to years
when a significant number of transformer failures occur, what kind of policy should
PSE&G implement to prevent this from occurring? Given PSE&G’s constraint on
the maximum number of transformer replacements per year, it must begin to replace
transformers before they fail DGA testing. A simple policy that is sure to reduce
failure risk is to replace 10 of the oldest transformers in the network every year. Al-
though this will completely reduce failure risk, it is also incredibly inefficient. PSE&G
must consider the opportunity cost of replacing transformers early when evaluating
replacement policies. The literature establishes that DGA testing is effective at iden-
tifying transformers on the brink of failure. Therefore, the problem is to develop a
policy that identifies transformers for replacement in a given year in addition to the
ones that fail DGA testing. The question becomes what criteria should be used to
identify these transformers in an efficient way such that the opportunity cost of early
replacement is minimized? One criteria for identification is chronological age, where
transformers that exceed a certain chronological age threshold are replaced.
This thesis introduces the idea of utilizing the occurrence of faults in addition
to chronological age as a criteria for replacement. Although there is a wealth of
literature that explores how to model transformer lifetime, there is very little literature
that incorporates fault information. This arises from the difficulty in detecting the
occurrence of a fault and measuring its magnitude. Although PSE&G does not have
fault data, it does have data about outages in the network. After consultation with
the Asset Strategy team, an assumption was made that outages on a substation
circuit indicate a fault occurrence at transformers on the same circuit. A substation
circuit is shown in green in Figure 1.5. From this assumption, the number of faults
that a transformer has experienced over its lifetime can be found in addition to its
10
Figure 1.5: The basic structure of the U.S. electric grid (Council of Economic Advisers, 2013)
chronological age. Although the severity of each fault is still unobservable to PSE&G,
a preliminary hypothesis is that by utilizing this additional information about the
number of faults, a more efficient replacement policy can be implemented.
This thesis approaches the problem in two parts. Part 1 develops a stochastic
model of transformer failures that takes into account the arrival of faults in the net-
work. Using empirical data from PSE&G, appropriate distributions are fit for the
number of faults that occur per year and other stochastic elements of the grid. From
this, failure times are randomly generated for each transformer based on its chronolog-
ical age and the amount of aging caused by faults. Part 2 compares the effectiveness
of different replacements policies on reducing transformer failure by implementing
them in the failure model developed in Part 1. Policies that are implemented include
PSE&G’s current policy, policies that utilize chronological age, and policies that use
both chronological age and fault occurrences. In addition to finding the best perform-
ing feasible policy, policies that utilize information that is currently not observable
to PSE&G are also implemented. These “infeasible policies” are used to estimate
the value of information that could be acquired from investment in new diagnostic
technologies.
11
Chapter 2
The Stochastic Model
The model simulates the arrival of faults in the system over the course of the
next 50 years. There are currently 358 transformers in the system, with the oldest
transformers between 60 and 70 years of age. Thus, in order to capture the arrival of
faults across all the transformers in the system, we simulate the past 70 years as well,
for a total of 120 years of simulation. When looking at the impact of different policies,
the policies are only implemented after year 70 (present day) in the simulation.
2.1 Assumptions
2.1.1 Transformer Location Correlation
Each of the 358 transformers corresponds to a location in the network, and each
fault is simulated to occur at a random location. The model assumes that there is a
correlation between certain locations in the network. This is based on the fact that
faults affect all transformers on the circuit they occur on. This means that faults affect
all transformers at the same substation and any other transformers in substations on
12
the same circuit. Moreover, each fault incident can occur at multiple locations in the
network. This means that each fault incident can impact multiple transformers to
different degrees, which is the foundation of the correlation assumption.
2.1.2 Aging Process
The aging process of transformers is assumed to incorporate two factors: chrono-
logical aging and the impact of faults. Chronological age is equal to the time that a
transformers has been in operation. Each year that a transformer is in operation, the
chronological age of the transformer will increase by 1 year. The impact of faults is
simulated in terms of the amount of aging it causes in a transformer. The combina-
tion of these two factors is defined as real age. At any point in time, the real age of a
transformer is simply the sum of its chronological age and the aging that faults have
caused over the course of its lifetime.
2.1.3 Failure Threshold
Failure thresholds are assumed to be in terms of real age. Each transformer in the
network is simulated with its own specific failure threshold. Once a transformer’s real
age passes its failure threshold, it is assumed to be in failure. The failure threshold is
also assumed to be different for transformers of different ages. Transformers produced
more than 50 years ago contain more excess capacity than transformers produced
more recently. Thus, the failure threshold is assumed to be higher for those older
transformers.
13
2.1.4 Failure and Replacement
When a transformer is in failure, it is recorded as having failed and replaced
with a new transformer. Similarly, if a transformer is chosen by the policy to be
replaced before failure, it is also replaced with a new transformer in the same location.
No policies for replacement are implemented before year 70 (present day) in the
model. The new transformer that is installed is assumed to be of the younger kind of
transformer and has the lower failure threshold explained in section 2.1.3.
2.2 The Unobservable Model
This model captures the failure rate of transformers. It is important to recognize
that this model is from the perspective of simulating a truth. In this respect, this
model is unobservable to the policies and distinct from the model they will use.
The policies are not able to see this truth, meaning that the problem is “partially
observable” from the perspective of the policies. The model that will be used by the
policies will be discussed in Chapter 4, Policies.
2.2.1 Initialization of Model
The total number of transformers in the network is N = 358 and the transformers
are denoted by their location 1, 2, ..., N . When a new transformer is installed, it
inherits the location denotation of the transformer it is replacing. Time is denoted
by t, with t = 1, 2, ...120, where t = 1 is 70 years ago, t = 70 is present day, and
t = 120 is 50 years from today and the end of the simulation. The model assumes
that decisions are made at the end of time t. Therefore, the time period t denotes
the period after time point t and before t+ 1.
14
2.2.2 The State Variable
The state variable st includes the following:
1. ait boolean for transformer i ∈ [1, N ]
Denotes whether or not the transformer has entered operation. ait = 1 if the
transformer at location i has been initialized and 0 otherwise. Because the
transformers in the network are of different ages, the model must keep track of
when each transformer is first initialized. This is only important in t = 1, 2, ...70.
After t = 70, ait = 1 for all i.
2. cit for transformer i ∈ [1, N ]
Denotes the chronological age of transformer i.
3. rit for transformer i ∈ [1, N ]
Denotes the real age of transformer i.
4. Lit for transformer i ∈ [1, N ]
Denotes the failure threshold (limit) of each transformer.
5. F vector of length 120
Denotes the number of failures in each time period t. At the end of each time
period t, F[t] will be updated to be the number of failures that occurred during
t. This is ultimate objective of the Stochastic Model.
We denote the unobservable state st =([ait], [c
it], [r
it], [L
it],F
).
2.2.3 Exogenous Information
Exogenous information includes any events that are random. As such, the ex-
ogenous information in this case will include the arrival of faults. The exogenous
information is as follows:
15
1.ˆN ft
Denotes the number of faults that occur in time period t.
For every fault that occurs, there is additional exogenous information that is asso-
ciated with each fault. Let j = 1, 2, ...,ˆN ft . The additional exogenous information
is:
2. ljt for every fault j
Denotes the location that the fault occurs at. ljt is constrained to be in the
interval [1, N ]Z .
3. mjt for every fault j
Denotes the magnitude of the fault. This will be in terms of real age.
In addition, the exogenous information will include which transformers go into oper-
ation at the end of time period t:
4. at vector
Denotes which transformers enter operation at the end of time t.
Lastly, the exogenous information includes certain static variables including the cor-
relation matrix between transformer locations and the failure thresholds of new trans-
formers:
5. ρ matrix of dimensions N ×N
Denotes the correlation between two transformer locations such that ρ(x, y) is
the correlation between transformers x, y.
6. L
Denotes the failure threshold of a newly installed transformer
16
2.2.4 Transition Functions
As noted above, these transition functions occur at the end of time period t.
It is important that these transition functions do not take into account replacement
policies, because the Stochastic Model is used for simulating failure times. Certain
transition functions, such as in the case where a transformer is replaced before reach-
ing the failure threshold, will have to be adjusted in the policy simulations, which is
detailed in Chapter 4, Policies.
1. ait+1 = 1 if i ∈ at
Denote transformer i as entering operation if the exogenous information denotes
as such.
2. cit+1 = cit + 1 if ait = 1
Increment the chronological age of transformer i if it was in operation during
time period t.
3. In order to update rit+1, we must iterate through each of theˆN ft faults that
occur in time period t and determine which transformers are affected by each.
This updating is done using Algorithm 1.
Algorithm 1 Transition Function for rit+1
for j = 1 toˆN ft do
for i = 1 to N doif ait = 1 then
rit+1 = rit + mjt ∗ ρ(ljt , i) + 1
end ifend for
end for
The above transition functions take into account the aging that has occurred to the
transformers over the course of time period t. Next, the model must check if any
of these transformers have experienced failure. This yields the following transition
17
function:
4. If a transformer fails, the transformer must be replaced and the variables of age
reset. This means that Lit+1 and F[t] must be updated together. In addition,
in the event that transformer i fails, rit+1 and cit+1 must also be reset. This is
all done through Algorithm 2.
Algorithm 2 Transition Function for Lit+1, rit+1, c
it+1 in the event of failure
F[t] = 0for i = 1 to N do
if rit+1 > Lit thenF[t] = F[t] + 1Lit+1 = Lrit+1 = cit+1 = 0
elseLit+1 = Lit
end ifend for
18
Chapter 3
Model Selection
The previous chapter outlines the basics of the failure model. The next task
is to make this model as realistic as possible. This chapter discusses the fitting of
distributions for the stochastic processes outlined in the previous chapter. The parts
of the model incorporating uncertainty are all formulated as part of the exogenous
information. For each of these variables, we choose a distribution and fit appropriate
parameters. Given the data that is available from PSE&G, different variables were
fit using different methodologies. Fitting to existing data was used where possible.
However, in some cases, there was little or no data, requiring assumptions that were
formulated in conjunction with the Asset Strategy Team.
3.1 Transformer Location Initialization
The time of initialization of each transformer location is based on the age of the
transformers in the existing network. This corresponds to the at variable. The distri-
bution of transformer ages is derived from the ages of the transformers in the current
transformer network. PSE&G provided data on their current batch of transformers
19
which included the year of installation. Figure 3.1 shows the distribution of ages in
PSE&G’s current network. 180 out of the 358 transformers are between 50 and 70
Figure 3.1: Histogram showing the ages of current transformers
years of age (Group 1). As Figure 3.1 shows, the number of transformers in this
age range is significantly higher than any other age range. Since this is the basis of
the entire problem, the model for transformer initialization must first and foremost
take this into account. In terms of the ages of the younger transformers, they are
relatively evenly distributed between 0 and 50 years of age (Group 2). There is also
a handful of transformers in the 80 year old range. These older transformers were
built with different capacities than Group 1 or Group 2 transformers. Thus, in order
to avoid needless complexity from these older transformers that are not pertinent to
the problem, the oldest group of transformers is assumed to belong to Group 2.
20
Thus, 2 different distributions are needed to simulate the ages of Group 1 and
Group 2 transformers. The transformers are indexed by i ∈ [1, 358]. Transformers
[1, 180] belong to Group 1 and transformers [181, 358] belong to Group 2. Based on
the above data, the following model of ages was chosen:
Agesi ∼
Unif(60, 70), if i ∈ [1, 180]
Unif(0, 50), if i ∈ [181, 358]
The following equation describes the relationship between Ages and at
at = [all i]
where 70− Agesi = t
3.2 Number of Faults Per Year
The key part of this model is the number of faults that occur in a given year.
PSE&G has historical data on the number of faults that have occurred between 2002
and 2013. This data will be used to fit the distribution of theˆN ft variable.
3.2.1 The Poisson Distribution
A Poisson process is a stochastic process which counts the number of events that
occur in a given time interval. In this case, we are concerned with how many faults
occur every year. Thus, the Poisson distribution is a good and realistic distribution
to use in order to simulate faults experienced by transformers.
LetˆN ft denote the number of fault occurrences that occur in a given year t. If
21
Pt is a Poisson process, then:
ˆN ft = Pt − Pt−1
Pt, by virtue of being a Poisson process will have the following properties:
1. P0 = 0
2. Pt follows a Poisson distribution with parameter λ. For every k = 0, 1, 2, ..., the
probability mass function of Pt is equal to
P (Pt = k) =λke−λ
k!
3. The increments are independent such that Pt−Pt−1 is independent of Pt+1−Pt
4. The increments are stationary such that the expected number of arrivals in a
given interval E(Pt − Pt−1) = λ(t− (t− 1)) = λ(1) = λ
Thus, according to the distribution we have outlined, E(ˆN ft ) = λ for a given
year t. In order to properly use this model in the simulation, fault occurrences will
be assumed to be a homogenous Poisson process. This means that λ is assumed to
be constant over time. This is a realistic assumption because the amount of faults
experienced by transformers should be relatively the same over time. In order for the
model to be valid over the past history, the assumption of homogeneity must hold
true.
3.2.2 The Data
The fault data comes from the Plant Operations Report (POR) from PSE&G.
The POR lists all the outages or incidents that have occurred in the network. Note
that these outages are outages of the power grid and not specific to transformers. The
POR is the best data that PSE&G has since it does not have detailed transformer
22
fault data. After consultation with the Asset Strategy Team, a reasonable assumption
was made that each outage on a substation circuit corresponds to a fault occurring
at a transformer on the circuit. Each outage in the POR is identified by an incident
ID and a circuit location. The assumption is that each outage that occurs in a given
circuit location means that a fault of some sort has occurred at the transformers on
that circuit.
Moreover, analyzing the incident IDs shows that a given incident can impact
multiple locations and by extension the transformers on those circuits. This is the
basis of the assumption of correlation between faults at different transformer locations,
since the data shows that a fault at one transformer is likely tied to the same incident
that caused faults at other transformer locations. Thus,ˆN ft is fitted to the number
of unique incidents that occur every year. Figure 3.2 shows the graph of the number
of incidents per year in the years for which there is data.
Figure 3.2: Historical number of unique incidents per year
23
3.2.3 Fitting the Poisson Distribution
Analysis of the data yields:
µ = 230.6
σ2 = 1, 767.5
A Poisson distribution has 1 parameter λ such that µ = σ2 = λ. Given the estimates
of µ and σ2 from the data, the preliminary results indicate that the Poisson distribu-
tion might not yield a good fit. The poissfit function in MATLAB finds λ using the
Maximum Likelihood Estimator. Using the poissfit function on the historical fault
data yields:
λ = 230.6
Unsurprisingly, the MLE estimate for λ is equal to the mean of the historical data. In
order to check the goodness-of-fit of the fitted model, the Kolmogorov-Smirnov test
(KStest) is used. The results of the KStest are:
pPoisson = 0.3079
Although the p-value is high enough that the KStest fails to reject the null hypothesis
that the empirical data comes from the Poisson distribution, it signifies that the
Poisson distribution is not an especially good fit for the historical fault data. Given µ
and σ2, the empirical data is characterized by a variance that is far greater than the
mean. This problem is defined as overdispersion, where the variance of the empirical
data is higher than is expected from a given statistical model.
24
3.2.4 The Negative Binomial Distribution
Ismail and Jemain (2007) identify the use of the Negative Binomial distribu-
tion as a suitable variation on the Poisson distribution to handle the problem of
overdispersion. Furthermore, the Negative Binomial distribution is used in the ex-
isting literature to model grid outages (Liu et al., 2005). A process following the
Negative Binomial distribution is characterized by many of the same properties as a
Poisson process, such as independent and stationary increments. Such a process has
the following additional properties:
1. The Negative Binomial distribution has 2 parameters, r and p.
2. If Bt describes a process following a Negative Binomial Distribution, the prob-
ability mass function is equal to:
P (Bt = k) =
(k + r − 1
k
)(1− p)rpk
3. µNB = pr1−p
4. σ2NB = pr
(1−p)2
A Poisson distribution has 1 parameter λ that remains constant. The Negative Bi-
nomial distribution has 1 additional parameter, which defines a specific distribution
for λ. In this way, the Negative Binomial distribution is equivalent to a Poisson
distribution where λ comes from a Gamma distribution such that E(λ) = µ.
The nbinfit function in MATLAB is used to compute the MLE estimates for
parameters r and p. The nbinfit function yields the following fitted values:
r = 39.18
p = 0.8548
25
Figure 3.3 shows the difference between the fitted Negative Binomial and Poisson
distributions. As expected, the Poisson distribution is characterized by a much lower
Figure 3.3: Comparison of the Poisson and Negative Binomial distributions. The green line showsthe p.m.f. of the fitted Negative Binomial distribution and the blue line shows the p.m.f. of thefitted Poisson distribution
variance, with most values concentrated within a small range around the mean. In
comparison, the Negative Binomial distribution has a much higher variance. In order
to compare the goodness-of-fit between the two models, the KStest is run comparing
the empirical data with the fitted Negative Binomial distribution, yielding:
pNB = 0.9625
Comparing the two p-values shows that the Negative Binomial distribution has a
significantly higher p-value than the Poisson distribution. Thus, the Negative Bi-
26
nomial distribution is a much better fit for the historical data and is chosen as the
distribution for the faults.
ˆN ft ∼ NB(39.18, 0.8548)
3.3 Location of Faults
The location of each fault that occurs is assumed to be random. It is assumed
that each fault has an equal probability of occurring at any of the transformer lo-
cations in the grid. Thus, the location of each fault is sampled from a uniform
distribution:
ljt ∼ Unif(1, 358)Z
3.4 Transformer Correlation
Correlation between each of the transformer locations is based off of a combi-
nation of the Plant Operations Report and the current list of transformers in the
network. This corresponds to the ρ variable.
Using the unique incident IDs in the POR, we can find the number of the cir-
cuit locations that each incident affected. We denote the number of circuit location
affected in 2013 as Nl. From the list of transformers, we can find the number of
transformers on each circuit. The mean number of transformers on each circuit is
denoted µT . Thus, µTNl is the distribution of the number of transformers impacted
by each fault on average. From the data:
µT = 2.73
27
Figure 3.4 shows the distribution of µTNl. The mean of µTNl is equal to 8.28. There
Figure 3.4: Distribution of the number of transformers affected per fault on average
is a lot of uncertainty surrounding the amount of correlation between transformers.
Certainly, a fault does not affect every single transformer equally. After consultation
with PSE&G, an assumption was made that the correlations between transformers
would vary between 0.5 and 0.9. Thus, Algorithm 3 was used to generate ρ for each
simulation with tunable parameter p.
In order to tune p, we ran 1,000 simulations and checked the average number of
correlated transformers in the generated correlation matrix. Values of p in the interval
of [0.01,0.10] were tested. Table 3.1 shows the results of tuning p. For p = 0.02, the
average number of correlated transformers was 8.24, which was the closest to the
empirical average of 8.28. Thus, ρ was generated using Algorithm 3 with tunable
28
Algorithm 3 Generation of Correlation Matrix ρ with Tunable Parameter p
F[t] = 0for i = 1 to N do
for j = 1 to N doif i = j thenρ(i, j) = 1
elseif Unif(0, 1) < p thenρ(i, j) = Unif(0.5, 0.9)ρ(i, j) = ρ(j, i)
end ifend if
end forend for
p µtNt
0.01 4.560.02 8.240.03 11.740.04 14.870.05 18.370.06 22.410.07 25.980.08 29.000.09 33.140.10 36.70
Table 3.1: Parameter selection for correlation matrix
parameter p = 0.02.
3.5 Failure Times and Fault Magnitude
The failure times and fault magnitude are all in terms of a real age rit. Since real
age is a constructed measure of aging that has no corresponding physical data, we
must find reasonable distributions for the failure times and the fault magnitude that
produce failure rates similar to observed or expected failure rates. This corresponds
to the exogenous variables L and mjt .
29
A transformer i experiences failure when rit > Lit. rit is determined from both
the chronological age of the transformer and the impact of aging through faults, but
the uncertainty in rit is entirely derived from the aging through faults. Thus, L and
mjt together determine when a transformer will fail, meaning that the distributions
for these 2 pieces of exogenous information must be fitted together.
3.5.1 Methodology
There is no data on the real age of transformers and the real age impact of
faults. However, we do know that the result of aging is transformer failures. Thus,
given an assumption of what the rate of transformer failures will be, we can test the
goodness-of-fit of distributions for failure times and fault magnitude.
First, a failure rate path Base Model is determined that represents what the
failure rates will look like in t = [70, 120], based on assumptions from PSE&G. Second,
distributions are chosen for the L and mjt , with unknown values for parameters.
Third, we iterate through different values for each of the parameters and compare the
sample failure rate using the parameters with the Base Model. The p-value of the
Kolmogorov-Smirnov test between the Base Model and sample failure rate is used to
determine goodness-of-fit.
3.5.2 The Base Model
The Base Model was chosen based on certain assumptions on the failure rate
going forward. Figure 3.5 shows the number of failures per year in the Base Model.
The Base Model was chosen because the characteristics match the assumptions that
PSE&G indicated they would expect to see in the failure rates:
1. The model shows no failures before t = 70. This is a strong assumption because
30
Figure 3.5: The number of failures per year in the Base Model
the current age of transformers is based off of transformers that have not yet
failed and are still in operation today. Thus, the model for failures that is ulti-
mately chosen should not show a high number of failures before t = 70. Given
the uncertainty surrounding the problem, we expect that the final model will
show a small number of transformers failing before t = 70 for some realizations,
but the goal is to minimize the number of these early failures.
2. The mean number of failures per year in the interval t = [70, 120] is 6. Not
accounting for the spikes, the mean number of failures is closer to 5 per year.
Based on consultation with PSE&G, 5 failures per year is the number of failures
that they expect to see in a normal year.
3. There are 3 years in which there are observed spikes in the failure rate. 2
31
of these spikes are higher than 15 failures, and another spike is smaller at 10
failures. Based on consultation with the Asset Strategy team, these spikes are
what they are trying to avoid. It is important to note that the failure rates
are the number of failures observed, not the number of replacements. Given
a replacement policy, it would be possible to observe lower spikes if certain
transformers are replaced early.
4. The frequency and magnitude of the spikes are key characteristics of the Base
Model. The low frequency of the number of years in which there are spikes
is a strong assumption for PSE&G. They are especially worried about just a
couple years in which there are spikes. It would not be reasonable to see higher
frequency of spikes. The magnitude of the spikes is also important because they
are lower than 20 failures per year. From PSE&G’s perspective, more than 20
failures in a given year would be catastrophic and they do not expect to see
such a high failure rate. These assumptions are also important for the ability to
design a more efficient replacement policy, as a high frequency and magnitude of
failure rates would make it impossible for PSE&G to avoid system-wide failure.
3.5.3 Failure Times
The failure times in terms of real age are assumed to be different for older
transformers and newer transformers. The older transformers are those older than
50 years old, and correspond to the transformers i ∈ [1, 180]. Newer transformers
are assumed to be transformers that are younger than 50 years old, which include
transformers i ∈ [181, 358]. Moreover, all the new transformers that are installed are
assumed to be newer transformers and have lower failure thresholds in terms of real
age.
Thus, distributions must be fit for L for t < 20 and t > 20 denoted L1 and L2
32
respectively. Based on assumptions from PSE&G, a reasonable difference between L1
and L2 is 30 real age units. We denote the parameter Rf as the failure threshold in
terms of real age for the older transformers. In addition, there is a certain amount of
randomness in the failure thresholds of transformers. The transformers in each group
are not identical and as such would not have the same exact failure thresholds. Thus,
an assumption was made that failure thresholds would be distributed in a ±10 real
age unit band around Rf . Therefore:
L1 ∼ Unif(Rf − 10, Rf + 10)
L2 ∼ Unif(Rf − 30− 10, Rf − 30 + 10)
The parameter for the distribution of failure times that must be fit is Rf .
3.5.4 Magnitude of Faults
“We think that the distribution of faults is such that 10% of the time faults
have no impact, 80% of the time there is some impact, and the other 10%
of the time the fault has a significant impact on the transformer.”
Richard Wernsing, Manager of Asset Strategy
This assumption forms the basis of the model for fault magnitude. We denote m1,
m2, and m3 such that:
P (mjt ∼ m1) = 0.1
P (mjt ∼ m2) = 0.8
P (mjt ∼ m3) = 0.1
Thus, we must find distributions for m1,m2,m3. Of these, we are most concerned
33
with m3, as the assumption is that these are the faults that produce significant impact
and therefore will affect the failure times the most. For m1 , the assumption is that
these faults have no impact on the aging of transformers. Therefore the fitting of m1
is straightforward:
m1 ∼ 0
For m2, we also assume that the impact of these faults is relatively minor. Therefore,
it assumed that m2 is fit to:
m2 ∼ Unif(0, 0.5)
The fitting of m3 is the most important. It is assumed that there is a significant
amount of uncertainty in the faults that fall under the category of m3. PSE&G has
experienced significant faults that range of from just aging of transformers to faults
that cause extensive outages of transformers. Therefore, we assume that m3 must
also come from a set of distributions to capture the wide range of significant faults.
We denote m3,1,m3,2,m3,3 such that:
P (m3 ∼ m3,1) = 0.1
P (m3 ∼ m3,2) = 0.8
P (m3 ∼ m3,3) = 0.1
We assume that these distributions increase in severity such that, on average, m3,3 >
m3,2 > m3,1. However, we do not have an assumption for the relative scale of these
three distributions, so we further denote scaling parameters am, bm such that:
m3,1 ∼ Unif(0, 1)
m3,2 ∼ Unif(0, am(3 + bm))
m3,3 ∼ Unif(3am, am(5 + bm))
34
am and bm control the relative and absolute differences between the distributions. We
can test different relative scales by varying am, bm. The parameters that must be fit
for fault magnitude are thus am, bm.
3.5.5 Parameter Selection
The parameters Rf , am, bm were tested iteratively and each combination of the
3 parameters was run for 50 iterations. The ranges for each parameter were as fol-
lows:
Rf = [110, 210] in increments of 5
am = [0.5, 1.9] in increments of 0.1
bm = [−2, 1.9] in increments of 0.1
21 values were tested for Rf , 15 for am, and 40 for bm, yielding a total of 12,600
combinations of the 3 parameters.
As described earlier, the p-value of the KSTest between the sample failure rate
and the Base Model was used to choose the best combination of parameters. We use
2 versions of this statistic:
1. ptot denotes the p-value when the sample failure rate and the Base Model are
compared for t = [1, 120].
2. p50 denotes the p-value when the sample failure rate and the Base Model are
compared for t = [70, 120].
Both statistics are used in the parameter selection because each statistic tests for
a different characteristic. p50 tests for the similarity of the failure rates specifically
in the simulation time period. p50 is the more important statistic since the failure
35
rate in the future is central to the problem at hand. However, ptot must also be
considered since, as earlier explained, the chosen model should minimize the number
of transformer failures before t = 70. ptot considers the entire timeframe so it tests
for this characteristic. Without considering ptot, a model that has a high p50 value
might closely match the failure rates in the simulation time period but also have a
significant number of failures before t = 70.
Rf am bm ptot p501 210 1.8 -0.3 0.519 0.1322 205 1.0 1.3 0.545 0.1233 210 1.1 1.6 0.533 0.1224 205 1.2 0.6 0.541 0.1215 210 1.2 1.3 0.517 0.1136 210 1.2 1.5 0.509 0.1117 210 1.1 1.3 0.536 0.1098 200 1.2 0.2 0.536 0.1089 200 1.3 -0.2 0.503 0.10810 200 1.0 1.5 0.490 0.108
Table 3.2: Summary of best fit parameters
Table 3.2 shows the top 10 combinations of parameters ranked by p50. All of the
top ranked combinations also have similar ptot, so it is safe to choose based solely on
p50. Thus, combination 1 is chosen and final parameters are:
Rf = 210
am = 1.8
bm = −0.3
Figure 3.6 shows sample paths of the failure model using the chosen parameters.
36
(a) (b)
(c) (d)
Figure 3.6: Sample paths of failure rates using the fitted stochastic model
37
Chapter 4
Policies
This chapter details the different policies that we have tested. The policies oper-
ate using a different model, denoted as the Observable Model. All of the tested policies
are evaluated in comparison to PSE&G’s current policy of replacement (denoted as
the Base policy), which replaces transformers when they fail DGA testing. In order
to avoid years where there are too many failures to replace (“spikes”), the policies
choose additional transformers to replace in addition to the ones that fail DGA test-
ing. Furthermore, the policies are only implemented for time periods t = [70, 120].
This is because the policies are tested for effectiveness in preventing failures in the
future.
4.1 The Observable Model
In contrast to the Unobservable Model explained in Chapter 2, this section out-
lines the Observable Model. The Observable Model contains only the information
that the policies have access to. There are certain state variables that are in both
the unobservable and observable models, such as the chronological age of the trans-
38
formers. In addition, the observable state will include statistics that certain policies
utilize.
4.1.1 The State Variable
The state variable st includes the following:
1. ait for transformer at location i ∈ [1, N ]
Denotes whether or not the transformer has entered operation. ait = 1 if the
transformer at location i has been initialized and 0 otherwise. Because the
transformers in the network are of different ages, the model must keep track of
when each transformer is first initialized. This is only important in t = 1, 2, ...70.
After t = 70, ait = 1 for all i.
2. cit for transformer at location i ∈ [1, N ]
Denotes the chronological age of transformer i.
3. N it,f for transformer at location i ∈ [1, N ]
Denotes the number of faults that each transformer has experienced over its
lifetime.
Thus, the observable state st =([ait], [c
it], [N
it,f ]). Compared to the unobservable
state, the number of variables in the observable state is small. This reflects the core of
the problem that PSE&G is facing: there are many factors affecting the transformers
that the utility does not have information for.
4.1.2 Decision Variables
The decision that must be made in each time period is which transformers should
be replaced. The decision variable is as follows:
39
1. xt a vector of which transformers to replace
Denotes the transformers to be replaced at the end of time t. Since replacement
policies only begin after t = 70, xt = { } for t < 70.
4.1.3 Exogenous Information
The exogenous information at time t in the observable model does not all ar-
rive at the same time. Rather, the exogenous information is divided into two time
periods:
1. The information that arrives before the replacement decision xt is made
2. The information that arrives after the decision is made but before the time
period t+ 1 begins
Moreover, in terms of the simulation, the exogenous information comes from 2 sources:
the unobservable model and events that occur with some randomness. In order to
distinguish between the unobservable and observable exogenous information, the su-
perscript “o” in addition to the ˆ will be used to denote observable exogenous
information.
The observable exogenous information that arrives before the replacement deci-
sions is as follows:
1. aot vector
Denotes which transformers enter operation in time t. This information is
the same as the unobservable exogenous information at detailed in section 3.1,
pg. 19.
2.ˆ
N f,ot
Denotes the number of faults that occur in time period t. This information is
40
the same as the unobservable exogenous informationˆN ft detailed in section 3.2,
pg. 21.
3. ˆlj,ot a vector for every fault j
Denotes the transformer location where the fault occurs at.
ˆlj,ot is distinct from, but related to, the exogenous information denoted ljt in the
unobservable model detailed in section 3.3, pg. 27. In the unobservable model, the
impact of fault j is calculated to be mjt ∗ ρ(ljt , i), where mj
t is the magnitude of the
fault and ρ is the correlation between locations (see pg. 17). However, mjt is not
observable to the policy, so the observable information is just every location on which
the fault had an impact. Therefore,
ˆlj,ot =[i ∈ (mj
t · ρ(ljt , i) > 0)]
Thus, the observable model is able observe the location and the number of faults that
occur in a given year. It is not able to observe the magnitude of the faults.
4. Dt a vector of which transformers fail DGA testing
Denotes which transformers fail DGA testing in a given year. The results of
DGA are a function of the physical condition of the transformers. For simulation
purposes, a threshold with some uncertainty was chosen in order to properly
simulate DGA. The chosen parameters are detailed in section 4.2, pg 44.
After the replacement decision is made, the exogenous information that arrives is as
follows:
5. Ft a vector of which transformers experience failure
Denotes which transformers have experienced failures. This information comes
from the unobservable model in section 3.5, pg. 29, such that a transformer i
experiences failure when rit > Lit. For the policy simulation, we are primarily
41
concerned with the t = [70, 120]. The stochastic model of failure times is selected
to minimize the number of failures that occur before t = 70, but inevitably
some will occur. These will be assumed to be replaced after failure and are not
considered in the policy testing results.
4.1.4 Transition Functions
The transition functions are as follows:
1. ait+1 = 1 if i ∈ at
Denote transformer i as entering operation if the exogenous information denotes
as such.
2. cit+1 =
0, if i ∈ xt
0, if i ∈ Ft
cit + 1, else
Reset the chronological age of a transformer if it fails or is replaced, otherwise
increment the chronological age by one year. Note that in the unobservable
model, the real age rit is also reset when cit is reset.
3. N it+1,f =
0, if i ∈ xt
0, if i ∈ Ft
N it,f +
∑ ˆNf,o
tj=1 1{i∈ ˆ
lj,ot }, else
Similar to the chronological age, the counter for the number of faults experienced
is set to 0 if the transformer fails or is replaced. Otherwise, the number of faults
that affected that transformer location is added to the previous total.
42
4.1.5 Objective Function
The objective function is to minimize the cost to PSE&G and is formally written
as:
minπ
[ 120∑t=70
C(st, Xπ(st))
]In the context of this problem, the costs incurred by PSE&G can be decomposed into
2 costs:
1. Co(st, , Xπ(st))
Denotes the opportunity cost, which is the cost of replacing a transformer before
it reaches failure. All policies that seek to prevent failure will incur some amount
of opportunity cost.
2. Cf (st, , Xπ(st))
Denotes the cost of failure, which is the cost of too many transformer failures in
time t. PSE&G is concerned with minimizing risk. As such, the costs of failure
must be evaluated in terms of the expected value of some risk measure.
The objective function can be reformulated as:
minπ
[ 120∑t=70
Co(st, Xπ(st)) +E
120∑t=70
Cf (st, Xπ(st))
]
This objective function represents the trade-off that PSE&G faces in solving this
problem. PSE&G could avoid the possibility of a high number of transformer failures
by replacing all of its transformers well before they are close to failure. However,
this would incur significant opportunity costs. Thus, the objective function seeks
to minimize the sum of these 2 costs, and the best performing policy will be one
that minimizes failures while also keeping opportunity costs low. Although the con-
cepts of these 2 costs are straightforward, quantifying the costs is a slightly more
43
difficult problem. The process of quantifying these costs is detailed in section 5.1.3,
pg. 60.
4.2 Base Policy
The Base policy is chosen to be PSE&G’s current policy for transformer re-
placement. The current policy is to replace a transformer when a set of diagnostics,
the most prominent being Dissolved Gas Analysis (DGA) testing, indicates that the
transformer is about to fail. The literature (Hamrick (2009), Karlsson (2007)) shows
that DGA testing only indicates a problem when the transformer is on the verge of
failing. Thus, DGA testing does not give any information about the aging of trans-
former before it reaches this point. The exogenous information Dt indicates which
transformers fail DGA testing in a given year t.
PSE&G’s current policy is to just replace whichever transformers fail DGA test-
ing. However, PSE&G faces a constraint on how many transformers it can replace
in a given year, denoted as ν. According to the Asset Strategy team, the maximum
number of replacements that can occur in a year is ν = 8 replacements. Thus, even
if DGA testing indicates that more than ν transformers are near failure, we assume
that PSE&G will not be able to replace all of them.
Thus, the base policy can be formulated as:
XBP (st) =
Dt, if n(Dt) ≤ ν
Dt[1 : ν], if n(Dt) > ν
where n(·) denotes the number of elements contained. Moreover, we assume that
elements in Dt are arranged in temporal order, such that the first element in Dt is
the first transformer to fail DGA testing in time t.
44
4.2.1 Simulating DGA Testing
DGA testing results are a function of the physical condition of the transformer.
However, we do not simulate the specific physical conditions of transformers. Thus,
we must find a way, in terms of the conditions that are simulated, to decide when
a transformer fails DGA testing. The process of determining DGA failure will have
access to the full unobservable model.
We assume that the threshold for DGA failure is in terms of the real age rit and
failure threshold Lit. Since transformer failure occurs when rit > Lit, the threshold
for DGA failure must be where rit ≤ Lit. We assume the following distribution for
Dt:
Dt = {i ∈ [1, N ] | rit > Lit − d}
with parameter d indicating the distance, in terms of real age, from failure when DGA
testing would indicate a problem.
Since DGA testing is not exact, d is chosen not to be a static value, but rather
to be a distribution in order to capture this uncertainty:
d ∼ Unif(1, du)
Since failing a DGA test must occur before transformer failure, the lower bound of d
was fixed at 1. However, the upper bound du is a parameter that must be chosen in
order to best match reality. After testing over 1000 iterations of the failure model,
du = 3 was determined to be the most realistic. This was based on 2 criteria:
1. The average number of transformers that fail DGA testing in t = [70, 120]. The
average in reality should be between 5-6 per year, according to PSE&G.
2. The base policy of replacing when DGA testing indicates failure up to 8 per year
45
resulted in a couple years where there are spikes of failures. This assumption
is core to the problem we are trying to solve. If the base policy sufficiently
prevented all failures then there would be no need to find better policies.
4.3 Pure Aging Policy
The Pure Aging policy chooses transformers to replace based solely on the
chronological age of the transformer. This relatively simple policy will seek to re-
place all transformers whose age exceeds a certain threshold, denoted α. As with all
the policies, this policy gives priority to those transformers that fail DGA testing.
The policy will choose additional replacements in years which Dt < ν.
Let At denote the set of all transformers that exceed a certain chronological age
α, such that:
At = {i ∈ [1, N ] | cit > α}
where α is a tunable parameter. We assume that the elements in At are sorted in
descending age order. The Pure Aging policy is then formulated as:
XPA(st) =
XBP (st), if n(Dt) > ν
Dt ∪At[1 : (ν − n(Dt))] else if n(Dt) + n(At) > ν
Dt ∪At, else if n(Dt) + n(At) ≤ ν
The Pure Aging policy will test the correlation between transformer failure and
chronological age. If all transformers fail in the similar age range, the Pure Aging
Policy should perform very well with a properly tuned α parameter. On the other
hand, if the correlation is low, the Pure Aging Policy might still be able to reduce
transformer failure, while incurring a very high opportunity cost.
46
4.4 Variance Reduction Policy
The Variance Reduction policy seeks to replace the same number of transformers
every year. The motivation for the Variance Reduction policy is from looking at the
replacement path of the Base policy. In the Base policy, the number of replacements
per year varies significantly, with some years at the maximum of 8 replacements
and other years as low as 1-2 replacements. The Variance Reduction policy tries to
“smooth” the number of replacements per year by replacing a constant number every
year, denoted by parameter η.
The question arises of how to choose which transformers to replace. If n(Dt) < η,
we must find a way to choose the additional η − n(Dt) transformers. We introduce
the concept of the weighted ranking, denoted as W it . W
it uses the chronological age,
cit, and number of faults experienced, N it,f , to assign a ranking to each transformer
i:
W it = cit + βN i
t,f
where β is a tunable parameter that determines the weighting given to the number
of faults experienced.
We further denote Wt as the sorted array of all W it in descending order. The
Variance Reduction Policy can then be formulated:
XV R(st) =
XBP (st), if n(Dt) > η
Dt ∪Wt[1 : (η − n(Dt)], else
The tunable parameters in this policy are β and η.
47
4.5 Threshold Policy
The Threshold policy builds upon the intuition of the Variance Reduction policy,
by more specifically utilizing W it . Rather than replacing a constant number of trans-
formers every year, the Threshold policy will replace transformer i when W it exceeds
a threshold value, denoted as τ .
Let Tt denote the set of transformers where W it exceeds τ such that:
Tt = {i ∈ [1, N ] | W it > τ}
Then, the Threshold Policy is formulated as:
XT (st) =
XBP (st), if n(Dt) > ν
Dt ∪Tt[1 : (ν − n(Dt)], else if n(Dt) + n(Tt) > ν
Dt ∪Tt, else if n(Dt) + n(Tt) ≤ ν
The tunable parameters in this policy are τ and, as before in the Variance Reduction
policy, β.
Both the Threshold and Variance Reduction policies rely on the concept of the
weighted ranking, W it . The intuition for W i
t is to better utilize the information that
is available to PSE&G. Currently, PSE&G does not use the number of jolts N it,f in
any way when making the replacement decision. By implementing the Threshold and
Variance Reduction policies, we will test the effectiveness of incorporating N it,f in the
decision making process.
48
4.6 Lookahead Policy
4.6.1 Overview of Lookahead Policies
Lookahead Policies describe a class of policies that plan over future periods
in order to make the current decision. In other words, Lookahead policies look at
st+1, st+2, · · · , st+T in order to make the decision at time t, xt.
In contrast to the other policies tested, the Lookahead policy evaluates not just
the current state of the system, but attempts to evaluate future states as well. As
such, the Lookahead policy requires an accurate method of prediction of how st will
evolve over time. For a Lookahead policy that looks ahead T times periods, it can be
generically formulated as:
X(st) = argminxt,xt+1,··· ,xt+T
C(st,xt) +T∑
t′=t+1
E[C(st′ , xt′)]
where C(·) is a cost function that is used to evaluate the cost of being in a state.
In the case of this problem, the objective is to minimize the occurrences of years
when the number of failures exceed ν. Thus:
C(st,xt) = 1 · (n(Ft)− ν)+
However, the cost function has to also take into account the replacement policy in
time t. We assume that in every case, transformers will fail DGA testing before
entering failure. Thus, PSE&G will replace all transformers that fail DGA testing up
to ν. Thus, the cost function is adapted to be:
C(st,xt) = 1 · (n(Dt)− ν)+
49
The difficult aspect of the Lookahead policy is in predicting the future. We
will use a point estimate of how many failures occur in future time periods for the
Lookahead policy. This point estimate will be based on the weighted ranking, W it .
We assume that W it increases by some amount every year, denoted by parameter δ.
Similar to the Threshold policy, a threshold parameter, denoted γ in this case, is
chosen such that W it > γ indicates DGA failure. Thus, at time t for some time t′ > t,
the belief model is:
E(W it ) = (W i
t )
E(W it′) = E(W i
t′−1) + δ
E(n(Dt′)) = n(E(Wt′) > γ)
E(C(st′ , xt′)) = 1 · (E(n(Dt′))− ν)+
We implement two versions of the Lookahead policy. The first is a 1 year Looka-
head policy that at time t predicts the number of failures that will occur at time t+1.
The second is a 2 year Lookahead policy that predicts the number of failures that will
occur at times t+1, t+2. Let Lkt be the set of transformers chosen for replacement by
a k year Lookahead policy. The Lookahead policy can then be formulated as:
XLk
(st) =
Dt, if n(Dt) > ν
Dt ∪ Lkt , else
The tunable parameters in this policy are γ and δ. The algorithms for deciding Lkt
are detailed in the sections below.
50
4.6.2 1 year
The 1 year Lookahead policy uses the predicted number of failures that will occur
at time t + 1 to choose xt. Algorithm 4 details the algorithm used to determine L1t .
Algorithm 4 Determine Transformer Replacement in 1 Year Lookahead Policy
Wt = [W 1t ,W
2t , · · · ,WN
t ]Wt+1 = Wt + δ · 1EN f
t+1 = 0for i = 1 to N do
if W it+1 ∈Wt+1 > γ then
EN ft+1 = EN f
t+1 + 1end if
end forn = min (ν − n(Dt), N
ft+1 − ν)
L1t = Wt[1 : n]
4.6.3 2 year
The 2 year Lookahead policy looks at the number of failures that will occur at
t + 1, t + 2 to choose xt. In contrast to the 1 year Lookahead policy, the 2 year
Lookahead is conditioned upon the decision in time t + 1. This means that the 2
year Lookahead policy must loop through every potential path in t+ 1. Algorithm 5
details the algorithm used to determine L2t .
51
Algorithm 5 Determine Transformer Replacement in 2 Year Lookahead Policy
for a = 0 to ν − n(Dt) doif i ∈Wt[1 : a] thencit = 0
end ifWt = [W 1
t ,W2t , · · · ,WN
t ]EWt+1 = Wt + δ · 1EN f
t+1 = 0for i = 1 to N do
if W it+1 ∈ EWt+1 > γ then
EN ft+1 = EN f
t+1 + 1end if
end forE[n( ˆDt+1)] = min (EN f
t+1, ν)
for b = 0 to ν −E[n( ˆDt+1)] doEWt+1[1 : b] = 0EWt+2 = EWt+1 + δ · 1EN f
t+2 = 0for i = 1 to N do
if W it+2 ∈ EWt+2 > γ then
EN ft+2 = EN f
t+2 + 1end if
end fornb = max (EN f
t+2 − ν, 0)end forna = minb nb
end forn = argmina naL2
t = Wt[1 : n]
4.7 Policies To Estimate the Value of Information
In addition to the policies outlined above, we also test 2 additional policies in
order to estimate the value of information. These 2 policies incorporate information
that is not currently observable to PSE&G, and therefore could not be realistically
implemented. The aim of these policies is to find how much better our policies would
improve if given access to better information.
52
The 2 policies are tested under different assumptions and are summarized as
follows:
1. The Lookahead Plus policy is an extension of the 2 year Lookahead policy out-
lined above. This policy will incorporate better information about the current
state of the system, rather than using the weighted ranking as an approxima-
tion. This will estimate the value of better information about the current state
of the grid.
2. The Perfect Information policy assumes perfect information about not only the
current state of the grid but also perfect information about aging that will occur
in the future.
4.7.1 Lookahead Plus Policy
The Lookahead Plus policy assumes that PSE&G can observe how close to failure
each transformer is, In other words, at time t, the policy is assumed to be able to
observe the quantity Lit − rit. Let Pt denote the exogenous variable of the arrival of
this information. Moreover, we can test how the quality of this information impacts
the value of the information, by introducing a noise term ε such that:
P it = Lit − rit · Unif(1− ε, 1 + ε)
Pt = [P 1t , P
2t , · · · , PN
t ]
It is assumed that Pt is sorted in ascending order such that the first transformers are
closest to failure. We further let Lpt be the set of transformers chosen for replacement
by the Lookahead Plus policy. The algorithm 6 details the algorithm for finding Lpt .
The tunable parameters in this policy are ε and δp. It is important to recognize
that even though the Lookahead Plus policy incorporates better information about
53
Algorithm 6 Determine Transformer Replacement in Lookahead Plus Policy
for a = 0 to ν − n(Dt) doif i ∈ Pt[1 : a] thenrit = 0
end ifPt = [P 1
t , P2t , · · · , PN
t ]EPt+1 = Pt + δp · 1EN f
t+1 = 0for i = 1 to N do
if P it+1 ∈ EPt+1 < 0 then
EN ft+1 = EN f
t+1 + 1end if
end forE[n( ˆDt+1)] = min (EN f
t+1, ν)
for b = 0 to ν −E[n( ˆDt+1)] doEPt+1[1 : b] =∞EPt+2 = EPt+1 + δp · 1EN f
t+2 = 0for i = 1 to N do
if P it+2 ∈ EPt+2 < 0 then
EN ft+2 = EN f
t+2 + 1end if
end fornb = max (EN f
t+2 − ν, 0)end forna = minb nb
end forn = argmina naLp
t = Pt[1 : n]
the current state of the system, the δp parameter is still an estimate of how the
transformers will age in the future. Thus, there is still uncertainty about the future
of the system.
4.7.2 Perfect Information Policy
The Perfect Information policy assumes perfect information about the current
state of the system as well as perfect information about the how the system will age in
54
the future. In order to simulate the Perfect Information policy, the amount of aging
due to faults was pregenerated for t = [70, 120].
Algorithm 7 Determine Transformer Replacement in Perfect Information Policy
for t = 70 to 120 doFt = {i ∈ [1, N ] | rit > Lit}if n(Ft) > ν then
xt = Ft[1 : ν]if i ∈ xt thenrit = 0
end ifO = Ft[(ν + 1) : n(Ft)]while n(O) > 0 dot′ = t− 1if n(xt′) < ν then
xt′ = [xt′O[1 : (ν − n(xt′))]for j = t′ to t do
if i ∈ O[1 : (ν − n(xt′))] thenrij = rij−1 + Ar[i, j]
end ifend forO[1 : (ν − n(xt′))] = [ ]
end ifend while
elsext = Ft
if i ∈ xt thenrit = 0
end ifend if
end for
Since the Perfect Information policy assumes perfect information of the future,
let Ar denote the array that keeps track of the amount of aging each transformer
experiences over the entire time period. For example, Ar[i, t] is the amount of aging
experienced by transformer i in time t. The simulation is run without any policy over
all t once to generate Ar. Algorithm 7 details the algorithm used to implement the
Perfect Information policy.
The Perfect Information policy thus utilizes a “spillage” method to prevent fail-
55
ures. The policy iterates forward from t = 70 until the number of failures exceeds ν
in a given year. When this occurs, the policy iterates backwards to check if there is
excess replacement capacity in the previous years. If there is capacity, the policy will
choose to replace transformers then. The policy continues iterating backward until all
transformer failures that exceed ν in the original year are accounted for. In this way,
this policy will make sure that there are no years in which n(Ft) > ν. Moreover, the
Perfect Information policy finds a lower bound on the opportunity cost of policies.
Policies that we implement will always have a higher opportunity cost compared to
the Base Policy, because they try to replace transformers in advance of failure. The
Perfect Information policy can be used to find how low that opportunity cost can
be while preventing excessive transformers failures, and thus it will give important
context to the performance of the other policies.
56
Chapter 5
Results
This chapter presents the results of the policy simulations. First, we identify
the metrics that we use to evaluate the performance of each of the policies. Then,
the results of each policy are presented and compared to each other. The simulation
of the policies involved K = 1, 000 iterations of each policy. Moreover, many of the
policies have tunable parameters, which can be tuned to different values depending
on the assumed risk tolerance. We show the results of the policies using different
values for the tunable parameters.
5.1 Evaluation Metrics
5.1.1 Opportunity Cost
The opportunity cost is the cost of replacing a transformer before it experiences
failure. The earlier a transformer is replaced, the higher the opportunity cost will be.
We assume that the opportunity cost of replacing a transformer is equal to how far
from the failure threshold it is at the time of replacement. The policy is implemented
57
in the time interval t = [70, 120] and xt represents the replacement decision at time
t. The opportunity cost for the k-th simulation is then represented by:
OCk =120∑t=70
∑i∈xt
max(Lit − rit, 0)
Note that we do not consider negative opportunity costs: in other words, if a trans-
former has already exceeded the failure threshold at time of replacement, the oppor-
tunity cost is assumed to be 0. Furthermore, we define the unit opportunity cost,
OCuk , as the opportunity cost incurred per replacement in the k-th simulation:
OCuk = OCk/
120∑t=70
n(xt)
OCk is the total opportunity cost in terms of real age. In order to quantify this cost,
we translate this opportunity cost in terms of lost operation time. Let µa denote
the average annual increase in real age. Through several thousands of iterations of
the model, we find that the policy-independent value of µa = 2.7. Thus, OCk
µais the
total operation time lost in years due to early replacement of transformers in the k-th
simulation.
PSE&G gives us some insight into the cost of losing 1 year of operation. Accord-
ing to PSE&G, the cost of purchasing a new transformer is $500,000. Assuming an
average transformer lifetime of 70 years, the value of each year of operation is:
co =500, 000
70= $7, 142.86
Thus, the total dollar value of the opportunity cost, OCdk , of a policy in simulation k
is defined as:
OCdk =
OCkµa
co
58
5.1.2 Cost of Failure
Recall from section 4.1.5 that PSE&G is primarily concerned with minimizing
the risk of excessive transformer failures. We introduce three risk measures that were
determined after consultation with PSE&G:
1. p9 is the probability that the number of failures is greater than or equal to 9.
2. p12 is the probability that the number of failures is greater than or equal to 12.
3. p15 is the probability that the number of failures is greater than or equal to 15.
These risk measures are derived from the fact that ν = 8. We will use the term
“spike” to mean a given year where the number of failures exceeds ν. 9 failures in
a year indicate that the number of failures exceeds the number that PSE&G could
replace. 12 and 15 failures indicate thresholds where PSE&G would incur increasing
costs due to failure.
These probabilities are computed empirically over the K iterations of each policy.
In this case, the term “failures” is defined loosely to mean the sum of the number of
transformers that fail DGA testing and the number of transformers that experience
failure, Dt + Ft. The probabilities are computed by determining the number of
iterations where the maximum number of failures exceeded the threshold:
p9 =
∑Kk=1 1{maxt∈[70:120](Dt+Ft)≥9}
K
p12 =
∑Kk=1 1{maxt∈[70:120](Dt+Ft)≥12}
K
p15 =
∑Kk=1 1{maxt∈[70:120](Dt+Ft)≥15}
K
In order to translate these risk measures into quantifiable costs, we must assign a
dollar value to transformer failure. We turn to the existing literature on transformer
59
failure to estimate this value. Bartley (2003) examines the cost of transformer fail-
ures from the perspective of insurance claims. The paper examines insurance claims
on transformer failures between 1997 and 2001. Specifically, for utility substation
transformers, the average cost per transformer claim, µf , is $520, 974, which includes
the cost of property damage and business interruption (Bartley, 2003, pg. 3).
The value from Bartley (2003) is used to derive the cost of failure. We assume
that the cost of failure for a policy is the weighted sum of the risk measures and the
cost of the relevant number of failures. Moreover, since multiple transformer failures
in the same year will incur higher costs per transformer failure, we assume a scaling
factor that increases the cost per failure by 1.5 and 2 for years where there are 12
and 15 failures respectively. Thus, the cost of failure is defined as:
FCd = p9(9µf ) + 1.5p12(12µf ) + 2p15(15µf )
This formulation assumes that failure costs are only incurred when the number of
transformer failures in a given year exceeds 8, because otherwise PSE&G would be
able to replace all failing transformers.
5.1.3 Empirical Objective Function
Recall from section 4.1.5 that the objective function is defined as:
minπ
[ 120∑t=70
Co(st, Xπ(st)) +E
120∑t=70
Cf (st, Xπ(st))
]
where the first term represents the opportunity cost and the second term represents
the cost of failure. From the formulation of the cost of failure in the previous sec-
tion, quantifying the failure cost is done after K iterations of the simulation. OCdk
represents the opportunity cost of the k-th simulation. We introduce the average
60
opportunity cost over K simulations, OCd, such that:
OCd
=1
K
K∑k=1
OCdk
And thus, we define the empirical objective function to be:
minπ
(OC
d+ FCd
)
5.1.4 Failure Risk vs. Opportunity Cost per Replacement
We introduce another way to present our results through visualizing the trade-
off between failure risk and opportunity cost. This trade-off is presented graphically
where:
1. The average unit opportunity cost in terms of years, OCu
µa, where OC
uis defined
as the average unit opportunity cost over K simulations, is on the x-axis.
2. One of the risk measures, p9, p12, p15, which shows the probability of a spike
greater than the respective threshold occurring, is on the y-axis.
This graphical representation has some key advantages over simply looking at the
objective function:
1. A very simple representation of the results, from the perspective of the policy
maker. The utility can decide its own risk tolerance and pick a policy that
reduces failure risk to within that tolerance. For example, a utility might decide
that it is willing to incur extra opportunity cost in order to minimize more risk.
Thus, this allows the results of our simulations to be more extendable to different
policy makers.
2. The results are presented without quantified dollar values assigned to them.
61
This makes the results independent of the assumptions about costs and allows
different utilities to assign their own costs appropriate for their grid.
From this, we can find the efficient frontier. Policies that lie on the efficient
frontier have the lowest unit opportunity cost for a given level of failure risk.
5.2 Comparing Policy Performance
First, we evaluate the policy performance in terms of the objective function (or
cost). Figure 5.1 shows the values of the objective function for the simulation results
of the different policies. Each policy is tested with various values for the tunable
parameters. We have chosen in Figure 5.1 to only show the results using the best sets
of tunable parameters for each policy; we will look at the effect of tuning parameters
in detail later on in this section. Even so, we see that tuning the parameters has
a large effect on the objective function in almost all the policies. The Threshold
Figure 5.1: Values of the objective function of policies with different values for the tunable param-eters. The orange line represents the minimum value across all policies and the labels above eachpolicy represent the minimum value within each policy.
62
policy performed the best, with the minimum cost across all policies. The 1-year
and 2-year Lookahead policies incurred costs that were not much higher than the
Threshold and had the next best performance. Neither the Pure Aging or Variance
Reduction policies were able to outperform the Base policy in terms of cost. This
does not mean that these 2 policies were ineffective at reducing the failure risk, but
rather that the trade-off in opportunity costs was too high. Next, we look at how the
different policies perform in terms of this trade-off.
Figure 5.2: The best and worst values of the objective function within each policy, with costs brokendown by opportunity cost and failure cost
Figure 5.2 shows the highest and lowest costs within each policy and compares
the cost breakdown in terms of cost of failure, FCd, and opportunity cost, OCd.
We see that in the Variance Reduction, Threshold, and both Lookahead policies, the
highest cost occurs when the cost is almost entirely opportunity cost. This suggests
that these policies are able to fully eliminate failure risk but at an extremely high
opportunity cost. In terms of the lowest costs within each policy, the best performing
policies (Threshold and Lookaheads) have a more balanced breakdown between cost
63
of failure and opportunity cost. This shows that these policies are able to reduce
failure risk such that the trade-off in opportunity cost is worth it. Comparing this
to the lowest cost Pure Aging and Variance Reduction policies, we see that these 2
policies have a much higher proportion of costs due to cost of failure. In fact, the
lowest Pure Aging and Variance Reduction costs do not have much lower costs of
failure than the Base policy. The preliminary conclusion from this is that these 2
policies are not as effective as the Threshold and Lookahead policies at identifying
which transformers to replace, resulting in opportunity costs that outweigh reduced
failure risk. We can look at the relationship between the number of replacements
made in t = [70, 120] and the unit opportunity cost to get a better understanding of
the opportunity cost.
Figure 5.3: The relationship between the number of replacements made in the time interval t =[70, 120] and the opportunity cost per replacement compared across policies
From Figure 5.3, we see that the number of replacements chosen by a policy
has a direct relationship with the unit opportunity cost. This is expected because
as the total number of replacements increases, the number of replacements per year
64
increases, which results in more transformers being replaced before failure. The slope
of this graph indicates the rate at which the opportunity cost increases as the num-
ber of replacements increases. Since we want to minimize the opportunity cost, a
smaller slope indicates better performance because it shows that a policy is choosing
additional replacements while incurring lower opportunity costs. The Threshold and
Lookahead policies show a very similar shallow slope. In contrast, the Pure Aging
policy has the highest slope, showing that it is choosing to replace a small number of
additional transformers at an extremely high opportunity cost.
As a reference point, the Base policy chooses to replace 279 transformers on
average, while the total number of transformer failures is 290 on average. While
the Base policy on average only fails to replace 11 transformers, these additional
failures all occur in just a couple years, which results in the spikes we are worried
about. However, given that the total number of transformer failures was 290, we
see from Figure 5.3 that many of the policies over-replace transformers. For many
of the policies, even when choosing to replace far more transformers than 290, the
failure risk is still significant. This suggests that the policies are not very good at
determining which transformers are about to fail, and in order to reduce failure risk,
they must cast a wider net and incur significant opportunity costs.
In this section, we have presented a high-level analysis of the polices through
comparing the performance across policies using the objective function. Next, we will
look at the performance of each policy in detail.
65
5.3 Policy Specific Results
5.3.1 Base Policy
The Base policy represents PSE&G’s current policy of replacing transformers
when they fail DGA testing. Table 5.1 shows the results for the Base policy, where
the columns represent the unit opportunity cost in years, the 3 risk measures, and
the value of the objective function respectively.
OCu/µa p9 p12 p15 Obj.
0.215 0.994 0.382 0.042 9328.28
Table 5.1: Results for Base policy
Since DGA testing indicates when a transformer is close to failure, we see that
the opportunity cost of the Base policy is very low. Since all the other policies choose
replacements in addition to the ones that fail DGA testing, the opportunity cost of
the Base policy is thus a lower bound on the opportunity cost. In terms of failure risk,
the Base policy confirms the fears that PSE&G has regarding spikes in the number
of failures per year. p9 is close to 1, indicating an almost 100% probability that there
will be spikes greater than or equal to 9 in a year. Figure 5.4 shows a sample path
of the Base policy. We see that when the failures do occur, the policy chooses to
replace 8 transformers, which is the maximum number of transformers that can be
replaced in a given year. This indicates that the DGA testing is able to identify the
transformers that are about to fail, but there is not enough capacity to replace them
all. This means that in order to prevent these failures, some transformers will need to
be replaced before the year they fail, indicating that the opportunity cost of policies
that seek to reduce failure risk will always be higher than the opportunity cost of the
Base policy.
66
Figure 5.4: A sample path of the Base policy from t = [70,120]. The blue line represents the numberof replacements and the red line represents the number of failures
5.3.2 Pure Aging Policy
α OCu/µa p9 p12 p15 Obj.
110 0.526 0.996 0.370 0.037 9802.06105 0.758 0.983 0.346 0.045 10132.14100 1.016 0.987 0.305 0.032 10102.5895 1.271 0.975 0.281 0.029 10309.2690 1.588 0.952 0.261 0.031 10714.3485 2.011 0.913 0.190 0.013 10487.2780 2.750 0.807 0.103 0.011 10728.4275 4.089 0.559 0.024 0.000 11556.8770 6.079 0.368 0.015 0.000 14954.43
Table 5.2: Results for Pure Aging policy
The Pure Aging policy chooses additional transformers to replace based purely
on chronological age. Recall from section 4.3, that α is a tunable parameter that
indicates a chronological age threshold for replacement. Table 5.2 shows the results
of the Pure Aging policy. We see that as α decreases, the failure risk decreases across
all 3 risk measures since transformers get replaced at younger and younger ages.
However, as failure risk decreases, the opportunity cost increases. Figure 5.5 gives
a graphical representation of the trade-off between failure risk and opportunity cost.
67
Figure 5.5: Performance of the Pure Aging policy across the 3 different risk measures with differentvalues for α. α decreases from left to right.
We see that α can be tuned depending on the risk tolerance of the utility. A
more risk-averse utility would prefer a lower value for α that decreases failure risk
despite incurring higher opportunity costs. We note that the Pure Aging policy is
not able to reduce the failure risk completely: even at low α values, there is still
significant risk of 9 or more failures. Figure 5.6 shows the sample paths of the Pure
Aging policy with different values of α.
The sample paths in Figure 5.6 each show a sustained period where the policy
chooses to replace the maximum number of transformers. As α increases, the time at
which the sustained period begins is later. This suggests that the sustained period of
maximum replacements occurs when a significant portion of the transformer popula-
tion has a chronological age close to α. In Figures 5.6a and 5.6b, we clearly see that
the replacement path becomes similar to the Base policy shown in Figure 5.4 after
the sustained period ends. This is when all of the transformers in the network are
younger than α and the Pure Aging policy reverts to the Base policy. The presence of
68
(a) α = 75 (b) α = 85
(c) α = 95 (d) α = 105
Figure 5.6: Sample paths of the Pure Aging policy with different values for α
a sustained period likely indicates that the α value is too low since the transformers
are consistently reaching α years of age and not failing. In contrast, Figure 5.6d shows
no sustained period, but α = 105 does not significantly reduce the failure risk. From
this, we can conclude that chronological age is not a good predictor of transformer
failure.
5.3.3 Variance Reduction Policy
The Variance Reduction policy is the first policy we test that utilizes the weighted
ranking to choose transformers to replace and seeks to replace a constant number of
transformers over time. The weighted ranking adds information by utilizing both the
69
chronological age of the transformer as well as the number of jolts it has experienced.
The tunable parameters in this policy are β and η. β is a tunable parameter for the
calculation of the weighted ranking, W it . After extensive testing, β was chosen to be
0.2. This value of β is used for the calculation of W it for all other policies that utilize
it. η represents the constant number of replacements per year. Table 5.3 shows the
results of the Variance Reduction policy.
η OCu/µa p9 p12 p15 Obj.
4 0.839 0.988 0.284 0.021 9330.075 1.676 0.892 0.151 0.017 9348.776 3.776 0.251 0.014 0.001 9580.877 7.460 0.003 0.000 0.000 18667.92
Table 5.3: Results for Variance Reduction Policy
Similar to α in the Pure Aging policy, η can be tuned depending on the risk tol-
erance of the utility. As η increases, the failure risk across the 3 parameters decreases.
For η = 4, the policy does not reduce p9 by very much but does reduce the other 2 risk
measures significantly over the Base policy. Overall, we see that for η = 4, 5, 6, the
policy does a decent job of reducing risk while not incurring high opportunity costs.
In these cases, the objective function values are higher than the Base policy, but not
by much. Figure 5.7 gives a graphical representation of this relationship.
We see that between η = 5 and η = 6, the policy is able to dramatically decrease
the failure risk. The average number of failures per year across the time horizon is
equal to slightly less than 6, so this decrease in failure risk makes sense. Unlike the
Pure Aging policy, the Variance Reduction policy is able to reduce the failure risk
entirely when η = 7. However, we see that the opportunity cost increases significantly
between η = 6 and η = 7, suggesting that the trade-off in opportunity cost may not
be worth it. Figure 5.8 shows sample paths for different values of η. We see that
for η = 5, 6, the number of failures is reduced significantly compared to when η = 4.
70
Figure 5.7: Performance of the Variance Reduction policy across the 3 different risk measures withβ = 0.2 and different values for η. η decreases from left to right.
Comparing the graphs for η = 6 and η = 7, we see a dramatic difference. When η = 6,
there are still a few years when the number of replacements exceeds 6. However,
when η = 7, the policy never replaces more than 7 transformers. Since the policy
only chooses to replace more than η transformers when DGA testing indicates more
than η failures, we see that η = 7 is dramatically over-replacing transformers.
The results from the Variance Reduction policy definitely perform better than
the Pure Aging policy. This suggests that the weighted ranking serves as a better
predictor of failure than the chronological age. However, the Variance Reduction
policy suffers from the fact that η must be a discrete value that realistically must lie
between 4 and 7, meaning that the policy loses some flexibility. This is shown by the
huge difference in results between η = 6 and η = 7. We next look at the Threshold
policy, which will also use the weighted ranking but will allow for the policy to be
more flexible in choosing when to replace transformers.
71
(a) η = 4 (b) η = 5
(c) η = 6 (d) η = 7
Figure 5.8: Sample paths of the Variance Reduction policy with different values for η
5.3.4 Threshold Policy
Similar to the Variance Reduction policy, the Threshold policy uses the weighted
ranking to choose replacements. The tunable parameters in this policy are β and τ .
As before, β is chosen to be 0.2. τ is a parameter that represents the nominal
“threshold” in terms of W it , past which a transformer should be replaced. As with
the other policies, τ can be tuned depending on the utility’s risk tolerance. Table 5.4
shows the results from the Threshold policy.
As τ decreases, the failure risk decreases. Unsurprisingly, as failure risk is re-
duced, the opportunity cost increases. However, in contrast to the other policies, the
72
τ OCu/µa p9 p12 p15 Obj.
150 0.218 0.994 0.390 0.046 9472.02145 0.227 0.991 0.383 0.043 9364.74140 0.268 0.997 0.348 0.036 9039.88135 0.392 0.983 0.291 0.028 8570.88130 0.696 0.951 0.212 0.018 8168.57125 1.305 0.828 0.139 0.011 8141.12120 2.313 0.529 0.039 0.004 8006.10115 3.786 0.171 0.006 0.000 9507.75110 5.710 0.024 0.000 0.000 13673.72105 7.952 0.002 0.000 0.000 19836.75100 9.963 0.002 0.000 0.000 26141.16
Table 5.4: Results for Threshold Policy
objective function actually starts to decrease as the failure risk is reduced up to a
point. The minimum value of the objective function is at τ = 120, which moderately
reduces failure risk while not incurring significant opportunity cost. This trend is
significant, because all of the other policies, while successful at reducing the failure
risk, always increased the objective function over the Base policy. This suggests that
the Threshold policy is more efficient at reducing failure risk compared to the policies
we have tested. Figure 5.9 gives a graphical representation of this relationship.
Compared to the Variance Reduction policy, Figure 5.9 shows a more gradual
decrease in failure risk. This is expected since τ is a continuous parameter controlling
risk tolerance whereas η is discrete. This also means that the Threshold policy is able
to eliminate risk at a lower opportunity cost, compared to the Variance Reduction
policy. Moreover, at similar risk levels, the Threshold policy has a lower opportunity
cost compared to the Variance Reduction policy. This suggests that the Threshold
policy is overall more efficient than the Variance Reduction policy.
Figure 5.10 shows sample paths for the Threshold policy with varying values for τ .
We see that in Figure 5.10d where τ = 110, the policy suffers from a similar problem as
the low-α Pure Aging policies in terms of a sustained period of maximum replacement.
73
Figure 5.9: Performance of the Threshold policy across the 3 different risk measures with β = 0.2and different values for τ . τ decreases from left to right.
In this case, the result tells a similar story: the presence of a sustained period of
maximum replacement indicates that the τ is too low since many transformers are
reaching the threshold before failing. We also see that there are smaller sustained
periods of maximum replacement in Figures 5.10a, 5.10b, 5.10c. However, these
periods are much shorter and occur around when transformer failures are occurring.
This indicates that spikes are occurring around when many transformers in the system
are close to the threshold. This suggests that given an appropriate τ , the Threshold
policy is better at identifying which transformers are about to fail than the other
policies we have tested. In contrast to the performance in the Pure Aging policy, we
see that this corroborates our hypothesis that the weighted ranking is a more useful
predictor of transformer failure than chronological age. Nevertheless, although the
sustained periods are shorter and occur at more appropriate times, their occurrences
show that the Threshold policy still over-replaces transformers.
74
(a) τ = 140 (b) τ = 130
(c) τ = 120 (d) τ = 110
Figure 5.10: Sample paths of the Threshold policy with different values for τ
5.3.5 Lookahead Policy
The Lookahead policy that is implemented is based on the same methodology
as the Threshold policy. In this way, the Lookahead policy is more of an extension
of the Threshold policy, and the Threshold policy can be thought of as a “0-year
Lookahead”. Tables 5.5a, 5.5b show the results of the 1 year and 2 year Lookahead
Policies. The tunable parameters in this policy are γ and δ. We assume that W it
increases by δ each year, meaning that δ captures our expectation of the average
increase in W it in a given year. After testing, it was determined that δ1 = 1 for
the 1-year Lookahead and δ2 = 0.5 for the 2-year Lookahead resulted in the best
performance. It is interesting that the two versions of the Lookahead policy have
75
γ OCu/µa p9 p12 p15 Obj.
140 0.215 0.997 0.35 0.031 8871.45135 0.217 0.994 0.363 0.034 9028.82130 0.288 0.992 0.377 0.042 9421.95125 0.707 0.954 0.239 0.030 8644.64120 1.631 0.764 0.099 0.010 8169.88115 3.001 0.388 0.03 0.002 8804.46110 4.845 0.065 0.001 0.000 11540.28105 7.230 0.008 0.000 0.000 17654.57100 9.613 0.001 0.000 0.000 24861.14
(a) 1 Year Lookahead
γ OCu/µa p9 p12 p15 Obj.
140 0.215 0.99 0.368 0.040 9146.86135 0.215 0.997 0.370 0.037 9150.58130 0.216 0.991 0.380 0.036 9205.02125 0.310 0.989 0.324 0.039 8910.25120 0.930 0.926 0.219 0.026 8735.08115 2.090 0.637 0.077 0.001 8249.48110 3.809 0.182 0.007 0.000 9495.73105 6.046 0.019 0.001 0.000 14360.82100 8.710 0.001 0.000 0.000 21891.05
(b) 2 Year Lookahead
Table 5.5: Results for Lookahead Policy
different best-fit values for δ. However, since the 2-year Lookahead is projecting
further into the future, it does make sense that it would have a more conservative
and lower value for the expected amount of aging.
γ is the Lookahead analog for τ in the Threshold policy in that it determines the
threshold past which a transformer should be replaced. The difference is that, in the
Lookahead, γ is used to predict transformer failures in future periods rather than the
current period. Similar to τ , γ can be tuned depending on the risk tolerance of the
utility.
The results show that the Lookahead policies perform quite similarly to each
other. It is interesting that the 2 year Lookahead Policy does not perform signifi-
cantly better than the 1 year Lookahead despite projecting further into the future.
76
Moreover, both 1 and 2 year Lookahead perform very similarly to the Threshold
policy. Figure 5.11 gives a graphical representation of the relationship between op-
portunity cost and failure risk for the Lookahead policies.
(a) 1 Year (b) 2 Year
Figure 5.11: Performance of the Lookahead policies across the 3 different risk measures with β = 0.2,δ1 = 1, δ2 = 0.5 and different values for γ. γ decreases from left to right.
We see that Figures 5.11a, 5.11b look remarkably similar across all 3 of the risk
measures. Overall, the Lookahead policies surprisingly do not perform significantly
better than the Threshold policy. We will analyze why this may be the case later on
in this chapter.
5.4 Comparison Across Different Risk Measures
In this section, we compare the results of the different polices across the 3 risk
measures. Although the Threshold policy performed the best in terms of the objective
function (see Figure 5.1, pg. 62), we want to further investigate which policy performs
the best across different risk tolerances. In this way, we can find the efficient frontier
77
that defines which policy incurs the lowest opportunity cost for a given amount of
risk that a utility is willing to take on. Graphically, a more efficient policy will have
a steeper slope. Moreover, we analyze the 3 different risk measures separately to
explore if our results vary depending on risk measure.
Figure 5.12: Performance of different policies in terms of reducing p9 with varying tunable parametersfor each policy
Figure 5.12 shows the performance of the different policies on the p9 risk measure.
We see that the performance of the Threshold and Lookahead policies are extremely
similar. These policies perform virtually identically on the interval of p9 = [0.7, 1.0].
However, in the interval p9 = [0.1, 0.7], the Threshold policy manages to outperform
the Lookahead policies, consistently incurring slightly lower opportunity costs for the
same level of risk. In the final interval where p9 = [0, 0.1], again we see that the
Threshold and Lookahead policies perform very similarly. Both of these policies are
consistently more efficient than the Variance Reduction and the Pure Aging policies.
We see that Pure Aging policy is by far the least efficient at reducing risk. Overall,
the Threshold policy proves to be the most efficient policy across all risk levels.
78
Figure 5.13: Performance of different policies in terms of reducing p12 with varying tunable param-eters for each policy
Figure 5.13 shows the performance of the different policies on the p12 risk mea-
sure. We see that the relative performance on p12 largely tells the same story as p9.
The Threshold policy still consistently outperforms all of the other policies (except for
around when p12 = 0.1 where the 1 year Lookahead policy performs slightly better).
Overall, we see that although the relative performance of the policies remains largely
unchanged, the difference in slopes is smaller. The Pure Aging policy’s performance
on p12 is much closer to that of the other policies than it was in p9. This is because
p12 is a higher risk measure than p9, and the probabilities of reaching p12 are lower.
Thus, we see that the worse performing policies are relatively more efficient compared
to the better performing policies at higher risk measures.
Figure 5.14 shows the performance of the different policies on the p15 risk mea-
sure. We see some interesting behavior from the Lookahead and Pure Aging policies,
where the risk measure actually increases before decreasing. We can attribute this
to the fact that 15 failures is so rarely reached that the policies are unable to con-
79
Figure 5.14: Performance of different policies in terms of reducing p15 with varying tunable param-eters for each policy
sistently reduce the probability at high risk tolerances. However, we still see that
the Threshold policy is consistently the most efficient policy across all risk levels.
However, we see that p15 is extremely low and should not be a significant concern for
the utility.
5.4.1 Case Study: Zero Risk Tolerance
We explore a specific case where we assume that the utility is extremely risk
averse and has zero risk tolerance. In this case, we consider the opportunity cost
incurred across the different policies when the failure risk is reduced to 0. Note
that for p12 and p15, we consider when the policy actually reaches 0. However, since
none of the policies are able to completely reduce p9 to 0, we consider p9 completely
reduced when the risk measure is less than 0.005. Figure 5.15 shows an extrapolated
version of the graphs shown in the previous section to highlight when the risk measure
approaches 0.
80
(a) 1 Year
(b) 2 Year
(c) 2 Year
Figure 5.15: Comparison of different policies across risk measures in the scenario of zero risk tolerancewhere failure risk is completely minimized
81
We see that in Figure 5.15a, the Variance Reduction policy actually reduces p9
below 0.005 at a lower opportunity cost than the other policies. In Figure 5.15b, the
Threshold policy reduced p12 to 0 at the lowest opportunity cost, while the Variance
Reduction policy performs worse than the Threshold and Lookahead policies. Note
that the Pure Aging policy is not shown in Figure 5.15a or 5.15b because the policy
does not actually reduce the risk measure enough to approach 0. In Figure 5.15c, we
see that again the Threshold Policy reduces p15 to 0 at the lowest opportunity cost.
However, we also see that the Pure Aging policy manages to reach 0 in this case and
at a lower opportunity cost than the Lookahead policies. The Variance Reduction
policy does not reduce p15 to 0 until incurring a unit opportunity cost of 7.5 years
and is not shown in Figure 5.15c.
This case study is just one example of exploring a specific risk tolerance. De-
pending on what kind of risk the utility is willing to tolerate, we can extrapolate
different parts of the graph to find which policies perform the best in the given risk
interval. Moreover, comparing across the 3 risk measures in this case study highlights
how the relative performance of different policies changes depending on which risk
measure we are looking at.
5.5 Limitations of Policies
From the results presented so far, we see that the Threshold policy performs
the best out of all of the policies. Specifically, we saw that the policies that utilized
the weighted ranking performed better than the Pure Aging policy that only relied
on chronological age. However, the policies with the weighted ranking still incurred
significant opportunity costs, especially at very low risk levels. Finally, we saw that
the Lookahead policies actually performed slightly worse than the Threshold policy
82
despite attempting to account for failures in the future. In this section, we will
investigate these limitations and seek to explain why the policies perform in this
way.
5.5.1 Chronological Age vs. Weighted Ranking
The results showed that the policies using the weighted ranking performed bet-
ter than the Pure Aging policy using only chronological age. Recall from section 4.4,
pg. 47 that the weighted ranking, W ti , is derived from a combination of the chronolog-
ical age and the number of jolts that the transformer has experienced over its lifetime.
By taking into account the number of jolts experienced, the hypothesis is that the
weighted ranking more accurately identifies when transformers will fail. Although the
results of the policy simulations suggest that this is the case, we look at the distribu-
tions of each criteria when transformers fail in the simulation. Table 5.6 shows the
µF σF σF/µFChronological age 72.45 18.99 0.261Weighted ranking 118.28 11.65 0.099
Table 5.6: Statistics for the failure values for the chronological age and weighted ranking
results of each criteria at failure. µF is the mean of the failure values and σF is the
standard deviation of the failure values. We see that the failure values for chrono-
logical age have a significantly higher standard deviation compared to the weighted
ranking, which indicates that transformers fail at a wider range of chronological ages
than weighted rankings. Figure 5.16 shows the difference in the dispersion of failure
values between chronological age and weighted ranking.
83
Figure 5.16: Fitted probability distribution functions of failure values using the two criteria. Thex-axis represents the percentage of the mean value
Figure 5.16 clearly shows that the failure values using the weighted ranking are
significantly less dispersed than the chronological age. Given this dramatic difference,
it is not surprising that policies that use weighted ranking as a criteria for replacement
perform better.
5.5.2 Variance in Fault Magnitude
Although it has been determined that using the weighted ranking as a replace-
ment criteria offers better performance than chronological age, we still see that the
policies utilizing weighted ranking incur significant opportunity costs. We hypothe-
size that this is due to the fact that the variance in fault magnitude is extremely high.
84
This is problematic for the weighted ranking formula, because the formula weights
every fault the same since the fault magnitude is unobservable to the policy. Recall
from section 3.5.4, pg. 33, that the magnitude of faults is assumed to come from 3
distributions, denoted m1,m2,m3 in increasing magnitude. m1 faults have 0 magni-
tude and are not counted in the weighted ranking formula. m2 faults constitute 80%
of the faults, but are assumed to have a minor impact on aging. m3 faults constitute
10% of the faults, but have a significant impact on aging. Figure 5.17 shows the
p.d.f. of the fault magnitude. Figure 5.17 shows that, as expected, the majority of
Figure 5.17: Fitted probability distribution function of all fault magnitudes
faults are from m2 faults. However, m3, faults, while low in probability compared to
the m2 faults, have high magnitudes such that they have the most significant impact
on transformer failure. Figure 5.18 shows the separate p.d.f.’s for m2 and m3 faults.
From Figure 5.18a, we see that m2 fault magnitudes lie within the interval of [0, 0.5].
From Figure 5.18b, we see that m3 fault magnitudes show an extremely long right
tail. Table 5.7 shows the statistics for the fault magnitudes, where µM denotes mean
fault magnitude, σM denotes the standard deviation, and Min. and Max. represent
the minimum and maximum observed fault magnitudes. We see that the maximum
m3 fault magnitude is more than 5 standard deviations away from the mean. More-
over, we see that the standard deviation of m3 fault magnitudes is very high relative
to the mean.
85
(a) m2 faults (b) m3 faults
Figure 5.18: Fitted probability distribution function of fault magnitudes separated by fault type
µM σM Min. Max.m2 faults 0.185 0.116 0.00 0.50m3 faults 2.77 2.21 0.50 13.86
Table 5.7: Statistics for the fault magnitude of m2 and m3 faults
Fault magnitude is in terms of real age, which is the unobservable parameter
that determines transformer failure. In this way, both chronological age and weighted
ranking are criteria that try to approximate real age using observable information.
We have shown in the previous section that the correlation between real age and
chronological age is low. Although weighted ranking is a better approximation of real
age than chronological age, we show in this section that the weighted ranking has
significant limitations as well. The magnitude of faults experienced by transformers
has an extremely high variance. The weighted ranking considers every fault to be
equal, ignoring the dramatic differences in magnitude. This is the biggest shortcom-
ing of the weighted ranking formula, and most likely explains the reason that the
policies still incur significant opportunity cost when reducing risk. We will explore
the importance of knowing the real age of transformers later on in section 5.6.
86
5.5.3 Expected Amount of Aging Per Year
The results indicated that the Threshold policy performed better than the 1-year
and 2-year Lookahead policies. This is surprising because the policies are based on
the same methodology, with the Lookahead policies forecasting into the future. Recall
from section 4.6, pg. 49 that the Lookahead policy has a tunable parameter δ that
indicates the expected amount of aging per year. Since δ is constant, if the observed
amount of aging per year has a high variance, δ would not accurately capture the
expected amount of aging.
(a) Fitted p.d.f. (b) Empirical c.d.f
Figure 5.19: Distribution functions of the annual amount of aging experienced by transformers
Figure 5.19 shows the observed distribution functions of the amount of aging in
terms of real age experienced by all transformers over the course of one simulation.
The results show that the amount of aging per year shows a similar trend to the
magnitude of faults (see Figure 5.18). The amount of aging per year is characterized
by an extremely long right tail, indicating that, in a small number of years, a trans-
former can experience significantly greater aging. Table 5.8 shows the statistics of the
observed amount of aging per year across all transformers, with µA representing the
mean amount of aging per year, σA representing the standard deviation, and Min.
and Max. indicating the minimum and maximum observed values over the course of
87
µA σA Min. Max.Aging per year 2.87 2.39 1 27.79
Table 5.8: Statistics for the amount of aging per year
one simulation. Note that µA is different from the value of µa defined earlier since µA
is only the mean across 1 sample path. We see that σA is fairly high and almost equal
to µA. Moreover, the maximum observed value is more than 10 standard deviations
away from the mean. This indicates that the length of the right tail is extremely
large.
These results indicate that the variance in the amount of aging experienced per
year is high. Since δ is constant, even with a finely tuned value, δ would not be
able to accurately forecast the future. Moreover, since the real age is unobservable
to the policy, the Lookahead policy, like the Threshold policy, utilizes the weighted
ranking to approximate real age. We showed in the previous section that the Weighed
Ranking faced some limitations in predicting real age. This fact, combined with the
inaccuracy of forecasting into the future, explain why the Lookahead policies perform
slightly worse than the Threshold policy. Moreover, the inaccuracy in forecasting also
explains why the 2 year Lookahead Policy does not perform any better than the 1
year Lookahead.
5.6 Value of Information
In this section, we will look at the value of information in terms of the objective
function. The two policies that have been implemented to test the value of informa-
tion are the Lookahead Plus policy and the Perfect Information policy. Recall from
section 4.7.1 that the Lookahead Plus policy is a version of the Lookahead policy that
is able to observe the real age of every transformer. The Perfect Information policy
88
assumes that the policy perfectly knows the future. In this way, these 2 policies can
be used to estimate:
1. V IC . Denotes the value of information about the current state of the grid
This is estimated by the improvement in the objective function due to the
Lookahead Plus policy.
2. V IF . Denotes the value of information about the future aging of the grid.
This is estimated by the improvement in the objective function in the Perfect
Information policy over the Lookahead Plus policy.
The total value of perfect information is therefore denoted V I , where
V I = V IC + V I
F
5.6.1 Estimating the Value of Information
Figure 5.20 shows the performance of the Lookahead Plus and Perfect Informa-
tion policies compared to the Base policy and Threshold policy. As expected, the
minimum costs of the Lookahead Plus and Perfect Information policies are both far
lower than the current minimum in the Threshold policy. Like the other policies we
have tested, the Lookahead Plus policy can be tuned depending on utility’s risk tol-
erance. Figure 5.20 shows that many of the costs incurred at different risk tolerances
for the Lookahead Plus policy, not just the minimum cost, are lower than the current
feasible minimum. This indicates that, given accurate information on the current
state of the grid, the Lookahead Plus policy is much more efficient than currently
feasible policies. Moreover, we see that V IC is significantly larger than V I
F . Table 5.9
shows a summary of the values of different information.
These results indicate that the value of being able to observe the current real
89
Figure 5.20: Values of the objective function of policies including policies to estimate the value ofinformation. The top orange line represents the minimum value across all currently feasible policies.The middle orange line represents the minimum value in the Lookahead Plus policy. The bottomorange line represents the minimum value of the Perfect Information policy.
Value % of V I
V IF $5,295 81.7V IC $1,187 18.3V I $6,482 100.0
Table 5.9: Summary of the value of information
age of transformers is significantly higher than the value of information about future
aging. From the perspective of the utility, these results estimate how much money
the utility would save if it were to be able to see this information. In this way, the
estimated value of information can serve as a guide for the utility on how much it
should be willing to spend on investing in potential new technologies to acquire these
capabilities. This paper does not explore what potential investments these may be
or whether or not these investments exist.
Regardless of what the limits of new technology are, these results are promising
for the utility. Technology that better captures the current state of transformers is
90
surely more feasible than forecasting future aging. Therefore, it is promising that
V IC captures an overwhelming percentage of total V I . However, these results assume
that the policy is able to perfectly observe the real age of transformers. Recall from
section 4.7.1 that there is an ε noise parameter in the Lookahead Plus policy that
affects the accuracy of the information. ε is assumed to be zero in this case. In the
next section, we investigate the performance of the policy when ε > 0.
5.6.2 Impact of Measurement Noise
Since there is currently no way to even attempt to measure real age, it is reason-
able to assume that technology that perfectly measures real age will not be developed
for some time. However, the first steps in development of such technology will likely
be technology that is able to measure real age with some amount of noise. In this
section, we explore what kind of impact measurement noise has on the performance
of the Lookahead Plus policy.
Recall from section 4.7.1, pg. 53 that the tunable parameters for the Lookahead
Plus policy are δp, which represents the expected amount of aging per year, and ε,
the noise term. In contrast to the Lookahead policies, where δ was fixed after testing,
δp can be tuned depending on the risk tolerance of the utility because the threshold
for failure is fixed at when P it < 0.
Figure 5.21 compares the performance of the Lookahead Plus policy with in-
creasing values of ε on the p9 risk measure. We omit showing the graphs for the
p12, p15 risk measures because they largely show the same thing. As expected, as
ε increases, the policies perform increasingly worse. The policy is consistently less
efficient as measurement noise increases. We see that for ε = 0.2 and ε = 0.25, the
policy starts at a much lower risk level despite the fact that the same δp values are
used. Since these lower risk levels are achieved at lower opportunity costs when ε is
91
Figure 5.21: Performance of the Lookahead Plus policy under different ε on the p9 risk measure. δpincreases from the left to right.
lower, this does not indicate that higher measurement noise is more efficient. Rather,
this suggests that higher measurement noise causes P it < 0 for more transformers,
which results in the policy choosing to over replace transformers.
Figure 5.22 compares the distributions of P it under different values of ε in the
same simulation path. Figure 5.22a shows the distribution under the assumption of
0 measurement noise and indicates that 5 transformers are near failure. Figure 5.22b
shows the same distribution but with significant noise in the measurement and indi-
cates that 29 transformers are near failure. This shows how increasing measurement
noise causes the policy to believe that more transformers are near failure, which ex-
plains the lower risk levels shown in Figure 5.21. Moreover, these distributions are
of the current belief. Since the Lookahead Plus policy uses the current belief to fore-
cast future failure, error caused by the measurement noise propagates into the future,
affecting the replacement decision.
92
(a) ε = 0 (b) ε = 0.2
Figure 5.22: Comparison of P it with different values of ε at t=90 in the same sample path.
Although measurement noise certainly negatively affects the performance of the
Lookahead Plus policy, how much impact does measurement noise have on the value
of information on the current grid, V IC? More importantly, how much does measure-
ment noise affect performance relative to the Threshold policy, the best performing
feasible policy? Figure 5.23a shows the performance of the policy with different
amounts of measurement noise compared to the Threshold policy on the p9 risk mea-
sure. Figure 5.23b shows the minimum objective function values under each scenario
compared with the minimum Threshold policy value. We see that, even with sig-
nificant measurement noise, the Lookahead Plus policy consistently outperforms the
Threshold policy. With ε = 0.25, the policy is still able to improve the objective func-
tion significantly and capture 60% of V IC . This is promising for the utility because
it shows that perfect measurements of real age are not necessary in order to reduce
costs. We note that even though the minimum costs for ε = 0.2 and ε = 0.25 are
lower than the Threshold policy, these levels of measurement noise will likely lead
to over-replacement of transformers. Thus, high measurement noise decreases the
flexibility of the policy compared to lower measurement noise, which is able to reduce
risk efficiently across all risk tolerances.
93
(a) p9 risk measure (b) Objective function
Figure 5.23: Impact of measurement noise on Lookahead Plus policy performance compared toThreshold policy.
From these results, we conclude that the value of information on the current
state of the grid cannot be underestimated. The value of information on the current
state is far higher than the value of information on future aging. Moreover, these
results show the limitations of currently feasible policies that use the weighted ranking.
Given that policies utilizing better information with significant measurement noise
still significantly outperform the best currently feasible policies, the weighted ranking
is shown to be a relatively inaccurate approximation of real age. We hope that these
results encourage PSE&G and other utilities to recognize the importance of investing
in more accurate diagnostic methods that would make this information accessible,
even if the diagnostics are not perfectly accurate.
94
Chapter 6
Conclusion
The results of the policy simulations confirm the preliminary hypothesis that
policies that utilize the weighted ranking would perform better than policies that use
only chronological age. The weighted ranking polices are able to reduce failure risk at
a lower opportunity cost as well as being able to fully eliminate failure risk entirely.
However, even though the efficient frontier is defined mostly by the Threshold policy,
the question remains of how PSE&G can utilize the results of this thesis to inform
its replacement policy going forward.
The extent to which failure risk is reduced varies depending on the values of the
tunable parameters. The efficient frontier determines the lowest possible opportunity
cost that can be incurred for a given level of failure risk, but it is up to the utility
to decide what level of failure risk is appropriate. The utility must consider that
as the level of failure risk decreases, the marginal reduction in failure risk per unit
opportunity cost also decreases. This suggests that, even on the efficient frontier,
the trade-off in opportunity cost is not worth the benefits of completely reducing
failure risk. The objective function formulated in this thesis attempts to quantify
the trade-off between failure risk and opportunity cost based on existing literature;
95
however, PSE&G and other utilities may operate with different assumptions and risk
tolerances that affect how they consider these costs. Furthermore, the regulatory
environment of public utilities may impact this decision process. The decision to
implement a more aggressive replacement policy that significantly reduces failure risk
would require increased financial outlays that must be budgeted for beforehand. The
tight regulations that utilities face may impose a cap on the amount of spending
utilities can allocate for replacement, which can limit the type of policy that can be
implemented. Ultimately, the final decision on replacement policies must be made
after careful consideration of the level of risk that can be tolerated as well as the
financial capabilities of the utility.
Regardless of how PSE&G approaches its replacement policy going forward, it
must also consider the limitations of these policies and what potential improvements
can be made. A significant limitation is the inability to measure the amount of aging
that a transformer has experienced before it fails DGA testing. The results of the
policy simulations indicate that the value of being able to measure such information
is quite high. This thesis does not go into specifics on if such measurements are
possible, but the results should inform PSE&G and other utilities on how to approach
investing in research that could lead to the future development of such measurement
techniques.
6.1 Areas for Further Research
This thesis represents a first effort at formulating the complex problem of dealing
with transformer failures faced by PSE&G and other utilities. Although great care
was taken to ensure that the problem is formulated as realistically as possible, certain
assumptions were made in order to simplify the problem. Thus, while the results of
96
this thesis provide many insights into a potential solution, there are still many areas
for further research.
First, a more complex model for transformer lifetime could be used to simulate
transformer failures. The literature includes research that models transformer lifetime
using a hazard function, such as a logistic function or the Perks’ function. The
difficulty in this approach lies in deciding on which model to use since there is no
consensus in the literature. Moreover, many of these more complex functions are
modeled with time being the only independent variable, raising the question of how
to simulate the impact of a fault on a hazard rate function.
Second, there were several simplifying assumptions on the operations of trans-
formers that were made. These include the simplifying the process of DGA testing,
the correlation between transformer locations, as well as a singular definition of what
a fault entails. As this was not an electrical engineering thesis, assumptions were
made to simplify the modeling process. A more realistic extension of this thesis
would modify these assumptions based off of electrical engineering literature.
Third, we do not consider the availability of spare transformers when evaluating
the replacement decision. Choosing how many spares to keep at any given point in
time adds another layer of complexity to the problem. However, from the perspec-
tive of the utility, the replacement decision must be evaluated in tandem with the
availability of spares at the time.
6.2 Final Remarks
Providing a resilient and stable power grid is the foremost priority of a public
utility. Transformers serve as the heart of the grid, meaning that potential failures
represent an area of enormous risk in achieving this goal. For PSE&G and utilities
97
nationwide, it is clear that the current policy of relying on DGA testing for transformer
replacement will not be sufficient in preventing transformer failures in the near future.
However, the results of this thesis are promising in showing that by implementing an
early replacement strategy, PSE&G can significantly reduce the risk of transformer
failure. Given careful consideration of its risk tolerance and the opportunity cost
involved, we are confident that PSE&G can implement a policy that matches its
financial capabilities and risk profile.
98
Bibliography
Arshad, M., Islam, S. M., and Khaliq, A. (2004). Power transformer aging and life
extension. 8th International Conference on Probabilistic Methods Applied to Power
Systems.
Bartley, W. H. (2003). Analysis of transformer failures. International Association of
Engineering Insurers 36th Annual Conference.
Brown, R. E. (2007). Power Systems, chapter Power System Reliability, pages 19–1–
19–14. CRC Press.
Chen, Q. and Egan, D. M. (2006). A bayesian method for transformers life estimation
using perks’ hazard function. IEEE Transactions on Power Systems, 21(4).
Chowdhury, A. A. and Koval, D. O. (2005). Development of probabilistic models for
computing optimal distribution substation spare transformers. IEEE Transactions
on Industry Applications, 41(6).
Council of Economic Advisers (2013). Economic benefits of increasing electric grid
resilience to weather outages. Executive Office of the President.
Dixon, F. L., Steward, D., and Hoffmeister, J. (2010). When to replace aging trans-
formers. 2010 Industry Applications Society 57th Annual Petroleum and Chemical
Industry Conference.
Fischer, M., Tenbohlen, S., Schafer, M., and Haug, R. (2010). Determining power
99
transformers sequence of maintenance and repair in power grids. 2010 IEEE Inter-
national Symposium on Electrical Insulation.
Hamrick, L. (2009). Dissolved gas analysis for transformers. ESCO Energy Services.
Hong, Y., Meeker, W. Q., and McCalley, J. D. (2009). Prediction of remaining life of
power transformers based on left truncated and right censored lifetime data. The
Annals of Applied Statistics, 3(2).
Ismail, N. and Jemain, A. A. (2007). Handling overdispersion with negative binomial
and generalized poisson regression models. Casualty Actuarial Society Forum.
Karlsson, S. (2007). A review of lifetime assessment of transformers and the use of
dissolved gas analysis. KTH School of Electrical Engineering.
Kogan, V. I., Roger, C. J., and Tipton, D. E. (1996). Substation distribution trans-
formers failures and spares. IEEE Transactions on Power Systems, 11(4).
Liu, H., Davidson, R. A., Rosowsky, D. V., and Stedinger, J. R. (2005). Negative
binomial regression of electric power outages in hurricanes. Journal of Infrastructure
Systems.
McNutt, W. J. (1992). Insulation thermal life considerations for transformer loading
guides. Transactions on Power Delivery, 7(1).
van Schijndel, A., Wetzer, J. M., and Wouters, P. (2006). Forecasting transformer
reliability. 2006 Annual Report Conference on Electrical Insulation and Dielectric
Phenomena.
100