000000000001010555

144
Technical Report Maintenance Best Practices for Switching Equipment and Transformers with Key Performance Indicators (KPIs) and Algorithms for “Living” Reliability Centered Maintenance (RCM) and Performance Based Maintenance (PBM)

Upload: ivan-hubert-medrano-terrel

Post on 12-Apr-2015

18 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 000000000001010555

Technical Report

Maintenance Best Practices for Switching Equipmentand Transformers with Key Performance Indicators(KPIs) and Algorithms for “Living” ReliabilityCentered Maintenance (RCM) and PerformanceBased Maintenance (PBM)

Page 2: 000000000001010555
Page 3: 000000000001010555

EPRI Project Manager B. Desai

ELECTRIC POWER RESEARCH INSTITUTE 3420 Hillview Avenue, Palo Alto, California 94304-1395 ▪ PO Box 10412, Palo Alto, California 94303-0813 ▪ USA

800.313.3774 ▪ 650.855.2121 ▪ [email protected] ▪ www.epri.com

Maintenance Best Practices for Switching Equipment and Transformers with Key Performance Indicators (KPIs) and Algorithms for “Living” Reliability Centered Maintenance (RCM) and Performance Based Maintenance (PBM) 1010555

Interim Report, December 2005

Page 4: 000000000001010555

DISCLAIMER OF WARRANTIES AND LIMITATION OF LIABILITIES

THIS DOCUMENT WAS PREPARED BY THE ORGANIZATION(S) NAMED BELOW AS AN ACCOUNT OF WORK SPONSORED OR COSPONSORED BY THE ELECTRIC POWER RESEARCH INSTITUTE, INC. (EPRI). NEITHER EPRI, ANY MEMBER OF EPRI, ANY COSPONSOR, THE ORGANIZATION(S) BELOW, NOR ANY PERSON ACTING ON BEHALF OF ANY OF THEM:

(A) MAKES ANY WARRANTY OR REPRESENTATION WHATSOEVER, EXPRESS OR IMPLIED, (I) WITH RESPECT TO THE USE OF ANY INFORMATION, APPARATUS, METHOD, PROCESS, OR SIMILAR ITEM DISCLOSED IN THIS DOCUMENT, INCLUDING MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, OR (II) THAT SUCH USE DOES NOT INFRINGE ON OR INTERFERE WITH PRIVATELY OWNED RIGHTS, INCLUDING ANY PARTY'S INTELLECTUAL PROPERTY, OR (III) THAT THIS DOCUMENT IS SUITABLE TO ANY PARTICULAR USER'S CIRCUMSTANCE; OR

(B) ASSUMES RESPONSIBILITY FOR ANY DAMAGES OR OTHER LIABILITY WHATSOEVER (INCLUDING ANY CONSEQUENTIAL DAMAGES, EVEN IF EPRI OR ANY EPRI REPRESENTATIVE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES) RESULTING FROM YOUR SELECTION OR USE OF THIS DOCUMENT OR ANY INFORMATION, APPARATUS, METHOD, PROCESS, OR SIMILAR ITEM DISCLOSED IN THIS DOCUMENT.

ORGANIZATION(S) THAT PREPARED THIS DOCUMENT

Maintenance and Test Engineering, LLC

NOTE

For further information about EPRI, call the EPRI Customer Assistance Center at 800.313.3774 or e-mail [email protected].

Electric Power Research Institute and EPRI are registered service marks of the Electric Power Research Institute, Inc.

Copyright © 2005 Electric Power Research Institute, Inc. All rights reserved.

Page 5: 000000000001010555

iii

CITATIONS

This report was prepared by

Maintenance and Test Engineering, LLC 2037 North Berry Street Olympia, WA 98506

Principal Investigator J. Skog

This report describes research sponsored by the Electric Power Research Institute (EPRI).

The report is a corporate document that should be cited in the literature in the following manner:

Maintenance Best Practices for Switching Equipment and Transformers with Key Performance Indicators (KPIs) and Algorithms for “Living” Reliability Centered Maintenance (RCM) and Performance Based Maintenance (PBM): EPRI, Palo Alto, CA: 2005. 1010555.

Page 6: 000000000001010555
Page 7: 000000000001010555

v

PRODUCT DESCRIPTION

Over the past several decades, utilities have taken two significant approaches to improve the effectiveness and efficiency of their maintenance programs. These approaches have focused on maintenance task improvements and technology improvements associated with the equipment operation and the use of on-line monitors. While these two approaches have resulted in improvements, they have not necessarily taken full advantage of existing data available through supervisory control and data acquisition (SCADA) devices and intelligent electronic devices (IEDs), optimized maintenance cycles, or focused on improving the overall performance of the maintenance program.

Performance focused maintenance (PFM) is an all-inclusive approach to maintenance. PFM brings together what previously appeared to be distinctly different approaches to maintenance under a single umbrella. PFM recognizes that maintenance is both a technical and business process that must be managed and, at a high level, should be very similar across the whole landscape of utilities. PFM acknowledges that the specific application of these processes and approaches will differ due to the wide range of customer requirements, electric infrastructures, and maintenance organizations. The adaptive approach of PFM allows utilities to meet their own specific maintenance and operational goals and at the same time be confident that they are effectively managing the process and following industry best practices.

Results and Findings This report outlines the initial effort to identify the value of PFM and to provide a broad overview of the topics involved. This work will provide direction for future EPRI efforts. The key purposes of this report are to establish the need for PFM in the minds of utility personnel and to demonstrate its potential value. The previous paradigm for maintenance is no longer sufficient to ensure optimal performance; PFM will provide the next step in maintenance optimization. This report contains suggestions to improve maintenance performance using techniques that have been developed but only partially tested and formally documented. Because this document is a work-in-process report, these suggestions are in transition.

Challenges and Objectives Although this report outlines the PFM approach for substations, PFM is directed at all utility personnel involved in the management of maintenance processes. It gives power delivery maintenance managers access to a series of tools and thought processes that will enable them to identify areas for improvement using PFM strategies.

Improving maintenance remains crucial to improved utility financial performance. Operations and maintenance (O&M) are the largest controllable costs for most utility organizations. Because both maintenance and operations are labor intensive, proper management of the workforce is

Page 8: 000000000001010555

vi

vital to success in each area. EPRI has been a leader in the application of reliability centered maintenance (RCM) and other maintenance performance enhancement tools. PFM is the logical extension of these tools.

Applications, Value, and Use PFM is applicable to all forms of asset management in the power delivery sector. This report provides an overview of a comprehensive yet adaptable approach to maintenance that can easily be applied to a utility’s unique asset management strategy. PFM is an overall process of utility service optimization that the utility can apply in totality or on a targeted basis.

EPRI Perspective This work is only a beginning in the development of PFM. PFM incorporates an assortment of tools previously developed by EPRI as well as forward-looking maintenance strategies. PFM combines the elements into a master framework that allows the utility to integrate them in a cohesive manner into their maintenance program. PFM is a vital part of power delivery asset management and is linked to future asset management projects. EPRI has long been a leader in developing and adapting new approaches for improving maintenance performance. In addition, EPRI has an unmatched ability to extract intelligence from utility personnel, vendors, and other industry leaders and to combine that intelligence into a program that will benefit the entire community.

Approach This report explains the value of PFM and outlines a possible approach to implementing it. To achieve this objective, EPRI obtained the services of key industry leaders in the areas of performance enhancement including maintenance management workstation (MMW), the integrated monitoring and diagnostics (IMD) and XVisor programs, and RCM. These industry leaders synthesized the lessons learned from the past 10 years of EPRI-sponsored optimization efforts to generate the contents of this report. PFM will serve as a basis for ideas for the next generation of EPRI-sponsored products. EPRI has established plans for industry-wide review of this report to determine the future direction of its efforts.

Keywords Availability Key performance indicator (KPI) Performance focused Reliability Aging models Maintenance optimization

Page 9: 000000000001010555

vii

ABSTRACT

This interim report introduces an advanced and comprehensive approach to power delivery maintenance—performance focused maintenance (PFM). PFM is an all-inclusive approach to maintenance that goes well beyond reliability centered maintenance (RCM) and condition based maintenance (CBM). PFM includes not only the technical aspects of maintenance but also the business, risk management, economic, organizational, and continuous improvement processes. PFM emphasizes the appropriate and judicious use of data and establishes feedback loops. Specifically, this report emphasizes the feedback loops involving all levels of the maintenance organization, from feedback on the performance of individual tasks to feedback on the performance of the utility as a maintenance provider and manager, to meet the continuing competitive challenge of improving maintenance performance while reducing costs.

Page 10: 000000000001010555
Page 11: 000000000001010555

ix

CONTENTS

1 INTRODUCTION ....................................................................................................................1-1 Goals of the Project...............................................................................................................1-1 The PFM Concept .................................................................................................................1-1

2 BALANCED MAINTENANCE APPROACH...........................................................................2-1 Definitions Used in this Report ..............................................................................................2-2

Maintenance Philosophy ..................................................................................................2-3 Maintenance Strategy.......................................................................................................2-3

Reliability Centered Maintenance ................................................................................2-3 Maintenance Basis ......................................................................................................2-3 Maintenance Tasks......................................................................................................2-4

Corrective Maintenance Tasks................................................................................2-4 Preventive Maintenance Tasks ...............................................................................2-5 Condition Directed/Based Maintenance Tasks .......................................................2-5 Predictive Maintenance Tasks (Also Referred to as Condition Based Maintenance) ..........................................................................................................2-5 Hidden Failure Finding Tasks .................................................................................2-5

3 ELEMENTS OF PFM..............................................................................................................3-1 1. Planning – Aligning Maintenance with Utility Goals ..........................................................3-2

PFM Planning Process Objective .....................................................................................3-3 Elements of the Planning Process....................................................................................3-3

Executive Sponsorship and Reporting.........................................................................3-3 System Selection .........................................................................................................3-3 Team Assembly ...........................................................................................................3-3 Understanding Utility Goals .........................................................................................3-4

2. Developing a Technical Maintenance Approach...............................................................3-4 PFM Technical Process Objectives..................................................................................3-4 Elements for Developing a Technical Maintenance Approach .........................................3-4

Page 12: 000000000001010555

x

Identifying Critical Functions........................................................................................3-4 How Do Failures Manifest Themselves? .....................................................................3-4 Identifying the Effects of Failure...................................................................................3-4 Selecting the Right Preventive Strategy ......................................................................3-5 Data and Measures .....................................................................................................3-5

3. Building an Aging Model....................................................................................................3-5 PFM Aging Model Objectives ...........................................................................................3-5 Elements for Building Aging Models.................................................................................3-5

Determining the Aging Mechanism..............................................................................3-5 Can Aging Be Measured?............................................................................................3-6 What is an Acceptable Level of Risk?..........................................................................3-6 Limiting the Risk...........................................................................................................3-6

4. Creating a Maintenance Plan – Best Practices .................................................................3-6 PFM Plan Objectives ........................................................................................................3-6 Elements for Building the Maintenance Plan....................................................................3-7

Task Triggers...............................................................................................................3-7 Optimizing Maintenance Intervals................................................................................3-7 Building a Maintenance Plan .......................................................................................3-7 Dynamic Prioritization ..................................................................................................3-8

5. Measuring Performance....................................................................................................3-8 PFM Measurement Objectives .........................................................................................3-8

Setting Specific Maintenance Goals ............................................................................3-8 Developing Metrics and KPIs.......................................................................................3-9 Determining Data Requirements..................................................................................3-9 Setting Targets ..........................................................................................................3-10 Identifying the Current State of Maintenance.............................................................3-10 Gap Analysis..............................................................................................................3-10

6. Documentation and Implementation................................................................................3-11 PFM Documentation and Implementation Objectives ....................................................3-12 Elements for Documenting and Implementing the PFM Recommendations ..................3-12

Reconciliation ............................................................................................................3-12 Identifying Change.....................................................................................................3-12 Impact Analysis..........................................................................................................3-13 Change Management ................................................................................................3-13 Implementation Plans ................................................................................................3-13

Page 13: 000000000001010555

xi

7. Measurement and Feedback...........................................................................................3-14 PFM Measurement and Feedback Objectives ...............................................................3-14 Elements for Measuring Maintenance Effectiveness and Providing Feedback..............3-14

Measurement.............................................................................................................3-14 Reporting ...................................................................................................................3-14 Making Corrections....................................................................................................3-14 Implementing New Technologies and Maintenance Tasks........................................3-15

4 USING A TARGETED APPROACH WITH PFM....................................................................4-1 Closing the Gaps with PFM Selective Activities ....................................................................4-4

Goals Are Unclear ............................................................................................................4-4 Reliability Is Below Expectations ......................................................................................4-4 Executives Are Confused About the Value of Their Maintenance Investments ...............4-5 Regulators Are Challenging Your Maintenance Program.................................................4-5 Availability Requirements Are Tightened..........................................................................4-5 Maintenance Tasks Are Not Achieving Desired Results ..................................................4-5 Want to Make Better Use of Data.....................................................................................4-5 Maintenance Task Intervals Are Suboptimal ....................................................................4-5 There Is Too Much Work and Not Enough Resources.....................................................4-6 A Replacement Strategy Is Needed .................................................................................4-6 Intellectual Property Is Lost ..............................................................................................4-6

5 EXAMPLE APPLICATION OF PFM MEASURE AND PERFORMANCE ACTIVITIES.........5-1 Overview ...............................................................................................................................5-1 PFM Findings ........................................................................................................................5-2

1. LTC Oil Temperature....................................................................................................5-2 2. Differential Temperature...............................................................................................5-2 3. Differential Temperature with Trending ........................................................................5-3 4. Temperature Index .......................................................................................................5-3

Using Readily Available Data ................................................................................................5-4 LTC Failure Avoided..............................................................................................................5-4

6 THE ROLE OF DATA IN PFM................................................................................................6-1 Multiple Uses of Data ............................................................................................................6-2 Where Data Are Applied in PFM ...........................................................................................6-3

Page 14: 000000000001010555

xii

7 EFFECTIVELY USING DATA FOR RISK ANALYSIS...........................................................7-1 Introduction ...........................................................................................................................7-1 Data Drivers and Risk Management .....................................................................................7-2 Risk Decision Process...........................................................................................................7-4

Risk Assessment ..............................................................................................................7-6 An Effective Assessment Model at the Network/System and Asset/Component Levels ...............................................................................................................................7-8

Example Using the Assessment Model...............................................................................7-11 Pre-Service Information..................................................................................................7-11 Information Regarding Service Life ................................................................................7-13 Analysis of the Situation .................................................................................................7-13 Decision Based Upon Analysis.......................................................................................7-15

Technical PFM Interaction...................................................................................................7-15 Practical Application of the Risk Assessment Approach .....................................................7-21 The Decision Process Considering Various Scenarios .......................................................7-21 Assessment Steps...............................................................................................................7-23

Susceptibility Assessment ..............................................................................................7-23 Consequence Assessment.............................................................................................7-23 Technical Assessment (of Expected Asset Performance)..............................................7-24 Economic Assessment ...................................................................................................7-24

The Decision Support Model – Functionality and Structure ................................................7-24 Decision Model Supporting Effective Use of Data..........................................................7-28 The Decision Support Model, a Practical Example ........................................................7-29

Condition Analysis...............................................................................................................7-31 Consistency of Information .............................................................................................7-31 Maximum Quality of Condition Information.....................................................................7-32 Data Mining and Decision Support .................................................................................7-32

Practical Example of Data Mining: Cable Condition Assessment .......................................7-33 Condition Analysis of Power Cables...............................................................................7-35 Knowledge Rules............................................................................................................7-38 Database for Condition Assessment Support.................................................................7-40 Determinations of Norms and Criteria ............................................................................7-45 Database Application for Condition Assessment............................................................7-47

Page 15: 000000000001010555

xiii

8 PROJECT OPPORTUNITIES.................................................................................................8-1 Load-Tap-Changer Opportunities..........................................................................................8-1 Medium-Voltage Circuit Breakers..........................................................................................8-2 High-Voltage SF6 Circuit Breakers ........................................................................................8-3

9 NEXT STEPS..........................................................................................................................9-1

10 REFERENCES ...................................................................................................................10-1

A APPLICATION STUDY FOR LOAD-TAP-CHANGERS....................................................... A-1 Performance Focused Maintenance – LTC Application ....................................................... A-1

LTC Population Characteristics ....................................................................................... A-3 Operating and Maintenance History ................................................................................ A-3 LTC Diagnostics and Observations ................................................................................. A-4 Industry LTC Experience ................................................................................................. A-5 Main Insulation Package Diagnostics and Observations................................................. A-6 PFM Technical Analysis .................................................................................................. A-7 PFM Technical Summary .............................................................................................. A-12 PFM Risk Analysis......................................................................................................... A-12 Developing Aging Models.............................................................................................. A-14 Implications of Aging/Wear Models ............................................................................... A-15 Transformer Winding Maintenance ............................................................................... A-16 LTC Maintenance .......................................................................................................... A-17 Performance Measurement ........................................................................................... A-20

Conclusion ......................................................................................................................... A-23

Page 16: 000000000001010555
Page 17: 000000000001010555

xv

LIST OF FIGURES

Figure 2-1 The PFM Framework................................................................................................2-1 Figure 3-1 PFM Process Block Diagram....................................................................................3-2 Figure 3-2 The Relationships Among Utility Goal, Maintenance, and Performance................3-11 Figure 4-1 Simplified PFM Decision Diagram for Selective Implementation..............................4-2 Figure 5-1 LTC and Main Tank Temperature Profile .................................................................5-3 Figure 5-2 Failed Reversing Identified by Temperature Index...................................................5-5 Figure 7-1 Establishing Priorities ...............................................................................................7-3 Figure 7-2 Gearbox Approaches to Balancing Stakeholder Needs ...........................................7-4 Figure 7-3 Risk Decision Process..............................................................................................7-5 Figure 7-4 Asset Grouping by Risk and Performance Expectations..........................................7-7 Figure 7-5 Risk-Based Failure Consequence and Probability Matrix ........................................7-9 Figure 7-6 Asset-Directed Activity Matrix.................................................................................7-10 Figure 7-7 Single-Line Diagram and Gas Sectionalizing Compartments (Inlay)......................7-12 Figure 7-8 GIS Example, Transforming Information into Consequence and Activity

Matrices............................................................................................................................7-14 Figure 7-9 PFM Technical Analysis for Switchgear .................................................................7-16 Figure 7-10 PFM/FMECA for a Transformer............................................................................7-21 Figure 7-11 Information Flow and Processing .........................................................................7-22 Figure 7-12 Five Steps Decision Flowchart for Asset Management (AM) Decision.................7-25 Figure 7-13 Flowchart for Determining Redundancy Factor ....................................................7-27 Figure 7-14 Flowchart Example End Result ............................................................................7-30 Figure 7-15 Exchange of Condition Data.................................................................................7-32 Figure 7-16 Example of Analysis Tool .....................................................................................7-33 Figure 7-17 Schematic Structure of Data Mining Process .......................................................7-34 Figure 7-18 Relations of the Directly and Indirectly Analyzed PD Properties ..........................7-36 Figure 7-19 Decision Support Flow Diagram for PD Diagnosis ...............................................7-37 Figure 7-20 Example of Time (Upper) and Type (Lower) Analysis ..........................................7-39 Figure 7-21 Schematic Structure of a Diagnostics Database ..................................................7-40 Figure 7-22 Screenshot of Cable Sections ..............................................................................7-41 Figure 7-23 Measurement Add and Update Screen ................................................................7-42 Figure 7-24 Dialog for Adding and Updating a Filter................................................................7-43 Figure 7-25 Histograms Created in the Type View ..................................................................7-44

Page 18: 000000000001010555

xvi

Figure 7-26 Cable System in Original (Left) and Modified (Right) Form..................................7-45 Figure 7-27 Experience Norms/Rejection Levels for the PD Amplitude Levels .......................7-46 Figure 7-28 PD Occurrence Frequency ...................................................................................7-46 Figure 7-29 Database View of the Different Diagnosed Cable Systems .................................7-47 Figure A-1 PFM Framework...................................................................................................... A-2 Figure A-2 Winding Failure Risk Analysis – Older Westinghouse .......................................... A-12 Figure A-3 Winding Failure Risk Analysis – Others ................................................................ A-13 Figure A-4 LTC Failure Risk Analysis – Poor Performer ........................................................ A-13 Figure A-5 LTC Failure Risk Analysis – Fair Performer.......................................................... A-13 Figure A-6 LTC Failure Risk Analysis – Good Performer ....................................................... A-14 Figure A-7 Main Winding Aging Model (Normal Loading)....................................................... A-15 Figure A-8 LTC Wear Model ................................................................................................... A-15 Figure A-9 Average Failure Rate and Risk for an Aging Fleet of Transformers...................... A-16 Figure A-10 Example of Optimizing Maintenance Intervals Based on Lowest Life-Cycle

Cost................................................................................................................................. A-18

Page 19: 000000000001010555

xvii

LIST OF TABLES

Table 1-1 Typical PFM Drivers and Benefits .............................................................................1-2 Table 4-1 Selective PFM Benefits..............................................................................................4-3 Table 5-1 Novel Condition Monitoring Approaches Used to Trigger Just in Time LTC

Maintenance.......................................................................................................................5-1 Table 7-1 Scenario Approach ....................................................................................................7-7 Table 7-2 FMECA of the Switchgear Secondary (Control) System .........................................7-19 Table 7-3 Part of the FMECA Drive System Circuit Breaker ...................................................7-20 Table 7-4 Final Step: Decision With Action..............................................................................7-31 Table A-1 LTC Population Characteristics, PFM Drivers and Benefits ..................................... A-3 Table A-2 LTC Condition Summary .......................................................................................... A-4 Table A-3 Industry Experience with LTCs................................................................................. A-5 Table A-4 Transformer Insulation Condition ............................................................................. A-6 Table A-5 PFM Technical Analysis Summary........................................................................... A-7 Table A-6 Number of LTC Operations Where 63% Contact Wear Is Expected...................... A-17 Table A-7 Metrics for LTC Performance ................................................................................. A-20 Table A-8 Metrics for Main Insulation Performance................................................................ A-21

Page 20: 000000000001010555
Page 21: 000000000001010555

1-1

1 INTRODUCTION

Performance focused maintenance (PFM) is a methodology to help maintenance and asset managers direct their limited resources to maintenance tasks that will best contribute to reaching the organization’s business goals. PFM integrates various technologies and techniques into maintenance in a phased approach that can be used to augment existing maintenance policies and programs and also develop new programs. The intent is to strengthen and build upon sound maintenance foundations rather than replace a utility’s current maintenance practices. PFM recognizes that, at the highest level, the maintenance process should be very similar for most utilities. However, PFM also recognizes that the approach to and application of these processes will differ from company to company due to individual circumstances, including the wide range of customer requirements, electric infrastructures, and maintenance organizations.

The flexible approach of PFM allows utilities to meet their specific operation and maintenance (O&M) goals and at the same time be confident that they are following an industry-accepted practice and the latest developments in maintenance technologies and methodologies.

Goals of the Project

The goal of the Electric Power Research Institute (EPRI) PFM project is to provide a maintenance framework that integrates many of the technical, economic, and managerial concepts that have been a foundation of maintenance for the past several decades as well as recently introduced concepts and ideas. PFM incorporates the concepts of modern asset management and integrates many previous EPRI activities, allowing utilities to build custom maintenance strategies that meet their business needs and utilize the best practices of the industry.

The PFM Concept

At the highest level, PFM is a methodology to answer the question: Are my maintenance resources being used in the most effective and efficient way to achieve the desired performance goals of my utility? Dissatisfaction with its existing maintenance program is not a necessity to apply some or all of the PFM concepts; in fact, it is generally true that existing maintenance programs are servicing the utility well. It is also expected that some of the PFM elements are currently incorporated into the utility’s maintenance strategy but that there is room for improvement. PFM provides a structured but flexible approach to improve overall maintenance effectiveness that can adapt to an individual utility’s needs and resources.

Page 22: 000000000001010555

Introduction

1-2

There are two basic starting points for reviewing and improving maintenance:

• One can start from an identified issue that requires correction and then work back, analyzing and reviewing the maintenance tasks and strategies that influence that issue. The scope of such a review depends on the issues being addressed. For example, if the issue identified is excessive maintenance costs across the board, a total review of maintenance could be initiated. However, if the concerns were about excessive corrective maintenance costs, an analysis and identification of the major contributors to those costs would be indicated. If a particular type or model of equipment was identified as a significant factor resulting in corrective maintenance, the PFM review could be directed at that equipment and the maintenance associated with it.

• The other starting point would be at the level of a particular maintenance subprogram or task in order to evaluate the results achieved by that activity in relationship to the costs expended. This approach addresses issues such as: Is this the most efficient task for the desired results? Can monitoring be cost effectively substituted for scheduled maintenance? What are the impacts on performance if a particular task is eliminated?

Regardless of the starting point, application of PFM should result in a similar set of recommendations. PFM directs review and analysis only at the areas targeted for improvement. Potential benefits of PFM are listed in Table 1-1. The PFM methodology will be further explained and illustrated with examples in the following sections.

Table 1-1 Typical PFM Drivers and Benefits

Problem Benefits

Underfunding

• Identifies where cutbacks are prudent

• Identifies consequences of reduced maintenance

• Links maintenance requirements with utility goals and objectives

Reduced reliability

• Identifies appropriate tasks for preventing loss of function and risk of failure

• Establishes realistic reliability goals

• Suggests design changes to improve reliability

Inefficient use of data

• Identifies what data are needed for maintenance and how the data are to be used in a continuous improvement process

• Reveals how data can be used on a predictive basis

Page 23: 000000000001010555

Introduction

1-3

Table 1-1 (cont.) Typical PFM Drivers and Benefits

Problem Benefits

Lack of executive support

• Tightly links maintenance activities to executive goals

• Measures continuous progress

• Identifies risks associated with program changes

• Reassures regulatory bodies that maintenance is being effectively managed

Regulatory oversight

• Ensures that regulatory requirements are followed

• Charts progress

• Provides a documented basis for the current approach

Page 24: 000000000001010555
Page 25: 000000000001010555

2-1

2 BALANCED MAINTENANCE APPROACH

Over the past decades, utility maintenance has transformed from a quiet, routine, and behind-the-scenes activity to one of the most dynamic and high-interest segments of the utility industry. Numerous approaches to maintenance have been developed and publicized as the most advanced approach to maintenance. While some of these new approaches have been more revolutionary than others, they have fueled a desire to identify a single best practice approach to maintenance. However, because there is no one-size-fits-all approach to maintenance, many times the benefits of these new approaches are either overstated or not realized.

PFM acknowledges that each utility must find its own balance between the desire to prevent failures and the ability to finance maintenance, the risk adversity and the ability to predict failures, and finally a balance between what its customers desire in terms of reliability and what its regulators are willing to include in rates. To achieve and maintain these balances, PFM includes a framework of key elements that are pictured in Figure 2-1.

Figure 2-1 The PFM Framework

The PFM framework is designed to:

• Provide a methodology to effectively and appropriately apply new maintenance concepts

• Incorporate risk analysis

Page 26: 000000000001010555

Balanced Maintenance Approach

2-2

• Identify a detailed and in-depth approach to managing and prioritizing the maintenance of specific assets and an overall approach to optimize asset and task performance

• Allow utilities to build their own individual maintenance strategies with the assurance that they are incorporating best practices that fit their own specific corporate and customer service objectives

• Leverage currently available data and information resources to use in algorithms to predict asset deterioration and incipient functional failure

• Embrace business goals and customer service objectives by employing maintenance approaches that are both technically and economically effective

• Support dynamic prioritization

• Set realistic maintenance goals

• Supply feedback, making continuous maintenance improvements that are a requirement—not an option—of maintenance

• Identify best practice building blocks

• Identify models that take greater advantage of data and underutilized native intelligence to trigger maintenance and replacement decisions

• Build from previous EPRI projects

Definitions Used in this Report

Utility maintenance has made some revolutionary changes during the past few decades. With these changes has come a new vocabulary that sometimes adds as much confusion as clarity to the subject. Because PFM encompasses many of these recent maintenance concepts, it is important to establish a clear set of definitions.

In the past, maintenance terms and definitions varied from source to source. A dictionary might define maintenance as the act of keeping equipment in the state of repair. In PFM, a much broader definition of maintenance is used to refer to all activities performed on equipment and systems in order to manage, assess, maintain, or restore their operating functionality. It is important to note that monitoring, inspecting, testing, and measuring are maintenance activities.

In the utility industry, maintenance has been targeted for improvement with a focus on maximizing equipment reliability while minimizing the cost of performing time-based and condition-based maintenance tasks. As a result, a significant effort has been placed on creating a common set of maintenance process and maintenance task definitions. EPRI has recently published various documents in the Generation sector with a clear set of maintenance terms and definitions. To optimize the use of this guide, it will be useful to review and understand these basic terms and concepts that are now accepted and understood across the industry. These definitions are consistent with the fossil and nuclear utility industries and are widely accepted and practiced definitions [1]. Also, these definitions are consistent and easily associated with the many terms used in the transmission and substation utility business units.

Page 27: 000000000001010555

Balanced Maintenance Approach

2-3

Maintenance Philosophy

A maintenance philosophy is an organization’s basic set of beliefs for developing a strategy to meet its overall business goals. Maintenance philosophies include goals such as maintaining high reliability, being the low-cost provider, minimizing capital investments, and increasing customer satisfaction.

Maintenance Strategy

A maintenance strategy is the specific set of actions and plans developed by an organization to support its philosophy and accomplish its goals. The specific strategy that an organization deploys consistent with its philosophy includes an appropriate mix of the following described maintenance processes and tasks/activities.

Reliability Centered Maintenance

Reliability centered maintenance (RCM) is a process to study and analyze equipment criticality, functions, failure modes, and causes to determine the appropriate technical mix of maintenance tasks that will best help an organization achieve its reliability goals. It is a step-by-step approach to optimize the maintenance task balance by incorporating equipment/plant knowledge, maintenance history, and industry experience. The RCM process results in a documented set of technically and economically effective maintenance tasks.

Maintenance Basis

The maintenance basis (MB) is the documented rationale for understanding expected equipment and system failures as well as their associated rationale for maintenance tasks and frequency to achieve an organization’s desired goals for safety, the environment, equipment reliability, and O&M costs.

Page 28: 000000000001010555

Balanced Maintenance Approach

2-4

Maintenance Tasks

There are five basic maintenance tasks that are required by an organization to be performed to protect or restore component/equipment/system functions. These processes are:

• Corrective maintenance (CM) tasks

• Preventive maintenance (PM) tasks

• Condition directed/based maintenance (CDM) tasks

• Predictive maintenance (PdM) tasks or condition based maintenance (CBM)

• Hidden failure finding (HFF) tasks

Corrective Maintenance Tasks

The corrective maintenance (CM) process is the most basic of maintenance processes. It is also commonly referred to as reactive maintenance. CM is the process of restoring equipment or components affecting personnel safety or equipment/plant reliability that have failed, are degraded, or do not conform to their original design, configuration, performance criteria, or intended functions. A component should be considered failed or degraded if the deficiency is similar to any of the following examples:

• Is removed from service because of actual or incipient failure

• Does not meet design specifications for configuration or performance

• Creates a personnel safety hazard or equipment reliability concern

• Adversely affects the performance of nearby equipment (for example, missing piping insulation that increases the operating temperature of nearby electrical equipment)

• Releases fluids that create contamination concerns (or have the potential to, under postulated accident conditions)

• Adversely affects controls or process indications that directly or indirectly impair an operator’s ability to operate the equipment or reduce redundancy of important equipment

There are two types or classifications of CM tasks:

• If an organization’s strategy for a component or equipment is to run the component/equipment to failure (RTF), the failure is expected. The resulting maintenance task to restore or repair the component or equipment is considered to be an expected corrective maintenance activity (CM-E).

• If an organization’s strategy for a component or equipment is to prevent or avoid equipment failure due to the significance of the function of a component or equipment, the resulting maintenance task to restore or repair the component or equipment is considered to be an unexpected or undesired corrective maintenance activity (CM-U).

Page 29: 000000000001010555

Balanced Maintenance Approach

2-5

Preventive Maintenance Tasks

The preventive maintenance (PM) process includes all program aspects to effectively manage periodic condition monitoring and periodic time-based actions taken to maintain or ascertain the condition of a piece of equipment within design operating conditions and to extend its life. They are performed before equipment failure, to avoid performance degradation or to reduce the likelihood of equipment failure.

The maintenance tasks/activities that are generated as a result of the PM process are:

• Time-based tasks to restore a piece of equipment to new or an improved condition or to replace a piece of equipment at a certain point in time. These tasks are identified as PM-RRs.

• Condition monitoring tasks that include the collection of data that could be used to indicate condition, such as visual inspections, electrical testing, infrared thermography, and oil or gas analysis. These tasks are identified as PM-C MTs.

Condition Directed/Based Maintenance Tasks

Condition directed/based maintenance (CDM) tasks are the resulting tasks that are triggered when a condition monitoring task indicates that end-of-life is near. These tasks renew the item to a “like new” or “good as new” condition.

Predictive Maintenance Tasks (Also Referred to as Condition Based Maintenance)

Predictive maintenance (PdM) tasks are maintenance activities that require models, technologies, people skills, and communication to integrate all equipment condition data to make timely decisions about maintenance requirements. PdM tasks are many times confused with condition monitoring tasks, which include the act of collecting condition monitoring data or PM-CMTs.

The result of an effective PdM process is informed and effective CDM decisions. The resulting maintenance tasks taken based on maintenance decisions generated by the PdM process are referred to as CDMs.

Hidden Failure Finding Tasks

Hidden failure finding (HFF) tasks are a special form of condition monitoring. Failure finding involves the detection of a failed function that is not obvious to the operators of the utility system. The task generally involves the functional operation of the device or system, resulting in a “go” or “no-go” decision. If the function cannot be performed, a condition directed task is implemented to return the item to a fully functional condition.

Page 30: 000000000001010555
Page 31: 000000000001010555

3-1

3 ELEMENTS OF PFM

To provide a balanced approach to maintenance, PFM employs several subprocesses to ensure that the needs of all maintenance stakeholders are addressed. This section of the report describes each of these subprocesses at a high level. Detailed documentation of subprocess procedures will be part of future EPRI work.

In its simplest form, PFM can be broken into seven major sequential processes, as described in Figure 3-1. These processes are:

1. Planning

2. Technical maintenance approach

3. Aging and modeling

4. Best practices

5. Measure performances

6. Documentation and implementation

7. Measurement and feedback

Page 32: 000000000001010555

Elements of PFM

3-2

Figure 3-1 PFM Process Block Diagram

1. Planning – Aligning Maintenance with Utility Goals

Regardless of the state of the current maintenance program, it is critical that one understands that maintenance is just one of many functions contained in the utility operating and business structures. Maintenance must not function as an isolated organization but integrate with all of the utility business elements to ensure that the overall goals and objectives of the utility are being met. Linking maintenance to important strategic utility goals guarantees long-term support and helps define maintenance success.

Page 33: 000000000001010555

Elements of PFM

3-3

PFM Planning Process Objective

The objective of the PFM planning process is to assemble the appropriate team for performing the PFM process and to ensure that the team understands how maintenance supports all or parts of each specific strategic utility business, employee, and customer service goal.

Elements of the Planning Process

Executive Sponsorship and Reporting

Development of effective and far-reaching initiatives will not be possible if the PFM process is not supported at the appropriate utility management level. Sustained executive sponsorship and visibility is necessary for success. The PFM process requires resources from many organizations and could have far-reaching impacts. Clear identification of sponsors, reporting structures, analysis scope, and timetables must be made before undertaking all or part of a PFM process.

System Selection

Maintenance is focused on ensuring the functional performance of specific utility assets. Each of these assets has its own technical operating and business requirements. PFM is a focused process that requires the resources of subject matter experts (SMEs). Each asset or system of assets has its own set of experts. To provide focus for the PFM team and to ensure that it includes the right mix of resources, the assets and systems to be analyzed must be clearly defined.

Team Assembly

PFM is performed by a multi-disciplined team that has a good understanding of critical maintenance stakeholder needs. Team members are not necessarily involved in all phases of the PFM process but should understand the objectives and agree on the findings. Potential team members include:

• Asset managers

• Maintenance engineers

• Maintenance supervisors

• Maintenance technicians

• System planners

• Apparatus engineers

• Data analysts

• Financial analysts

• System operators

Page 34: 000000000001010555

Elements of PFM

3-4

Understanding Utility Goals

Maintenance is a very necessary element of every utility. Maintenance is a support organization that plays a major role in ensuring that the strategic utility goals are consistently being met. The PFM team must understand how maintenance successes and failures impact each of these strategic goals. How the utility goals place constraints on maintenance must also be understood; future PFM recommendations must be consistent with both these goals and constraints.

2. Developing a Technical Maintenance Approach

At the most fundamental level, the maintenance goal is to prevent the loss of critical functions by performing an appropriate set of maintenance tasks, thus reducing the likelihood of functional failure to an acceptable level.

PFM Technical Process Objectives

The objectives of this PFM process are to develop a technical understanding of how failures manifest themselves, the effects of failure, and how the failures can be effectively prevented.

Elements for Developing a Technical Maintenance Approach

Identifying Critical Functions

Every asset in the utility has a series of critical functions that it is expected to perform at a very high degree of reliability. These critical functions must be identified so that the appropriate maintenance strategy for guaranteeing their success can be developed.

How Do Failures Manifest Themselves?

Only by understanding the mode of failure and the precursors to failure can one develop an effective, preventive strategy.

Identifying the Effects of Failure

Failures with benign effects are good corrective maintenance candidates. Failures with severe failure effects are worthy of a preventive strategy or even a design change. The effect of a failure along with its probability identifies the risk associated with the failure. Understanding the effects of a failure in terms of safety, costs, system impact, and customer outage consequences is helpful in setting the level of acceptable risk.

Page 35: 000000000001010555

Elements of PFM

3-5

Selecting the Right Preventive Strategy

There are, many times, several effective methods for preventing a failure from taking place or reducing its impacts. Choosing the best single approach requires both a technical understanding of the equipment and a practical understanding of utility operations. PFM task selection models help to ensure that technically and economically effective preventive strategies are developed.

Data and Measures

Many maintenance strategies are driven by operational and diagnostic data. A listing of all available data elements is developed and a determination is made as to their value, usefulness, and impacts on future maintenance activities. The resultant list provides a foundation for future database requirements, performance metrics, and the development of predictive algorithms.

3. Building an Aging Model

Each failure has its own unique mechanism of manifesting itself. Understanding this mechanism allows one to identify conditional changes that occur prior to failure and to build models that predict those conditional changes.

PFM Aging Model Objectives

The objective of the PFM aging model process is to use information and insight gained during the development of a technical maintenance approach and build models that describe how an item ages. These models focus on the root cause of failure and their associated aging mechanism. Because the probability of failure changes with age for most failure mechanisms, the risk associated with failure also changes. By developing a mathematical aging model, one can better assess the risk associated with various maintenance strategies as well as develop a predictive approach to some maintenance tasks.

Elements for Building Aging Models

Determining the Aging Mechanism

Time is not the only mechanism of an aging process. Many times, aging is influenced by operating events, temperature, loading, or other factors. The dominant cause of aging must be identified in order to build an appropriate aging model.

Page 36: 000000000001010555

Elements of PFM

3-6

Can Aging Be Measured?

Aging models are often built from data collected during maintenance and diagnostic testing. If historical data are available, models can be created directly from the data. If data are not currently available, the analysis team should determine what data can be collected in the future to calibrate the model.

What is an Acceptable Level of Risk?

Acceptable risk levels are rarely explicitly defined by the utility. These levels are more generally alluded to in the utility’s stated goals and objectives in various forms. These forms may include:

• System average interruption duration index (SAIDI) and system average interruption frequency index (SAIFI) targets

• Maximum allowable failure rates

• Environmental exposure

• Safety goals

These generalized goals and objectives must be transformed into specific failure probabilities or risk limits.

Limiting the Risk

Risk is a function of the probability of failure and the total effects of a failure. The aging model that predicts end-of-life can also be used to predict probability of failure and the associated risk. By applying the maintenance tasks identified above at the appropriate apparatus age, the amount of risk being taken by the utility can now be capped.

4. Creating a Maintenance Plan – Best Practices

Maintenance plans are composed of a series of tasks aimed at effectively and technically reducing the probability of a functional failure. Each task has its own requirements and costs. Since maintenance involves human and equipment resources, the various maintenance task activities must be coordinated in a manner that produces high reliability, high availability, and lower costs.

PFM Plan Objectives

The objective of PFM plan development is to ensure that maintenance tasks are logically integrated and that intervals and task triggers result in the correct balance between reliability and cost.

Page 37: 000000000001010555

Elements of PFM

3-7

Elements for Building the Maintenance Plan

Task Triggers

Maintenance activities must be performed at the appropriate time—either during the periodic scheduling of activities based on the calendar or after a series of operating events. Many times, maintenance is performed when a specific condition is observed or measured (resulting in CDM).

For each maintenance task, the appropriate trigger must be identified. A determination of the urgency of each task should also be made so that an understanding of the risk associated with delays can be predetermined.

Optimizing Maintenance Intervals

Many times the scheduling bandwidth associated with periodic maintenance is quite wide. Reduced maintenance intervals may result in reduced risk and higher reliability at the expense of increased maintenance costs. Extended periodic maintenance intervals can have the opposite impact. Choosing the right periodic interval should result in:

• Not exceeding risk levels

• Low cost

• High reliability

The aging and risk models developed previously are used to identify the risk, reliability, and costs associated with various periodic maintenance intervals. These models are then used to set the optimal maintenance interval.

Building a Maintenance Plan

Effective maintenance takes full advantage of available O&M resources. This means that the technical maintenance strategy identified previously is not implemented in a vacuum. The development of a maintenance plan recognizes that maintenance must also be a coordinated process making effective use of:

• Planned outages

• Routine observations

• Automated data and event collection

• Specialty human and technical resources

• Customer needs

Page 38: 000000000001010555

Elements of PFM

3-8

Dynamic Prioritization

Even the best maintenance plans cannot guarantee the availability of all necessary resources to ensure that they are performed on schedule. As maintenance is based more and more on diagnostics, condition monitoring, and predictive algorithms, it becomes difficult to develop detailed, long-range maintenance plans. It is inevitable that some maintenance delays will take place. It is important that these maintenance delays be managed in a way that minimizes their impact.

Dynamic prioritization involves understanding the effects of delayed maintenance and the risk sensitivity of each maintenance trigger. Dynamic prioritization implies that maintenance scheduling delays are acceptable and each maintenance plan has its own resilience to these delays. Dynamic prioritization allows dissimilar maintenance plans to be compared and identifies the relative criticality of each plan.

5. Measuring Performance

After a maintenance plan has been developed but before it is implemented, the expected results of the plan must be identified. These performance targets allow for meaningful external review by the maintenance stakeholders and set the stage for the continuous improvement of the maintenance program.

PFM Measurement Objectives

The goal of the PFM performance measurement process is to identify a set of metrics and key performance indicators (KPIs) that objectively and meaningfully identify the progress of maintenance and provide linkage to higher level strategic utility goals and objectives. For each metric and KPI, a set of targets must be set so that success is defined. Data required to calculate these metrics and KPIs must be readily available, or an action plan to obtain these data must be created.

Setting Specific Maintenance Goals

Regardless of the forces that are driving a review of the current maintenance program, realistic goals must be set so that targeted actions can take place. These goals must be specific, be achievable, and address the requirements of all major utility stakeholders. Without goals it is difficult to determine if overall improvement really takes place and impossible to know when success is achieved.

For maintenance, these goals can be generalized and quantified in terms of:

• Safety

• Functional reliability

• Equipment or system availability

Page 39: 000000000001010555

Elements of PFM

3-9

• Equipment maintainability

• Economics

• Quality of service

These goals should form the foundation of any maintenance program and serve as both the starting point and final destination for PFM.

If goals have not been correctly established, it is difficult to choose the appropriate course of action. If stakeholders do not agree that the goals are adequate, they will never be satisfied. Many times the goals of the maintenance organization, while valiant, do not meet and sometimes conflict with the expectations of the various stakeholders.

Developing realistic goals that can be embraced by all stakeholders provides the foundation for building a sustainable maintenance program. Sometimes every goal the stakeholder desires can be met, and sometimes they cannot. In the latter case, re-evaluation of the goals, system design, or both may be required.

Developing Metrics and KPIs

For each maintenance plan, a specific set of metrics must be developed to ensure that the plan is technically effective and that utility, reliability, and operating objectives will continue to be met into the future. These metrics and KPIs address:

• Reliability

• Availability

• Maintainability

• Supportability

• Financial prudence

Determining Data Requirements

When developing the technical approach for maintenance, a list of available data elements is developed. This list must be reviewed in the context of metrics and KPIs. Each data element should support an immediate need to take action, a predictive aging model, and one or more metrics or KPIs. Data that do not support any of these items must be reassessed for their value. Conversely, each metric and KPI should rely on readily available data. If data are not available, a determination of the value of the metric and KPI must be made, resulting in either a change in metrics or KPIs or the development of a new data collection requirement.

Page 40: 000000000001010555

Elements of PFM

3-10

Setting Targets

All maintenance metrics and KPIs must have a set of targets that indicate the following:

• Goals are being met.

• Goals are not being met, but the program is under control.

• The maintenance plan is not effective, and changes are necessary.

The targets must align with the aging model development and risk analysis previously performed. Failing to align these items will result in targets that are well below the risk acceptance level or well beyond the capabilities of the existing maintenance resources.

Identifying the Current State of Maintenance

Once the goals have been established, it is necessary to determine the maintenance program’s current proximity to these goals. If historical data are available, this task should be relatively simple and accurate. If historical data are not available or in the wrong form, performance estimates must be made from expert knowledge and whatever data are available. Having a lack of data should not be an insurmountable impediment; performance estimates can be made using data from a random sample of equipment. Care must be taken to ensure that the sample is large enough to represent the whole population and over an appropriate time frame. Figure 3-2 shows the relationships among a utility’s goals, maintenance performed to achieve those goals, and the performance results.

Gap Analysis

The difference between the desired goal and the current situation is known as a gap. The wider the gap, the more likely the current maintenance program is in need of in-depth review or change. Similarly, if there is a large number of gaps, a broader analysis must be made in order to understand the cause for these gaps and the best methodology to close them.

The identification of gaps may indicate a need to revise previously identified maintenance program targets or a need to conclusively determine if the proposed maintenance program can effectively meet expectations.

Page 41: 000000000001010555

Elements of PFM

3-11

Figure 3-2 The Relationships Among Utility Goal, Maintenance, and Performance

6. Documentation and Implementation

Comprehensible documentation is necessary to gain outside support and to keep the maintenance program “living.” The documentation forms the basis for plan acceptance and future enhancements. Although each step of the PFM process should be documented at the time it is performed, a final summary document must be developed.

Implementation of a new maintenance program is the most overlooked aspect of any maintenance review project. Unless the existing maintenance program has been well refined or resistance to change has been insurmountable, it is necessary to formulate an implementation plan. It should be recognized that:

• Implementation failures account for a large percentage of program failures.

• Implementation costs make up about 25% of total PFM program costs.

• Implementation is the time to solidify worker and management acceptance and buy in.

Page 42: 000000000001010555

Elements of PFM

3-12

PFM Documentation and Implementation Objectives

The objective of the implementation and documentation process is to ensure that the PFM recommendations are implemented with minimal objection, the work of the PFM team can be understood by all maintenance stakeholders, and that continuous improvements can be sustained well into the future. The process requires full reconciliation with the existing maintenance strategy and documentation as to why the change is necessary.

Elements for Documenting and Implementing the PFM Recommendations

Reconciliation

If a maintenance program previously existed, it is required that a comparison of the PFM-based program be made to the current maintenance program. This comparison will identify differences that must be justified and potentially effective maintenance tasks that were overlooked or discarded during the PFM technical processes. Any resulting program changes must be identified and provided with supporting documentation so that a change implementation plan can be developed.

Identifying Change

Change can be a difficult process, especially if it is the response to a knee-jerk reaction. The PFM process will identify many changes that should be made to the existing maintenance program and its ongoing support processes and organization. Many of these changes will be identified by the stakeholders themselves and thus will have a pre-established base of support. It must be recognized that even with this base of support, it is not a guarantee that all organizations will immediately understand and embrace the change.

Although not all changes require up-front pre-approval and buy in, there will be a few changes that are pivotal to the implementation of the PFM findings. These critical changes must be identified well in advance and formal change-management activities initiated. This change-management process may need the support of:

• Asset management

• Information technology

• Labor relations

• Maintenance craft personnel

• First-line supervisors

• Upper management

Page 43: 000000000001010555

Elements of PFM

3-13

Impact Analysis

Every change to an existing maintenance plan has some impact. Some of these impacts are minor while others may be severe, requiring significant resources to mitigate the impacts. The impact of each change must be identified. Then, a determination as to the overall value the change has on the ability of the maintenance program to meet its goals and objectives must be identified. In some cases, the impacts might exceed the value of the change, and thus a different approach is required.

Change Management

Whether a change is welcome or not, it generally does not happen without support and management. All changes identified by the PFM process must be managed to ensure that they are made correctly and in a timely manner.

Implementation Plans

Changes to an existing maintenance plan and system can be significant events themselves. Many times these changes require both financial and technical resources that must be secured in advance. To ensure that the implementation goes smoothly, specific implementation plans must be developed to address:

• Changes to the maintenance management system

• Resources needed for implementation

• Additional training

• New equipment purchases

• New procedures

• New monitoring equipment

• Responsibilities for action and approvals, interfaces, schedules, and milestones

Implementation planning also requires:

• Executive sponsorship

• Training

• Leadership program development team

• Tight link between program developers and craft

• Feedback mechanism

• Sustained support

Page 44: 000000000001010555

Elements of PFM

3-14

7. Measurement and Feedback

No single piece of equipment is 100% reliable nor is any PM program 100% effective. PM programs improve with age only if O&M targets are defined and focused. The path to meeting these goals is not singular and linear but iterative and circular.

PFM Measurement and Feedback Objectives

The objectives of the measurement and feedback process are to ensure that the maintenance program is on track; to meet maintenance technical, financial, and operational objectives; and to communicate progress to all maintenance stakeholders.

Elements for Measuring Maintenance Effectiveness and Providing Feedback

Measurement

The metrics and KPIs previously identified must be updated on a periodic basis. Data should be readily obtained from appropriate collection systems and suitable calculations made. Depending on the KPI or metric, updating may be necessary only at quarterly or yearly intervals.

Reporting

Not all maintenance stakeholders are interested in all the KPIs or metrics. Only items of interest need to be periodically reported to each stakeholder group. The reporting should include:

• Historical trends

• Summary analysis of the data

• Future expected maintenance actions that have influence on these items

Making Corrections

Each metric or KPI has a predefined acceptance level. Negative trends or off-target conditions for any of these reporting items must receive follow-up analysis. This analysis must identify:

• Data or operating anomalies

• New information not identified or addressed by previous PFM analyses

• Needs to take action

• Suggested corrective actions

Page 45: 000000000001010555

Elements of PFM

3-15

Implementing New Technologies and Maintenance Tasks

New technologies and maintenance techniques are continually being developed. Many of these items have the potential of improving the existing maintenance program, while others may just be a cost resulting in little or no performance improvement. These new opportunities must be analyzed, tested, and validated before they are made part of the mainstream maintenance program.

Page 46: 000000000001010555
Page 47: 000000000001010555

4-1

4 USING A TARGETED APPROACH WITH PFM

Although PFM is a comprehensive process for developing and assessing a maintenance program, it does not need to be employed in totality. PFM can be applied on a targeted basis, using only selected PFM activities. Figure 4-1 shows a logic diagram for selectively implementing PFM.

Page 48: 000000000001010555

Using a Targeted Approach with PFM

4-2

Figure 4-1 Simplified PFM Decision Diagram for Selective Implementation

Page 49: 000000000001010555

Using a Targeted Approach with PFM

4-3

Some of the potential benefits that can be gained through selective implementation are listed in Table 4-1.

Table 4-1 Selective PFM Benefits

Activity Typical Benefits Realized from the Selective Activity

Stakeholder identification

• Identifies your customers/maintenance allies

• Expands the list of actual maintenance benefactors

• Clearly identifies the actual needs of all maintenance benefactors

• Legitimizes the impacts of failure

• Sets a foundation for building business cases for various maintenance initiatives

Maintenance goal setting

• Sets specific and measurable maintenance targets

• Provides a path for continuous improvement

• Provides a cause and effect relationship between maintenance strategy and maintenance results

Measures and metrics

• Identifies specific methods for determining success and the needs for change

• Provides clarity to data collection requirements

• Identifies the need for corrective actions when metrics exceed boundaries

• Provides input to reliability models

• Charts progress

KPI development

• Provides executives with a meaningful view of the value of maintenance

• Provides regulators with a level of confidence that maintenance is being managed

• Charts progress

Function and failure analysis

• Identifies most effective maintenance tasks

• Eliminates ineffective and redundant tasks

• Identifies the need for design change

Page 50: 000000000001010555

Using a Targeted Approach with PFM

4-4

Table 4-1 (cont.) Selective PFM Benefits

Activity Typical Benefits Realized from the Selective Activity

Predictive modeling

• Leverages the use of data

• Reduces maintenance costs

• Increases equipment reliability and availability

• Forms a technical basis for equipment replacement

Task interval optimization

• Reduces total overall maintenance costs

• Provides a direct linkage between reliability and task frequency

• Provides justification for maintenance intervals

• Identifies the impacts of budget changes

Maintenance strategy documentation

• Provides a consistent basis for maintenance decisions

• Instills confidence in the minds of executives and regulators that maintenance is being well managed

Task prioritization • Helps allocate limited resources effectively

• Ensures that the utility gets the biggest “bang” for its maintenance dollar

Closing the Gaps with PFM Selective Activities

The following describe common ways that the PFM templates have been used to enhance mature maintenance programs.

Goals Are Unclear

The PFM performance measurement activity provided the utility asset manager with a methodology for setting maintenance goals with results that are measurable.

Reliability Is Below Expectations

Reliability requirements were identified for each critical function, and appropriate metrics were developed that clearly indicated whether or not progress was being made. The metrics template provided the PFM leader with a framework for identifying suitable metrics and measures.

Page 51: 000000000001010555

Using a Targeted Approach with PFM

4-5

Executives Are Confused About the Value of Their Maintenance Investments

Maintenance is a complex subject that must be managed by experts. Executives cannot be expected to understand maintenance at the same level as asset managers, but they must be able to see how maintenance allows the utility to meet its business and customer service goals. PFM KPI activities were used to distill maintenance activities into a balanced scorecard that provided executives with constant insight into how maintenance was effectively using precious resources to meet critical utility goals and how maintenance objectives aligned with the utility’s risk profile.

Regulators Are Challenging Your Maintenance Program

Regulators understand process and results, but they do not understand the details of maintenance. When utility operation and customer service results do not meet expectations and there are no supporting processes to validate the current maintenance approach, regulators respond with orders that are not necessarily in the best interest of the customer or the utility. The PFM activity for building an active and documented maintenance strategy that could withstand the scrutiny of regulators was employed. It provided persuasive arguments supporting the utility’s maintenance approach.

Availability Requirements Are Tightened

Reduced downtime, whether scheduled or nonscheduled, is the only way to increase equipment availability. Successful application of PFM technical approaches for developing a predictive maintenance strategy improved both reliability and availability.

Maintenance Tasks Are Not Achieving Desired Results

New technologies promise improved reliability at reduced costs, but will they work? The PFM task selection process ensures that the most proactive maintenance practices are being used.

Want to Make Better Use of Data

The approach to maintenance must be intelligent and business based, effectively using readily available data to predict equipment condition and measure both asset and program performance. PFM aging models have been used to better predict end-of-life.

Maintenance Task Intervals Are Suboptimal

Periodic maintenance intervals have been optimized based on technical and economic objectives, not arbitrary time intervals.

Page 52: 000000000001010555

Using a Targeted Approach with PFM

4-6

There Is Too Much Work and Not Enough Resources

PFM prioritization and resource allocation processes ensured that the most critical functions were receiving the appropriate attention of maintenance.

A Replacement Strategy Is Needed

Efficient direction of capital expenditures for replacement of equipment occurs when maintenance cannot achieve reliability and availability requirements and risk tolerances are exceeded. PFM activities for evaluating risk will serve as the basis for building an equipment replacement strategy.

Intellectual Property Is Lost

Expert system knowledge currently possessed by experienced personnel was captured and institutionalized, elevating the proficiency level of all technical and operations personnel during PFM documentation stages.

Page 53: 000000000001010555

5-1

5 EXAMPLE APPLICATION OF PFM MEASURE AND PERFORMANCE ACTIVITIES

Overview

A large East Coast utility had embraced the concepts of RCM and applied the concepts to much of its substation maintenance program. The revised maintenance program was superior to previous activities and was based on a strong technical foundation. Condition-monitoring activities were heavily employed to initiate “just in time” maintenance for their load tap changing (LTC) transformers. The novel condition monitoring approaches and their derived benefits are listed in Table 5-1.

Table 5-1 Novel Condition Monitoring Approaches Used to Trigger Just in Time LTC Maintenance

LTC Attribute Monitored Derived Benefits

Operations counter

• Frequent use may indicate excessive contact wear and require maintenance.

• Infrequent use may indicate that the tap changer requires exercising through the tap change or reversing switch.

• Assesses contact life from cumulative operations.

Operations per tap

• Number of operations at each tap position may indicate excessive wear on certain taps or the reversing switch.

• Ensures regulating voltage range.

• Comparison with system voltage levels may indicate a tap changer control deficiency such as phase out of sync.

• Accesses contact life of each tap from cumulative operations.

Limit switch counter • Limit switch operations may indicate a gearing, voltage, or control deficiency.

Motor run time

• Changes in run times above baseline levels may indicate a mechanical deficiency such as contact wear, gear fouling, motor fouling, or AC supply failure.

• Indicates contact or bearing wear.

Page 54: 000000000001010555

Example Application of PFM Measure and Performance Activities

5-2

Table 5-1 (cont.) Novel Condition Monitoring Approaches Used to Trigger Just in Time LTC Maintenance

LTC Attribute Monitored Derived Benefits

Motor current • Changes in current may indicate a power supply failure or mechanical deficiency such as gear or motor fouling.

Gas-in-oil in diverter compartment oil

• Changes in concentration of gas-in-oil above the baseline may indicate coking or incipient dielectric failure.

• Indicates overheating due to high resistance current paths.

• Indicates contact wear.

PFM Findings

The PFM measurement and performance process revealed a significant gap between reliability and availability requirements and actual LTC performance. LTC contact wear was not being effectively monitored by the techniques presented in Table 5-1. A PFM technical maintenance approach was performed, and it was determined that LTC temperature modeling could provide significant insight into the state of each contact.

Overheated contacts in an LTC can result from various causes, such as coking, misalignment, and loss of spring pressure. Because contact temperature cannot be directly measured, the overheating will generally be detected by an increase in the LTC oil temperature. If the overheating progresses to an advanced stage, the oil quality will deteriorate and bubbles may form. A flashover between contacts could occur, which would place a short circuit on the regulating winding and cause the transformer to fail.

The technical analysis identified that LTC temperature profiles are normally influenced by weather conditions, cooling bank status, and electrical load. However, abnormal sources of energy (losses) also affected the temperature profile. Four potentially effective methods for determining when one or more contacts were near end-of-life were identified.

1. LTC Oil Temperature

The simplest temperature-related diagnostic involves monitoring the temperature level. LTC temperature in excess of a certain level may be an indication of equipment trouble. Setting a high temperature level for triggering LTC maintenance is quite challenging because normal loading and ambient temperatures cause LTC oil temperatures to change significantly.

2. Differential Temperature

Monitoring the temperature difference between the main tank and LTC compartment when the tap changer is in a compartment separate from the main tank can indicate high resistance conduction paths in the LTC. Typically, the main tank temperature will be higher than the tap

Page 55: 000000000001010555

Example Application of PFM Measure and Performance Activities

5-3

changer compartment temperature. Many factors influence differential temperature. Excessive losses caused by bad contacts or coking in the tap changer may be detectable. However, the LTC temperature can exceed main tank temperature periodically under normal conditions. Hourly variations in electrical load, weather conditions, and cooling bank activation can result in main tank temperatures below the tap changer.

Figure 5-1 shows a typical curve of the top oil temperature in the main tank and of the LTC compartment temperature for reactance-type tap changers. It should be noted that a signature needs to be developed for each type of tap changer. The top trace (black) is the main tank top oil temperature, and the bottom trace (gray) is the LTC compartment temperature.

Figure 5-1 LTC and Main Tank Temperature Profile

3. Differential Temperature with Trending

A method to distinguish between normal and abnormal differential temperature is to include load trends. If the LTC temperature is greater than that of the main tank when load is decreasing, this is deemed a normal condition. However, if the tap changer temperature exceeds the main tank temperature when load is increasing, this may indicate an LTC contact problem.

4. Temperature Index

Another method used to examine temperature differential involves computing the area between the two temperature curves over a rolling window of time. This quantity is called the temperature index and is usually expressed in degree-hours.

Normal temperature difference (defined as the main tank level above the LTC) is counted as negative area, and the reverse is positive area. Therefore, over a period of time, the index reflects the general relationship between the two measurements without changing significantly due to normal daily variations in temperature. Under abnormal conditions, the index will exhibit an increasing trend because the LTC tends to operate at a higher temperature relative to the main

Page 56: 000000000001010555

Example Application of PFM Measure and Performance Activities

5-4

tank. This method eliminates false alarms associated with simple differential monitoring but responds slowly to abnormal conditions.

Using Readily Available Data

Temperature monitoring of transformer main tank oil through SCADA was typical for this utility. Monitoring of LTC oil temperature with SCADA was not typical but easy to implement and low cost. Temperature data were archived in the SCADA data historian, which was accessible to the utility’s maintenance management workstation (MMW).

Development of an LTC temperature indexing algorithm was straightforward and easy to do. The algorithm calculated the temperature index on a periodic basis and provided both reports and maintenance triggering.

LTC Failure Avoided

Implementation of the temperature index model on a 65-MVA LTC transformer had positive results. Within a short period of time, an incipient reversing switch failure was detected and averted. The potential impact of this was $1.2 million in avoided replacement expenditures. The nearly failed contacts are shown in Figure 5-2.

Page 57: 000000000001010555

Example Application of PFM Measure and Performance Activities

5-5

Figure 5-2 Failed Reversing Identified by Temperature Index

Page 58: 000000000001010555
Page 59: 000000000001010555

6-1

6 THE ROLE OF DATA IN PFM

PFM is a consumer of data and a generator of information. Within the PFM construct, it is realized that proper use of data cannot only improve equipment reliability but also improve the maintenance management process by delivering important information to numerous organizations throughout the utility.

Developing and implementing algorithms to identify asset degradation and incipient functional failures is an important approach to maintenance. Traditional approaches to PdM have required the installation of numerous sensors and expensive data collection systems. PFM embraces the use of readily available data from different data sources and can reduce the cost of implementing and the time required to develop predictive maintenance strategies.

Typical data elements used by PFM include:

• Design

– Original equipment manufacturer (OEM) design

– System design

– Fault duty

• O&M

– Loading, voltage, and current

– Fault and switching operations

– Operating events and times

– Temperatures

– Run times

– Maintenance events

– Troubles and failures

– Outage times

– Part usage

• Diagnostic

– Periodic off-line measurements and diagnostic tests

– On-line monitoring

– Visual observations

Page 60: 000000000001010555

The Role of Data in PFM

6-2

– Laboratory tests

– Forensic tests

• Industry

– Trouble and failure experiences for similar equipment

– Generic failure models

– Availability statistics

– Reliability statistics

• Economic

– Asset replacement costs

– Labor costs – CM

– Labor costs – PM

– Labor costs – CDM

– Parts costs

– Failure effects

Multiple Uses of Data

Utility maintenance organizations are no different than many other businesses when it comes to the topic of data; they find themselves inundated with data but challenged as to how to transform them into real information. For maintenance managers, the challenge is twofold:

• How can existing data sources be leveraged to improve the current maintenance process?

• How can a maintenance organization transform its data into meaningful information that can be of value to other departments and organizations within the utility?

Data and data utilization are extremely important elements of PFM. The PFM project includes the building of models that integrate several utility data sources, domain expertise, and existing EPRI tools, resulting in improved maintenance approaches that target specific equipment families.

Consider today’s microprocessor protective relays. These devices now not only detect the presence of a fault, they also measure real-time currents and record sequences of events. This information is not only useful in evaluating the performance of the protective scheme but can also be used to evaluate:

• Breaker mechanism performance

• Breaker contact wear

• Transformer through-fault impacts

Page 61: 000000000001010555

The Role of Data in PFM

6-3

Where Data Are Applied in PFM

PFM leverages data whenever they are available. When they are not available, PFM suggests the initial use of proxies until the time that solid data are obtained. Typical areas where data are used to make decisions include:

• Understanding failure consequence

• Determining current failure probability

• Identifying and quantifying risk

• Building failure models

• Optimizing maintenance intervals

• Developing predictive models

• Setting norms

• Triggering maintenance

Page 62: 000000000001010555
Page 63: 000000000001010555

7-1

7 EFFECTIVELY USING DATA FOR RISK ANALYSIS

Introduction

Asset maintenance strategies based on performance require external drivers or circumstances that set the performance level. Maintenance plans will differ depending on the specific demands but will nearly always strive to achieve maximum performance given a set of performance requirements. Performance requirements have different characters, depending on the stakeholder being addressed. Shareholders can be interested in either or both short-term and long-term capital growth, industrial consumers may be interested in a variable relationship between availability and cost, and society (including captive customers and government) would generally be interested in continuous availability against lowest price possible as well as a safe and clean environment for people.

Making high level maintenance decisions should not be driven by a ruling principle forced by privatization or regulation. Opinions and/or (contractual) requirements on the one hand and technical information regarding asset performance on the other should be the performance drivers. Information as a result of effective analysis of the proper data should lead to decisions with respect to maintenance frequency, refurbishment, replacement, and task content, all at the utility level. The effective use of data, leading to the essential information, is dependent on the data collected, the collection process, and the analysis methodology. It is obvious that the quality of such an analysis is improved if it is based upon a larger data population rather than solely based on limited data derived from the utility alone. Maximizing the data population can be realized by either making use of industrial data concerning failure analysis or by sharing condition/performance data regarding the asset (or group/type of assets) with other owners of the same asset type. The leading guideline in all analysis processes is very simple: do the right things right at the right moment. Simply collecting mountains of data is not in itself valuable; the real trick is to focus on the essential data and convert them into useable management information.

This section addresses the process of deciding how to identify the data that will be needed for various maintenance information activities and how to apply this information in a decision process. It shows the necessity of applying subjective data related to the position of the asset in the network against the more objective asset condition information. It concentrates on information that should support decisions regarding the handling of equipment in relation to the total relevant maintenance stakeholder environment. It emphasizes and describes methods that can be applied to generate the proper information/data in a robust collection process. In many situations, this means not only mining existing databases but also using special distilling processes such as Delphi methods, applied in discussions with experienced engineers and

Page 64: 000000000001010555

Effectively Using Data for Risk Analysis

7-2

technicians. Moreover, once it has been determined what information is relevant to the maintenance process, a consistent method of generating information is required.

At the asset level, proper technical PFM activities such as failure modes, effects, and criticality analysis (FMECA) become the basis for determining what information is relevant. PFM analysis tells us that the best way to apply FMECA is based upon a functional division. This approach enables us to understand the mutual relationships between critical functions and the desired performance of each function. Apart from deciding which data entities are of relevance for maintenance decisions, the number of data entities becomes relevant when performance databases are designed. Collecting and storing measured data in a consistent way offers possibilities for applying different analytical tools that optimize the generation of expert knowledge. If properly stored in a database of sufficient quality and structure, combinations of measured data can be analyzed per type, fleet, or location. Knowledge about performance can thus be transferred into the creation of new expert rules and applied for decision with regard to maintenance, especially if the collected data are shared with others.

Data Drivers and Risk Management

As stated in the introduction, the main challenge of asset management is to make the right decision at the right moment. This series of correct actions implies a decision process at different levels, leading to the proper order/priority of the execution of activities. The decision as to whether or not to execute a maintenance action requires insight not only into the assets’ performance but also on its short-, mid-, and long-term effects at the system level and the respective consequences to the stakeholders.

General utility management must realize that they severely challenge the abilities of the asset managers when they set qualitative and quantitative requirements for O&M. Requirements that address expectations regarding the quality of supply, environmental load, customer service, and employee satisfaction challenge the asset manager to develop plans that realize these goals while simultaneously achieving the lowest possible long-term cost of ownership. Many times these requirements force the asset manager to find an optimal balance between often conflicting satisfying factors of stakeholders.

The asset manager has to assign a (limited) budget in the correct order to a series of capital and O&M projects. The order of assignment should depend on the contribution the project/activity has on the gap between the assumed actual (is) and the anticipated (should be) needs for all stakeholders. This balancing process is often referred to as risk management. Figure 7-1 depicts the challenges the asset manager is facing in this priority process.

Page 65: 000000000001010555

Effectively Using Data for Risk Analysis

7-3

Figure 7-1 Establishing Priorities

Risk management may be expressed, in the utility environment, as a function of the probability of the occurrence of supply failures. The translation from accepted risk levels into activities necessary to maintain the supply function through the upkeep of the asset performance is thus of extreme importance. Methods to measure the influence of asset performance on the system level are based upon historic information regarding the quality of supply. Methods to measure the quality and expected performance of the asset itself strongly depend on the information with respect to the specific asset type as integrated into the PFM failure modes and effects analysis (FMEA). Industrial data or, preferably, data of certain complete asset fleet type can generate only generic performance information. Applying data-mining techniques to a combination of utility-specific and industry databases provides better insight into future asset behavior and becomes a better platform for making risk decisions. A properly designed data collection process will support not only the quality of the decision but also the application of a certified process.

Most asset management strategies attempt to achieve an optimum balance between investment return and stakeholders’ values. Risk decisions are based upon two things: the judgment of the acceptable risk and the (assumed or predicted) performance of the asset string concerned. To be fully informed about the string of asset performance capabilities, CBM tasks are applied intensively. A PFM technical analysis of the asset string (such as FMECA) creates a standardized and human independent environment for the identification and execution of condition measurements and CDM.

Page 66: 000000000001010555

Effectively Using Data for Risk Analysis

7-4

The drivers that stimulate the asset manager to show maximum and continuous effort are:

• The requirements set by utility executives

• The challenge and possibility to be creative in the translation from data to information

• The desire to develop a solid information basis for making optimal decisions

Risk Decision Process

The secret of a well-designed and applied risk management process lies in finding the optimal balance of value-added service versus risk taken between shareholders, regulators, employees, consumers, and society. Risk decisions are made by defining alternatives in terms of financial, quality, and environmental impacts. Based upon the risk analysis, control measures, and execution of actions, stepwise actualization of improvements occur in a more or less continuous process.

Risk management implies that decisions are based not only upon realizing the agreed-upon reliability targets or the profit promised but also the desires of other stakeholders. All decision aspects are based on economic needs, technical requirements, environmental impacts, and customer and employee values. The situation can be regarded as a gearbox relation of cogwheels as given in Figure 7-2. The cogwheels represent the same functions in both diagrams and show the mutual influence of each other’s position. Turning the “maintenance activity” wheel will influence all other positions, and a careful balance must be found.

Figure 7-2 Gearbox Approaches to Balancing Stakeholder Needs

The issue of risk management is a constant and complex theme for the asset manager. One way to perform risk assessments is to separate observations and studies into two separate activities, one for a system and one for components, using reliability management as the linking process. This approach eases the handling of data and distinguishes between the system risk and asset failure mode while guaranteeing the interaction between supply/demand and decisions regarding

Page 67: 000000000001010555

Effectively Using Data for Risk Analysis

7-5

the execution of maintenance activities. The asset manager basically applies a step-by-step decision process as shown in Figure 7-3. The decision process again is separated into a risk (network) and a condition (component) orientation with reliability management as the link between both. A corporate level is added, addressing the risks involved in items like safety, corporate image, and environmental damage.

Based on the information available (external, financial, and asset information sources), different impact assessments are executed. Regarding the first two levels, “asset” and “system,” the proper information is collected in a process as shown in Figure 7-3 (right). This process shows the steps taken from data input up to the decision-making level. It strongly emphasizes the necessity of a data warehouse approach.

The effectiveness of making use of data is strongly influenced by approaches as shown in Figure 7-3 (left). Reliability management aims to identify the processes of degradation while describing and quantifying the processes’ effects on the availability of a system. It opts to combine the (expected) condition of the system elements with financial consequences, such as investments, needed to realize required availability. Weak results may lead to a change in the design of the system as well as a replacement of a component or another maintenance strategy being applied. Taking into account societal information, one reaches the decision at the “corporate” level, and the risk management process is completed. Items such as utility image, environmental impacts, and personnel safety (as well as failure consequences at the societal/strategic level) are taken into account by this model at the “corporate” level.

Figure 7-3 Risk Decision Process

Page 68: 000000000001010555

Effectively Using Data for Risk Analysis

7-6

The described impact assessment approach necessitates three elementary types of information:

• Feedback from the corporate level:

– Analysis reports of effects from realized investment and upkeep activities on the corporate strategic level as related to stakeholder satisfaction

– Reports on environmental consequences

– Possible safety hazards for personnel and the environment

• Feedback from the system level:

– Information on interruptions

– Quality of the supply to customers

– Energy not supplied

• Feedback from the component level:

– Information on equipment failures and measured conditions

– Maintenance activity

– Operational activity

Risk Assessment

Based upon the theoretic model of the risk decision process described in Figure 7-3, the risk assessment approach is more practically visualized in its pre-design schema as shown in Figure 7-4 and in Figures 7-5 and 7-6.

Simply said, and in an asset life-cycle management process, the asset manager’s main objective is to find out which assets need to be maintained, which need to be replaced, and which can be ignored for a while. Then, given that information, a decision must be made about the order in which the chosen actions should be executed to obtain optimal system, societal, and financial results.

Roughly translated, the asset manager has to keep risk to an accepted level. The asset manager has to determine which assets have had too much money invested in them (and a decrease would be acceptable) and which assets are at high risk and money should be spent on. With respect to Figure 7-4, more money should be spent on assets that are to be positioned in the top right corner of the figure and less spending is required on the assets that can be positioned in the bottom left corner. After making this investment decision, the asset manager can determine which actions are reasonable and appropriate—refurbish, renovate, replace, or redesign.

Page 69: 000000000001010555

Effectively Using Data for Risk Analysis

7-7

Figure 7-4 Asset Grouping by Risk and Performance Expectations

The action-type decisions are based upon analyzing the different solution scenarios and determining their specific economic impact and the value added to other stakeholders. Table 7-1 exemplifies the use of a tool for evaluating several different action scenarios.

Table 7-1 Scenario Approach

Scenario Process Decision Evaluation Matrix with Stakeholders’ Preferences

Scenario Process Alternative Flexibility

Score Safety Score

Reliability %

Durability Score

NPV/EVA $

1 A ++ + 99.84 + X

1 B - + 96.70 - Y

2 C + ++ 99.00 + Z

2 B - + 96.00 - W

In this example only one hard data element is present—the costs related to the scenario involved, expressed as net present value (NPV) or economic value added (EVA). The EVA approach not only takes into account the life-cycle costs expressed in present money (that is, NPV) but also evaluates consequences of loaning money (weighted average costs of supplied capital), corporate revenues and costs, and long-term consequences of changes in corporate rating. Normally, the cheapest solution has the preference of the shareholders, although flexibility scores can also influence their desires. The flexibility scores express the possibility of extending the actual decision to a later period or allowing for additional options on future decisions. The other stakeholder preferences are covered by a quality element in safety (both for personnel as well as the environment), reliability/availability (the situation with respect to redundancy and asset performance), and durability (mainly environmental). Presenting and assessing alternatives in this way objectifies the decision-making process, although it is still partly subjective.

Page 70: 000000000001010555

Effectively Using Data for Risk Analysis

7-8

An Effective Assessment Model at the Network/System and Asset/Component Levels

In present utility settings, there are numerous data entities stored in databases. On many occasions, it is recognized that decision-supporting data were available but not known at the moment of decision. This situation also implies that many data entities are collected and stored for no reason at all. It is the intention of this section to give guidance on the process of deciding which data are relevant and important to collect and store in databases. To meet this objective, the approach, as depicted in Figure 7-4, is described in more detail in Figure 7-5 and Figure 7-6. Besides further detailing the risk-performance model, the separation of the system/network level and the asset levels is introduced. This separation results in splitting the discussions between the more theoretical system/network circumstances and the more practical, oriented asset needs.

It is on the corporate/system level that the influence of the different stakeholders becomes relevant and the first step of deciding which information is relevant is taken. Stakeholders are not interested if an asset operates well, they are interested only if energy supply brings what was promised—safe and environmentally acceptable energy, profit, and costs.

Figure 7-5 shows a nine-field matrix where the classification of risk depends on the relation of consequences and probabilities of occurrence. The latter is roughly distracted from the susceptibility for failures of a circuit, asset, or subsystem. The consequence axis aims to take the different stakeholder categories into account. The approach forms the basis for a number of (commercially available) tools commonly used by some utilities. For reasons of understanding and learning, this section describes the basic approach (including an example). Once understood, an internally developed, spreadsheet-based, software support tool will provide most engineers and analysts with good service. Of course, one can make a more detailed matrix (containing 16, 25, or even 36 fields) in order to further detail the decisions to be made.

Page 71: 000000000001010555

Effectively Using Data for Risk Analysis

7-9

Figure 7-5 Risk-Based Failure Consequence and Probability Matrix

In the approach, one can recognize the following:

• On the X-axis, the introduction of the different consequences for the different stakeholder areas is apparent. Each (stakeholder) consequence class (low, medium, high) has its own measurable level per stakeholder category. The asset manager decides under which circumstances the different consequences lead to a certain consequence class. A situation with physical long-term effects is easily judged; a combination of medium-classed “medical treatment” and “negative PR” is more difficult and will depend on utility objectives (or history).

• On the Y-axis, the concept of circuit, network, or asset susceptibility is introduced. Susceptibility classification is almost an intuitive process, taking into account the average asset age, the level of redundancy, the influence of polluted/salted air, aggressive soil, distance to sea, and the influence of sand or ice.

By ranking susceptibility and consequences for (all) stakeholders, including penalties and constraint costs, a non-critical, critical, or vital situation is judged.

Although more rankings are possible, it is the experience that a three-level/nine-field approach is already quite complex and results in sufficient sensitivity. With respect to “effectively making use of data,” the figure forms the basis for the choice of which data entities are of relevance to measure at this level. These entities are not limited to this example; in many situations, an

Page 72: 000000000001010555

Effectively Using Data for Risk Analysis

7-10

intense discussion between experts (the so called Delphi method) will give access to the relevant items to collect.

The next step is the transformation of this knowledge into the actions to execute on the assets being part of the circuit or system part. This is shown in the matrix given in Figure 7-6. The judgment regarding the “failure consequence class” is placed on the risk axis (Y) of the asset-directed activity matrix. Accordingly, this risk (failure consequence class) is weighted against the condition or performance of the asset. The resulting matrix field gives insight into which decision should be taken regarding maintenance and/or reinvestment priorities. The two matrices are thus linked via the consequence judgment “vital,” “critical,” or “non-critical.” The risk analysis matrix in Figure 7-5 is related to the activity matrix of Figure 7-6 by stakeholder consequence management. The basic idea is to translate the network failure probabilities and consequences for stakeholders into decisions at the component (asset) level. The matrix in Figure 7-6 also provides information about the data needed at the asset level—a qualitative indicator giving information about the condition of the assets belonging to a circuit or subsystem.

Such an analysis is normally based upon the results of PFM technical activities. In ideal situations, a data mining process using all relevant asset information, preferably set against a larger population of measured data from identical or comparable asset types, improves this judgment.

Figure 7-6 Asset-Directed Activity Matrix

The condition class given in Figure 7-6 has two components. One component is given by the PFM-based maintenance and condition measurement/assessment and referred to as a quality or asset health rating of 1, 3, or 5 (or whatever is practiced). The other figure expresses the failure history of the asset (corrective maintenance). If less than one corrective measure per year was

Page 73: 000000000001010555

Effectively Using Data for Risk Analysis

7-11

necessary, the asset has a good judgment, less than three per year is medium class, and more than three per year is classified as bad. Of course, the setting of the figures depends on the opinion of the organization.

By deciding to what level of failure consequence class a circuit, asset, or system belongs, one can determine which action must be executed to bring the asset into the necessary performance level. If the quality level is too high compared to the failure consequence class, one can decide to decrease investments in maintenance. On the other hand, if the performance is too low given the failure consequences, a decision to increase maintenance or even to refurbish or replace it might be logical. Alternatively, one can decide to interfere in the susceptibility class of Figure 7-5 and lower the probability (risk of occurrence) by investments in redundancy or age.

Example Using the Assessment Model

To better explain the previous approaches, an example from a utility practice is presented. The following sections describe the example, using the assessment model:

• Pre-service information

• Information regarding service life

• Analysis of the situation

• Decision based upon analysis

Pre-Service Information

In the early 1980s, a 12-bay, double-busbar, gas-insulated substation (GIS) switchyard of the latest generation was placed into service. The 12 bays were divided over two (8/4) busbar sections, according to the single-line diagram in Figure 7-7. The busbar sections were separated by longitudinal sectionalizers, and only the 8-bay section was equipped with a coupling bay. The switchyard was connected with circuit breakers to another older GIS through another busbar system. Although the system was operated at 150 kV, the design was based upon 300 kV (at higher gas pressure).

Page 74: 000000000001010555

Effectively Using Data for Risk Analysis

7-12

Figure 7-7 Single-Line Diagram and Gas Sectionalizing Compartments (Inlay)

The installation was connected to one generator, three 50-kV power transformers, three connections that incorporated the substation into the 150-kV grid, and two cable connections that fed a large industrial facility. The inlay of the single-line diagram shows the complex gas sectionalizing arrangement.

During installation and acceptance testing, some high levels of partial discharges (PDs) were detected, and some voltage transformers produced excessive noise. These problems were corrected by the supplier, and follow-up testing was performed to the utility’s satisfaction.

Page 75: 000000000001010555

Effectively Using Data for Risk Analysis

7-13

Information Regarding Service Life

Routine PD tests were performed periodically through a special connection and a special GIS testing transformer. During these testing periods, it appeared that it was very difficult to take part of the system out of service. This was caused by the arrangement of the different feeders/grid connections, transformers, and industry connections over the busbars. Another cause was the requirement, from a safety point of view, that it was not acceptable to have rated power on one side of an isolating point and a test voltage on the other (which is the standard situation using single-gap isolators). The routine tests did reveal a relatively high level of PDs. In the first few years of its service life, these PDs were found in the circuit breakers and voltage transformers; in the later period, PDs were also found in the busbar systems. Further investigations indicated that there were probably two causes for this:

• The mounting construction was designed incorrectly and had to be adapted during erection.

• The cleaning of the internal parts was not executed at normal quality.

The first cause resulted in high mechanical stresses on the circuit breakers, voltage transformers, and current transformers during switching operations, causing loose particles to move and insulators to be damaged. The second cause—probably based upon the idea that cleaning was not so critical because of the higher rated (design) voltage—resulted in the presence of many particles in the system. In practice, a number of internal breakdowns appeared over the first 10 years of operation. Damage of gas-tight insulators, in some cases, could result in a personnel hazard due to explosion after opening a compartment for repair. The gas compartmentalization, resulting from logical requirements, made it impossible to open a single compartment if opposite parts were in service or if neighboring compartments were at rated pressure. As a result of this design, small repairs required a significant bus outage.

Analysis of the Situation

Modeling susceptibility for this situation using the upper part of the risk-based failure consequence matrix shown in Figure 7-5, one initially concludes that there was a reasonable level of redundancy (double busbar, two busbar sections, connection to another switchyard, 50-kV supporting grid, and higher rated design voltage). In practice, however, this appeared not to be true because of the distribution of the different incoming and outgoing bays over both busbar sections. Moreover, the configuration (generation plant, heavy industry) did not allow maintenance to freely choose the maintenance periods for inspection and performing diagnostic and condition monitoring tasks. Although the heavy industry and sea coast were nearby (the GIS was mounted in a building), pollution was not an issue nor was the age of the installation. Apart from the busbar configuration, the installation of the system was critical with many (potential) fault situations, and thus the susceptibility was rated as “high.”

Page 76: 000000000001010555

Effectively Using Data for Risk Analysis

7-14

Applying the lower part of the risk-based failure consequence matrix found in Figure 7-5 to identify risk from a stakeholder point of view, one can argue from a technical point of view that the mechanical forces experienced during operation caused a considerable increase in the aging process as evidenced by:

• Cracks in insulators

• Loosing bolts in voltage transformers

• Loosing bolts in parts of circuit breaker constructions

• Short circuits in primary circuits of current transformers

• Internal breakdown from loose free particles

This situation led to, at minimum, the judgment “medium aged” after only 10 years of service. There was also an incident where experienced personnel were almost injured during the opening of the system when an insulator disk broke. This created the possibility of an internal breakdown during routine operation, and thus the health and safety index was set to “medium” (with a strong tendency to set it to “high”). Societal damage and image impacts were rated as “severe” because the loss of production in the heavy industry had significant regional economic impacts and would definitely tarnish the utility’s image. Finally, the switchyard was very clearly situated in a “heavy industry” area, also the highest category. Including all the specific consequences for stakeholders resulted in a failure consequence class rating somewhere between “critical” and “vital,” as shown in the left matrix of Figure 7-8—a rather harsh rating for such a young installation.

Figure 7-8 GIS Example, Transforming Information into Consequence and Activity Matrices

Transposing this same information to the asset-directed activity matrix found in Figure 7-6, it is now essential to have a proper assessment with respect to the expected performance of the switchgear. One of the measurable elements is the number of forced outages per year that are caused by the system and followed by corrective maintenance. In the period of analysis, this was between one and three failures per year—sometimes minor, sometimes major. Although no complete PFM technical analysis was applied, standard maintenance activities such as functional testing (for protection and control), PD measurement, and breaker opening and closing velocity measurements were applied regularly. This information revealed that, apart from the critical

Page 77: 000000000001010555

Effectively Using Data for Risk Analysis

7-15

situation with respect to PDs, large time delays for circuit breaker pole openings were also routinely observed. Theoretically, this could have caused overvoltages at transformer terminals of the connected 150-kV transformers and could have led to more excessive damage. The general conclusion (see the right matrix in Figure 7-8) was that the performance of the asset was poor and, even worse, could not be improved because of limitations for taking it out of service for refurbishment.

Note: Where Figure 7-8 gives a more or less qualitative approach, a quantitative analysis based upon the same example will be described later.

Decision Based Upon Analysis

As shown in the analysis, the utility clearly had a situation that needed to be rectified. A station with a very critical/vital failure consequence classification had a very low performance rating. Based upon a discussion like the one shown in the decision evaluation matrix of Table 7-1 and given specific situations with respect to expansion as well as decreasing prices of new switchgear, a dramatic decision was made to replace the switchyard completely and install a new installation (after only 20 years of service). The lessons of the previous situation were taken into account. Sectionalizing was optimized over feeders and large customers, mounting design was scrutinized and improved (same building), and enhancements were made to facilitate the testing and maintenance of the bus during service.

Technical PFM Interaction

Given the approach described in Figure 7-6, performing a PFM FMECA becomes necessary. Properly applied, this analysis provides information on the current and expected performance of an item. A specific FMECA approach, based upon functional analysis, may help set the proper actions for the improvement of performance. This approach is shown in Figure 7-9 where switchgear is analyzed. This approach can also be applied to all other asset types and is thoroughly described in a previous section of this report. The type of FMECA presented in Figure 7-9 is not based upon collecting all possible data but focused on the potential hazards or defects that can occur. It is mainly based upon answering questions such as:

• What are the main functions of the asset?

• How can the internal functional relationships be described?

Page 78: 000000000001010555

Effectively Using Data for Risk Analysis

7-16

Data needs are thus limited compared to the more detailed, parts-oriented PFM technical analysis.

Figure 7-9 PFM Technical Analysis for Switchgear

The approach forces one to judge the specific outputs that interact with the different functions, to validate them, and to describe the risk of occurrence and the effect of failure on the system as a whole. The end result is a set of condition measurements and preventive actions that describe the current quality of the asset and its expected performance.

The PFM technical analysis is an important part of risk analysis. A simple example of an FMECA is provided to strengthen these relationships and concepts. The FMECA approach is an inductive method of performing a qualitative asset reliability or safety analysis. It addresses the failure behavior and criticality of a specific stand-alone asset (one that is not related to system). The FMECA study also forms the necessary basis for deciding which measurements can be of use in a CBM process.

The FMECA process determines the critical parts of an asset. Logically, each type of switchgear will have its own FMECA, based upon the functional division, as shown in Figure 7-7. Addressed subsystems for this example include:

• Mechanical

• Dielectric – primary

• Control – secondary

• Stored energy system

Page 79: 000000000001010555

Effectively Using Data for Risk Analysis

7-17

Each subsystem has its own unique primary function and contains all components that provide this function. For example, the dielectric subsystem contains all components such as oil, epoxy, SF6 gas, and insulating material that provides the insulation of the high-voltage (HV) parts of the switchgear to the ground. The inventory of components and failure information is achieved by brainstorming sessions with technicians, by analyzing event reports and archives, by visiting maintenance activities, and through discussions with manufacturers and (preferably) other users.

All components and their failure behavior are weighted by the four criteria as described hereafter. All criteria are defined from a technical and a more practical point of view. As the risk element is already covered (or will be) by the described approach, the FMECA does not consider this. Consequences at the stakeholder level are thus not taken into account, which significantly eases the analysis. The score for each component is achieved by using the Delphi method in brainstorming sessions and is used to estimate the necessity to address the possible hazard by maintenance or design action, either preventive or predictive. Of course, these actions give, at the same time, a good impression of the asset quality/expected performance. The abbreviations and possible values that are used in Table 7-2 are as follows:

• Failure frequency (F)

– 0 = No failure has occurred yet

– 1 = Failure incidentally occurs (< once a year)

– 2 = Failure frequently occurs (> once a year)

• Impact of failure on energy delivery (I)

– 1 = No impact

– 2 = Has impact only on the circuit that is directly connected to the switchgear

– 3 = Has impact on more than just the circuit that is connected to the switchgear

• Corrective costs (C)

– 1 = None

– 2 = < $5,000

– 3 = > $5,000

• Environmental impact (E)

– 1 = Technical impact only

– 2 = Impact on personnel and environmental safety

The final score for each component is achieved by multiplying the scores of the four criteria. The weight factors depend on a company’s philosophy and are changeable. For example, in this case, the safety impact of a failure weighs twice as much as the technical impact but could be higher if a utility has safety as a key utility objective. Another example is the failure of a protective relay, which has no impact on safety but has consequences for the circuit, which is connected to the switchgear.

Page 80: 000000000001010555

Effectively Using Data for Risk Analysis

7-18

Finally, a classification should be determined according to the utility’s ability to detect a deviation, to measure the condition, or to replace the part causing the problem. A component should be replaced on a regular basis if its failure mode is not detectable or worthwhile to measure. With respect to the detectable and diagnosable critical components, an optimized inspection and diagnostic program can be determined, including the specific inspection points and measurement types.

Table 7-2 shows an example of an FMECA for a switchgear control (secondary) subsystem. The main function of the control subsystem is to protect and operate the switchgear at the right moment. Components of this subsystem can be defined as necessary parts that contribute to this main function. The failure description of the component describes each situation that deviates from the component’s healthy situation. In this example, three deviated situations are described for auxiliary contacts as part of the switchgear control subsystem. Oxidation, burned contacts, and loose wiring are deviated situations. These deviated situations are defined as touchable and visible.

The root cause must have a causal relationship with the deviated situation. For example, low-contact pressure for the auxiliary contacts is not a deviated situation until it damages the contacts. Only if low-contact pressure results in a burned contact is there a causal relationship between low-contact pressure and the burned contacts. In this case, the low-contact pressure is defined as the real cause of the burned contacts as a deviated situation. The effect of the deviated situation has to be defined from the point of view of the function of the component within the subsystem. In this example, the moisture causes the oxidation of the contacts, which might cause a malfunction or even inoperability of the switchgear. Another example is illustrated by the failure of the trip coil. The trip coil is used for translating the control command into the operation of the mechanical drive mechanism. A bad moving plunger in the trip coil is a well-known failure caused by oxidation of the plunger. As such, the plunger is not the deviated situation, but the oxidation caused by a high moisture level is considered the deviated situation. The effect is deviated or delayed switchgear operation.

Page 81: 000000000001010555

Effectively Using Data for Risk Analysis

7-19

Table 7-2 FMECA of the Switchgear Secondary (Control) System

FMECA for Secondary Subsystem Switchgear Type

Weight Factor Component Deviated

Situation Cause Effect F x I x C x E = W

Measurement/Maintenance

Measurable Quantity

Oxidated contacts Moisture Refused

command 2 3 2 1 12 Visual inspection Coil time

Burned contacts

Low-contact pressure

Refused command 1 3 2 1 6 Trip coil

measurement

Oxidation level wiring connection

Auxiliary contacts

Loose wiring Vibration Refused

command 2 3 2 1 12

Burned coil

Deviated powering

Refused switchgear operation

1 3 1 1 3 Visual inspection Coil surface

Trip coil

Oxidated plunger Moisture

Deviated switchgear operation

2 2 1 1 4 Trip coil measurement

Coil time and curve

Within this framework of conceptual and practical thinking, all components must be analyzed according to the subsystem division. The weight factors are determined for all components as described. Depending on the score on each of the four criteria, different failures on the same component can have different final weight factors. For example, a deviated situation of auxiliary contacts has low impact on environmental safety but might cause a trip failure in the case of isolating a damaged power cable. In this situation, the damaged power cable will be switched off by the backup breakers, resulting in the switching of an unnecessarily large area caused by an unresponsive switching command. For this reason, a deviated situation of the auxiliary contact has impact on more than just the control circuit that is connected to the switchgear.

Based on the classification of the components and their associated weight factors, the necessary maintenance activities can be determined. The deviated situation of auxiliary contacts can be detected by visual inspections, functional tests, timing measurements, or the trip coil resistance measurement. From a technical point of view, the information entered into the “measurement/maintenance” and “measurable quantity” cells describes the total maintenance and measurement activities for the specific type of switchgear, except for their frequency. The frequency for applying these maintenance or measurement activities depends on the component’s aging process and the accepted level of risk. In Table 7-3, the main function of the driving subsystem circuit breaker is to store energy for operating the main contacts and actuating the circuit breakers’ main contacts with the right acceleration and velocity. The functional demands of the drive system circuit breaker are to supply the right amount of stored energy to achieve the right switching curve.

Page 82: 000000000001010555

Effectively Using Data for Risk Analysis

7-20

Table 7-3 Part of the FMECA Drive System Circuit Breaker

Driving Subsystem Circuit Breaker (CB) Type

Weight Factor Component Deviated

Situation Cause Effect F x I x C x E = W

Measurement/Maintenance

Measurable Quantity

Closing springs

Decreased spring energy

Springs overwound for a long period

Slow closing or failure to close

1 2 3 1 6

CB timing; minimum energy check; visual inspection

Speed and contact time; CB operates in minimum energy position; distance between spring housing and spring deviated compared to other CB/OK/wrong distance but repaired

Motor Burnout Overheating of windings

Closing springs are unwound; motor failed to operate

1 2 2 1 4 Functional test

Motor operates/does not operate/does not operate and placid

Motor Broken brush caps

Aging or overheating of holder

Closing springs are unwound; motor failed to operate

1 2 2 1 4 Visual inspection

Broken/unbroken but replaced

Gearing Broken tooth

Jumped mechanism; loose mounting of the motor

No winding of the spring; strip the gear

1 2 2 1 4 Visual inspection

Broken tooth/OK/broken tooth and replaced

Spring winding gear/worm gear

Incorrectly greased

Grease supplied to the spring winding gear

Disruption of self lubricating system

2 2 1 1 4 Maintenance Remove grease

Spring housing Loose bolts Aging/

vibration

Failure to wind the springs

0 2 1 1 0 Visual inspection

Tight/loose but tightened

Close latch cam Stiff roller Drying of

grease Failure to close 0 2 2 1 0 Functional test Free/stiff but freed

Closing latch block

Stiff latch block rollers

Drying of grease, incorrect use of grease

Failure to close 1 2 2 2 8 Functional test Free/stiff but freed

As will be shown later, the PFM technical analysis results are of major importance for the collection of condition/performance asset data in a way that human interference is avoided maximally. This avoidance of individual and subjective opinions is necessary to make optimal analyses of situations.

Finally, and just to show that the approach is not limited to switchgear or circuit breakers, a model for transformers is shown in Figure 7-10.

Page 83: 000000000001010555

Effectively Using Data for Risk Analysis

7-21

Figure 7-10 PFM/FMECA for a Transformer

Practical Application of the Risk Assessment Approach

The approach, as described in the previous section, should be applied within asset management departments. Applying this step-by-step process is very educational for understanding of the difficulties met while implementing risk-based asset management. The described approach stimulates the constant awareness that a sustainable asset management process considers both technical and economic as sociological and ecological/environmental aspects. A balance must be found between all stakeholder values and requirements. The achievement of effective asset management and a good service level is nothing more or less than finding a proper balance in the satisfaction of all, often conflicting, stakeholder needs. Unfortunately, senior utility management often faces a lack of well-prepared information, skills, and decision-supporting tools. In some instances, this may lead to polarized strategies, which fail to satisfy the previously mentioned objectives of sustainability and overall stakeholder satisfaction. Corrective measures (that is, from regulatory or other bodies) that the asset manager would prefer to avoid, become necessary. The information in this section is aimed at dealing with the challenge: how to change data into information that can support the decisions relevant to the asset manager.

The Decision Process Considering Various Scenarios

A structured approach that assesses data into asset decision information following the three decision levels of Figure 7-3 is given in Figure 7-11. A step-by-step approach—starting with information regarding asset level, followed by influence on system/network level and consequences at corporate level—is described.

Page 84: 000000000001010555

Effectively Using Data for Risk Analysis

7-22

Figure 7-11 Information Flow and Processing

Based upon this detailed approach, multiple scenarios can be found to influence the performance at all three levels—increasing or decreasing maintenance and inspection intervals, replacement or refurbishment, and changing the maintenance strategy. Also, a combination of technical information with relevant economic data and future system performance will result in quantified benefits. Benefits are not exclusively expressed in economic terms, but also in terms of reliability. On the corporate level, balancing the cost and benefits of the scenarios with the risk involved with each scenario will result in the strategic decision that has the best-managed risk. On this level, all stakeholders’ expectations can be taken into account when they are formulated as business values, resulting in a set of performance indicators to give expression to those expectations.

The societal and economic information, together with the reliability of the equipment inventory and the system performance, form the ingredients of the risks involved. Balancing the risks will lead to the final decision.

Page 85: 000000000001010555

Effectively Using Data for Risk Analysis

7-23

The asset management decision process that is supported by this information flow model consists of three assessment types, which, if applied correctly and completely, support decisions while taking account of the system-oriented risk and the asset-based activity.

Assessment Steps

Based upon previous statements and having developed an approach for network/asset risk assessment (see the matrices in Figures 7-5 and 7-6), a controllable assessment process has to be started. This process is based upon dedicated assessments that will ultimately lead to a proper decision. Each assessment step is defined as follows.

Susceptibility Assessment

A susceptibility assessment focuses on topics such as:

• Distance to sea

• Polluted air

• Specific soil situation (for cable connections)

• General age of circuit or asset

• Level of meshed network

• Level of redundancy

Consequence Assessment

A consequence assessment should cover all mentioned aspects shown in Figure 7-6, including:

• Economic damage, including penalties and constraint costs

• Aging

• Societal assessment

• Issues such as utility image, safety, and environmental hazards

• Relationship with respect to number and type of customers

The combination of susceptibility and consequence assessment will lead to a proper failure consequence class (criticality level) that, in turn, is weighted against the results of the technical and economic assessments.

Page 86: 000000000001010555

Effectively Using Data for Risk Analysis

7-24

Technical Assessment (of Expected Asset Performance)

The technical assessment will be the basis for a decision on the asset level and focuses on topics such as:

• Technical remaining life

• Condition of the asset

• Dielectric strength

• Condition of the mechanical controls and parts

Data mining techniques are applied to generate the best possible knowledge about the norms and criteria used for judging the quality level of the asset. Aging/degradation processes can thus be monitored, which might influence both the susceptibility of the circuit for disturbances and the condition of the specific asset for PdM and CDM activity decisions. This process will improve significantly if asset condition information is shared between asset owners.

This process will lead to a positioning in one of the fields in the activity matrix and will be followed by an economic assessment to make an optimal decision with the best overall effects.

Economic Assessment

The expenses related to the operation of the asset are reviewed. Life-cycle costing (LCC) is one way of looking at the economic settings of an asset. Other important topics that are taken into consideration in this stage are:

• Depreciation

• Weighted average costs of capital (WACC)

The benefits related to the operation of the asset are considered in the approach as illustrated in Table 7-1 in both financial as well as more practical values for all preferred scenarios.

The Decision Support Model – Functionality and Structure

Based upon the discussions in the previous sections, a model, as shown in Figure 7-12, can be applied. Although basically a transformation of the approach, as described in Figure 7-5 and Figure 7-6, it also takes into account the detailed decision process model, as shown in Figure 7-11.

Page 87: 000000000001010555

Effectively Using Data for Risk Analysis

7-25

Figure 7-12 Five Steps Decision Flowchart for Asset Management (AM) Decision

The process itself starts with a profiling phase to get a better picture of the stakeholder’s requirements. In the decision stage, it will become clear how well the advice complies with the objectives, given the set of stakeholder demands and wishes. Apart from the very important analysis of the stakeholder’s expectations, an analysis of the environmental situation and system development stage should clarify the susceptibility of the judged system/asset. Both analyses are partly characterized by “fuzzy” elements and thus are strongly dependent on the engineer’s or manager’s opinion regarding these subjects.

During the analysis of the (expected) technical performance, the asset is analyzed by assessing the influence of the condition measurement results and the historic information regarding

Page 88: 000000000001010555

Effectively Using Data for Risk Analysis

7-26

failures. If relevant industrial failure data can be used here, assessment based upon a larger data population of the same asset type is, however, preferable. A very practical way of approaching a not-too-detailed analysis is to introduce the categories bad, medium, or good (or in figures 1, 3, or 5). The same approach can be used with the economic and societal assessment, although this will be strongly influenced by utility objectives. The set category division is open for changes and to be adapted by influences coming from the specific situation. In the final step, the position of the asset in regard to its necessary/expected performance is clarified. The analysis stage after that position is confirmed is based solely upon the level of value added to the stakeholder’s requirements, as shown in processes described in Table 7-1.

Because it is beyond the scope of this report to describe the complete scheme, a single example (see Figure 7-13) is given based upon the assessment of the susceptibility class of a cable circuit. To guarantee a certain quality of service level, some redundancy is applied. To assess how well redundancy is implemented in the case of cable connections, two measurable entities are taken into account:

• The time needed for switching of the short-circuit, τ

• The location of the asset within the circuit

The range of the three categories of the τ-parameter was defined by a Delphi-method-based obtainment of data:

• τ ≤ 1 ms category 5

• 1 ms < τ ≤ 1 s category 3

• τ > 1 s category 1

The categories of the other parameter are defined by the three grid configurations that are commonly used:

• Radial configuration category 1

• Loop configuration category 3

• Meshed configuration category 5

The evaluating method used to obtain a value for the redundancy factor is displayed in Figure 7-13.

Page 89: 000000000001010555

Effectively Using Data for Risk Analysis

7-27

Figure 7-13 Flowchart for Determining Redundancy Factor

The following equations show the decision rules that can be applied and the results in the redundancy factor (RF):

11};5,3,1{

3;1

21

21 =⎭⎬⎫

=∈==

RFQQ

QQ Eq. 7-1

33;35;1

21

21 =⎭⎬⎫

====

RFQQQQ

Eq. 7-2

55;55;35;1

21

21

21

=⎪⎭

⎪⎬

======

RFQQQQQQ

Eq. 7-3

factor redundancy (Value) : ionconfiguratNet :

)( perioddown service theofDuration :

2

1

RFQQ τ

These decision rules can also be expressed in text in the following manner:

• A mesh configuration results in a redundancy factor of 5, as long as the switching time is 1 second or less.

• A radial configuration results in a redundancy factor of 1, independent of the switching time.

By addressing the other factors (such as environmental conditions and age) in the same way and by deciding about the mutual weight, ultimately, a circuit or subsystem falls into one of the three

Page 90: 000000000001010555

Effectively Using Data for Risk Analysis

7-28

categories: high, medium, or low. Applying the same systematic approach to the stakeholders’ requirements will give comparable results. A weighting of the situation into “vital,” “critical,” and “non-critical” is the result of the complete analysis.

Decision Model Supporting Effective Use of Data

Based upon the previously described approach, a decision support system can be applied, either commercially purchased or internally developed. All these tools are more or less based upon the previously given description and need the following as input:

• Specific parameter data, based upon a well-designed and detailed discussion with regard to the measuring parameters

• A pre-design decision regarding the different weights of the influencing factors

• A pre-design decision regarding inter-related weights of the different factors

The described risk and activity analysis is strongly dependent on the use of accurate data. The process identifies which data are essential for analysis and identifies the possibility of ignoring data that are not relevant.

Considering the division into technical, economic, and social/societal areas, it is best to decide which data to collect per area. In the first stage of risk analysis (see Figure 7-5), information regarding environment and relevant position to stakeholders is of importance, while technical information differs from the activity matrix (see Figure 7-6). Economic issues are taken into account later in the decision evaluation process for different scenarios. This stage can be seen as a discussion of “where and how” to spend money and will be strongly influenced by utility philosophies. It is essential that top management understands this process and does not limit its target to financial results alone.

In the second stage, the activity matrix, data covering technical information consider the condition/performance of the asset which, depending on the asset amount per type, can be improved by sharing data. Where other discussed analysis issues are partly influenced by subjective opinions, the technical assessment is to be made maximally objective.

The final outcome of the model is a set of advisable actions that can be of multiple characters. One direction can advise to improve asset quality and bring the asset to a less critical level in view of its position in the system. Obviously, this can be realized by replacement, refurbishment, or intensified maintenance. The other direction can accept the expected (limited) performance but aims at decreasing the susceptibility of the system/circuit and, as such, brings the asset to a less critical position. This could be realized by making better use of the connected (lower voltage) grid, by protection of the switchgear against ambient conditions (building), or by improving the redundancy situation.

As long as the decision attempts to balance all stakeholders’ requirements, the best decision regarding the circumstances will be taken. Sensitivity analysis of different scenarios will certainly support the decision how to spend the money best and in the best priority.

Page 91: 000000000001010555

Effectively Using Data for Risk Analysis

7-29

The Decision Support Model, a Practical Example

The GIS substation situation, as described earlier in the qualitative example, is used to give a further practical quantitative example of the application of the different models presented. The example follows the AM flowchart decision model found in Figure 7-13. The process as exemplified there can, however, be applied according to the following analysis stages:

1. Analysis regarding the susceptibility. In the assessed situation, age and pollution were not an issue (building), nor was soil quality. Redundancy, however, was considered to be very limited due to the restricted possibility of using the lower voltage grid, the specific design with respect to bay distribution, and the position of the longitudinal disconnectors. The susceptibility rating was also influenced by mounting difficulties experienced during the erection stage, and an overall susceptibility rating of medium to bad quality (4) was given (see Figure 7-14).

2. Stakeholder consequence analysis. The stakeholder situation was judged quite severely, considering:

• The possibility for SF6 pollution, both on personnel and the environment.

• The image consequences that were very negative and, principally, could mean loss of a major customer.

• From a societal point of view, the industrial area could not be harmed anymore; the area was considered the largest population of employers in the region.

• Aging processes in the specific equipment were very evident and could not be stopped due to the circumstances; moreover, the regularly occurring overvoltages had the potential of damaging/aging the connected transformers.

• Direct damages to the surrounding industries were in the range of $1–5 million.

As such, the situation was judged very seriously—level 5.

3. Failure consequence class. Susceptibility was rated medium to bad while the consequence category was set on high. This led to the assessment conclusion of “vital.”

4. Transposing the failure consequence assessment result (in this case, “vital”) to the risk analysis matrix provides a basis for determining the expected performance level. This expected situation is compared to the actual asset performance, based upon PFM-related condition measurements and analyses. In the assessed situation, the condition class was considered to be level 1 because of high PD levels and inconsistent pole velocities of circuit breakers.

Page 92: 000000000001010555

Effectively Using Data for Risk Analysis

7-30

Figure 7-14 Flowchart Example End Result

In the final analysis step, the result of the assessment becomes clear: one either has to do something about the asset quality or bring the situation to a lower failure consequence class. In this practical example, the position of the switchgear was clearly in the top right corner of the matrix and an action was urgently needed.

Page 93: 000000000001010555

Effectively Using Data for Risk Analysis

7-31

Table 7-4 Final Step: Decision With Action

Scenario Process Decision Evaluation Matrix with Stakeholders’ Preferences

Scenario Alternative Flexibility Class

Safety Class

Reliability %

Durability Class

NPV/EVA USD

1 A ++ + 99.84 + X

1 B - + 96.70 - Y

2 C + ++ 99.00 + Z

2 B - + 96.00 - W

It was apparent that the current layout of the busbar was inadequate and three designs were considered (alternatives A, B, and C). From a technical point of view, there were only two general options for each design: complete refurbishment or replacement (scenarios 1 and 2). While this left the possibility of six different solutions, only four combinations were practical. The four solutions for rectifying the situation are listed in Table 7-4. Although the NPV of all alternatives was low, the two different approaches associated with design “B” (scenarios 1 and 2, alternative B) were considered unacceptable because of lack of flexibility and low reliability. From the remaining design alternatives, which were basically both applicable because of complete erection alongside the original switchgear, alternative A won mainly because of flexibility.

Condition Analysis

It is obvious that the condition of an asset forms an important trigger when deciding which activity should be performed. Depending on the risk axis (failure consequence class), as shown in Figure 7-6 and as referred to in Figure 7-2, this is in fact the only steering entity/variable of the approach (maybe apart from redundancy). Information about the condition (present and future performance) can be collected from available industrial figures and relevant measurement data. Industrial data provide information about generic failure rates, minutes of energy not supplied, mean times between failures of specific switchgear principles/drives, and more. This data can be very valuable but do not inform about the specific condition of the specific device/asset. The latter information is based upon the earlier described PFM technical analysis, which identified the performance entities to measure and the norms to apply.

Consistency of Information

Deciding upon the performance data necessary to make a rigorous asset condition analysis is only part of the problem. Although this process is necessary to limit the quantity of data to those relevant for the decision, one should also guarantee the quality of the data. This means providing intensive support for maintenance technicians performing measurements and maintenance.

Page 94: 000000000001010555

Effectively Using Data for Risk Analysis

7-32

Maximum Quality of Condition Information

Determining the condition of an asset is not dependant solely on the local situation. One can use information from other assets as well, especially those assets of the same type and operational requirements, thus avoiding being too subjective in this part of the decision process. A maximum objective decision regarding asset performance is supported by broadly collected asset condition information, analyzed, and translated into knowledge rules and norms. Some commercially available software tools support this idea of a generic or generally applied data bank/database with the pure objective to store a maximum amount of measured condition data for analysis purposes. The basic idea is shown in Figure 7-18.

Figure 7-15 Exchange of Condition Data

As a result of this joining initiative, participants can expect a far better judgment that supports an improved decision regarding the risk-based activity to be executed on the asset.

Data Mining and Decision Support

In general, a data mining process involves the earlier stages of data selection and data transformation and the subsequent stages of validation and interpretation. Data mining aims to provide an alternative to a data analysis based upon hypothesis and theory. The idea behind data mining is to find intelligible patterns that are not predicted by the established theories. Formatting the output data in a visual form that human intelligence can interpret is important, especially the evaluation of existing experiences in condition monitoring, including validity and relevance of results.

The full potential of condition and maintenance information cannot always be realized by using traditional techniques of data handling and analysis. There are often underlying trends or features of the data that are not evident from the usual analysis techniques. Such details and trends can be important for the assessment of the equipment operation. Increasing requirements from operators and asset managers force utilities to fully exploit the capabilities of these data to optimize the use of an HV electrical plant. The method of extracting full value from such extensive databases

Page 95: 000000000001010555

Effectively Using Data for Risk Analysis

7-33

using new analysis techniques is commonly called data mining. In its basics, data mining is the application of relatively new data-driven approaches to find patterns in data obtained from (in this case) electrical equipment. The data mining techniques are then used to relate these patterns to the operational condition of the equipment and to provide new knowledge about aging mechanisms and norms and, as such, get in control of maintenance activities. In order to support the data mining process, the utility may apply the knowledge expert database in which all the measured data and failure information are stored and analyzed. Figure 7-16 shows an example of such an analysis tool that is based upon a data mining expert system. The general features of this approach will be shown later.

Figure 7-16 Example of Analysis Tool

Practical Example of Data Mining: Cable Condition Assessment

For the classification of the conditional quality of HV assets, rejection levels are necessary to determine the necessity of CDM actions or replacement. For example, a PD source is not always harmful to the asset’s insulation in the short term. To distinguish the condition state of an asset, decision support needs to be available. One way to develop this decision support is by using statistical analysis on large numbers of field measurement data. This will help determine

Page 96: 000000000001010555

Effectively Using Data for Risk Analysis

7-34

experience norms (rejection levels) for different diagnostic properties applicable for condition assessment of HV assets, in order to find a basis for rejection of an asset.

Figure 7-17 Schematic Structure of Data Mining Process

During routine maintenance activities, large amounts of condition-related data from different components are collected. These data can be used to perform data analysis on different levels. This data mining is carried out using a condition database, which contains all the information of the asset characteristics and its related diagnostic information. From the collected data in the database (see Figure 7-17), statistical distributions for analysis can be obtained for different diagnostic properties. Furthermore, between different diagnostic properties and component failures, correlations can be obtained that justify condition assessment actions. This process provides three outcomes (A, B, and C) of the data mining approach:

• Outcome A refers to new knowledge about aging mechanisms of power cable components.

• Outcome B refers to recommended maintenance activities on power cable components, resulting from the database analysis.

• Due to the large amount of measurement data stored in the software data system, operating norms and criteria are continuously updated and fed back to workers in the field as determined by result C.

When PFM technical analyses on power cables are performed, the result is that the major failure causes are related to damages of an external nature (digging activities in the ground) and internal insulation problems in the cable and its accessories. Analyzing the frequently occurring defect types and the degradation modes of the different insulation materials, the material degradation in the cable network can be categorized into four local degradation processes that are related to PDs. As a result, PD characteristics provide a sensitive parameter to detect degradation processes in the power cables. With PD diagnostics, the insulation defects can be pinpointed to a specific component of the cable system. The location of the PD sources along the cable length can be analyzed by time domain reflectometry.

Page 97: 000000000001010555

Effectively Using Data for Risk Analysis

7-35

Condition Analysis of Power Cables

Decision making for the condition assessment of power cables is mainly related to its dielectric condition where different PD properties have their relevant contribution. These PD properties exert their respective relation to insulation degradation of a cable system and its components, as shown in Figure 7-18.

These different PD properties should be taken into account for the determination of the insulation quality. Their effect on the degradation of the insulation is dependent on each PD property. To discern the contributions of different PD properties, different weights are assigned to each PD property.

In Figure 7-18, a general overview of PD measurements on power cable systems is given. These are rules of thumb that support the analysis of the measurement results. These decision criteria are based on the different aspects of the cable’s condition analysis from inspections (a set of PD properties obtained from a measurement). The knowledge rules contain the effect of PD properties on the insulation degradation determined by the detected PD property size and weight.

Page 98: 000000000001010555

Effectively Using Data for Risk Analysis

7-36

Figure 7-18 Relations of the Directly and Indirectly Analyzed PD Properties

Page 99: 000000000001010555

Effectively Using Data for Risk Analysis

7-37

For supporting decision making based on the insulation quality of a cable system, a flow diagram can be applied in which the different PD properties are used, as shown in Figure 7-19. By analyzing the derived measurement data following the diagram, the cable systems insulation condition is determined to be in one of these three classes:

• Not OK: Defective cable component (or length) in the cable system should be replaced, as multiple PD properties are outside the experience norms.

• Trending: Possible degradation, trending on the cable component is required (for example, 1 year or 3 years), as some of the PD properties are outside the experience norms.

• OK: No weak spots in the cable system. Cable system is OK, as none of the most important PD properties are outside the experience norms.

Figure 7-19 Decision Support Flow Diagram for PD Diagnosis

Page 100: 000000000001010555

Effectively Using Data for Risk Analysis

7-38

The order of the PD properties, as applied in the decision diagram, is based on the contribution and the recognition value of PD properties for degradation processes. On the basis of these weight factors, a decision can be formulated for the qualification of the insulation condition. As the cable system consists of a series of different components, the condition of a cable system is determined by its weakest link. So, by summing the condition of the individual components, the quality of a cable system can be indicated, for example, as “not OK,” “trending,” or “OK.”

This approach provides a defendable maintenance plan in the case of a critical decision. The measurement values and the norms can be compared on a per component basis.

Knowledge Rules

To find the rejection levels of different properties, visual inspections of the replaced cable components are often performed. However, it is very difficult to get the general relationships between the degradation symptoms and the forensic evidence. In most cases, when a cable component is inspected, only a limited view of the interior of the opened component is obtained. In order to open a component, destructive actions are needed that will influence the accuracy of the inspection. The analysis of the forensic evidence is often performed with the subjective judgments in the visual inspection. For example, it is not possible to do a visual analysis if the electric field at the location of a cavity in the insulation is high enough to ignite the PD. When a cable component is opened, usually more than one defect can be found. These manually constructed components often show additional defects. It is always difficult to visually determine the relationship between the found defects and the detected PD properties.

The relationship between the PD symptoms and the forensic evidence is mostly inaccurate. The solution for the analysis of a diagnosis of a cable system (or its components) is to make comparisons to other measurements results. This comparison can be performed in two different ways, as illustrated in Figure 7-20:

• Time analysis: Comparison of a PD property to the properties as obtained in previous measurements on the same object. If the trend of a PD property is negative (for example, increasing PD magnitudes), from the point of view of that property, the condition of that component is deteriorating. Trending of a power cable condition may be time and cost consuming but shows the advancement of a degradation process clearly. A trend line together with a norm level will contribute to a failure prediction.

• Type analysis: Comparison of a PD property to that same type of properties, as obtained for the total population of the same specific component type. If a PD property of a specific component is out of the range of the typical observations, this indicates that, from the point of view of that property, the condition of that component deviates from the normal condition of the total population. As a result, the statistical response of a PD property, as obtained in a measurement, can be compared to the true population of that PD property. To disclaim a sample belonging to a certain population, a sample is generally found outside of the 90% or 95% confidence interval of the true population.

Page 101: 000000000001010555

Effectively Using Data for Risk Analysis

7-39

Figure 7-20 Example of Time (Upper) and Type (Lower) Analysis

Both of the previously described analyses show that there is an independent way to determine diagnostic norms for a condition assessment. Tolerance levels are necessary to assess the deviation of a diagnostic property in relation to time or population. These tolerance levels can be determined independently by making statistical analyses for large amounts of condition data. From these large amounts of data, the statistical distributions can be obtained for the different diagnostic properties. The populations used for the statistical analysis should be representative of the total population of the full network. In order to make these statistical analyses of field measurements, a database is necessary for the consistency of the performed analysis.

Page 102: 000000000001010555

Effectively Using Data for Risk Analysis

7-40

Database for Condition Assessment Support

To support a condition assessment of distribution power cables, database systems are needed to manage the data from measurements and inspections. The database for management of measurement data is also used to perform statistical analysis from which knowledge rules and norms can be set. The system, as applied for the data collection and analysis in this report, consists of the following components:

• Module to input or define the cable data (such as structure and component types)

• Module to input measurements

• Module to analyze the data

• Database to store the data

Separating the functionality of the system in these four blocks has the advantage that extra modules can be added to store data in or to use data from the database.

The database system should be able to interact with its environment to be functional. To do so, four modules will be implemented as separate software programs, respectively called:

• Definition

• Input

• Measurement

• Analysis

A schematic structure of a diagnostics database is shown in Figure 7-21.

Figure 7-21 Schematic Structure of a Diagnostics Database

A set of diagnostic tools provides a number of algorithms for the indication of the condition of an HV component. Depending on the diagnostics, a choice has to be made as to which data should be collected in the database. In this respect, only the condition data that can provide us

Page 103: 000000000001010555

Effectively Using Data for Risk Analysis

7-41

relevant information on the condition of an HV component and can be used for data mining purposes should be stored. Individually written textual remarks are very hard to make a comparison analysis with, but standard remark texts or numerical properties can be used for a variety of analyzing purposes.

Also, in the database, the different cable component types can be defined. For each of the cable component types, norms can be assigned for different PD properties. From the different components, the diagnosed cable systems can be added to the database, as shown in Figure 7-22. All of the required information on the cable system can be added. The obtained measurement data are assigned to the individual cable components or to the full cable system.

Figure 7-22 Screenshot of Cable Sections

After the selection of one of the cables in the database, measurements results can be added, removed, or updated for the cable system or for the individual components, as can be seen in Figure 7-23. On the left, an overview of a cable system is shown where, after selection, the measurement input can be filled. The analysis of the inserted measurement data is applied for time analysis or type analysis of the measurement data. Because the database can contain a large amount of data, filters can be used to restrict the amount of data for analysis purposes.

Page 104: 000000000001010555

Effectively Using Data for Risk Analysis

7-42

The analysis can be performed on three different levels:

• Component level

• Type level

• Group level

The component view is used to view and analyze the measurement data per component. The type view enables statistical analysis of components of the same type. The group view is used to view and analyze the measurement data on groups of components.

Figure 7-23 Measurement Add and Update Screen

A filter is constructed of rules, and rules are constructed of conditions. After selecting the root of the tree, controls appear for adding rules. When a rule is selected, conditions can be added to this rule, or the rule can be removed. One can be used to restrict the value of a property of a component (such as a joint, termination, cable part, or cable system) (see Figure 7-24). Expressions can be composed by one or two properties and an operator to restrict the amount of

Page 105: 000000000001010555

Effectively Using Data for Risk Analysis

7-43

data to be analyzed. By combining these data restrictions, certain data amounts can be used for analysis purposes (for example, all cable joints of type A situated in paper oil insulated joints with a partial discharge inception voltage [PDIV] below service voltage).

The analysis on the component level shows the properties that are relevant to the selected component for the different measurement sessions. An output matrix shows the different PD properties of the different measurements. When norms are in use, the elements of the component tree will color red if they are outside a norm and green if they are inside a norm. This coloring is recursive, so a cable system will color red if one of its components is red.

Figure 7-24 Dialog for Adding and Updating a Filter

Page 106: 000000000001010555

Effectively Using Data for Risk Analysis

7-44

The analysis at the type level is used to display statistical analysis of measurement data, as shown in Figure 7-25. The tree on the left displays the different types of components. Selecting a component type and a PD property will display the histogram of the measurement results that satisfied the filter and are performed on the selected component type. The histogram of the measurement data can also be displayed and exported as a table.

Figure 7-25 Histograms Created in the Type View

By using a database as described, the integrity of the collected data is an important issue. What if measurement data in one file or from one measurement session contain information about a cable system and its components? The situation can occur that one of the subcomponents will be replaced. Now, the measurement data are no longer valid for the new combination of the component and subcomponents. If these data stay in the database, they are no longer valid. If these data are deleted, information about the non-replaced components will be lost.

The cable system changes when components are replaced or when the cable system is enlarged with another cable system or divided into two parts. The new cable system is no longer the same cable system as the old one. A new cable system should be entered in the database, and the old cable system will be marked as “historic” and remains in the database. In this way, cable parts, joints, or terminations can belong to multiple cable systems in the database. Only one of these

Page 107: 000000000001010555

Effectively Using Data for Risk Analysis

7-45

cable systems can be the “current” cable system. The other older cable systems must all be historic.

By using a reference to its former cable system(s), the history of a cable system can be maintained. Because PD property values are linked to the components themselves, the historic data from the components are still accessible from the new “current” cable system. An example of this is shown in Figure 7-26. At time t, two joints C and D and a short cable part replace joint B. The measurement data on terminations 1 and 2 and joint A are the only ones that remain valid in the new situation. The date when a cable system became historic should remain stored inside the database to facilitate the ability to view measurement data that apply only to the historic cable.

Figure 7-26 Cable System in Original (Left) and Modified (Right) Form

Determinations of Norms and Criteria

The scatter in the distributions of the populations can be rather large, due to relatively few observations. For statistical analysis of the different properties, the condition data can be indicated by the type and the shape of the distribution. Calculations on the representative distributions can be performed to determine the deviation of a diagnostic property or to determine the deviation limits (norms) for a population of a property.

From the condition data per type of component, different statistical distributions, as shown in Figure 7-25, can be obtained, depending on the type of diagnostic properties. The distribution used for the statistical analysis should be representative of the total population of the full network.

Figure 7-28 shows examples of different PD diagnostic properties for distribution power cables. For the PD level, a Weibull distribution can be applied, which is used as a mathematical fitting of the distributions of the PD magnitude levels. The Weibull statistic deals with the different kinds of shapes of the distribution of PD amplitude levels. The parameters of the distribution can

Page 108: 000000000001010555

Effectively Using Data for Risk Analysis

7-46

then be determined from the sampled measurement data. For the PD occurrence frequency at a location, the Poisson distributions can be applied.

Figure 7-27 Experience Norms/Rejection Levels for the PD Amplitude Levels

Figure 7-28 PD Occurrence Frequency

After the determination of the type and shape of a distribution, the required rejection levels are calculated. Because the distribution represents the true population, calculations can be performed for the definitions of the norm levels of the different diagnostic properties. The diagnostic norm level can be determined at the level of the distribution (for example, the 95% level). This diagnostic norm determines that the remaining part of the distribution is typical for this diagnostic property and for this asset and will not fail inside the next maintenance interval.

Page 109: 000000000001010555

Effectively Using Data for Risk Analysis

7-47

The determined diagnostic norms are dependent on the goals of the asset manager. If the failure rate of a component increases, the experience norm should be adapted so that more defective components are taken out of service. Even so, if the condition assessments interval is increased from 5 to 10 years, the experience norm can be decreased to keep the risk level at the same level. Furthermore, as real-time condition data of large populations of various assets are used, the determinations of rejection are obtained in an independent manner. Also, the data are obtained from those service-aged assets for which the diagnostic norms are determined.

Database Application for Condition Assessment

In the condition database, the cable system component data, the condition data from the diagnostic inspections, and the experience norms (knowledge rules) are combined for condition assessment. As a result, based on the decision support as described previously in this section, an overview of the diagnosed cables in the network (or part of the network) can be obtained, as shown in Figure 7-29.

Figure 7-29 Database View of the Different Diagnosed Cable Systems

Page 110: 000000000001010555

Effectively Using Data for Risk Analysis

7-48

Figure 7-29 shows a database view of the different diagnosed cable systems of a specific network owner. The tree of the cable systems, in the figure on the left side, shows the different conditions of each cable system by color (red, orange, and green). As a cable system consists of a series of different components, its overall condition is actually determined by the component with the worst insulation condition. Therefore, the tree can be extended to show the different components of that cable system with its individual insulation condition.

Because all of the PD properties are stored in the database, it is possible to trace back to the criteria by which the decision is based. A filter can be created to decrease the total amount of data for analysis or condition assessment. On the right side, the different PD properties of a cable component are reflected with the applied filter, which indicate on what PD properties the condition of that component is determined. On the right side, the PD properties of a diagnosed cable termination are shown, indicating that its condition is determined by the PD activity in phase L3. The PDIV is below operation voltage (9 kV) and the PD levels at U0 and 2U0 are above the experience norm for the shrink termination. Phase L1 also shows PD activity in this termination, but the PD properties are not critical.

Page 111: 000000000001010555

8-1

8 PROJECT OPPORTUNITIES

PFM embraces the use of models in its subprocesses. These models are very useful in predicting such things as:

• Maintenance costs

• Optimum maintenance intervals

• Wear

• End-of-life

• Risk

• Maintenance effectiveness

The development of accurate maintenance models requires an in-depth understanding of functions as well as practical insight to their operation, risk environment, and failure mechanisms. The result of this amassed knowledge is a comprehensive maintenance strategy that properly applies relevant technologies, uses readily available data, and meets reliability and availability goals at the lowest total cost of asset ownership.

These models cannot be developed in a vacuum but must be developed in a collaborative environment involving researchers, asset managers, and maintenance technicians.

Load-Tap-Changer Opportunities

Load-tap-changers (LTCs) continue to be one of the larger consumers of maintenance resources. While great strides have been made in modeling the aging mechanism of some models and subcomponents, the work is far from complete. Prudent application of life extension technologies coupled with better identification of subcomponent deterioration/failure and improved end-of-life predictors can dramatically reduce maintenance costs while simultaneously improving reliability.

By working collaboratively with utilities that have a keen interest in optimizing their LTC maintenance program, a comprehensive methodology for developing and managing these maintenance programs can be developed. The resultant product will include:

• A documented methodology and base data for applying a strategic asset maintenance model

• Revised methodologies on how to perform PFM studies

Page 112: 000000000001010555

Project Opportunities

8-2

• For condition assessment tasks adding specifically:

– Triggers for scheduling condition assessment tasks

– Assessment determinants for triggering condition-directed tasks as a result of the assessment process

• Optimal interval determination

• KPIs:

– Measurement techniques

– Metrics

– Actions to be taken when KPIs show inappropriate trends

• Prioritization methodology

• Linkage to:

– Life extension project

– Best practices project

– Industry database

Medium-Voltage Circuit Breakers

Medium-voltage circuit breakers are the largest major asset components found in the substation arena. These devices are a critical element of the substation’s protection scheme and see numerous switching and fault operations in a typical year. While their individual replacement costs are on the low end of the spectrum compared to power transformers, their population is quite large and the effects of a failure are significant to both the customer and the operating utility.

Because the risk of functional failure is significant and the population is large, traditional maintenance approaches not only dominate the maintenance landscape but also consume a large amount of labor resources. An improved understanding of the failure mechanism and increased use of readily available data from SCADA systems and inspection activities has the potential to not only drive down maintenance costs but also improve reliability and extend useful operating life.

By working collaboratively with utilities that have a keen interest in optimizing their medium-voltage circuit breaker maintenance program, a comprehensive methodology for developing and managing these maintenance programs can be developed. The resultant product will include:

• A documented methodology and base data for applying a strategic asset maintenance model

• Development and implementation of predictive maintenance algorithms

• Revised methodologies on how to perform PFM studies

Page 113: 000000000001010555

Project Opportunities

8-3

• For condition assessment tasks adding specifically:

– Triggers for scheduling condition assessment tasks

– Assessment determinants for triggering condition directed tasks as a result of the assessment process

• Optimal interval determination

• KPIs:

– Measurement techniques

– Metrics

– Actions to be taken when KPIs show inappropriate trends

• Prioritization methodology

• Linkage to:

– Life extension project

– Best practices project

– Industry database

High-Voltage SF6 Circuit Breakers

High-voltage circuit breakers are the largest major asset components found in the transmission substation arena. These devices are a critical element of the substation’s protection scheme and see numerous switching and fault operations in a typical year. While their individual replacement costs are on the low end of the spectrum compared to power transformers, their population is quite large and the effects of a failure are significant to both the customer and the operating utility.

Because the risk of functional failure is significant and the population is large, traditional maintenance approaches not only dominate the maintenance landscape but also consume a large amount of labor resources. An improved understanding of the failure mechanism and increased use of readily available data from SCADA systems and inspection activities has the potential to not only drive down maintenance costs but also improve reliability and extend useful operating life.

By working collaboratively with utilities having a keen interest in optimizing their high-voltage SF6 circuit breaker maintenance program, a comprehensive methodology for developing and managing these maintenance programs can be developed. The resultant product will include:

• A documented methodology and base data for applying a strategic asset maintenance model

• Development and implementation of predictive maintenance algorithms

• Revised methodologies on how to perform PFM studies

Page 114: 000000000001010555

Project Opportunities

8-4

• For condition assessment tasks adding specifically:

– Triggers for scheduling condition assessment tasks

– Assessment determinants for triggering condition directed tasks as a result of the assessment process

• Optimal interval determination

• KPIs:

– Measurement techniques

– Metrics

– Actions to be taken when KPIs show inappropriate trends

• Prioritization methodology

• Linkage to:

– Life extension project

– Best practices project

– Industry database

Page 115: 000000000001010555

9-1

9 NEXT STEPS

This phase of the PFM project was focused on developing the concepts and creating an integrated framework. Many of the concepts have been tested with various degrees of documentation.

The next phase of this project requires a set of collaborative utilities to share their maintenance performance data and jointly develop a core set of performance metrics and aging models. Once real data and aging models become available, predictive models can be further refined and updated by using near real-time data from existing SCADA and other monitoring systems. The result would be a robust set of algorithms that can be used to trigger maintenance and predict remaining equipment life.

Page 116: 000000000001010555
Page 117: 000000000001010555

10-1

10 REFERENCES

1. Technical Update: Maintenance and Monitoring Best Practices for Substations Equipment – 2004. EPRI, Palo Alto, CA: 2004. 1008673.

Page 118: 000000000001010555
Page 119: 000000000001010555

A-1

A APPLICATION STUDY FOR LOAD-TAP-CHANGERS

Performance Focused Maintenance – LTC Application

Over the past several decades, utilities have taken two significant approaches to improve the maintenance effectiveness and efficiency for their LTC transformers. These approaches have focused on maintenance task improvements and technology improvements associated with the operation of the LTC and the use of on-line monitors. While these two approaches have resulted in improvements, they have not necessarily optimized the use of existing data available through SCADA and IEDs or focused on improving the overall performance of the LTC and its associated maintenance.

PFM is an all-inclusive approach to maintenance. PFM brings together what previously appeared to be distinctly different approaches to maintenance under a single umbrella. PFM recognizes that maintenance is both a technical and business process that must be managed and should be very similar across the whole landscape of utilities. PFM acknowledges that the specific application of these process and approaches will differ due to the wide range of customer requirements, electric infrastructures, and maintenance organizations. The adaptive approach of PFM allows utilities to meet their own specific maintenance and operational goals and at the same time be confident that they are effectively managing the process and following best industry practices.

A framework for PFM, shown in Figure A-1, displays the key concepts that go into a performance based approach to maintenance. From the diagram, one can see that the approach is quite robust and contains the technical and business elements needed for a world-class maintenance program.

Page 120: 000000000001010555

Application Study for Load-Tap-Changers

A-2

Figure A-1 PFM Framework

To further these concepts, a simple example focused on distribution LTC transformers is presented. This example application will apply only some of the PFM concepts to a family of 20 MVA LTC power transformers. Each of these transformers has an LTC, which will be the focus of this study. Due to a limited amount of available data, only a limited analysis of the main winding will be made, and there will be no analysis of the bushings.

Page 121: 000000000001010555

Application Study for Load-Tap-Changers

A-3

LTC Population Characteristics

Nineteen LTC transformers are included in the analysis population. These transformers are rated:

• Primary voltage = 110 kV

• Secondary voltage = 12.5 kV and 24 kV

• MVA rate = 12/16/20 MVA

All of the LTCs use a reactive type design with a preventive autotransformer. The LTC models and populations are listed in Table A-1.

Table A-1 LTC Population Characteristics, PFM Drivers and Benefits

Manufacturer LTC Model Population Average Age

(years)

Allis Chalmers TLH21 3 31.0

General Electric LR654A 5 33.0

McGraw Edison/ Pennsylvania 550B 3 37.7

McGraw Edison 550C 2 24.0

RTE UZD 2 26.0

Westinghouse URT 3 45.7

Westinghouse UVT 2 31.0

Operating and Maintenance History

For this example application of PFM concepts, a utility has provided EPRI with a limited amount of LTC O&M history. Due to the small amount of available data, some general assumptions are made. These assumptions, while typical of many LTC transformers, may need to be adjusted at a future date in order to better describe the actual condition of the transformers and the future anticipated operating performance of the LTCs.

Page 122: 000000000001010555

Application Study for Load-Tap-Changers

A-4

Specific assumptions being made include:

• No failures have taken place.

• Contact replacement takes place at the time of internal inspection of the LTC even if some life is still left.

• LTC oil is filtered at the time of each inspection.

• The loading patterns of all transformers are not beyond the nameplate rating.

• The aging of the main insulation package can be determined by oil testing.

• No bushing analysis is made.

LTC Diagnostics and Observations

Recent oil samples from the LTC were analyzed by an oil laboratory, and several observations were made. A summary of these observations is listed in Table A-2.

Table A-2 LTC Condition Summary

Manufacturer LTC Model Oil Test Results Other Observations

Allis Chalmers TLH21 High moisture (45ppm) content in LTC oil.

Allis Chalmers TLH21 Possibility of mild overheating or coking in LTC.

General Electric LR654A LTC needs minor repairs.

McGraw Edison/ Pennsylvania 550B

LTC oil has high arcing products, and oil probably is oxidized. Coking or overheating may have occurred.

McGraw Edison 550C No problems reported. No problems reported.

RTE UZD No problems reported. No problems reported.

Westinghouse URT LTC oil in poor condition. Coking or overheating may be present.

Westinghouse URT LTC is in bad shape. Needs serious repair to the gearbox and control parts.

Westinghouse URT LTC oil is oxidized and needs refurbishing.

Westinghouse UVT

LTC oil in worse condition. Moisture: 98 ppm, dielectric strength: 13 kV, IFT: 19 mN/m – needs rerating of bank after LTC internal inspection.

Page 123: 000000000001010555

Application Study for Load-Tap-Changers

A-5

Industry LTC Experience

A survey of other utilities was made to better understand the experience of others and determine if any of the transformers had a higher than normal failure or LTC wear rate. A summary of this survey is shown in Table A-3.

Table A-3 Industry Experience with LTCs

Manufacturer LTC Model Industry Experience Industry Rating of LTC

Allis Chalmers TLH21 • Proper contact alignment is critical.

• Excessive wear above 800 amps. Poor

General Electric LR654A • Solid tap changer at currents less than 1000 amps. Fair

McGraw Edison/ Pennsylvania 550B

• Poor contact performance.

• Reversing switch was of a poor design. Poor

McGraw Edison 550C • Improvement of the 550B model. Fair

RTE UZD • Good contact performance.

• Have found problems with the barrier board studs.

Good

Westinghouse URT • Solid tap changer at currents less than 1000 amps. Fair

Westinghouse UVT

• Some problems with controls.

• Vacuum bottles reduce contact wear significantly.

• Needs desiccant breather.

Good

Page 124: 000000000001010555

Application Study for Load-Tap-Changers

A-6

Main Insulation Package Diagnostics and Observations

Recent oil samples from the main tank were analyzed by an oil laboratory, and several observations were made. A summary of these observations is described in Table A-4.

Table A-4 Transformer Insulation Condition

Manufacturer LTC Model Aging Other Observations

Allis Chalmers TLH21 Moderate to marginal 40–70% remaining life by DP analysis

General Electric LR654A Normal 50–95% remaining life by DP analysis

McGraw Edison/ Pennsylvania 550B Marginal 67% remaining life by DP analysis

McGraw Edison 550C Marginal 45–77% remaining life by DP analysis

RTE UZD Normal 77–84% remaining life by DP analysis

Westinghouse URT Accelerated aging on one unit 12–95% remaining life by DP analysis

Westinghouse UVT Excellent 95% remaining life by DP analysis

Page 125: 000000000001010555

Application Study for Load-Tap-Changers

A-7

PFM Technical Analysis

A traditional failure mode and criticality analysis was performed on this family of distribution class transformers with LTC. The results are summarized in Table A-5.

Table A-5 PFM Technical Analysis Summary

Functions and

Regulatory Requirements

Critical Function

(Mark with an

“X”)

Failure Modes

Dominant Failure

Modes (0 = Exceptional, 1 = Seldom,

2 = Real Possibility)

Failure Effects:

Equipment

Failure Effects: System

Failure Effects: Remote

Failure Effects:

Customer Failure Causes

Dominant Cause

(0 = Rare, 1 = Seldom,

2 = Real Possibility,

3 = Exceptional

Problem)

Aging Mechanisms

Applicable Performance

Metrics

Safety Impact?

(Yes, No)

Transform voltage at rated KVA

High

Fails to transform voltage at rated MVA

0; never happens on its own

Component failure

Loss of critical function

Highside protection operates

Extended outage

Result of other failure modes 0 No aging

mechanism None N/A

High Fails to provide cooling

1; multi-stage cooling Loss of life

Could lead to dielectric failure

Highside protection operates

Extended outage

Cooling control failure or loss of station service

1 Random Failure rate of cooling control system

No

Automatically adjust output voltage (LTC)

High

Fails to adjust output voltage

2 Component failure

Loss of critical function

Maintenance inconvenience

Quality of service

Loose electrical connections

0 OEM workmanship

Failure rate for this mode No

Fails to adjust output voltage

2 Component failure

Loss of critical function

Maintenance inconvenience

None; must take place when xfr is off-line

Contact misalignment or failure

2 OEM workmanship

Failure rate for this mode No

Fails to adjust output voltage

2 Component failure

Loss of critical function

Loss of transfer capacity, possible transformer failure

Extended outage

Drive mechanism 1 OEM

workmanship Failure rate for this mode No

Fails to adjust output voltage

2 Component failure

Loss of critical function

Loss of transfer capacity, possible transformer failure

Extended outage

Failed reversing switch

2 OEM workmanship

Failure rate for this mode No

Page 126: 000000000001010555

Application Study for Load-Tap-Changers

A-8

Table A-5 (cont.) PFM Technical Analysis Summary

Functions and

Regulatory Requirements

Critical Function

(Mark with an

“X”)

Failure Modes

Dominant Failure

Modes (0 = Exceptional, 1 = Seldom,

2 = Real Possibility)

Failure Effects:

Equipment

Failure Effects: System

Failure Effects: Remote

Failure Effects:

Customer Failure Causes

Dominant Cause

(0 = Rare, 1 = Seldom,

2 = Real Possibility,

3 = Exceptional

Problem)

Aging Mechanisms

Applicable Performance

Metrics

Safety Impact?

(Yes, No)

Fails to adjust output voltage

2 Component failure

Loss of critical function

Maintenance inconvenience

Quality of service reduction (back-up controls prevent overvoltage condition)

Failed controls 1 OEM workmanship

Failure rate for this mode No

Fails to adjust output voltage

2 Component failure

Loss of critical function

Maintenance inconvenience

Quality of service reduction (back-up controls prevent overvoltage condition)

Loss of sensing (voltage input)

1 OEM workmanship

Failure rate for this mode No

Manually adjust output voltage (NLTC)

High

Fails to adjust output voltage

0 = in service; 1 = during installation

Component failure

Loss of critical function

Maintenance inconvenience

None; must take place when xfr is off-line

Loose electrical connection

0 OEM workmanship

Report exception during installation or resetting

No

Fails to adjust output voltage

0 = in service; 1 = during installation

Component failure

Loss of critical function

Maintenance inconvenience

None; must take place when xfr is off-line

Contact misalignment 0 OEM

workmanship

Report exception during installation or resetting

No

Fails to adjust output voltage

0 = in service; 1 = during installation

Component failure

Loss of critical function

Maintenance inconvenience

None; must take place when xfr is off-line

Broken mechanical connection

0 OEM workmanship

Report exception during installation or resetting

No

Provide oil level indication

Incorrectly indicates oil level is too high

0 Gauge or float figure

Loss of alarming None None Mechanical

binding 0 Random

Report exception during inspection

No

Indicates oil level is ok but is low

0 Gauge or float figure

Loss of alarming None None Mechanical

binding 0 Random

Report exception during inspection

No

Page 127: 000000000001010555

Application Study for Load-Tap-Changers

A-9

Table A-5 (cont.) PFM Technical Analysis Summary

Functions and

Regulatory Requirements

Critical Function

(Mark with an

“X”)

Failure Modes

Dominant Failure

Modes (0 = Exceptional, 1 = Seldom,

2 = Real Possibility)

Failure Effects:

Equipment

Failure Effects: System

Failure Effects: Remote

Failure Effects:

Customer Failure Causes

Dominant Cause

(0 = Rare, 1 = Seldom,

2 = Real Possibility,

3 = Exceptional

Problem)

Aging Mechanisms

Applicable Performance

Metrics

Safety Impact?

(Yes, No)

Incorrectly indicates oil level is too low

0 Gage or float figure

Loss of alarming None None Broken float 0 Random

Report exception during inspection

No

Contain oil High

Fails to provide containment; minor leak

1 Loss of cooling and insulation

Possible dielectric failure as a result of oil heating

Loss of transfer capacity

None; planned load transfer

Rust/corrosion 1

Time and location dependent (environment)

Failure rate for this mode No

Fails to provide containment; minor leak

1 Loss of cooling and insulation

Possible dielectric failure as a result of oil heating

Loss of transfer capacity

None; planned load transfer

Gasket failure 1 Time and temperature

Failure rate for this mode No

Fails to provide containment; minor leak

1 Loss of cooling and insulation

Possible dielectric failure as a result of oil heating

Loss of transfer capacity

None; planned load transfer

Weld fatigue 0 Time-OEM manufacturing error

Failure rate for this mode No

Fails to provide containment; major leak

0 Loss of cooling and insulation

Possible dielectric failure as a result of oil heating

Highside protection operates

Extended outage Vandalism 2 Random Failure rate for

this mode No

Provide connectivity to HV or LV system

High

Failure to provide conduction path

1 Local overheating Hotspot

Could lead to flashover and outage

Voltage flicker and possible outage

Loose bushing connection 1 Random Failure rate for

this mode No

1 Bushing failure

Loss of function

Highside protection operates

Extended outage Broken bushing 1 Random Failure rate for

this mode No

Provide rated insulation High

Fails to provide insulation-normal operation

2 Loss of function

Damaged coils

Highside protection operates

Extended outage

Overload, overheating 1 Time and

temperature Failure rate for this mode No

2 Loss of function

Tank rupture or fire (rare)

Highside protection operates

Extended outage

Contamination, water ingress 2

Time or random event that allows water ingress

Failure rate for this mode No

Page 128: 000000000001010555

Application Study for Load-Tap-Changers

A-10

Table A-5 (cont.) PFM Technical Analysis Summary

Functions and

Regulatory Requirements

Critical Function

(Mark with an

“X”)

Failure Modes

Dominant Failure

Modes (0 = Exceptional, 1 = Seldom,

2 = Real Possibility)

Failure Effects:

Equipment

Failure Effects: System

Failure Effects: Remote

Failure Effects:

Customer Failure Causes

Dominant Cause

(0 = Rare, 1 = Seldom,

2 = Real Possibility,

3 = Exceptional

Problem)

Aging Mechanisms

Applicable Performance

Metrics

Safety Impact?

(Yes, No)

2 Loss of function

Tank rupture or fire (rare)

Highside protection operates

Extended outage

Manufacturing contaminants 0 N/A Failure rate for

this mode No

2 Loss of function

Damaged coils

Highside protection operates

Extended outage Age 2 Time Failure rate for

this mode No

2 Loss of function

Damaged coils

Highside protection operates

Extended outage

Low oil, loss of cooling 1 Temperature Failure rate for

this mode No

HIgh

Fails to provide insulation-surge and transient

2 Loss of function

Damaged coils

Highside protection operates

Extended outage

Overload, overheating 1 Time and

temperature

Failure rate for insulation failure mode (only root cause analysis can differentiate the exact cause and mode)

No

2 Loss of function

Damaged coils

Highside protection operates

Extended outage

Contamination, water ingress 1

Time or random event that allows water ingress

Failure rate for insulation failure mode (only root cause analysis can differentiate the exact cause and mode)

No

2 Loss of function

Damaged coils

Highside protection operates

Extended outage

Manufacturing contaminants 0 N/A

Failure rate for insulation failure mode (only root cause analysis can differentiate the exact cause and mode)

No

2 Loss of function

Damaged coils

Highside protection operates

Extended outage Age 1 Time

Failure rate for insulation failure mode (only root cause analysis can differentiate the exact cause and mode)

No

Page 129: 000000000001010555

Application Study for Load-Tap-Changers

A-11

Table A-5 (cont.) PFM Technical Analysis Summary

Functions and

Regulatory Requirements

Critical Function

(Mark with an

“X”)

Failure Modes

Dominant Failure

Modes (0 = Exceptional, 1 = Seldom,

2 = Real Possibility)

Failure Effects:

Equipment

Failure Effects: System

Failure Effects: Remote

Failure Effects:

Customer Failure Causes

Dominant Cause

(0 = Rare, 1 = Seldom,

2 = Real Possibility,

3 = Exceptional

Problem)

Aging Mechanisms

Applicable Performance

Metrics

Safety Impact?

(Yes, No)

2 Loss of function

Damaged coils

Highside protection operates

Extended outage

Loss of coil compression 2 Time

Failure rate for insulation failure mode (only root cause analysis can differentiate the exact cause and mode)

No

2 Loss of function

Damaged coils

Highside protection operates

Extended outage Low oil 0 Random

Failure rate for insulation failure mode (only root cause analysis can differentiate the exact cause and mode)

No

Provide pressure relief High Fails to open 0

Damage to pressure relief device

Tank deformation, rupture, or fire (rare)

The initiating event that caused a change in pressure generally causes a protective device to operate.

Extended outage

Corrosion or contamination in spring

0 Time and environment

Report exception during inspection

Yes

Fails to reset 0 Damage to pressure relief device

Could allow contamination to enter the transformer

None

May lead to future insulation failure and thus, extended outage

Spring failure 0

Report exception during inspection

No

Page 130: 000000000001010555

Application Study for Load-Tap-Changers

A-12

PFM Technical Summary

The PFM analysis identified the dominant modes and causes of failure to be:

• Failure to adjust output voltage automatically

– Stationary contact failure

– Moving contact failure

– Reversing switch failure

• Failure to provide insulation – surge and transient

– Loss of clamping pressure

• Failure to provide insulation – normal operation

– Age

– Moisture intrusion

Other causes of functional failure (such as leaks) will not be further investigated because the current substation and transformer maintenance inspection program is a prudent and successful approach.

PFM Risk Analysis

An analysis of the risk associated with the dominant causes of failure was performed. The results are summarized in Figure A-2 through Figure A-6 for the failure modes of:

• Failure to adjust output voltage automatically

• Failure to provide insulation – surge and transient

• Failure to provide insulation – normal operation

Figure A-2 Winding Failure Risk Analysis – Older Westinghouse

Page 131: 000000000001010555

Application Study for Load-Tap-Changers

A-13

Figure A-3 Winding Failure Risk Analysis – Others

Figure A-4 LTC Failure Risk Analysis – Poor Performer

Figure A-5 LTC Failure Risk Analysis – Fair Performer

Page 132: 000000000001010555

Application Study for Load-Tap-Changers

A-14

Figure A-6 LTC Failure Risk Analysis – Good Performer

From the risk analysis, several potential maintenance/reliability areas requiring more in-depth analyses or potentially a change in strategy were recognized. Three areas that will be further analyzed in this report include:

• Advanced aging of some main winding insulation systems

• Short contact lives associated with the TLH21 model of LTC

• The potential to extend maintenance intervals for type UVT and UZD model tap changers

Note: Due to limited information, other areas could not be sufficiently reviewed.

Developing Aging Models

Initial risk analysis activities gave good insight into areas of potential concern and opportunity. The process was very subjective due to the limited amount of available data. This limited amount of data does not imply that the risk analysis was of little value; in fact, just the opposite is true. The risk analysis also does a good job in identifying what future data collection activities will benefit the utility.

Two aging models were developed, using PFM techniques to model:

• The aging process of paper insulation in transformers

• The wear process for LTC contacts

Page 133: 000000000001010555

Application Study for Load-Tap-Changers

A-15

The two models developed are graphed in Figure A-7 and Figure A-8.

Figure A-7 Main Winding Aging Model (Normal Loading)

Figure A-8 LTC Wear Model

Implications of Aging/Wear Models

It is recognized that the above aging/wear models are second approximations as to how the aging process actually takes place. They are an improvement over first approximation models that are sometimes referred to as failure rate models in several ways. The improvements include:

• Realization that most aging processes are not linear or constant. The rate of aging can change with time.

• Time is not the only aging mechanism.

• Wear is described as the probability of end-of-life.

• Models are specific to each equipment design.

Page 134: 000000000001010555

Application Study for Load-Tap-Changers

A-16

With the wear models, risk can now be quantified as a function of age, acknowledging the fact that risk is dynamic and changes with time. Now risk and benefits can be defined in the following equation as:

21 RiskRiskBenefitfailure)ofty (Probabilidollars)in failure of (ImpactsRisk

−=×=

Eq. A-1

Where: Risk1 = Risk associated with operation or maintenance scenario 1 Risk2 = Risk associated with operation or maintenance scenario 2

This risk can now be applied to the fleet of transformers to determine the average annual failure rates and the expected budget impacts as shown in Figure A-9.

Figure A-9 Average Failure Rate and Risk for an Aging Fleet of Transformers

Transformer Winding Maintenance

The PFM analysis reinforced the facts that:

• Renewal of the paper insulation system is not a maintenance activity.

• Renewal of the paper insulation system is, in most cases, not cost effective.

• Coil clamping pressure directly affects the capability of the transformer to withstand a through-fault.

• Aging is a function of:

– Time

– Temperature/loading

– Available oxygen

• An age limit on transformer operating life is prudent and could be determined from the utility’s risk tolerance level.

Page 135: 000000000001010555

Application Study for Load-Tap-Changers

A-17

A third approximation of winding age should now be developed in order to more accurately quantify the annual risk of failure. This model can take into account the previously mentioned aging elements and be calibrated by a combination of:

• Power factor test measurements

• Oil (furan) tests

• Dissolved gas analysis (DGA) tests (CO and CO2)

The topic of transformer coil clamping pressure must be further investigated. It is acknowledged that test techniques such as a swept frequency response analysis (SFRA) can detect the movement of windings resulting from through-faults and reduced clamping pressure. This analysis is after-the-fact, and, although it can potentially alert the utility of an abnormal risk of near-term failure, it does not prevent winding movements.

LTC Maintenance

The maintenance of LTCs is a traditional renewal activity that can be improved by the results of the above analysis. It is possible to set more exact maintenance triggers for LTC inspection based on the above models and the level of risk the utility is willing to take. Table A-6 identifies some possible trigger levels for LTC internal inspection.

Table A-6 Number of LTC Operations Where 63% Contact Wear Is Expected

Model Operations Before Maintenance

TLH21 15,000

LR654 40,000

550B 30,000

550C 40,000

UZD 80,000

URT 30,000

UVT 60,000

Page 136: 000000000001010555

Application Study for Load-Tap-Changers

A-18

Another approach to triggering LTC maintenance is to perform maintenance when the total cost of maintenance and risk is minimal. This approach requires the utility to quantify:

• The cost of maintenance

• The cost of failure:

– Equipment repair and replacement costs

– Contractual impacts

– Supply impacts

– Revenue impacts

• Social impacts:

– Customer impacts

– Environmental impacts

– Political impacts

This approach is location sensitive and results in an optimum maintenance interval. It is approximated in Figure A-10 for demonstration of the concept only.

Figure A-10 Example of Optimizing Maintenance Intervals Based on Lowest Life-Cycle Cost

Beyond setting operation limits for each LTC model, the analysis identified a need to reassess the maintenance approach for the model TLH21 tap changer and potentially other models as well. This revised approach may include:

• Use of on-line LTC oil filters

• LTC temperature monitoring – temperature index

• LTC DGA analysis

Page 137: 000000000001010555

Application Study for Load-Tap-Changers

A-19

It is obvious that the second approximation model of LTC contact wear can be greatly improved. A third generation approximation can be developed that incorporates:

• Actual loading at the time of contact change

• Tap positions at the time of change

• A large sampling of observations using observations from multiple utilities

This improvement can be personalized by using readily available, utility-specific O&M data that come from field inspections, SCADA systems, and data historians.

Page 138: 000000000001010555

Application Study for Load-Tap-Changers

A-20

Performance Measurement

Both the PFM technical analysis and the age model identified important data elements to measure during routine transformer O&M as well as metrics to calculate for measuring transformer and LTC performance. Some of the measures and metrics identified during this analysis are shown in Tables A-7 and A-8.

Table A-7 Metrics for LTC Performance

Metrics and KPIs

Mark only one column below.

Function Information to Collect/Use Source

Equip. Spec. or

Aggregate?

Static Equip. Data?

Measure-ment Data?

Metric Info?

KPI Info? Action If Off-Target Report Needed

Adjust output voltage (LTC) Loading SCADA/

historian Specific N/A Base data

Adjust output voltage (LTC) Tap position SCADA/

historian Specific N/A Base data

Adjust output voltage (LTC)

LTC oil temperature

SCADA/ historian Specific N/A Base data

Adjust output voltage (LTC)

LTC oil temperature

SCADA/ historian Specific N/A Base data

Adjust output voltage (LTC) Failure event CMMS Specific N/A Base data

Adjust output voltage (LTC)

Percent wear at the time of maintenance

CMMS Specific N/A Base data

Adjust output voltage (LTC)

Contact wear as function of operations and model

CMMS Aggregate X

Determine if maintenance triggers are correct or if impacted by other factors, such as loading

Wear distribution by age/operations

Page 139: 000000000001010555

Application Study for Load-Tap-Changers

A-21

Table A-7 (cont.) Metrics for LTC Performance

Function Information to Collect/Use Source

Equip. Spec. or

Aggregate?

Static Equip. Data?

Measure-ment Data?

Metric Info?

KPI Info? Action If Off-Target Report Needed

Adjust output voltage (LTC)

Reversing switch wear as function of operations and model

CMMS Aggregate X

Determine if maintenance triggers are correct or if impacted by other factors, such as loading

Wear distribution by age/operations

Adjust output voltage (LTC)

LTC temperature index MMW Specific X

Perform DGA and analysis and/or schedule internal inspection of LTC

13-month index history of all LTCs

Adjust output voltage (LTC)

Operations per month MMW Specific X

Investigate bandwidth and time delay settings on LTC controls

Items with more operation than monthly threshold

Adjust output voltage (LTC)

Reversing switch operations per month

MMW Specific X Investigate NLTC setting Items with fewer than needed operations of the reversing switch

Table A-8 Metrics for Main Insulation Performance

Metrics and KPIs

Mark only one column below.

Function Information to Collect/Use Source

Equip. Spec. or

Aggregate?

Static Equip. Data?

Measure-ment Data?

Metric Info?

KPI Info? Action If Off-Target Report Needed

Provide rated insulation

Outage event/frequency for insulation failure mode

Specific X Determine if it is a batch or age problem

Outage count distribution by mode and age

Provide rated insulation

Outage duration for insulation failure mode

Specific X Look at emergency replacement process

Outage duration distribution by mode

Page 140: 000000000001010555

Application Study for Load-Tap-Changers

A-22

Table A-8 (cont.) Metrics for Main Insulation Performance

Function Information to Collect/Use Source

Equip. Spec. or

Aggregate?

Static Equip. Data?

Measure-ment Data?

Metric Info?

KPI Info? Action If Off-Target Report Needed

Provide rated insulation

Customers affected for insulation failure mode

Specific X Determine if it is a batch or age problem

Customers impacted distribution by mode and age

Provide rated insulation DGA results Specific X

Determine if a specific transformer problem exists

Items exceeding gas threshold

Provide rated insulation Furan results Specific X

Determine if a specific transformer problem exists

Items exceeding implied age or furan threshold

Provide rated insulation

Failure mode distribution by age

Transformer application specific

X Investigate OEM quality control

Failure mode distribution by age

Provide rated insulation

Failure rate by age for insulation failure mode

Transformer application specific

X Investigate OEM quality control

Failure rate distribution by mode

Provide rated insulation

SAIDI for all xfr insulation failure modes

Aggregate X Modify replacement criteria or aging model

SAIDI report by application

Provide rated insulation

SAIDI for all xfr insulation failure modes

Aggregate X Modify replacement criteria or aging model

SAIFI report by application

Provide rated insulation

Operating life index distribution Aggregate X Review replacement plan

Graph showing the current operating life distribution and the trends toward older or newer

Page 141: 000000000001010555

Application Study for Load-Tap-Changers

A-23

Conclusion

The concepts of PFM are quite thorough and cover the gamut of technical, business, and information requirements needed for a robust and responsive maintenance program. Although the analysis presented previously is rather short and limited, one can see how these concepts can be further expanded upon, resulting in further insight into how to manage and improve maintenance. The implications that can be drawn from a larger pool of data than what has been provided here can have a great financial and reliability impact on a utility. Potential impacts include:

• Developing an objective transformer replacement program

• Delaying premature transformer replacements where prudent

• Limiting risk equitably across the whole population of transformers

• Changing LTC purchase specifications so that life-cycle O&M costs are minimal and maximum reliability is realized

• Identifying how existing data sources can be used as drivers in a PM approach

• Statistically demonstrating the value of maintenance

• Identifying areas of maintenance that are not under control

Page 142: 000000000001010555
Page 143: 000000000001010555
Page 144: 000000000001010555

© 2005 Electric Power Research Institute (EPRI), Inc. All rights reserved.Electric Power Research Institute and EPRI are registered service marks ofthe Electric Power Research Institute, Inc.

Printed on recycled paper in the United States of America

Program:

Substations

1010555

Export Control Restrictions

Access to and use of EPRI Intellectual Property is granted with

the specific understanding and requirement that responsibility

for ensuring full compliance with all applicable U.S. and

foreign export laws and regulations is being undertaken by

you and your company. This includes an obligation to ensure

that any individual receiving access hereunder who is not a

U.S. citizen or permanent U.S. resident is permitted access

under applicable U.S. and foreign export laws and

regulations. In the event you are uncertain whether you or

your company may lawfully obtain access to this EPRI

Intellectual Property, you acknowledge that it is your

obligation to consult with your company’s legal counsel to

determine whether this access is lawful. Although EPRI may

make available on a case-by-case basis an informal

assessment of the applicable U.S. export classification for

specific EPRI Intellectual Property, you and your company

acknowledge that this assessment is solely for informational

purposes and not for reliance purposes. You and your

company acknowledge that it is still the obligation of you and

your company to make your own assessment of the applicable

U.S. export classification and ensure compliance accordingly.

You and your company understand and acknowledge your

obligations to make a prompt report to EPRI and the

appropriate authorities regarding any access to or use of EPRI

Intellectual Property hereunder that may be in violation of

applicable U.S. or foreign export laws or regulations.

The Electric Power Research Institute (EPRI)

The Electric Power Research Institute (EPRI), with major locations in

Palo Alto, California, and Charlotte, North Carolina, was established

in 1973 as an independent, nonprofit center for public interest

energy and environmental research. EPRI brings together members,

participants, the Institute’s scientists and engineers, and other leading

experts to work collaboratively on solutions to the challenges of electric

power. These solutions span nearly every area of electricity generation,

delivery, and use, including health, safety, and environment. EPRI’s

members represent over 90% of the electricity generated in the

United States. International participation represents nearly 15% of

EPRI’s total research, development, and demonstration program.

Together...Shaping the Future of Electricity

ELECTRIC POWER RESEARCH INSTITUTE3420 Hillview Avenue, Palo Alto, California 94304-1395 • PO Box 10412, Palo Alto, California 94303-0813 USA

800.313.3774 • 650.855.2121 • [email protected] • www.epri.com